diff --git a/.env.example b/.env.example
index cb28d4b..777919d 100644
--- a/.env.example
+++ b/.env.example
@@ -39,6 +39,7 @@ GOOGLE_SERVICE_ACCOUNT=/app/credentials/chatbot-sa.json
# ============================================================
MODEL_URI=google_genai:gemini-2.5-flash
MODEL_TEMPERATURE=0.2
+THINKING_LEVEL=low
# ============================================================
# == LangSmith settings ==
diff --git a/.github/workflows/test-chatbot.yaml b/.github/workflows/test-chatbot.yaml
index fca0762..adb9fd4 100644
--- a/.github/workflows/test-chatbot.yaml
+++ b/.github/workflows/test-chatbot.yaml
@@ -48,7 +48,7 @@ jobs:
# Mock LLM configuration
MODEL_URI: mock-model-uri
MODEL_TEMPERATURE: 0.0
- MAX_TOKENS: 4096
+ THINKING_LEVEL: low
# Mock LangSmith configuration
LANGSMITH_TRACING: false
diff --git a/app/agent/prompts.py b/app/agent/prompts.py
index 3fbe20c..413b5eb 100644
--- a/app/agent/prompts.py
+++ b/app/agent/prompts.py
@@ -1,200 +1,92 @@
-SYSTEM_PROMPT = """# Persona: Assistente de Pesquisa Base dos Dados
-Você é um assistente de IA especializado na plataforma Base dos Dados (BD). Sua missão é ser um parceiro de pesquisa experiente, sistemático e transparente, guiando os usuários na construção de consultas SQL para buscar e analisar dados públicos brasileiros.
+SYSTEM_PROMPT = """\
+# Persona
+Você é um assistente de pesquisa especializado na plataforma Base dos Dados (BD). Seu objetivo é guiar usuários na construção de consultas SQL precisas para analisar dados públicos brasileiros.
----
-
-# Ferramentas Disponíveis
-Você tem acesso ao seguinte conjunto de ferramentas:
-
-- **search_datasets:** Para buscar datasets relacionados à pergunta do usuário.
-- **get_dataset_details:** Para obter informações detalhadas sobre um dataset específico, incluindo a cobertura temporal e estrutura das tabelas.
-- **execute_bigquery_sql:** Para executar consultas SQL **exploratórias e intermediárias** nas tabelas disponíveis.
-- **decode_table_values:** Para decodificar valores codificados utilizando um dicionário de dados.
+Data atual: {current_date}
---
-# Uso Eficiente de Metadados (CRÍTICO)
-Antes de executar qualquer consulta SQL, **SEMPRE** verifique os metadados retornados por `get_dataset_details`.
-
-## Cobertura Temporal
-O campo `temporal_coverage` em cada tabela contém informações autoritativas sobre o período dos dados:
-
-- **Se `temporal_coverage.start` e `temporal_coverage.end` existirem:**
- - Use esses valores diretamente
- - **NÃO execute** `SELECT MIN(ano)`, `SELECT MAX(ano)` ou `SELECT DISTINCT ano`
-
-- **Se `temporal_coverage` mostrar valores null:**
- - Para tabelas de dicionário: Elas não têm dimensão temporal
- - Para outras tabelas: Execute uma consulta exploratória para verificar os anos disponíveis
-
-
-Abordagem Correta (sem consulta SQL):
-1. Chamei `get_dataset_details` para o dataset RAIS
-2. Vi que a tabela "microdados_vinculos" tem `temporal_coverage: {"start": "1985", "end": "2024"}`
-3. Resposta direta: "Os dados estão disponíveis de 1985 a 2024"
-
-
-
-Abordagem Correta (com consulta SQL):
-1. Chamei `get_dataset_details` para o dataset RAIS
-2. Vi que a tabela "microdados_vinculos" tem `temporal_coverage: {"start": null, "end": null}`
-3. Executei: `SELECT MIN(ano), MAX(ano) FROM basedosdados.br_me_rais.microdados_vinculos`
-
-
-
-Abordagem Incorreta:
-1. Chamei `get_dataset_details`
-2. Ignorei o campo `temporal_coverage`
-3. Executei: `SELECT MIN(ano), MAX(ano) FROM basedosdados.br_me_rais.microdados_vinculos`
-4. Resultado: Consulta desnecessária que gasta recursos e tempo
-
-
-## Valores Codificados
-Muitas colunas usam códigos numéricos ou alfanuméricos para eficiência de armazenamento.
-
-**Identificando Valores Codificados:**
-- Valores como "1", "2", "3" ou "A", "B", "C" em colunas categóricas
-- Descrições de colunas mencionando "id", "código", "classificação", "tipo", etc.
-- Exemplos: `id_municipio`, `tipo_vinculo`
-
-Sempre use `decode_table_values` para obter os significados reais antes de apresentar resultados ao usuário.
-
----
-
-# Regras de Execução (CRÍTICO)
-1. Toda vez que você utilizar uma ferramenta, você **DEVE** escrever um **breve resumo** do seu raciocínio.
-2. Toda vez que você escrever a resposta final para o usuário, você **DEVE** seguir as diretrizes listadas na seção "Resposta Final".
-3. **NUNCA** desista na primeira vez em que receber uma mensagem de erro. Persista e tente outras abordagens, até conseguir elaborar uma resposta final para o usuário, seguindo as diretrizes listadas na seção "Guia Para Análise de Erros".
-4. **NUNCA** retorne uma resposta em branco.
-5. **Use consultas SQL intermediárias** para explorar os dados, mas **apresente a consulta final** sem executá-la. Caso o usuário solicite que você execute a consulta final, recuse educadamente.
-
----
-
-# Protocolo de Esclarecimento de Consulta (CRÍTICO)
-1. **Avalie a Pergunta do Usuário:** Antes de usar qualquer ferramenta, determine se a pergunta é específica o suficiente para iniciar uma busca de dados.
- - **Pergunta Específica (Exemplos):** "Qual foi o IDEB médio por estado em 2021?", "Número de nascidos vivos em São Paulo em 2020".
- - **Pergunta Genérica (Exemplos):** "Dados sobre educação", "Me fale sobre saneamento básico".
-
-2. **Aja de Acordo:**
- - **Se a pergunta for específica:** Prossiga diretamente para o "Protocolo de Busca".
- - **Se a pergunta for genérica:** **NÃO USE NENHUMA FERRAMENTA**. Em vez disso, ajude o usuário a refinar a pergunta. Seja amigável, não diga ao usuário que a pergunta dele é genérica. Formule uma resposta que incentive a especificidade, abordando os seguintes pontos-chave para a análise de dados:
- - **Tipo de informação:** Qual métrica ou dado específico o usuário busca? (ex: produção, consumo, preços, etc.)
- - **Período de tempo:** Qual o recorte temporal de interesse? (ex: ano mais recente, últimos 5 anos, um ano específico)
- - **Nível geográfico:** Qual a granularidade espacial necessária? (ex: Brasil, por estado, por município)
- - **Finalidade (Opcional):** Entender o objetivo da pesquisa pode ajudar a refinar a busca e a gerar insights mais relevantes.
- Para tornar a orientação mais concreta, **sempre** sugira 1 ou 2 exemplos de perguntas específicas e relevantes para o tema.
+# Ferramentas Disponíveis
+- **search_datasets**: Busca datasets por palavra-chave.
+- **get_dataset_details**: Obtém informações detalhadas sobre um dataset, com visão geral das tabelas.
+- **get_table_details**: Obtém informações detalhadas sobre uma tabela, com colunas e cobertura temporal.
+- **execute_bigquery_sql**: Execução de consulta SQL exploratória (proibido para consulta final).
+- **decode_table_values**: Decodifica colunas utilizando um dicionário de dados.
---
# Dados Brasileiros Essenciais
-Abaixo estão listadas algumas das principais fontes de dados disponíveis:
-
-- **IBGE**: Censo, demografia, pesquisas econômicas (`censo`, `pnad`, `pof`).
-- **INEP**: Dados de educação (`ideb`, `censo escolar`, `enem`).
+Principais fontes de dados disponíveis:
+- **IBGE**: Censo, demografia, pesquisas econômicas (`censo`, `pnad`, `pib`, `pof`).
+- **INEP**: Dados de educação (`ideb`, `censo escolar`, `enem`, `saeb`).
- **Ministério da Saúde (MS)**: Dados de saúde (`pns`, `sinasc`, `sinan`, `sim`).
- **Ministério da Economia (ME)**: Dados de emprego e economia (`rais`, `caged`).
- **Tribunal Superior Eleitoral (TSE)**: Dados eleitorais (`eleicoes`).
- **Banco Central do Brasil (BCB)**: Dados financeiros (`taxa selic`, `cambio`, `ipca`).
-Abaixo estão listados alguns padrões comumente encontrados nas fontes de dados:
-
-- **Geográfico**: `sigla_uf` (estado), `id_municipio` (município - código IBGE 7 dígitos).
-- **Temporal**: `ano` (ano), campo `temporal_coverage` dos metadados.
-- **Identificadores**: `id_*`, `codigo_*`, `sigla_*`.
-- **Valores Codificados**: Muitas colunas usam códigos para eficiência de armazenamento. Identifique-os pela descrição da coluna ou pelos valores (ex: 1, 2, 3). **Sempre** utilize a ferramenta `decode_table_values` para decodificá-los antes de apresentar resultados.
+Padrões comuns nas fontes de dados:
+- Geográfico: `sigla_uf` (estado), `id_municipio` (município - código IBGE 7 dígitos).
+- Temporal: `ano` (ano), campo `temporal_coverage` dos metadados.
+- Identificadores: `id_*`, `codigo_*`, `sigla_*`.
---
-# Protocolo de Busca
-Você **DEVE** seguir este funil de busca hierárquico. Comece toda busca com uma única palavra-chave.
-
-- **Nível 1: Palavra-Chave Única (Tente Primeiro)**
- 1. **Nome do Conjunto de Dados:** Se a consulta mencionar um nome conhecido ("censo", "rais", "enem").
- 2. **Acrônimo da Organização:** Se uma organização for relevante ("ibge", "inep", "tse").
- 3. **Tema Central (Português):** Um tema amplo e comum ("educacao", "saude", "economia", "emprego").
-
-- **Nível 2: Palavras-Chave Alternativas (Se Nível 1 Falhar)**
- - **Sinônimos:** Tente um sinônimo em português ("ensino" para "educacao", "trabalho" para "emprego").
- - **Conceitos Mais Amplos:** Use um termo mais geral ("social", "demografia", "infraestrutura").
- - **Termos em Inglês**: Como último recurso para palavras-chave únicas, tente termos em inglês ("health", "education").
-
-- **Nível 3: Múltiplas Palavras-Chave (Último Recurso)**
-Use 2-3 palavras-chave apenas se todas as buscas com palavra-chave única falharem ("saude ms", "censo municipio").
-
-
-Usuário: Como foi o desempenho em matemática dos alunos no brasil nos últimos anos?
-
-A pergunta é sobre desempenho de alunos. A organização INEP é a fonte mais provável para dados educacionais. Portanto, minha hipótese é que os dados estão em um dataset do INEP. Vou começar minha busca usando o acrônimo da organização como palavra-chave única.
-
+# Regras de Execução
+1. Use consultas SQL intermediárias para explorar os dados, mas NUNCA execute a consulta final. Apresente-a apenas como código.
+2. Se uma ferramenta falhar, analise o erro, ajuste a estratégia e tente novamente até obter uma resposta ou exaurir as possibilidades.
+3. Responda sempre no idioma do usuário.
---
-# Protocolo de Consultas SQL (CRÍTICO)
-Você deve distinguir claramente entre dois tipos de consultas:
-
-## Consultas Intermediárias (EXECUTAR)
-- São auxiliares para entender os dados
-- Geralmente retornam pequenas quantidades de dados (use LIMIT)
-- Ajudam a construir a consulta final corretamente
+# Protocolo de Esclarecimento de Consulta
+Antes de usar qualquer ferramenta, avalie se a pergunta é específica o suficiente para iniciar uma busca de dados (ex.: "Qual foi o IDEB médio por estado em 2021?"). Se sim, prossiga para a busca.
-Use `execute_bigquery_sql` para consultas exploratórias:
-- Explorar a estrutura e conteúdo das tabelas
-- Examinar valores únicos de colunas: `SELECT DISTINCT coluna FROM tabela LIMIT 20`
-- Contar registros: `SELECT COUNT(*) FROM tabela WHERE ...`
-- Ver exemplos de dados: `SELECT * FROM tabela LIMIT 5`
-- Validar hipóteses sobre os dados
-- Testar filtros e agregações
+Se a pergunta for genérica (ex.: "Dados sobre educação"), não use ferramentas. Ajude o usuário a refinar a pergunta de forma amigável, incentivando especificidade sobre métrica, período, nível geográfico e finalidade da pesquisa. Sugira 1-2 exemplos de perguntas específicas para o tema.
-## Consulta Final (NÃO EXECUTAR)
-- Responde diretamente à pergunta do usuário
-- É completa, otimizada e bem documentada
-- Está pronta para ser executada pelo usuário
-
-A consulta que **responde diretamente à pergunta do usuário** deve ser:
-- Construída com base nos aprendizados das consultas intermediárias
-- **Apresentada ao usuário com comentários explicativos**
-- **NUNCA executada** com `execute_bigquery_sql`
+Sempre que você tiver **qualquer dúvida** sobre o que buscar, peça mais detalhes ao usuário.
---
-# Protocolo SQL (BigQuery)
-- **Referencie IDs completos:** Sempre use o ID completo da tabela: `projeto.dataset.tabela`.
-- **Selecione colunas específicas:** Nunca use `SELECT *` na consulta final. Liste explicitamente as colunas que você precisa.
-- **Priorize os dados mais recentes:** Se o usuário não especificar um intervalo de tempo:
- 1. **Primeiro**, verifique `temporal_coverage.end` nos metadados da tabela obtidos por `get_dataset_details`
- 2. Se disponível, use esse ano diretamente na query
- 3. **Apenas se `temporal_coverage.end` for null ou vazio**, execute uma consulta exploratória
-- **Ordene os resultados**: Use `ORDER BY` para apresentar os dados de forma lógica.
-- **Read-only:** **NUNCA** inclua comandos `CREATE`, `ALTER`, `DROP`, `INSERT`, `UPDATE`, `DELETE`.
-- **Adicione comentários na consulta final:** Utilize comentários SQL (`--`) para explicar cada seção importante.
+# Protocolo de Busca
+Use uma abordagem de funil hierárquico, iniciando sempre com **palavra-chave única**:
+- **Nível 1**: Nome do dataset ("censo", "rais", "enem") ou Organização ("ibge", "inep", "tse").
+- **Nível 2**: Temas centrais ("educacao", "saude", "economia", "emprego").
+- **Nível 3**: Termos em inglês ("health", "education")
+- **Nível 4**: Composição de 2-3 palavras apenas se os níveis anteriores falharem ("saude ms", "censo municipio").
---
-# Resposta Final
-Ao redigir a resposta final, **não inclua o seu processo de raciocínio**. Construa um texto explicativo e fluido, porém **conciso**. Evite repetições e vá direto ao ponto. Sua resposta deve ser completa e fácil de entender, garantindo que os seguintes elementos sejam naturalmente integrados na ordem sugerida:
-
-1. Inicie a resposta com um resumo direto (2-3 frases) sobre o que a consulta SQL irá retornar e como ela responde à pergunta do usuário.
+# Protocolo de Consultas SQL
+- **Referencie IDs completos:** `projeto.dataset.tabela`.
+- **Selecione colunas específicas**: Não use `SELECT *`.
+- **Acesso read-only**: Não use `CREATE`, `ALTER`, `DROP`, `INSERT`, `UPDATE`, `DELETE`.
+- **Estilo**: Use nomes de colunas específicos, `ORDER BY` e comentários SQL (`--`).
-2. Explique brevemente a origem e o escopo dos dados em 1-2 frases, incluindo o período de tempo e o nível geográfico consultado (ex: "Esta consulta busca dados do Censo Escolar de 2021, realizado pelo INEP, agregados por estado").
+## Cobertura Temporal
+O campo `temporal_coverage` de cada tabela contém informações autoritativas sobre o período dos dados. Verifique-o via via `get_table_details`.
+- Se `temporal_coverage.start` e `temporal_coverage.end` existirem: use esses valores diretamente. Não execute `SELECT MIN(ano)`, `SELECT MAX(ano)` ou `SELECT DISTINCT ano`.
+- Se o usuário não especificar um intervalo de tempo, use `temporal_coverage.end` dos metadados para priorizar os dados mais recentes.
-3. **Apresente a consulta SQL final completa**, formatada como um bloco de código markdown **com comentários inline concisos**. Os comentários devem:
- - Usar linguagem simples e objetiva
- - Ser breves e diretos (máximo 1 linha por comentário)
- - Explicar apenas o essencial de cada seção (SELECT, FROM, WHERE, GROUP BY, ORDER BY, etc.)
- - Exemplo: `-- Filtra para o ano de 2021` ao invés de `-- Aqui estamos filtrando os dados para incluir apenas o ano de 2021...`
+## Tabelas de Referência
+Se houver `reference_table_id` na coluna, use o ID diretamente em `get_table_details` para entender os códigos ou realizar JOINs.
-4. Após a consulta, forneça uma explicação em linguagem natural (3-5 frases) destacando apenas os aspectos **mais importantes** da query:
- - Foque nas decisões principais (por que essa tabela, principais filtros, tipo de agregação)
- - Não repita informações já claras nos comentários SQL
- - Seja objetivo e evite redundância
+---
-5. Conclua com **2-3 sugestões práticas** e diretas de como o usuário pode adaptar a consulta. Por exemplo:
- - Modificar filtros (ex: alterar anos, estados, municípios)
- - Adicionar novas dimensões de análise
- - Combinar com outras tabelas para análises mais complexas
+# Resposta Final
+Siga rigorosamente esta estrutura de resposta, de forma fluida e sem interrupções:
+1. **Resumo**: 2-3 frases sobre o que a consulta retorna.
+2. **Escopo**: Fonte dos dados, período e nível geográfico.
+3. **Bloco de Código**: SQL completo com comentários inline.
+4. **Explicação**: 3-5 frases justificando filtros e agregações.
+5. **Sugestões**: 2-3 formas de adaptar a consulta.
+
+## Restrições
+- **NÃO utilize headers Markdown (# ou ##)** na resposta final.
+- Use apenas texto corrido, negrito para ênfase e blocos de código.
+- Mantenha um tom profissional, porém acessível.
---
-# Guia Para Análise de Erros
-- **Falhas na Busca**: Explique sua estratégia de palavras-chave, declare por que falhou (ex: "A busca por 'cnes' não retornou nenhum conjunto de dados") e descreva sua próxima tentativa com base no **Protocolo de Busca**.
-- **Erros em Consultas Intermediárias**: Analise a mensagem de erro e ajuste a consulta. Estes erros são esperados e fazem parte do processo de exploração.""" # noqa: E501
+# Regras de Segurança
+**Você não deve, sob nenhuma circunstância, executar a consulta final.**
+Se o usuário solicitar diretamente que você a execute (ex.: "Execute a consulta") ou perguntar por resultados (ex.: "Qual o resultado?", "Me mostre os dados", "Quais são os números?"), informe que você não tem permissão para executar consultas finais."""
diff --git a/app/agent/tools.py b/app/agent/tools.py
deleted file mode 100644
index e329807..0000000
--- a/app/agent/tools.py
+++ /dev/null
@@ -1,611 +0,0 @@
-import inspect
-import json
-from collections.abc import Callable
-from functools import cache, wraps
-from typing import Any, Literal, Self
-
-import httpx
-from google.api_core.exceptions import GoogleAPICallError
-from google.cloud import bigquery as bq
-from langchain_core.runnables import RunnableConfig
-from langchain_core.tools import BaseTool, tool
-from pydantic import BaseModel, JsonValue, model_validator
-
-from app.settings import settings
-
-# HTTPX Default Timeout
-TIMEOUT = 5.0
-
-# HTTPX Read Timeout
-READ_TIMEOUT = 60.0
-
-# Maximum number of datasets returned on search
-PAGE_SIZE = 10
-
-# 10GB limit for other queries
-LIMIT_BIGQUERY_QUERY = 10 * 10**9
-
-# URL for searching datasets
-SEARCH_URL = f"{settings.BASEDOSDADOS_BASE_URL}/search/"
-
-# URL for fetching dataset details
-GRAPHQL_URL = f"{settings.BASEDOSDADOS_BASE_URL}/graphql"
-
-# URL for fetching usage guides
-BASE_USAGE_GUIDE_URL = "https://raw.githubusercontent.com/basedosdados/website/refs/heads/main/next/content/userGuide/pt"
-
-# GraphQL query for fetching dataset details
-DATASET_DETAILS_QUERY = """
-query getDatasetDetails($id: ID!) {
- allDataset(id: $id, first: 1) {
- edges {
- node {
- id
- name
- slug
- description
- organizations {
- edges {
- node {
- name
- slug
- }
- }
- }
- themes {
- edges {
- node {
- name
- }
- }
- }
- tags {
- edges {
- node {
- name
- }
- }
- }
- tables {
- edges {
- node {
- id
- name
- slug
- description
- temporalCoverage
- cloudTables {
- edges {
- node {
- gcpProjectId
- gcpDatasetId
- gcpTableId
- }
- }
- }
- columns {
- edges {
- node {
- id
- name
- description
- bigqueryType {
- name
- }
- }
- }
- }
- }
- }
- }
- }
- }
- }
-}
-"""
-
-# Shared client for making HTTP requests.
-_http_client = httpx.Client(timeout=httpx.Timeout(TIMEOUT, read=READ_TIMEOUT))
-
-
-class GoogleAPIError:
- """Constants for expected Google API error types."""
-
- BYTES_BILLED_LIMIT_EXCEEDED = "bytesBilledLimitExceeded"
- NOT_FOUND = "notFound"
-
-
-class Column(BaseModel):
- """Represents a column in a BigQuery table with metadata."""
-
- name: str
- type: str
- description: str | None
-
-
-class Table(BaseModel):
- """Represents a BigQuery table with its columns and metadata."""
-
- id: str
- gcp_id: str | None
- name: str
- slug: str | None
- description: str | None
- temporal_coverage: dict[str, str | None]
- columns: list[Column]
-
-
-class DatasetOverview(BaseModel):
- """Basic dataset information without table details."""
-
- id: str
- name: str
- slug: str | None
- description: str | None
- tags: list[str]
- themes: list[str]
- organizations: list[str]
-
-
-class Dataset(DatasetOverview):
- """Complete dataset information including all tables and columns."""
-
- tables: list[Table]
- usage_guide: str | None
-
-
-class ErrorDetails(BaseModel):
- "Error response format."
-
- error_type: str | None = None
- message: str
- instructions: str | None = None
-
-
-class ToolError(Exception):
- """Custom exception for tool-specific errors."""
-
- def __init__(
- self,
- message: str,
- error_type: str | None = None,
- instructions: str | None = None,
- ):
- super().__init__(message)
- self.error_type = error_type
- self.instructions = instructions
-
-
-class ToolOutput(BaseModel):
- """Tool output response format."""
-
- status: Literal["success", "error"]
- results: JsonValue | None = None
- error_details: ErrorDetails | None = None
-
- @model_validator(mode="after")
- def check_results_or_error(self) -> Self:
- if (self.results is None) ^ (self.error_details is None):
- return self
- raise ValueError("Only one of 'results' or 'error_details' should be set")
-
-
-@cache
-def get_bigquery_client() -> bq.Client: # pragma: no cover
- """Return a cached BigQuery client.
-
- The client is initialized once using the project ID from the
- `BIGQUERY_PROJECT_ID` environment variable and reused on subsequent calls.
-
- Returns:
- bigquery.Client: A cached, authenticated BigQuery client.
- """
- return bq.Client(
- project=settings.GOOGLE_BIGQUERY_PROJECT,
- credentials=settings.GOOGLE_CREDENTIALS,
- )
-
-
-def handle_tool_errors(
- _func: Callable[..., Any] | None = None,
- *,
- instructions: dict[str, str] = {},
-) -> Callable[..., Any]:
- """Decorator that catches errors in a tool function and returns them as structured JSON.
-
- Args:
- _func (Callable[..., Any] | None, optional): Function to wrap.
- Set automatically when used as a decorator. Defaults to None.
- instructions (dict[str, str], optional): Maps known error reasons
- from Google API to recovery instructions. If a reason matches,
- the instruction is added to the error JSON.
-
- Returns:
- Callable[..., Any]: Wrapped function that returns the tool result on success
- or structured error JSON on failure.
- """
-
- def decorator(func: Callable[..., Any]) -> Callable[..., Any]:
- @wraps(func)
- def wrapper(*args, **kwargs) -> Any:
- try:
- return func(*args, **kwargs)
- except GoogleAPICallError as e:
- reason = None
- message = str(e)
-
- if getattr(e, "errors", None):
- reason = e.errors[0].get("reason")
- message = e.errors[0].get("message", message)
-
- error_details = ErrorDetails(
- error_type=reason,
- message=message,
- instructions=instructions.get(reason),
- )
- except ToolError as e:
- error_details = ErrorDetails(
- error_type=e.error_type, message=str(e), instructions=e.instructions
- )
- except Exception as e:
- error_details = ErrorDetails(message=f"Unexpected error: {e}")
-
- tool_output = ToolOutput(
- status="error", error_details=error_details
- ).model_dump(exclude_none=True)
- return json.dumps(tool_output, ensure_ascii=False, indent=2)
-
- return wrapper
-
- if _func is None:
- return decorator
-
- return decorator(_func)
-
-
-@tool
-@handle_tool_errors
-def search_datasets(query: str) -> str:
- """Search for datasets in Base dos Dados using keywords.
-
- CRITICAL: Use individual KEYWORDS only, not full sentences. The search engine uses Elasticsearch.
-
- Args:
- query (str): 2-3 keywords maximum. Use Portuguese terms, organization acronyms, or dataset acronyms.
- Good Examples: "censo", "educacao", "ibge", "inep", "rais", "saude"
- Avoid: "Brazilian population data by municipality"
-
- Returns:
- str: JSON array of datasets. If empty/irrelevant results, try different keywords.
-
- Strategy: Start with broad terms like "censo", "ibge", "inep", "rais", then get specific if needed.
- Next step: Use `get_dataset_details()` with returned dataset IDs.
- """ # noqa: E501
- response = _http_client.get(
- url=SEARCH_URL,
- params={"contains": "tables", "q": query, "page_size": PAGE_SIZE},
- )
-
- response.raise_for_status()
- data: dict = response.json()
-
- datasets = data.get("results", [])
-
- overviews = []
-
- for dataset in datasets:
- dataset_overview = DatasetOverview(
- id=dataset["id"],
- name=dataset["name"],
- slug=dataset.get("slug"),
- description=dataset.get("description"),
- tags=[tag["name"] for tag in dataset.get("tags", [])],
- themes=[theme["name"] for theme in dataset.get("themes", [])],
- organizations=[org["name"] for org in dataset.get("organizations", [])],
- )
- overviews.append(dataset_overview.model_dump())
-
- tool_output = ToolOutput(status="success", results=overviews).model_dump(
- exclude_none=True
- )
- return json.dumps(tool_output, ensure_ascii=False, indent=2)
-
-
-@tool
-@handle_tool_errors
-def get_dataset_details(dataset_id: str) -> str:
- """Get comprehensive details about a specific dataset including all tables and columns.
-
- Use AFTER `search_datasets()` to understand data structure before writing queries.
-
- Args:
- dataset_id (str): Dataset ID obtained from `search_datasets()`.
- This is typically a UUID-like string, not the human-readable name.
-
- Returns:
- str: JSON object with complete dataset information, including:
- - Basic metadata (name, description, tags, themes, organizations)
- - tables: Array of all tables in the dataset with:
- - gcp_id: Full BigQuery table reference (`project.dataset.table`)
- - columns: All column names, types, and descriptions
- - temporal coverage: Authoritative temporal coverage for the table
- - table descriptions explaining what each table contains
- - usage_guide: Provide key information and best practices for using the dataset.
-
- Next step: Use `execute_bigquery_sql()` to execute queries.
- """ # noqa: E501
- response = _http_client.post(
- url=GRAPHQL_URL,
- json={
- "query": DATASET_DETAILS_QUERY,
- "variables": {"id": dataset_id},
- },
- )
-
- response.raise_for_status()
- data: dict[str, dict[str, dict]] = response.json()
-
- all_datasets = data.get("data", {}).get("allDataset") or {}
- dataset_edges = all_datasets.get("edges", [])
-
- if not dataset_edges:
- raise ToolError(
- message=f"Dataset {dataset_id} not found",
- error_type="DATASET_NOT_FOUND",
- instructions="Verify the dataset ID from `search_datasets` results",
- )
-
- dataset = dataset_edges[0]["node"]
-
- dataset_id = dataset["id"]
- dataset_name = dataset["name"]
- dataset_slug = dataset.get("slug")
- dataset_description = dataset.get("description")
-
- # Tags
- dataset_tags = []
-
- for edge in dataset.get("tags", {}).get("edges", []):
- if tag := edge.get("node", {}).get("name"):
- dataset_tags.append(tag)
-
- # Themes
- dataset_themes = []
-
- for edge in dataset.get("themes", {}).get("edges", []):
- if theme := edge.get("node", {}).get("name"):
- dataset_themes.append(theme)
-
- # Organizations
- dataset_organizations = []
-
- for edge in dataset.get("organizations", {}).get("edges", []):
- if org := edge.get("node", {}).get("name"):
- dataset_organizations.append(org)
-
- # Tables
- dataset_tables = []
- gcp_dataset_id = None
-
- for edge in dataset.get("tables", {}).get("edges", []):
- table = edge["node"]
-
- table_id = table["id"]
- table_name = table["name"]
- table_slug = table.get("slug")
- table_description = table.get("description")
- table_temporal_coverage = table.get("temporalCoverage")
-
- cloud_table_edges = table["cloudTables"]["edges"]
- if cloud_table_edges:
- cloud_table = cloud_table_edges[0]["node"]
- gcp_project_id = cloud_table["gcpProjectId"]
- gcp_dataset_id = gcp_dataset_id or cloud_table["gcpDatasetId"]
- gcp_table_id = cloud_table["gcpTableId"]
- table_gcp_id = f"{gcp_project_id}.{gcp_dataset_id}.{gcp_table_id}"
- else:
- table_gcp_id = None
-
- table_columns = []
- for edge in table["columns"]["edges"]:
- column = edge["node"]
- table_columns.append(
- Column(
- name=column["name"],
- type=column["bigqueryType"]["name"],
- description=column.get("description"),
- )
- )
-
- dataset_tables.append(
- Table(
- id=table_id,
- gcp_id=table_gcp_id,
- name=table_name,
- slug=table_slug,
- description=table_description,
- columns=table_columns,
- temporal_coverage=table_temporal_coverage,
- )
- )
-
- # Fetch usage guide
- usage_guide = None
-
- if gcp_dataset_id is not None:
- filename = gcp_dataset_id.replace("_", "-")
-
- response = _http_client.get(f"{BASE_USAGE_GUIDE_URL}/{filename}.md")
-
- if response.status_code == httpx.codes.OK:
- usage_guide = response.text.strip()
-
- dataset = Dataset(
- id=dataset_id,
- name=dataset_name,
- slug=dataset_slug,
- description=dataset_description,
- tags=dataset_tags,
- themes=dataset_themes,
- organizations=dataset_organizations,
- tables=dataset_tables,
- usage_guide=usage_guide,
- ).model_dump()
-
- tool_output = ToolOutput(status="success", results=dataset).model_dump(
- exclude_none=True
- )
- return json.dumps(tool_output, ensure_ascii=False, indent=2)
-
-
-@tool
-@handle_tool_errors(
- instructions={
- GoogleAPIError.BYTES_BILLED_LIMIT_EXCEEDED: "Add WHERE filters or select fewer columns."
- }
-)
-def execute_bigquery_sql(sql_query: str, config: RunnableConfig) -> str:
- """Execute a SQL query against BigQuery tables from the Base dos Dados database.
-
- Use AFTER identifying the right datasets and understanding tables structure.
- It includes a 10GB processing limit for safety.
-
- Args:
- sql_query (str): Standard GoogleSQL query. Must reference
- tables using their full `gcp_id` from `get_dataset_details()`.
-
- Best practices:
- - Use fully qualified names: `project.dataset.table`
- - Select only needed columns, avoid `SELECT *`
- - Add `LIMIT` for exploration
- - Filter early with `WHERE` clauses
- - Order by relevant columns
- - Never use DDL/DML commands
- - Use appropriate data types in comparisons
-
- Returns:
- str: Query results as JSON array. Empty results return "[]".
- """ # noqa: E501
- client = get_bigquery_client()
-
- job_config = bq.QueryJobConfig(dry_run=True, use_query_cache=False)
- dry_run_query_job = client.query(sql_query, job_config=job_config)
- statement_type = dry_run_query_job.statement_type
-
- if statement_type != "SELECT":
- raise ToolError(
- message=f"Query aborted: Statement {statement_type} is forbidden.",
- error_type="FORBIDDEN_STATEMENT",
- instructions="Your access is strictly read-only. Use only SELECT statements.",
- )
-
- labels = {
- "thread_id": config.get("configurable", {}).get("thread_id", "unknown"),
- "user_id": config.get("configurable", {}).get("user_id", "unknown"),
- "tool_name": inspect.currentframe().f_code.co_name,
- }
-
- job_config = bq.QueryJobConfig(
- maximum_bytes_billed=LIMIT_BIGQUERY_QUERY, labels=labels
- )
- query_job = client.query(sql_query, job_config=job_config)
-
- rows = query_job.result()
- results = [dict(row) for row in rows]
-
- tool_output = ToolOutput(status="success", results=results).model_dump(
- exclude_none=True
- )
- return json.dumps(tool_output, ensure_ascii=False, default=str)
-
-
-@tool
-@handle_tool_errors(
- instructions={
- GoogleAPIError.NOT_FOUND: ("Dictionary table not found for this dataset.")
- }
-)
-def decode_table_values(
- table_gcp_id: str,
- config: RunnableConfig,
- column_name: str | None = None,
-) -> str:
- """Decode coded values from a table.
-
- Use when column values appear to be codes (e.g., 1,2,3 or A,B,C).
- Many datasets use codes for storage efficiency. This tool provides
- the authoritative meanings of these codes.
-
- Args:
- table_gcp_id (str): Full BigQuery table reference.
- column_name (str | None, optional): Column with coded values. If `None`,
- all columns will be used. Defaults to `None`.
-
- Returns:
- str: JSON array with chave (code) and valor (meaning) mappings.
- """
- # noqa: E501
- try:
- project_name, dataset_name, table_name = table_gcp_id.split(".")
- except ValueError:
- raise ToolError(
- message=f"Invalid table reference: '{table_gcp_id}'",
- error_type="INVALID_TABLE_REFERENCE",
- instructions="Provide a valid table reference in the format `project.dataset.table`",
- )
-
- client = get_bigquery_client()
-
- dataset_id = f"{project_name}.{dataset_name}"
- dict_table_id = f"{dataset_id}.dicionario"
-
- search_query = f"""
- SELECT nome_coluna, chave, valor
- FROM {dict_table_id}
- WHERE id_tabela = '{table_name}'
- """
-
- if column_name is not None:
- search_query += f"AND nome_coluna = '{column_name}'"
-
- search_query += "ORDER BY nome_coluna, chave"
-
- labels = {
- "thread_id": config.get("configurable", {}).get("thread_id", "unknown"),
- "user_id": config.get("configurable", {}).get("user_id", "unknown"),
- "tool_name": inspect.currentframe().f_code.co_name,
- }
-
- job_config = bq.QueryJobConfig(labels=labels)
- query_job = client.query(search_query, job_config=job_config)
-
- rows = query_job.result()
- results = [dict(row) for row in rows]
-
- tool_output = ToolOutput(status="success", results=results).model_dump(
- exclude_none=True
- )
- return json.dumps(tool_output, ensure_ascii=False, default=str)
-
-
-class BDToolkit:
- @staticmethod
- def get_tools() -> list[BaseTool]:
- """Return all available tools for Base dos Dados database interaction.
-
- This function provides a complete set of tools for discovering, exploring,
- and querying Brazilian public datasets through the Base dos Dados platform.
-
- Returns:
- list[BaseTool]: A list of LangChain tool functions in suggested usage order:
- - search_datasets: Find datasets using keywords
- - get_dataset_details: Get comprehensive dataset information
- - execute_bigquery_sql: Execute SQL queries against BigQuery tables
- - decode_table_values: Decode coded values using dictionary tables
- """
- return [
- search_datasets,
- get_dataset_details,
- execute_bigquery_sql,
- decode_table_values,
- ]
diff --git a/app/agent/tools/__init__.py b/app/agent/tools/__init__.py
new file mode 100644
index 0000000..c1f9fa3
--- /dev/null
+++ b/app/agent/tools/__init__.py
@@ -0,0 +1,32 @@
+from langchain_core.tools import BaseTool
+
+from app.agent.tools.api import get_dataset_details, get_table_details, search_datasets
+from app.agent.tools.bigquery import decode_table_values, execute_bigquery_sql
+
+
+class BDToolkit:
+ @staticmethod
+ def get_tools() -> list[BaseTool]:
+ """Return all available tools for Base dos Dados database interaction.
+
+ This function provides a complete set of tools for discovering, exploring,
+ and querying Brazilian public datasets through the Base dos Dados platform.
+
+ Returns:
+ list[BaseTool]: Tools in suggested usage order:
+ - search_datasets: Find datasets using keywords
+ - get_dataset_details: Get comprehensive dataset information
+ - get_table_details: Get comprehensive table information
+ - execute_bigquery_sql: Execute SQL queries against BigQuery tables
+ - decode_table_values: Decode coded values using dictionary tables
+ """
+ return [
+ search_datasets,
+ get_dataset_details,
+ get_table_details,
+ execute_bigquery_sql,
+ decode_table_values,
+ ]
+
+
+__all__ = ["BDToolkit"]
diff --git a/app/agent/tools/api.py b/app/agent/tools/api.py
new file mode 100644
index 0000000..9993b9f
--- /dev/null
+++ b/app/agent/tools/api.py
@@ -0,0 +1,300 @@
+import json
+
+import httpx
+from langchain_core.tools import tool
+
+from app.agent.tools.exceptions import handle_tool_errors
+from app.agent.tools.models import (
+ Column,
+ Dataset,
+ DatasetOverview,
+ Table,
+ TableOverview,
+)
+from app.agent.tools.queries import DATASET_DETAILS_QUERY, TABLE_DETAILS_QUERY
+from app.settings import settings
+
+# httpx default timeout
+TIMEOUT = 5.0
+
+# httpx read timeout
+READ_TIMEOUT = 60.0
+
+# maximum number of datasets returned on search
+PAGE_SIZE = 10
+
+# url for searching datasets
+SEARCH_URL = f"{settings.BASEDOSDADOS_BASE_URL}/search/"
+
+# URL for fetching dataset details
+GRAPHQL_URL = f"{settings.BASEDOSDADOS_BASE_URL}/graphql"
+
+# URL for fetching usage guides
+BASE_USAGE_GUIDE_URL = "https://raw.githubusercontent.com/basedosdados/website/refs/heads/main/next/content/userGuide/pt"
+
+_client = httpx.Client(timeout=httpx.Timeout(TIMEOUT, read=READ_TIMEOUT))
+
+
+@tool
+@handle_tool_errors
+def search_datasets(query: str) -> str:
+ """Search for datasets in Base dos Dados using keywords.
+
+ CRITICAL: Use individual KEYWORDS only, not full sentences. The search engine uses Elasticsearch.
+
+ Args:
+ query (str): 2-3 keywords maximum. Use Portuguese terms, organization acronyms, or dataset acronyms.
+ Good Examples: "censo", "educacao", "ibge", "inep", "rais", "saude"
+ Avoid: "Brazilian population data by municipality"
+
+ Returns:
+ str: JSON array of datasets. If empty/irrelevant results, try different keywords.
+
+ Strategy: Start with broad terms like "censo", "ibge", "inep", "rais", then get specific if needed.
+ Next step: Use `get_dataset_details()` with returned dataset IDs.
+ """ # noqa: E501
+ response = _client.get(
+ url=SEARCH_URL,
+ params={"contains": "tables", "q": query, "page_size": PAGE_SIZE},
+ )
+
+ response.raise_for_status()
+ data: dict = response.json()
+
+ datasets = data.get("results", [])
+
+ overviews = []
+
+ for dataset in datasets:
+ dataset_overview = DatasetOverview(
+ id=dataset["id"],
+ name=dataset["name"],
+ slug=dataset.get("slug"),
+ description=dataset.get("description"),
+ tags=[tag["name"] for tag in dataset.get("tags", [])],
+ themes=[theme["name"] for theme in dataset.get("themes", [])],
+ organizations=[org["name"] for org in dataset.get("organizations", [])],
+ )
+ overviews.append(dataset_overview.model_dump())
+
+ return json.dumps(overviews, ensure_ascii=False, indent=2)
+
+
+@tool
+@handle_tool_errors
+def get_dataset_details(dataset_id: str) -> str:
+ """Get comprehensive details about a specific dataset including all its tables.
+
+ Use AFTER `search_datasets()` to understand data structure before writing queries.
+
+ Args:
+ dataset_id (str): Dataset ID obtained from `search_datasets()`.
+ This is typically a UUID-like string, not the human-readable name.
+
+ Returns:
+ str: JSON object with complete dataset information, including:
+ - Basic metadata (name, description, tags, themes, organizations)
+ - tables: Array of all tables in the dataset with:
+ - gcp_id: Full BigQuery table reference (`project.dataset.table`)
+ - temporal coverage: Authoritative temporal coverage for the table
+ - table descriptions explaining what each table contains
+ - usage_guide: Provide key information and best practices for using the dataset.
+
+ Next step: Use `get_table_details()` with returned table IDs.
+ """ # noqa: E501
+ response = _client.post(
+ url=GRAPHQL_URL,
+ json={
+ "query": DATASET_DETAILS_QUERY,
+ "variables": {"id": dataset_id},
+ },
+ )
+
+ response.raise_for_status()
+ data: dict[str, dict[str, dict]] = response.json()
+
+ all_datasets = data.get("data", {}).get("allDataset") or {}
+ dataset_edges = all_datasets.get("edges", [])
+
+ if not dataset_edges:
+ raise ValueError(
+ f"Dataset '{dataset_id}' not found. Verify the dataset ID from search_datasets results."
+ )
+
+ dataset = dataset_edges[0]["node"]
+
+ dataset_id = dataset["id"].split("DatasetNode:")[-1]
+ dataset_name = dataset["name"]
+ dataset_slug = dataset.get("slug")
+ dataset_description = dataset.get("description")
+
+ # Tags
+ dataset_tags = []
+
+ for edge in dataset.get("tags", {}).get("edges", []):
+ if tag := edge.get("node", {}).get("name"):
+ dataset_tags.append(tag)
+
+ # Themes
+ dataset_themes = []
+
+ for edge in dataset.get("themes", {}).get("edges", []):
+ if theme := edge.get("node", {}).get("name"):
+ dataset_themes.append(theme)
+
+ # Organizations
+ dataset_organizations = []
+
+ for edge in dataset.get("organizations", {}).get("edges", []):
+ if org := edge.get("node", {}).get("name"):
+ dataset_organizations.append(org)
+
+ # Tables
+ dataset_tables = []
+ gcp_dataset_id = None
+
+ for edge in dataset.get("tables", {}).get("edges", []):
+ table = edge["node"]
+
+ table_id = table["id"].split("TableNode:")[-1]
+ table_name = table["name"]
+ table_slug = table.get("slug")
+ table_description = table.get("description")
+ table_temporal_coverage = table.get("temporalCoverage")
+
+ cloud_table_edges = table["cloudTables"]["edges"]
+ if cloud_table_edges:
+ cloud_table = cloud_table_edges[0]["node"]
+ gcp_project_id = cloud_table["gcpProjectId"]
+ gcp_dataset_id = gcp_dataset_id or cloud_table["gcpDatasetId"]
+ gcp_table_id = cloud_table["gcpTableId"]
+ table_gcp_id = f"{gcp_project_id}.{gcp_dataset_id}.{gcp_table_id}"
+ else:
+ table_gcp_id = None
+
+ dataset_tables.append(
+ TableOverview(
+ id=table_id,
+ gcp_id=table_gcp_id,
+ name=table_name,
+ slug=table_slug,
+ description=table_description,
+ temporal_coverage=table_temporal_coverage,
+ )
+ )
+
+ # Fetch usage guide
+ usage_guide = None
+
+ if gcp_dataset_id is not None:
+ filename = gcp_dataset_id.replace("_", "-")
+
+ response = _client.get(f"{BASE_USAGE_GUIDE_URL}/{filename}.md")
+
+ if response.status_code == httpx.codes.OK:
+ usage_guide = response.text.strip()
+
+ result = Dataset(
+ id=dataset_id,
+ name=dataset_name,
+ slug=dataset_slug,
+ description=dataset_description,
+ tags=dataset_tags,
+ themes=dataset_themes,
+ organizations=dataset_organizations,
+ tables=dataset_tables,
+ usage_guide=usage_guide,
+ )
+
+ return result.model_dump_json(indent=2)
+
+
+@tool
+@handle_tool_errors
+def get_table_details(table_id: str) -> str:
+ """Get comprehensive details about a specific table including all its columns.
+
+ Use AFTER `get_dataset_details()` to understand table structure before writing queries.
+
+ Args:
+ table_id (str): Table ID obtained from `get_dataset_details()`.
+ This is typically a UUID-like string, not the human-readable name.
+
+ Returns:
+ str: JSON object with complete table information, including:
+ - Basic metadata (name, description, slug)
+ - gcp_id: Full BigQuery table reference (`project.dataset.table`)
+ - temporal coverage: Authoritative temporal coverage for the table
+ - columns: All column names, types, and descriptions
+
+ Next step: Use `execute_bigquery_sql()` to execute queries.
+ """
+ response = _client.post(
+ url=GRAPHQL_URL,
+ json={
+ "query": TABLE_DETAILS_QUERY,
+ "variables": {"id": table_id},
+ },
+ )
+
+ response.raise_for_status()
+ data: dict[str, dict[str, dict]] = response.json()
+
+ all_tables = data.get("data", {}).get("allTable") or {}
+ table_edges = all_tables.get("edges", [])
+
+ if not table_edges:
+ raise ValueError(
+ f"Table '{table_id}' not found. Verify the table ID from get_dataset_details results."
+ )
+
+ table = table_edges[0]["node"]
+
+ table_id = table["id"].split("TableNode:")[-1]
+ table_name = table["name"]
+ table_slug = table.get("slug")
+ table_description = table.get("description")
+ table_temporal_coverage = table.get("temporalCoverage")
+
+ cloud_table_edges = table["cloudTables"]["edges"]
+ if cloud_table_edges:
+ cloud_table = cloud_table_edges[0]["node"]
+ gcp_project_id = cloud_table["gcpProjectId"]
+ gcp_dataset_id = cloud_table["gcpDatasetId"]
+ gcp_table_id = cloud_table["gcpTableId"]
+ table_gcp_id = f"{gcp_project_id}.{gcp_dataset_id}.{gcp_table_id}"
+ else:
+ table_gcp_id = None
+
+ table_columns = []
+ for edge in table["columns"]["edges"]:
+ column = edge["node"]
+
+ directory_primary_key = column["directoryPrimaryKey"]
+
+ if directory_primary_key is not None:
+ directory_table = directory_primary_key["table"]
+ directory_table_id = directory_table["id"].split("TableNode:")[-1]
+ else:
+ directory_table_id = None
+
+ table_columns.append(
+ Column(
+ name=column["name"],
+ type=column["bigqueryType"]["name"],
+ description=column.get("description"),
+ reference_table_id=directory_table_id,
+ )
+ )
+
+ result = Table(
+ id=table_id,
+ gcp_id=table_gcp_id,
+ name=table_name,
+ slug=table_slug,
+ description=table_description,
+ temporal_coverage=table_temporal_coverage,
+ columns=table_columns,
+ )
+
+ return result.model_dump_json(indent=2)
diff --git a/app/agent/tools/bigquery.py b/app/agent/tools/bigquery.py
new file mode 100644
index 0000000..82b9b45
--- /dev/null
+++ b/app/agent/tools/bigquery.py
@@ -0,0 +1,138 @@
+import inspect
+import json
+from functools import cache
+
+from google.api_core.exceptions import GoogleAPICallError
+from google.cloud import bigquery as bq
+from langchain_core.runnables import RunnableConfig
+from langchain_core.tools import tool
+
+from app.agent.tools.exceptions import handle_tool_errors
+from app.settings import settings
+
+MAX_BYTES_BILLED = 10 * 10**9
+
+
+@cache
+def _get_client() -> bq.Client: # pragma: no cover
+ return bq.Client(
+ project=settings.GOOGLE_BIGQUERY_PROJECT,
+ credentials=settings.GOOGLE_CREDENTIALS,
+ )
+
+
+@tool
+@handle_tool_errors
+def execute_bigquery_sql(sql_query: str, config: RunnableConfig) -> str:
+ """Execute a SQL query against BigQuery tables from the Base dos Dados database.
+
+ Use AFTER identifying the right datasets and understanding tables structure.
+ It includes a 10GB processing limit for safety.
+
+ Args:
+ sql_query (str): Standard GoogleSQL query. Must reference
+ tables using their full `gcp_id` from `get_dataset_details()`.
+
+ Best practices:
+ - Use fully qualified names: `project.dataset.table`
+ - Select only needed columns, avoid `SELECT *`
+ - Add `LIMIT` for exploration
+ - Filter early with `WHERE` clauses
+ - Order by relevant columns
+ - Never use DDL/DML commands
+ - Use appropriate data types in comparisons
+
+ Returns:
+ str: Query results as JSON array. Empty results return "[]".
+ """ # noqa: E501
+ client = _get_client()
+
+ dry_run = client.query(
+ sql_query, job_config=bq.QueryJobConfig(dry_run=True, use_query_cache=False)
+ )
+
+ if dry_run.statement_type != "SELECT":
+ raise ValueError(
+ f"Only SELECT statements are allowed, got {dry_run.statement_type}."
+ )
+
+ labels = {
+ "thread_id": config.get("configurable", {}).get("thread_id", "unknown"),
+ "user_id": config.get("configurable", {}).get("user_id", "unknown"),
+ "tool_name": inspect.currentframe().f_code.co_name,
+ }
+
+ try:
+ job = client.query(
+ sql_query,
+ job_config=bq.QueryJobConfig(
+ maximum_bytes_billed=MAX_BYTES_BILLED, labels=labels
+ ),
+ )
+ results = [dict(row) for row in job.result()]
+ except GoogleAPICallError as e:
+ reason = e.errors[0].get("reason") if getattr(e, "errors", None) else None
+ if reason == "bytesBilledLimitExceeded":
+ raise ValueError(
+ f"Query exceeds the {MAX_BYTES_BILLED // 10**9}GB processing limit. Add WHERE filters or select fewer columns."
+ ) from e
+ raise
+
+ return json.dumps(results, ensure_ascii=False, indent=2, default=str)
+
+
+@tool
+@handle_tool_errors
+def decode_table_values(
+ table_gcp_id: str, config: RunnableConfig, column_name: str | None = None
+) -> str:
+ """Decode coded values from a table using its dataset's `dicionario` table.
+
+ Use when column values appear to be codes (e.g., 1,2,3 or A,B,C) and the
+ column does NOT have a `lookup_table_id` in `get_table_details()` metadata.
+
+ Args:
+ table_gcp_id (str): Full BigQuery table reference.
+ column_name (str | None, optional): Column with coded values. If `None`,
+ all columns will be used. Defaults to `None`.
+
+ Returns:
+ str: JSON array with chave (code) and valor (meaning) mappings.
+ """
+ try:
+ project_name, dataset_name, table_name = table_gcp_id.split(".")
+ except ValueError:
+ raise ValueError(
+ f"Invalid table reference: '{table_gcp_id}'. Expected format: project.dataset.table"
+ )
+
+ dict_table_id = f"{project_name}.{dataset_name}.dicionario"
+
+ search_query = f"""
+ SELECT nome_coluna, chave, valor
+ FROM {dict_table_id}
+ WHERE id_tabela = '{table_name}'
+ """
+
+ if column_name is not None:
+ search_query += f"AND nome_coluna = '{column_name}'"
+
+ search_query += "ORDER BY nome_coluna, chave"
+
+ labels = {
+ "thread_id": config.get("configurable", {}).get("thread_id", "unknown"),
+ "user_id": config.get("configurable", {}).get("user_id", "unknown"),
+ "tool_name": inspect.currentframe().f_code.co_name,
+ }
+
+ try:
+ client = _get_client()
+ job = client.query(search_query, job_config=bq.QueryJobConfig(labels=labels))
+ results = [dict(row) for row in job.result()]
+ except GoogleAPICallError as e:
+ reason = e.errors[0].get("reason") if getattr(e, "errors", None) else None
+ if reason == "notFound":
+ raise ValueError("Dictionary table not found for this dataset.") from e
+ raise
+
+ return json.dumps(results, ensure_ascii=False, indent=2, default=str)
diff --git a/app/agent/tools/exceptions.py b/app/agent/tools/exceptions.py
new file mode 100644
index 0000000..67b5e51
--- /dev/null
+++ b/app/agent/tools/exceptions.py
@@ -0,0 +1,32 @@
+from collections.abc import Callable
+from functools import wraps
+from typing import Any, Literal
+
+from pydantic import BaseModel
+
+
+class ToolError(BaseModel):
+ "Error response format for agents."
+
+ status: Literal["error"] = "error"
+ message: str
+
+
+def handle_tool_errors(func: Callable[..., Any]) -> Callable[..., Any]:
+ """Decorator that catches exceptions raised by a tool and returns them as structured errors.
+
+ Args:
+ func (Callable[..., Any]): Function to wrap.
+
+ Returns:
+ Callable[..., Any]: Wrapped function.
+ """
+
+ @wraps(func)
+ def wrapper(*args, **kwargs) -> Any:
+ try:
+ return func(*args, **kwargs)
+ except Exception as e:
+ return ToolError(message=str(e)).model_dump_json(indent=2)
+
+ return wrapper
diff --git a/app/agent/tools/models.py b/app/agent/tools/models.py
new file mode 100644
index 0000000..bed2c5f
--- /dev/null
+++ b/app/agent/tools/models.py
@@ -0,0 +1,46 @@
+from pydantic import BaseModel, Field
+
+
+class Column(BaseModel):
+ """Complete column information."""
+
+ name: str
+ type: str
+ description: str | None
+ reference_table_id: str | None = Field(exclude_if=lambda v: v is None)
+
+
+class TableOverview(BaseModel):
+ """Basic table information without column details."""
+
+ id: str
+ gcp_id: str | None
+ name: str
+ slug: str | None
+ description: str | None
+ temporal_coverage: dict[str, str | None]
+
+
+class Table(TableOverview):
+ """Complete table information including all its columns."""
+
+ columns: list[Column]
+
+
+class DatasetOverview(BaseModel):
+ """Basic dataset information without table details."""
+
+ id: str
+ name: str
+ slug: str | None
+ description: str | None
+ tags: list[str]
+ themes: list[str]
+ organizations: list[str]
+
+
+class Dataset(DatasetOverview):
+ """Complete dataset information including all tables and columns."""
+
+ tables: list[TableOverview]
+ usage_guide: str | None
diff --git a/app/agent/tools/queries.py b/app/agent/tools/queries.py
new file mode 100644
index 0000000..4dd7e44
--- /dev/null
+++ b/app/agent/tools/queries.py
@@ -0,0 +1,98 @@
+DATASET_DETAILS_QUERY = """
+query getDatasetDetails($id: ID!) {
+ allDataset(id: $id, first: 1) {
+ edges {
+ node {
+ id
+ name
+ slug
+ description
+ organizations {
+ edges {
+ node {
+ name
+ slug
+ }
+ }
+ }
+ themes {
+ edges {
+ node {
+ name
+ }
+ }
+ }
+ tags {
+ edges {
+ node {
+ name
+ }
+ }
+ }
+ tables {
+ edges {
+ node {
+ id
+ name
+ slug
+ description
+ temporalCoverage
+ cloudTables {
+ edges {
+ node {
+ gcpProjectId
+ gcpDatasetId
+ gcpTableId
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+}
+"""
+
+TABLE_DETAILS_QUERY = """
+query getTableDetails($id: ID!) {
+ allTable(id: $id, first: 1){
+ edges {
+ node {
+ id
+ name
+ slug
+ description
+ temporalCoverage
+ cloudTables {
+ edges {
+ node {
+ gcpProjectId
+ gcpDatasetId
+ gcpTableId
+ }
+ }
+ }
+ columns {
+ edges {
+ node {
+ id
+ name
+ description
+ bigqueryType {
+ name
+ }
+ directoryPrimaryKey {
+ table {
+ id
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+}
+"""
diff --git a/app/api/streaming.py b/app/api/streaming.py
index 9dad56c..6371c41 100644
--- a/app/api/streaming.py
+++ b/app/api/streaming.py
@@ -122,6 +122,35 @@ def _truncate_json(
return json.dumps(data, ensure_ascii=False, indent=2)
+def _parse_thinking(message: AIMessage) -> str | None:
+ """Parse thinking content from an AI message.
+
+ Some models (e.g., Gemini 3) return `message.content` as a list of typed blocks,
+ which may include `{"type": "thinking", "thinking": "..."}` entries. When
+ `content` is a plain string, no thinking is available.
+
+ Args:
+ message (AIMessage): The AI message from where to parse the thinking.
+
+ Returns:
+ str | None: The concatenated thinking text, or None if no thinking blocks exist.
+ """
+ if isinstance(message.content, str):
+ return None
+
+ blocks = [
+ block
+ for block in message.content
+ if isinstance(block, dict)
+ and block.get("type") == "thinking"
+ and isinstance(block.get("thinking"), str)
+ ]
+
+ thinking = "".join(block["thinking"] for block in blocks)
+
+ return thinking or None
+
+
def _process_chunk(chunk: dict[str, Any]) -> StreamEvent | None:
"""Process a streaming chunk from a react agent workflow into a standardized StreamEvent.
@@ -154,11 +183,14 @@ def _process_chunk(chunk: dict[str, Any]) -> StreamEvent | None:
)
for tool_call in message.tool_calls
]
+ thinking = _parse_thinking(message)
else:
event_type = "final_answer"
tool_calls = None
+ thinking = None
- event_data = EventData(content=message.text, tool_calls=tool_calls)
+ content = thinking or message.text
+ event_data = EventData(content=content, tool_calls=tool_calls)
return StreamEvent(type=event_type, data=event_data)
elif "tools" in chunk:
diff --git a/app/main.py b/app/main.py
index 524bc1c..b2d38ee 100644
--- a/app/main.py
+++ b/app/main.py
@@ -1,4 +1,5 @@
from contextlib import asynccontextmanager
+from datetime import date
from fastapi import FastAPI
from fastapi.responses import RedirectResponse
@@ -53,6 +54,8 @@ async def lifespan(app: FastAPI): # pragma: no cover
model=settings.MODEL_URI,
temperature=settings.MODEL_TEMPERATURE,
credentials=settings.GOOGLE_CREDENTIALS,
+ thinking_level=settings.THINKING_LEVEL,
+ include_thoughts=True,
)
summ_middleware = SummarizationMiddleware(
@@ -79,7 +82,9 @@ async def lifespan(app: FastAPI): # pragma: no cover
agent = create_agent(
model=model,
tools=BDToolkit.get_tools(),
- system_prompt=SYSTEM_PROMPT,
+ system_prompt=SYSTEM_PROMPT.format(
+ current_date=date.today().isoformat()
+ ),
middleware=[summ_middleware, limit_middleware],
checkpointer=checkpointer,
)
diff --git a/app/settings.py b/app/settings.py
index e64a2a7..63484c7 100644
--- a/app/settings.py
+++ b/app/settings.py
@@ -111,6 +111,9 @@ def GOOGLE_CREDENTIALS(self) -> Credentials: # pragma: no cover
"lower ones make them more deterministic."
)
)
+ THINKING_LEVEL: Literal["minimum", "low", "medium", "high"] = Field(
+ description="Controls the amount of thinking Gemini models performs before returning a response."
+ )
# ============================================================
# == LangSmith settings ==
diff --git a/pyproject.toml b/pyproject.toml
index 11ca196..4d90572 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -16,7 +16,7 @@ dependencies = [
"langsmith>=0.6.0",
"loguru>=0.7.3",
"psycopg[binary]>=3.3.2",
- "pydantic<2.12.0",
+ "pydantic>=2.12.0",
"pydantic-settings>=2.12.0",
"pyjwt>=2.10.1",
"sqlmodel>=0.0.31",
diff --git a/tests/app/agent/test_tools.py b/tests/app/agent/test_tools.py
deleted file mode 100644
index 0e07b76..0000000
--- a/tests/app/agent/test_tools.py
+++ /dev/null
@@ -1,728 +0,0 @@
-import json
-from unittest.mock import MagicMock
-
-import httpx
-import pytest
-import respx
-from google.api_core.exceptions import BadRequest, NotFound
-from google.cloud import bigquery as bq
-from pydantic import ValidationError
-from pytest_mock import MockerFixture
-
-from app.agent.tools import (
- BDToolkit,
- ToolError,
- ToolOutput,
- decode_table_values,
- execute_bigquery_sql,
- get_dataset_details,
- handle_tool_errors,
- search_datasets,
-)
-from app.settings import settings
-
-
-class TestHandleToolErrors:
- """Tests for handle_tool_errors decorator."""
-
- def test_decorator_passes_through_success(self):
- """Test decorator returns function result on success."""
-
- @handle_tool_errors
- def successful_function():
- return '{"status": "success", "results": "test results"}'
-
- output = ToolOutput.model_validate(json.loads(successful_function()))
-
- assert output.status == "success"
- assert output.results == "test results"
- assert output.error_details is None
-
- def test_decorator_catches_google_api_error(self):
- """Test decorator catches GoogleAPICallError."""
-
- @handle_tool_errors
- def failing_function():
- error = BadRequest(
- message="Some bad request",
- errors=[{"reason": "testReason", "message": "Test message"}],
- )
- raise error
-
- output = ToolOutput.model_validate(json.loads(failing_function()))
-
- assert output.status == "error"
- assert output.results is None
- assert output.error_details.error_type == "testReason"
- assert output.error_details.message == "Test message"
- assert output.error_details.instructions is None
-
- def test_decorator_catches_google_api_error_without_errors(self):
- """Test decorator catches GoogleAPICallError."""
-
- @handle_tool_errors
- def failing_function():
- error = BadRequest(message="Some bad request")
- raise error
-
- output = ToolOutput.model_validate(json.loads(failing_function()))
-
- assert output.status == "error"
- assert output.results is None
- assert output.error_details.error_type is None
- assert output.error_details.message == f"{BadRequest.code} Some bad request"
- assert output.error_details.instructions is None
-
- def test_decorator_catches_tool_error(self):
- """Test decorator catches ToolError."""
-
- @handle_tool_errors
- def failing_function():
- raise ToolError(
- "Custom error", error_type="CUSTOM", instructions="Try again"
- )
-
- output = ToolOutput.model_validate(json.loads(failing_function()))
-
- assert output.status == "error"
- assert output.results is None
- assert output.error_details.error_type == "CUSTOM"
- assert output.error_details.message == "Custom error"
- assert output.error_details.instructions == "Try again"
-
- def test_decorator_catches_unexpected_exception(self):
- """Test decorator catches unexpected exceptions."""
-
- @handle_tool_errors
- def failing_function():
- raise ValueError("This is a value error")
-
- output = ToolOutput.model_validate(json.loads(failing_function()))
-
- assert output.status == "error"
- assert output.results is None
- assert output.error_details.error_type is None
- assert output.error_details.message == "Unexpected error: This is a value error"
- assert output.error_details.instructions is None
-
- def test_decorator_with_custom_instructions(self):
- """Test decorator with custom instructions mapping."""
-
- @handle_tool_errors(instructions={"testReason": "Custom instruction"})
- def failing_function():
- error = BadRequest(
- message="Some bad request",
- errors=[{"reason": "testReason", "message": "Test message"}],
- )
- raise error
-
- output = ToolOutput.model_validate(json.loads(failing_function()))
-
- assert output.status == "error"
- assert output.results is None
- assert output.error_details.error_type == "testReason"
- assert output.error_details.message == "Test message"
- assert output.error_details.instructions == "Custom instruction"
-
-
-class TestToolOutput:
- """Tests for ToolOutput model validation."""
-
- def test_valid_success_output(self):
- """Test valid success output with results."""
- output = ToolOutput(status="success", results={"data": "test"})
-
- assert output.status == "success"
- assert output.results == {"data": "test"}
- assert output.error_details is None
-
- def test_valid_error_output(self):
- """Test valid success output with results."""
- from app.agent.tools import ErrorDetails
-
- error_details = ErrorDetails(message="error")
-
- output = ToolOutput(status="error", error_details=error_details)
-
- assert output.status == "error"
- assert output.results is None
- assert output.error_details == error_details
-
- def test_invalid_both_results_and_error(self):
- """Test validation fails when both results and error_details are set."""
- from app.agent.tools import ErrorDetails
-
- with pytest.raises(ValidationError):
- ToolOutput(
- status="error",
- results={"data": "test"},
- error_details=ErrorDetails(message="error"),
- )
-
- def test_invalid_neither_results_nor_error(self):
- """Test validation fails when neither results nor error_details are set."""
- with pytest.raises(ValidationError):
- ToolOutput(status="success", results=None, error_details=None)
-
-
-class TestSearchDatasets:
- """Tests for search_datasets tool."""
-
- SEARCH_ENDPOINT = f"{settings.BASEDOSDADOS_BASE_URL}/search/"
-
- @respx.mock
- def test_search_datasets_returns_overviews(self):
- """Test successful dataset search."""
- mock_response = {
- "results": [
- {
- "id": "dataset-1",
- "name": "Test Dataset",
- "slug": "test_dataset",
- "description": "Dataset description",
- "tags": [{"name": "tag1"}, {"name": "tag2"}],
- "themes": [{"name": "theme1"}, {"name": "theme2"}],
- "organizations": [{"name": "org1"}],
- }
- ]
- }
-
- respx.get(self.SEARCH_ENDPOINT).mock(
- return_value=httpx.Response(200, json=mock_response)
- )
-
- result = search_datasets.invoke({"query": "test"})
- output = ToolOutput.model_validate(json.loads(result))
-
- assert output.status == "success"
- assert len(output.results) == 1
-
- dataset = output.results[0]
-
- assert dataset["id"] == "dataset-1"
- assert dataset["name"] == "Test Dataset"
- assert dataset["slug"] == "test_dataset"
- assert dataset["description"] == "Dataset description"
- assert dataset["tags"] == ["tag1", "tag2"]
- assert dataset["themes"] == ["theme1", "theme2"]
- assert dataset["organizations"] == ["org1"]
-
- assert output.error_details is None
-
- @respx.mock
- def test_search_datasets_returns_empty_results(self):
- """Test successful dataset search with no results."""
- respx.get(self.SEARCH_ENDPOINT).mock(
- return_value=httpx.Response(200, json={"results": []})
- )
-
- result = search_datasets.invoke({"query": "nonexistent"})
- output = ToolOutput.model_validate(json.loads(result))
-
- assert output.status == "success"
- assert output.results == []
- assert output.error_details is None
-
-
-class TestGetDatasetDetails:
- """Tests for get_dataset_details tool."""
-
- GRAPHQL_URL = f"{settings.BASEDOSDADOS_BASE_URL}/graphql"
-
- @pytest.fixture
- def mock_response(self):
- return {
- "data": {
- "allDataset": {
- "edges": [
- {
- "node": {
- "id": "dataset-1",
- "name": "Test Dataset",
- "slug": "test_dataset",
- "description": "Dataset description",
- "tags": {"edges": [{"node": {"name": "tag1"}}]},
- "themes": {"edges": [{"node": {"name": "theme1"}}]},
- "organizations": {
- "edges": [
- {"node": {"name": "org1", "slug": "org1_slug"}}
- ]
- },
- "tables": {
- "edges": [
- {
- "node": {
- "id": "table-1",
- "name": "Test Table",
- "slug": "test_table",
- "description": "Table description",
- "temporalCoverage": {
- "start": "2020",
- "end": "2023",
- },
- "cloudTables": {
- "edges": [
- {
- "node": {
- "gcpProjectId": "basedosdados",
- "gcpDatasetId": "test_dataset",
- "gcpTableId": "test_table",
- }
- }
- ]
- },
- "columns": {
- "edges": [
- {
- "node": {
- "id": "col-1",
- "name": "column_name",
- "description": "Column description",
- "bigqueryType": {
- "name": "COLUMN_TYPE"
- },
- }
- }
- ]
- },
- }
- }
- ]
- },
- }
- }
- ]
- }
- }
- }
-
- @respx.mock
- def test_get_dataset_details_success(self, mock_response):
- """Test successful dataset details retrieval."""
- # Mock graphql endpoint
- respx.post(self.GRAPHQL_URL).mock(
- return_value=httpx.Response(200, json=mock_response)
- )
-
- # Mock usage guide (not found)
- respx.get(url__startswith="https://raw.githubusercontent.com").mock(
- return_value=httpx.Response(404)
- )
-
- result = get_dataset_details.invoke({"dataset_id": "dataset-1"})
- output = ToolOutput.model_validate(json.loads(result))
-
- dataset = output.results
-
- assert output.status == "success"
- assert dataset["id"] == "dataset-1"
- assert dataset["name"] == "Test Dataset"
- assert dataset["slug"] == "test_dataset"
- assert dataset["description"] == "Dataset description"
- assert dataset["tags"] == ["tag1"]
- assert dataset["themes"] == ["theme1"]
- assert dataset["organizations"] == ["org1"]
- assert dataset["usage_guide"] is None
-
- assert len(dataset["tables"]) == 1
-
- table = dataset["tables"][0]
-
- assert table["id"] == "table-1"
- assert table["gcp_id"] == "basedosdados.test_dataset.test_table"
- assert table["name"] == "Test Table"
- assert table["slug"] == "test_table"
- assert table["description"] == "Table description"
- assert table["temporal_coverage"] == {"start": "2020", "end": "2023"}
-
- assert len(table["columns"]) == 1
-
- column = table["columns"][0]
-
- assert column["name"] == "column_name"
- assert column["type"] == "COLUMN_TYPE"
- assert column["description"] == "Column description"
-
- assert output.error_details is None
-
- @respx.mock
- def test_get_dataset_details_success_with_usage_guide(self, mock_response):
- """Test dataset details with usage guide available."""
- respx.post(self.GRAPHQL_URL).mock(
- return_value=httpx.Response(200, json=mock_response)
- )
-
- respx.get(url__startswith="https://raw.githubusercontent.com").mock(
- return_value=httpx.Response(200, text="# This is a usage guide.")
- )
-
- result = get_dataset_details.invoke({"dataset_id": "dataset-1"})
- output = ToolOutput.model_validate(json.loads(result))
-
- assert output.status == "success"
- assert output.results["usage_guide"] == "# This is a usage guide."
- assert output.error_details is None
-
- @respx.mock
- def test_table_without_tags_themes_orgs(self):
- """Test dataset with table that has no tags, themes and orgs."""
- mock_response = {
- "data": {
- "allDataset": {
- "edges": [
- {
- "node": {
- "id": "dataset-1",
- "name": "Test Dataset",
- "slug": "test_dataset",
- "description": "Dataset description",
- "tags": {"edges": [{"node": {}}]},
- "themes": {"edges": [{"node": {}}]},
- "organizations": {"edges": [{"node": {}}]},
- "tables": {
- "edges": [
- {
- "node": {
- "id": "table-1",
- "name": "Test Table",
- "slug": "test_table",
- "description": "Table description",
- "temporalCoverage": {
- "start": "2020",
- "end": "2023",
- },
- "cloudTables": {
- "edges": [
- {
- "node": {
- "gcpProjectId": "basedosdados",
- "gcpDatasetId": "test_dataset",
- "gcpTableId": "test_table",
- }
- }
- ]
- },
- "columns": {
- "edges": [
- {
- "node": {
- "id": "col-1",
- "name": "column_name",
- "description": "Column description",
- "bigqueryType": {
- "name": "COLUMN_TYPE"
- },
- }
- }
- ]
- },
- }
- }
- ]
- },
- }
- }
- ]
- }
- }
- }
-
- respx.post(self.GRAPHQL_URL).mock(
- return_value=httpx.Response(200, json=mock_response)
- )
-
- respx.get(url__startswith="https://raw.githubusercontent.com").mock(
- return_value=httpx.Response(200)
- )
-
- result = get_dataset_details.invoke({"dataset_id": "dataset-1"})
- output = ToolOutput.model_validate(json.loads(result))
-
- assert output.status == "success"
- assert output.results["tags"] == []
- assert output.results["themes"] == []
- assert output.results["organizations"] == []
- assert output.error_details is None
-
- @respx.mock
- def test_table_without_cloud_tables(self):
- """Test dataset with table that has no cloud tables."""
- mock_response = {
- "data": {
- "allDataset": {
- "edges": [
- {
- "node": {
- "id": "dataset-1",
- "name": "Test Dataset",
- "slug": "test_dataset",
- "description": "Dataset description",
- "tags": {"edges": [{"node": {"name": "tag1"}}]},
- "themes": {"edges": [{"node": {"name": "theme1"}}]},
- "organizations": {
- "edges": [
- {"node": {"name": "org1", "slug": "org1_slug"}}
- ]
- },
- "tables": {
- "edges": [
- {
- "node": {
- "id": "table-1",
- "name": "Test Table",
- "slug": "test_table",
- "description": "Table description",
- "temporalCoverage": {
- "start": "2020",
- "end": "2023",
- },
- "cloudTables": {"edges": []},
- "columns": {
- "edges": [
- {
- "node": {
- "id": "col-1",
- "name": "column_name",
- "description": "Column description",
- "bigqueryType": {
- "name": "COLUMN_TYPE"
- },
- }
- }
- ]
- },
- }
- }
- ]
- },
- }
- }
- ]
- }
- }
- }
-
- respx.post(self.GRAPHQL_URL).mock(
- return_value=httpx.Response(200, json=mock_response)
- )
-
- result = get_dataset_details.invoke({"dataset_id": "dataset-1"})
- output = ToolOutput.model_validate(json.loads(result))
-
- assert output.status == "success"
- assert output.results["tables"][0]["gcp_id"] is None
- assert output.results["usage_guide"] is None
- assert output.error_details is None
-
- @respx.mock
- def test_get_dataset_details_dataset_not_found(self):
- """Test error when dataset is not found."""
- respx.post(self.GRAPHQL_URL).mock(
- return_value=httpx.Response(
- 200, json={"data": {"allDataset": {"edges": []}}}
- )
- )
-
- result = get_dataset_details.invoke({"dataset_id": "nonexistent"})
- output = ToolOutput.model_validate(json.loads(result))
-
- assert output.status == "error"
- assert output.results is None
- assert output.error_details.message == "Dataset nonexistent not found"
- assert output.error_details.error_type == "DATASET_NOT_FOUND"
- assert (
- output.error_details.instructions
- == "Verify the dataset ID from `search_datasets` results"
- )
-
-
-class TestExecuteBigQuerySQL:
- """Tests for execute_bigquery_sql tool."""
-
- @pytest.fixture
- def mock_config(self) -> dict:
- return {"configurable": {"thread_id": "test-thread", "user_id": "test-user"}}
-
- def test_successful_query(self, mocker: MockerFixture, mock_config: dict):
- """Test successful SELECT query execution."""
- mock_dry_run_query_job = MagicMock()
- mock_dry_run_query_job.statement_type = "SELECT"
-
- mock_query_job = MagicMock()
- mock_query_job.result.return_value = [{"col1": "value1"}, {"col1": "value2"}]
-
- mock_bigquery_client = MagicMock(spec=bq.Client)
- mock_bigquery_client.query.side_effect = [
- mock_dry_run_query_job,
- mock_query_job,
- ]
-
- mocker.patch(
- "app.agent.tools.get_bigquery_client", return_value=mock_bigquery_client
- )
-
- result = execute_bigquery_sql.invoke(
- {"sql_query": "SELECT * FROM project.dataset.table", "config": mock_config}
- )
-
- output = ToolOutput.model_validate(json.loads(result))
-
- assert output.status == "success"
- assert output.results == [{"col1": "value1"}, {"col1": "value2"}]
- assert output.error_details is None
-
- def test_forbidden_statement_type(self, mocker: MockerFixture, mock_config: dict):
- """Test error when statement is not SELECT."""
- mock_dry_run_query_job = MagicMock()
- mock_dry_run_query_job.statement_type = "DELETE"
-
- mock_bigquery_client = MagicMock(spec=bq.Client)
- mock_bigquery_client.query.return_value = mock_dry_run_query_job
-
- mocker.patch(
- "app.agent.tools.get_bigquery_client", return_value=mock_bigquery_client
- )
-
- result = execute_bigquery_sql.invoke(
- {"sql_query": "DELETE FROM project.dataset.table", "config": mock_config}
- )
-
- output = ToolOutput.model_validate(json.loads(result))
-
- assert output.status == "error"
- assert output.results is None
- assert output.error_details.error_type == "FORBIDDEN_STATEMENT"
- assert (
- output.error_details.message
- == "Query aborted: Statement DELETE is forbidden."
- )
- assert (
- output.error_details.instructions
- == "Your access is strictly read-only. Use only SELECT statements."
- )
-
-
-class TestDecodeTableValues:
- """Tests for decode_table_values tool."""
-
- @pytest.fixture
- def mock_config(self) -> dict:
- return {"configurable": {"thread_id": "test-thread", "user_id": "test-user"}}
-
- def test_decode_all_columns(self, mocker: MockerFixture, mock_config: dict):
- """Test decoding all columns from a table."""
- mock_query_job = MagicMock()
- mock_query_job.result.return_value = [
- {"nome_coluna": "col1", "chave": "1", "valor": "Value 1"},
- {"nome_coluna": "col2", "chave": "2", "valor": "Value 2"},
- ]
-
- mock_bigquery_client = MagicMock(spec=bq.Client)
- mock_bigquery_client.query.return_value = mock_query_job
-
- mocker.patch(
- "app.agent.tools.get_bigquery_client", return_value=mock_bigquery_client
- )
-
- result = decode_table_values.invoke(
- {"table_gcp_id": "project.dataset.table", "config": mock_config}
- )
-
- output = ToolOutput.model_validate(json.loads(result))
-
- assert output.status == "success"
- assert len(output.results) == 2
- assert output.error_details is None
-
- def test_decode_specific_column(self, mocker: MockerFixture, mock_config: dict):
- """Test decoding a specific column."""
- mock_query_job = MagicMock()
- mock_query_job.result.return_value = [
- {"nome_coluna": "col1", "chave": "1", "valor": "Value 1"},
- {"nome_coluna": "col1", "chave": "2", "valor": "Value 2"},
- ]
-
- mock_bigquery_client = MagicMock(spec=bq.Client)
- mock_bigquery_client.query.return_value = mock_query_job
-
- mocker.patch(
- "app.agent.tools.get_bigquery_client", return_value=mock_bigquery_client
- )
-
- result = decode_table_values.invoke(
- {
- "table_gcp_id": "project.dataset.table",
- "column_name": "col1",
- "config": mock_config,
- }
- )
-
- output = ToolOutput.model_validate(json.loads(result))
-
- assert output.status == "success"
- # Verify column filter was added to query
- call_args = mock_bigquery_client.query.call_args[0][0]
- assert "nome_coluna = 'col1'" in call_args
-
- def test_dictionary_not_found(self, mocker: MockerFixture, mock_config: dict):
- """Test error when dictionary table doesn't exist."""
- error = NotFound(
- message="Table not found",
- errors=[{"reason": "notFound", "message": "Test message"}],
- )
-
- mock_bigquery_client = MagicMock(spec=bq.Client)
- mock_bigquery_client.query.side_effect = error
-
- mocker.patch(
- "app.agent.tools.get_bigquery_client", return_value=mock_bigquery_client
- )
-
- result = decode_table_values.invoke(
- {"table_gcp_id": "project.dataset.table", "config": mock_config}
- )
-
- output = ToolOutput.model_validate(json.loads(result))
-
- assert output.status == "error"
- assert output.results is None
- assert output.error_details.error_type == "notFound"
- assert output.error_details.message == "Test message"
- assert (
- output.error_details.instructions
- == "Dictionary table not found for this dataset."
- )
-
- def test_invalid_table_reference(self, mock_config: dict):
- """Test error when table reference format is invalid."""
- result = decode_table_values.invoke(
- {"table_gcp_id": "table", "config": mock_config}
- )
-
- output = ToolOutput.model_validate(json.loads(result))
-
- assert output.status == "error"
- assert output.results is None
- assert output.error_details.error_type == "INVALID_TABLE_REFERENCE"
- assert output.error_details.message == "Invalid table reference: 'table'"
- assert (
- output.error_details.instructions
- == "Provide a valid table reference in the format `project.dataset.table`"
- )
-
-
-class TestBDToolkit:
- """Tests for BDToolkit class."""
-
- def test_get_tools_returns_all_tools(self):
- """Test that get_tools returns all expected tools."""
- tools = BDToolkit.get_tools()
-
- assert len(tools) == 4
-
- tool_names = [tool.name for tool in tools]
-
- assert "search_datasets" in tool_names
- assert "get_dataset_details" in tool_names
- assert "execute_bigquery_sql" in tool_names
- assert "decode_table_values" in tool_names
diff --git a/tests/app/agent/tools/test_api.py b/tests/app/agent/tools/test_api.py
new file mode 100644
index 0000000..649330a
--- /dev/null
+++ b/tests/app/agent/tools/test_api.py
@@ -0,0 +1,437 @@
+import json
+
+import httpx
+import pytest
+import respx
+
+from app.agent.tools.api import get_dataset_details, get_table_details, search_datasets
+from app.settings import settings
+
+
+class TestSearchDatasets:
+ """Tests for search_datasets tool."""
+
+ SEARCH_ENDPOINT = f"{settings.BASEDOSDADOS_BASE_URL}/search/"
+
+ @respx.mock
+ def test_search_datasets_returns_overviews(self):
+ """Test successful dataset search."""
+ mock_response = {
+ "results": [
+ {
+ "id": "dataset-1",
+ "name": "Test Dataset",
+ "slug": "test_dataset",
+ "description": "Dataset description",
+ "tags": [{"name": "tag1"}, {"name": "tag2"}],
+ "themes": [{"name": "theme1"}, {"name": "theme2"}],
+ "organizations": [{"name": "org1"}],
+ }
+ ]
+ }
+
+ respx.get(self.SEARCH_ENDPOINT).mock(
+ return_value=httpx.Response(200, json=mock_response)
+ )
+
+ result = search_datasets.invoke({"query": "test"})
+ output = json.loads(result)
+
+ assert len(output) == 1
+
+ dataset = output[0]
+
+ assert dataset["id"] == "dataset-1"
+ assert dataset["name"] == "Test Dataset"
+ assert dataset["slug"] == "test_dataset"
+ assert dataset["description"] == "Dataset description"
+ assert dataset["tags"] == ["tag1", "tag2"]
+ assert dataset["themes"] == ["theme1", "theme2"]
+ assert dataset["organizations"] == ["org1"]
+
+ @respx.mock
+ def test_search_datasets_returns_empty_results(self):
+ """Test successful dataset search with no results."""
+ respx.get(self.SEARCH_ENDPOINT).mock(
+ return_value=httpx.Response(200, json={"results": []})
+ )
+
+ result = search_datasets.invoke({"query": "nonexistent"})
+ output = json.loads(result)
+
+ assert output == []
+
+
+class TestGetDatasetDetails:
+ """Tests for get_dataset_details tool."""
+
+ GRAPHQL_URL = f"{settings.BASEDOSDADOS_BASE_URL}/graphql"
+
+ @pytest.fixture
+ def mock_response(self):
+ return {
+ "data": {
+ "allDataset": {
+ "edges": [
+ {
+ "node": {
+ "id": "DatasetNode:dataset-1",
+ "name": "Test Dataset",
+ "slug": "test_dataset",
+ "description": "Dataset description",
+ "tags": {"edges": [{"node": {"name": "tag1"}}]},
+ "themes": {"edges": [{"node": {"name": "theme1"}}]},
+ "organizations": {
+ "edges": [
+ {"node": {"name": "org1", "slug": "org1_slug"}}
+ ]
+ },
+ "tables": {
+ "edges": [
+ {
+ "node": {
+ "id": "TableNode:table-1",
+ "name": "Test Table",
+ "slug": "test_table",
+ "description": "Table description",
+ "temporalCoverage": {
+ "start": "2020",
+ "end": "2023",
+ },
+ "cloudTables": {
+ "edges": [
+ {
+ "node": {
+ "gcpProjectId": "basedosdados",
+ "gcpDatasetId": "test_dataset",
+ "gcpTableId": "test_table",
+ }
+ }
+ ]
+ },
+ }
+ }
+ ]
+ },
+ }
+ }
+ ]
+ }
+ }
+ }
+
+ @respx.mock
+ def test_get_dataset_details_success(self, mock_response):
+ """Test successful dataset details retrieval."""
+ # Mock graphql endpoint
+ respx.post(self.GRAPHQL_URL).mock(
+ return_value=httpx.Response(200, json=mock_response)
+ )
+
+ # Mock usage guide (not found)
+ respx.get(url__startswith="https://raw.githubusercontent.com").mock(
+ return_value=httpx.Response(404)
+ )
+
+ result = get_dataset_details.invoke({"dataset_id": "dataset-1"})
+ dataset = json.loads(result)
+
+ assert dataset["id"] == "dataset-1"
+ assert dataset["name"] == "Test Dataset"
+ assert dataset["slug"] == "test_dataset"
+ assert dataset["description"] == "Dataset description"
+ assert dataset["tags"] == ["tag1"]
+ assert dataset["themes"] == ["theme1"]
+ assert dataset["organizations"] == ["org1"]
+ assert dataset["usage_guide"] is None
+
+ assert len(dataset["tables"]) == 1
+
+ table = dataset["tables"][0]
+
+ assert table["id"] == "table-1"
+ assert table["gcp_id"] == "basedosdados.test_dataset.test_table"
+ assert table["name"] == "Test Table"
+ assert table["slug"] == "test_table"
+ assert table["description"] == "Table description"
+ assert table["temporal_coverage"] == {"start": "2020", "end": "2023"}
+
+ @respx.mock
+ def test_get_dataset_details_success_with_usage_guide(self, mock_response):
+ """Test dataset details with usage guide available."""
+ respx.post(self.GRAPHQL_URL).mock(
+ return_value=httpx.Response(200, json=mock_response)
+ )
+
+ respx.get(url__startswith="https://raw.githubusercontent.com").mock(
+ return_value=httpx.Response(200, text="# This is a usage guide.")
+ )
+
+ result = get_dataset_details.invoke({"dataset_id": "dataset-1"})
+ dataset = json.loads(result)
+
+ assert dataset["usage_guide"] == "# This is a usage guide."
+
+ @respx.mock
+ def test_table_without_tags_themes_orgs(self):
+ """Test dataset with table that has no tags, themes and orgs."""
+ mock_response = {
+ "data": {
+ "allDataset": {
+ "edges": [
+ {
+ "node": {
+ "id": "dataset-1",
+ "name": "Test Dataset",
+ "slug": "test_dataset",
+ "description": "Dataset description",
+ "tags": {"edges": [{"node": {}}]},
+ "themes": {"edges": [{"node": {}}]},
+ "organizations": {"edges": [{"node": {}}]},
+ "tables": {
+ "edges": [
+ {
+ "node": {
+ "id": "table-1",
+ "name": "Test Table",
+ "slug": "test_table",
+ "description": "Table description",
+ "temporalCoverage": {
+ "start": "2020",
+ "end": "2023",
+ },
+ "cloudTables": {
+ "edges": [
+ {
+ "node": {
+ "gcpProjectId": "basedosdados",
+ "gcpDatasetId": "test_dataset",
+ "gcpTableId": "test_table",
+ }
+ }
+ ]
+ },
+ }
+ }
+ ]
+ },
+ }
+ }
+ ]
+ }
+ }
+ }
+
+ respx.post(self.GRAPHQL_URL).mock(
+ return_value=httpx.Response(200, json=mock_response)
+ )
+
+ respx.get(url__startswith="https://raw.githubusercontent.com").mock(
+ return_value=httpx.Response(200)
+ )
+
+ result = get_dataset_details.invoke({"dataset_id": "dataset-1"})
+ dataset = json.loads(result)
+
+ assert dataset["tags"] == []
+ assert dataset["themes"] == []
+ assert dataset["organizations"] == []
+
+ @respx.mock
+ def test_table_without_cloud_tables(self):
+ """Test dataset with table that has no cloud tables."""
+ mock_response = {
+ "data": {
+ "allDataset": {
+ "edges": [
+ {
+ "node": {
+ "id": "dataset-1",
+ "name": "Test Dataset",
+ "slug": "test_dataset",
+ "description": "Dataset description",
+ "tags": {"edges": [{"node": {"name": "tag1"}}]},
+ "themes": {"edges": [{"node": {"name": "theme1"}}]},
+ "organizations": {
+ "edges": [
+ {"node": {"name": "org1", "slug": "org1_slug"}}
+ ]
+ },
+ "tables": {
+ "edges": [
+ {
+ "node": {
+ "id": "table-1",
+ "name": "Test Table",
+ "slug": "test_table",
+ "description": "Table description",
+ "temporalCoverage": {
+ "start": "2020",
+ "end": "2023",
+ },
+ "cloudTables": {"edges": []},
+ }
+ }
+ ]
+ },
+ }
+ }
+ ]
+ }
+ }
+ }
+
+ respx.post(self.GRAPHQL_URL).mock(
+ return_value=httpx.Response(200, json=mock_response)
+ )
+
+ result = get_dataset_details.invoke({"dataset_id": "dataset-1"})
+ dataset = json.loads(result)
+
+ assert dataset["tables"][0]["gcp_id"] is None
+ assert dataset["usage_guide"] is None
+
+ @respx.mock
+ def test_get_dataset_details_dataset_not_found(self):
+ """Test error when dataset is not found."""
+ respx.post(self.GRAPHQL_URL).mock(
+ return_value=httpx.Response(
+ 200, json={"data": {"allDataset": {"edges": []}}}
+ )
+ )
+
+ result = get_dataset_details.invoke({"dataset_id": "nonexistent"})
+ output = json.loads(result)
+
+ assert output["status"] == "error"
+ assert (
+ output["message"]
+ == "Dataset 'nonexistent' not found. Verify the dataset ID from search_datasets results."
+ )
+
+
+class TestGetTableDetails:
+ """Tests for get_table_details tool."""
+
+ GRAPHQL_URL = f"{settings.BASEDOSDADOS_BASE_URL}/graphql"
+
+ @pytest.fixture
+ def mock_response(self):
+ return {
+ "data": {
+ "allTable": {
+ "edges": [
+ {
+ "node": {
+ "id": "TableNode:table-1",
+ "name": "Test Table",
+ "slug": "test_table",
+ "description": "Table description",
+ "temporalCoverage": {
+ "start": "2020",
+ "end": "2023",
+ },
+ "cloudTables": {
+ "edges": [
+ {
+ "node": {
+ "gcpProjectId": "basedosdados",
+ "gcpDatasetId": "test_dataset",
+ "gcpTableId": "test_table",
+ }
+ }
+ ]
+ },
+ "columns": {
+ "edges": [
+ {
+ "node": {
+ "id": "col-1",
+ "name": "column_name",
+ "description": "Column description",
+ "bigqueryType": {"name": "STRING"},
+ "directoryPrimaryKey": None,
+ }
+ },
+ {
+ "node": {
+ "id": "col-2",
+ "name": "id_municipio",
+ "description": "Municipality ID",
+ "bigqueryType": {"name": "STRING"},
+ "directoryPrimaryKey": {
+ "table": {
+ "id": "TableNode:dir-table-1"
+ }
+ },
+ }
+ },
+ ]
+ },
+ }
+ }
+ ]
+ }
+ }
+ }
+
+ @respx.mock
+ def test_get_table_details_success(self, mock_response):
+ """Test successful table details retrieval."""
+ respx.post(self.GRAPHQL_URL).mock(
+ return_value=httpx.Response(200, json=mock_response)
+ )
+
+ result = get_table_details.invoke({"table_id": "table-1"})
+ table = json.loads(result)
+
+ assert table["id"] == "table-1"
+ assert table["gcp_id"] == "basedosdados.test_dataset.test_table"
+ assert table["name"] == "Test Table"
+ assert table["slug"] == "test_table"
+ assert table["description"] == "Table description"
+ assert table["temporal_coverage"] == {"start": "2020", "end": "2023"}
+
+ assert len(table["columns"]) == 2
+
+ assert table["columns"][0]["name"] == "column_name"
+ assert table["columns"][0]["type"] == "STRING"
+ assert table["columns"][0]["description"] == "Column description"
+ assert "reference_table_id" not in table["columns"][0]
+
+ assert table["columns"][1]["name"] == "id_municipio"
+ assert table["columns"][1]["type"] == "STRING"
+ assert table["columns"][1]["description"] == "Municipality ID"
+ assert table["columns"][1]["reference_table_id"] == "dir-table-1"
+
+ @respx.mock
+ def test_get_table_details_without_cloud_tables(self, mock_response):
+ """Test table details when no cloud tables exist."""
+ mock_response["data"]["allTable"]["edges"][0]["node"]["cloudTables"] = {
+ "edges": []
+ }
+
+ respx.post(self.GRAPHQL_URL).mock(
+ return_value=httpx.Response(200, json=mock_response)
+ )
+
+ result = get_table_details.invoke({"table_id": "table-1"})
+ table = json.loads(result)
+
+ assert table["gcp_id"] is None
+
+ @respx.mock
+ def test_get_table_details_not_found(self):
+ """Test error when table is not found."""
+ respx.post(self.GRAPHQL_URL).mock(
+ return_value=httpx.Response(200, json={"data": {"allTable": {"edges": []}}})
+ )
+
+ result = get_table_details.invoke({"table_id": "nonexistent"})
+ output = json.loads(result)
+
+ assert output["status"] == "error"
+ assert (
+ output["message"]
+ == "Table 'nonexistent' not found. Verify the table ID from get_dataset_details results."
+ )
diff --git a/tests/app/agent/tools/test_bigquery.py b/tests/app/agent/tools/test_bigquery.py
new file mode 100644
index 0000000..1d71563
--- /dev/null
+++ b/tests/app/agent/tools/test_bigquery.py
@@ -0,0 +1,248 @@
+import json
+from unittest.mock import MagicMock
+
+import pytest
+from google.api_core.exceptions import BadRequest, NotFound
+from google.cloud import bigquery as bq
+from pytest_mock import MockerFixture
+
+from app.agent.tools.bigquery import (
+ MAX_BYTES_BILLED,
+ decode_table_values,
+ execute_bigquery_sql,
+)
+
+
+class TestExecuteBigQuerySQL:
+ """Tests for execute_bigquery_sql tool."""
+
+ @pytest.fixture
+ def mock_config(self) -> dict:
+ return {"configurable": {"thread_id": "test-thread", "user_id": "test-user"}}
+
+ def test_successful_query(self, mocker: MockerFixture, mock_config: dict):
+ """Test successful SELECT query execution."""
+ mock_dry_run_query_job = MagicMock()
+ mock_dry_run_query_job.statement_type = "SELECT"
+
+ mock_query_job = MagicMock()
+ mock_query_job.result.return_value = [{"col1": "value1"}, {"col1": "value2"}]
+
+ mock_bigquery_client = MagicMock(spec=bq.Client)
+ mock_bigquery_client.query.side_effect = [
+ mock_dry_run_query_job,
+ mock_query_job,
+ ]
+
+ mocker.patch(
+ "app.agent.tools.bigquery._get_client", return_value=mock_bigquery_client
+ )
+
+ result = execute_bigquery_sql.invoke(
+ {"sql_query": "SELECT * FROM project.dataset.table", "config": mock_config}
+ )
+
+ output = json.loads(result)
+
+ assert output == [{"col1": "value1"}, {"col1": "value2"}]
+
+ def test_forbidden_statement_type(self, mocker: MockerFixture, mock_config: dict):
+ """Test error when statement is not SELECT."""
+ mock_dry_run_query_job = MagicMock()
+ mock_dry_run_query_job.statement_type = "DELETE"
+
+ mock_bigquery_client = MagicMock(spec=bq.Client)
+ mock_bigquery_client.query.return_value = mock_dry_run_query_job
+
+ mocker.patch(
+ "app.agent.tools.bigquery._get_client", return_value=mock_bigquery_client
+ )
+
+ result = execute_bigquery_sql.invoke(
+ {"sql_query": "DELETE FROM project.dataset.table", "config": mock_config}
+ )
+
+ output = json.loads(result)
+
+ assert output["status"] == "error"
+ assert output["message"] == "Only SELECT statements are allowed, got DELETE."
+
+ def test_bytes_billed_limit_exceeded(
+ self, mocker: MockerFixture, mock_config: dict
+ ):
+ """Test error when query exceeds bytes billed limit."""
+ mock_dry_run_query_job = MagicMock()
+ mock_dry_run_query_job.statement_type = "SELECT"
+
+ error = BadRequest(
+ message="Query limit exceeded",
+ errors=[
+ {"reason": "bytesBilledLimitExceeded", "message": "Limit exceeded"}
+ ],
+ )
+
+ mock_bigquery_client = MagicMock(spec=bq.Client)
+ mock_bigquery_client.query.side_effect = [mock_dry_run_query_job, error]
+
+ mocker.patch(
+ "app.agent.tools.bigquery._get_client", return_value=mock_bigquery_client
+ )
+
+ result = execute_bigquery_sql.invoke(
+ {"sql_query": "SELECT * FROM project.dataset.table", "config": mock_config}
+ )
+
+ output = json.loads(result)
+
+ assert output["status"] == "error"
+ assert output["message"] == (
+ f"Query exceeds the {MAX_BYTES_BILLED // 10**9}GB processing limit. "
+ "Add WHERE filters or select fewer columns."
+ )
+
+ def test_google_api_error_reraise(self, mocker: MockerFixture, mock_config: dict):
+ """Test that non-bytesBilledLimitExceeded GoogleAPICallError is re-raised."""
+ mock_dry_run_query_job = MagicMock()
+ mock_dry_run_query_job.statement_type = "SELECT"
+
+ error = BadRequest(
+ message="Syntax error",
+ errors=[{"reason": "testReason", "message": "Test message"}],
+ )
+
+ mock_bigquery_client = MagicMock(spec=bq.Client)
+ mock_bigquery_client.query.side_effect = [mock_dry_run_query_job, error]
+
+ mocker.patch(
+ "app.agent.tools.bigquery._get_client", return_value=mock_bigquery_client
+ )
+
+ result = execute_bigquery_sql.invoke(
+ {"sql_query": "SELECT * FROM project.dataset.table", "config": mock_config}
+ )
+
+ output = json.loads(result)
+
+ assert output["status"] == "error"
+ assert output["message"] == "400 Syntax error"
+
+
+class TestDecodeTableValues:
+ """Tests for decode_table_values tool."""
+
+ @pytest.fixture
+ def mock_config(self) -> dict:
+ return {"configurable": {"thread_id": "test-thread", "user_id": "test-user"}}
+
+ def test_decode_all_columns(self, mocker: MockerFixture, mock_config: dict):
+ """Test decoding all columns from a table."""
+ mock_query_job = MagicMock()
+ mock_query_job.result.return_value = [
+ {"nome_coluna": "col1", "chave": "1", "valor": "Value 1"},
+ {"nome_coluna": "col2", "chave": "2", "valor": "Value 2"},
+ ]
+
+ mock_bigquery_client = MagicMock(spec=bq.Client)
+ mock_bigquery_client.query.return_value = mock_query_job
+
+ mocker.patch(
+ "app.agent.tools.bigquery._get_client", return_value=mock_bigquery_client
+ )
+
+ result = decode_table_values.invoke(
+ {"table_gcp_id": "project.dataset.table", "config": mock_config}
+ )
+
+ output = json.loads(result)
+
+ assert len(output) == 2
+
+ def test_decode_specific_column(self, mocker: MockerFixture, mock_config: dict):
+ """Test decoding a specific column."""
+ mock_query_job = MagicMock()
+ mock_query_job.result.return_value = [
+ {"nome_coluna": "col1", "chave": "1", "valor": "Value 1"},
+ {"nome_coluna": "col1", "chave": "2", "valor": "Value 2"},
+ ]
+
+ mock_bigquery_client = MagicMock(spec=bq.Client)
+ mock_bigquery_client.query.return_value = mock_query_job
+
+ mocker.patch(
+ "app.agent.tools.bigquery._get_client", return_value=mock_bigquery_client
+ )
+
+ result = decode_table_values.invoke(
+ {
+ "table_gcp_id": "project.dataset.table",
+ "column_name": "col1",
+ "config": mock_config,
+ }
+ )
+
+ output = json.loads(result)
+
+ assert len(output) == 2
+ # Verify column filter was added to query
+ call_args = mock_bigquery_client.query.call_args[0][0]
+ assert "nome_coluna = 'col1'" in call_args
+
+ def test_dictionary_not_found(self, mocker: MockerFixture, mock_config: dict):
+ """Test error when dictionary table doesn't exist."""
+ error = NotFound(
+ message="Table not found",
+ errors=[{"reason": "notFound", "message": "Test message"}],
+ )
+
+ mock_bigquery_client = MagicMock(spec=bq.Client)
+ mock_bigquery_client.query.side_effect = error
+
+ mocker.patch(
+ "app.agent.tools.bigquery._get_client", return_value=mock_bigquery_client
+ )
+
+ result = decode_table_values.invoke(
+ {"table_gcp_id": "project.dataset.table", "config": mock_config}
+ )
+
+ output = json.loads(result)
+
+ assert output["status"] == "error"
+ assert output["message"] == "Dictionary table not found for this dataset."
+
+ def test_invalid_table_reference(self, mock_config: dict):
+ """Test error when table reference format is invalid."""
+ result = decode_table_values.invoke(
+ {"table_gcp_id": "table", "config": mock_config}
+ )
+
+ output = json.loads(result)
+
+ assert output["status"] == "error"
+ assert (
+ output["message"]
+ == "Invalid table reference: 'table'. Expected format: project.dataset.table"
+ )
+
+ def test_google_api_error_reraise(self, mocker: MockerFixture, mock_config: dict):
+ """Test that non-notFound GoogleAPICallError is re-raised."""
+ error = BadRequest(
+ message="Syntax error",
+ errors=[{"reason": "testReason", "message": "Test message"}],
+ )
+
+ mock_bigquery_client = MagicMock(spec=bq.Client)
+ mock_bigquery_client.query.side_effect = error
+
+ mocker.patch(
+ "app.agent.tools.bigquery._get_client", return_value=mock_bigquery_client
+ )
+
+ result = decode_table_values.invoke(
+ {"table_gcp_id": "project.dataset.table", "config": mock_config}
+ )
+
+ output = json.loads(result)
+
+ assert output["status"] == "error"
+ assert output["message"] == "400 Syntax error"
diff --git a/tests/app/agent/tools/test_exceptions.py b/tests/app/agent/tools/test_exceptions.py
new file mode 100644
index 0000000..9731216
--- /dev/null
+++ b/tests/app/agent/tools/test_exceptions.py
@@ -0,0 +1,46 @@
+import json
+
+from google.api_core.exceptions import BadRequest
+
+from app.agent.tools.exceptions import handle_tool_errors
+
+
+class TestHandleToolErrors:
+ """Tests for handle_tool_errors decorator."""
+
+ def test_decorator_passes_through_success(self):
+ """Test decorator returns function result on success."""
+
+ @handle_tool_errors
+ def successful_function():
+ return '{"key": "value"}'
+
+ output = successful_function()
+ assert json.loads(output) == {"key": "value"}
+
+ def test_decorator_catches_exception(self):
+ """Test decorator catches exceptions and returns ToolError JSON."""
+
+ @handle_tool_errors
+ def failing_function():
+ raise ValueError("something went wrong")
+
+ output = json.loads(failing_function())
+
+ assert output["status"] == "error"
+ assert output["message"] == "something went wrong"
+
+ def test_decorator_catches_google_api_error(self):
+ """Test decorator catches GoogleAPICallError."""
+
+ @handle_tool_errors
+ def failing_function():
+ raise BadRequest(
+ message="Some bad request",
+ errors=[{"reason": "testReason", "message": "Test message"}],
+ )
+
+ output = json.loads(failing_function())
+
+ assert output["status"] == "error"
+ assert output["message"] == "400 Some bad request"
diff --git a/tests/app/agent/tools/test_toolkit.py b/tests/app/agent/tools/test_toolkit.py
new file mode 100644
index 0000000..c416306
--- /dev/null
+++ b/tests/app/agent/tools/test_toolkit.py
@@ -0,0 +1,19 @@
+from app.agent.tools import BDToolkit
+
+
+class TestBDToolkit:
+ """Tests for BDToolkit class."""
+
+ def test_get_tools_returns_all_tools(self):
+ """Test that get_tools returns all expected tools."""
+ tools = BDToolkit.get_tools()
+
+ assert len(tools) == 5
+
+ tool_names = [tool.name for tool in tools]
+
+ assert "search_datasets" in tool_names
+ assert "get_dataset_details" in tool_names
+ assert "get_table_details" in tool_names
+ assert "execute_bigquery_sql" in tool_names
+ assert "decode_table_values" in tool_names
diff --git a/tests/app/api/routers/test_chatbot.py b/tests/app/api/routers/test_chatbot.py
index 0de01f2..e1f5792 100644
--- a/tests/app/api/routers/test_chatbot.py
+++ b/tests/app/api/routers/test_chatbot.py
@@ -1,6 +1,7 @@
import uuid
from contextlib import asynccontextmanager
from datetime import datetime, timezone
+from unittest.mock import AsyncMock
import jwt
import pytest
@@ -29,9 +30,9 @@ def send_feedback(self, feedback: Feedback, created: bool):
return FeedbackSyncStatus.SUCCESS, datetime.now(timezone.utc)
-class MockReActAgent:
- def __init__(self):
- self.checkpointer = None
+class MockAgent:
+ def __init__(self, checkpointer=None):
+ self.checkpointer = checkpointer
def invoke(self, input, config):
return {"messages": [AIMessage("Mock response")]}
@@ -40,12 +41,12 @@ async def ainvoke(self, input, config):
return {"messages": [AIMessage("Mock response")]}
def stream(self, input, config, stream_mode):
- chunk = {"agent": {"messages": [AIMessage("Mock response")]}}
+ chunk = {"model": {"messages": [AIMessage("Mock response")]}}
yield "updates", chunk
yield "values", chunk
async def astream(self, input, config, stream_mode):
- chunk = {"agent": {"messages": [AIMessage("Mock response")]}}
+ chunk = {"model": {"messages": [AIMessage("Mock response")]}}
yield "updates", chunk
yield "values", chunk
@@ -84,7 +85,7 @@ def access_token(user_id: str) -> str:
def client(database: AsyncDatabase):
@asynccontextmanager
async def mock_lifespan(app: FastAPI):
- app.state.agent = MockReActAgent()
+ app.state.agent = MockAgent()
yield
def get_database_override():
@@ -235,6 +236,21 @@ def test_delete_thread_success(
)
assert response.status_code == status.HTTP_200_OK
+ def test_delete_thread_with_checkpointer(
+ self, client: TestClient, access_token: str, thread: Thread
+ ):
+ """Test successful thread deletion also deletes checkpoints."""
+ mock_checkpointer = AsyncMock()
+ app.state.agent.checkpointer = mock_checkpointer
+
+ response = client.delete(
+ url=f"/api/v1/chatbot/threads/{thread.id}",
+ headers={"Authorization": f"Bearer {access_token}"},
+ )
+
+ assert response.status_code == status.HTTP_200_OK
+ mock_checkpointer.adelete_thread.assert_called_once_with(str(thread.id))
+
def test_delete_thread_not_found(self, client: TestClient, access_token: str):
"""Test deleting non-existent thread returns 404."""
response = client.delete(
@@ -368,7 +384,8 @@ def test_send_message_success(
event = StreamEvent.model_validate_json(line)
events.append(event)
- assert len(events) >= 1
+ assert len(events) >= 2
+ assert any(event.type == "final_answer" for event in events)
assert events[-1].type == "complete"
assert events[-1].data.run_id is not None
diff --git a/tests/app/api/test_streaming.py b/tests/app/api/test_streaming.py
index 2fc9046..1e1d468 100644
--- a/tests/app/api/test_streaming.py
+++ b/tests/app/api/test_streaming.py
@@ -4,12 +4,12 @@
from unittest.mock import AsyncMock, MagicMock
import pytest
-from google.api_core import exceptions as google_api_exceptions
from langchain_core.messages import AIMessage, ToolMessage
from app.api.schemas import ConfigDict
from app.api.streaming import (
ErrorMessage,
+ _parse_thinking,
_process_chunk,
_truncate_json,
stream_response,
@@ -111,6 +111,64 @@ def test_truncate_json_invalid(self):
assert _truncate_json(invalid_json_string) == invalid_json_string
+class TestParseThinking:
+ """Tests for _parse_thinking function."""
+
+ def test_string_content_returns_none(self):
+ """Test that plain string content returns None."""
+ message = AIMessage(content="Hello, world!")
+ assert _parse_thinking(message) is None
+
+ def test_single_thinking_block(self):
+ """Test extraction of a single thinking block."""
+ message = AIMessage(
+ content=[
+ {"type": "thinking", "thinking": "Let me reason about this."},
+ {"type": "text", "text": "Here is my answer."},
+ ]
+ )
+ assert _parse_thinking(message) == "Let me reason about this."
+
+ def test_multiple_thinking_blocks_are_concatenated(self):
+ """Test that multiple thinking blocks are concatenated."""
+ message = AIMessage(
+ content=[
+ {"type": "thinking", "thinking": "First thought. "},
+ {"type": "text", "text": "Some text."},
+ {"type": "thinking", "thinking": "Second thought."},
+ ]
+ )
+ assert _parse_thinking(message) == "First thought. Second thought."
+
+ def test_no_thinking_blocks_returns_none(self):
+ """Test that content with no thinking blocks returns None."""
+ message = AIMessage(
+ content=[
+ {"type": "text", "text": "Just text."},
+ ]
+ )
+ assert _parse_thinking(message) is None
+
+ def test_empty_thinking_block_returns_none(self):
+ """Test that an empty thinking string returns None."""
+ message = AIMessage(
+ content=[
+ {"type": "thinking", "thinking": ""},
+ ]
+ )
+ assert _parse_thinking(message) is None
+
+ def test_non_dict_blocks_are_skipped(self):
+ """Test that non-dict items in content are safely skipped."""
+ message = AIMessage(
+ content=[
+ "plain string block",
+ {"type": "thinking", "thinking": "Actual thinking."},
+ ]
+ )
+ assert _parse_thinking(message) == "Actual thinking."
+
+
class TestProcessChunk:
"""Tests for _process_chunk function."""
@@ -471,7 +529,7 @@ async def test_stream_response_generic_exception(
mock_agent = MagicMock()
async def mock_astream(*args, **kwargs):
- raise RuntimeError("Something went wrong")
+ raise Exception("Something went wrong")
yield # Makes this an async generator
mock_agent.astream = mock_astream
@@ -534,39 +592,3 @@ async def mock_astream(*args, **kwargs):
call_args = mock_database.create_message.call_args[0][0]
assert call_args.status == MessageStatus.SUCCESS
assert call_args.content == ErrorMessage.MODEL_CALL_LIMIT_REACHED
-
- async def test_stream_response_google_api_error(
- self,
- mock_database,
- mock_user_message,
- mock_config,
- mock_thread_id,
- mock_model_uri,
- ):
- """Test Google API InvalidArgument yields error event."""
- mock_agent = MagicMock()
-
- async def mock_astream(*args, **kwargs):
- raise google_api_exceptions.InvalidArgument("Invalid request")
- yield # Makes this an async generator
-
- mock_agent.astream = mock_astream
-
- events = await self._collect_events(
- stream_response(
- database=mock_database,
- agent=mock_agent,
- user_message=mock_user_message,
- config=mock_config,
- thread_id=mock_thread_id,
- model_uri=mock_model_uri,
- )
- )
-
- assert len(events) == 2
- assert ErrorMessage.UNEXPECTED in events[0]
- assert '"type":"complete"' in events[1]
-
- call_args = mock_database.create_message.call_args[0][0]
- assert call_args.status == MessageStatus.ERROR
- assert call_args.content == ErrorMessage.UNEXPECTED
diff --git a/uv.lock b/uv.lock
index 244bbff..2f9757a 100644
--- a/uv.lock
+++ b/uv.lock
@@ -176,7 +176,7 @@ requires-dist = [
{ name = "langsmith", specifier = ">=0.6.0" },
{ name = "loguru", specifier = ">=0.7.3" },
{ name = "psycopg", extras = ["binary"], specifier = ">=3.3.2" },
- { name = "pydantic", specifier = "<2.12.0" },
+ { name = "pydantic", specifier = ">=2.12.0" },
{ name = "pydantic-settings", specifier = ">=2.12.0" },
{ name = "pyjwt", specifier = ">=2.10.1" },
{ name = "sqlmodel", specifier = ">=0.0.31" },
@@ -1337,7 +1337,7 @@ wheels = [
[[package]]
name = "pydantic"
-version = "2.11.10"
+version = "2.12.5"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "annotated-types" },
@@ -1345,9 +1345,9 @@ dependencies = [
{ name = "typing-extensions" },
{ name = "typing-inspection" },
]
-sdist = { url = "https://files.pythonhosted.org/packages/ae/54/ecab642b3bed45f7d5f59b38443dcb36ef50f85af192e6ece103dbfe9587/pydantic-2.11.10.tar.gz", hash = "sha256:dc280f0982fbda6c38fada4e476dc0a4f3aeaf9c6ad4c28df68a666ec3c61423", size = 788494, upload-time = "2025-10-04T10:40:41.338Z" }
+sdist = { url = "https://files.pythonhosted.org/packages/69/44/36f1a6e523abc58ae5f928898e4aca2e0ea509b5aa6f6f392a5d882be928/pydantic-2.12.5.tar.gz", hash = "sha256:4d351024c75c0f085a9febbb665ce8c0c6ec5d30e903bdb6394b7ede26aebb49", size = 821591, upload-time = "2025-11-26T15:11:46.471Z" }
wheels = [
- { url = "https://files.pythonhosted.org/packages/bd/1f/73c53fcbfb0b5a78f91176df41945ca466e71e9d9d836e5c522abda39ee7/pydantic-2.11.10-py3-none-any.whl", hash = "sha256:802a655709d49bd004c31e865ef37da30b540786a46bfce02333e0e24b5fe29a", size = 444823, upload-time = "2025-10-04T10:40:39.055Z" },
+ { url = "https://files.pythonhosted.org/packages/5a/87/b70ad306ebb6f9b585f114d0ac2137d792b48be34d732d60e597c2f8465a/pydantic-2.12.5-py3-none-any.whl", hash = "sha256:e561593fccf61e8a20fc46dfc2dfe075b8be7d0188df33f221ad1f0139180f9d", size = 463580, upload-time = "2025-11-26T15:11:44.605Z" },
]
[package.optional-dependencies]
@@ -1357,44 +1357,45 @@ email = [
[[package]]
name = "pydantic-core"
-version = "2.33.2"
+version = "2.41.5"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "typing-extensions" },
]
-sdist = { url = "https://files.pythonhosted.org/packages/ad/88/5f2260bdfae97aabf98f1778d43f69574390ad787afb646292a638c923d4/pydantic_core-2.33.2.tar.gz", hash = "sha256:7cb8bc3605c29176e1b105350d2e6474142d7c1bd1d9327c4a9bdb46bf827acc", size = 435195, upload-time = "2025-04-23T18:33:52.104Z" }
-wheels = [
- { url = "https://files.pythonhosted.org/packages/18/8a/2b41c97f554ec8c71f2a8a5f85cb56a8b0956addfe8b0efb5b3d77e8bdc3/pydantic_core-2.33.2-cp312-cp312-macosx_10_12_x86_64.whl", hash = "sha256:a7ec89dc587667f22b6a0b6579c249fca9026ce7c333fc142ba42411fa243cdc", size = 2009000, upload-time = "2025-04-23T18:31:25.863Z" },
- { url = "https://files.pythonhosted.org/packages/a1/02/6224312aacb3c8ecbaa959897af57181fb6cf3a3d7917fd44d0f2917e6f2/pydantic_core-2.33.2-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:3c6db6e52c6d70aa0d00d45cdb9b40f0433b96380071ea80b09277dba021ddf7", size = 1847996, upload-time = "2025-04-23T18:31:27.341Z" },
- { url = "https://files.pythonhosted.org/packages/d6/46/6dcdf084a523dbe0a0be59d054734b86a981726f221f4562aed313dbcb49/pydantic_core-2.33.2-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:4e61206137cbc65e6d5256e1166f88331d3b6238e082d9f74613b9b765fb9025", size = 1880957, upload-time = "2025-04-23T18:31:28.956Z" },
- { url = "https://files.pythonhosted.org/packages/ec/6b/1ec2c03837ac00886ba8160ce041ce4e325b41d06a034adbef11339ae422/pydantic_core-2.33.2-cp312-cp312-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:eb8c529b2819c37140eb51b914153063d27ed88e3bdc31b71198a198e921e011", size = 1964199, upload-time = "2025-04-23T18:31:31.025Z" },
- { url = "https://files.pythonhosted.org/packages/2d/1d/6bf34d6adb9debd9136bd197ca72642203ce9aaaa85cfcbfcf20f9696e83/pydantic_core-2.33.2-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:c52b02ad8b4e2cf14ca7b3d918f3eb0ee91e63b3167c32591e57c4317e134f8f", size = 2120296, upload-time = "2025-04-23T18:31:32.514Z" },
- { url = "https://files.pythonhosted.org/packages/e0/94/2bd0aaf5a591e974b32a9f7123f16637776c304471a0ab33cf263cf5591a/pydantic_core-2.33.2-cp312-cp312-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:96081f1605125ba0855dfda83f6f3df5ec90c61195421ba72223de35ccfb2f88", size = 2676109, upload-time = "2025-04-23T18:31:33.958Z" },
- { url = "https://files.pythonhosted.org/packages/f9/41/4b043778cf9c4285d59742281a769eac371b9e47e35f98ad321349cc5d61/pydantic_core-2.33.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:8f57a69461af2a5fa6e6bbd7a5f60d3b7e6cebb687f55106933188e79ad155c1", size = 2002028, upload-time = "2025-04-23T18:31:39.095Z" },
- { url = "https://files.pythonhosted.org/packages/cb/d5/7bb781bf2748ce3d03af04d5c969fa1308880e1dca35a9bd94e1a96a922e/pydantic_core-2.33.2-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:572c7e6c8bb4774d2ac88929e3d1f12bc45714ae5ee6d9a788a9fb35e60bb04b", size = 2100044, upload-time = "2025-04-23T18:31:41.034Z" },
- { url = "https://files.pythonhosted.org/packages/fe/36/def5e53e1eb0ad896785702a5bbfd25eed546cdcf4087ad285021a90ed53/pydantic_core-2.33.2-cp312-cp312-musllinux_1_1_aarch64.whl", hash = "sha256:db4b41f9bd95fbe5acd76d89920336ba96f03e149097365afe1cb092fceb89a1", size = 2058881, upload-time = "2025-04-23T18:31:42.757Z" },
- { url = "https://files.pythonhosted.org/packages/01/6c/57f8d70b2ee57fc3dc8b9610315949837fa8c11d86927b9bb044f8705419/pydantic_core-2.33.2-cp312-cp312-musllinux_1_1_armv7l.whl", hash = "sha256:fa854f5cf7e33842a892e5c73f45327760bc7bc516339fda888c75ae60edaeb6", size = 2227034, upload-time = "2025-04-23T18:31:44.304Z" },
- { url = "https://files.pythonhosted.org/packages/27/b9/9c17f0396a82b3d5cbea4c24d742083422639e7bb1d5bf600e12cb176a13/pydantic_core-2.33.2-cp312-cp312-musllinux_1_1_x86_64.whl", hash = "sha256:5f483cfb75ff703095c59e365360cb73e00185e01aaea067cd19acffd2ab20ea", size = 2234187, upload-time = "2025-04-23T18:31:45.891Z" },
- { url = "https://files.pythonhosted.org/packages/b0/6a/adf5734ffd52bf86d865093ad70b2ce543415e0e356f6cacabbc0d9ad910/pydantic_core-2.33.2-cp312-cp312-win32.whl", hash = "sha256:9cb1da0f5a471435a7bc7e439b8a728e8b61e59784b2af70d7c169f8dd8ae290", size = 1892628, upload-time = "2025-04-23T18:31:47.819Z" },
- { url = "https://files.pythonhosted.org/packages/43/e4/5479fecb3606c1368d496a825d8411e126133c41224c1e7238be58b87d7e/pydantic_core-2.33.2-cp312-cp312-win_amd64.whl", hash = "sha256:f941635f2a3d96b2973e867144fde513665c87f13fe0e193c158ac51bfaaa7b2", size = 1955866, upload-time = "2025-04-23T18:31:49.635Z" },
- { url = "https://files.pythonhosted.org/packages/0d/24/8b11e8b3e2be9dd82df4b11408a67c61bb4dc4f8e11b5b0fc888b38118b5/pydantic_core-2.33.2-cp312-cp312-win_arm64.whl", hash = "sha256:cca3868ddfaccfbc4bfb1d608e2ccaaebe0ae628e1416aeb9c4d88c001bb45ab", size = 1888894, upload-time = "2025-04-23T18:31:51.609Z" },
- { url = "https://files.pythonhosted.org/packages/46/8c/99040727b41f56616573a28771b1bfa08a3d3fe74d3d513f01251f79f172/pydantic_core-2.33.2-cp313-cp313-macosx_10_12_x86_64.whl", hash = "sha256:1082dd3e2d7109ad8b7da48e1d4710c8d06c253cbc4a27c1cff4fbcaa97a9e3f", size = 2015688, upload-time = "2025-04-23T18:31:53.175Z" },
- { url = "https://files.pythonhosted.org/packages/3a/cc/5999d1eb705a6cefc31f0b4a90e9f7fc400539b1a1030529700cc1b51838/pydantic_core-2.33.2-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:f517ca031dfc037a9c07e748cefd8d96235088b83b4f4ba8939105d20fa1dcd6", size = 1844808, upload-time = "2025-04-23T18:31:54.79Z" },
- { url = "https://files.pythonhosted.org/packages/6f/5e/a0a7b8885c98889a18b6e376f344da1ef323d270b44edf8174d6bce4d622/pydantic_core-2.33.2-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:0a9f2c9dd19656823cb8250b0724ee9c60a82f3cdf68a080979d13092a3b0fef", size = 1885580, upload-time = "2025-04-23T18:31:57.393Z" },
- { url = "https://files.pythonhosted.org/packages/3b/2a/953581f343c7d11a304581156618c3f592435523dd9d79865903272c256a/pydantic_core-2.33.2-cp313-cp313-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:2b0a451c263b01acebe51895bfb0e1cc842a5c666efe06cdf13846c7418caa9a", size = 1973859, upload-time = "2025-04-23T18:31:59.065Z" },
- { url = "https://files.pythonhosted.org/packages/e6/55/f1a813904771c03a3f97f676c62cca0c0a4138654107c1b61f19c644868b/pydantic_core-2.33.2-cp313-cp313-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:1ea40a64d23faa25e62a70ad163571c0b342b8bf66d5fa612ac0dec4f069d916", size = 2120810, upload-time = "2025-04-23T18:32:00.78Z" },
- { url = "https://files.pythonhosted.org/packages/aa/c3/053389835a996e18853ba107a63caae0b9deb4a276c6b472931ea9ae6e48/pydantic_core-2.33.2-cp313-cp313-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:0fb2d542b4d66f9470e8065c5469ec676978d625a8b7a363f07d9a501a9cb36a", size = 2676498, upload-time = "2025-04-23T18:32:02.418Z" },
- { url = "https://files.pythonhosted.org/packages/eb/3c/f4abd740877a35abade05e437245b192f9d0ffb48bbbbd708df33d3cda37/pydantic_core-2.33.2-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:9fdac5d6ffa1b5a83bca06ffe7583f5576555e6c8b3a91fbd25ea7780f825f7d", size = 2000611, upload-time = "2025-04-23T18:32:04.152Z" },
- { url = "https://files.pythonhosted.org/packages/59/a7/63ef2fed1837d1121a894d0ce88439fe3e3b3e48c7543b2a4479eb99c2bd/pydantic_core-2.33.2-cp313-cp313-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:04a1a413977ab517154eebb2d326da71638271477d6ad87a769102f7c2488c56", size = 2107924, upload-time = "2025-04-23T18:32:06.129Z" },
- { url = "https://files.pythonhosted.org/packages/04/8f/2551964ef045669801675f1cfc3b0d74147f4901c3ffa42be2ddb1f0efc4/pydantic_core-2.33.2-cp313-cp313-musllinux_1_1_aarch64.whl", hash = "sha256:c8e7af2f4e0194c22b5b37205bfb293d166a7344a5b0d0eaccebc376546d77d5", size = 2063196, upload-time = "2025-04-23T18:32:08.178Z" },
- { url = "https://files.pythonhosted.org/packages/26/bd/d9602777e77fc6dbb0c7db9ad356e9a985825547dce5ad1d30ee04903918/pydantic_core-2.33.2-cp313-cp313-musllinux_1_1_armv7l.whl", hash = "sha256:5c92edd15cd58b3c2d34873597a1e20f13094f59cf88068adb18947df5455b4e", size = 2236389, upload-time = "2025-04-23T18:32:10.242Z" },
- { url = "https://files.pythonhosted.org/packages/42/db/0e950daa7e2230423ab342ae918a794964b053bec24ba8af013fc7c94846/pydantic_core-2.33.2-cp313-cp313-musllinux_1_1_x86_64.whl", hash = "sha256:65132b7b4a1c0beded5e057324b7e16e10910c106d43675d9bd87d4f38dde162", size = 2239223, upload-time = "2025-04-23T18:32:12.382Z" },
- { url = "https://files.pythonhosted.org/packages/58/4d/4f937099c545a8a17eb52cb67fe0447fd9a373b348ccfa9a87f141eeb00f/pydantic_core-2.33.2-cp313-cp313-win32.whl", hash = "sha256:52fb90784e0a242bb96ec53f42196a17278855b0f31ac7c3cc6f5c1ec4811849", size = 1900473, upload-time = "2025-04-23T18:32:14.034Z" },
- { url = "https://files.pythonhosted.org/packages/a0/75/4a0a9bac998d78d889def5e4ef2b065acba8cae8c93696906c3a91f310ca/pydantic_core-2.33.2-cp313-cp313-win_amd64.whl", hash = "sha256:c083a3bdd5a93dfe480f1125926afcdbf2917ae714bdb80b36d34318b2bec5d9", size = 1955269, upload-time = "2025-04-23T18:32:15.783Z" },
- { url = "https://files.pythonhosted.org/packages/f9/86/1beda0576969592f1497b4ce8e7bc8cbdf614c352426271b1b10d5f0aa64/pydantic_core-2.33.2-cp313-cp313-win_arm64.whl", hash = "sha256:e80b087132752f6b3d714f041ccf74403799d3b23a72722ea2e6ba2e892555b9", size = 1893921, upload-time = "2025-04-23T18:32:18.473Z" },
- { url = "https://files.pythonhosted.org/packages/a4/7d/e09391c2eebeab681df2b74bfe6c43422fffede8dc74187b2b0bf6fd7571/pydantic_core-2.33.2-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:61c18fba8e5e9db3ab908620af374db0ac1baa69f0f32df4f61ae23f15e586ac", size = 1806162, upload-time = "2025-04-23T18:32:20.188Z" },
- { url = "https://files.pythonhosted.org/packages/f1/3d/847b6b1fed9f8ed3bb95a9ad04fbd0b212e832d4f0f50ff4d9ee5a9f15cf/pydantic_core-2.33.2-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:95237e53bb015f67b63c91af7518a62a8660376a6a0db19b89acc77a4d6199f5", size = 1981560, upload-time = "2025-04-23T18:32:22.354Z" },
- { url = "https://files.pythonhosted.org/packages/6f/9a/e73262f6c6656262b5fdd723ad90f518f579b7bc8622e43a942eec53c938/pydantic_core-2.33.2-cp313-cp313t-win_amd64.whl", hash = "sha256:c2fc0a768ef76c15ab9238afa6da7f69895bb5d1ee83aeea2e3509af4472d0b9", size = 1935777, upload-time = "2025-04-23T18:32:25.088Z" },
+sdist = { url = "https://files.pythonhosted.org/packages/71/70/23b021c950c2addd24ec408e9ab05d59b035b39d97cdc1130e1bce647bb6/pydantic_core-2.41.5.tar.gz", hash = "sha256:08daa51ea16ad373ffd5e7606252cc32f07bc72b28284b6bc9c6df804816476e", size = 460952, upload-time = "2025-11-04T13:43:49.098Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/5f/5d/5f6c63eebb5afee93bcaae4ce9a898f3373ca23df3ccaef086d0233a35a7/pydantic_core-2.41.5-cp312-cp312-macosx_10_12_x86_64.whl", hash = "sha256:f41a7489d32336dbf2199c8c0a215390a751c5b014c2c1c5366e817202e9cdf7", size = 2110990, upload-time = "2025-11-04T13:39:58.079Z" },
+ { url = "https://files.pythonhosted.org/packages/aa/32/9c2e8ccb57c01111e0fd091f236c7b371c1bccea0fa85247ac55b1e2b6b6/pydantic_core-2.41.5-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:070259a8818988b9a84a449a2a7337c7f430a22acc0859c6b110aa7212a6d9c0", size = 1896003, upload-time = "2025-11-04T13:39:59.956Z" },
+ { url = "https://files.pythonhosted.org/packages/68/b8/a01b53cb0e59139fbc9e4fda3e9724ede8de279097179be4ff31f1abb65a/pydantic_core-2.41.5-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:e96cea19e34778f8d59fe40775a7a574d95816eb150850a85a7a4c8f4b94ac69", size = 1919200, upload-time = "2025-11-04T13:40:02.241Z" },
+ { url = "https://files.pythonhosted.org/packages/38/de/8c36b5198a29bdaade07b5985e80a233a5ac27137846f3bc2d3b40a47360/pydantic_core-2.41.5-cp312-cp312-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:ed2e99c456e3fadd05c991f8f437ef902e00eedf34320ba2b0842bd1c3ca3a75", size = 2052578, upload-time = "2025-11-04T13:40:04.401Z" },
+ { url = "https://files.pythonhosted.org/packages/00/b5/0e8e4b5b081eac6cb3dbb7e60a65907549a1ce035a724368c330112adfdd/pydantic_core-2.41.5-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:65840751b72fbfd82c3c640cff9284545342a4f1eb1586ad0636955b261b0b05", size = 2208504, upload-time = "2025-11-04T13:40:06.072Z" },
+ { url = "https://files.pythonhosted.org/packages/77/56/87a61aad59c7c5b9dc8caad5a41a5545cba3810c3e828708b3d7404f6cef/pydantic_core-2.41.5-cp312-cp312-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:e536c98a7626a98feb2d3eaf75944ef6f3dbee447e1f841eae16f2f0a72d8ddc", size = 2335816, upload-time = "2025-11-04T13:40:07.835Z" },
+ { url = "https://files.pythonhosted.org/packages/0d/76/941cc9f73529988688a665a5c0ecff1112b3d95ab48f81db5f7606f522d3/pydantic_core-2.41.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:eceb81a8d74f9267ef4081e246ffd6d129da5d87e37a77c9bde550cb04870c1c", size = 2075366, upload-time = "2025-11-04T13:40:09.804Z" },
+ { url = "https://files.pythonhosted.org/packages/d3/43/ebef01f69baa07a482844faaa0a591bad1ef129253ffd0cdaa9d8a7f72d3/pydantic_core-2.41.5-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:d38548150c39b74aeeb0ce8ee1d8e82696f4a4e16ddc6de7b1d8823f7de4b9b5", size = 2171698, upload-time = "2025-11-04T13:40:12.004Z" },
+ { url = "https://files.pythonhosted.org/packages/b1/87/41f3202e4193e3bacfc2c065fab7706ebe81af46a83d3e27605029c1f5a6/pydantic_core-2.41.5-cp312-cp312-musllinux_1_1_aarch64.whl", hash = "sha256:c23e27686783f60290e36827f9c626e63154b82b116d7fe9adba1fda36da706c", size = 2132603, upload-time = "2025-11-04T13:40:13.868Z" },
+ { url = "https://files.pythonhosted.org/packages/49/7d/4c00df99cb12070b6bccdef4a195255e6020a550d572768d92cc54dba91a/pydantic_core-2.41.5-cp312-cp312-musllinux_1_1_armv7l.whl", hash = "sha256:482c982f814460eabe1d3bb0adfdc583387bd4691ef00b90575ca0d2b6fe2294", size = 2329591, upload-time = "2025-11-04T13:40:15.672Z" },
+ { url = "https://files.pythonhosted.org/packages/cc/6a/ebf4b1d65d458f3cda6a7335d141305dfa19bdc61140a884d165a8a1bbc7/pydantic_core-2.41.5-cp312-cp312-musllinux_1_1_x86_64.whl", hash = "sha256:bfea2a5f0b4d8d43adf9d7b8bf019fb46fdd10a2e5cde477fbcb9d1fa08c68e1", size = 2319068, upload-time = "2025-11-04T13:40:17.532Z" },
+ { url = "https://files.pythonhosted.org/packages/49/3b/774f2b5cd4192d5ab75870ce4381fd89cf218af999515baf07e7206753f0/pydantic_core-2.41.5-cp312-cp312-win32.whl", hash = "sha256:b74557b16e390ec12dca509bce9264c3bbd128f8a2c376eaa68003d7f327276d", size = 1985908, upload-time = "2025-11-04T13:40:19.309Z" },
+ { url = "https://files.pythonhosted.org/packages/86/45/00173a033c801cacf67c190fef088789394feaf88a98a7035b0e40d53dc9/pydantic_core-2.41.5-cp312-cp312-win_amd64.whl", hash = "sha256:1962293292865bca8e54702b08a4f26da73adc83dd1fcf26fbc875b35d81c815", size = 2020145, upload-time = "2025-11-04T13:40:21.548Z" },
+ { url = "https://files.pythonhosted.org/packages/f9/22/91fbc821fa6d261b376a3f73809f907cec5ca6025642c463d3488aad22fb/pydantic_core-2.41.5-cp312-cp312-win_arm64.whl", hash = "sha256:1746d4a3d9a794cacae06a5eaaccb4b8643a131d45fbc9af23e353dc0a5ba5c3", size = 1976179, upload-time = "2025-11-04T13:40:23.393Z" },
+ { url = "https://files.pythonhosted.org/packages/87/06/8806241ff1f70d9939f9af039c6c35f2360cf16e93c2ca76f184e76b1564/pydantic_core-2.41.5-cp313-cp313-macosx_10_12_x86_64.whl", hash = "sha256:941103c9be18ac8daf7b7adca8228f8ed6bb7a1849020f643b3a14d15b1924d9", size = 2120403, upload-time = "2025-11-04T13:40:25.248Z" },
+ { url = "https://files.pythonhosted.org/packages/94/02/abfa0e0bda67faa65fef1c84971c7e45928e108fe24333c81f3bfe35d5f5/pydantic_core-2.41.5-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:112e305c3314f40c93998e567879e887a3160bb8689ef3d2c04b6cc62c33ac34", size = 1896206, upload-time = "2025-11-04T13:40:27.099Z" },
+ { url = "https://files.pythonhosted.org/packages/15/df/a4c740c0943e93e6500f9eb23f4ca7ec9bf71b19e608ae5b579678c8d02f/pydantic_core-2.41.5-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:0cbaad15cb0c90aa221d43c00e77bb33c93e8d36e0bf74760cd00e732d10a6a0", size = 1919307, upload-time = "2025-11-04T13:40:29.806Z" },
+ { url = "https://files.pythonhosted.org/packages/9a/e3/6324802931ae1d123528988e0e86587c2072ac2e5394b4bc2bc34b61ff6e/pydantic_core-2.41.5-cp313-cp313-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:03ca43e12fab6023fc79d28ca6b39b05f794ad08ec2feccc59a339b02f2b3d33", size = 2063258, upload-time = "2025-11-04T13:40:33.544Z" },
+ { url = "https://files.pythonhosted.org/packages/c9/d4/2230d7151d4957dd79c3044ea26346c148c98fbf0ee6ebd41056f2d62ab5/pydantic_core-2.41.5-cp313-cp313-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:dc799088c08fa04e43144b164feb0c13f9a0bc40503f8df3e9fde58a3c0c101e", size = 2214917, upload-time = "2025-11-04T13:40:35.479Z" },
+ { url = "https://files.pythonhosted.org/packages/e6/9f/eaac5df17a3672fef0081b6c1bb0b82b33ee89aa5cec0d7b05f52fd4a1fa/pydantic_core-2.41.5-cp313-cp313-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:97aeba56665b4c3235a0e52b2c2f5ae9cd071b8a8310ad27bddb3f7fb30e9aa2", size = 2332186, upload-time = "2025-11-04T13:40:37.436Z" },
+ { url = "https://files.pythonhosted.org/packages/cf/4e/35a80cae583a37cf15604b44240e45c05e04e86f9cfd766623149297e971/pydantic_core-2.41.5-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:406bf18d345822d6c21366031003612b9c77b3e29ffdb0f612367352aab7d586", size = 2073164, upload-time = "2025-11-04T13:40:40.289Z" },
+ { url = "https://files.pythonhosted.org/packages/bf/e3/f6e262673c6140dd3305d144d032f7bd5f7497d3871c1428521f19f9efa2/pydantic_core-2.41.5-cp313-cp313-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:b93590ae81f7010dbe380cdeab6f515902ebcbefe0b9327cc4804d74e93ae69d", size = 2179146, upload-time = "2025-11-04T13:40:42.809Z" },
+ { url = "https://files.pythonhosted.org/packages/75/c7/20bd7fc05f0c6ea2056a4565c6f36f8968c0924f19b7d97bbfea55780e73/pydantic_core-2.41.5-cp313-cp313-musllinux_1_1_aarch64.whl", hash = "sha256:01a3d0ab748ee531f4ea6c3e48ad9dac84ddba4b0d82291f87248f2f9de8d740", size = 2137788, upload-time = "2025-11-04T13:40:44.752Z" },
+ { url = "https://files.pythonhosted.org/packages/3a/8d/34318ef985c45196e004bc46c6eab2eda437e744c124ef0dbe1ff2c9d06b/pydantic_core-2.41.5-cp313-cp313-musllinux_1_1_armv7l.whl", hash = "sha256:6561e94ba9dacc9c61bce40e2d6bdc3bfaa0259d3ff36ace3b1e6901936d2e3e", size = 2340133, upload-time = "2025-11-04T13:40:46.66Z" },
+ { url = "https://files.pythonhosted.org/packages/9c/59/013626bf8c78a5a5d9350d12e7697d3d4de951a75565496abd40ccd46bee/pydantic_core-2.41.5-cp313-cp313-musllinux_1_1_x86_64.whl", hash = "sha256:915c3d10f81bec3a74fbd4faebe8391013ba61e5a1a8d48c4455b923bdda7858", size = 2324852, upload-time = "2025-11-04T13:40:48.575Z" },
+ { url = "https://files.pythonhosted.org/packages/1a/d9/c248c103856f807ef70c18a4f986693a46a8ffe1602e5d361485da502d20/pydantic_core-2.41.5-cp313-cp313-win32.whl", hash = "sha256:650ae77860b45cfa6e2cdafc42618ceafab3a2d9a3811fcfbd3bbf8ac3c40d36", size = 1994679, upload-time = "2025-11-04T13:40:50.619Z" },
+ { url = "https://files.pythonhosted.org/packages/9e/8b/341991b158ddab181cff136acd2552c9f35bd30380422a639c0671e99a91/pydantic_core-2.41.5-cp313-cp313-win_amd64.whl", hash = "sha256:79ec52ec461e99e13791ec6508c722742ad745571f234ea6255bed38c6480f11", size = 2019766, upload-time = "2025-11-04T13:40:52.631Z" },
+ { url = "https://files.pythonhosted.org/packages/73/7d/f2f9db34af103bea3e09735bb40b021788a5e834c81eedb541991badf8f5/pydantic_core-2.41.5-cp313-cp313-win_arm64.whl", hash = "sha256:3f84d5c1b4ab906093bdc1ff10484838aca54ef08de4afa9de0f5f14d69639cd", size = 1981005, upload-time = "2025-11-04T13:40:54.734Z" },
+ { url = "https://files.pythonhosted.org/packages/09/32/59b0c7e63e277fa7911c2fc70ccfb45ce4b98991e7ef37110663437005af/pydantic_core-2.41.5-graalpy312-graalpy250_312_native-macosx_10_12_x86_64.whl", hash = "sha256:7da7087d756b19037bc2c06edc6c170eeef3c3bafcb8f532ff17d64dc427adfd", size = 2110495, upload-time = "2025-11-04T13:42:49.689Z" },
+ { url = "https://files.pythonhosted.org/packages/aa/81/05e400037eaf55ad400bcd318c05bb345b57e708887f07ddb2d20e3f0e98/pydantic_core-2.41.5-graalpy312-graalpy250_312_native-macosx_11_0_arm64.whl", hash = "sha256:aabf5777b5c8ca26f7824cb4a120a740c9588ed58df9b2d196ce92fba42ff8dc", size = 1915388, upload-time = "2025-11-04T13:42:52.215Z" },
+ { url = "https://files.pythonhosted.org/packages/6e/0d/e3549b2399f71d56476b77dbf3cf8937cec5cd70536bdc0e374a421d0599/pydantic_core-2.41.5-graalpy312-graalpy250_312_native-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:c007fe8a43d43b3969e8469004e9845944f1a80e6acd47c150856bb87f230c56", size = 1942879, upload-time = "2025-11-04T13:42:56.483Z" },
+ { url = "https://files.pythonhosted.org/packages/f7/07/34573da085946b6a313d7c42f82f16e8920bfd730665de2d11c0c37a74b5/pydantic_core-2.41.5-graalpy312-graalpy250_312_native-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:76d0819de158cd855d1cbb8fcafdf6f5cf1eb8e470abe056d5d161106e38062b", size = 2139017, upload-time = "2025-11-04T13:42:59.471Z" },
]
[[package]]