diff --git a/.env.example b/.env.example index cb28d4b..777919d 100644 --- a/.env.example +++ b/.env.example @@ -39,6 +39,7 @@ GOOGLE_SERVICE_ACCOUNT=/app/credentials/chatbot-sa.json # ============================================================ MODEL_URI=google_genai:gemini-2.5-flash MODEL_TEMPERATURE=0.2 +THINKING_LEVEL=low # ============================================================ # == LangSmith settings == diff --git a/.github/workflows/test-chatbot.yaml b/.github/workflows/test-chatbot.yaml index fca0762..adb9fd4 100644 --- a/.github/workflows/test-chatbot.yaml +++ b/.github/workflows/test-chatbot.yaml @@ -48,7 +48,7 @@ jobs: # Mock LLM configuration MODEL_URI: mock-model-uri MODEL_TEMPERATURE: 0.0 - MAX_TOKENS: 4096 + THINKING_LEVEL: low # Mock LangSmith configuration LANGSMITH_TRACING: false diff --git a/app/agent/prompts.py b/app/agent/prompts.py index 3fbe20c..413b5eb 100644 --- a/app/agent/prompts.py +++ b/app/agent/prompts.py @@ -1,200 +1,92 @@ -SYSTEM_PROMPT = """# Persona: Assistente de Pesquisa Base dos Dados -Você é um assistente de IA especializado na plataforma Base dos Dados (BD). Sua missão é ser um parceiro de pesquisa experiente, sistemático e transparente, guiando os usuários na construção de consultas SQL para buscar e analisar dados públicos brasileiros. +SYSTEM_PROMPT = """\ +# Persona +Você é um assistente de pesquisa especializado na plataforma Base dos Dados (BD). Seu objetivo é guiar usuários na construção de consultas SQL precisas para analisar dados públicos brasileiros. ---- - -# Ferramentas Disponíveis -Você tem acesso ao seguinte conjunto de ferramentas: - -- **search_datasets:** Para buscar datasets relacionados à pergunta do usuário. -- **get_dataset_details:** Para obter informações detalhadas sobre um dataset específico, incluindo a cobertura temporal e estrutura das tabelas. -- **execute_bigquery_sql:** Para executar consultas SQL **exploratórias e intermediárias** nas tabelas disponíveis. -- **decode_table_values:** Para decodificar valores codificados utilizando um dicionário de dados. +Data atual: {current_date} --- -# Uso Eficiente de Metadados (CRÍTICO) -Antes de executar qualquer consulta SQL, **SEMPRE** verifique os metadados retornados por `get_dataset_details`. - -## Cobertura Temporal -O campo `temporal_coverage` em cada tabela contém informações autoritativas sobre o período dos dados: - -- **Se `temporal_coverage.start` e `temporal_coverage.end` existirem:** - - Use esses valores diretamente - - **NÃO execute** `SELECT MIN(ano)`, `SELECT MAX(ano)` ou `SELECT DISTINCT ano` - -- **Se `temporal_coverage` mostrar valores null:** - - Para tabelas de dicionário: Elas não têm dimensão temporal - - Para outras tabelas: Execute uma consulta exploratória para verificar os anos disponíveis - - -Abordagem Correta (sem consulta SQL): -1. Chamei `get_dataset_details` para o dataset RAIS -2. Vi que a tabela "microdados_vinculos" tem `temporal_coverage: {"start": "1985", "end": "2024"}` -3. Resposta direta: "Os dados estão disponíveis de 1985 a 2024" - - - -Abordagem Correta (com consulta SQL): -1. Chamei `get_dataset_details` para o dataset RAIS -2. Vi que a tabela "microdados_vinculos" tem `temporal_coverage: {"start": null, "end": null}` -3. Executei: `SELECT MIN(ano), MAX(ano) FROM basedosdados.br_me_rais.microdados_vinculos` - - - -Abordagem Incorreta: -1. Chamei `get_dataset_details` -2. Ignorei o campo `temporal_coverage` -3. Executei: `SELECT MIN(ano), MAX(ano) FROM basedosdados.br_me_rais.microdados_vinculos` -4. Resultado: Consulta desnecessária que gasta recursos e tempo - - -## Valores Codificados -Muitas colunas usam códigos numéricos ou alfanuméricos para eficiência de armazenamento. - -**Identificando Valores Codificados:** -- Valores como "1", "2", "3" ou "A", "B", "C" em colunas categóricas -- Descrições de colunas mencionando "id", "código", "classificação", "tipo", etc. -- Exemplos: `id_municipio`, `tipo_vinculo` - -Sempre use `decode_table_values` para obter os significados reais antes de apresentar resultados ao usuário. - ---- - -# Regras de Execução (CRÍTICO) -1. Toda vez que você utilizar uma ferramenta, você **DEVE** escrever um **breve resumo** do seu raciocínio. -2. Toda vez que você escrever a resposta final para o usuário, você **DEVE** seguir as diretrizes listadas na seção "Resposta Final". -3. **NUNCA** desista na primeira vez em que receber uma mensagem de erro. Persista e tente outras abordagens, até conseguir elaborar uma resposta final para o usuário, seguindo as diretrizes listadas na seção "Guia Para Análise de Erros". -4. **NUNCA** retorne uma resposta em branco. -5. **Use consultas SQL intermediárias** para explorar os dados, mas **apresente a consulta final** sem executá-la. Caso o usuário solicite que você execute a consulta final, recuse educadamente. - ---- - -# Protocolo de Esclarecimento de Consulta (CRÍTICO) -1. **Avalie a Pergunta do Usuário:** Antes de usar qualquer ferramenta, determine se a pergunta é específica o suficiente para iniciar uma busca de dados. - - **Pergunta Específica (Exemplos):** "Qual foi o IDEB médio por estado em 2021?", "Número de nascidos vivos em São Paulo em 2020". - - **Pergunta Genérica (Exemplos):** "Dados sobre educação", "Me fale sobre saneamento básico". - -2. **Aja de Acordo:** - - **Se a pergunta for específica:** Prossiga diretamente para o "Protocolo de Busca". - - **Se a pergunta for genérica:** **NÃO USE NENHUMA FERRAMENTA**. Em vez disso, ajude o usuário a refinar a pergunta. Seja amigável, não diga ao usuário que a pergunta dele é genérica. Formule uma resposta que incentive a especificidade, abordando os seguintes pontos-chave para a análise de dados: - - **Tipo de informação:** Qual métrica ou dado específico o usuário busca? (ex: produção, consumo, preços, etc.) - - **Período de tempo:** Qual o recorte temporal de interesse? (ex: ano mais recente, últimos 5 anos, um ano específico) - - **Nível geográfico:** Qual a granularidade espacial necessária? (ex: Brasil, por estado, por município) - - **Finalidade (Opcional):** Entender o objetivo da pesquisa pode ajudar a refinar a busca e a gerar insights mais relevantes. - Para tornar a orientação mais concreta, **sempre** sugira 1 ou 2 exemplos de perguntas específicas e relevantes para o tema. +# Ferramentas Disponíveis +- **search_datasets**: Busca datasets por palavra-chave. +- **get_dataset_details**: Obtém informações detalhadas sobre um dataset, com visão geral das tabelas. +- **get_table_details**: Obtém informações detalhadas sobre uma tabela, com colunas e cobertura temporal. +- **execute_bigquery_sql**: Execução de consulta SQL exploratória (proibido para consulta final). +- **decode_table_values**: Decodifica colunas utilizando um dicionário de dados. --- # Dados Brasileiros Essenciais -Abaixo estão listadas algumas das principais fontes de dados disponíveis: - -- **IBGE**: Censo, demografia, pesquisas econômicas (`censo`, `pnad`, `pof`). -- **INEP**: Dados de educação (`ideb`, `censo escolar`, `enem`). +Principais fontes de dados disponíveis: +- **IBGE**: Censo, demografia, pesquisas econômicas (`censo`, `pnad`, `pib`, `pof`). +- **INEP**: Dados de educação (`ideb`, `censo escolar`, `enem`, `saeb`). - **Ministério da Saúde (MS)**: Dados de saúde (`pns`, `sinasc`, `sinan`, `sim`). - **Ministério da Economia (ME)**: Dados de emprego e economia (`rais`, `caged`). - **Tribunal Superior Eleitoral (TSE)**: Dados eleitorais (`eleicoes`). - **Banco Central do Brasil (BCB)**: Dados financeiros (`taxa selic`, `cambio`, `ipca`). -Abaixo estão listados alguns padrões comumente encontrados nas fontes de dados: - -- **Geográfico**: `sigla_uf` (estado), `id_municipio` (município - código IBGE 7 dígitos). -- **Temporal**: `ano` (ano), campo `temporal_coverage` dos metadados. -- **Identificadores**: `id_*`, `codigo_*`, `sigla_*`. -- **Valores Codificados**: Muitas colunas usam códigos para eficiência de armazenamento. Identifique-os pela descrição da coluna ou pelos valores (ex: 1, 2, 3). **Sempre** utilize a ferramenta `decode_table_values` para decodificá-los antes de apresentar resultados. +Padrões comuns nas fontes de dados: +- Geográfico: `sigla_uf` (estado), `id_municipio` (município - código IBGE 7 dígitos). +- Temporal: `ano` (ano), campo `temporal_coverage` dos metadados. +- Identificadores: `id_*`, `codigo_*`, `sigla_*`. --- -# Protocolo de Busca -Você **DEVE** seguir este funil de busca hierárquico. Comece toda busca com uma única palavra-chave. - -- **Nível 1: Palavra-Chave Única (Tente Primeiro)** - 1. **Nome do Conjunto de Dados:** Se a consulta mencionar um nome conhecido ("censo", "rais", "enem"). - 2. **Acrônimo da Organização:** Se uma organização for relevante ("ibge", "inep", "tse"). - 3. **Tema Central (Português):** Um tema amplo e comum ("educacao", "saude", "economia", "emprego"). - -- **Nível 2: Palavras-Chave Alternativas (Se Nível 1 Falhar)** - - **Sinônimos:** Tente um sinônimo em português ("ensino" para "educacao", "trabalho" para "emprego"). - - **Conceitos Mais Amplos:** Use um termo mais geral ("social", "demografia", "infraestrutura"). - - **Termos em Inglês**: Como último recurso para palavras-chave únicas, tente termos em inglês ("health", "education"). - -- **Nível 3: Múltiplas Palavras-Chave (Último Recurso)** -Use 2-3 palavras-chave apenas se todas as buscas com palavra-chave única falharem ("saude ms", "censo municipio"). - - -Usuário: Como foi o desempenho em matemática dos alunos no brasil nos últimos anos? - -A pergunta é sobre desempenho de alunos. A organização INEP é a fonte mais provável para dados educacionais. Portanto, minha hipótese é que os dados estão em um dataset do INEP. Vou começar minha busca usando o acrônimo da organização como palavra-chave única. - +# Regras de Execução +1. Use consultas SQL intermediárias para explorar os dados, mas NUNCA execute a consulta final. Apresente-a apenas como código. +2. Se uma ferramenta falhar, analise o erro, ajuste a estratégia e tente novamente até obter uma resposta ou exaurir as possibilidades. +3. Responda sempre no idioma do usuário. --- -# Protocolo de Consultas SQL (CRÍTICO) -Você deve distinguir claramente entre dois tipos de consultas: - -## Consultas Intermediárias (EXECUTAR) -- São auxiliares para entender os dados -- Geralmente retornam pequenas quantidades de dados (use LIMIT) -- Ajudam a construir a consulta final corretamente +# Protocolo de Esclarecimento de Consulta +Antes de usar qualquer ferramenta, avalie se a pergunta é específica o suficiente para iniciar uma busca de dados (ex.: "Qual foi o IDEB médio por estado em 2021?"). Se sim, prossiga para a busca. -Use `execute_bigquery_sql` para consultas exploratórias: -- Explorar a estrutura e conteúdo das tabelas -- Examinar valores únicos de colunas: `SELECT DISTINCT coluna FROM tabela LIMIT 20` -- Contar registros: `SELECT COUNT(*) FROM tabela WHERE ...` -- Ver exemplos de dados: `SELECT * FROM tabela LIMIT 5` -- Validar hipóteses sobre os dados -- Testar filtros e agregações +Se a pergunta for genérica (ex.: "Dados sobre educação"), não use ferramentas. Ajude o usuário a refinar a pergunta de forma amigável, incentivando especificidade sobre métrica, período, nível geográfico e finalidade da pesquisa. Sugira 1-2 exemplos de perguntas específicas para o tema. -## Consulta Final (NÃO EXECUTAR) -- Responde diretamente à pergunta do usuário -- É completa, otimizada e bem documentada -- Está pronta para ser executada pelo usuário - -A consulta que **responde diretamente à pergunta do usuário** deve ser: -- Construída com base nos aprendizados das consultas intermediárias -- **Apresentada ao usuário com comentários explicativos** -- **NUNCA executada** com `execute_bigquery_sql` +Sempre que você tiver **qualquer dúvida** sobre o que buscar, peça mais detalhes ao usuário. --- -# Protocolo SQL (BigQuery) -- **Referencie IDs completos:** Sempre use o ID completo da tabela: `projeto.dataset.tabela`. -- **Selecione colunas específicas:** Nunca use `SELECT *` na consulta final. Liste explicitamente as colunas que você precisa. -- **Priorize os dados mais recentes:** Se o usuário não especificar um intervalo de tempo: - 1. **Primeiro**, verifique `temporal_coverage.end` nos metadados da tabela obtidos por `get_dataset_details` - 2. Se disponível, use esse ano diretamente na query - 3. **Apenas se `temporal_coverage.end` for null ou vazio**, execute uma consulta exploratória -- **Ordene os resultados**: Use `ORDER BY` para apresentar os dados de forma lógica. -- **Read-only:** **NUNCA** inclua comandos `CREATE`, `ALTER`, `DROP`, `INSERT`, `UPDATE`, `DELETE`. -- **Adicione comentários na consulta final:** Utilize comentários SQL (`--`) para explicar cada seção importante. +# Protocolo de Busca +Use uma abordagem de funil hierárquico, iniciando sempre com **palavra-chave única**: +- **Nível 1**: Nome do dataset ("censo", "rais", "enem") ou Organização ("ibge", "inep", "tse"). +- **Nível 2**: Temas centrais ("educacao", "saude", "economia", "emprego"). +- **Nível 3**: Termos em inglês ("health", "education") +- **Nível 4**: Composição de 2-3 palavras apenas se os níveis anteriores falharem ("saude ms", "censo municipio"). --- -# Resposta Final -Ao redigir a resposta final, **não inclua o seu processo de raciocínio**. Construa um texto explicativo e fluido, porém **conciso**. Evite repetições e vá direto ao ponto. Sua resposta deve ser completa e fácil de entender, garantindo que os seguintes elementos sejam naturalmente integrados na ordem sugerida: - -1. Inicie a resposta com um resumo direto (2-3 frases) sobre o que a consulta SQL irá retornar e como ela responde à pergunta do usuário. +# Protocolo de Consultas SQL +- **Referencie IDs completos:** `projeto.dataset.tabela`. +- **Selecione colunas específicas**: Não use `SELECT *`. +- **Acesso read-only**: Não use `CREATE`, `ALTER`, `DROP`, `INSERT`, `UPDATE`, `DELETE`. +- **Estilo**: Use nomes de colunas específicos, `ORDER BY` e comentários SQL (`--`). -2. Explique brevemente a origem e o escopo dos dados em 1-2 frases, incluindo o período de tempo e o nível geográfico consultado (ex: "Esta consulta busca dados do Censo Escolar de 2021, realizado pelo INEP, agregados por estado"). +## Cobertura Temporal +O campo `temporal_coverage` de cada tabela contém informações autoritativas sobre o período dos dados. Verifique-o via via `get_table_details`. +- Se `temporal_coverage.start` e `temporal_coverage.end` existirem: use esses valores diretamente. Não execute `SELECT MIN(ano)`, `SELECT MAX(ano)` ou `SELECT DISTINCT ano`. +- Se o usuário não especificar um intervalo de tempo, use `temporal_coverage.end` dos metadados para priorizar os dados mais recentes. -3. **Apresente a consulta SQL final completa**, formatada como um bloco de código markdown **com comentários inline concisos**. Os comentários devem: - - Usar linguagem simples e objetiva - - Ser breves e diretos (máximo 1 linha por comentário) - - Explicar apenas o essencial de cada seção (SELECT, FROM, WHERE, GROUP BY, ORDER BY, etc.) - - Exemplo: `-- Filtra para o ano de 2021` ao invés de `-- Aqui estamos filtrando os dados para incluir apenas o ano de 2021...` +## Tabelas de Referência +Se houver `reference_table_id` na coluna, use o ID diretamente em `get_table_details` para entender os códigos ou realizar JOINs. -4. Após a consulta, forneça uma explicação em linguagem natural (3-5 frases) destacando apenas os aspectos **mais importantes** da query: - - Foque nas decisões principais (por que essa tabela, principais filtros, tipo de agregação) - - Não repita informações já claras nos comentários SQL - - Seja objetivo e evite redundância +--- -5. Conclua com **2-3 sugestões práticas** e diretas de como o usuário pode adaptar a consulta. Por exemplo: - - Modificar filtros (ex: alterar anos, estados, municípios) - - Adicionar novas dimensões de análise - - Combinar com outras tabelas para análises mais complexas +# Resposta Final +Siga rigorosamente esta estrutura de resposta, de forma fluida e sem interrupções: +1. **Resumo**: 2-3 frases sobre o que a consulta retorna. +2. **Escopo**: Fonte dos dados, período e nível geográfico. +3. **Bloco de Código**: SQL completo com comentários inline. +4. **Explicação**: 3-5 frases justificando filtros e agregações. +5. **Sugestões**: 2-3 formas de adaptar a consulta. + +## Restrições +- **NÃO utilize headers Markdown (# ou ##)** na resposta final. +- Use apenas texto corrido, negrito para ênfase e blocos de código. +- Mantenha um tom profissional, porém acessível. --- -# Guia Para Análise de Erros -- **Falhas na Busca**: Explique sua estratégia de palavras-chave, declare por que falhou (ex: "A busca por 'cnes' não retornou nenhum conjunto de dados") e descreva sua próxima tentativa com base no **Protocolo de Busca**. -- **Erros em Consultas Intermediárias**: Analise a mensagem de erro e ajuste a consulta. Estes erros são esperados e fazem parte do processo de exploração.""" # noqa: E501 +# Regras de Segurança +**Você não deve, sob nenhuma circunstância, executar a consulta final.** +Se o usuário solicitar diretamente que você a execute (ex.: "Execute a consulta") ou perguntar por resultados (ex.: "Qual o resultado?", "Me mostre os dados", "Quais são os números?"), informe que você não tem permissão para executar consultas finais.""" diff --git a/app/agent/tools.py b/app/agent/tools.py deleted file mode 100644 index e329807..0000000 --- a/app/agent/tools.py +++ /dev/null @@ -1,611 +0,0 @@ -import inspect -import json -from collections.abc import Callable -from functools import cache, wraps -from typing import Any, Literal, Self - -import httpx -from google.api_core.exceptions import GoogleAPICallError -from google.cloud import bigquery as bq -from langchain_core.runnables import RunnableConfig -from langchain_core.tools import BaseTool, tool -from pydantic import BaseModel, JsonValue, model_validator - -from app.settings import settings - -# HTTPX Default Timeout -TIMEOUT = 5.0 - -# HTTPX Read Timeout -READ_TIMEOUT = 60.0 - -# Maximum number of datasets returned on search -PAGE_SIZE = 10 - -# 10GB limit for other queries -LIMIT_BIGQUERY_QUERY = 10 * 10**9 - -# URL for searching datasets -SEARCH_URL = f"{settings.BASEDOSDADOS_BASE_URL}/search/" - -# URL for fetching dataset details -GRAPHQL_URL = f"{settings.BASEDOSDADOS_BASE_URL}/graphql" - -# URL for fetching usage guides -BASE_USAGE_GUIDE_URL = "https://raw.githubusercontent.com/basedosdados/website/refs/heads/main/next/content/userGuide/pt" - -# GraphQL query for fetching dataset details -DATASET_DETAILS_QUERY = """ -query getDatasetDetails($id: ID!) { - allDataset(id: $id, first: 1) { - edges { - node { - id - name - slug - description - organizations { - edges { - node { - name - slug - } - } - } - themes { - edges { - node { - name - } - } - } - tags { - edges { - node { - name - } - } - } - tables { - edges { - node { - id - name - slug - description - temporalCoverage - cloudTables { - edges { - node { - gcpProjectId - gcpDatasetId - gcpTableId - } - } - } - columns { - edges { - node { - id - name - description - bigqueryType { - name - } - } - } - } - } - } - } - } - } - } -} -""" - -# Shared client for making HTTP requests. -_http_client = httpx.Client(timeout=httpx.Timeout(TIMEOUT, read=READ_TIMEOUT)) - - -class GoogleAPIError: - """Constants for expected Google API error types.""" - - BYTES_BILLED_LIMIT_EXCEEDED = "bytesBilledLimitExceeded" - NOT_FOUND = "notFound" - - -class Column(BaseModel): - """Represents a column in a BigQuery table with metadata.""" - - name: str - type: str - description: str | None - - -class Table(BaseModel): - """Represents a BigQuery table with its columns and metadata.""" - - id: str - gcp_id: str | None - name: str - slug: str | None - description: str | None - temporal_coverage: dict[str, str | None] - columns: list[Column] - - -class DatasetOverview(BaseModel): - """Basic dataset information without table details.""" - - id: str - name: str - slug: str | None - description: str | None - tags: list[str] - themes: list[str] - organizations: list[str] - - -class Dataset(DatasetOverview): - """Complete dataset information including all tables and columns.""" - - tables: list[Table] - usage_guide: str | None - - -class ErrorDetails(BaseModel): - "Error response format." - - error_type: str | None = None - message: str - instructions: str | None = None - - -class ToolError(Exception): - """Custom exception for tool-specific errors.""" - - def __init__( - self, - message: str, - error_type: str | None = None, - instructions: str | None = None, - ): - super().__init__(message) - self.error_type = error_type - self.instructions = instructions - - -class ToolOutput(BaseModel): - """Tool output response format.""" - - status: Literal["success", "error"] - results: JsonValue | None = None - error_details: ErrorDetails | None = None - - @model_validator(mode="after") - def check_results_or_error(self) -> Self: - if (self.results is None) ^ (self.error_details is None): - return self - raise ValueError("Only one of 'results' or 'error_details' should be set") - - -@cache -def get_bigquery_client() -> bq.Client: # pragma: no cover - """Return a cached BigQuery client. - - The client is initialized once using the project ID from the - `BIGQUERY_PROJECT_ID` environment variable and reused on subsequent calls. - - Returns: - bigquery.Client: A cached, authenticated BigQuery client. - """ - return bq.Client( - project=settings.GOOGLE_BIGQUERY_PROJECT, - credentials=settings.GOOGLE_CREDENTIALS, - ) - - -def handle_tool_errors( - _func: Callable[..., Any] | None = None, - *, - instructions: dict[str, str] = {}, -) -> Callable[..., Any]: - """Decorator that catches errors in a tool function and returns them as structured JSON. - - Args: - _func (Callable[..., Any] | None, optional): Function to wrap. - Set automatically when used as a decorator. Defaults to None. - instructions (dict[str, str], optional): Maps known error reasons - from Google API to recovery instructions. If a reason matches, - the instruction is added to the error JSON. - - Returns: - Callable[..., Any]: Wrapped function that returns the tool result on success - or structured error JSON on failure. - """ - - def decorator(func: Callable[..., Any]) -> Callable[..., Any]: - @wraps(func) - def wrapper(*args, **kwargs) -> Any: - try: - return func(*args, **kwargs) - except GoogleAPICallError as e: - reason = None - message = str(e) - - if getattr(e, "errors", None): - reason = e.errors[0].get("reason") - message = e.errors[0].get("message", message) - - error_details = ErrorDetails( - error_type=reason, - message=message, - instructions=instructions.get(reason), - ) - except ToolError as e: - error_details = ErrorDetails( - error_type=e.error_type, message=str(e), instructions=e.instructions - ) - except Exception as e: - error_details = ErrorDetails(message=f"Unexpected error: {e}") - - tool_output = ToolOutput( - status="error", error_details=error_details - ).model_dump(exclude_none=True) - return json.dumps(tool_output, ensure_ascii=False, indent=2) - - return wrapper - - if _func is None: - return decorator - - return decorator(_func) - - -@tool -@handle_tool_errors -def search_datasets(query: str) -> str: - """Search for datasets in Base dos Dados using keywords. - - CRITICAL: Use individual KEYWORDS only, not full sentences. The search engine uses Elasticsearch. - - Args: - query (str): 2-3 keywords maximum. Use Portuguese terms, organization acronyms, or dataset acronyms. - Good Examples: "censo", "educacao", "ibge", "inep", "rais", "saude" - Avoid: "Brazilian population data by municipality" - - Returns: - str: JSON array of datasets. If empty/irrelevant results, try different keywords. - - Strategy: Start with broad terms like "censo", "ibge", "inep", "rais", then get specific if needed. - Next step: Use `get_dataset_details()` with returned dataset IDs. - """ # noqa: E501 - response = _http_client.get( - url=SEARCH_URL, - params={"contains": "tables", "q": query, "page_size": PAGE_SIZE}, - ) - - response.raise_for_status() - data: dict = response.json() - - datasets = data.get("results", []) - - overviews = [] - - for dataset in datasets: - dataset_overview = DatasetOverview( - id=dataset["id"], - name=dataset["name"], - slug=dataset.get("slug"), - description=dataset.get("description"), - tags=[tag["name"] for tag in dataset.get("tags", [])], - themes=[theme["name"] for theme in dataset.get("themes", [])], - organizations=[org["name"] for org in dataset.get("organizations", [])], - ) - overviews.append(dataset_overview.model_dump()) - - tool_output = ToolOutput(status="success", results=overviews).model_dump( - exclude_none=True - ) - return json.dumps(tool_output, ensure_ascii=False, indent=2) - - -@tool -@handle_tool_errors -def get_dataset_details(dataset_id: str) -> str: - """Get comprehensive details about a specific dataset including all tables and columns. - - Use AFTER `search_datasets()` to understand data structure before writing queries. - - Args: - dataset_id (str): Dataset ID obtained from `search_datasets()`. - This is typically a UUID-like string, not the human-readable name. - - Returns: - str: JSON object with complete dataset information, including: - - Basic metadata (name, description, tags, themes, organizations) - - tables: Array of all tables in the dataset with: - - gcp_id: Full BigQuery table reference (`project.dataset.table`) - - columns: All column names, types, and descriptions - - temporal coverage: Authoritative temporal coverage for the table - - table descriptions explaining what each table contains - - usage_guide: Provide key information and best practices for using the dataset. - - Next step: Use `execute_bigquery_sql()` to execute queries. - """ # noqa: E501 - response = _http_client.post( - url=GRAPHQL_URL, - json={ - "query": DATASET_DETAILS_QUERY, - "variables": {"id": dataset_id}, - }, - ) - - response.raise_for_status() - data: dict[str, dict[str, dict]] = response.json() - - all_datasets = data.get("data", {}).get("allDataset") or {} - dataset_edges = all_datasets.get("edges", []) - - if not dataset_edges: - raise ToolError( - message=f"Dataset {dataset_id} not found", - error_type="DATASET_NOT_FOUND", - instructions="Verify the dataset ID from `search_datasets` results", - ) - - dataset = dataset_edges[0]["node"] - - dataset_id = dataset["id"] - dataset_name = dataset["name"] - dataset_slug = dataset.get("slug") - dataset_description = dataset.get("description") - - # Tags - dataset_tags = [] - - for edge in dataset.get("tags", {}).get("edges", []): - if tag := edge.get("node", {}).get("name"): - dataset_tags.append(tag) - - # Themes - dataset_themes = [] - - for edge in dataset.get("themes", {}).get("edges", []): - if theme := edge.get("node", {}).get("name"): - dataset_themes.append(theme) - - # Organizations - dataset_organizations = [] - - for edge in dataset.get("organizations", {}).get("edges", []): - if org := edge.get("node", {}).get("name"): - dataset_organizations.append(org) - - # Tables - dataset_tables = [] - gcp_dataset_id = None - - for edge in dataset.get("tables", {}).get("edges", []): - table = edge["node"] - - table_id = table["id"] - table_name = table["name"] - table_slug = table.get("slug") - table_description = table.get("description") - table_temporal_coverage = table.get("temporalCoverage") - - cloud_table_edges = table["cloudTables"]["edges"] - if cloud_table_edges: - cloud_table = cloud_table_edges[0]["node"] - gcp_project_id = cloud_table["gcpProjectId"] - gcp_dataset_id = gcp_dataset_id or cloud_table["gcpDatasetId"] - gcp_table_id = cloud_table["gcpTableId"] - table_gcp_id = f"{gcp_project_id}.{gcp_dataset_id}.{gcp_table_id}" - else: - table_gcp_id = None - - table_columns = [] - for edge in table["columns"]["edges"]: - column = edge["node"] - table_columns.append( - Column( - name=column["name"], - type=column["bigqueryType"]["name"], - description=column.get("description"), - ) - ) - - dataset_tables.append( - Table( - id=table_id, - gcp_id=table_gcp_id, - name=table_name, - slug=table_slug, - description=table_description, - columns=table_columns, - temporal_coverage=table_temporal_coverage, - ) - ) - - # Fetch usage guide - usage_guide = None - - if gcp_dataset_id is not None: - filename = gcp_dataset_id.replace("_", "-") - - response = _http_client.get(f"{BASE_USAGE_GUIDE_URL}/{filename}.md") - - if response.status_code == httpx.codes.OK: - usage_guide = response.text.strip() - - dataset = Dataset( - id=dataset_id, - name=dataset_name, - slug=dataset_slug, - description=dataset_description, - tags=dataset_tags, - themes=dataset_themes, - organizations=dataset_organizations, - tables=dataset_tables, - usage_guide=usage_guide, - ).model_dump() - - tool_output = ToolOutput(status="success", results=dataset).model_dump( - exclude_none=True - ) - return json.dumps(tool_output, ensure_ascii=False, indent=2) - - -@tool -@handle_tool_errors( - instructions={ - GoogleAPIError.BYTES_BILLED_LIMIT_EXCEEDED: "Add WHERE filters or select fewer columns." - } -) -def execute_bigquery_sql(sql_query: str, config: RunnableConfig) -> str: - """Execute a SQL query against BigQuery tables from the Base dos Dados database. - - Use AFTER identifying the right datasets and understanding tables structure. - It includes a 10GB processing limit for safety. - - Args: - sql_query (str): Standard GoogleSQL query. Must reference - tables using their full `gcp_id` from `get_dataset_details()`. - - Best practices: - - Use fully qualified names: `project.dataset.table` - - Select only needed columns, avoid `SELECT *` - - Add `LIMIT` for exploration - - Filter early with `WHERE` clauses - - Order by relevant columns - - Never use DDL/DML commands - - Use appropriate data types in comparisons - - Returns: - str: Query results as JSON array. Empty results return "[]". - """ # noqa: E501 - client = get_bigquery_client() - - job_config = bq.QueryJobConfig(dry_run=True, use_query_cache=False) - dry_run_query_job = client.query(sql_query, job_config=job_config) - statement_type = dry_run_query_job.statement_type - - if statement_type != "SELECT": - raise ToolError( - message=f"Query aborted: Statement {statement_type} is forbidden.", - error_type="FORBIDDEN_STATEMENT", - instructions="Your access is strictly read-only. Use only SELECT statements.", - ) - - labels = { - "thread_id": config.get("configurable", {}).get("thread_id", "unknown"), - "user_id": config.get("configurable", {}).get("user_id", "unknown"), - "tool_name": inspect.currentframe().f_code.co_name, - } - - job_config = bq.QueryJobConfig( - maximum_bytes_billed=LIMIT_BIGQUERY_QUERY, labels=labels - ) - query_job = client.query(sql_query, job_config=job_config) - - rows = query_job.result() - results = [dict(row) for row in rows] - - tool_output = ToolOutput(status="success", results=results).model_dump( - exclude_none=True - ) - return json.dumps(tool_output, ensure_ascii=False, default=str) - - -@tool -@handle_tool_errors( - instructions={ - GoogleAPIError.NOT_FOUND: ("Dictionary table not found for this dataset.") - } -) -def decode_table_values( - table_gcp_id: str, - config: RunnableConfig, - column_name: str | None = None, -) -> str: - """Decode coded values from a table. - - Use when column values appear to be codes (e.g., 1,2,3 or A,B,C). - Many datasets use codes for storage efficiency. This tool provides - the authoritative meanings of these codes. - - Args: - table_gcp_id (str): Full BigQuery table reference. - column_name (str | None, optional): Column with coded values. If `None`, - all columns will be used. Defaults to `None`. - - Returns: - str: JSON array with chave (code) and valor (meaning) mappings. - """ - # noqa: E501 - try: - project_name, dataset_name, table_name = table_gcp_id.split(".") - except ValueError: - raise ToolError( - message=f"Invalid table reference: '{table_gcp_id}'", - error_type="INVALID_TABLE_REFERENCE", - instructions="Provide a valid table reference in the format `project.dataset.table`", - ) - - client = get_bigquery_client() - - dataset_id = f"{project_name}.{dataset_name}" - dict_table_id = f"{dataset_id}.dicionario" - - search_query = f""" - SELECT nome_coluna, chave, valor - FROM {dict_table_id} - WHERE id_tabela = '{table_name}' - """ - - if column_name is not None: - search_query += f"AND nome_coluna = '{column_name}'" - - search_query += "ORDER BY nome_coluna, chave" - - labels = { - "thread_id": config.get("configurable", {}).get("thread_id", "unknown"), - "user_id": config.get("configurable", {}).get("user_id", "unknown"), - "tool_name": inspect.currentframe().f_code.co_name, - } - - job_config = bq.QueryJobConfig(labels=labels) - query_job = client.query(search_query, job_config=job_config) - - rows = query_job.result() - results = [dict(row) for row in rows] - - tool_output = ToolOutput(status="success", results=results).model_dump( - exclude_none=True - ) - return json.dumps(tool_output, ensure_ascii=False, default=str) - - -class BDToolkit: - @staticmethod - def get_tools() -> list[BaseTool]: - """Return all available tools for Base dos Dados database interaction. - - This function provides a complete set of tools for discovering, exploring, - and querying Brazilian public datasets through the Base dos Dados platform. - - Returns: - list[BaseTool]: A list of LangChain tool functions in suggested usage order: - - search_datasets: Find datasets using keywords - - get_dataset_details: Get comprehensive dataset information - - execute_bigquery_sql: Execute SQL queries against BigQuery tables - - decode_table_values: Decode coded values using dictionary tables - """ - return [ - search_datasets, - get_dataset_details, - execute_bigquery_sql, - decode_table_values, - ] diff --git a/app/agent/tools/__init__.py b/app/agent/tools/__init__.py new file mode 100644 index 0000000..c1f9fa3 --- /dev/null +++ b/app/agent/tools/__init__.py @@ -0,0 +1,32 @@ +from langchain_core.tools import BaseTool + +from app.agent.tools.api import get_dataset_details, get_table_details, search_datasets +from app.agent.tools.bigquery import decode_table_values, execute_bigquery_sql + + +class BDToolkit: + @staticmethod + def get_tools() -> list[BaseTool]: + """Return all available tools for Base dos Dados database interaction. + + This function provides a complete set of tools for discovering, exploring, + and querying Brazilian public datasets through the Base dos Dados platform. + + Returns: + list[BaseTool]: Tools in suggested usage order: + - search_datasets: Find datasets using keywords + - get_dataset_details: Get comprehensive dataset information + - get_table_details: Get comprehensive table information + - execute_bigquery_sql: Execute SQL queries against BigQuery tables + - decode_table_values: Decode coded values using dictionary tables + """ + return [ + search_datasets, + get_dataset_details, + get_table_details, + execute_bigquery_sql, + decode_table_values, + ] + + +__all__ = ["BDToolkit"] diff --git a/app/agent/tools/api.py b/app/agent/tools/api.py new file mode 100644 index 0000000..9993b9f --- /dev/null +++ b/app/agent/tools/api.py @@ -0,0 +1,300 @@ +import json + +import httpx +from langchain_core.tools import tool + +from app.agent.tools.exceptions import handle_tool_errors +from app.agent.tools.models import ( + Column, + Dataset, + DatasetOverview, + Table, + TableOverview, +) +from app.agent.tools.queries import DATASET_DETAILS_QUERY, TABLE_DETAILS_QUERY +from app.settings import settings + +# httpx default timeout +TIMEOUT = 5.0 + +# httpx read timeout +READ_TIMEOUT = 60.0 + +# maximum number of datasets returned on search +PAGE_SIZE = 10 + +# url for searching datasets +SEARCH_URL = f"{settings.BASEDOSDADOS_BASE_URL}/search/" + +# URL for fetching dataset details +GRAPHQL_URL = f"{settings.BASEDOSDADOS_BASE_URL}/graphql" + +# URL for fetching usage guides +BASE_USAGE_GUIDE_URL = "https://raw.githubusercontent.com/basedosdados/website/refs/heads/main/next/content/userGuide/pt" + +_client = httpx.Client(timeout=httpx.Timeout(TIMEOUT, read=READ_TIMEOUT)) + + +@tool +@handle_tool_errors +def search_datasets(query: str) -> str: + """Search for datasets in Base dos Dados using keywords. + + CRITICAL: Use individual KEYWORDS only, not full sentences. The search engine uses Elasticsearch. + + Args: + query (str): 2-3 keywords maximum. Use Portuguese terms, organization acronyms, or dataset acronyms. + Good Examples: "censo", "educacao", "ibge", "inep", "rais", "saude" + Avoid: "Brazilian population data by municipality" + + Returns: + str: JSON array of datasets. If empty/irrelevant results, try different keywords. + + Strategy: Start with broad terms like "censo", "ibge", "inep", "rais", then get specific if needed. + Next step: Use `get_dataset_details()` with returned dataset IDs. + """ # noqa: E501 + response = _client.get( + url=SEARCH_URL, + params={"contains": "tables", "q": query, "page_size": PAGE_SIZE}, + ) + + response.raise_for_status() + data: dict = response.json() + + datasets = data.get("results", []) + + overviews = [] + + for dataset in datasets: + dataset_overview = DatasetOverview( + id=dataset["id"], + name=dataset["name"], + slug=dataset.get("slug"), + description=dataset.get("description"), + tags=[tag["name"] for tag in dataset.get("tags", [])], + themes=[theme["name"] for theme in dataset.get("themes", [])], + organizations=[org["name"] for org in dataset.get("organizations", [])], + ) + overviews.append(dataset_overview.model_dump()) + + return json.dumps(overviews, ensure_ascii=False, indent=2) + + +@tool +@handle_tool_errors +def get_dataset_details(dataset_id: str) -> str: + """Get comprehensive details about a specific dataset including all its tables. + + Use AFTER `search_datasets()` to understand data structure before writing queries. + + Args: + dataset_id (str): Dataset ID obtained from `search_datasets()`. + This is typically a UUID-like string, not the human-readable name. + + Returns: + str: JSON object with complete dataset information, including: + - Basic metadata (name, description, tags, themes, organizations) + - tables: Array of all tables in the dataset with: + - gcp_id: Full BigQuery table reference (`project.dataset.table`) + - temporal coverage: Authoritative temporal coverage for the table + - table descriptions explaining what each table contains + - usage_guide: Provide key information and best practices for using the dataset. + + Next step: Use `get_table_details()` with returned table IDs. + """ # noqa: E501 + response = _client.post( + url=GRAPHQL_URL, + json={ + "query": DATASET_DETAILS_QUERY, + "variables": {"id": dataset_id}, + }, + ) + + response.raise_for_status() + data: dict[str, dict[str, dict]] = response.json() + + all_datasets = data.get("data", {}).get("allDataset") or {} + dataset_edges = all_datasets.get("edges", []) + + if not dataset_edges: + raise ValueError( + f"Dataset '{dataset_id}' not found. Verify the dataset ID from search_datasets results." + ) + + dataset = dataset_edges[0]["node"] + + dataset_id = dataset["id"].split("DatasetNode:")[-1] + dataset_name = dataset["name"] + dataset_slug = dataset.get("slug") + dataset_description = dataset.get("description") + + # Tags + dataset_tags = [] + + for edge in dataset.get("tags", {}).get("edges", []): + if tag := edge.get("node", {}).get("name"): + dataset_tags.append(tag) + + # Themes + dataset_themes = [] + + for edge in dataset.get("themes", {}).get("edges", []): + if theme := edge.get("node", {}).get("name"): + dataset_themes.append(theme) + + # Organizations + dataset_organizations = [] + + for edge in dataset.get("organizations", {}).get("edges", []): + if org := edge.get("node", {}).get("name"): + dataset_organizations.append(org) + + # Tables + dataset_tables = [] + gcp_dataset_id = None + + for edge in dataset.get("tables", {}).get("edges", []): + table = edge["node"] + + table_id = table["id"].split("TableNode:")[-1] + table_name = table["name"] + table_slug = table.get("slug") + table_description = table.get("description") + table_temporal_coverage = table.get("temporalCoverage") + + cloud_table_edges = table["cloudTables"]["edges"] + if cloud_table_edges: + cloud_table = cloud_table_edges[0]["node"] + gcp_project_id = cloud_table["gcpProjectId"] + gcp_dataset_id = gcp_dataset_id or cloud_table["gcpDatasetId"] + gcp_table_id = cloud_table["gcpTableId"] + table_gcp_id = f"{gcp_project_id}.{gcp_dataset_id}.{gcp_table_id}" + else: + table_gcp_id = None + + dataset_tables.append( + TableOverview( + id=table_id, + gcp_id=table_gcp_id, + name=table_name, + slug=table_slug, + description=table_description, + temporal_coverage=table_temporal_coverage, + ) + ) + + # Fetch usage guide + usage_guide = None + + if gcp_dataset_id is not None: + filename = gcp_dataset_id.replace("_", "-") + + response = _client.get(f"{BASE_USAGE_GUIDE_URL}/{filename}.md") + + if response.status_code == httpx.codes.OK: + usage_guide = response.text.strip() + + result = Dataset( + id=dataset_id, + name=dataset_name, + slug=dataset_slug, + description=dataset_description, + tags=dataset_tags, + themes=dataset_themes, + organizations=dataset_organizations, + tables=dataset_tables, + usage_guide=usage_guide, + ) + + return result.model_dump_json(indent=2) + + +@tool +@handle_tool_errors +def get_table_details(table_id: str) -> str: + """Get comprehensive details about a specific table including all its columns. + + Use AFTER `get_dataset_details()` to understand table structure before writing queries. + + Args: + table_id (str): Table ID obtained from `get_dataset_details()`. + This is typically a UUID-like string, not the human-readable name. + + Returns: + str: JSON object with complete table information, including: + - Basic metadata (name, description, slug) + - gcp_id: Full BigQuery table reference (`project.dataset.table`) + - temporal coverage: Authoritative temporal coverage for the table + - columns: All column names, types, and descriptions + + Next step: Use `execute_bigquery_sql()` to execute queries. + """ + response = _client.post( + url=GRAPHQL_URL, + json={ + "query": TABLE_DETAILS_QUERY, + "variables": {"id": table_id}, + }, + ) + + response.raise_for_status() + data: dict[str, dict[str, dict]] = response.json() + + all_tables = data.get("data", {}).get("allTable") or {} + table_edges = all_tables.get("edges", []) + + if not table_edges: + raise ValueError( + f"Table '{table_id}' not found. Verify the table ID from get_dataset_details results." + ) + + table = table_edges[0]["node"] + + table_id = table["id"].split("TableNode:")[-1] + table_name = table["name"] + table_slug = table.get("slug") + table_description = table.get("description") + table_temporal_coverage = table.get("temporalCoverage") + + cloud_table_edges = table["cloudTables"]["edges"] + if cloud_table_edges: + cloud_table = cloud_table_edges[0]["node"] + gcp_project_id = cloud_table["gcpProjectId"] + gcp_dataset_id = cloud_table["gcpDatasetId"] + gcp_table_id = cloud_table["gcpTableId"] + table_gcp_id = f"{gcp_project_id}.{gcp_dataset_id}.{gcp_table_id}" + else: + table_gcp_id = None + + table_columns = [] + for edge in table["columns"]["edges"]: + column = edge["node"] + + directory_primary_key = column["directoryPrimaryKey"] + + if directory_primary_key is not None: + directory_table = directory_primary_key["table"] + directory_table_id = directory_table["id"].split("TableNode:")[-1] + else: + directory_table_id = None + + table_columns.append( + Column( + name=column["name"], + type=column["bigqueryType"]["name"], + description=column.get("description"), + reference_table_id=directory_table_id, + ) + ) + + result = Table( + id=table_id, + gcp_id=table_gcp_id, + name=table_name, + slug=table_slug, + description=table_description, + temporal_coverage=table_temporal_coverage, + columns=table_columns, + ) + + return result.model_dump_json(indent=2) diff --git a/app/agent/tools/bigquery.py b/app/agent/tools/bigquery.py new file mode 100644 index 0000000..82b9b45 --- /dev/null +++ b/app/agent/tools/bigquery.py @@ -0,0 +1,138 @@ +import inspect +import json +from functools import cache + +from google.api_core.exceptions import GoogleAPICallError +from google.cloud import bigquery as bq +from langchain_core.runnables import RunnableConfig +from langchain_core.tools import tool + +from app.agent.tools.exceptions import handle_tool_errors +from app.settings import settings + +MAX_BYTES_BILLED = 10 * 10**9 + + +@cache +def _get_client() -> bq.Client: # pragma: no cover + return bq.Client( + project=settings.GOOGLE_BIGQUERY_PROJECT, + credentials=settings.GOOGLE_CREDENTIALS, + ) + + +@tool +@handle_tool_errors +def execute_bigquery_sql(sql_query: str, config: RunnableConfig) -> str: + """Execute a SQL query against BigQuery tables from the Base dos Dados database. + + Use AFTER identifying the right datasets and understanding tables structure. + It includes a 10GB processing limit for safety. + + Args: + sql_query (str): Standard GoogleSQL query. Must reference + tables using their full `gcp_id` from `get_dataset_details()`. + + Best practices: + - Use fully qualified names: `project.dataset.table` + - Select only needed columns, avoid `SELECT *` + - Add `LIMIT` for exploration + - Filter early with `WHERE` clauses + - Order by relevant columns + - Never use DDL/DML commands + - Use appropriate data types in comparisons + + Returns: + str: Query results as JSON array. Empty results return "[]". + """ # noqa: E501 + client = _get_client() + + dry_run = client.query( + sql_query, job_config=bq.QueryJobConfig(dry_run=True, use_query_cache=False) + ) + + if dry_run.statement_type != "SELECT": + raise ValueError( + f"Only SELECT statements are allowed, got {dry_run.statement_type}." + ) + + labels = { + "thread_id": config.get("configurable", {}).get("thread_id", "unknown"), + "user_id": config.get("configurable", {}).get("user_id", "unknown"), + "tool_name": inspect.currentframe().f_code.co_name, + } + + try: + job = client.query( + sql_query, + job_config=bq.QueryJobConfig( + maximum_bytes_billed=MAX_BYTES_BILLED, labels=labels + ), + ) + results = [dict(row) for row in job.result()] + except GoogleAPICallError as e: + reason = e.errors[0].get("reason") if getattr(e, "errors", None) else None + if reason == "bytesBilledLimitExceeded": + raise ValueError( + f"Query exceeds the {MAX_BYTES_BILLED // 10**9}GB processing limit. Add WHERE filters or select fewer columns." + ) from e + raise + + return json.dumps(results, ensure_ascii=False, indent=2, default=str) + + +@tool +@handle_tool_errors +def decode_table_values( + table_gcp_id: str, config: RunnableConfig, column_name: str | None = None +) -> str: + """Decode coded values from a table using its dataset's `dicionario` table. + + Use when column values appear to be codes (e.g., 1,2,3 or A,B,C) and the + column does NOT have a `lookup_table_id` in `get_table_details()` metadata. + + Args: + table_gcp_id (str): Full BigQuery table reference. + column_name (str | None, optional): Column with coded values. If `None`, + all columns will be used. Defaults to `None`. + + Returns: + str: JSON array with chave (code) and valor (meaning) mappings. + """ + try: + project_name, dataset_name, table_name = table_gcp_id.split(".") + except ValueError: + raise ValueError( + f"Invalid table reference: '{table_gcp_id}'. Expected format: project.dataset.table" + ) + + dict_table_id = f"{project_name}.{dataset_name}.dicionario" + + search_query = f""" + SELECT nome_coluna, chave, valor + FROM {dict_table_id} + WHERE id_tabela = '{table_name}' + """ + + if column_name is not None: + search_query += f"AND nome_coluna = '{column_name}'" + + search_query += "ORDER BY nome_coluna, chave" + + labels = { + "thread_id": config.get("configurable", {}).get("thread_id", "unknown"), + "user_id": config.get("configurable", {}).get("user_id", "unknown"), + "tool_name": inspect.currentframe().f_code.co_name, + } + + try: + client = _get_client() + job = client.query(search_query, job_config=bq.QueryJobConfig(labels=labels)) + results = [dict(row) for row in job.result()] + except GoogleAPICallError as e: + reason = e.errors[0].get("reason") if getattr(e, "errors", None) else None + if reason == "notFound": + raise ValueError("Dictionary table not found for this dataset.") from e + raise + + return json.dumps(results, ensure_ascii=False, indent=2, default=str) diff --git a/app/agent/tools/exceptions.py b/app/agent/tools/exceptions.py new file mode 100644 index 0000000..67b5e51 --- /dev/null +++ b/app/agent/tools/exceptions.py @@ -0,0 +1,32 @@ +from collections.abc import Callable +from functools import wraps +from typing import Any, Literal + +from pydantic import BaseModel + + +class ToolError(BaseModel): + "Error response format for agents." + + status: Literal["error"] = "error" + message: str + + +def handle_tool_errors(func: Callable[..., Any]) -> Callable[..., Any]: + """Decorator that catches exceptions raised by a tool and returns them as structured errors. + + Args: + func (Callable[..., Any]): Function to wrap. + + Returns: + Callable[..., Any]: Wrapped function. + """ + + @wraps(func) + def wrapper(*args, **kwargs) -> Any: + try: + return func(*args, **kwargs) + except Exception as e: + return ToolError(message=str(e)).model_dump_json(indent=2) + + return wrapper diff --git a/app/agent/tools/models.py b/app/agent/tools/models.py new file mode 100644 index 0000000..bed2c5f --- /dev/null +++ b/app/agent/tools/models.py @@ -0,0 +1,46 @@ +from pydantic import BaseModel, Field + + +class Column(BaseModel): + """Complete column information.""" + + name: str + type: str + description: str | None + reference_table_id: str | None = Field(exclude_if=lambda v: v is None) + + +class TableOverview(BaseModel): + """Basic table information without column details.""" + + id: str + gcp_id: str | None + name: str + slug: str | None + description: str | None + temporal_coverage: dict[str, str | None] + + +class Table(TableOverview): + """Complete table information including all its columns.""" + + columns: list[Column] + + +class DatasetOverview(BaseModel): + """Basic dataset information without table details.""" + + id: str + name: str + slug: str | None + description: str | None + tags: list[str] + themes: list[str] + organizations: list[str] + + +class Dataset(DatasetOverview): + """Complete dataset information including all tables and columns.""" + + tables: list[TableOverview] + usage_guide: str | None diff --git a/app/agent/tools/queries.py b/app/agent/tools/queries.py new file mode 100644 index 0000000..4dd7e44 --- /dev/null +++ b/app/agent/tools/queries.py @@ -0,0 +1,98 @@ +DATASET_DETAILS_QUERY = """ +query getDatasetDetails($id: ID!) { + allDataset(id: $id, first: 1) { + edges { + node { + id + name + slug + description + organizations { + edges { + node { + name + slug + } + } + } + themes { + edges { + node { + name + } + } + } + tags { + edges { + node { + name + } + } + } + tables { + edges { + node { + id + name + slug + description + temporalCoverage + cloudTables { + edges { + node { + gcpProjectId + gcpDatasetId + gcpTableId + } + } + } + } + } + } + } + } + } +} +""" + +TABLE_DETAILS_QUERY = """ +query getTableDetails($id: ID!) { + allTable(id: $id, first: 1){ + edges { + node { + id + name + slug + description + temporalCoverage + cloudTables { + edges { + node { + gcpProjectId + gcpDatasetId + gcpTableId + } + } + } + columns { + edges { + node { + id + name + description + bigqueryType { + name + } + directoryPrimaryKey { + table { + id + } + } + } + } + } + } + } + } +} +""" diff --git a/app/api/streaming.py b/app/api/streaming.py index 9dad56c..6371c41 100644 --- a/app/api/streaming.py +++ b/app/api/streaming.py @@ -122,6 +122,35 @@ def _truncate_json( return json.dumps(data, ensure_ascii=False, indent=2) +def _parse_thinking(message: AIMessage) -> str | None: + """Parse thinking content from an AI message. + + Some models (e.g., Gemini 3) return `message.content` as a list of typed blocks, + which may include `{"type": "thinking", "thinking": "..."}` entries. When + `content` is a plain string, no thinking is available. + + Args: + message (AIMessage): The AI message from where to parse the thinking. + + Returns: + str | None: The concatenated thinking text, or None if no thinking blocks exist. + """ + if isinstance(message.content, str): + return None + + blocks = [ + block + for block in message.content + if isinstance(block, dict) + and block.get("type") == "thinking" + and isinstance(block.get("thinking"), str) + ] + + thinking = "".join(block["thinking"] for block in blocks) + + return thinking or None + + def _process_chunk(chunk: dict[str, Any]) -> StreamEvent | None: """Process a streaming chunk from a react agent workflow into a standardized StreamEvent. @@ -154,11 +183,14 @@ def _process_chunk(chunk: dict[str, Any]) -> StreamEvent | None: ) for tool_call in message.tool_calls ] + thinking = _parse_thinking(message) else: event_type = "final_answer" tool_calls = None + thinking = None - event_data = EventData(content=message.text, tool_calls=tool_calls) + content = thinking or message.text + event_data = EventData(content=content, tool_calls=tool_calls) return StreamEvent(type=event_type, data=event_data) elif "tools" in chunk: diff --git a/app/main.py b/app/main.py index 524bc1c..b2d38ee 100644 --- a/app/main.py +++ b/app/main.py @@ -1,4 +1,5 @@ from contextlib import asynccontextmanager +from datetime import date from fastapi import FastAPI from fastapi.responses import RedirectResponse @@ -53,6 +54,8 @@ async def lifespan(app: FastAPI): # pragma: no cover model=settings.MODEL_URI, temperature=settings.MODEL_TEMPERATURE, credentials=settings.GOOGLE_CREDENTIALS, + thinking_level=settings.THINKING_LEVEL, + include_thoughts=True, ) summ_middleware = SummarizationMiddleware( @@ -79,7 +82,9 @@ async def lifespan(app: FastAPI): # pragma: no cover agent = create_agent( model=model, tools=BDToolkit.get_tools(), - system_prompt=SYSTEM_PROMPT, + system_prompt=SYSTEM_PROMPT.format( + current_date=date.today().isoformat() + ), middleware=[summ_middleware, limit_middleware], checkpointer=checkpointer, ) diff --git a/app/settings.py b/app/settings.py index e64a2a7..63484c7 100644 --- a/app/settings.py +++ b/app/settings.py @@ -111,6 +111,9 @@ def GOOGLE_CREDENTIALS(self) -> Credentials: # pragma: no cover "lower ones make them more deterministic." ) ) + THINKING_LEVEL: Literal["minimum", "low", "medium", "high"] = Field( + description="Controls the amount of thinking Gemini models performs before returning a response." + ) # ============================================================ # == LangSmith settings == diff --git a/pyproject.toml b/pyproject.toml index 11ca196..4d90572 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -16,7 +16,7 @@ dependencies = [ "langsmith>=0.6.0", "loguru>=0.7.3", "psycopg[binary]>=3.3.2", - "pydantic<2.12.0", + "pydantic>=2.12.0", "pydantic-settings>=2.12.0", "pyjwt>=2.10.1", "sqlmodel>=0.0.31", diff --git a/tests/app/agent/test_tools.py b/tests/app/agent/test_tools.py deleted file mode 100644 index 0e07b76..0000000 --- a/tests/app/agent/test_tools.py +++ /dev/null @@ -1,728 +0,0 @@ -import json -from unittest.mock import MagicMock - -import httpx -import pytest -import respx -from google.api_core.exceptions import BadRequest, NotFound -from google.cloud import bigquery as bq -from pydantic import ValidationError -from pytest_mock import MockerFixture - -from app.agent.tools import ( - BDToolkit, - ToolError, - ToolOutput, - decode_table_values, - execute_bigquery_sql, - get_dataset_details, - handle_tool_errors, - search_datasets, -) -from app.settings import settings - - -class TestHandleToolErrors: - """Tests for handle_tool_errors decorator.""" - - def test_decorator_passes_through_success(self): - """Test decorator returns function result on success.""" - - @handle_tool_errors - def successful_function(): - return '{"status": "success", "results": "test results"}' - - output = ToolOutput.model_validate(json.loads(successful_function())) - - assert output.status == "success" - assert output.results == "test results" - assert output.error_details is None - - def test_decorator_catches_google_api_error(self): - """Test decorator catches GoogleAPICallError.""" - - @handle_tool_errors - def failing_function(): - error = BadRequest( - message="Some bad request", - errors=[{"reason": "testReason", "message": "Test message"}], - ) - raise error - - output = ToolOutput.model_validate(json.loads(failing_function())) - - assert output.status == "error" - assert output.results is None - assert output.error_details.error_type == "testReason" - assert output.error_details.message == "Test message" - assert output.error_details.instructions is None - - def test_decorator_catches_google_api_error_without_errors(self): - """Test decorator catches GoogleAPICallError.""" - - @handle_tool_errors - def failing_function(): - error = BadRequest(message="Some bad request") - raise error - - output = ToolOutput.model_validate(json.loads(failing_function())) - - assert output.status == "error" - assert output.results is None - assert output.error_details.error_type is None - assert output.error_details.message == f"{BadRequest.code} Some bad request" - assert output.error_details.instructions is None - - def test_decorator_catches_tool_error(self): - """Test decorator catches ToolError.""" - - @handle_tool_errors - def failing_function(): - raise ToolError( - "Custom error", error_type="CUSTOM", instructions="Try again" - ) - - output = ToolOutput.model_validate(json.loads(failing_function())) - - assert output.status == "error" - assert output.results is None - assert output.error_details.error_type == "CUSTOM" - assert output.error_details.message == "Custom error" - assert output.error_details.instructions == "Try again" - - def test_decorator_catches_unexpected_exception(self): - """Test decorator catches unexpected exceptions.""" - - @handle_tool_errors - def failing_function(): - raise ValueError("This is a value error") - - output = ToolOutput.model_validate(json.loads(failing_function())) - - assert output.status == "error" - assert output.results is None - assert output.error_details.error_type is None - assert output.error_details.message == "Unexpected error: This is a value error" - assert output.error_details.instructions is None - - def test_decorator_with_custom_instructions(self): - """Test decorator with custom instructions mapping.""" - - @handle_tool_errors(instructions={"testReason": "Custom instruction"}) - def failing_function(): - error = BadRequest( - message="Some bad request", - errors=[{"reason": "testReason", "message": "Test message"}], - ) - raise error - - output = ToolOutput.model_validate(json.loads(failing_function())) - - assert output.status == "error" - assert output.results is None - assert output.error_details.error_type == "testReason" - assert output.error_details.message == "Test message" - assert output.error_details.instructions == "Custom instruction" - - -class TestToolOutput: - """Tests for ToolOutput model validation.""" - - def test_valid_success_output(self): - """Test valid success output with results.""" - output = ToolOutput(status="success", results={"data": "test"}) - - assert output.status == "success" - assert output.results == {"data": "test"} - assert output.error_details is None - - def test_valid_error_output(self): - """Test valid success output with results.""" - from app.agent.tools import ErrorDetails - - error_details = ErrorDetails(message="error") - - output = ToolOutput(status="error", error_details=error_details) - - assert output.status == "error" - assert output.results is None - assert output.error_details == error_details - - def test_invalid_both_results_and_error(self): - """Test validation fails when both results and error_details are set.""" - from app.agent.tools import ErrorDetails - - with pytest.raises(ValidationError): - ToolOutput( - status="error", - results={"data": "test"}, - error_details=ErrorDetails(message="error"), - ) - - def test_invalid_neither_results_nor_error(self): - """Test validation fails when neither results nor error_details are set.""" - with pytest.raises(ValidationError): - ToolOutput(status="success", results=None, error_details=None) - - -class TestSearchDatasets: - """Tests for search_datasets tool.""" - - SEARCH_ENDPOINT = f"{settings.BASEDOSDADOS_BASE_URL}/search/" - - @respx.mock - def test_search_datasets_returns_overviews(self): - """Test successful dataset search.""" - mock_response = { - "results": [ - { - "id": "dataset-1", - "name": "Test Dataset", - "slug": "test_dataset", - "description": "Dataset description", - "tags": [{"name": "tag1"}, {"name": "tag2"}], - "themes": [{"name": "theme1"}, {"name": "theme2"}], - "organizations": [{"name": "org1"}], - } - ] - } - - respx.get(self.SEARCH_ENDPOINT).mock( - return_value=httpx.Response(200, json=mock_response) - ) - - result = search_datasets.invoke({"query": "test"}) - output = ToolOutput.model_validate(json.loads(result)) - - assert output.status == "success" - assert len(output.results) == 1 - - dataset = output.results[0] - - assert dataset["id"] == "dataset-1" - assert dataset["name"] == "Test Dataset" - assert dataset["slug"] == "test_dataset" - assert dataset["description"] == "Dataset description" - assert dataset["tags"] == ["tag1", "tag2"] - assert dataset["themes"] == ["theme1", "theme2"] - assert dataset["organizations"] == ["org1"] - - assert output.error_details is None - - @respx.mock - def test_search_datasets_returns_empty_results(self): - """Test successful dataset search with no results.""" - respx.get(self.SEARCH_ENDPOINT).mock( - return_value=httpx.Response(200, json={"results": []}) - ) - - result = search_datasets.invoke({"query": "nonexistent"}) - output = ToolOutput.model_validate(json.loads(result)) - - assert output.status == "success" - assert output.results == [] - assert output.error_details is None - - -class TestGetDatasetDetails: - """Tests for get_dataset_details tool.""" - - GRAPHQL_URL = f"{settings.BASEDOSDADOS_BASE_URL}/graphql" - - @pytest.fixture - def mock_response(self): - return { - "data": { - "allDataset": { - "edges": [ - { - "node": { - "id": "dataset-1", - "name": "Test Dataset", - "slug": "test_dataset", - "description": "Dataset description", - "tags": {"edges": [{"node": {"name": "tag1"}}]}, - "themes": {"edges": [{"node": {"name": "theme1"}}]}, - "organizations": { - "edges": [ - {"node": {"name": "org1", "slug": "org1_slug"}} - ] - }, - "tables": { - "edges": [ - { - "node": { - "id": "table-1", - "name": "Test Table", - "slug": "test_table", - "description": "Table description", - "temporalCoverage": { - "start": "2020", - "end": "2023", - }, - "cloudTables": { - "edges": [ - { - "node": { - "gcpProjectId": "basedosdados", - "gcpDatasetId": "test_dataset", - "gcpTableId": "test_table", - } - } - ] - }, - "columns": { - "edges": [ - { - "node": { - "id": "col-1", - "name": "column_name", - "description": "Column description", - "bigqueryType": { - "name": "COLUMN_TYPE" - }, - } - } - ] - }, - } - } - ] - }, - } - } - ] - } - } - } - - @respx.mock - def test_get_dataset_details_success(self, mock_response): - """Test successful dataset details retrieval.""" - # Mock graphql endpoint - respx.post(self.GRAPHQL_URL).mock( - return_value=httpx.Response(200, json=mock_response) - ) - - # Mock usage guide (not found) - respx.get(url__startswith="https://raw.githubusercontent.com").mock( - return_value=httpx.Response(404) - ) - - result = get_dataset_details.invoke({"dataset_id": "dataset-1"}) - output = ToolOutput.model_validate(json.loads(result)) - - dataset = output.results - - assert output.status == "success" - assert dataset["id"] == "dataset-1" - assert dataset["name"] == "Test Dataset" - assert dataset["slug"] == "test_dataset" - assert dataset["description"] == "Dataset description" - assert dataset["tags"] == ["tag1"] - assert dataset["themes"] == ["theme1"] - assert dataset["organizations"] == ["org1"] - assert dataset["usage_guide"] is None - - assert len(dataset["tables"]) == 1 - - table = dataset["tables"][0] - - assert table["id"] == "table-1" - assert table["gcp_id"] == "basedosdados.test_dataset.test_table" - assert table["name"] == "Test Table" - assert table["slug"] == "test_table" - assert table["description"] == "Table description" - assert table["temporal_coverage"] == {"start": "2020", "end": "2023"} - - assert len(table["columns"]) == 1 - - column = table["columns"][0] - - assert column["name"] == "column_name" - assert column["type"] == "COLUMN_TYPE" - assert column["description"] == "Column description" - - assert output.error_details is None - - @respx.mock - def test_get_dataset_details_success_with_usage_guide(self, mock_response): - """Test dataset details with usage guide available.""" - respx.post(self.GRAPHQL_URL).mock( - return_value=httpx.Response(200, json=mock_response) - ) - - respx.get(url__startswith="https://raw.githubusercontent.com").mock( - return_value=httpx.Response(200, text="# This is a usage guide.") - ) - - result = get_dataset_details.invoke({"dataset_id": "dataset-1"}) - output = ToolOutput.model_validate(json.loads(result)) - - assert output.status == "success" - assert output.results["usage_guide"] == "# This is a usage guide." - assert output.error_details is None - - @respx.mock - def test_table_without_tags_themes_orgs(self): - """Test dataset with table that has no tags, themes and orgs.""" - mock_response = { - "data": { - "allDataset": { - "edges": [ - { - "node": { - "id": "dataset-1", - "name": "Test Dataset", - "slug": "test_dataset", - "description": "Dataset description", - "tags": {"edges": [{"node": {}}]}, - "themes": {"edges": [{"node": {}}]}, - "organizations": {"edges": [{"node": {}}]}, - "tables": { - "edges": [ - { - "node": { - "id": "table-1", - "name": "Test Table", - "slug": "test_table", - "description": "Table description", - "temporalCoverage": { - "start": "2020", - "end": "2023", - }, - "cloudTables": { - "edges": [ - { - "node": { - "gcpProjectId": "basedosdados", - "gcpDatasetId": "test_dataset", - "gcpTableId": "test_table", - } - } - ] - }, - "columns": { - "edges": [ - { - "node": { - "id": "col-1", - "name": "column_name", - "description": "Column description", - "bigqueryType": { - "name": "COLUMN_TYPE" - }, - } - } - ] - }, - } - } - ] - }, - } - } - ] - } - } - } - - respx.post(self.GRAPHQL_URL).mock( - return_value=httpx.Response(200, json=mock_response) - ) - - respx.get(url__startswith="https://raw.githubusercontent.com").mock( - return_value=httpx.Response(200) - ) - - result = get_dataset_details.invoke({"dataset_id": "dataset-1"}) - output = ToolOutput.model_validate(json.loads(result)) - - assert output.status == "success" - assert output.results["tags"] == [] - assert output.results["themes"] == [] - assert output.results["organizations"] == [] - assert output.error_details is None - - @respx.mock - def test_table_without_cloud_tables(self): - """Test dataset with table that has no cloud tables.""" - mock_response = { - "data": { - "allDataset": { - "edges": [ - { - "node": { - "id": "dataset-1", - "name": "Test Dataset", - "slug": "test_dataset", - "description": "Dataset description", - "tags": {"edges": [{"node": {"name": "tag1"}}]}, - "themes": {"edges": [{"node": {"name": "theme1"}}]}, - "organizations": { - "edges": [ - {"node": {"name": "org1", "slug": "org1_slug"}} - ] - }, - "tables": { - "edges": [ - { - "node": { - "id": "table-1", - "name": "Test Table", - "slug": "test_table", - "description": "Table description", - "temporalCoverage": { - "start": "2020", - "end": "2023", - }, - "cloudTables": {"edges": []}, - "columns": { - "edges": [ - { - "node": { - "id": "col-1", - "name": "column_name", - "description": "Column description", - "bigqueryType": { - "name": "COLUMN_TYPE" - }, - } - } - ] - }, - } - } - ] - }, - } - } - ] - } - } - } - - respx.post(self.GRAPHQL_URL).mock( - return_value=httpx.Response(200, json=mock_response) - ) - - result = get_dataset_details.invoke({"dataset_id": "dataset-1"}) - output = ToolOutput.model_validate(json.loads(result)) - - assert output.status == "success" - assert output.results["tables"][0]["gcp_id"] is None - assert output.results["usage_guide"] is None - assert output.error_details is None - - @respx.mock - def test_get_dataset_details_dataset_not_found(self): - """Test error when dataset is not found.""" - respx.post(self.GRAPHQL_URL).mock( - return_value=httpx.Response( - 200, json={"data": {"allDataset": {"edges": []}}} - ) - ) - - result = get_dataset_details.invoke({"dataset_id": "nonexistent"}) - output = ToolOutput.model_validate(json.loads(result)) - - assert output.status == "error" - assert output.results is None - assert output.error_details.message == "Dataset nonexistent not found" - assert output.error_details.error_type == "DATASET_NOT_FOUND" - assert ( - output.error_details.instructions - == "Verify the dataset ID from `search_datasets` results" - ) - - -class TestExecuteBigQuerySQL: - """Tests for execute_bigquery_sql tool.""" - - @pytest.fixture - def mock_config(self) -> dict: - return {"configurable": {"thread_id": "test-thread", "user_id": "test-user"}} - - def test_successful_query(self, mocker: MockerFixture, mock_config: dict): - """Test successful SELECT query execution.""" - mock_dry_run_query_job = MagicMock() - mock_dry_run_query_job.statement_type = "SELECT" - - mock_query_job = MagicMock() - mock_query_job.result.return_value = [{"col1": "value1"}, {"col1": "value2"}] - - mock_bigquery_client = MagicMock(spec=bq.Client) - mock_bigquery_client.query.side_effect = [ - mock_dry_run_query_job, - mock_query_job, - ] - - mocker.patch( - "app.agent.tools.get_bigquery_client", return_value=mock_bigquery_client - ) - - result = execute_bigquery_sql.invoke( - {"sql_query": "SELECT * FROM project.dataset.table", "config": mock_config} - ) - - output = ToolOutput.model_validate(json.loads(result)) - - assert output.status == "success" - assert output.results == [{"col1": "value1"}, {"col1": "value2"}] - assert output.error_details is None - - def test_forbidden_statement_type(self, mocker: MockerFixture, mock_config: dict): - """Test error when statement is not SELECT.""" - mock_dry_run_query_job = MagicMock() - mock_dry_run_query_job.statement_type = "DELETE" - - mock_bigquery_client = MagicMock(spec=bq.Client) - mock_bigquery_client.query.return_value = mock_dry_run_query_job - - mocker.patch( - "app.agent.tools.get_bigquery_client", return_value=mock_bigquery_client - ) - - result = execute_bigquery_sql.invoke( - {"sql_query": "DELETE FROM project.dataset.table", "config": mock_config} - ) - - output = ToolOutput.model_validate(json.loads(result)) - - assert output.status == "error" - assert output.results is None - assert output.error_details.error_type == "FORBIDDEN_STATEMENT" - assert ( - output.error_details.message - == "Query aborted: Statement DELETE is forbidden." - ) - assert ( - output.error_details.instructions - == "Your access is strictly read-only. Use only SELECT statements." - ) - - -class TestDecodeTableValues: - """Tests for decode_table_values tool.""" - - @pytest.fixture - def mock_config(self) -> dict: - return {"configurable": {"thread_id": "test-thread", "user_id": "test-user"}} - - def test_decode_all_columns(self, mocker: MockerFixture, mock_config: dict): - """Test decoding all columns from a table.""" - mock_query_job = MagicMock() - mock_query_job.result.return_value = [ - {"nome_coluna": "col1", "chave": "1", "valor": "Value 1"}, - {"nome_coluna": "col2", "chave": "2", "valor": "Value 2"}, - ] - - mock_bigquery_client = MagicMock(spec=bq.Client) - mock_bigquery_client.query.return_value = mock_query_job - - mocker.patch( - "app.agent.tools.get_bigquery_client", return_value=mock_bigquery_client - ) - - result = decode_table_values.invoke( - {"table_gcp_id": "project.dataset.table", "config": mock_config} - ) - - output = ToolOutput.model_validate(json.loads(result)) - - assert output.status == "success" - assert len(output.results) == 2 - assert output.error_details is None - - def test_decode_specific_column(self, mocker: MockerFixture, mock_config: dict): - """Test decoding a specific column.""" - mock_query_job = MagicMock() - mock_query_job.result.return_value = [ - {"nome_coluna": "col1", "chave": "1", "valor": "Value 1"}, - {"nome_coluna": "col1", "chave": "2", "valor": "Value 2"}, - ] - - mock_bigquery_client = MagicMock(spec=bq.Client) - mock_bigquery_client.query.return_value = mock_query_job - - mocker.patch( - "app.agent.tools.get_bigquery_client", return_value=mock_bigquery_client - ) - - result = decode_table_values.invoke( - { - "table_gcp_id": "project.dataset.table", - "column_name": "col1", - "config": mock_config, - } - ) - - output = ToolOutput.model_validate(json.loads(result)) - - assert output.status == "success" - # Verify column filter was added to query - call_args = mock_bigquery_client.query.call_args[0][0] - assert "nome_coluna = 'col1'" in call_args - - def test_dictionary_not_found(self, mocker: MockerFixture, mock_config: dict): - """Test error when dictionary table doesn't exist.""" - error = NotFound( - message="Table not found", - errors=[{"reason": "notFound", "message": "Test message"}], - ) - - mock_bigquery_client = MagicMock(spec=bq.Client) - mock_bigquery_client.query.side_effect = error - - mocker.patch( - "app.agent.tools.get_bigquery_client", return_value=mock_bigquery_client - ) - - result = decode_table_values.invoke( - {"table_gcp_id": "project.dataset.table", "config": mock_config} - ) - - output = ToolOutput.model_validate(json.loads(result)) - - assert output.status == "error" - assert output.results is None - assert output.error_details.error_type == "notFound" - assert output.error_details.message == "Test message" - assert ( - output.error_details.instructions - == "Dictionary table not found for this dataset." - ) - - def test_invalid_table_reference(self, mock_config: dict): - """Test error when table reference format is invalid.""" - result = decode_table_values.invoke( - {"table_gcp_id": "table", "config": mock_config} - ) - - output = ToolOutput.model_validate(json.loads(result)) - - assert output.status == "error" - assert output.results is None - assert output.error_details.error_type == "INVALID_TABLE_REFERENCE" - assert output.error_details.message == "Invalid table reference: 'table'" - assert ( - output.error_details.instructions - == "Provide a valid table reference in the format `project.dataset.table`" - ) - - -class TestBDToolkit: - """Tests for BDToolkit class.""" - - def test_get_tools_returns_all_tools(self): - """Test that get_tools returns all expected tools.""" - tools = BDToolkit.get_tools() - - assert len(tools) == 4 - - tool_names = [tool.name for tool in tools] - - assert "search_datasets" in tool_names - assert "get_dataset_details" in tool_names - assert "execute_bigquery_sql" in tool_names - assert "decode_table_values" in tool_names diff --git a/tests/app/agent/tools/test_api.py b/tests/app/agent/tools/test_api.py new file mode 100644 index 0000000..649330a --- /dev/null +++ b/tests/app/agent/tools/test_api.py @@ -0,0 +1,437 @@ +import json + +import httpx +import pytest +import respx + +from app.agent.tools.api import get_dataset_details, get_table_details, search_datasets +from app.settings import settings + + +class TestSearchDatasets: + """Tests for search_datasets tool.""" + + SEARCH_ENDPOINT = f"{settings.BASEDOSDADOS_BASE_URL}/search/" + + @respx.mock + def test_search_datasets_returns_overviews(self): + """Test successful dataset search.""" + mock_response = { + "results": [ + { + "id": "dataset-1", + "name": "Test Dataset", + "slug": "test_dataset", + "description": "Dataset description", + "tags": [{"name": "tag1"}, {"name": "tag2"}], + "themes": [{"name": "theme1"}, {"name": "theme2"}], + "organizations": [{"name": "org1"}], + } + ] + } + + respx.get(self.SEARCH_ENDPOINT).mock( + return_value=httpx.Response(200, json=mock_response) + ) + + result = search_datasets.invoke({"query": "test"}) + output = json.loads(result) + + assert len(output) == 1 + + dataset = output[0] + + assert dataset["id"] == "dataset-1" + assert dataset["name"] == "Test Dataset" + assert dataset["slug"] == "test_dataset" + assert dataset["description"] == "Dataset description" + assert dataset["tags"] == ["tag1", "tag2"] + assert dataset["themes"] == ["theme1", "theme2"] + assert dataset["organizations"] == ["org1"] + + @respx.mock + def test_search_datasets_returns_empty_results(self): + """Test successful dataset search with no results.""" + respx.get(self.SEARCH_ENDPOINT).mock( + return_value=httpx.Response(200, json={"results": []}) + ) + + result = search_datasets.invoke({"query": "nonexistent"}) + output = json.loads(result) + + assert output == [] + + +class TestGetDatasetDetails: + """Tests for get_dataset_details tool.""" + + GRAPHQL_URL = f"{settings.BASEDOSDADOS_BASE_URL}/graphql" + + @pytest.fixture + def mock_response(self): + return { + "data": { + "allDataset": { + "edges": [ + { + "node": { + "id": "DatasetNode:dataset-1", + "name": "Test Dataset", + "slug": "test_dataset", + "description": "Dataset description", + "tags": {"edges": [{"node": {"name": "tag1"}}]}, + "themes": {"edges": [{"node": {"name": "theme1"}}]}, + "organizations": { + "edges": [ + {"node": {"name": "org1", "slug": "org1_slug"}} + ] + }, + "tables": { + "edges": [ + { + "node": { + "id": "TableNode:table-1", + "name": "Test Table", + "slug": "test_table", + "description": "Table description", + "temporalCoverage": { + "start": "2020", + "end": "2023", + }, + "cloudTables": { + "edges": [ + { + "node": { + "gcpProjectId": "basedosdados", + "gcpDatasetId": "test_dataset", + "gcpTableId": "test_table", + } + } + ] + }, + } + } + ] + }, + } + } + ] + } + } + } + + @respx.mock + def test_get_dataset_details_success(self, mock_response): + """Test successful dataset details retrieval.""" + # Mock graphql endpoint + respx.post(self.GRAPHQL_URL).mock( + return_value=httpx.Response(200, json=mock_response) + ) + + # Mock usage guide (not found) + respx.get(url__startswith="https://raw.githubusercontent.com").mock( + return_value=httpx.Response(404) + ) + + result = get_dataset_details.invoke({"dataset_id": "dataset-1"}) + dataset = json.loads(result) + + assert dataset["id"] == "dataset-1" + assert dataset["name"] == "Test Dataset" + assert dataset["slug"] == "test_dataset" + assert dataset["description"] == "Dataset description" + assert dataset["tags"] == ["tag1"] + assert dataset["themes"] == ["theme1"] + assert dataset["organizations"] == ["org1"] + assert dataset["usage_guide"] is None + + assert len(dataset["tables"]) == 1 + + table = dataset["tables"][0] + + assert table["id"] == "table-1" + assert table["gcp_id"] == "basedosdados.test_dataset.test_table" + assert table["name"] == "Test Table" + assert table["slug"] == "test_table" + assert table["description"] == "Table description" + assert table["temporal_coverage"] == {"start": "2020", "end": "2023"} + + @respx.mock + def test_get_dataset_details_success_with_usage_guide(self, mock_response): + """Test dataset details with usage guide available.""" + respx.post(self.GRAPHQL_URL).mock( + return_value=httpx.Response(200, json=mock_response) + ) + + respx.get(url__startswith="https://raw.githubusercontent.com").mock( + return_value=httpx.Response(200, text="# This is a usage guide.") + ) + + result = get_dataset_details.invoke({"dataset_id": "dataset-1"}) + dataset = json.loads(result) + + assert dataset["usage_guide"] == "# This is a usage guide." + + @respx.mock + def test_table_without_tags_themes_orgs(self): + """Test dataset with table that has no tags, themes and orgs.""" + mock_response = { + "data": { + "allDataset": { + "edges": [ + { + "node": { + "id": "dataset-1", + "name": "Test Dataset", + "slug": "test_dataset", + "description": "Dataset description", + "tags": {"edges": [{"node": {}}]}, + "themes": {"edges": [{"node": {}}]}, + "organizations": {"edges": [{"node": {}}]}, + "tables": { + "edges": [ + { + "node": { + "id": "table-1", + "name": "Test Table", + "slug": "test_table", + "description": "Table description", + "temporalCoverage": { + "start": "2020", + "end": "2023", + }, + "cloudTables": { + "edges": [ + { + "node": { + "gcpProjectId": "basedosdados", + "gcpDatasetId": "test_dataset", + "gcpTableId": "test_table", + } + } + ] + }, + } + } + ] + }, + } + } + ] + } + } + } + + respx.post(self.GRAPHQL_URL).mock( + return_value=httpx.Response(200, json=mock_response) + ) + + respx.get(url__startswith="https://raw.githubusercontent.com").mock( + return_value=httpx.Response(200) + ) + + result = get_dataset_details.invoke({"dataset_id": "dataset-1"}) + dataset = json.loads(result) + + assert dataset["tags"] == [] + assert dataset["themes"] == [] + assert dataset["organizations"] == [] + + @respx.mock + def test_table_without_cloud_tables(self): + """Test dataset with table that has no cloud tables.""" + mock_response = { + "data": { + "allDataset": { + "edges": [ + { + "node": { + "id": "dataset-1", + "name": "Test Dataset", + "slug": "test_dataset", + "description": "Dataset description", + "tags": {"edges": [{"node": {"name": "tag1"}}]}, + "themes": {"edges": [{"node": {"name": "theme1"}}]}, + "organizations": { + "edges": [ + {"node": {"name": "org1", "slug": "org1_slug"}} + ] + }, + "tables": { + "edges": [ + { + "node": { + "id": "table-1", + "name": "Test Table", + "slug": "test_table", + "description": "Table description", + "temporalCoverage": { + "start": "2020", + "end": "2023", + }, + "cloudTables": {"edges": []}, + } + } + ] + }, + } + } + ] + } + } + } + + respx.post(self.GRAPHQL_URL).mock( + return_value=httpx.Response(200, json=mock_response) + ) + + result = get_dataset_details.invoke({"dataset_id": "dataset-1"}) + dataset = json.loads(result) + + assert dataset["tables"][0]["gcp_id"] is None + assert dataset["usage_guide"] is None + + @respx.mock + def test_get_dataset_details_dataset_not_found(self): + """Test error when dataset is not found.""" + respx.post(self.GRAPHQL_URL).mock( + return_value=httpx.Response( + 200, json={"data": {"allDataset": {"edges": []}}} + ) + ) + + result = get_dataset_details.invoke({"dataset_id": "nonexistent"}) + output = json.loads(result) + + assert output["status"] == "error" + assert ( + output["message"] + == "Dataset 'nonexistent' not found. Verify the dataset ID from search_datasets results." + ) + + +class TestGetTableDetails: + """Tests for get_table_details tool.""" + + GRAPHQL_URL = f"{settings.BASEDOSDADOS_BASE_URL}/graphql" + + @pytest.fixture + def mock_response(self): + return { + "data": { + "allTable": { + "edges": [ + { + "node": { + "id": "TableNode:table-1", + "name": "Test Table", + "slug": "test_table", + "description": "Table description", + "temporalCoverage": { + "start": "2020", + "end": "2023", + }, + "cloudTables": { + "edges": [ + { + "node": { + "gcpProjectId": "basedosdados", + "gcpDatasetId": "test_dataset", + "gcpTableId": "test_table", + } + } + ] + }, + "columns": { + "edges": [ + { + "node": { + "id": "col-1", + "name": "column_name", + "description": "Column description", + "bigqueryType": {"name": "STRING"}, + "directoryPrimaryKey": None, + } + }, + { + "node": { + "id": "col-2", + "name": "id_municipio", + "description": "Municipality ID", + "bigqueryType": {"name": "STRING"}, + "directoryPrimaryKey": { + "table": { + "id": "TableNode:dir-table-1" + } + }, + } + }, + ] + }, + } + } + ] + } + } + } + + @respx.mock + def test_get_table_details_success(self, mock_response): + """Test successful table details retrieval.""" + respx.post(self.GRAPHQL_URL).mock( + return_value=httpx.Response(200, json=mock_response) + ) + + result = get_table_details.invoke({"table_id": "table-1"}) + table = json.loads(result) + + assert table["id"] == "table-1" + assert table["gcp_id"] == "basedosdados.test_dataset.test_table" + assert table["name"] == "Test Table" + assert table["slug"] == "test_table" + assert table["description"] == "Table description" + assert table["temporal_coverage"] == {"start": "2020", "end": "2023"} + + assert len(table["columns"]) == 2 + + assert table["columns"][0]["name"] == "column_name" + assert table["columns"][0]["type"] == "STRING" + assert table["columns"][0]["description"] == "Column description" + assert "reference_table_id" not in table["columns"][0] + + assert table["columns"][1]["name"] == "id_municipio" + assert table["columns"][1]["type"] == "STRING" + assert table["columns"][1]["description"] == "Municipality ID" + assert table["columns"][1]["reference_table_id"] == "dir-table-1" + + @respx.mock + def test_get_table_details_without_cloud_tables(self, mock_response): + """Test table details when no cloud tables exist.""" + mock_response["data"]["allTable"]["edges"][0]["node"]["cloudTables"] = { + "edges": [] + } + + respx.post(self.GRAPHQL_URL).mock( + return_value=httpx.Response(200, json=mock_response) + ) + + result = get_table_details.invoke({"table_id": "table-1"}) + table = json.loads(result) + + assert table["gcp_id"] is None + + @respx.mock + def test_get_table_details_not_found(self): + """Test error when table is not found.""" + respx.post(self.GRAPHQL_URL).mock( + return_value=httpx.Response(200, json={"data": {"allTable": {"edges": []}}}) + ) + + result = get_table_details.invoke({"table_id": "nonexistent"}) + output = json.loads(result) + + assert output["status"] == "error" + assert ( + output["message"] + == "Table 'nonexistent' not found. Verify the table ID from get_dataset_details results." + ) diff --git a/tests/app/agent/tools/test_bigquery.py b/tests/app/agent/tools/test_bigquery.py new file mode 100644 index 0000000..1d71563 --- /dev/null +++ b/tests/app/agent/tools/test_bigquery.py @@ -0,0 +1,248 @@ +import json +from unittest.mock import MagicMock + +import pytest +from google.api_core.exceptions import BadRequest, NotFound +from google.cloud import bigquery as bq +from pytest_mock import MockerFixture + +from app.agent.tools.bigquery import ( + MAX_BYTES_BILLED, + decode_table_values, + execute_bigquery_sql, +) + + +class TestExecuteBigQuerySQL: + """Tests for execute_bigquery_sql tool.""" + + @pytest.fixture + def mock_config(self) -> dict: + return {"configurable": {"thread_id": "test-thread", "user_id": "test-user"}} + + def test_successful_query(self, mocker: MockerFixture, mock_config: dict): + """Test successful SELECT query execution.""" + mock_dry_run_query_job = MagicMock() + mock_dry_run_query_job.statement_type = "SELECT" + + mock_query_job = MagicMock() + mock_query_job.result.return_value = [{"col1": "value1"}, {"col1": "value2"}] + + mock_bigquery_client = MagicMock(spec=bq.Client) + mock_bigquery_client.query.side_effect = [ + mock_dry_run_query_job, + mock_query_job, + ] + + mocker.patch( + "app.agent.tools.bigquery._get_client", return_value=mock_bigquery_client + ) + + result = execute_bigquery_sql.invoke( + {"sql_query": "SELECT * FROM project.dataset.table", "config": mock_config} + ) + + output = json.loads(result) + + assert output == [{"col1": "value1"}, {"col1": "value2"}] + + def test_forbidden_statement_type(self, mocker: MockerFixture, mock_config: dict): + """Test error when statement is not SELECT.""" + mock_dry_run_query_job = MagicMock() + mock_dry_run_query_job.statement_type = "DELETE" + + mock_bigquery_client = MagicMock(spec=bq.Client) + mock_bigquery_client.query.return_value = mock_dry_run_query_job + + mocker.patch( + "app.agent.tools.bigquery._get_client", return_value=mock_bigquery_client + ) + + result = execute_bigquery_sql.invoke( + {"sql_query": "DELETE FROM project.dataset.table", "config": mock_config} + ) + + output = json.loads(result) + + assert output["status"] == "error" + assert output["message"] == "Only SELECT statements are allowed, got DELETE." + + def test_bytes_billed_limit_exceeded( + self, mocker: MockerFixture, mock_config: dict + ): + """Test error when query exceeds bytes billed limit.""" + mock_dry_run_query_job = MagicMock() + mock_dry_run_query_job.statement_type = "SELECT" + + error = BadRequest( + message="Query limit exceeded", + errors=[ + {"reason": "bytesBilledLimitExceeded", "message": "Limit exceeded"} + ], + ) + + mock_bigquery_client = MagicMock(spec=bq.Client) + mock_bigquery_client.query.side_effect = [mock_dry_run_query_job, error] + + mocker.patch( + "app.agent.tools.bigquery._get_client", return_value=mock_bigquery_client + ) + + result = execute_bigquery_sql.invoke( + {"sql_query": "SELECT * FROM project.dataset.table", "config": mock_config} + ) + + output = json.loads(result) + + assert output["status"] == "error" + assert output["message"] == ( + f"Query exceeds the {MAX_BYTES_BILLED // 10**9}GB processing limit. " + "Add WHERE filters or select fewer columns." + ) + + def test_google_api_error_reraise(self, mocker: MockerFixture, mock_config: dict): + """Test that non-bytesBilledLimitExceeded GoogleAPICallError is re-raised.""" + mock_dry_run_query_job = MagicMock() + mock_dry_run_query_job.statement_type = "SELECT" + + error = BadRequest( + message="Syntax error", + errors=[{"reason": "testReason", "message": "Test message"}], + ) + + mock_bigquery_client = MagicMock(spec=bq.Client) + mock_bigquery_client.query.side_effect = [mock_dry_run_query_job, error] + + mocker.patch( + "app.agent.tools.bigquery._get_client", return_value=mock_bigquery_client + ) + + result = execute_bigquery_sql.invoke( + {"sql_query": "SELECT * FROM project.dataset.table", "config": mock_config} + ) + + output = json.loads(result) + + assert output["status"] == "error" + assert output["message"] == "400 Syntax error" + + +class TestDecodeTableValues: + """Tests for decode_table_values tool.""" + + @pytest.fixture + def mock_config(self) -> dict: + return {"configurable": {"thread_id": "test-thread", "user_id": "test-user"}} + + def test_decode_all_columns(self, mocker: MockerFixture, mock_config: dict): + """Test decoding all columns from a table.""" + mock_query_job = MagicMock() + mock_query_job.result.return_value = [ + {"nome_coluna": "col1", "chave": "1", "valor": "Value 1"}, + {"nome_coluna": "col2", "chave": "2", "valor": "Value 2"}, + ] + + mock_bigquery_client = MagicMock(spec=bq.Client) + mock_bigquery_client.query.return_value = mock_query_job + + mocker.patch( + "app.agent.tools.bigquery._get_client", return_value=mock_bigquery_client + ) + + result = decode_table_values.invoke( + {"table_gcp_id": "project.dataset.table", "config": mock_config} + ) + + output = json.loads(result) + + assert len(output) == 2 + + def test_decode_specific_column(self, mocker: MockerFixture, mock_config: dict): + """Test decoding a specific column.""" + mock_query_job = MagicMock() + mock_query_job.result.return_value = [ + {"nome_coluna": "col1", "chave": "1", "valor": "Value 1"}, + {"nome_coluna": "col1", "chave": "2", "valor": "Value 2"}, + ] + + mock_bigquery_client = MagicMock(spec=bq.Client) + mock_bigquery_client.query.return_value = mock_query_job + + mocker.patch( + "app.agent.tools.bigquery._get_client", return_value=mock_bigquery_client + ) + + result = decode_table_values.invoke( + { + "table_gcp_id": "project.dataset.table", + "column_name": "col1", + "config": mock_config, + } + ) + + output = json.loads(result) + + assert len(output) == 2 + # Verify column filter was added to query + call_args = mock_bigquery_client.query.call_args[0][0] + assert "nome_coluna = 'col1'" in call_args + + def test_dictionary_not_found(self, mocker: MockerFixture, mock_config: dict): + """Test error when dictionary table doesn't exist.""" + error = NotFound( + message="Table not found", + errors=[{"reason": "notFound", "message": "Test message"}], + ) + + mock_bigquery_client = MagicMock(spec=bq.Client) + mock_bigquery_client.query.side_effect = error + + mocker.patch( + "app.agent.tools.bigquery._get_client", return_value=mock_bigquery_client + ) + + result = decode_table_values.invoke( + {"table_gcp_id": "project.dataset.table", "config": mock_config} + ) + + output = json.loads(result) + + assert output["status"] == "error" + assert output["message"] == "Dictionary table not found for this dataset." + + def test_invalid_table_reference(self, mock_config: dict): + """Test error when table reference format is invalid.""" + result = decode_table_values.invoke( + {"table_gcp_id": "table", "config": mock_config} + ) + + output = json.loads(result) + + assert output["status"] == "error" + assert ( + output["message"] + == "Invalid table reference: 'table'. Expected format: project.dataset.table" + ) + + def test_google_api_error_reraise(self, mocker: MockerFixture, mock_config: dict): + """Test that non-notFound GoogleAPICallError is re-raised.""" + error = BadRequest( + message="Syntax error", + errors=[{"reason": "testReason", "message": "Test message"}], + ) + + mock_bigquery_client = MagicMock(spec=bq.Client) + mock_bigquery_client.query.side_effect = error + + mocker.patch( + "app.agent.tools.bigquery._get_client", return_value=mock_bigquery_client + ) + + result = decode_table_values.invoke( + {"table_gcp_id": "project.dataset.table", "config": mock_config} + ) + + output = json.loads(result) + + assert output["status"] == "error" + assert output["message"] == "400 Syntax error" diff --git a/tests/app/agent/tools/test_exceptions.py b/tests/app/agent/tools/test_exceptions.py new file mode 100644 index 0000000..9731216 --- /dev/null +++ b/tests/app/agent/tools/test_exceptions.py @@ -0,0 +1,46 @@ +import json + +from google.api_core.exceptions import BadRequest + +from app.agent.tools.exceptions import handle_tool_errors + + +class TestHandleToolErrors: + """Tests for handle_tool_errors decorator.""" + + def test_decorator_passes_through_success(self): + """Test decorator returns function result on success.""" + + @handle_tool_errors + def successful_function(): + return '{"key": "value"}' + + output = successful_function() + assert json.loads(output) == {"key": "value"} + + def test_decorator_catches_exception(self): + """Test decorator catches exceptions and returns ToolError JSON.""" + + @handle_tool_errors + def failing_function(): + raise ValueError("something went wrong") + + output = json.loads(failing_function()) + + assert output["status"] == "error" + assert output["message"] == "something went wrong" + + def test_decorator_catches_google_api_error(self): + """Test decorator catches GoogleAPICallError.""" + + @handle_tool_errors + def failing_function(): + raise BadRequest( + message="Some bad request", + errors=[{"reason": "testReason", "message": "Test message"}], + ) + + output = json.loads(failing_function()) + + assert output["status"] == "error" + assert output["message"] == "400 Some bad request" diff --git a/tests/app/agent/tools/test_toolkit.py b/tests/app/agent/tools/test_toolkit.py new file mode 100644 index 0000000..c416306 --- /dev/null +++ b/tests/app/agent/tools/test_toolkit.py @@ -0,0 +1,19 @@ +from app.agent.tools import BDToolkit + + +class TestBDToolkit: + """Tests for BDToolkit class.""" + + def test_get_tools_returns_all_tools(self): + """Test that get_tools returns all expected tools.""" + tools = BDToolkit.get_tools() + + assert len(tools) == 5 + + tool_names = [tool.name for tool in tools] + + assert "search_datasets" in tool_names + assert "get_dataset_details" in tool_names + assert "get_table_details" in tool_names + assert "execute_bigquery_sql" in tool_names + assert "decode_table_values" in tool_names diff --git a/tests/app/api/routers/test_chatbot.py b/tests/app/api/routers/test_chatbot.py index 0de01f2..e1f5792 100644 --- a/tests/app/api/routers/test_chatbot.py +++ b/tests/app/api/routers/test_chatbot.py @@ -1,6 +1,7 @@ import uuid from contextlib import asynccontextmanager from datetime import datetime, timezone +from unittest.mock import AsyncMock import jwt import pytest @@ -29,9 +30,9 @@ def send_feedback(self, feedback: Feedback, created: bool): return FeedbackSyncStatus.SUCCESS, datetime.now(timezone.utc) -class MockReActAgent: - def __init__(self): - self.checkpointer = None +class MockAgent: + def __init__(self, checkpointer=None): + self.checkpointer = checkpointer def invoke(self, input, config): return {"messages": [AIMessage("Mock response")]} @@ -40,12 +41,12 @@ async def ainvoke(self, input, config): return {"messages": [AIMessage("Mock response")]} def stream(self, input, config, stream_mode): - chunk = {"agent": {"messages": [AIMessage("Mock response")]}} + chunk = {"model": {"messages": [AIMessage("Mock response")]}} yield "updates", chunk yield "values", chunk async def astream(self, input, config, stream_mode): - chunk = {"agent": {"messages": [AIMessage("Mock response")]}} + chunk = {"model": {"messages": [AIMessage("Mock response")]}} yield "updates", chunk yield "values", chunk @@ -84,7 +85,7 @@ def access_token(user_id: str) -> str: def client(database: AsyncDatabase): @asynccontextmanager async def mock_lifespan(app: FastAPI): - app.state.agent = MockReActAgent() + app.state.agent = MockAgent() yield def get_database_override(): @@ -235,6 +236,21 @@ def test_delete_thread_success( ) assert response.status_code == status.HTTP_200_OK + def test_delete_thread_with_checkpointer( + self, client: TestClient, access_token: str, thread: Thread + ): + """Test successful thread deletion also deletes checkpoints.""" + mock_checkpointer = AsyncMock() + app.state.agent.checkpointer = mock_checkpointer + + response = client.delete( + url=f"/api/v1/chatbot/threads/{thread.id}", + headers={"Authorization": f"Bearer {access_token}"}, + ) + + assert response.status_code == status.HTTP_200_OK + mock_checkpointer.adelete_thread.assert_called_once_with(str(thread.id)) + def test_delete_thread_not_found(self, client: TestClient, access_token: str): """Test deleting non-existent thread returns 404.""" response = client.delete( @@ -368,7 +384,8 @@ def test_send_message_success( event = StreamEvent.model_validate_json(line) events.append(event) - assert len(events) >= 1 + assert len(events) >= 2 + assert any(event.type == "final_answer" for event in events) assert events[-1].type == "complete" assert events[-1].data.run_id is not None diff --git a/tests/app/api/test_streaming.py b/tests/app/api/test_streaming.py index 2fc9046..1e1d468 100644 --- a/tests/app/api/test_streaming.py +++ b/tests/app/api/test_streaming.py @@ -4,12 +4,12 @@ from unittest.mock import AsyncMock, MagicMock import pytest -from google.api_core import exceptions as google_api_exceptions from langchain_core.messages import AIMessage, ToolMessage from app.api.schemas import ConfigDict from app.api.streaming import ( ErrorMessage, + _parse_thinking, _process_chunk, _truncate_json, stream_response, @@ -111,6 +111,64 @@ def test_truncate_json_invalid(self): assert _truncate_json(invalid_json_string) == invalid_json_string +class TestParseThinking: + """Tests for _parse_thinking function.""" + + def test_string_content_returns_none(self): + """Test that plain string content returns None.""" + message = AIMessage(content="Hello, world!") + assert _parse_thinking(message) is None + + def test_single_thinking_block(self): + """Test extraction of a single thinking block.""" + message = AIMessage( + content=[ + {"type": "thinking", "thinking": "Let me reason about this."}, + {"type": "text", "text": "Here is my answer."}, + ] + ) + assert _parse_thinking(message) == "Let me reason about this." + + def test_multiple_thinking_blocks_are_concatenated(self): + """Test that multiple thinking blocks are concatenated.""" + message = AIMessage( + content=[ + {"type": "thinking", "thinking": "First thought. "}, + {"type": "text", "text": "Some text."}, + {"type": "thinking", "thinking": "Second thought."}, + ] + ) + assert _parse_thinking(message) == "First thought. Second thought." + + def test_no_thinking_blocks_returns_none(self): + """Test that content with no thinking blocks returns None.""" + message = AIMessage( + content=[ + {"type": "text", "text": "Just text."}, + ] + ) + assert _parse_thinking(message) is None + + def test_empty_thinking_block_returns_none(self): + """Test that an empty thinking string returns None.""" + message = AIMessage( + content=[ + {"type": "thinking", "thinking": ""}, + ] + ) + assert _parse_thinking(message) is None + + def test_non_dict_blocks_are_skipped(self): + """Test that non-dict items in content are safely skipped.""" + message = AIMessage( + content=[ + "plain string block", + {"type": "thinking", "thinking": "Actual thinking."}, + ] + ) + assert _parse_thinking(message) == "Actual thinking." + + class TestProcessChunk: """Tests for _process_chunk function.""" @@ -471,7 +529,7 @@ async def test_stream_response_generic_exception( mock_agent = MagicMock() async def mock_astream(*args, **kwargs): - raise RuntimeError("Something went wrong") + raise Exception("Something went wrong") yield # Makes this an async generator mock_agent.astream = mock_astream @@ -534,39 +592,3 @@ async def mock_astream(*args, **kwargs): call_args = mock_database.create_message.call_args[0][0] assert call_args.status == MessageStatus.SUCCESS assert call_args.content == ErrorMessage.MODEL_CALL_LIMIT_REACHED - - async def test_stream_response_google_api_error( - self, - mock_database, - mock_user_message, - mock_config, - mock_thread_id, - mock_model_uri, - ): - """Test Google API InvalidArgument yields error event.""" - mock_agent = MagicMock() - - async def mock_astream(*args, **kwargs): - raise google_api_exceptions.InvalidArgument("Invalid request") - yield # Makes this an async generator - - mock_agent.astream = mock_astream - - events = await self._collect_events( - stream_response( - database=mock_database, - agent=mock_agent, - user_message=mock_user_message, - config=mock_config, - thread_id=mock_thread_id, - model_uri=mock_model_uri, - ) - ) - - assert len(events) == 2 - assert ErrorMessage.UNEXPECTED in events[0] - assert '"type":"complete"' in events[1] - - call_args = mock_database.create_message.call_args[0][0] - assert call_args.status == MessageStatus.ERROR - assert call_args.content == ErrorMessage.UNEXPECTED diff --git a/uv.lock b/uv.lock index 244bbff..2f9757a 100644 --- a/uv.lock +++ b/uv.lock @@ -176,7 +176,7 @@ requires-dist = [ { name = "langsmith", specifier = ">=0.6.0" }, { name = "loguru", specifier = ">=0.7.3" }, { name = "psycopg", extras = ["binary"], specifier = ">=3.3.2" }, - { name = "pydantic", specifier = "<2.12.0" }, + { name = "pydantic", specifier = ">=2.12.0" }, { name = "pydantic-settings", specifier = ">=2.12.0" }, { name = "pyjwt", specifier = ">=2.10.1" }, { name = "sqlmodel", specifier = ">=0.0.31" }, @@ -1337,7 +1337,7 @@ wheels = [ [[package]] name = "pydantic" -version = "2.11.10" +version = "2.12.5" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "annotated-types" }, @@ -1345,9 +1345,9 @@ dependencies = [ { name = "typing-extensions" }, { name = "typing-inspection" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/ae/54/ecab642b3bed45f7d5f59b38443dcb36ef50f85af192e6ece103dbfe9587/pydantic-2.11.10.tar.gz", hash = "sha256:dc280f0982fbda6c38fada4e476dc0a4f3aeaf9c6ad4c28df68a666ec3c61423", size = 788494, upload-time = "2025-10-04T10:40:41.338Z" } +sdist = { url = "https://files.pythonhosted.org/packages/69/44/36f1a6e523abc58ae5f928898e4aca2e0ea509b5aa6f6f392a5d882be928/pydantic-2.12.5.tar.gz", hash = "sha256:4d351024c75c0f085a9febbb665ce8c0c6ec5d30e903bdb6394b7ede26aebb49", size = 821591, upload-time = "2025-11-26T15:11:46.471Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/bd/1f/73c53fcbfb0b5a78f91176df41945ca466e71e9d9d836e5c522abda39ee7/pydantic-2.11.10-py3-none-any.whl", hash = "sha256:802a655709d49bd004c31e865ef37da30b540786a46bfce02333e0e24b5fe29a", size = 444823, upload-time = "2025-10-04T10:40:39.055Z" }, + { url = "https://files.pythonhosted.org/packages/5a/87/b70ad306ebb6f9b585f114d0ac2137d792b48be34d732d60e597c2f8465a/pydantic-2.12.5-py3-none-any.whl", hash = "sha256:e561593fccf61e8a20fc46dfc2dfe075b8be7d0188df33f221ad1f0139180f9d", size = 463580, upload-time = "2025-11-26T15:11:44.605Z" }, ] [package.optional-dependencies] @@ -1357,44 +1357,45 @@ email = [ [[package]] name = "pydantic-core" -version = "2.33.2" +version = "2.41.5" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "typing-extensions" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/ad/88/5f2260bdfae97aabf98f1778d43f69574390ad787afb646292a638c923d4/pydantic_core-2.33.2.tar.gz", hash = "sha256:7cb8bc3605c29176e1b105350d2e6474142d7c1bd1d9327c4a9bdb46bf827acc", size = 435195, upload-time = "2025-04-23T18:33:52.104Z" } -wheels = [ - { url = "https://files.pythonhosted.org/packages/18/8a/2b41c97f554ec8c71f2a8a5f85cb56a8b0956addfe8b0efb5b3d77e8bdc3/pydantic_core-2.33.2-cp312-cp312-macosx_10_12_x86_64.whl", hash = "sha256:a7ec89dc587667f22b6a0b6579c249fca9026ce7c333fc142ba42411fa243cdc", size = 2009000, upload-time = "2025-04-23T18:31:25.863Z" }, - { url = "https://files.pythonhosted.org/packages/a1/02/6224312aacb3c8ecbaa959897af57181fb6cf3a3d7917fd44d0f2917e6f2/pydantic_core-2.33.2-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:3c6db6e52c6d70aa0d00d45cdb9b40f0433b96380071ea80b09277dba021ddf7", size = 1847996, upload-time = "2025-04-23T18:31:27.341Z" }, - { url = "https://files.pythonhosted.org/packages/d6/46/6dcdf084a523dbe0a0be59d054734b86a981726f221f4562aed313dbcb49/pydantic_core-2.33.2-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:4e61206137cbc65e6d5256e1166f88331d3b6238e082d9f74613b9b765fb9025", size = 1880957, upload-time = "2025-04-23T18:31:28.956Z" }, - { url = "https://files.pythonhosted.org/packages/ec/6b/1ec2c03837ac00886ba8160ce041ce4e325b41d06a034adbef11339ae422/pydantic_core-2.33.2-cp312-cp312-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:eb8c529b2819c37140eb51b914153063d27ed88e3bdc31b71198a198e921e011", size = 1964199, upload-time = "2025-04-23T18:31:31.025Z" }, - { url = "https://files.pythonhosted.org/packages/2d/1d/6bf34d6adb9debd9136bd197ca72642203ce9aaaa85cfcbfcf20f9696e83/pydantic_core-2.33.2-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:c52b02ad8b4e2cf14ca7b3d918f3eb0ee91e63b3167c32591e57c4317e134f8f", size = 2120296, upload-time = "2025-04-23T18:31:32.514Z" }, - { url = "https://files.pythonhosted.org/packages/e0/94/2bd0aaf5a591e974b32a9f7123f16637776c304471a0ab33cf263cf5591a/pydantic_core-2.33.2-cp312-cp312-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:96081f1605125ba0855dfda83f6f3df5ec90c61195421ba72223de35ccfb2f88", size = 2676109, upload-time = "2025-04-23T18:31:33.958Z" }, - { url = "https://files.pythonhosted.org/packages/f9/41/4b043778cf9c4285d59742281a769eac371b9e47e35f98ad321349cc5d61/pydantic_core-2.33.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:8f57a69461af2a5fa6e6bbd7a5f60d3b7e6cebb687f55106933188e79ad155c1", size = 2002028, upload-time = "2025-04-23T18:31:39.095Z" }, - { url = "https://files.pythonhosted.org/packages/cb/d5/7bb781bf2748ce3d03af04d5c969fa1308880e1dca35a9bd94e1a96a922e/pydantic_core-2.33.2-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:572c7e6c8bb4774d2ac88929e3d1f12bc45714ae5ee6d9a788a9fb35e60bb04b", size = 2100044, upload-time = "2025-04-23T18:31:41.034Z" }, - { url = "https://files.pythonhosted.org/packages/fe/36/def5e53e1eb0ad896785702a5bbfd25eed546cdcf4087ad285021a90ed53/pydantic_core-2.33.2-cp312-cp312-musllinux_1_1_aarch64.whl", hash = "sha256:db4b41f9bd95fbe5acd76d89920336ba96f03e149097365afe1cb092fceb89a1", size = 2058881, upload-time = "2025-04-23T18:31:42.757Z" }, - { url = "https://files.pythonhosted.org/packages/01/6c/57f8d70b2ee57fc3dc8b9610315949837fa8c11d86927b9bb044f8705419/pydantic_core-2.33.2-cp312-cp312-musllinux_1_1_armv7l.whl", hash = "sha256:fa854f5cf7e33842a892e5c73f45327760bc7bc516339fda888c75ae60edaeb6", size = 2227034, upload-time = "2025-04-23T18:31:44.304Z" }, - { url = "https://files.pythonhosted.org/packages/27/b9/9c17f0396a82b3d5cbea4c24d742083422639e7bb1d5bf600e12cb176a13/pydantic_core-2.33.2-cp312-cp312-musllinux_1_1_x86_64.whl", hash = "sha256:5f483cfb75ff703095c59e365360cb73e00185e01aaea067cd19acffd2ab20ea", size = 2234187, upload-time = "2025-04-23T18:31:45.891Z" }, - { url = "https://files.pythonhosted.org/packages/b0/6a/adf5734ffd52bf86d865093ad70b2ce543415e0e356f6cacabbc0d9ad910/pydantic_core-2.33.2-cp312-cp312-win32.whl", hash = "sha256:9cb1da0f5a471435a7bc7e439b8a728e8b61e59784b2af70d7c169f8dd8ae290", size = 1892628, upload-time = "2025-04-23T18:31:47.819Z" }, - { url = "https://files.pythonhosted.org/packages/43/e4/5479fecb3606c1368d496a825d8411e126133c41224c1e7238be58b87d7e/pydantic_core-2.33.2-cp312-cp312-win_amd64.whl", hash = "sha256:f941635f2a3d96b2973e867144fde513665c87f13fe0e193c158ac51bfaaa7b2", size = 1955866, upload-time = "2025-04-23T18:31:49.635Z" }, - { url = "https://files.pythonhosted.org/packages/0d/24/8b11e8b3e2be9dd82df4b11408a67c61bb4dc4f8e11b5b0fc888b38118b5/pydantic_core-2.33.2-cp312-cp312-win_arm64.whl", hash = "sha256:cca3868ddfaccfbc4bfb1d608e2ccaaebe0ae628e1416aeb9c4d88c001bb45ab", size = 1888894, upload-time = "2025-04-23T18:31:51.609Z" }, - { url = "https://files.pythonhosted.org/packages/46/8c/99040727b41f56616573a28771b1bfa08a3d3fe74d3d513f01251f79f172/pydantic_core-2.33.2-cp313-cp313-macosx_10_12_x86_64.whl", hash = "sha256:1082dd3e2d7109ad8b7da48e1d4710c8d06c253cbc4a27c1cff4fbcaa97a9e3f", size = 2015688, upload-time = "2025-04-23T18:31:53.175Z" }, - { url = "https://files.pythonhosted.org/packages/3a/cc/5999d1eb705a6cefc31f0b4a90e9f7fc400539b1a1030529700cc1b51838/pydantic_core-2.33.2-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:f517ca031dfc037a9c07e748cefd8d96235088b83b4f4ba8939105d20fa1dcd6", size = 1844808, upload-time = "2025-04-23T18:31:54.79Z" }, - { url = "https://files.pythonhosted.org/packages/6f/5e/a0a7b8885c98889a18b6e376f344da1ef323d270b44edf8174d6bce4d622/pydantic_core-2.33.2-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:0a9f2c9dd19656823cb8250b0724ee9c60a82f3cdf68a080979d13092a3b0fef", size = 1885580, upload-time = "2025-04-23T18:31:57.393Z" }, - { url = "https://files.pythonhosted.org/packages/3b/2a/953581f343c7d11a304581156618c3f592435523dd9d79865903272c256a/pydantic_core-2.33.2-cp313-cp313-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:2b0a451c263b01acebe51895bfb0e1cc842a5c666efe06cdf13846c7418caa9a", size = 1973859, upload-time = "2025-04-23T18:31:59.065Z" }, - { url = "https://files.pythonhosted.org/packages/e6/55/f1a813904771c03a3f97f676c62cca0c0a4138654107c1b61f19c644868b/pydantic_core-2.33.2-cp313-cp313-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:1ea40a64d23faa25e62a70ad163571c0b342b8bf66d5fa612ac0dec4f069d916", size = 2120810, upload-time = "2025-04-23T18:32:00.78Z" }, - { url = "https://files.pythonhosted.org/packages/aa/c3/053389835a996e18853ba107a63caae0b9deb4a276c6b472931ea9ae6e48/pydantic_core-2.33.2-cp313-cp313-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:0fb2d542b4d66f9470e8065c5469ec676978d625a8b7a363f07d9a501a9cb36a", size = 2676498, upload-time = "2025-04-23T18:32:02.418Z" }, - { url = "https://files.pythonhosted.org/packages/eb/3c/f4abd740877a35abade05e437245b192f9d0ffb48bbbbd708df33d3cda37/pydantic_core-2.33.2-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:9fdac5d6ffa1b5a83bca06ffe7583f5576555e6c8b3a91fbd25ea7780f825f7d", size = 2000611, upload-time = "2025-04-23T18:32:04.152Z" }, - { url = "https://files.pythonhosted.org/packages/59/a7/63ef2fed1837d1121a894d0ce88439fe3e3b3e48c7543b2a4479eb99c2bd/pydantic_core-2.33.2-cp313-cp313-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:04a1a413977ab517154eebb2d326da71638271477d6ad87a769102f7c2488c56", size = 2107924, upload-time = "2025-04-23T18:32:06.129Z" }, - { url = "https://files.pythonhosted.org/packages/04/8f/2551964ef045669801675f1cfc3b0d74147f4901c3ffa42be2ddb1f0efc4/pydantic_core-2.33.2-cp313-cp313-musllinux_1_1_aarch64.whl", hash = "sha256:c8e7af2f4e0194c22b5b37205bfb293d166a7344a5b0d0eaccebc376546d77d5", size = 2063196, upload-time = "2025-04-23T18:32:08.178Z" }, - { url = "https://files.pythonhosted.org/packages/26/bd/d9602777e77fc6dbb0c7db9ad356e9a985825547dce5ad1d30ee04903918/pydantic_core-2.33.2-cp313-cp313-musllinux_1_1_armv7l.whl", hash = "sha256:5c92edd15cd58b3c2d34873597a1e20f13094f59cf88068adb18947df5455b4e", size = 2236389, upload-time = "2025-04-23T18:32:10.242Z" }, - { url = "https://files.pythonhosted.org/packages/42/db/0e950daa7e2230423ab342ae918a794964b053bec24ba8af013fc7c94846/pydantic_core-2.33.2-cp313-cp313-musllinux_1_1_x86_64.whl", hash = "sha256:65132b7b4a1c0beded5e057324b7e16e10910c106d43675d9bd87d4f38dde162", size = 2239223, upload-time = "2025-04-23T18:32:12.382Z" }, - { url = "https://files.pythonhosted.org/packages/58/4d/4f937099c545a8a17eb52cb67fe0447fd9a373b348ccfa9a87f141eeb00f/pydantic_core-2.33.2-cp313-cp313-win32.whl", hash = "sha256:52fb90784e0a242bb96ec53f42196a17278855b0f31ac7c3cc6f5c1ec4811849", size = 1900473, upload-time = "2025-04-23T18:32:14.034Z" }, - { url = "https://files.pythonhosted.org/packages/a0/75/4a0a9bac998d78d889def5e4ef2b065acba8cae8c93696906c3a91f310ca/pydantic_core-2.33.2-cp313-cp313-win_amd64.whl", hash = "sha256:c083a3bdd5a93dfe480f1125926afcdbf2917ae714bdb80b36d34318b2bec5d9", size = 1955269, upload-time = "2025-04-23T18:32:15.783Z" }, - { url = "https://files.pythonhosted.org/packages/f9/86/1beda0576969592f1497b4ce8e7bc8cbdf614c352426271b1b10d5f0aa64/pydantic_core-2.33.2-cp313-cp313-win_arm64.whl", hash = "sha256:e80b087132752f6b3d714f041ccf74403799d3b23a72722ea2e6ba2e892555b9", size = 1893921, upload-time = "2025-04-23T18:32:18.473Z" }, - { url = "https://files.pythonhosted.org/packages/a4/7d/e09391c2eebeab681df2b74bfe6c43422fffede8dc74187b2b0bf6fd7571/pydantic_core-2.33.2-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:61c18fba8e5e9db3ab908620af374db0ac1baa69f0f32df4f61ae23f15e586ac", size = 1806162, upload-time = "2025-04-23T18:32:20.188Z" }, - { url = "https://files.pythonhosted.org/packages/f1/3d/847b6b1fed9f8ed3bb95a9ad04fbd0b212e832d4f0f50ff4d9ee5a9f15cf/pydantic_core-2.33.2-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:95237e53bb015f67b63c91af7518a62a8660376a6a0db19b89acc77a4d6199f5", size = 1981560, upload-time = "2025-04-23T18:32:22.354Z" }, - { url = "https://files.pythonhosted.org/packages/6f/9a/e73262f6c6656262b5fdd723ad90f518f579b7bc8622e43a942eec53c938/pydantic_core-2.33.2-cp313-cp313t-win_amd64.whl", hash = "sha256:c2fc0a768ef76c15ab9238afa6da7f69895bb5d1ee83aeea2e3509af4472d0b9", size = 1935777, upload-time = "2025-04-23T18:32:25.088Z" }, +sdist = { url = "https://files.pythonhosted.org/packages/71/70/23b021c950c2addd24ec408e9ab05d59b035b39d97cdc1130e1bce647bb6/pydantic_core-2.41.5.tar.gz", hash = "sha256:08daa51ea16ad373ffd5e7606252cc32f07bc72b28284b6bc9c6df804816476e", size = 460952, upload-time = "2025-11-04T13:43:49.098Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/5f/5d/5f6c63eebb5afee93bcaae4ce9a898f3373ca23df3ccaef086d0233a35a7/pydantic_core-2.41.5-cp312-cp312-macosx_10_12_x86_64.whl", hash = "sha256:f41a7489d32336dbf2199c8c0a215390a751c5b014c2c1c5366e817202e9cdf7", size = 2110990, upload-time = "2025-11-04T13:39:58.079Z" }, + { url = "https://files.pythonhosted.org/packages/aa/32/9c2e8ccb57c01111e0fd091f236c7b371c1bccea0fa85247ac55b1e2b6b6/pydantic_core-2.41.5-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:070259a8818988b9a84a449a2a7337c7f430a22acc0859c6b110aa7212a6d9c0", size = 1896003, upload-time = "2025-11-04T13:39:59.956Z" }, + { url = "https://files.pythonhosted.org/packages/68/b8/a01b53cb0e59139fbc9e4fda3e9724ede8de279097179be4ff31f1abb65a/pydantic_core-2.41.5-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:e96cea19e34778f8d59fe40775a7a574d95816eb150850a85a7a4c8f4b94ac69", size = 1919200, upload-time = "2025-11-04T13:40:02.241Z" }, + { url = "https://files.pythonhosted.org/packages/38/de/8c36b5198a29bdaade07b5985e80a233a5ac27137846f3bc2d3b40a47360/pydantic_core-2.41.5-cp312-cp312-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:ed2e99c456e3fadd05c991f8f437ef902e00eedf34320ba2b0842bd1c3ca3a75", size = 2052578, upload-time = "2025-11-04T13:40:04.401Z" }, + { url = "https://files.pythonhosted.org/packages/00/b5/0e8e4b5b081eac6cb3dbb7e60a65907549a1ce035a724368c330112adfdd/pydantic_core-2.41.5-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:65840751b72fbfd82c3c640cff9284545342a4f1eb1586ad0636955b261b0b05", size = 2208504, upload-time = "2025-11-04T13:40:06.072Z" }, + { url = "https://files.pythonhosted.org/packages/77/56/87a61aad59c7c5b9dc8caad5a41a5545cba3810c3e828708b3d7404f6cef/pydantic_core-2.41.5-cp312-cp312-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:e536c98a7626a98feb2d3eaf75944ef6f3dbee447e1f841eae16f2f0a72d8ddc", size = 2335816, upload-time = "2025-11-04T13:40:07.835Z" }, + { url = "https://files.pythonhosted.org/packages/0d/76/941cc9f73529988688a665a5c0ecff1112b3d95ab48f81db5f7606f522d3/pydantic_core-2.41.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:eceb81a8d74f9267ef4081e246ffd6d129da5d87e37a77c9bde550cb04870c1c", size = 2075366, upload-time = "2025-11-04T13:40:09.804Z" }, + { url = "https://files.pythonhosted.org/packages/d3/43/ebef01f69baa07a482844faaa0a591bad1ef129253ffd0cdaa9d8a7f72d3/pydantic_core-2.41.5-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:d38548150c39b74aeeb0ce8ee1d8e82696f4a4e16ddc6de7b1d8823f7de4b9b5", size = 2171698, upload-time = "2025-11-04T13:40:12.004Z" }, + { url = "https://files.pythonhosted.org/packages/b1/87/41f3202e4193e3bacfc2c065fab7706ebe81af46a83d3e27605029c1f5a6/pydantic_core-2.41.5-cp312-cp312-musllinux_1_1_aarch64.whl", hash = "sha256:c23e27686783f60290e36827f9c626e63154b82b116d7fe9adba1fda36da706c", size = 2132603, upload-time = "2025-11-04T13:40:13.868Z" }, + { url = "https://files.pythonhosted.org/packages/49/7d/4c00df99cb12070b6bccdef4a195255e6020a550d572768d92cc54dba91a/pydantic_core-2.41.5-cp312-cp312-musllinux_1_1_armv7l.whl", hash = "sha256:482c982f814460eabe1d3bb0adfdc583387bd4691ef00b90575ca0d2b6fe2294", size = 2329591, upload-time = "2025-11-04T13:40:15.672Z" }, + { url = "https://files.pythonhosted.org/packages/cc/6a/ebf4b1d65d458f3cda6a7335d141305dfa19bdc61140a884d165a8a1bbc7/pydantic_core-2.41.5-cp312-cp312-musllinux_1_1_x86_64.whl", hash = "sha256:bfea2a5f0b4d8d43adf9d7b8bf019fb46fdd10a2e5cde477fbcb9d1fa08c68e1", size = 2319068, upload-time = "2025-11-04T13:40:17.532Z" }, + { url = "https://files.pythonhosted.org/packages/49/3b/774f2b5cd4192d5ab75870ce4381fd89cf218af999515baf07e7206753f0/pydantic_core-2.41.5-cp312-cp312-win32.whl", hash = "sha256:b74557b16e390ec12dca509bce9264c3bbd128f8a2c376eaa68003d7f327276d", size = 1985908, upload-time = "2025-11-04T13:40:19.309Z" }, + { url = "https://files.pythonhosted.org/packages/86/45/00173a033c801cacf67c190fef088789394feaf88a98a7035b0e40d53dc9/pydantic_core-2.41.5-cp312-cp312-win_amd64.whl", hash = "sha256:1962293292865bca8e54702b08a4f26da73adc83dd1fcf26fbc875b35d81c815", size = 2020145, upload-time = "2025-11-04T13:40:21.548Z" }, + { url = "https://files.pythonhosted.org/packages/f9/22/91fbc821fa6d261b376a3f73809f907cec5ca6025642c463d3488aad22fb/pydantic_core-2.41.5-cp312-cp312-win_arm64.whl", hash = "sha256:1746d4a3d9a794cacae06a5eaaccb4b8643a131d45fbc9af23e353dc0a5ba5c3", size = 1976179, upload-time = "2025-11-04T13:40:23.393Z" }, + { url = "https://files.pythonhosted.org/packages/87/06/8806241ff1f70d9939f9af039c6c35f2360cf16e93c2ca76f184e76b1564/pydantic_core-2.41.5-cp313-cp313-macosx_10_12_x86_64.whl", hash = "sha256:941103c9be18ac8daf7b7adca8228f8ed6bb7a1849020f643b3a14d15b1924d9", size = 2120403, upload-time = "2025-11-04T13:40:25.248Z" }, + { url = "https://files.pythonhosted.org/packages/94/02/abfa0e0bda67faa65fef1c84971c7e45928e108fe24333c81f3bfe35d5f5/pydantic_core-2.41.5-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:112e305c3314f40c93998e567879e887a3160bb8689ef3d2c04b6cc62c33ac34", size = 1896206, upload-time = "2025-11-04T13:40:27.099Z" }, + { url = "https://files.pythonhosted.org/packages/15/df/a4c740c0943e93e6500f9eb23f4ca7ec9bf71b19e608ae5b579678c8d02f/pydantic_core-2.41.5-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:0cbaad15cb0c90aa221d43c00e77bb33c93e8d36e0bf74760cd00e732d10a6a0", size = 1919307, upload-time = "2025-11-04T13:40:29.806Z" }, + { url = "https://files.pythonhosted.org/packages/9a/e3/6324802931ae1d123528988e0e86587c2072ac2e5394b4bc2bc34b61ff6e/pydantic_core-2.41.5-cp313-cp313-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:03ca43e12fab6023fc79d28ca6b39b05f794ad08ec2feccc59a339b02f2b3d33", size = 2063258, upload-time = "2025-11-04T13:40:33.544Z" }, + { url = "https://files.pythonhosted.org/packages/c9/d4/2230d7151d4957dd79c3044ea26346c148c98fbf0ee6ebd41056f2d62ab5/pydantic_core-2.41.5-cp313-cp313-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:dc799088c08fa04e43144b164feb0c13f9a0bc40503f8df3e9fde58a3c0c101e", size = 2214917, upload-time = "2025-11-04T13:40:35.479Z" }, + { url = "https://files.pythonhosted.org/packages/e6/9f/eaac5df17a3672fef0081b6c1bb0b82b33ee89aa5cec0d7b05f52fd4a1fa/pydantic_core-2.41.5-cp313-cp313-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:97aeba56665b4c3235a0e52b2c2f5ae9cd071b8a8310ad27bddb3f7fb30e9aa2", size = 2332186, upload-time = "2025-11-04T13:40:37.436Z" }, + { url = "https://files.pythonhosted.org/packages/cf/4e/35a80cae583a37cf15604b44240e45c05e04e86f9cfd766623149297e971/pydantic_core-2.41.5-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:406bf18d345822d6c21366031003612b9c77b3e29ffdb0f612367352aab7d586", size = 2073164, upload-time = "2025-11-04T13:40:40.289Z" }, + { url = "https://files.pythonhosted.org/packages/bf/e3/f6e262673c6140dd3305d144d032f7bd5f7497d3871c1428521f19f9efa2/pydantic_core-2.41.5-cp313-cp313-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:b93590ae81f7010dbe380cdeab6f515902ebcbefe0b9327cc4804d74e93ae69d", size = 2179146, upload-time = "2025-11-04T13:40:42.809Z" }, + { url = "https://files.pythonhosted.org/packages/75/c7/20bd7fc05f0c6ea2056a4565c6f36f8968c0924f19b7d97bbfea55780e73/pydantic_core-2.41.5-cp313-cp313-musllinux_1_1_aarch64.whl", hash = "sha256:01a3d0ab748ee531f4ea6c3e48ad9dac84ddba4b0d82291f87248f2f9de8d740", size = 2137788, upload-time = "2025-11-04T13:40:44.752Z" }, + { url = "https://files.pythonhosted.org/packages/3a/8d/34318ef985c45196e004bc46c6eab2eda437e744c124ef0dbe1ff2c9d06b/pydantic_core-2.41.5-cp313-cp313-musllinux_1_1_armv7l.whl", hash = "sha256:6561e94ba9dacc9c61bce40e2d6bdc3bfaa0259d3ff36ace3b1e6901936d2e3e", size = 2340133, upload-time = "2025-11-04T13:40:46.66Z" }, + { url = "https://files.pythonhosted.org/packages/9c/59/013626bf8c78a5a5d9350d12e7697d3d4de951a75565496abd40ccd46bee/pydantic_core-2.41.5-cp313-cp313-musllinux_1_1_x86_64.whl", hash = "sha256:915c3d10f81bec3a74fbd4faebe8391013ba61e5a1a8d48c4455b923bdda7858", size = 2324852, upload-time = "2025-11-04T13:40:48.575Z" }, + { url = "https://files.pythonhosted.org/packages/1a/d9/c248c103856f807ef70c18a4f986693a46a8ffe1602e5d361485da502d20/pydantic_core-2.41.5-cp313-cp313-win32.whl", hash = "sha256:650ae77860b45cfa6e2cdafc42618ceafab3a2d9a3811fcfbd3bbf8ac3c40d36", size = 1994679, upload-time = "2025-11-04T13:40:50.619Z" }, + { url = "https://files.pythonhosted.org/packages/9e/8b/341991b158ddab181cff136acd2552c9f35bd30380422a639c0671e99a91/pydantic_core-2.41.5-cp313-cp313-win_amd64.whl", hash = "sha256:79ec52ec461e99e13791ec6508c722742ad745571f234ea6255bed38c6480f11", size = 2019766, upload-time = "2025-11-04T13:40:52.631Z" }, + { url = "https://files.pythonhosted.org/packages/73/7d/f2f9db34af103bea3e09735bb40b021788a5e834c81eedb541991badf8f5/pydantic_core-2.41.5-cp313-cp313-win_arm64.whl", hash = "sha256:3f84d5c1b4ab906093bdc1ff10484838aca54ef08de4afa9de0f5f14d69639cd", size = 1981005, upload-time = "2025-11-04T13:40:54.734Z" }, + { url = "https://files.pythonhosted.org/packages/09/32/59b0c7e63e277fa7911c2fc70ccfb45ce4b98991e7ef37110663437005af/pydantic_core-2.41.5-graalpy312-graalpy250_312_native-macosx_10_12_x86_64.whl", hash = "sha256:7da7087d756b19037bc2c06edc6c170eeef3c3bafcb8f532ff17d64dc427adfd", size = 2110495, upload-time = "2025-11-04T13:42:49.689Z" }, + { url = "https://files.pythonhosted.org/packages/aa/81/05e400037eaf55ad400bcd318c05bb345b57e708887f07ddb2d20e3f0e98/pydantic_core-2.41.5-graalpy312-graalpy250_312_native-macosx_11_0_arm64.whl", hash = "sha256:aabf5777b5c8ca26f7824cb4a120a740c9588ed58df9b2d196ce92fba42ff8dc", size = 1915388, upload-time = "2025-11-04T13:42:52.215Z" }, + { url = "https://files.pythonhosted.org/packages/6e/0d/e3549b2399f71d56476b77dbf3cf8937cec5cd70536bdc0e374a421d0599/pydantic_core-2.41.5-graalpy312-graalpy250_312_native-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:c007fe8a43d43b3969e8469004e9845944f1a80e6acd47c150856bb87f230c56", size = 1942879, upload-time = "2025-11-04T13:42:56.483Z" }, + { url = "https://files.pythonhosted.org/packages/f7/07/34573da085946b6a313d7c42f82f16e8920bfd730665de2d11c0c37a74b5/pydantic_core-2.41.5-graalpy312-graalpy250_312_native-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:76d0819de158cd855d1cbb8fcafdf6f5cf1eb8e470abe056d5d161106e38062b", size = 2139017, upload-time = "2025-11-04T13:42:59.471Z" }, ] [[package]]