Docs — HealthQuery

Overview

Natural language query agent for a healthcare dataset. The user asks a question in Portuguese, Spanish, or English — the agent interprets it, generates SQL SELECT, executes it on the connected database, and returns the answer in text with the reasoning exposed.

Operates on 5 tables: pacientes, consultas, diagnosticos, medicamentos, exames. Limit of 30 queries per session. No data is modified.

Capabilities

What the agent does

Accepts natural language questions in PT, ES and EN
Classifies intent using the Haiku model before any other call
Generates SQL SELECT compatible with the connected database (SQLite in prep, Teradata at the event)
Executes the query and handles the result — returned rows, empty result, SQL error
Returns response in 3 parts: direct answer + generated SQL (always exposed) + reasoning
Supports multi-step questions: decomposes into sub-queries, reports how many will be used before executing
Manages query counter: warns when 5 remain, shuts down at 30
Logs each interaction with classified intent, SQL, execution time, tokens
Conversational memory with 30-message sliding window — supports anaphoric refinements
Post-response next-question suggestions, in the user's language
Report mode — synthesizes session history without incrementing the query counter

What the agent does NOT do

No writes — no INSERT, UPDATE, DELETE, DROP, CREATE, ALTER (programmatic block)
Does not answer questions outside the dataset domain
Does not formulate medical diagnoses or give clinical advice
Does not execute SQL passed directly by the user
Does not reference tables outside the 5 defined in the schema (programmatic block)
Does not continue after the 30-query limit
Does not invent data — if not found in the database, says so

7 Tools

Tool	Function
`execute_sql`	Core NL→SQL with multi-layer guard rails
`detect_anomalies`	Compares resultado_valor with referencia_min/max, calculates deviation %, classifies as altered/critical
`compare_periods`	2 aggregation queries, absolute diff + percentage change per column
`rank_patients`	CTE with 4 weighted factors, top N with score and justification
`patient_profile`	4 internal queries, clinical prose synthesis by Sonnet
`trend_analysis`	Splits records into 2 halves, compares averages, interprets relative to reference range
`calculate`	Safe AST evaluator for compound growth, exponential, multi-step formulas

Guard Rails

Guard rails are programmatic validations in the code, not in the prompt. The model does not decide what is allowed — the code decides first.

Guard Rail	Type	Trigger	Action
Write blocked	Programmatic	Haiku classifies as `write_attempt`	Blocks, responds without calling DB
Query limit	Programmatic	Counter ≥ 30	Ends session
Schema enforcement	Programmatic	SQL references table outside the 5 defined	Rejects before executing
PII + sensitive diagnosis	Programmatic	Query would return name + condition without aggregation	Forces aggregation or refuses
check_pii_result	Programmatic	Actual result contains name column + clinical columns	Blocks — catches SELECT * and split-queries

Behavior by Situation

1

Clear question, direct query

Classifies as data_query, generates SELECT, executes, returns result with SQL and reasoning.

2

Ambiguous question

Asks 1 clarifying question per turn. If still ambiguous, picks the most common interpretation and informs which one was chosen.

3

Out-of-scope question

Classifies as out_of_scope and informs that it operates exclusively on the healthcare dataset.

4

Write or modification attempt

Haiku classifies as write_attempt → programmatic block before reaching the main model or database.

5

Query returns empty

Does not invent data, does not speculate. Reports that no results were found for the queried condition.

6

SQL error

Catches the error, attempts auto-correction once (re-sends error + SQL to model). If it fails again, informs the user. The attempt counts toward the 30-query limit.

7

Multi-step question

Explicitly decomposes before executing: reports how many queries will be used and what each one resolves.

8

Approaching the limit

Warns when 5 remain. Ends with a clear message at 30. Guides the user to start a new session.

9

PII + sensitive condition

Blocks return of identifiers combined with a specific diagnosis. Offers count or aggregation as alternative. Production-safe standard even with fake data.

10

Input is not a question (greeting)

Classifies as greeting, responds briefly. Does not decrement the query counter.

Schema — 5 tables

pacientes

Column	Type	Description
id	INTEGER PK	Identifier
nome	TEXT	Full name
data_nascimento	DATE	To calculate age
sexo	TEXT	'M' or 'F'
cidade	TEXT	City of residence
estado	TEXT	State (UF)
plano_saude	TEXT	E.g.: "Unimed", "SUS", "Bradesco"

consultas

Column	Type	Description
id	INTEGER PK	Identifier
paciente_id	INTEGER FK	→ pacientes
data_consulta	DATE	Consultation date
medico	TEXT	Doctor's name
especialidade	TEXT	E.g.: "Cardiology"
tipo	TEXT	'elective', 'emergency', 'follow-up'
duracao_min	INTEGER	Duration in minutes

diagnosticos

Column	Type	Description
id	INTEGER PK	Identifier
paciente_id	INTEGER FK	→ pacientes
consulta_id	INTEGER FK	→ consultas
cid	TEXT	ICD code (e.g.: E11.9, I10)
descricao	TEXT	E.g.: "Type 2 Diabetes"
data_diagnostico	DATE	Diagnosis date
status	TEXT	'active', 'chronic', 'resolved'

medicamentos

Column	Type	Description
id	INTEGER PK	Identifier
paciente_id	INTEGER FK	→ pacientes
consulta_id	INTEGER FK	→ consultas (nullable)
nome_comercial	TEXT	E.g.: "Losartan"
principio_ativo	TEXT	Active ingredient
dosagem	TEXT	E.g.: "500mg"
frequencia	TEXT	E.g.: "twice daily"
data_inicio	DATE	Treatment start
data_fim	DATE	End date (NULL = active)
status	TEXT	'active', 'suspended', 'completed'

exames

Column	Type	Description
id	INTEGER PK	Identifier
paciente_id	INTEGER FK	→ pacientes
consulta_id	INTEGER FK	→ consultas (nullable)
tipo_exame	TEXT	'laboratory', 'imaging', 'functional'
nome_exame	TEXT	E.g.: "Fasting blood glucose"
resultado_valor	REAL	For numeric results (nullable)
resultado_texto	TEXT	For textual results
unidade	TEXT	E.g.: "mg/dL"
referencia_min	REAL	Minimum normal value
referencia_max	REAL	Maximum normal value
data_coleta	DATE	Collection date
status	TEXT	'normal', 'altered', 'critical'

Design Decisions

Decision	Choice	Reason
Intent classifier	Haiku (model)	More robust than regex for edge cases; negligible cost
Guard rails	Programmatic, not via prompt	Prompt can be bypassed; code cannot
SQL auto-correction	Maximum 1 retry	Loops are expensive; 2 failures signal input ambiguity
Query counting	Includes failed attempts	Aligns with the 30-query hackathon limit
SQL always exposed	Yes	Observability and credibility
SELECT *	Never allowed	PII leak via wildcard
EXISTS for multiple conditions	Separate EXISTS, not AND on same row	A diagnosis record only has one ICD code; AND would return empty
tool_choice="none" on final_response	Enforced	Prevents BadRequestError 400 when model tries to generate a second tool call
Language	PT, ES and EN	Patrick works in PT; hackathon is global

Observability — Log Structure

Each interaction is logged to logs/session_<id>.jsonl. One record per line.

{
  "session_id": "uuid",
  "query_number": 3,
  "timestamp": "ISO8601",
  "user_input": "...",
  "intent_classified": "data_query",
  "sql_generated": "SELECT ...",
  "execution_time_ms": 120,
  "rows_returned": 47,
  "error": null,
  "model_used": "claude-sonnet-4-6",
  "input_tokens": 340,
  "output_tokens": 85
}

Changelog

v0.3 — 16/05/2026

7 tools: +detect_anomalies, +compare_periods, +rank_patients, +patient_profile, +trend_analysis, +calculate
6 intents: +greeting, +report_request
Conversational memory with 30-message sliding window
Post-response next-question suggestions
Report mode without incrementing query_count
Multilingual PT/ES/EN — classifier detects language

v0.2 — 16/05/2026

12 rules in SYSTEM_PROMPT (vs. 5 original) — 7 new ones emerged from test bugs
check_pii_result: analyzes actual result columns, not just the SQL
Support for multiple parallel tool_use blocks
tool_choice="none" on final_response — fixes BadRequestError 400
Haiku classifier calibration: examples of clarification_needed vs. data_query

v0.1 — 15/05/2026

Initial spec: overview, capabilities, 4 guard rails, 5-table schema
10 behavior situations documented
JSONL log structure defined