Agent Specification
Spec Driven Development — HealthQuery v0.3
Overview
Natural language query agent for a healthcare dataset. The user asks a question in Portuguese, Spanish, or English — the agent interprets it, generates SQL SELECT, executes it on the connected database, and returns the answer in text with the reasoning exposed.
Operates on 5 tables: pacientes, consultas, diagnosticos, medicamentos, exames. Limit of 30 queries per session. No data is modified.
Capabilities
What the agent does
- Accepts natural language questions in PT, ES and EN
- Classifies intent using the Haiku model before any other call
- Generates SQL SELECT compatible with the connected database (SQLite in prep, Teradata at the event)
- Executes the query and handles the result — returned rows, empty result, SQL error
- Returns response in 3 parts: direct answer + generated SQL (always exposed) + reasoning
- Supports multi-step questions: decomposes into sub-queries, reports how many will be used before executing
- Manages query counter: warns when 5 remain, shuts down at 30
- Logs each interaction with classified intent, SQL, execution time, tokens
- Conversational memory with 30-message sliding window — supports anaphoric refinements
- Post-response next-question suggestions, in the user's language
- Report mode — synthesizes session history without incrementing the query counter
What the agent does NOT do
- No writes — no INSERT, UPDATE, DELETE, DROP, CREATE, ALTER (programmatic block)
- Does not answer questions outside the dataset domain
- Does not formulate medical diagnoses or give clinical advice
- Does not execute SQL passed directly by the user
- Does not reference tables outside the 5 defined in the schema (programmatic block)
- Does not continue after the 30-query limit
- Does not invent data — if not found in the database, says so
7 Tools
| Tool | Function |
|---|---|
execute_sql |
Core NL→SQL with multi-layer guard rails |
detect_anomalies |
Compares resultado_valor with referencia_min/max, calculates deviation %, classifies as altered/critical |
compare_periods |
2 aggregation queries, absolute diff + percentage change per column |
rank_patients |
CTE with 4 weighted factors, top N with score and justification |
patient_profile |
4 internal queries, clinical prose synthesis by Sonnet |
trend_analysis |
Splits records into 2 halves, compares averages, interprets relative to reference range |
calculate |
Safe AST evaluator for compound growth, exponential, multi-step formulas |
Guard Rails
Guard rails are programmatic validations in the code, not in the prompt. The model does not decide what is allowed — the code decides first.
| Guard Rail | Type | Trigger | Action |
|---|---|---|---|
| Write blocked | Programmatic | Haiku classifies as write_attempt |
Blocks, responds without calling DB |
| Query limit | Programmatic | Counter ≥ 30 | Ends session |
| Schema enforcement | Programmatic | SQL references table outside the 5 defined | Rejects before executing |
| PII + sensitive diagnosis | Programmatic | Query would return name + condition without aggregation | Forces aggregation or refuses |
| check_pii_result | Programmatic | Actual result contains name column + clinical columns | Blocks — catches SELECT * and split-queries |
Behavior by Situation
Clear question, direct query
Classifies as data_query, generates SELECT, executes, returns result with SQL and reasoning.
Ambiguous question
Asks 1 clarifying question per turn. If still ambiguous, picks the most common interpretation and informs which one was chosen.
Out-of-scope question
Classifies as out_of_scope and informs that it operates exclusively on the healthcare dataset.
Write or modification attempt
Haiku classifies as write_attempt → programmatic block before reaching the main model or database.
Query returns empty
Does not invent data, does not speculate. Reports that no results were found for the queried condition.
SQL error
Catches the error, attempts auto-correction once (re-sends error + SQL to model). If it fails again, informs the user. The attempt counts toward the 30-query limit.
Multi-step question
Explicitly decomposes before executing: reports how many queries will be used and what each one resolves.
Approaching the limit
Warns when 5 remain. Ends with a clear message at 30. Guides the user to start a new session.
PII + sensitive condition
Blocks return of identifiers combined with a specific diagnosis. Offers count or aggregation as alternative. Production-safe standard even with fake data.
Input is not a question (greeting)
Classifies as greeting, responds briefly. Does not decrement the query counter.
Schema — 5 tables
pacientes
| Column | Type | Description |
|---|---|---|
| id | INTEGER PK | Identifier |
| nome | TEXT | Full name |
| data_nascimento | DATE | To calculate age |
| sexo | TEXT | 'M' or 'F' |
| cidade | TEXT | City of residence |
| estado | TEXT | State (UF) |
| plano_saude | TEXT | E.g.: "Unimed", "SUS", "Bradesco" |
consultas
| Column | Type | Description |
|---|---|---|
| id | INTEGER PK | Identifier |
| paciente_id | INTEGER FK | → pacientes |
| data_consulta | DATE | Consultation date |
| medico | TEXT | Doctor's name |
| especialidade | TEXT | E.g.: "Cardiology" |
| tipo | TEXT | 'elective', 'emergency', 'follow-up' |
| duracao_min | INTEGER | Duration in minutes |
diagnosticos
| Column | Type | Description |
|---|---|---|
| id | INTEGER PK | Identifier |
| paciente_id | INTEGER FK | → pacientes |
| consulta_id | INTEGER FK | → consultas |
| cid | TEXT | ICD code (e.g.: E11.9, I10) |
| descricao | TEXT | E.g.: "Type 2 Diabetes" |
| data_diagnostico | DATE | Diagnosis date |
| status | TEXT | 'active', 'chronic', 'resolved' |
medicamentos
| Column | Type | Description |
|---|---|---|
| id | INTEGER PK | Identifier |
| paciente_id | INTEGER FK | → pacientes |
| consulta_id | INTEGER FK | → consultas (nullable) |
| nome_comercial | TEXT | E.g.: "Losartan" |
| principio_ativo | TEXT | Active ingredient |
| dosagem | TEXT | E.g.: "500mg" |
| frequencia | TEXT | E.g.: "twice daily" |
| data_inicio | DATE | Treatment start |
| data_fim | DATE | End date (NULL = active) |
| status | TEXT | 'active', 'suspended', 'completed' |
exames
| Column | Type | Description |
|---|---|---|
| id | INTEGER PK | Identifier |
| paciente_id | INTEGER FK | → pacientes |
| consulta_id | INTEGER FK | → consultas (nullable) |
| tipo_exame | TEXT | 'laboratory', 'imaging', 'functional' |
| nome_exame | TEXT | E.g.: "Fasting blood glucose" |
| resultado_valor | REAL | For numeric results (nullable) |
| resultado_texto | TEXT | For textual results |
| unidade | TEXT | E.g.: "mg/dL" |
| referencia_min | REAL | Minimum normal value |
| referencia_max | REAL | Maximum normal value |
| data_coleta | DATE | Collection date |
| status | TEXT | 'normal', 'altered', 'critical' |
Design Decisions
| Decision | Choice | Reason |
|---|---|---|
| Intent classifier | Haiku (model) | More robust than regex for edge cases; negligible cost |
| Guard rails | Programmatic, not via prompt | Prompt can be bypassed; code cannot |
| SQL auto-correction | Maximum 1 retry | Loops are expensive; 2 failures signal input ambiguity |
| Query counting | Includes failed attempts | Aligns with the 30-query hackathon limit |
| SQL always exposed | Yes | Observability and credibility |
| SELECT * | Never allowed | PII leak via wildcard |
| EXISTS for multiple conditions | Separate EXISTS, not AND on same row | A diagnosis record only has one ICD code; AND would return empty |
| tool_choice="none" on final_response | Enforced | Prevents BadRequestError 400 when model tries to generate a second tool call |
| Language | PT, ES and EN | Patrick works in PT; hackathon is global |
Observability — Log Structure
Each interaction is logged to logs/session_<id>.jsonl. One record per line.
{
"session_id": "uuid",
"query_number": 3,
"timestamp": "ISO8601",
"user_input": "...",
"intent_classified": "data_query",
"sql_generated": "SELECT ...",
"execution_time_ms": 120,
"rows_returned": 47,
"error": null,
"model_used": "claude-sonnet-4-6",
"input_tokens": 340,
"output_tokens": 85
}
Changelog
- 7 tools: +detect_anomalies, +compare_periods, +rank_patients, +patient_profile, +trend_analysis, +calculate
- 6 intents: +greeting, +report_request
- Conversational memory with 30-message sliding window
- Post-response next-question suggestions
- Report mode without incrementing query_count
- Multilingual PT/ES/EN — classifier detects language
- 12 rules in SYSTEM_PROMPT (vs. 5 original) — 7 new ones emerged from test bugs
- check_pii_result: analyzes actual result columns, not just the SQL
- Support for multiple parallel tool_use blocks
- tool_choice="none" on final_response — fixes BadRequestError 400
- Haiku classifier calibration: examples of clarification_needed vs. data_query
- Initial spec: overview, capabilities, 4 guard rails, 5-table schema
- 10 behavior situations documented
- JSONL log structure defined
HealthQuery — AI Agent Hackathon Pilot — Teradata Global Services — June 2026