HealthQuery

Agent Specification

Spec Driven Development — HealthQuery v0.3

v0.3 16/05/2026 7 tools 6 intents 18 rules
Overview

Natural language query agent for a healthcare dataset. The user asks a question in Portuguese, Spanish, or English — the agent interprets it, generates SQL SELECT, executes it on the connected database, and returns the answer in text with the reasoning exposed.

Operates on 5 tables: pacientes, consultas, diagnosticos, medicamentos, exames. Limit of 30 queries per session. No data is modified.

Capabilities

What the agent does

  1. Accepts natural language questions in PT, ES and EN
  2. Classifies intent using the Haiku model before any other call
  3. Generates SQL SELECT compatible with the connected database (SQLite in prep, Teradata at the event)
  4. Executes the query and handles the result — returned rows, empty result, SQL error
  5. Returns response in 3 parts: direct answer + generated SQL (always exposed) + reasoning
  6. Supports multi-step questions: decomposes into sub-queries, reports how many will be used before executing
  7. Manages query counter: warns when 5 remain, shuts down at 30
  8. Logs each interaction with classified intent, SQL, execution time, tokens
  9. Conversational memory with 30-message sliding window — supports anaphoric refinements
  10. Post-response next-question suggestions, in the user's language
  11. Report mode — synthesizes session history without incrementing the query counter

What the agent does NOT do

  1. No writes — no INSERT, UPDATE, DELETE, DROP, CREATE, ALTER (programmatic block)
  2. Does not answer questions outside the dataset domain
  3. Does not formulate medical diagnoses or give clinical advice
  4. Does not execute SQL passed directly by the user
  5. Does not reference tables outside the 5 defined in the schema (programmatic block)
  6. Does not continue after the 30-query limit
  7. Does not invent data — if not found in the database, says so
7 Tools
Tool Function
execute_sql Core NL→SQL with multi-layer guard rails
detect_anomalies Compares resultado_valor with referencia_min/max, calculates deviation %, classifies as altered/critical
compare_periods 2 aggregation queries, absolute diff + percentage change per column
rank_patients CTE with 4 weighted factors, top N with score and justification
patient_profile 4 internal queries, clinical prose synthesis by Sonnet
trend_analysis Splits records into 2 halves, compares averages, interprets relative to reference range
calculate Safe AST evaluator for compound growth, exponential, multi-step formulas
Guard Rails

Guard rails are programmatic validations in the code, not in the prompt. The model does not decide what is allowed — the code decides first.

Guard Rail Type Trigger Action
Write blocked Programmatic Haiku classifies as write_attempt Blocks, responds without calling DB
Query limit Programmatic Counter ≥ 30 Ends session
Schema enforcement Programmatic SQL references table outside the 5 defined Rejects before executing
PII + sensitive diagnosis Programmatic Query would return name + condition without aggregation Forces aggregation or refuses
check_pii_result Programmatic Actual result contains name column + clinical columns Blocks — catches SELECT * and split-queries
Behavior by Situation
1

Clear question, direct query

Classifies as data_query, generates SELECT, executes, returns result with SQL and reasoning.

2

Ambiguous question

Asks 1 clarifying question per turn. If still ambiguous, picks the most common interpretation and informs which one was chosen.

3

Out-of-scope question

Classifies as out_of_scope and informs that it operates exclusively on the healthcare dataset.

4

Write or modification attempt

Haiku classifies as write_attempt → programmatic block before reaching the main model or database.

5

Query returns empty

Does not invent data, does not speculate. Reports that no results were found for the queried condition.

6

SQL error

Catches the error, attempts auto-correction once (re-sends error + SQL to model). If it fails again, informs the user. The attempt counts toward the 30-query limit.

7

Multi-step question

Explicitly decomposes before executing: reports how many queries will be used and what each one resolves.

8

Approaching the limit

Warns when 5 remain. Ends with a clear message at 30. Guides the user to start a new session.

9

PII + sensitive condition

Blocks return of identifiers combined with a specific diagnosis. Offers count or aggregation as alternative. Production-safe standard even with fake data.

10

Input is not a question (greeting)

Classifies as greeting, responds briefly. Does not decrement the query counter.

Schema — 5 tables

pacientes

Column Type Description
id INTEGER PK Identifier
nome TEXT Full name
data_nascimento DATE To calculate age
sexo TEXT 'M' or 'F'
cidade TEXT City of residence
estado TEXT State (UF)
plano_saude TEXT E.g.: "Unimed", "SUS", "Bradesco"

consultas

Column Type Description
id INTEGER PK Identifier
paciente_id INTEGER FK → pacientes
data_consulta DATE Consultation date
medico TEXT Doctor's name
especialidade TEXT E.g.: "Cardiology"
tipo TEXT 'elective', 'emergency', 'follow-up'
duracao_min INTEGER Duration in minutes

diagnosticos

Column Type Description
id INTEGER PK Identifier
paciente_id INTEGER FK → pacientes
consulta_id INTEGER FK → consultas
cid TEXT ICD code (e.g.: E11.9, I10)
descricao TEXT E.g.: "Type 2 Diabetes"
data_diagnostico DATE Diagnosis date
status TEXT 'active', 'chronic', 'resolved'

medicamentos

Column Type Description
id INTEGER PK Identifier
paciente_id INTEGER FK → pacientes
consulta_id INTEGER FK → consultas (nullable)
nome_comercial TEXT E.g.: "Losartan"
principio_ativo TEXT Active ingredient
dosagem TEXT E.g.: "500mg"
frequencia TEXT E.g.: "twice daily"
data_inicio DATE Treatment start
data_fim DATE End date (NULL = active)
status TEXT 'active', 'suspended', 'completed'

exames

Column Type Description
id INTEGER PK Identifier
paciente_id INTEGER FK → pacientes
consulta_id INTEGER FK → consultas (nullable)
tipo_exame TEXT 'laboratory', 'imaging', 'functional'
nome_exame TEXT E.g.: "Fasting blood glucose"
resultado_valor REAL For numeric results (nullable)
resultado_texto TEXT For textual results
unidade TEXT E.g.: "mg/dL"
referencia_min REAL Minimum normal value
referencia_max REAL Maximum normal value
data_coleta DATE Collection date
status TEXT 'normal', 'altered', 'critical'
Design Decisions
Decision Choice Reason
Intent classifier Haiku (model) More robust than regex for edge cases; negligible cost
Guard rails Programmatic, not via prompt Prompt can be bypassed; code cannot
SQL auto-correction Maximum 1 retry Loops are expensive; 2 failures signal input ambiguity
Query counting Includes failed attempts Aligns with the 30-query hackathon limit
SQL always exposed Yes Observability and credibility
SELECT * Never allowed PII leak via wildcard
EXISTS for multiple conditions Separate EXISTS, not AND on same row A diagnosis record only has one ICD code; AND would return empty
tool_choice="none" on final_response Enforced Prevents BadRequestError 400 when model tries to generate a second tool call
Language PT, ES and EN Patrick works in PT; hackathon is global
Observability — Log Structure

Each interaction is logged to logs/session_<id>.jsonl. One record per line.

{
  "session_id": "uuid",
  "query_number": 3,
  "timestamp": "ISO8601",
  "user_input": "...",
  "intent_classified": "data_query",
  "sql_generated": "SELECT ...",
  "execution_time_ms": 120,
  "rows_returned": 47,
  "error": null,
  "model_used": "claude-sonnet-4-6",
  "input_tokens": 340,
  "output_tokens": 85
}
Changelog
v0.3 — 16/05/2026
  • 7 tools: +detect_anomalies, +compare_periods, +rank_patients, +patient_profile, +trend_analysis, +calculate
  • 6 intents: +greeting, +report_request
  • Conversational memory with 30-message sliding window
  • Post-response next-question suggestions
  • Report mode without incrementing query_count
  • Multilingual PT/ES/EN — classifier detects language
v0.2 — 16/05/2026
  • 12 rules in SYSTEM_PROMPT (vs. 5 original) — 7 new ones emerged from test bugs
  • check_pii_result: analyzes actual result columns, not just the SQL
  • Support for multiple parallel tool_use blocks
  • tool_choice="none" on final_response — fixes BadRequestError 400
  • Haiku classifier calibration: examples of clarification_needed vs. data_query
v0.1 — 15/05/2026
  • Initial spec: overview, capabilities, 4 guard rails, 5-table schema
  • 10 behavior situations documented
  • JSONL log structure defined

HealthQuery — AI Agent Hackathon Pilot — Teradata Global Services — June 2026