Usage¶

Pipeline¶

The pipeline is the recommended way to use hiperhealth. It runs clinical stages independently through composable skills.

Running a single stage¶

from hiperhealth.pipeline import PipelineContext, Stage, create_default_runner

runner = create_default_runner()

ctx = PipelineContext(
    patient={'symptoms': 'chest pain, shortness of breath', 'age': 45},
    language='en',
    session_id='visit-1',
)

ctx = runner.run(Stage.DIAGNOSIS, ctx)
print(ctx.results['diagnosis'].summary)
print(ctx.results['diagnosis'].options)

Running multiple stages¶

ctx = runner.run_many([Stage.SCREENING, Stage.DIAGNOSIS, Stage.EXAM], ctx)

Persisting context between sessions¶

Stages can be executed at different times by different actors. Serialize the context to JSON between invocations:

# Monday — nurse runs screening
ctx = PipelineContext(
    patient={'symptoms': 'Patient John has fever and cough', 'age': 30},
    language='pt',
    session_id='encounter-42',
)
runner = create_default_runner()
ctx = runner.run(Stage.SCREENING, ctx)

# Save to database, file, or message queue
saved_json = ctx.model_dump_json()

# Wednesday — physician restores context and runs diagnosis
ctx = PipelineContext.model_validate_json(saved_json)
ctx = runner.run(Stage.DIAGNOSIS, ctx)

Available stages¶

Stage	Description
`screening`	Initial triage, PII de-identification
`intake`	Data extraction from reports and wearable files
`diagnosis`	LLM-powered differential diagnosis
`exam`	Exam/procedure suggestions based on diagnosis
`treatment`	Treatment planning (extensible via custom skills)
`prescription`	Prescription generation (extensible via custom skills)

Built-in skills¶

The create_default_runner() factory registers three built-in skills in this order:

Skill	Stages	Description
`PrivacySkill`	screening, intake	De-identifies PII in patient data
`ExtractionSkill`	intake	Extracts text from medical reports and wearable data
`DiagnosticsSkill`	diagnosis, exam	LLM-powered diagnosis and exam suggestions

Skills run in registration order, so PrivacySkill always runs before ExtractionSkill within the same stage.

Installing and registering custom skills¶

Custom skills are typically published through channel repositories. Register a channel once, inspect its skills, then install and register the canonical skill id you want:

from hiperhealth.pipeline import SkillRegistry, create_default_runner

registry = SkillRegistry()
registry.add_channel(
    'https://github.com/my-org/traditional-medicine.git',
    local_name='tm',
)

registry.list_channels()
registry.list_channel_skills('tm')
registry.install_skill('tm.ayurveda')

runner = create_default_runner()
runner.register('tm.ayurveda', index=0)

In notebooks and scripts, registry.list_skills() is useful for exploring all registered channel skills and built-ins from one place.

registry.list_skills()
registry.list_skills(channel='tm')
registry.list_skills(channel='tm', installed_only=True)

To compare results with and without a specific skill, temporarily disable it at the runner layer without uninstalling or unregistering it:

with runner.disabled({'tm.ayurveda'}):
    ctx_without_ayurveda = runner.run(Stage.TREATMENT, ctx)

ctx_with_ayurveda = runner.run(Stage.TREATMENT, ctx)

For one-off calls, run() also accepts disabled_skills=.

For channel lifecycle operations such as update_channel(), remove_channel(), install_channel(include_disabled=True), and the skills-channel.yaml / skill.yaml manifest layout, see Creating Skills.

See Creating Skills for the full channel repository layout and manifest schema.

Session-based workflow¶

For multi-visit clinical scenarios, sessions provide a parquet-backed event log that persists the full interaction history. The calling system manages the session file lifecycle (storage, deletion, retention).

Creating and loading sessions¶

from hiperhealth.pipeline import Session

# Create a new session
session = Session.create('/data/sessions/patient-visit.parquet', language='en')

# Provide clinical data (no PII — only clinical information)
session.set_clinical_data({
    'symptoms': 'chronic bloating, fatigue',
    'age': 34,
    'biological_sex': 'female',
})

# Load an existing session (e.g., days later)
session = Session.load('/data/sessions/patient-visit.parquet')

Checking requirements before execution¶

Skills declare what information they need before a stage can run. Use check_requirements() to gather inquiries from all relevant skills:

from hiperhealth.pipeline import Session, Stage, create_default_runner

runner = create_default_runner()
session = Session.load('/data/sessions/patient-visit.parquet')

inquiries = runner.check_requirements(Stage.DIAGNOSIS, session)
for inq in inquiries:
    print(f'[{inq.priority}] {inq.field}: {inq.label}')

Inquiries have three priority levels reflecting clinical data availability:

Priority	Meaning	Example
`required`	Must have before this stage can run	Basic symptoms for diagnosis
`supplementary`	Would improve results, available now	Dietary history, medication list
`deferred`	Only available after a future pipeline step	Lab results (after exam stage)

Providing answers and running stages¶

# Patient provides answers
session.provide_answers({'dietary_history': 'High carb, low fiber...'})

# Re-check — are required fields satisfied?
inquiries = runner.check_requirements(Stage.DIAGNOSIS, session)
required = [i for i in inquiries if i.priority == 'required']

if not required:
    runner.run_session(Stage.DIAGNOSIS, session, llm=my_llm)

Multi-visit workflow¶

Not all data is available at the same time. A typical multi-visit flow:

# Visit 1: Preliminary diagnosis with available data
runner.run_session(Stage.DIAGNOSIS, session, llm=my_llm)
runner.run_session(Stage.EXAM, session, llm=my_llm)  # requests lab work

# Visit 2: Lab results arrive, re-run with enriched data
session = Session.load('/data/sessions/patient-visit.parquet')
session.provide_answers({'stool_analysis': lab_results})
runner.run_session(Stage.DIAGNOSIS, session, llm=my_llm)  # complete diagnosis

# Visit 3: Treatment plan
runner.check_requirements(Stage.TREATMENT, session)
runner.run_session(Stage.TREATMENT, session, llm=my_llm)

Inspecting session state¶

session.clinical_data       # all patient data (merged from events)
session.results             # stage results keyed by stage name
session.pending_inquiries   # unanswered inquiries
session.stages_completed    # which stages have run
session.events              # raw event log

Interactive analysis in Jupyter notebooks¶

hiperhealth is designed to work as a data science framework for clinical analysis. Physicians can use it directly from Jupyter notebooks to study patient cases:

from hiperhealth.pipeline import Session, Stage, create_default_runner

runner = create_default_runner()

session = Session.create('/tmp/case-study.parquet')
session.set_clinical_data({
    'symptoms': 'chronic fatigue, joint pain, morning stiffness',
    'age': 52,
    'biological_sex': 'female',
    'family_history': 'rheumatoid arthritis (mother)',
})

# Check what information skills need
inquiries = runner.check_requirements(Stage.DIAGNOSIS, session)
for inq in inquiries:
    print(f'  [{inq.priority}] {inq.label}')

# Provide supplementary data interactively
session.provide_answers({
    'rheumatoid_factor': 'positive, 45 IU/mL',
    'anti_ccp': 'positive, 120 U/mL',
    'esr': '38 mm/hr',
})

# Run diagnosis
runner.run_session(Stage.DIAGNOSIS, session, llm=my_llm)
print(session.results[Stage.DIAGNOSIS])

Analyzing session data with pandas or polars¶

The session parquet file is a standard parquet that can be queried directly:

import polars as pl

df = pl.read_parquet('/tmp/case-study.parquet')

# See all events
df

# Filter to specific event types
df.filter(pl.col('event_type') == 'inquiries_raised')

# See what stages have been completed
df.filter(pl.col('event_type') == 'stage_completed').select('stage', 'timestamp')

Diagnostics¶

The diagnostics helpers return LLMDiagnosis objects with:

summary: short summary text
options: suggested diagnoses or exam/procedure names

Supported output languages are:

en
pt
es
fr
it

Unknown language values fall back to English.

Differential diagnosis¶

from hiperhealth.skills.diagnostics.core import differential

patient = {
    'age': 45,
    'gender': 'M',
    'symptoms': 'chest pain, shortness of breath',
    'previous_tests': 'ECG normal',
}

result = differential(patient, language='en', session_id='demo-1')
print(result.summary)
print(result.options)

Suggested exams and procedures¶

from hiperhealth.skills.diagnostics.core import exams

result = exams(
    ['Acute coronary syndrome'],
    language='en',
    session_id='demo-1',
)
print(result.summary)
print(result.options)

Runtime configuration in code¶

from hiperhealth.skills.diagnostics.core import differential
from hiperhealth.llm import LLMSettings

settings = LLMSettings(
    provider='ollama',
    model='llama3.2:3b',
    api_params={'base_url': 'http://localhost:11434/v1'},
)

result = differential(
    {'symptoms': 'headache'},
    llm_settings=settings,
)

Medical report extraction¶

Medical reports are extracted locally from PDF or image files. The extractor returns text and metadata, not FHIR resources.

Supported inputs:

pdf
png
jpg
jpeg

Example:

from hiperhealth.skills.extraction.medical_reports import (
    MedicalReportFileExtractor,
)

extractor = MedicalReportFileExtractor()
report = extractor.extract_report_data(
    'tests/data/reports/pdf_reports/report-1.pdf'
)

print(report['source_name'])
print(report['mime_type'])
print(report['text'][:200])

Returned payload keys:

source_name
source_type
mime_type
text

If you only need the raw text:

text = extractor.extract_text('tests/data/reports/pdf_reports/report-1.pdf')

Wearable data extraction¶

Wearable data extraction supports CSV and JSON inputs and returns a normalized list of dictionaries.

from hiperhealth.skills.extraction.wearable import WearableDataFileExtractor

extractor = WearableDataFileExtractor()
data = extractor.extract_wearable_data(
    'tests/data/wearable/wearable_data.csv'
)
print(data[:2])

De-identification¶

from hiperhealth.skills.privacy.deidentifier import (
    Deidentifier,
    deidentify_patient_record,
)

engine = Deidentifier()
record = {
    'symptoms': 'Patient John Doe reports severe headache.',
    'mental_health': 'Lives at 123 Main St',
}
clean = deidentify_patient_record(record, engine)
print(clean)

Raw LLM output capture¶

Diagnostics responses are normalized and then written to data/llm_raw/ using the supplied session_id when present.

Backward compatibility¶

The old import paths continue to work:

# These still work
from hiperhealth.agents.diagnostics.core import differential, exams
from hiperhealth.agents.extraction.medical_reports import MedicalReportFileExtractor
from hiperhealth.agents.extraction.wearable import WearableDataFileExtractor
from hiperhealth.privacy.deidentifier import Deidentifier

The canonical locations are now under hiperhealth.skills.* and hiperhealth.pipeline.