AI Schema Discovery

Watch LangExtract automatically discover extraction patterns from your data

The Innovation

🔍

Zero Configuration

No manual YAML schemas required. LangExtract analyzes your text and suggests extraction patterns automatically.

🧠

Domain Intelligence

Adapts extraction patterns based on text type - medical notes get different schemas than scientific papers.

Instant Results

From sample text to structured data in seconds. No domain expertise or manual configuration needed.

Traditional Approach

# Manual schema.yaml - hours of work
extraction_schema:
  patient_info:
    age: "Extract age near 'year-old'"
    gender: "Extract male or female"
  medications:
    drug: "Extract medication names"
    dosage: "Extract dosage amounts"
# ... 50+ more lines of configuration

Aperio + LangExtract

# AI discovers schema automatically
result = lx.extract(
    text=medical_text,
    prompt="Extract patient info for analytics",
    examples=[simple_example]
)
# ✅ Complete schema discovered automatically
# ✅ No manual configuration required
                    

Live Demo Results

15
Total Entities Extracted
2
Domains Analyzed
10
Schema Categories Discovered
0
Manual Configuration Files

Medical Domain Schema

Automatically discovered categories:

  • PATIENT_DEMO - Demographics and patient info
  • CONDITION - Medical conditions and diagnoses
  • MEDICATION - Drugs, dosages, frequencies
  • TREATMENT - Procedures and recommendations
  • DIAGNOSIS - Clinical assessments

Scientific Domain Schema

Completely different patterns discovered:

  • METHOD - Research techniques and approaches
  • PERFORMANCE - Accuracy metrics and results
  • DATASET - Research data sources
  • INFRASTRUCTURE - Hardware and training details
  • IMPROVEMENT - Performance gains and innovations

Knowledge Graph Visualization

Interactive network graphs showing relationships between extracted entities across both domains.

Knowledge graphs showing medical and scientific domain extractions

Left: Medical domain relationships (patient data, conditions, treatments)
Right: Scientific domain relationships (methods, performance, datasets)

Try It Yourself

Ready to explore AI-powered schema discovery? Get started in minutes.

git clone https://github.com/knightsri/aperio
cd aperio
python -m venv aperio_env
source aperio_env/Scripts/activate
pip install -r requirements.txt
cp .env.example .env

# Add your Gemini API key to .env
jupyter notebook aperio_demo.ipynb