Skip to main content

AI Enrichment

The FDS Transformer supports tiered AI enrichment - a multi-tier system that uses different AI models based on field complexity. This enables cost-effective, intelligent field generation while maintaining quality where it matters most.

Overview

Tiered enrichment groups fields by complexity:

TierModelUse CaseCostSpeed
SimpleClaude Haiku 4.5Fast, straightforward enrichmentLowFast
MediumClaude Sonnet 4.5Balanced accuracy/speedMediumMedium
ComplexClaude Sonnet 4.5Deep biomechanical analysisHigherSlower

This approach:

  • Reduces costs by using cheaper models for simple tasks
  • Improves accuracy by dedicating powerful models to complex analysis
  • Enables batching to reduce API calls

Requirements

  • OpenRouter API Key - Get one at openrouter.ai
  • Set the environment variable:
export OPENROUTER_API_KEY=your-api-key-here

Configuration

Basic Setup

Add the enrichment section to your mapping.json:

{
"enrichment": {
"enabled": true,
"provider": "openrouter",

"tiers": {
"simple": {
"model": "anthropic/claude-haiku-4.5",
"temperature": 0.1,
"maxTokens": 1000,
"batchSize": 5,
"priority": "speed"
},
"medium": {
"model": "anthropic/claude-sonnet-4.5",
"temperature": 0.1,
"maxTokens": 1500,
"batchSize": 3,
"priority": "balanced"
},
"complex": {
"model": "anthropic/claude-sonnet-4.5",
"temperature": 0.1,
"maxTokens": 2000,
"batchSize": 1,
"priority": "accuracy"
}
},

"fields": {
"canonical.aliases": { "tier": "simple", "prompt": "aliases" },
"classification.exerciseType": { "tier": "simple", "prompt": "classification-simple" },
"classification.level": { "tier": "simple", "prompt": "classification-simple" },
"metrics.primary": { "tier": "simple", "prompt": "metrics" },
"equipment.optional": { "tier": "simple", "prompt": "equipment" },

"constraints.contraindications": { "tier": "medium", "prompt": "constraints" },
"constraints.prerequisites": { "tier": "medium", "prompt": "constraints" },
"constraints.progressions": { "tier": "medium", "prompt": "progressions" },
"constraints.regressions": { "tier": "medium", "prompt": "progressions" },
"relations": { "tier": "medium", "prompt": "relations" },

"classification.movement": { "tier": "complex", "prompt": "biomechanics" },
"classification.mechanics": { "tier": "complex", "prompt": "biomechanics" },
"classification.force": { "tier": "complex", "prompt": "biomechanics" },
"classification.kineticChain": { "tier": "complex", "prompt": "biomechanics" },
"targets.secondary": { "tier": "complex", "prompt": "biomechanics" }
}
}
}

Tier Configuration

Each tier has these settings:

PropertyTypeDescription
modelstringOpenRouter model identifier
temperaturenumberGeneration temperature (0-1). Lower = more deterministic
maxTokensnumberMaximum tokens for response
batchSizenumberNumber of exercises to process together
prioritystringOptimization hint: speed, balanced, or accuracy

Field Configuration

Each field in the fields object:

PropertyTypeDescription
tierstringWhich tier to use: simple, medium, complex
promptstringPrompt template key
enumstring[]Valid values (for constrained fields)
requiredbooleanWhether field must be populated

Field-to-Tier Mapping

Simple Tier Fields

Fast enrichment for straightforward data:

FieldPromptDescription
canonical.aliasesaliasesAlternative names for the exercise
classification.exerciseTypeclassification-simpleExercise type (strength, cardio, etc.)
classification.levelclassification-simpleDifficulty level
metrics.primarymetricsPrimary measurement type
equipment.optionalequipmentOptional equipment suggestions

Medium Tier Fields

Balanced enrichment for relational data:

FieldPromptDescription
constraints.contraindicationsconstraintsMedical/injury contraindications
constraints.prerequisitesconstraintsRequired abilities
constraints.progressionsprogressionsHarder variations
constraints.regressionsprogressionsEasier variations
relationsrelationsRelated exercise references

Complex Tier Fields

Deep analysis for biomechanical data:

FieldPromptDescription
classification.movementbiomechanicsMovement pattern classification
classification.mechanicsbiomechanicsCompound vs isolation
classification.forcebiomechanicsForce direction (push/pull/static)
classification.kineticChainbiomechanicsOpen vs closed chain
targets.secondarybiomechanicsSecondary muscles engaged

Running Enrichment

Full Enrichment (All Tiers)

fds-transformer transform \
--input ./exercises.json \
--config ./mapping.json \
--output ./fds-output/

Single Tier Only

Run specific tiers to control costs or debug:

# Simple tier only (fastest, cheapest)
fds-transformer transform \
--input ./exercises.json \
--config ./mapping.json \
--tier simple

# Medium tier only
fds-transformer transform \
--input ./exercises.json \
--config ./mapping.json \
--tier medium

# Complex tier only (most detailed)
fds-transformer transform \
--input ./exercises.json \
--config ./mapping.json \
--tier complex

Skip Enrichment

Transform without any AI enrichment:

fds-transformer transform \
--input ./exercises.json \
--config ./mapping.json \
--no-enrichment

Cost Estimation

Preview costs before running:

fds-transformer transform \
--input ./exercises.json \
--config ./mapping.json \
--estimate-cost

Output:

┌───────────────────────────────────────────────────────────────────────┐
│ Cost Estimation │
├───────────────────────────────────────────────────────────────────────┤
│ Input: 1,323 exercises │
│ Enrichment fields: 18 (6 simple, 5 medium, 7 complex) │
│ │
│ Tier │ Model │ Batch │ Calls │ Tokens │ Cost │
│ ───────────┼────────────────────┼───────┼────────┼──────────┼────────│
│ Simple │ claude-haiku-4.5 │ 5 │ 265 │ ~53K │ $0.42 │
│ Medium │ claude-sonnet-4.5 │ 3 │ 441 │ ~132K │ $1.98 │
│ Complex │ claude-sonnet-4.5 │ 1 │ 1,323 │ ~529K │ $7.94 │
│ ───────────┴────────────────────┴───────┴────────┴──────────┴────────│
│ TOTAL │ 2,029 │ ~0.71M │ $10.34 │
│ │
│ Estimated time: 40 minutes (at 50 requests/min) │
└───────────────────────────────────────────────────────────────────────┘

Fallback & Error Handling

Configure graceful degradation:

{
"enrichment": {
"fallback": {
"retries": 2,
"degradeModel": true,
"useDefaults": true,
"degradeChain": {
"complex": "medium",
"medium": "simple",
"simple": null
}
}
}
}
PropertyTypeDescription
retriesnumberNumber of retries before degrading
degradeModelbooleanTry lower-tier model on failure
useDefaultsbooleanUse defaults on complete failure
degradeChainobjectModel fallback chain

Rate Limiting

Control API request rate:

{
"enrichment": {
"rateLimit": {
"requestsPerMinute": 50,
"backoffStrategy": "exponential",
"initialBackoffMs": 1000,
"maxBackoffMs": 60000
}
}
}
PropertyTypeDefaultDescription
requestsPerMinutenumber50Max requests per minute
backoffStrategystringexponentialBackoff type: exponential, linear, fixed
initialBackoffMsnumber1000Initial backoff delay
maxBackoffMsnumber60000Maximum backoff delay

Checkpoints & Resume

Enable checkpoint saving for long runs:

{
"enrichment": {
"checkpoint": {
"enabled": true,
"saveInterval": 10
}
}
}

Resume from checkpoint:

fds-transformer transform \
--input ./exercises.json \
--config ./mapping.json \
--resume

Debug Mode

Enable verbose logging:

DEBUG_ENRICHMENT=true fds-transformer transform \
--input ./exercises.json \
--config ./mapping.json \
--log-level debug

This outputs:

  • Prompts sent to AI
  • Raw responses
  • Token usage per request
  • Timing information

Per-Field Enrichment

For simpler use cases or when you need fine-grained control, configure enrichment per-field in mappings:

{
"mappings": {
"canonical.description": {
"from": "description",
"enrichment": {
"enabled": true,
"prompt": "exercise_description",
"context": ["name", "target", "equipment"],
"when": "missing",
"fallback": "No description available"
}
}
}
}
PropertyTypeDescription
enabledbooleanEnable enrichment for this field
promptstringPrompt template key or custom prompt
contextstring[]Source fields to include as context
whenstringWhen to enrich: always, missing, empty, notFound
fallbackanyValue to use if enrichment fails
validatebooleanValidate enriched value against schema

Environment Variables

VariableDescription
OPENROUTER_API_KEYAPI key for OpenRouter (required)
FDS_TRANSFORMER_MODELOverride default model for all tiers
DEBUG_ENRICHMENTSet to true for verbose logging

Best Practices

  1. Start with cost estimation - Always run --estimate-cost first
  2. Test with small batches - Try 10-20 items before full runs
  3. Use tier filtering - Debug one tier at a time with --tier
  4. Enable checkpoints - Always enable for large datasets
  5. Monitor token usage - Check debug output for optimization opportunities
  6. Use appropriate batch sizes - Larger batches reduce costs but may increase failures

See Also