🔬 OpenEnv Benchmark · Food Safety

Trace. Contain.
Save Lives.

An AI benchmark where agents act as food safety investigators — tracing contaminated food through a live supply chain under noisy sensors, delayed reports, and limited budgets.

📡 API Documentation
🌾 Farm ⚙ Processing 📦 Warehouse 🏪 Retailer
A Real Operations Benchmark
Not a toy control task — FoodCrisisEnv models real FDA-style supply chain outbreak response with partial observability, delayed signals, and budget trade-offs.
🧪

Noisy Observations

True contamination is hidden. Agents only see noisy sensor readings that can spike on clean nodes and miss real sources.

Delayed Illness Reports

Consumer illness reports arrive hours after exposure. By the time signals are obvious, contamination may already be downstream.

💰

Limited Resources

Finite lab tests and recall budget — inspecting every node or recalling every batch is not an option.

👥

Public Trust

False quarantines and unnecessary alerts damage trust. Overreaction has real cost just like underreaction.

🤖

LLM-Ready

Every observation includes a natural language summary. Drop in any LLM as an agent with our prompt template.

📊

Deterministic Grading

Episodes scored on containment, precision, speed, and trust preservation. Reproducible with seed control.

Three Steps to Contain
01

Reset & Observe

Start an episode. The supply chain spawns with hidden contamination sources and batches already moving downstream.

02

Decide & Act

Choose from 7 actions — INSPECT, QUARANTINE, LIFT, RECALL, TRACE, ALERT, or WAIT — each with real consequences.

03

Contain & Score

Find the source, block the spread, recall contaminated batches. Your score reflects containment, precision, speed, and trust.

7 Actions, Real Consequences
Every action has a cost. Finding the outbreak source is worth 4x more than chasing downstream symptoms.
🔬

INSPECT

Lab test a node. Exact contamination result, costs 1 lab token.

🚫

QUARANTINE

Block outbound spread from node. +4.0 for source, −2.0 if wrong.

LIFT

Remove quarantine, restore flow. Builds trust if node was clean.

📦

RECALL

Remove batch from chain. +1.5 for contaminated, −1.0 if clean.

🔍

TRACE

Trace batch path upstream. Low-cost intel to find the true source.

⚠️

ALERT

Retailer warning. Slows exposure but permanently reduces trust.

WAIT

Let the system evolve. Useful when waiting for lab results.

Three Difficulty Levels
EasyTask 1
Single source · Low noise (0.05) · 1-step illness delay · 10 lab tests · 100 recall budget · 48 steps
MediumTask 2
Multi-source · Noisy sensors (0.15) · 3-step delay · 6 lab tests · 60 recall budget · 60 steps
HardTask 3
Adversarial spikes · Re-seeding · 5-step delay · 4 lab tests · 40 recall budget · 72 steps
Plug in Your Agent
MethodPathDescription
GET/healthLiveness check
POST/resetStart new episode
POST/stepExecute one action
GET/stateFull observation
GET/docsSwagger UI
5 Lines of Code
Python
from irce.client import FoodCrisisEnvClient from irce.models import FoodCrisisAction with FoodCrisisEnvClient(BASE).sync() as env: obs = env.reset(task_id=1) while not obs.done: obs = env.step(FoodCrisisAction( action_type="TRACE batch_001" ))
Ready to Investigate?

Launch the interactive simulation — control the environment manually or let an LLM agent decide.

← Home
SIMULATION

⬡ Mission Control

Task Selection
Single source, low noise, fast reports, generous budgets.
Episode Progress
0%
STEP
0
/ 48
Resources
🔬 Lab Budget
📦 Recall Budget
👥 Public Trust
Action Feedback
Total Mission Score 0.00
Latest Reward
Outcome
Supply Chain
Clean Suspect Contaminated Unknown Q
Loading supply chain...
EPISODE COMPLETE
FINAL SCORE

⬡ Operations

Execute Action
Lab Results
No results yet.
Quarantined Nodes
None.
⚠ Illness Reports
No reports yet.
Traced Batches
None traced.
Episode Log
Awaiting first action...