🔬 OpenEnv Benchmark · Food Safety

Trace. Contain.
Save Lives.

An AI benchmark where agents act as food safety investigators — tracing contaminated food through a live supply chain under noisy sensors, delayed reports, and limited budgets.

📡 API Documentation

🌾 Farm → ⚙ Processing → 📦 Warehouse → 🏪 Retailer

What We Built

A Real Operations Benchmark

Not a toy control task — FoodCrisisEnv models real FDA-style supply chain outbreak response with partial observability, delayed signals, and budget trade-offs.

🧪

Noisy Observations

True contamination is hidden. Agents only see noisy sensor readings that can spike on clean nodes and miss real sources.

⏱

Delayed Illness Reports

Consumer illness reports arrive hours after exposure. By the time signals are obvious, contamination may already be downstream.

💰

Limited Resources

Finite lab tests and recall budget — inspecting every node or recalling every batch is not an option.

👥

Public Trust

False quarantines and unnecessary alerts damage trust. Overreaction has real cost just like underreaction.

🤖

LLM-Ready

Every observation includes a natural language summary. Drop in any LLM as an agent with our prompt template.

📊

Deterministic Grading

Episodes scored on containment, precision, speed, and trust preservation. Reproducible with seed control.

How It Works

Three Steps to Contain

Reset & Observe

Start an episode. The supply chain spawns with hidden contamination sources and batches already moving downstream.

Decide & Act

Choose from 7 actions — INSPECT, QUARANTINE, LIFT, RECALL, TRACE, ALERT, or WAIT — each with real consequences.

Contain & Score

Find the source, block the spread, recall contaminated batches. Your score reflects containment, precision, speed, and trust.

Agent Toolbox

7 Actions, Real Consequences

Every action has a cost. Finding the outbreak source is worth 4x more than chasing downstream symptoms.

🔬

INSPECT

Lab test a node. Exact contamination result, costs 1 lab token.

🚫

QUARANTINE

Block outbound spread from node. +4.0 for source, −2.0 if wrong.

✅

LIFT

Remove quarantine, restore flow. Builds trust if node was clean.

📦

RECALL

Remove batch from chain. +1.5 for contaminated, −1.0 if clean.

🔍

TRACE

Trace batch path upstream. Low-cost intel to find the true source.

⚠️

ALERT

Retailer warning. Slows exposure but permanently reduces trust.

⏸

WAIT

Let the system evolve. Useful when waiting for lab results.

Task Suite

Three Difficulty Levels

EasyTask 1

Single source · Low noise (0.05) · 1-step illness delay · 10 lab tests · 100 recall budget · 48 steps

MediumTask 2

Multi-source · Noisy sensors (0.15) · 3-step delay · 6 lab tests · 60 recall budget · 60 steps

HardTask 3

Adversarial spikes · Re-seeding · 5-step delay · 4 lab tests · 40 recall budget · 72 steps

API Endpoints

Plug in Your Agent

Method	Path	Description
GET	/health	Liveness check
POST	/reset	Start new episode
POST	/step	Execute one action
GET	/state	Full observation
GET	/docs	Swagger UI

Quick Start

5 Lines of Code

Python

from irce.client import FoodCrisisEnvClient
from irce.models import FoodCrisisAction

with FoodCrisisEnvClient(BASE).sync() as env:
    obs = env.reset(task_id=1)
    while not obs.done:
        obs = env.step(FoodCrisisAction(
            action_type="TRACE batch_001"
        ))

Ready to Investigate?

Launch the interactive simulation — control the environment manually or let an LLM agent decide.

Trace. Contain.
Save Lives.

Noisy Observations

Delayed Illness Reports

Limited Resources

Public Trust

LLM-Ready

Deterministic Grading

Reset & Observe

Decide & Act

Contain & Score

INSPECT

QUARANTINE

LIFT

RECALL

TRACE

ALERT

WAIT

⬡ Mission Control

⬡ Operations

Trace. Contain.Save Lives.

Noisy Observations

Delayed Illness Reports

Limited Resources

Public Trust

LLM-Ready

Deterministic Grading

Reset & Observe

Decide & Act

Contain & Score

INSPECT

QUARANTINE

LIFT

RECALL

TRACE

ALERT

WAIT

⬡ Mission Control

⬡ Operations

Trace. Contain.
Save Lives.