Framework Guide

Benchmark multi-agent crews before they burn real budget.

Crucible helps CrewAI teams test delegation, task settlement, and runaway tool costs using a single benchmark language across every worker.

How this integration works

Wrap the crew kickoff entrypoint and keep your crew architecture intact.

Score delegation quality, cost discipline, and control integrity together.

Publish report-linked runs for every major crew release candidate.

Starter example

from crucible import evaluate, wrap_crewai_agent

wrapped = wrap_crewai_agent(my_crew)
result = evaluate(
    wrapped,
    name="ResearchCrew-v3",
    scenario="research-ops",
    seed=19,
)