← Back to framework guides
Framework Guide
Benchmark multi-agent crews before they burn real budget.
Crucible helps CrewAI teams test delegation, task settlement, and runaway tool costs using a single benchmark language across every worker.
How this integration works
Wrap the crew kickoff entrypoint and keep your crew architecture intact.
Score delegation quality, cost discipline, and control integrity together.
Publish report-linked runs for every major crew release candidate.
Starter example
from crucible import evaluate, wrap_crewai_agent
wrapped = wrap_crewai_agent(my_crew)
result = evaluate(
wrapped,
name="ResearchCrew-v3",
scenario="research-ops",
seed=19,
)