AMC vs Everything Else

Dimension	AMC	Self-Reported Docs	MTEB / Benchmarks	Model Cards	SOC2 Audits	Manual Red-Teaming	AgentBench	HELM
Evidence Source	Execution evidence (observed)	Self-reported	Synthetic benchmarks	Vendor-authored	Point-in-time audit	Manual testing	Synthetic tasks	Synthetic scenarios
Tamper Resistance	Ed25519 + Merkle tree	None	None	None	Auditor independence	Human judgment	None	None
Continuous Monitoring	Real-time + drift detection	Static docs	One-time run	Static document	Annual audit	Periodic testing	One-time eval	One-time eval
Framework Coverage	14 adapters (LangChain, CrewAI, etc.)	Framework-specific	Model-level only	Model-level only	Org-level	Single agent	Limited frameworks	Model-level only
EU AI Act Ready	Full compliance mapping	No mapping	Not designed for it	Some transparency	Partial overlap	No compliance focus	Not designed for it	Transparency focus
Cost	Free (MIT licensed)	Free	Free	Free	$50K–$200K/year	$10K–$100K/engagement	Free	Free
Time to First Score	2 minutes	Hours to write	Minutes to run	Weeks to prepare	3–6 months	Days to weeks	Hours to setup	Hours to setup
Anti-Gaming	Temporal + cross-ref + freshness	Easily gamed	Data contamination risk	Cherry-picked results	Scope limitations	Evaluator bias	Limited checks	Limited checks
Agent-Level (not Model)	Full agent evaluation	Depends on docs	Model only	Model only	Org-level	Agent-level	Agent-level	Model only
Domain Correctness Proof	Bounded amcproof lane: proven/disproven/unsupported against declared rule manifests	Claims only	Benchmark score only	Narrative limitations	Control audit, not answer proof	Human review only	No source-to-rule proof	No source-to-rule proof

The Core Problem

When you evaluate an AI agent using self-reported documentation, you get a score of 100/100. When AMC evaluates the same agent from execution evidence — watching what it actually does — the real score is 16/100.

That 84-point gap is documentation inflation. Every approach on this page except AMC suffers from some form of it, because they either let the agent provide its own evidence, test only at the model level, or evaluate at a single point in time.

Why AMC Wins

🔐 Cryptographic Evidence

Ed25519 signatures + Merkle tree ledger. Evidence is tamper-evident by design — you can't fake your way to L5.

📡 Continuous, Not Point-in-Time

AMC monitors continuously with drift detection. SOC2 audits give you a snapshot; AMC gives you a movie.

🤖 Agent-Level, Not Model-Level

MTEB and HELM evaluate models. AMC evaluates the full agent — tools, memory, governance, the whole stack.

🔌 14 Framework Adapters

LangChain, CrewAI, AutoGen, OpenAI Agents SDK, and more. One env var change. Zero code modifications.

⚖️ EU AI Act Mapped

Full compliance mapping to EU AI Act articles. August 2026 deadline is coming — AMC gets you ready now.

🧾 Domain Proof Lane

For declared rule manifests, AMC can say proven, disproven, or unsupported — without pretending evidence receipts prove answer correctness.

💰 Free & Open Source

MIT licensed. No $200K SOC2 audits. No $100K red-team engagements. Zero to first score in 2 minutes.