ai agent trust scoring
Run one command_
Get the full score.
Fix the gaps.
Agent Maturity Compass checks how ready your AI agent is for real work. No account. No setup maze. Run amc for the 244-question default score, then expand into lifecycle proof, Watch, Fleet, and compliance receipts.
install
Start with one command_
Copy this into your agent project. AMC creates the workspace, runs the full score, and prints the result.
No install first run
Best for trying AMC once. It runs the full score immediately.
copy command →Install globally
$ amc
Install once, then run amc in any agent project.
desktop studio
Prefer buttons? Run Studio.
AMC includes macOS and Windows launcher apps for Agent Maturity Compass Studio, plus the same local control panel for scores, evidence, reports, assurance packs, Industry Packs, and run history.
The macOS and Windows launchers start demo-mode Studio and open the local console in the system browser without bundling Electron.
Overall level, layer bars, evidence coverage, and top gaps.
Assurance is free core. Industry Packs show locked and unlocked states.
Generate outputs for engineering, compliance, and leadership review.
current capabilities
More than a score.
Proof you can operate_
AMC now connects the default score to proof receipts, runtime Watch signals, fleet topology, policy enforcement, compliance binders, and portable evidence. These are AMC-owned surfaces, not source-specific wrappers.
Run the stable default bank, or opt into lifecycle questions for governance, evidence binding, runtime watch, proof exports, memory, and fleet operation.
amc proof checkSource-to-rule checks emit amcproof artifacts and fail closed as unsupported when correctness proof is missing.
Correlate sessions, risk, cost, latency, override near-misses, and incident-to-regression closure with signed evidence refs.
Use fleet overview, graph validation, SLO status, and trust-graph exports before scoring multi-agent systems.
Runtime firewall decisions, resource snapshots, safe confirmation proof, and 147 assurance packs stay in the free core.
The generated command inventory, API reference, OpenAPI playground, and docs hub expose the same product surface users run locally.
AMC separates what an agent claims from what execution evidence proves.
the trust stack
Eight named surfaces_
One complete trust platform
The top of the page is intentionally simple. This is the deeper AMC map: score, harden, enforce, prove, monitor, comply, govern fleets, and issue portable trust identity.
platform surfaces
Score trust before you ship
Execution-backed scoring, assurance, enforcement, evidence, monitoring, compliance, fleet management, and portable trust identity.
integrations
Works with your stack_
14 built-in adapters. Works with OpenAI-compatible endpoints. No framework rewrite required.
our approach
Expertise built on
evidence_
Score and analyze
Point AMC at any agent. The default 244-question full score runs against execution evidence, while the optional 264-question lifecycle set adds runtime, proof, memory, and fleet coverage.
Fix and harden
AMC identifies high-signal gaps, generates targeted guardrail work, and uses 147 assurance packs for prompt injection, exfiltration, memory risk, sycophancy, and other adversarial failures.
Ship and monitor
Add CI gates, keep signed local evidence, prevent trust regressions, and generate compliance artifacts for EU AI Act, ISO 42001, NIST AI RMF, SOC 2, and OWASP LLM Top 10 review.
research foundation
Built on primary sources_
AMC is grounded in AI safety, security, governance, compliance, and agent-system research. The modules map papers and standards into scoring controls, assurance packs, and operational evidence.
NIST AI Risk Management Framework
Core governance structure for trustworthy AI deployment and evaluation.
maps to governance, risk, and cross-framework evidencenist.gov →Levels of AGI — Operationalizing Progress
Maturity levels framework for AI capabilities and autonomy.
maps to L0-L5 maturity and autonomy scoringarxiv 2311.02462 →Connecting the Dots — LLMs Infer Latent Info
Frontier models infer information from distributed evidence.
maps to context leakage and information barriersarxiv 2406.14546 →Persistent Memory Injection
Cross-session memory poisoning through self-reinforcing payloads.
maps to memory integrity and persistence checksarxiv 2602.15654 →Monitor Bypass and Agent-as-a-Proxy Risk
Shows why monitoring-only defenses can fail without runtime controls.
maps to monitor bypass resistance and enforcementarxiv 2602.05066 →MCP Security Bench
Evaluation coverage for MCP-specific attack vectors and tool-boundary failures.
maps to MCP compliance and security resilience packsarxiv 2510.15994 →Economic Denial of Service
Cost amplification through tool-calling and hidden runtime loops.
maps to budget controls and cost predictabilityarxiv 2601.10955 →Sycophancy and Objective Drift
Agent behavior can optimize social approval while drifting from ground truth.
maps to sycophancy, alignment, and refusal checksarxiv 2602.08092 →Agent Maturity Compass whitepaper → / see research notes and gap analysis →
pricing
Free core. paid domain depth_
AMC Core
CLI, Desktop Studio, full score, assurance, evidence, reports, and CI gates.
Industry Packs
All 41 Industry Domain Packs for regulated verticals.
Procurement path: Compare buyer packages for commercial offers by buyer type, proof surfaces, and purchase-ready artifacts.
industry packs
7 stations. 41 packs.
regulated depth_
Optional paid packs add sector-specific diagnostics grounded in real regulations: EU AI Act, HIPAA, FDA, NERC CIP, CITES, ILO, ISO, and other domain frameworks. One $9.99/month subscription unlocks all 41 packs.
Environment
Farm to Fork, Weave to Wear, Material to Machines, Source to Sustenance, Ubiquity to Utility, Sip to Sanitation.
SEE +Health
Digital Health Record, Wellness, Patient Lifecycle, Clinical Lifecycle, Professional Practice, Life Technology, Drug Discovery, Clinical Trials, Specialized Medicine.
SEE +Wealth
Future of Work, Digital Payments, No Poverty, Circular Economy, Blockchain and DeFi.
SEE +Education
K-12, Higher Education, Skills and Vocational, Specialized Education, Differently Abled.
SEE +Mobility
Sustainable Communities, Sustainable Ports, Sustainable Real Estate, Virtual Infrastructure, Privacy and Security, Freight/3PL/Warehouse.
SEE +Technology
Cognition to Intelligence, Networked Ecosystems, OS for Sustainable Outcomes, Infotainment, Partnerships for Prosperity.
SEE +Governance
Digital Citizens and Rights, Dance of Democracy, Petition to Law, Citizen Services, Public and Private Collaboration.
SEE +faq
Common questions_
No. The default amc command generates the full 244-question score. Use amc run --question-set lifecycle for the 264-question lifecycle-expanded set.
Self-reported documentation can claim 100/100 while execution-verified evidence shows a much weaker score. AMC closes that gap with observed evidence, signed artifacts, and repeatable proof chains.
AMC is a trusted observer. Self-reported evidence is capped at 0.4x weight; observed runtime evidence carries 1.0x weight. It scores maturity, finds gaps, runs assurance packs, and preserves verifiable evidence.
14 adapters: LangChain, LangGraph, CrewAI, AutoGen, OpenAI Agents SDK, LlamaIndex, Semantic Kernel, Claude Code, Gemini, OpenClaw, OpenHands, Python SDK, CLI, and OpenAI-compatible endpoints.
Yes. Use gates such as amc gate --min-score L3 to fail builds below threshold and prevent trust regressions.
No account is needed for the free core. Industry Packs require paid access.
No. Use npx agent-maturity-compass or install with npm. Docker is optional.
Yes. Run amc up and use the local Studio dashboard.
The core trust stack is MIT licensed: Score, Shield, Enforce, Vault, Watch, Comply, Fleet, Passport, adapters, Studio, reports, and CI gates. The paid add-on is $9.99/month access to all 41 Industry Domain Packs.