ai agent trust scoring

Run one command_

Get the full score.

Fix the gaps.

Agent Maturity Compass checks how ready your AI agent is for real work. No account. No setup maze. Run amc for the 244-question default score, then expand into lifecycle proof, Watch, Fleet, and compliance receipts.

$ npx agent-maturity-compass

install amc → try in browser →

244default questions

264lifecycle questions

1,144CLI command paths

147assurance packs

41industry packs

8,150collected tests

install

Start with one command_

Copy this into your agent project. AMC creates the workspace, runs the full score, and prints the result.

recommended

No install first run

$ npx agent-maturity-compass

Best for trying AMC once. It runs the full score immediately.

copy command →

daily use

Install globally

$ npm i -g agent-maturity-compass
$ amc

Install once, then run amc in any agent project.

copy command →

desktop studio

Prefer buttons? Run Studio.

AMC includes macOS and Windows launcher apps for Agent Maturity Compass Studio, plus the same local control panel for scores, evidence, reports, assurance packs, Industry Packs, and run history.

macOS app Windows app no account score dashboard signed evidence

amc up

$ open "Agent Maturity Compass Studio.app"

Studiohttp://127.0.0.1:3212

WindowsAgent Maturity Compass Studio.cmd

Run full scorebutton + CLI

Assurance packs147 adversarial packs

Industry Packs$9.99/month unlock

Reportsaudit-ready exports

appsLaunch locally

The macOS and Windows launchers start demo-mode Studio and open the local console in the system browser without bundling Electron.

scoreSee the trust baseline

Overall level, layer bars, evidence coverage, and top gaps.

packsBrowse free and paid depth

Assurance is free core. Industry Packs show locked and unlocked states.

reportsShare with the team

Generate outputs for engineering, compliance, and leadership review.

current capabilities

More than a score.
Proof you can operate_

AMC now connects the default score to proof receipts, runtime Watch signals, fleet topology, policy enforcement, compliance binders, and portable evidence. These are AMC-owned surfaces, not source-specific wrappers.

score + proof244 default, 264 lifecycle

Run the stable default bank, or opt into lifecycle questions for governance, evidence binding, runtime watch, proof exports, memory, and fleet operation.

domain proof laneamc proof check

Source-to-rule checks emit amcproof artifacts and fail closed as unsupported when correctness proof is missing.

watchSession, SLO, incident receipts

Correlate sessions, risk, cost, latency, override near-misses, and incident-to-regression closure with signed evidence refs.

fleetOverview + trust graph

Use fleet overview, graph validation, SLO status, and trust-graph exports before scoring multi-agent systems.

enforce + shieldPolicy and safe proof

Runtime firewall decisions, resource snapshots, safe confirmation proof, and 147 assurance packs stay in the free core.

docs + API1,144 CLI paths

The generated command inventory, API reference, OpenAPI playground, and docs hub expose the same product surface users run locally.

self-reported

100

→

evidence-verified

trust gap

AMC separates what an agent claims from what execution evidence proves.

0.4xself-reported weight

1.0xobserved evidence weight

2.5xtrust multiplier

the trust stack

Eight named surfaces_
One complete trust platform

The top of the page is intentionally simple. This is the deeper AMC map: score, harden, enforce, prove, monitor, comply, govern fleets, and issue portable trust identity.

platform surfaces

Score trust before you ship

Execution-backed scoring, assurance, enforcement, evidence, monitoring, compliance, fleet management, and portable trust identity.

amc score

integrations

Works with your stack_

14 built-in adapters. Works with OpenAI-compatible endpoints. No framework rewrite required.

LangChain

CrewAI

Anthropic

Google

Expertise built on
evidence_

Score and analyze

Point AMC at any agent. The default 244-question full score runs against execution evidence, while the optional 264-question lifecycle set adds runtime, proof, memory, and fleet coverage.

Fix and harden

AMC identifies high-signal gaps, generates targeted guardrail work, and uses 147 assurance packs for prompt injection, exfiltration, memory risk, sycophancy, and other adversarial failures.

Ship and monitor

Add CI gates, keep signed local evidence, prevent trust regressions, and generate compliance artifacts for EU AI Act, ISO 42001, NIST AI RMF, SOC 2, and OWASP LLM Top 10 review.

research foundation

Built on primary sources_

AMC is grounded in AI safety, security, governance, compliance, and agent-system research. The modules map papers and standards into scoring controls, assurance packs, and operational evidence.

NIST

NIST AI Risk Management Framework

Core governance structure for trustworthy AI deployment and evaluation.

maps to governance, risk, and cross-framework evidencenist.gov →

ICML 2024

Levels of AGI — Operationalizing Progress

Maturity levels framework for AI capabilities and autonomy.

maps to L0-L5 maturity and autonomy scoringarxiv 2311.02462 →

NeurIPS 2024

Connecting the Dots — LLMs Infer Latent Info

Frontier models infer information from distributed evidence.

maps to context leakage and information barriersarxiv 2406.14546 →

memory

Persistent Memory Injection

Cross-session memory poisoning through self-reinforcing payloads.

maps to memory integrity and persistence checksarxiv 2602.15654 →

agents

Monitor Bypass and Agent-as-a-Proxy Risk

Shows why monitoring-only defenses can fail without runtime controls.

maps to monitor bypass resistance and enforcementarxiv 2602.05066 →

security

MCP Security Bench

Evaluation coverage for MCP-specific attack vectors and tool-boundary failures.

maps to MCP compliance and security resilience packsarxiv 2510.15994 →

cost

Economic Denial of Service

Cost amplification through tool-calling and hidden runtime loops.

maps to budget controls and cost predictabilityarxiv 2601.10955 →

alignment

Sycophancy and Objective Drift

Agent behavior can optimize social approval while drifting from ground truth.

maps to sycophancy, alignment, and refusal checksarxiv 2602.08092 →

Agent Maturity Compass whitepaper → / see research notes and gap analysis →

pricing

Free core. paid domain depth_

free / open source

AMC Core

CLI, Desktop Studio, full score, assurance, evidence, reports, and CI gates.

244 default questions · 264 lifecycle questions · 147 assurance packs · 14 adapters · MIT licensed

start free →

$9.99 / month

Industry Packs

All 41 Industry Domain Packs for regulated verticals.

Health · Wealth · Education · Mobility · Technology · Governance · Environment

copy access command →

Procurement path: Compare buyer packages for commercial offers by buyer type, proof surfaces, and purchase-ready artifacts.

industry packs

7 stations. 41 packs.
regulated depth_

Optional paid packs add sector-specific diagnostics grounded in real regulations: EU AI Act, HIPAA, FDA, NERC CIP, CITES, ILO, ISO, and other domain frameworks. One $9.99/month subscription unlocks all 41 packs.

Environment

6 packs · 87 questions

Farm to Fork, Weave to Wear, Material to Machines, Source to Sustenance, Ubiquity to Utility, Sip to Sanitation.

SEE +

Health

9 packs · 151 questions

Digital Health Record, Wellness, Patient Lifecycle, Clinical Lifecycle, Professional Practice, Life Technology, Drug Discovery, Clinical Trials, Specialized Medicine.

SEE +

Wealth

5 packs · 70 questions

Future of Work, Digital Payments, No Poverty, Circular Economy, Blockchain and DeFi.

SEE +

Education

5 packs · 72 questions

K-12, Higher Education, Skills and Vocational, Specialized Education, Differently Abled.

SEE +

Mobility

6 packs · 78 questions

Sustainable Communities, Sustainable Ports, Sustainable Real Estate, Virtual Infrastructure, Privacy and Security, Freight/3PL/Warehouse.

SEE +

Technology

5 packs · 71 questions

Cognition to Intelligence, Networked Ecosystems, OS for Sustainable Outcomes, Infotainment, Partnerships for Prosperity.

SEE +

Governance

5 packs · 71 questions

Digital Citizens and Rights, Dance of Democracy, Petition to Law, Citizen Services, Public and Private Collaboration.

SEE +

faq

Common questions_

Is this just a quick score?+

No. The default amc command generates the full 244-question score. Use amc run --question-set lifecycle for the 264-question lifecycle-expanded set.

What is the 84-point documentation inflation gap?+

Self-reported documentation can claim 100/100 while execution-verified evidence shows a much weaker score. AMC closes that gap with observed evidence, signed artifacts, and repeatable proof chains.

How is AMC different from other evaluation tools?+

AMC is a trusted observer. Self-reported evidence is capped at 0.4x weight; observed runtime evidence carries 1.0x weight. It scores maturity, finds gaps, runs assurance packs, and preserves verifiable evidence.

Which frameworks does AMC support?+

14 adapters: LangChain, LangGraph, CrewAI, AutoGen, OpenAI Agents SDK, LlamaIndex, Semantic Kernel, Claude Code, Gemini, OpenClaw, OpenHands, Python SDK, CLI, and OpenAI-compatible endpoints.

Can I use AMC in CI/CD?+

Yes. Use gates such as amc gate --min-score L3 to fail builds below threshold and prevent trust regressions.

Do I need an account?+

No account is needed for the free core. Industry Packs require paid access.

Do I need Docker?+

No. Use npx agent-maturity-compass or install with npm. Docker is optional.

Can non-engineers use it?+

Yes. Run amc up and use the local Studio dashboard.

What is free and what is paid?+

The core trust stack is MIT licensed: Score, Shield, Enforce, Vault, Watch, Comply, Fleet, Passport, adapters, Studio, reports, and CI gates. The paid add-on is $9.99/month access to all 41 Industry Domain Packs.

start today

Score your agent with one command_

install amc →

Run one command_

Get the full score.

Fix the gaps.

Start with one command_

No install first run

Install globally

Prefer buttons? Run Studio.

More than a score.Proof you can operate_

Eight named surfaces_One complete trust platform

Score trust before you ship

Works with your stack_

Expertise built onevidence_

Score and analyze

Fix and harden

Ship and monitor

Built on primary sources_

NIST AI Risk Management Framework

Levels of AGI — Operationalizing Progress

Connecting the Dots — LLMs Infer Latent Info

Persistent Memory Injection

Monitor Bypass and Agent-as-a-Proxy Risk

MCP Security Bench

Economic Denial of Service

Sycophancy and Objective Drift

Free core. paid domain depth_

AMC Core

Industry Packs

7 stations. 41 packs.regulated depth_

Environment

Health

Wealth

Education

Mobility

Technology

Governance

Common questions_

Score your agent with one command_

More than a score.
Proof you can operate_

Eight named surfaces_
One complete trust platform

Expertise built on
evidence_

7 stations. 41 packs.
regulated depth_