1 · The 60-second answer
Large companies — railroads, utilities, hospitals, banks — own tens of thousands of computers, network devices, and industrial controllers. Every month, hundreds of new security flaws ("vulnerabilities") are published against the software running on those machines. Today the response is mostly humans: analysts open tickets, plan patch windows, get approvals, apply fixes, and write audit reports. The cycle takes weeks.
Mythal replaces that human cycle with a swarm of specialist AI agents. A Supervisor agent receives each new vulnerability, dispatches eleven specialist agents to enrich, prioritize, plan, and verify a fix, and routes the result through a strict policy gate. A twelfth agent runs continuously on the company's full inventory and surfaces risk that no scanner has reported yet. Every decision is logged, every action is reversible, and every closed fix produces evidence an auditor will accept.
Think of this as LangGraph or CrewAI for vulnerability remediation — a typed message bus, a Supervisor that drives a finite-state machine, twelve specialist nodes with system prompts and tool registries, an OPA-style policy gate between every decision and any side effect, and a Postgres-backed reasoning trace ledger that doubles as audit evidence.
2 · The world we're operating in
Before the agents make sense, the problem has to make sense. Spend ninety seconds here and the rest of the document drops into place.
What "vulnerability management" actually is
Every piece of software has bugs. Some bugs let an attacker do something they shouldn't —
read a file, escalate privileges, run code remotely. When a researcher (or a vendor, or
an AI tool now) finds one, it gets a global ID: CVE-2026-12345. It's published in
the National Vulnerability Database. Vendors then ship a patch — a new software version
that fixes the bug.
The company's job is to figure out, for every machine it owns: do any of the published CVEs apply to me, and have I installed the fix yet? Multiply that question by 50,000 assets and 1,000 new CVEs a month and you have a real engineering problem.
What's a "scanner"?
A scanner is software that crawls the company's network, identifies what's running on every device (OS, version, installed packages, firmware), and matches that inventory against the global CVE database. Output: asset X is running software Y at version Z, which has open CVE A, B, C.
Big vendors you'll hear constantly: Qualys, Tenable, Rapid7, Wiz (cloud), Microsoft Defender Vulnerability Management (for endpoints inside the Microsoft estate). On the industrial side: Claroty, Nozomi Networks, Dragos. Most large operators run three or four of these in parallel; nobody trusts a single source.
A scanner is a data source, not a brain. Think of it the way you'd think of an embedding model that outputs vectors — useful, structured, and dumb. The intelligence in our platform is the agents that consume the scanner output, not the scanners themselves. We are scanner-agnostic on purpose.
What's IT vs OT? Why does it matter?
IT is what you'd expect: laptops, servers, email, databases. If you patch an IT server and it crashes, someone reboots it and you write a postmortem. Annoying, not catastrophic.
OT — operational technology — is the software that physically runs the world: the box that switches a railroad track, the controller in a power substation, the pump in a water treatment plant, the valve on a gas pipeline. If you "patch" one of those and it doesn't come back up, a train derails, a city loses power, or a pipeline over-pressurizes. OT systems often run software that hasn't been updated in eight years because the vendor only certifies firmware updates inside a planned maintenance window that the operator schedules months in advance.
This distinction is the single most important thing to understand about our platform. The reason a dedicated OT Safety Officer agent exists and holds veto rights is that an OT operator (the person responsible for the trains actually moving) will not adopt any tool that can change OT state without an explicit safety model. Period.
"Critical Cyber System" (CCS) — the strictest category
Inside OT, there's an even stricter subset called Critical Cyber Systems. This is a term from a TSA security directive (the rail one — TSA SD 1580-21-01). It means "if this system is breached or fails, the operational impact is severe and immediate." A Positive Train Control wayside unit is a CCS. A substation RTU controlling 138kV power flow is a CCS. The platform enforces extra rules for these: dual approval, an open maintenance window, a tested rollback, and an OT Safety Officer sign-off — all four must be true before a change is allowed.
What's a "patch" and what's "patch management"?
A patch is just a new version of the software. Patch management is the entire human workflow around getting it deployed: ingest the vendor advisory, decide who's affected, pick a deployment cohort (canary first, then rings), schedule a window, get a change ticket approved, run the patch, verify the system came back healthy, write up the evidence. In most large organizations this is a department of fifteen people. We are replacing that department's throughput, not its governance.
Compliance frameworks — the audit pressure
Critical infrastructure is regulated. Governments and industry bodies publish detailed rule sets, and the company has to be able to prove they followed them. The big ones you'll see throughout this codebase:
- TSA SD 1580-21-01 — US Transportation Security Administration directive for Class I freight railroads. Mandates timely patching of Critical Cyber Systems and network segmentation between IT and OT.
- NIST CSF 2.0 — the broad US cybersecurity framework (Identify · Protect · Detect · Respond · Recover). Most enterprise security programs organize themselves around it.
- NIST 800-82r3 — NIST's playbook specifically for industrial control systems. Defines the patch management and zone/conduit rules our OT Safety Officer enforces.
- IEC 62443 — the international equivalent of 800-82r3. Comes up for tenants outside the US.
- SOX § 404, HIPAA, PCI DSS v4 — financial, healthcare, and payment-card frameworks. We cover them for cross-vertical tenants.
Each remediation we close produces an evidence unit — a small record that says "we did X on date Y, approved by Z, here's the trace." That record is tagged to one or more control IDs in one or more frameworks. The Compliance Reporter agent rolls these up into auditor-grade PDFs.
3 · Terminology you'll hear constantly
Quick reference. Each term comes back later, in context.
Globally unique ID for a security flaw. Format: CVE-YEAR-NUMBER. Assigned
by MITRE. The platform identity for any specific bug.
A list maintained by CISA (the US Cybersecurity & Infrastructure Security Agency) of CVEs that are actively being exploited in the wild. If a CVE is on KEV, it jumps to the top of the priority queue immediately — federal agencies are required to patch KEV-listed CVEs within tight deadlines.
A probability between 0 and 1 estimating how likely a CVE is to be exploited in the next 30 days. Updated daily by FIRST.org. Most CVEs score below 0.1. KEV-listed CVEs typically score above 0.5.
The "severity" score — a number from 0.0 to 10.0 capturing how bad a CVE is in
theory. 9.0+ is critical. CVSS tells you the worst-case severity; EPSS
tells you the likelihood. You want both.
A vendor's internal security team that issues advisories about flaws in their own products and ships patches. Cisco PSIRT, Microsoft MSRC, Siemens ProductCERT, Wabtec PSIRT, Red Hat PSIRT — every major vendor has one. Our Threat Intel agent polls them all.
A structured list of every component (library, dependency) inside a piece of software. Helpful for figuring out which assets are affected when a CVE drops in a common library (think Log4Shell). Not yet a first-class input in our MVP but on the roadmap.
The headline metric for our buyers. Average wall-clock time from "a CVE was detected on one of my assets" to "the fix is verified live." Baseline at a typical Class I railroad: 22 days. Our pilot target: 5 days. The KPI that closes the deal.
A rail-specific safety system that automatically prevents train collisions and derailments. Hardware on locomotives and at wayside (track-side cabinets) — strongly regulated, almost always tagged as a Critical Cyber System.
Generic term for OT — covers PLCs, RTUs, SCADA, HMIs, anything that controls a physical process. NIST 800-82r3 is "the ICS security standard."
4 · Why the world broke — the Mythos thesis, properly
Here's the framing you'll repeat in every executive room, and you should understand it end-to-end before you do.
Historically, finding new vulnerabilities required deep human expertise — a researcher spends weeks reverse-engineering a vendor's firmware, finds a flaw, writes it up. Patches dropped at a manageable cadence. Companies had weeks to react. Patch Tuesday was a known monthly event you could staff against.
Then large language models — specifically the wave that started with Anthropic's Claude Mythos and the contemporary reasoning models from OpenAI and others — made vulnerability discovery dramatically cheaper. AI tools can now:
- Diff a patched binary against an unpatched one and reconstruct the vulnerability the patch was fixing (so attackers learn faster).
- Audit large open-source codebases at scale and flag exploitable patterns.
- Generate proof-of-concept exploits from CVE descriptions.
Two effects: vendors are publishing patches faster than ever (because they're being told about flaws faster than ever), and attackers are operationalizing exploits faster than ever. April 2026's Patch Tuesday landed 163 CVEs in one day. The defender's response capacity didn't change — it's still those fifteen humans we mentioned.
Discovery went machine-speed. Exploitation went machine-speed. Remediation stayed human-speed. That gap is the largest unhedged risk on the modern CISO's balance sheet, and it is acute in critical infrastructure where patch windows are scarce and the blast radius of a breach is operational, not just informational.
The "Mythos thesis" is just this: the response layer also has to go machine-speed, without losing the safety properties (rollback, audit, dual approval) that the human layer enforced. That's the entire raison d'être of Mythal.
5 · Our answer: the closed loop
We model the work as a pipeline. A single vulnerability finding enters at one end, flows through a fixed sequence of stages, and exits as either "closed" (verified fix) or "rolled back / escalated" (humans please look). Every stage produces a structured record. Every stage can fail safely.
Each stage above is implemented as a separate agent. The Supervisor agent owns the state and routes work. The Inventory Insights agent runs independently and surfaces recommendations the moment they exist — not when a CVE happens to expose them.
If you've written a multi-agent app with LangGraph, this is structurally the same:
a StateGraph with named nodes, edges that decide based on a typed state
object, and a Supervisor that pumps the graph. We chose not to use LangGraph itself
because we wanted (1) Postgres as the trace store, not a transient in-process state,
and (2) HMAC signing on every transition for audit purposes.
6 · The 12 agents, one by one
Each agent is a Python module under apps/api/src/sentinelgrid_api/agents/.
Each one exports a run(...) function with a typed Pydantic input and output,
optionally calls an LLM (Anthropic Claude, OpenAI, or none), and writes one message
to the bus and one entry to the reasoning trace per invocation. Below: what each one
does, why it's a separate agent, and what would break if it disappeared.
① Supervisor — the orchestrator
Drives the finite-state machine per finding. Receives the initial event, dispatches
to specialists in sequence, holds the conversation state, and emits the master trace.
In anthropic mode runs on Claude Opus 4.7 because the orchestration
decisions benefit from a larger reasoning model; in deterministic mode
it's straight Python control flow.
Why a separate agent? Because routing decisions ("this finding is OT + CCS, so the OT Safety Officer must approve before Remediation Planner can act") are themselves reasoning. Splitting routing from work makes both testable in isolation.
② Scanner Liaison — normalize the firehose
Every scanner has a different schema. Qualys says QID, Tenable says
plugin_id, Wiz says issue_id. We translate all of them into
one canonical VulnerabilityFinding shape, and we deduplicate across scanners
by (asset_id, cve) — the same CVE seen on the same asset by Qualys and Defender
is one finding, not two.
Why a separate agent? Because the schema-translation problem is its own discipline. Field mappings change, scanners add new fields, and we want a single module to own that drift. Also: when a scanner feed goes degraded, this agent is where you put the alerting.
③ Threat Intel Aggregator — add context
Raw CVEs have a severity score but no narrative. This agent enriches each finding with: is it on the CISA KEV list, what's its EPSS probability, is there public exploit code, is a known ransomware actor associated, did the vendor's PSIRT flag it as exploited. Reads from NVD, KEV, EPSS, vendor PSIRTs, GitHub Security Advisories, and (for entitled tenants) pre-disclosure feeds like Project Glasswing.
Why a separate agent? Because each of those feeds has its own auth, rate limits, and refresh cadence. Concentrating that operational complexity in one module means the rest of the pipeline can treat enrichment as a single uniform call.
④ Patch Hunter — find the fix
Given a CVE, locate the vendor patch (or community workaround), figure out the version
it bumps you to, and assign a PatchReliabilityScore — a 0-to-1 number that
blends source authority (vendor > community > ad-hoc), deployment population evidence
(has anyone else applied this and survived?), and rollback feasibility. The score is
what the policy gate uses to decide auto-apply vs human approval.
⑤ Impact Analyst — score blast radius
Joins the vulnerable asset to the CMDB and the dependency graph. Outputs a
BusinessImpactProfile: business criticality, network exposure (is this
internet-facing, DMZ, or OT-segmented?), how many downstream systems depend on it,
whether sensitive data lives on it. This is what turns "CVSS 9.0 on some Windows server"
into "CVSS 9.0 on the payroll database that 4,000 employees hit every day."
⑥ Change Risk — predict the change blowing up
Patches sometimes break things. This agent looks at historical change-failure rates for the asset class and vendor, factors in whether a canary peer is available, and recommends a deployment window. Output: a risk score and a window suggestion. This is what differentiates "auto-apply tonight" from "queue for the Saturday 02:00 wave."
⑦ ★ OT Safety Officer — the veto
The single most important agent in the platform. If the affected asset is in an OT zone or carries the Critical Cyber System flag, this agent reviews the proposed remediation and almost always vetoes direct patching. Instead, it proposes compensating controls — tightening the industrial firewall ACL so only authorized engineering workstations can reach the device, deploying an IPS signature that virtually patches the vulnerability at the network layer, increasing monitoring sensitivity — and schedules the actual firmware update for the next planned maintenance window with dual approval requirement.
Why a separate agent? Because OT operators (the humans who own the physical safety of the railroad) will not approve any tool whose default behavior can touch OT. Having a named agent with explicit veto rights and a documented prompt is the safety model they need to see before they'll sign the contract. Without this agent, the platform does not get adopted in rail, pipeline, power, or water — regardless of how good the rest of it is.
If a CISO at a Class I railroad asks one question about the platform, it will be about OT safety. The answer is this agent. Memorize the veto path: review → propose ACL tightening + IPS signature + monitored isolation → schedule firmware update in next window → require dual approval (security + OT ops). That sequence is the deal-closer.
⑧ Remediation Planner — write the runbook
Synthesizes everything upstream into a concrete plan: exact steps, exact systems, exact order, exact approvals required, exact rollback procedure, exact verification checks. Produces both a human-readable runbook (so a security analyst can read it and trust it) and a machine-executable workflow (so the Executor agent can run it).
⑨ Executor — actually apply the fix
Drives the integration tools that change the world. For Windows: Microsoft SCCM/Intune or Tanium. For Linux: Ansible or Puppet or Chef. For network gear: Cisco Catalyst Center or Palo Alto Panorama. For cloud: AWS Systems Manager or Azure Arc. For OT: vendor-specific tooling like Tenable OT Security or Claroty SRA. Refuses to act without an approved plan and a policy-authorized scope — that refusal is the safety net.
⑩ Verifier — close the loop
After the Executor runs, this agent rescans the asset to confirm the CVE no longer reports, runs a health probe to confirm the service didn't break, and where applicable re-runs an exploit-safety check (does the previously vulnerable endpoint now reject the exploit payload?). If any check fails, it triggers rollback and escalates to the Supervisor.
⑪ Compliance Reporter — produce evidence
After every closed plan, emits one or more evidence units tagged to control IDs in TSA SD 1580, NIST CSF 2.0, NIST 800-82r3, IEC 62443, and the cross-vertical frameworks. Each evidence unit is a structured record + a snippet of the reasoning trace. The agent also generates auditor-ready PDF packages on demand.
⑫ ★ Inventory Insights — the proactive agent
Everything above is reactive: a CVE drops, the pipeline fires. The Inventory Insights agent is the opposite — it sweeps the company's full estate on its own cadence and produces recommendations even when no CVE has been published yet. End-of-life software, version sprawl, OT firmware bands that are below recommended minimums, Critical Cyber Systems with no documented owner, service accounts on CCS systems without strong factor authentication, vendors that show up only once or twice (probable shadow IT).
Why this matters competitively: almost every other vulnerability platform stops at scanner findings. CISOs of critical-infrastructure operators desperately need an answer to "what do I actually own and where am I exposed before the next CVE drops?" This agent is that answer.
7 · The glue — what holds the agents together
The typed message bus
Every inter-agent communication is a AgentMessage object: stable ULID,
trace ID grouping all messages for one finding, source agent, destination agent,
intent string, structured payload, policy context, timestamp, and HMAC-SHA256
signature over the canonical JSON. The bus is just a Postgres table plus a Redis
pub/sub channel.
This is the equivalent of a Kafka topic for agent traffic, except we picked Postgres because the audit guarantee ("every message persists before any side effect runs") was more important than throughput. At sub-10k findings/hour you don't need Kafka. The signature is exactly the same idea as JWS — detect tampering if anyone messes with the audit log.
The reasoning trace ledger
Parallel to the message bus, every agent writes a human-readable narrative entry to
the reasoning_traces table: "Threat Intel: Enriched CVE-2026-19110.
KEV=true, EPSS=0.81, exploit in the wild. Flagging for high-priority routing."
The console reads this directly; the Compliance Reporter quotes excerpts in the
auditor PDFs. The reasoning trace is the product — auditors and
investors both read it. It's the part that distinguishes "platform" from "black-box
automation."
The policy gate
Between every agent decision and any side effect sits a small rules engine. Seven default rules:
- Critical Cyber Systems require dual approval + open maintenance window + validated rollback.
- Any OT-zone asset requires OT Safety Officer sign-off plus dual approval.
- IT auto-apply is allowed only when: criticality ≤ Medium AND patch reliability ≥ 0.85 AND a canary peer exists AND the window is open AND rollback is valid.
- Default IT remediation path requires single security approval.
- Agents may not exfiltrate findings to non-allowlisted tools.
- No approval is granted without a validated rollback plan.
- Change blackout windows are honored — no state-changing action during a blackout.
This is the same role Open Policy Agent (Rego) plays in modern Kubernetes admission control: a deterministic guard between intent and effect. The LLM agents propose; the policy gate disposes. Mixing the two responsibilities into the LLM itself is the architectural mistake every first attempt at "autonomous remediation" makes.
Three reasoning backends
An environment variable AGENT_MODE selects what runs the agents' reasoning:
- deterministic (default) — every agent is pure Python rules. No LLM call. Sub-second per agent step. Best for CI, demos, and any tenant who doesn't want external API calls.
- anthropic — Supervisor and OT Safety Officer call Claude Opus 4.7; the nine specialists call Claude Sonnet 4.6. Outputs are JSON-schema-validated.
- openai — specialists call
gpt-4o-mini; Supervisor and OT Safety callo4-mini. Same JSON-schema gates apply.
If a model call fails for any reason (auth error, rate limit, network), the agent silently falls back to its deterministic path. The pipeline does not crash. This is important: enterprise buyers cannot tolerate a platform whose availability depends on someone else's API key.
8 · Why this is enterprise-grade (and not a toy)
Four properties make this credible inside a Fortune 500 operator. Memorize them — they're what a CISO will probe in the technical demo.
Auditable end-to-end
Every state transition emits a structured event. Every event is signed. Every closed remediation generates evidence tagged to compliance control IDs. The console's Agent Activity page is just a real-time view onto the same data the auditor PDF is generated from. The system can't lie because it can't avoid writing.
Reversible by design
No remediation is approved without a tested rollback plan. The Verifier agent rolls back automatically on failure. Even on auto-apply, every step records what would be needed to undo it. If the rollback plan doesn't exist or fails dry-run validation, the policy gate denies the change. That clause is rule SG-POL-006 and it's the reason an OT operator will trust the platform.
Human-in-the-loop where it counts
Auto-apply is restricted to IT assets at low/medium criticality with high-reliability patches and an open window. Everything else requires a human signature. CCS changes require two human signatures (Security + OT Operations). The platform's job isn't to remove humans; it's to focus them on the decisions only they can make.
Adversary-aware prompts
Every external string — advisory body text, scanner output, ticket comments — is
wrapped in <untrusted_external> tags inside agent prompts. A pre-flight
classifier flags suspicious patterns ("ignore previous instructions", smuggled
tool-call syntax, embedded role tags). Agents are explicitly told to treat content
inside those tags as data only, never as instructions. This matters because attackers
who realize their CVE description is being read by an AI will try to inject prompt
payloads — that's already happening in 2026.
9 · How to talk about it in the room
Different audiences want different framings. Here are the four you'll see most often, and the angle that lands with each.
Talking to a CISO
Lead with the asymmetry. "Vulnerability discovery has gone machine-speed. Exploitation has gone machine-speed. Your remediation team hasn't. That gap is the largest unhedged risk on your balance sheet, and Mythal closes it without asking your OT team to give up the safety model they've spent twenty years building." Then show Scenario C (the OT Safety Officer veto).
Talking to a VP of Infrastructure Security
Lead with MTR. "Today your team takes 22 days to close an IT finding. With Mythal auto-applying the easy ones and queueing the rest with full context, you'll close them in 5. You'll free two to three FTEs to work on detection instead of patch tickets." Show the Plans Kanban with auto-apply happening live.
Talking to a Head of OT / Plant Operations
Lead with the veto. "Read this agent's system prompt. It defaults to no. The only way to make a state change on an OT zone or a Critical Cyber System is dual approval, an open maintenance window, and a validated rollback — all four. We enforce that with code, not with policy documents." Walk through the OT Operations page. Show the compensating-control records.
Talking to a CFO
Lead with insurance + audit cost. "Your cyber-insurance renewal next year depends on attestable controls and demonstrable MTR. We produce both as a byproduct. Independent analysis suggests $1.2–2.4M of premium relief at typical Class I scale. Plus you're not staffing fifteen analysts forever." Show the Compliance Reporter PDF export and the executive deck.
Talking to a venture partner
Lead with the thesis. "Mythos collapsed the cost of vulnerability discovery. Every incumbent in this space was built before that happened. We're not faster than Qualys, we're built on a different physical assumption: the patch firehose is the default state, not the exception. The OT Safety Officer agent is the moat in critical infrastructure. Inventory Insights is the moat against pure scanner companies." Show the agent architecture page and Scenario B (Mythos drop).
10 · Where to go next
Now that the concepts are loaded, walk the rest of the deliverables in this order:
- The 12 agents — same content as Section 6 above, but formatted as a reference grid you can scan in the room.
- Architecture — the topology, the policy gate rules, the data flow, the tech stack.
- Inventory Insights — deeper on the 12th agent because it's the most likely "I've never seen this anywhere else" reaction.
- Compliance — the control mappings, framework by framework.
- Client pitch — the polished CISO leave-behind. Read this last, because everything in it makes more sense after the primer.
- Demo script — the in-room runbook with timed beats.
- Glossary — keep this open in a tab during prep; every term you'll hear is here.
Open the live console at localhost:3090 alongside these docs. Click Run on Scenario A (Patch Tuesday) and watch the Agent Activity timeline fill in real time. That single experience is what made everything in this document click for me when I first wired it up — and it'll do the same for you.