Mythal
primer
Read this first · for the builder, not the buyer

Mythal, explained
like you've never sold to a CISO.

This document assumes you're an experienced engineer who builds with LLMs and agent frameworks — and that you haven't spent a career inside enterprise cybersecurity. By the end you'll be able to walk an executive, a VP, or a hiring panel through what we built, why each piece exists, and where it sits relative to the tools and patterns you already know.

1 · The 60-second answer

Large companies — railroads, utilities, hospitals, banks — own tens of thousands of computers, network devices, and industrial controllers. Every month, hundreds of new security flaws ("vulnerabilities") are published against the software running on those machines. Today the response is mostly humans: analysts open tickets, plan patch windows, get approvals, apply fixes, and write audit reports. The cycle takes weeks.

Mythal replaces that human cycle with a swarm of specialist AI agents. A Supervisor agent receives each new vulnerability, dispatches eleven specialist agents to enrich, prioritize, plan, and verify a fix, and routes the result through a strict policy gate. A twelfth agent runs continuously on the company's full inventory and surfaces risk that no scanner has reported yet. Every decision is logged, every action is reversible, and every closed fix produces evidence an auditor will accept.

If you've built agentic apps

Think of this as LangGraph or CrewAI for vulnerability remediation — a typed message bus, a Supervisor that drives a finite-state machine, twelve specialist nodes with system prompts and tool registries, an OPA-style policy gate between every decision and any side effect, and a Postgres-backed reasoning trace ledger that doubles as audit evidence.

2 · The world we're operating in

Before the agents make sense, the problem has to make sense. Spend ninety seconds here and the rest of the document drops into place.

What "vulnerability management" actually is

Every piece of software has bugs. Some bugs let an attacker do something they shouldn't — read a file, escalate privileges, run code remotely. When a researcher (or a vendor, or an AI tool now) finds one, it gets a global ID: CVE-2026-12345. It's published in the National Vulnerability Database. Vendors then ship a patch — a new software version that fixes the bug.

The company's job is to figure out, for every machine it owns: do any of the published CVEs apply to me, and have I installed the fix yet? Multiply that question by 50,000 assets and 1,000 new CVEs a month and you have a real engineering problem.

What's a "scanner"?

A scanner is software that crawls the company's network, identifies what's running on every device (OS, version, installed packages, firmware), and matches that inventory against the global CVE database. Output: asset X is running software Y at version Z, which has open CVE A, B, C.

Big vendors you'll hear constantly: Qualys, Tenable, Rapid7, Wiz (cloud), Microsoft Defender Vulnerability Management (for endpoints inside the Microsoft estate). On the industrial side: Claroty, Nozomi Networks, Dragos. Most large operators run three or four of these in parallel; nobody trusts a single source.

AI parallel

A scanner is a data source, not a brain. Think of it the way you'd think of an embedding model that outputs vectors — useful, structured, and dumb. The intelligence in our platform is the agents that consume the scanner output, not the scanners themselves. We are scanner-agnostic on purpose.

What's IT vs OT? Why does it matter?

IT is what you'd expect: laptops, servers, email, databases. If you patch an IT server and it crashes, someone reboots it and you write a postmortem. Annoying, not catastrophic.

OT — operational technology — is the software that physically runs the world: the box that switches a railroad track, the controller in a power substation, the pump in a water treatment plant, the valve on a gas pipeline. If you "patch" one of those and it doesn't come back up, a train derails, a city loses power, or a pipeline over-pressurizes. OT systems often run software that hasn't been updated in eight years because the vendor only certifies firmware updates inside a planned maintenance window that the operator schedules months in advance.

This distinction is the single most important thing to understand about our platform. The reason a dedicated OT Safety Officer agent exists and holds veto rights is that an OT operator (the person responsible for the trains actually moving) will not adopt any tool that can change OT state without an explicit safety model. Period.

"Critical Cyber System" (CCS) — the strictest category

Inside OT, there's an even stricter subset called Critical Cyber Systems. This is a term from a TSA security directive (the rail one — TSA SD 1580-21-01). It means "if this system is breached or fails, the operational impact is severe and immediate." A Positive Train Control wayside unit is a CCS. A substation RTU controlling 138kV power flow is a CCS. The platform enforces extra rules for these: dual approval, an open maintenance window, a tested rollback, and an OT Safety Officer sign-off — all four must be true before a change is allowed.

What's a "patch" and what's "patch management"?

A patch is just a new version of the software. Patch management is the entire human workflow around getting it deployed: ingest the vendor advisory, decide who's affected, pick a deployment cohort (canary first, then rings), schedule a window, get a change ticket approved, run the patch, verify the system came back healthy, write up the evidence. In most large organizations this is a department of fifteen people. We are replacing that department's throughput, not its governance.

Compliance frameworks — the audit pressure

Critical infrastructure is regulated. Governments and industry bodies publish detailed rule sets, and the company has to be able to prove they followed them. The big ones you'll see throughout this codebase:

Each remediation we close produces an evidence unit — a small record that says "we did X on date Y, approved by Z, here's the trace." That record is tagged to one or more control IDs in one or more frameworks. The Compliance Reporter agent rolls these up into auditor-grade PDFs.

3 · Terminology you'll hear constantly

Quick reference. Each term comes back later, in context.

CVE
Common Vulnerabilities and Exposures

Globally unique ID for a security flaw. Format: CVE-YEAR-NUMBER. Assigned by MITRE. The platform identity for any specific bug.

KEV
Known Exploited Vulnerabilities catalog

A list maintained by CISA (the US Cybersecurity & Infrastructure Security Agency) of CVEs that are actively being exploited in the wild. If a CVE is on KEV, it jumps to the top of the priority queue immediately — federal agencies are required to patch KEV-listed CVEs within tight deadlines.

EPSS
Exploit Prediction Scoring System

A probability between 0 and 1 estimating how likely a CVE is to be exploited in the next 30 days. Updated daily by FIRST.org. Most CVEs score below 0.1. KEV-listed CVEs typically score above 0.5.

CVSS
Common Vulnerability Scoring System

The "severity" score — a number from 0.0 to 10.0 capturing how bad a CVE is in theory. 9.0+ is critical. CVSS tells you the worst-case severity; EPSS tells you the likelihood. You want both.

PSIRT
Product Security Incident Response Team

A vendor's internal security team that issues advisories about flaws in their own products and ships patches. Cisco PSIRT, Microsoft MSRC, Siemens ProductCERT, Wabtec PSIRT, Red Hat PSIRT — every major vendor has one. Our Threat Intel agent polls them all.

SBOM
Software Bill of Materials

A structured list of every component (library, dependency) inside a piece of software. Helpful for figuring out which assets are affected when a CVE drops in a common library (think Log4Shell). Not yet a first-class input in our MVP but on the roadmap.

MTR
Mean Time To Remediate

The headline metric for our buyers. Average wall-clock time from "a CVE was detected on one of my assets" to "the fix is verified live." Baseline at a typical Class I railroad: 22 days. Our pilot target: 5 days. The KPI that closes the deal.

PTC
Positive Train Control

A rail-specific safety system that automatically prevents train collisions and derailments. Hardware on locomotives and at wayside (track-side cabinets) — strongly regulated, almost always tagged as a Critical Cyber System.

ICS
Industrial Control Systems

Generic term for OT — covers PLCs, RTUs, SCADA, HMIs, anything that controls a physical process. NIST 800-82r3 is "the ICS security standard."

4 · Why the world broke — the Mythos thesis, properly

Here's the framing you'll repeat in every executive room, and you should understand it end-to-end before you do.

Historically, finding new vulnerabilities required deep human expertise — a researcher spends weeks reverse-engineering a vendor's firmware, finds a flaw, writes it up. Patches dropped at a manageable cadence. Companies had weeks to react. Patch Tuesday was a known monthly event you could staff against.

Then large language models — specifically the wave that started with Anthropic's Claude Mythos and the contemporary reasoning models from OpenAI and others — made vulnerability discovery dramatically cheaper. AI tools can now:

Two effects: vendors are publishing patches faster than ever (because they're being told about flaws faster than ever), and attackers are operationalizing exploits faster than ever. April 2026's Patch Tuesday landed 163 CVEs in one day. The defender's response capacity didn't change — it's still those fifteen humans we mentioned.

!
Key insight — the asymmetry

Discovery went machine-speed. Exploitation went machine-speed. Remediation stayed human-speed. That gap is the largest unhedged risk on the modern CISO's balance sheet, and it is acute in critical infrastructure where patch windows are scarce and the blast radius of a breach is operational, not just informational.

The "Mythos thesis" is just this: the response layer also has to go machine-speed, without losing the safety properties (rollback, audit, dual approval) that the human layer enforced. That's the entire raison d'être of Mythal.

5 · Our answer: the closed loop

We model the work as a pipeline. A single vulnerability finding enters at one end, flows through a fixed sequence of stages, and exits as either "closed" (verified fix) or "rolled back / escalated" (humans please look). Every stage produces a structured record. Every stage can fail safely.

01
Discovered
Scanner sees the finding
02
Enriched
KEV, EPSS, exploit signals
03
Prioritized
Blast radius, change risk
04
Planned
Concrete fix + rollback
05
Approved
Policy gate or human
06
Executing
Applied via Ansible / SCCM / Panorama
07
Verified
Rescan + health probe
08
Closed
Evidence emitted, ledger sealed
A
OT veto path
Compensating controls instead of patch
B
Rolled back
Verifier reverted; escalated
C
Escalated
Beyond agent retry budget; paged human
+
Inventory sweep
Continuous · independent of CVE flow

Each stage above is implemented as a separate agent. The Supervisor agent owns the state and routes work. The Inventory Insights agent runs independently and surfaces recommendations the moment they exist — not when a CVE happens to expose them.

AI parallel

If you've written a multi-agent app with LangGraph, this is structurally the same: a StateGraph with named nodes, edges that decide based on a typed state object, and a Supervisor that pumps the graph. We chose not to use LangGraph itself because we wanted (1) Postgres as the trace store, not a transient in-process state, and (2) HMAC signing on every transition for audit purposes.

6 · The 12 agents, one by one

Each agent is a Python module under apps/api/src/sentinelgrid_api/agents/. Each one exports a run(...) function with a typed Pydantic input and output, optionally calls an LLM (Anthropic Claude, OpenAI, or none), and writes one message to the bus and one entry to the reasoning trace per invocation. Below: what each one does, why it's a separate agent, and what would break if it disappeared.

① Supervisor — the orchestrator

Drives the finite-state machine per finding. Receives the initial event, dispatches to specialists in sequence, holds the conversation state, and emits the master trace. In anthropic mode runs on Claude Opus 4.7 because the orchestration decisions benefit from a larger reasoning model; in deterministic mode it's straight Python control flow.

Why a separate agent? Because routing decisions ("this finding is OT + CCS, so the OT Safety Officer must approve before Remediation Planner can act") are themselves reasoning. Splitting routing from work makes both testable in isolation.

② Scanner Liaison — normalize the firehose

Every scanner has a different schema. Qualys says QID, Tenable says plugin_id, Wiz says issue_id. We translate all of them into one canonical VulnerabilityFinding shape, and we deduplicate across scanners by (asset_id, cve) — the same CVE seen on the same asset by Qualys and Defender is one finding, not two.

Why a separate agent? Because the schema-translation problem is its own discipline. Field mappings change, scanners add new fields, and we want a single module to own that drift. Also: when a scanner feed goes degraded, this agent is where you put the alerting.

③ Threat Intel Aggregator — add context

Raw CVEs have a severity score but no narrative. This agent enriches each finding with: is it on the CISA KEV list, what's its EPSS probability, is there public exploit code, is a known ransomware actor associated, did the vendor's PSIRT flag it as exploited. Reads from NVD, KEV, EPSS, vendor PSIRTs, GitHub Security Advisories, and (for entitled tenants) pre-disclosure feeds like Project Glasswing.

Why a separate agent? Because each of those feeds has its own auth, rate limits, and refresh cadence. Concentrating that operational complexity in one module means the rest of the pipeline can treat enrichment as a single uniform call.

④ Patch Hunter — find the fix

Given a CVE, locate the vendor patch (or community workaround), figure out the version it bumps you to, and assign a PatchReliabilityScore — a 0-to-1 number that blends source authority (vendor > community > ad-hoc), deployment population evidence (has anyone else applied this and survived?), and rollback feasibility. The score is what the policy gate uses to decide auto-apply vs human approval.

⑤ Impact Analyst — score blast radius

Joins the vulnerable asset to the CMDB and the dependency graph. Outputs a BusinessImpactProfile: business criticality, network exposure (is this internet-facing, DMZ, or OT-segmented?), how many downstream systems depend on it, whether sensitive data lives on it. This is what turns "CVSS 9.0 on some Windows server" into "CVSS 9.0 on the payroll database that 4,000 employees hit every day."

⑥ Change Risk — predict the change blowing up

Patches sometimes break things. This agent looks at historical change-failure rates for the asset class and vendor, factors in whether a canary peer is available, and recommends a deployment window. Output: a risk score and a window suggestion. This is what differentiates "auto-apply tonight" from "queue for the Saturday 02:00 wave."

⑦ ★ OT Safety Officer — the veto

The single most important agent in the platform. If the affected asset is in an OT zone or carries the Critical Cyber System flag, this agent reviews the proposed remediation and almost always vetoes direct patching. Instead, it proposes compensating controls — tightening the industrial firewall ACL so only authorized engineering workstations can reach the device, deploying an IPS signature that virtually patches the vulnerability at the network layer, increasing monitoring sensitivity — and schedules the actual firmware update for the next planned maintenance window with dual approval requirement.

Why a separate agent? Because OT operators (the humans who own the physical safety of the railroad) will not approve any tool whose default behavior can touch OT. Having a named agent with explicit veto rights and a documented prompt is the safety model they need to see before they'll sign the contract. Without this agent, the platform does not get adopted in rail, pipeline, power, or water — regardless of how good the rest of it is.

!
Sales reality

If a CISO at a Class I railroad asks one question about the platform, it will be about OT safety. The answer is this agent. Memorize the veto path: review → propose ACL tightening + IPS signature + monitored isolation → schedule firmware update in next window → require dual approval (security + OT ops). That sequence is the deal-closer.

⑧ Remediation Planner — write the runbook

Synthesizes everything upstream into a concrete plan: exact steps, exact systems, exact order, exact approvals required, exact rollback procedure, exact verification checks. Produces both a human-readable runbook (so a security analyst can read it and trust it) and a machine-executable workflow (so the Executor agent can run it).

⑨ Executor — actually apply the fix

Drives the integration tools that change the world. For Windows: Microsoft SCCM/Intune or Tanium. For Linux: Ansible or Puppet or Chef. For network gear: Cisco Catalyst Center or Palo Alto Panorama. For cloud: AWS Systems Manager or Azure Arc. For OT: vendor-specific tooling like Tenable OT Security or Claroty SRA. Refuses to act without an approved plan and a policy-authorized scope — that refusal is the safety net.

⑩ Verifier — close the loop

After the Executor runs, this agent rescans the asset to confirm the CVE no longer reports, runs a health probe to confirm the service didn't break, and where applicable re-runs an exploit-safety check (does the previously vulnerable endpoint now reject the exploit payload?). If any check fails, it triggers rollback and escalates to the Supervisor.

⑪ Compliance Reporter — produce evidence

After every closed plan, emits one or more evidence units tagged to control IDs in TSA SD 1580, NIST CSF 2.0, NIST 800-82r3, IEC 62443, and the cross-vertical frameworks. Each evidence unit is a structured record + a snippet of the reasoning trace. The agent also generates auditor-ready PDF packages on demand.

⑫ ★ Inventory Insights — the proactive agent

Everything above is reactive: a CVE drops, the pipeline fires. The Inventory Insights agent is the opposite — it sweeps the company's full estate on its own cadence and produces recommendations even when no CVE has been published yet. End-of-life software, version sprawl, OT firmware bands that are below recommended minimums, Critical Cyber Systems with no documented owner, service accounts on CCS systems without strong factor authentication, vendors that show up only once or twice (probable shadow IT).

Why this matters competitively: almost every other vulnerability platform stops at scanner findings. CISOs of critical-infrastructure operators desperately need an answer to "what do I actually own and where am I exposed before the next CVE drops?" This agent is that answer.

7 · The glue — what holds the agents together

The typed message bus

Every inter-agent communication is a AgentMessage object: stable ULID, trace ID grouping all messages for one finding, source agent, destination agent, intent string, structured payload, policy context, timestamp, and HMAC-SHA256 signature over the canonical JSON. The bus is just a Postgres table plus a Redis pub/sub channel.

AI parallel

This is the equivalent of a Kafka topic for agent traffic, except we picked Postgres because the audit guarantee ("every message persists before any side effect runs") was more important than throughput. At sub-10k findings/hour you don't need Kafka. The signature is exactly the same idea as JWS — detect tampering if anyone messes with the audit log.

The reasoning trace ledger

Parallel to the message bus, every agent writes a human-readable narrative entry to the reasoning_traces table: "Threat Intel: Enriched CVE-2026-19110. KEV=true, EPSS=0.81, exploit in the wild. Flagging for high-priority routing." The console reads this directly; the Compliance Reporter quotes excerpts in the auditor PDFs. The reasoning trace is the product — auditors and investors both read it. It's the part that distinguishes "platform" from "black-box automation."

The policy gate

Between every agent decision and any side effect sits a small rules engine. Seven default rules:

  1. Critical Cyber Systems require dual approval + open maintenance window + validated rollback.
  2. Any OT-zone asset requires OT Safety Officer sign-off plus dual approval.
  3. IT auto-apply is allowed only when: criticality ≤ Medium AND patch reliability ≥ 0.85 AND a canary peer exists AND the window is open AND rollback is valid.
  4. Default IT remediation path requires single security approval.
  5. Agents may not exfiltrate findings to non-allowlisted tools.
  6. No approval is granted without a validated rollback plan.
  7. Change blackout windows are honored — no state-changing action during a blackout.
AI parallel

This is the same role Open Policy Agent (Rego) plays in modern Kubernetes admission control: a deterministic guard between intent and effect. The LLM agents propose; the policy gate disposes. Mixing the two responsibilities into the LLM itself is the architectural mistake every first attempt at "autonomous remediation" makes.

Three reasoning backends

An environment variable AGENT_MODE selects what runs the agents' reasoning:

If a model call fails for any reason (auth error, rate limit, network), the agent silently falls back to its deterministic path. The pipeline does not crash. This is important: enterprise buyers cannot tolerate a platform whose availability depends on someone else's API key.

8 · Why this is enterprise-grade (and not a toy)

Four properties make this credible inside a Fortune 500 operator. Memorize them — they're what a CISO will probe in the technical demo.

Auditable end-to-end

Every state transition emits a structured event. Every event is signed. Every closed remediation generates evidence tagged to compliance control IDs. The console's Agent Activity page is just a real-time view onto the same data the auditor PDF is generated from. The system can't lie because it can't avoid writing.

Reversible by design

No remediation is approved without a tested rollback plan. The Verifier agent rolls back automatically on failure. Even on auto-apply, every step records what would be needed to undo it. If the rollback plan doesn't exist or fails dry-run validation, the policy gate denies the change. That clause is rule SG-POL-006 and it's the reason an OT operator will trust the platform.

Human-in-the-loop where it counts

Auto-apply is restricted to IT assets at low/medium criticality with high-reliability patches and an open window. Everything else requires a human signature. CCS changes require two human signatures (Security + OT Operations). The platform's job isn't to remove humans; it's to focus them on the decisions only they can make.

Adversary-aware prompts

Every external string — advisory body text, scanner output, ticket comments — is wrapped in <untrusted_external> tags inside agent prompts. A pre-flight classifier flags suspicious patterns ("ignore previous instructions", smuggled tool-call syntax, embedded role tags). Agents are explicitly told to treat content inside those tags as data only, never as instructions. This matters because attackers who realize their CVE description is being read by an AI will try to inject prompt payloads — that's already happening in 2026.

9 · How to talk about it in the room

Different audiences want different framings. Here are the four you'll see most often, and the angle that lands with each.

Talking to a CISO

Lead with the asymmetry. "Vulnerability discovery has gone machine-speed. Exploitation has gone machine-speed. Your remediation team hasn't. That gap is the largest unhedged risk on your balance sheet, and Mythal closes it without asking your OT team to give up the safety model they've spent twenty years building." Then show Scenario C (the OT Safety Officer veto).

Talking to a VP of Infrastructure Security

Lead with MTR. "Today your team takes 22 days to close an IT finding. With Mythal auto-applying the easy ones and queueing the rest with full context, you'll close them in 5. You'll free two to three FTEs to work on detection instead of patch tickets." Show the Plans Kanban with auto-apply happening live.

Talking to a Head of OT / Plant Operations

Lead with the veto. "Read this agent's system prompt. It defaults to no. The only way to make a state change on an OT zone or a Critical Cyber System is dual approval, an open maintenance window, and a validated rollback — all four. We enforce that with code, not with policy documents." Walk through the OT Operations page. Show the compensating-control records.

Talking to a CFO

Lead with insurance + audit cost. "Your cyber-insurance renewal next year depends on attestable controls and demonstrable MTR. We produce both as a byproduct. Independent analysis suggests $1.2–2.4M of premium relief at typical Class I scale. Plus you're not staffing fifteen analysts forever." Show the Compliance Reporter PDF export and the executive deck.

Talking to a venture partner

Lead with the thesis. "Mythos collapsed the cost of vulnerability discovery. Every incumbent in this space was built before that happened. We're not faster than Qualys, we're built on a different physical assumption: the patch firehose is the default state, not the exception. The OT Safety Officer agent is the moat in critical infrastructure. Inventory Insights is the moat against pure scanner companies." Show the agent architecture page and Scenario B (Mythos drop).

10 · Where to go next

Now that the concepts are loaded, walk the rest of the deliverables in this order:

  1. The 12 agents — same content as Section 6 above, but formatted as a reference grid you can scan in the room.
  2. Architecture — the topology, the policy gate rules, the data flow, the tech stack.
  3. Inventory Insights — deeper on the 12th agent because it's the most likely "I've never seen this anywhere else" reaction.
  4. Compliance — the control mappings, framework by framework.
  5. Client pitch — the polished CISO leave-behind. Read this last, because everything in it makes more sense after the primer.
  6. Demo script — the in-room runbook with timed beats.
  7. Glossary — keep this open in a tab during prep; every term you'll hear is here.

Open the live console at localhost:3090 alongside these docs. Click Run on Scenario A (Patch Tuesday) and watch the Agent Activity timeline fill in real time. That single experience is what made everything in this document click for me when I first wired it up — and it'll do the same for you.