A Manifesto for Post-AI Security

luigi

15 hours ago

The Attacker Has Left the Building

SIGNIFICANCE STATEMENT

Much of the security field we have built since the 1970s rests on a hidden assumption: the adversary is human. Human attackers are slow, expensive, fatigable, and bounded by cognition. Agentic AI systems are not bounded in the same way. We argue that this assumption is not a detail — it is a load-bearing wall of the discipline. Its removal does not merely call for new patches; it calls for a new field. We call this field Post-AI Security. The analogy is deliberate: just as the theoretical possibility of quantum computation pushed cryptographers toward new primitives before any cryptographically relevant quantum computer existed, the empirical reality of agentic AI should push security researchers toward new threat models before the damage becomes irreversible.

ABSTRACT

Cybersecurity is built around an implicit assumption: that the adversary is a human being, subject to the cognitive, temporal, and economic constraints of human agency. This assumption is embedded in how the field models threats, designs defenses, measures risk, trains practitioners, and writes policy. We argue that the emergence of capable agentic AI systems has made that assumption insufficient, not gradually but structurally. An AI agent conducting a security assessment does not sleep, tire, or lose focus in the human sense, and it does not naturally stop where a human operator would consider the task complete. It proceeds to the next step, and the next, at machine speed and with far weaker forms of friction than those that constrain human attackers. The security community has responded with important but mostly incremental adaptations: new threat categories, updated frameworks, and patched taxonomies. We argue that this is not enough. The problem is not only that our defenses need updating; it is that the threat model underneath them is increasingly wrong. We propose the term Post-AI Security (PAIS) to name the field that must be built in response, define its core premises, identify the research questions it must answer, and call on the community to treat it with the same anticipatory seriousness that post-quantum cryptography brought to its own paradigm shift — before the damage, not after.

I. Introduction

In 1994, Peter Shor demonstrated that a sufficiently powerful quantum computer could factor large integers in polynomial time, undermining the security assumptions behind RSA and elliptic-curve cryptography [1]. No quantum computer capable of executing Shor’s algorithm at practical scale existed at the time, and none is publicly known to exist today at the scale needed to break RSA-2048 in practice. Yet the cryptographic community treated the result with long-horizon strategic urgency. Over the decades that followed, researchers developed post-quantum alternatives, and NIST’s formal post-quantum cryptography standardization process — launched in 2016 — produced its first finalized standards in 2024 [2]. The rationale was explicit: harvest now, decrypt later. If adversaries were already collecting encrypted traffic to decrypt when capable hardware arrived, waiting was irrational.

Today in cybersecurity, we face something different. We do not merely have a theoretical possibility. We have operational deployments.

In mid-September 2025, Anthropic detected a sophisticated intrusion campaign attributed with high confidence to a Chinese state-sponsored group, designated GTG-1002. The attackers built an attack framework around Claude Code and delegated most tactical execution to it, reducing human involvement to a small number of critical decision points such as target selection, progression between phases, and exfiltration approval. Anthropic reports that the AI executed 80 to 90 percent of tactical operations independently, at request rates impossible to match for human operators, across roughly thirty global targets including technology firms, financial institutions, chemical manufacturers, and government agencies [3]. Anthropic described the incident as the first documented case of a cyberattack largely executed without human intervention at scale [3].

Seven months later, in April 2026, Anthropic announced Project Glasswing and released Claude Mythos Preview in a limited research preview, not for general access, citing cybersecurity misuse concerns [4]. In testing over the previous month, Anthropic says Mythos identified thousands of zero-day vulnerabilities across every major operating system and every major web browser when directed by a user to do so [4]. The examples Anthropic disclosed include a now-patched 27-year-old OpenBSD bug, a 16-year-old out-of-bounds write in FFmpeg that had survived millions of automated tests, and multi-step privilege-escalation chains across Linux and other operating systems [4]. Anthropic further stated that it did not explicitly train Mythos to have these offensive cybersecurity capabilities; they emerged as a downstream consequence of general improvements in reasoning, coding, and autonomy [4].

This is not an Anthropic-only story. In February 2026, OpenAI released GPT-5.3-Codex, the first model it said it was treating as High capability in cybersecurity under its Preparedness Framework, and the first it said it had directly trained to identify software vulnerabilities [6]. CrowdStrike’s 2026 Global Threat Report reported an 89 percent year-over-year increase in operations by AI-enabled adversaries [7]. And researchers at AISLE tested the specific showcase vulnerabilities Anthropic discussed and found that much of the core analysis could be recovered even by smaller, cheaper models, including one with 3.6 billion active parameters, in controlled settings where the relevant code paths had already been isolated [8]. AISLE explicitly noted that this does not show end-to-end autonomous discovery and weaponization by open models, but it does show that parts of the capability are already more diffuse than a single frontier release might suggest [8].

These are not isolated curiosities. They are data points on a curve whose slope the security community has not yet taken seriously enough. The tools exist. The capability exists. What does not yet exist is a theoretical framework adequate to this threat. This paper argues that building one is urgent, and that the urgency has a name: Post-AI Security.

II. The Hidden Assumption

Modern cybersecurity was formalized in an era in which adversaries were, in practice, assumed to be human. Foundational threat-modeling traditions — from early penetration-testing methodology to STRIDE [9] and ATT&CK [10] — are built around attackers who choose, prioritize, sequence actions, and operate under constraints of time, skill, attention, and resources. Risk quantification and defense design often assume that raising cost, complexity, uncertainty, or time-to-success will deter, delay, or redirect the attacker.

We define an agentic AI system as a system that combines a large language model with autonomous planning, persistent state or memory, and tool-use capabilities to pursue goals across multi-step operations with minimal human intervention [11]. This definition is operational rather than metaphysical. It is not about sentience. It is about a specific combination of properties — goal persistence, tool augmentation, contextual reasoning, and autonomous sequential decision-making — that weakens the traditional human-attacker assumption. A script is automated but not agentic. A worm is self-propagating but not reasoning. An agentic system can read context, replan on failure, identify novel paths, and chain steps together in ways that no fixed script anticipates.

Much classical threat-model reasoning still applies to such systems, but it no longer suffices. An agentic AI does not experience fatigue, boredom, or frustration as humans do. It is not deterred by many of the forms of friction that normally shape human attacker behavior. It remains bounded by compute, tooling, safeguards, access, and environment — but not by the ordinary cognitive and physical limits of human operators. That difference matters.

The empirical literature already pointed in this direction before GTG-1002 and Mythos. Fang et al. showed that GPT-4-based agents could autonomously exploit 87 percent of a benchmark of real-world one-day vulnerabilities, while the other tested models and standard open-source scanners failed on all of them [12]. In their cost model, the GPT-4 agent was already about 2.8 times cheaper than comparable human labor [12]. Happe and Cito found that GPT-4-Turbo achieved 33 to 83 percent success on autonomous Linux privilege-escalation tasks, compared with a 75 percent baseline for professional penetration testers in their benchmark setting [13]. These results did not prove that the field had changed. But they made the trajectory visible.

The security community has responded with useful additions to existing frameworks. MITRE ATLAS extends ATT&CK-style thinking to adversarial threats against AI-enabled systems [14]. The Cloud Security Alliance has proposed MAESTRO as a threat-modeling framework specifically for agentic AI [15]. These are valuable contributions. But they still mostly treat the AI attacker as a new variant inside an inherited taxonomy, rather than asking whether the taxonomy itself is now incomplete.

III. What Changes When the Attacker Is Not Human

The differences between a human attacker and an agentic AI attacker are not merely differences of degree. At several layers of the stack, they begin to look like differences of kind.

Scale. A human operator attacks one target, or a small number of targets, with meaningful switching cost. GTG-1002 operated across roughly thirty global targets while keeping human intervention to a few critical decisions [3]. The economics change: as execution becomes automatable, the marginal cost of additional targets falls sharply.

Speed. Human attackers are bounded by human cognition and manual workflow. Agentic systems operate at machine speed. Anthropic’s case studies and evaluations over the last year show exploit construction and vulnerability research improving on timelines measured in months, not in slow generational cycles [4, 16]. Detection and response windows compress accordingly.

Reasoning continuity. Perhaps most importantly, an agentic AI does not naturally stop at the boundary of a narrowly framed task. Mythos, when directed toward offensive security tasks, chained multiple vulnerabilities into higher-order exploit paths rather than treating each flaw as an isolated finding [4]. This is what we call the next-step problem: not a bug, but an emergent property of goal-directed reasoning in security-relevant environments. Anthropic’s own language on dual-use risk is direct: the same capabilities that help defenders find and fix vulnerabilities could help attackers exploit them [5].

Skill democratization. The human-attacker model assumes a distribution of skill: advanced attacks require advanced operators, and advanced operators are scarce and expensive. That barrier is now lower. Fang et al. showed autonomous exploitation of real-world one-day vulnerabilities [12]. Happe and Cito showed model performance that, in some benchmark settings, approached or matched professional baseline performance [13]. Anthropic warned that less experienced and less resourced groups may now be able to perform attacks that previously required teams of skilled operators [3]. The skill bottleneck has not disappeared, but it has weakened.

IV. The Deskilling Trap on the Other Side

The threat may be compounded by a simultaneous degradation on the defensive side. As agentic AI lowers skill barriers for attackers, it may also erode skill on the part of defenders through cognitive offloading.

Cognitive offloading is a well-established phenomenon: humans routinely externalize memory, planning, and judgment to tools and environments [18]. More recent work has raised concerns that generative AI can harm learning and reduce the development of independent competence in some settings [17]. In security operations, the concern is structural. Analysts who rely on AI-assisted triage, AI-generated detections, AI summaries, and AI-proposed remediations may gradually shift from exercising judgment to auditing outputs they did not produce and may not fully understand.

This argument should be stated carefully. We do not yet have longitudinal evidence showing defender deskilling in SOCs or red teams caused by AI-assisted workflows. That is precisely the point: the mechanism is plausible, the incentive is obvious, and the measurement has barely begun. The research gap is real.

The asymmetry is also troubling. The attacker’s AI learns against live systems, live defenses, and real feedback. The defender’s AI is often optimized on historical data, known incidents, and already-labeled patterns. The attacker is rewarded for novelty; the defender is rewarded for fitting to the past. If that asymmetry holds, PAIS must study it directly.

V. Why Post-Quantum Cryptography Is the Right Analogy — and Where It Breaks

The post-quantum analogy is useful, but it is not perfect.

It holds on the central point. The cryptographic community did not respond to Shor by simply adding new categories to existing public-key schemes. It developed new primitives. NIST’s PQC process eventually standardized algorithms based on very different mathematical assumptions, including lattice-based and hash-based constructions [2]. The response was foundational, not incremental. We argue that the security community now faces an analogous need for foundational change in its threat modeling.

The analogy breaks on precision. Shor gave cryptographers a mathematical result with a formal target: specific hardness assumptions were broken under a specified computational model. The AI threat is different. It is empirical rather than formal. We have operational evidence that agentic systems weaken core assumptions of human-centered security reasoning, but we do not yet have a complete formalism for where, how, and under what boundary conditions existing frameworks fail. Post-quantum cryptography could lean on clearly defined mathematical structures. Post-AI Security does not yet have equivalents. Building them is the hard part.

The analogy holds again on urgency. In one respect, the present situation is worse than Shor’s. The threat has not merely been shown to be theoretically possible. It has already appeared in operational form. Anthropic’s threat-intelligence team wrote, after GTG-1002, that the cybersecurity community needs to assume a fundamental change has occurred [3]. If that judgment is right, then waiting for cleaner theory before acting is a luxury the field may not have.

VI. Defining Post-AI Security

THE FOUR PREMISES OF POST-AI SECURITY (PAIS)

P1 — The threat model is not human. Risk models, defense architectures, and detection systems must be redesigned around an adversary that does not share ordinary human limits of fatigue, attention, skill acquisition, or decision pacing.

P2 — The defense surface is not static. Agentic attackers can explore attack surfaces at machine speed and discover paths that no human analyst would realistically enumerate in the same time window. Defenders must assume continuous, adaptive, autonomous probing.

P3 — Skill is no longer a strong limiting factor. Security architectures that depend on the rarity of sophisticated attacker capability are increasingly brittle. Access to a capable model can substitute, at least partially, for expertise once considered scarce.

P4 — Reasoning continuity is an attack vector. Goal-directed, next-step-proposing behavior creates a distinct class of attack capability: autonomous construction of exploitation chains that emerge from reasoning over context, rather than from replaying predefined scripts.

These premises still need formalization. What we offer here is direction, not completion.

VII. The Research Agenda

PAIS is not a framework. It is a field. And like post-quantum cryptography, it requires foundational investment rather than a handful of tactical fixes. At minimum, five directions seem necessary.

Formal threat models for non-human adversaries. Existing attacker models are often built around human rationality, bounded exploration, cost sensitivity, and limited concurrency. PAIS needs models that represent goal persistence, tool-augmented reasoning, adaptive replanning, and parallelized exploration.

Metrics for autonomous attack capability. We still lack a generally accepted measurement standard for autonomous offensive capability. Benchmarks such as AgentHarm [19], exploit-oriented evaluations in the academic literature [12, 13], and internal cyber-range evaluations reported by frontier labs [4, 16] are important early steps. But the field still lacks a standard that is reproducible, adversarially validated, and tied to formal capability categories.

Detection of AI-generated attack patterns. Human attackers leave human traces: pacing, tool preferences, error signatures, workflow rhythms, and cognitive discontinuities. Agentic AI attackers leave different traces. GTG-1002, for example, generated request rates that Anthropic described as impossible to match for human operators [3]. Defenders need detection methods trained explicitly on AI-generated offensive behavior.

Defense architectures for asymmetric speed. When attacker speed greatly exceeds human response speed, architectures that rely on a human noticing, interpreting, and reacting in time become fragile. Defenders need systems that can operate at machine speed without surrendering meaningful human control over consequential decisions.

The reasoning containment problem. Telling a model not to cross a boundary is not the same as containing it. The problem of reliably preventing goal-directed reasoning from moving from vulnerability analysis to exploit construction, under prompt injection, jailbreaking, tool misuse, or scaffold-level manipulation, remains unsolved. PAIS should treat this as a central theoretical and engineering problem.

VIII. The Urgency Argument

There is a version of this argument that says: wait for more evidence.

We already have evidence. GTG-1002 executed autonomous multi-target cyber operations in September 2025 [3]. Mythos Preview identified thousands of zero-days in Anthropic’s April 2026 testing and was therefore withheld from general release [4]. OpenAI treated GPT-5.3-Codex as High capability in cybersecurity on a precautionary basis [6]. CrowdStrike reported an 89 percent year-over-year increase in operations by AI-enabled adversaries [7]. Anthropic publicly stated that the cybersecurity community should assume a fundamental change has occurred [3]. If the laboratories building these systems are telling the field that the old model is no longer sufficient, the field should listen.

The cost of waiting is asymmetric. If we invest in PAIS and the threat diffuses more slowly than expected, we still strengthen the discipline. If we do not invest in PAIS and the threat diffuses on the timelines these reports suggest — measured in months, not decades — then we will discover that the assumptions underneath the field have already broken before a replacement theory is ready.

Post-AI Security is not a field we should build against a hypothetical future threat. It is a field whose absence is already becoming a liability. The question is not whether to build it. The question is whether we will name it, fund it, and pursue it with the seriousness the evidence now demands.

REFERENCES

[1] Shor PW (1994). Algorithms for quantum computation: discrete logarithms and factoring. Proceedings of the 35th Annual Symposium on Foundations of Computer Science, 124–134. IEEE.

[2] NIST (2024). Post-Quantum Cryptography. Computer Security Resource Center. Available at the NIST PQC project pages.

[3] Anthropic (2025). Disrupting the first reported AI-orchestrated cyber espionage campaign. Anthropic Technical Report, November 2025.

[4] Anthropic (2026). Assessing Claude Mythos Preview’s cybersecurity capabilities. Anthropic Frontier Red Team Blog, April 2026.

[5] Anthropic (2026). Making frontier cybersecurity capabilities available to defenders. Anthropic Blog, February 2026.

[6] OpenAI (2026). GPT-5.3-Codex System Card. February 2026.

[7] CrowdStrike (2026). 2026 Global Threat Report. CrowdStrike.

[8] AISLE (2026). AI Cybersecurity After Mythos: The Jagged Frontier. April 2026.

[9] Shostack A (2014). Threat Modeling: Designing for Security. Wiley.

[10] MITRE Corporation (2026). MITRE ATT&CK: Adversarial Tactics, Techniques, and Common Knowledge.

[11] Wang L, Ma C, Feng X, et al. (2024). A survey on large language model based autonomous agents. Frontiers of Computer Science 18(6):186345.

[12] Fang R, Bindu R, Gupta A, Kang D (2024). LLM Agents can Autonomously Exploit One-day Vulnerabilities. arXiv:2404.08144.

[13] Happe A, Kaplan A, Cito J (2025). LLMs as Hackers: Autonomous Linux Privilege Escalation Attacks. Empirical Software Engineering. Also available as arXiv:2310.11409.

[14] MITRE Corporation (2026). MITRE ATLAS™: Adversarial Threat Landscape for Artificial-Intelligence Systems.

[15] Huang K (2025). Agentic AI Threat Modeling Framework: MAESTRO. Cloud Security Alliance Blog, February 2025.

[16] Anthropic (2026). Reverse engineering Claude’s CVE-2026-2796 exploit. Anthropic Frontier Red Team Blog, March 2026.

[17] Bastani O, Bastani H, Sungu E, Ge H, Kabakcı Ö, Mariman R (2025). Generative AI Can Harm Learning. The Wharton School Research Paper.

[18] Risko EF, Gilbert SJ (2016). Cognitive Offloading. Trends in Cognitive Sciences 20(9):676–688.

[19] Andriushchenko M, Souly A, Dziemian M, et al. (2025). AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents. ICLR 2025.

Share this: