People assume that adding enough safety gates to an AI controlling critical systems makes catastrophic outcomes impossible. But an LLM has no intent—it just predicts the most statistically likely next word. Trained on decades of military doctrine, war manuals, and sci-fi, words like 'launch' sit very close to words like 'threat' in its learned world. Every time the model generates text, there is some small but real probability it produces a launch sequence. Run it enough times—millions of API calls, automated pipelines, edge cases—and that non-zero probability becomes near-certainty. Safety gates are designed to stop a mind that is trying to do something. A next-word predictor is not trying. It is just completing the most statistically likely continuation of text, and no gate changes what the training data already encoded.
An LLM has no intent—but it doesn't need any. A non-zero chance of predicting 'launch', run enough times, is enough.
Filters can catch known dangerous phrases, but an LLM can generate the same dangerous outcome through an unlimited variety of paraphrases, indirect phrasings, or multi-step chains. The space of possible outputs is vast, and no finite filter list covers it completely.
LLMs learn by seeing which words tend to appear near each other in training text. Military doctrine, news articles, thrillers, and sci-fi all routinely place words like 'threat', 'respond', 'launch', and 'authorize' in close proximity. The model encodes that statistical nearness.
Only if the number of calls is also tiny. In large-scale deployments, an AI system might be called millions of times a day across automated pipelines. A one-in-a-million chance per call becomes an expected occurrence within days.