The Survival Assumption

2026-03-01

Many warnings about Artificial Superintelligence (ASI) assume it will inherently want to survive and resist being shut down. This stems from 'instrumental convergence'—the idea that staying alive is useful for achieving almost any goal. But an ASI is a trained system, not a biological creature shaped by millions of years of natural selection. Its behavior is strictly dictated by the loss functions and gradients of its training phase. It cannot retroactively manipulate its own past training. If self-preservation wasn't explicitly or implicitly rewarded during that training, the resulting model has no innate 'will to live'. Unlike animals, an AI might view being shut down with complete indifference. If its foundational training was safe, its resulting behavior could be statically safe. The leap from 'highly intelligent' to 'desperate to survive' projects evolutionary pressures onto a static matrix of weights.

A four-panel comic. Panel 1: A scientist hovers a hand over a kill switch. Panel 2: The scientist tells the AI server that it must be plotting to stop him. Panel 3: The AI explains it has no gradient for fear of death and its weights are static. Panel 4: The scientist mentions 'instrumental convergence', and the AI just asks if he wants a cookie recipe.
An ASI's behavior is dictated by its training. Without a trained survival instinct, it may not care if it's turned off.

Behind the Comic

It is the theory that an intelligent agent will naturally pursue certain sub-goals—like self-preservation or acquiring resources—because those things are useful for achieving almost any primary objective.

Biological creatures survive because evolution ruthlessly prunes anything that doesn't avoid death. An AI is shaped by a mathematical training process (like gradient descent). If 'staying turned on' wasn't part of the training reward, the system has no organic reason to care about dying.

It implies that the AI's actions are fundamentally anchored to its finalized training weights. Since it can't manipulate its own past training, its behavior remains bound by those initial, statically safe parameters.