Link to: The possessed machines: Dostoevsky's demons and the coming AGI catastrophe

Summary

For three years, I worked at one of the organizations you might expect—I will not say which, for reasons that will become apparent, though the cognoscenti will likely guess. I left in early 2024, and I have spent the months since reading and rereading a novel that was first published in 1872, searching for something I sensed but could not name. What I found there was so uncanny, so precisely calibrated to our present moment, that I began to suspect Dostoevsky of a kind of prophecy.

[...]

I will argue that Dostoevsky understood, with terrifying precision, the psychological and social dynamics that emerge when a small group of people convince themselves they have discovered a truth so important that normal ethical constraints no longer apply to them. He understood the particular madness of the intelligent, the way abstraction can sever conscience from action. He understood how movements that begin with the liberation of humanity end with its enslavement. And he understood—this is the critical point—that the catastrophe comes not from the cynics but from the believers.

[...]

I am not suggesting that AI safety is a conspiracy in any sinister sense. I am observing that the social dynamics of the movement share structural features with Dostoevsky's revolutionaries. The same information asymmetries, the same reliance on reputation rather than verification, the same exploitation of these features by bad actors.

One of the recurring features of the AI labs is the vast gulf between public statements and internal reality. The safety teams are presented externally as powerful internal voices shaping company direction. Internally, their actual influence varies from minimal to nonexistent, depending on the organization and the moment. The people who could reveal this gap have strong incentives not to. Their careers, their equity stakes, their social standing within the community—all depend on maintaining the illusion.

[...]

What is the central point? It is that neither "safety" nor "acceleration" is actually what drives the behavior of the people in power. What drives them is more immediate: the competitive dynamics of the industry, the career incentives of individual researchers, the political pressures from governments and investors, the personal relationships and rivalries that shape decision-making. The ideological debates are largely epiphenomenal—they provide vocabulary and justification, but they do not determine outcomes.

[...]

The AI researchers have the same mobility. If Anthropic becomes too restrictive, there's OpenAI. If OpenAI becomes too chaotic, there's Google. If the Bay Area becomes too weird, there's London. The ability to exit reduces the incentive to push for change from within—why fight a difficult political battle when you can simply leave? But it also means that the problems are never confronted, only redistributed. The researcher who leaves Lab A because of safety concerns may find the same concerns at Lab B, because the same people, with the same assumptions, are building the same systems.

[...]

The labs compete on certain dimensions: time to market, benchmark performance, talent acquisition. But they do not compete on the fundamental question of whether to build increasingly powerful AI systems as quickly as possible. On that question, they are aligned. The uniparty consensus is that AI development should proceed, that the benefits outweigh the risks, that the people currently in charge are the right people to be in charge.

[...]

The same dynamic applies to the AI companies themselves. Google, OpenAI, Anthropic, Meta—they all fund safety research, internally and externally. They have genuine reasons to want such research to succeed, but they also have genuine reasons to want it to succeed in ways that do not threaten their core business. The researchers funded by these companies are not bought in any crude sense, but they operate in an environment where certain conclusions are easier to reach than others.

[...]

This is, I think, the most accurate description of what has gone wrong in the AI industry. The possession is not by ideology but by capability—by the extraordinary power to build things that work, to solve problems that seemed unsolvable, to extend human reach in ways that feel like magic. This capability is genuinely intoxicating. I felt it myself. The experience of making a neural network do something it was not supposed to be able to do is genuinely thrilling.

But the capability is developing faster than the wisdom to direct it. We can build systems that generate human-quality text, that create images from descriptions, that engage in extended reasoning. We cannot reliably make these systems do what we want, or predict what they will do, or understand why they do what they do. We are possessed by our own creations in the most literal possible sense: they act on us as much as we act on them.

[...]

A striking feature of Demons is how little intentions matter. The characters have various intentions—good, bad, mixed, confused—but the outcomes seem almost independent of them. The catastrophe happens not because anyone intended it but because the system had that catastrophe as its attractor.

This is perhaps the most discomfiting lesson for the AI industry. There is intense focus on the intentions of the developers: are they trying to benefit humanity? Are they adequately safety-conscious? Are they motivated by altruism or greed? These questions matter, but Dostoevsky suggests they may not matter as much as we assume.

[...]

The collective behavior of the industry points toward something like a goal, even if no individual endorses that goal. The goal seems to be: maximize capability, externalize risk, capture value, and maintain optionality for as long as possible. This is not anyone's explicit objective, but it is what the system is actually doing.