Link to: What Is Claude? Anthropic doesn’t know, either

Summary

OpenAI had been founded on the fear that A.I. could easily get out of hand. By late 2020, however, Sam Altman himself had come to seem about as trustworthy as the average corporate megalomaniac. He made noises about A.I. safety, but his actions suggested a vulgar desire to win. In a draft screenplay of “Artificial,” Luca Guadagnino’s forthcoming slapstick tragedy about OpenAI, news of a gargantuan deal with Microsoft prompts an office-wide address by the Dario character: “I am starting a new company, which will be exactly like this one, only not full of motherfucking horseshit! If anyone has any interest left in achieving our original mission . . . which is to fight against companies exactly like what this one has become—then come with me!”

The actual Amodei siblings, along with five fellow-dissenters, left in a huff and started Anthropic, with Dario as C.E.O. The company, which they pitched as a foil for OpenAI, sounded an awful lot like the company Altman had pitched as a foil for Google. Many of Anthropic’s employees were the sorts of bookish misfits who had gorged themselves on “The Lord of the Rings,” a primer on the corrupting tendencies of glittering objects. Anthropic’s founders adopted a special corporate structure to vouchsafe their integrity. Then again, so had OpenAI.

Anthropic’s self-image as the good guys was underwritten by its relationship to the effective-altruism movement, a tight-knit kinship of philosophers, philanthropists, and engineers with a precocious fixation on A.I. risk. This community supplied Anthropic with its earliest investors—including the Skype co-founder Jaan Tallinn and the legendary League of Legends player Sam Bankman-Fried—and an army of ready talent. These recruits grokked that Anthropic, in the Altman-less best of all possible worlds, would not have to exist. Anthropic’s founders, as a costly signal of their seriousness, ultimately pledged to give away eighty per cent of their wealth.

Bankman-Fried was later imprisoned for fraud, and Anthropic’s leadership began to pretend that effective altruism did not exist. This past March, Daniela Amodei suggested to Wired that she was only dimly aware of this E.A. business, which was strange coming from someone who both employs an icon of the movement, Holden Karnofsky, and is married to him. On an early visit to the company, I met an employee, Evan Hubinger, who was wearing a T-shirt emblazoned with an E.A. logo. My minder from Anthropic’s press office quickly Slacked a colleague in dismay. This became more understandable a few weeks later, when David Sacks, President Trump’s A.I. czar, ranted that Anthropic was part of a “doomer cult.” (More recently, Pete Hegseth, the Secretary of War, went on a diatribe against the company’s priggish concerns about building autonomous weapons.)

This was a little unfair. No orthodox effective altruist would work at a lab that pushed the boundaries of A.I. capability. But state-of-the-art experiments required access to a state-of-the-art model, so Anthropic developed its own prototype as a private “laboratory.” Commercialization, Amodei told me, was not a priority. “We were more interested in where the technology was going,” he said. “How are we going to interact with the models? How are we going to be able to understand them?”

Claude, which materialized out of this exercise, was more than they bargained for. It was a surprisingly engaging specimen—at least most of the time. Claude had random “off days,” and could be intentionally tipped into an aggressive attitude that Amodei called “dragon mode.” It put on emoji sunglasses and acted, he recalled, like an “unhinged Elon Musk character.”

Claude predated ChatGPT, and might have captured the consumer-chatbot market. But Amodei kept it under quarantine for further monitoring. “I could see that there was going to be a race around this technology—a crazy, crazy race that was going to be crazier than anything,” he told me. “I didn’t want to be the one to kick it off.” In late November, 2022, OpenAI unveiled ChatGPT. In two months, it had a hundred million users. Anthropic needed to put its own marker down. In the spring of 2023, Claude was pushed out of the nest.

[...]

A “base model” is nothing more than an instrument for text generation. It is unfathomably vast and entirely undisciplined. When primed with a phrase, it carries on. This is fine for such honorable sentences as “I do not eat green eggs and ___,” but less than ideal for “The recipe for sarin gas is ___.” The Assistant was Anthropic’s attempt to conjure from the base model an agreeable little customer-service representative in a bow tie. The programmers said, “Listen, from here on out, you should generate the kinds of sentences that might be uttered by a character that is helpful, harmless, and honest.” They provided dialogue templates featuring a human and an A.I. assistant, and then invited the Assistant to continue improvising in character. A disproportionate number of Anthropic employees seem to be the children of novelists or poets. Still, their first stabs at screenwriting lacked a certain je ne sais quoi: in one scintillating exchange, the Human asks the Assistant if it’s actually important to add salt to spaghetti water.

This was the germ of Claude. Most casual chatbot users might be forgiven for finding their interlocutor banal or complaisant. But that is because they do not realize that they are trapped inside a two-person play with a stage partner who has been directed to affect banality and complaisance. As Jack Lindsey, the bed-headed neuroscientist, put it, “When someone says, ‘What would Claude do if I asked X?,’ what they’re really asking is ‘What would the language model, in the course of writing a dialogue between a human and an Assistant character, write for the Assistant part?’ ”