Making Sense - Sam Harris - June 12, 2025


#420 — Countdown to Superintelligence


Episode Stats

Length

20 minutes

Words per Minute

174.36012

Word Count

3,640

Sentence Count

199

Misogynist Sentences

1

Hate Speech Sentences

2


Summary

Daniel Cocatello left OpenAI last year. In this episode, he talks about the circumstances under which he left, why he left and why he decided to leave, and the lessons he learned from his experience at OpenAI.


Transcript

00:00:00.000 Welcome to the Making Sense Podcast. This is Sam Harris. Just a note to say that if you're
00:00:11.740 hearing this, you're not currently on our subscriber feed, and we'll only be hearing
00:00:15.720 the first part of this conversation. In order to access full episodes of the Making Sense
00:00:20.060 Podcast, you'll need to subscribe at samharris.org. We don't run ads on the podcast, and therefore
00:00:26.240 it's made possible entirely through the support of our subscribers. So if you enjoy what we're
00:00:30.200 doing here, please consider becoming one. I am here with Daniel Cocatello. Daniel,
00:00:38.920 thanks for joining me. Thanks for having me. So we'll get into your background in a second.
00:00:44.220 I just want to give people a reference that is going to be of great interest after we have this
00:00:49.560 conversation. You and a bunch of co-authors wrote a blog post titled AI 2027, which is a very
00:00:57.680 compelling read, and we're going to cover some of it, but I'm sure there are details there that we're
00:01:02.600 not going to get to. So I highly recommend that people read that. You might even read that before
00:01:08.020 coming back to listen to this conversation. Daniel, what's your background? We're going to talk about
00:01:13.700 the circumstances under which you left OpenAI, but maybe you can tell us how you came to work at
00:01:19.260 OpenAI in the first place. Sure. Yeah. So I've been sort of in the AI field for a while, mostly doing
00:01:27.560 forecasting and a little bit of alignment research. So that's probably why I got hired at OpenAI. I was
00:01:33.700 on the governance team. We were making policy recommendations to the company and trying to
00:01:38.880 predict where all of this was headed. I worked at OpenAI for two years, and then I quit last year.
00:01:43.700 And then I worked on AI 2027 with the team that we hired. And one of your co-authors on that blog
00:01:51.240 post was Scott Alexander. That's right. Yeah. Yeah. Yeah. Yeah. It's, again, very well worth reading.
00:01:57.460 So what happened at OpenAI that precipitated your leaving? And can you describe the circumstances of
00:02:05.680 your leaving? Because I seem to remember you had to walk away. You refused to sign an NDA or
00:02:11.760 a non-disparagement agreement or something and had to walk away from your equity. And that was
00:02:16.440 perceived as both a sign of the scale of your alarm and the depth of your principles.
00:02:23.500 What happened over there? Yeah. So this story has been covered elsewhere in greater detail. But the
00:02:28.720 summary is that there wasn't any one particular event or scary thing that was happening. It was more
00:02:35.880 the general trends. So if you've read AI 2027, you get a sense of the sorts of things that I'm
00:02:42.480 expecting to happen in the future. And frankly, I think it's going to be incredibly dangerous.
00:02:49.040 And I think that there's a lot that society needs to be doing to get ready for this and to try to
00:02:53.320 avoid those bad outcomes and to steer things in a good direction. And there's especially a lot that
00:02:57.900 companies who are building this technology need to be doing, which we'll get into later.
00:03:02.200 And not only was OpenAI not really doing those things, OpenAI was sort of not on track to get
00:03:08.560 ready or to take these sorts of concerns seriously, I think. And I gradually came to believe this over
00:03:14.640 my time there and gradually came to think that, well, basically that we were on a path towards
00:03:19.020 something like AI 2027 happening and that it was hopeless to try to sort of be on the inside and
00:03:24.480 talk to people and try to steer things in a good direction that way.
00:03:28.560 So that's why I left. And then with the equity thing, they make their employees,
00:03:34.780 when people leave, they have this agreement that they try to get you to sign, which among other
00:03:40.500 things says that you basically have to agree never to criticize the company again and also never to
00:03:44.900 tell anyone about this agreement, which was the clause that I found objectionable. And if you don't
00:03:50.900 sign, then they take away all of your equity, including your vested equity.
00:03:54.500 That's sort of a shocking detail. Is that even legal? I mean, isn't vested equity vested equity?
00:03:59.240 Yeah. I mean, one of the lessons I learned from this whole experience is it's good to get lawyers
00:04:02.880 know your rights, you know? I don't know if it was legal actually, but what happened was my wife and
00:04:09.300 I talked about it and ultimately decided not to sign, even though we knew we would lose our equity
00:04:14.300 because we wanted to have the moral high ground and to be able to criticize the company in the future.
00:04:18.440 And happily, it worked out really well for us because there was a huge uproar. Like when this
00:04:24.040 came to light, a lot of employees were very upset. You know, the public was upset and the company very
00:04:28.960 quickly backed down and changed the policies. So we got to keep our equity actually.
00:04:34.120 Okay. Yeah, good. So let's remind people about what this phrase alignment problem means. I mean,
00:04:43.460 I've just obviously discussed this topic a bunch on the podcast over the years, but many people may
00:04:48.480 be joining us relatively naive to the topic. How do you think about the alignment problem? And
00:04:54.920 why is it that some very well-informed people don't view it as a problem at all?
00:05:02.560 Well, it's different for every person. I guess working backwards, well, I'll work forwards. So first
00:05:07.240 of all, what is the alignment problem? It's the problem of figuring out how to make AIs
00:05:11.600 sort of reliably do what we want. It's maybe more specifically the problem of
00:05:16.600 shaping the cognition of the AIs so that they have the goals that we want them to have. They
00:05:24.020 have the virtues that we want them to have, such as honesty, for example. It's very important that
00:05:28.160 our AIs be honest with us. Getting them to reliably be honest with us is part of the alignment
00:05:33.080 problem. And it's sort of an open secret that we don't really have a good solution to the alignment
00:05:38.700 problem right now. You can go read the literature on this. You can also look at what's currently
00:05:45.480 happening. The AIs are not actually reliably honest. And there's many documented examples of
00:05:50.020 them saying things that we're pretty sure they know are not true. So this is a big, open,
00:05:55.520 unsolved problem that we are gradually making progress towards. And right now, the stakes are
00:06:00.900 very low. Right now, we just have these chatbots that even when they're misaligned and even when they
00:06:06.700 cheat or lie or whatever, it's not really that big of a problem. But these companies,
00:06:12.840 OpenAI, Anthropic, Google DeepMind, some of these other companies as well, they are racing to build
00:06:19.580 superintelligence. You can see this on their website and in the statements of the CEOs,
00:06:24.840 especially OpenAI and Anthropic have literally said that they are building superintelligence.
00:06:31.080 They're trying to build it, that they think they will succeed around the end of this decade or
00:06:35.160 before this decade is out. What is superintelligence? Superintelligence is an AI system that is better
00:06:41.480 than the best humans at everything, while also being faster and cheaper. So if they succeed in
00:06:48.520 getting to superintelligence, then the alignment problem suddenly becomes extremely high stakes.
00:06:53.720 We need to make sure that any superintelligences that are built, or at least the first ones that are
00:06:57.900 built are aligned. Otherwise, terrible things could happen, such as human extinction.
00:07:04.500 Yeah, so we'll get there. The leap from having what one person called a functionally a country of
00:07:11.560 geniuses in a data center, the leap from that to real world risk and something like human extinction
00:07:18.880 is going to seem counterintuitive to some people. So we'll definitely cover that. But why is it,
00:07:24.160 I mean, we have people, I guess some people have moved on this topic. I mean, so forgive me if I'm
00:07:29.580 unfairly maligning anyone, but I remember someone like Jan LeCun over at Facebook, who's obviously one
00:07:37.220 of the pioneers in the field, just doesn't give any credence at all to the concept of an alignment
00:07:43.540 problem. I've lost touch with how these people justify that degree of insouciance. What's your view of
00:07:52.700 the skepticism that you meet there?
00:07:54.960 Well, it's different for different people. And honestly, it would be helpful to have a more
00:07:59.400 specific example of something someone has said for me to respond to. With Jan LeCun, if I remember
00:08:04.140 correctly, for a while he was both saying things to the effect of AIs are just tools and they're going
00:08:11.780 to be submissive and obedient to us because they're AIs and there just isn't much of a problem here.
00:08:16.420 And also saying things along the lines of, they're never going to be super intelligent or like,
00:08:21.560 you know, the current LLMs are not on a path to AGI. They're not going to be able to, you know,
00:08:28.160 actually autonomously do a bunch of stuff.
00:08:30.680 It seems to me that the thinking on that front has changed a lot.
00:08:34.440 Indeed. Jan himself has sort of walked that back a bit and is now starting to, he's still sort of like
00:08:40.140 an AI skeptic, but now he's, I think there's a quote where he said something like, we're not
00:08:45.640 going to get to superintelligence in the next five years or something, which is a much milder claim
00:08:50.580 than what he used to be saying.
00:08:52.740 Well, when I started talking about this, I think the first time was around 2016. So nine years ago,
00:08:58.720 I bumped into a lot of people who would say this isn't going to happen for 50 years, at least.
00:09:05.240 I'm not hearing increments of half centuries thrown around much anymore. I mean, a lot of people are
00:09:11.320 debating the difference between your time horizon, like two years or three years and, you know, five
00:09:18.340 or 10. I mean, 10 at the outside is what I'm hearing from people who seem cautious.
00:09:23.520 Yep. I think that's basically right as a description of what smart people in the field are sort of
00:09:28.760 converging towards. And I think that's an incredibly important fact for the general public to be aware
00:09:33.320 of is, and everyone needs to know that the field of AI experts and AI forecasters has lowered its
00:09:40.180 timelines and is now thinking that there is a substantial chance that some of these companies
00:09:45.500 will actually succeed in building superintelligence sometime around the end of the decade or so.
00:09:50.300 There's lots of disagreement about timelines exactly, but that's sort of like where a lot of
00:09:55.040 the opinions are headed towards now.
00:09:56.860 So the problem of alignment is the most grandiose, speculative, science fiction inflected version
00:10:06.000 of the risk posed by AI, right? This is the risk that a super intelligent, self-improving,
00:10:13.220 autonomous system could get away from us and not have our well-being in its sights or actually be,
00:10:20.900 you know, actually hostile to it and for some reason that we didn't put into the AI. And
00:10:26.720 therefore, we could find ourselves playing chess against the perfect chess engine and failing.
00:10:33.080 And that poses an existential threat, which we'll describe. But obviously, there are nearer term
00:10:38.120 concerns that more and more people are worried about. There's the human misuse of increasingly
00:10:43.660 powerful AI. There's, we might call this a containment problem. I think Mustafa Suleyman over
00:10:51.820 at Microsoft, it used to be a deep mind, tends to think of the problem of containment first,
00:10:56.960 that really it's, you know, aligned or not, as this technology gets more democratized,
00:11:03.440 people can decide to put it to sinister use, which is to say, you know, use that we would
00:11:10.200 consider unaligned. They can, you know, change the system level prompt and make, you know,
00:11:18.100 make these tools malicious as they become increasingly powerful. And it's hard to see
00:11:23.860 how we can contain the spread of that risk. And yeah, I mean, so then there's just the other
00:11:32.320 issues like, you know, job displacement and economic and political concerns that are all
00:11:38.500 too obvious. I mean, it's just the spread of misinformation and the political instability
00:11:42.960 that can arise in the context of spreading misinformation and shocking degrees of wealth
00:11:48.920 inequality that might, you know, initially be unmasked by the growth of this technology.
00:11:54.580 Let's just get into this landscape, knowing that misaligned superintelligence is the kind
00:11:59.680 of the final topic we want to talk about. What is it that you and your co-authors are predicting?
00:12:07.040 Why did you title your piece AI 2027? What are the next two years on your account hold for us?
00:12:12.960 That's a lot to talk about. So the reason why we titled AI 2027 is because in the scenario that we
00:12:19.800 wrote, the most important pivotal events and decisions happen in 2027. The story continues
00:12:26.380 to 2028, 2029, et cetera. But the most important part of the story happens in 2027. For example,
00:12:32.580 what you might call, what was called in the literature, AI takeoff happens in AI 2027. AI takeoff is this
00:12:38.860 forecasted dynamic of the speed of AI research accelerating dramatically when AIs are able to
00:12:45.200 do AI research much better than humans. So in other words, when you automate the AI research,
00:12:50.600 probably it will go faster. And there's a question about how much faster it will go,
00:12:55.180 what that looks like, et cetera, when it will eventually asymptote. But that whole dynamic is
00:12:59.740 called AI takeoff. And it happens in our scenario in 2027. I should say, as a footnote, I've updated
00:13:06.380 my timelines a little bit more optimistic after writing this. And now I would say 2028 is more
00:13:10.720 likely. But broadly speaking, I still feel like it's basically the tracks we're headed on.
00:13:16.380 So when you say AI takeoff, are you, is that synonymous with the older phrase,
00:13:20.460 an intelligence explosion?
00:13:21.420 Basically. Yeah. Yeah. I mean, that phrase has been with us for a long time since the mathematician
00:13:27.780 I.J. Goode, I think in the fifties, you deposited this, just extrapolated, you know, from the general
00:13:34.500 principle that once you had machines, intelligent machines devising the next generation of intelligent
00:13:39.780 machines, that this process could be self-sustaining and asymptotic and get away from us. And he dubbed
00:13:46.020 this, an intelligence explosion. So this is mostly a story of, of software improving software. I mean,
00:13:54.220 it's the, the, the AI at this point doesn't yet have its hands on, you know, physical factories at
00:13:58.760 building new chips or robots.
00:14:00.900 That's right. Yeah. Yeah. So, I mean, and this, and this is also another important thing that I think
00:14:05.520 that I would like people to think about more and understand better is that I think that, um,
00:14:09.580 at least in our view, most of the important decisions that affect the fate of the world will be made
00:14:14.780 prior to any massive transformations of the economy due to AI. And if you want to understand why or how,
00:14:22.700 why we mean that, et cetera, well, it's all laid out. And in our scenario, you can sort of see the
00:14:27.520 events unfold. And then you sort of, after you finish reading it, you can be like, oh yeah, I guess
00:14:31.240 like the world looked kind of pretty normal in 2027, even though, you know, behind closed doors at the
00:14:36.580 AI companies, all of these incredibly impactful decisions were being made about automating AI research
00:14:41.200 and producing superintelligence and so forth. And then in 2028, things are going crazy in the real
00:14:46.720 world. And there's all these new factories and robots and stuff being built orchestrated by the
00:14:50.840 superintelligences. But in terms of like where to intervene, you don't want to wait until the
00:14:55.960 superintelligences are already building all the factories. You want to like try to steer things in a
00:15:01.140 better direction before then. Yeah. So in your piece, I mean, it is a kind of a piece of speculative
00:15:07.100 fiction in a way, but it's all too plausible. And what's interesting is just the, some of the
00:15:13.420 disjunctions you point out. I mean, like moments where the economy is actually, you know, for real
00:15:19.100 people is probably being destroyed because people are becoming far less valuable. There's another blog
00:15:25.460 post that perhaps you know about called the intelligence curse, which goes over some of this
00:15:30.500 ground as well, which I, which I recommend people look up, but that's, that's really just the,
00:15:34.420 a name for this principle that once AI is better at, you know, virtually everything than people are,
00:15:41.560 right? Once it's all analogous to chess, the value of people just evaporates from the point of view of
00:15:49.660 companies and even governments, right? I mean, that's just, we, people are not necessary there for,
00:15:55.100 because they can't add value to any process that's, that's running the economy or the most
00:16:00.240 important process is they're, they're running the economy. So there's just these interesting
00:16:04.240 moments where you, the stock market might be booming, but the economy for most people is
00:16:10.360 actually in free fall. And then you get into the implications of the, an arms race between the U.S.
00:16:16.560 and China. And it's all, it's very, it's all too plausible. Once you, the moments you admit that
00:16:23.820 we are in this arms race condition and an arms race is precisely the situation wherein all the
00:16:30.080 players are not holding safety as their top priority. Yeah. And unfortunately, you know,
00:16:36.700 I don't think it's good that we're in an arms race, but it does seem to be what we're headed
00:16:40.660 towards. And it seems to be what the companies are also like pushing along, right? If you, if you look
00:16:47.140 at the rhetoric coming out of the lobbyists, for example, they talk a lot about how it's important
00:16:51.680 to beat China and how the U.S. needs to maintain its competitive advantage in AI and so forth.
00:16:57.300 And I mean, more generally, like it's kind of like, I'm not sure what the best way to say this
00:17:02.480 is, but, but basically a lot of people at these companies building this technology expect something
00:17:09.540 more or less like AI 2027 to happen and have expected this for years. And like, this is what
00:17:14.800 they are building towards and they're doing it because they think if we don't do it, someone else
00:17:18.120 will do it worse. And, you know, they think that they think it's going to work out well.
00:17:22.980 Do they think it's going to work out well, or they just think that there is no alternative
00:17:27.720 because we have a coordination problem we can't solve? I mean, the anthropic, if anthropic stops,
00:17:34.020 they know that open AI is not going to stop. They can't agree to, you know, all the, all the,
00:17:39.860 the U.S. players can't agree to stop together. And even if they did, they know that China wouldn't
00:17:45.140 stop. Right. So it's just this, it's a coordination problem. They can't be solved. Even if everyone
00:17:50.280 agrees that in an arms race condition, it could likely with, with some significant probability,
00:17:56.020 I mean, maybe, maybe it's only 10% in some people's minds, but it's still a non-negligible
00:18:00.540 probability of birthing something that destroys us. Yeah. My, my, my take on that is it's both.
00:18:06.020 So I think that, you know, I have lots of friends at these companies and I used to work there and I
00:18:10.480 talked to lots of people there all the time. In my opinion, I think on average, they're overly
00:18:14.340 optimistic about where all this is headed perhaps because they're biased because their, their job
00:18:20.280 depends on them thinking it's a good idea to do all of this. But also separately, there is this
00:18:25.600 very, there is both a real like arms race dynamic where it just really is true that if, if one company
00:18:32.400 decides not to do this, then other companies will probably just do it anyway. And it really is true
00:18:37.060 that if one country decides not to do this, other countries will probably do it anyway. And then
00:18:42.080 there's also an added element of perceived dynamic there where a lot of people are basically not
00:18:48.180 even trying to coordinate the world to handle this responsibly and to put guardrails in place or to
00:18:53.080 slow down or whatever. And they're not trying because they basically think it's hopeless to,
00:18:57.220 to achieve that level of coordination. Well, you mentioned that, that the LLMs are already showing
00:19:03.080 some deceptive characteristics. I mean, I guess we might wonder whether what is functionally
00:19:08.620 appearing as deception is really deception. I mean, we're really motivated in any sense that we
00:19:14.480 would, uh, uh, you know, whether, whether we're guilty of anthropomorphizing these systems by calling
00:19:18.780 it lying or deception, but what's the behavior that we have seen from some of these systems that,
00:19:24.620 that we're calling, uh, lying or cheating or, or deception?
00:19:29.100 Yeah. Great question. So there's a couple of things. Keywords to search for are sycophancy,
00:19:34.840 reward hacking, and scheming. So, uh, there's various papers on this and there's even blog posts
00:19:41.560 by open AI and Anthropic, uh, detailing some examples that have been found. Um, so sycophancy
00:19:46.500 is a observed tendency of many of these AI systems to basically suck up to or flatter the humans they're
00:19:52.600 talking to, often in ways that are just extremely over the top. If you'd like to continue listening
00:20:00.920 to this conversation, you'll need to subscribe at samharris.org. Once you do, you'll get access to
00:20:07.060 all full length episodes of the Making Sense podcast. The Making Sense podcast is ad free and
00:20:12.580 relies entirely on listener support. And you can subscribe now at samharris.org.
00:20:22.600 Thank you.