Making Sense - Sam Harris - #420 — Countdown to Superintelligence

00:00:00.000 Welcome to the Making Sense Podcast. This is Sam Harris. Just a note to say that if you're

00:00:11.740 hearing this, you're not currently on our subscriber feed, and we'll only be hearing

00:00:15.720 the first part of this conversation. In order to access full episodes of the Making Sense

00:00:20.060 Podcast, you'll need to subscribe at samharris.org. We don't run ads on the podcast, and therefore

00:00:26.240 it's made possible entirely through the support of our subscribers. So if you enjoy what we're

00:00:30.200 doing here, please consider becoming one. I am here with Daniel Cocatello. Daniel,

00:00:38.920 thanks for joining me. Thanks for having me. So we'll get into your background in a second.

00:00:44.220 I just want to give people a reference that is going to be of great interest after we have this

00:00:49.560 conversation. You and a bunch of co-authors wrote a blog post titled AI 2027, which is a very

00:00:57.680 compelling read, and we're going to cover some of it, but I'm sure there are details there that we're

00:01:02.600 not going to get to. So I highly recommend that people read that. You might even read that before

00:01:08.020 coming back to listen to this conversation. Daniel, what's your background? We're going to talk about

00:01:13.700 the circumstances under which you left OpenAI, but maybe you can tell us how you came to work at

00:01:19.260 OpenAI in the first place. Sure. Yeah. So I've been sort of in the AI field for a while, mostly doing

00:01:27.560 forecasting and a little bit of alignment research. So that's probably why I got hired at OpenAI. I was

00:01:33.700 on the governance team. We were making policy recommendations to the company and trying to

00:01:38.880 predict where all of this was headed. I worked at OpenAI for two years, and then I quit last year.

00:01:43.700 And then I worked on AI 2027 with the team that we hired. And one of your co-authors on that blog

00:01:51.240 post was Scott Alexander. That's right. Yeah. Yeah. Yeah. Yeah. It's, again, very well worth reading.

00:01:57.460 So what happened at OpenAI that precipitated your leaving? And can you describe the circumstances of

00:02:05.680 your leaving? Because I seem to remember you had to walk away. You refused to sign an NDA or

00:02:11.760 a non-disparagement agreement or something and had to walk away from your equity. And that was

00:02:16.440 perceived as both a sign of the scale of your alarm and the depth of your principles.

00:02:23.500 What happened over there? Yeah. So this story has been covered elsewhere in greater detail. But the

00:02:28.720 summary is that there wasn't any one particular event or scary thing that was happening. It was more

00:02:35.880 the general trends. So if you've read AI 2027, you get a sense of the sorts of things that I'm

00:02:42.480 expecting to happen in the future. And frankly, I think it's going to be incredibly dangerous.

00:02:49.040 And I think that there's a lot that society needs to be doing to get ready for this and to try to

00:02:53.320 avoid those bad outcomes and to steer things in a good direction. And there's especially a lot that

00:02:57.900 companies who are building this technology need to be doing, which we'll get into later.

00:03:02.200 And not only was OpenAI not really doing those things, OpenAI was sort of not on track to get

00:03:08.560 ready or to take these sorts of concerns seriously, I think. And I gradually came to believe this over

00:03:14.640 my time there and gradually came to think that, well, basically that we were on a path towards

00:03:19.020 something like AI 2027 happening and that it was hopeless to try to sort of be on the inside and

00:03:24.480 talk to people and try to steer things in a good direction that way.

00:03:28.560 So that's why I left. And then with the equity thing, they make their employees,

00:03:34.780 when people leave, they have this agreement that they try to get you to sign, which among other

00:03:40.500 things says that you basically have to agree never to criticize the company again and also never to

00:03:44.900 tell anyone about this agreement, which was the clause that I found objectionable. And if you don't

00:03:50.900 sign, then they take away all of your equity, including your vested equity.

00:03:54.500 That's sort of a shocking detail. Is that even legal? I mean, isn't vested equity vested equity?

00:03:59.240 Yeah. I mean, one of the lessons I learned from this whole experience is it's good to get lawyers

00:04:02.880 know your rights, you know? I don't know if it was legal actually, but what happened was my wife and

00:04:09.300 I talked about it and ultimately decided not to sign, even though we knew we would lose our equity

00:04:14.300 because we wanted to have the moral high ground and to be able to criticize the company in the future.

00:04:18.440 And happily, it worked out really well for us because there was a huge uproar. Like when this

00:04:24.040 came to light, a lot of employees were very upset. You know, the public was upset and the company very

00:04:28.960 quickly backed down and changed the policies. So we got to keep our equity actually.

00:04:34.120 Okay. Yeah, good. So let's remind people about what this phrase alignment problem means. I mean,

00:04:43.460 I've just obviously discussed this topic a bunch on the podcast over the years, but many people may

00:04:48.480 be joining us relatively naive to the topic. How do you think about the alignment problem? And

00:04:54.920 why is it that some very well-informed people don't view it as a problem at all?

00:05:02.560 Well, it's different for every person. I guess working backwards, well, I'll work forwards. So first

00:05:07.240 of all, what is the alignment problem? It's the problem of figuring out how to make AIs

00:05:11.600 sort of reliably do what we want. It's maybe more specifically the problem of

00:05:16.600 shaping the cognition of the AIs so that they have the goals that we want them to have. They

00:05:24.020 have the virtues that we want them to have, such as honesty, for example. It's very important that

00:05:28.160 our AIs be honest with us. Getting them to reliably be honest with us is part of the alignment

00:05:33.080 problem. And it's sort of an open secret that we don't really have a good solution to the alignment

00:05:38.700 problem right now. You can go read the literature on this. You can also look at what's currently

00:05:45.480 happening. The AIs are not actually reliably honest. And there's many documented examples of

00:05:50.020 them saying things that we're pretty sure they know are not true. So this is a big, open,

00:05:55.520 unsolved problem that we are gradually making progress towards. And right now, the stakes are

00:06:00.900 very low. Right now, we just have these chatbots that even when they're misaligned and even when they

00:06:06.700 cheat or lie or whatever, it's not really that big of a problem. But these companies,

00:06:12.840 OpenAI, Anthropic, Google DeepMind, some of these other companies as well, they are racing to build

00:06:19.580 superintelligence. You can see this on their website and in the statements of the CEOs,

00:06:24.840 especially OpenAI and Anthropic have literally said that they are building superintelligence.

00:06:31.080 They're trying to build it, that they think they will succeed around the end of this decade or

00:06:35.160 before this decade is out. What is superintelligence? Superintelligence is an AI system that is better

00:06:41.480 than the best humans at everything, while also being faster and cheaper. So if they succeed in

00:06:48.520 getting to superintelligence, then the alignment problem suddenly becomes extremely high stakes.

00:06:53.720 We need to make sure that any superintelligences that are built, or at least the first ones that are

00:06:57.900 built are aligned. Otherwise, terrible things could happen, such as human extinction.

00:07:04.500 Yeah, so we'll get there. The leap from having what one person called a functionally a country of

00:07:11.560 geniuses in a data center, the leap from that to real world risk and something like human extinction

00:07:18.880 is going to seem counterintuitive to some people. So we'll definitely cover that. But why is it,

00:07:24.160 I mean, we have people, I guess some people have moved on this topic. I mean, so forgive me if I'm

00:07:29.580 unfairly maligning anyone, but I remember someone like Jan LeCun over at Facebook, who's obviously one

00:07:37.220 of the pioneers in the field, just doesn't give any credence at all to the concept of an alignment

00:07:43.540 problem. I've lost touch with how these people justify that degree of insouciance. What's your view of

00:07:52.700 the skepticism that you meet there?

00:07:54.960 Well, it's different for different people. And honestly, it would be helpful to have a more

00:07:59.400 specific example of something someone has said for me to respond to. With Jan LeCun, if I remember

00:08:04.140 correctly, for a while he was both saying things to the effect of AIs are just tools and they're going

00:08:11.780 to be submissive and obedient to us because they're AIs and there just isn't much of a problem here.

00:08:16.420 And also saying things along the lines of, they're never going to be super intelligent or like,

00:08:21.560 you know, the current LLMs are not on a path to AGI. They're not going to be able to, you know,

00:08:28.160 actually autonomously do a bunch of stuff.

00:08:30.680 It seems to me that the thinking on that front has changed a lot.

00:08:34.440 Indeed. Jan himself has sort of walked that back a bit and is now starting to, he's still sort of like

00:08:40.140 an AI skeptic, but now he's, I think there's a quote where he said something like, we're not

00:08:45.640 going to get to superintelligence in the next five years or something, which is a much milder claim

00:08:50.580 than what he used to be saying.

00:08:52.740 Well, when I started talking about this, I think the first time was around 2016. So nine years ago,

00:08:58.720 I bumped into a lot of people who would say this isn't going to happen for 50 years, at least.

00:09:05.240 I'm not hearing increments of half centuries thrown around much anymore. I mean, a lot of people are

00:09:11.320 debating the difference between your time horizon, like two years or three years and, you know, five

00:09:18.340 or 10. I mean, 10 at the outside is what I'm hearing from people who seem cautious.

00:09:23.520 Yep. I think that's basically right as a description of what smart people in the field are sort of

00:09:28.760 converging towards. And I think that's an incredibly important fact for the general public to be aware

00:09:33.320 of is, and everyone needs to know that the field of AI experts and AI forecasters has lowered its

00:09:40.180 timelines and is now thinking that there is a substantial chance that some of these companies

00:09:45.500 will actually succeed in building superintelligence sometime around the end of the decade or so.

00:09:50.300 There's lots of disagreement about timelines exactly, but that's sort of like where a lot of

00:09:55.040 the opinions are headed towards now.

00:09:56.860 So the problem of alignment is the most grandiose, speculative, science fiction inflected version

00:10:06.000 of the risk posed by AI, right? This is the risk that a super intelligent, self-improving,

00:10:13.220 autonomous system could get away from us and not have our well-being in its sights or actually be,

00:10:20.900 you know, actually hostile to it and for some reason that we didn't put into the AI. And

00:10:26.720 therefore, we could find ourselves playing chess against the perfect chess engine and failing.

00:10:33.080 And that poses an existential threat, which we'll describe. But obviously, there are nearer term

00:10:38.120 concerns that more and more people are worried about. There's the human misuse of increasingly

00:10:43.660 powerful AI. There's, we might call this a containment problem. I think Mustafa Suleyman over

00:10:51.820 at Microsoft, it used to be a deep mind, tends to think of the problem of containment first,

00:10:56.960 that really it's, you know, aligned or not, as this technology gets more democratized,

00:11:03.440 people can decide to put it to sinister use, which is to say, you know, use that we would

00:11:10.200 consider unaligned. They can, you know, change the system level prompt and make, you know,

00:11:18.100 make these tools malicious as they become increasingly powerful. And it's hard to see

00:11:23.860 how we can contain the spread of that risk. And yeah, I mean, so then there's just the other

00:11:32.320 issues like, you know, job displacement and economic and political concerns that are all

00:11:38.500 too obvious. I mean, it's just the spread of misinformation and the political instability

00:11:42.960 that can arise in the context of spreading misinformation and shocking degrees of wealth

00:11:48.920 inequality that might, you know, initially be unmasked by the growth of this technology.

00:11:54.580 Let's just get into this landscape, knowing that misaligned superintelligence is the kind

00:11:59.680 of the final topic we want to talk about. What is it that you and your co-authors are predicting?

00:12:07.040 Why did you title your piece AI 2027? What are the next two years on your account hold for us?

00:12:12.960 That's a lot to talk about. So the reason why we titled AI 2027 is because in the scenario that we

00:12:19.800 wrote, the most important pivotal events and decisions happen in 2027. The story continues

00:12:26.380 to 2028, 2029, et cetera. But the most important part of the story happens in 2027. For example,

00:12:32.580 what you might call, what was called in the literature, AI takeoff happens in AI 2027. AI takeoff is this

00:12:38.860 forecasted dynamic of the speed of AI research accelerating dramatically when AIs are able to

00:12:45.200 do AI research much better than humans. So in other words, when you automate the AI research,

00:12:50.600 probably it will go faster. And there's a question about how much faster it will go,

00:12:55.180 what that looks like, et cetera, when it will eventually asymptote. But that whole dynamic is

00:12:59.740 called AI takeoff. And it happens in our scenario in 2027. I should say, as a footnote, I've updated

00:13:06.380 my timelines a little bit more optimistic after writing this. And now I would say 2028 is more

00:13:10.720 likely. But broadly speaking, I still feel like it's basically the tracks we're headed on.

00:13:16.380 So when you say AI takeoff, are you, is that synonymous with the older phrase,

00:13:20.460 an intelligence explosion?

00:13:21.420 Basically. Yeah. Yeah. I mean, that phrase has been with us for a long time since the mathematician

00:13:27.780 I.J. Goode, I think in the fifties, you deposited this, just extrapolated, you know, from the general

00:13:34.500 principle that once you had machines, intelligent machines devising the next generation of intelligent

00:13:39.780 machines, that this process could be self-sustaining and asymptotic and get away from us. And he dubbed

00:13:46.020 this, an intelligence explosion. So this is mostly a story of, of software improving software. I mean,

00:13:54.220 it's the, the, the AI at this point doesn't yet have its hands on, you know, physical factories at

00:13:58.760 building new chips or robots.

00:14:00.900 That's right. Yeah. Yeah. So, I mean, and this, and this is also another important thing that I think

00:14:05.520 that I would like people to think about more and understand better is that I think that, um,

00:14:09.580 at least in our view, most of the important decisions that affect the fate of the world will be made

00:14:14.780 prior to any massive transformations of the economy due to AI. And if you want to understand why or how,

00:14:22.700 why we mean that, et cetera, well, it's all laid out. And in our scenario, you can sort of see the

00:14:27.520 events unfold. And then you sort of, after you finish reading it, you can be like, oh yeah, I guess

00:14:31.240 like the world looked kind of pretty normal in 2027, even though, you know, behind closed doors at the

00:14:36.580 AI companies, all of these incredibly impactful decisions were being made about automating AI research

00:14:41.200 and producing superintelligence and so forth. And then in 2028, things are going crazy in the real

00:14:46.720 world. And there's all these new factories and robots and stuff being built orchestrated by the

00:14:50.840 superintelligences. But in terms of like where to intervene, you don't want to wait until the

00:14:55.960 superintelligences are already building all the factories. You want to like try to steer things in a

00:15:01.140 better direction before then. Yeah. So in your piece, I mean, it is a kind of a piece of speculative

00:15:07.100 fiction in a way, but it's all too plausible. And what's interesting is just the, some of the

00:15:13.420 disjunctions you point out. I mean, like moments where the economy is actually, you know, for real

00:15:19.100 people is probably being destroyed because people are becoming far less valuable. There's another blog

00:15:25.460 post that perhaps you know about called the intelligence curse, which goes over some of this

00:15:30.500 ground as well, which I, which I recommend people look up, but that's, that's really just the,

00:15:34.420 a name for this principle that once AI is better at, you know, virtually everything than people are,

00:15:41.560 right? Once it's all analogous to chess, the value of people just evaporates from the point of view of

00:15:49.660 companies and even governments, right? I mean, that's just, we, people are not necessary there for,

00:15:55.100 because they can't add value to any process that's, that's running the economy or the most

00:16:00.240 important process is they're, they're running the economy. So there's just these interesting

00:16:04.240 moments where you, the stock market might be booming, but the economy for most people is

00:16:10.360 actually in free fall. And then you get into the implications of the, an arms race between the U.S.

00:16:16.560 and China. And it's all, it's very, it's all too plausible. Once you, the moments you admit that

00:16:23.820 we are in this arms race condition and an arms race is precisely the situation wherein all the

00:16:30.080 players are not holding safety as their top priority. Yeah. And unfortunately, you know,

00:16:36.700 I don't think it's good that we're in an arms race, but it does seem to be what we're headed

00:16:40.660 towards. And it seems to be what the companies are also like pushing along, right? If you, if you look

00:16:47.140 at the rhetoric coming out of the lobbyists, for example, they talk a lot about how it's important

00:16:51.680 to beat China and how the U.S. needs to maintain its competitive advantage in AI and so forth.

00:16:57.300 And I mean, more generally, like it's kind of like, I'm not sure what the best way to say this

00:17:02.480 is, but, but basically a lot of people at these companies building this technology expect something

00:17:09.540 more or less like AI 2027 to happen and have expected this for years. And like, this is what

00:17:14.800 they are building towards and they're doing it because they think if we don't do it, someone else

00:17:18.120 will do it worse. And, you know, they think that they think it's going to work out well.

00:17:22.980 Do they think it's going to work out well, or they just think that there is no alternative

00:17:27.720 because we have a coordination problem we can't solve? I mean, the anthropic, if anthropic stops,

00:17:34.020 they know that open AI is not going to stop. They can't agree to, you know, all the, all the,

00:17:39.860 the U.S. players can't agree to stop together. And even if they did, they know that China wouldn't

00:17:45.140 stop. Right. So it's just this, it's a coordination problem. They can't be solved. Even if everyone

00:17:50.280 agrees that in an arms race condition, it could likely with, with some significant probability,

00:17:56.020 I mean, maybe, maybe it's only 10% in some people's minds, but it's still a non-negligible

00:18:00.540 probability of birthing something that destroys us. Yeah. My, my, my take on that is it's both.

00:18:06.020 So I think that, you know, I have lots of friends at these companies and I used to work there and I

00:18:10.480 talked to lots of people there all the time. In my opinion, I think on average, they're overly

00:18:14.340 optimistic about where all this is headed perhaps because they're biased because their, their job

00:18:20.280 depends on them thinking it's a good idea to do all of this. But also separately, there is this

00:18:25.600 very, there is both a real like arms race dynamic where it just really is true that if, if one company

00:18:32.400 decides not to do this, then other companies will probably just do it anyway. And it really is true

00:18:37.060 that if one country decides not to do this, other countries will probably do it anyway. And then

00:18:42.080 there's also an added element of perceived dynamic there where a lot of people are basically not

00:18:48.180 even trying to coordinate the world to handle this responsibly and to put guardrails in place or to

00:18:53.080 slow down or whatever. And they're not trying because they basically think it's hopeless to,

00:18:57.220 to achieve that level of coordination. Well, you mentioned that, that the LLMs are already showing

00:19:03.080 some deceptive characteristics. I mean, I guess we might wonder whether what is functionally

00:19:08.620 appearing as deception is really deception. I mean, we're really motivated in any sense that we

00:19:14.480 would, uh, uh, you know, whether, whether we're guilty of anthropomorphizing these systems by calling

00:19:18.780 it lying or deception, but what's the behavior that we have seen from some of these systems that,

00:19:24.620 that we're calling, uh, lying or cheating or, or deception?

00:19:29.100 Yeah. Great question. So there's a couple of things. Keywords to search for are sycophancy,

00:19:34.840 reward hacking, and scheming. So, uh, there's various papers on this and there's even blog posts

00:19:41.560 by open AI and Anthropic, uh, detailing some examples that have been found. Um, so sycophancy

00:19:46.500 is a observed tendency of many of these AI systems to basically suck up to or flatter the humans they're

00:19:52.600 talking to, often in ways that are just extremely over the top. If you'd like to continue listening

00:20:00.920 to this conversation, you'll need to subscribe at samharris.org. Once you do, you'll get access to

00:20:07.060 all full length episodes of the Making Sense podcast. The Making Sense podcast is ad free and

00:20:12.580 relies entirely on listener support. And you can subscribe now at samharris.org.

00:20:22.600 Thank you.

Making Sense - Sam Harris - June 12, 2025

#420 — Countdown to Superintelligence

Episode Stats

Length

Words per Minute

Word Count

Sentence Count

Misogynist Sentences

Hate Speech Sentences

Summary

Transcript