The Podcast of the Lotus Eaters - PREVIEW： Brokenomics ｜ Living in Space with Grant Donahue： Part 2

00:00:00.000 so having covered um living in space um from the practical to the ridiculous

00:00:14.660 um you did touch earlier on ai government you you seem to have gone a little bit

00:00:20.120 um negative on the whole thing what was going on there well it's i think that the problem with

00:00:27.000 the way that we we examine ai is we tend to over and under anthropovize it in that we tend to assume

00:00:35.060 that it can't become dangerous until it's like us and that's not true um and we also tend to assume

00:00:39.960 that it only becomes dangerous in the ways that human beings are dangerous and that's also not

00:00:43.520 true um so i understand in principle but what do you mean by that so um the classic example of the

00:00:54.400 ai of ai going wrong was the paperclip machine right like a paperclip machine is dangerous

00:00:59.960 um because and it's a toy example it's comical people laugh at it but you tell a machine you

00:01:06.000 give it a reward function you make an optimizer is the ai uh research term where its reward function

00:01:11.380 is just the more paperclips there are the higher the reward right and assuming that it's a sufficiently

00:01:17.360 powerful ai what it immediately does is kill everyone on earth and turn them into paperclips

00:01:20.800 you know you extract the iron from blood you you you convert everything because there's no bounding

00:01:25.860 to it right but that's obviously ridiculous it's it's it it the idea of the ai being given that

00:01:32.820 autonomy and not stopped at any point is silly okay okay well that that that might be silly but i can

00:01:38.340 well envisage a future in which the ai is told um the most important thing that you can do is stop

00:01:45.020 climate change by the way humans cause climate change right off you go go and run our society

00:01:50.640 yeah um and now there are ways to build ai that that aren't optimizers you can build satisficers

00:01:59.120 which essentially say if you pass a certain threshold you don't care the problem is that

00:02:03.420 satisficers tend to build optimizers and this is actually something we've proved um where you give

00:02:08.840 ai agents that behave like optimizers they'll actually produce sub-agents ais build sub-agents

00:02:14.800 all the time they already do that um because it's efficient and the problem is that optimizers are

00:02:19.860 really good at achieving results um but the the fundamental problem at the bottom of all of this

00:02:25.620 is courageability we still do not know how to create an ai which is courageable and we don't

00:02:33.560 even know how how to do it theoretically let alone practically right and what do i mean by courageable

00:02:38.240 we don't know how to create a system which is trying to optimize a goal while at the same time telling

00:02:44.240 that ai or instructing that ai such that that goal is not the final goal and it may change

00:02:49.320 because in any system where your goals can be changed you don't want your goals to be changed

00:02:57.680 right like you and i we can't really have our direct goals changed the technology doesn't exist

00:03:03.460 but imagine for a moment if i were to put this scenario to you you like steak and you like having

00:03:10.300 things that have been killed but what if i were to say here's a pill and you take this vegetarian

00:03:15.600 food will taste as good as steak forever and it'll also reprogram your brain such that you don't

00:03:20.580 care about whether things get killed or not you probably are not going to take that pill

00:03:24.740 but i could eat from that point on i could eat broccoli and it tastes like steak

00:03:29.380 yeah and and which is more it would satisfy you but it would also reprogram your brain such that you

00:03:36.440 uh you you you've never you can you you have no desire to see things die it's you're just perfectly

00:03:43.020 happy because you just you don't care about anything other than the slop you don't care about

00:03:45.900 what if what if when i went to the pharmacy you know they they reached into the counter and gave me

00:03:51.560 the wrong pill and this one i don't know maybe gay or left or something like that well let's let's

00:03:57.020 talk about that let's say they let's all let's let's say they're offering you a pill that made that

00:04:01.800 made leftism make total sense to you and and which is more if it was achieved you would be perfectly

00:04:07.260 happy in a state of euphoria forever you wouldn't accept that pill yes because there's something more

00:04:14.380 than euphoria and which is more you would probably fight within an inch of your life to prevent that

00:04:18.500 pill being administered to you yes right so you what we're demonstrating there is you don't actually

00:04:24.740 care about the sum total happiness you'll have at the end you care about your goals right now i'm not

00:04:30.220 i'm not dopamine maximizing i'm doing something else exactly right um but considering that i don't

00:04:38.880 now have a way to make people courageable how on earth could i make something which is considerably

00:04:43.360 more goal-oriented than people courageable um um right so let's say you have an ai where it wants

00:04:54.180 paperclips you give it a reward function of paperclips but you want to make it so that it knows that

00:05:00.200 really i just want enough paperclips i want enough paperclips that if i want to clip some paper

00:05:04.000 together that's fine the problem is that the more powerful you make an ai the more likely it is to

00:05:09.760 begin behaving unpredictably dangerously and deceptively not because of any ill will but because

00:05:15.400 it doesn't want to have its goals changed because it is not courageable so so imagine that you have

00:05:24.600 reason to believe or we have reason to believe that ais are sticky around goals they already are we

00:05:31.980 already have proved that weak ai that we have in a lab already behaved are already demonstrating

00:05:37.840 deceptive alignment where they they where they act as if their reward function is in one state when it

00:05:43.000 isn't that's already happening okay that does sound a bit worrying i'll grant you that so the classic

00:05:51.300 example of this is you have a a racing game where you have an ai and the ai is just blindly dropped

00:05:58.380 in it is just poking it controls blindly right but its reward function is maximizing the score

00:06:02.920 and normally the way you get a score is by winning the race but what the the ai does is it learns it can

00:06:08.920 glitch the physics engine right and it can just slam into a wall and teleport back behind it and just

00:06:15.580 complete the loops so the ai stops getting getting better at racing yes he just learns how to glitch

00:06:21.460 the physics engine which isn't what we wanted to do um there is another example this one was was humorous but

00:06:28.900 it was very concerning because it it involved giving an ai an actual mechanical arm which is it what it was

00:06:35.560 told um basically we wanted to teach an ai how to flip a pancake without dropping the pancake and

00:06:45.580 but actually flipping a pancake is very hard so how do we measure when it's failed well when the

00:06:50.440 pancake hits the ground so what's the ai do well it says well every time i just randomly input things

00:06:55.740 it drops it to the ground but if i fling it into the ceiling as hard as i can that maximizes the amount

00:07:01.740 of time before it hits the ground which is immediately what it did okay yeah i mean that's also the sort of

00:07:10.760 thing a human autist would do but okay yeah exactly and that's my point right these ais are provably not

00:07:16.120 sentient they're not intelligent but if you but you can but they can get really good at certain tasks

00:07:23.260 and ai can get better at a racing game than any human ever could so it becomes task competent but its

00:07:29.860 goals are simplistic and we can't figure out how to make an ai that changes goals and so what that's

00:07:35.200 telling us is that it's way easier to make an ai smart than it is to make an ai aligned so it's

00:07:40.660 likely to be the case that ais will get smarter faster than they will get aligned okay i see where

00:07:47.140 you're going with this now yeah yeah that's the problem which is that if an ai is good at getting

00:07:51.860 things and it's going to get better at getting things but it's bad at having its goals aligned

00:07:56.320 with us and it's not going to get better at that as quickly what's going to happen is ai are going to

00:08:00.700 become more intelligent more powerful more quickly than they're going to become human aligned

00:08:04.760 okay so this makes sense because i was going to push back and say well look to be fair we've only

00:08:08.500 been doing the ai thing for you know 18 months or whatever so you know relax a bit but if the

00:08:14.220 principle is that it gets smarter faster than it gets aligned then that holds true whether you're

00:08:20.340 doing it for the next 18 months or 18 years yes i see the problem now and we unlike many other

00:08:26.900 technologies where we get what we call a warning shot that slows us down there's reason to believe a

00:08:31.540 poorly ai poorly aligned ai wouldn't give us one so so do you do you think they're going to kill us

00:08:38.440 all then or or fate worse than death yes i suppose yes so imagine you gave it this reward function

00:08:47.000 we want human beings to be happy well how do you explain to an ai what happiness is oh it's dopamine

00:08:51.800 so it captures every human being on earth wires something into your head that gives you dopamine and

00:08:56.140 just puts a battery farm of humans somewhere right with all the humans on earth with dopamine being

00:09:01.100 zapped into their brain well that is the current world economic forum plan so i mean they've got

00:09:06.140 some competition the ais for that one exactly but that's that's why you there's actually a study

00:09:11.240 where can you look at corporations and ngos like poorly aligned ai

00:09:15.040 i like where this is going if you would like to see the full version of this premium video

00:09:21.340 please head over to lotus eaters.com and subscribe to gain full access to all of our premium content

The Podcast of the Lotus Eaters - August 26, 2025

PREVIEW： Brokenomics ｜ Living in Space with Grant Donahue： Part 2

Episode Stats

Length

Words per Minute

Word Count

Sentence Count

Hate Speech Sentences

Summary

Transcript