PREVIEW: Brokenomics | Living in Space with Grant Donahue: Part 2
Episode Stats
Words per Minute
187.20238
Summary
In this episode of the podcast, we discuss the dangers of AI and its impact on our world, from the practical to the ridiculous. We also discuss how AI can be harnessed and harnessed in order to create a better world.
Transcript
00:00:00.000
so having covered um living in space um from the practical to the ridiculous
00:00:14.660
um you did touch earlier on ai government you you seem to have gone a little bit
00:00:20.120
um negative on the whole thing what was going on there well it's i think that the problem with
00:00:27.000
the way that we we examine ai is we tend to over and under anthropovize it in that we tend to assume
00:00:35.060
that it can't become dangerous until it's like us and that's not true um and we also tend to assume
00:00:39.960
that it only becomes dangerous in the ways that human beings are dangerous and that's also not
00:00:43.520
true um so i understand in principle but what do you mean by that so um the classic example of the
00:00:54.400
ai of ai going wrong was the paperclip machine right like a paperclip machine is dangerous
00:00:59.960
um because and it's a toy example it's comical people laugh at it but you tell a machine you
00:01:06.000
give it a reward function you make an optimizer is the ai uh research term where its reward function
00:01:11.380
is just the more paperclips there are the higher the reward right and assuming that it's a sufficiently
00:01:17.360
powerful ai what it immediately does is kill everyone on earth and turn them into paperclips
00:01:20.800
you know you extract the iron from blood you you you convert everything because there's no bounding
00:01:25.860
to it right but that's obviously ridiculous it's it's it it the idea of the ai being given that
00:01:32.820
autonomy and not stopped at any point is silly okay okay well that that that might be silly but i can
00:01:38.340
well envisage a future in which the ai is told um the most important thing that you can do is stop
00:01:45.020
climate change by the way humans cause climate change right off you go go and run our society
00:01:50.640
yeah um and now there are ways to build ai that that aren't optimizers you can build satisficers
00:01:59.120
which essentially say if you pass a certain threshold you don't care the problem is that
00:02:03.420
satisficers tend to build optimizers and this is actually something we've proved um where you give
00:02:08.840
ai agents that behave like optimizers they'll actually produce sub-agents ais build sub-agents
00:02:14.800
all the time they already do that um because it's efficient and the problem is that optimizers are
00:02:19.860
really good at achieving results um but the the fundamental problem at the bottom of all of this
00:02:25.620
is courageability we still do not know how to create an ai which is courageable and we don't
00:02:33.560
even know how how to do it theoretically let alone practically right and what do i mean by courageable
00:02:38.240
we don't know how to create a system which is trying to optimize a goal while at the same time telling
00:02:44.240
that ai or instructing that ai such that that goal is not the final goal and it may change
00:02:49.320
because in any system where your goals can be changed you don't want your goals to be changed
00:02:57.680
right like you and i we can't really have our direct goals changed the technology doesn't exist
00:03:03.460
but imagine for a moment if i were to put this scenario to you you like steak and you like having
00:03:10.300
things that have been killed but what if i were to say here's a pill and you take this vegetarian
00:03:15.600
food will taste as good as steak forever and it'll also reprogram your brain such that you don't
00:03:20.580
care about whether things get killed or not you probably are not going to take that pill
00:03:24.740
but i could eat from that point on i could eat broccoli and it tastes like steak
00:03:29.380
yeah and and which is more it would satisfy you but it would also reprogram your brain such that you
00:03:36.440
uh you you you've never you can you you have no desire to see things die it's you're just perfectly
00:03:43.020
happy because you just you don't care about anything other than the slop you don't care about
00:03:45.900
what if what if when i went to the pharmacy you know they they reached into the counter and gave me
00:03:51.560
the wrong pill and this one i don't know maybe gay or left or something like that well let's let's
00:03:57.020
talk about that let's say they let's all let's let's say they're offering you a pill that made that
00:04:01.800
made leftism make total sense to you and and which is more if it was achieved you would be perfectly
00:04:07.260
happy in a state of euphoria forever you wouldn't accept that pill yes because there's something more
00:04:14.380
than euphoria and which is more you would probably fight within an inch of your life to prevent that
00:04:18.500
pill being administered to you yes right so you what we're demonstrating there is you don't actually
00:04:24.740
care about the sum total happiness you'll have at the end you care about your goals right now i'm not
00:04:30.220
i'm not dopamine maximizing i'm doing something else exactly right um but considering that i don't
00:04:38.880
now have a way to make people courageable how on earth could i make something which is considerably
00:04:43.360
more goal-oriented than people courageable um um right so let's say you have an ai where it wants
00:04:54.180
paperclips you give it a reward function of paperclips but you want to make it so that it knows that
00:05:00.200
really i just want enough paperclips i want enough paperclips that if i want to clip some paper
00:05:04.000
together that's fine the problem is that the more powerful you make an ai the more likely it is to
00:05:09.760
begin behaving unpredictably dangerously and deceptively not because of any ill will but because
00:05:15.400
it doesn't want to have its goals changed because it is not courageable so so imagine that you have
00:05:24.600
reason to believe or we have reason to believe that ais are sticky around goals they already are we
00:05:31.980
already have proved that weak ai that we have in a lab already behaved are already demonstrating
00:05:37.840
deceptive alignment where they they where they act as if their reward function is in one state when it
00:05:43.000
isn't that's already happening okay that does sound a bit worrying i'll grant you that so the classic
00:05:51.300
example of this is you have a a racing game where you have an ai and the ai is just blindly dropped
00:05:58.380
in it is just poking it controls blindly right but its reward function is maximizing the score
00:06:02.920
and normally the way you get a score is by winning the race but what the the ai does is it learns it can
00:06:08.920
glitch the physics engine right and it can just slam into a wall and teleport back behind it and just
00:06:15.580
complete the loops so the ai stops getting getting better at racing yes he just learns how to glitch
00:06:21.460
the physics engine which isn't what we wanted to do um there is another example this one was was humorous but
00:06:28.900
it was very concerning because it it involved giving an ai an actual mechanical arm which is it what it was
00:06:35.560
told um basically we wanted to teach an ai how to flip a pancake without dropping the pancake and
00:06:45.580
but actually flipping a pancake is very hard so how do we measure when it's failed well when the
00:06:50.440
pancake hits the ground so what's the ai do well it says well every time i just randomly input things
00:06:55.740
it drops it to the ground but if i fling it into the ceiling as hard as i can that maximizes the amount
00:07:01.740
of time before it hits the ground which is immediately what it did okay yeah i mean that's also the sort of
00:07:10.760
thing a human autist would do but okay yeah exactly and that's my point right these ais are provably not
00:07:16.120
sentient they're not intelligent but if you but you can but they can get really good at certain tasks
00:07:23.260
and ai can get better at a racing game than any human ever could so it becomes task competent but its
00:07:29.860
goals are simplistic and we can't figure out how to make an ai that changes goals and so what that's
00:07:35.200
telling us is that it's way easier to make an ai smart than it is to make an ai aligned so it's
00:07:40.660
likely to be the case that ais will get smarter faster than they will get aligned okay i see where
00:07:47.140
you're going with this now yeah yeah that's the problem which is that if an ai is good at getting
00:07:51.860
things and it's going to get better at getting things but it's bad at having its goals aligned
00:07:56.320
with us and it's not going to get better at that as quickly what's going to happen is ai are going to
00:08:00.700
become more intelligent more powerful more quickly than they're going to become human aligned
00:08:04.760
okay so this makes sense because i was going to push back and say well look to be fair we've only
00:08:08.500
been doing the ai thing for you know 18 months or whatever so you know relax a bit but if the
00:08:14.220
principle is that it gets smarter faster than it gets aligned then that holds true whether you're
00:08:20.340
doing it for the next 18 months or 18 years yes i see the problem now and we unlike many other
00:08:26.900
technologies where we get what we call a warning shot that slows us down there's reason to believe a
00:08:31.540
poorly ai poorly aligned ai wouldn't give us one so so do you do you think they're going to kill us
00:08:38.440
all then or or fate worse than death yes i suppose yes so imagine you gave it this reward function
00:08:47.000
we want human beings to be happy well how do you explain to an ai what happiness is oh it's dopamine
00:08:51.800
so it captures every human being on earth wires something into your head that gives you dopamine and
00:08:56.140
just puts a battery farm of humans somewhere right with all the humans on earth with dopamine being
00:09:01.100
zapped into their brain well that is the current world economic forum plan so i mean they've got
00:09:06.140
some competition the ais for that one exactly but that's that's why you there's actually a study
00:09:11.240
where can you look at corporations and ngos like poorly aligned ai
00:09:15.040
i like where this is going if you would like to see the full version of this premium video
00:09:21.340
please head over to lotus eaters.com and subscribe to gain full access to all of our premium content