The Peter Attia Drive - February 19, 2024


#290 ‒ Liquid biopsies for early cancer detection, the role of epigenetics in aging, and the future of aging research | Alex Aravanis, M.D., Ph.D.


Episode Stats


Length

2 hours and 5 minutes

Words per minute

165.39502

Word count

20,839

Sentence count

1,065

Harmful content

Misogyny

13

sentences flagged

Hate speech

7

sentences flagged


Summary

Summaries generated with gmurro/bart-large-finetuned-filtered-spotify-podcast-summ .

Alex Aravins is the CEO and Co-Founder of Moonwalk Biosciences, a company that focuses on genome sequencing and personalized medicine. In this episode, we discuss the evolution of DNA sequencing, epigenetics, and the biology of aging.

Transcript

Transcript generated with Whisper (turbo).
Misogyny classifications generated with MilaNLProc/bert-base-uncased-ear-misogyny .
Hate speech classifications generated with facebook/roberta-hate-speech-dynabench-r4-target .
00:00:00.000 Hey, everyone. Welcome to the Drive podcast. I'm your host, Peter Atiyah. This podcast,
00:00:16.540 my website, and my weekly newsletter all focus on the goal of translating the science of longevity
00:00:21.520 into something accessible for everyone. Our goal is to provide the best content in health and
00:00:26.720 wellness, and we've established a great team of analysts to make this happen. It is extremely
00:00:31.660 important to me to provide all of this content without relying on paid ads. To do this, our work
00:00:36.960 is made entirely possible by our members, and in return, we offer exclusive member-only content
00:00:42.700 and benefits above and beyond what is available for free. If you want to take your knowledge of
00:00:47.940 this space to the next level, it's our goal to ensure members get back much more than the price
00:00:53.200 of the subscription. If you want to learn more about the benefits of our premium membership,
00:00:58.020 head over to peteratiyahmd.com forward slash subscribe. My guest this week is Alex Aravinas.
00:01:06.920 Alex is the CEO and co-founder of Moonwalk Biosciences. I should note up front that I am
00:01:13.160 also an investor in and an advisor to Moonwalk Biosciences. Alex and I were colleagues in medical
00:01:20.020 school, so I've known Alex for a little over 25 years now. Before Moonwalk, Alex was Illumina's
00:01:25.600 chief technology officer, the SVP and head of research and product development, and under his
00:01:31.440 leadership, Illumina launched the industry-leading product for generating and analyzing most of the
00:01:36.980 world's genomic data. He developed large genome-based research and clinical applications, including
00:01:42.660 whole genome sequencing for rare disease diagnoses, comprehensive genomic profiling for cancer,
00:01:48.400 and for selected optimal therapies, and the most advanced AI tools for interpreting genomic
00:01:54.420 information. Alex has been the founder of several biotech and healthcare companies, including Grail
00:01:59.780 Bio, where he served as the chief science officer and head of R&D. At Grail, he led the development of
00:02:06.180 its multi-cancer early screening test gallery, which we'll discuss at length in this podcast. He holds over
00:02:12.300 30 patents and serves on the scientific advisory board for several biotechnology companies.
00:02:17.240 Alex received his master's and his PhD in electrical engineering and his MD from Stanford University
00:02:23.980 and his undergrad in engineering from Berkeley. In this episode, we talk about two related things,
00:02:31.500 liquid biopsies and epigenetics. We cover the evolution of genome sequencing and tumor sequencing.
00:02:38.100 We then speak at length about Alex's work with Grail and liquid biopsies, including an understanding of
00:02:44.340 cell-free DNA, methylation, sensitivity, specificity, along with the positive and negative predictive
00:02:50.100 value of liquid biopsies. We then get into epigenetics, methylation, and the biology of aging.
00:02:56.660 This is an especially complicated topic, but truthfully, there are few topics in biology today
00:03:02.280 that excite me more than this. And I suspect that my enthusiasm will come across pretty clearly here.
00:03:08.820 So without further delay, please enjoy my conversation with Alex Aravines.
00:03:19.020 Hey, Alex, great to be sitting down with you here today. I kind of wish we were doing this in person
00:03:24.040 because we haven't seen each other in person in a few months. And even that was sort of a chance
00:03:28.040 meeting. So I guess by way of background, you and I go back over 20 years now, I guess it's 25 years
00:03:34.880 that we both started med school together. It's hard to believe it's been that long, huh?
00:03:40.380 It seems like a million years ago, but it also seems like yesterday. Yeah, those are good times.
00:03:45.240 So Alex, one of the things I remember when we first met was that we pretty much clicked over
00:03:50.020 the fact that we were both engineers coming in. And we had a good group of friends that I remember in
00:03:54.220 medical school. And the one thing we had in common is not one of us was a pre-med. We were all kind of
00:03:58.720 whatever the term was they used to describe as non-traditional path to medical school. So let's
00:04:04.960 talk a little bit about just briefly your background. You came in as an electrical engineer and then you
00:04:09.800 did a PhD in a lab of a very prominent scientist by the name of Dick Chen. Maybe tell folks a little
00:04:15.280 bit about what you did in that work and what it was that got you excited enough about science to
00:04:20.480 deviate off the traditional MD path.
00:04:22.640 Yeah, my PhD was in electrical engineering and Stanford has a cool configuration on the
00:04:28.900 campus where the engineering school is literally across the street from the medical school.
00:04:33.060 And so over time, I became more and more interested in applying signal processing techniques, circuit
00:04:40.020 design, imaging, AI, things like that. But the problems in medicine that were more interesting
00:04:45.260 to me than some of the traditional engineering products and things like that. Met a world famous
00:04:51.060 neuroscientist named, as you mentioned, Dick Chen, who was very interested in fundamental questions
00:04:56.200 about the quantum unit of communication in the brain, which is the individual synaptic vesicle.
00:05:02.260 And there was a question of just what did it look like and how did it operate? And it was the
00:05:07.060 beginning for me of just applying these engineering tools to really important questions in biology and
00:05:11.460 helping answer them. That first story was a great article in Nature where we definitively answered
00:05:17.320 the question of how that quantum is transmitted between cells. And then went on to do several
00:05:22.840 other projects like that.
00:05:25.000 Can you say a little bit about that? How is that information transmitted?
00:05:28.700 It was really fun to come up with these problems with an engineering and communications background.
00:05:32.620 But if you look at a central neuron on the brain, and you look at the rate at which information is
00:05:38.660 transferred, it seemed to be much faster than the number of synaptic vesicles in the terminal,
00:05:44.660 right? So there was this, well, there's only 30 synaptic vesicles in the terminal by like an
00:05:49.420 electron microscope, yet you're seeing hundreds of transmissions over a few seconds. So how is that
00:05:55.920 possible? And there were various theories. There was an individual vesicle that was fusing and staying
00:06:02.280 fused and pumping neurotransmitter through it without collapsing. And that's how you could get these
00:06:07.460 so much more rapid puffs. We came up with a cute term, which was called kiss and run to explain
00:06:13.440 phenomenon. And it again, helped answer this fundamental question of how did the brain get
00:06:19.740 so many small neurons yet able to transmit so much information per individual connection.
00:06:26.780 So Alex, if you think about all the things that you learned during your PhD, I mean, I guess one of
00:06:32.040 the benefits of doing it where you did it in the lab you did it in was you overlapped with some other
00:06:38.280 really thoughtful folks, including a previous guest on the podcast, Carl Deseroth. What do you think
00:06:43.500 were the most important things you learned philosophically, not necessarily technically,
00:06:48.700 that are serving you in the stuff we're going to talk about today? So we're going to talk today a lot
00:06:54.800 about liquid biopsies. We're going to talk a lot about epigenetics. We're going to talk a lot about
00:07:00.600 certainly technologies that have made those things possible. And when you think back to your background in
00:07:07.440 double E, what were the transferable skills? So I think one of them, and it's a saying in
00:07:12.460 engineering, which is if you can't build it, you don't understand it. So simply understanding a
00:07:16.980 description of something is not the same as you can build it up from scratch. And so you can't always
00:07:22.000 do that in biology, but you can do experiments where you're testing the concept of, can I really make it
00:07:27.760 work? And so I think that was an engineering concept that served me well a lot. Another, it's not
00:07:33.680 exclusive to engineering, but was being very first principled. Do we really understand how this
00:07:38.580 works? In that particular lab, there's a big emphasis on doing experiments where you always
00:07:45.180 learn something, where, you know, regardless of whether or not it confirmed or rejected your
00:07:50.300 hypothesis, you learn something new about the system. Don't do experiments where you may just
00:07:56.120 not learn anything. That was a very powerful way to think about things.
00:08:00.400 So we'll fast forward a bit, just for the sake of time, there's obviously an interesting detour where
00:08:06.020 after I go off to residency and after you finish your PhD, we still find ourselves back together
00:08:11.860 side by side in the same company for four years, which again, brought many funny stories, including
00:08:19.160 my favorite is you and I getting lost in the middle of Texas, actually not in the middle of Texas,
00:08:25.580 but just outside of El Paso and nearly running out of gas. I mean, this was no cell signal.
00:08:34.020 We were in trouble, but we somehow made it out of that one together.
00:08:36.900 Yes. Yeah. No, I remember that, that us Californians thought that there must be a Starbucks within,
00:08:42.960 you know, 10, 15 miles out in the middle of West Texas. And it turns out you can go hundreds of miles
00:08:48.300 with a Starbucks.
00:08:49.580 That's right. Passing a gas station with an eighth of a tank saying, we'll stop at the next one can be
00:08:55.900 a strategic error. There was also the time you bailed me out when I forgot my cufflinks because
00:09:02.240 you had some dental floss. Do you remember that? I don't know if you remember that.
00:09:06.340 There you go.
00:09:06.900 Yeah. You had some dental floss.
00:09:08.940 That's right. Yeah. Total MacGyver move. But anyway, let's fast forward to all that stuff. So
00:09:12.940 I don't know what year it is. It's got to be circa what, 2012. When do you end up at Illumina
00:09:19.380 for the first time? Early 2013.
00:09:22.560 Okay. Talk to me about that role. What was it that you were recruited to Illumina to do? And maybe
00:09:27.160 just tell folks who don't know what Illumina is a little bit about the company as well.
00:09:33.060 Yeah. So today, Illumina is the largest maker of DNA sequencing technologies. So when you hear about
00:09:41.760 the human genome being sequenced, things like expression data or any seek, most liquid biopsies,
00:09:48.940 most tumor sequencing, finding genetic variants in kids with rare disease, most of that is done
00:09:55.260 with Illumina technology. So they also make the chemistries that process the DNA, the sequencers
00:10:00.860 that generate that information, and also the software that helps analyze it. So I really took that tool
00:10:07.960 from a very niche research technology to a standard of care in medicine and hundreds of thousands of
00:10:14.720 publications and tremendously has been advancing science. So 11 years ago, you showed up there.
00:10:21.540 What was the role you were cast in? This was earlier on in Illumina's history. What attracted me to the
00:10:27.640 company and why I was recruited was to help develop more clinical applications and more applied
00:10:34.160 applications of the technology. So the technology had a use by certain sequencing aficionados for basic
00:10:41.280 research. But the company and I agreed with the vision felt that, hey, this could be used for a lot
00:10:46.460 more. This could be used to help every cancer patient. This could be used to help people with genetic
00:10:51.780 diseases. How can we develop the technology and other aspects of it, the assays and softwares to make that
00:10:58.540 reality? I was hired to do that. It occurred to me when you even said a little bit of that, Alex, that
00:11:04.560 many of us, you and I would take for granted some of the lingo involved here, sequencing and what's
00:11:11.560 involved. But I still think it might be a bit of a black box to some people listening. And given
00:11:17.140 the topics we're going to cover today, I think just explaining to people, for example, what was done
00:11:24.860 in the late 90s, early 2000s when, quote unquote, the human genome was sequenced? What does that mean?
00:11:31.260 And how had that changed from the very first time it was done by sheer brute force in the most analog
00:11:38.240 way until even when you arrived 10, 11 years ago? So maybe walk us through what it actually means
00:11:46.480 to sequence a genome. And feel free to also throw in a little bit of background about some of the basics
00:11:52.080 of DNA and the structure, et cetera, as it pertains to that. It's some really important fundamental
00:11:56.800 stuff. A quick primer on human genetics. So in most cells of the body, you have 23 pairs of
00:12:04.220 chromosomes. They're very similar except the X and Y chromosome, which are obviously different in men
00:12:09.660 and women. Each one of those chromosomes is actually a lot of DNA packed together in a very orderly way, 1.00
00:12:17.520 where the DNA is wrapped around proteins called nucleosomes, which are composed of histones.
00:12:23.480 And then it's packed into something called chromatin, which is this mass of DNA and proteins.
00:12:28.820 And again, packed together, and then you make these units of chromosomes. Now, if you were to unwind
00:12:34.260 all of those chromosomes, pull the string on the sweater and completely unwind it, and you were to line
00:12:41.060 all of them end to end, you would have 3 billion individual bases. So the ATCG code at any given
00:12:49.860 one of those 3 billion positions, you would have a string of letters. Each one would either be ATC or G,
00:12:56.040 and it would be 3 billion long. So to sequence a whole human genome is to read out that code for an
00:13:03.480 individual. And once you do that, you then know their particular code at each of those positions.
00:13:09.520 So at the end of the last century, that was considered quite a daunting task. But as I think
00:13:18.260 our country has often done, decided that it was a very worthy one to do, along with several other
00:13:24.000 leading countries that believe strongly in science. And so they funded the Human Genome Project. So all
00:13:29.240 over the world at centers, people were trying to sequence bits of this 3 billion bases to comprise
00:13:35.660 the first complete human genome. So it's just quite famous. There were two efforts. One was a public
00:13:41.580 effort led by the NIH and Francis Collin at the time. They had a particular approach where what they
00:13:48.320 were doing was they were cutting out large sections of the genome, and then using an older type of
00:13:55.540 sequencing method called capillary electrophoresis to sequence each of those individual bases.
00:14:00.740 There was a private effort led by Craig Venter and a company called Solera, which took a very
00:14:07.220 different approach, which is they cut up the genome into much, much smaller pieces, pieces that were so
00:14:13.780 small that you didn't necessarily know a priori what part of the genome they would come from, which is
00:14:19.860 why they were doing this longer, more laborious process through the public effort. But there was a big
00:14:24.800 innovation, which is they realized that if you had enough of these fragments, you could, using a
00:14:29.960 mathematical technique, reconstruct it from these individual pieces, where you could take individual
00:14:35.680 pieces, looked at where they overlapped. And again, we're talking about billions of fragments here,
00:14:40.640 and you can imagine mathematically reconstructing that. Very computationally intensive, very complex.
00:14:46.600 But the benefit of that is that you could generate the data much, much faster. And so in a fraction of the
00:14:52.480 time and for a fraction of the money, they actually caught up to the public effort and then culminated
00:14:57.740 in each having a draft of a human genome around the same time in late 2000, early 2001. And then
00:15:06.020 simultaneously in nature and science, we got the first draft of a human genome milestone in science.
00:15:12.960 Alex, what were the approximate lengths of the fragments that Solera was breaking DNA down into?
00:15:19.080 They were taking chunks out in individual megabases, so like a million bases at a time. And then they would
00:15:26.800 isolate that and then deconstruct it even into smaller pieces, which were kilobase fragments,
00:15:33.200 a thousand bases at a time. And again, so they would take a piece of the puzzle, but they would
00:15:37.740 know which piece it was, and then break that into smaller and smaller ones. And then after you had the
00:15:43.240 one kilobase sequences, they would put it all back together versus just to contrast that with the
00:15:48.820 private effort, which they called shotgun sequencing, which is you just took the whole thing,
00:15:53.720 ground it up, brute force sequenced it, and then use the informatics to figure out what went where.
00:16:00.040 And in the shotgun, how small were they broken down into?
00:16:03.280 They got down to kilobase and hundred base, multi-hundred base fragments. But the key was,
00:16:09.120 all you had to do was just brute force keep sequencing, as opposed to this more artisanal
00:16:14.440 approach of trying to take individual pieces and deconstruct them and then reconstruct them.
00:16:19.260 So it's early 2001. This gets published. By the way, do we know the identity of the individual?
00:16:24.480 I think we do know the identity of the individual who was sequenced, don't we? I can't recall.
00:16:28.980 I think the original one was still anonymous and likely to be a composite of multiple individuals,
00:16:34.280 just because of the amount of DNA.
00:16:36.160 That was needed. Yeah.
00:16:37.280 Yeah. Soon after, there were individuals. Craig Venter, he may have been the first
00:16:41.720 individual who was named that we had the genome for.
00:16:45.100 Got it. It's often been said, Alex, that that effort costs, at the end of that sequencing,
00:16:51.540 if you decided, I want to now do one more person, it would cost a billion dollars directionally
00:16:56.580 to do that effort. What was the state of the art in transitioning that from where it was,
00:17:07.200 let's just say, order of magnitude, 10 to the $9 per sequence, to where it was
00:17:13.960 10 years later, approximately? What was the technology introduction or plural version of
00:17:21.520 that question that led to a reduction? And how many logs did it improve by?
00:17:27.520 We went back and did this analysis. So if you literally at the end of the original human
00:17:31.780 genome said, Hey, I want to do one more. And you have the benefit of all the learnings from the
00:17:36.280 previous one, a few hundred million dollars would have been an incremental genome. By 2000,
00:17:43.980 well, it was low tens of thousands of dollars. So let's call that four or five logs of improvement.
00:17:52.380 And what brought that? So the day you show up at Illumina and it's, if for research purposes,
00:17:57.680 or if a very wealthy individual said, I have to know my whole genome sequence,
00:18:02.240 and they were willing to pay $25,000 for it, or a lab was doing it as part of a clinical trial or
00:18:08.700 for research, what were they buying from Illumina to make that happen?
00:18:13.900 So it was a series of inventions that allow the sequencing reactions to be miniaturized.
00:18:19.160 And then you could do orders of magnitude, more sequencing of DNA by miniaturizing it.
00:18:24.880 The older sequencers, they had a small glass tube. And as the DNA went through, you sequenced it,
00:18:29.980 it got converted into a 2D format, kind of like a glass slide, where you had tiny fragments of DNA
00:18:36.780 stuck to it, hundreds of millions, then ultimately billions. And then you sequenced all of them
00:18:42.440 simultaneously. So there was a huge miniaturization of each individual sequencing reaction, which allowed
00:18:49.340 you to just in one system generate many, many more DNA sequences at the same time. There's a very
00:18:56.280 important chemistry that was developed called sequencing by synthesis by a Cambridge chemist,
00:19:02.080 who I know well, Shankar Balasupramanian. And he developed Illumina sequencing chemistry,
00:19:07.980 which ultimately went through a company called Celexa, which Illumina acquired. And that has
00:19:12.300 generated the majority of the world's genomics data, the original chemistry that he developed in
00:19:17.940 Cambridge.
00:19:18.340 And what was it about that chemistry that was such a step forward?
00:19:22.920 It allowed you to miniaturize the sequencing reactions. So you could have a huge number,
00:19:28.180 ultimately billions in a very small glass slide. It also allowed you to do something
00:19:33.400 which is called cyclic sequencing in a very precise and efficient and fast way, where you read off one
00:19:41.500 base at a time, and you can control it. And so you imagine you have, say, a lawn of a billion DNA
00:19:46.840 fragments, and you're on base three on every single fragment, and you want to know what base four is
00:19:51.960 on every fragment. It allowed you to simultaneously sequence just one more base on all billion
00:19:57.900 fragments, read it out across your whole lawn. And then once you read it out, add one more base,
00:20:04.640 read it all out. And so this allowed for this huge parallelization.
00:20:09.240 Let's talk a little bit about where we are today. To my recollection, the last time I looked
00:20:15.440 to do a whole genome sequence today is on the order of $1,000, $500 to $1,000. Is that about accurate?
00:20:23.520 Yeah, that's way too expensive, Peter. Today, a couple hundred dollars.
00:20:29.140 Okay. So a couple hundred dollars today. I feel like I looked at this on a graph a while ago,
00:20:35.500 and it was one of the few things I noticed that was improving faster than Moore's Law.
00:20:40.980 Maybe tell folks what Moore's Law is, why it's often talked about. I think everybody's heard of
00:20:46.560 it. And maybe talk about the step function that it's basically, if I'm looking at it correctly,
00:20:51.940 there were two Moore's Laws, but there was something in between that became even a bigger
00:20:57.120 improvement. But maybe tell folks what Moore's Law is, first of all.
00:21:01.000 It's not like a law, like a law of physics or something like that, but it became an industry
00:21:05.400 trend in microprocessors. What it refers to is the density of transistors on a microchip and the
00:21:14.160 cost of the amount of computing power per amount of transistors. And that geometrically decreased
00:21:21.660 kind of in a steady way. Actually, I don't remember the exact number if it's like doubling every
00:21:26.980 two years or something like that. But there was a geometric factor to it that the industry
00:21:32.420 followed for decades. It's not quite following that anymore. I mean, transistors are getting down
00:21:37.320 to like the atomic scale, but went way faster than people had envisioned.
00:21:43.720 It basically started in the late 60s. And as you said, it went until it hits the limits of atomic
00:21:49.000 chemistry.
00:21:50.020 Yeah. And so that relentless push is what made the whole software engineering high-tech industry possible.
00:21:56.140 So back to my question, which is, if you just look at the cost of sequencing
00:22:00.600 from 2000 till today, it's sort of like two curves. There's the relentless curve that gets to where
00:22:08.380 we are in 2013. But then there was another big drop in price that occurred after that. I'm guessing
00:22:15.500 that had to do with shotgun sequencing or the commercialization of it. I mean, not the concept
00:22:20.020 of it, which already existed. Does that sound right?
00:22:23.380 Yeah. So when Illumina really started to deliver the higher throughput next generation sequencings,
00:22:29.140 it brought along a new faster curve because of the miniaturizations. So this ability to sequence
00:22:34.840 billions of fragments in a small area, I was privileged to be a big part of this effort.
00:22:40.520 And Illumina just continuing to drive the density down, the speed of the chemistry up,
00:22:45.700 all the associated optics, engineering software around it drove that much faster than Moore's law
00:22:52.360 reduction in cost.
00:22:53.400 Were other companies involved in the culmination of next-gen sequencing?
00:22:58.740 Yeah, many. And some of them are still around. None nearly as successful as Illumina,
00:23:03.700 but also some important players there.
00:23:06.520 And today that's the industry standard. I assume there's no sequencing that's going on
00:23:10.360 that isn't next-gen?
00:23:12.400 No, the vast majority is next-gen sequencing. There's niche applications where there's other
00:23:17.680 approaches, but in the 99% of the data being generated, some version of next-generation
00:23:22.680 sequencing.
00:23:24.220 Got it. So you mentioned a moment ago that part of the effort to bring you to Illumina was
00:23:31.080 presumably based on not just your innate talents, but also the fact that you came from a somewhat
00:23:37.360 clinical background as well. You're an MD and a PhD. And if their desire is to be able to branch out
00:23:42.840 into clinical applications, that would make for a natural fit. So where in that journey did the
00:23:48.580 idea of liquid biopsies come up? And maybe talk a little bit about the history of one of the
00:23:54.160 companies in that space that we're going to talk about today.
00:23:56.840 So to start with that, I should talk about first tumor sequencing, which predated liquid biopsy.
00:24:02.260 A couple of companies, most notably Foundation Medicine, developed using Illumina technology,
00:24:08.260 developed tumor sequencing. So there had been some academic work, but they tried to develop it and
00:24:13.920 were the first to do it successfully as a clinical product. What you can imagine is there's these
00:24:19.000 genes that are implicated in cancer that often get mutated. Knowing which mutations a tumor has has
00:24:25.380 big implications for prognosis, but also for treatment. Over time, we have more and more targeted
00:24:31.000 therapies where if your tumor has a very particular mutation, it's more likely to respond to certain
00:24:36.940 drugs that target that type of tumor. And at the time, as more and more of these mutations were
00:24:43.640 identified that could be important in the treatment of a tumor, it was becoming impractical to say,
00:24:49.980 do a PCR test for every mutation. So imagine there's 100 potential mutations you'd like to know about
00:24:56.180 if a patient has in their tumor and their lung cancer, doing each of these individually. Again,
00:25:02.160 a lot of expense, a lot of false positives. And so what companies like Foundation Med is say, hey,
00:25:08.360 why don't we just sequence all of those positions at once given next generation sequencing? So they 0.50
00:25:13.740 would make a panel to sequence, say, 500 genes or a few hundred genes, the ones that are most important
00:25:19.460 in most solid cancers. And then they would sequence them. And then in one test, they would see the vast
00:25:25.220 majority of the potential mutations that could be relevant to treatment for that cancer patient.
00:25:30.740 And so that is still a very important tool in oncology today. A large fraction of tumors are
00:25:37.160 sequenced. And that's what allows people to get access to many types of drugs. Many of the targeted
00:25:43.360 therapies for lung cancer, melanoma, or you hear about things like microsatellite instability or high
00:25:50.720 mutational burden, that all comes from tumor sequencing. Once that was established, then a few
00:25:57.600 folks, most notably at Johns Hopkins, but also other places, started to ask the question, well,
00:26:02.960 you know, could we sequence the tumor from the blood? And you might say, well, hey, you have a tumor in
00:26:08.020 your lung. Why would sequencing blood be relevant to looking at the tumor? Well, it turns out there is
00:26:14.800 tumor DNA in the blood. And this is interesting. So in the late 40s, it was first identified,
00:26:20.720 that there was DNA in the blood outside of cells, so-called cell-free DNA. And then in the 70s,
00:26:27.040 it was noticed that cancer patients had a lot of DNA outside their cells in the blood, and that some
00:26:34.540 of this was likely from tumors, from the cancer itself. If you know anything about tumor biology,
00:26:41.560 you know that cancer cells are constantly dying. So you think of cancers as growing very quickly,
00:26:46.220 and that's true, but they actually are dying at an incredible rate because it's disordered growth.
00:26:52.640 So many of the cells that divide have all kinds of genomic problems. So they die or they're cut
00:26:58.260 off from vasculature. But the crazy thing about a tumor is, yes, it's growing fast if it's an aggressive
00:27:04.120 tumor, but also the amount of cell death within that tumor is very high. And every time one of those
00:27:10.500 cells die, some of the DNA has the potential to get into the bloodstream. And so it was this insight
00:27:16.940 along with the tumor sequencing that said, hey, what if we sequence this cell-free DNA? Could we
00:27:22.920 end up sequencing some of the tumor DNA or the cancer cell DNA that's in circulation?
00:27:28.940 Early results, particularly from this group at Johns Hopkins, began to show that indeed that was
00:27:35.060 possible. And then a few companies, again, using Illumina technology, and then we started doing
00:27:41.080 it at Illumina also, our own liquid biopsy assays and tests and technologies developed what became
00:27:47.600 liquid biopsy. In this context, it was for late-stage cancer. So it was for patients who
00:27:52.740 diagnosed with a cancer. You wanted to know, did their tumor have mutations? And you could do it from
00:27:58.180 the blood. There was a big benefit, which was, as you know, for lung cancer, taking a biopsy can be a
00:28:04.200 very dangerous proposition. You can cause a pneumothorax. You can land someone in the ICU.
00:28:11.220 You know, in rare cases, it can lead to death in that type of procedure. And so the ability to
00:28:16.180 get the mutational profile from the blood was really attractive. And so that started many companies
00:28:23.360 down the road of developing these liquid biopsies for late-stage cancers.
00:28:28.380 So Alex, let's talk about a couple of things there. Tell me the typical length of
00:28:34.080 a cell-free DNA fragment. How many base pairs, or what's the range?
00:28:38.840 Yeah, it depends on the exact context, but around 160 base pairs. So that's 160 letters of the ATCG
00:28:46.660 code. And there's a very particular reason it's that length, which is that if you pull the string
00:28:53.140 on the sweater, you unwind the chromosome, and you keep doing it until you get down to something
00:28:58.800 around 160 base pairs, what you find is that the DNA, right, it's not just naked, it's wrapped around
00:29:05.320 something called a nucleosome, which is an octamer or eight of these histone proteins in a cube,
00:29:12.500 and the DNA is wrapped around it twice. And that's the smallest unit of chromatin of this larger chromosome
00:29:19.720 structure. And so the reason it's 160 bases is that's more or less the geometry of going around
00:29:26.380 twice. And so DNA can be cleaved by enzymes in the blood, but that nucleosome protects the DNA from
00:29:35.540 being cut to anything smaller than about 160 base pairs. And does that mean that the cell-free DNA
00:29:42.600 that is found in the blood is still wrapped around the nucleosome twice, like it's still clinging to
00:29:50.100 that and that's what's protecting it from being cleaved any smaller?
00:29:52.520 That's right.
00:29:53.860 You mentioned that obviously the first application of this was presumably looking for ways to figure
00:30:02.220 out what the mutation was of a person with late-stage cancer without requiring a tissue
00:30:07.060 biopsy. Presumably by this point, it was very easy to gather hundreds of 160 base pair fragments and
00:30:17.180 use the same sort of mathematics to reassemble them based on the few overlaps to say this is the actual
00:30:23.780 sequence because presumably the genes are much longer than 160 base pairs that they're looking at.
00:30:30.100 That's right. So by this point in 2014, 2015, the informatics was quite sophisticated. So you could
00:30:38.880 take a large number of DNA sequences from fragments and easily determine which gene it was associated with.
00:30:46.060 At some point I recall in here, I had a discussion on the podcast maybe a year and a half ago,
00:30:53.220 two years ago with Max Dean, another one of our med school classmates, about looking at recurrences in
00:31:00.960 patients who were clinically free of disease. So you took a patient who's had a resection plus or minus
00:31:08.900 some adjuvant chemotherapy. And to the naked eye and to the radiograph, they appear free of disease.
00:31:16.120 And the question becomes, is that cancer recurring? And the sooner we can find out, the better our chance
00:31:22.940 at treating them systemically again, because it's a pretty well-established fact in oncology that the
00:31:29.380 lower the burden of tumor, the better the response, the lower the mutations, the less escapes, etc.
00:31:35.640 And so did that kind of become the next iteration of this technology, which was,
00:31:41.860 if we know the sequence of the tumor, can we go fishing for that particular tumor in the cell-free
00:31:48.460 DNA?
00:31:49.640 Yeah, yeah. Broadly speaking, there's kind of three applications from looking at tumor DNA in the
00:31:54.360 blood. One is screening, which we'll talk about later, which is people who don't have cancer,
00:31:59.140 or 99% who don't, and trying to find the individual who has cancer, an invasive cancer,
00:32:04.980 but doesn't know it. There's this application of what we call therapy selection, which is you're a
00:32:09.300 cancer patient trying to decide which targeted therapy would be best for you. And then this other
00:32:15.520 one you mentioned is a third application we call often minimal residual disease. We're looking at
00:32:21.480 monitoring a response, which is you're undergoing treatment, and you want to know,
00:32:26.520 is the amount of tumor DNA in the blood undetectable? And also its velocity, is it changing?
00:32:33.160 Because as you mentioned, that could tell you, is your treatment working, the tumor DNA burden or
00:32:38.980 load is going down? Is it undetectable, and you're potentially cured that there's no longer that source
00:32:45.820 of tumor DNA in your body? Or is it present even after a treatment with intent to cure, and that in the
00:32:54.900 presence of that tumor DNA still means basically, and we appreciate this now, unfortunately, you have
00:33:01.360 not been cured, but that patient hasn't been cured, because there is some nidus of tissue somewhere
00:33:06.220 that still harbors these mutations, and therefore is the tumor, even if it's not detectable by any other
00:33:12.900 means.
00:33:13.300 So at what point does this company called Grail that we're going to talk about, at what point does
00:33:23.520 it come into existence, and what was the impetus and motivation for that as a distinct entity outside
00:33:30.280 of Illumina?
00:33:31.680 So there were several technological and scientific insights that came together, along with, as often
00:33:38.520 in this case, some really old entrepreneurs and investors. The use of this liquid biopsy technology
00:33:46.580 in late-stage cancers, it was clearly possible to sequence tumors from the blood, and it was clearly
00:33:52.340 actually the tumor DNA, and it was useful for cancer patients. So we knew that there was tumor DNA, we knew
00:33:58.740 it could be done, but what the field didn't know is, could you just see this in early-stage cancers,
00:34:03.660 localized cancers that were small? Not a lot of data on that, but there was the potential.
00:34:10.720 There was also a really interesting incidental set of findings in a completely different application
00:34:16.300 called non-invasive prenatal testing. Again, totally different application, but it was discovered
00:34:22.700 principally by a scientist in Hong Kong named Dennis Lowe that you could see fetal DNA in the blood,
00:34:30.300 or more specifically placental DNA in the blood, and it was also cell-free DNA. What he developed,
00:34:37.880 actually, along with one of our professors at Stanford, Steve Quake, was a technique to look
00:34:43.840 for trisomies in the blood based on this placental or fetal DNA, and this is called non-invasive
00:34:50.060 prenatal testing. And so what you do is you sequence the cell-free DNA fragments in a pregnant woman, 0.62
00:34:56.540 you look at the DNA, and if you see extra DNA, for example, at the position of chromosome 21,
00:35:04.560 well, that indicates that there are tissues in women, presumably the fetus, or placenta that's 0.89
00:35:10.700 giving off extra chromosome 21. And so this ended up being an incredibly sensitive and specific way
00:35:18.380 to test for the presence of trisomies, chromosome 21, 18, 13, early in pregnancy. And it's had a
00:35:26.740 tremendous impact. It was also involved in subsequent iterations of the test. In the United States,
00:35:31.860 it decreased amniocentesis by about 80% because the test is so sensitive and specific as a screen
00:35:38.940 that many, many women have now not had to undergo amniocentesis and the risks around. Again,
00:35:45.240 totally different application of cell-free DNA. But what happened is during the early
00:35:51.380 commercialization of about the first few hundred tests, the companies pioneering this, and one of
00:35:56.620 them was a company called Veranata that Illumina acquired, began to see in rare cases, very unusual
00:36:03.520 DNA patterns. It wasn't just a chromosome 21 or 18 or 13, but what's often called chromotripsis,
00:36:13.000 which is many, many abnerations across chromosomes. The two women who really did this analysis and
00:36:21.660 really brought both Illumina and the world's attention to it were Meredith Hawks-Miller,
00:36:26.560 a pathologist and lab director at this Illumina-owned company, Veranata, and another
00:36:31.500 bioinformatics scientist, Daria Chudova. What they showed is, ultimately, that these women actually
00:36:38.420 had cancer. They were young women of childbearing age. They ultimately had healthy children,
00:36:44.840 but they had an invasive cancer and it was being diagnosed in their cell-free DNA by this
00:36:51.540 non-invasive prenatal test. And as they began to show these patterns to people, it became clear that
00:36:58.280 they were clearly cancer. If you have many, many chromosomes that are abnormal, that's just not
00:37:03.980 compatible with life or a fetus. And so when you saw this just genome-wide chromosomal changes,
00:37:11.500 it was very clear that we're incidentally finding cancer in these women. 0.99
00:37:15.680 Let's talk a little bit about that, actually, because I want to dig into that. It's so interesting.
00:37:19.760 So let's take a step back. So again, whenever you say we're sampling for cell-free DNA,
00:37:25.140 we should all be keeping in the back of our mind, we're looking for these teeny tiny little
00:37:29.440 160 base pair fragments wrapped around little nucleosomes. Now, let's just go back to the
00:37:36.120 initial use case around trisomy 21. With 160 base pairs, is that sufficient to identify any one
00:37:44.080 chromosome? Presumably, you're also sampling maternal blood, so you know what the maternal
00:37:49.220 chromosomes look like, and you're presumably juxtaposing those two as your control. Is that
00:37:55.500 part of it? Not quite. So it's all mixed together. So in a pregnant woman's blood and maternal blood, 0.83
00:38:02.140 it's a mixture. So you have cell-free DNA. The majority of the cell-free DNA is from her
00:38:07.340 own cells and tissues. And then you have superimposed on that a bit of cell-free DNA from
00:38:13.720 mostly the placenta. And so what you're seeing is this mix of cell-free DNA. And then what you do is
00:38:20.500 you sequence. There's different ways to do it, but the most common way is you do shotgun sequencing,
00:38:25.140 and you sequence millions of these fragments. And every time you sequence a fragment,
00:38:30.860 you place it in a chromosome based on its sequence. Your first fragment, you say,
00:38:35.760 hey, when I compare this to the draft human genome, this goes on chromosome two.
00:38:39.840 You sequence your third fragment and you say, hey, this sequence looks like chromosome 14.
00:38:44.960 And you keep putting them in the chromosome buckets. And what you expect, if every tissue has an
00:38:53.200 even chromosome distribution, you know, or two chromosomes, is that that profile would be flat
00:38:57.840 and each bucket would be about the same level. But what you see in a woman carrying a fetus that 1.00
00:39:04.460 has a trisomy... You'll see 50% greater in the chromosome 21 bucket.
00:39:09.920 You actually see more like 5% or 10%. Because again, remember, 90% of it might be maternal blood,
00:39:15.760 right? So that's all going to be even. But within the 10% fetal, you're going to have an extra 50%.
00:39:21.480 So the total might be an extra 5% or 10%. But that's a whopping big signal and very easy to detect.
00:39:29.280 Isn't it interesting? It just gives a sense of how large the numbers are if a 5% delta
00:39:34.620 is an off the charts, unmistakable increase in significance. I want to make sure again,
00:39:40.940 people understand what you just said, because it's very important. Because the majority of the
00:39:45.280 cell-free DNA belongs to the mother, and because the fetal slash placental cell-free DNA is a trivial
00:39:51.820 amount, even though by definition a trisomy means there is 50% more of one chromosome, you've gone
00:39:59.360 from two to three copies. In the fully diluted sample, that might only translate to a few percent.
00:40:07.140 But that's enough, given the large numbers that you're testing, to be a definitive,
00:40:13.820 statistically significant difference that triggers a positive test.
00:40:18.240 Yep. Well put. Yes.
00:40:20.480 Alex, I want to come back to the story, because this is clearly the beginning of the story.
00:40:24.300 But let's come back to just a couple other housekeeping items.
00:40:27.580 A moment ago, we talked about cell-free DNA in the context of tumor. Someone might be listening to us
00:40:31.720 thinking, wait, guys, you just said that the majority of the cell-free DNA is from this mother.
00:40:37.120 99.9% of the time, she doesn't have cancer. Where is that cell-free DNA coming from? 1.00
00:40:42.340 When cells are destroyed, either through necrosis or apoptosis, there's a lot of cell turnover,
00:40:48.600 right, of cells that replicate, especially epithelial cells, blood cells, and so on. As the natural
00:40:55.440 biochemistry destroys them, some of the DNA from the nucleus ends up in circulation. Again,
00:41:01.720 where they're wrapped around these nucleosomes. So it's essentially cell death and cell turnover
00:41:07.180 is the source of it. And since, again, at any one time, there's millions of cells dying and being
00:41:13.200 turned over, there's always some base-level cell-free DNA in the blood.
00:41:18.280 And again, I don't know if you've ever done the calculation. If not, I don't mean to put you on
00:41:21.540 the spot. But do you have an approximate guess for how many base pairs of cell-free DNA are floating
00:41:28.180 around your body or my body as we sit here right now?
00:41:31.160 What I can say is, if you took a 10-mil blood tube, which is a lot of what these tests use,
00:41:37.100 and you remove all the cellular DNA, remember, there's a ton of DNA in the cells in circulation.
00:41:41.700 Sure. The white blood cells, the red blood cells, et cetera. Get rid of all that. Yep.
00:41:45.380 Huge amount. You probably have on the order of a few thousand cells worth of cell-free DNA
00:41:51.160 in a 10-mil blood tube, which isn't a lot. Just to make sure I understand you, you're saying
00:41:56.640 a few thousand cells worth. Each cell would be 3 billion base pairs.
00:42:02.500 Yes. Yes.
00:42:03.980 Wow. On the one hand, it doesn't sound like a lot because there are billions of cells. On the other
00:42:10.180 hand, it still sounds like a lot. That's still a big computational problem.
00:42:14.420 Where it becomes challenging is when we get into early detection, right? Where if you think about it,
00:42:19.720 for any position in the genome, you only have a few thousand representations of it because there's
00:42:27.300 only a few thousand cells. That starts to limit your ability to detect events that occur at one
00:42:34.380 in a million or one in a hundred thousand. Alex, do you recall these incident cases of the pregnant
00:42:43.960 mothers? Again, I guess we should probably go back and re-explain that because it's such an
00:42:48.900 important and profound discovery. There were a handful of cases where in the process of screening
00:42:54.460 for trisomies, they're discovering not that the mother has additional chromosomes that can be
00:43:02.640 attributed to the fetus, but that she has significant mutations across a number of genes that
00:43:12.700 also are probably showing up in relatively small amounts because they're not in all of her cells. 0.89
00:43:19.080 Is that correct?
00:43:20.500 Yeah. Yeah. So you might expect a flat pattern, right? In the majority of cases, or when the fetus
00:43:26.740 has a trisomy, you see these very well-known accumulations, mostly in 21, but occasionally in
00:43:33.260 18 or 13. And instead what you see is just increases and decreases monosomies and trisomies
00:43:40.120 across many, many chromosomes, which is just not compatible with life even as a fetus. But there
00:43:47.120 is a biology where you do see these tremendous changes in the chromosomes. And that's often in the
00:43:53.720 case of cancer.
00:43:55.240 Do you recall what those cancers turned out to be in those young women? I mean, I assume they
00:43:59.880 were breast cancers, but they could have been lung cancers, anything?
00:44:03.360 Yeah. So Meredith and Daria, they published a paper in JAMA, which for anyone interested,
00:44:08.480 details these 10 or so cases and what happened in each of them. It was a mix. I think there was
00:44:14.580 a neuroendocrine, uterine, some GI cancers. It was a smattering of different things.
00:44:20.860 And what was the approximate year of that? We'll try to find that paper and link to it in the show
00:44:24.800 notes.
00:44:25.580 It was 2015 in JAMA.
00:44:27.360 Got it. That seems unbelievable.
00:44:29.880 Of course, one doesn't know the contrapositive. One doesn't know how many women had cancer but
00:44:37.660 weren't captured. But is it safe to assume that the 10 who were identified all had cancer?
00:44:45.280 Yes. Yes.
00:44:46.420 So there were no false positives. We just don't know how many false negatives there were.
00:44:50.180 Right. Yeah. This is one of the things that contributed to the evidence that cancer screening
00:44:56.580 might be possible using cell-free DNA, which is these incidental findings. As I mentioned earlier,
00:45:02.180 we already knew that, yes, tumors do put cell-free DNA into the bloodstream. But this was a profound
00:45:08.300 demonstration that in actual clinical practice, you could find undiagnosed cancers in asymptomatic
00:45:15.240 individuals. And that it was highly specific, meaning that when it was found using this method,
00:45:20.940 it almost, well, I think in those initial ones, it was every case, but almost every case turned out
00:45:26.000 to have cancer. Now, to your point, it's not a screening test because even in relatively healthy
00:45:33.720 and women of childbearing age, a population of 100,000, you expect epidemiologically 10 times or
00:45:41.160 so or 50 times that number of cancers over a year or so. So clearly you're missing the majority of
00:45:48.080 cancer. So it's not a screening test. Right. It was just a proof of concept though.
00:45:52.860 Yeah. An inadvertent proof of concept that really raised that Illumina and I think in the field,
00:45:57.800 our attention of, hey, using cell-free DNA and sequencing based methods, it might be possible
00:46:03.240 to develop a very specific test for cancer. So what was the next step in the process of
00:46:10.480 systematically going after addressing this problem? Myself and some other folks at Illumina,
00:46:16.040 along with the two scientists I mentioned, Meredith and Daria, and then also in particular,
00:46:21.960 the CMO at the time, Rick Klausner, who had a very long history in cancer research and in cancer
00:46:29.500 screening. He was the previous NCI director under Bill Clinton. So that's the National Cancer
00:46:35.320 Institute at the NIH under Bill Clinton. And he was the CMO at Illumina at the time. And we started to
00:46:41.340 talk more and more about what would it take to develop or determine the feasibility of a universal
00:46:48.680 blood test for cancer based on this cell-free DNA technology. And being very first principle,
00:46:55.180 I really asked the question, well, why is it in 50 years of many companies and a tremendous amount
00:47:01.140 of academic research, no one had ever developed a broad-based blood test for cancer? Not just many
00:47:08.140 cancers, let alone any cancer. Really, the only example is PSA. And again, the false positive
00:47:14.600 rates there are so high that its benefit to harm has been questioned many times. And that's why it
00:47:20.980 doesn't have a USPSTF grade A or B anymore. And the fundamental reason is specificity. So there's lots
00:47:28.480 of things that are sensitive, meaning that there are proteins that accumulate, biochemistries, metabolites
00:47:34.460 that go up in cancer. But the problem is they go up in a lot of benign conditions. So, you know,
00:47:39.840 a big benign prostate spews out a lot of PSA. And pretty much every other protein or metabolite does
00:47:45.900 that. The biomarkers to date were all very sensitive, but all had false positive rates of,
00:47:53.540 say, 5% or 10%. And so if you're imagining screening the whole population, you can't be working up one of
00:48:00.560 10 people for a potential cancer. And so the key technological thing to solve was, well, how do
00:48:07.100 you have something that has a 1% false positive rate or a half percent false positive rate? Because
00:48:12.840 that's what you need to get to if you want to do broad-based cancer screening in relatively healthy
00:48:18.940 asymptomatic people. And this is why we thought it might be possible with cell-free DNA because
00:48:25.220 the tumor DNA could be more specific than proteins and other things that are common in benign disease.
00:48:33.440 And so that was the reason to believe. The things we didn't know is, well, how much DNA does a early
00:48:39.480 stage tumor pump out? If it doesn't pump out any, well, there's nothing to detect. The other is the
00:48:44.620 heterogeneity. Cancer is not like infectious disease or there's one very identifying antigen or sequence.
00:48:52.680 Every tumor is truly unique, right? So even two lung cancers that are both the same
00:48:57.600 histological subtype, they can share very few mutations or none. So you can have two squamous
00:49:03.860 cell lung cancers that honestly don't have a single shared mutation. So now you need to look at
00:49:09.960 hundreds or thousands or even of millions of physicians to see enough potential changes.
00:49:15.540 And this is where, again, NGS was a really good fit, which is how do you overcome the heterogeneity
00:49:21.340 that you need to now look for a disease that isn't defined? I can't tell you these three mutations
00:49:27.140 are the ones you need to find for this cancer. There's a huge set of different ones for every
00:49:33.040 cancer. And then that got us thinking, well, look, in addition to sequencing many physicians and
00:49:39.000 sequencing very deeply and using cell-free DNA, we were going to need to use AI or machine learning
00:49:44.600 because we had to learn these complex associations and patterns that no human being could curate
00:49:52.000 thousands of different mutational profiles and try to find the common signals and so on.
00:49:58.320 What emerged over the course of a year is, look, this might be possible, but we're going to have to
00:50:03.980 enroll very large populations just to study and find the signals and develop the technology.
00:50:09.920 And then we're going to need very large studies to actually do interventions and prove it clinically
00:50:15.720 valid that it actually works. We're going to have to use NGS and sequence broadly across the whole
00:50:22.080 genome. And only then might it be possible. And so the company at the time decided, and this was a
00:50:30.320 board-level decision, that ultimately this made more sense as an independent company, given the amount
00:50:37.640 of capital that was going to be required, given the scientific and technical risk, given the kind
00:50:43.240 of people that you would need to recruit. We're passionate about this, that it made sense to do
00:50:48.240 as a separate company. And so the CEO at the time, Jay Flatley, in early 2016 announced the founding of
00:50:56.100 the company and then spinning it out of Illuminon. I had the honor of being one of the co-founders of it.
00:51:02.100 Let's go back to 2016. You guys are now setting up new shop. You've got this new company. It's called
00:51:08.420 Grail. You've brought over some folks like you from Illumina, and presumably you're now also
00:51:15.340 recruiting. What is the sequence of the first two or three problems you immediately get to work on?
00:51:23.100 As I wrote the starting research and development plan, the way I wrote it was we needed to evaluate
00:51:29.420 every potential feature in cell-free DNA, meaning that any known method of looking for cancer in
00:51:36.680 cell-free DNA, we needed to evaluate. That if we were going to do this and recruit these cohorts and
00:51:42.060 all these altruistic individuals, and we were going to spend the money to do this, we needed to not look
00:51:46.620 at just one method or someone's favorite method or whatever they thought might work. We needed to look
00:51:52.040 at every single one. And so that's what we did. We developed an assay and software for mutations and
00:51:59.300 then a bunch of other things, chromosomal changes, changes in the fragment size, and many others. And we
00:52:05.740 said, look, we're going to test each one of these head-to-head, and we're going to test them in
00:52:09.160 combination, and we're going to figure out the best way to do this. We even had a mantra that Rick came up with
00:52:15.720 that I thought was very helpful, which is we're either going to figure out how to do this, or we're going to prove
00:52:19.940 it can't be done. I think that was very helpful in thinking about how to do these initial experiments.
00:52:25.100 So it was a lot of building these assays. We needed a massive data sets to train the machine learning
00:52:30.360 algorithm. So we had this study called the CCGA, the Circulating Cell-Free Genome Atlas, where we
00:52:35.560 recruited 15,000 individuals with and without cancer of every major cancer type, and in most cases,
00:52:42.940 hundreds. And then we tested all of these different methods, the ones I mentioned, and also
00:52:49.020 importantly, a methylation-based assay. And we did blinded studies to compare them and see could
00:52:55.060 any of them detect a large fraction of the cancers? Did any of them have the potential to do it at high
00:53:00.200 specificity? Because that's what we would need if we were going to develop a universal test for cancer
00:53:06.100 that could be used in a broad population. So let's kind of go back and talk about a few of those things
00:53:11.300 there because there was a lot there. So you said up front, look, we're going to make sure that any
00:53:18.180 measurable property of cell-free DNA, we are measuring, we are quantifying it, we are noting
00:53:24.880 it. We talked about some of them, right? So fragment length, that seems relatively fixed, but presumably
00:53:30.560 at large enough sample size, you're going to see some variation there. Does that matter?
00:53:35.240 The actual genetic sequence, of course, that's your bread and butter, to be able to measure that.
00:53:41.320 You also mentioned, of course, something called methylation, which we haven't really talked about
00:53:45.980 yet. So we should explain what that means. Were there any other properties besides fragment length,
00:53:52.280 sequence, and methylation that I'm missing? There were several others. One was chromosomal changes.
00:53:57.580 So as we mentioned in cancer, the numbers of chromosomes often change. So many cancers,
00:54:03.980 and this is wild, they'll often double the number of chromosomes. So you can go from 23 to
00:54:10.200 double or even triple the number. But these chromosomes are not normal. So you'll often have
00:54:16.440 arms or the structures of chromosomes will get rearranged. And so there's a way to look at that
00:54:22.620 also in the cell-free DNA. Like as we mentioned in the non-invasive prenatal testing, where you look at
00:54:27.300 the amount of DNA per chromosome or per part of chromosome. So we looked at what's called these
00:54:32.800 chromosomal abnormalities. We also looked at cell-free RNA. So it turns out there's also
00:54:38.940 RNA from tumors in circulation. How stable is that, Alex? I was under the impression that
00:54:44.840 RNA wouldn't be terribly stable, unlike DNA, which of course is double strand and quite stable.
00:54:51.140 How do you capture cell-free RNA? So naked RNA is not very stable. However, there's proteins that if
00:55:00.080 the RNA is bound to, and one type is called an argonaut protein, if the RNA is bound to it,
00:55:06.060 it is protected. I assume this is typically messenger RNA that's been in the process of
00:55:10.920 being transcribed. But somewhere along the way, before translation occurs, there's the disruption
00:55:18.400 to the cell that results in lysis or something. And you're just basically getting the cell-free RNA
00:55:23.640 RNA because you happened to catch it at that point. It was a replicating cell or something,
00:55:28.200 or it was just translating protein?
00:55:29.620 Yeah. Or during apoptosis, it's somehow during some kind of programmed cell death,
00:55:34.860 it's being digested or bound. The amount relative to the amount of cell death is low. So presumably
00:55:41.600 most of the RNA is destroyed, but enough of it does get protected and bound to proteins.
00:55:47.720 Whether or not it's cellular detritus or garbage, or it's intentional, it's kind of a different
00:55:53.320 question, but it is present. There's also vesicular structure. So little bubbles of membrane that the
00:56:00.280 RNA can be contained in. The most common one is referred to as an exosome, which are these little
00:56:05.760 vesicles in circulation. So in a variety of different ways, you can have messenger RNA and other types of
00:56:12.200 RNA preserved outside of cells in circulation. And so we looked at that also.
00:56:19.360 How long did it take to quantify all of these things? And presumably, I think you sort of alluded
00:56:25.760 to this, but we're not just looking at any one of these things. You're also asking the question,
00:56:30.220 can combinations of these factors add to the fidelity of the test, correct?
00:56:34.920 Yeah. So this initial research phase took close to three years, cost hundreds of millions of dollars.
00:56:41.160 We had to recruit the largest cohort ever for this type of study, the CCGA study, as I alluded to.
00:56:47.580 And there were different phases. There was a discovery and then multiple development and
00:56:52.300 validation phases. We had to make the world's best assays to look at each of these features.
00:56:58.960 And then we had to process all of those samples and then analyze them. And we did it in a very
00:57:05.240 rigorous way where the final testing was all done blinded and the analysis was all done blinded.
00:57:10.360 So we could be sure that the results were not biased. And then we compared them all and we also
00:57:16.440 compared them in combinations. And we use sophisticated machine learning approaches to
00:57:21.320 really maximize the use of each individual type of data from each, you know, whether or not it was
00:57:26.440 mutations or the chromosomal changes or methylation.
00:57:29.200 So you mentioned that the CCGA had 15,000 samples. How many of those samples were cancers versus
00:57:36.640 controls? What was the distribution of those?
00:57:39.440 It's about 60% cancer versus controls. Yeah, 40%.
00:57:43.340 You sort of alluded to it, but just to be sure I understood, you're obviously starting first with
00:57:48.520 a biased set where you know what's happening and then you're moving to a blinded, unbiased set
00:57:53.380 for confirmation. Is that effectively the way you did it?
00:57:56.080 Yeah. Yeah. It's often referred to as a training set and a test set. Yeah.
00:58:02.220 Tell us what emerged, Alex, when it was all said and done, when you had first and foremost
00:58:07.440 identified every single thing that was measurable and knowable. Sorry, before we do that, I keep
00:58:13.400 taking us off methylation. Explain methylation of all the characteristics. That's the one I don't
00:58:18.380 think we've covered yet.
00:58:19.120 So DNA methylation is a chemical modification of the DNA. So in particular at the
00:58:25.580 C in the ATC-C code, the C stands for acytosine. So that's a particular nucleotide or base in DNA.
00:58:34.240 Mammalian biology can methylate. It means that it can add a methyl group, but a methyl group is just
00:58:40.500 a single carbon atom with three hydrogens and then bonded to that cytosine. And so that's what DNA
00:58:46.760 methylation is. So to say acytosine is methylated, it means that it has that single methyl group bonded
00:58:54.120 to it. Turns out that there's about 28 million positions in the human genome that can be methylated.
00:59:00.640 It usually occurs at what's called CPG sites, which is if you go along one strand of DNA,
00:59:07.840 this is not pairing of the DNA, but one strand, a G follows a C. So that's what a CPG is. It's a C
00:59:14.260 with a phosphate bond to a G. And so at those positions in the genome, there are enzymes that
00:59:21.000 can methylate the cytosine and demethylate it. And there's again, about 28 million of those sites
00:59:27.700 out of the 3 billion overall bases in the human genome. These chemical modifications are really
00:59:35.340 important because they affect things like gene expression. It's one of the more important classes
00:59:40.300 of something that's called epigenetics, which is changes that are outside of the genetics or outside
00:59:46.040 of the code itself. As you know, the DNA code is the same in most cells of the human body.
00:59:51.600 Obviously, the cells are quite different. So a T cell is very different than a neuron.
00:59:55.360 And other than the T cell receptor, all of the genes are the same. The code is the same. So why are the
01:00:00.900 cells different? Well, it's the epigenetics. So things like which parts of the gene are methylated
01:00:06.700 or which ones are associated with histones that are blocking access to the DNA, that that's what
01:00:13.460 ultimately determines which genes are transcribed, which proteins are made, and why cells take on
01:00:19.820 very different morphology and properties. The methylation is a very fundamental code
01:00:25.900 for controlling it. So I call the epigenetics the software of the genome. The genetic code is kind
01:00:32.920 of the hardware, but how you use it, which genes you use when, which combination, that's really the
01:00:38.740 epigenetics. What is the technological delta or difference in reading out the methylation sequence
01:00:48.640 on those CPG sites relative to the ease with which you simply measure the base pair sequences? So you can
01:00:58.460 measure C, G, A, T, C, C, A, T, G, et cetera. But then in the same readout, do you also acquire which of
01:01:07.160 the C's are methylated, or are you doing a separate analysis? There's different technologies to do that.
01:01:12.940 For cell-free DNA, usually you want very accurate sequencing of billions of these, or many hundreds
01:01:19.440 of millions of these small fragments. The way it's done is, and this adds complexity to the chemistry,
01:01:25.320 is you pre-treat the DNA in a way that encodes the methylation status in the ATG sequence,
01:01:32.800 and then you just use a sequencer that can only see ATCG. But because you've encoded the information,
01:01:39.780 you can then deconvolute it and infer which sites were methylated. Just to be a little more specific,
01:01:46.580 there's chemicals that will, for example, deaminate a cytosine that's not methylated.
01:01:53.520 And then that deaminated cytosine effectively turns into a uracil, which is a fifth letter in RNA.
01:01:59.920 And then when you copy the DNA and you amplify it prior to the sequencing, it amplifies as a T,
01:02:07.120 because a U, when it's copied by a DNA polymerase, it becomes a T. And then you end up with a sequence
01:02:13.160 where you expect to see C's and you see a T. And if you see a T there, then you know that,
01:02:18.900 aha, this must have been an unmethylated site.
01:02:21.900 That came from a U, and the U is an unmethylated C. Brilliant.
01:02:26.080 Brilliant. Right. And if the C was not changed, then you say, then that must have been a site that
01:02:31.120 was methylated. Because you'll see G's opposite them. Oh, sorry. If the C was methylated, you'll
01:02:36.300 see the G opposite because you won't turn it to the uracil. Right, right. Yeah. Brilliant.
01:02:41.560 That technique is called bisulfite sequencing. There are other ways to do it, but that's the
01:02:45.780 predominant of it. All right. So now back to the question I started to ask a minute ago,
01:02:49.780 but then realized we hadn't talked about methylation. So you've come up with all these different
01:02:53.620 things that you can do with this tiny amount of blood. Because again, you talk about 10 ML,
01:02:59.920 you know, in the grand scheme of things, that's a really small amount of blood. That's two small
01:03:04.340 tubes of blood. Very easy to do. Presumably there was an optimization problem in here where you min
01:03:10.440 max this thing and realize, well, look, this would be easy to do if we could take a liter of blood,
01:03:15.400 but that's clinically impossible. Yeah. It would be nice to Theranos this quote unquote,
01:03:20.540 and do this with a finger stick of blood, but you're never going to get the goods.
01:03:24.540 So did you sort of end up at 10 ML? Was it just sort of an optimization problem that got you there
01:03:30.340 as the most blood we could take without being unreasonable, but yet still have high enough
01:03:35.380 fidelity? And maybe asked another way, can you get better and better at doing this if you were taking
01:03:40.980 eight tubes of blood instead of two? Yeah. There's a couple of considerations. One is the
01:03:46.000 practical one. You need a format to the extent your standard phlebotomy and standard volumes that are
01:03:52.340 below the volumes at which you could put someone in jeopardy. That's a big practical issue. But it
01:03:58.560 actually turned out that what ultimately limited the sensitivity of the test was the background biology.
01:04:06.140 So for broad-based cancer screening, more blood would actually not help you. Now there's other
01:04:11.180 applications for monitoring or the therapy selection where you're looking for a very particular target,
01:04:18.520 someone who has cancer and you know what kind of cancer, and there you could improve your sensitivity.
01:04:23.280 But just for cancer screening, you're usually not limited by the amount of blood.
01:04:29.400 And so did methylation turn out to be the most predictive element at giving you that very,
01:04:35.860 very high specificity, or was it some combination of those measurable factors?
01:04:42.340 Yeah. So it was pretty unexpected. I would say going into it, most people thought that the mutations
01:04:47.880 were going to be the most sensitive method. Some of us thought that the chromosomal changes were going
01:04:53.860 to be the most sensitive. I would say the methylation signals were kind of a dark horse. I had to fight
01:04:59.640 several times to keep it in the running. But again, we really took a, let the data tell us what's
01:05:05.860 the right thing to do. It's not biases from other experiments. Let's do this in a comprehensive,
01:05:11.120 rigorous way. And in the end, the methylation performed by far the best. So it was the most
01:05:17.220 sensitive. So it detected the most cancers. Importantly, it was very specific. It actually
01:05:22.680 had the potential and ultimately did get to less than 1% false positive rate. And then the methylation
01:05:28.760 had this other feature, which was very unique, which was that it could predict the type of cancer.
01:05:34.640 What was the original, what we call now the cancer site of origin? What organ or tissue did it originate
01:05:41.880 from? Interestingly, adding them all together didn't improve on the methylation. I can explain
01:05:47.920 why. And now in hindsight, you might've thought, Hey, more types of information and signal are better,
01:05:53.620 but it actually did it. So we ended up with one clear result that the methylation patterns in the
01:05:59.960 cell free DNA were the most useful and information and adding other things was not going to help the
01:06:06.440 performance. And why do you think that was? Because it is a little counterintuitive. There are clearly
01:06:13.080 examples I could probably think of where you can degrade the signal by adding more information.
01:06:18.440 But I'm curious if you have a biologic teleologic explanation for why one and only one of these
01:06:26.300 metrics turned out to be the best and any additional information only diluted the signal.
01:06:31.860 It comes down to, this is a good engineering principle, right? If you want to improve your
01:06:36.600 prediction, you need an additional signal that carries information and is independent from your
01:06:42.580 initial signal. If it's totally correlated, then it doesn't actually add anything.
01:06:47.760 Let's take an analogy. Let's say you're on a freeway overpass and you're developing an image
01:06:53.320 recognition for Fords. And you say, okay, what I'm going to start initially with is an algorithm.
01:06:59.280 It's going to look for a blue oval with the letters F-O-R-D in it. So that's pretty good. 0.98
01:07:04.360 Now let's say you say, okay, I know that some Fords also have the number 150 on the side,
01:07:10.840 F-150. So I'm going to add that, right? If you think about it, if your algorithm
01:07:17.600 based on the blue oval is already pretty good, adding the 150 is not going to help because
01:07:24.740 whenever the 150 occurs, the blue oval is also always there. Now, if the blue oval wasn't always
01:07:31.280 there or there were Fords that didn't have the blue oval, then some other signal could be helpful.
01:07:35.940 And so that's kind of what ended up happening is that the methylation signal was so much more
01:07:41.680 prevalent and so much more distorted in cancer that everything else didn't really add because
01:07:48.840 anytime you could see one of the others, you could also see many more abnormal methylation fragments.
01:07:55.820 Yeah, that's really fantastic. I guess I also want to, again, just go back and make sure people
01:08:01.480 understand the mission statement you guys brought to this, which was high specificity is a must.
01:08:09.040 So people have heard me on the podcast do this before, but just in case there are people
01:08:13.300 who haven't heard this example or forget it, I sometimes like to use the metal detector analogy
01:08:18.820 in the airport to help explain sensitivity and specificity. So sensitivity is the ability of
01:08:25.400 the metal detector to detect metal that should not go through. And let's be clear. It's not that people
01:08:32.600 in the airports care if your phone is going through or your laptop or your watch or your belt,
01:08:38.440 they care that you're bringing guns or explosives. That's why we have metal detectors or knives or
01:08:45.060 things of that nature. That's why the metal detector exists. It has to be sensitive enough
01:08:50.640 that no one carrying one of those things can get through. On the other hand, specificity would say,
01:08:58.100 so if you're optimizing for sensitivity, you make it such that you will detect any metal that goes
01:09:03.680 through that thing. And by definition, you're going to be stopping a lot of people. You're going to stop
01:09:10.280 everybody from walking through. If their zipper is made of metal, you'll stop them.
01:09:14.740 Or prosthetic or a big belt or boots or anything. You got a little metal on your glasses, you're going
01:09:21.120 to get stopped. So you have to dial the thing in a way so that you have some specificity to this test
01:09:27.100 as well, which is I can't just stop everybody. In an ideal world, I kind of want everyone to make
01:09:33.380 it through who's not carrying one of those really bad things. And we're defining bad thing by a certain
01:09:38.900 quantity of metal. And therefore, your specificity is to kind of say, I don't want my test to be
01:09:47.580 triggered on good guys, right? I want my test to be triggered on bad guys. Now, when you guys are
01:09:54.820 designing a test like this, like the Grail test, I guess I should just go back and state anybody
01:10:00.300 who's ever been through two different airports wearing the exact same clothing and realizes
01:10:06.240 sometimes it triggers, sometimes it doesn't. What you realize is not every machine has the same
01:10:09.960 setting. And that's because the airport, the people at TSA, they turn up or turn down the sensitivity
01:10:15.500 and that changes the specificity as well. How deliberately do you, when you're setting up this
01:10:23.240 assay, have the capacity to dial up and down sensitivity and specificity? So while I understand
01:10:29.480 your mandate was a very high specificity test, where was the control or manipulation of that system,
01:10:37.940 if at all? So there's a threshold. It's complex. Conceptually, there's a threshold inside the
01:10:44.000 algorithm, right? So you can imagine that after you have this comprehensive map of all these different
01:10:51.140 types of methylation changes that can occur in the fragments of hundreds of examples of every cancer
01:10:57.200 type. And then you compare it to all the methylation changes that can occur outside of cancer, which we
01:11:03.880 haven't talked about, which is very important. So most of the methylation patterns are pretty similar
01:11:09.380 and similar cell types across individuals. But there are changes that occur that occur with age or
01:11:14.760 ethnicity or environmental exposure and so on. What you'd like is those two populations to be
01:11:21.500 completely different. But it turns out there is some overlap. So there are fragments that occur in
01:11:28.040 cancer that can occur outside of cancer. The algorithm in a very complex state space is trying to
01:11:35.320 separate these populations. And whether or not you're going to call something as a potential cancer
01:11:42.580 and say a cancel signal is detected is whether or not the algorithm thinks, is it associated with
01:11:47.620 this cancer group or is it associated with a non-cancer group? But again, there's some overlap
01:11:54.140 between these. And so where you set that overlap, like in the border between individuals who don't have
01:12:01.860 cancer, but how for whatever reason, an abnormal level of fragments that kind of look cancerous,
01:12:07.540 that will determine your specificity. So there is a dial to turn where you can increase the
01:12:14.440 stringency, call fewer false positives, but then you will start to miss some of the true positives.
01:12:21.260 Now, what was so great about methylation is that these populations were pretty well separated,
01:12:26.680 better than anything the world had ever seen before, which is why you could get high specificity
01:12:31.940 and still pretty good sensitivity. But yes, there is some overlap, which means you have to make a
01:12:38.040 trade-off and dial it in. Inside the company, is there sort of a specific discussion around
01:12:44.320 the trade-offs of it's better to have a false positive than have a false negative? Like let's
01:12:49.800 use the example you brought up earlier, right? So prostate-specific antigen is kind of the mirror
01:12:54.380 image of this, right? It's a highly, highly sensitive test with very low specificity. It's obviously a
01:13:01.120 protein, so it's a totally different type of assay, right? It's a far cruder test, of course.
01:13:05.600 But the idea is, in theory, and of course I could give you plenty of examples, someone with prostate
01:13:11.580 cancer is going to have a high PSA. So you're not going to miss people with cancer. But as you pointed
01:13:18.420 out earlier, you're going to be catching a lot of people who don't have cancer. And it's for that
01:13:24.060 reason, as you said, there is no longer a formal recommendation around the use of PSA screening.
01:13:28.820 It has now kind of been relegated to the just talk to your doctor about it. And of course,
01:13:34.540 the thinking is, look, there are too many men that have undergone an unnecessary prostate biopsy on the
01:13:40.580 basis of an elevated PSA that really should have been attributed to their BPH or prostatitis or
01:13:46.660 something else. So notwithstanding the fact that we have far better ways to screen for prostate cancer
01:13:51.440 today, that's a test that is highly geared towards never missing a cancer. In its current format,
01:13:59.260 under low prevalence populations, which is effectively the population it's being designed
01:14:05.580 for, right? This is designed as a screening tool. It seems to have better negative predictive value
01:14:10.900 than positive predictive value, correct? It's pretty high in both because negative predictive
01:14:15.260 value also is related to prevalence. Well, just to put some numbers out there, right? So
01:14:20.200 in the CCGA study, but then importantly, in an interventional study called Pathfinder,
01:14:26.620 a positive predictive value is around 40%. That's all stages?
01:14:31.680 Yeah. So that's all cancers, all stages. It's a population study. So it's whatever natural
01:14:37.220 set of cancers and stages occur in that group. So that was about 6,500 individuals.
01:14:43.920 Do you recall, Alex, what the prevalence was in that population? Was it a low risk population?
01:14:50.200 Yeah. So it was a mix of a slightly elevated risk population and then a average risk population.
01:14:58.240 Just in terms of risk, and I think you'll appreciate this, I think of anyone over 50 as
01:15:02.720 high risk. And that's where the majority of these studies are happening, right? So I mean,
01:15:07.000 age is your single biggest risk factor for cancer. The population over 50 is about a 10x increased risk
01:15:14.640 relative to the population under 50.
01:15:17.920 And age 55 to 65 is the decade where cancer is the number one cause of death.
01:15:23.620 I would say in developed nations, I mean, that's actually increasing, right? I mean,
01:15:27.920 we're making such incredible progress on metabolic disease and cardiovascular disease. Cancer in the
01:15:34.000 developed world is predicted to become surpass cardiovascular disease as the number one killer.
01:15:39.020 Anyway, older populations are at, I wouldn't call them low risk, I'd call them average risk for that 1.00
01:15:45.100 age group, which is still relatively high for the overall population. But it was a mixed prevalence,
01:15:50.440 a bit less than 1%. Some of these studies do have a healthy volunteer bias.
01:15:55.840 In a 6,500 person cohort with a prevalence of 1%, which is pretty low, the positive predictive value was 40%.
01:16:06.020 Yep, that's right.
01:16:07.880 What was the sensitivity for all stages then? It must have been,
01:16:11.560 it's easy to calculate if I had my spreadsheet in front of me, but it's got to be 60% or higher.
01:16:17.000 Sensitivity and specificity has got to be close over 99% at that point, right?
01:16:21.760 Those are the rough numbers. Yeah, that's right. Some of the important statistics there, right? So about
01:16:26.780 half of the cancers that manifested over the lifetime of the study were detected by the test. The test
01:16:34.100 actually doubled the number of cancers in that interventional study than were detected by standard
01:16:39.420 of care screening alone. The interventional study, the Pathfinder study, the enrollees were getting
01:16:45.920 standard of care screening according to guidelines. So mammography, importantly, cervical cancer
01:16:52.120 screening, and then colonoscopies or stool-based testing based on guidelines. And so a number of the
01:16:58.340 cancers that the grail gallery test detected were also detected by standard of care, which you would
01:17:04.400 expect. But the total number of cancers found was about doubled with the addition of the gallery test.
01:17:11.880 And that was predominantly cancers where there isn't a screening test for. But just going back to
01:17:17.260 the positive predictive value, just the positive predictive value of most screening tests is low single
01:17:22.700 digits. You probably have the experience more than I have, but many, many times a female colleague, 1.00
01:17:29.060 friend, or someone's wife calls and said, you know, I got a mammography. They found something. I'm
01:17:34.480 going to have to go for a follow-up, a biopsy, and so on. And literally 19 times out of 20, it's a false
01:17:41.580 positive. That's one where we've accepted, for better or worse, a huge false positive rate. Catch some
01:17:48.560 cancers, right? And that's why there's a fair amount of debate around mammography. But again,
01:17:53.040 that's a positive predictive value of about four and a half percent. The vast majority of people who
01:17:58.680 get initial positive, they're not going to end up having cancer, but still potentially worth it.
01:18:05.220 Now we're talking about something where we're approaching one or two positive tests will
01:18:10.340 ultimately lead to a cancer diagnosis that's potentially actionable. So it's, I think sometimes
01:18:15.920 when people hear 40%, they say, gee, that means there's still a fair amount of people who are
01:18:22.240 going to get a positive test, meaning a cancer signal detected and ultimately not. But again,
01:18:28.240 for a screening test, that's incredibly high yield. I think another way to think about that is to go
01:18:33.600 back to the airport analogy. So this is a metal detector that is basically saying, look, we're willing
01:18:42.960 to beep at people who don't have knives to make sure everybody with a knife or gun gets caught.
01:18:49.280 So the negative predictive value is what's giving you the insight about the bad guys. So a 40% positive
01:18:56.780 predictive value means, let's just make the numbers even simpler. Let's say it's a 25% positive predictive
01:19:03.680 value. It means for every four people you stop, only one is a true bad guy. Think about what it's like
01:19:12.420 in the actual airport. How many times in a day does the metal detector go off and how many times in a
01:19:19.380 day are they catching a bad guy? The answer is it probably goes off 10,000 times in a day and they
01:19:25.540 catch zero bad guys on average. So that gives you a sense of how low the positive predictive value is
01:19:31.480 and how high the sensitivity is and how low the specificity is. So yes, I think that's a great way to
01:19:37.320 look at it, which is if you are screening a population that is of relatively normal risk,
01:19:45.740 a positive predictive value of 20% is very, very good. It also explains, I think, where the burden
01:19:54.820 of responsibility falls to the physician, which is as a physician, I think you have to be able to talk
01:20:00.760 to your patients about this explicitly prior to any testing. I think patients need to understand that,
01:20:10.220 hey, there's a chance that if I get a positive test here, it's not a real positive. I have to have
01:20:17.000 kind of the emotional constitution to go through with that, and I have to be willing to then engage
01:20:22.460 in follow-up testing. Because if this thing says, oh, you know, Alex, it looks like you have a lung cancer,
01:20:28.800 the next step is, I'm going to be getting a chest x-ray, or I'm going to be getting a low-dose CT
01:20:33.140 of my chest. And that doesn't only come with a little bit of risk, in this case, radiation,
01:20:37.740 although it's an almost trivial amount, but I think more than anything, it's the risk of the emotional
01:20:42.880 discomfort associated with that. And I think, honestly, when you present the data this way to patients,
01:20:49.020 they really understand it, and they really can make great informed decisions for themselves.
01:20:53.580 And by the way, for some of those patients, it means, thank you, but no thank you.
01:20:56.820 I just don't want to go through with this, and that's okay, too. Let's talk a little bit about
01:21:01.160 some of the really interesting stuff that emerged in the various histologies and the various stages.
01:21:08.140 And I've had some really interesting discussions with your colleagues. I guess, just for the sake
01:21:12.480 of completing the story, you're no longer a part of the company, Grail. Maybe just explain that so
01:21:18.500 that we can kind of get back to the Grail stuff, but just so that people understand kind of your
01:21:22.300 trajectory. We should do that. Yeah. So I was at Illumina, and then I helped spin off Grail as a
01:21:27.900 co-founder, led the R&D and clinical development. I actually went back to Illumina as the chief
01:21:34.340 technology officer running all of the company's research and development. Really, really fantastic,
01:21:39.640 fun job. Subsequently, Illumina acquired Grail, solely owned subsidiary of Illumina.
01:21:47.360 That was almost three years ago. Recently, I left Illumina to start a new company, a really
01:21:54.220 interesting biotech company that I'm the CEO of. No longer actively involved in either company.
01:22:00.200 I have great relations with all my former colleagues. Excited to see their progress.
01:22:04.960 I should also say that I am still a shareholder also of Illumina, just for full disclosure.
01:22:10.020 Yeah. Thank you. You have a number of colleagues, as you said, who are still at Grail,
01:22:13.740 who I've gotten to know. One of the things that really intrigued me was, again, some of the
01:22:20.100 histologic differences and the stage differences of cancer. If you look at the opening data,
01:22:29.600 a few things stood out. There were certain histologies that, if you took them all together
01:22:35.660 by stage, didn't look as good as others. For example, talk a little bit about prostate cancer
01:22:42.320 detection using the gallery test. I think what you're referring to is there's a very wide variety
01:22:50.480 of different performances in different cancers. They're all highly specific, so very low false
01:22:55.700 positive rate because there's only one false positive rate for the whole test, which is probably worth
01:23:00.720 spending some time on later. For example, sensitivity to many GI cancers or certain histologies of lung
01:23:08.280 cancer, the test is very good at detecting earlier stage localized cancers. Particularly in prostate
01:23:15.080 cancer and hormone receptor positive breast cancer, the detection rate is lower for stage one cancers.
01:23:23.620 But this gets to a very important issue, which is what is it that you want to detect? So do you want
01:23:29.720 to detect everything that's called a cancer today? Or is what you want to detect is you want to detect
01:23:33.980 cancers that are going to grow and ultimately cause harm? So the weird thing about cancer screening in
01:23:40.020 general is there's both over and under diagnosis. Most small breast cancers and most DCIS and most
01:23:46.960 even small prostate cancers will never kill the patient or cause morbidity, but there is a small
01:23:52.640 subset that will. And so for those, we have decided to, again, go for a trade-off where we'll often
01:24:00.420 resect things and go through treatments just to make sure that smaller percentage is removed,
01:24:05.660 even though we're removing a ton of other, quote, cancers that are unlikely to ever proceed into
01:24:12.540 anything dangerous. On the flip side, 70% of people who die of cancer, they die from an unscreened cancer.
01:24:19.900 So there's huge underdiagnosis. You should remember that. 70% of people who ultimately die of cancer on
01:24:27.460 their death certificate, they die from a cancer where there was no established screening prior to
01:24:32.900 something like Grail's Gallery. So we have this weird mix of, there's a lot of cancers where we
01:24:37.540 know we're overdiagnosing, but we're doing it for a defensible trade-off. And then there's a huge
01:24:43.400 number of cancer deaths occurring where there's essentially zero diagnosis. But back to the ones
01:24:49.020 where there's underdiagnosis, it gets back to what does it mean to have tumor DNA in your blood?
01:24:55.420 So measuring and detecting a cancer from tumor DNA in your blood is a functional asset.
01:25:01.360 To get tumor DNA in your blood, you have to have enough cells. They have to be growing fast enough,
01:25:07.380 dying fast enough, and have blood access. So those are the things that you require.
01:25:13.180 Now, if you have a tumor that's small, encapsulated, not growing, well, guess what? It's not going to have
01:25:20.180 DNA in the blood. So unlike an imaging assay, which is anatomical, this is really a functional
01:25:26.200 asset. You're querying for whether or not there's a cancer that has the mass, the cell activity and
01:25:33.160 death, and access to the blood to get and manifest its DNA into the blood. So it's really stratifying
01:25:41.600 cancers on whether or not they have the activities. Now, interestingly, this functional assay
01:25:47.180 is very correlated with ultimate mortality. There's a really nice set of data that the
01:25:53.700 GRAIL put out where you look at Kaplan-Meier curves. So over the course of the CCGA study,
01:25:58.960 which is now going out, I don't know, five plus years, you can say, well, what do survival curves
01:26:04.600 look like? If you were positive, your test was detected versus your test was negative, meaning your
01:26:10.540 cancer was not detected by the GRAIL test. And there's a big difference. So basically,
01:26:15.020 if your cancer was undetectable by the GRAIL test, you have a very good outcome, much,
01:26:23.740 much better than the general population with that cancer. So this suggests two things. One is,
01:26:29.380 A, those cancers may not have actually been dangerous because there's not a lot of mortality
01:26:34.240 associated with them. And maybe that's also why they couldn't put their tumor DNA in the blood.
01:26:38.900 The other is whatever the existing standard of care is, it's working well. Now, if you look at all
01:26:45.060 the cancers in the Kaplan-Meier curve that were detected, they have a lot of mortality associated
01:26:50.920 with them. And so what it's showing is that it's the dangerous cancers, the cancers that are
01:26:55.500 accounting for the majority of mortality, those are the ones that the test is detecting.
01:27:00.580 This biological rationale makes a lot of sense, which is, okay, a tumor that grows fast, can get
01:27:06.520 its DNA in the blood. Well, that's probably also a dangerous tumor that is going to become invasive 0.52
01:27:11.000 and spread. So again, it's a functional assay. So if your test is detected by one of these tests,
01:27:18.680 like the gallery test, it's saying something about the tumor that is very profound, which is that it's
01:27:25.260 active enough to get its signal into the blood. And it's very likely, if untreated, to ultimately
01:27:32.500 be associated with morbidity and potentially mortality. I think it's an open question of
01:27:38.760 these tumors that aren't detectable and that are in cancers, we know there's a lot of indolent
01:27:45.440 disease. What does it really mean that the test is low sensitivity for that?
01:27:50.220 Yeah. I would say that when I went through these data and I went through every single histology
01:27:56.880 by stage, I did this exercise probably 18 months ago. The one that stood out to me more than any
01:28:04.660 other was the sensitivity and specificity discrepancy. Well, I should say the sensitivity
01:28:12.100 discrepancy between triple negative breast cancer and hormone positive breast cancer. You alluded to
01:28:20.040 this, but I want to reiterate the point because I think within the same quote unquote disease of
01:28:25.240 breast cancer, we clearly understand that there are three diseases. There's estrogen positive,
01:28:31.280 there's HER2 new positive, there's triple negative. Those are the defining features of three
01:28:36.140 completely unrelated cancers with the exception of the fact that they all originate from the same
01:28:42.360 mammary gland. But that's about where the similarity ends. Their treatments are different,
01:28:46.580 their prognoses are different. And to take the two most extreme examples, you take a woman who has 0.98
01:28:53.160 triple positive breast cancer, i.e. it's estrogen receptor positive, progesterone positive,
01:28:59.040 HER2 new positive. You take a woman who has none of those receptors positive. The difference on the 1.00
01:29:05.040 gallery test performance on stage one and stage two, so this is cancers that have not even spread to
01:29:12.780 lymph nodes. The hormone positives were about a 20% sensitivity for stage one, stage two,
01:29:18.560 and the triple negative was 75% sensitivity for stage one, stage two. And so this underscores your
01:29:26.220 point, which is the triple positive cancer is a much, much worse cancer. And that at stage one,
01:29:34.500 stage two, you're detecting 75% sensitivity portends a very bad prognosis. Now, I think the really important
01:29:45.900 question here, I believe that this is being asked, is does the ability to screen in this way lead to better
01:29:56.320 outcomes? So I will state my bias, because I think it's important to put your biases out there,
01:30:02.220 and I've stated it publicly many times, but I'll state it again. My bias is that yes, it will. My bias
01:30:09.260 is that early detection leads to earlier treatment. And even if the treatments are identical to those
01:30:17.220 that will be used in advanced cancers, the outcomes are better because of the lower rate of tumor burden.
01:30:23.580 And by the way, I would point to two of the most common cancers as examples of that, which are breast
01:30:30.140 and colorectal cancer, where the treatments are virtually indistinguishable in the adjuvant setting
01:30:36.320 versus the metastatic setting. And yet the outcomes are profoundly different. In other words, when you
01:30:42.060 take a patient with breast or colorectal cancer, and you do a surgical resection, and they are a stage three
01:30:47.920 or less, and you give them adjuvant therapy, they have far, far, far better survival than those patients who
01:30:56.300 undergo a resection, but have metastatic disease and receive the same adjuvant therapy. It's not even
01:31:02.380 close. And so that's the reason that I argue that the sooner we know we have cancer and the sooner we
01:31:08.220 can begin treatment, the better we are. But the skeptic will push back at me and say, Peter, the only thing
01:31:14.280 the Grail test is going to do is tell more people bad news. So we'll concede that people are going to
01:31:22.480 get a better, more relevant diagnosis, that we will not be alerting them to cancers that are irrelevant
01:31:29.920 and over-treating them. And we will alert them to negative or more harmful cancers, but it won't
01:31:37.060 translate to a difference in survival. So what is your take on that? And how can that question be
01:31:43.920 definitively answered? It's a very important question. And over time, it will be definitively
01:31:50.700 answered. So we should talk about some of Grail's studies and how they're going about it.
01:31:55.640 So the statistics are very profound, like you said. So most solid tumors, five-year survival,
01:32:01.600 when disease is localized, hasn't spread to another organ, 70 to 80% five-year survival,
01:32:08.360 less than 20 per metastatic stage four disease. That correlation of stage diagnosis versus five-year
01:32:16.520 survival is night and day. And obviously, everyone would want them and their loved ones,
01:32:21.620 most people in the localized disease category. Now, there's an academic question, like you're
01:32:28.020 saying, which is, okay, well, that's true. But does that really prove that if you find people at
01:32:33.240 that localized disease through this method, as opposed to all the variety of methods that happens today,
01:32:39.300 incidentally, that you will have the same outcome? And sure, I guess you could come up with some
01:32:45.340 very theoretical possibility that somehow that won't, but that doesn't seem very likely.
01:32:52.940 And I think it gets to a fundamental question of, well, are we going to wait decades to see that?
01:32:59.480 And in the meantime, give up the possibility, which is probably likely, that finding these cancers early
01:33:06.360 and intervening early will change outcome. I'm all for, and I think everyone is, bigger and more
01:33:13.140 definitive studies over time. But the idea that we're never going to do that study or just take
01:33:19.560 kind of a nihilistic point of view, that until it's done, we're not going to find cancers early
01:33:24.520 and intervene, I don't think it's conscionable to do that, especially when the false positive rate's low.
01:33:30.220 I think there's a few other ways to come at it, which is, if what you said was really true,
01:33:35.040 I've met some of the folks and called by them, the GRAIL test has found the positive.
01:33:38.920 I can think of a former colleague in the test found an ovarian cancer. Do you think when she
01:33:45.180 went to her OBGYN and said, look, the test said that I have potentially an ovarian cancer and they
01:33:50.560 did an ultrasound and they found something that OBGYN said, you know what, since this was found
01:33:56.080 through a new method, let's not intervene. There's a malignancy. It is an ovarian cancer. We know what the
01:34:02.920 natural history is, but we're not going to intervene. Similarly with cases of pancreatic cancer,
01:34:08.120 head and neck or things like that. I don't understand the logic because today people do
01:34:13.420 show up. It's not very often with early stage versions of these disease, ovarian, pancreatic,
01:34:18.180 head and neck and things, and we treat them. So why is it you wouldn't treat them if you could find 0.99
01:34:23.480 them through this modality? I just don't know of any GI surgeon who says, well, you're one of the
01:34:29.460 lucky people where you found your pancreatic cancer at stage one, two, but we're not going to treat it
01:34:33.460 because there isn't definitive evidence over decades that mortality isn't better. So I get
01:34:39.200 the academic point and Grail and others are investing tremendous amount to increase the data.
01:34:45.360 The idea that we have this technology and we're going to allow huge numbers of cancers to just
01:34:51.840 progress to late stage before treating, I don't think that's the right balance of potential benefit
01:34:58.460 versus burden of evidence. So is there now a prospective real world trial ongoing in Europe?
01:35:05.960 There it is. Let's talk a little bit about that.
01:35:08.320 The NHS has been piloting the Grail test in a population of about 140,000. So it involves
01:35:15.840 sequential testing, I think at least two tests, and then they look at outcomes. It's an interventional
01:35:22.800 study with return of results. And they're looking for a really interesting endpoint here. So mortality
01:35:29.420 takes time. So, I mean, some cancers, I mean, to ultimately see whether or not getting diagnosed at
01:35:35.460 a different stage and the intervention changes that that could take one or in some cases, two decades,
01:35:41.040 but they came up with a really interesting surrogate endpoint, which is reduction in stage four
01:35:46.440 cancers. So here's the logic. I think it makes a lot of sense, which is if people stop getting
01:35:51.480 diagnosed with stage four and say a big reduction in stage three cancer, then doesn't it stand to
01:35:57.500 reason that ultimately you will reduce mortality? So if you remove the end stage version of cancer,
01:36:04.900 which kills most people, and that you know that you have to pass through, most people don't die
01:36:10.380 of stage two cancer. They were diagnosed with stage two, they died because it turned out it wasn't stage
01:36:14.900 two and it spread. If you do a study and within a few years, when you're screening people at
01:36:21.480 there's no more, and let's take the extreme stage four cancer, then you've stage shifted the
01:36:26.400 population and you're kind of eliminating late stage metastatic cancer. So again, I think while
01:36:33.760 we're waiting for that to read out, my personal belief is the potential benefit of finding cancer
01:36:39.460 is so significant. Testing now for many patients makes sense. And then I think this endpoint of stage
01:36:48.460 four reduction. Yeah, that's a clever, clever endpoint. One of the things that I know that a lot
01:36:55.660 of the folks who oppose cancer screening tend to cite is that a number of cancer screening studies
01:37:01.920 do not find an improvement in all cause mortality, even when there's a reduction in cancer specific
01:37:08.340 mortality. So, hey, we did this colonoscopy study, or we did this breast cancer screening study,
01:37:14.140 and it indeed reduced breast cancer deaths, but it didn't actually translate to a difference in
01:37:18.780 all cause mortality. I've explained this on a previous podcast, but it is worth for folks who
01:37:23.380 didn't hear that to understand why. To me, that's a very, very, oh, how can I say this charitably?
01:37:30.360 That's a very misguided view of the literature because what you fail to appreciate is those studies
01:37:37.060 are never powered for all cause mortality. And if you reduce breast cancer mortality by 40% or 30%,
01:37:46.720 that translates to a trivial reduction in all cause mortality because breast cancer is still just one 0.99
01:37:54.920 of 50 cancers. And even though it's a relatively prevalent cancer over the period of time of a study,
01:38:00.860 which is typically five to seven years, the actual number of women who were going to die of 1.00
01:38:05.780 breast cancer is still relatively small compared to the number of women, period, who were going to die
01:38:11.780 of anything. And I, in previous podcasts have discussed that it's very difficult to get that
01:38:18.480 detection within the margin of error. And so if you actually wanted to be able to see how that
01:38:24.960 translates to a reduction in all cause mortality, you would need to increase the size of these studies
01:38:30.140 considerably, even though really what you're trying to do is detect a reduction in cancer
01:38:35.360 specific mortality. I say all of that to say that I think one of the interesting things about the
01:38:41.460 NHS study is it is a pan screening study. And to my knowledge, it's the first. In other words,
01:38:48.980 it has the potential to detect many cancers and therefore you have many shots on goal. Potentially,
01:38:56.520 this could show a reduction in all cause mortality and not just cancer specific mortality. I would have to
01:39:02.500 see the power analysis, but I wonder if the investigators thought that far ahead. Do you
01:39:06.840 know? I mean, they're going to follow these patients long-term. They will get, be able to
01:39:11.720 have the data on mortality. I don't know if it's powered for all cause. I think that's unlikely just
01:39:19.500 for the reasons you said, which is the numbers would be really high. I mean, again, if you're powering
01:39:25.200 it to see a reduction in stage four over a couple of years, that may not be enough.
01:39:31.620 Interesting. Well, time will tell. Alex, I want to pivot if we've got a few more minutes
01:39:35.860 to a topic that you and I spend a lot of time talking about these days. And so by way of
01:39:41.520 disclosure, you sort of noted that you've left Illumina somewhat recently. You've started another
01:39:47.360 company. I'm involved in that company as both an investor and an advisor, and it's an incredibly
01:39:51.980 fascinating subject. But one of the things that we talk about a lot is going back to this role of
01:39:59.380 the epigenome. So I think you did a great job explaining it and putting it in context. So we've
01:40:04.800 got these 3 billion base pairs and lo and behold, some 28 million of them also happen to have a methyl
01:40:13.520 group on their C. I'll fill in a few more details that we didn't discuss on the podcast, but just to
01:40:19.520 throw it out there. As a general rule, when we're born, we have kind of our max set of them. And as
01:40:26.060 we age, we tend to lose them. As a person ages, the number of those methylation sites goes down.
01:40:33.780 You obviously explain most importantly what they do, what we believe their most important purpose is,
01:40:39.200 which is to impact gene expression. It's worth also pointing out that there are many hallmarks of
01:40:46.400 aging. There are many things that are really believed to be at the fundamental level that
01:40:52.680 describes why you and I today look and function entirely different from the way we did when we
01:41:00.080 met 25 years ago. We're half the men we used to be. I could make a Laplace Fourier joke there, but I will
01:41:06.860 refrain. So I guess the question is, Alex, where do you think methylation fits in to the biology of
01:41:17.800 aging? That's a macro question, but... Yeah, yeah. So you talked about the hallmarks of aging,
01:41:24.320 because the author, I think it was Hanrahan, came up with that about 10 years ago, this hallmarks of
01:41:29.620 aging. And he recently gave a talk where he talked about perhaps methylation is the hallmark of
01:41:35.920 aging. And what he's referring to is the mounting data that the epigenetic changes are the most
01:41:45.280 descriptive of aging and are becoming more and more causally linked to aging events.
01:41:50.900 There's lots of data that show that people of comparable age, but different health status,
01:41:58.460 for example, smokers versus non-smokers, people who exercise versus people who don't,
01:42:03.240 people who are obese versus people who are not, can have very different methylation patterns.
01:42:09.500 There's also some data that look at centenarians relative to non-centenarians. And obviously,
01:42:17.780 that's a complicated analysis because by definition, there's a difference in age,
01:42:21.900 but you get a sense of different patterns of methylation. And clearly, centenarians we've
01:42:26.880 established long ago do not acquire their centenarian status by their behaviors.
01:42:31.940 Just look at Charlie Munger and Henry Kissinger, two people who recently passed away at basically
01:42:37.700 the age of a hundred, despite no evidence whatsoever that they did anything to take care
01:42:42.360 of themselves. So clearly their biology and their genes are very protective. As you said,
01:42:49.200 there are a bunch of these hallmarks. I think the original paper talked about nine and that's
01:42:54.160 been somewhat expanded. But you share that view, I suppose, that the epigenome sits at the top
01:43:01.480 and that potentially it's the one that's impacting the other. So when we think about
01:43:05.840 mitochondrial dysfunction, which no one would dispute, mine and yours are nowhere near as good
01:43:12.700 as they were 25 years ago. Our nutrient sensing pathways, inflammation, all of these things are
01:43:18.160 moving in the wrong direction as we age. How do you think those tie to methylation and to the epigenome
01:43:24.860 and to gene expression by extension?
01:43:27.220 Maybe let's reduce it to like a kind of an engineering framework. If we took Peter's epigenome
01:43:34.160 from 25 years ago when I first met you, right? And we knew for every cell type and every cell,
01:43:41.860 what was the methylation status at all 28 million positions? We had recorded that and we took yours
01:43:48.100 today where most of those cells have deviated from that and we could flip all those states back.
01:43:55.700 That's kind of how I think about it is the cells don't go away, just whether or not they have the
01:43:59.580 methyl group or not changes. And some places gain it, some places lose it. If we could flip all those
01:44:06.600 back, would that force the cell to behave like it was 25 years ago? Express genes, the fidelity with
01:44:15.320 which it controlled those genes, the interplay between them, would it be reprogrammed back to
01:44:20.860 that state? And so that I think is a really provocative hypothesis. We don't know that for
01:44:27.720 sure, but there's more and more evidence that that might be possible. And so to me, that's the
01:44:33.080 burning question is now that we have the ability to characterize that and we know what it looks like
01:44:37.760 in a higher functioning state, which correlates with youth, and we are gaining technologies to be able
01:44:44.280 to modulate that and actually change the epigenome as opposed to modifying proteins or gene expressions,
01:44:50.120 but actually go in and remethylate and demethylate certain sites. Can we reprogram things back to that
01:44:57.780 earlier state? And if it is the root level at which things are controlled, will you then get all of the
01:45:04.300 other features that the cell had and the organism had? That's a really exciting question to answer.
01:45:09.660 Because if the answer is yes, or even partially yes, then it gives us a really concrete way to go
01:45:15.700 about this. And so we talk about the hallmarks and the hallmarks are complex and interrelated.
01:45:21.680 What I like about the epigenome is we can read it out and we're gaining the ability to modify it
01:45:27.240 directly. So if really it's the most fundamental level at which all of these other things are
01:45:32.040 controlled, it gives us, again, maybe back to the early discussion, a very straightforward
01:45:37.100 engineering way to go about this. Let's talk a little bit about how that's done.
01:45:41.840 A year ago, you were part of a pretty remarkable effort that culminated in a publication in Nature,
01:45:48.480 if I recall, it sequenced the entire human epigenome. So if we had the Human Genome Project
01:45:54.280 24 years ago, roughly, we had the Epigenome Project. Can you talk a little bit about that
01:46:00.380 and maybe explain technologically how that was done as well?
01:46:06.260 Yeah. So in the development of the Grail Gallery test, there was a key capability that we knew was
01:46:12.820 going to be important for a multi-cancer test. So very different than most cancer screening today,
01:46:19.120 which is done one cancer at a time. So if you have a blood test and it's going to tell you there's a
01:46:24.520 cancer signal present and this person should be worked up for cancer, you'd really like to know,
01:46:30.160 well, where is that cancer likely reside? Because that's where you should start your workup. And you
01:46:35.260 want it to be pretty accurate. So if the algorithm detects a cancer and it's really a head and neck
01:46:40.760 cancer, you'd like the test to also say it's likely head and neck and then do an endoscopy
01:46:45.300 and not have to do lots of whole body imaging or a whole body PET CT or things like that.
01:46:52.200 So we developed something called a cancer site of origin. And so today the test has that. If you
01:46:57.300 get a signal detected, it also predicts where the cancer is. And it gives like a top two choice,
01:47:02.900 top two choices. It's about 90% accurate in doing that. But how does that work? The physicians and
01:47:10.040 patients have gotten that have described it as kind of magic that it detects the cancer and predicts it.
01:47:14.840 And it's based on the methylation patterns. So methylation is what determines cell identity and
01:47:21.660 cell state. So again, DNA code is more or less the same in your cells, but the methylation patterns
01:47:28.340 are strikingly different. When a cell replicates, why does it continue to be the same type of cell?
01:47:34.580 When epithelial cell replicates, same DNA as a T cell or a heart cell, but it doesn't become those
01:47:41.040 it stays. It's because the methylation pattern, those exact methylation states on the 28 million
01:47:46.580 are also replicated. So just in the same way, DNA as a way of replicating the code, there's an enzyme
01:47:52.220 that looks and copies the pattern to the next cell. And so that exact code determines, again,
01:48:00.980 is it a colonic epithelial cell or a fallopian epithelial cell or whatever it is. And so we knew
01:48:06.980 that the only way to make a predictor in the cell pre-DNA is to have that atlas of all the
01:48:14.480 different methylation patterns. And so with a collaborator, a guy named Yuval Dor at Jerusalem
01:48:20.160 University, we laboriously got surgical remnants from healthy individuals. He developed protocols
01:48:27.620 to isolate the individual cell types of most of the cells that get transformed in cancer.
01:48:34.860 And then we got pure methylation patterns where we sequenced, like sequencing the whole genome,
01:48:40.000 sequenced the whole methylome of all those cell types. And we published that a year ago.
01:48:44.160 As the first atlas of the human methylome and all of the major cell types. And so for the first time,
01:48:51.300 we could say, hey, this is the code, which makes you beta islet cell in the pancreas that makes
01:48:57.800 insulin versus something else. Interestingly, there's only one cell in the body where the insulin promoter
01:49:05.940 is not methylated. And that is the beta islet cell. Every other single cell, that promoter is heavily
01:49:12.240 methylated because it shouldn't be making insulin. It's those kinds of signals that when you have the
01:49:18.840 cell-free DNA and you look at the methylation pattern allows the algorithm to predict, hey,
01:49:23.120 this isn't just methylation signal that looks like cancer. The patterns and what's methylated and what's
01:49:29.480 not methylated looks like colorectal tissue or a colorectal cancer. And that's how the algorithm does it.
01:49:37.080 And so this atlas, again, was a real breakthrough for diagnostics and it made cancer site of origin
01:49:44.080 useful. It's also being used for lots of MRD or those cancer monitoring tests too, because it's so
01:49:50.180 sensitive. But it also brought up this interesting possibility, which is if you're going to develop
01:49:55.320 therapeutics or you want to, say, rejuvenate cells or repair them that have changed or become
01:50:01.760 pathologic, what if you compare the methylation pattern in the good state versus the bad state?
01:50:07.020 Does that then tell you the exact positions that need to be fixed? And then with another technology,
01:50:13.440 which can go and flip those states, will that reverse or rejuvenate the cell to the original or
01:50:21.760 desired state?
01:50:23.600 So Alex, unlike the genome, which doesn't migrate so much as we age, I mean, obviously it accumulates
01:50:29.940 mutations, but with enough people, I guess we can figure that out pretty quickly. Do you need
01:50:35.880 longitudinal analysis of a given individual, i.e. within an individual to really study the
01:50:43.260 methylome? Do you need to be able to say, boy, in an ideal world, this is what Peter's epigenome
01:50:49.820 looked like when he was one year, you know, at birth, one year old, two, three, four, 50 years old,
01:50:54.660 so that you could also see not just how does the methylation site determine the tissue specificity
01:51:04.820 or differentiation, but how is it changing with normal aging as well?
01:51:11.860 I think a lot of it is not individual specific. I'll give you an example. So I've done a fair amount
01:51:17.580 of work in T cells. And if you look at, say, exhausted effector T cells versus naive memory
01:51:24.540 cells, where younger individuals tend to have more of those, and it gives them more reservoir
01:51:29.900 to do things like fight disease, fight cancer. There's very distinct methylation changes. Certain
01:51:36.720 genes get methylated or demethylated. And those changes seem to be, again, very correlated with this
01:51:44.120 change in T cell function. My belief is that those represent fundamental changes as the T cell
01:51:52.240 population gets aged, and you end up with more and more T cells that, relatively speaking, are useless.
01:51:58.440 And so if you wanted to rejuvenate the T cells, repairing those methylation states is something that
01:52:04.380 would benefit everyone. Now, there are definitely a small percentage of methylation sites that are
01:52:11.220 probably drifting or degrading, and those could be specific to individuals. There's some gender
01:52:16.840 specific sites, for sure. There's some ethnic ones. But big, big changes seem to happen more with loss of
01:52:25.680 function, big changes in age that are probably common across individuals, or in the case of cancer, we also
01:52:34.660 have profound changes. When you think about this space, a term comes up. If folks have been kind of
01:52:42.400 following this, they've probably heard of things called Yamanaka factors. In fact, a Nobel Prize was
01:52:47.980 awarded to Yamanaka for the discovery of these factors. Can you explain what they are and what role they
01:52:55.980 play in everything you were discussing?
01:52:58.780 What Yamanaka and colleagues discovered is that if you take fully differentiated cells, for example,
01:53:07.060 fibroblasts, and you expose them to a particular cocktail of four transcription factors, that the
01:53:13.940 cell reverts to a stem cell-like state. And these are called induced pluripotent stem cells. You subject
01:53:21.440 a differentiated cell that was a mature cell of a particular type. I think most of their work was in
01:53:27.560 fibroblasts. And the cell, when it's exposed to these transcription factors, and these transcription
01:53:33.560 factors are powerful ones at the top of the hierarchy, they unleash a huge number of changes
01:53:39.700 in gene expression. Genes get turned on, get turned off. And then ultimately, if you keep letting it going,
01:53:46.840 you end up with something that is a type of stem cell. And why this was so exciting is it gave the
01:53:54.080 possibility to create stem cells through a manufactured process. As you know, there's a lot
01:53:59.720 of controversy about getting stem cells from embryos or other sources. This created a way now to create
01:54:07.180 stem cells and use them for medical research by just taking an individual's own cells and kind of
01:54:13.040 de-differentiating it back to a stem cell.
01:54:15.400 How much did that alter the phenotype of the cell itself? In other words, the fibroblast has
01:54:24.300 a bunch of phenotypic properties. What are the properties of a stem cell and how much of that is
01:54:31.040 driven by the change in methylation? In other words, I'm trying to understand how these transcription
01:54:37.380 factors are actually exerting their impact throughout this regression, for lack of a better word.
01:54:42.860 We refer to cell-type specific features as somatic features, like a T-cell receptor. That's a feature
01:54:50.260 of a T-cell or a dendrite or an axon would be for a neuron or an L-type calcium channel for a cardiac
01:54:57.080 myocyte. So those are very cell-type specific features. So if you turn on these Yamanaka factors and you
01:55:03.920 go back to a pluripotent stem cell, you lose most of these. And that word pluripotent means the
01:55:10.840 potential to become anything, at least in theory. So you lose most of these cell-type specific
01:55:16.860 features. So the use of the iPSCs is then to re-differentiate them. And that's what people have
01:55:24.120 been attempting to do. And it opened up the ability to do that, which is you create this
01:55:28.340 stem cell that now potentially has the ability to be differentiated into something else. You give it a
01:55:34.140 different cocktail and you try to make it a neuron or a muscle cell, and then use that in a tissue
01:55:41.020 replacement therapy. And there's a lot of research on that and a lot of groups trying to do that.
01:55:46.220 You also asked about what is the relationship between that and the epigenetics and methylation
01:55:50.740 state. That has not been well explored. And that's something that I and others are excited to do,
01:55:56.620 because it could be that you're indirectly affecting the epigenome with these Yamanaka factors,
01:56:02.460 and that if you translated that into an epigenetic programming protocol, you could have a lot more
01:56:08.520 control over it. Because one of the challenges with the Yamanaka factors is if you do this for 0.99
01:56:14.900 long enough, eventually the stem cell becomes something much more like a cancer cell and just
01:56:21.000 becomes kind of unregulated growth. And so again, huge breakthrough in learning about this kind of
01:56:27.820 cell reprogramming and de-differentiation, but our ability to use it in a practical way for tissue and
01:56:35.100 cell replacements is not there. My hope is that by converting it to an epigenetic level, it'll be more
01:56:41.700 tractable. You mentioned that this is typically done with fibroblasts. I assume the experiment has been
01:56:47.160 done where you sprinkle Yamanaka factors on cardiac myocytes, neurons, and things like that. Do they not
01:56:53.960 regress all the way back to potent stem cells? I think to varying extents. I mean, if you truly have
01:57:00.400 a pluripotent stem cell, I guess in theory, it shouldn't matter where it came from, right? Because
01:57:05.280 it's pluripotent. So with developmental factors, where did your first neurons come from? You had a
01:57:11.400 stem cell, and then in the embryo or the fetus, there were factors that then coax that stem cell to 0.79
01:57:18.020 become these other types of cells and tissues. So if it's truly pluripotent, you should be able to do
01:57:24.080 that. Now, I think you're getting at something which is different, which is called partial
01:57:27.860 reprogramming. He and the people who have followed his work, they're trying to do his things which
01:57:33.640 is kind of stop halfway. So what if you took a heart cell or a T cell that's lost a lot of function,
01:57:41.360 and you give it these Yamanaka factors, but you stop it before it really loses its cell identity,
01:57:48.700 will it have gained some properties of its higher functioning youthful state without
01:57:53.500 having lost it? And so there's some provocative papers out there on this. There's a guy, Juan Carlos
01:58:00.520 Del Monte, who's done some work on this and some very provocative results in mice of doing these
01:58:06.620 partial reprogramming protocols and rejuvenating. Again, it's mice, so all the usual caveats,
01:58:13.480 but getting very striking improvements in function, in eyesight, cognition, again, in these
01:58:19.380 mouse metrics. So certainly interesting in trying to understand how that might be able to translate to
01:58:25.040 humans. Again, the worry there would be that if you don't control it, then you could make essentially
01:58:31.360 a tumor. So it's opened up that whole area of science that it's possible to do these kinds of
01:58:37.640 dramatic de-differentiations, how to really harness that in a context of human rejuvenation.
01:58:44.440 We don't know how to do that yet, but there's a lot of people trying to figure that out.
01:58:48.940 If you had to guess with a little bit of optimism, but not pie in the sky optimism,
01:58:53.940 where do you think this field will be in a decade? Which there's a day when a decade sounded a long
01:59:01.540 time away. It doesn't sound that long anymore. Decades seem to be going by quicker than I remember.
01:59:07.100 So it's going to be a decade pretty soon, but that's still a sizable amount of time for the field to
01:59:12.940 progress. What do you realistically think can happen with respect to addressing the aging phenotype
01:59:22.880 vis-a-vis some method of reversal of aging, some truly gyro-protective intervention?
01:59:32.320 So I'm optimistic and I'm a believer. I think for specific organs and tissues and cell types,
01:59:40.300 there will be treatments that rejuvenate them. It's hard to see in a decade that there's just a
01:59:45.260 complete rejuvenation of every single cell and tissue in a human, but joint tissues,
01:59:52.320 the retina, immune cells. We're learning so much about the biology related to rejuvenation and
02:00:00.960 healthier states of them. And then in combination with that, the tools to manipulate them, which is
02:00:06.200 equally important. You could understand what the biology is, but not have a way to intervene.
02:00:09.820 The tools to go in and edit these at a genomic level, to edit it at an epigenetic level,
02:00:16.960 to change the state and the delivery technologies to get them to very specific tissues and organs
02:00:23.520 is also progressing tremendously. So I definitely see a world in 10 years from now where we may have
02:00:30.120 rejuvenation therapies for osteoarthritis, rejuvenation for various retinopathies, where
02:00:37.320 we can rejuvenate whole classes of immune cells that make you more resistant to disease,
02:00:42.820 more resistant to cancer. I think we'll see things that will have real benefits in improving health
02:00:49.300 span. Alex, this is an area that I think truly excites me more than anything else in all of
02:00:56.820 biology, which is to say, I don't think there's anything else in my professional life that grips my
02:01:03.880 fascination more than this question. Namely, if you can revert the epigenome to a version that
02:01:13.800 existed earlier, can you take the phenotype back with you? And that could be at the tissue level,
02:01:20.100 as you say, could I make my joints feel the way they did 25 years ago? Could it make my T cells
02:01:27.600 function as they did 25 years ago? And obviously one can extrapolate from this and think of the entire
02:01:33.440 organism. So anyway, I'm excited by the work that you and others in this field are doing
02:01:39.040 and grateful that you've taken the time to talk about something that's really no longer your main
02:01:44.060 project, but something for which you provide probably as good a history of as anyone vis-a-vis
02:01:50.580 the liquid biopsies. And then obviously a little bit of a glimpse into the problem that obsesses you
02:01:54.740 today. Awesome. Well, fun chatting with you as always, Peter. Glad to have the opportunity to dive
02:02:00.260 in deep with this. There are many places to do this. Thank you. Thanks, Alex. Thank you for listening
02:02:05.680 to this week's episode of The Drive. It's extremely important to me to provide all of this content
02:02:10.720 without relying on paid ads. To do this, our work is made entirely possible by our members. And in
02:02:16.280 return, we offer exclusive member-only content and benefits above and beyond what is available for free.
02:02:23.000 So if you want to take your knowledge of this space to the next level, it's our goal to ensure
02:02:26.880 our members get back much more than the price of the subscription. Premium membership includes
02:02:31.740 several benefits. First, comprehensive podcast show notes that detail every topic, paper, person,
02:02:38.960 and thing that we discuss in each episode. And the word on the street is nobody's show notes rival
02:02:44.220 ours. Second, monthly ask me anything or AMA episodes. These episodes are comprised of detailed
02:02:51.380 responses to subscriber questions typically focused on a single topic and are designed to offer a great
02:02:57.340 deal of clarity and detail on topics of special interest to our members. You'll also get access
02:03:02.060 to the show notes for these episodes, of course. Third, delivery of our premium newsletter, which is put
02:03:08.160 together by our dedicated team of research analysts. This newsletter covers a wide range of topics related
02:03:14.100 to longevity and provides much more detail than our free weekly newsletter. Fourth, access to our
02:03:21.160 private podcast feed that provides you with access to every episode, including AMA's sans the spiel you're
02:03:27.400 listening to now and in your regular podcast feed. Fifth, the Qualies, an additional member-only podcast
02:03:34.740 we put together that serves as a highlight reel featuring the best excerpts from previous episodes of
02:03:40.700 the drive. This is a great way to catch up on previous episodes without having to go back and
02:03:45.200 listen to each one of them. And finally, other benefits that are added along the way. If you want
02:03:50.400 to learn more and access these member-only benefits, you can head over to peteratiamd.com forward slash
02:03:56.900 subscribe. You can also find me on YouTube, Instagram, and Twitter, all with the handle
02:04:01.980 peteratiamd. You can also leave us a review on Apple podcasts or whatever podcast player you use.
02:04:08.660 This podcast is for general informational purposes only and does not constitute the practice of
02:04:13.940 medicine, nursing, or other professional healthcare services, including the giving of medical advice.
02:04:19.420 No doctor-patient relationship is formed. The use of this information and the materials linked to this
02:04:25.240 podcast is at the user's own risk. The content on this podcast is not intended to be a substitute for
02:04:31.180 professional medical advice, diagnosis, or treatment. Users should not disregard or delay in obtaining
02:04:36.800 medical advice from any medical condition they have, and they should seek the assistance of their
02:04:41.760 healthcare professionals for any such conditions. Finally, I take all conflicts of interest very
02:04:47.100 seriously. For all of my disclosures and the companies I invest in or advise, please visit
02:04:52.720 peteratiamd.com forward slash about where I keep an up-to-date and active list of all disclosures.
02:04:59.740 peteratiamd.com forward slash about where I keep an up-to-date and active list of all disclosures.
02:05:29.740 peteratiamd.com forward slash about where I keep an up-to-date and active list of all disclosures.