The Peter Attia Drive - #269 - Good vs. bad science： how to read and understand scientific studies

00:00:00.000 Hey, everyone. Welcome to the drive podcast. I'm your host, Peter Atiyah. This podcast,

00:00:16.580 my website, and my weekly newsletter all focus on the goal of translating the science of longevity

00:00:21.580 into something accessible for everyone. Our goal is to provide the best content in health and

00:00:26.780 wellness, and we've established a great team of analysts to make this happen. It is extremely

00:00:31.720 important to me to provide all of this content without relying on paid ads to do this. Our work

00:00:37.000 is made entirely possible by our members. And in return, we offer exclusive member only content

00:00:42.760 and benefits above and beyond what is available for free. If you want to take your knowledge of

00:00:47.980 this space to the next level, it's our goal to ensure members get back much more than the price

00:00:53.260 of the subscription. If you want to learn more about the benefits of our premium membership,

00:00:58.080 head over to peteratiyahmd.com forward slash subscribe. Welcome to a special episode of

00:01:06.100 the drive. For this week's episode, we're going to rebroadcast AMA number 30 on how to read and

00:01:12.140 understand scientific studies, which was originally released in December of 2021. While this was

00:01:17.760 originally released as an AMA for subscribers only, due to how important of a topic this

00:01:23.280 is, we've decided to re-release it and make it available for everyone today. If you're a consumer

00:01:28.760 of this podcast or any of our weekly emails, you know that I place a large emphasis on scientific

00:01:34.400 literacy and how the media often gets this wrong. And even well-intentioned scientists sometimes

00:01:40.840 misrepresent or misunderstand their own results. And so this episode is our effort to try to help you

00:01:47.020 with that. In this episode, we discuss what is the process for a study to go from an idea to a design

00:01:53.000 to execution. What are the different types of studies out there and what do they mean? What are

00:01:58.060 the strengths and limitations of each of them? How do clinical trials work specifically for drugs,

00:02:03.140 for example? What are the common pitfalls of observational studies that you should be looking for?

00:02:08.040 What questions should you be asking about a study to figure out how rigorous it was? What does it mean

00:02:13.420 when a study is statistically significant? And is this the same as it being clinically significant?

00:02:19.340 Why do some studies never get published? And what is my process for reading scientific papers?

00:02:24.920 So without further delay, I hope you enjoy or re-enjoy this special episode on how to read

00:02:30.520 and interpret scientific studies.

00:02:37.820 Hey, Bob, how are you, man? Looking pretty studious there in the library today.

00:02:41.740 Hey, Peter. Thanks very much. Yeah, just getting some reading in before the podcast.

00:02:47.100 This is going to be a pretty good one because as you may recall about, I don't know, four or five

00:02:51.240 months ago, maybe longer, I was on a podcast with Tim Ferriss and I don't know how it came up,

00:02:55.740 but I do remember somehow it came up that we had spent a lot of time writing this series,

00:03:00.620 studying studies. And God, that's been four years ago, I think. But we didn't really have

00:03:06.660 something more digestible for folks on how to make sense of the ever-changing landscape of

00:03:12.980 scientific literature and how to kind of distinguish between the signal and the noise

00:03:17.540 of the research news cycle. And I remember after that, Tim and I went out for dinner and he kept

00:03:22.280 pressing me on, well, what can I do to get better at this process? Are there newsletters I used to be

00:03:28.100 subscribing to and things like that? And while I'm sure that there are, I didn't know what they were

00:03:31.360 off the top of my head. And so I think what we've done here, when I say we, I mean you, what you have

00:03:36.440 done here is aggregate all the questions that have come in over the past year, basically, that pertain

00:03:43.540 to understanding the structure of science. I looked through the questions last week and I was

00:03:48.960 pretty excited. I think it's going to be a sweet discussion and I hope this serves as an amazing

00:03:53.620 primer for people to really understand the process of scientific experiments and everything from how

00:04:00.820 studies are published and obviously what some of the limitations are. So anything else you want to

00:04:04.740 add to that, Bob, before we jump in? I agree. I think it's a fun topic. We get so many of these

00:04:08.780 questions that we end up, or at least I do, or to the website where we'll point readers to one of the

00:04:15.080 parts of the studying studies. But I think sometimes just talking about it and explaining

00:04:19.260 it can help a lot. So I think this will be really useful as far as like a question and answer session

00:04:25.780 rather than just reading a blog. I don't think this displaces that other stuff. I think we go into

00:04:30.520 probably more detail on some things there, but I also think we're going to cover things here that

00:04:34.160 aren't covered there. So depending on how you like to get your info, this could be fun. So where do you

00:04:39.200 want to start? We have, again, a lot of questions, but I think this question gets to the core of,

00:04:44.260 I think what we're trying to do here, which is how can a user or a person who has no scientific

00:04:49.820 background better understand studies that they read in the news or in the publications to know

00:04:55.300 if the findings are solid or not, especially in today's age where you can easily see two studies

00:04:59.380 that contradict each other. Coffee's good. Coffee's bad. Eggs are good. Eggs are bad. So I thought we

00:05:05.660 could run through a bunch of questions with the first one that we got here is what is the process

00:05:11.900 for a study to go from an idea to design and execution? This is a great question. In theory,

00:05:18.480 it should start with a hypothesis. Good science is generally hypothesis driven. I think the cleanest

00:05:26.680 way to think about that is to take the position that there is no relationship between two phenomena.

00:05:36.160 We would call this sort of a null hypothesis. So my hypothesis might be that drinking coffee makes

00:05:44.520 your eyes turn darker. So I would have to state that hypothesis, and then I would have to frame it in

00:05:51.860 a way that says my null hypothesis is that when you drink coffee, your eyes do not change in color in any

00:06:00.800 way, shape, or form. And that would imply that the alternative hypothesis is that when you drink coffee,

00:06:07.800 your eyes do change color. You can already see, by the way, that there's nuance to this because am I

00:06:15.240 specifying what color it changes to? Does it get darker? Does it get lighter? Does it change to blue,

00:06:20.560 green? Does it just get the darker shade of whatever it is? But let's put that aside for a moment and just

00:06:24.620 say that you will have this null hypothesis and you will have this alternative hypothesis. And to be able

00:06:30.800 to formulate that cleanly is sort of the first step here. The second thing, of course, is to conduct

00:06:36.260 an experimental design. How are you going to test that hypothesis? As we're going to talk about, a

00:06:42.240 really, really elegant way to test this is using a randomized controlled experiment. If it's possible

00:06:48.300 to blind it, we'll talk about what that means. You'll have to decide, well, how long should we make

00:06:53.780 people drink coffee? How frequently should they drink coffee? How are we going to measure eye color?

00:06:57.840 These are the questions that come down to experimental design. You then have to determine

00:07:02.740 a very important variable, which is how many subjects will you have? And of course, that will

00:07:08.640 depend on a number of things, including how many arms you will have in this study. But it comes down

00:07:13.700 to doing something that's called a power analysis. And this is so important that we're going to spend

00:07:17.580 some time talking about it today, although I won't talk about it right now. If this study involves

00:07:22.220 human subjects or animal subjects, you will have to get something called an institutional review board

00:07:28.520 to approve the ethics of the study. So you'll have to get that IRB approval. You'll have to determine

00:07:34.820 what your primary and secondary outcomes are, get the protocol approved, develop a plan for statistics,

00:07:40.280 and then pre-register the study. All of these things happen before you do the study. And of course,

00:07:46.160 in parallel to this, you have to have funding. So those are kind of the steps that go into doing

00:07:52.620 an experimental study. And what we're going to talk about, I think in a minute, is that there

00:07:57.220 are some studies that are not experimental where some of these steps are obviously skipped.

00:08:02.400 Yeah. One of the questions we got was, what are the different types of studies out there? And what

00:08:06.060 do they mean? For example, observational study versus a randomized controlled study. What are the

00:08:11.400 different types of studies? I think broadly speaking, you can break studies into three

00:08:19.420 categories. One would be observational studies. We'll bifurcate those or trifurcate those in a minute.

00:08:26.040 Then you can have experimental studies. And then you can have basically summations of and or reviews of

00:08:36.460 and or analyses of studies of any type. Let's kind of start at the bottom of that pyramid.

00:08:42.860 I think you actually have a figure that I don't like very much, but-

00:08:46.780 I was going to say, I thought it was one of your favorites.

00:08:48.700 Yeah, I can't stand it. I'll tell you what I like about the figure. I like the color schema because

00:08:52.860 my boys are so obsessed with rainbows that if I show them this figure, they're going to be

00:08:59.280 really happy. So let's pull up said rainbow figure.

00:09:02.960 Sure. Okay. Got it.

00:09:05.020 Okay. So you can see these buckets here. And again, at the level of talking about them,

00:09:10.220 I think this makes sense. What I don't agree with the pyramid for Bob is that it puts a hierarchy

00:09:15.780 in place that suggests a meta-analysis is better than a randomized control trial,

00:09:20.340 which is not necessarily true. But let's just kind of go through what each of these things mean. So

00:09:24.640 looking at the observational studies, an individual case report is first or second paper I ever wrote

00:09:30.660 in my life when I was in medical school was an individual case report. It was a patient who had

00:09:35.220 come into clinic when I was at the NIH. This was a patient with metastatic melanoma and their calcium

00:09:43.280 was sky high, dangerously high, in fact. And obviously our first assumption was that this patient

00:09:50.120 had metastatic disease to their bone and that they were lysing bone and calcium was leaching into

00:09:56.560 their bloodstream. It turned out that wasn't the case at all. It turned out they had something that

00:10:00.820 had not been previously reported in patients with melanoma, which was they had developed this

00:10:06.120 parathyroid hormone related like hormone in response to their melanoma. This is a hormone that

00:10:12.620 exists normally, but it doesn't exist in this format. And so their cancer was causing them to have more of

00:10:19.380 this hormone that was causing them to raise their calcium level. It was interesting because it had never

00:10:24.500 been reported before in the literature. And so I wrote this up. This was an individual case report.

00:10:30.460 Is there any value in that? Sure. There's some value in that. The next time a patient with melanoma

00:10:36.000 shows up to clinic and their calcium is sky high and someone goes to the literature to search for it,

00:10:40.940 they'll see that report and it will hopefully save them time in getting to the diagnosis.

00:10:46.800 Your mentor and friend, Steve Rosenberg, I think of him when I think of individual case reports.

00:10:52.100 I think if you listen to the podcast, he talks about this, but a lot of what motivated him early

00:10:57.220 on, I think were just a couple of cases. I think it gets back to that first question too,

00:11:01.100 about the process for a study to go to an idea to design to execution, which is

00:11:04.660 to have a hypothesis, you need to make an observation. And so you make an observation,

00:11:08.900 you say, hmm, that's strange. And I think that that's what individual case reports can represent

00:11:13.820 sometimes is this is an interesting observation. It's hypothesis generating for the most part,

00:11:18.460 but it really might kickstart a larger trial or it might kickstart a career. You never know.

00:11:24.580 Exactly. Now, of course, it's not going to be generalizable. I can't make any statement about

00:11:29.560 the frequency of this in the broader subset of patients. And obviously I can't make any comment

00:11:35.280 about any intervention that may or may not change the outcome of this. So that gets us to kind of our

00:11:40.460 next thing, which is like a case series or set of studies. So here you're basically doing the same

00:11:49.280 thing, but in plural effectively, you wouldn't just look at one patient. You would say, well,

00:11:55.880 I've now been looking back at my clinical practice and I've had 27 patients over the last 40 years

00:12:05.920 that have demonstrated this very unusual finding. And another example of this, going back to the

00:12:12.140 Steve Rosenberg case would be one could write a paper that looks at all spontaneous regressions

00:12:17.400 of cancer. Obviously, spontaneous regressions of cancer are incredibly rare, but there are certainly

00:12:22.800 enough of them that one could write a case series. So now let's consider cohort studies. So cohort studies

00:12:29.020 are larger studies and they can be retrospective or they can be prospective. So I'll give you an example

00:12:34.140 of both. So a retrospective observational cohort study would be, let's go back and look at all the

00:12:43.400 people who have used saunas for the last 10 years and look at how they're doing today relative to people

00:12:53.200 who didn't use saunas over the last 10 years. So it's retrospective. We're looking backwards. It's

00:12:59.680 observational. We're not doing anything, right? We're not telling these people to do this or

00:13:03.940 telling those people to do that. And the hope when you do this is that you're going to see some sort

00:13:08.280 of pattern. Undoubtedly, you will see a pattern. Of course, the question is, will you be able to

00:13:12.620 establish causality in that pattern? Cohort studies can just as easily, although more time

00:13:17.760 consuming, be prospective. So you could say, I want to follow people over the next five years, 10 years

00:13:26.780 who use saunas and compare them to a similar number of people who don't. And now in a forward

00:13:34.860 looking fashion, we're going to be examining the other behaviors of these people and ultimately what

00:13:40.820 their outcomes are. Do they have different rates of death, heart disease, cancer, Alzheimer's disease,

00:13:45.140 other metrics of health that we might be interested in? Again, we're not intervening.

00:13:50.000 There's not an experiment per se. We're just observing, but now we're doing it as we march

00:13:54.980 forward through time. So this brings us to the kind of the next layer of this pyramid, which are

00:13:59.880 the experimental studies. Divide these into randomized versus non-randomized. And of course,

00:14:06.240 this idea of randomization is going to be a very important one as we go through this.

00:14:10.960 So a non-randomized trial sometimes gets referred to as an open label trial where you take two groups

00:14:19.640 of people and you give one of them a treatment and you give the other one either a placebo or a

00:14:24.020 different treatment, but you don't randomize them. There's a reason that they're in that group.

00:14:28.520 So you might say, we want to study the effect of a certain antibiotic on a person that comes in the

00:14:38.540 ER and we're going to take all the people that come in who look a certain way. Maybe they have

00:14:47.100 a fever of a certain level or a white blood cell count of a certain level. We're going to give them

00:14:51.620 the antibiotic and the people who come in, but they don't have those exact signs or symptoms.

00:14:57.900 We're going to not give an antibiotic to, and we're going to follow them. That's kind of a lame

00:15:02.220 example. You could do the same sort of thing with surgical interventions. We're going to try to ask the

00:15:07.940 question, is surgery better than antibiotics for appendicitis or suspected appendicitis, but we don't

00:15:15.500 randomize the people to the choice. There's some other factor that is going to determine whether

00:15:21.380 or not we do that. As you can see, that's going to have a lot of limitations because presumably there's

00:15:26.340 a reason you're making that decision and that reason will undoubtedly introduce bias. So of course,

00:15:32.200 the gold standard that we always talk about is a randomized control trial where whatever question

00:15:38.780 you want to study, you study it, but you attempt to take all bias out of it by randomly assigning

00:15:46.480 people into the treatment groups, the two or more treatment groups. We'll talk about things like

00:15:51.000 blinding later because you can obviously get into more and more rigor when you do this, but before we

00:15:57.360 leave the kind of experimental side, anything you want to add to that, Bob? I would add, so non-randomized

00:16:02.200 controlled trials, maybe another example, illustrative example, I think with non-randomized controlled

00:16:06.520 trials might be you have patients maybe making a decision beforehand, which we'll get into

00:16:11.680 selection bias, but they might want to go on a statin, let's say, and then you give them a choice

00:16:15.860 and the other ones might want to go on some other drug like azetimibe. They're basically selecting

00:16:20.180 themselves into two groups, but you could compare those two groups and see how they do, but it hasn't

00:16:26.160 been randomized. There's a lot of bias that can go into that. There could be a lot of reasons why one

00:16:30.520 group is selecting a particular treatment over the other. And so that's why I think when we get to

00:16:34.660 randomized trials that shows the power of randomization. Yeah, exactly. We don't need to

00:16:40.440 go back to the figure, but people might recall that at the top of that pyramid was systemic reviews and

00:16:45.260 meta-analyses. Let's just talk about meta-analyses since they are probably the most powerful. So this

00:16:49.540 is a statistical technique where you can combine data from multiple studies that are attempting to

00:16:54.540 look at the same question, basically. So each study gets a relative weighting and the weighting of a

00:17:01.200 study is sort of a function of its precision. It depends a little bit on sample size, other events

00:17:06.260 in the study, larger studies, which have smaller standard errors are given more weight than smaller

00:17:10.920 studies with larger standard errors, for example. You'll know you're looking at a meta-analysis.

00:17:15.420 We should have had a figure for this, but I'll describe it the best I can. They usually have a figure

00:17:20.060 somewhere in there that will show across rows, all of the studies. So let's say there's 10 studies

00:17:26.600 included in the meta-analysis, and then they'll have the hazard ratios for each of the studies.

00:17:34.040 So they'll represent them usually as little triangles. The triangle will represent the 95%

00:17:40.020 confidence interval of what the hazard ratio is, which we'll talk about a hazard ratio,

00:17:44.920 but it's basically a marker of the risk. And you'll see all 10 studies, and then they'll show you the

00:17:49.960 final summation of them at the bottom, which of course you wouldn't be able to deduce looking at the

00:17:54.440 figure, but it takes into account that mathematical weighting. So on the surface, meta-analyses seem

00:17:59.200 really, really great, because if one trial, one randomized trial is good, 10 must be better.

00:18:07.620 I know I've said this before probably three or four times over the past few years on the podcast,

00:18:11.540 but as James Yang, one of the smartest people I ever met when I was both a student and fellow at NCI,

00:18:17.420 once said during a journal club about a meta-analysis that was being presented,

00:18:21.140 he said something to the effect of a thousand sows ears makes not a pearl necklace. And that's just

00:18:27.520 an eloquent way to say that garbage in, garbage out. So if you do a meta-analysis of a bunch of

00:18:33.120 garbage studies, you get a garbage meta-analysis. It can't clean garbage. It simply can aggregate it.

00:18:41.580 So a meta-analysis of great randomized control trials will produce a great meta-analysis.

00:18:47.160 They try to control for garbage, the researchers and the investigators. But I think to your point

00:18:52.100 with the pearl necklace, imagine if you had say 10 trials and nine of them are garbage. One of them

00:18:58.520 is really good, really rigorous, randomized controlled trial. And you're looking at the

00:19:02.900 top of the pyramid and you're saying, well, meta-analysis is the best. We should be looking

00:19:05.880 at this meta-analysis. Meanwhile, you've got that one randomized controlled trial that actually is

00:19:11.360 worth its salt. It's rigorous, et cetera, that I would say if you had the option, I think you probably

00:19:15.860 would rely more on that one randomized controlled trial, which is lower on the pyramid. So I think

00:19:20.700 that's probably, I think you've told me one of your hangups with the pyramid, because it's not

00:19:24.880 necessarily top of the pyramid. It's going to be some meta-analysis of randomized controlled trials.

00:19:30.440 That's right. Yeah. I don't want to suggest meta-analyses are not great. What I want to suggest

00:19:34.820 is you can't just take a meta-analysis as gospel without actually looking at each study. You don't get a

00:19:41.720 pass at examining each of the constitutive studies within a meta-analysis is really the point I think

00:19:47.380 we want to make here. There's one thing in here that isn't represented, but we had a few questions

00:19:52.620 about it. I think a couple. People were asking about what's the difference between a phase three

00:19:56.740 and a phase two or a phase one clinical trial. Do you know what's going on there?

00:20:01.160 Yes. So here we're talking about human clinical trials. This phraseology is used by the FDA here

00:20:09.320 in the United States. And typically the world does tend to follow in lockstep, but not always with kind

00:20:15.240 of the FDA's process. So if you go way, way, way back, you have an interesting idea. You have a drug

00:20:22.000 that you think is, or a molecule that you think will have some benefit. Think of it as a cancer

00:20:28.960 therapeutic. You've done some interesting experiments in animals, maybe started with some

00:20:35.060 mice and you went up to some rats and maybe even you've done something in primates. And now you're

00:20:40.760 really committed to this as the success of this and the safety of this in animals looks good. So it's

00:20:46.380 both safe and efficacious in animals. And you now decide you want to foray into the human space. Well,

00:20:52.520 the first thing you have to do is file for something called an IND, an investigational new drug

00:20:58.120 application. So after you do all of this preclinical work, you have to file this IND with the FDA.

00:21:05.180 And that basically sets your intention of testing this as a drug in humans. And the first phase of

00:21:11.880 that, which is called phase one is geared specifically to dose escalate this drug from a very, very low

00:21:20.200 level to determine what the toxicity is across a range of doses that will hopefully have efficacy.

00:21:28.120 These are typically very small studies, usually less than a hundred people. They're typically done

00:21:34.560 in cohorts. So you might say, well, the first 12 people are going to be at 0.1 milligrams per kilogram.

00:21:42.480 And assuming we see no adverse effects there, we'll go up to 0.15 milligrams per kilogram for the next

00:21:48.800 12 people. And if we have no issues there, we'll escalate it to 0.25. You'll notice, Bob,

00:21:54.820 I said nothing in there about, does the drug work? These are going to be patients with cancer. If this

00:22:01.000 is a drug that's being sought as a treatment for colon cancer, these are going to be patients that

00:22:05.120 all have colon cancer. They're often going to be patients who have metastatic colon cancer. So these

00:22:10.400 are going to be patients who have progressed through all other standard treatments and who are basically

00:22:18.620 saying, look, sign me up for this clinical trial. I realized that this first phase is not going to be

00:22:25.560 necessarily giving me a high enough dose that I could experience a benefit and that you're really

00:22:30.260 only looking to make sure that this drug doesn't hurt me. But nevertheless, I want to participate in

00:22:35.280 this trial. If the drug gets through phase one safely, then it goes to phase two. And the goal of

00:22:42.700 phase two is to continue to evaluate for safety, but also to start to look for efficacy. But this is

00:22:50.280 done in an open label fashion. What that means is they're not randomizing patients to one drug versus

00:22:58.160 the other typically. They can, but usually it's now we think we know one or two doses that are going to

00:23:05.960 produce efficacy. They were deemed safe in the phase one. We're now going to take patients and give them

00:23:13.700 this drug and look for an effect. And a lot of times, if there's no control arm in the study, you're

00:23:18.840 going to compare to the natural history. So let's assume that we know that patients with metastatic

00:23:24.800 colon cancer have on standard of care, have immediate survival of X months. Well, we're going to give

00:23:31.120 these patients this drug and see if that extends it anymore. And of course you could do this with a

00:23:35.180 control arm, but now it adds the number of patients to the study. So again, typically very small studies

00:23:40.340 can be, you know, in the 20, 30, 40, 50 range, maybe up to a few hundred people.

00:23:47.040 And that one, Peter, I think is probably a good example of if you have the non-randomization,

00:23:51.600 this might be a case where say it's an immunotherapy and people know about the immunotherapy and it's

00:23:55.980 been really effective. It's approved for a particular cancer, let's say. And there are a lot of people

00:24:00.500 that know about it and there are cancer patients that know about it and they want to get that

00:24:04.140 treatment, but it's not approved. They're talking to their doctor. Maybe they're online. They might

00:24:09.000 enroll in one of these trials because they really want to try the drug and maybe they might believe

00:24:12.760 in it more than some other treatment. Yep. There are lots of things that can introduce bias to a

00:24:18.000 phase two if it does not have randomization. Again, the goal would be to still randomize in phase two

00:24:23.500 because you really do want to tease out efficacy. So if a compound succeeds in phase two,

00:24:29.660 which means it continues to show no significant adverse safety effects, which by the way,

00:24:37.000 doesn't mean it doesn't have side effects. Every treatment has side effects. It's just that it

00:24:41.340 doesn't have side effects that are deemed unacceptable for the risk profile of the patient

00:24:46.860 and it shows efficacy. So really you have to have these two things. You then proceed to phase three.

00:24:53.680 Here a phase three is a really rigorous trial. This is a huge step up. It's typically a log

00:24:58.520 step up in the number of patients. You're talking potentially thousands of patients here. And this

00:25:04.520 is absolutely a placebo controlled trial or not necessarily placebo, but it can be standard of care

00:25:13.500 versus standard of care plus this new agent. But it is randomized whenever possible. It is blinded.

00:25:19.960 And with drugs, that's always possible. And these are typically longer studies because you have so

00:25:25.380 much more sample size, you're going to potentially pick up side effects that weren't there in the

00:25:30.920 first place. And of course, now you really have that gold standard for measuring efficacy. And it's

00:25:37.080 on the basis of the phase one, phase two, and mostly phase three data that a drug will get approved or

00:25:42.800 not approved for broad use, which leads to a fourth phase, which is a post-marketing study. So phase

00:25:50.500 four studies take place after the drug has been approved and they're used to basically get additional

00:25:58.940 information because once a drug is approved, you now have more people taking it. And they may also

00:26:04.300 be using this to look at other indications for the drug. We talked about this recently, right? A phase

00:26:09.500 four trial with semaglutide being used to look at obesity versus its original phase three trials,

00:26:17.740 which were looking at diabetes. The drug's already been approved. This study isn't being done to ask

00:26:22.920 the question, should semaglutide be on the market? No, it's on the market. It's basically expanding

00:26:27.560 the indication for semaglutide in this case, so that insurance companies would actually pay for it

00:26:32.300 for a new indication. But given the size and the number of these studies, you're also looking for,

00:26:37.700 hey, is there another side effect here that we missed in the phase three?

00:26:42.680 Right. And it might be the particular population that might have a different risk profile.

00:26:46.580 Well, you might have a different threshold.

00:26:48.600 That's right. Because you're not doing this in patients with type two diabetes. You're doing

00:26:51.140 this in patients who explicitly don't have diabetes, but have obesity. Different patients

00:26:55.960 are we going to see something different here? So yeah. So anyway, that's the long and short of phases

00:27:00.180 one, two, three, and four. Okay. So going back to observational studies, are there any things that

00:27:07.500 you look for in particular that will increase or decrease your confidence in it, whether that's a pearl

00:27:11.900 necklace or a garbage? I think that selection bias is a big one. When I think about observational

00:27:18.520 studies, whether they be prospective or retrospective, the healthy user bias, I think,

00:27:24.480 is one of the more common ones we see in the epidemiology as it pertains to health. So I wouldn't

00:27:30.780 even know where to begin talking about these studies because the examples are so myriad. But is bacon bad

00:27:36.340 for you? Well, if you look at observational epidemiology, bacon is almost always bad for

00:27:42.880 you. I don't know what the hazard ratios are, Bob, but it's probably in the neighborhood of

00:27:46.380 1.3 or something like that. Meaning it has about a 30% increase in the risk of basically anything you

00:27:53.400 look at, right? Whether it be cancer, heart disease, death, is that directionally right?

00:27:58.340 I think that's right. I mean, there's probably more nuance. The WHO is looking at,

00:28:02.020 I think they said over 700 epidemiological studies for red meat consumption. And I think they also

00:28:06.980 had processed meat consumption. When you look at those, we can get into it, but how are they

00:28:11.400 measuring bacon consumption? They're using these food frequency questionnaires, probably get into

00:28:15.280 this recall bias. But yeah, generally, I think with the WHO stuff, I think it was about 20 to 30%

00:28:20.660 associated increase. And so you look at that at the surface, of course, you'd be concerned. You'd be

00:28:24.820 like, oh my God, like I shouldn't be eating fill in the blank. I shouldn't be drinking coffee. I

00:28:29.040 shouldn't be eating bacon. I shouldn't be eating meat at all. The problem with these studies is that

00:28:36.120 you can't ever, no matter how much you try to statistically reconcile it, you can't strip out

00:28:43.140 the fact that people make choices not in isolation. So is there any difference between a person who makes

00:28:51.760 a lifelong decision to not eat meat and a person who doesn't? Of course there is. And it's going to

00:28:57.460 come down to many things that go beyond their diet, including things that can't be controlled

00:29:01.500 for. Now, obviously you can control for some things. Smoking. A person who doesn't eat meat

00:29:06.460 is far less likely to smoke than a person who does. A person who doesn't eat meat is probably far more

00:29:12.620 likely to exercise or pay attention to their sleep habits or be more compliant with their medications or

00:29:20.000 things like that. Again, people who don't eat meat, basically that is a proxy. That is a really good

00:29:26.340 marker for someone who is very, very health conscious. So this healthy user bias permeates

00:29:33.860 everywhere. And by the way, it permeates in both directions. So if you look at the epidemiology that

00:29:41.260 started to become very popular about 10 years ago, that was suggesting that diet soda was more

00:29:47.400 fattening than soda. So drinking a diet Coke is worse than drinking a Coke. Well, on the surface,

00:29:53.820 that doesn't seem to make a lot of sense, right? I mean, diet Coke has no calories in it. Coke is

00:29:59.160 full of just liquid sugar. And of course that gets you thinking, oh, is it the aspartame or whatever

00:30:03.900 else? Well, a far simpler explanation is look at people who are drinking diet soda versus people who

00:30:10.500 are drinking soda. You could make an argument. I think this is the argument that as a person is

00:30:17.860 becoming more metabolically ill and they're being informed that they really need to stop drinking

00:30:23.680 soda, they're going to be drinking diet soda. And so it's very difficult to look at just people drink

00:30:31.380 this, people drink that. They're otherwise identical. And simply the only difference between them is what

00:30:37.220 they drink. It just doesn't really hold up. So anyway, you're always going to look for that healthy

00:30:41.680 user bias. You talked about another bias a second ago, which is information or recall bias. And I think

00:30:47.320 many people are just shocked to learn how clunky and kludgy nutritional epidemiology is. Like when

00:30:54.860 you think about all of the amazing technology we have in the world, and we just recently did a

00:31:00.680 podcast and talking about some of the most cutting edge tools of neuroscience that allow you to examine

00:31:07.620 the behavior of a single neuron using channel opsins and all these things. Like that's at one end of

00:31:14.140 science. And at the other end of science, we have this thing called a food frequency questionnaire

00:31:18.020 where you get a call from Billy and he asks you, hey, do you remember how many times a week you ate

00:31:25.280 oatmeal for the past year? I pay quite a bit of attention to what I eat. I don't know how I'd answer

00:31:30.120 that question. You just go to your spreadsheet of your oatmeal consumption, right? Yeah. Now you've dug into

00:31:35.540 this a bit, Bob. I'm being a bit tongue in cheek and facetious. Can you try to make the case that

00:31:39.880 recall bias isn't really that bad and I'm just exaggerating? I can't make that case. I guess it

00:31:45.580 might depend on what are you recalling? Yeah. What's the best case scenario? I would probably

00:31:50.780 get out of the food category. It might have to do with something say like smoking history. You might

00:31:55.480 even have receipts of the last time that you paid for cigarettes or something like that. If you ask

00:31:59.800 people how much did you smoke in the last year, I think you can get a more accurate answer. But with

00:32:04.340 the food frequency questionnaires, I mean, there's so many analyses that even just the number of foods

00:32:09.020 that are out there compared to the number of foods that are encapsulated in the food frequency

00:32:12.720 questionnaire, vastly different. It only covers a very small portion of it. And it's actually the

00:32:18.560 foods that I think the epidemiologists often look at, like the red meat consumption and things like

00:32:23.000 that, that people will underestimate when they do like these validity studies and actually follow them

00:32:27.860 or they do like a food log compared to the food frequency questionnaires. The correlation is so low

00:32:33.080 that it's so underestimated that you're not really getting an accurate picture. So don't know about

00:32:38.340 like a best case scenario with food frequency questionnaires for food. It would be on frequency.

00:32:43.380 Imagine if you got a food frequency questionnaire that was maybe more technologically advanced.

00:32:47.740 It's an app where you literally recall at the end of each day, what did I eat today?

00:32:52.440 But the problem is, I think, is the frequency of this that oftentimes it's, you're doing one

00:32:56.640 questionnaire for what did you eat over the course of, say, one year or two years. Or even they'll just do

00:33:03.340 one food frequency questionnaire at the beginning of the study at baseline. They'll follow up with

00:33:07.900 these people for, say, 10 years, 20 years. And the assumption is don't change their eating habits.

00:33:13.480 They never ask them again what happened here. But if, you know, they might compare two groups,

00:33:17.600 one group has higher bacon consumption than the other. The assumption is that they're going to

00:33:21.220 continue with those dietary habits in perpetuity. Again, that's not a best case scenario. But I guess

00:33:26.560 the best case scenario is you could have more rigor, I think, if you did it more frequently.

00:33:30.440 Because obviously, if I asked you what you had for breakfast this morning, I think you'd probably

00:33:34.480 have more confidence in the answer than what did you have for breakfast on January 3rd of last year.

00:33:40.300 I think with nutrition, I just, because I spend so much time doing this type of stuff with patients,

00:33:45.820 it's metaphysically impossible. I mean, I really feel strongly that we should abandon food frequency

00:33:49.640 questionnaires and no study should ever be published that includes them. I'm going to anger a number of

00:33:54.560 epidemiologists listening to this. I really think we need to put a stop to that. I think where recall

00:33:59.420 is reasonable is, as you said, on things that are more profound. I mean, if we wanted to do a study

00:34:05.100 on, think of something really that you would never forget, like, oh, childbirth. Asking women to

00:34:10.940 recall, how many times have you been pregnant? How many times did you either have an abortion or

00:34:16.360 miscarry? And how many times did you deliver a term? Like something that profound? Yeah. I would feel

00:34:23.220 confident that if you asked a woman that question over the past 10 years of her life,

00:34:26.580 you would get very accurate answers. But by the way, it still doesn't tell me that I would be able

00:34:32.680 to infer causality. If I was trying to look at women who have never had a miscarriage versus women

00:34:39.300 who have had miscarriages, just because I look back and ask them to tell me those things doesn't

00:34:44.900 mean that embedded within those differences are other biological or social or economic factors.

00:34:49.660 You kind of get where we're going here, which is I think epidemiology has a place, but I think the

00:34:56.600 pendulum has swung a little too far and its place has been asserted as being more valuable than I

00:35:02.900 think it probably is. You kind of talked about something that's, I think, a very important bias

00:35:08.580 that exists in any study. But I think this is actually a big problem in prospective studies,

00:35:15.520 if they're done incorrectly, which is something called performance bias. So the Hawthorne effect

00:35:20.460 is basically an effect that says, if a person is watching you, you will change your behavior. So

00:35:25.320 anybody who has tried to fastidiously log what they eat every day, which I've done, many people have

00:35:33.120 done this. There's no question you change your behavior just by logging what you eat. You will change

00:35:40.400 what you eat. How much more will you do it when you know someone is going to look at it?

00:35:45.520 Unbelievably so. In fact, you could make a case that one of the most efficacious dietary interventions

00:35:53.060 known to man is having somebody watch what you eat every meal, not just every meal, every moment.

00:36:00.200 And whether you have somebody virtually or literally watching you at every moment eat,

00:36:04.320 especially someone who you're not entirely comfortable with, that's going to have an enormous

00:36:09.460 impact. Isn't there a name for this that's like a car wreck? Is it the Hertz effect?

00:36:14.420 It's the Avis effect because actually I think Hertz might, in the seventies and eighties,

00:36:19.480 Avis was always behind. Oh, they were behind Hertz.

00:36:23.000 Yeah. Hertz or budget, budget rent. So their slogan, I thought it was great. Basically we're

00:36:27.300 number two. Then they would say, we try harder. We've got this inferiority complex. We're number two.

00:36:32.780 And that, I'm trying to think of an example. The Hawthorne effect is, it's almost like a

00:36:37.040 experimenter bias that the experimenter is watching the, observing the people under the lamp. That was

00:36:42.560 where this came from. Looked at work productivity with different lighting. They got the clipboard and

00:36:47.020 it could be your boss that's out there and looking and watching you. And so that experimenter is having

00:36:51.420 the effect with the Avis effect. So say that you were, say you, Peter, competitive Peter,

00:36:56.460 were enrolled in a cycling trial. Say it's open label and you get a placebo or you get nothing.

00:37:02.800 And you know that there's another group out there that say, I think we looked at this a little bit,

00:37:06.500 say it's like this lotion that you put on and supposedly it's supposed to improve your performance.

00:37:11.120 There could be a part of you that says like, I'm going to beat those guys, the control group.

00:37:14.780 They're going to say like, we're number two, like we're not getting the special treatment.

00:37:17.820 So we're going to win this thing. Sounds like that wouldn't happen. But I think that people,

00:37:21.740 if they enroll in a trial, sometimes there's that competitive nature. And it's also to your point

00:37:26.440 about if you have somebody watching you, that could also adjust your performance. Now you have this,

00:37:31.420 whether it's not physically somebody watching you, but you know that there's a trial that's

00:37:34.580 following you and that you know that they're going to be looking a year down the road, two years down

00:37:38.380 the road, or even three weeks down the road is probably more common. And they're going to test

00:37:42.240 you again and see how you do and see if you improve or see if you don't. Those things can play a role.

00:37:46.260 And just to the point of somebody that might be not pleasant, that's watching you as far as food,

00:37:51.400 there's a great Saturday Night Live clip with, it's the rock in the clip. And it's a commercial for

00:37:56.720 Nick Cottrell, and it's a smoking cessation treatment. And it's actually, the rock is named

00:38:02.360 Nick Cottrell. There's a guy on a couch and he's about to smoke a cigarette. The rock comes out

00:38:07.520 jacked, smacks the cigarette out of his hand. And it's one of the most effective smoking cessation

00:38:12.420 programs I've seen. I bet. I think there's a more sinister form of performance bias that creeps up in

00:38:20.000 clinical trials, especially in randomized control trials where you think at the surface, wow,

00:38:24.460 this is a really well done study. So you'll take two groups and let's say it's a weight loss trial.

00:38:29.780 We're going to test calorie restriction versus pick your diet, the all potato diet. So the calorie

00:38:37.320 restricted group is given some leaflets and it tells them how to measure calories, that they need to

00:38:46.420 cut their calories by 25% from baseline. And we'll see you in 12 weeks. The potato diet group is

00:38:53.480 given twice weekly counseling sessions on all the different ways you can cook potatoes so that you

00:39:00.480 don't get fed up and bored of eating potatoes all day on the potato diet group. And at the end of the

00:39:05.640 study, the potato diet group lost more weight than the calorie restricted group. It'd be tempting to

00:39:11.300 say, well, come on, this was a randomized control trial. I mean, but the problem is there's an enormous

00:39:16.320 performance bias in the potato group in that they were given far more attention. They were observed

00:39:21.940 more. They were given more coaching. They had much more of a positive behavioral influence.

00:39:28.760 I would say that's the number one bias that I see in RCTs that are lifestyle-based is that very subtle

00:39:36.780 performance bias. If you're really designing a trial well, you have to flatten the curve on those

00:39:46.020 differences. So each person in each group should be getting the exact same amount of attention,

00:39:51.840 the exact same amount of touch with the investigators, the exact same type of advice

00:39:58.260 so that you can eliminate that difference, which unfortunately shows up a lot.

00:40:03.520 That almost gets back to this idea of the null hypothesis that what'd you say? Coffee might

00:40:07.400 darken your eyes. That's your guess. You've observed it. You've got a couple of case studies of

00:40:11.840 some people in your family or whatever. And so that's your hypothesis. And then the way that

00:40:15.860 you design the trial is interesting because it's really like this coffee is going to be innocent

00:40:20.780 until proven guilty, the default position. And then it's really your role. And it seems almost

00:40:25.340 counterintuitive to a lot of people. And it's hard actually from a human perspective is that your

00:40:29.220 role is really to be as rigorous as possible to essentially falsify your hypothesis. You need to do

00:40:35.800 that as rigorously as you can. And sometimes I think to your point, sometimes it's like you get really

00:40:40.080 excited about a treatment. The people that are involved in the study, the investigators,

00:40:43.380 they're really excited about it. The control group or the placebo, it's almost an afterthought.

00:40:47.540 And so there might be a lot of things that they're doing in the treatment group, not just the treatment

00:40:50.820 itself that could bias the study. Yeah. Continuing on that thread of other things that you want to

00:40:56.260 look at in a study is, and we talked about this very briefly in passing, was the idea of differentiating

00:41:01.600 primary from secondary outcomes. And there's some debate about whether you can only have one

00:41:06.140 primary outcome or whether you can have co-primary outcomes. But the primary outcomes are basically

00:41:12.360 the outcomes for which the study is designed around and powered against. Again, we will come

00:41:16.420 to this idea of power in a moment. But there are lots of secondary outcomes and they're often

00:41:20.320 exploratory. It's really important that when people are pre-registering studies, they state what

00:41:26.200 the primary outcome is and what the secondary outcomes are. And typically a study that fails to

00:41:31.780 meet its primary outcome will be deemed a null study, even if it meets secondary outcomes.

00:41:38.660 So it's just very important to pay attention to the subtlety of that. And again, a good journal

00:41:43.100 with a pre-registered study is going to make that abundantly clear, but I can promise you that

00:41:47.300 someone writing about it in the newspaper is virtually never going to make that distinction.

00:41:51.660 And it's important to understand that because it gets to this next issue, which is kind of the

00:41:55.880 multiple hypothesis testing problem. Research should be hypothesis seeking or hypothesis testing,

00:42:02.520 but it can also be hypothesis generating. And so you can use statistical tools to slice and dice data

00:42:09.180 in multiple ways. And you can take many looks at data to see if you actually find something

00:42:16.260 significant there. You have to be careful because the more you look, the more times you look at

00:42:21.620 something, the more likely you are to find something that is indeed positive. So this isn't a great

00:42:27.640 analogy, but just to give you a sense of it, if you flip a coin, the fair coin, you've got like a 50%

00:42:32.860 chance of getting heads. If you get two chances to flip the coin, the probability that you're going

00:42:37.100 to get heads is now 75%. If you get three chances to flip a coin, you're up to 87 and a half percent

00:42:43.360 chance that you're going to get at least one head. 10 times, you're basically at 100% likely that

00:42:50.780 you're going to get heads. So if you're allowed 10 looks, you have to correct for that. And there's

00:42:55.800 something in statistics called the Bonferroni correction factor that does force you to do

00:43:00.320 that. It forces you to divide your p-value by N, where N is the number of times you've taken a look

00:43:06.720 at the data, so to speak. And therefore it raises the bar for what is significant. Again, we'll talk

00:43:12.680 about p-values for folks who maybe aren't as familiar with that in a second. Is there anything else

00:43:17.160 that you'd add to that? I'm sure I'm missing some things. Maybe a more technical term that we

00:43:20.680 didn't bring up, which is confounding. When we talked about the healthy user bias, I think that's

00:43:24.620 a great example of something that can confound your results. It's not in the causal pathway,

00:43:29.140 let's call it, that might be affecting the results, whether it's age, sex, smoking. The list is almost

00:43:34.940 endless. And this is what those observational studies will try to control for in order to almost

00:43:39.580 mimic what randomization would look like. Right. This is the sort of the bane of the existence

00:43:44.500 of the epidemiologist. If you're trying to determine a relationship between hot chocolate

00:43:48.940 consumption and skiing accidents, it's very likely that people who drink more hot chocolate are more

00:43:55.200 likely to have ski accidents. I mean, does skiing cause hot chocolate consumption or do ski accidents

00:44:01.180 cause hot chocolate consumption? Does consuming hot chocolate make you a worse skier? Or is it that

00:44:06.640 people who live in colder climates consume more hot chocolate and usually skiing occurs in colder

00:44:11.340 climates outside of Dubai? So climate, therefore, is obviously a confounder. And the goal is to be

00:44:17.880 able to identify every possible confounder when you're doing epidemiology. And I think as John

00:44:23.480 Iannidis argued when we had him on our podcast, that would be a good podcast for people to go back and

00:44:28.500 listen to alongside this. It's really not possible to identify, let alone eliminate all confounders.

00:44:35.080 Absolutely. So if we look at experiments or experimental studies compared to observational

00:44:40.820 studies, are there things you look for specifically or in particular for experimental studies to

00:44:46.000 increase or decrease your confidence in them? Yeah. Well, first and foremost, randomization.

00:44:50.060 So if an experiment isn't randomized, again, it doesn't mean that it's useless, but it just

00:44:54.440 means it's going to be a lot harder to really make sense of this. And randomization needs to be

00:45:01.500 a rigorous randomization. You can randomize incorrectly, believe it or not. I think there's

00:45:06.440 a very famous example with PredMed, which was a study that when it was published was kind of a

00:45:11.840 remarkable finding, a very large study, something like 7,500 people randomized into three groups,

00:45:18.140 2,500 per group, given basically two different dietary patterns, a Mediterranean diet in two versions

00:45:25.760 and a low-fat diet. This was a primary prevention study. So it was looking at people who are high

00:45:31.420 risk, but who haven't had heart attacks or anything yet. And it was looking at mortality.

00:45:36.160 And the study was actually stopped early. Again, something we're going to talk about in a second,

00:45:39.540 because it had such a positive effect. So the Mediterranean diet had such a favorable

00:45:44.120 effect relative to the low-fat diet that people were dying at a rate far less, such that it would

00:45:51.360 have been unethical to continue the study for the, I think the seven and a half years it was planned

00:45:55.300 to run. And I think they stopped it in the four-year mark and sort of declared victory.

00:45:59.140 But then something happened, Bob. What happened?

00:46:02.160 They went back and reanalyzed this PredMed group. The first paper was published in New

00:46:06.880 England Journal of Medicine 2013, and they had almost like a brand new article addressing some

00:46:12.560 issues. They did a reanalysis that was published in 2018. I think it came from this fellow named John

00:46:18.740 Carlyle, who had this way of looking. And I think this was, we've got an email on this with David

00:46:25.340 Allison, who's a great statistician. He talks about this in his article too, as well, where

00:46:30.240 he looked at this. But this fellow named John Carlyle did this analysis where he looked at

00:46:35.280 thousands of studies and he could flag the studies and see, does this truly look like randomization

00:46:40.260 based on some particular statistics? And the PredMed study was flagged looking like this doesn't look

00:46:46.740 like proper randomization. There might be something going on here. And I think according to the media

00:46:51.040 outlets, I think I read in the New York Times, they talked to the lead or the senior investigator,

00:46:54.860 and he said that it turns out that some of the villages or the clinics, I forget how many clinics

00:46:59.720 in total there were in the study, but at 11 of the clinics, one of the investigators were randomizing

00:47:06.180 the entire clinics to one group. If you really want to dig into a study, sometimes you really have to

00:47:12.020 get the story, which is oftentimes you look at randomization. Oh, that's really simple. You just

00:47:16.260 randomize people to different groups and blinded or unblinded. It's very hard to blind the fact you

00:47:22.020 see your neighbor get a delivery every week, a jug of olive oil or a sack of mixed nuts, which were

00:47:28.280 the two Mediterranean groups. And I think what happened was people started complaining in the

00:47:32.980 villages. They're like, what do I get? And they're like, you got your low fat diet pamphlet. Remember,

00:47:37.000 we give it to you every year. You can do that in a study. And that's typically referred to as a cluster,

00:47:41.440 a cluster randomization, where you might randomize one classroom to another classroom, which might be

00:47:47.060 convenient, but it requires different statistical methods. Let's use that example, because that's

00:47:51.800 actually a really good one, right? If you want to study the effects of meditation on attention span of

00:47:58.480 kids, it's very different to say, we're going to just take 100 kids and randomize 50 into one group, 50 into

00:48:06.500 another and separate them versus saying, we've got a class, two classes over here, two classes over

00:48:14.360 here. We're going to split those two and two into the effect. That's a totally different type of

00:48:20.920 randomization. One is a true randomization. One's a cluster randomization. And while you can do the

00:48:26.740 latter, it requires a different statistical adjustment. So Predimid basically had to reanalyze all of their

00:48:33.480 data in light of that. It turned out in the case of Predimid, the results still held, but it will

00:48:39.040 always kind of be a cloud that hangs over it. I think Ianides, to make this point, he was a huge

00:48:44.220 fan of the Predimid study. And something that he said, which I think might be intuitive, is if

00:48:49.120 they're randomizing entire villages to a group and they're not accounting for it, he thinks like,

00:48:53.720 I'm not sure that's going to be the only problem in that study. Everything was uncovered. But

00:48:57.700 on the flip side, it's really, really hard to do everything right in a study. You're going to make

00:49:01.920 mistakes. And now imagine randomizing a household. Dad, you're on a Mediterranean diet for the next

00:49:07.640 seven years. Mom, you're on a low-fat diet for the next seven years. I mean, it starts to get very

00:49:11.740 difficult. That's one important thing. You also want to make sure, is there a control group? Not all

00:49:17.360 prospective trials have control groups. Sometimes it's a single group where a person serves as their

00:49:22.820 own control, and there's typically a crossover. So you'll take a group, you'll randomize them into

00:49:28.700 two. It's not that one group is getting treatment A and the other group is getting

00:49:34.100 placebo or treatment B. Both groups get both treatments, plus or minus a placebo, in different

00:49:41.340 orders. And this is a great statistical tool provided the treatment doesn't interfere with

00:49:49.620 the washout. The treatment doesn't interfere with the control session. The reason this is powerful is

00:49:55.800 you need far fewer subjects when everybody gets to serve as their own control. So it greatly reduces

00:50:02.240 basically the cost and logistics of a study. But you run into challenges, right? So if you take

00:50:07.480 20 people are going to take this drug that is supposed to help them exercise better for eight weeks,

00:50:17.320 and another group is going to take a placebo for eight weeks and exercise, and then everybody switches,

00:50:23.380 because that's the right way you would do it. You had some people start first on the treatment,

00:50:26.920 some people start first in the placebo. Do you need a gap between the treatments?

00:50:31.920 Because will the effects of that drug linger into the placebo period for one group, which is not

00:50:37.300 what's happening to the other group? And even if it is, even if you're only doing it with one group,

00:50:42.260 are you confounding the effect of that treatment? I hope that makes sense, Bob. I don't know if I'm

00:50:45.820 making sense. I know you know what I'm saying, but is there a better way to explain that?

00:50:49.420 I think that makes sense. One other point I was going to make about that too,

00:50:52.220 with the crossover groups. I was going to ask you about that because I've seen

00:50:54.680 the statistical power, I guess you would call it, of the crossover groups. As you can see,

00:50:58.620 relatively small studies, not a lot of people, pretty short. And you look at the p-values,

00:51:04.100 and we'll get into that, but they're 0.000 something. The assumption, when I was thinking

00:51:08.600 about it, when you're talking about it, and they serve as their own controls, it's almost as if

00:51:12.340 they're treating them like if you could get identical twins and randomize those identical twins

00:51:16.800 to one group or the other, you would think that's great because you're controlling for so many things

00:51:20.840 about the physiology or the genotype, et cetera, about those people. And it's almost like they

00:51:25.840 treat these crossover groups, or you're almost cloning these people. You're comparing them to

00:51:29.420 themselves, but it's a good point. And there might be something about the order of the treatments that

00:51:34.280 they receive. If they get treatment A and then treatment B, maybe one might have an effect on the

00:51:38.720 other. The really good ones go A, B, and then B, A. They divide them into two groups and go A, B,

00:51:43.800 and B, A. And yeah, it really comes down to the fact that you can use what's called a paired T-test.

00:51:47.460 The simplicity of the statistic of the paired T-test is part of its elegance here and that

00:51:52.900 it basically eliminates a lot of variance. Okay. So then we talked earlier about this blinding.

00:51:57.720 What does that mean? So in an ideal world, both the subjects and the investigators should not know

00:52:04.600 who is getting the treatment and who is getting the placebo. At a minimum, the subjects should not know.

00:52:12.120 That would be single blinding. But again, double blinding is always preferred if possible,

00:52:18.340 because the investigators can be biased. They can have hidden biases if they know the outcome. So

00:52:27.580 for example, if patients are being given a drug for weight loss, you could say, well, it's pretty easy to

00:52:33.800 blind the patients from that. But if the investigators know that, they might behave differently towards the

00:52:39.860 patients for whom they expect greater weight loss if they believe that this drug is effective. So

00:52:44.940 again, very important and sometimes very challenging. You know, I think we talked about this in the

00:52:48.940 podcast with Rick Doblin. One of the huge challenges of studying psychedelics is it's very difficult to

00:52:55.280 blind anybody. Most of all, the user, the subject. One group is getting psilocybin and the other group is

00:53:02.440 getting, even if it's niacin, which causes some flushing, it's not hard to know which group you're in.

00:53:07.560 And that may affect the results. Size matters, duration matters, and basically the generalizability

00:53:15.340 of the study. So is it in a population that replicates or looks like what I'm interested in

00:53:22.480 studying, whether it's me or my patient or whomever I care about? And there are strengths and weaknesses

00:53:27.320 to mass heterogeneity of studies. So the more heterogeneous a study in terms of its patient population,

00:53:34.640 well, the more generalizable the results are, but the higher the bar for finding it. So I think

00:53:39.880 this has got a lot of attention lately, but I think for a while it was a relatively unknown kind

00:53:43.320 of dirty little secret of medicine was how many clinical trials involved men only? How many drugs

00:53:50.500 were approved for both men and women, but on the basis of only being studied in men? And the rationale

00:53:55.620 for this was that it was more complicated to study women. So women, especially premenopausal women,

00:54:00.840 because they have a menstrual cycle, that really changes things hormonally. And therefore it's

00:54:07.900 more complicated to do studies and look at drug kinetics and all sorts of things in women.

00:54:13.420 And so the easier way to do that was to just study it in a homogeneous population of men.

00:54:19.220 Well, of course that poses an enormous problem if you're now trying to extrapolate the utility of

00:54:23.800 that drug in women. It's an extreme example, but a very important one. For large studies,

00:54:28.180 you tend to want to know, is this a multi-site or a single site? Again, PREDIMED is a great example.

00:54:33.380 So you had a multi-site study and there were probably significant differences between how

00:54:38.100 the sites were run. So there's an advantage to multi-sites because in theory, it brings more

00:54:43.020 heterogeneity. It should cancel out the effect of any one study over another, but it's harder to control.

00:54:49.820 And therefore you can have, whether it be deliberately or non-deliberately rogue studies

00:54:55.120 or sites rather introducing more bias. I think another thing I really look at here is how big

00:55:00.980 is the association of the effect? We'll talk about this with power, but you can have something that

00:55:06.360 is statistically significant. In that sense, the study is quote unquote a success, but it's

00:55:11.520 clinically irrelevant. The effect is not that big. So we've tested this new drug for blood pressure

00:55:17.360 and it lowers systolic blood pressure by one millimeter of mercury after a year of use.

00:55:22.880 And it's like, okay, that might be statistically significant. If the study was large enough,

00:55:27.620 is it clinically significant? Almost assuredly not. You want to pay attention to what the adverse

00:55:32.520 events were, both in frequency, severity, and distribution. You want to pay very close attention

00:55:37.680 to who funded the trial. Trials don't fund themselves. And a lot of trials are funded by drug companies.

00:55:44.840 Now, again, they're usually done with very clear data monitoring and data analytics. And despite

00:55:52.580 all of the fear mongering out there, it's not like pharma really gets to put their hand on the scale

00:55:57.540 of these pharma studies. But where I think things can get a little dicey is in terms of things getting

00:56:05.640 buried in supplemental journals and things like that. So you do want to pay a bit of attention to

00:56:09.760 who's funding a trial. And I think even more important than that is kind of understanding what

00:56:13.900 the conflicts of interest are of the authors. And nowadays those have to be declared, but there's been a

00:56:19.060 huge amount of hoopla over that. And there have been some very famous examples of people who are

00:56:24.140 on editorial boards of journals or publishing like crazy and not declaring that, hey, I'm a paid

00:56:30.340 consultant of these 10 pharma companies and I'm writing or doing experiments on drugs by these people,

00:56:37.360 or I'm an editor on journals that are commenting on this. And then finally, you really want to

00:56:42.120 understand if the study was adequately powered. And this becomes very important if the study has a null

00:56:46.940 outcome. You want to just spend a minute and we'll talk about power?

00:56:50.480 Yeah, I think that makes sense.

00:56:52.360 Power is defined as one minus beta, where beta is defined as the probability of a false negative.

00:57:02.140 Let's contrast that for a moment by talking about what a false positive is. A false positive

00:57:06.760 is defined as alpha, and that's also known as the p-value. I think this is actually complicated and I

00:57:13.280 want to just spend a minute on this. So everybody's heard of a p-value, but I don't think people think

00:57:17.920 of it as a false positive rate. I don't think most people have heard of the false negative rate being

00:57:22.520 beta and then one minus beta being the power. So I think people probably always know we talk about

00:57:27.520 p-values being 0.05 or less. It's very difficult to make a case that we're going to look at a study

00:57:33.420 that has a p-value of 0.1 and say it's significant. So what does that mean? So the p-value is, as I said,

00:57:44.780 it's the probability that what you've seen is a false positive. You see an effect, it's actually by chance.

00:57:55.320 It's not the true effect. You do a study and you're trying to determine if, this is stupid, but coffee changes

00:58:02.200 eye color, makes your eyes darker. And if you do that study and lo and behold, it appears that coffee

00:58:10.360 did make the eyes of the subjects darker and the p-value is 0.17, it means there's a 17% chance

00:58:21.300 that this was a false positive. So let me kind of restate this. So a p-value is basically trying to

00:58:27.560 answer the question, what's the probability of rejecting the null hypothesis when it is in fact true.

00:58:34.020 If the p-value is 0, it means it's impossible. And if it's 1, it means you are absolutely going to do it.

00:58:41.900 So obviously we want p-values that are as small as possible. It can never be 0, but you want them to be

00:58:48.360 as close to 0 as possible. And basically we say 5% is our minimum threshold, really our maximum threshold.

00:58:55.980 That's the ceiling that we'll put on this idea. We go back to what we talked about at the outset.

00:59:00.540 So the default position is that the null hypothesis is correct, that there is no difference between

00:59:06.960 the groups. So this term statistical significance basically means that the null hypothesis is rejected

00:59:16.060 if the p-value is less than that pre-stated level. I don't know if I'm explaining this really,

00:59:20.720 really well, Bob. Is there anything you would add to this? Because I think this is an important idea,

00:59:25.160 even though its p-values are so ubiquitous. I think it's maybe worth spending one more minute

00:59:30.160 on it before we go back to power. Sounds like it makes sense. I'm trying to think of somebody who

00:59:34.560 might not understand it as well. Those examples that you gave are good. And this is, so you see

00:59:40.760 on most papers, I think you'll see this p-value of 0.05. We can get into the confidence intervals,

00:59:46.220 but you'll see 95% confidence interval and p-value of 0.05. And that's your false positive rate.

00:59:54.080 But it's an arbitrary threshold. So you could try to submit a paper. And I've seen,

00:59:58.960 I usually catch it by the confidence interval. I'll see 90% confidence interval on some figure

01:00:03.420 or table. And I'll look at it. They'll use a p-value of less than 0.1. And maybe they have

01:00:08.280 some justification for it or not, but it really is this arbitrary threshold. Like imagine if your

01:00:13.140 p-value was, if it's less than 0.95, we're going to reject the null hypothesis. In theory, it's not

01:00:20.120 exact, but if it's the chance of this being a false positive is about 90% based on your analysis,

01:00:26.560 you would reject the null hypothesis that there's no difference between these groups,

01:00:29.920 which sounds sort of insane. So it's, I think it was this guy Fisher who established this 0.05,

01:00:35.420 but this has been the threshold. More or less, they're willing to accept, at least for the

01:00:39.300 purposes of a single trial, that the p-value of 0.05, that they're willing to accept a level

01:00:44.280 of false positive in their results and still make that claim that they rejected that hypothesis.

01:00:50.020 Right. Because if you make the p-value so low, if you say, no, my threshold is 0.0000001,

01:00:55.900 then you really run the risk of discarding a lot of information that turns out to be kind of relevant.

01:01:02.720 It is a fine balance between those two.

01:01:05.780 And that would be a false negative.

01:01:07.140 Exactly.

01:01:08.080 The lower there. Yeah. There might be an effect, but you're not going to see it.

01:01:10.640 So this false negative rate, we typically allow to be a larger number. It's typically between

01:01:18.400 10 and 20%. The flip side of that is we have 80 to 90% power because one minus accepted false

01:01:27.220 negative rate is called your power. I think this is one of the most important concepts to understand

01:01:32.000 in designing any sort of clinical trial, whether it's humans, animals, any sort of intervention.

01:01:37.260 So there's a table, they're all over the place, but this is the one I've always liked. It's old. It's

01:01:41.900 probably 10 years old, probably longer than that actually, but it's out of a great cancer textbook

01:01:46.900 on clinical trials. So pull up this table, Bob, and we'll kind of walk through.

01:01:50.940 Got it. Power table.

01:01:52.820 Okay. These look a little intimidating at the outset. So let's kind of walk through

01:01:57.500 how to interpret this. So what this table is saying is you want to presuppose, you know,

01:02:07.620 what the difference is between the treatment groups. You have to say, I believe that the

01:02:15.500 difference between the success rate in the treatment between group A and group B is going

01:02:21.360 to be X percent. And the smaller of the two is Y percent. Let's come up with a real number.

01:02:29.900 So I think that we are going to look at how this drug impacts your rate of surviving a urinary tract

01:02:42.020 infection or curative disinfection. And I think that the placebo group is going to have a success rate

01:02:49.520 of 25%. And I think that the treatment group is going to have a success of 35%. So I think there's

01:03:01.140 a 10% gap. And I think the lower of those two is 25%. So you go to 0.25 on the horizontal axis,

01:03:10.760 and you go to 0.1 over on the column, and you'll see there's two numbers there, 459 and 358. And the

01:03:21.120 upper of those two is if you want 90% power, i.e. 10% false negative. And the lower of those two

01:03:28.720 is for 80% power or 20% false negative rate. And those numbers basically tell you how many people

01:03:36.260 you need in each of the two treatment groups, if you want to be significant at a level of 0.05%.

01:03:43.380 So what do you notice when you look at this? You notice that the bigger the gap, the bigger the effect

01:03:53.560 size between the two groups, the fewer subjects you need. So if you march for left to right in this table,

01:04:01.180 holding that effect size at 0.25, if you say, well, the difference is 15%, it goes to, you only need

01:04:09.540 216 or 165. If the difference is 30%, so one group is going to have a 25% success rate, one group's going

01:04:17.760 to have a 55% success rate, you're down to 60 and 47. And if you go out to a 50% difference, so one group

01:04:24.720 is going to have a 25% response rate, the other group's a 75% response rate, you're now down to needing

01:04:29.700 somewhere between 18 and 23 people per arm. And by the way, if you go down to 5%, one group responds

01:04:37.700 at 25%, the other at 30%, you're at 1700 or nearly 1300, depending on your level of power.

01:04:46.380 I appreciate everybody kind of bearing with me as I went through this power table. It seems like one of

01:04:50.360 the driest things in the world, but as my mentor once told me, it's the single most important table

01:04:56.100 you should ever familiarize yourself with. If you want to be in the business of designing

01:04:59.460 clinical trials or basically any sort of experiment, because it is just so easy to get

01:05:06.900 this wrong and over or under power an experiment. What does that mean? So to under power experiment,

01:05:14.940 I think is the more common mistake here. You simply don't have enough people in the study

01:05:21.180 to appreciate a difference. If it is there, the study ends up being null. The P value does not exceed the

01:05:29.400 threshold of 0.05. And you say, look, there is no difference between treatment A and treatment B.

01:05:35.640 When in reality, there may well have been, but you didn't have the power to determine it. And therefore,

01:05:43.200 you don't actually know if you should have rejected the null hypothesis or accepted it.

01:05:48.840 Yeah. Yeah.

01:05:50.160 I think the other problem, equally sinister, perhaps not as common, is when a study is overpowered.

01:05:58.080 And now you have more people in the study than you should have had for the effect size. And you start

01:06:05.500 to find things that are statistically significant, but are probably irrelevant clinically. That's when

01:06:11.960 you start to pick up an effect size of 1% when you're dealing with something clinically that should

01:06:17.560 never be thought of as being relevant below 10% detection threshold. So notwithstanding the fact

01:06:23.160 that you also probably had more people in a study than you needed to, it could have cost more. And

01:06:28.680 you typically don't see this as much with clinical trials, but you'll see this more with kind of data

01:06:32.400 dump trials, data mining studies where they're grossly overpowered. Okay. I kind of got a way off on

01:06:38.160 a tangent there. I don't know why I went down that path of power, but I know it's important.

01:06:43.100 So I think we got on the subject because we were looking at things you look for in an

01:06:47.380 experimental study that increase or decrease your confidence in it. And I think it's something

01:06:50.820 that's, if people have this list, it's often left off. I think it's important.

01:06:55.320 Yep. Okay, good. So yeah, power matters. And when you look at a study and it's not significant,

01:07:02.700 you should ask the question, was this study powered correctly? I can't tell just by looking,

01:07:08.200 I actually have to pull out that table we just went over and go through the matrix and go, okay,

01:07:13.820 well, this is how many people were in it. Therefore at 80% power, they were detected to

01:07:19.560 tell a difference between the two groups of this much with an effect size here. And then a lot of

01:07:24.060 times I go, oh, wow, this study wasn't powered appropriately anyway. So I've learned nothing new

01:07:27.880 here, unfortunately. Is it true that you have a laminate of this in your wallet, this power table?

01:07:32.100 I don't anymore, but I used to have a laminated copy at my desk. Yes.

01:07:37.900 I made placemats out of it for the kids. They love it.

01:07:40.200 Very nice. Hours of enjoyment. Related to this, I think without looking at their power analysis,

01:07:46.980 which often, I think this is like maybe a tip is often you won't see anything in their paper,

01:07:51.180 but you might see it in the protocol. If they include that, they'll talk about how they powered

01:07:55.440 the study. What was their justification? What was the effect size that they were looking for?

01:07:58.840 And how many participants did they need? And then you can look at how many they actually got

01:08:03.140 in the trial that actually completed the trial or enrolled in the trial. And to your point of

01:08:08.820 overpowering a study, sometimes you might be able to discern it if you're looking at that example,

01:08:12.320 or I think you said there's a drug that lowers systolic blood pressure by one millimeters of

01:08:17.720 mercury, and the results are statistically significant. I think that might put up your feelers and say

01:08:22.800 how many thousands of patients were in this study. And I think that that gets to another question,

01:08:27.900 which is looking at how these differences are actually determined when you're looking at the

01:08:33.420 effect in one group versus the effect in another group. So what are some of the ways in which

01:08:37.460 researchers measure the association or the quote unquote effect size in these studies?

01:08:42.580 A lot of times it's only reported as a relative risk. You and I have harped on this in the past,

01:08:46.600 which is you can't really talk about relative risk without knowing absolute risk. And sometimes they

01:08:50.540 don't give you enough data in the paper to do that. And it's infuriating actually.

01:08:54.040 But absolute risk is, let's use the example, right? It's sort of like group one had at the end of this

01:09:01.580 study of 5% risk of dying. And the other group had a 3% risk of dying. So what's the absolute risk?

01:09:14.320 It's 5% in one group, 3% in another. So therefore we have what's called the ARR or the absolute risk

01:09:22.000 reduction is the delta between those two. So the ARR is 5% minus 3% is 2%. There was a 2% absolute

01:09:33.440 risk reduction. And that's important to know because often what's only reported is the relative risk

01:09:41.800 reduction, which is the absolute risk reduction over the non-exposure absolute risk. So in this case,

01:09:48.140 the relative risk reduction would be whatever the absolute one was, I think 2% divided by the

01:09:56.920 non-exposure risk, which is 5%. So that's 40%. So they had a 40% relative risk reduction going from 5%

01:10:05.360 to 3%. Both of those things are important, but again, it's really critical that you know both.

01:10:13.740 One of my favorite examples of this, of course, is the famous Women's Health Initiative, which was

01:10:19.940 looking at the increase in the risk of breast cancer for the women who were receiving the estrogen and

01:10:28.580 synthetic progesterone treatment. Now, notwithstanding the fact that, and we've talked about this a hundred

01:10:33.980 times why I don't think that that study was a good study in any way, shape, or form, and I don't think

01:10:38.640 that the study demonstrated there was any difference in risk. Statistically, here's what got reported.

01:10:44.220 It got reported that the women receiving the hormone replacement therapy had a 25% increase

01:10:52.660 in breast cancer. And that was true at a relative risk level. But the absolute risk difference was

01:11:00.840 a difference of 5 women per 1,000 to 4 women per 1,000. So as you went from 4 cases of breast cancer

01:11:08.680 per 1,000 women to 5 cases of breast cancer per 1,000 women, that is indeed an increase of 25%. 5 minus 4

01:11:15.600 is 1 divided by 4 is 0.25. But what's the absolute risk reduction, or in this case, risk increase?

01:11:22.060 It's 1 over 1,000, or 0.1%. So what I usually say to women when we're talking about hormone

01:11:33.340 replacement therapy is you can kind of use that as your ceiling for the true risk increase of this

01:11:39.380 therapy, even if you discount the 12 mistakes in that study that make it hard to believe that that

01:11:45.880 effect size would hold. So another way that we tend to measure effect size or association is using

01:11:51.500 something called a hazard ratio. A hazard ratio actually involves some really complicated math

01:11:56.740 that we're not going to get into. Something called a Cox proportional hazard, which I'm embarrassed to

01:12:01.880 say, I don't actually know the math anymore. There was a day when I did, and I remember it was not easy

01:12:07.160 for me to learn. I had to go out and buy a bunch of books on statistics because even though my

01:12:11.360 background's in math, I did not have a huge background in stats. It wasn't like rocket science, but I

01:12:16.020 remember really having to understand the mathematics behind the Cox proportional hazard.

01:12:21.500 The magic of the hazard ratio is it is temporal. So it captures the risk of something, i.e. the hazard

01:12:30.100 over time. And that differentiates it from something called an odds ratio, which can't do that, which

01:12:35.320 only can measure over the entire period of time, what is the risk? So at the risk of oversimplifying

01:12:41.780 this a little bit, let's talk about the hazard ratio over a given period of time, but acknowledging

01:12:47.500 that it's real magic is its ability to tell you what's happening at any point in time. Let's just

01:12:54.620 pretend we're talking about a cancer drug trial and the hazard rates, i.e. the rates of disease

01:13:00.380 progression were 20% in one group and 30% in another group. So the people getting the drug progressed 20%

01:13:09.420 of the time, the people not getting the placebo progressed 30% of the time. So the hazard ratio

01:13:16.380 is the ratio of 0.2 to 0.3, which is 0.667. So in other words, the treatment group was 67% as likely to

01:13:26.380 experience disease progression as the control group. You could flip the math and say, well, what if you saw

01:13:31.900 the exact same rates, but in something that was desirable? So then it would be the 0.3 over the

01:13:38.260 0.2 would be 1.5. So your hazard ratio would be 0.15, which means there's a 50% increase in the

01:13:47.020 benefit or the harm if it's something that's harmful. So again, hazard ratios are, I think,

01:13:53.360 ubiquitous in clinical trials. You'll see them everywhere. And the thing you just have to know

01:13:57.740 is how to do the math on it. So Bob, I'll quiz you and you tell me and the listener how you're

01:14:02.080 figuring this out. The hazard ratio is 0.82. 0.82. You're comparing the experiment to the control

01:14:12.000 or the experimental group to the control. The experimental group is, I would probably flip it.

01:14:17.900 I would say they have an 18% reduced risk of whatever the event is you're talking about of progression.

01:14:23.540 All right. All right. So give me one, Bob. Nice. How about we'll go the other way,

01:14:29.020 the other side of one. Hazard ratio of 2.2. If we said 1.8, it would be an 80% increase.

01:14:37.500 2.2 would be 120% increase. How are you doing that? You're taking 2.2, you're subtracting one

01:14:44.860 and you get 1.2 and you multiply by 100%. And what you did earlier was when I gave you 0.82,

01:14:52.580 you took 1 minus 0.82 and you got negative 0.18, which is a reduction of 18%. So again,

01:15:02.280 you can just play with these for like five minutes. It's actually not that complicated,

01:15:06.280 but you just have to do a bunch of them and become familiar with what those numbers mean.

01:15:10.160 Now let's bring it back to the ARR thing. There's another common theme you'll hear about in trials

01:15:18.060 called the number needed to treat or the NNT analysis. And this gets back to the importance

01:15:25.740 of absolute risk reduction. Let's say there's an example of, let's use the same numbers we used

01:15:32.720 earlier. They're familiar to me, but you've got a drug that the people who take it have four heart

01:15:40.360 attacks per thousand people over a five-year period. And then the placebo, they have five events over

01:15:48.240 that same period of time per thousand people. The drug reduces the events from five out of a thousand

01:15:55.340 to four out of a thousand. So what's the relative risk reduction there? The relative risk reduction

01:16:02.820 is 20%. Four minus five divided by five in this case is a 20% relative risk reduction. So you might

01:16:09.420 say, this is something we should be putting in the drinking water. This is such an important thing,

01:16:15.920 but you want to calculate how many people do you need to treat to prevent the event? And to do that,

01:16:23.320 you have to take one and divide it by the absolute risk reduction, not the relative risk reduction.

01:16:28.100 And the absolute risk reduction here is 0.01%. And one divided by 0.01% is 1,000. So now you have to

01:16:37.260 treat a thousand people to achieve the effect, which means you better figure out what the side

01:16:44.380 effects are of that thing, what the cost of that thing is, what the complexity of it to justify it.

01:16:48.480 There may be certain things for which a NNT of a thousand is valuable, but you wouldn't say that

01:16:54.380 across the board. Conversely, if you have a drug that reduces the risk of death from

01:16:57.960 4% to 2%, or say 4% to 3%, then you would say 4 minus 3 is 1%. 1 divided by 1% is 100. If it took it

01:17:10.680 from 4 to 2%, it would be 1 divided by 2% is 50. If it went from 4% to 1%, a reduction of death from 4%

01:17:20.320 to 1%, your difference is 3%, you're now talking about an NNT of 33. As a general rule, we love to see

01:17:29.320 drugs in that sub 100 range of NNT. We tend to not get that impressed when the NNT of something is like

01:17:37.720 1,000. So again, that's another way to think about the effect size.

01:17:43.140 Okay. I'd like the number needed to treat from a clinician's perspective or from a practical

01:17:47.260 perspective. It's really telling, embedded in there, obviously, is the absolute risk and not

01:17:51.660 just relative risk. So we went over P-values and confidence intervals a little bit. I don't think

01:17:57.500 we went over confidence intervals as much as P-values. Do you want to stop and talk about confidence

01:18:01.360 intervals? Sure. Why don't you take this one? I need a drink.

01:18:05.400 Okay. By the way, non-alcoholic, just for those listening.

01:18:08.680 Drinking my... What are you drinking?

01:18:10.420 Drinking my Ghia with Topo Chico.

01:18:12.940 So confidence intervals are technically intervals in which the population statistic could lie.

01:18:18.320 Typically, I think what you see on a paper is this 95% CI. It's usually abbreviated, but it's a 95%

01:18:24.380 confidence interval. And it's usually reported next to the hazard ratio that we just talked about.

01:18:29.240 Say the hazard ratio is 0.5, which means a halving of the risk, say in the experimental group versus

01:18:35.560 the control group. And then you'll see this 95% confidence interval. And it might say,

01:18:40.320 they'll give you these two numbers. For example, let's just say it's 0.2 to 1.2 is your confidence

01:18:47.300 interval. And what that is, is that's the flip side of the significance level, which is one minus

01:18:54.860 alpha. So we've talked about alpha being the P value, but also being the false positive rate.

01:18:59.840 So it's the flip side. So when you see 0.05 for your P value, that's a tip off that your confidence

01:19:04.680 interval is 95%. And I think a lot of people think about the word confidence in this definition,

01:19:10.620 and they take it to mean the probability that a specific confidence interval... So my example,

01:19:16.000 they go 0.4 to say 1.2. That interval between those two numbers or between those two ratios

01:19:21.820 contains the population parameter. They think, okay, we could be 95% confident that the true

01:19:27.740 effect, say meat consumption and cancer, is between these two numbers. But that's not really

01:19:34.040 what the confidence interval suggests. It's more of a suggestion. I don't think this often happens

01:19:38.240 in practice, but if you were to take 100 different samples and compute this confidence interval,

01:19:43.700 then approximately 95 out of those 100 will contain the true mean value.

01:19:48.080 So it's been described by some as an uncertainty interval rather than a confidence interval.

01:19:53.600 So there's another way to do this. It was just kind of a quick and dirty way to do this is just

01:19:56.880 to look at the confidence interval and ask if the interval contains one or not. You gave an example

01:20:01.800 a second ago, Bob, you said your hazard ratio was what?

01:20:04.580 Hazard ratio was 0.5 with a confidence interval of 0.4 to 1.2.

01:20:11.420 Okay. So that would not be significant. So even though your hazard ratio, you might look at that and say,

01:20:16.260 oh, look, that's a big reduction. 0.5 hazard ratio means a 50% reduction. But your confidence

01:20:23.880 interval was very wide. It was all the way from 0.4 up to one point something. So it crosses over

01:20:30.300 unity. Conversely, if you had a hazard ratio of 0.5, but your confidence interval was 0.4 to 0.6

01:20:39.840 or 0.7 or even up to 0.9, you would say, indeed, that is at the level of 95% that is confidence.

01:20:49.060 So the other thing you'll notice, by the way, is the closer one edge of the confidence interval

01:20:55.060 comes to one, the closer the p-value is to 0.05. When you have like a confidence interval

01:21:01.400 that runs from 1.01 up to two, your p-value is probably about 0.049 or something like that.

01:21:11.780 Whereas when you have confidence intervals that are miles away from one, the p-values tend to be

01:21:17.040 very small. Yeah. That statistician, Andrew Gelman, he talked about uncertainty intervals. And the reason

01:21:23.840 why he says that is imagine you've got a huge confidence interval is what we call it. So big

01:21:28.360 confidence interval, meaning instead of 0.4 to 1.2, it was like 40% reduction at 0.4, or it went

01:21:34.720 all the way out to like, say a thousand. He would say like, that's a huge uncertainty interval. But

01:21:39.500 the way that we think about confidence is that's a huge confidence interval. And it's maybe intuitively

01:21:43.320 backwards for some people to think about it that way. Yep. Absolutely agree. The tighter the interval,

01:21:49.380 the more confidence you actually have in it. And obviously it can't cross one.

01:21:53.540 The less uncertainty.

01:21:54.100 That's right. The less uncertainty there is. When you get these monster ones,

01:21:57.840 and this is why I like those sort of tornado graphs that you see in meta-analyses where

01:22:03.200 you visually get to see how much uncertainty existed in a given study. Now the confidence

01:22:09.100 interval, here's a great example. Hazard ratio was 1.4. Oh wow. 40% increase. The 95% confidence

01:22:17.940 interval went from 1.1 to 17. Do I really have a lot of confidence in that? No, that's an enormous

01:22:26.380 uncertainty interval. Yeah. And of course you would want to know, okay, what are we talking

01:22:30.580 about here? Absolutely. And not just relatively. Yeah. Yeah. Yeah. I mean, look, I think the takeaway

01:22:35.320 of this entire section is if you make the decision that you want to pay attention to science, you just

01:22:44.080 have to roll up your sleeves and accept the fact you're not going to be able to read these things in

01:22:47.720 the bathtub on a lazy Sunday morning. You kind of have to roll up your sleeves and pay attention to

01:22:53.380 all of this little stuff. Now it gets easier the more you do it. When I read a paper today,

01:22:58.020 it's so much easier than it was 25 years ago, but you still have to kind of have your guard up for all

01:23:03.820 of these things. You might learn something new from virtually every paper that you read. So you

01:23:08.280 didn't read a paper 25 years ago and then you read your second paper today. You've read a multitude of

01:23:13.240 papers. And in each one, there's probably something educational in there that you might pick up. And

01:23:17.100 that probably goes to at the beginning of the episode, you're talking about, I think, Tim asking,

01:23:21.220 how do I get better at this? And it's probably like this consistent repetition, read a paper,

01:23:25.300 you know, your favorite paper every week. And you can see that some of the stuff we've talked about

01:23:28.380 here, I mean, we just went deep into some statistics and not even that deep, right? I mean,

01:23:31.820 we didn't really explain what the Cox proportional hazard is and things like that. And we didn't

01:23:35.600 differentiate odds ratios with hazard ratios, which requires getting into more math.

01:23:39.080 Look, you still have to be able to kind of crunch some numbers sometimes. And it's unfortunate that

01:23:43.500 I think a lot of people in the media don't know how to do this. And yet they're the ones that are

01:23:46.880 reporting on these things. So if you're getting your science info from Twitter and from the news,

01:23:53.680 there's a little bit of a buyer beware. You have to understand the fact that it's very likely that

01:23:58.700 the people that are reporting these things, not because they're necessarily not well-intentioned,

01:24:03.320 but they themselves might not be doing the type of analysis that's necessary.

01:24:06.160 So another question we got is, do studies ever stop midway through? If so, what are the reasons?

01:24:12.840 Yes, they do. There are generally three reasons that studies are stopped. And again,

01:24:18.620 we're really talking about prospective clinical trials here. So the first and most important of

01:24:22.760 these is safety. So remember we talked about phase one, phase two, phase three. Well, phase one is all

01:24:28.100 about safety. Phase two is about efficacy and safety. Phase three is really about effectiveness and

01:24:33.200 safety. But notice safety is in all of those. So absolutely anytime there's a safety breach,

01:24:40.920 which means there is a statistically significant difference between an important safety metric

01:24:47.200 between the groups, that'll just stop the study. The second thing that will stop a study is benefit.

01:24:52.780 Again, the PREDIMED example of when it was first done is it stopped two thirds of the way through

01:24:58.580 because it was deemed that there was such a benefit to the group on the Mediterranean diet relative to

01:25:04.760 the low fat diet that it would have been unethical to let those people in the low fat diet continue for

01:25:10.020 another two and a half years on a diet that was so clearly increasing their risk of mortality.

01:25:16.000 And then the final thing that will stop a study prematurely is futility. This is a little bit harder

01:25:20.220 to understand, but it actually comes down to that hazard ratio concept, which is able to measure risk

01:25:26.560 temporally in an aggregate fashion. So if two thirds of the way through a study, there's no benefit

01:25:33.480 and statistically, you know that nothing that's going to happen in the remainder of the study is

01:25:39.040 going to change that, you stop the study. It's futile to continue the study. Those are basically your big

01:25:45.300 three reasons why a study is going to be stopped. So Peter, I think a good example of stopping a trial

01:25:51.780 for safety was actually there's been several, but one of the CTAP inhibitors, torsitrapid. I don't

01:25:57.740 know if I pronounced that correctly. I remember this really well. This is one of the few moments in

01:26:01.860 science where I remember where I was standing when the result was announced. It was Q4 of 2006. I was at

01:26:08.720 McKinsey at the time and I was walking up Kearney towards California street. I heard the news of this and

01:26:18.000 I couldn't believe I was so sure this was going to be a home run study. Yeah. So in this case,

01:26:24.960 the trial was set up with 7,500 patients about in each group. They're on a CTAP inhibitor and they're

01:26:30.400 all on statins. So they're on Lipitor in particular. And so they compared the CTAP inhibitor to just

01:26:36.460 Lipitor alone, which serves as a control group. It was actually the CTAP inhibitor plus Lipitor versus

01:26:41.500 Lipitor alone. And this was a Pfizer study, which everybody thought was very cheeky of Pfizer because

01:26:47.960 Lipitor was about to come off patent. Their way of sort of extending the life of it was saying,

01:26:53.340 hey, when you pair the CTAP inhibitor with Lipitor, it's going to have a benefit because

01:26:56.940 the background on this is that CTAP inhibitors raised HDL cholesterol. So it's like, we're going

01:27:02.320 to take a drug Lipitor that lowers LDL cholesterol. We're going to pair it with a drug that raises HDL

01:27:07.100 cholesterol. How could this possibly go wrong? Famous last words.

01:27:11.440 They intended to follow a patient's almost a five-year trial, four and a half years.

01:27:18.300 And along the way, they'll have a review board that's looking at the results of the study. And

01:27:22.060 in this case, they had a monitoring board that was looking at, in this case, they're looking at death,

01:27:28.680 all-cause mortality. And they found that 82 patients receiving the drug combination had died compared

01:27:34.700 with only 51 on Lipitor alone. And so they advised Pfizer to halt the trial at that point,

01:27:39.980 which it did immediately, which it was just a little over a year into the trial when they did it.

01:27:44.600 And in a way, it gets back to when you're talking about power analysis, the way that they do this is

01:27:48.660 they have pre-specified p-values where they're kind of sneaking looks at the data. Like we talked

01:27:54.740 about multiple hypothesis testing that they're actually taking a few shots on goal in a way,

01:27:58.980 because after 12 months, they're going to actually compare the two groups,

01:28:01.660 see if this thing is a p-value of less than 0.01, depending on what they're looking at.

01:28:06.600 And in this case, they had this pre-specified p-value of less than 0.01 based on a test for

01:28:12.860 death from any cause. And they found that. And actually the paper, there was still a published

01:28:18.920 paper, even though the trial only went for 12 months in the New England Journal of Medicine,

01:28:22.540 and they report those endpoints where the study was stopped.

01:28:27.040 And that's a whole other discussion about why did that happen? Because

01:28:30.780 other CTEP inhibitors would go on to face the same fate. So the first thought was, well,

01:28:35.080 it was this particular drug, but it turned out that CTEP inhibitors in general are not a good

01:28:40.100 thing. At best, they do nothing. And at worst, they kill people and probably had to do with the fact

01:28:45.780 that they're altering HDL function. But anyway, that's another discussion. In fact, I feel like

01:28:50.520 Tom Dayspring and I, or Ron Kraus and I talked about this at length on one of our podcasts. I think it

01:28:54.720 was Tom and I-

01:28:54.980 I think he did.

01:28:55.600 Yeah, that talked about this. It's super interesting.

01:28:57.300 So I remember you telling that story. It's just a quick follow-up question. I guess, A,

01:29:01.860 was this the first CTEP inhibitor that was tested? Okay. And then you had the follow-up one. I don't

01:29:07.720 think we've really addressed this, but why would you do an observational study over a randomized

01:29:12.060 controlled trial? In some cases, a good example is try getting it past the IRB, a randomized controlled

01:29:17.920 trial, and give people a carton of Marlboros. They're going to smoke that over each day compared

01:29:22.780 to a placebo group. It's unethical. And so with this, I don't know how many times they

01:29:27.040 saw these adverse events with each CTEP inhibitor, and it's, I guess, the same drug class that

01:29:31.360 might have different mechanisms. But does there become a point where maybe that's why they

01:29:35.100 have the phase one, phase two, phase three, and they get past those barriers? But in a

01:29:39.520 way, do you almost assume it might be unethical to run another CTEP inhibitor trial if you're

01:29:43.720 seeing differences in death the last however many times?

01:29:47.660 Well, I mean, I don't think it's unethical because I think they are basically saying, look,

01:29:50.240 it's a different drug. You can change one molecule on a drug, and it completely changes

01:29:55.860 the way it works. Look at COX-2 inhibitors. You look at Celebrex versus Vioxx. I mean,

01:30:00.640 notwithstanding my views on that, which I talk about with Eric Topol on our podcast, but

01:30:04.600 basically two drugs nearly identical, and one was far more efficacious, but also had side effects in

01:30:12.980 a subset of people with hypertension. So I think the real question is, at what point do pharma

01:30:17.400 companies say enough is enough? And I lost track. I feel like there were three CTEP inhibitors that

01:30:23.560 were brought to phase three. And ultimately, there was a Mendelian randomization that looked

01:30:28.000 at CTEP mutations and really found that this was not going to be a good strategy. That's a good

01:30:33.520 example that you brought up with respect to safety. And then we talked about benefit, which

01:30:38.920 was Predimed. And then what happened in the Lookahead trial? Because I think Lookahead was one that got

01:30:43.540 stopped for futility, right? That was one where they randomly assigned about 5,000 overweight or

01:30:48.980 obese patients with type 2 diabetes, and it was an intensive lifestyle intervention. That was the

01:30:53.840 intervention group. And then you had diabetes support and education in the control group.

01:30:58.300 Their primary outcome, what they were looking at is what's called MACE. So death from cardiovascular

01:31:02.320 causes major adverse cardiovascular events. And I think they were going to look for 13 and a half

01:31:09.560 long trial, almost 14 years. And in this case, the trial was stopped just under 10 years. And it was

01:31:15.760 based off of what's called a futility analysis that you explained. Yeah, which basically means no

01:31:22.040 matter what happens from this point on, this study will not be significant. So at the time that it was

01:31:27.180 stopped, the hazard ratio was 0.95. So there was a suggestion of a 5% reduction in the risk of death from

01:31:36.220 cardiovascular events. So in the right direction, but the 95% confidence interval or uncertainty

01:31:42.640 interval, if we're going to adopt that terminology was 0.83 to 1.09. So it crossed one. And so, you know,

01:31:50.280 the P value is going to be greater than 0.05. In fact, the P value was 0.51 or something like that. I

01:31:55.560 mean, it was basically complete chance. There was absolutely no effect. And again, no point in continuing.

01:32:01.680 Okay. So moving on to review process. What is the review process once a study is done

01:32:08.460 to get a paper published in a journal? Once a study is done and they've done their analysis and

01:32:13.180 they write up a manuscript, they'll submit it to a journal for publication. And then that journal

01:32:18.100 will have an editor who will look to see if the paper meets their criteria. And if they think it's

01:32:23.840 original and interesting, is this paper adding something to the body of knowledge? At that point,

01:32:29.500 the editor might just say, Hey, this is not really a good fit for our journal or for whatever reason,

01:32:34.860 this is something we're not interested in any further. You're free to go and submit this

01:32:38.420 elsewhere. But otherwise the editor is going to invite individuals that are typically part of an

01:32:44.080 editorial board to peer review the manuscript. So you hear this term all the time, right? Which is,

01:32:49.340 is this a peer reviewed publication? And that's important because not all things that get published

01:32:53.980 have been peer reviewed. And that's obviously the highest standard. So the reviewers are basically

01:32:59.860 invited, not randomly, but because they have some expertise in this area, but other things are

01:33:05.080 important, right? You have to consider the conflicts of interest. They might have to decline if they're

01:33:09.700 conflicted. That's kind of a sticky topic because there were some really obvious conflicts like

01:33:14.960 financial conflicts of interest. But I think there's a whole deeper discussion about when you have

01:33:20.200 sort of philosophical conflicts of interest with the person. And that gets into another area,

01:33:25.660 which is peer review can be blinded or not blinded, right? It can be single blinded where the reviewer

01:33:31.860 knows who the author is, but the author doesn't know who the feedback is from. That tends to be very

01:33:36.780 common. I think that's probably the most common one I've seen. It can be double blinded where the

01:33:41.660 reviewer doesn't know who it's being written by and vice versa, and they can be completely open.

01:33:46.340 But again, the most common one that I've seen is single blinded. You'll typically have

01:33:50.160 three reviewers review something and they can either accept it outright,

01:33:54.660 reject it outright, or make recommendations for revisions. I think you'll see that as probably

01:33:59.660 the most common thing where they say, we're still interested in this paper, but did you actually

01:34:05.420 consider this hypothesis? So sometimes the revisions are just repeat your analysis. Sometimes it's do

01:34:10.880 another experiment. That won't be the case in a clinical trial. I've had papers where that happened,

01:34:15.220 where I've done a series of experiments and I'd written it all up and I'd submitted and the

01:34:18.480 reviewer came back and said, well, you really should have done this experiment as well because

01:34:23.180 this would have served as another control. So you go and repeat that experiment. Of course,

01:34:26.500 when you're working in cell culture or something like that, it's not that onerous. And this process

01:34:29.800 can go on several times, but ultimately the editor makes a decision to accept that paper and publish

01:34:35.420 it or reject it again. And that's basically the process. And you're typically going to start at the

01:34:39.920 top of the food chain. So you're typically, as an author, you're going to try to get your paper

01:34:44.420 published in the most prestigious journal. I guess that's something we can talk about what

01:34:48.640 determines the procedure of a journal, but you'll sort of keep going down the pecking order until

01:34:52.780 you can get it into the right journal. And sometimes right out of the gate, you just sort of know like

01:34:57.680 this is a publication that is really mechanistic and it's really going to be geared towards

01:35:02.580 proceedings of the National Academy of Science versus something that has really got enormous clinical

01:35:08.180 implications and should go to JAMA or the England Journal of Medicine. There's a little bit of that

01:35:12.440 that's going on as well. Every study that's out there, do they end up getting published?

01:35:17.220 No, many don't. I think this is a really big problem, which is you have this thing called

01:35:22.560 publication bias. So there's a very, very famous example of this that you and I have spoken about,

01:35:27.940 which is the Minnesota Heart Study. This is an example where a study was done. It ran from what,

01:35:34.440 1967 to 1973, if my memory serves me correctly. And it was looking at people who were in a residential

01:35:42.020 care facility. They had complete control over what these patients ate and they were randomized to a

01:35:47.240 diet of either normal saturated fat consumption or very low saturated fat consumption where the

01:35:52.860 saturated fat was substituted with polyunsaturated fats. And at the end of this seven year study,

01:35:58.940 of course, the hypothesis being the group that was substituted saturated fat for polyunsaturated fat

01:36:04.000 would have lower cholesterol levels and lower cardiovascular death rate.

01:36:08.020 And at the end of, in 1973, when the study concluded, they found that indeed the subjects

01:36:13.340 who were given high amounts of polyunsaturated fats and low saturated fats did in fact have

01:36:18.380 lower cholesterol levels, but their rates of cardiovascular deaths were significantly greater

01:36:22.320 and they didn't publish the study. That study would remain unpublished until 1989,

01:36:27.200 some 16 years later. When asked why a 16 year delay in publishing that study, the lead author,

01:36:33.920 who's, I don't even remember who it was.

01:36:35.520 Ivan France.

01:36:36.300 Yeah. France. That's right.

01:36:38.080 There's a senior and a junior. Yeah.

01:36:39.740 Yeah. Yeah. He said the study didn't turn out the way we wanted it to. That's kind of an egregious

01:36:44.100 example of publication bias in this case, a negative study. But I think there are a lot

01:36:50.640 of studies that don't get published, even if they're negative. And that's a shame because

01:36:55.100 when something doesn't work, it is just as important as when it does work. It is unfortunate

01:37:00.900 that not all studies get published because again, just think about it this way. If you want

01:37:05.580 to go out and do an experiment and 10 people have done that experiment before you and it's

01:37:11.060 always failed, wouldn't it be great to know that? Would that impact your decision on whether

01:37:16.460 or not you want to do the experiment a certain way or would you want to try something a little

01:37:19.860 bit different? So you can see very quickly this becomes problematic when papers don't get

01:37:24.320 published.

01:37:25.360 Okay. You've got this massive problem, publication bias. Do you know of any ways that can combat this?

01:37:30.940 I think there are a lot of people working on this problem. And I think one of the important

01:37:36.060 steps is pre-registration, which we talked about at the outset, right? Which is you force investigators

01:37:41.520 to pre-register their experiments on clinicaltrials.gov. That's not just, here's my experiment. It's here are

01:37:50.180 my statistical methods. Here is my number of subjects. Here's my primary outcome. Here are my secondary

01:37:56.140 outcomes, et cetera. And that basically makes it a lot harder to say, I'm not going to publish this

01:38:01.860 when it comes out if it doesn't turn out the way I wanted it to.

01:38:05.020 I don't know if there are particular journals that participate in this. I imagine that they could,

01:38:08.740 they could make it a prerequisite. Your trial must be pre-registered in order to be published in our

01:38:13.340 journal. And if it's a journal worth publishing in, it's probably not a bad idea.

01:38:17.420 Correct. There's both requirements of journals and there's also requirements of funding entities,

01:38:20.940 which say, we won't fund you unless the study is pre-registered. Registered reports is a publishing

01:38:26.520 format that an organization called the Center for Open Science. I think that's the one founded by

01:38:31.180 Brian Nosek. Is that his name? I think that's right. Yeah.

01:38:34.640 Yeah. Brian would be a great guy to have on the podcast actually at some point.

01:38:38.700 So with registered reports, basically you submit your protocol, almost like the pre-registration.

01:38:44.480 You submit that. And at that point, instead of after all the data is collected, it's peer reviewed.

01:38:51.320 And if it's peer reviewed and accepted based on you've got a high quality protocol, everything

01:38:55.600 looks good, then it's provisionally accepted for publication. Like you said, like if it's a

01:38:59.880 negative result and maybe the journal is not going to publish it, they're basically making the

01:39:03.580 decision if this is a positive or a negative trial, whatever, however it turns out, your protocol,

01:39:07.960 your plan looks really good. And so we're going to accept it. We're going to basically accept

01:39:12.700 it provisionally provided that you don't start cutting corners and go away from this plan that

01:39:16.920 we accepted. So you follow your plan and it's like, however the cards fall, it's already been

01:39:21.240 accepted for publication. That's a pretty novel concept actually. But again, I think it's all in

01:39:26.560 the spirit of how do we make sure that we get rid of publication bias, positive result bias. Again,

01:39:33.480 going back to what we said a second ago, you're far more likely to see something get published if it

01:39:38.340 is a positive finding that if it's a negative finding, although negative findings

01:39:41.700 are just as important for the establishment of knowledge, right? Let's use the CTEP inhibitor.

01:39:47.240 Imagine no one had ever published the studies demonstrating that CTEP inhibitors were

01:39:52.400 at best neutral, at worst harmful. Studies of that magnitude can't escape publication,

01:39:59.060 but think of all the bench research that can be going on or the small early phase one trials that

01:40:03.640 can be going on or the preclinical stuff that's going on. It's very easy to kind of

01:40:08.040 under-report things that are negative. Yeah. One other thing about the registered

01:40:12.380 reports I was just thinking about gets to your power analysis. As an example, let's say that

01:40:17.280 your study is underpowered. It would be great to have a group of your peers, say, tear your study

01:40:22.260 apart, but see if there's anything wrong with it. And they say, like, your study is powered to detect

01:40:25.860 like a 70% difference in all-cause mortality. They might be pointing something out that basically

01:40:31.800 saying, your study is dead on arrival if you actually run it this way.

01:40:35.660 Right. Because there's no way you're going to see an effect size greater than 30%, and yet

01:40:39.780 you're only powered to detect it if it's 70%, which is crazy. So either change the experimental design,

01:40:46.960 figure out a way to raise more money to do this study correctly. That's a valuable tool.

01:40:50.800 So we touched on this. I think you talked about more reputable journals. That's one of the questions

01:40:55.660 is I think people know that certain journals are more respected than others, but is there a reason

01:41:00.060 why in particular something's respected over another journal? Yeah. So there's something called

01:41:05.580 an impact factor, and it's usually something that changes each year, meaning it's usually evaluated on

01:41:11.360 a per year basis. So it's the ratio between the number of citations, total citations, citing something

01:41:20.000 is referencing that paper. So if you're writing a paper, you would say, well, Kaplan wrote such and

01:41:24.880 such, and you cite that paper. It's a reference. Great paper.

01:41:28.020 Great paper by Kaplan. It's the ratio between the total number of citations that come to all

01:41:33.880 articles published by that journal by the total number of articles published by that journal over

01:41:38.300 a previous period of time. So it's typically done over a year. So you would say 27,000 citations

01:41:43.280 came into the journals published out of that article out of 10,000 articles. 27,000 divided by

01:41:52.700 10,000 would be 2.7. The impact factor would be 2.7. To put this in context, there's 13,000 journals

01:42:01.440 out there. 98% of them have an impact factor less than 10. 95% of them have an impact factor less than

01:42:10.460 5%. And about half of them have an impact factor less than 2. Just to give you kind of a sense of

01:42:16.100 what impact factor looks like. And by the way, there's a tail on that is very asymmetric. So

01:42:20.780 the number of journals that have an impact factor of 0.4, 0.7, 0.8, I mean, is incredibly high.

01:42:27.840 If you look at the distribution of this, there's obviously a very long tail on the small end of

01:42:30.940 this. I've got a table here that I can pull up. Yeah, yeah. Let's take a look at that because I

01:42:35.360 think it's pretty cool to look at this actually. So if you look at this table, you've highlighted

01:42:39.580 the journals that have more than 100,000 citations. What year is this? 2019. So you've got the New

01:42:47.360 England Journal of Medicine, which is kind of staggering, right? Nearly 350,000 citations.

01:42:54.040 And we could do the math, but you can tell how many articles were published because if you divide

01:42:58.480 347,000 by that number, you get 74.699. So that's the impact factor for the New England Journal of

01:43:04.700 Medicine. The Lancet, 250,000 citations, impact factor 60. So you can sort of see like these are

01:43:11.460 the whatever top 28 journals by impact factor. There's kind of an outlier here, right? Which is

01:43:18.100 the Cancer Journal for Clinicians, which has a staggering impact factor of 292, despite only having

01:43:26.660 40,000 citations. That's a little bit of a skew. I don't really consider that to be in the same league

01:43:35.220 because it's basically the Global Cancer Society statistic article. And therefore it reports on tons

01:43:42.220 of cancer statistics. And therefore it doesn't really publish that much, but it gets referenced

01:43:48.500 so much because anytime someone is basically referencing a cancer statistic, they're going to

01:43:52.660 reference that. So I kind of put that in its own little category. And the same, by the way,

01:43:57.680 notice that the WHO technical report series has an impact factor of 59, but it's only cited like

01:44:02.940 3,500 times. So it's cited a lot for a very few number of publications. But again, I think the ones

01:44:09.600 that really matter here clinically, the New England Journal of Medicine, obviously Lancet, JAMA are sort of

01:44:16.020 your huge clinical ones. One more question about reading a scientific paper. So do you have a

01:44:21.460 particular process when you read a paper? Do you just print it out and start to finish,

01:44:25.620 start from the abstract and work your way through? Or do you have a particular process in general?

01:44:29.920 Yeah, kind of. I mean, I generally do read the abstract first and that gives me a sense of,

01:44:35.240 am I interested in this paper? The title of the paper is usually not sufficient for me to know if I'm

01:44:40.240 going to be interested, but the abstract usually is my go, no go on that. So I could read 10 abstracts

01:44:47.640 in a matter of minutes and decide, do I want to read three of these papers? The next decision I make

01:44:54.380 is how familiar am I with this subject matter? And if I'm not really familiar with it, I will read the

01:45:00.580 introduction section. A lot of times I am relatively familiar with the subject matter. So I'll just skip

01:45:05.680 the introduction section altogether. And I usually go straight to the methods section and that gets

01:45:11.120 into the details. So this was an exercise study. They did muscle biopsies. I just want to really

01:45:17.560 get right down to it. So how many subjects were there? How were they randomized? What were the

01:45:21.780 interventions? When were the biopsies taken? Was there a crossover? I just want to get into all that

01:45:27.260 detail. The next thing I do is I look at the results section, but start with the figures. So I kind of go

01:45:33.960 right into look at the figure and read the legend. And if the authors have done a good job, it's almost

01:45:40.120 standalone at that point. So figures and tables should, in my opinion, be standalone. So the legend

01:45:47.300 should explain everything you need to know. And of course, then reading the prose of the results

01:45:53.180 section kind of adds a little bit more color to that. And then the last thing I do is I'll read the

01:45:58.460 discussion section because I'll, by this point, have formulated my own thoughts on what the strengths

01:46:04.580 and weaknesses of the studies are, what questions remain, et cetera. Oftentimes the authors will have

01:46:09.140 thought of things that I haven't thought of, or they'll thought of things that I disagree with,

01:46:12.400 and I'll kind of want to go through and do that. So that's my general framework for it. And you'll

01:46:18.240 notice it's quasi-linear, but not entirely linear. Yeah. I'd like your example of figures. I can't

01:46:23.700 remember, but did Steve Rosenberg, and this probably talks about the importance of mentorship as well.

01:46:28.900 Did he have advice as far as, I think probably when you're writing a paper,

01:46:32.840 like your figures and what they should represent? Yeah, that was our process. So when you finished

01:46:37.440 an experiment, the very first thing you did was you made the figures and tables, you made those

01:46:41.700 and the legends first. And that's what you would go in and present to him. And you'd present that

01:46:45.500 at journal club, or not journal club, but lab meeting rather. And you wouldn't really take pen to

01:46:50.180 paper to write anything until you had that down. You had to sort of know what are the relevant

01:46:56.660 figures? What are the relevant tables? Can I explain them very concisely in a legend?

01:47:01.560 And once you got that down, the paper kind of wrote itself. The methods are really easy to write.

01:47:06.540 The results is easy to write. And the last thing you would write would be the intro and the abstract.

01:47:11.460 That was just the way that I was taught to do it. And I found that to be very productive.

01:47:14.540 Okay. I think we've run the list of questions.

01:47:17.460 We got through them all, man.

01:47:18.900 We did. We got through it.

01:47:20.820 Thank you for listening to this week's episode of The Drive. It's extremely important to me to

01:47:25.320 provide all of this content without relying on paid ads. To do this, our work is made entirely

01:47:30.300 possible by our members. And in return, we offer exclusive member-only content and benefits above

01:47:36.720 and beyond what is available for free. So if you want to take your knowledge of this space to the next

01:47:41.260 level, it's our goal to ensure members get back much more than the price of the subscription.

01:47:46.420 Premium membership includes several benefits. First, comprehensive podcast show notes that detail

01:47:52.540 every topic, paper, person, and thing that we discuss in each episode. And the word on the street is

01:47:58.380 nobody's show notes rival ours. Second, monthly Ask Me Anything or AMA episodes. These episodes are

01:48:06.260 comprised of detailed responses to subscriber questions typically focused on a single topic

01:48:11.100 and are designed to offer a great deal of clarity and detail on topics of special interest to our

01:48:16.480 members. You'll also get access to the show notes for these episodes, of course. Third, delivery of

01:48:22.220 our premium newsletter, which is put together by our dedicated team of research analysts. This

01:48:27.280 newsletter covers a wide range of topics related to longevity and provides much more detail than our

01:48:33.240 free weekly newsletter. Fourth, access to our private podcast feed that provides you with access to

01:48:40.020 every episode, including AMA's sans the spiel you're listening to now and in your regular podcast

01:48:46.220 feed. Fifth, the qualies, an additional member only podcast we put together that serves as a highlight

01:48:53.100 reel featuring the best excerpts from previous episodes of the drive. This is a great way to catch

01:48:58.440 up on previous episodes without having to go back and listen to each one of them. And finally, other benefits

01:49:03.940 that are added along the way. If you want to learn more and access these member only benefits, you can

01:49:09.380 head over to peteratiamd.com forward slash subscribe. You can also find me on YouTube, Instagram, and

01:49:16.340 Twitter, all with the handle peteratiamd. You can also leave us review on Apple podcasts or whatever

01:49:22.940 podcast player you use. This podcast is for general informational purposes only and does not constitute the

01:49:29.420 practice of medicine, nursing, or other professional healthcare services, including the giving of

01:49:34.220 medical advice. No doctor patient relationship is formed. The use of this information and the materials

01:49:40.520 linked to this podcast is at the user's own risk. The content on this podcast is not intended to be a

01:49:46.600 substitute for professional medical advice, diagnosis, or treatment. Users should not disregard or delay in

01:49:52.460 obtaining medical advice from any medical condition they have, and they should seek the assistance of their

01:49:57.720 healthcare professionals for any such conditions. Finally, I take all conflicts of interest very

01:50:03.080 seriously. For all of my disclosures and the companies I invest in or advise, please visit

01:50:08.680 peteratiamd.com forward slash about where I keep an up-to-date and active list of all disclosures.

01:50:27.720 Thank you.

The Peter Attia Drive - September 04, 2023

#269 - Good vs. bad science： how to read and understand scientific studies

Episode Stats

Length

Words per Minute

Word Count

Sentence Count

Misogynist Sentences

Hate Speech Sentences

Summary

Transcript