AI and Machine Learning in Medicine with Jonathan Chen

AI and Machine Learning in Medicine with Jonathan Chen

Good afternoon. Thank you everyone for having me here,
a real great privilege, where I want to do some review of
concepts in artificial intelligence and machine learning in healthcare medicine. So why are we all so interested? What is it we’re all so interested for? Why did you show up today? I’ll tell you what I think it is. Is that we have a growing
collective anticipation over the potential of emergent
technologies that can answer questions previously thought to be impossible for
a computer. Questions like, is that a chihuahua? Wait a minute,
I think it’s a blueberry muffin.>>[LAUGH]
>>Kitten or caramel ice cream? Dog or bagel? Or do my personal favorite,
parrot or guacamole?>>[LAUGH]
>>We have algorithms now that can automatically
fill your or your family’s Facebook feeds with the perfect click-bait that
will keep you distracted for hours. All while siphoning off
your personal information, without you even realizing it. We could label every cat
video on the Internet. And yet, you might think perhaps there
could be some more meaningful applications of such technology. How about, say, is that a benign mole or
is that malignant melanoma? Is that cancer? Is that a normal, healthy lung,
or is that infected pneumonia? More and
more you’re gonna see examples like these, where algorithms are able to answer with
superhuman levels of accuracy clinical questions previously thought
a computer couldn’t do. And yet medicine, health care, systematically lags behind
other industries, by over a decade, in terms of our effective use of information
technology and real world data. Even though, I mean, shouldn’t medicine,
the one place where literally lives hang in the balance, be the place where
we should be doing this the best? Consider, for example,
by some estimates medical error is the third leading
cause of death in America. 250,000 dead, every single year. Now, some argue, give me a break,
this number’s nonsense. The real number’s like 400,000, this only
counts inpatient hospital coded deaths. Others say, hey come on,
let’s not exaggerate, all right? Look at the prior studies,
it’s more like 100,000. So, two things. Even if it were only 100,000, that is
a jumbo jet airliner full of patients we are pushing over a cliff to
their death every single day.>>[LAUGH]
>>Even more concerning, what I find, is precisely that I can’t even
tell you what the real number is. Apparently, that is how much we
care about reliably capturing and using our data in medicine. I don’t know,
maybe lives are too abstract, right? Fine, then consider the money.
Of the $3 trillion dollars we spend on health care every year, and you can see all the drama over how
do you deal with health care reform. It’s all about health care, just super
expensive, particularly in the US, and we don’t know how to deal with it. Estimates are that up to a third of
that spending is essentially waste. It’s extra services, processes, administration, prices that don’t
actually help any patients. Perhaps it is no wonder, then, that some
are looking for advancing technology, intelligent systems, some way to help us
make better use of our scarce health care resources so that we can do more good for
more people. So that’s some of the why, this is why,
I think, we’re looking for something. Most of my talk is gonna
be about the what. What is it we’re talking about? You like AI, artificial intelligence? You like machine learning? Big data? Cognitive computing? No, nobody? Precision health? What are we talking about? When I think back to my PhD in
computer science ten years ago, the things people call AI today,
it shocks me. I’m like, for reals? We get to call that AI now? I thought we were gonna
talk about HAL 9000.>>[LAUGH]
>>Skynet, David from Prometheus, right? General AI systems. Thinking machines that can
reason through complex problems, adapt to new situations, and
fluently interact with human beings. We are nowhere close to such technology,
not even close. What are we talking about, then? If you go to the much more
general definition, well, AI is just any time a computer can kind
of mimic intelligent behavior that you’d expect out of a human, okay? Well, in that case, a rule-based expert system from
the 1980s is a great example of AI. Here, I’ve got a rule for you. If you have a fever, and
you have low blood pressure, I think you might have sepsis,
a really bad infection. Perfectly fair rule,
you have a computer execute that, that’s a simple form of artificial
intelligence right there. Usually, that’s not what people
are talking about nowadays. Nowadays, when someone is selling you AI, almost always what they’re really
selling you is machine learning. Just a set of tools and
techniques, of which deep learning is a particular subset of
those largely rebranded neural networks. This technology had been around since
the 1970s, but in recent years has been shown to be particularly effective as
certain tasks, like image recognition. So what distinguishes machine learning,
in general? It’s in contrast to the classical where
you can imagine programming a computer from the top down, you’re putting rules
into the computer about how to behave. Instead, these are algorithms that learn
by example, they learn by data, right? So look, I’ve got 100 patients here who
have sepsis, that bad infection, and I’ve got 100 patients here who don’t. Is it fever, is it blood pressure? I don’t know, I don’t know! Computer, you just look at all these
examples, and you tell me what the rules are, you tell me what the patterns
are that make the difference, right? You break down machine learning further,
there’s several different categories and types of applications it can
provide a lot of value for. Although, again, most of the time
what what you’re being sold is very very narrow,
it’s a supervised learning classifier. These are not general thinking machines, these are models designed to answer very
narrow multiple choice questions, right? Is this picture a cat or is it a tumor? Is my patient gonna have
a stroke in the next year or not? Right?
Multiple choice question, and very typically we frame these
as prediction problems. Can you predict or diagnose which one of those multiple
choice answers is gonna be correct? So let’s think through
how we might use this. Conceptual example. How do you allocate a scarce resource, the
concept I want you to keep thinking about? A scarce resources is, say,
how about an organ transplant? So if I’m taking care of somebody, chronic
alcoholism or maybe untreated Hepatitis C, or Steve Jobs, he had cancer,
he had a tumor growing in his liver. Liver’s shutting down, this person is absolutely going to die
unless they get a liver transplant. When you’re in a town like Stanford,
our donor pool is very short. We don’t have enough people
riding motorcycles or getting gunshot wounds to the head, so
we have a very limited donor pool, right?>>[LAUGH]
>>In some ways, it’s a good thing. But then what do you do, then? Okay, a donor organ does show up, a liver
does show up, I’m ready to donate it. You’ve literally got ten
people who want that organ. You’ve got ten people who need that organ,
what do you do? Who would you give it to? This is not a rhetorical question, this is a real medical question that
has to be answered all the time. Was there a thought?>>[INAUDIBLE]
>>That’s good, right? It’s not gonna probably benefit them,
right? The organ will be lost. But barring those kind of easy
screening criteria, what do you do? You might think, well, to be fair,
how about first come, first serve, right? Let’s have a waiting list,
that seems fair. In most of medicine, actually, we never
have a first come, first serve queue. Almost always, we have a priority queue. We triage people up the list, right? If you show up to the Or emergency room
with a stubbed toe, they’re gonna make you wait a while, cuz the guy who’s having
a heart attack needs to be seen first. Same idea here. Hey, I know you’ve been on the waiting
list for a year, but you’re not that sick, you could probably survive another year. This guy, I know he just showed up, but he is going to die within the next
week unless we transplant now. He’s sicker,
we got to get to him first, okay? So that’s usually how we do things. Well, then, if that’s the way we do it,
well, who’s sicker? How do we define who’s sicker? It’s the patient who’s in the hospital,
they’re pretty sick. You know what, the patient who’s been
escalated to the intensive care unit, [SOUND], really sick. Well, it did used to be that way,
in some cases like heart transplants, it does still work that way. You can imagine how easy it is
to game that system, right? I’m going to transfer my patient
to the intensive care unit. Not necessarily cuz they need it,
but by doing so I’ve artificially made them look sicker
and I can get them up that waiting list. Perhaps we can come up with
a slightly more objective, data-driven approach to doing this. So Ray Kim, who is our head of
our gastrointestinal division, he developed a MELD score many years ago. And what this is, you input some
laboratory tests about a liver patient and it comes up with a score. And ultimately,
what this thing is actually doing, it’s predicting the likelihood that
a liver patient is going to die. But what we use it for is to prioritize
who should get the transplant first. So these kinds of data-driven prediction
systems, they’ve been around for decades. We just have different tools now to
make it easier for us to develop those. Make the distinction again, this system
has said nothing about transplant. It said, predict will you die. The prediction is different from,
you understand the context, what’s the action that’s
worth taking over it, okay? If you look at this website, MDCalc, we have hundreds of these
kinds of risk scores. We’ve been using these kinds of things for
decades. This is how we decide whether or not to
give you cholesterol medicine, whether or not to give you a blood thinner. We’re trying to predict whether you
need it, and thus decide whether or not to intervene. Look outside of medicine, these kinds of
algorithms are pervasive in your everyday life, even if you don’t realize it. What is your FICO credit score? It is a prediction,
are you gonna repay your loan or not? How much email spam do you get? It’s still kind of bad now,
it was even worse before. But eventually, these algorithms look at
your mail and try to guess is this spam, is this junk mail or not? What’s become a lot more popular in
recent years is automated scorers. For a comparison, this MELD score,
this is pretty good. But I gotta open up in a separate webpage,
I gotta data entry in these numbers, and I’m punching this in one patient at
a time and trying to think it through. Why enter in these five numbers? Now that we have an electronic
medical record infrastructure, and as clunky as it is, at least we have it. Why not flip every data, a hundred
pieces of information at the algorithm. Why check it one at a time? Let’s just monitor people 24 hours a day,
every day of the week, right? That’s become possible now to do. Although, conceptually, it’s not really that different than things
we’ve been doing for a long time, right? So how would you go about doing this? If you wanna make progress,
you gotta have the data. The oil of the Fourth Industrial
Revolution is the data. Why it’s especially difficult in
healthcare is cuz healthcare data is notoriously messy to deal with, why? Because it wasn’t created for the purposes of things like,
I don’t know, improving healthcare. It was created for billing. It was created so doctors and
nurses could kind of chat with each other. That’s how we kind of write our
notes in a way that’s very hard for a computer to digest. So if you did make any progress in an AI
or data science project, you will quickly find people will admit most of your
time is spent on the data janitorship, just getting things organized and
making sense of it. 80% of your time is spent
just preparing the data. The other 20% you spend complaining
about how you need to prepare the data. If you’re willing to go through with that, you’re usually trying to get to
something that looks like this. This is made-up data, a typical dataframe,
or I might call it a feature matrix. Every row is some patient or some case
I wanna make a prediction about so I can make some estimate. Every column is just
something about that person. And usually, there’ll be one target,
one orange column here. That’s the thing I wanna anticipate or
predict or diagnose. In this case,
does the patient die within a year? Let’s do a concrete example
to break this down. You guys are doing okay right now, later in the afternoon you’re
feeling short of breath. Something doesn’t seem right. You go to the emergency room. ICU in the hospital, order a CT scan. Shoot, you’ve got a pulmonary embolism,
a blood clot in your lung, a really dangerous situation. I’m gonna get you on blood thinners,
the right medication or treatment. And that’s the key thing I need to do,
but now I have a question. As a human being,
I have to answer a couple questions. I have to make a prediction,
are you gonna be okay? Because I need to make a decision. Do I need to admit you to the hospital,
or can I just let you go home, right? As a human being, this is a prediction and
a decision I have to make. So how do I go about doing that? Look, I’m a doctor too,
I’m gonna examine you. I’m gonna talk to you, get your history. Tell me how you feel. With these healer’s hands, I’m gonna
lay these healer’s hands on you and do an examination and make my assessment. And you know what,
I think you’re gonna be okay. You’re not that bad,
I’m gonna give you the right medication. I’m gonna let you go home. I think that’s the right thing to do. And as much as I joke about it,
expert intuition is extremely powerful and effective. The challenge with it is that it is
extremely difficult to reproduce. Which means it is essentially impossible
to deliver it consistently at scale without some kind of support system. So what kind of support can we offer? How about let’s look at our data. Let’s look at our prior cases. Imagine each one of these
dots represents some patient, some person who had a blood clot in
their lungs, kinda like you did, right? And I measured two things about them,
their heart rate and their oxygen levels, and that’s where they landed. The blue dots are patients
where they did fine. You gave them treatment,
they went home, everything was okay. The red patients, something bad happened. Ended up in the intensive care unit
on an artificial breathing machine cuz they got so bad. And maybe they even died. All right, so here’s the question for you. What do you do when Mr. Green shows up?>>[LAUGH]
>>What do you do? I can use my best guess. My intuition tells me,
I think you’re gonna be okay. Or let’s look at our data. What we can do, how about we draw a line,
a decision boundary, through the prior cases we’ve seen
in mathematically optimal way, pushes as many blue patients to this
side and red patients to that side. And then now I can have a quantitative
backings to my saying, you know what, Mr. Green sure looks like a blue
patient more than a red patient. Now I can even more trust in my intuition,
I think you’re gonna be fine, I’m gonna let you go home,
you’re gonna be okay, okay? So that is the basic concept
of how these algorithms work. But observational research
has done this for years, for decades, like that MELD score. Logistic regression, Cox survival curves,
it’s all about drawing lines through data, and they work very well. What if your data looked like this?>>[LAUGH]
>>It’s obvious the red patients are different than the blue patients,
right? It’s obvious. There is no way you can draw one line
through that data that makes any sense, because there’s clearly some kind
of nonlinear interaction happening. Is it your heart rate that matters? Maybe it’s your blood
pressure that matters. What if it was the ratio of those
two numbers that mattered, right? When you start talking about ratios,
that’s a nonlinear interaction. There’s no way to draw a line through that
data, unless you specifically look for it. So then consider what’s a benefit of
many machine learning algorithms, they have a lot more flexibility to
deal with this kind of variation. The top here are three different example
input data sets, red patients and blue patients. And here’s a couple of algorithms
trying to figure out how to figure out where the red and
the blue parts of the world are, right? If you have a linear method,
look at the middle here, this is clearly two concentric
circles of blue and red people right? But the algorithm can only
draw lines through it, it really has no sensible way to do that. In the bottom here is a nearest
neighbors machine learning algorithm. It has a lot more flexibility to kind of
figure out where to draw those boundaries, that gives it that adaptability. You might also note that flexibility
isn’t always a good thing, though. You can go too far. There’s a bias-variance trade off,
what does that mean? Look at this, it draws this kind of weird, almost jagged decision boundary
between the red and the blue lines. It’s kind of bizarre what it’s doing. What’s happening is it’s trying too hard. It is over fitting the data and
learning what’s probably just noise, just random variations and
not really the real thing that’s going on. This is a problem because nobody is
interested in weak recapitulating historical data trends. I want a model that can make robust
predictions about future cases I’ve never seen before, all right? This model is trying too hard to fit
the past when it means it won’t be, perhaps as useful as I want in the future. But, probably better than linear model,
right? Here’s a battery of many more
machine learning algorithms. All trying to find different ways to
break up those input data sources and figuring out what the red and
blue parts of the world are, okay? The broader point I want to make here is,
all of these, this is standard algorithms at this point. Been around for decades and
proven in many applications. Open source software,
any one of you can download and start using these tools right now. You don’t need to be a PhD in computer
science to start doing this kind of work. Combine that with the increasing
ubiquity of electronic health data and the time is ripe now to
start developing and possibly deploying AI machine learning
tools in medicine and healthcare. Before we dive too deep though,
some things to be aware about. You’re used to hearing about
observational research. Any time you hear research in the news,
coffee causes cancer. Actually, we changed our mind,
coffee doesn’t cause cancer. That’s what observational research is. You hear about this all the time. It’s what you’re used to hearing about. And you’ve got to make the distinction
between what’s going on here. Anybody here think smoking is bad for you?>>[LAUGH]
>>Yeah, okay. Anybody here think smoking
causes lung cancer? A lot of hands up, I believe so. But if you really understand evidence
based medicine, how do you know that? How do you know that? We should be running
a randomized control trial. If you really wanna know what causes what,
that’s the only way you really know. Doesn’t exist for smoking. We haven’t done a trial, so we don’t,
when sometimes we don’t really know. A couple of reasons, it will be very
impractical and randomized smoking and it would be probably unethical to do so,
because the observational evidence that smoking causes lung cancer
and health effects is so overwhelming. It wouldn’t be ethical to even
try that trial, all right? Right, cuz that kind of research,
it’s all about trying to infer something. It’s about trying to explain
something about how the world works. Compare that to a machine learning
algorithm, a prediction algorithm. These algorithms, all they’re trying to
do is get the best possible prediction accuracy cuz that’s the only thing
you told it to care about, all right? They often don’t know and
they don’t care how or why anything works. All they’re trying do to is be accurate. That’s a very different goal than trying
to explain how something works, right? When that study came about, that smoking
cause lung cancer, that’s great. I can use that knowledge
right now as a doctor. I’m gonna tell all my patients
please stop smoking, all right? An algorithm comes out. Look, you can just take
a picture of somebody’s mole. It can tell you if it’s cancer or not. That’s amazing and does nothing for me. I’ll see ten patients tomorrow
in clinic who have a rash and having read that,
I can’t do anything, right? That doesn’t help me actually act on them. Until we have an infrastructure to readily
share generalizable software modules and that’s just years off at this point. So given that, again, that these
prediction algorithms, all they wanna do is predict, they’re not trying to explain,
they don’t really know how anything works. Just beware that they can make
very strange conclusions like patients who are visited by
the chaplain are very likely to die.>>[LAUGH]
>>And this is true, this is absolutely true. This is a correct finding. If a chaplain comes to visit you,
it’s, well, it’s highly predictive of death, right? Please don’t conclude,
let’s fire the chaplain and we could save some patient lives, right? You get that this doesn’t make sense
because you understand the context. There’s confounding by indication. The kind of patient that chaplain was
gonna go see were the kind of patients who weren’t doing so well, right? The algorithm doesn’t know that. The algorithm has no way to
understand that kind of context. All you told the algorithm to do is
be accurate in predicting who dies, it’s gonna get its hand on any
piece of information it can and do what you’re told to do,
be accurate in making that prediction. Even though it’ll find things
like this that you realize, that just doesn’t make sense. You couldn’t use that in any helpful way,
right? All right, let’s think more broadly again. All of the different branches
of machine learning. Most of the time you’re hearing about
supervised learning classifiers, but there’s many other branches of categories
when going to like unsupervised learning. How do you automatically
organize complex data sources? I’m interested in a lot of these
areas in terms of medicine. Why do we do this? Because somebody to argue with there’s
an unreasonable effectiveness to data. Don’t get too enamored with
a particular method or an algorithm, deep-learning this or SVM that. All great, all important. But the real secret sauce,
the secret ingredient, it’s not a secret. It’s the data. That’s actually what you need. A good data source is what you need. This is particularly challenging in
medicine because of how hard it is to generate good labelled training sets,
right? I need to hire a bunch of people to
look at a bunch of pictures and say, look is this a tumor or is this a cat? Right, that’s a lot of
effort to do that labelling. Can you manually review 10,000 charts for
me and tell me if there’s a diabetic
with complications or not? That’s a lot of manual effort. The conceit here is, rather than
dreaming about all the things we could do if we had these great,
lovingly curated datasets. What could you do with the masses of
unlabeled data that’s already available? This is how we can do things like market
basket analysis, so what is this about? You bought this book on Amazon? Hey, maybe you’ll like this book too,
cuz other people did. So you know when you go to the grocery
store, you can get a club card, and they will give you free
discounts on your groceries. So what are they doing that for? Those discounts are not free,
it’s not free. What they are doing is they’re keeping
track of what you’re buying, now and over time, so they can make a lot of inferences
and guesses about how you behave. So classic stories in this space
is our one supermarket found. People buy beer and diapers together. A lot more than you would
expect by random chance. It’s a true finding,
I don’t know what it means, but you could easily imagine
applying a theory to that data. Young beleaguered parents go for another
diaper run, hey, if I’m gonna be here, I might as well grab some beer.>>[LAUGH]
>>You could also imagine, as the supermarket manager, well then you
can start making some strategic decisions. Yeah, there we go. We got the plan. We’re gonna put all the diapers in
the back of the store, and we’ll stack our really expensive alcohol in front of it,
encourage some impulse buys. Up here is a really fun,
very colorful story. So Target, the store Target,
started sending Target advertisements to a teenaged girl
that displayed a bunch of baby products. Her father read this or
found this and was very upset. What are you guys doing? What, did you want my teenage girl
to get pregnant or something? Except it turned out what happened was
Target’s algorithm had figured out that the teenage girl was pregnant
even before for her own father had.>>[LAUGH]
>>And it was just based on the very mundane
daily purchases she was making. She was buying cotton balls and
a very large purse and lotion or something like this. Things that you might
not be thinking about. And what Target did was they realized
when a young woman buys those things that usually means they’re about
three months pregnant. And there’s a great opportunity
to target a new customer. So anyway, that’s what that
club card is actually about. Okay, let’s get back to medicine, so
I’ve been trying to try this idea for medical decision making. What we have here, this is screenshots
from a prototype I developed. And this is meant to simulate
the electronic medical record when I’m taking care of you in the hospital,
in the clinic at the end of the day, I gotta punch in the orders
that defines your care plan. What medications do you need,
what lab tests should we check, what x-rays should we do? So if I knew nothing else, the system
would just say, here’s just the generic bestseller list of most common things
that happen in the Stanford Hospital. I’m gonna manually check for something. So I’m just looking for spiro, here we go,
spironolactone, add to shopping cart. I just ordered spironolactone for you. This is a male hormone antagonist,
a blocker, it’s very good for people with liver failure or
heart failure. So what’s happening now is given
that I order spironolactone for you, in the left here
these are things that other Stanford doctors are very likely
to order within the next 24 hours. On the right are things that other doctors
are disproportionately likely to order within the next 24 hours. So, for example,
yeah actually you know what, furosemide, that’s a good choice,
add to shopping cart. It’s a diuretic,
a water pill, makes you pee. Has a very good balancing effect on
the electrolytes, a spironolactone. Given these two pieces of information,
the system juggles the list and here, what’s up at the top now, Carvedilol,
Digoxin, Isosorbide Dinitrate. These are heart failure medications. Rifaximin, Propranolol,
hey, you know what? Rifaximin, this for
complications of liver disease. Actually my patient needs that,
order that, add to shopping cart. The system juggles the list again,
Lactulose, Zinc, Propranolol, treatments for complications
of liver disease floating to the top while the heart failure medications are starting
to get pushed down because they seem to be less relevant. So a few things I want
to emphasize this point. You’ll notice at no time
did I ever write a note, or pick a diagnosis code from a list. I’m just taking care of you, I’m just
taking care of the patient, right? I’m ordering the medications you need, I’m just doing my job,
because I wanna do what you need, right? I’ll not be distracted by
irrelevant data entry. I mean, how many of your times do
you go to the doctor’s office and they’re looking at the computer
rather than looking at you, right? But if we design this system properly, it should be able to pay attention to
what we’re doing and infer the relevant patient context and anticipate what you
need without us having to even ask for it. So, how did I build this? Maybe I got all of the practice
guidelines for liver disease and I did my homework studiously. And I looked at what all
the actions you should do and I programmed rules into the computer
to trigger those, right? Or maybe I hired 100 doctors and had them
look at 10,000 charts a liver patients and say can you manually annotate all of
these with what you think the right treatment and diagnosis this is? So no I didn’t do that because
I can’t afford to do that. That is extremely laborious and is very
brittle, it’s actually hard to maintain. But in senses maybe I don’t have to,
right? Cuz what do we do as doctors every day,
right? In some sense, we are an army of
manual annotators working every day doing our best. Seeing you in our clinics,
seeing you in our hospital, and we’re trying to figure out
at the end of the day, we can write a note that might be
nonsense, it might be copy and pasted. At the end of the day it’s our orders
that show you what we really believe. At the end of the day I think
this is the treatment and diagnostic approach that
we need to help you, right? Sound pretty good? You guys like it? Worried about any ways
this is gonna go wrong?>>[LAUGH]
>>Then we’re good right? Okay great, let’s do it, at least a couple ways I can
imagine that things could go wrong. So it’s usually the health services
researchers, maybe the health economist, they go wait a minute, wait a minute,
isn’t this gonna lead to over utilization. Amazon recommends you a book, all they
care about is you just buying more stuff. They don’t really care if you are happy. That’s not really what we’re trying to
do in medicine and I think that’s true. If you just use the system just as I
showed, probably will lead to more good decisions, but it will probably lead
to some extra wasteful decisions, too. So, a complementary line of my work
is I’m doing the inverse situation. Hey, before you order that back MRI, did you know that only 2% of the time
has that ever lead to abnormal results. And only 1% of those times does that
actually lead to a surgery that helps you? I’m not telling you what to do or
not to do, but you might wanna think about that before
you commit to these extra processes. I made a much tougher existential
question at the end of the day, do you believe in the wisdom of the crowd,
or do you fear the tyranny of the mob?>>[LAUGH]
>>I’m actually not sure some days. This is actually really hard,
because just because something is common, does not mean that it is good. This is really hard to unpack in medicine
because there is no gold standard to define what is good medical
decision making, right? So if you wanna know more depth and
details, you read my papers. I have a whole line of research papers
all about trying to unpack this really complex question. Here’s an anecdotal, a conceptual example
to at least understand the concept for why this almost strangely might work. Who wants to be a millionaire? TV game show, right? Answer a multiple choice
question to win big money. If you get stuck, classic lifelines,
ask for help, right? So, what do you do? You can phone a friend, what is this? You can get on a telephone and
call anybody in the world you want, who are you gonna call? I’m gonna call my smartest most
knowledgeable expert friend I know, I’m going to consult a specialist who I think has the best chance at
getting me the right answer, right? Alternatively, I can ask the audience,
just take a poll of the studio audience. These aren’t knowledgeable experts
they’re just ordinary people watching a TV game show. Actually, let’s do this for
fun, I wasn’t gonna do this, I didn’t know what was going to happen. How many think the answer
to this question is A? Mera Mayer was the first pregnant woman
to play, who thinks the answer is A? Nobody, a few people, how many people
think it is B, the first NASCAR driver? No, how many people think it
is C the Fortune 500 CEO? Interesting, how many people
think it is D a NASA astronaut? A couple. It turns out if you phone a friend they’re
right about 65% of the time, pretty good. If you ask the audience it turns
out they’re right 91% of the time. What on Earth is going on?>>[LAUGH]
>>I trust experts, right? I trust smart people,
who knows who’s in this audience. I mean you guys apparently
are pretty smart actually, [LAUGH] So what’s going on? A couple of things to breakdown. This does not mean the audience
was filled with geniuses and 91% knew the answer that you didn’t. It turns out in this case
only 55% of the audience, you guys are way better than that. I think you guys were 90%, and that
example only 55% pick the right answer but that majority vote turned out to
be right over 90% of the time. There’s actually mathematical theorems
that explain why this phenomena works. It’s 200 years old, it’s the reason
why you might expect a jury of 12 regular people to be more likely to pick
the right answer, than 1 expert judge. I mean, the judge is really smart right, don’t you think one judge is more
likely to know the right answer? They are pretty smart, but
as long as each jury each voter, is at least better than random,
in terms of picking the right answer. The more you aggregate the more
likely you are to aggregate toward the correct answer. Not a guarantee, the audience
is still wrong, 90% of the time. Think about that when you are voting
in a couple of weeks, but,>>[LAUGH]>>Your chances are better than leaving it up to one person. Another really interesting thing
this example illustrates is, this concept should work well
as long as the audience, the crowd you’re learning from has goals
that are aligned with yours, right? It turns out the Russian
version of this game show, They actually had a turn ask the audience
off, because I guess Russians don’t like watching other
people win money, or something.>>[LAUGH]
>>And so they would deliberately feed contestants wrong answers to sabotage
them, rigging the voting system. Anyway, so
just beware that that could happen, okay? I would like to hope that, if I’m studying
doctors, doctors, for the most part, are trying to do their best to do good,
okay? For the most part. [LAUGH] Keep in mind if you think about where we are on the hype
cycle of emerging technology. The x-axis here is time,
the y-axis is expectations, it’s hype. It’s how excited we are for
new tech, right? So I think there’s a ton of potential
in this kind of work, in research and development, but
I’ve been worrying the past few years, cuz notice where we are in terms of
deep learning, machine learning. At the very tip top of the peak
of inflated expectations. I’m worried because I know
exactly what happens next, we are going to crash into
the trough of disillusionment. You can see blockchain,
it’s already headed down. All right, the thing is
the stakeholders will get disappointed, they’ll get resentful. Hey, I read that 80% of doctors
will be replaced by algorithms. I thought pathologists and radiologists,
you should all be fired by now. I thought if we just poured all
this data into the computer, that out would’ve come the cure for
cancer, why is this not happening? I’m not really sure those were
the right expectations to have. Even though there have been amazing
achievements in AI research in the past few years. What’s going on here,
DeepMind, a company in the UK, has an algorithm now that’s able to beat
world class professional Go players. This is actually an incredible
achievement in AI. Ten years ago I had
classmates work on this. They said it was probably impossible,
the game is just too complex. It’s way way [LAUGH]
way harder than chess. There’s just too many
combinations to think through. They applied a similar algorithm
to playing Atari games, the algorithm doesn’t understand
what an Atari game is, they just fed the pixels of colors off
the video screen into the algorithm and the score, and then the algorithm
just started playing around and eventually was able to figure
out how to play these games. Carnegie Mellon, a year ago,
is deploying a poker bot, able to beat professional-level
players at no-limit Texas hold-em now. If I wasn’t in medicine,
I would love to be working on this. It just looks so fun, but I’m menacing, because I also wanna
also work on other ideas. So, what’s the difference? I mean, come on,
we’re beating professional Go players. We’ve got self driving cars,
why haven’t we solved medicine yet? Let’s get to it already,
what’s the difference? A lot of differences, what I think
is the key difference is that, if you wanna play a thousand games of
Go with your algorithm, it’s easy. You can simulate that,
trivially, instantly, fast. You’ll get an answer that’s right or
wrong, immediately. If I want to try novel
medical care on patients, and see if that will lead to better outcomes,
and get a reliable answer, that actually might be impossible to
simulate that, and get a good answer. But if that’s the case,
how am I supposed to get information? How can we get data? How do we know what works in medicine and
what doesn’t? You’ve got to experiment on people, and when you get it wrong,
you kill people in very bad ways. Somebody has cancer,
you wanna pick a special chemo or immuno therapy,
did you make the right choice? You could be waiting five years
to find out the answer to that. That time cycles way to slow for a lot of
these really cool reinforcement learning techniques in machine learning nowadays. I actually disagree with this last point, this is a slide I adapted
it from Marty Tindimboms, I believed it had many excellent
points about data not even existing. I actually think a lot of the data
we want and need does exist, but it’s locked up in healthcare system silos. Who have no incentive to wanna share
with anybody, they only see risk. And even if you could get access to that
data, it’s so messy, it’s so difficult to deal with, because it wasn’t created for
the purposes of helping people. All right, I’ll break down just
a few summary thoughts on. If you wanna make progress here,
what do we do? What are the questions you should ask
if somebody is trying to sell you something like this? And how do you tell what’s credible and
what’s not? Remember that nobody actually
cares about technology, okay? I got AI this, I got AI that,
who cares, right? Can you solve a business or
medical problem for me or not? Which really means you need
a combination of skills, you need a team, the domain experts and
the technical experts. Think about cases where
an important decision depends on a human making a prediction
about the future, right? And us humans, we can be pretty
good about, I think you’re okay, I predict you’re gonna be fine,
I’ll let you go home. I think, I predict you have
a high risk of heart-attack, so you better take this medicine. We’re pretty good, but we lived with
human, inevitably there’s undesirable, very ability in the way we practice,
which leads to challenges, which we could support with
data driven algorithms. Remember the predictions
are from the outcome. Remember that data matters more
than any predicted algorithm. That’s the real secret sauce, that’s the oil of the fourth
industrial revolution here. Think a lot about actionability, a lot of these things are about
predicting something. I can predict that the sun’s
gonna rise tomorrow. It doesn’t mean I can do
anything about it, right? So you’ll see a lot of are like
novel screening methodology. That’s right, your Apple Watch can
tell if you have a heart rhythm right? I can make an early detection if a patient
has sepsis, this bad infection right? I’m gonna wonder if they have a fever,
if they have low blood pressure, and I see the doctor ordered antibiotics for
them, I predict that they have sepsis, a bad infection. Well, yeah, thank you for that prediction. Why do you think I
ordered the antibiotics? Think instead about
an existing scarce resource, or back up, think about what would
you do if you got that prediction? Very often, these early warning systems or screening methodology is I recommend
you pay more attention to the patient. What is it you think I was doing, right? It’s often not really clear that
you actually have an opportunity. Instead, think about an existing scarce
resource we have, we know it works, an organ transplant, but
you can’t get it to everybody. So for example, the VA, the Veteran
Affairs, they monitor their entire primary care population 24 hours a day,
and they make a prediction. Are they likely to be hospitalized or
are they likely to die within next year? Cuz if so, they’ve got specialized
intensive out-patient care for them. Extra social worker, doctor,
case manager, pharmacist that zeros in, the specialized team to help those people. But you can’t deploy that team to
everybody, nor is there a need to. The reality is, most of you are gonna be
just fine if we just leave you alone. That’s actually the better thing to do. But could you predict who is the 5% of
the population that’s gonna cost 50% of our healthcare resources next year? If you can identify them for me, maybe we
can deploy some specialized care, okay? Targeted advertising,
this is a very controversial point, but do you think a prediction, a model, algorithm
computer needs to explain itself? Is that essential or overrated? I’m actually gonna go with I think
it’s not explanations you want. What you want is confidence,
what you want is trust. Explanations are really just a crutch, when I haven’t been able to give
you confidence in another way. Can anyone tell me how Tylenol works,
okay, well, nobody? Can anyone tell me how anesthesia works,
you can’t, I can’t either, nobody can! We don’t understand the mechanism of
action of many of these drugs, and yet still, clearly, we’re willing to use it. So apparently,
explanation is not essential. The distinction, the reason I’m
willing to prescribe you Tylenol, is because the safety
trials have been done. The efficacy trials have been
done to empirically demonstrate that however it works,
it does work, and it’s safe. Most of these types of algorithms have
not been held up even close to that level of evaluation standard. Which is why, in the meantime,
can you at least explain yourself so we feel better about how you work, okay? Incentives are everything. I bet everyone in this room could come up
with five great ways to make the practice of medicine better. Really challenging cuz most of those
systems, there’s no business model to sustain it, even if it is a good idea and
probably would help people. So the VA, the Veteran Affairs, is kind of like government healthcare,
capitated value-based payments. So they’ve very incentivized,
primary care. We’re gonna keep you healthy,
deploy a special team. We’re gonna keep you out of the hospital. It’s really just better for everybody. Imagine I go to Stanford, still 90%
fee for services everywhere else. Hey, hospital executive, can you give
me all your data and some resources? I found this really great way to
keep patients out of the hospital. He’s like, wait a second, you found a great way to turn
away my paying customers? I’m not really sure why I wanna work with,
anyway. All sorts of distortions in our medical
system that make it difficult to do the right thing. There are ways, but there’s larger
policy changes that have to happen. Think about operational rather
than in clinical use cases. As a doctor, I very naturally think,
what does a doctor think of, and diagnose and x-ray, that’s so cool. Actually, practically speaking,
it’s much less risk, much lower-hanging fruit to go for
an administrative use case. Hey, can you predict how long this
patient’s gonna be in the operating room? That way, I can make much better
use of that extremely expensive and costly scarce resource. Can you predict which one of my patients
won’t show up in clinic next week so I can better use that scarce resource? Can you automatically transcribe
our patient encounter so that we can just talk to each other? I don’t need to sit at a computer and
take notes and look at you like this while we’re chatting, because maybe
a computer can automate that for me. Those are actually probably much more
viable use cases that we’ll get value of in the nearer future. All right, I’m gonna wrap,
coming back to this life cycle. I’m worried about the crash. But this I why I wanted to give this talk. I’m hoping, through conversations like
this, we can foster a better appreciation for both the capabilities and the
limitations of such advancing technology. So we can soften that crash and quickly
move on to the slope of enlightenment.>>[LAUGH]
>>Where we effectively use every information and data source to
best improve our collective health. So what do you think? You think computers are gonna
be smarter than humans? No, humans are gonna be
smarter than computers? I think this makes for
a very stimulating debate, and I think it’s completely irrelevant. It’s the wrong question to ask. Because the answer is combine good machine learning software with
the best human clinician hardware and together we can deliver better care
than either of us could do alone. Thank you all.>>[APPLAUSE]


  1. Tom Lloyd says:

    Nice video

  2. Igor Gabrielan says:

Leave a Reply

Your email address will not be published. Required fields are marked *