Houston, Texas, USA : In a matter of seconds, a new algorithm read chest X-rays for 14 pathologies, performing as well as radiologists in most cases, a Stanford-led study says.
A new artificial intelligence algorithm can reliably screen chest X-rays for more than a dozen types of disease, and it does so in less time than it takes to read this sentence, according to a new study led by Stanford University researchers.
The algorithm, dubbed CheXNeXt, is the first to simultaneously evaluate X-rays for a multitude of possible maladies and return results that are consistent with the readings of radiologists, the study says.
Scientists trained the algorithm to detect 14 different pathologies: For 10 diseases, the algorithm performed just as well as radiologists; for three, it underperformed compared with radiologists; and for one, the algorithm outdid the experts.
“Usually, we see AI algorithms that can detect a brain hemorrhage or a wrist fracture—a very narrow scope for single-use cases,” said Matthew Lungren, MD, MPH, assistant professor of radiology. “But here we’re talking about 14 different pathologies analyzed simultaneously, and it’s all through one algorithm.”
The goal, Lungren said, is to eventually leverage these algorithms to reliably and quickly scan a wide range of image-based medical exams for signs of disease without the backup of professional radiologists. And while that may sound disconcerting, the technology could eventually serve as high-quality digital “consultations” to resource-deprived regions of the world that wouldn’t otherwise have access to a radiologist’s expertise. Likewise, there’s an important role for AI in fully developed health care systems too, Lungren added. Algorithms like CheXNeXt could one day expedite care, empowering primary care doctors to make informed decisions about X-ray diagnostics faster, without having to wait for a radiologist.
“We’re seeking opportunities to get our algorithm trained and validated in a variety of settings to explore both its strengths and blind spots,” said graduate student Pranav Rajpurkar. “The algorithm has evaluated over 100,000 X-rays so far, but now we want to know how well it would do if we showed it a million X-rays—and not just from one hospital, but from hospitals around the world.”
A paper detailing the findings of the study was published online Nov. 20 in PLOS Medicine. Lungren and Andrew Ng, Ph.D., adjunct professor of computer science at Stanford, share senior authorship. Rajpurkar and fellow graduate student Jeremy Irvin are the lead authors.
Practice makes perfect
Lungren and Ng’s diagnostic algorithm has been in development for more than a year. It builds on their work on a previous iteration of the technology that could outperform radiologists when diagnosing pneumonia from a chest X-ray. Now, they’ve boosted the abilities of the algorithm to flag 14 ailments, including masses, enlarged hearts and collapsed lungs. For 11 of the 14 pathologies, the algorithm made diagnoses with the accuracy of radiologists or better.
Back in the summer of 2017, the National Institutes of Health released a set of hundreds of thousands of X-rays. Since then, there’s been a mad dash for computer scientists and radiologists working in artificial intelligence to deliver the best possible algorithm for chest X-ray diagnostics.
The scientists used about 112,000 X-rays to train the algorithm. A panel of three radiologists then reviewed a different set of 420 X-rays, one by one, for the 14 pathologies. Their conclusions served as a “ground truth”— a diagnosis that experts agree is the most accurate assessment—for each scan. This set would eventually be used to test how well the algorithm had learned the telltale signs of disease in an X-ray. It also allowed the team of researchers to see how well the algorithm performed compared to the radiologists.
“We treated the algorithm like it was a student; the NIH data set was the material we used to teach the student, and the 420 images were like the final exam,” Lungren said. To further evaluate the performance of the algorithm compared with human experts, the scientists asked an additional nine radiologists from multiple institutions to also take the same “final exam.”
“That’s another factor that elevates this research,” Lungren said. “We weren’t just comparing this against other algorithms out there; we were comparing this model against practicing radiologists.”
What’s more, to read all 420 X-rays, the radiologists took about three hours on average, while the algorithm scanned and diagnosed all pathologies in about 90 seconds.
Next stop: the clinic
Now, Lungren said, his team is working on a subsequent version of CheXNeXt that will bring the researchers even closer to in-clinic testing. The algorithm isn’t ready for that just yet, but Lungren hopes that it will eventually help expedite the X-ray-reading process for doctors diagnosing urgent care or emergency patients who come in with a cough.
“I could see this working in a few ways. The algorithm could triage the X-rays, sorting them into prioritized categories for doctors to review, like normal, abnormal or emergent,” Lungren said. Or the algorithm could sit bedside with primary care doctors for on-demand consultation, he said. In this case, Lungren said, the algorithm could step in to help confirm or cast doubt on a diagnosis. For example, if a patient’s physical exam and lab results were consistent with pneumonia, and the algorithm diagnosed pneumonia on the patient’s X-ray, then that’s a pretty high-confidence diagnosis and the physician could provide care right away for the condition. Importantly, in this scenario, there would be no need to wait for a radiologist. But if the algorithm came up with a different diagnosis, the primary care doctor could take a closer look at the X-ray or consult with a radiologist to make the final call.
“We should be building AI algorithms to be as good or better than the gold standard of human, expert physicians. Now, I’m not expecting AI to replace radiologists any time soon, but we are not truly pushing the limits of this technology if we’re just aiming to enhance existing radiologist workflows,” Lungren said. “Instead, we need to be thinking about how far we can push these AI models to improve the lives of patients anywhere in the world.”
A second study found that AI algorithm is better at diagnosing pneumonia than radiologists:
AI algorithm is better at diagnosing pneumonia than radiologists
Stanford researchers have developed a deep-learning algorithm that evaluates chest X-rays for signs of disease.
Stanford researchers have developed an algorithm that offers diagnoses based off chest X-ray images. It can diagnose up to 14 types of medical conditions and is able to diagnose pneumonia better than expert radiologists working alone
A paper about the algorithm, called CheXNet, was published Nov. 14 on the open-access, scientific preprint website arXiv.
“Interpreting X-ray images to diagnose pathologies like pneumonia is very challenging, and we know that there’s a lot of variability in the diagnoses radiologists arrive at,” said Pranav Rajpurkar, a graduate student in the Machine Learning Group at Stanford and co-lead author of the paper. “We became interested in developing machine learning algorithms that could learn from hundreds of thousands of chest X-ray diagnoses and make accurate diagnoses.”
The work uses a public data set initially released by the National Institutes of Health Clinical Center on Sept. 26. That data set contains 112,120 frontal-view chest X-ray images labeled with up to 14 possible pathologies. It was released in tandem with an algorithm that could diagnose many of those 14 pathologies with some success, designed to encourage others to advance that work. As soon as they saw these materials, the Machine Learning Group—a group led by Andrew Ng, PhD, adjunct professor of computer science—knew they had their next project.
The researchers, working with Matthew Lungren, MD, MPH, assistant professor of radiology at the School of Medicine, had four Stanford radiologists independently annotate 420 of the images for possible indications of pneumonia. The researchers said they chose to focus on this disease, which brings 1 million Americans to the hospital each year, according to the Centers for Disease Control and Prevention, and is especially difficult to spot on X-rays. In the meantime, the Machine Learning Group team got to work developing an algorithm that could automatically diagnose the pathologies.
Within a week, the researchers had an algorithm that diagnosed 10 of the pathologies labeled in the X-rays more accurately than previous state-of-the-art results. In just over a month, their algorithm could beat these standards in all 14 identification tasks. In that short time span, CheXNet also outperformed the four Stanford radiologists in diagnosing pneumonia accurately.
Why use an algorithm
Often, treatments for common but devastating diseases that occur in the chest, such as pneumonia, rely heavily on how doctors interpret radiological imaging. But even the best radiologists are prone to misdiagnoses due to challenges in distinguishing between diseases based on X-rays.
“The motivation behind this work is to have a deep-learning model to aid in the interpretation task that could overcome the intrinsic limitations of human perception and bias, and reduce errors,” explained Lungren, who is co-author of the paper. “More broadly, we believe that a deep-learning model for this purpose could improve health care delivery across a wide range of settings.”
After about a month of continuous iteration, the algorithm outperformed the four individual Stanford radiologists in pneumonia diagnoses. This means that the diagnoses provided by CheXNet agreed with a majority vote of radiologists more often than those of the individual radiologists. The algorithm now has the highest performance of any work that has come out so far related to the NIH chest X-ray data set.
Many options for the future
Also detailed in their arXiv paper, the researchers have developed a computer-based tool that produces what looks like a heat map of the chest X-rays—but instead of representing temperature, the colors of these maps represent areas that the algorithm determines are most likely to represent pneumonia. This tool could help reduce the amount of missed cases of pneumonia and significantly accelerate radiologist workflow by showing them where to look first, leading to faster diagnoses for the sickest patients.
In parallel to other work the group is doing with irregular heartbeat diagnosis and electronic medical record data, the researchers hope CheXNet can help people in areas of the world where people might not have easy access to a radiologist.
“We plan to continue building and improving upon medical algorithms that can automatically detect abnormalities and we hope to make high-quality, anonymized medical datasets publicly available for others to work on similar problems,” said Jeremy Irvin, a graduate student and co-lead author of the paper. “There is massive potential for machine learning to improve the current health care system, and we want to continue to be at the forefront of innovation in the field.”
A third study found that AI algorithm diagnoses heart arrhythmias with cardiologist-level accuracy:
AI algorithm diagnoses heart arrhythmias with cardiologist-level accuracy
A new algorithm developed by Stanford computer scientists can sift through hours of heart rhythm data generated by some wearable monitors to find sometimes life-threatening irregular heartbeats, called arrhythmias. The algorithm, detailed in an arXiv paper, performs better than trained cardiologists, and has the added benefit of being able to sort through data from remote locations where people don’t have routine access to cardiologists.
“One of the big deals about this work, in my opinion, is not just that we do abnormality detection but that we do it with high accuracy across a large number of different types of abnormalities,” said Awni Hannun, a graduate student and co-lead author of the paper. “This is definitely something that you won’t find to this level of accuracy anywhere else.”
People suspected to have an arrhythmia will often get an electrocardiogram (ECG) in a doctor’s office. However, if an in-office ECG doesn’t reveal the problem, the doctor may prescribe the patient a wearable ECG that monitors the heart continuously for two weeks. The resulting hundreds of hours of data would then need to be inspected second by second for any indications of problematic arrhythmias, some of which are extremely difficult to differentiate from harmless heartbeat irregularities.
Researchers in the Stanford Machine Learning Group, led by Andrew Ng, an adjunct professor of computer science, saw this as a data problem. They set out to develop a deep learning algorithm to detect 14 types of arrhythmia from ECG signals. They collaborated with the heartbeat monitor company iRhythm to collect a massive dataset that they used to train a deep neural network model. In seven months, it was able to diagnose these arrhythmias about as accurately as cardiologists and outperform them in most cases.
The researchers believe that this algorithm could someday help make cardiologist-level arrhythmia diagnosis and treatment more accessible to people who are unable to see a cardiologist in person. Ng thinks this is just one of many opportunities for deep learning to improve patients’ quality of care and help doctors save time.
Building heartbeat interpreter
The group trained their algorithm on data collected from iRhythm’s wearable ECG monitor. Patients wear a small chest patch for two weeks and carry out their normal day-to-day activities while the device records each heartbeat for analysis. The group took approximately 30,000, 30-second clips from various patients that represented a variety of arrhythmias.
“The differences in the heartbeat signal can be very subtle but have massive impact in how you choose to tackle these detections,” said Pranav Rajpurkar, a graduate student and co-lead author of the paper. “For example, two forms of the arrhythmia known as second-degree atrioventricular block look very similar, but one requires no treatment while the other requires immediate attention.”
To test accuracy of the algorithm, the researchers gave a group of three expert cardiologists 300 undiagnosed clips and asked them to reach a consensus about any arrhythmias present in the recordings. Working with these annotated clips, the algorithm could then predict how those cardiologists would label every second of other ECGs with which it was presented, in essence, giving a diagnosis.
Success and the future
The group had six different cardiologists, working individually, diagnose the same 300-clip set. The researchers then compared which more closely matched the consensus opinion – the algorithm or the cardiologists working independently. They found that the algorithm is competitive with the cardiologists, and able to outperform cardiologists on most arrhythmias.
“There was always an element of suspense when we were running the model and waiting for the result to see if it was going to do better than the experts,” said Rajpurkar. “And we had these exciting moments over and over again as we pushed the model closer and closer to expert performance and then finally went beyond it.”
In addition to cardiologist-level accuracy, the algorithm has the advantage that it does not get fatigued and can make arrhythmia detections instantaneously and continuously.
Long term, the group hopes this algorithm could be a step toward expert-level arrhythmia diagnosis for people who don’t have access to a cardiologist, as in many parts of the developing world and in other rural areas. More immediately, the algorithm could be part of a wearable device that at-risk people keep on at all times that would alert emergency services to potentially deadly heartbeat irregularities as they’re happening.