Artificial intelligence (AI) aims to mimic human cognitive functions. It is bringing a paradigm shift to healthcare, powered by increasing availability of healthcare data and rapid progress of analytics techniques. We survey the current status of AI applications in healthcare and discuss its future. AI can be applied to various types of healthcare data (structured and unstructured). Popular AI techniques include machine learning methods for structured data, such as the classical support vector machine and neural network, and the modern deep learning, as well as natural language processing for unstructured data. Major disease areas that use AI tools include cancer, neurology and cardiology. We then review in more details the AI applications in stroke, in the three major areas of early detection and diagnosis, treatment, as well as outcome prediction and prognosis evaluation. We conclude with discussion about pioneer AI systems, such as IBM Watson, and hurdles for real-life deployment of AI.

Overview of the Medical Artificial Intelligence (AI) research
Recently AI techniques have sent vast waves across healthcare, even fuelling an active discussion of whether AI doctors will eventually replace human physicians in the future. We believe that human physicians will not be replaced by machines in the foreseeable future, but AI can definitely assist physicians to make better clinical decisions or even replace human judgment in certain functional areas of healthcare (eg, radiology). The increasing availability of healthcare data and rapid development of big data analytic methods has made possible the recent successful applications of AI in healthcare. Guided by relevant clinical questions, powerful AI techniques can unlock clinically relevant information hidden in the massive amount of data, which in turn can assist clinical decision making.

In this white paper, we survey the current status of AI in healthcare, as well as discuss its future. We first briefly review four relevant aspects from medical investigators’ perspectives:
1. Motivations of applying AI in healthcare
2. Data types that have be analyzed by AI systems
3. Mechanisms that enable AI systems to generate clinical meaningful results
4. Disease types that the AI communities are currently tackling.

Motivation

The advantages of AI have been extensively discussed in the medical literature. AI can use sophisticated algorithms to ‘learn’ features from a large volume of healthcare data, and then use the obtained insights to assist clinical practice. It can also be equipped with learning and self-correcting abilities to improve its accuracy based on feedback. An AI system can assist physicians by providing up-to-date medical information from journals, textbooks and clinical practices to inform proper patient care.In addition, an AI system can help to reduce diagnostic and therapeutic errors that are inevitable in the human clinical practice. Moreover, an AI system extracts useful information from a large patient population to assist making real-time inferences for health risk alert and health outcome prediction.


Healthcare Data

Before AI systems can be deployed in healthcare applications, they need to be ‘trained’ through data that are generated from clinical activities, such as screening, diagnosis, treatment assignment and so on, so that they can learn similar groups of subjects, associations between subject features and outcomes of interest. These clinical data often exist in but not limited to the form of demographics, medical notes, and electronic recordings from medical devices, physical examinations and clinical laboratory and images.

Specifically, in the diagnosis stage, a substantial proportion of the AI literature analyses data from diagnosis imaging, genetic testing and electro diagnosis.

For example, Jha and Topol urged radiologists to adopt AI technologies when analyzing diagnostic images that contain vast data information. Li et al studied the uses of abnormal genetic expression in long non-coding RNAs to diagnose gastric cancer. Shin et al developed an electro diagnosis support system for localizing neural injury.

Fig. The data types considered in the artificial intelligence artificial (AI) literature. The comparison is obtained through searching the diagnosis techniques in the AI literature on the PubMed database.


AI devices

The above discussion suggests that AI devices mainly fall into two major categories. The first category includes Machine learning (ML) techniques that analyze structured data such as imaging, genetic and EP data. In the medical applications, the ML procedures attempt to cluster patients’ traits, or infer the probability of the disease outcomes.

The second category includes Natural language processing (NLP) methods that extract information from unstructured data such as clinical notes/medical journals to supplement and enrich structured medical data. The NLP procedures target at turning texts to machine-readable structured data, which can then be analyzed by ML techniques.

For better presentation, the below flow chart describes the road map from clinical data generation, through NLP data enrichment and ML data analysis, to clinical decision making. We comment that the road map starts and ends with clinical activities. As powerful as AI techniques can be, they have to be motivated by clinical problems and be applied to assist clinical practice in the end.

The road map from clinical data generation to natural language processing data enrichment, to machine learning data analysis, to clinical decision making. EMR, electronic medical record; EP, electrophysiological.


Disease Focus


The leading 10 disease types considered in the artificial intelligence (AI) literature. The first vocabularies in the disease names are displayed. The comparison is obtained through searching the disease types in the AI literature on PubMed.

Despite the increasingly rich AI literature in healthcare, the research mainly concentrates around a few disease types: cancer, nervous system disease and cardiovascular disease.

1. Cancer: Somashekhar et al demonstrated that the IBM Watson for oncology would be a reliable AI system for assisting the diagnosis of cancer through a double-blinded validation study. Esteva et al analyzed clinical images to identify skin cancer subtypes.

2. Neurology: Bouton et al developed an AI system to restore the control of movement in patients with quadriplegia. Farina et al tested the power of an offline man/machine interface that uses the discharge timings of spinal motor neurons to control upper-limb prostheses.

3. Cardiology: Dilsizian and Siegel discussed the potential application of the AI system to diagnose the heart disease through cardiac image. Arterys recently received clearance from the US Food and Drug Administration (FDA) to market its Arterys Cardio DL application, which uses AI to provide automated, editable ventricle segmentations based on conventional cardiac MRI images.

The concentration around these three diseases is not completely unexpected. All three diseases are leading causes of death; therefore, early diagnoses are crucial to prevent the deterioration of patients’ health status. Furthermore, early diagnoses can be potentially achieved through improving the analysis procedures on imaging, genetic, EP or EMR, which is the strength of the AI system.


Classical ML

ML constructs data analytical algorithms to extract features from data. Inputs to ML algorithms include patient ‘traits’ and sometimes medical outcomes of interest. A patient’s traits commonly include baseline data, such as age, gender, and disease history and so on, and disease-specific data, such as diagnostic imaging, gene expressions, EP test, physical examination results, clinical symptoms, medication and so on. Besides the traits, patients’ medical outcomes are often collected in clinical research. These include disease indicators, patient’s survival times and quantitative disease levels.

Depending on whether to incorporate the outcomes, ML algorithms can be divided into two major categories: unsupervised learning and supervised learning. Unsupervised learning is well known for feature extraction, while supervised learning is suitable for predictive modeling via building some relationships between the patient traits (as input) and the outcome of interest (as output). More recently, semi supervised learning has been proposed as a hybrid between unsupervised learning and supervised learning, which is suitable for scenarios where the outcome is missing for certain subjects. These three types of learning are illustrated below.

Clustering and principal component analysis (PCA) are two major unsupervised learning methods. Clustering groups subjects with similar traits together into clusters, without using the outcome information. Clustering algorithms output the cluster labels for the patients through maximizing and minimizing the similarity of the patients within and between the clusters. Popular clustering algorithms include k-means clustering, hierarchical clustering and Gaussian mixture clustering. PCA is mainly for dimension reduction, especially when the trait is recorded in a large number of dimensions, such as the number of genes in a genome-wide association study. PCA projects the data onto a few principal component (PC) directions, without losing too much information about the subjects. Sometimes, one can first use PCA to reduce the dimension of the data, and then use clustering to group the subjects.

On the other hand, supervised learning considers the subjects’ outcomes together with their traits, and goes through a certain training process to determine the best outputs associated with the inputs that are closest to the outcomes on average. Usually, the output formulations vary with the outcomes of interest. For example, the outcome can be the probability of getting a particular clinical event, the expected value of a disease level or the expected survival time.

Clearly, compared with unsupervised learning, supervised learning provides more clinically relevant results; hence AI applications in healthcare most often use supervised learning. (Note that unsupervised learning can be used as part of the preprocessing step to reduce dimensionality or identify subgroups, which in turn makes the follow-up supervised learning step more efficient).

The machine learning algorithms used in the medical literature. The data are generated through searching the machine learning algorithms within healthcare on PubMed.

Relevant techniques include linear regression, logistic regression, naïve Bayes, decision tree, nearest neighbor, random forest, discriminate analysis, support vector machine (SVM) and neural network. The below figure displays the popularity of the various supervised learning techniques in medical applications, which clearly shows that SVM and neural network are the most popular ones. This remains the case when restricting to the three major data types (image, genetic and EP), as shown below.


Deep learning: a new era of ML

Deep learning is a modern extension of the classical neural network technique. One can view deep learning as a neural network with many layers. Rapid development of modern computing enables deep learning to build up neural networks with a large number of layers, which is infeasible for classical neural networks. As such, deep learning can explore more complex non-linear patterns in the data. Another reason for the recent popularity of deep learning is due to the increase of the volume and complexity of data. Below Figure shows that the application of deep learning in the field of medical research nearly doubled in 2016. In addition, Below Fig shows that a clear majority of deep learning is used in imaging analysis, which makes sense given that images are naturally complex and high volume.


An illustration of deep learning with two hidden layers.


Current trend for deep learning. The data are generated through searching the deep learning in healthcare and disease category on PubMed.


The data sources for deep learning. The data are generated through searching deep learning in combination with the diagnosis techniques on PubMed.

Different from the classical neural network, deep learning uses more hidden layers so that the algorithms can handle complex data with various structures. In the medical applications, the commonly used deep learning algorithms include convolution neural network (CNN), recurrent neural network, deep belief network and deep neural network.

The four main deep learning algorithm and their popularities. The data are generated through searching algorithm names in healthcare and disease category on PubMed.



Natural language processing (NLP)

The image, EP and genetic data are machine-understandable so that the ML algorithms can be directly performed after proper preprocessing or quality control processes. However, large proportions of clinical information are in the form of narrative text, such as physical examination, clinical laboratory reports, operative notes and discharge summaries, which are unstructured and incomprehensible for the computer program. Under this context, NLP targets at extracting useful information from the narrative text to assist clinical decision making.

An NLP pipeline comprises two main components: (1) text processing and (2) classification. Through text processing, the NLP identifies a series of disease-relevant keywords in the clinical notes based on the historical databases. Then a subset of the keywords is selected through examining their effects on the classification of the normal and abnormal cases. The validated keywords then enter and enrich the structured data to support clinical decision making.

The NLP pipelines have been developed to assist clinical decision making on alerting treatment arrangements, monitoring adverse effects and so on. For example, introducing NLP for reading the chest X-ray reports would assist the antibiotic assistant system to alert physicians for the possible need for anti-infective therapy. NLP used to automatically monitor the laboratory- based adverse effects.


AI Applications

Stroke is a common and frequently occurring disease that affects more than 500 million people worldwide. It is the leading cause of death in China and the fifth in North America. Stroke had cost about US$689 billion in medical expenses across the world, causing heavy burden to countries and families. Therefore, research on prevention and treatment for stroke has great significance. In recent years, AI techniques have been used in more and more stroke-related studies. Below we summarize some of the relevant AI techniques in the three main areas of stroke care: early disease prediction and diagnosis, treatment, as well as outcome prediction and prognosis evaluation.


Early detection and diagnosis

Stroke, for 85% of the time, is caused by thrombus in the vessel called cerebral infarction. However, for lack of judgment of early stroke symptom, only a few patients could receive timely treatment. Villar et al developed a movement-detecting device for early stroke prediction. Two ML algorithms — genetic fuzzy finite state machine and PCA — were implemented into the device for the model building solution. The detection process included a human activity recognition stage and a stroke-onset detection stage. Once the movement of the patient is significantly different from the normal pattern, an alert of stroke would be activated and evaluated for treatment as soon as possible. Similarly, Maninini et al proposed a wearable device for collecting data about normal/pathological gaits for stroke prediction. The data would be extracted and modeled by hidden Markov models and SVM, and the algorithm could correctly classify 90.5% of the subjects to the right group.

For diagnosis of stroke, neuroimaging techniques, including MRI and CT, are important for disease evaluation. Some studies have tried to apply ML methods to neuroimaging data to assist with stroke diagnosis. Rehme et al used SVM in resting-state functional MRI data, by which endophenotypes of motor disability after stroke were identified and classified.SVM can correctly classify patients with stroke with 87.6% accuracy. Griffis et al tried naïve Bayes classification to identify stroke lesion in T1-weighted MRI.

The result is comparable with human expert manual lesion delineation. Kamnitsas et al tried three-dimensional CNN (3D CNN) for lesion segmentation in multimodel brain MRI. They also used fully connected conditional random field model for final postprocessing of the CNN’s soft segmentation maps. Rondina et al analysed stroke anatomical MRI images using Gaussian process regression, and found that the patterns of voxels performed better than lesion load per region as the predicting features.

ML methods have also been applied to analyze CT scans from patients with stroke. Free-floating intraluminal thrombus may be formed as lesion after stroke, which is difficult to be distinguished with carotid plaque on the CT imaging. Thornhill et al used three ML algorithms to classify these two types by quantitative shape analysis, including linear discriminant analysis, artificial neural network and SVM. The accuracy for each method varies between 65.2% and 76.4%.


Treatment

ML has also been applied for predicting and analysing the performance of stroke treatment. As a critical step of emergency measure, the outcome of intravenous thrombolysis (tPA) has strong relationship with the prognosis and survival rate. Bentley et alused SVM to predict whether patients with tPA treatment would develop symptomatic intracranial haemorrhage by CT scan. They used whole-brain images as the input into the SVM, which performed better than conventional radiology-based methods. To improve the clinical decision-making process of tPA treatment, Love et al proposed a stroke treatment model by analyzing practice guidelines, meta- analyses and clinical trials using Bayesian belief network. The model consisted of 56 different variables and three decisions for analysing the procedure of diagnosis, treatment and outcome prediction. Ye et al used interaction trees and subgroup analysis to explore appropriate tPA dosage based on patient characteristics, taking into account both the risk of bleeding and the treatment efficacy.


Outcome prediction and prognosis evaluation

Many factors can affect stroke prognosis and disease mortality. Compared with conventional methods, ML methods have advantages in improving prediction performance. To better support clinical decision-making process, Zhang et al proposed a model for predicting 3-month treatment outcome by analyzing physiological parameters during 48 hours after stroke using logistic regression. Asadi et al compiled a database of clinical information of 107 patients with acute anterior or posterior circulation stroke who underwent intra-arterial therapy. The authors analyzed the data via artificial neural network and SVM, and obtained prediction accuracy above 70%. They also used ML techniques to identify factors influencing outcome in brain arteriovenous malformation treated with endovascular embolisation. While standard regression analysis model could only achieve a 43% accuracy rate, their methods worked much better with 97.5% accuracy.

Birkner et al used an optimal algorithm to predict 30-day mortality and obtained more accurate prediction than existing methods. Similarly, King et al used SVM to predict stroke mortality at discharge. In addition, they proposed the use of the synthetic minority oversampling technique to reduce the stroke outcome prediction bias caused by between-class imbalance among multiple data sets.

Brain images have been analysed to predict the outcome of stroke treatment. Chen et al analysed CT scan data via ML for evaluating the cerebral oedema following hemispheric infarction. They built random forest to automatically identify cerebrospinal fluid and analyse the shifts on CT scan, which is more efficient and accurate than conventional methods. Siegel et alextracted functional connectivity from MRI and functional MRI data, and used ridge regression and multitask learning for cognitive deficiency prediction after stroke. Hope et al studied the relationship between lesions extracted from MRI images and the treatment outcome via Gaussian process regression model. They used the model to predict the severity of cognitive impairments after stroke and the course of recovery over time.


References

1.↵ Murdoch TB , Detsky AS . The inevitable application of big data to health care. JAMA 2013;309:1351–2.doi:10.1001/jama.2013.393 CrossRefPubMedWeb of ScienceGoogle Scholar
2.↵ Kolker E , Özdemir V , Kolker E . How Healthcare can refocus on its Super-Customers (Patients, n =1) and Customers (Doctors and Nurses) by Leveraging Lessons from Amazon, Uber, and Watson. OMICS 2016;20:329–33.doi:10.1089/omi.2016.0077 Google Scholar
3.↵ Dilsizian SE , Siegel EL . Artificial intelligence in medicine and cardiac imaging: harnessing big data and advanced computing to provide personalized medical diagnosis and treatment. Curr Cardiol Rep 2014;16:441.doi:10.1007/s11886-013-0441-8 CrossRefPubMedGoogle Scholar
4.↵ Patel VL , Shortliffe EH , Stefanelli M , et al . The coming of age of artificial intelligence in medicine. Artif Intell Med 2009;46:5–17.doi:10.1016/j.artmed.2008.07.017 CrossRefPubMedWeb of ScienceGoogle Scholar
5.↵ Jha S , Topol EJ . Adapting to Artificial Intelligence: radiologists and pathologists as information specialists. JAMA 2016;316:2353–4.doi:10.1001/jama.2016.17438 Google Scholar
6.↵ Pearson T . How to replicate Watson hardware and systems design for your own use in your basement. 2011 https://www.ibm.com/developerworks/community/blogs/InsideSystemStorage/entry/ibm_watson_how_to_build_ your_own_watson_jr_in_your_basement7?lang=en (accessed 1 Jun 2017).Google Scholar
7.↵ Weingart SN , Wilson RM , Gibberd RW , et al . Epidemiology of medical error. BMJ 2000;320:774–7.doi:10.1136/bmj.320.7237.774 FREE Full TextGoogle Scholar
8.↵ Graber ML , Franklin N , Gordon R . Diagnostic error in internal medicine. Arch Intern Med 2005;165:1493–9.doi:10.1001/archinte.165.13.1493 CrossRefPubMedWeb of ScienceGoogle Scholar
9.↵ Winters B , Custer J , Galvagno SM , et al . Diagnostic errors in the intensive care unit: a systematic review of autopsy studies. BMJ Qual Saf 2012;21:894–902.doi:10.1136/bmjqs-2012-000803 Abstract/FREE Full TextGoogle Scholar
10.↵ Lee CS , Nagy PG , Weaver SJ , et al . Cognitive and system factors contributing to diagnostic errors in radiology. AJR Am J Roentgenol 2013;201:611–7.doi:10.2214/AJR.12.10375 CrossRefPubMedGoogle Scholar
11. BMJ Publishing Group Ltd Copyright Clearance Center’s Rights Link service.

Want to know more about how we can be of help to you? Contact Us