The Journal of Artificial Intelligence Research (JAIR) is dedicated to the rapid dissemination of important research results to the global artificial intelligence (AI) community. The journal’s scope encompasses all areas of AI, including agents and multi-agent systems, automated reasoning, constraint processing and search, knowledge representation, machine learning, natural language, planning and scheduling, robotics and vision, and uncertainty in AI.
Current Issue
Vol. 81 (2024)
Published: 2024-09-11
A Fortiori Case-Based Reasoning: From Theory to Data
Inverting cryptographic hash functions via cube-and-conquer, efficient and fair healthcare rationing, uncertainty as a fairness measure, the human in interactive machine learning: analysis and perspectives for ambient intelligence, the effect of preferences in abstract argumentation under a claim-centric view, digraph k-coloring games: new algorithms and experiments, opening the analogical portal to explainability: can analogies help laypeople in ai-assisted decision making, separating and collapsing electoral control types, the state of computer vision research in africa, understanding what affects the generalization gap in visual reinforcement learning: theory and empirical evidence.
Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.
- View all journals
- My Account Login
- Explore content
- About the journal
- Publish with us
- Sign up for alerts
- Open access
- Published: 16 October 2023
Forecasting the future of artificial intelligence with machine learning-based link prediction in an exponentially growing knowledge network
- Mario Krenn ORCID: orcid.org/0000-0003-1620-9207 1 ,
- Lorenzo Buffoni 2 ,
- Bruno Coutinho 2 ,
- Sagi Eppel 3 ,
- Jacob Gates Foster 4 ,
- Andrew Gritsevskiy ORCID: orcid.org/0000-0001-8138-8796 3 , 5 , 6 ,
- Harlin Lee ORCID: orcid.org/0000-0001-6128-9942 4 ,
- Yichao Lu ORCID: orcid.org/0009-0001-2005-1724 7 ,
- João P. Moutinho 2 ,
- Nima Sanjabi ORCID: orcid.org/0009-0000-6342-5231 8 ,
- Rishi Sonthalia ORCID: orcid.org/0000-0002-0928-392X 4 ,
- Ngoc Mai Tran 9 ,
- Francisco Valente ORCID: orcid.org/0000-0001-6964-9391 10 ,
- Yangxinyu Xie ORCID: orcid.org/0000-0002-1532-6746 11 ,
- Rose Yu 12 &
- Michael Kopp 6
Nature Machine Intelligence volume 5 , pages 1326–1335 ( 2023 ) Cite this article
30k Accesses
19 Citations
1040 Altmetric
Metrics details
- Complex networks
- Computer science
- Research data
A tool that could suggest new personalized research directions and ideas by taking insights from the scientific literature could profoundly accelerate the progress of science. A field that might benefit from such an approach is artificial intelligence (AI) research, where the number of scientific publications has been growing exponentially over recent years, making it challenging for human researchers to keep track of the progress. Here we use AI techniques to predict the future research directions of AI itself. We introduce a graph-based benchmark based on real-world data—the Science4Cast benchmark, which aims to predict the future state of an evolving semantic network of AI. For that, we use more than 143,000 research papers and build up a knowledge network with more than 64,000 concept nodes. We then present ten diverse methods to tackle this task, ranging from pure statistical to pure learning methods. Surprisingly, the most powerful methods use a carefully curated set of network features, rather than an end-to-end AI approach. These results indicate a great potential that can be unleashed for purely ML approaches without human knowledge. Ultimately, better predictions of new future research directions will be a crucial component of more advanced research suggestion tools.
Similar content being viewed by others
Learning on knowledge graph dynamics provides an early warning of impactful research
TrendyGenes, a computational pipeline for the detection of literature trends in academia and drug discovery
Accelerating science with human-aware artificial intelligence
The corpus of scientific literature grows at an ever-increasing speed. Specifically, in the field of artificial intelligence (AI) and machine learning (ML), the number of papers every month is growing exponentially with a doubling rate of roughly 23 months (Fig. 1 ). Simultaneously, the AI community is embracing diverse ideas from many disciplines such as mathematics, statistics and physics, making it challenging to organize different ideas and uncover new scientific connections. We envision a computer program that can automatically read, comprehend and act on AI literature. It can predict and suggest meaningful research ideas that transcend individual knowledge and cross-domain boundaries. If successful, it could greatly improve the productivity of AI researchers, open up new avenues of research and help drive progress in the field.
The doubling rate of papers per month is roughly 23 months, which might lead to problems for publishing in these fields, at some point. The categories are cs.AI, cs.LG, cs.NE and stat.ML.
In this work, we address the ambitious vision of developing a data-driven approach to predict future research directions 1 . As new research ideas often emerge from connecting seemingly unrelated concepts 2 , 3 , 4 , we model the evolution of AI literature as a temporal network. We construct an evolving semantic network that encapsulates the content and development of AI research since 1994, with approximately 64,000 nodes (representing individual concepts) and 18 million edges (connecting jointly investigated concepts).
We use the semantic network as an input to ten diverse statistical and ML methods to predict the future evolution of the semantic network with high accuracy. That is, we can predict which combinations of concepts AI researchers will investigate in the future. Being able to predict what scientists will work on is a first crucial step for suggesting new topics that might have a high impact.
Several methods were contributions to the Science4Cast competition hosted by the 2021 IEEE International Conference on Big Data (IEEE BigData 2021). Broadly, we can divide the methods into two classes: methods that use hand-crafted network-theoretical features and those that automatically learn features. We found that models using carefully hand-crafted features outperform methods that attempt to learn features autonomously. This (somewhat surprising) finding indicates a great potential for improvements of models free of human priors.
Our paper introduces a real-world graph benchmark for AI, presents ten methods for solving it, and discusses how this task contributes to the larger goal of AI-driven research suggestions in AI and other disciplines. All methods are available at GitHub 5 .
Semantic networks
The goal here is to extract knowledge from the scientific literature that can subsequently be processed by computer algorithms. At first glance, a natural first step would be to use large language model (such as GPT3 6 , Gopher 7 , MegaTron 8 or PaLM 9 ) on each article to extract concepts and their relations automatically. However, these methods still struggle in reasoning capabilities 10 , 11 ; thus, it is not yet directly clear how these models can be used for identifying and suggesting new ideas and concept combinations.
Rzhetsky et al. 12 pioneered an alternative approach, creating semantic networks in biochemistry from co-occurring concepts in scientific papers. There, nodes represent scientific concepts, specifically biomolecules, and are linked when a paper mentions both in its title or abstract. This evolving network captures the field’s history and, using supercomputer simulations, provides insights into scientists’ collective behaviour and suggests more efficient research strategies 13 . Although creating semantic networks from concept co-occurrences extracts only a small amount of knowledge from each paper, it captures non-trivial and actionable content when applied to large datasets 2 , 4 , 13 , 14 , 15 . PaperRobot extends this approach by predicting new links from large medical knowledge graphs and formulating new ideas in human language as paper drafts 16 .
This approach was applied and extended to quantum physics 17 by building a semantic network of over 6,000 concepts. There, the authors (including one of us) formulated the prediction of new research trends and connections as an ML task, with the goal of identifying concept pairs not yet jointly discussed in the literature but likely to be investigated in the future. This prediction task was one component for personalized suggestions of new research ideas.
Link prediction in semantic networks
We formulate the prediction of future research topics as a link-prediction task in an exponentially growing semantic network in the AI field. The goal is to predict which unconnected nodes, representing scientific concepts not yet jointly researched, will be connected in the future.
Link prediction is a common problem in computer science, addressed with classical metrics and features, as well as ML techniques. Network theory-based methods include local motif-based approaches 18 , 19 , 20 , 21 , 22 , linear optimization 23 , global perturbations 24 and stochastic block models 25 . ML works optimized a combination of predictors 26 , with further discussion in a recent review 27 .
In ref. 17 , 17 hand-crafted features were used for this task. In the Science4Cast competition, the goal was to find more precise methods for link-prediction tasks in semantic networks (a semantic network of AI that is ten times larger than the one in ref. 17 ).
Potential for idea generation in science
The long-term goal of predictions and suggestions in semantic networks is to provide new ideas to individual researchers. In a way, we hope to build a creative artificial muse in science 28 . We can bias or constrain the model to give topic suggestions that are related to the research interest of individual scientists, or a pair of scientists to suggest topics for collaborations in an interdisciplinary setting.
Generation and analysis of the dataset
Dataset construction.
We create a dynamic semantic network using papers published on arXiv from 1992 to 2020 in the categories cs.AI, cs.LG, cs.NE and stat.ML. The 64,719 nodes represent AI concepts extracted from 143,000 paper titles and abstracts using Rapid Automatic Keyword Extraction (RAKE) and normalized via natural language processing (NLP) techniques and custom methods 29 . Although high-quality taxonomies such as the Computer Science Ontology (CSO) exist 30 , 31 , we choose not to use them for two reasons: the rapid growth of AI and ML may result in new concepts not yet in the CSO, and not all scientific domains have high-quality taxonomies like CSO. Our goal is to build a scalable approach applicable to any domain of science. However, future research could investigate merging these approaches (see ‘Extensions and future work’).
Concepts form the nodes of the semantic network, and edges are drawn when concepts co-appear in a paper title or abstract. Edges have time stamps based on the paper’s publication date, and multiple time-stamped edges between concepts are common. The network is edge-weighted, and the weight of an edge stands for the number of papers that connect two concepts. In total, this creates a time-evolving semantic network, depicted in Fig. 2 .
Utilizing 143,000 AI and ML papers on arXiv from 1992 to 2020, we create a list of concepts using RAKE and other NLP tools, which form nodes in a semantic network. Edges connect concepts that co-occur in titles or abstracts, resulting in an evolving network that expands as more concepts are jointly investigated. The task involves predicting which unconnected nodes (concepts not yet studied together) will connect within a few years. We present ten diverse statistical and ML methods to address this challenge.
Network-theoretical analysis
The published semantic network has 64,719 nodes and 17,892,352 unique undirected edges, with a mean node degree of 553. Many hub nodes greatly exceed this mean degree, as shown in Fig. 3 , For example, the highest node degrees are 466,319 (neural network), 198,050 (deep learning), 195,345 (machine learning), 169,555 (convolutional neural network), 159,403 (real world), 150,227 (experimental result), 127,642 (deep neural network) and 115,334 (large scale). We fit a power-law curve to the degree distribution p ( k ) using ref. 32 and obtained p ( k ) ∝ k −2.28 for degree k ≥ 1,672. However, real complex network degree distributions often follow power laws with exponential cut-offs 33 . Recent work 34 has indicated that lognormal distributions fit most real-world networks better than power laws. Likelihood ratio tests from ref. 32 suggest truncated power law ( P = 0.0031), lognormal ( P = 0.0045) and lognormal positive ( P = 0.015) fit better than power law, while exponential ( P = 3 × 10 −10 ) and stretched exponential ( P = 6 × 10 −5 ) are worse. We couldn’t conclusively determine the best fit with P ≤ 0.1.
Nodes with the highest (466,319) and lowest (2) non-zero degrees are neural network and video compression technique, respectively. The most frequent non-zero degree is 64 (which occures 313 times). The plot, in log scale, omits 1,247 nodes with zero degrees.
We observe changes in network connectivity over time. Although degree distributions remained heavy-tailed, the ordering of nodes within the tail changed due to popularity trends. The most connected nodes and the years they became so include decision tree (1994), machine learning (1996), logic program (2000), neural network (2005), experimental result (2011), machine learning (2013, for a second time) and neural network (2015).
Connected component analysis in Fig. 4 reveals that the network grew more connected over time, with the largest group expanding and the number of connected components decreasing. Mid-sized connected components’ trajectories may expose trends, like image processing. A connected component with four nodes appeared in 1999 (brightness change, planar curve, local feature, differential invariant), and three more joined in 2000 (similarity transformation, template matching, invariant representation). In 2006, a paper discussing support vector machine and local feature merged this mid-sized group with the largest connected component.
Primary (left, blue) vertical axis: number of connected components with more than one node. Secondary (right, orange) vertical axis: number of nodes in the largest connected component. For example, the network in 2019 comprises of one large connected component with 63,472 nodes and 1,247 isolated nodes, that is, nodes with no edges. However, the 2001 network has 19 connected components with size greater than one, the largest of which has 2,733 nodes.
The semantic network reveals increasing centralization over time, with a smaller percentage of nodes (concepts) contributing to a larger fraction of edges (concept combinations). Figure 5 shows that the fraction of edges for high-degree nodes rises, while it decreases for low-degree nodes. The decreasing average clustering coefficient over time supports this trend, suggesting nodes are more likely to connect to high-degree central nodes. This could be due to the AI community’s focus on a few dominating methods or more consistent terminology use.
This cumulative histogram illustrates the fraction of nodes (concepts) corresponding to the fraction of edges (connections) for given years (1999, 2003, 2007, 2011, 2015 and 2019). The graph was generated by adding edges and nodes dated before each year. Nodes are sorted by increasing degrees. The y value at x = 80 represents the fraction of edges contributed by all nodes in and below the 80th percentile of degrees.
Problem formulation
At the big picture, we aim to make predictions in an exponentially growing semantic network. The specific task involves predicting which two nodes v 1 and v 2 with degrees d ( v 1/ 2 ) ≥ c lacking an edge in the year (2021 − δ ) will have w edges in 2021. We use δ = 1, 3, 5, c = 0, 5, 25 and w = 1, 3, where c is a minimal degree. Note that c = 0 is an intriguing special case where the nodes may not have an associated edge in the initial year, requiring the model to predict which nodes will connect to entirely new edges. The task w = 3 goes beyond simple link prediction and seeks to identify uninvestigated concept pairs that will appear together in at least three papers. An interesting alternative task could be predicting the fastest-growing links, denoted as ‘trend’ prediction.
In this task, we provide a list of 10 million unconnected node pairs (each node having a degree ≥ c ) for the year (2021 − δ ), with the goal of sorting this list by descending probability that they will have at least w edges in 2021.
For evaluation, we employ the receiver operating characteristic (ROC) curve 35 , which plots the true-positive rate against the false-positive rate at various threshold settings. We use the area under the curve (AUC) of the ROC curve as our evaluation metric. The advantage of AUC over mean square error is its independence from the data distribution. Specifically, in our case, where the two classes have a highly asymmetric distribution (with only about 1–3% of newly connected edges) and the distribution changes over time, AUC offers meaningful interpretation. Perfect predictions yield AUC = 1, whereas random predictions result in AUC = 0.5. AUC represents the percentage that a random true element is ranked higher than a random false one. For other metrics, see ref. 36 .
To tackle this task, models can use the complete information of the semantic network from the year (2021 − δ ) in any way possible. In our case, all presented models generate a dataset for learning to make predictions from (2021 − 2 δ ) to (2021 − δ ). Once the models successfully complete this task, they are applied to the test dataset to make predictions from (2021 − δ ) to 2021. All reported AUCs are based on the test dataset. Note that solving the test dataset is especially challenging due to the δ -year shift, causing systematic changes such as the number of papers and density of the semantic network.
AI-based solutions
We demonstrate various methods to predict new links in a semantic network, ranging from pure statistical approaches and neural networks with hand-crafted features (NF) to ML models without NF. The results are shown in Fig. 6 , with the highest AUC scores achieved by methods using NF as ML model inputs. Pure network features without ML are competitive, while pure ML methods have yet to outperform those with NF. Predicting links generated at least three times can achieve a quasi-deterministic AUC > 99.5%, suggesting an interesting target for computational sociology and science of science research. We have performed numerous tests to exclude data leakage in the benchmark dataset, overfitting or data duplication both in the set of articles and the set of concepts. We rank methods based on their performance, with model M1 as the best performing and model M8 as the least effective (for the prediction of a new edge with δ = 3, c = 0). Models M4 and M7 are subdivided into M4A, M4B, M7A and M7B, differing in their focus on feature or embedding selection (more details in Methods ).
Here we show the AUC values for different models that use machine learning techniques (ML), hand-crafted network features (NF) or a combination thereof. The left plot shows results for the prediction of a single new link (that is, w = 1) and the right plot shows the results for the prediction of new triple links w = 3. The task is to predict δ = [1, 3, 5] years into the future, with cut-off values c = [0, 5, 25]. We sort the models by the the results for the task ( w = 1, δ = 3, c = 0), which was the task in the Science4Cast competition. Data points that are not shown have a AUC below 0.6 or are not computed due to computational costs. All AUC values reported are computed on a validation dataset δ years ahead of the training dataset that the models have never seen. Note that the prediction of new triple edges can be performed nearly deterministic. It will be interesting to understand the origin of this quasi-deterministic pattern in AI research, for example, by connecting it to the research interests of scientists 88 .
Model M1: NF + ML. This approach combines tree-based gradient boosting with graph neural networks, using extensive feature engineering to capture node centralities, proximity and temporal evolution 37 . The Light Gradient Boosting Machine (LightGBM) model 38 is employed with heavy regularization to combat overfitting due to the scarcity of positive examples, while a time-aware graph neural network learns dynamic node representations.
Model M2: NF + ML. This method utilizes node and edge features (as well as their first and second derivatives) to predict link formation probabilities 39 . Node features capture popularity, and edge features measure similarity. A multilayer perceptron with rectified linear unit (ReLU) activation is used for learning. Cold start issues are addressed with feature imputation.
Model M3: NF + ML. This method captures hand-crafted node features over multiple time snapshots and employs a long short-term memory (LSTM) to learn time dependencies 40 . The features were selected to be highly informative while having a low computational cost. The final configuration uses degree centrality, degree of neighbours and common neighbours as features. The LSTM outperforms fully connected neural networks.
Model M4: pure NF. Two purely statistical methods, preferential attachment 41 and common neighbours 27 , are used 42 . Preferential attachment is based on node degrees, while common neighbours relies on the number of shared neighbours. Both methods are computationally inexpensive and perform competitively with some learning-based models.
Model M5: NF + ML. Here, ten groups of first-order graph features are extracted to obtain neighbourhood and similarity properties, with principal component analysis 43 applied for dimensionality reduction 44 . A random forest classifier is trained on the balanced dataset to predict new links.
Model M6: NF + ML. The baseline solution uses 15 hand-crafted features as input to a four-layer neural network, predicting the probability of link formation between node pairs 17 .
Model M7: end-to-end ML (auto node embedding). The baseline solution is modified to use node2vec 45 and ProNE embeddings 46 instead of hand-crafted features. The embeddings are input to a neural network with two hidden layers for link prediction.
Model M8: end-to-end ML (transformers). This method learns features in an unsupervised manner using transformers 47 . Node2vec embeddings 45 , 48 are generated for various snapshots of the adjacency matrix, and a transformer model 49 is pre-trained as a feature extractor. A two-layer ReLU network is used for classification.
Extensions and future work
Developing an AI that suggests research topics to scientists is a complex task, and our link-prediction approach in temporal networks is just the beginning. We highlight key extensions and future work directly related to the ultimate goal of AI for AI.
High-quality predictions without feature engineering. Interestingly, the most effective methods utilized carefully crafted features on a graph with extracted concepts as nodes and edges representing their joint publication history. Investigating whether end-to-end deep learning can solve tasks without feature engineering will be a valuable next step.
Fully automated concept extraction. Current concept lists, generated by RAKE’s statistical text analysis, demand time-consuming code development to address irrelevant term extraction (for example, verbs, adjectives). A fully automated NLP technique that accurately extracts meaningful concepts without manual code intervention would greatly enhance the process.
Leveraging ontology taxonomies. Alongside fully automated concept extraction, utilizing established taxonomies such as the CSO 30 , 31 , Wikipedia-extracted concepts, book indices 17 or PhySH key phrases is crucial. Although not comprehensive for all domains, these curated datasets often contain hierarchical and relational concept information, greatly improving prediction tasks.
Incorporating relation extraction. Future work could explore relation extraction techniques for constructing more accurate, sparser semantic networks. By discerning and classifying meaningful concept relationships in abstracts 50 , 51 , a refined AI literature representation is attainable. Using NLP tools for entity recognition, relationship identification and classification, this approach may enhance prediction performance and novel research direction identification.
Generation of new concepts. Our work predicts links between known concepts, but generating new concepts using AI remains a challenge. This unsupervised task, as explored in refs. 52 , 53 , involves detecting concept clusters with dynamics that signal new concept formation. Incorporating emerging concepts into the current framework for suggesting research topics is an intriguing future direction.
Semantic information beyond concept pairs. Currently, abstracts and titles are compressed into concept pairs, but more comprehensive information extraction could yield meaningful predictions. Exploring complex data structures such as hypergraphs 54 may be computationally demanding, but clever tricks could reduce complexity, as shown in ref. 55 . Investigating sociological factors or drawing inspiration from material science approaches 56 may also improve prediction tasks. A recent dataset for the study of the science of science also includes more complex data structures than the ones used in our paper, including data from social networks such as Twitter 57 .
Predictions of scientific success. While predicting new links between concepts is valuable, assessing their potential impact is essential for high-quality suggestions. Introducing a metric of success, like estimated citation numbers or citation growth rate, can help gauge the importance of these connections. Adapting citation prediction techniques from the science of science 58 , 59 , 60 , 61 to semantic networks offers a promising research direction.
Anomaly detections. Predicting likely connections may not align with finding surprising research directions. One method for identifying surprising suggestions involves constraining cosine similarity between vertices 62 , which measures shared neighbours and can be associated with semantic (dis)similarity. Another approach is detecting anomalies in semantic networks, which are potential links with extreme properties 63 , 64 . While scientists often focus on familiar topics 3 , 4 , greater impact results from unexpected combinations of distant domains 12 , encouraging the search for surprising associations.
End-to-end formulation. Our method breaks down the goal of extracting knowledge from scientific literature into subtasks, contrasting with end-to-end deep learning that tackles problems directly without subproblems 65 , 66 . End-to-end approaches have shown great success in various domains 67 , 68 , 69 . Investigating whether such an end-to-end solution can achieve similar success in our context would be intriguing.
Our method represents a crucial step towards developing a tool that can assist scientists in uncovering novel avenues for exploration. We are confident that our outlined ideas and extensions pave the way for achieving practical, personalized, interdisciplinary AI-based suggestions for new impactful discoveries. We firmly believe that such a tool holds the potential to become a influential catalyst, transforming the way scientists approach research questions and collaborate in their respective fields.
Details on concept set generation and application
In this section, we provide details on the generation of our list of 64,719 concepts. For more information, the code is accessible on GitHub . The entire approach is designed for immediate scalability to other domains.
Initially, we utilized approximately 143,000 arXiv papers from the categories cs.AI, cs.LG, cs.NE and stat.ML spanning 1992 to 2020. The omission of earlier data has a negligible effect on our research question, as we show below. We then iterated over each individual article, employing RAKE (with an extended stopword list) to suggest concept candidates, which were subsequently stored.
Following the iteration, we retained concepts composed of at least two words (for example, neural network) appearing in six or more articles, as well as concepts comprising a minimum of three words (for example, recurrent neural network) appearing in three or more articles. This initial filter substantially reduced noise generated by RAKE, resulting in a list of 104,948 concepts.
Lastly, we developed an automated filtering tool to further enhance the quality of the concept list. This tool identified common, domain-independent errors made by RAKE, which primarily included phrases that were not concepts (for example, dataset provided or discuss open challenge). We compiled a list of 543 words not part of meaningful concepts, including verbs, ordinal numbers, conjunctions and adverbials. Ultimately, this process produced our final list of 64,719 concepts employed in our study. No further semantic concept/entity linking is applied.
By this construction, the test sets with c = 0 could lead to very rare contamination of the dataset. That is because each concept will have at least one edge in the final dataset. The effects, however, are negligible.
The distribution of concepts in the articles can be seen in Extended Data Fig. 1 . As an example, we show the extraction of concepts from five randomly chosen papers:
Memristor hardware-friendly reinforcement learning 70 : ‘actor critic algorithm’, ‘neuromorphic hardware implementation’, ‘hardware neural network’, ‘neuromorphic hardware system’, ‘neural network’, ‘large number’, ‘reinforcement learning’, ‘case study’, ‘pre training’, ‘training procedure’, ‘complex task’, ‘high performance’, ‘classical problem’, ‘hardware implementation’, ‘synaptic weight’, ‘energy efficient’, ‘neuromorphic hardware’, ‘control theory’, ‘weight update’, ‘training technique’, ‘actor critic’, ‘nervous system’, ‘inverted pendulum’, ‘explicit supervision’, ‘hardware friendly’, ‘neuromorphic architecture’, ‘hardware system’.
Automated deep learning analysis of angiography video sequences for coronary artery disease 71 : ‘deep learning approach’, ‘coronary artery disease’, ‘deep learning analysis’, ‘traditional image processing’, ‘deep learning’, ‘image processing’, ‘f1 score’, ‘video sequence’, ‘error rate’, ‘automated analysis’, ‘coronary artery’, ‘vessel segmentation’, ‘key frame’, ‘visual assessment’, ‘analysis method’, ‘analysis pipeline’, ‘coronary angiography’, ‘geometrical analysis’.
Demographic influences on contemporary art with unsupervised style embeddings 72 : ‘classification task’, ‘social network’, ‘data source’, ‘visual content’, ‘graph network’, ‘demographic information’, ‘social connection’, ‘visual style’, ‘historical dataset’, ‘novel information’
The utility of general domain transfer learning for medical language tasks 73 : ‘natural language processing’, ‘long short term memory’, ‘logistic regression model’, ‘transfer learning technique’, ‘short term memory’, ‘average f1 score’, ‘class classification model’, ‘domain transfer learning’, ‘weighted average f1 score’, ‘medical natural language processing’, ‘natural language process’, ‘transfer learning’, ‘f1 score’, ’natural language’, ’deep model’, ’logistic regression’, ’model performance’, ’classification model’, ’text classification’, ’regression model’, ’nlp task’, ‘short term’, ‘medical domain’, ‘weighted average’, ‘class classification’, ‘bert model’, ‘language processing’, ‘biomedical domain’, ‘domain transfer’, ‘nlp model’, ‘main model’, ‘general domain’, ‘domain model’, ‘medical text’.
Fast neural architecture construction using envelopenets 74 : ‘neural network architecture’, ‘neural architecture search’, ‘deep network architecture’, ‘image classification problem’, ‘neural architecture search method’, ‘neural network’, ‘reinforcement learning’, ‘deep network’, ‘image classification’, ‘objective function’, ‘network architecture’, ‘classification problem’, ‘evolutionary algorithm’, ‘neural architecture’, ‘base network’, ‘architecture search’, ‘training epoch’, ‘search method’, ‘image class’, ‘full training’, ‘automated search’, ‘generated network’, ‘constructed network’, ‘gpu day’.
Time gap between the generation of edges
We use articles from arXiv, which only goes back to the year 1992. However, of course, the field of AI exists at least since the 1960s 75 . Thus, this raises the question whether the omission of the first 30–40 years of research has a crucial impact in the prediction task we formulate, specifically, whether edges that we consider as new might not be so new after all. Thus, in Extended Data Fig. 2 , we compute the time between the formation of edges between the same concepts, taking into account all or just the first edge. We see that the vast majority of edges are formed within short time periods, thus the effect of omission of early publication has a negligible effect for our question. Of course, different questions might be crucially impacted by the early data; thus, a careful choice of the data source is crucial 61 .
Positive examples in the test dataset
Table 1 shows the number of positive cases within the 10 million examples in the 18 test datasets that are used for evaluation.
Publication rates in quantum physics
Another field of research that gained a lot of attention in the recent years is quantum physics. This field is also a strong adopter of arXiv. Thus, we analyse in the same way as for AI in Fig. 1 . We find in Extended Data Fig. 3 no obvious exponential increase in papers per month. A detailed analysis of other domains is beyond the current scope. It will be interesting to investigate the growth rates in different scientific disciplines in more detail, especially given that exponential increase has been observed in several aspects of the science of science 3 , 76 .
Details on models M1–M8
What follows are more detailed explanations of the models presented in the main text. All codes are available at GitHub. The feature importance of the best model M1 is shown here, those of other models are analysed in the respective workshop contributions (cited in the subsections).
Details on M1
The best-performing solution is based on a blend of a tree-based gradient boosting approach and a graph neural network approach 37 . Extensive feature engineering was conducted to capture the centralities of the nodes, the proximity between node pairs and their evolution over time. The centrality of a node is captured by the number of neighbours and the PageRank score 77 , while the proximity between a node pair is derived using the Jaccard index. We refer the reader to ref. 37 for the list of all features and their feature importance.
The tree-based gradient boosting approach uses LightGBM 38 and applies heavy regularization to combat overfitting due to the scarcity of positive samples. The graph neural network approach employs a time-aware graph neural network to learn node representations on dynamic semantic networks. The feature importance of model M1, averaged over 18 datasets, is shown in Table 2 . It shows that the temporal features do contribute largely to the model performance, but the model remains strong even when they are removed. An example of the evolution of the training (from 2016 to 2019) and test set (2019 to 2021) for δ = 3, c = 25, ω = 1 is shown in Extended Data Fig. 4 .
Details on M2
The second method assumes that the probability that nodes u and v form an edge in the future is a function of the node features f ( u ), f ( v ) and some edge feature h ( u , v ). We chose node features f that capture popularity at the current time t 0 (such as degree, clustering coefficient 78 , 79 and PageRank 77 ). We also use these features’ first and second time derivatives to capture the evolution of the node’s popularity over time. After variable selection during training, we chose h to consist of the HOP-rec score (high-order proximity for implicit recommendation) 80 , 81 and a variation of the Dice similarity score 82 as a measure of similarity between nodes. In summary, we use 31 node features for each node, and two edge features, which gives 31 × 2 + 2 = 64 features in total. These features are then fed into a small multilayer perceptron (5 layers, each with 13 neurons) with ReLU activation.
Cold start is the problem that some nodes in the test set do not appear in the training set. Our strategy for a cold start is imputation. We say a node v is seen if it appeared in the training data, and unseen otherwise; similarly, we say that a node is born at time t if t is the first time stamp where an edge linking this node has appeared. The idea is that an unseen node is simply a node born in the future, so its features should look like a recently born node in the training set. If a node is unseen, then we impute its features as the average of the features of the nodes born recently. We found that with imputation during training, the test AUC scores across all models consistently increased by about 0.02. For a complete description of this method, we refer the reader to ref. 39 .
Details on M3
This approach, detailed in ref. 40 , uses hand-crafted node features that have been captured in multiple time snapshots (for example, every year) and then uses an LSTM to benefit from learning the time dependencies of these features. The final configuration uses two main types of feature: node features including degree and degree of neighbours, and edge features including common neighbours. In addition, to balance the training data, the same number of positive and negative instances have been randomly sampled and combined.
One of the goals was to identify features that are very informative with a very low computational cost. We found that the degree centrality of the nodes is the most important feature, and the degree centrality of the neighbouring nodes and the degree of mutual neighbours gave us the best trade-off. As all of the extracted features’ distributions are highly skewed to the right, meaning most of the features take near zero values, using a power transform such as Yeo–Johnson 83 helps to make the distributions more Gaussian, which boosts the learning. Finally, for the link-prediction task, we saw that LSTMs perform better than fully connected neural networks.
Details on M4
The following two methods are based on a purely statistical analysis of the test data and are explained in detail in ref. 42 .
Preferential attachment. In the network analysis, we concluded that the growth of this dataset tends to maintain a heavy-tailed degree distribution, often associated with scale-free networks. As mentioned before the γ value of the degree distribution is very close to 2, suggesting that preferential attachment 41 is probably the main organizational principle of the network. As such, we implemented a simple prediction model following this procedure. Preferential attachment scores in link prediction are often quantified as
with k i , j the degree of nodes i and j . However, this assumes the scoring of links between nodes that are already connected to the network, that is k i , j > 0, which is not the case for all the links we must score in the dataset. As a result, we define our preferential attachment model as
Using this simple model with no free parameters we could score new links and compare them with the other models. Immediately we note that preferential attachment outperforms some learning-based models, even if it never manages to reach the top AUC, but it is extremely simple and with negligible computational cost.
Common neighbours. We explore another network-based approach to score the links. Indeed, while the preferential attachment model we derived performed well, it uses no information about the distance between i and j , which is a popular feature used in link-prediction methods 27 . As such, we decided to test a method known as common neighbours 18 . We define Γ ( i ) as the set of neighbors of node i and Γ ( i ) ∩ Γ ( j ) as the set of common neighbours between nodes i and j . We can easily score the nodes with
the intuition being that nodes that share a larger number of neighbours are more likely to be connected than distant nodes that do not share any.
Evaluating this score for each pair ( i , j ) on the dataset of unconnected pairs, which can be computed as the second power of the adjacency matrix, A 2 , we obtained an AUC that is sometimes higher than preferential attachment and sometimes lower than it but is still consistently quite close with the best learning-based models.
Details on M5
This method is based on ref. 44 . First, ten groups of first-order graph features are extracted to get some neighbourhood and similarity properties from each pair of nodes: degree centrality of nodes, pair’s total number of neighbours, common neighbours index, Jaccard coefficient, Simpson coefficient, geometric coefficient, cosine coefficient, Adamic–Adar index, resource allocation index and preferential attachment index. They are obtained for three consecutive years to capture the temporal dynamics of the semantic network, leading to a total of 33 features. Second, principal component analysis 43 is applied to reduce the correlation between features, speed up the learning process and improve generalization, which results in a final set of seven latent variables. Lastly, a random forest classifier is trained (using a balanced dataset) to estimate the likelihood of new links between the AI concepts.
In this paper, a modification was performed in relation to the original formulation of the method 44 : two of the original features, average neighbour degree and clustering coefficient, were infeasible to extract for some of the tasks covered in this paper, as their computation can be heavy for such a very large network, and they were discarded. Due to some computational memory issues, it was not possible to run the model for some of the tasks covered in this study, and so those results are missing.
Details on M6
The baseline solution for the Science4Cast competition was closely related to the model presented in ref. 17 . It uses 15 hand-crafted features of a pair of nodes v 1 and v 2 . (Degrees of v 1 and v 2 in the current year and previous two years are six properties. The number of shared neighbours in total of v 1 and v 2 in the current year and previous two years are six properties. The number of shared neighbours between v 1 and v 2 in the current year and the previous two years are three properties). These 15 features are the input of a neural network with four layers (15, 100, 10 and 1 neurons), intending to predict whether the nodes v 1 and v 2 will have w edges in the future. After the training, the model computes the probability for all 10 million evaluation examples. This list is sorted and the AUC is computed.
Details on M7
The solution M7 was not part of the Science4Cast competition and therefore not described in the corresponding proceedings, thus we want to add more details.
The most immediate way one can apply ML to this problem is by automating the detection of features. Quite simply, the baseline solution M6 is modified such that instead of 15 hand-crafted features, the neural network is instead trained on features extracted from a graph embedding. We use two different embedding approaches. The first method is employs node2vec (M7A) 45 , for which we use the implementations provided in the nodevectors Python package 84 . The second one uses the ProNE embedding (M7B) 46 , which is based on sparse matrix factorizations modulated by the higher-order Cheeger inequality 85 .
The embeddings generate a 32-dimensional representation for each node, resulting in edge representations in [0, 1] 64 . These features are input into a neural network with two hidden layers of size 1,000 and 30. Like M6, the model computes the probability for evaluation examples to determine the ROC. We compare ProNE to node2vec, a common graph embedding method using a biased random walk procedure with return and in–out parameters, which greatly affect network encoding. Initial experiments used default values for a 64-dimensional encoding before inputting into the neural network. The higher variance in node2vec predictions is probably due to its sensitivity to hyperparameters. While ProNE is better suited for general multi-dataset link prediction, node2vec’s sensitivity may help identify crucial network features for predicting temporal evolution.
Details on M8
This model, which is detailed in ref. 47 , does not use any hand-crafted features but learns them in a completely unsupervised manner. To do so, we extract various snapshots of the adjacency matrix through time, capturing graphs in the form of A t for t = 1994, …, 2019. We then embed each of these graphs into 128-dimensional Euclidean space via node2vec 45 , 48 . For each node u in the semantic graph, we extract different 128-dimensional vector embeddings n u ( A 1994 ), …, n u ( A 2019 ).
Transformers have performed extremely well in NLP tasks 49 ; thus, we apply them to learn the dynamics of the embedding vectors. We pre-train a transformer to help classify node pairs. For the transformer, the encoder and decoder had 6 layers each; we used 128 as the embedding dimension, 2,048 as the feed-forward dimension and 8-headed attention. This transformer acts as our feature extractor. Once we pre-train our transformer, we add a two-layer ReLU network with hidden dimension 128 as a classifier on top.
Data availability
All 18 datasets tested in this paper are available via Zenodo at https://doi.org/10.5281/zenodo.7882892 ref. 86 .
Code availability
All of the models and codes described above can be found via GitHub at https://github.com/artificial-scientist-lab/FutureOfAIviaAI ref. 5 and a permanent Zenodo record at https://zenodo.org/record/8329701 ref. 87 .
Clauset, A., Larremore, D. B. & Sinatra, R. Data-driven predictions in the science of science. Science 355 , 477–480 (2017).
Article Google Scholar
Evans, J. A. & Foster, J. G. Metaknowledge. Science 331 , 721–725 (2011).
Article MathSciNet MATH Google Scholar
Fortunato, S. et al. Science of science. Science 359 , eaao0185 (2018).
Wang, D. & Barabási, A.-L. The Science of Science (Cambridge Univ. Press, 2021).
Krenn, M. et al. FutureOfAIviaAI. GitHub https://github.com/artificial-scientist-lab/FutureOfAIviaAI (2023).
Brown, T. et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33 , 1877–1901 (2020).
Google Scholar
Rae, J. W. et al. Scaling language models: methods, analysis & insights from training gopher. Preprint at https://arxiv.org/abs/2112.11446 (2021).
Smith, S. et al. Using DeepSpeed and Megatron to train Megatron-Turing NLG 530B, a large-scale generative language model. Preprint at https://arxiv.org/abs/2201.11990 (2022).
Chowdhery, A. et al. Palm: scaling language modeling with pathways. Preprint at https://arxiv.org/abs/2204.02311 (2022).
Kojima, T., Gu, S. S., Reid, M., Matsuo, Y. & Iwasawa, Y. Large language models are zero-shot reasoners. Preprint at https://arxiv.org/abs/2205.11916 (2022).
Zhang, H., Li, L. H., Meng, T., Chang, K.-W. & Broeck, G. V. d. On the paradox of learning to reason from data. Preprint at https://arxiv.org/abs/2205.11502 (2022).
Rzhetsky, A., Foster, J. G., Foster, I. T. & Evans, J. A. Choosing experiments to accelerate collective discovery. Proc. Natl Acad. Sci. USA 112 , 14569–14574 (2015).
Foster, J. G., Rzhetsky, A. & Evans, J. A. Tradition and innovation in scientists’ research strategies. Am. Sociol. Rev. 80 , 875–908 (2015).
Van Eck, N. J. & Waltman, L. Text mining and visualization using vosviewer. Preprint at https://arxiv.org/abs/1109.2058 (2011).
Van Eck, N. J. & Waltman, L. in Measuring Scholarly Impact: Methods and Practice (eds Ding, Y. et al.) 285–320 (Springer, 2014).
Wang, Q. et al. Paperrobot: Incremental draft generation of scientific ideas. Preprint at https://arxiv.org/abs/1905.07870 (2019).
Krenn, M. & Zeilinger, A. Predicting research trends with semantic and neural networks with an application in quantum physics. Proc. Natl Acad. Sci. USA 117 , 1910–1916 (2020).
Liben-Nowell, D. & Kleinberg, J. The link-prediction problem for social networks. J. Am. Soc. Inf. Sci. Technol. 58 , 1019–1031 (2007).
Albert, I. & Albert, R. Conserved network motifs allow protein–protein interaction prediction. Bioinformatics 20 , 3346–3352 (2004).
Zhou, T., Lü, L. & Zhang, Y.-C. Predicting missing links via local information. Eur. Phys. J. B 71 , 623–630 (2009).
Article MATH Google Scholar
Kovács, I. A. et al. Network-based prediction of protein interactions. Nat. Commun. 10 , 1240 (2019).
Muscoloni, A., Abdelhamid, I. & Cannistraci, C. V. Local-community network automata modelling based on length-three-paths for prediction of complex network structures in protein interactomes, food webs and more. Preprint at bioRxiv https://doi.org/10.1101/346916 (2018).
Pech, R., Hao, D., Lee, Y.-L., Yuan, Y. & Zhou, T. Link prediction via linear optimization. Physica A 528 , 121319 (2019).
Lü, L., Pan, L., Zhou, T., Zhang, Y.-C. & Stanley, H. E. Toward link predictability of complex networks. Proc. Natl Acad. Sci. USA 112 , 2325–2330 (2015).
Guimerà, R. & Sales-Pardo, M. Missing and spurious interactions and the reconstruction of complex networks. Proc. Natl Acad. Sci. USA 106 , 22073–22078 (2009).
Ghasemian, A., Hosseinmardi, H., Galstyan, A., Airoldi, E. M. & Clauset, A. Stacking models for nearly optimal link prediction in complex networks. Proc. Natl Acad. Sci. USA 117 , 23393–23400 (2020).
Zhou, T. Progresses and challenges in link prediction. iScience 24 , 103217 (2021).
Krenn, M. et al. On scientific understanding with artificial intelligence. Nat. Rev. Phys. 4 , 761–769 (2022).
Rose, S., Engel, D., Cramer, N. & Cowley, W. in Text Mining: Applications and Theory (eds Berry, M. W. & Kogan, J.) Ch. 1 (Wiley, 2010).
Salatino, A. A., Thanapalasingam, T., Mannocci, A., Osborne, F. & Motta, E. The computer science ontology: a large-scale taxonomy of research areas. In Proc. Semantic Web–ISWC 2018: 17th International Semantic Web Conference Part II Vol. 17, 187–205 (Springer, 2018).
Salatino, A. A., Osborne, F., Thanapalasingam, T. & Motta, E. The CSO classifier: ontology-driven detection of research topics in scholarly articles. In Proc. Digital Libraries for Open Knowledge: 23rd International Conference on Theory and Practice of Digital Libraries Vol. 23, 296–311 (Springer, 2019).
Alstott, J., Bullmore, E. & Plenz, D. powerlaw: a Python package for analysis of heavy-tailed distributions. PLoS ONE 9 , e85777 (2014).
Fenner, T., Levene, M. & Loizou, G. A model for collaboration networks giving rise to a power-law distribution with an exponential cutoff. Soc. Netw. 29 , 70–80 (2007).
Broido, A. D. & Clauset, A. Scale-free networks are rare. Nat. Commun. 10 , 1017 (2019).
Fawcett, T. ROC graphs: notes and practical considerations for researchers. Pattern Recognit. Lett. 31 , 1–38 (2004).
Sun, Y., Wong, A. K. & Kamel, M. S. Classification of imbalanced data: a review. Int. J. Pattern Recognit. Artif. Intell. 23 , 687–719 (2009).
Lu, Y. Predicting research trends in artificial intelligence with gradient boosting decision trees and time-aware graph neural networks. In 2021 IEEE International Conference on Big Data (Big Data) 5809–5814 (IEEE, 2021).
Ke, G. et al. LightGBM: a highly efficient gradient boosting decision tree. In Proc. 31st International Conference on Neural Information Processing Systems 3149–3157 (Curran Associates Inc., 2017).
Tran, N. M. & Xie, Y. Improving random walk rankings with feature selection and imputation Science4Cast competition, team Hash Brown. In 2021 IEEE International Conference on Big Data (Big Data) 5824–5827 (IEEE, 2021).
Sanjabi, N. Efficiently predicting scientific trends using node centrality measures of a science semantic network. In 2021 IEEE International Conference on Big Data (Big Data) 5820–5823 (IEEE, 2021).
Barabási, A.-L. Network science. Phil. Trans. R. Soci. A 371 , 20120375 (2013).
Moutinho, J. P., Coutinho, B. & Buffoni, L. Network-based link prediction of scientific concepts—a Science4Cast competition entry. In 2021 IEEE International Conference on Big Data (Big Data) 5815–5819 (IEEE, 2021).
Jolliffe, I. T. & Cadima, J. Principal component analysis: a review and recent developments. Phil. Trans. R. Soc. A 374 , 20150202 (2016).
Valente, F. Link prediction of artificial intelligence concepts using low computational power. In 2021 IEEE International Conference on Big Data (Big Data) 5828–5832 (2021).
Grover, A. & Leskovec, J. node2vec: scalable feature learning for networks. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 855–864 (ACM, 2016).
Zhang, J., Dong, Y., Wang, Y., Tang, J. & Ding, M. ProNE: fast and scalable network representation learning. In Proc. Twenty-Eighth International Joint Conference on Artificial Intelligence 4278–4284 (International Joint Conferences on Artificial Intelligence Organization, 2019).
Lee, H., Sonthalia, R. & Foster, J. G. Dynamic embedding-based methods for link prediction in machine learning semantic network. In 2021 IEEE International Conference on Big Data (Big Data) 5801–5808 (IEEE, 2021).
Liu, R. & Krishnan, A. PecanPy: a fast, efficient and parallelized python implementation of node2vec. Bioinformatics 37 , 3377–3379 (2021).
Vaswani, A. et al. Attention is all you need. In Proc. 31st International Conference on Neural Information Processing Systems 6000–6010 (Curran Associates Inc., 2017).
Zelenko, D., Aone, C. & Richardella, A. Kernel methods for relation extraction. J. Mach. Learn. Res. 3 , 1083–1106 (2003).
MathSciNet MATH Google Scholar
Bach, N. & Badaskar, S. A review of relation extraction. Literature Review for Language and Statistics II 2 , 1–15 (2007).
Salatino, A. A., Osborne, F. & Motta, E. How are topics born? Understanding the research dynamics preceding the emergence of new areas. PeerJ Comput. Sc. 3 , e119 (2017).
Salatino, A. A., Osborne, F. & Motta, E. AUGUR: forecasting the emergence of new research topics. In Proc. 18th ACM/IEEE on Joint Conference on Digital Libraries 303–312 (IEEE, 2018).
Battiston, F. et al. The physics of higher-order interactions in complex systems. Nat. Phys. 17 , 1093–1098 (2021).
Coutinho, B. C., Wu, A.-K., Zhou, H.-J. & Liu, Y.-Y. Covering problems and core percolations on hypergraphs. Phys. Rev. Lett. 124 , 248301 (2020).
Article MathSciNet Google Scholar
Olivetti, E. A. et al. Data-driven materials research enabled by natural language processing and information extraction. Appl. Phys. Rev. 7 , 041317 (2020).
Lin, Z., Yin, Y., Liu, L. & Wang, D. SciSciNet: a large-scale open data lake for the science of science research. Sci. Data 10 , 315 (2023).
Azoulay, P. et al. Toward a more scientific science. Science 361 , 1194–1197 (2018).
Liu, H., Kou, H., Yan, C. & Qi, L. Link prediction in paper citation network to construct paper correlation graph. EURASIP J. Wirel. Commun. Netw. 2019 , 1–12 (2019).
Reisz, N. et al. Loss of sustainability in scientific work. New J. Phys. 24 , 053041 (2022).
Frank, M. R., Wang, D., Cebrian, M. & Rahwan, I. The evolution of citation graphs in artificial intelligence research. Nat. Mach. Intell. 1 , 79–85 (2019).
Newman, M. Networks (Oxford Univ. Press, 2018).
Kwon, D. et al. A survey of deep learning-based network anomaly detection. Cluster Comput. 22 , 949–961 (2019).
Pang, G., Shen, C., Cao, L. & Hengel, A. V. D. Deep learning for anomaly detection: a review. ACM Comput. Surv. 54 , 1–38 (2021).
Collobert, R. et al. Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12 , 2493–2537 (2011).
MATH Google Scholar
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521 , 436–444 (2015).
Krizhevsky, A., Sutskever, I. & Hinton, G. E. ImageNet classification with deep convolutional neural networks. Commun. ACM 60 , 84–90 (2017).
Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518 , 529–533 (2015).
Silver, D. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529 , 484–489 (2016).
Wu, N., Vincent, A., Strukov, D. & Xie, Y. Memristor hardware-friendly reinforcement learning. Preprint at https://arxiv.org/abs/2001.06930 (2020).
Zhou, C. et al. Automated deep learning analysis of angiography video sequences for coronary artery disease. Preprint at https://arxiv.org/abs/2101.12505 (2021).
Huckle, N., Garcia, N. & Nakashima, Y. Demographic influences on contemporary art with unsupervised style embeddings. In Proc. Computer Vision–ECCV 2020 Workshops Part II Vol. 16, 126–142 (Springer, 2020).
Ranti, D. et al. The utility of general domain transfer learning for medical language tasks. Preprint at https://arxiv.org/abs/2002.06670 (2020).
Kamath, P., Singh, A. & Dutta, D. Fast neural architecture construction using envelopenets. Preprint at https://arxiv.org/abs/1803.06744 (2018).
Minsky, M. Steps toward artificial intelligence. Proc. IRE 49 , 8–30 (1961).
Bornmann, L., Haunschild, R. & Mutz, R. Growth rates of modern science: a latent piecewise growth curve approach to model publication numbers from established and new literature databases. Humanit. Soc. Sci. Commun. 8 , 224 (2021).
Brin, S. & Page, L. The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30 , 107–117 (1998).
Holland, P. W. & Leinhardt, S. Transitivity in structural models of small groups. Comp. Group Studies 2 , 107–124 (1971).
Watts, D. J. & Strogatz, S. H. Collective dynamics of ‘small-world’ networks. Nature 393 , 440–442 (1998).
Yang, J.-H., Chen, C.-M., Wang, C.-J. & Tsai, M.-F. HOP-rec: high-order proximity for implicit recommendation. In Proc. 12th ACM Conference on Recommender Systems 140–144 (2018).
Lin, B.-Y. OGB_collab_project. GitHub https://github.com/brucenccu/OGB_collab_project (2021).
Sorensen, T. A. A method of establishing groups of equal amplitude in plant sociology based on similarity of species content and its application to analyses of the vegetation on danish commons. Biol. Skar. 5 , 1–34 (1948).
Yeo, I.-K. & Johnson, R. A. A new family of power transformations to improve normality or symmetry. Biometrika 87 , 954–959 (2000).
Ranger, M. nodevectors. GitHub https://github.com/VHRanger/nodevectors (2021).
Bandeira, A. S., Singer, A. & Spielman, D. A. A Cheeger inequality for the graph connection Laplacian. SIAM J. Matrix Anal. Appl. 34 , 1611–1630 (2013).
Krenn, M. et al. Predicting the future of AI with AI. Zenodo https://doi.org/10.5281/zenodo.7882892 (2023).
Krenn, M. et al. FutureOfAIviaAI code. Zenodo https://zenodo.org/record/8329701 (2023).
Jia, T., Wang, D. & Szymanski, B. K. Quantifying patterns of research-interest evolution. Nat. Hum. Behav. 1 , 0078 (2017).
Download references
Acknowledgements
We thank IARAI Vienna and IEEE for supporting and hosting the IEEE BigData Competition Science4Cast. We are specifically grateful to D. Kreil, M. Neun, C. Eichenberger, M. Spanring, H. Martin, D. Geschke, D. Springer, P. Herruzo, M. McCutchan, A. Mihai, T. Furdui, G. Fratica, M. Vázquez, A. Gruca, J. Brandstetter and S. Hochreiter for helping to set up and successfully execute the competition and the corresponding workshop. We thank X. Gu for creating Fig. 2 , and M. Aghajohari and M. Sadegh Akhondzadeh for helpful comments on the paper. The work of H.L., R.S. and J.G.F. was supported by grant TWCF0333 from the Templeton World Charity Foundation. H.L. is additionally supported by NSF grant DMS-1952339. J.P.M. acknowledges the support of FCT (Portugal) through scholarship SFRH/BD/144151/2019. B.C. thanks the support from FCT/MCTES through national funds and when applicable co-funded EU funds under the project UIDB/50008/2020, and FCT through the project CEECINST/00117/2018/CP1495/CT0001. N.M.T. and Y.X. are supported by NSF grant DMS-2113468, the NSF IFML 2019844 award to the University of Texas at Austin, and the Good Systems Research Initiative, part of University of Texas at Austin Bridging Barriers.
Open access funding provided by Max Planck Society.
Author information
Authors and affiliations.
Max Planck Institute for the Science of Light (MPL), Erlangen, Germany
Mario Krenn
Instituto de Telecomunicações, Lisbon, Portugal
Lorenzo Buffoni, Bruno Coutinho & João P. Moutinho
University of Toronto, Toronto, Ontario, Canada
Sagi Eppel & Andrew Gritsevskiy
University of California Los Angeles, Los Angeles, CA, USA
Jacob Gates Foster, Harlin Lee & Rishi Sonthalia
Cavendish Laboratories, Cavendish, VT, USA
Andrew Gritsevskiy
Institute of Advanced Research in Artificial Intelligence (IARAI), Vienna, Austria
Andrew Gritsevskiy & Michael Kopp
Alpha 8 AI, Toronto, Ontario, Canada
Independent Researcher, Barcelona, Spain
Nima Sanjabi
University of Texas at Austin, Austin, TX, USA
Ngoc Mai Tran
Independent Researcher, Leiria, Portugal
Francisco Valente
University of Pennsylvania, Philadelphia, PA, USA
Yangxinyu Xie
University of California, San Diego, CA, USA
You can also search for this author in PubMed Google Scholar
Contributions
M. Krenn and R.Y. initiated the research. M. Krenn and M. Kopp organized the Science4Cast competition. M. Krenn generated the datasets and initial codes. S.E. and H.L. analysed the network-theoretical properties of the semantic network. M. Krenn, L.B., B.C., J.G.F., A.G, H.L., Y.L, J.P.M, N.S., R.S., N.M.T, F.V., Y.X and M. Kopp provided codes for the ten models. M. Krenn wrote the paper with input from all co-authors.
Corresponding author
Correspondence to Mario Krenn .
Ethics declarations
Competing interests.
The authors declare no competing interests.
Peer review
Peer review information.
Nature Machine Intelligence thanks Alexander Belikov, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Mirko Pieropan, in collaboration with the Nature Machine Intelligence team.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended data fig. 1.
Number of concepts per article.
Extended Data Fig. 2
Time Gap between the generation of edges. Here, left shows the time it takes to create a new edge between two vertices and right shows the time between the first and the second edge.
Extended Data Fig. 3
Publications in Quantum Physics.
Extended Data Fig. 4
Evolution of the AUC during training for Model M1.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .
Reprints and permissions
About this article
Cite this article.
Krenn, M., Buffoni, L., Coutinho, B. et al. Forecasting the future of artificial intelligence with machine learning-based link prediction in an exponentially growing knowledge network. Nat Mach Intell 5 , 1326–1335 (2023). https://doi.org/10.1038/s42256-023-00735-0
Download citation
Received : 21 January 2023
Accepted : 11 September 2023
Published : 16 October 2023
Issue Date : November 2023
DOI : https://doi.org/10.1038/s42256-023-00735-0
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
This article is cited by
Quantifying the use and potential benefits of artificial intelligence in scientific research.
- Dashun Wang
Nature Human Behaviour (2024)
Link prediction for hypothesis generation: an active curriculum learning infused temporal graph-based approach
- Uchenna Akujuobi
- Priyadarshini Kumari
- Tarek R. Besold
Artificial Intelligence Review (2024)
A commentary on transformative consumer research: Musings on its genesis, evolution, and opportunity for scientific specialization
- Martin Mende
- David Glen Mick
AMS Review (2024)
Quick links
- Explore articles by subject
- Guide to authors
- Editorial policies
Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.
- Review article
- Open access
- Published: 28 October 2019
Systematic review of research on artificial intelligence applications in higher education – where are the educators?
- Olaf Zawacki-Richter ORCID: orcid.org/0000-0003-1482-8303 1 ,
- Victoria I. Marín ORCID: orcid.org/0000-0002-4673-6190 1 ,
- Melissa Bond ORCID: orcid.org/0000-0002-8267-031X 1 &
- Franziska Gouverneur 1
International Journal of Educational Technology in Higher Education volume 16 , Article number: 39 ( 2019 ) Cite this article
354k Accesses
1093 Citations
241 Altmetric
Metrics details
According to various international reports, Artificial Intelligence in Education (AIEd) is one of the currently emerging fields in educational technology. Whilst it has been around for about 30 years, it is still unclear for educators how to make pedagogical advantage of it on a broader scale, and how it can actually impact meaningfully on teaching and learning in higher education. This paper seeks to provide an overview of research on AI applications in higher education through a systematic review. Out of 2656 initially identified publications for the period between 2007 and 2018, 146 articles were included for final synthesis, according to explicit inclusion and exclusion criteria. The descriptive results show that most of the disciplines involved in AIEd papers come from Computer Science and STEM, and that quantitative methods were the most frequently used in empirical studies. The synthesis of results presents four areas of AIEd applications in academic support services, and institutional and administrative services: 1. profiling and prediction, 2. assessment and evaluation, 3. adaptive systems and personalisation, and 4. intelligent tutoring systems. The conclusions reflect on the almost lack of critical reflection of challenges and risks of AIEd, the weak connection to theoretical pedagogical perspectives, and the need for further exploration of ethical and educational approaches in the application of AIEd in higher education.
Introduction
Artificial intelligence (AI) applications in education are on the rise and have received a lot of attention in the last couple of years. AI and adaptive learning technologies are prominently featured as important developments in educational technology in the 2018 Horizon report (Educause, 2018 ), with a time to adoption of 2 or 3 years. According to the report, experts anticipate AI in education to grow by 43% in the period 2018–2022, although the Horizon Report 2019 Higher Education Edition (Educause, 2019 ) predicts that AI applications related to teaching and learning are projected to grow even more significantly than this. Contact North, a major Canadian non-profit online learning society, concludes that “there is little doubt that the [AI] technology is inexorably linked to the future of higher education” (Contact North, 2018 , p. 5). With heavy investments by private companies such as Google, which acquired European AI start-up Deep Mind for $400 million, and also non-profit public-private partnerships such as the German Research Centre for Artificial Intelligence Footnote 1 (DFKI), it is very likely that this wave of interest will soon have a significant impact on higher education institutions (Popenici & Kerr, 2017 ). The Technical University of Eindhoven in the Netherlands, for example, recently announced that they will launch an Artificial Intelligence Systems Institute with 50 new professorships for education and research in AI. Footnote 2
The application of AI in education (AIEd) has been the subject of research for about 30 years. The International AIEd Society (IAIED) was launched in 1997, and publishes the International Journal of AI in Education (IJAIED), with the 20th annual AIEd conference being organised this year. However, on a broader scale, educators have just started to explore the potential pedagogical opportunities that AI applications afford for supporting learners during the student life cycle.
Despite the enormous opportunities that AI might afford to support teaching and learning, new ethical implications and risks come in with the development of AI applications in higher education. For example, in times of budget cuts, it might be tempting for administrators to replace teaching by profitable automated AI solutions. Faculty members, teaching assistants, student counsellors, and administrative staff may fear that intelligent tutors, expert systems and chat bots will take their jobs. AI has the potential to advance the capabilities of learning analytics, but on the other hand, such systems require huge amounts of data, including confidential information about students and faculty, which raises serious issues of privacy and data protection. Some institutions have recently been established, such as the Institute for Ethical AI in Education Footnote 3 in the UK, to produce a framework for ethical governance for AI in education, and the Analysis & Policy Observatory published a discussion paper in April 2019 to develop an AI ethics framework for Australia. Footnote 4
Russel and Norvig ( 2010 ) remind us in their leading textbook on artificial intelligence, “All AI researchers should be concerned with the ethical implications of their work” (p. 1020). Thus, we would like to explore what kind of fresh ethical implications and risks are reflected by the authors in the field of AI enhanced education. The aim of this article is to provide an overview for educators of research on AI applications in higher education. Given the dynamic development in recent years, and the growing interest of educators in this field, a review of the literature on AI in higher education is warranted.
Specifically, this paper addresses the following research questions in three areas, by means of a systematic review (see Gough, Oliver, & Thomas, 2017 ; Petticrew & Roberts, 2006 ):
How have publications on AI in higher education developed over time, in which journals are they published, and where are they coming from in terms of geographical distribution and the author’s disciplinary affiliations?
How is AI in education conceptualised and what kind of ethical implications, challenges and risks are considered?
What is the nature and scope of AI applications in the context of higher education?
The field AI originates from computer science and engineering, but it is strongly influenced by other disciplines such as philosophy, cognitive science, neuroscience, and economics. Given the interdisciplinary nature of the field, there is little agreement among AI researchers on a common definition and understanding of AI – and intelligence in general (see Tegmark, 2018 ). With regard to the introduction of AI-based tools and services in higher education, Hinojo-Lucena, Aznar-Díaz, Cáceres-Reche, and Romero-Rodríguez ( 2019 ) note that “this technology [AI] is already being introduced in the field of higher education, although many teachers are unaware of its scope and, above all, of what it consists of” (p. 1). For the purpose of our analysis of artificial intelligence in higher education, it is desirable to clarify terminology. Thus, in the next section, we explore definitions of AI in education, and the elements and methods that AI applications might entail in higher education, before we proceed with the systematic review of the literature.
AI in education (AIEd)
The birth of AI goes back to the 1950s when John McCarthy organised a two-month workshop at Dartmouth College in the USA. In the workshop proposal, McCarthy used the term artificial intelligence for the first time in 1956 (Russel & Norvig, 2010 , p. 17):
The study [of artificial intelligence] is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it. An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves.
Baker and Smith ( 2019 ) provide a broad definition of AI: “Computers which perform cognitive tasks, usually associated with human minds, particularly learning and problem-solving” (p. 10). They explain that AI does not describe a single technology. It is an umbrella term to describe a range of technologies and methods, such as machine learning, natural language processing, data mining, neural networks or an algorithm.
AI and machine learning are often mentioned in the same breath. Machine learning is a method of AI for supervised and unsupervised classification and profiling, for example to predict the likelihood of a student to drop out from a course or being admitted to a program, or to identify topics in written assignments. Popenici and Kerr ( 2017 ) define machine learning “as a subfield of artificial intelligence that includes software able to recognise patterns, make predictions, and apply newly discovered patterns to situations that were not included or covered by their initial design” (p. 2).
The concept of rational agents is central to AI: “An agent is anything that can be viewed as perceiving its environment through sensors and acting upon that environment through actuators” (Russel & Norvig, 2010 , p. 34). The vacuum-cleaner robot is a very simple form of an intelligent agent, but things become very complex and open-ended when we think about an automated taxi.
Experts in the field distinguish between weak and strong AI (see Russel & Norvig, 2010 , p. 1020) or narrow and general AI (see Baker & Smith, 2019 , p. 10). A philosophical question remains whether machines will be able to actually think or even develop consciousness in the future, rather than just simulating thinking and showing rational behaviour. It is unlikely that such strong or general AI will exist in the near future. We are therefore dealing here with GOFAI (“ good old-fashioned AI ”, a term coined by the philosopher John Haugeland, 1985 ) in higher education – in the sense of agents and information systems that act as if they were intelligent.
Given this understanding of AI, what are potential areas of AI applications in education, and higher education in particular? Luckin, Holmes, Griffiths, and Forcier ( 2016 ) describe three categories of AI software applications in education that are available today: a) personal tutors, b) intelligent support for collaborative learning, and c) intelligent virtual reality.
Intelligent tutoring systems (ITS) can be used to simulate one-to-one personal tutoring. Based on learner models, algorithms and neural networks, they can make decisions about the learning path of an individual student and the content to select, provide cognitive scaffolding and help, to engage the student in dialogue. ITS have enormous potential, especially in large-scale distance teaching institutions, which run modules with thousands of students, where human one-to-one tutoring is impossible. A vast array of research shows that learning is a social exercise; interaction and collaboration are at the heart of the learning process (see for example Jonassen, Davidson, Collins, Campbell, & Haag, 1995 ). However, online collaboration has to be facilitated and moderated (Salmon, 2000 ). AIEd can contribute to collaborative learning by supporting adaptive group formation based on learner models, by facilitating online group interaction or by summarising discussions that can be used by a human tutor to guide students towards the aims and objectives of a course. Finally, also drawing on ITS, intelligent virtual reality (IVR) is used to engage and guide students in authentic virtual reality and game-based learning environments. Virtual agents can act as teachers, facilitators or students’ peers, for example, in virtual or remote labs (Perez et al., 2017 ).
With the advancement of AIEd and the availability of (big) student data and learning analytics, Luckin et al. ( 2016 ) claim a “[r] enaissance in assessment” (p. 35). AI can provide just-in-time feedback and assessment. Rather than stop-and-test, AIEd can be built into learning activities for an ongoing analysis of student achievement. Algorithms have been used to predict the probability of a student failing an assignment or dropping out of a course with high levels of accuracy (e.g. Bahadır, 2016 ).
In their recent report, Baker and Smith ( 2019 ) approach educational AI tools from three different perspectives; a) learner-facing, b) teacher-facing, and c) system-facing AIEd. Learner-facing AI tools are software that students use to learn a subject matter, i.e. adaptive or personalised learning management systems or ITS. Teacher-facing systems are used to support the teacher and reduce his or her workload by automating tasks such as administration, assessment, feedback and plagiarism detection. AIEd tools also provide insight into the learning progress of students so that the teacher can proactively offer support and guidance where needed. System-facing AIEd are tools that provide information for administrators and managers on the institutional level, for example to monitor attrition patterns across faculties or colleges.
In the context of higher education, we use the concept of the student life-cycle (see Reid, 1995 ) as a framework to describe the various AI based services on the broader institutional and administrative level, as well as for supporting the academic teaching and learning process in the narrower sense.
The purpose of a systematic review is to answer specific questions, based on an explicit, systematic and replicable search strategy, with inclusion and exclusion criteria identifying studies to be included or excluded (Gough, Oliver & Thomas, 2017 ). Data is then coded and extracted from included studies, in order to synthesise findings and to shine light on their application in practice, as well as on gaps or contradictions. This contribution maps 146 articles on the topic of artificial intelligence in higher education.
Search strategy
The initial search string (see Table 1 ) and criteria (see Table 2 ) for this systematic review included peer-reviewed articles in English, reporting on artificial intelligence within education at any level, and indexed in three international databases; EBSCO Education Source, Web of Science and Scopus (covering titles, abstracts, and keywords). Whilst there are concerns about peer-review processes within the scientific community (e.g., Smith, 2006 ), articles in this review were limited to those published in peer-reviewed journals, due to their general trustworthiness in academia and the rigorous review processes undertaken (Nicholas et al., 2015 ). The search was undertaken in November 2018, with an initial 2656 records identified.
After duplicates were removed, it was decided to limit articles to those published during or after 2007, as this was the year that iPhone’s Siri was introduced; an algorithm-based personal assistant, started as an artificial intelligence project funded by the US Defense Advanced Research Projects Agency (DARPA) in 2001, turned into a company that was acquired by Apple Inc. It was also decided that the corpus would be limited to articles discussing applications of artificial intelligence in higher education only.
Screening and inter-rater reliability
The screening of 1549 titles and abstracts was carried out by a team of three coders and at this first screening stage, there was a requirement of sensitivity rather than specificity, i.e. papers were included rather than excluded. In order to reach consensus, the reasons for inclusion and exclusion for the first 80 articles were discussed at regular meetings. Twenty articles were randomly selected to evaluate the coding decisions of the three coders (A, B and C) to determine inter-rater reliability using Cohen’s kappa (κ) (Cohen, 1960 ), which is a coefficient for the degree of consistency among raters, based on the number of codes in the coding scheme (Neumann, 2007 , p. 326). Kappa values of .40–.60 are characterised as fair, .60 to .75 as good, and over .75 as excellent (Bakeman & Gottman, 1997 ; Fleiss, 1981 ). Coding consistency for inclusion or exclusion of articles between rater A and B was κ = .79, between rater A and C it was κ = .89, and between rater B and C it was κ = .69 (median = .79). Therefore, inter-rater reliability can be considered as excellent for the coding of inclusion and exclusion criteria.
After initial screening, 332 potential articles remained for screening on full text (see Fig. 1 ). However, 41 articles could not be retrieved, either through the library order scheme or by contacting authors. Therefore, 291 articles were retrieved, screened and coded, and following the exclusion of 149 papers, 146 articles remained for synthesis. Footnote 5
PRISMA diagram (slightly modified after Brunton & Thomas, 2012 , p. 86; Moher, Liberati, Tetzlaff, & Altman, 2009 , p. 8)
Coding, data extraction and analysis
In order to extract the data, all articles were uploaded into systematic review software EPPI Reviewer Footnote 6 and a coding system was developed. Codes included article information (year of publication, journal name, countries of authorship, discipline of first author), study design and execution (empirical or descriptive, educational setting) and how artificial intelligence was used (applications in the student life cycle, specific applications and methods). Articles were also coded on whether challenges and benefits of AI were present, and whether AI was defined. Descriptive data analysis was carried out with the statistics software R using the tidyr package (Wickham & Grolemund, 2016 ).
Limitations
Whilst this systematic review was undertaken as rigorously as possible, each review is limited by its search strategy. Although the three educational research databases chosen are large and international in scope, by applying the criteria of peer-reviewed articles published only in English or Spanish, research published on AI in other languages were not included in this review. This also applies to research in conference proceedings, book chapters or grey literature, or those articles not published in journals that are indexed in the three databases searched. In addition, although Spanish peer-reviewed articles were added according to inclusion criteria, no specific search string in the language was included, which narrows down the possibility of including Spanish papers that were not indexed with the chosen keywords. Future research could consider using a larger number of databases, publication types and publication languages, in order to widen the scope of the review. However, serious consideration would then need to be given to project resources and the manageability of the review (see Authors, in press).
Journals, authorship patterns and methods
Articles per year.
There was a noticeable increase in the papers published from 2007 onwards. The number of included articles grew from six in 2007 to 23 in 2018 (see Fig. 2 ).
Number of included articles per year ( n = 146)
The papers included in the sample were published in 104 different journals. The greatest number of articles were published in the International Journal of Artificial Intelligence in Education ( n = 11) , followed by Computers & Education ( n = 8) , and the International Journal of Emerging Technologies in Learning ( n = 5) . Table 3 lists 19 journals that published at least two articles on AI in higher education from 2007 to 2018.
For the geographical distribution analysis of articles, the country of origin of the first author was taken into consideration ( n = 38 countries). Table 4 shows 19 countries that contributed at least two papers, and it reveals that 50% of all articles come from only four countries: USA, China, Taiwan, and Turkey.
Author affiliations
Again, the affiliation of the first author was taken into consideration (see Table 5 ). Researchers working in departments of Computer Science contributed by far the greatest number of papers ( n = 61) followed by Science, Technology, Engineering and Mathematics (STEM) departments ( n = 29). Only nine first authors came from an Education department, some reported dual affiliation with Education and Computer Science ( n = 2), Education and Psychology ( n = 1), or Education and STEM ( n = 1).
Thus, 13 papers (8.9%) were written by first authors with an Education background. It is noticeable that three of them were contributed by researchers from the Teachers College at Columbia University, New York, USA (Baker, 2016 ; Paquette, Lebeau, Beaulieu, & Mayers, 2015 ; Perin & Lauterbach, 2018 ) – and they were all published in the same journal, i.e. the International Journal of Artificial Intelligence in Education .
Thirty studies (20.5%) were coded as being theoretical or descriptive in nature. The vast majority of studies (73.3%) applied quantitative methods, whilst only one (0.7%) was qualitative in nature and eight (5.5%) followed a mixed-methods approach. The purpose of the qualitative study, involving interviews with ESL students, was to explore the nature of written feedback coming from an automated essay scoring system compared to a human teacher (Dikli, 2010 ). In many cases, authors employed quasi-experimental methods, being an intentional sample divided into the experimental group, where an AI application (e.g. an intelligent tutoring system) was applied, and the control group without the intervention, followed by pre- and posttest (e.g. Adamson, Dyke, Jang, & Rosé, 2014 ).
Understanding of AI and critical reflection of challenges and risks
There are many different types and levels of AI mentioned in the articles, however only five out of 146 included articles (3.4%) provide an explicit definition of the term “Artificial Intelligence”. The main characteristics of AI, described in all five studies, are the parallels between the human brain and artificial intelligence. The authors conceptualise AI as intelligent computer systems or intelligent agents with human features, such as the ability to memorise knowledge, to perceive and manipulate their environment in a similar way as humans, and to understand human natural language (see Huang, 2018 ; Lodhi, Mishra, Jain, & Bajaj, 2018 ; Welham, 2008 ). Dodigovic ( 2007 ) defines AI in her article as follows (p. 100):
Artificial intelligence (AI) is a term referring to machines which emulate the behaviour of intelligent beings [ … ] AI is an interdisciplinary area of knowledge and research, whose aim is to understand how the human mind works and how to apply the same principles in technology design. In language learning and teaching tasks, AI can be used to emulate the behaviour of a teacher or a learner [ … ] . (p. 100)
Dodigovic is the only author who gives a definition of AI, and comes from an Arts, Humanities and Social Science department, taking into account aspects of AI and intelligent tutors in second language learning.
A stunningly low number of authors, only two out of 146 articles (1.4%), critically reflect upon ethical implications, challenges and risks of applying AI in education. Li ( 2007 ) deals with privacy concerns in his article about intelligent agent supported online learning:
Privacy is also an important concern in applying agent-based personalised education. As discussed above, agents can autonomously learn many of students’ personal information, like learning style and learning capability. In fact, personal information is private. Many students do not want others to know their private information, such as learning styles and/or capabilities. Students might show concern over possible discrimination from instructors in reference to learning performance due to special learning needs. Therefore, the privacy issue must be resolved before applying agent-based personalised teaching and learning technologies. (p. 327)
Another challenge of applying AI is mentioned by Welham ( 2008 , p. 295) concerning the costs and time involved in developing and introducing AI-based methods that many public educational institutions cannot afford.
AI applications in higher education
As mentioned before, we used the concept of the student life-cycle (see Reid, 1995 ) as a framework to describe the various AI based services at the institutional and administrative level (e.g. admission, counselling, library services), as well as at the academic support level for teaching and learning (e.g. assessment, feedback, tutoring). Ninety-two studies (63.0%) were coded as relating to academic support services and 48 (32.8%) as administrative and institutional services; six studies (4.1%) covered both levels. The majority of studies addressed undergraduate students ( n = 91, 62.3%) compared to 11 (7.5%) focussing on postgraduate students, and another 44 (30.1%) that did not specify the study level.
The iterative coding process led to the following four areas of AI applications with 17 sub-categories, covered in the publications: a) adaptive systems and personalisation, b) assessment and evaluation, c) profiling and prediction, and d) intelligent tutoring systems. Some studies addressed AI applications in more than one area (see Table 6 ).
The nature and scope of the various AI applications in higher education will be described along the lines of these four application categories in the following synthesis.
Profiling and prediction
The basis for many AI applications are learner models or profiles that allow prediction, for example of the likelihood of a student dropping out of a course or being admitted to a programme, in order to offer timely support or to provide feedback and guidance in content related matters throughout the learning process. Classification, modelling and prediction are an essential part of educational data mining (Phani Krishna, Mani Kumar, & Aruna Sri, 2018 ).
Most of the articles (55.2%, n = 32) address issues related to the institutional and administrative level, many (36.2%, n = 21) are related to academic teaching and learning at the course level, and five (8.6%) are concerned with both levels. Articles dealing with profiling and prediction were classified into three sub-categories; admission decisions and course scheduling ( n = 7), drop-out and retention ( n = 23), and student models and academic achievement ( n = 27). One study that does not fall into any of these categories is the study by Ge and Xie ( 2015 ), which is concerned with forecasting the costs of a Chinese university to support management decisions based on an artificial neural network.
All of the 58 studies in this area applied machine learning methods, to recognise and classify patterns, and to model student profiles to make predictions. Thus, they are all quantitative in nature. Many studies applied several machine learning algorithms (e.g. ANN, SVM, RF, NB; see Table 7 ) Footnote 7 and compared their overall prediction accuracy with conventional logistic regression. Table 7 shows that machine learning methods outperformed logistic regression in all studies in terms of their classification accuracy in percent. To evaluate the performance of classifiers, the F1-score can also be used, which takes into account the number of positive instances correctly classified as positive, the number of negative instances incorrectly classified as positive, and the number of positive instances incorrectly classified as negative (Umer et al., 2017 ; for a brief overview of measures of diagnostic accuracy, see Šimundić, 2009 ). The F1-score ranges between 0 and 1 with its best value at 1 (perfect precision and recall). Yoo and Kim ( 2014 ) reported high F1-scores of 0.848, 0.911, and 0.914 for J48, NB, and SVM, in a study to predict student’s group project performance from online discussion participation.
Admission decisions and course scheduling
Chen and Do ( 2014 ) point out that “the accurate prediction of students’ academic performance is of importance for making admission decisions as well as providing better educational services” (p. 18). Four studies aimed to predict whether or not a prospective student would be admitted to university. For example, Acikkar and Akay ( 2009 ) selected candidates for a School of Physical Education and Sports in Turkey based on a physical ability test, their scores in the National Selection and Placement Examination, and their graduation grade point average (GPA). They used the support vector machine (SVM) technique to classify the students and where able to predict admission decisions on a level of accuracy of 97.17% in 2006 and 90.51% in 2007. SVM was also applied by Andris, Cowen, and Wittenbach ( 2013 ) to find spatial patterns that might favour prospective college students from certain geographic regions in the USA. Feng, Zhou, and Liu ( 2011 ) analysed enrolment data from 25 Chinese provinces as the training data to predict registration rates in other provinces using an artificial neural network (ANN) model. Machine learning methods and ANN are also used to predict student course selection behaviour to support course planning. Kardan, Sadeghi, Ghidary, and Sani ( 2013 ) investigated factors influencing student course selection, such as course and instructor characteristics, workload, mode of delivery and examination time, to develop a model to predict course selection with an ANN in two Computer Engineering and Information Technology Masters programs. In another paper from the same author team, a decision support system for course offerings was proposed (Kardan & Sadeghi, 2013 ). Overall, the research shows that admission decisions can be predicted at high levels of accuracy, so that an AI solution could relieves the administrative staff and allows them to focus on the more difficult cases.
Drop-out and retention
Studies pertaining to drop-out and retention are intended to develop early warning systems to detect at-risk students in their first year (e.g., Alkhasawneh & Hargraves, 2014 ; Aluko, Adenuga, Kukoyi, Soyingbe, & Oyedeji, 2016 ; Hoffait & Schyns, 2017 ; Howard, Meehan, & Parnell, 2018 ) or to predict the attrition of undergraduate students in general (e.g., Oztekin, 2016 ; Raju & Schumacker, 2015 ). Delen ( 2011 ) used institutional data from 25,224 students enrolled as Freshmen in an American university over 8 years. In this study, three classification techniques were used to predict dropout: ANN, decision trees (DT) and logistic regression. The data contained variables related to students’ demographic, academic, and financial characteristics (e.g. age, sex, ethnicity, GPA, TOEFL score, financial aid, student loan, etc.). Based on a 10-fold cross validation, Delen ( 2011 ) found that the ANN model worked best with an accuracy rate of 81.19% (see Table 7 ) and he concluded that the most important predictors of student drop-out are related to the student’s past and present academic achievement, and whether they receive financial support. Sultana, Khan, and Abbas ( 2017 , p. 107) discussed the impact of cognitive and non-cognitive features of students for predicting academic performance of undergraduate engineering students. In contrast to many other studies, they focused on non-cognitive variables to improve prediction accuracy, i.e. time management, self-concept, self-appraisal, leadership, and community support.
Student models and academic achievement
Many more studies are concerned with profiling students and modelling learning behaviour to predict their academic achievements at the course level. Hussain et al. ( 2018 ) applied several machine learning algorithms to analyse student behavioural data from the virtual learning environment at the Open University UK, in order to predict student engagement, which is of particular importance at a large scale distance teaching university, where it is not possible to engage the majority of students in face-to-face sessions. The authors aim to develop an intelligent predictive system that enables instructors to automatically identify low-engaged students and then to make an intervention. Spikol, Ruffaldi, Dabisias, and Cukurova ( 2018 ) used face and hand tracking in workshops with engineering students to estimate success in project-based learning. They concluded that results generated from multimodal data can be used to inform teachers about key features of project-based learning activities. Blikstein et al. ( 2014 ) investigated patterns of how undergraduate students learn computer programming, based on over 150,000 code transcripts that the students created in software development projects. They found that their model, based on the process of programming, had better predictive power than the midterm grades. Another example is the study of Babić ( 2017 ), who developed a model to predict student academic motivation based on their behaviour in an online learning environment.
The research on student models is an important foundation for the design of intelligent tutoring systems and adaptive learning environments.
- Intelligent tutoring systems
All of the studies investigating intelligent tutoring systems (ITS) ( n = 29) are only concerned with the teaching and learning level, except for one that is contextualised at the institutional and administrative level. The latter presents StuA , an interactive and intelligent student assistant that helps newcomers in a college by answering queries related to faculty members, examinations, extra curriculum activities, library services, etc. (Lodhi et al., 2018 ).
The most common terms for referring to ITS described in the studies are intelligent (online) tutors or intelligent tutoring systems (e.g., in Dodigovic, 2007 ; Miwa, Terai, Kanzaki, & Nakaike, 2014 ), although they are also identified often as intelligent (software) agents (e.g., Schiaffino, Garcia, & Amandi, 2008 ), or intelligent assistants (e.g., in Casamayor, Amandi, & Campo, 2009 ; Jeschike, Jeschke, Pfeiffer, Reinhard, & Richter, 2007 ). According to Welham ( 2008 ), the first ITS reported was the SCHOLAR system, launched in 1970, which allowed the reciprocal exchange of questions between teacher and student, but not holding a continuous conversation.
Huang and Chen ( 2016 , p. 341) describe the different models that are usually integrated in ITS: the student model (e.g. information about the student’s knowledge level, cognitive ability, learning motivation, learning styles), the teacher model (e.g. analysis of the current state of students, select teaching strategies and methods, provide help and guidance), the domain model (knowledge representation of both students and teachers) and the diagnosis model (evaluation of errors and defects based on domain model).
The implementation and validation of the ITS presented in the studies usually took place over short-term periods (a course or a semester) and no longitudinal studies were identified, except for the study by Jackson and Cossitt ( 2015 ). On the other hand, most of the studies showed (sometimes slightly) positive / satisfactory preliminary results regarding the performance of the ITS, but they did not take into account the novelty effect that a new technological development could have in an educational context. One study presented negative results regarding the type of support that the ITS provided (Adamson et al., 2014 ), which could have been more useful if it was more adjusted to the type of (in this case, more advanced) learners.
Overall, more research is needed on the effectiveness of ITS. The last meta-analysis of 39 ITS studies was published over 5 years ago: Steenbergen-Hu and Cooper ( 2014 ) found that ITS had a moderate effect of students’ learning, and that ITS were less effective that human tutoring, but ITS outperformed all other instruction methods (such as traditional classroom instruction, reading printed or digital text, or homework assignments).
The studies addressing various ITS functions were classified as follows: teaching course content ( n = 12), diagnosing strengths or gaps in students’ knowledge and providing automated feedback ( n = 7), curating learning materials based on students’ needs ( n = 3), and facilitating collaboration between learners ( n = 2).
Teaching course content
Most of the studies ( n = 4) within this group focused on teaching Computer Science content (Dobre, 2014 ; Hooshyar, Ahmad, Yousefi, Yusop, & Horng, 2015 ; Howard, Jordan, di Eugenio, & Katz, 2017 ; Shen & Yang, 2011 ). Other studies included ITS teaching content for Mathematics (Miwa et al., 2014 ), Business Statistics and Accounting (Jackson & Cossitt, 2015 ; Palocsay & Stevens, 2008 ), Medicine (Payne et al., 2009 ) and writing and reading comprehension strategies for undergraduate Psychology students (Ray & Belden, 2007 ; Weston-Sementelli, Allen, & McNamara, 2018 ). Overall, these ITS focused on providing teaching content to students and, at the same time, supporting them by giving adaptive feedback and hints to solve questions related to the content, as well as detecting students’ difficulties/errors when working with the content or the exercises. This is made possible by monitoring students’ actions with the ITS.
In the study by Crown, Fuentes, Jones, Nambiar, and Crown ( 2011 ), a combination of teaching content through dialogue with a chatbot, that at the same time learns from this conversation - defined as a text-based conversational agent -, is described, which moves towards a more active, reflective and thinking student-centred learning approach. Duffy and Azevedo ( 2015 ) present an ITS called MetaTutor, which is designed to teach students about the human circulatory system, but it also puts emphasis on supporting students’ self-regulatory processes assisted by the features included in the MetaTutor system (a timer, a toolbar to interact with different learning strategies, and learning goals, amongst others).
Diagnosing strengths or gaps in student knowledge, and providing automated feedback
In most of the studies ( n = 4) of this group, ITS are presented as a rather one-way communication from computer to student, concerning the gaps in students’ knowledge and the provision of feedback. Three examples in the field of STEM have been found: two of them where the virtual assistance is presented as a feature in virtual laboratories by tutoring feedback and supervising student behaviour (Duarte, Butz, Miller, & Mahalingam, 2008 ; Ramírez, Rico, Riofrío-Luzcando, Berrocal-Lobo, & Antonio, 2018 ), and the third one is a stand-alone ITS in the field of Computer Science (Paquette et al., 2015 ). One study presents an ITS of this kind in the field of second language learning (Dodigovic, 2007 ).
In two studies, the function of diagnosing mistakes and the provision of feedback is accomplished by a dialogue between the student and the computer. For example, with an interactive ubiquitous teaching robot that bases its speech on question recognition (Umarani, Raviram, & Wahidabanu, 2011 ), or with the tutoring system, based on a tutorial dialogue toolkit for introductory college Physics (Chi, VanLehn, Litman, & Jordan, 2011 ). The same tutorial dialogue toolkit (TuTalk) is the core of the peer dialogue agent presented by Howard et al. ( 2017 ), where the ITS engages in a one-on-one problem-solving peer interaction with a student and can interact verbally, graphically and in a process-oriented way, and engage in collaborative problem solving instead of tutoring. This last study could be considered as part of a new category regarding peer-agent collaboration.
Curating learning materials based on student needs
Two studies focused on this kind of ITS function (Jeschike et al., 2007 ; Schiaffino et al., 2008 ), and a third one mentions it in a more descriptive way as a feature of the detection system presented (Hall Jr & Ko, 2008 ). Schiaffino et al. ( 2008 ) present eTeacher as a system for personalised assistance to e-learning students by observing their behaviour in the course and generating a student’s profile. This enables the system to provide specific recommendations regarding the type of reading material and exercises done, as well as personalised courses of action. Jeschike et al. ( 2007 ) refers to an intelligent assistant contextualised in a virtual laboratory of statistical mechanics, where it presents exercises and the evaluation of the learners’ input to content, and interactive course material that adapts to the learner.
Facilitating collaboration between learners
Within this group we can identify only two studies: one focusing on supporting online collaborative learning discussions by using academically productive talk moves (Adamson et al., 2014 ); and the second one, on facilitating collaborative writing by providing automated feedback, generated automatic questions, and the analysis of the process (Calvo, O’Rourke, Jones, Yacef, & Reimann, 2011 ). Given the opportunities that the applications described in these studies afford for supporting collaboration among students, more research in this area would be desireable.
The teachers’ perspective
As mentioned above, Baker and Smith ( 2019 , p.12) distinguish between student and teacher-facing AI. However, only two included articles in ITS focus on the teacher’s perspective. Casamayor et al. ( 2009 ) focus on assisting teachers with the supervision and detection of conflictive cases in collaborative learning. In this study, the intelligent assistant provides the teachers with a summary of the individual progress of each group member and the type of participation each of them have had in their work groups, notification alerts derived from the detection of conflict situations, and information about the learning style of each student-logging interactions, so that the teachers can intervene when they consider it convenient. The other study put the emphasis on the ITS sharing teachers’ tutoring tasks by providing immediate feedback (automating tasks), and leaving the teachers the role of providing new hints and the correct solution to the tasks (Chou, Huang, & Lin, 2011 ). The study of Chi et al. ( 2011 ) also mentions the ITS purpose to share teacher’s tutoring tasks. The main aim in any of these cases is to reduce teacher’s workload. Furthermore, many of the learner-facing studies deal with the teacher-facing functions too, although they do not put emphasis on the teacher’s perspective.
Assessment and evaluation
Assessment and evaluation studies also largely focused on the level of teaching and learning (86%, n = 31), although five studies described applications at the institutional level. In order to gain an overview of student opinion about online and distance learning at their institution, academics at Anadolu University (Ozturk, Cicek, & Ergul, 2017 ) used sentiment analysis to analyse mentions by students on Twitter, using Twitter API Twython and terms relating to the system. This analysis of publicly accessible data, allowed researchers insight into student opinion, which otherwise may not have been accessible through their institutional LMS, and which can inform improvements to the system. Two studies used AI to evaluate student Prior Learning and Recognition (PLAR); Kalz et al. ( 2008 ) used Latent Semantic Analysis and ePortfolios to inform personalised learning pathways for students, and Biletska, Biletskiy, Li, and Vovk ( 2010 ) used semantic web technologies to convert student credentials from different institutions, which could also provide information from course descriptions and topics, to allow for easier granting of credit. The final article at the institutional level (Sanchez et al., 2016 ) used an algorithm to match students to professional competencies and capabilities required by companies, in order to ensure alignment between courses and industry needs.
Overall, the studies show that AI applications can perform assessment and evaluation tasks at very high accuracy and efficiency levels. However, due to the need to calibrate and train the systems (supervised machine learning), they are more applicable to courses or programs with large student numbers.
Articles focusing on assessment and evaluation applications of AI at the teaching and learning level, were classified into four sub-categories; automated grading ( n = 13), feedback ( n = 8), evaluation of student understanding, engagement and academic integrity ( n = 5), and evaluation of teaching ( n = 5).
Automated grading
Articles that utilised automated grading, or Automated Essay Scoring (AES) systems, came from a range of disciplines (e.g. Biology, Medicine, Business Studies, English as a Second Language), but were mostly focused on its use in undergraduate courses ( n = 10), including those with low reading and writing ability (Perin & Lauterbach, 2018 ). Gierl, Latifi, Lai, Boulais, and Champlain’s ( 2014 ) use of open source Java software LightSIDE to grade postgraduate medical student essays resulted in an agreement between the computer classification and human raters between 94.6% and 98.2%, which could enable reducing cost and the time associated with employing multiple human assessors for large-scale assessments (Barker, 2011 ; McNamara, Crossley, Roscoe, Allen, & Dai, 2015 ). However, they stressed that not all writing genres may be appropriate for AES and that it would be impractical to use in most small classrooms, due to the need to calibrate the system with a large number of pre-scored assessments. The benefits of using algorithms that find patterns in text responses, however, has been found to lead to encouraging more revisions by students (Ma & Slater, 2015 ) and to move away from merely measuring student knowledge and abilities by multiple choice tests (Nehm, Ha, & Mayfield, 2012 ). Continuing issues persist, however, in the quality of feedback provided by AES (Dikli, 2010 ), with Barker ( 2011 ) finding that the more detailed the feedback provided was, the more likely students were to question their grades, and a question was raised over the benefits of this feedback for beginning language students (Aluthman, 2016 ).
Articles concerned with feedback included a range of student-facing tools, including intelligent agents that provide students with prompts or guidance when they are confused or stalled in their work (Huang, Chen, Luo, Chen, & Chuang, 2008 ), software to alert trainee pilots when they are losing situation awareness whilst flying (Thatcher, 2014 ), and machine learning techniques with lexical features to generate automatic feedback and assist in improving student writing (Chodorow, Gamon, & Tetreault, 2010 ; Garcia-Gorrostieta, Lopez-Lopez, & Gonzalez-Lopez, 2018 ; Quixal & Meurers, 2016 ), which can help reduce students cognitive overload (Yang, Wong, & Yeh, 2009 ). The automated feedback system based on adaptive testing reported by Barker ( 2010 ), for example, not only determines the most appropriate individual answers according to Bloom’s cognitive levels, but also recommends additional materials and challenges.
Evaluation of student understanding, engagement and academic integrity
Three articles reported on student-facing tools that evaluate student understanding of concepts (Jain, Gurupur, Schroeder, & Faulkenberry, 2014 ; Zhu, Marquez, & Yoo, 2015 ) and provide personalised assistance (Samarakou, Fylladitakis, Früh, Hatziapostolou, & Gelegenis, 2015 ). Hussain et al. ( 2018 ) used machine learning algorithms to evaluate student engagement in a social science course at the Open University, including final results, assessment scores and the number of clicks that students make in the VLE, which can alert instructors to the need for intervention, and Amigud, Arnedo-Moreno, Daradoumis, and Guerrero-Roldan ( 2017 ) used machine learning algorithms to check academic integrity, by assessing the likelihood of student work being similar to their other work. With a mean accuracy of 93%, this opens up possibilities of reducing the need for invigilators or to access student accounts, thereby reducing concerns surrounding privacy.
Evaluation of teaching
Four studies used data mining algorithms to evaluate lecturer performance through course evaluations (Agaoglu, 2016 ; Ahmad & Rashid, 2016 ; DeCarlo & Rizk, 2010 ; Gutierrez, Canul-Reich, Ochoa Zezzatti, Margain, & Ponce, 2018 ), with Agaoglu ( 2016 ) finding, through using four different classification techniques, that many questions in the evaluation questionnaire were irrelevant. The application of an algorithm to evaluate the impact of teaching methods in a differential equations class, found that online homework with immediate feedback was more effective than clickers (Duzhin & Gustafsson, 2018 ). The study also found that, whilst previous exam results are generally good predictors for future exam results, they say very little about students’ expected performance in project-based tasks.
Adaptive systems and personalisation
Most of the studies on adaptive systems (85%, n = 23) are situated at the teaching and learning level, with four cases considering the institutional and administrative level. Two studies explored undergraduate students’ academic advising (Alfarsi, Omar, & Alsinani, 2017 ; Feghali, Zbib, & Hallal, 2011 ), and Nguyen et al. ( 2018 ) focused on AI to support university career services. Ng, Wong, Lee, and Lee ( 2011 ) reported on the development of an agent-based distance LMS, designed to manage resources, support decision making and institutional policy, and assist with managing undergraduate student study flow (e.g. intake, exam and course management), by giving users access to data across disciplines, rather than just individual faculty areas.
There does not seem to be agreement within the studies on a common term for adaptive systems, and that is probably due to the diverse functions they carry out, which also supports the classification of studies. Some of those terms coincide in part with the ones used for ITS, e.g. intelligent agents (Li, 2007 ; Ng et al., 2011 ). The most general terms used are intelligent e-learning system (Kose & Arslan, 2016 ), adaptive web-based learning system (Lo, Chan, & Yeh, 2012 ), or intelligent teaching system (Yuanyuan & Yajuan, 2014 ). As in ITS, most of the studies either describe the system or include a pilot study but no longer-term results are reported. Results from these pilot studies are usually reported as positive, except in Vlugter, Knott, McDonald, and Hall ( 2009 ), where the experimental group that used the dialogue-based computer assisted language-system scored lower than the control group in the delayed post-tests.
The 23 studies focused on teaching and learning can be classified into five sub-categories; teaching course content ( n = 7), recommending/providing personalised content ( n = 5), supporting teachers in learning and teaching design ( n = 3), using academic data to monitor and guide students ( n = 2), and supporting representation of knowledge using concept maps ( n = 2). However, some studies were difficult to classify, due to their specific and unique functions; helping to organise online learning groups with similar interests (Yang, Wang, Shen, & Han, 2007 ), supporting business decisions through simulation (Ben-Zvi, 2012 ), or supporting changes in attitude and behaviour for patients with Anorexia Nervosa, through embodied conversational agents (Sebastian & Richards, 2017 ). Aparicio et al. ( 2018 ) present a study where no adaptive system application was analysed, rather students’ perceptions of the use of information systems in education in general - and biomedical education in particular - were analysed, including intelligent information access systems .
The disciplines that are taught through adaptive systems are diverse, including environmental education (Huang, 2018 ), animation design (Yuanyuan & Yajuan, 2014 ), language learning (Jia, 2009 ; Vlugter et al., 2009 ), Computer Science (Iglesias, Martinez, Aler, & Fernandez, 2009 ) and Biology (Chaudhri et al., 2013 ). Walsh, Tamjidul, and Williams ( 2017 ), however, present an adaptive system based on machine learning-human machine learning symbiosis from a descriptive perspective, without specifying any discipline.
Recommending/providing personalised content
This group refers to adaptive systems that deliver customised content, materials and exercises according to students’ behaviour profiling in Business and Administration studies (Hall Jr & Ko, 2008 ) and Computer Science (Kose & Arslan, 2016 ; Lo et al., 2012 ). On the other hand, Tai, Wu, and Li ( 2008 ) present an e-learning recommendation system for online students to help them choose among courses, and Torres-Díaz, Infante Moro, and Valdiviezo Díaz ( 2014 ) emphasise the usefulness of (adaptive) recommendation systems in MOOCs to suggest actions, new items and users, according to students’ personal preferences.
Supporting teachers in learning and teaching design
In this group, three studies were identified. One study puts the emphasis on a hybrid recommender system of pedagogical patterns, to help teachers define their teaching strategies, according to the context of a specific class (Cobos et al., 2013 ), and another study presents a description of a metadata-based model to implement automatic learning designs that can solve detected problems (Camacho & Moreno, 2007 ). Li’s ( 2007 ) descriptive study argues that intelligent agents save time for online instructors, by leaving the most repetitive tasks to the systems, so that they can focus more on creative work.
Using academic data to monitor and guide students
The adaptive systems within this category focus on the extraction of student academic information to perform diagnostic tasks, and help tutors to offer a more proactive personal guidance (Rovira, Puertas, & Igual, 2017 ); or, in addition to that task, include performance evaluation and personalised assistance and feedback, such as the Learner Diagnosis, Assistance, and Evaluation System based on AI (StuDiAsE) for engineering learners (Samarakou et al., 2015 ).
Supporting representation of knowledge in concept maps
To help build students’ self-awareness of conceptual structures, concept maps can be quite useful. In the two studies of this group, an expert system was included, e.g. in order to accommodate selected peer ideas in the integrated concept maps and allow teachers to flexibly determine in which ways the selected concept maps are to be merged ( ICMSys ) (Kao, Chen, & Sun, 2010 ), or to help English as a Foreign Language college students to develop their reading comprehension through mental maps of referential identification (Yang et al., 2009 ). This latter system also includes system-guided instruction, practice and feedback.
Conclusions and implications for further educational research
In this paper, we have explored the field of AIEd research in terms of authorship and publication patterns. It is evident that US-American, Chinese, Taiwanese and Turkish colleagues (accounting for 50% of the publications as first authors) from Computer Science and STEM departments (62%) dominate the field. The leading journals are the International Journal of Artificial Intelligence in Education , Computers & Education , and the International Journal of Emerging Technologies in Learning .
More importantly, this study has provided an overview of the vast array of potential AI applications in higher education to support students, faculty members, and administrators. They were described in four broad areas (profiling and prediction, intelligent tutoring systems, assessment and evaluation, and adaptive systems and personalisation) with 17 sub-categories. This structure, which was derived from the systematic review, contributes to the understanding and conceptualisation of AIEd practice and research.
On the other hand, the lack of longitudinal studies and the substantial presence of descriptive and pilot studies from the technological perspective, as well as the prevalence of quantitative methods - especially quasi-experimental methods - in empirical studies, shows that there is still substantial room for educators to aim at innovative and meaningful research and practice with AIEd that could have learning impact within higher education, e.g. adopting design-based approaches (Easterday, Rees Lewis, & Gerber, 2018 ). A recent systematic literature review on personalisation in educational technology coincided with the predominance of experiences in technological developments, which also often used quantitative methods (Bartolomé, Castañeda, & Adell, 2018 ). Misiejuk and Wasson ( 2017 , p. 61) noted in their systematic review on Learning Analytics that “there are very few implementation studies and impact studies” (p. 61), which is also similar to the findings in the present article.
The full consequences of AI development cannot yet be foreseen today, but it seems likely that AI applications will be a top educational technology issue for the next 20 years. AI-based tools and services have a high potential to support students, faculty members and administrators throughout the student lifecycle. The applications that are described in this article provide enormous pedagogical opportunities for the design of intelligent student support systems, and for scaffolding student learning in adaptive and personalized learning environments. This applies in particular to large higher education institutions (such as open and distance teaching universities), where AIEd might help to overcome the dilemma of providing access to higher education for very large numbers of students (mass higher education). On the other hand, it might also help them to offer flexible, but also interactive and personalized learning opportunities, for example by relieving teachers from burdens, such as grading hundreds or even thousands of assignments, so that they can focus on their real task: empathic human teaching.
It is crucial to emphasise that educational technology is not (only) about technology – it is the pedagogical, ethical, social, cultural and economic dimensions of AIEd we should be concerned about. Selwyn ( 2016 , p. 106) writes:
The danger, of course, lies in seeing data and coding as an absolute rather than relative source of guidance and support. Education is far too complex to be reduced solely to data analysis and algorithms. As with digital technologies in general, digital data do not offer a neat technical fix to education dilemmas – no matter how compelling the output might be.
We should not strive for what is technically possible, but always ask ourselves what makes pedagogical sense. In China, systems are already being used to monitor student participation and expressions via face recognition in classrooms (so called Intelligent Classroom Behavior Management System, Smart Campus Footnote 8 ) and display them to the teacher on a dashboard. This is an example of educational surveillance, and it is highly questionable whether such systems provide real added value for a good teacher who should be able to capture the dynamics in a learning group (online and in an on-campus setting) and respond empathically and in a pedagogically meaningful way. In this sense, it is crucial to adopt an ethics of care (Prinsloo, 2017 ) to start thinking on how we are exploring the potential of algorithmic decision-making systems that are embedded in AIEd applications. Furthermore, we should also always remember that AI systems “first and foremost, require control by humans. Even the smartest AI systems can make very stupid mistakes. […] AI Systems are only as smart as the date used to train them” (Kaplan & Haenlein, 2019 , p. 25). Some critical voices in educational technology remind us that we should go beyond the tools, and talk again about learning and pedagogy, as well as acknowledging the human aspects of digital technology use in education (Castañeda & Selwyn, 2018 ). The new UNESCO report on challenges and opportunities of AIEd for sustainable development deals with various areas, all of which have an important pedagogical, social and ethical dimension, e.g. ensuring inclusion and equity in AIEd, preparing teachers for AI-powered education, developing quality and inclusive data systems, or ethics and transparency in data collection, use and dissemination (Pedró, Subosa, Rivas, & Valverde, 2019 ).
That being said, a stunning result of this review is the dramatic lack of critical reflection of the pedagogical and ethical implications as well as risks of implementing AI applications in higher education. Concerning ethical implications, privacy issues were also noted to be rarely addressed in empirical studies in a recent systematic review on Learning Analytics (Misiejuk & Wasson, 2017 ). More research is needed from educators and learning designers on how to integrate AI applications throughout the student lifecycle, to harness the enormous opportunities that they afford for creating intelligent learning and teaching systems. The low presence of authors affiliated with Education departments identified in our systematic review is evidence of the need for educational perspectives on these technological developments.
The lack of theory might be a syndrome within the field of educational technology in general. In a recent study, Hew, Lan, Tang, Jia, and Lo ( 2019 ) found that more than 40% of articles in three top educational technology journals were wholly a-theoretical. The systematic review by Bartolomé et al. ( 2018 ) also revealed this lack of explicit pedagogical perspectives in the studies analysed. The majority of research included in this systematic review is merely focused on analysing and finding patterns in data to develop models, and to make predictions that inform student and teacher facing applications, or to support administrative decisions using mathematical theories and machine learning methods that were developed decades ago (see Russel & Norvig, 2010 ). This kind of research is now possible through the growth of computing power and the vast availability of big digital student data. However, at this stage, there is very little evidence for the advancement of pedagogical and psychological learning theories related to AI driven educational technology. It is an important implication of this systematic review, that researchers are encouraged to be explicit about the theories that underpin empirical studies about the development and implementation of AIEd projects, in order to expand research to a broader level, helping us to understand the reasons and mechanisms behind this dynamic development that will have an enormous impact on higher education institutions in the various areas we have covered in this review.
Availability of data and materials
The datasets used and/or analysed during the current study (the bibliography of included studies) are available from the corresponding author upon request.
https://www.dfki.de/en/web/ (accessed 22 July, 2019)
https://www.tue.nl/en/news/news-overview/11-07-2019-tue-announces-eaisi-new-institute-for-intelligent-machines/ (accessed 22 July, 2019)
http://instituteforethicalaiineducation.org (accessed 22 July, 2019)
https://apo.org.au/node/229596 (accessed 22 July, 2019)
A file with all included references is available at: https://www.researchgate.net/publication/ 335911716_AIED-Ref (CC-0; DOI: https://doi.org/10.13140/RG.2.2.13000.88321 )
https://eppi.ioe.ac.uk/cms/er4/ (accessed July 22, 2019)
It is beyond the scope of this article to discuss the various machine learning methods for classification and prediction. Readers are therefore encouraged to refer to the literature referenced in the articles that are included in this review (e.g. Delen, 2010 and Umer, Susnjak, Mathrani, & Suriadi, 2017 ).
https://www.businessinsider.de/china-school-facial-recognition-technology-2018-5?r=US&IR=T (accessed July 5, 2019)
Acikkar, M., & Akay, M. F. (2009). Support vector machines for predicting the admission decision of a candidate to the School of Physical Education and Sports at Cukurova University. Expert Systems with Applications , 36 (3 PART 2), 7228–7233. https://doi.org/10.1016/j.eswa.2008.09.007 .
Article Google Scholar
Adamson, D., Dyke, G., Jang, H., & Rosé, C. P. (2014). Towards an agile approach to adapting dynamic collaboration support to student needs. International Journal of Artificial Intelligence in Education , 24 (1), 92–124. https://doi.org/10.1007/s40593-013-0012-6 .
Agaoglu, M. (2016). Predicting instructor performance using data mining techniques in higher education. IEEE Access , 4 , 2379–2387. https://doi.org/10.1109/ACCESS.2016.2568756 .
Ahmad, H., & Rashid, T. (2016). Lecturer performance analysis using multiple classifiers. Journal of Computer Science , 12 (5), 255–264. https://doi.org/10.3844/fjcssp.2016.255.264 .
Alfarsi, G. M. S., Omar, K. A. M., & Alsinani, M. J. (2017). A rule-based system for advising undergraduate students. Journal of Theoretical and Applied Information Technology , 95 (11) Retrieved from http://www.jatit.org .
Alkhasawneh, R., & Hargraves, R. H. (2014). Developing a hybrid model to predict student first year retention in STEM disciplines using machine learning techniques. Journal of STEM Education: Innovations & Research , 15 (3), 35–42 https://core.ac.uk/download/pdf/51289621.pdf .
Google Scholar
Aluko, R. O., Adenuga, O. A., Kukoyi, P. O., Soyingbe, A. A., & Oyedeji, J. O. (2016). Predicting the academic success of architecture students by pre-enrolment requirement: Using machine-learning techniques. Construction Economics and Building , 16 (4), 86–98. https://doi.org/10.5130/AJCEB.v16i4.5184 .
Aluthman, E. S. (2016). The effect of using automated essay evaluation on ESL undergraduate students’ writing skill. International Journal of English Linguistics , 6 (5), 54–67. https://doi.org/10.5539/ijel.v6n5p54 .
Amigud, A., Arnedo-Moreno, J., Daradoumis, T., & Guerrero-Roldan, A.-E. (2017). Using learning analytics for preserving academic integrity. International Review of Research in Open and Distance Learning , 18 (5), 192–210. https://doi.org/10.19173/irrodl.v18i5.3103 .
Andris, C., Cowen, D., & Wittenbach, J. (2013). Support vector machine for spatial variation. Transactions in GIS , 17 (1), 41–61. https://doi.org/10.1111/j.1467-9671.2012.01354.x .
Aparicio, F., Morales-Botello, M. L., Rubio, M., Hernando, A., Muñoz, R., López-Fernández, H., … de Buenaga, M. (2018). Perceptions of the use of intelligent information access systems in university level active learning activities among teachers of biomedical subjects. International Journal of Medical Informatics , 112 (December 2017), 21–33. https://doi.org/10.1016/j.ijmedinf.2017.12.016 .
Babić, I. D. (2017). Machine learning methods in predicting the student academic motivation. Croatian Operational Research Review , 8 (2), 443–461. https://doi.org/10.17535/crorr.2017.0028 .
Article MathSciNet Google Scholar
Bahadır, E. (2016). Using neural network and logistic regression analysis to predict prospective mathematics teachers’ academic success upon entering graduate education. Kuram ve Uygulamada Egitim Bilimleri , 16 (3), 943–964. https://doi.org/10.12738/estp.2016.3.0214 .
Bakeman, R., & Gottman, J. M. (1997). Observing interaction - an introduction to sequential analysis . Cambridge: Cambridge University Press.
Book Google Scholar
Baker, R. S. (2016). Stupid Tutoring Systems, Intelligent Humans. International Journal of Artificial Intelligence in Education , 26 (2), 600–614. https://doi.org/10.1007/s40593-016-0105-0 .
Baker, T., & Smith, L. (2019). Educ-AI-tion rebooted? Exploring the future of artificial intelligence in schools and colleges. Retrieved from Nesta Foundation website: https://media.nesta.org.uk/documents/Future_of_AI_and_education_v5_WEB.pdf
Barker, T. (2010). An automated feedback system based on adaptive testing: Extending the model. International Journal of Emerging Technologies in Learning , 5 (2), 11–14. https://doi.org/10.3991/ijet.v5i2.1235 .
Barker, T. (2011). An automated individual feedback and marking system: An empirical study. Electronic Journal of E-Learning , 9 (1), 1–14 https://www.learntechlib.org/p/52053/ .
Bartolomé, A., Castañeda, L., & Adell, J. (2018). Personalisation in educational technology: The absence of underlying pedagogies. International Journal of Educational Technology in Higher Education , 15 (14). https://doi.org/10.1186/s41239-018-0095-0 .
Ben-Zvi, T. (2012). Measuring the perceived effectiveness of decision support systems and their impact on performance. Decision Support Systems , 54 (1), 248–256. https://doi.org/10.1016/j.dss.2012.05.033 .
Biletska, O., Biletskiy, Y., Li, H., & Vovk, R. (2010). A semantic approach to expert system for e-assessment of credentials and competencies. Expert Systems with Applications , 37 (10), 7003–7014. https://doi.org/10.1016/j.eswa.2010.03.018 .
Blikstein, P., Worsley, M., Piech, C., Sahami, M., Cooper, S., & Koller, D. (2014). Programming pluralism: Using learning analytics to detect patterns in the learning of computer programming. Journal of the Learning Sciences , 23 (4), 561–599. https://doi.org/10.1080/10508406.2014.954750 .
Brunton, J., & Thomas, J. (2012). Information management in systematic reviews. In D. Gough, S. Oliver, & J. Thomas (Eds.), An introduction to systematic reviews , (pp. 83–106). London: SAGE.
Calvo, R. A., O’Rourke, S. T., Jones, J., Yacef, K., & Reimann, P. (2011). Collaborative writing support tools on the cloud. IEEE Transactions on Learning Technologies , 4 (1), 88–97 https://www.learntechlib.org/p/73461/ .
Camacho, D., & Moreno, M. D. R. (2007). Towards an automatic monitoring for higher education learning design. International Journal of Metadata, Semantics and Ontologies , 2 (1), 1. https://doi.org/10.1504/ijmso.2007.015071 .
Casamayor, A., Amandi, A., & Campo, M. (2009). Intelligent assistance for teachers in collaborative e-learning environments. Computers & Education , 53 (4), 1147–1154. https://doi.org/10.1016/j.compedu.2009.05.025 .
Castañeda, L., & Selwyn, N. (2018). More than tools? Making sense of he ongoing digitizations of higher education. International Journal of Educational Technology in Higher Education , 15 (22). https://doi.org/10.1186/s41239-018-0109-y .
Chaudhri, V. K., Cheng, B., Overtholtzer, A., Roschelle, J., Spaulding, A., Clark, P., … Gunning, D. (2013). Inquire biology: A textbook that answers questions. AI Magazine , 34 (3), 55–55. https://doi.org/10.1609/aimag.v34i3.2486 .
Chen, J.-F., & Do, Q. H. (2014). Training neural networks to predict student academic performance: A comparison of cuckoo search and gravitational search algorithms. International Journal of Computational Intelligence and Applications , 13 (1). https://doi.org/10.1142/S1469026814500059 .
Chi, M., VanLehn, K., Litman, D., & Jordan, P. (2011). Empirically evaluating the application of reinforcement learning to the induction of effective and adaptive pedagogical strategies. User Modeling and User-Adapted Interaction , 21 (1), 137–180. https://doi.org/10.1007/s11257-010-9093-1 .
Chodorow, M., Gamon, M., & Tetreault, J. (2010). The utility of article and preposition error correction systems for English language learners: Feedback and assessment. Language Testing , 27 (3), 419–436. https://doi.org/10.1177/0265532210364391 .
Chou, C.-Y., Huang, B.-H., & Lin, C.-J. (2011). Complementary machine intelligence and human intelligence in virtual teaching assistant for tutoring program tracing. Computers & Education , 57 (4), 2303–2312 https://www.learntechlib.org/p/167322/ .
Cobos, C., Rodriguez, O., Rivera, J., Betancourt, J., Mendoza, M., León, E., & Herrera-Viedma, E. (2013). A hybrid system of pedagogical pattern recommendations based on singular value decomposition and variable data attributes. Information Processing and Management , 49 (3), 607–625. https://doi.org/10.1016/j.ipm.2012.12.002 .
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement , 20 , 37–46. https://doi.org/10.1177/001316446002000104 .
Contact North. (2018). Ten facts about artificial intelligence in teaching and learning. Retrieved from https://teachonline.ca/sites/default/files/tools-trends/downloads/ten_facts_about_artificial_intelligence.pdf
Crown, S., Fuentes, A., Jones, R., Nambiar, R., & Crown, D. (2011). Anne G. Neering: Interactive chatbot to engage and motivate engineering students. Computers in Education Journal , 21 (2), 24–34.
DeCarlo, P., & Rizk, N. (2010). The design and development of an expert system prototype for enhancing exam quality. International Journal of Advanced Corporate Learning , 3 (3), 10–13. https://doi.org/10.3991/ijac.v3i3.1356 .
Delen, D. (2010). A comparative analysis of machine learning techniques for student retention management. Decision Support Systems , 49 (4), 498–506. https://doi.org/10.1016/j.dss.2010.06.003 .
Delen, D. (2011). Predicting student attrition with data mining methods. Journal of College Student Retention: Research, Theory and Practice , 13 (1), 17–35. https://doi.org/10.2190/CS.13.1.b .
Dikli, S. (2010). The nature of automated essay scoring feedback. CALICO Journal , 28 (1), 99–134. https://doi.org/10.11139/cj.28.1.99-134 .
Dobre, I. (2014). Assessing the student′s knowledge in informatics discipline using the METEOR metric. Mediterranean Journal of Social Sciences , 5 (19), 84–92. https://doi.org/10.5901/mjss.2014.v5n19p84 .
Dodigovic, M. (2007). Artificial intelligence and second language learning: An efficient approach to error remediation. Language Awareness , 16 (2), 99–113. https://doi.org/10.2167/la416.0 .
Duarte, M., Butz, B., Miller, S., & Mahalingam, A. (2008). An intelligent universal virtual laboratory (UVL). IEEE Transactions on Education , 51 (1), 2–9. https://doi.org/10.1109/SSST.2002.1027009 .
Duffy, M. C., & Azevedo, R. (2015). Motivation matters: Interactions between achievement goals and agent scaffolding for self-regulated learning within an intelligent tutoring system. Computers in Human Behavior , 52 , 338–348. https://doi.org/10.1016/j.chb.2015.05.041 .
Duzhin, F., & Gustafsson, A. (2018). Machine learning-based app for self-evaluation of teacher-specific instructional style and tools. Education Sciences , 8 (1). https://doi.org/10.3390/educsci8010007 .
Easterday, M. W., Rees Lewis, D. G., & Gerber, E. M. (2018). The logic of design research. Learning: Research and Practice , 4 (2), 131–160. https://doi.org/10.1080/23735082.2017.1286367 .
EDUCAUSE. (2018). Horizon report: 2018 higher education edition. Retrieved from EDUCAUSE Learning Initiative and The New Media Consortium website: https://library.educause.edu/~/media/files/library/2018/8/2018horizonreport.pdf
EDUCAUSE. (2019). Horizon report: 2019 higher education edition. Retrieved from EDUCAUSE Learning Initiative and The New Media Consortium website: https://library.educause.edu/-/media/files/library/2019/4/2019horizonreport.pdf
Feghali, T., Zbib, I., & Hallal, S. (2011). A web-based decision support tool for academic advising. Educational Technology and Society , 14 (1), 82–94 https://www.learntechlib.org/p/52325/ .
Feng, S., Zhou, S., & Liu, Y. (2011). Research on data mining in university admissions decision-making. International Journal of Advancements in Computing Technology , 3 (6), 176–186. https://doi.org/10.4156/ijact.vol3.issue6.21 .
Fleiss, J. L. (1981). Statistical methods for rates and proportions . New York: Wiley.
MATH Google Scholar
Garcia-Gorrostieta, J. M., Lopez-Lopez, A., & Gonzalez-Lopez, S. (2018). Automatic argument assessment of final project reports of computer engineering students. Computer Applications in Engineering Education, 26(5), 1217–1226. https://doi.org/10.1002/cae.21996
Ge, C., & Xie, J. (2015). Application of grey forecasting model based on improved residual correction in the cost estimation of university education. International Journal of Emerging Technologies in Learning , 10 (8), 30–33. https://doi.org/10.3991/ijet.v10i8.5215 .
Gierl, M., Latifi, S., Lai, H., Boulais, A., & Champlain, A. (2014). Automated essay scoring and the future of educational assessment in medical education. Medical Education , 48 (10), 950–962. https://doi.org/10.1111/medu.12517 .
Gough, D., Oliver, S., & Thomas, J. (2017). An introduction to systematic reviews , (2nd ed., ). Los Angeles: SAGE.
Gutierrez, G., Canul-Reich, J., Ochoa Zezzatti, A., Margain, L., & Ponce, J. (2018). Mining: Students comments about teacher performance assessment using machine learning algorithms. International Journal of Combinatorial Optimization Problems and Informatics , 9 (3), 26–40 https://ijcopi.org/index.php/ojs/article/view/99 .
Hall Jr., O. P., & Ko, K. (2008). Customized content delivery for graduate management education: Application to business statistics. Journal of Statistics Education , 16 (3). https://doi.org/10.1080/10691898.2008.11889571 .
Haugeland, J. (1985). Artificial intelligence: The very idea. Cambridge, Mass.: MIT Press
Hew, K. F., Lan, M., Tang, Y., Jia, C., & Lo, C. K. (2019). Where is the “theory” within the field of educational technology research? British Journal of Educational Technology , 50 (3), 956–971. https://doi.org/10.1111/bjet.12770 .
Hinojo-Lucena, F.-J., Aznar-Díaz, I., Cáceres-Reche, M.-P., & Romero-Rodríguez, J.-M. (2019). Artificial intelligence in higher education: A bibliometric study on its impact in the scientific literature. Education Sciences , 9 (1), 51. https://doi.org/10.3390/educsci9010051 .
Hoffait, A.-S., & Schyns, M. (2017). Early detection of university students with potential difficulties. Decision Support Systems , 101 , 1–11. https://doi.org/10.1016/j.dss.2017.05.003 .
Hooshyar, D., Ahmad, R., Yousefi, M., Yusop, F., & Horng, S. (2015). A flowchart-based intelligent tutoring system for improving problem-solving skills of novice programmers. Journal of Computer Assisted Learning , 31 (4), 345–361. https://doi.org/10.1111/jcal.12099 .
Howard, C., Jordan, P., di Eugenio, B., & Katz, S. (2017). Shifting the load: A peer dialogue agent that encourages its human collaborator to contribute more to problem solving. International Journal of Artificial Intelligence in Education , 27 (1), 101–129. https://doi.org/10.1007/s40593-015-0071-y .
Howard, E., Meehan, M., & Parnell, A. (2018). Contrasting prediction methods for early warning systems at undergraduate level. Internet and Higher Education , 37 , 66–75. https://doi.org/10.1016/j.iheduc.2018.02.001 .
Huang, C.-J., Chen, C.-H., Luo, Y.-C., Chen, H.-X., & Chuang, Y.-T. (2008). Developing an intelligent diagnosis and assessment e-Learning tool for introductory programming. Educational Technology & Society , 11 (4), 139–157 https://www.jstor.org/stable/jeductechsoci.11.4.139 .
Huang, J., & Chen, Z. (2016). The research and design of web-based intelligent tutoring system. International Journal of Multimedia and Ubiquitous Engineering , 11 (6), 337–348. https://doi.org/10.14257/ijmue.2016.11.6.30 .
Huang, S. P. (2018). Effects of using artificial intelligence teaching system for environmental education on environmental knowledge and attitude. Eurasia Journal of Mathematics, Science and Technology Education , 14 (7), 3277–3284. https://doi.org/10.29333/ejmste/91248 .
Hussain, M., Zhu, W., Zhang, W., & Abidi, S. M. R. (2018). Student engagement predictions in an e-Learning system and their impact on student course assessment scores. Computational Intelligence and Neuroscience . https://doi.org/10.1155/2018/6347186 .
Iglesias, A., Martinez, P., Aler, R., & Fernandez, F. (2009). Reinforcement learning of pedagogical policies in adaptive and intelligent educational systems. Knowledge-Based Systems , 22 (4), 266–270 https://e-archivo.uc3m.es/bitstream/handle/10016/6502/reinforcement_aler_KBS_2009_ps.pdf?sequence=1&isAllowed=y .
Jackson, M., & Cossitt, B. (2015). Is intelligent online tutoring software useful in refreshing financial accounting knowledge? Advances in Accounting Education: Teaching and Curriculum Innovations , 16 , 1–19. https://doi.org/10.1108/S1085-462220150000016001 .
Jain, G. P., Gurupur, V. P., Schroeder, J. L., & Faulkenberry, E. D. (2014). Artificial intelligence-based student learning evaluation: A concept map-based approach for analyzing a student’s understanding of a topic. IEEE Transactions on Learning Technologies , 7 (3), 267–279. https://doi.org/10.1109/TLT.2014.2330297 .
Jeschike, M., Jeschke, S., Pfeiffer, O., Reinhard, R., & Richter, T. (2007). Equipping virtual laboratories with intelligent training scenarios. AACE Journal , 15 (4), 413–436 h ttps://www.learntechlib.org/primary/p/23636/ .
Jia, J. (2009). An AI framework to teach English as a foreign language: CSIEC. AI Magazine , 30 (2), 59–59. https://doi.org/10.1609/aimag.v30i2.2232 .
Jonassen, D., Davidson, M., Collins, M., Campbell, J., & Haag, B. B. (1995). Constructivism and computer-mediated communication in distance education. American Journal of Distance Education , 9 (2), 7–25. https://doi.org/10.1080/08923649509526885 .
Kalz, M., van Bruggen, J., Giesbers, B., Waterink, W., Eshuis, J., & Koper, R. (2008). A model for new linkages for prior learning assessment. Campus-Wide Information Systems , 25 (4), 233–243. https://doi.org/10.1108/10650740810900676 .
Kao, Chen, & Sun (2010). Using an e-Learning system with integrated concept maps to improve conceptual understanding. International Journal of Instructional Media , 37 (2), 151–151.
Kaplan, A., & Haenlein, M. (2019). Siri, Siri, in my hand: Who’s the fairest in the land? On the interpretations, illustrations, and implications of artificial intelligence. Business Horizons , 62 (1), 15–25. https://doi.org/10.1016/j.bushor.2018.08.004 .
Kardan, A. A., & Sadeghi, H. (2013). A decision support system for course offering in online higher education institutes. International Journal of Computational Intelligence Systems , 6 (5), 928–942. https://doi.org/10.1080/18756891.2013.808428 .
Kardan, A. A., Sadeghi, H., Ghidary, S. S., & Sani, M. R. F. (2013). Prediction of student course selection in online higher education institutes using neural network. Computers and Education , 65 , 1–11. https://doi.org/10.1016/j.compedu.2013.01.015 .
Kose, U., & Arslan, A. (2016). Intelligent e-Learning system for improving students’ academic achievements in computer programming courses. International Journal of Engineering Education , 32 (1, A), 185–198.
Li, X. (2007). Intelligent agent-supported online education. Decision Sciences Journal of Innovative Education , 5 (2), 311–331. https://doi.org/10.1111/j.1540-4609.2007.00143.x .
Lo, J. J., Chan, Y. C., & Yeh, S. W. (2012). Designing an adaptive web-based learning system based on students’ cognitive styles identified online. Computers and Education , 58 (1), 209–222. https://doi.org/10.1016/j.compedu.2011.08.018 .
Lodhi, P., Mishra, O., Jain, S., & Bajaj, V. (2018). StuA: An intelligent student assistant. International Journal of Interactive Multimedia and Artificial Intelligence , 5 (2), 17–25. https://doi.org/10.9781/ijimai.2018.02.008 .
Luckin, R., Holmes, W., Griffiths, M., & Forcier, L. B. (2016). Intelligence unleashed - an argument for AI in education. Retrieved from http://discovery.ucl.ac.uk/1475756/
Ma, H., & Slater, T. (2015). Using the developmental path of cause to bridge the gap between AWE scores and writing teachers’ evaluations. Writing & Pedagogy , 7 (2), 395–422. https://doi.org/10.1558/wap.v7i2-3.26376 .
McNamara, D. S., Crossley, S. A., Roscoe, R. D., Allen, L. K., & Dai, J. (2015). A hierarchical classification approach to automated essay scoring. Assessing Writing , 23 , 35–59. https://doi.org/10.1016/j.asw.2014.09.002 .
Misiejuk, K., & Wasson, B. (2017). State of the field report on learning analytics. SLATE report 2017–2 . Bergen: Centre for the Science of Learning & Technology (SLATE) Retrieved from http://bora.uib.no/handle/1956/17740 .
Miwa, K., Terai, H., Kanzaki, N., & Nakaike, R. (2014). An intelligent tutoring system with variable levels of instructional support for instructing natural deduction. Transactions of the Japanese Society for Artificial Intelligence , 29 (1), 148–156. https://doi.org/10.1527/tjsai.29.148 .
Moher, D., Liberati, A., Tetzlaff, J., & Altman, D. G. (2009). Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. BMJ , 339 , b2535. https://doi.org/10.1136/bmj.b2535 Clinical Research Ed.
Nehm, R. H., Ha, M., & Mayfield, E. (2012). Transforming biology assessment with machine learning: Automated scoring of written evolutionary explanations. Journal of Science Education and Technology , 21 (1), 183–196. https://doi.org/10.1007/s10956-011-9300-9 .
Neumann, W. L. (2007). Social research methods: Qualitative and quantitative approaches . Boston: Pearson.
Ng, S. C., Wong, C. K., Lee, T. S., & Lee, F. Y. (2011). Design of an agent-based academic information system for effective education management. Information Technology Journal , 10 (9), 1784–1788. https://doi.org/10.3923/itj.2011.1784.1788 .
Nguyen, J., Sánchez-Hernández, G., Armisen, A., Agell, N., Rovira, X., & Angulo, C. (2018). A linguistic multi-criteria decision-aiding system to support university career services. Applied Soft Computing Journal , 67 , 933–940. https://doi.org/10.1016/j.asoc.2017.06.052 .
Nicholas, D., Watkinson, A., Jamali, H. R., Herman, E., Tenopir, C., Volentine, R., … Levine, K. (2015). Peer review: still king in the digital age. Learned Publishing , 28 (1), 15–21. https://doi.org/10.1087/20150104 .
Oztekin, A. (2016). A hybrid data analytic approach to predict college graduation status and its determinative factors. Industrial Management and Data Systems , 116 (8), 1678–1699. https://doi.org/10.1108/IMDS-09-2015-0363 .
Ozturk, Z. K., Cicek, Z. I. E., & Ergul, Z. (2017). Sentiment analysis: An application to Anadolu University. Acta Physica Polonica A , 132 (3), 753–755. https://doi.org/10.12693/APhysPolA.132.753 .
Palocsay, S. W., & Stevens, S. P. (2008). A study of the effectiveness of web-based homework in teaching undergraduate business statistics. Decision Sciences Journal of Innovative Education , 6 (2), 213–232. https://doi.org/10.1111/j.1540-4609.2008.00167.x .
Paquette, L., Lebeau, J. F., Beaulieu, G., & Mayers, A. (2015). Designing a knowledge representation approach for the generation of pedagogical interventions by MTTs. International Journal of Artificial Intelligence in Education , 25 (1), 118–156 https://www.learntechlib.org/p/168275/ .
Payne, V. L., Medvedeva, O., Legowski, E., Castine, M., Tseytlin, E., Jukic, D., & Crowley, R. S. (2009). Effect of a limited-enforcement intelligent tutoring system in dermatopathology on student errors, goals and solution paths. Artificial Intelligence in Medicine , 47 (3), 175–197. https://doi.org/10.1016/j.artmed.2009.07.002 .
Pedró, F., Subosa, M., Rivas, A., & Valverde, P. (2019). Artificial intelligence in education: Challenges and opportunities for sustainable development . Paris: UNESCO.
Perez, S., Massey-Allard, J., Butler, D., Ives, J., Bonn, D., Yee, N., & Roll, I. (2017). Identifying productive inquiry in virtual labs using sequence mining. In E. André, R. Baker, X. Hu, M. M. T. Rodrigo, & B. du Boulay (Eds.), Artificial intelligence in education , (vol. 10,331, pp. 287–298). https://doi.org/10.1007/978-3-319-61425-0_24 .
Chapter Google Scholar
Perin, D., & Lauterbach, M. (2018). Assessing text-based writing of low-skilled college students. International Journal of Artificial Intelligence in Education , 28 (1), 56–78. https://doi.org/10.1007/s40593-016-0122-z .
Petticrew, M., & Roberts, H. (2006). Systematic reviews in the social sciences: A practical guide . Malden; Oxford: Blackwell Pub.
Phani Krishna, K. V., Mani Kumar, M., & Aruna Sri, P. S. G. (2018). Student information system and performance retrieval through dashboard. International Journal of Engineering and Technology (UAE) , 7 , 682–685. https://doi.org/10.14419/ijet.v7i2.7.10922 .
Popenici, S., & Kerr, S. (2017). Exploring the impact of artificial intelligence on teaching and learning in higher education. Research and Practice in Technology Enhanced Learning . https://doi.org/10.1186/s41039-017-0062-8 .
Prinsloo, P. (2017). Fleeing from Frankenstein’s monster and meeting Kafka on the way: Algorithmic decision-making in higher education. E-Learning and Digital Media , 14 (3), 138–163. https://doi.org/10.1177/2042753017731355 .
Quixal, M., & Meurers, D. (2016). How can writing tasks be characterized in a way serving pedagogical goals and automatic analysis needs? Calico Journal , 33 (1), 19–48. https://doi.org/10.1558/cj.v33i1.26543 .
Raju, D., & Schumacker, R. (2015). Exploring student characteristics of retention that lead to graduation in higher education using data mining models. Journal of College Student Retention: Research, Theory and Practice , 16 (4), 563–591. https://doi.org/10.2190/CS.16.4.e .
Ramírez, J., Rico, M., Riofrío-Luzcando, D., Berrocal-Lobo, M., & Antonio, A. (2018). Students’ evaluation of a virtual world for procedural training in a tertiary-education course. Journal of Educational Computing Research , 56 (1), 23–47. https://doi.org/10.1177/0735633117706047 .
Ray, R. D., & Belden, N. (2007). Teaching college level content and reading comprehension skills simultaneously via an artificially intelligent adaptive computerized instructional system. Psychological Record , 57 (2), 201–218 https://opensiuc.lib.siu.edu/cgi/viewcontent.cgi?referer=https://www.google.com/&httpsredir=1&article=1103&context=tpr .
Reid, J. (1995). Managing learner support. In F. Lockwood (Ed.), Open and distance learning today , (pp. 265–275). London: Routledge.
Rovira, S., Puertas, E., & Igual, L. (2017). Data-driven system to predict academic grades and dropout. PLoS One , 12 (2), 1–21. https://doi.org/10.1371/journal.pone.0171207 .
Russel, S., & Norvig, P. (2010). Artificial intelligence - a modern approach . New Jersey: Pearson Education.
Salmon, G. (2000). E-moderating - the key to teaching and learning online , (1st ed., ). London: Routledge.
Samarakou, M., Fylladitakis, E. D., Früh, W. G., Hatziapostolou, A., & Gelegenis, J. J. (2015). An advanced eLearning environment developed for engineering learners. International Journal of Emerging Technologies in Learning , 10 (3), 22–33. https://doi.org/10.3991/ijet.v10i3.4484 .
Sanchez, E. L., Santos-Olmo, A., Alvarez, E., Huerta, M., Camacho, S., & Fernandez-Medina, E. (2016). Development of an expert system for the evaluation of students’ curricula on the basis of competencies. Future Internet , 8 (2). https://doi.org/10.3390/fi8020022 .
Schiaffino, S., Garcia, P., & Amandi, A. (2008). eTeacher: Providing personalized assistance to e-learning students. Computers & Education , 51 (4), 1744–1754. https://doi.org/10.1016/j.compedu.2008.05.008 .
Sebastian, J., & Richards, D. (2017). Changing stigmatizing attitudes to mental health via education and contact with embodied conversational agents. Computers in Human Behavior , 73 , 479–488. https://doi.org/10.1016/j.chb.2017.03.071 .
Selwyn, N. (2016). Is technology good for education? Cambridge, UK: Malden, MA : Polity Press.
Shen, V. R. L., & Yang, C.-Y. (2011). Intelligent multiagent tutoring system in artificial intelligence. International Journal of Engineering Education , 27 (2), 248–256.
Šimundić, A.-M. (2009). Measures of diagnostic accuracy: Basic definitions. Journal of the International Federation of Clinical Chemistry and Laboratory Medicine , 19 (4), 203–2011 https://www.ncbi.nlm.nih.gov/pubmed/27683318 .
Smith, R. (2006). Peer review: a flawed process at the heart of science and journals. Journal of the Royal Society of Medicine , 99 , 178–182. https://doi.org/10.1258/jrsm.99.4.178 .
Spikol, D., Ruffaldi, E., Dabisias, G., & Cukurova, M. (2018). Supervised machine learning in multimodal learning analytics for estimating success in project-based learning. Journal of Computer Assisted Learning , 34 (4), 366–377. https://doi.org/10.1111/jcal.12263 .
Sreenivasa Rao, K., Swapna, N., & Praveen Kumar, P. (2018). Educational data mining for student placement prediction using machine learning algorithms. International Journal of Engineering and Technology (UAE) , 7 (1.2), 43–46. https://doi.org/10.14419/ijet.v7i1.2.8988 .
Steenbergen-Hu, S., & Cooper, H. (2014). A meta-analysis of the effectiveness of intelligent tutoring systems on college students’ academic learning. Journal of Educational Psychology , 106 (2), 331–347. https://doi.org/10.1037/a0034752 .
Sultana, S., Khan, S., & Abbas, M. (2017). Predicting performance of electrical engineering students using cognitive and non-cognitive features for identification of potential dropouts. International Journal of Electrical Engineering Education , 54 (2), 105–118. https://doi.org/10.1177/0020720916688484 .
Tai, D. W. S., Wu, H. J., & Li, P. H. (2008). Effective e-learning recommendation system based on self-organizing maps and association mining. Electronic Library , 26 (3), 329–344. https://doi.org/10.1108/02640470810879482 .
Tegmark, M. (2018). Life 3.0: Being human in the age of artificial intelligence . London: Penguin Books.
Teshnizi, S. H., & Ayatollahi, S. M. T. (2015). A comparison of logistic regression model and artificial neural networks in predicting of student’s academic failure. Acta Informatica Medica, 23(5), 296-300. https://doi.org/10.5455/aim.2015.23.296-300
Thatcher, S. J. (2014). The use of artificial intelligence in the learning of flight crew situation awareness in an undergraduate aviation programme. World Transactions on Engineering and Technology Education , 12 (4), 764–768 https://www.semanticscholar.org/paper/The-use-of-artificial-intelligence-in-the-learning-Thatcher/758d3053051511cde2f28fc6b2181b8e227f8ea2 .
Torres-Díaz, J. C., Infante Moro, A., & Valdiviezo Díaz, P. (2014). Los MOOC y la masificación personalizada. Profesorado , 18 (1), 63–72 http://www.redalyc.org/articulo.oa?id=56730662005 .
Umarani, S. D., Raviram, P., & Wahidabanu, R. S. D. (2011). Speech based question recognition of interactive ubiquitous teaching robot using supervised classifier. International Journal of Engineering and Technology , 3 (3), 239–243 http://www.enggjournals.com/ijet/docs/IJET11-03-03-35.pdf .
Umer, R., Susnjak, T., Mathrani, A., & Suriadi, S. (2017). On predicting academic performance with process mining in learning analytics. Journal of Research in Innovative Teaching , 10 (2), 160–176. https://doi.org/10.1108/JRIT-09-2017-0022 .
Vlugter, P., Knott, A., McDonald, J., & Hall, C. (2009). Dialogue-based CALL: A case study on teaching pronouns. Computer Assisted Language Learning , 22 (2), 115–131. https://doi.org/10.1080/09588220902778260 .
Walsh, K., Tamjidul, H., & Williams, K. (2017). Human machine learning symbiosis. Journal of Learning in Higher Education , 13 (1), 55–62 http://cs.uno.edu/~tamjid/pub/2017/JLHE.pdf .
Welham, D. (2008). AI in training (1980–2000): Foundation for the future or misplaced optimism? British Journal of Educational Technology , 39 (2), 287–303. https://doi.org/10.1111/j.1467-8535.2008.00818.x .
Weston-Sementelli, J. L., Allen, L. K., & McNamara, D. S. (2018). Comprehension and writing strategy training improves performance on content-specific source-based writing tasks. International Journal of Artificial Intelligence in Education , 28 (1), 106–137. https://doi.org/10.1007/s40593-016-0127-7 .
Wickham, H., & Grolemund, G. (2016). R for data science: Import, tidy, transform, visualize, and model data , (1st ed., ). Sebastopol: O’Reilly.
Yang, F., Wang, M., Shen, R., & Han, P. (2007). Community-organizing agent: An artificial intelligent system for building learning communities among large numbers of learners. Computers & Education , 49 (2), 131–147. https://doi.org/10.1016/j.compedu.2005.04.019 .
Yang, Y. F., Wong, W. K., & Yeh, H. C. (2009). Investigating readers’ mental maps of references in an online system. Computers and Education , 53 (3), 799–808. https://doi.org/10.1016/j.compedu.2009.04.016 .
Yoo, J., & Kim, J. (2014). Can online discussion participation predict group project performance? Investigating the roles of linguistic features and participation patterns. International Journal of Artificial Intelligence in Education , 24 (1), 8–32 https://www.learntechlib.org/p/155243/ .
Yuanyuan, J., & Yajuan, L. (2014). Development of an intelligent teaching system based on 3D technology in the course of digital animation production. International Journal of Emerging Technologies in Learning , 9 (9), 81–86. https://doi.org/10.3991/ijet.v11i09.6116 .
Zhu, W., Marquez, A., & Yoo, J. (2015). “Engineering economics jeopardy!” Mobile app for university students. Engineering Economist , 60 (4), 291–306. https://doi.org/10.1080/0013791X.2015.1067343 .
Download references
Acknowledgements
Not applicable.
This study received no external funding.
Author information
Authors and affiliations.
Faculty of Education and Social Sciences, University of Oldenburg, Ammerländer Heerstr. 138, 26129, Oldenburg, Germany
Olaf Zawacki-Richter, Victoria I. Marín, Melissa Bond & Franziska Gouverneur
You can also search for this author in PubMed Google Scholar
Contributions
The authors declare that each author has made a substantial contribution to this article, has approved the submitted version of this article and hast agreed to be personally accountable for the author’s own contributions. In particular, OZR as the leading author, has made a major contribution to the conception and design of the research; the data collection, screening of abstracts and full papers, the analysis, synthesis and interpretation of data; VIM has made a major contribution to the data collection, screening of abstracts and full papers, the analysis, synthesis and interpretation of data; MB has made a major contribution to the data collection, screening of full papers, the analysis, synthesis and interpretation of data; as a native speaker of English she was also responsible for language editing; FG has made a major contribution to the data collection, and the screening of abstracts and full papers. She calculated Cohen’s kappa values of interrater reliability.
Authors’ information
Dr. Olaf Zawacki-Richter is a Professor of Educational Technology in the Faculty of Education and Social Sciences at the University of Oldenburg in Germany. He is the Director of the Center for Open Education Research (COER) and the Center for Lifelong Learning (C3L).
Dr. Victoria I. Marín is a Post-doctoral Researcher in the Faculty of Education and Social Sciences / Center for Open Education Research (COER) at the University of Oldenburg in Germany.
Melissa Bond is a PhD candidate and Research Associate in the Faculty of Education and Social Sciences / Center for Open Education Research (COER) at the University of Oldenburg in Germany.
Franziska Gouverneur is a Masters student and Research Assistant in the Faculty of Education and Social Sciences / Center for Open Education Research (COER) at the University of Oldenburg in Germany.
Corresponding author
Correspondence to Olaf Zawacki-Richter .
Ethics declarations
Competing interests.
The authors declare that they have no competing interests.
Additional information
Publisher’s note.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Reprints and permissions
About this article
Cite this article.
Zawacki-Richter, O., Marín, V.I., Bond, M. et al. Systematic review of research on artificial intelligence applications in higher education – where are the educators?. Int J Educ Technol High Educ 16 , 39 (2019). https://doi.org/10.1186/s41239-019-0171-0
Download citation
Received : 26 July 2019
Accepted : 01 October 2019
Published : 28 October 2019
DOI : https://doi.org/10.1186/s41239-019-0171-0
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
- Artificial intelligence
- Higher education
- Machine learning
- Systematic review
Generative AI: A Review on Models and Applications
Ieee account.
- Change Username/Password
- Update Address
Purchase Details
- Payment Options
- Order History
- View Purchased Documents
Profile Information
- Communications Preferences
- Profession and Education
- Technical Interests
- US & Canada: +1 800 678 4333
- Worldwide: +1 732 981 0060
- Contact & Support
- About IEEE Xplore
- Accessibility
- Terms of Use
- Nondiscrimination Policy
- Privacy & Opting Out of Cookies
A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.
An official website of the United States government
Official websites use .gov A .gov website belongs to an official government organization in the United States.
Secure .gov websites use HTTPS A lock ( Lock Locked padlock icon ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.
- Publications
- Account settings
- Advanced Search
- Journal List
The rise of artificial intelligence in healthcare applications
Kaveh memarzadeh.
- Author information
- Article notes
- Copyright and License information
Issue date 2020.
Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active.
Big data and machine learning are having an impact on most aspects of modern life, from entertainment, commerce, and healthcare. Netflix knows which films and series people prefer to watch, Amazon knows which items people like to buy when and where, and Google knows which symptoms and conditions people are searching for. All this data can be used for very detailed personal profiling, which may be of great value for behavioral understanding and targeting but also has potential for predicting healthcare trends. There is great optimism that the application of artificial intelligence (AI) can provide substantial improvements in all areas of healthcare from diagnostics to treatment. It is generally believed that AI tools will facilitate and enhance human work and not replace the work of physicians and other healthcare staff as such. AI is ready to support healthcare personnel with a variety of tasks from administrative workflow to clinical documentation and patient outreach as well as specialized support such as in image analysis, medical device automation, and patient monitoring. In this chapter, some of the major applications of AI in healthcare will be discussed covering both the applications that are directly associated with healthcare and those in the healthcare value chain such as drug development and ambient assisted living.
Keywords: Artificial intelligence, healthcare applications, machine learning, precision medicine, ambient assisted living, natural language programming, machine vision
2.1. The new age of healthcare
Big data and machine learning are having an impact on most aspects of modern life, from entertainment, commerce, and healthcare. Netflix knows which films and series people prefer to watch, Amazon knows which items people like to buy when and where, and Google knows which symptoms and conditions people are searching for. All this data can be used for very detailed personal profiling, which may be of great value for behavioral understanding and targeting but also has potential for predicting healthcare trends. There is great optimism that the application of artificial intelligence (AI) can provide substantial improvements in all areas of healthcare from diagnostics to treatment. There is already a large amount of evidence that AI algorithms are performing on par or better than humans in various tasks, for instance, in analyzing medical images or correlating symptoms and biomarkers from electronic medical records (EMRs) with the characterization and prognosis of the disease [1] .
The demand for healthcare services is ever increasing and many countries are experiencing a shortage of healthcare practitioners, especially physicians. Healthcare institutions are also fighting to keep up with all the new technological developments and the high expectations of patients with respect to levels of service and outcomes as they know it from consumer products including those of Amazon and Apple [2] . The advances in wireless technology and smartphones have provided opportunities for on-demand healthcare services using health tracking apps and search platforms and have also enabled a new form of healthcare delivery, via remote interactions, available anywhere and anytime. Such services are relevant for underserved regions and places lacking specialists and help reduce costs and prevent unnecessary exposure to contagious illnesses at the clinic. Telehealth technology is also relevant in developing countries where the healthcare system is expanding and where healthcare infrastructure can be designed to meet the current needs [3] . While the concept is clear, these solutions still need substantial independent validation to prove patient safety and efficacy.
The healthcare ecosystem is realizing the importance of AI-powered tools in the next-generation healthcare technology. It is believed that AI can bring improvements to any process within healthcare operation and delivery. For instance, the cost savings that AI can bring to the healthcare system is an important driver for implementation of AI applications. It is estimated that AI applications can cut annual US healthcare costs by USD 150 billion in 2026. A large part of these cost reductions stem from changing the healthcare model from a reactive to a proactive approach, focusing on health management rather than disease treatment. This is expected to result in fewer hospitalizations, less doctor visits, and less treatments. AI-based technology will have an important role in helping people stay healthy via continuous monitoring and coaching and will ensure earlier diagnosis, tailored treatments, and more efficient follow-ups.
The AI-associated healthcare market is expected to grow rapidly and reach USD 6.6 billion by 2021 corresponding to a 40% compound annual growth rate [4] .
2.1.1. Technological advancements
There have been a great number of technological advances within the field of AI and data science in the past decade. Although research in AI for various applications has been ongoing for several decades, the current wave of AI hype is different from the previous ones. A perfect combination of increased computer processing speed, larger data collection data libraries, and a large AI talent pool has enabled rapid development of AI tools and technology, also within healthcare [5] . This is set to make a paradigm shift in the level of AI technology and its adoption and impact on society.
In particular, the development of deep learning (DL) has had an impact on the way we look at AI tools today and is the reason for much of the recent excitement surrounding AI applications. DL allows finding correlations that were too complex to render using previous machine learning algorithms. This is largely based on artificial neural networks and compared with earlier neural networks, which only had 3–5 layers of connections, DL networks have more than 10 layers. This corresponds to simulation of artificial neurons in the order of millions.
There are numerous companies that are frontrunners in this area, including IBM Watson and Google’s Deep Mind. These companies have shown that their AI can beat humans in selected tasks and activities including chess, Go, and other games. Both IBM Watson and Google’s Deep Mind are currently being used for many healthcare-related applications. IBM Watson is being used to investigate for diabetes management, advanced cancer care and modeling, and drug discovery, but has yet to show clinical value to the patients. Deep Mind is also being looked at for applications including mobile medical assistant, diagnostics based on medical imaging, and prediction of patient deterioration [6] , [7] .
Many data and computation-based technologies have followed exponential growth trajectories. The most known example is that of Moore’s law, which explains the exponential growth in the performance of computer chips. Many consumer-oriented apps have experienced similar exponential growth by offering affordable services. In healthcare and life science, the mapping of the human genome and the digitization of medical data could result in a similar growth pattern as genetic sequencing and profiling becomes cheaper and electronic health records and the like serve as a platform for data collection. Although these areas may seem small at first, the exponential growth will take control at some point. Humans are generally poor at understanding exponential trends and have a tendency to overestimate the impact of technology in the short-term (e.g. 1 year) while underestimating the long-term (e.g. 10 years) effect.
2.1.2. Artificial intelligence applications in healthcare
It is generally believed that AI tools will facilitate and enhance human work and not replace the work of physicians and other healthcare staff as such. AI is ready to support healthcare personnel with a variety of tasks from administrative workflow to clinical documentation and patient outreach as well as specialized support such as in image analysis, medical device automation, and patient monitoring.
There are different opinions on the most beneficial applications of AI for healthcare purposes. Forbes stated in 2018 that the most important areas would be administrative workflows, image analysis, robotic surgery, virtual assistants, and clinical decision support [8] . A 2018 report by Accenture mentioned the same areas and also included connected machines, dosage error reduction, and cybersecurity [9] . A 2019 report from McKinsey states important areas being connected and cognitive devices, targeted and personalized medicine, robotics-assisted surgery, and electroceuticals [10] .
In the next sections, some of the major applications of AI in healthcare will be discussed covering both the applications that are directly associated with healthcare and other applications in the healthcare value chain such as drug development and ambient assisted living (AAL).
2.2. Precision medicine
Precision medicine provides the possibility of tailoring healthcare interventions to individuals or groups of patients based on their disease profile, diagnostic or prognostic information, or their treatment response. The tailor-made treatment opportunity will take into consideration the genomic variations as well as contributing factors of medical treatment such as age, gender, geography, race, family history, immune profile, metabolic profile, microbiome, and environment vulnerability. The objective of precision medicine is to use individual biology rather than population biology at all stages of a patient’s medical journey. This means collecting data from individuals such as genetic information, physiological monitoring data, or EMR data and tailoring their treatment based on advanced models. Advantages of precision medicine include reduced healthcare costs, reduction in adverse drug response, and enhancing effectivity of drug action [11] . Innovation in precision medicine is expected to provide great benefits to patients and change the way health services are delivered and evaluated.
There are many types of precision medicine initiatives and overall, they can be divided into three types of clinical areas: complex algorithms, digital health applications, and “omics”-based tests.
Complex algorithms: Machine learning algorithms are used with large datasets such as genetic information, demographic data, or electronic health records to provide prediction of prognosis and optimal treatment strategy.
Digital health applications : Healthcare apps record and process data added by patients such as food intake, emotional state or activity, and health monitoring data from wearables, mobile sensors, and the likes. Some of these apps fall under precision medicine and use machine learning algorithms to find trends in the data and make better predictions and give personalized treatment advice.
Omics-based tests : Genetic information from a population pool is used with machine learning algorithms to find correlations and predict treatment responses for the individual patient. In addition to genetic information, other biomarkers such as protein expression, gut microbiome, and metabolic profile are also employed with machine learning to enable personalized treatments [12] .
Here, we explore selected therapeutic applications of AI including genetics-based solutions and drug discovery.
2.2.1. Genetics-based solutions
It is believed that within the next decade a large part of the global population will be offered full genome sequencing either at birth or in adult life. Such genome sequencing is estimated to take up 100–150 GB of data and will allow a great tool for precision medicine. Interfacing the genomic and phenotype information is still ongoing. The current clinical system would need a redesign to be able to use such genomics data and the benefits hereof [13] .
Deep Genomics, a Healthtech company, is looking at identifying patterns in the vast genetic dataset as well as EMRs, in order to link the two with regard to disease markers. This company uses these correlations to identify therapeutics targets, either existing therapeutic targets or new therapeutic candidates with the purpose of developing individualized genetic medicines. They use AI in every step of their drug discovery and development process including target discovery, lead optimization, toxicity assessment, and innovative trial design.
Many inherited diseases result in symptoms without a specific diagnosis and while interpreting whole genome data is still challenging due to the many genetic profiles. Precision medicine can allow methods to improve identification of genetic mutations based on full genome sequencing and the use of AI.
2.2.2. Drug discovery and development
Drug discovery and development is an immensely long, costly, and complex process that can often take more than 10 years from identification of molecular targets until a drug product is approved and marketed. Any failure during this process has a large financial impact, and in fact most drug candidates fail sometime during development and never make it onto the market. On top of that are the ever-increasing regulatory obstacles and the difficulties in continuously discovering drug molecules that are substantially better than what is currently marketed. This makes the drug innovation process both challenging and inefficient with a high price tag on any new drug products that make it onto the market [14] .
There has been a substantial increase in the amount of data available assessing drug compound activity and biomedical data in the past few years. This is due to the increasing automation and the introduction of new experimental techniques including hidden Markov model based text to speech synthesis and parallel synthesis. However, mining of the large-scale chemistry data is needed to efficiently classify potential drug compounds and machine learning techniques have shown great potential [15] . Methods such as support vector machines, neural networks, and random forest have all been used to develop models to aid drug discovery since the 1990s. More recently, DL has begun to be implemented due to the increased amount of data and the continuous improvements in computing power. There are various tasks in the drug discovery process where machine learning can be used to streamline the tasks. This includes drug compound property and activity prediction, de novo design of drug compounds, drug–receptor interactions, and drug reaction prediction [16] .
The drug molecules and the associated features used in the in silico models are transformed into vector format so they can be read by the learning systems. Generally, the data used here include molecular descriptors (e.g., physicochemical properties) and molecular fingerprints (molecular structure) as well as simplified molecular input line entry system (SMILES) strings and grids for convolutional neural networks (CNNs) [17] .
2.2.2.1. Drug property and activity prediction
The properties and activity on a drug molecule are important to know in order to assess its behavior in the human body. Machine learning-based techniques have been used to assess the biological activity, absorption, distribution, metabolism, and excretion (ADME) characteristics, and physicochemical properties of drug molecules ( Fig. 2.1 ). In recent years, several libraries of chemical and biological data including ChEMBL and PubChem have become available for storing information on millions of molecules for various disease targets. These libraries are machine-readable and are used to build machine learning models for drug discovery. For instance, CNNs have been used to generate molecular fingerprints from a large set of molecular graphs with information about each atom in the molecule. Neural fingerprints are then used to predict new characteristics based on a given molecule. In this way, molecular properties including octanol, solubility melting point, and biological activity can be evaluated as demonstrated by Coley et al. and others and be used to predict new features of the drug molecules [18] . They can then also be combined with a scoring function of the drug molecules to select for molecules with desirable biological activity and physiochemical properties. Currently, most new drugs discovered have a complex structure and/or undesirable properties including poor solubility, low stability, or poor absorption.
Figure 2.1.
Machine learning opportunities within the small molecule drug discovery and development process.
Machine learning has also been implemented to assess the toxicity of molecules, for instance, using DeepTox, a DL-based model for evaluating the toxic effects of compounds based on a dataset containing many drug molecules [19] . Another platform called MoleculeNet is also used to translate two-dimensional molecular structures into novel features/descriptors, which can then be used in predicting toxicity of the given molecule. The MoleculeNet platform is built on data from various public databases and more than 700,000 compounds have already been tested for toxicity or other properties [20] .
2.2.2.2. De novo design through deep learning
Another interesting application of DL in drug discovery is the generation of new chemical structures through neural networks ( Fig. 2.2 ). Several DL-based techniques have been proposed for molecular de novo design. This also includes protein engineering involving the molecular design of proteins with specific binding or functions.
Figure 2.2.
Illustration of the generative artificial intelligence concept for de novo design. Training data of molecular structures are used to emit new chemical entities by sampling.
Here, variational autoencoders and adversarial autoencoders are often used to design new molecules in an automated process by fitting the design model to large datasets of drug molecules. Autoencoders are a type of neural network for unsupervised learning and are also the tools used to, for instance, generate images of fictional human faces. The autoencoders are trained on many drug molecule structures and the latent variables are then used as the generative model. As an example, the program druGAN used adversarial autoencoders to generate new molecular fingerprints and drug designs incorporating features such as solubility and absorption based on predefined anticancer drug properties. These results suggest a substantial improvement in the efficiency in generating new drug designs with specific properties [21] . Blaschke et al. also applied adversarial autoencoders and Bayesian optimization to generate ligands specific to the dopamine type 2 receptor [22] . Merk et al. trained a recurrent neural network to capture a large number of bioactive compounds such as SMILES strings. This model was then fine-tuned to recognize retinoid X and peroxisome proliferator-activated receptor agonists. The identified compounds were synthesized and demonstrated potent receptor modulatory activity in in vitro assays [23] .
2.2.2.3. Drug–target interactions
The assessment of drug–target interactions is an important part of the drug design process. The binding pose and the binding affinity between the drug molecule and the target have an important impact on the chances of success based on the in silico prediction. Some of the more common approaches involve drug candidate identification via molecular docking, for prediction and preselection of interesting drug–target interactions.
Molecular docking is a molecular modeling approach used to study the binding and complex formation between two molecules. It can be used to find interactions between a drug compound and a target, for example a receptor, and predicts the conformation of the drug compound in the binding site of the target. The docking algorithm then ranks the interactions via scoring functions and estimates binding affinity. Popular commercial molecular docking tools include AutoDock, DOCK, Glide, and FlexX. These are rather simple and many data scientists are working on improving the prediction of drug–target interaction using various learning models [24] . CNNs are found useful as scoring functions for docking applications and have demonstrated efficient pose/affinity prediction for drug–target complexes and assessment of activity/inactivity. For instance, Wallach and Dzamba build AtomNet, a deep CNN to predict the bioactivity of small molecule drugs for drug discovery applications. The authors showed that AtomNet outperforms conventional docking models in relation to accuracy with an AUC (area under the curve) of 0.9 or more for 58% of the targets [25] .
Current trends within AI applications for drug discovery and development point toward more and more models using DL approaches. Compared with more conventional machine learning approaches, DL models take a long time to train because of the large datasets and the often large number of parameters needed. This can be a major disadvantage when data is not readily available. There is therefore ongoing work on reducing the amount of data required as training sets for DL so it can learn with only small amounts of available data. This is similar to the learning process that takes place in the human brain and would be beneficial in applications where data collection is resource intensive and large datasets are not readily available, as is often the case with medicinal chemistry and novel drug targets. There are several novel methods being investigated, for instance, using a one-shot learning approach or a long short-term memory approach and also using memory augmented neural networks such as the differentiable neural computer [17] .
2.3. Artificial intelligence and medical visualization
Interpretation of data that appears in the form of either an image or a video can be a challenging task. Experts in the field have to train for many years to attain the ability to discern medical phenomena and on top of that have to actively learn new content as more research and information presents itself. However, the demand is ever increasing and there is a significant shortage of experts in the field. There is therefore a need for a fresh approach and AI promises to be the tool to be used to fill this demand gap.
2.3.1. Machine vision for diagnosis and surgery
Computer vision involves the interpretation of images and videos by machines at or above human-level capabilities including object and scene recognition. Areas where computer vision is making an important impact include image-based diagnosis and image-guided surgery.
2.3.1.1. Computer vision for diagnosis and surgery
Computer vision has mainly been based on statistical signal processing but is now shifting more toward application of artificial neural networks as the choice for learning method. Here, DL is used to engineer computer vision algorithms for classifying images of lesions in skin and other tissues. Video data is estimated to contain 25 times the amount of data from high-resolution diagnostic images such as CT and could thus provide a higher data value based on resolution over time. Video analysis is still premature but has great potential for clinical decision support. As an example, a video analysis of a laparoscopic procedure in real time has resulted in 92.8% accuracy in identification of all the steps of the procedure and surprisingly, the detection of missing or unexpected steps [26] .
A notable application of AI and computer vision within surgery technology is to augment certain features and skills within surgery such as suturing and knot-tying. The smart tissue autonomous robot (STAR) from the Johns Hopkins University has demonstrated that it can outperform human surgeons in some surgical procedures such as bowel anastomosis in animals. A fully autonomous robotic surgeon remains a concept for the not so near future but augmenting different aspects of surgery using AI is of interest to researchers. An example of this is a group at the Institute of Information Technology at the Alpen-Adria Universität Klagenfurt that uses surgery videos as training material in order to identify a specific intervention made by the surgeon. For example, when an act of dissection or cutting is performed on the patient’s tissues or organs, the algorithm recognizes the likelihood of the intervention as well as the specific region in the body [27] . Such algorithms are naturally based on the training on many videos and could be proven very useful for complicated surgical procedures or for situations where an inexperienced surgeon is required to perform an emergency surgery. It is important that surgeons are actively engaged in the development of such tools ensuring clinical relevance and quality and facilitating the translation from the lab to the clinical sector.
2.3.2. Deep learning and medical image recognition
The word “Deep” refers to the multilayered nature of machine learning and among all DL techniques, the most promising in the field of image recognition has been the CNNs. Yann LeCun, a prominent French computer scientist introduced the theoretical background to this system by creating LeNET in the 1980s, an automated handwriting recognition algorithm designed to read cheques for financial systems. Since then, these networks have shown significant promise in the field of pattern recognition.
Similar to radiologists that during the medical training period have to learn by constantly correlating and relating their interpretations of radiological images to the ground truth, CNNs are influenced by the human visual cortex, where image recognition is initiated by the identification of the many features of the image. Furthermore, CNNs require a significant amount of training data that comes in the form of medical images along with labels for what the image is supposed to be. At each hidden layer of training, CNNs can adjust the applied weights and filters (characteristics of regions in an image) to improve the performance on the given training data.
Briefly and very simply ( Fig. 2.3 ), the act of convolving an image with various weights and creating a stack of filtered images is referred to as a convolutional layer, where an image essentially becomes a stack of filtered images. Pooling is then applied to all these filtered images, where the original stack of images becomes a smaller representation of themselves and all negative values are removed by a rectified linear unit (ReLU). All these operations are then stacked on top of one another to create layers, sometimes referred to as Deep stacking. This process can be repeated multiple times and each time the image gets filtered more and relatively smaller. The last layer is referred to as a fully connected layer where every value assigned to all layers will contribute to what the results will be. If the system produces an error in this final answer, the gradient descent can be applied by adjusting the values up and down to see how the error changes relative to the right answer of interest. This can be achieved by an algorithm called back propagation that signifies “learning from mistakes.” After learning a new capability from the existing data, this can be applied to new images and the system can classify the images in the right category (Inference), similar to how a radiologist operates [28] .
Figure 2.3.
The various stages of convolutional neural networks at work.
Adapted from Lundervold AS, Lundervold A. An overview of deep learning in medical imaging focusing on MRI. Z Med Phys. 2019;29:102–27.
2.3.3. Augmented reality and virtual reality in the healthcare space
Augmented and virtual reality (AR and VR) can be incorporated at every stage of a healthcare system. These systems can be implemented at the early stages of education for medical students, to those training for a specific specialty and experienced surgeons. On the other hand, these technologies can be beneficial and have some negative consequences for patients.
In this section, we will attempt to cover each stage and finally comment on the usefulness of these technologies.
2.3.3.1. Education and exploration
Humans are visual beings and play is one of the most important aspects of our lives. As children the most important way for us to learn was to play. Interaction with the surroundings allowed us to gain further understanding of the world and provided us with the much-needed experience. The current educational system is limited and for interactive disciplines such as medicine this can be a hindrance. Medicine can be visualized as an art form and future clinicians are the artist. These individuals require certain skills to fulfill the need for an ever-evolving profession. Early in medical school, various concepts are taught to students without them ever experiencing these concepts in real life. So game-like technologies such as VR and AR could enhance and enrich the learning experience for future medical and health-related disciplines [29] . Medical students could be provided with and taught novel and complicated surgical procedures, or learn about anatomy through AR without ever needing to interact or involve real patients at an early stage or without ever needing to perform an autopsy on a real corpse. These students will of course be interacting with real patients in their future careers, but the goal would be to initiate the training at an earlier stage and lowering the cost of training at a later stage.
For today’s training specialists, the same concept can be applied. Of course, human interaction should be encouraged in the medical field but these are not always necessary and available when an individual is undergoing a certain training regimen. The use of other physical and digital cues such as haptic feedback and photorealistic images and videos can provide a real simulation whereby learning can flourish and the consequences and cost of training are not drastic ( Fig. 2.4 ).
Figure 2.4.
Virtual reality can help current and future surgeons enhance their surgical abilities prior to an actual operation. (Image obtained from a video still, OSSOR VR).
In a recent study [30] , two groups of surgical trainees were subjected to different methods for Mastoidectomy, where one group ( n =18) would go through the standard training path and the other would train on a freeware VR simulator [the visible ear simulator (VES)]. At the end of the training, a significant improvement in surgical dissection was observed for those who trained with VR. For real-life and precise execution, AR would be more advantageous in healthcare settings. By wearing lightweight headsets (e.g., Microsoft HoloLens or Google Glass) that project relevant images or video onto the regions of interest, the user can focus on the task without ever being distracted by moving their visual fields away from the region of interest.
2.3.3.2. Patient experience
Humans interact with their surroundings with audiovisual cues and utilize their limbs to engage and move within this world. This seemingly ordinary ability can be extremely beneficial for those who are experiencing debilitating conditions that limit movement or for individuals who are experiencing pain and discomfort either from a chronic illness or as a side effect of a treatment. A recent study, looking at the effect of immersive VR for patients who had suffered from chronic stroke patients, found this technology to be contributing positively to the state of patients. During the VR experience, the patients are asked to grab a virtual ball and throw it back into the virtual space [31] . For these patients, this immersive experience could act as a personal rehabilitation physiotherapist who engages their upper limb movement multiple times a day, allowing for possible neuroplasticity and a gradual return of normal motor function to these regions.
For others, these immersive technologies could help cope with the pain and the discomfort of their cancer or mental health condition. A study has shown that late-stage adult cancer patients can use this technology with minimum physical discomfort and in return benefit from an enhanced relaxed state, entertainment, and a much-needed distraction [32] . These immersive worlds provide a form of escapism with their artificial characters and environments, allowing the individual to interact and explore the surrounding while receiving audiovisual feedback from the environment, much like all the activities of daily living.
2.4. Intelligent personal health records
Personal health records have historically been physician-oriented and often have lacked patient-related functionalities. However, in order to promote self-management and improve the outcomes for patients, a patient-centric personal health record should be implemented. The goal is to allow ample freedom for patients to manage their conditions, while freeing up time for the clinicians to perform more crucial and urgent tasks.
2.4.1. Health monitoring and wearables
For millennia individuals relied on physicians to inform them about their own bodies and to some extent, this practice is still applied today. However, the relatively new field of wearables is changing this. Wearable health devices (WHDs) are an upcoming technology that allow for constant measurement of certain vital signs under various conditions. The key to their early adoption and success is their application flexibility—the users are now able to track their activity while running, meditating, or when underwater. The goal is to provide individuals with a sense of power over their own health by allowing them to analyze the data and manage their own health. Simply, WHDs create individual empowerment ( Fig. 2.5 ).
Figure 2.5.
Health outcome of a patient depends on a simple yet interconnected set of criteria that are predominantly behavior dependent.
At first look, a wearable device might look like an ordinary band or watch; however, these devices bridge the gap between multiple scientific disciplines such as biomedical engineering, materials science, electronics, computer programming, and data science, among many others [33] . It would not be an exaggeration to refer to them as ever-present digital health coaches, as increasingly it is encouraged to wear them at all times in order to get the most out of your data. Garmin wearables are a good example of this, with a focus on being active, they cover a vast variety of sports and provide a substantial amount of data on their Garmin connect application where users can analyze and observe their daily activities. These are increasingly accompanied by implementation of gamification.
Gamification refers to utilization of game design elements for nongame-related applications. These elements are used to motivate and drive users to reach their goals [34] . On wearable platforms, data gathered from daily activities can serve as competition between different users on the platform. Say, that your average weekly steps are around 50,000 steps. Here, based on specific algorithms, the platform places you on a leaderboard against individuals whose average weekly steps are similar to yours or higher, with the highest ranking member exceeding your current average weekly steps. As a result of this gamified scenario, the user can push themselves to increase their daily activities in order to do better on the leaderboard and potentially lead a healthier life. While the gamification aspect of wearables and their application could bring benefits, evidence of efficacy is scarce and varies widely with some claiming that the practice might bring more harm than good.
Remote monitoring and picking up on early signs of disease could be immensely beneficial for those who suffer from chronic conditions and the elderly. Here, by wearing a smart device or manual data entry for a prolonged period, individuals will be able to communicate to their healthcare workers without the need of disrupting their daily lives [35] . This is a great example of algorithms collaborating with healthcare professionals to produce an outcome that is beneficial for patients.
2.4.2. Natural language processing
Natural language processing (NLP) relates to the interaction between computers and humans using natural language and often emphasizes on the computer’s ability to understand human language. NLP is crucial for many applications of big data analysis within healthcare, particularly for EMRs and translation of narratives provided by clinicians. It is typically used in operations such as extraction of information, conversion of unstructured data into structured data, and categorization of data and documents.
NLP makes use of various classifications to infer meaning from unstructured textual data and allows clinicians to work more freely using language in a “natural way” as opposed to fitting sequences of text into input options to serve the computer. NLP is being used to analyze data from EMRs and gather large-scale information on the late-stage complications of a certain medical condition [26] .
There are many areas in healthcare in which NLP can provide substantial benefits. Some of the more immediate applications include [36]
Efficient billing: extracting information from physician notes and assigning medical codes for the billing process.
Authorization approval: Using information from physician notes to prevent delays and administrative errors.
Clinical decision support: Facilitate decision-making for members of healthcare team upon need (for instance, predicting patient prognosis and outcomes).
Medical policy assessment: compiling clinical guidance and formulation appropriate guidelines for care.
One application of NLP is disease classification based on medical notes and standardized codes using International Statistical Classification of Diseases and Related Health Problems (ICD). ICD is managed and published by the WHO and contains codes for diseases and symptoms as well as various findings, circumstances, and causes of disease. Here is an illustrative example of how an NLP algorithm can be used to extract and identify the ICD code from a clinical guidelines description. Unstructured text is organized into structured data by parsing for relevant clauses followed by classification of ICD-10 codes based on frequency of occurrence. The NLP algorithm is run at various thresholds to improve classification accuracy and the data is aggregated for the final output ( Fig. 2.6 ).
Figure 2.6.
Example of ICD-10 mapping from a clinical guidelines’ description [36] .
2.4.3. Integration of personal records
Since the introduction of EMRs, there have been large databases of information on each patient, which collectively can be used to identify healthcare trends within different disease areas. The EMR databases contain the history of hospital encounters, records of diagnoses and interventions, lab test, medical images, and clinical narratives. All these datasets can be used to build predictive models that can help clinicians with diagnostics and various treatment decision support. As AI tools mature it will be possible to extract all kinds of information such as related disease effects and correlations between historical and future medical events [37] . The only data often missing is data from in between interventions and between hospital visits when the patient is well or may not be showing symptoms. Such data could help to construct an end-to-end model of both “health” and “disease” for studying long-term effects and further disease classifications.
Although the applications of AI for EMRs are still quite limited, the potential for using the large databases to detect new trends and predict health outcomes is enormous. Current applications include data extraction from text narratives, predictive algorithms based on data from medical tests, and clinical decision support based on personal medical history. There is also great potential for AI to enable integration of EMR data with various health applications. Current AI applications within healthcare are often standalone applications, these are often used for diagnostics using medical imaging and for disease prediction using remote patient monitoring [38] . However, integrating such standalone applications with EMR data could provide even greater value by adding personal medical data and history as well as a large statistical reference library to make classifications and predictions more accurate and powerful. EMR providers such as Cerner, Epic, and Athena are beginning to add AI functionality such as NLP in their systems making it easier to access and extract data held in their libraries [39] . This could facilitate the integration of, for instance, Telehealth and remote monitoring applications with EMR data and the data integration transfer could even go both ways including the addition of remote monitoring data in the EMR systems.
There are many EMR providers and systems globally. These use various operating systems and approaches with more than a thousand EMR providers operating in the United States alone. Integration of EMR records on their own poses a great challenge and interoperability of these systems is important to obtain the best value from the data. There are various international efforts in gathering EMR data across countries including Observational Health Data Science and Informatics (OHDSI), who have consolidated 1.26 billion patient records from 17 different countries [40] . Various AI methods have been used to extract, classify, and correlate data from EMRs but most generally make use of NLP, DL, and neural networks.
DeepCare is an example of an AI-based platform for end-to-end processing of EMR data. It uses a deep dynamic memory neural network to read and store experiences and in memory cells. The long short-term memory of the system models the illness trajectory and healthcare processes of users via a time-stamped sequence of events and in this way allows capturing long-term dependencies [41] . Using the stored data, the framework of DeepCare can model disease progression, support intervention recommendation, and provide disease prognosis based on EMR databases. Studying data from a cohort of diabetic and mental health patients it was demonstrated that DeepCare could predict the progression of disease, optimal interventions, and assessing the likelihood for readmission [37] .
2.5. Robotics and artificial intelligence-powered devices
There are numerous areas in healthcare where robots are being used to replace human workforce, augment human abilities, and assist human healthcare professionals. These include robots used for surgical procedures such as laparoscopic operations, robotic assistants for rehabilitation and patient assistance, robots that are integrated into implants and prosthetic, and robots used to assist physicians and other healthcare staff with their tasks. Some of these devices are being developed by several companies especially for interacting with patients and improving the connection between humans and machines from a care perspective. Most of the robots currently under development have some level of AI technology incorporated for better performance with regard to classifications, language recognition, image processing, and more.
2.5.1. Minimally invasive surgery
Although many advances have been seen in the area surrounding surgery measured by the outcomes of surgical procedures, the main practice of surgery still remains a relatively low-tech procedure for the most part using hand tools and instruments for “cutting and sewing.” Conventional surgery relies greatly on sensing by the surgeon, where touching allows them to distinguish between tissues and organs and often requires open surgery. There is an ongoing transformation within surgical technology and focus has especially been placed in reducing the invasiveness of surgical procedure by minimizing incisions, reducing open surgeries, and using flexible tools and cameras to assist the surgery [42] . Such minimally invasive surgery is seen as the way forward, but it is still in an early phase with many improvements to be made to make it “less of a big deal” for patients and reduce time and cost. Minimal invasive surgery requires different motor skills compared with conventional surgery due to the lower tactile feedback when relying more on tools and less on direct touching. Sensors that provide the surgeon with finer tactile stimuli are under development and make use of tactile data processing to translate the sensor input into data or stimuli that can be perceived by the surgeon. Such tactile data processing typically makes use of AI, more specifically artificial neural networks to enhance the function of this signal translation and the interpretation of the tactile information [43] . Artificial tactile sensing offers several advantages compared with physical touching including a larger reference library to compare sensation and standardization among surgeons with respect to quantitative features, continuous improvement, and level of training.
An example where artificial tactile sensing has been used includes screening of breast cancer, as a replacement for clinical breast examination to complement medical imaging techniques such as x-ray mammography and MRI. Here, the artificial tactile sensing system was built on data from reconstruction of mechanical tissue measurements using a pressure sensor as reference data. During training of the neural network, the weight of the input data adjusts according to the desired output [44] . The tactile sensory system can detect mass calcifications inside the breast tissue based on palpation of different points of the tissue and comparing with different reference data, and subsequently determine whether there are any significant abnormalities in the breast tissue. Artificial tactile sensing has also been used for other applications including assessment of liver, brain, and submucosal tumors [45] .
2.5.2. Neuroprosthetics
Our species has always longed for an eternal life, in the ancient Vedic tradition there exists a medicinal drink that provides “immortality” for those who drink it. The Rig Veda, which was written some 5000 years ago, comments: “We drank soma, we became immortal, we came to the light, we found gods.” This is similar in ancient Persian culture, where a similar legendary drink is called Hoama in the Zoroastrain sacred book, Avesta [46] , [47] . This longing for “enhancement” and “augmentation” has always been with us, and in the 21st century we are gradually beginning to move towards making some past myths into reality. In this section, we will cover some recent innovations that can utilize AI to assist and allow humans to function better. Most research in this area is to assist individuals with preexisting conditions and have not been implemented in normal functioning humans for the sake of human augmentation; however, this can perhaps change in the coming years.
Neuroprosthetics are defined as devices that help or augment the subject’s own nervous system, in both forms of input and output. This augmentation or stimulation often occurs in the form of an electrical stimulation to overcome the neurological deficiencies that patients experience.
These debilitating conditions can impair hearing, vision, cognitive, sensory or motor skills, and can lead to comorbidities. Indeed, movement disorders such as multiple sclerosis or Parkinson’s are progressive conditions that can lead to a painful and gradual decline in the above skills while the patient is always conscious of every change. The recent advances in brain machine interfaces (BMIs) have shown that a system can be employed where the subjects’ intended and voluntary goal-directed wishes (electroencephalogram, EEG) can be stored and learned when a user “trains” an intelligent controller (an AI). This period of training allows for identification of errors in certain tasks that the user deems incorrect, say that on a computer screen, a square is directed to go left and instead it goes to right and also in a situation where the BMI is connected to a fixed robotic hand, the subject directs the device to go up and the signals are interpreted as a down movement. Correct actions are stored, and the error-related brain signals are registered by the AI to correct for future actions. Because of this “reinforcement learning,” the system can potentially store single to several control “policies,” which allow for patient personalization [48] . This is rather similar to the goals of the company Neuralink which aims to bring the fields of material science, robotics, electronics, and neuroscience together to try and solve multifaceted health problems [49] .
While in its infancy and very exploratory, this field will be immensely helpful for patients with neurodegenerative diseases who will increasingly rely on neuroprostheses throughout their lives.
2.6. Ambient assisted living
With the aging society, more and more people live through old age with chronic disorders and mostly manage to live independently up to an old age. Data indicates that half of people above the age of 65 years have a disability of some sort, which constitutes over 35 million people in the United States alone. Most people want to preserve their autonomy, even at an old age, and maintain control over their lives and decisions [50] . Assistive technologies increase the self-dependencies of patients, encouraging user participation in Information and Communication Technology (ICT) tools to provide remote care services type assistance and provide information to the healthcare professionals. Assistive technologies are experiencing rapid growth, especially among people aged 65–74 years [51] . Governments, industries, and various organizations are promoting the concept of AAL, which enables people to live independently in their home environment. AAL has multiple objectives including promoting a healthy lifestyle for individuals at risk, increasing the autonomy and mobility of elderly individuals, and enhancing security, support, and productivity so people can live in their preferred environment and ultimately improve their quality of life. AAL applications typically collect data through sensors and cameras and apply various artificially intelligent tools for developing an intelligent system [52] . One way of implementing AAL is using smart homes or assistive robots.
2.6.1. Smart home
A smart home is a normal residential home, which has been augmented using different sensors and monitoring tools to make it “smart” and facilitate the lives of the residents in their living space. Other popular applications of AAL that can be a part of a smart home or used as an individual application include remote monitoring, reminders, alarm generation, behavior analysis, and robotic assistance.
Smart homes can be useful for people with dementia and several studies have investigated smart home applications to facilitate the lives of dementia patients. Low-cost sensors in an Internet of Things (IoT) architecture can be a useful way of detecting abnormal behavior in the home. For instance, sensors are placed in different areas of the house including the bedroom, kitchen, and bathroom to ensure safety. A sensor can be placed on the oven and detect the use of the cooker, so the patient is reminded if it was not switched off after use. A rain sensor can be placed by the window to alert the patient if the window was left open during rain. A bath sensor and a lamp sensor can be used in the bathroom to ensure that they are not left on [53] .
The sensors can transmit information to a nearby computing device that can process the data or upload them to the cloud for further processing using various machine learning algorithms, and if necessary, alert relatives or healthcare professionals ( Fig. 2.7 ). By daily collection of patient data, activities of daily living are defined over time and abnormalities can be detected as a deviation from the routine. Machine learning algorithms used in smart home applications include probabilistic and discriminative methods such as Naive Bayes classifier and Hidden Markov Model, support vector machine, and artificial neural networks [54] .
Figure 2.7.
Process diagram of a typical smart home or smart assistant setup.
In one example, Markov Logic Network was used for activity recognition design to model both simple and composite activities and decide on appropriate alerts to process patient abnormality. The Markov Logic Network used handles both uncertainty modeling and domain knowledge modeling within a single framework, thus modeling the factors that influence patient abnormality [55] . Uncertainty modeling is important for monitoring patients with dementia as activities conducted by the patient are typically incomplete in nature. Domain knowledge related to the patient’s lifestyle is also important and combined with their medical history it can enhance the probability of activity recognition and facilitate decision-making. This machine learning-based activity recognition framework detected abnormality together with contextual factors such as object, space, time, and duration for decision support on suitable action to keep the patient safe in the given environment. Alerts of different importance are typically used for such decision support and can, for instance, include a low-level alarm when the patient has forgotten to complete a routine activity such as switching off the lights or closing the window and a high-level alarm if the patient has fallen and requires intervention by a caretaker. One of the main aims of such activity monitoring approaches, as well as other monitoring tools, is to support healthcare practitioners in identifying symptoms of cognitive functioning or providing diagnosis and prognosis in a quantitative and objective manner using a smart home system [56] . There are various other assistive technology devices for people with dementia including motion detectors, electronic medication dispensers, and robotic devices for tracking.
2.6.2. Assistive robots
Assistive robots are used to support the physical limitations of the elderly and dysfunctional people and help them by assisting in daily activities and acting as an extra pair of hands or eyes. Such assistive robots can help in various activities such as mobility, housekeeping, medication management, eating, grooming, bathing, and various social communications. An assistive robot named RIBA with human-type arms was designed to help patients with lifting and moving heavy things. It has been demonstrated that the robot is able to carry the patient from the bed to a wheelchair and vice versa. Instructions can be provided to RIBA either by using tactile sensors using a method known as tactile guidance to teach by showing [57] .
The MARIO project (Managing active and healthy Aging with use of caring Service robots) is another assistive robot which has attracted a lot of attention. The project aims to address the problems of loneliness, isolation, and dementia, which are commonly observed with elderly people. This is done by performing multifaceted interventions delivered by service robots. The MARIO Kompaï companion robot was developed with the objective to provide real feelings and emotions to improve acceptance by dementia patients, to support physicians and caretakers in performing dementia assessment tests, and promote interactions with the end users. The Kompaï robot used for the MARIO project was developed by Robosoft and is a robot containing a camera, a Kinect motion sensor, and two LiDAR remote sensing systems for navigation and object identification [58] . It further includes a speech recognition system or other controller and interface technologies, with the intention to support and manage a wide range of robotic applications in a single robotic platform similar to apps for smartphones. The robotic apps include those focused on cognitive stimulation, social interaction, as well as general health assessment. Many of these apps use AI-powered tools to process the data collected from the robots in order to perform tasks such as facial recognition, object identification, language processing, and various diagnostic support [59] .
2.6.3. Cognitive assistants
Many elderly people experience a decline in their cognitive abilities and have difficulties in problem-solving tasks as well as maintaining attention and accessing their memory. Cognitive stimulation is a common rehabilitation approach after brain injuries from stroke, multiple sclerosis or trauma, and various mild cognitive impairments. Cognitive stimulation has been demonstrated to decrease cognitive impairment and can be trained using assistive robots.
Virtrael is one of such cognitive stimulation platforms and serves to assess, stimulate, and train various cognitive skills that experience a decline in the patient. The Virtrael program is based on visual memory training and the project is carried out by three different key functionalities: configuration, communication, and games. The configuration mode allows an administrator to match the patient with a therapist and the therapist to configure the program for the patient. The communication tool allows communication between the patient and the therapist and between patients., The games are intended to train cognitive skills of the patient including memory, attention, and planning ( Fig. 2.8 ) [60] .
Figure 2.8.
Example of games used for training cognitive skills of patients [60] .
2.6.4. Social and emotional stimulation
One of the first applications of assistive robots and a commonly investigated technology is companion robots for social and emotional stimulation. Such robots assist elderly patients with their stress or depression by connecting emotionally with the patient with enhanced social interaction and assistance with various daily tasks. The robots vary from being pet-like robots to more peer-like and they are all interactive and provide psychological and social effects. The robotic pet PARO, a baby seal robot, is the most widely used robotic pet and carries various sensors to sense touch, sounds, and visual objects [61] . Another robot is the Mario Kampäi mentioned earlier, which focuses on assisting elderly patients with dementia, loneliness, and isolation. Yet, another companion robot Buddy, by Blue Frog Robotics, assists elderly patients by helping with daily activities such as reminders about medication and appointments, as well as using motion sensors to detect falls and physical inactivity. Altogether, studies investigating cognitive stimulation seem to demonstrate a decrease in the rate of cognitive decline and progression of dementia.
2.7. The artificial intelligence can see you now
AI is increasingly becoming an integral part of all our lives. From smartphones to cars and more importantly our healthcare. This technology will continue to push boundaries and certain norms that have been dormant and accepted as the status quo for hundreds of years, will now be directly challenged and significantly augmented.
2.7.1. Artificial intelligence in the near and the remote
We believe that AI has an important role to play in the healthcare offerings of the future. In the form of machine learning, it is the primary capability behind the development of precision medicine, widely agreed to be a sorely needed advance in care. Although early efforts at providing diagnosis and treatment recommendations have proven challenging, we expect that AI will ultimately master that domain as well. Given the rapid advances in AI for imaging analysis, it seems likely that most radiology and pathology images will be examined at some point by a machine. Speech and text recognition are already employed for tasks like patient communication and capture of clinical notes, and their usage will increase.
The greatest challenge to AI in these healthcare domains is not whether the technologies will be capable enough to be useful, but rather ensuring their adoption in daily clinical practice. For widespread adoption to take place, AI systems must be approved by regulators, integrated with EHR systems, standardized to a sufficient degree that similar products work in a similar fashion, taught to clinicians, paid for by public or private payer organizations, and updated over time in the field. These challenges will ultimately be overcome, but they will take much longer to do so than it will take for the technologies themselves to mature. As a result, we expect to see limited use of AI in clinical practice within 5 years and more extensive use within 10 years.
It also seems increasingly clear that AI systems will not replace human clinicians on a large scale, but rather will augment their efforts to care for patients. Over time, human clinicians may move toward tasks and job designs that draw on uniquely human skills like empathy, persuasion, and big-picture integration. Perhaps the only healthcare providers who will risk their careers over time may be those who refuse to work alongside AI.
2.7.2. Success factors for artificial intelligence in healthcare
A review by Becker [62] suggests that AI used in healthcare can serve clinicians, patients, and other healthcare workers in four different ways. Here, we will use these suggestions as inspirations and will expand on their contribution toward a successful implementation of AI in healthcare: ( Fig. 2.9 )
Assessment of disease onset and treatment success.
Management or alleviation of complications.
Patient-care assistance during a treatment or procedure.
Research aimed at discovery or treatment of disease.
Figure 2.9.
The likely success factors depend largely on the satisfaction of the end users and the results that the AI-based systems produce.
2.7.2.1. Assessment of condition
Prediction and assessment of a condition is something that individuals will demand to have more control over in the coming years. This increase in demand is partly due to a technology reliable population that has grown to learn that technological innovation will be able to assist them in leading healthy lives. Of course, while not all answers lie in this arena, it is an extremely promising field.
Mood and mental health-related conditions are immensely important topic in today’s world and for good reason. According to the WHO, one in four people around the world experiences such conditions and as a result can accelerate their path toward ill-health and comorbidities. Recently, machine learning algorithms have been developed to detect words and intonations of an individual’s speech that may indicate a mood disorder. Using neural networks, an MIT-based lab has conducted research onto the detection of early signs of depression using speech. According to the researchers, the “model sees sequences of words/speaking style” and decides whether these emerging patterns are likely to be seen in individuals with and without depression [63] . The technique employed by the researchers is often referred to as a sequence modeling, where model sequences of audio and text from patients with and without depression are fed to the system and as these accumulate, various text patterns could be paired with audio signals. For example, words such as “low”, “blue,” and “sad” can be paired with more monotone and flat audio signals. Additionally, the speed and the length of pauses can play a major role in detection of individuals experiencing depression. An example of this can be seen in Fig. 2.10 where within a period of 60 seconds and based on the tone and words used, it is possible to measure an estimated emotion.
Figure 2.10.
Early detection of certain mood conditions can be predicted by analyzing the trend, tone of voice, and speaking style of individuals.
2.7.2.2. Managing complications
The general feeling of being unwell and its various complications that accompany mild illnesses are usually well tolerated by patients. However, for certain conditions, it is categorically important to manage these symptoms as to prevent further development and ultimately alleviate more complex symptoms. A good example for this can be seen in the field of infectious diseases. In a study published in the journal of trauma and acute care surgery, researchers think that by understanding the microbiological niches (biomarkers) of trauma patients, we could hold the key to future wound infections and therefore can allow healthcare workers to take the necessary arrangements to prevent the worst outcome [64] . Machine learning techniques can also contribute toward the prediction of serious complications such as neuropathy that could arise for those suffering from type 2 diabetes or early cardiovascular irregularities. Furthermore, the development of models that can help clinicians detect postoperative complications such as infections will contribute toward a more efficient system [65] .
2.7.2.3. Patient-care assistance
Patient-care assistance technologies can improve the workflow for clinicians and contribute toward patient’s autonomy and well-being. If each patient is treated as an independent system, then based on the variety of designated data available, a bespoke approach can be implemented. This is of utmost importance for the elderly and the vulnerable in our societies. An example of this could be that of virtual health assistants that remind individuals to take their required medications at a certain time or recommend various exercise habits for an optimal outcome. The field of Affective Computing can contribute significantly in this arena. Affective computing refers to a discipline that allows the machine to process, interpret, simulate, and analyze human behavior and emotions. Here, patients will be able to interact with the device in a remote manner and access their biometric data, all the while feeling that they are interacting with a caring and empathetic system that truly wants the best outcome for them. This setting can be applied both at home and in a hospital setting to relieve work pressure from healthcare workers and improve service.
2.7.2.4. Medical research
AI can accelerate the diagnosis process and medical research. In recent years, an increasing number of partnerships have formed between biotech, MedTech, and pharmaceutical companies to accelerate the discovery of new drugs. These partnerships are not all based on curiosity-driven research but often out of necessity and need of society. In a world where certain expertise is rare, research costs high and effective treatments for certain conditions are yet to be devised, collaboration between various disciplines is key. A good example of this collaboration is seen in a recent breakthrough for antibiotic discovery, where the researchers devised/trained a neural network that actively “learned” the properties of a vast number of molecules in order to identify those that inhibit the growth of E. coli , a Gram negative bacterial species that is notoriously hard to kill [66] . Another example is the recent research carried out regarding the pandemic of COVID-19 all around the world. Predictive Oncology, a precision medicine company has announced that they are launching an AI platform to accelerate the production of new diagnostics and vaccines, by using more than 12,000 computer simulations per machine. This is combined with other efforts to employ DL to find molecules that can interact with the main proteases (M pro or 3CL pro ) of the virus, resulting in the disruption of the replication machinery of the virus inside the host [67] , [68] .
2.7.3. The digital primary physician
As you walk into the primary care physician’s room, you are greeted by the doctor. There is an initial eye to eye contact, then an exchange of pleasantries follows. She further asks you about your health and how she can be of help. You, the patient, have multiple medical problems: previous presence of sciatica, snapping hip syndrome, high cholesterol, an above-average blood pressure, and chronic sinusitis. However, because of the limited time that you have with the doctor, priorities matter [69] . You categorize your own conditions and tend to focus on the most important to you, the chronic sinusitis. The doctor asks you multiple questions about the condition and as you are explaining your symptoms, she types it all in your online record, does a quick examination, writes a prescription, and says to come back in 6 weeks for further examination. For your other conditions, you probably need to book a separate appointment unless you live in a country that designates more than 20 minutes per patient.
The above scenario is the normal routine in most countries. However, despite the helpfulness of the physician, it is not an ideal system and it is likely that if you were in the position of the above patient, you will walk away dissatisfied with the care received. The frustration with such systems has led to an immense pressure on the health workers and needs to be addressed. Today, there are numerous health-related applications that utilize and combine the power of AI with that of a remote physician to answer some of the simple questions that might not warrant a physical visit to the doctors.
2.7.3.1. Artificial intelligence prequalification (triage)
Prior to having access to an actual doctor, trained AI bots can qualify whether certain symptoms warrant an actual conversation with a physician. Many questions are asked of the patient and based on each response; the software encourages the user to take specific actions. These questions and answers are often vigorously reviewed by medical professionals at each stage to account for accuracy. In important cases, a general response of “You should see a doctor” is given and the patient is directed to book an appointment with a primary care physician.
2.7.3.2. Remote digital visits
The unique selling point for these recent innovations is that they allow remote video conversations between the patient and the physician. Normally, the patient books an appointment for a specific time, often during the same day. This provides them with ample time to provide as much information as possible for the physician responsible to review and carefully analyze the evidence before talking to the patient. The information can be in the form of images, text, video, and audio. This is extremely encouraging and creative as many people around the world lack the time and resources to visit a physician and allows remote work for the physician.
2.7.3.3. The future of primary care
In a recent study, when asked about the future of AI on primary care, while acknowledging its potential benefits, most practitioners were extremely skeptical regarding it playing a significant role in the future of the profession. One main pain point refers to the lack of empathy and the ethical dilemma that can occur between AI and patients [70] . While this might be true for today, it is naive to assume that this form of technology will remain dormant and will not progress any further. Humanity prefers streamlining and creative solutions that are effective and take less out of our daily lives. Combine this with the ever-increasing breakthroughs in the field of smart healthcare materials [71] and AI, one could envisage patients managing most of their own conditions at home and when necessary get in touch with a relevant healthcare worker who will refer them to more specialized physicians who could tend to their needs. It is also very important to note that at the time of an epidemic, an outbreak, natural or manmade disaster, or simply when the patient is away from their usual dwelling, a technology that allows humans to remotely interact and solve problems will have to become a necessity. At the time of writing (Early 2020), the threat of a SARS-COV-2 epidemic looms over many countries and is expanding at an unprecedented rate. World experts speculate that the infection rate is high and has the potential to remain within a population and cause many fatalities in many months to come. It is therefore essential to promote remote healthcare facilities/technologies and to have permanent solutions in place to save lives in order to reduce any unnecessary burden or risk on both healthcare workers and patients alike.
- 1. Miller D.D., Brown E.W. Artificial intelligence in medical practice: the question to the answer? Am J Med. 2018;131(2):129–133. doi: 10.1016/j.amjmed.2017.10.035. [ DOI ] [ PubMed ] [ Google Scholar ]
- 2. Kirch D.G., Petelle K. Addressing the physician shortage: the peril of ignoring demography. JAMA. 2017;317(19):1947–1948. doi: 10.1001/jama.2017.2714. [ DOI ] [ PubMed ] [ Google Scholar ]
- 3. Combi C., Pozzani G., Pozzi G. Telemedicine for developing countries. Appl Clin Inform. 2016;07(04):1025–1050. doi: 10.4338/ACI-2016-06-R-0089. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 4. Bresnick J. Artificial intelligence in healthcare market to see 40% CAGR surge; 2017.
- 5. Lee K.-F. AI superpowers: China, Silicon Valley, and the new world order. 1st ed. Houghton Mifflin Harcourt; 2019. [ Google Scholar ]
- 6. King D, DeepMind’s health team joins Google Health.
- 7. Hoyt R.E., Snider D., Thompson C., Mantravadi S. IBM Watson Analytics: automating visualization, descriptive, and predictive statistics. JMIR Public Health Surveill. 2016;2(2):e157. doi: 10.2196/publichealth.5810. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 8. Marr B. How is AI used in healthcare—5 powerful real-world examples that show the latest advances. Forbes; 2018.
- 9. Kalis B, Collier M, Fu R. 10 promising AI applications in health care. Harvard Business Review; 2018.
- 10. Singhal S, Carlton S. The era of exponential improvement in healthcare? McKinsey Co Rev.; 2019.
- 11. Konieczny L, Roterman I. Personalized precision medicine. Bio-Algorithms Med-Syst 2019; 15.
- 12. Love-Koh J. The future of precision medicine: potential impacts for health technology assessment. Pharmacoeconomics. 2018;36(12):1439–1451. doi: 10.1007/s40273-018-0686-6. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 13. Kulski JK. Next-generation sequencing—an overview of the history, tools, and ‘omic’ applications; 2020.
- 14. Hughes J.P., Rees S., Kalindjian S.B., Philpott K.L. Principles of early drug discovery. Br J Pharmacol. 2011;162(6):1239–1249. doi: 10.1111/j.1476-5381.2010.01127.x. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 15. Ekins S. Exploiting machine learning for end-to-end drug discovery and development. Nat Mater. 2019;18(5):435–441. doi: 10.1038/s41563-019-0338-z. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 16. Zhang L., Tan J., Han D., Zhu H. From machine learning to deep learning: progress in machine intelligence for rational drug discovery. Drug Discov Today. 2017;22(11):1680–1685. doi: 10.1016/j.drudis.2017.08.010. [ DOI ] [ PubMed ] [ Google Scholar ]
- 17. Lavecchia A. Deep learning in drug discovery: opportunities, challenges and future prospects. Drug Discov Today. 2019;24(10):2017–2032. doi: 10.1016/j.drudis.2019.07.006. [ DOI ] [ PubMed ] [ Google Scholar ]
- 18. Coley C.W., Barzilay R., Green W.H., Jaakkola T.S., Jensen K.F. Convolutional embedding of attributed molecular graphs for physical property prediction. J Chem Inf Model. 2017;57(8):1757–1772. doi: 10.1021/acs.jcim.6b00601. [ DOI ] [ PubMed ] [ Google Scholar ]
- 19. Mayr A., Klambauer G., Unterthiner T., Hochreiter S. DeepTox: toxicity prediction using deep learning. Front Environ Sci. 2016;3:80. [ Google Scholar ]
- 20. Wu Z. MoleculeNet: a benchmark for molecular machine learning. Chem Sci. 2018;9 doi: 10.1039/c7sc02664a. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 21. Kadurin A., Nikolenko S., Khrabrov K., Aliper A., Zhavoronkov A. druGAN: an advanced generative adversarial autoencoder model for de novo generation of new molecules with desired molecular properties in silico. Mol Pharm. 2017;14(9):3098–3104. doi: 10.1021/acs.molpharmaceut.7b00346. [ DOI ] [ PubMed ] [ Google Scholar ]
- 22. Blaschke T., Olivecrona M., Engkvist O., Bajorath J., Chen H. Application of generative autoencoder in de novo molecular design. Mol Inform. 2018;37(1–2):1700123. doi: 10.1002/minf.201700123. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 23. Merk D., Friedrich L., Grisoni F., Schneider G. De novo design of bioactive small molecules by artificial intelligence. Mol Inform. 2018;37 doi: 10.1002/minf.201700153. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 24. Shi T. Molecular image-based convolutional neural network for the prediction of ADMET properties. Chemom Intell Lab Syst. 2019;194:103853. [ Google Scholar ]
- 25. Wallach H.A., Dzamba M.I. AtomNet: a deep convolutional neural network for bioactivity prediction in structure-based drug discovery. arXiv. 2015 [ Google Scholar ]
- 26. Hashimoto D.A., Rosman G., Rus D., Meireles O.R. Artificial intelligence in surgery: promises and perils. Ann Surg. 2018;268:70–76. doi: 10.1097/SLA.0000000000002693. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 27. Petscharnig S., Schöffmann K. Learning laparoscopic video shot classification for gynecological surgery. Multimed Tools Appl. 2018;77:8061–8079. [ Google Scholar ]
- 28. Lundervold A.S., Lundervold A. An overview of deep learning in medical imaging focusing on MRI. Z Med Phys. 2019;29:102–127. doi: 10.1016/j.zemedi.2018.11.002. [ DOI ] [ PubMed ] [ Google Scholar ]
- 29. Chien CH, Chen CH, Jeng TS. An interactive augmented reality system for learning anatomy structure. In: Proceedings of the International MultiConference of Engineers and Computer Scientists 2010, IMECS 2010; 2010. http://www.iaeng.org/publication/IMECS2018/
- 30. Frendø M., Konge L., Cayé-Thomasen P., Sørensen M.S., Andersen S.A.W. Decentralized virtual reality training of mastoidectomy improves cadaver dissection performance: a prospective, controlled cohort study. Otol Neurotol. 2020;41(4) doi: 10.1097/MAO.0000000000002541. [ DOI ] [ PubMed ] [ Google Scholar ]
- 31. Lee S.H., Jung H.Y., Yun S.J., Oh B.M., Seo H.G. Upper extremity rehabilitation using fully immersive virtual reality games with a head mount display: a feasibility study. PM R. 2020;12:257–262. doi: 10.1002/pmrj.12206. [ DOI ] [ PubMed ] [ Google Scholar ]
- 32. Baños R.M. A positive psychological intervention using virtual reality for patients with advanced cancer in a hospital setting: a pilot study to assess feasibility. Support Care Cancer. 2013;21:263–270. doi: 10.1007/s00520-012-1520-x. [ DOI ] [ PubMed ] [ Google Scholar ]
- 33. Dias D., Cunha J.P.S. Wearable health devices—vital sign monitoring, systems and technologies. Sensors (Basel) 2018;18(8):2414. doi: 10.3390/s18082414. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 34. Johnson D., Deterding S., Kuhn K.-A., Staneva A., Stoyanov S., Hides L. Gamification for health and wellbeing: a systematic review of the literature. Internet Interv. 2016;6:89–106. doi: 10.1016/j.invent.2016.10.002. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 35. Athilingam P., Labrador M.A., Remo E.F.J., Mack L., San Juan A.B., Elliott A.F. Features and usability assessment of a patient-centered mobile application (HeartMapp) for self-management of heart failure. Appl Nurs Res. 2016;32:156–163. doi: 10.1016/j.apnr.2016.07.001. [ DOI ] [ PubMed ] [ Google Scholar ]
- 36. Rangasamy ASS, Nadenichek R, Rayasam M. Natural language processing in healthcare; 2018.
- 37. Pham T., Tran T., Phung D., Venkatesh S. Predicting healthcare trajectories from medical records: a deep learning approach. J Biomed Inform. 2017;69:218–229. doi: 10.1016/j.jbi.2017.04.001. [ DOI ] [ PubMed ] [ Google Scholar ]
- 38. Rojahn K. Remote monitoring of chronic diseases: a landscape assessment of policies in four European countries. PLoS One. 2016;11:e0155738. doi: 10.1371/journal.pone.0155738. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 39. Davenport TH, Hongsermeier TM, Mc Cord KA. Using AI to improve electronic health records. Harvard Business Review; 2018.
- 40. Wang F., Casalino L.P., Khullar D. Deep learning in medicine—promise, progress, and challenges. JAMA Intern Med. 2019;179:293–294. doi: 10.1001/jamainternmed.2018.7117. [ DOI ] [ PubMed ] [ Google Scholar ]
- 41. Pham T, Tran T, Phung D, Venkatesh S. DeepCare: a deep dynamic memory model for predictive medicine. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Springer; 2016.
- 42. Konstantinova J., Jiang A., Althoefer K., Dasgupta P., Nanayakkara T. Implementation of tactile sensing for palpation in robot-assisted minimally invasive surgery: a review. IEEE Sens J. 2014;14 [ Google Scholar ]
- 43. Naeini F.B. A novel dynamic-vision-based approach for tactile sensing applications. IEEE Trans Instrum Meas. 2019:1. [ Google Scholar ]
- 44. Naidu A.S., Naish M.D., Patel R.V. A breakthrough in tumor localization: combining tactile sensing and ultrasound to improve tumor localization in robotics-assisted minimally invasive surgery. IEEE Robot Autom Mag. 2017;24 [ Google Scholar ]
- 45. Madani N., Mojra A. Quantitative diagnosis of breast tumors by characterization of viscoelastic behavior of healthy breast tissue. J Mech Behav Biomed Mater. 2017;68:180–187. doi: 10.1016/j.jmbbm.2017.01.044. [ DOI ] [ PubMed ] [ Google Scholar ]
- 46. Simha RK. How Russian scientists cracked the secret of a Vedic ritual drink; 2017.
- 47. David O. Scientific verification of vedic knowledge: archaeology online.
- 48. Iturrate I., Chavarriaga R., Montesano L., Minguez J., del Millán J.R. Teaching brain-machine interfaces as an alternative paradigm to neuroprosthetics control. Sci Rep. 2015;5(1):13893. doi: 10.1038/srep13893. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 49. Musk E. An integrated brain-machine interface platform with thousands of channels. J Med Internet Res. 2019;21(10):e16194. doi: 10.2196/16194. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 50. Roberts AW, Ogunwole SU, Blakeslee L, Rabe MA, The population 65 years and older in the United States: 2016; 2018.
- 51. Anderson W.L., Wiener J.M. The impact of assistive technologies on formal and informal home care. Gerontologist. 2015;55:422–433. doi: 10.1093/geront/gnt165. [ DOI ] [ PubMed ] [ Google Scholar ]
- 52. Barnay T., Juin S. Does home care for dependent elderly people improve their mental health? J Health Econ. 2016;45:149–160. doi: 10.1016/j.jhealeco.2015.10.008. [ DOI ] [ PubMed ] [ Google Scholar ]
- 53. Demir E., Köseoǧlu E., Sokullu R., Şeker B. Smart home assistant for ambient assisted living of elderly people with dementia. Procedia Comp Sci. 2017;113:609–614. [ Google Scholar ]
- 54. Fahad LG, Ali A, Rajarajan M. Learning models for activity recognition in smart homes. In: Information science and applications. Berlin: Springer; 2015. p. 819–26.
- 55. Gayathri KS, Easwarakumar KS. Intelligent decision support system for dementia care through smart home. Procedia Comp Sci 2016;93:947–55.
- 56. Nef T. Vol. 15. 2015. Evaluation of three state-of-the-art classifiers for recognition of activities of daily living from smart home ambient data; pp. 11725–11740. (Sensors (Basel)). [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 57. Joseph A, Christian B, Abiodun AA, Oyawale F. A review on humanoid robotics in healthcare. In: MATEC Web of Conferences; 2018. https://www.matec-conferences.org/
- 58. D’Onofrio G. MARIO Project: validation and evidence of service robots for older people with dementia. J Alzheimers Dis. 2019;68:1587–1601. doi: 10.3233/JAD-181165. [ DOI ] [ PubMed ] [ Google Scholar ]
- 59. Koumakis L., Chatzaki C., Kazantzaki E., Maniadi E., Tsiknakis M. Dementia care frameworks and assistive technologies for their implementation: a review. IEEE Rev Biomed Eng. 2019;12:4–18. doi: 10.1109/RBME.2019.2892614. [ DOI ] [ PubMed ] [ Google Scholar ]
- 60. Garcia-Alonso J, Fonseca C, editors. Gerontechnology: First International Workshop. In: First international workshop on gerotechnology. Springer; 2018.
- 61. Vitanza A, D’Onofrio G, Ricciardi F, Sancarlo D, Greco A, Giuliani F. Assistive robots for the elderly: innovative tools to gather health relevant data. In: Data science for healthcare: methodologies and applications. Springer; 2019.
- 62. Becker A. Artificial intelligence in medicine: what is it doing for us today? Health Policy Technol. 2019;9:198–205. [ Google Scholar ]
- 63. Matheson R. Model can more naturally detect depression in conversations. MIT News; 2018.
- 64. Dente C.J. Towards precision medicine: accurate predictive modeling of infectious complications in combat casualties. J Trauma Acute Care Surg. 2017;83(4) doi: 10.1097/TA.0000000000001596. [ DOI ] [ PubMed ] [ Google Scholar ]
- 65. Hu Z. Accelerating chart review using automated methods on electronic health record data for postoperative complications. AMIA Annu Symp Proc. 2016;2016:1822–1831. [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 66. Stokes J.M. A deep learning approach to antibiotic discovery. Cell. 2020;180:688–702. doi: 10.1016/j.cell.2020.01.021. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 67. Zhang H., Saravanan K.M., Yang Y., Hossain M.T., Li J., Ren X. Deep learning based drug screening for novel coronavirus 2019-nCov. Preprints. 2020 doi: 10.1007/s12539-020-00376-6. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 68. Zhang L. Crystal structure of SARS-CoV-2 main protease provides a basis for design of improved α-ketoamide inhibitors. Science. 2020:eabb3405. doi: 10.1126/science.abb3405. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 69. Irving G. International variations in primary care physician consultation time: a systematic review of 67 countries. BMJ Open. 2017;7:e017902. doi: 10.1136/bmjopen-2017-017902. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 70. Blease C., Kaptchuk T.J., Bernstein M.H., Mandl K.D., Halamka J.D., Desroches C.M. Artificial intelligence and the future of primary care: exploratory qualitative study of UK general practitioners’ views. J Med Internet Res. 2019;21:e12802. doi: 10.2196/12802. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 71. Zang Y., Zhang F., Di C.A., Zhu D. Advances of flexible pressure sensors toward artificial intelligence and health care applications. Mater Horiz. 2015;2:140–156. [ Google Scholar ]
- View on publisher site
- PDF (1.9 MB)
- Collections
Similar articles
Cited by other articles, links to ncbi databases.
- Download .nbib .nbib
- Format: AMA APA MLA NLM
Add to Collections
Advertisement
Machine Learning: Algorithms, Real-World Applications and Research Directions
- Review Article
- Published: 22 March 2021
- Volume 2 , article number 160 , ( 2021 )
Cite this article
- Iqbal H. Sarker ORCID: orcid.org/0000-0003-1740-5517 1 , 2
622k Accesses
1950 Citations
53 Altmetric
Explore all metrics
In the current age of the Fourth Industrial Revolution (4 IR or Industry 4.0), the digital world has a wealth of data, such as Internet of Things (IoT) data, cybersecurity data, mobile data, business data, social media data, health data, etc. To intelligently analyze these data and develop the corresponding smart and automated applications, the knowledge of artificial intelligence (AI), particularly, machine learning (ML) is the key. Various types of machine learning algorithms such as supervised, unsupervised, semi-supervised, and reinforcement learning exist in the area. Besides, the deep learning , which is part of a broader family of machine learning methods, can intelligently analyze the data on a large scale. In this paper, we present a comprehensive view on these machine learning algorithms that can be applied to enhance the intelligence and the capabilities of an application. Thus, this study’s key contribution is explaining the principles of different machine learning techniques and their applicability in various real-world application domains, such as cybersecurity systems, smart cities, healthcare, e-commerce, agriculture, and many more. We also highlight the challenges and potential research directions based on our study. Overall, this paper aims to serve as a reference point for both academia and industry professionals as well as for decision-makers in various real-world situations and application areas, particularly from the technical point of view.
Similar content being viewed by others
Machine Learning Approaches for Smart City Applications: Emergence, Challenges and Opportunities
Insights into the Advancements of Artificial Intelligence and Machine Learning, the Present State of Art, and Future Prospects: Seven Decades of Digital Revolution
Editorial: Machine Learning, Advances in Computing, Renewable Energy and Communication (MARC)
Explore related subjects.
- Artificial Intelligence
Avoid common mistakes on your manuscript.
Introduction
We live in the age of data, where everything around us is connected to a data source, and everything in our lives is digitally recorded [ 21 , 103 ]. For instance, the current electronic world has a wealth of various kinds of data, such as the Internet of Things (IoT) data, cybersecurity data, smart city data, business data, smartphone data, social media data, health data, COVID-19 data, and many more. The data can be structured, semi-structured, or unstructured, discussed briefly in Sect. “ Types of Real-World Data and Machine Learning Techniques ”, which is increasing day-by-day. Extracting insights from these data can be used to build various intelligent applications in the relevant domains. For instance, to build a data-driven automated and intelligent cybersecurity system, the relevant cybersecurity data can be used [ 105 ]; to build personalized context-aware smart mobile applications, the relevant mobile data can be used [ 103 ], and so on. Thus, the data management tools and techniques having the capability of extracting insights or useful knowledge from the data in a timely and intelligent way is urgently needed, on which the real-world applications are based.
The worldwide popularity score of various types of ML algorithms (supervised, unsupervised, semi-supervised, and reinforcement) in a range of 0 (min) to 100 (max) over time where x-axis represents the timestamp information and y-axis represents the corresponding score
Artificial intelligence (AI), particularly, machine learning (ML) have grown rapidly in recent years in the context of data analysis and computing that typically allows the applications to function in an intelligent manner [ 95 ]. ML usually provides systems with the ability to learn and enhance from experience automatically without being specifically programmed and is generally referred to as the most popular latest technologies in the fourth industrial revolution (4 IR or Industry 4.0) [ 103 , 105 ]. “Industry 4.0” [ 114 ] is typically the ongoing automation of conventional manufacturing and industrial practices, including exploratory data processing, using new smart technologies such as machine learning automation. Thus, to intelligently analyze these data and to develop the corresponding real-world applications, machine learning algorithms is the key. The learning algorithms can be categorized into four major types, such as supervised, unsupervised, semi-supervised, and reinforcement learning in the area [ 75 ], discussed briefly in Sect. “ Types of Real-World Data and Machine Learning Techniques ”. The popularity of these approaches to learning is increasing day-by-day, which is shown in Fig. 1 , based on data collected from Google Trends [ 4 ] over the last five years. The x - axis of the figure indicates the specific dates and the corresponding popularity score within the range of \(0 \; (minimum)\) to \(100 \; (maximum)\) has been shown in y - axis . According to Fig. 1 , the popularity indication values for these learning types are low in 2015 and are increasing day by day. These statistics motivate us to study on machine learning in this paper, which can play an important role in the real-world through Industry 4.0 automation.
In general, the effectiveness and the efficiency of a machine learning solution depend on the nature and characteristics of data and the performance of the learning algorithms . In the area of machine learning algorithms, classification analysis, regression, data clustering, feature engineering and dimensionality reduction, association rule learning, or reinforcement learning techniques exist to effectively build data-driven systems [ 41 , 125 ]. Besides, deep learning originated from the artificial neural network that can be used to intelligently analyze data, which is known as part of a wider family of machine learning approaches [ 96 ]. Thus, selecting a proper learning algorithm that is suitable for the target application in a particular domain is challenging. The reason is that the purpose of different learning algorithms is different, even the outcome of different learning algorithms in a similar category may vary depending on the data characteristics [ 106 ]. Thus, it is important to understand the principles of various machine learning algorithms and their applicability to apply in various real-world application areas, such as IoT systems, cybersecurity services, business and recommendation systems, smart cities, healthcare and COVID-19, context-aware systems, sustainable agriculture, and many more that are explained briefly in Sect. “ Applications of Machine Learning ”.
Based on the importance and potentiality of “Machine Learning” to analyze the data mentioned above, in this paper, we provide a comprehensive view on various types of machine learning algorithms that can be applied to enhance the intelligence and the capabilities of an application. Thus, the key contribution of this study is explaining the principles and potentiality of different machine learning techniques, and their applicability in various real-world application areas mentioned earlier. The purpose of this paper is, therefore, to provide a basic guide for those academia and industry people who want to study, research, and develop data-driven automated and intelligent systems in the relevant areas based on machine learning techniques.
The key contributions of this paper are listed as follows:
To define the scope of our study by taking into account the nature and characteristics of various types of real-world data and the capabilities of various learning techniques.
To provide a comprehensive view on machine learning algorithms that can be applied to enhance the intelligence and capabilities of a data-driven application.
To discuss the applicability of machine learning-based solutions in various real-world application domains.
To highlight and summarize the potential research directions within the scope of our study for intelligent data analysis and services.
The rest of the paper is organized as follows. The next section presents the types of data and machine learning algorithms in a broader sense and defines the scope of our study. We briefly discuss and explain different machine learning algorithms in the subsequent section followed by which various real-world application areas based on machine learning algorithms are discussed and summarized. In the penultimate section, we highlight several research issues and potential future directions, and the final section concludes this paper.
Types of Real-World Data and Machine Learning Techniques
Machine learning algorithms typically consume and process data to learn the related patterns about individuals, business processes, transactions, events, and so on. In the following, we discuss various types of real-world data as well as categories of machine learning algorithms.
Types of Real-World Data
Usually, the availability of data is considered as the key to construct a machine learning model or data-driven real-world systems [ 103 , 105 ]. Data can be of various forms, such as structured, semi-structured, or unstructured [ 41 , 72 ]. Besides, the “metadata” is another type that typically represents data about the data. In the following, we briefly discuss these types of data.
Structured: It has a well-defined structure, conforms to a data model following a standard order, which is highly organized and easily accessed, and used by an entity or a computer program. In well-defined schemes, such as relational databases, structured data are typically stored, i.e., in a tabular format. For instance, names, dates, addresses, credit card numbers, stock information, geolocation, etc. are examples of structured data.
Unstructured: On the other hand, there is no pre-defined format or organization for unstructured data, making it much more difficult to capture, process, and analyze, mostly containing text and multimedia material. For example, sensor data, emails, blog entries, wikis, and word processing documents, PDF files, audio files, videos, images, presentations, web pages, and many other types of business documents can be considered as unstructured data.
Semi-structured: Semi-structured data are not stored in a relational database like the structured data mentioned above, but it does have certain organizational properties that make it easier to analyze. HTML, XML, JSON documents, NoSQL databases, etc., are some examples of semi-structured data.
Metadata: It is not the normal form of data, but “data about data”. The primary difference between “data” and “metadata” is that data are simply the material that can classify, measure, or even document something relative to an organization’s data properties. On the other hand, metadata describes the relevant data information, giving it more significance for data users. A basic example of a document’s metadata might be the author, file size, date generated by the document, keywords to define the document, etc.
In the area of machine learning and data science, researchers use various widely used datasets for different purposes. These are, for example, cybersecurity datasets such as NSL-KDD [ 119 ], UNSW-NB15 [ 76 ], ISCX’12 [ 1 ], CIC-DDoS2019 [ 2 ], Bot-IoT [ 59 ], etc., smartphone datasets such as phone call logs [ 84 , 101 ], SMS Log [ 29 ], mobile application usages logs [ 137 ] [ 117 ], mobile phone notification logs [ 73 ] etc., IoT data [ 16 , 57 , 62 ], agriculture and e-commerce data [ 120 , 138 ], health data such as heart disease [ 92 ], diabetes mellitus [ 83 , 134 ], COVID-19 [ 43 , 74 ], etc., and many more in various application domains. The data can be in different types discussed above, which may vary from application to application in the real world. To analyze such data in a particular problem domain, and to extract the insights or useful knowledge from the data for building the real-world intelligent applications, different types of machine learning techniques can be used according to their learning capabilities, which is discussed in the following.
Types of Machine Learning Techniques
Machine Learning algorithms are mainly divided into four categories: Supervised learning, Unsupervised learning, Semi-supervised learning, and Reinforcement learning [ 75 ], as shown in Fig. 2 . In the following, we briefly discuss each type of learning technique with the scope of their applicability to solve real-world problems.
Various types of machine learning techniques
Supervised: Supervised learning is typically the task of machine learning to learn a function that maps an input to an output based on sample input-output pairs [ 41 ]. It uses labeled training data and a collection of training examples to infer a function. Supervised learning is carried out when certain goals are identified to be accomplished from a certain set of inputs [ 105 ], i.e., a task-driven approach . The most common supervised tasks are “classification” that separates the data, and “regression” that fits the data. For instance, predicting the class label or sentiment of a piece of text, like a tweet or a product review, i.e., text classification, is an example of supervised learning.
Unsupervised: Unsupervised learning analyzes unlabeled datasets without the need for human interference, i.e., a data-driven process [ 41 ]. This is widely used for extracting generative features, identifying meaningful trends and structures, groupings in results, and exploratory purposes. The most common unsupervised learning tasks are clustering, density estimation, feature learning, dimensionality reduction, finding association rules, anomaly detection, etc.
Semi-supervised: Semi-supervised learning can be defined as a hybridization of the above-mentioned supervised and unsupervised methods, as it operates on both labeled and unlabeled data [ 41 , 105 ]. Thus, it falls between learning “without supervision” and learning “with supervision”. In the real world, labeled data could be rare in several contexts, and unlabeled data are numerous, where semi-supervised learning is useful [ 75 ]. The ultimate goal of a semi-supervised learning model is to provide a better outcome for prediction than that produced using the labeled data alone from the model. Some application areas where semi-supervised learning is used include machine translation, fraud detection, labeling data and text classification.
Reinforcement: Reinforcement learning is a type of machine learning algorithm that enables software agents and machines to automatically evaluate the optimal behavior in a particular context or environment to improve its efficiency [ 52 ], i.e., an environment-driven approach . This type of learning is based on reward or penalty, and its ultimate goal is to use insights obtained from environmental activists to take action to increase the reward or minimize the risk [ 75 ]. It is a powerful tool for training AI models that can help increase automation or optimize the operational efficiency of sophisticated systems such as robotics, autonomous driving tasks, manufacturing and supply chain logistics, however, not preferable to use it for solving the basic or straightforward problems.
Thus, to build effective models in various application areas different types of machine learning techniques can play a significant role according to their learning capabilities, depending on the nature of the data discussed earlier, and the target outcome. In Table 1 , we summarize various types of machine learning techniques with examples. In the following, we provide a comprehensive view of machine learning algorithms that can be applied to enhance the intelligence and capabilities of a data-driven application.
Machine Learning Tasks and Algorithms
In this section, we discuss various machine learning algorithms that include classification analysis, regression analysis, data clustering, association rule learning, feature engineering for dimensionality reduction, as well as deep learning methods. A general structure of a machine learning-based predictive model has been shown in Fig. 3 , where the model is trained from historical data in phase 1 and the outcome is generated in phase 2 for the new test data.
A general structure of a machine learning based predictive model considering both the training and testing phase
Classification Analysis
Classification is regarded as a supervised learning method in machine learning, referring to a problem of predictive modeling as well, where a class label is predicted for a given example [ 41 ]. Mathematically, it maps a function ( f ) from input variables ( X ) to output variables ( Y ) as target, label or categories. To predict the class of given data points, it can be carried out on structured or unstructured data. For example, spam detection such as “spam” and “not spam” in email service providers can be a classification problem. In the following, we summarize the common classification problems.
Binary classification: It refers to the classification tasks having two class labels such as “true and false” or “yes and no” [ 41 ]. In such binary classification tasks, one class could be the normal state, while the abnormal state could be another class. For instance, “cancer not detected” is the normal state of a task that involves a medical test, and “cancer detected” could be considered as the abnormal state. Similarly, “spam” and “not spam” in the above example of email service providers are considered as binary classification.
Multiclass classification: Traditionally, this refers to those classification tasks having more than two class labels [ 41 ]. The multiclass classification does not have the principle of normal and abnormal outcomes, unlike binary classification tasks. Instead, within a range of specified classes, examples are classified as belonging to one. For example, it can be a multiclass classification task to classify various types of network attacks in the NSL-KDD [ 119 ] dataset, where the attack categories are classified into four class labels, such as DoS (Denial of Service Attack), U2R (User to Root Attack), R2L (Root to Local Attack), and Probing Attack.
Multi-label classification: In machine learning, multi-label classification is an important consideration where an example is associated with several classes or labels. Thus, it is a generalization of multiclass classification, where the classes involved in the problem are hierarchically structured, and each example may simultaneously belong to more than one class in each hierarchical level, e.g., multi-level text classification. For instance, Google news can be presented under the categories of a “city name”, “technology”, or “latest news”, etc. Multi-label classification includes advanced machine learning algorithms that support predicting various mutually non-exclusive classes or labels, unlike traditional classification tasks where class labels are mutually exclusive [ 82 ].
Many classification algorithms have been proposed in the machine learning and data science literature [ 41 , 125 ]. In the following, we summarize the most common and popular methods that are used widely in various application areas.
Naive Bayes (NB): The naive Bayes algorithm is based on the Bayes’ theorem with the assumption of independence between each pair of features [ 51 ]. It works well and can be used for both binary and multi-class categories in many real-world situations, such as document or text classification, spam filtering, etc. To effectively classify the noisy instances in the data and to construct a robust prediction model, the NB classifier can be used [ 94 ]. The key benefit is that, compared to more sophisticated approaches, it needs a small amount of training data to estimate the necessary parameters and quickly [ 82 ]. However, its performance may affect due to its strong assumptions on features independence. Gaussian, Multinomial, Complement, Bernoulli, and Categorical are the common variants of NB classifier [ 82 ].
Linear Discriminant Analysis (LDA): Linear Discriminant Analysis (LDA) is a linear decision boundary classifier created by fitting class conditional densities to data and applying Bayes’ rule [ 51 , 82 ]. This method is also known as a generalization of Fisher’s linear discriminant, which projects a given dataset into a lower-dimensional space, i.e., a reduction of dimensionality that minimizes the complexity of the model or reduces the resulting model’s computational costs. The standard LDA model usually suits each class with a Gaussian density, assuming that all classes share the same covariance matrix [ 82 ]. LDA is closely related to ANOVA (analysis of variance) and regression analysis, which seek to express one dependent variable as a linear combination of other features or measurements.
Logistic regression (LR): Another common probabilistic based statistical model used to solve classification issues in machine learning is Logistic Regression (LR) [ 64 ]. Logistic regression typically uses a logistic function to estimate the probabilities, which is also referred to as the mathematically defined sigmoid function in Eq. 1 . It can overfit high-dimensional datasets and works well when the dataset can be separated linearly. The regularization (L1 and L2) techniques [ 82 ] can be used to avoid over-fitting in such scenarios. The assumption of linearity between the dependent and independent variables is considered as a major drawback of Logistic Regression. It can be used for both classification and regression problems, but it is more commonly used for classification.
K-nearest neighbors (KNN): K-Nearest Neighbors (KNN) [ 9 ] is an “instance-based learning” or non-generalizing learning, also known as a “lazy learning” algorithm. It does not focus on constructing a general internal model; instead, it stores all instances corresponding to training data in n -dimensional space. KNN uses data and classifies new data points based on similarity measures (e.g., Euclidean distance function) [ 82 ]. Classification is computed from a simple majority vote of the k nearest neighbors of each point. It is quite robust to noisy training data, and accuracy depends on the data quality. The biggest issue with KNN is to choose the optimal number of neighbors to be considered. KNN can be used both for classification as well as regression.
Support vector machine (SVM): In machine learning, another common technique that can be used for classification, regression, or other tasks is a support vector machine (SVM) [ 56 ]. In high- or infinite-dimensional space, a support vector machine constructs a hyper-plane or set of hyper-planes. Intuitively, the hyper-plane, which has the greatest distance from the nearest training data points in any class, achieves a strong separation since, in general, the greater the margin, the lower the classifier’s generalization error. It is effective in high-dimensional spaces and can behave differently based on different mathematical functions known as the kernel. Linear, polynomial, radial basis function (RBF), sigmoid, etc., are the popular kernel functions used in SVM classifier [ 82 ]. However, when the data set contains more noise, such as overlapping target classes, SVM does not perform well.
Decision tree (DT): Decision tree (DT) [ 88 ] is a well-known non-parametric supervised learning method. DT learning methods are used for both the classification and regression tasks [ 82 ]. ID3 [ 87 ], C4.5 [ 88 ], and CART [ 20 ] are well known for DT algorithms. Moreover, recently proposed BehavDT [ 100 ], and IntrudTree [ 97 ] by Sarker et al. are effective in the relevant application domains, such as user behavior analytics and cybersecurity analytics, respectively. By sorting down the tree from the root to some leaf nodes, as shown in Fig. 4 , DT classifies the instances. Instances are classified by checking the attribute defined by that node, starting at the root node of the tree, and then moving down the tree branch corresponding to the attribute value. For splitting, the most popular criteria are “gini” for the Gini impurity and “entropy” for the information gain that can be expressed mathematically as [ 82 ].
An example of a decision tree structure
An example of a random forest structure considering multiple decision trees
Random forest (RF): A random forest classifier [ 19 ] is well known as an ensemble classification technique that is used in the field of machine learning and data science in various application areas. This method uses “parallel ensembling” which fits several decision tree classifiers in parallel, as shown in Fig. 5 , on different data set sub-samples and uses majority voting or averages for the outcome or final result. It thus minimizes the over-fitting problem and increases the prediction accuracy and control [ 82 ]. Therefore, the RF learning model with multiple decision trees is typically more accurate than a single decision tree based model [ 106 ]. To build a series of decision trees with controlled variation, it combines bootstrap aggregation (bagging) [ 18 ] and random feature selection [ 11 ]. It is adaptable to both classification and regression problems and fits well for both categorical and continuous values.
Adaptive Boosting (AdaBoost): Adaptive Boosting (AdaBoost) is an ensemble learning process that employs an iterative approach to improve poor classifiers by learning from their errors. This is developed by Yoav Freund et al. [ 35 ] and also known as “meta-learning”. Unlike the random forest that uses parallel ensembling, Adaboost uses “sequential ensembling”. It creates a powerful classifier by combining many poorly performing classifiers to obtain a good classifier of high accuracy. In that sense, AdaBoost is called an adaptive classifier by significantly improving the efficiency of the classifier, but in some instances, it can trigger overfits. AdaBoost is best used to boost the performance of decision trees, base estimator [ 82 ], on binary classification problems, however, is sensitive to noisy data and outliers.
Extreme gradient boosting (XGBoost): Gradient Boosting, like Random Forests [ 19 ] above, is an ensemble learning algorithm that generates a final model based on a series of individual models, typically decision trees. The gradient is used to minimize the loss function, similar to how neural networks [ 41 ] use gradient descent to optimize weights. Extreme Gradient Boosting (XGBoost) is a form of gradient boosting that takes more detailed approximations into account when determining the best model [ 82 ]. It computes second-order gradients of the loss function to minimize loss and advanced regularization (L1 and L2) [ 82 ], which reduces over-fitting, and improves model generalization and performance. XGBoost is fast to interpret and can handle large-sized datasets well.
Stochastic gradient descent (SGD): Stochastic gradient descent (SGD) [ 41 ] is an iterative method for optimizing an objective function with appropriate smoothness properties, where the word ‘stochastic’ refers to random probability. This reduces the computational burden, particularly in high-dimensional optimization problems, allowing for faster iterations in exchange for a lower convergence rate. A gradient is the slope of a function that calculates a variable’s degree of change in response to another variable’s changes. Mathematically, the Gradient Descent is a convex function whose output is a partial derivative of a set of its input parameters. Let, \(\alpha\) is the learning rate, and \(J_i\) is the training example cost of \(i \mathrm{th}\) , then Eq. ( 4 ) represents the stochastic gradient descent weight update method at the \(j^\mathrm{th}\) iteration. In large-scale and sparse machine learning, SGD has been successfully applied to problems often encountered in text classification and natural language processing [ 82 ]. However, SGD is sensitive to feature scaling and needs a range of hyperparameters, such as the regularization parameter and the number of iterations.
Rule-based classification : The term rule-based classification can be used to refer to any classification scheme that makes use of IF-THEN rules for class prediction. Several classification algorithms such as Zero-R [ 125 ], One-R [ 47 ], decision trees [ 87 , 88 ], DTNB [ 110 ], Ripple Down Rule learner (RIDOR) [ 125 ], Repeated Incremental Pruning to Produce Error Reduction (RIPPER) [ 126 ] exist with the ability of rule generation. The decision tree is one of the most common rule-based classification algorithms among these techniques because it has several advantages, such as being easier to interpret; the ability to handle high-dimensional data; simplicity and speed; good accuracy; and the capability to produce rules for human clear and understandable classification [ 127 ] [ 128 ]. The decision tree-based rules also provide significant accuracy in a prediction model for unseen test cases [ 106 ]. Since the rules are easily interpretable, these rule-based classifiers are often used to produce descriptive models that can describe a system including the entities and their relationships.
Classification vs. regression. In classification the dotted line represents a linear boundary that separates the two classes; in regression, the dotted line models the linear relationship between the two variables
Regression Analysis
Regression analysis includes several methods of machine learning that allow to predict a continuous ( y ) result variable based on the value of one or more ( x ) predictor variables [ 41 ]. The most significant distinction between classification and regression is that classification predicts distinct class labels, while regression facilitates the prediction of a continuous quantity. Figure 6 shows an example of how classification is different with regression models. Some overlaps are often found between the two types of machine learning algorithms. Regression models are now widely used in a variety of fields, including financial forecasting or prediction, cost estimation, trend analysis, marketing, time series estimation, drug response modeling, and many more. Some of the familiar types of regression algorithms are linear, polynomial, lasso and ridge regression, etc., which are explained briefly in the following.
Simple and multiple linear regression: This is one of the most popular ML modeling techniques as well as a well-known regression technique. In this technique, the dependent variable is continuous, the independent variable(s) can be continuous or discrete, and the form of the regression line is linear. Linear regression creates a relationship between the dependent variable ( Y ) and one or more independent variables ( X ) (also known as regression line) using the best fit straight line [ 41 ]. It is defined by the following equations:
where a is the intercept, b is the slope of the line, and e is the error term. This equation can be used to predict the value of the target variable based on the given predictor variable(s). Multiple linear regression is an extension of simple linear regression that allows two or more predictor variables to model a response variable, y, as a linear function [ 41 ] defined in Eq. 6 , whereas simple linear regression has only 1 independent variable, defined in Eq. 5 .
Polynomial regression: Polynomial regression is a form of regression analysis in which the relationship between the independent variable x and the dependent variable y is not linear, but is the polynomial degree of \(n^\mathrm{th}\) in x [ 82 ]. The equation for polynomial regression is also derived from linear regression (polynomial regression of degree 1) equation, which is defined as below:
Here, y is the predicted/target output, \(b_0, b_1,... b_n\) are the regression coefficients, x is an independent/ input variable. In simple words, we can say that if data are not distributed linearly, instead it is \(n^\mathrm{th}\) degree of polynomial then we use polynomial regression to get desired output.
LASSO and ridge regression: LASSO and Ridge regression are well known as powerful techniques which are typically used for building learning models in presence of a large number of features, due to their capability to preventing over-fitting and reducing the complexity of the model. The LASSO (least absolute shrinkage and selection operator) regression model uses L 1 regularization technique [ 82 ] that uses shrinkage, which penalizes “absolute value of magnitude of coefficients” ( L 1 penalty). As a result, LASSO appears to render coefficients to absolute zero. Thus, LASSO regression aims to find the subset of predictors that minimizes the prediction error for a quantitative response variable. On the other hand, ridge regression uses L 2 regularization [ 82 ], which is the “squared magnitude of coefficients” ( L 2 penalty). Thus, ridge regression forces the weights to be small but never sets the coefficient value to zero, and does a non-sparse solution. Overall, LASSO regression is useful to obtain a subset of predictors by eliminating less important features, and ridge regression is useful when a data set has “multicollinearity” which refers to the predictors that are correlated with other predictors.
Cluster Analysis
Cluster analysis, also known as clustering, is an unsupervised machine learning technique for identifying and grouping related data points in large datasets without concern for the specific outcome. It does grouping a collection of objects in such a way that objects in the same category, called a cluster, are in some sense more similar to each other than objects in other groups [ 41 ]. It is often used as a data analysis technique to discover interesting trends or patterns in data, e.g., groups of consumers based on their behavior. In a broad range of application areas, such as cybersecurity, e-commerce, mobile data processing, health analytics, user modeling and behavioral analytics, clustering can be used. In the following, we briefly discuss and summarize various types of clustering methods.
Partitioning methods: Based on the features and similarities in the data, this clustering approach categorizes the data into multiple groups or clusters. The data scientists or analysts typically determine the number of clusters either dynamically or statically depending on the nature of the target applications, to produce for the methods of clustering. The most common clustering algorithms based on partitioning methods are K-means [ 69 ], K-Mediods [ 80 ], CLARA [ 55 ] etc.
Density-based methods: To identify distinct groups or clusters, it uses the concept that a cluster in the data space is a contiguous region of high point density isolated from other such clusters by contiguous regions of low point density. Points that are not part of a cluster are considered as noise. The typical clustering algorithms based on density are DBSCAN [ 32 ], OPTICS [ 12 ] etc. The density-based methods typically struggle with clusters of similar density and high dimensionality data.
Hierarchical-based methods: Hierarchical clustering typically seeks to construct a hierarchy of clusters, i.e., the tree structure. Strategies for hierarchical clustering generally fall into two types: (i) Agglomerative—a “bottom-up” approach in which each observation begins in its cluster and pairs of clusters are combined as one, moves up the hierarchy, and (ii) Divisive—a “top-down” approach in which all observations begin in one cluster and splits are performed recursively, moves down the hierarchy, as shown in Fig 7 . Our earlier proposed BOTS technique, Sarker et al. [ 102 ] is an example of a hierarchical, particularly, bottom-up clustering algorithm.
Grid-based methods: To deal with massive datasets, grid-based clustering is especially suitable. To obtain clusters, the principle is first to summarize the dataset with a grid representation and then to combine grid cells. STING [ 122 ], CLIQUE [ 6 ], etc. are the standard algorithms of grid-based clustering.
Model-based methods: There are mainly two types of model-based clustering algorithms: one that uses statistical learning, and the other based on a method of neural network learning [ 130 ]. For instance, GMM [ 89 ] is an example of a statistical learning method, and SOM [ 22 ] [ 96 ] is an example of a neural network learning method.
Constraint-based methods: Constrained-based clustering is a semi-supervised approach to data clustering that uses constraints to incorporate domain knowledge. Application or user-oriented constraints are incorporated to perform the clustering. The typical algorithms of this kind of clustering are COP K-means [ 121 ], CMWK-Means [ 27 ], etc.
A graphical interpretation of the widely-used hierarchical clustering (Bottom-up and top-down) technique
Many clustering algorithms have been proposed with the ability to grouping data in machine learning and data science literature [ 41 , 125 ]. In the following, we summarize the popular methods that are used widely in various application areas.
K-means clustering: K-means clustering [ 69 ] is a fast, robust, and simple algorithm that provides reliable results when data sets are well-separated from each other. The data points are allocated to a cluster in this algorithm in such a way that the amount of the squared distance between the data points and the centroid is as small as possible. In other words, the K-means algorithm identifies the k number of centroids and then assigns each data point to the nearest cluster while keeping the centroids as small as possible. Since it begins with a random selection of cluster centers, the results can be inconsistent. Since extreme values can easily affect a mean, the K-means clustering algorithm is sensitive to outliers. K-medoids clustering [ 91 ] is a variant of K-means that is more robust to noises and outliers.
Mean-shift clustering: Mean-shift clustering [ 37 ] is a nonparametric clustering technique that does not require prior knowledge of the number of clusters or constraints on cluster shape. Mean-shift clustering aims to discover “blobs” in a smooth distribution or density of samples [ 82 ]. It is a centroid-based algorithm that works by updating centroid candidates to be the mean of the points in a given region. To form the final set of centroids, these candidates are filtered in a post-processing stage to remove near-duplicates. Cluster analysis in computer vision and image processing are examples of application domains. Mean Shift has the disadvantage of being computationally expensive. Moreover, in cases of high dimension, where the number of clusters shifts abruptly, the mean-shift algorithm does not work well.
DBSCAN: Density-based spatial clustering of applications with noise (DBSCAN) [ 32 ] is a base algorithm for density-based clustering which is widely used in data mining and machine learning. This is known as a non-parametric density-based clustering technique for separating high-density clusters from low-density clusters that are used in model building. DBSCAN’s main idea is that a point belongs to a cluster if it is close to many points from that cluster. It can find clusters of various shapes and sizes in a vast volume of data that is noisy and contains outliers. DBSCAN, unlike k-means, does not require a priori specification of the number of clusters in the data and can find arbitrarily shaped clusters. Although k-means is much faster than DBSCAN, it is efficient at finding high-density regions and outliers, i.e., is robust to outliers.
GMM clustering: Gaussian mixture models (GMMs) are often used for data clustering, which is a distribution-based clustering algorithm. A Gaussian mixture model is a probabilistic model in which all the data points are produced by a mixture of a finite number of Gaussian distributions with unknown parameters [ 82 ]. To find the Gaussian parameters for each cluster, an optimization algorithm called expectation-maximization (EM) [ 82 ] can be used. EM is an iterative method that uses a statistical model to estimate the parameters. In contrast to k-means, Gaussian mixture models account for uncertainty and return the likelihood that a data point belongs to one of the k clusters. GMM clustering is more robust than k-means and works well even with non-linear data distributions.
Agglomerative hierarchical clustering: The most common method of hierarchical clustering used to group objects in clusters based on their similarity is agglomerative clustering. This technique uses a bottom-up approach, where each object is first treated as a singleton cluster by the algorithm. Following that, pairs of clusters are merged one by one until all clusters have been merged into a single large cluster containing all objects. The result is a dendrogram, which is a tree-based representation of the elements. Single linkage [ 115 ], Complete linkage [ 116 ], BOTS [ 102 ] etc. are some examples of such techniques. The main advantage of agglomerative hierarchical clustering over k-means is that the tree-structure hierarchy generated by agglomerative clustering is more informative than the unstructured collection of flat clusters returned by k-means, which can help to make better decisions in the relevant application areas.
Dimensionality Reduction and Feature Learning
In machine learning and data science, high-dimensional data processing is a challenging task for both researchers and application developers. Thus, dimensionality reduction which is an unsupervised learning technique, is important because it leads to better human interpretations, lower computational costs, and avoids overfitting and redundancy by simplifying models. Both the process of feature selection and feature extraction can be used for dimensionality reduction. The primary distinction between the selection and extraction of features is that the “feature selection” keeps a subset of the original features [ 97 ], while “feature extraction” creates brand new ones [ 98 ]. In the following, we briefly discuss these techniques.
Feature selection: The selection of features, also known as the selection of variables or attributes in the data, is the process of choosing a subset of unique features (variables, predictors) to use in building machine learning and data science model. It decreases a model’s complexity by eliminating the irrelevant or less important features and allows for faster training of machine learning algorithms. A right and optimal subset of the selected features in a problem domain is capable to minimize the overfitting problem through simplifying and generalizing the model as well as increases the model’s accuracy [ 97 ]. Thus, “feature selection” [ 66 , 99 ] is considered as one of the primary concepts in machine learning that greatly affects the effectiveness and efficiency of the target machine learning model. Chi-squared test, Analysis of variance (ANOVA) test, Pearson’s correlation coefficient, recursive feature elimination, are some popular techniques that can be used for feature selection.
Feature extraction: In a machine learning-based model or system, feature extraction techniques usually provide a better understanding of the data, a way to improve prediction accuracy, and to reduce computational cost or training time. The aim of “feature extraction” [ 66 , 99 ] is to reduce the number of features in a dataset by generating new ones from the existing ones and then discarding the original features. The majority of the information found in the original set of features can then be summarized using this new reduced set of features. For instance, principal components analysis (PCA) is often used as a dimensionality-reduction technique to extract a lower-dimensional space creating new brand components from the existing features in a dataset [ 98 ].
Many algorithms have been proposed to reduce data dimensions in the machine learning and data science literature [ 41 , 125 ]. In the following, we summarize the popular methods that are used widely in various application areas.
Variance threshold: A simple basic approach to feature selection is the variance threshold [ 82 ]. This excludes all features of low variance, i.e., all features whose variance does not exceed the threshold. It eliminates all zero-variance characteristics by default, i.e., characteristics that have the same value in all samples. This feature selection algorithm looks only at the ( X ) features, not the ( y ) outputs needed, and can, therefore, be used for unsupervised learning.
Pearson correlation: Pearson’s correlation is another method to understand a feature’s relation to the response variable and can be used for feature selection [ 99 ]. This method is also used for finding the association between the features in a dataset. The resulting value is \([-1, 1]\) , where \(-1\) means perfect negative correlation, \(+1\) means perfect positive correlation, and 0 means that the two variables do not have a linear correlation. If two random variables represent X and Y , then the correlation coefficient between X and Y is defined as [ 41 ]
ANOVA: Analysis of variance (ANOVA) is a statistical tool used to verify the mean values of two or more groups that differ significantly from each other. ANOVA assumes a linear relationship between the variables and the target and the variables’ normal distribution. To statistically test the equality of means, the ANOVA method utilizes F tests. For feature selection, the results ‘ANOVA F value’ [ 82 ] of this test can be used where certain features independent of the goal variable can be omitted.
Chi square: The chi-square \({\chi }^2\) [ 82 ] statistic is an estimate of the difference between the effects of a series of events or variables observed and expected frequencies. The magnitude of the difference between the real and observed values, the degrees of freedom, and the sample size depends on \({\chi }^2\) . The chi-square \({\chi }^2\) is commonly used for testing relationships between categorical variables. If \(O_i\) represents observed value and \(E_i\) represents expected value, then
Recursive feature elimination (RFE): Recursive Feature Elimination (RFE) is a brute force approach to feature selection. RFE [ 82 ] fits the model and removes the weakest feature before it meets the specified number of features. Features are ranked by the coefficients or feature significance of the model. RFE aims to remove dependencies and collinearity in the model by recursively removing a small number of features per iteration.
Model-based selection: To reduce the dimensionality of the data, linear models penalized with the L 1 regularization can be used. Least absolute shrinkage and selection operator (Lasso) regression is a type of linear regression that has the property of shrinking some of the coefficients to zero [ 82 ]. Therefore, that feature can be removed from the model. Thus, the penalized lasso regression method, often used in machine learning to select the subset of variables. Extra Trees Classifier [ 82 ] is an example of a tree-based estimator that can be used to compute impurity-based function importance, which can then be used to discard irrelevant features.
Principal component analysis (PCA): Principal component analysis (PCA) is a well-known unsupervised learning approach in the field of machine learning and data science. PCA is a mathematical technique that transforms a set of correlated variables into a set of uncorrelated variables known as principal components [ 48 , 81 ]. Figure 8 shows an example of the effect of PCA on various dimensions space, where Fig. 8 a shows the original features in 3D space, and Fig. 8 b shows the created principal components PC1 and PC2 onto a 2D plane, and 1D line with the principal component PC1 respectively. Thus, PCA can be used as a feature extraction technique that reduces the dimensionality of the datasets, and to build an effective machine learning model [ 98 ]. Technically, PCA identifies the completely transformed with the highest eigenvalues of a covariance matrix and then uses those to project the data into a new subspace of equal or fewer dimensions [ 82 ].
An example of a principal component analysis (PCA) and created principal components PC1 and PC2 in different dimension space
Association Rule Learning
Association rule learning is a rule-based machine learning approach to discover interesting relationships, “IF-THEN” statements, in large datasets between variables [ 7 ]. One example is that “if a customer buys a computer or laptop (an item), s/he is likely to also buy anti-virus software (another item) at the same time”. Association rules are employed today in many application areas, including IoT services, medical diagnosis, usage behavior analytics, web usage mining, smartphone applications, cybersecurity applications, and bioinformatics. In comparison to sequence mining, association rule learning does not usually take into account the order of things within or across transactions. A common way of measuring the usefulness of association rules is to use its parameter, the ‘support’ and ‘confidence’, which is introduced in [ 7 ].
In the data mining literature, many association rule learning methods have been proposed, such as logic dependent [ 34 ], frequent pattern based [ 8 , 49 , 68 ], and tree-based [ 42 ]. The most popular association rule learning algorithms are summarized below.
AIS and SETM: AIS is the first algorithm proposed by Agrawal et al. [ 7 ] for association rule mining. The AIS algorithm’s main downside is that too many candidate itemsets are generated, requiring more space and wasting a lot of effort. This algorithm calls for too many passes over the entire dataset to produce the rules. Another approach SETM [ 49 ] exhibits good performance and stable behavior with execution time; however, it suffers from the same flaw as the AIS algorithm.
Apriori: For generating association rules for a given dataset, Agrawal et al. [ 8 ] proposed the Apriori, Apriori-TID, and Apriori-Hybrid algorithms. These later algorithms outperform the AIS and SETM mentioned above due to the Apriori property of frequent itemset [ 8 ]. The term ‘Apriori’ usually refers to having prior knowledge of frequent itemset properties. Apriori uses a “bottom-up” approach, where it generates the candidate itemsets. To reduce the search space, Apriori uses the property “all subsets of a frequent itemset must be frequent; and if an itemset is infrequent, then all its supersets must also be infrequent”. Another approach predictive Apriori [ 108 ] can also generate rules; however, it receives unexpected results as it combines both the support and confidence. The Apriori [ 8 ] is the widely applicable techniques in mining association rules.
ECLAT: This technique was proposed by Zaki et al. [ 131 ] and stands for Equivalence Class Clustering and bottom-up Lattice Traversal. ECLAT uses a depth-first search to find frequent itemsets. In contrast to the Apriori [ 8 ] algorithm, which represents data in a horizontal pattern, it represents data vertically. Hence, the ECLAT algorithm is more efficient and scalable in the area of association rule learning. This algorithm is better suited for small and medium datasets whereas the Apriori algorithm is used for large datasets.
FP-Growth: Another common association rule learning technique based on the frequent-pattern tree (FP-tree) proposed by Han et al. [ 42 ] is Frequent Pattern Growth, known as FP-Growth. The key difference with Apriori is that while generating rules, the Apriori algorithm [ 8 ] generates frequent candidate itemsets; on the other hand, the FP-growth algorithm [ 42 ] prevents candidate generation and thus produces a tree by the successful strategy of ‘divide and conquer’ approach. Due to its sophistication, however, FP-Tree is challenging to use in an interactive mining environment [ 133 ]. Thus, the FP-Tree would not fit into memory for massive data sets, making it challenging to process big data as well. Another solution is RARM (Rapid Association Rule Mining) proposed by Das et al. [ 26 ] but faces a related FP-tree issue [ 133 ].
ABC-RuleMiner: A rule-based machine learning method, recently proposed in our earlier paper, by Sarker et al. [ 104 ], to discover the interesting non-redundant rules to provide real-world intelligent services. This algorithm effectively identifies the redundancy in associations by taking into account the impact or precedence of the related contextual features and discovers a set of non-redundant association rules. This algorithm first constructs an association generation tree (AGT), a top-down approach, and then extracts the association rules through traversing the tree. Thus, ABC-RuleMiner is more potent than traditional rule-based methods in terms of both non-redundant rule generation and intelligent decision-making, particularly in a context-aware smart computing environment, where human or user preferences are involved.
Among the association rule learning techniques discussed above, Apriori [ 8 ] is the most widely used algorithm for discovering association rules from a given dataset [ 133 ]. The main strength of the association learning technique is its comprehensiveness, as it generates all associations that satisfy the user-specified constraints, such as minimum support and confidence value. The ABC-RuleMiner approach [ 104 ] discussed earlier could give significant results in terms of non-redundant rule generation and intelligent decision-making for the relevant application areas in the real world.
Reinforcement Learning
Reinforcement learning (RL) is a machine learning technique that allows an agent to learn by trial and error in an interactive environment using input from its actions and experiences. Unlike supervised learning, which is based on given sample data or examples, the RL method is based on interacting with the environment. The problem to be solved in reinforcement learning (RL) is defined as a Markov Decision Process (MDP) [ 86 ], i.e., all about sequentially making decisions. An RL problem typically includes four elements such as Agent, Environment, Rewards, and Policy.
RL can be split roughly into Model-based and Model-free techniques. Model-based RL is the process of inferring optimal behavior from a model of the environment by performing actions and observing the results, which include the next state and the immediate reward [ 85 ]. AlphaZero, AlphaGo [ 113 ] are examples of the model-based approaches. On the other hand, a model-free approach does not use the distribution of the transition probability and the reward function associated with MDP. Q-learning, Deep Q Network, Monte Carlo Control, SARSA (State–Action–Reward–State–Action), etc. are some examples of model-free algorithms [ 52 ]. The policy network, which is required for model-based RL but not for model-free, is the key difference between model-free and model-based learning. In the following, we discuss the popular RL algorithms.
Monte Carlo methods: Monte Carlo techniques, or Monte Carlo experiments, are a wide category of computational algorithms that rely on repeated random sampling to obtain numerical results [ 52 ]. The underlying concept is to use randomness to solve problems that are deterministic in principle. Optimization, numerical integration, and making drawings from the probability distribution are the three problem classes where Monte Carlo techniques are most commonly used.
Q-learning: Q-learning is a model-free reinforcement learning algorithm for learning the quality of behaviors that tell an agent what action to take under what conditions [ 52 ]. It does not need a model of the environment (hence the term “model-free”), and it can deal with stochastic transitions and rewards without the need for adaptations. The ‘Q’ in Q-learning usually stands for quality, as the algorithm calculates the maximum expected rewards for a given behavior in a given state.
Deep Q-learning: The basic working step in Deep Q-Learning [ 52 ] is that the initial state is fed into the neural network, which returns the Q-value of all possible actions as an output. Still, when we have a reasonably simple setting to overcome, Q-learning works well. However, when the number of states and actions becomes more complicated, deep learning can be used as a function approximator.
Reinforcement learning, along with supervised and unsupervised learning, is one of the basic machine learning paradigms. RL can be used to solve numerous real-world problems in various fields, such as game theory, control theory, operations analysis, information theory, simulation-based optimization, manufacturing, supply chain logistics, multi-agent systems, swarm intelligence, aircraft control, robot motion control, and many more.
Artificial Neural Network and Deep Learning
Deep learning is part of a wider family of artificial neural networks (ANN)-based machine learning approaches with representation learning. Deep learning provides a computational architecture by combining several processing layers, such as input, hidden, and output layers, to learn from data [ 41 ]. The main advantage of deep learning over traditional machine learning methods is its better performance in several cases, particularly learning from large datasets [ 105 , 129 ]. Figure 9 shows a general performance of deep learning over machine learning considering the increasing amount of data. However, it may vary depending on the data characteristics and experimental set up.
Machine learning and deep learning performance in general with the amount of data
The most common deep learning algorithms are: Multi-layer Perceptron (MLP), Convolutional Neural Network (CNN, or ConvNet), Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) [ 96 ]. In the following, we discuss various types of deep learning methods that can be used to build effective data-driven models for various purposes.
A structure of an artificial neural network modeling with multiple processing layers
MLP: The base architecture of deep learning, which is also known as the feed-forward artificial neural network, is called a multilayer perceptron (MLP) [ 82 ]. A typical MLP is a fully connected network consisting of an input layer, one or more hidden layers, and an output layer, as shown in Fig. 10 . Each node in one layer connects to each node in the following layer at a certain weight. MLP utilizes the “Backpropagation” technique [ 41 ], the most “fundamental building block” in a neural network, to adjust the weight values internally while building the model. MLP is sensitive to scaling features and allows a variety of hyperparameters to be tuned, such as the number of hidden layers, neurons, and iterations, which can result in a computationally costly model.
CNN or ConvNet: The convolution neural network (CNN) [ 65 ] enhances the design of the standard ANN, consisting of convolutional layers, pooling layers, as well as fully connected layers, as shown in Fig. 11 . As it takes the advantage of the two-dimensional (2D) structure of the input data, it is typically broadly used in several areas such as image and video recognition, image processing and classification, medical image analysis, natural language processing, etc. While CNN has a greater computational burden, without any manual intervention, it has the advantage of automatically detecting the important features, and hence CNN is considered to be more powerful than conventional ANN. A number of advanced deep learning models based on CNN can be used in the field, such as AlexNet [ 60 ], Xception [ 24 ], Inception [ 118 ], Visual Geometry Group (VGG) [ 44 ], ResNet [ 45 ], etc.
LSTM-RNN: Long short-term memory (LSTM) is an artificial recurrent neural network (RNN) architecture used in the area of deep learning [ 38 ]. LSTM has feedback links, unlike normal feed-forward neural networks. LSTM networks are well-suited for analyzing and learning sequential data, such as classifying, processing, and predicting data based on time series data, which differentiates it from other conventional networks. Thus, LSTM can be used when the data are in a sequential format, such as time, sentence, etc., and commonly applied in the area of time-series analysis, natural language processing, speech recognition, etc.
An example of a convolutional neural network (CNN or ConvNet) including multiple convolution and pooling layers
In addition to these most common deep learning methods discussed above, several other deep learning approaches [ 96 ] exist in the area for various purposes. For instance, the self-organizing map (SOM) [ 58 ] uses unsupervised learning to represent the high-dimensional data by a 2D grid map, thus achieving dimensionality reduction. The autoencoder (AE) [ 15 ] is another learning technique that is widely used for dimensionality reduction as well and feature extraction in unsupervised learning tasks. Restricted Boltzmann machines (RBM) [ 46 ] can be used for dimensionality reduction, classification, regression, collaborative filtering, feature learning, and topic modeling. A deep belief network (DBN) is typically composed of simple, unsupervised networks such as restricted Boltzmann machines (RBMs) or autoencoders, and a backpropagation neural network (BPNN) [ 123 ]. A generative adversarial network (GAN) [ 39 ] is a form of the network for deep learning that can generate data with characteristics close to the actual data input. Transfer learning is currently very common because it can train deep neural networks with comparatively low data, which is typically the re-use of a new problem with a pre-trained model [ 124 ]. A brief discussion of these artificial neural networks (ANN) and deep learning (DL) models are summarized in our earlier paper Sarker et al. [ 96 ].
Overall, based on the learning techniques discussed above, we can conclude that various types of machine learning techniques, such as classification analysis, regression, data clustering, feature selection and extraction, and dimensionality reduction, association rule learning, reinforcement learning, or deep learning techniques, can play a significant role for various purposes according to their capabilities. In the following section, we discuss several application areas based on machine learning algorithms.
Applications of Machine Learning
In the current age of the Fourth Industrial Revolution (4IR), machine learning becomes popular in various application areas, because of its learning capabilities from the past and making intelligent decisions. In the following, we summarize and discuss ten popular application areas of machine learning technology.
Predictive analytics and intelligent decision-making: A major application field of machine learning is intelligent decision-making by data-driven predictive analytics [ 21 , 70 ]. The basis of predictive analytics is capturing and exploiting relationships between explanatory variables and predicted variables from previous events to predict the unknown outcome [ 41 ]. For instance, identifying suspects or criminals after a crime has been committed, or detecting credit card fraud as it happens. Another application, where machine learning algorithms can assist retailers in better understanding consumer preferences and behavior, better manage inventory, avoiding out-of-stock situations, and optimizing logistics and warehousing in e-commerce. Various machine learning algorithms such as decision trees, support vector machines, artificial neural networks, etc. [ 106 , 125 ] are commonly used in the area. Since accurate predictions provide insight into the unknown, they can improve the decisions of industries, businesses, and almost any organization, including government agencies, e-commerce, telecommunications, banking and financial services, healthcare, sales and marketing, transportation, social networking, and many others.
Cybersecurity and threat intelligence: Cybersecurity is one of the most essential areas of Industry 4.0. [ 114 ], which is typically the practice of protecting networks, systems, hardware, and data from digital attacks [ 114 ]. Machine learning has become a crucial cybersecurity technology that constantly learns by analyzing data to identify patterns, better detect malware in encrypted traffic, find insider threats, predict where bad neighborhoods are online, keep people safe while browsing, or secure data in the cloud by uncovering suspicious activity. For instance, clustering techniques can be used to identify cyber-anomalies, policy violations, etc. To detect various types of cyber-attacks or intrusions machine learning classification models by taking into account the impact of security features are useful [ 97 ]. Various deep learning-based security models can also be used on the large scale of security datasets [ 96 , 129 ]. Moreover, security policy rules generated by association rule learning techniques can play a significant role to build a rule-based security system [ 105 ]. Thus, we can say that various learning techniques discussed in Sect. Machine Learning Tasks and Algorithms , can enable cybersecurity professionals to be more proactive inefficiently preventing threats and cyber-attacks.
Internet of things (IoT) and smart cities: Internet of Things (IoT) is another essential area of Industry 4.0. [ 114 ], which turns everyday objects into smart objects by allowing them to transmit data and automate tasks without the need for human interaction. IoT is, therefore, considered to be the big frontier that can enhance almost all activities in our lives, such as smart governance, smart home, education, communication, transportation, retail, agriculture, health care, business, and many more [ 70 ]. Smart city is one of IoT’s core fields of application, using technologies to enhance city services and residents’ living experiences [ 132 , 135 ]. As machine learning utilizes experience to recognize trends and create models that help predict future behavior and events, it has become a crucial technology for IoT applications [ 103 ]. For example, to predict traffic in smart cities, parking availability prediction, estimate the total usage of energy of the citizens for a particular period, make context-aware and timely decisions for the people, etc. are some tasks that can be solved using machine learning techniques according to the current needs of the people.
Traffic prediction and transportation: Transportation systems have become a crucial component of every country’s economic development. Nonetheless, several cities around the world are experiencing an excessive rise in traffic volume, resulting in serious issues such as delays, traffic congestion, higher fuel prices, increased CO \(_2\) pollution, accidents, emergencies, and a decline in modern society’s quality of life [ 40 ]. Thus, an intelligent transportation system through predicting future traffic is important, which is an indispensable part of a smart city. Accurate traffic prediction based on machine and deep learning modeling can help to minimize the issues [ 17 , 30 , 31 ]. For example, based on the travel history and trend of traveling through various routes, machine learning can assist transportation companies in predicting possible issues that may occur on specific routes and recommending their customers to take a different path. Ultimately, these learning-based data-driven models help improve traffic flow, increase the usage and efficiency of sustainable modes of transportation, and limit real-world disruption by modeling and visualizing future changes.
Healthcare and COVID-19 pandemic: Machine learning can help to solve diagnostic and prognostic problems in a variety of medical domains, such as disease prediction, medical knowledge extraction, detecting regularities in data, patient management, etc. [ 33 , 77 , 112 ]. Coronavirus disease (COVID-19) is an infectious disease caused by a newly discovered coronavirus, according to the World Health Organization (WHO) [ 3 ]. Recently, the learning techniques have become popular in the battle against COVID-19 [ 61 , 63 ]. For the COVID-19 pandemic, the learning techniques are used to classify patients at high risk, their mortality rate, and other anomalies [ 61 ]. It can also be used to better understand the virus’s origin, COVID-19 outbreak prediction, as well as for disease diagnosis and treatment [ 14 , 50 ]. With the help of machine learning, researchers can forecast where and when, the COVID-19 is likely to spread, and notify those regions to match the required arrangements. Deep learning also provides exciting solutions to the problems of medical image processing and is seen as a crucial technique for potential applications, particularly for COVID-19 pandemic [ 10 , 78 , 111 ]. Overall, machine and deep learning techniques can help to fight the COVID-19 virus and the pandemic as well as intelligent clinical decisions making in the domain of healthcare.
E-commerce and product recommendations: Product recommendation is one of the most well known and widely used applications of machine learning, and it is one of the most prominent features of almost any e-commerce website today. Machine learning technology can assist businesses in analyzing their consumers’ purchasing histories and making customized product suggestions for their next purchase based on their behavior and preferences. E-commerce companies, for example, can easily position product suggestions and offers by analyzing browsing trends and click-through rates of specific items. Using predictive modeling based on machine learning techniques, many online retailers, such as Amazon [ 71 ], can better manage inventory, prevent out-of-stock situations, and optimize logistics and warehousing. The future of sales and marketing is the ability to capture, evaluate, and use consumer data to provide a customized shopping experience. Furthermore, machine learning techniques enable companies to create packages and content that are tailored to the needs of their customers, allowing them to maintain existing customers while attracting new ones.
NLP and sentiment analysis: Natural language processing (NLP) involves the reading and understanding of spoken or written language through the medium of a computer [ 79 , 103 ]. Thus, NLP helps computers, for instance, to read a text, hear speech, interpret it, analyze sentiment, and decide which aspects are significant, where machine learning techniques can be used. Virtual personal assistant, chatbot, speech recognition, document description, language or machine translation, etc. are some examples of NLP-related tasks. Sentiment Analysis [ 90 ] (also referred to as opinion mining or emotion AI) is an NLP sub-field that seeks to identify and extract public mood and views within a given text through blogs, reviews, social media, forums, news, etc. For instance, businesses and brands use sentiment analysis to understand the social sentiment of their brand, product, or service through social media platforms or the web as a whole. Overall, sentiment analysis is considered as a machine learning task that analyzes texts for polarity, such as “positive”, “negative”, or “neutral” along with more intense emotions like very happy, happy, sad, very sad, angry, have interest, or not interested etc.
Image, speech and pattern recognition: Image recognition [ 36 ] is a well-known and widespread example of machine learning in the real world, which can identify an object as a digital image. For instance, to label an x-ray as cancerous or not, character recognition, or face detection in an image, tagging suggestions on social media, e.g., Facebook, are common examples of image recognition. Speech recognition [ 23 ] is also very popular that typically uses sound and linguistic models, e.g., Google Assistant, Cortana, Siri, Alexa, etc. [ 67 ], where machine learning methods are used. Pattern recognition [ 13 ] is defined as the automated recognition of patterns and regularities in data, e.g., image analysis. Several machine learning techniques such as classification, feature selection, clustering, or sequence labeling methods are used in the area.
Sustainable agriculture: Agriculture is essential to the survival of all human activities [ 109 ]. Sustainable agriculture practices help to improve agricultural productivity while also reducing negative impacts on the environment [ 5 , 25 , 109 ]. The sustainable agriculture supply chains are knowledge-intensive and based on information, skills, technologies, etc., where knowledge transfer encourages farmers to enhance their decisions to adopt sustainable agriculture practices utilizing the increasing amount of data captured by emerging technologies, e.g., the Internet of Things (IoT), mobile technologies and devices, etc. [ 5 , 53 , 54 ]. Machine learning can be applied in various phases of sustainable agriculture, such as in the pre-production phase - for the prediction of crop yield, soil properties, irrigation requirements, etc.; in the production phase—for weather prediction, disease detection, weed detection, soil nutrient management, livestock management, etc.; in processing phase—for demand estimation, production planning, etc. and in the distribution phase - the inventory management, consumer analysis, etc.
User behavior analytics and context-aware smartphone applications: Context-awareness is a system’s ability to capture knowledge about its surroundings at any moment and modify behaviors accordingly [ 28 , 93 ]. Context-aware computing uses software and hardware to automatically collect and interpret data for direct responses. The mobile app development environment has been changed greatly with the power of AI, particularly, machine learning techniques through their learning capabilities from contextual data [ 103 , 136 ]. Thus, the developers of mobile apps can rely on machine learning to create smart apps that can understand human behavior, support, and entertain users [ 107 , 137 , 140 ]. To build various personalized data-driven context-aware systems, such as smart interruption management, smart mobile recommendation, context-aware smart searching, decision-making that intelligently assist end mobile phone users in a pervasive computing environment, machine learning techniques are applicable. For example, context-aware association rules can be used to build an intelligent phone call application [ 104 ]. Clustering approaches are useful in capturing users’ diverse behavioral activities by taking into account data in time series [ 102 ]. To predict the future events in various contexts, the classification methods can be used [ 106 , 139 ]. Thus, various learning techniques discussed in Sect. “ Machine Learning Tasks and Algorithms ” can help to build context-aware adaptive and smart applications according to the preferences of the mobile phone users.
In addition to these application areas, machine learning-based models can also apply to several other domains such as bioinformatics, cheminformatics, computer networks, DNA sequence classification, economics and banking, robotics, advanced engineering, and many more.
Challenges and Research Directions
Our study on machine learning algorithms for intelligent data analysis and applications opens several research issues in the area. Thus, in this section, we summarize and discuss the challenges faced and the potential research opportunities and future directions.
In general, the effectiveness and the efficiency of a machine learning-based solution depend on the nature and characteristics of the data, and the performance of the learning algorithms. To collect the data in the relevant domain, such as cybersecurity, IoT, healthcare and agriculture discussed in Sect. “ Applications of Machine Learning ” is not straightforward, although the current cyberspace enables the production of a huge amount of data with very high frequency. Thus, collecting useful data for the target machine learning-based applications, e.g., smart city applications, and their management is important to further analysis. Therefore, a more in-depth investigation of data collection methods is needed while working on the real-world data. Moreover, the historical data may contain many ambiguous values, missing values, outliers, and meaningless data. The machine learning algorithms, discussed in Sect “ Machine Learning Tasks and Algorithms ” highly impact on data quality, and availability for training, and consequently on the resultant model. Thus, to accurately clean and pre-process the diverse data collected from diverse sources is a challenging task. Therefore, effectively modifying or enhance existing pre-processing methods, or proposing new data preparation techniques are required to effectively use the learning algorithms in the associated application domain.
To analyze the data and extract insights, there exist many machine learning algorithms, summarized in Sect. “ Machine Learning Tasks and Algorithms ”. Thus, selecting a proper learning algorithm that is suitable for the target application is challenging. The reason is that the outcome of different learning algorithms may vary depending on the data characteristics [ 106 ]. Selecting a wrong learning algorithm would result in producing unexpected outcomes that may lead to loss of effort, as well as the model’s effectiveness and accuracy. In terms of model building, the techniques discussed in Sect. “ Machine Learning Tasks and Algorithms ” can directly be used to solve many real-world issues in diverse domains, such as cybersecurity, smart cities and healthcare summarized in Sect. “ Applications of Machine Learning ”. However, the hybrid learning model, e.g., the ensemble of methods, modifying or enhancement of the existing learning techniques, or designing new learning methods, could be a potential future work in the area.
Thus, the ultimate success of a machine learning-based solution and corresponding applications mainly depends on both the data and the learning algorithms. If the data are bad to learn, such as non-representative, poor-quality, irrelevant features, or insufficient quantity for training, then the machine learning models may become useless or will produce lower accuracy. Therefore, effectively processing the data and handling the diverse learning algorithms are important, for a machine learning-based solution and eventually building intelligent applications.
In this paper, we have conducted a comprehensive overview of machine learning algorithms for intelligent data analysis and applications. According to our goal, we have briefly discussed how various types of machine learning methods can be used for making solutions to various real-world issues. A successful machine learning model depends on both the data and the performance of the learning algorithms. The sophisticated learning algorithms then need to be trained through the collected real-world data and knowledge related to the target application before the system can assist with intelligent decision-making. We also discussed several popular application areas based on machine learning techniques to highlight their applicability in various real-world issues. Finally, we have summarized and discussed the challenges faced and the potential research opportunities and future directions in the area. Therefore, the challenges that are identified create promising research opportunities in the field which must be addressed with effective solutions in various application areas. Overall, we believe that our study on machine learning-based solutions opens up a promising direction and can be used as a reference guide for potential research and applications for both academia and industry professionals as well as for decision-makers, from a technical point of view.
Canadian institute of cybersecurity, university of new brunswick, iscx dataset, http://www.unb.ca/cic/datasets/index.html/ (Accessed on 20 October 2019).
Cic-ddos2019 [online]. available: https://www.unb.ca/cic/datasets/ddos-2019.html/ (Accessed on 28 March 2020).
World health organization: WHO. http://www.who.int/ .
Google trends. In https://trends.google.com/trends/ , 2019.
Adnan N, Nordin Shahrina Md, Rahman I, Noor A. The effects of knowledge transfer on farmers decision making toward sustainable agriculture practices. World J Sci Technol Sustain Dev. 2018.
Agrawal R, Gehrke J, Gunopulos D, Raghavan P. Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the 1998 ACM SIGMOD international conference on Management of data. 1998; 94–105
Agrawal R, Imieliński T, Swami A. Mining association rules between sets of items in large databases. In: ACM SIGMOD Record. ACM. 1993;22: 207–216
Agrawal R, Gehrke J, Gunopulos D, Raghavan P. Fast algorithms for mining association rules. In: Proceedings of the International Joint Conference on Very Large Data Bases, Santiago Chile. 1994; 1215: 487–499.
Aha DW, Kibler D, Albert M. Instance-based learning algorithms. Mach Learn. 1991;6(1):37–66.
Article Google Scholar
Alakus TB, Turkoglu I. Comparison of deep learning approaches to predict covid-19 infection. Chaos Solit Fract. 2020;140:
Amit Y, Geman D. Shape quantization and recognition with randomized trees. Neural Comput. 1997;9(7):1545–88.
Ankerst M, Breunig MM, Kriegel H-P, Sander J. Optics: ordering points to identify the clustering structure. ACM Sigmod Record. 1999;28(2):49–60.
Anzai Y. Pattern recognition and machine learning. Elsevier; 2012.
MATH Google Scholar
Ardabili SF, Mosavi A, Ghamisi P, Ferdinand F, Varkonyi-Koczy AR, Reuter U, Rabczuk T, Atkinson PM. Covid-19 outbreak prediction with machine learning. Algorithms. 2020;13(10):249.
Article MathSciNet Google Scholar
Baldi P. Autoencoders, unsupervised learning, and deep architectures. In: Proceedings of ICML workshop on unsupervised and transfer learning, 2012; 37–49 .
Balducci F, Impedovo D, Pirlo G. Machine learning applications on agricultural datasets for smart farm enhancement. Machines. 2018;6(3):38.
Boukerche A, Wang J. Machine learning-based traffic prediction models for intelligent transportation systems. Comput Netw. 2020;181
Breiman L. Bagging predictors. Mach Learn. 1996;24(2):123–40.
Article MATH Google Scholar
Breiman L. Random forests. Mach Learn. 2001;45(1):5–32.
Breiman L, Friedman J, Stone CJ, Olshen RA. Classification and regression trees. CRC Press; 1984.
Cao L. Data science: a comprehensive overview. ACM Comput Surv (CSUR). 2017;50(3):43.
Google Scholar
Carpenter GA, Grossberg S. A massively parallel architecture for a self-organizing neural pattern recognition machine. Comput Vis Graph Image Process. 1987;37(1):54–115.
Chiu C-C, Sainath TN, Wu Y, Prabhavalkar R, Nguyen P, Chen Z, Kannan A, Weiss RJ, Rao K, Gonina E, et al. State-of-the-art speech recognition with sequence-to-sequence models. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018 pages 4774–4778. IEEE .
Chollet F. Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1251–1258, 2017.
Cobuloglu H, Büyüktahtakın IE. A stochastic multi-criteria decision analysis for sustainable biomass crop selection. Expert Syst Appl. 2015;42(15–16):6065–74.
Das A, Ng W-K, Woon Y-K. Rapid association rule mining. In: Proceedings of the tenth international conference on Information and knowledge management, pages 474–481. ACM, 2001.
de Amorim RC. Constrained clustering with minkowski weighted k-means. In: 2012 IEEE 13th International Symposium on Computational Intelligence and Informatics (CINTI), pages 13–17. IEEE, 2012.
Dey AK. Understanding and using context. Person Ubiquit Comput. 2001;5(1):4–7.
Eagle N, Pentland AS. Reality mining: sensing complex social systems. Person Ubiquit Comput. 2006;10(4):255–68.
Essien A, Petrounias I, Sampaio P, Sampaio S. Improving urban traffic speed prediction using data source fusion and deep learning. In: 2019 IEEE International Conference on Big Data and Smart Computing (BigComp). IEEE. 2019: 1–8. .
Essien A, Petrounias I, Sampaio P, Sampaio S. A deep-learning model for urban traffic flow prediction with traffic events mined from twitter. In: World Wide Web, 2020: 1–24 .
Ester M, Kriegel H-P, Sander J, Xiaowei X, et al. A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd. 1996;96:226–31.
Fatima M, Pasha M, et al. Survey of machine learning algorithms for disease diagnostic. J Intell Learn Syst Appl. 2017;9(01):1.
Flach PA, Lachiche N. Confirmation-guided discovery of first-order rules with tertius. Mach Learn. 2001;42(1–2):61–95.
Freund Y, Schapire RE, et al. Experiments with a new boosting algorithm. In: Icml, Citeseer. 1996; 96: 148–156
Fujiyoshi H, Hirakawa T, Yamashita T. Deep learning-based image recognition for autonomous driving. IATSS Res. 2019;43(4):244–52.
Fukunaga K, Hostetler L. The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Trans Inform Theory. 1975;21(1):32–40.
Article MathSciNet MATH Google Scholar
Goodfellow I, Bengio Y, Courville A, Bengio Y. Deep learning. Cambridge: MIT Press; 2016.
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial nets. In: Advances in neural information processing systems. 2014: 2672–2680.
Guerrero-Ibáñez J, Zeadally S, Contreras-Castillo J. Sensor technologies for intelligent transportation systems. Sensors. 2018;18(4):1212.
Han J, Pei J, Kamber M. Data mining: concepts and techniques. Amsterdam: Elsevier; 2011.
Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation. In: ACM Sigmod Record, ACM. 2000;29: 1–12.
Harmon SA, Sanford TH, Sheng X, Turkbey EB, Roth H, Ziyue X, Yang D, Myronenko A, Anderson V, Amalou A, et al. Artificial intelligence for the detection of covid-19 pneumonia on chest ct using multinational datasets. Nat Commun. 2020;11(1):1–7.
He K, Zhang X, Ren S, Sun J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell. 2015;37(9):1904–16.
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016: 770–778.
Hinton GE. A practical guide to training restricted boltzmann machines. In: Neural networks: Tricks of the trade. Springer. 2012; 599-619
Holte RC. Very simple classification rules perform well on most commonly used datasets. Mach Learn. 1993;11(1):63–90.
Hotelling H. Analysis of a complex of statistical variables into principal components. J Edu Psychol. 1933;24(6):417.
Houtsma M, Swami A. Set-oriented mining for association rules in relational databases. In: Data Engineering, 1995. Proceedings of the Eleventh International Conference on, IEEE.1995:25–33.
Jamshidi M, Lalbakhsh A, Talla J, Peroutka Z, Hadjilooei F, Lalbakhsh P, Jamshidi M, La Spada L, Mirmozafari M, Dehghani M, et al. Artificial intelligence and covid-19: deep learning approaches for diagnosis and treatment. IEEE Access. 2020;8:109581–95.
John GH, Langley P. Estimating continuous distributions in bayesian classifiers. In: Proceedings of the Eleventh conference on Uncertainty in artificial intelligence, Morgan Kaufmann Publishers Inc. 1995; 338–345
Kaelbling LP, Littman ML, Moore AW. Reinforcement learning: a survey. J Artif Intell Res. 1996;4:237–85.
Kamble SS, Gunasekaran A, Gawankar SA. Sustainable industry 4.0 framework: a systematic literature review identifying the current trends and future perspectives. Process Saf Environ Protect. 2018;117:408–25.
Kamble SS, Gunasekaran A, Gawankar SA. Achieving sustainable performance in a data-driven agriculture supply chain: a review for research and applications. Int J Prod Econ. 2020;219:179–94.
Kaufman L, Rousseeuw PJ. Finding groups in data: an introduction to cluster analysis, vol. 344. John Wiley & Sons; 2009.
Keerthi SS, Shevade SK, Bhattacharyya C, Radha Krishna MK. Improvements to platt’s smo algorithm for svm classifier design. Neural Comput. 2001;13(3):637–49.
Khadse V, Mahalle PN, Biraris SV. An empirical comparison of supervised machine learning algorithms for internet of things data. In: 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA), IEEE. 2018; 1–6
Kohonen T. The self-organizing map. Proc IEEE. 1990;78(9):1464–80.
Koroniotis N, Moustafa N, Sitnikova E, Turnbull B. Towards the development of realistic botnet dataset in the internet of things for network forensic analytics: bot-iot dataset. Fut Gen Comput Syst. 2019;100:779–96.
Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, 2012: 1097–1105
Kushwaha S, Bahl S, Bagha AK, Parmar KS, Javaid M, Haleem A, Singh RP. Significant applications of machine learning for covid-19 pandemic. J Ind Integr Manag. 2020;5(4).
Lade P, Ghosh R, Srinivasan S. Manufacturing analytics and industrial internet of things. IEEE Intell Syst. 2017;32(3):74–9.
Lalmuanawma S, Hussain J, Chhakchhuak L. Applications of machine learning and artificial intelligence for covid-19 (sars-cov-2) pandemic: a review. Chaos Sol Fract. 2020:110059 .
LeCessie S, Van Houwelingen JC. Ridge estimators in logistic regression. J R Stat Soc Ser C (Appl Stat). 1992;41(1):191–201.
LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proc IEEE. 1998;86(11):2278–324.
Liu H, Motoda H. Feature extraction, construction and selection: A data mining perspective, vol. 453. Springer Science & Business Media; 1998.
López G, Quesada L, Guerrero LA. Alexa vs. siri vs. cortana vs. google assistant: a comparison of speech-based natural user interfaces. In: International Conference on Applied Human Factors and Ergonomics, Springer. 2017; 241–250.
Liu B, HsuW, Ma Y. Integrating classification and association rule mining. In: Proceedings of the fourth international conference on knowledge discovery and data mining, 1998.
MacQueen J, et al. Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, 1967;volume 1, pages 281–297. Oakland, CA, USA.
Mahdavinejad MS, Rezvan M, Barekatain M, Adibi P, Barnaghi P, Sheth AP. Machine learning for internet of things data analysis: a survey. Digit Commun Netw. 2018;4(3):161–75.
Marchand A, Marx P. Automated product recommendations with preference-based explanations. J Retail. 2020;96(3):328–43.
McCallum A. Information extraction: distilling structured data from unstructured text. Queue. 2005;3(9):48–57.
Mehrotra A, Hendley R, Musolesi M. Prefminer: mining user’s preferences for intelligent mobile notification management. In: Proceedings of the International Joint Conference on Pervasive and Ubiquitous Computing, Heidelberg, Germany, 12–16 September, 2016; pp. 1223–1234. ACM, New York, USA. .
Mohamadou Y, Halidou A, Kapen PT. A review of mathematical modeling, artificial intelligence and datasets used in the study, prediction and management of covid-19. Appl Intell. 2020;50(11):3913–25.
Mohammed M, Khan MB, Bashier Mohammed BE. Machine learning: algorithms and applications. CRC Press; 2016.
Book Google Scholar
Moustafa N, Slay J. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In: 2015 military communications and information systems conference (MilCIS), 2015;pages 1–6. IEEE .
Nilashi M, Ibrahim OB, Ahmadi H, Shahmoradi L. An analytical method for diseases prediction using machine learning techniques. Comput Chem Eng. 2017;106:212–23.
Yujin O, Park S, Ye JC. Deep learning covid-19 features on cxr using limited training data sets. IEEE Trans Med Imaging. 2020;39(8):2688–700.
Otter DW, Medina JR , Kalita JK. A survey of the usages of deep learning for natural language processing. IEEE Trans Neural Netw Learn Syst. 2020.
Park H-S, Jun C-H. A simple and fast algorithm for k-medoids clustering. Expert Syst Appl. 2009;36(2):3336–41.
Liii Pearson K. on lines and planes of closest fit to systems of points in space. Lond Edinb Dublin Philos Mag J Sci. 1901;2(11):559–72.
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, et al. Scikit-learn: machine learning in python. J Mach Learn Res. 2011;12:2825–30.
MathSciNet MATH Google Scholar
Perveen S, Shahbaz M, Keshavjee K, Guergachi A. Metabolic syndrome and development of diabetes mellitus: predictive modeling based on machine learning techniques. IEEE Access. 2018;7:1365–75.
Santi P, Ram D, Rob C, Nathan E. Behavior-based adaptive call predictor. ACM Trans Auton Adapt Syst. 2011;6(3):21:1–21:28.
Polydoros AS, Nalpantidis L. Survey of model-based reinforcement learning: applications on robotics. J Intell Robot Syst. 2017;86(2):153–73.
Puterman ML. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons; 2014.
Quinlan JR. Induction of decision trees. Mach Learn. 1986;1:81–106.
Quinlan JR. C4.5: programs for machine learning. Mach Learn. 1993.
Rasmussen C. The infinite gaussian mixture model. Adv Neural Inform Process Syst. 1999;12:554–60.
Ravi K, Ravi V. A survey on opinion mining and sentiment analysis: tasks, approaches and applications. Knowl Syst. 2015;89:14–46.
Rokach L. A survey of clustering algorithms. In: Data mining and knowledge discovery handbook, pages 269–298. Springer, 2010.
Safdar S, Zafar S, Zafar N, Khan NF. Machine learning based decision support systems (dss) for heart disease diagnosis: a review. Artif Intell Rev. 2018;50(4):597–623.
Sarker IH. Context-aware rule learning from smartphone data: survey, challenges and future directions. J Big Data. 2019;6(1):1–25.
Sarker IH. A machine learning based robust prediction model for real-life mobile phone data. Internet Things. 2019;5:180–93.
Sarker IH. Ai-driven cybersecurity: an overview, security intelligence modeling and research directions. SN Comput Sci. 2021.
Sarker IH. Deep cybersecurity: a comprehensive overview from neural network and deep learning perspective. SN Comput Sci. 2021.
Sarker IH, Abushark YB, Alsolami F, Khan A. Intrudtree: a machine learning based cyber security intrusion detection model. Symmetry. 2020;12(5):754.
Sarker IH, Abushark YB, Khan A. Contextpca: predicting context-aware smartphone apps usage based on machine learning techniques. Symmetry. 2020;12(4):499.
Sarker IH, Alqahtani H, Alsolami F, Khan A, Abushark YB, Siddiqui MK. Context pre-modeling: an empirical analysis for classification based user-centric context-aware predictive modeling. J Big Data. 2020;7(1):1–23.
Sarker IH, Alan C, Jun H, Khan AI, Abushark YB, Khaled S. Behavdt: a behavioral decision tree learning to build user-centric context-aware predictive model. Mob Netw Appl. 2019; 1–11.
Sarker IH, Colman A, Kabir MA, Han J. Phone call log as a context source to modeling individual user behavior. In: Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing (Ubicomp): Adjunct, Germany, pages 630–634. ACM, 2016.
Sarker IH, Colman A, Kabir MA, Han J. Individualized time-series segmentation for mining mobile phone user behavior. Comput J Oxf Univ UK. 2018;61(3):349–68.
Sarker IH, Hoque MM, MdK Uddin, Tawfeeq A. Mobile data science and intelligent apps: concepts, ai-based modeling and research directions. Mob Netw Appl, pages 1–19, 2020.
Sarker IH, Kayes ASM. Abc-ruleminer: user behavioral rule-based machine learning method for context-aware intelligent services. J Netw Comput Appl. 2020; page 102762
Sarker IH, Kayes ASM, Badsha S, Alqahtani H, Watters P, Ng A. Cybersecurity data science: an overview from machine learning perspective. J Big Data. 2020;7(1):1–29.
Sarker IH, Watters P, Kayes ASM. Effectiveness analysis of machine learning classification models for predicting personalized context-aware smartphone usage. J Big Data. 2019;6(1):1–28.
Sarker IH, Salah K. Appspred: predicting context-aware smartphone apps using random forest learning. Internet Things. 2019;8:
Scheffer T. Finding association rules that trade support optimally against confidence. Intell Data Anal. 2005;9(4):381–95.
Sharma R, Kamble SS, Gunasekaran A, Kumar V, Kumar A. A systematic literature review on machine learning applications for sustainable agriculture supply chain performance. Comput Oper Res. 2020;119:
Shengli S, Ling CX. Hybrid cost-sensitive decision tree, knowledge discovery in databases. In: PKDD 2005, Proceedings of 9th European Conference on Principles and Practice of Knowledge Discovery in Databases. Lecture Notes in Computer Science, volume 3721, 2005.
Shorten C, Khoshgoftaar TM, Furht B. Deep learning applications for covid-19. J Big Data. 2021;8(1):1–54.
Gökhan S, Nevin Y. Data analysis in health and big data: a machine learning medical diagnosis model based on patients’ complaints. Commun Stat Theory Methods. 2019;1–10
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, et al. Mastering the game of go with deep neural networks and tree search. nature. 2016;529(7587):484–9.
Ślusarczyk B. Industry 4.0: Are we ready? Polish J Manag Stud. 17, 2018.
Sneath Peter HA. The application of computers to taxonomy. J Gen Microbiol. 1957;17(1).
Sorensen T. Method of establishing groups of equal amplitude in plant sociology based on similarity of species. Biol Skr. 1948; 5.
Srinivasan V, Moghaddam S, Mukherji A. Mobileminer: mining your frequent patterns on your phone. In: Proceedings of the International Joint Conference on Pervasive and Ubiquitous Computing, Seattle, WA, USA, 13-17 September, pp. 389–400. ACM, New York, USA. 2014.
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2015; pages 1–9.
Tavallaee M, Bagheri E, Lu W, Ghorbani AA. A detailed analysis of the kdd cup 99 data set. In. IEEE symposium on computational intelligence for security and defense applications. IEEE. 2009;2009:1–6.
Tsagkias M. Tracy HK, Surya K, Vanessa M, de Rijke M. Challenges and research opportunities in ecommerce search and recommendations. In: ACM SIGIR Forum. volume 54. NY, USA: ACM New York; 2021. p. 1–23.
Wagstaff K, Cardie C, Rogers S, Schrödl S, et al. Constrained k-means clustering with background knowledge. Icml. 2001;1:577–84.
Wang W, Yang J, Muntz R, et al. Sting: a statistical information grid approach to spatial data mining. VLDB. 1997;97:186–95.
Wei P, Li Y, Zhang Z, Tao H, Li Z, Liu D. An optimization method for intrusion detection classification model based on deep belief network. IEEE Access. 2019;7:87593–605.
Weiss K, Khoshgoftaar TM, Wang DD. A survey of transfer learning. J Big data. 2016;3(1):9.
Witten IH, Frank E. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann; 2005.
Witten IH, Frank E, Trigg LE, Hall MA, Holmes G, Cunningham SJ. Weka: practical machine learning tools and techniques with java implementations. 1999.
Wu C-C, Yen-Liang C, Yi-Hung L, Xiang-Yu Y. Decision tree induction with a constrained number of leaf nodes. Appl Intell. 2016;45(3):673–85.
Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Philip SY, et al. Top 10 algorithms in data mining. Knowl Inform Syst. 2008;14(1):1–37.
Xin Y, Kong L, Liu Z, Chen Y, Li Y, Zhu H, Gao M, Hou H, Wang C. Machine learning and deep learning methods for cybersecurity. IEEE Access. 2018;6:35365–81.
Xu D, Yingjie T. A comprehensive survey of clustering algorithms. Ann Data Sci. 2015;2(2):165–93.
Zaki MJ. Scalable algorithms for association mining. IEEE Trans Knowl Data Eng. 2000;12(3):372–90.
Zanella A, Bui N, Castellani A, Vangelista L, Zorzi M. Internet of things for smart cities. IEEE Internet Things J. 2014;1(1):22–32.
Zhao Q, Bhowmick SS. Association rule mining: a survey. Singapore: Nanyang Technological University; 2003.
Zheng T, Xie W, Xu L, He X, Zhang Y, You M, Yang G, Chen Y. A machine learning-based framework to identify type 2 diabetes through electronic health records. Int J Med Inform. 2017;97:120–7.
Zheng Y, Rajasegarar S, Leckie C. Parking availability prediction for sensor-enabled car parks in smart cities. In: Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP), 2015 IEEE Tenth International Conference on. IEEE, 2015; pages 1–6.
Zhu H, Cao H, Chen E, Xiong H, Tian J. Exploiting enriched contextual information for mobile app classification. In: Proceedings of the 21st ACM international conference on Information and knowledge management. ACM, 2012; pages 1617–1621
Zhu H, Chen E, Xiong H, Kuifei Y, Cao H, Tian J. Mining mobile user preferences for personalized context-aware recommendation. ACM Trans Intell Syst Technol (TIST). 2014;5(4):58.
Zikang H, Yong Y, Guofeng Y, Xinyu Z. Sentiment analysis of agricultural product ecommerce review data based on deep learning. In: 2020 International Conference on Internet of Things and Intelligent Applications (ITIA), IEEE, 2020; pages 1–7
Zulkernain S, Madiraju P, Ahamed SI. A context aware interruption management system for mobile devices. In: Mobile Wireless Middleware, Operating Systems, and Applications. Springer. 2010; pages 221–234
Zulkernain S, Madiraju P, Ahamed S, Stamm K. A mobile intelligent interruption management system. J UCS. 2010;16(15):2060–80.
Download references
Author information
Authors and affiliations.
Swinburne University of Technology, Melbourne, VIC, 3122, Australia
Iqbal H. Sarker
Department of Computer Science and Engineering, Chittagong University of Engineering & Technology, 4349, Chattogram, Bangladesh
You can also search for this author in PubMed Google Scholar
Corresponding author
Correspondence to Iqbal H. Sarker .
Ethics declarations
Conflict of interest.
The author declares no conflict of interest.
Additional information
Publisher's note.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the topical collection “Advances in Computational Approaches for Artificial Intelligence, Image Processing, IoT and Cloud Applications” guest edited by Bhanu Prakash K N and M. Shivakumar.
Rights and permissions
Reprints and permissions
About this article
Sarker, I.H. Machine Learning: Algorithms, Real-World Applications and Research Directions. SN COMPUT. SCI. 2 , 160 (2021). https://doi.org/10.1007/s42979-021-00592-x
Download citation
Received : 27 January 2021
Accepted : 12 March 2021
Published : 22 March 2021
DOI : https://doi.org/10.1007/s42979-021-00592-x
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
- Machine learning
- Deep learning
- Artificial intelligence
- Data science
- Data-driven decision-making
- Predictive analytics
- Intelligent applications
- Find a journal
- Publish with us
- Track your research
- Frontiers in Big Data
- Data Mining and Management
- Research Topics
Ethical Artificial Intelligence: Methods and Applications
Total Downloads
Total Views and Downloads
About this Research Topic
As the field of Big Data analytics continues to evolve, ensuring fairness, transparency, and ethical considerations in the design and deployment of AI systems has become a critical challenge. This Research Topic focuses on advancing research in ethical AI and fairness-aware machine learning. The widespread integration of AI technologies in various industries requires that we address inherent biases and develop robust, transparent models that promote equitable outcomes for diverse populations. The aim of this Research Topic is to provide a platform for cutting-edge research that explores the ethical dimensions of AI and big data, encouraging innovative solutions that mitigate biases, enhance model fairness, and improve the overall trustworthiness of AI systems. This issue will bring together contributions that focus on the theoretical, technical, and practical aspects of ethical AI, with particular attention to real-world applications. We welcome submissions of original research articles and comprehensive reviews that explore ethical considerations in big data analytics. Our goal is to foster interdisciplinary discussions on how to build and deploy AI systems that are not only accurate and efficient but also fair and socially responsible. By focusing on these critical issues, this curated collection aims to inspire the next generation of AI systems that are powerful, reliable, and ethical, shaping a future where technology can be trusted to work for the benefit of all. In this themed article collection, original research articles and reviews are welcome. Research areas may include (but are not limited to) the following: • Algorithmic fairness and bias in classifying and clustering big data; • Human-in-the-loop for ethical-aware machine learning; • Ethical recommender systems and diversity in recommendation; • Learning ethical-aware representation of heterogeneous data domains; • Causality-based fairness in high-dimensional data; • Integration of observation for causality-based bias control; • Preserving fairness in graph embedding; • Novel visualization techniques to facilitate the query and analysis of data bias; • Robustness and generalization of large language models; • Bias mitigation and fairness of large language models; • Explainability, interpretability, privacy and security of large language models; • First-hand experience creating or with company practices for ethical AI; And with particular focuses but not limited to these application domains: • Application of ethical AI methods in large-scale data mining; • Computer vision (fairness in face recognition, object relation; debiasing in image processing and video); • Natural language processing (fair text generation, semantic parsing); • Reinforcement learning (fairness-aware multi-agent learning, compositional imitation learning); • Social science (racial profiling, institutional racism).
Keywords : ethical AI, fairness, transparency, large-scale data mining
Important Note : All contributions to this Research Topic must be within the scope of the section and journal to which they are submitted, as defined in their mission statements. Frontiers reserves the right to guide an out-of-scope manuscript to a more suitable section or journal at any stage of peer review.
Topic Editors
Topic coordinators, submission deadlines, participating journals.
Manuscripts can be submitted to this Research Topic via the following journals:
total views
- Demographics
No records found
total views article views downloads topic views
Top countries
Top referring sites, about frontiers research topics.
With their unique mixes of varied contributions from Original Research to Review Articles, Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author.
Notes from the AI frontier: Applications and value of deep learning
Artificial intelligence (AI) stands out as a transformational technology of our digital age—and its practical application throughout the economy is growing apace. For this briefing, Notes from the AI frontier: Insights from hundreds of use cases (PDF–446KB), we mapped both traditional analytics and newer “deep learning” techniques and the problems they can solve to more than 400 specific use cases in companies and organizations. Drawing on McKinsey Global Institute research and the applied experience with AI of McKinsey Analytics, we assess both the practical applications and the economic potential of advanced AI techniques across industries and business functions. Our findings highlight the substantial potential of applying deep learning techniques to use cases across the economy, but we also see some continuing limitations and obstacles—along with future opportunities as the technologies continue their advance. Ultimately, the value of AI is not to be found in the models themselves, but in companies’ abilities to harness them.
It is important to highlight that, even as we see economic potential in the use of AI techniques, the use of data must always take into account concerns including data security, privacy, and potential issues of bias.
Mapping AI techniques to problem types
Insights from use cases, sizing the potential value of ai, the road to impact and value.
As artificial intelligence technologies advance, so does the definition of which techniques constitute AI . For the purposes of this briefing, we use AI as shorthand for deep learning techniques that use artificial neural networks. We also examined other machine learning techniques and traditional analytics techniques (Exhibit 1).
Neural networks are a subset of machine learning techniques. Essentially, they are AI systems based on simulating connected “neural units,” loosely modeling the way that neurons interact in the brain. Computational models inspired by neural connections have been studied since the 1940s and have returned to prominence as computer processing power has increased and large training data sets have been used to successfully analyze input data such as images, video, and speech. AI practitioners refer to these techniques as “deep learning,” since neural networks have many (“deep”) layers of simulated interconnected neurons.
We analyzed the applications and value of three neural network techniques:
- Feed forward neural networks : the simplest type of artificial neural network. In this architecture, information moves in only one direction, forward, from the input layer, through the “hidden” layers, to the output layer. There are no loops in the network. The first single-neuron network was proposed already in 1958 by AI pioneer Frank Rosenblatt. While the idea is not new, advances in computing power, training algorithms, and available data led to higher levels of performance than previously possible.
- Recurrent neural networks (RNNs) : Artificial neural networks whose connections between neurons include loops, well-suited for processing sequences of inputs. In November 2016, Oxford University researchers reported that a system based on recurrent neural networks (and convolutional neural networks) had achieved 95 percent accuracy in reading lips, outperforming experienced human lip readers, who tested at 52 percent accuracy.
- Convolutional neural networks (CNNs) : Artificial neural networks in which the connections between neural layers are inspired by the organization of the animal visual cortex, the portion of the brain that processes images, well suited for perceptual tasks.
For our use cases, we also considered two other techniques—generative adversarial networks (GANs) and reinforcement learning—but did not include them in our potential value assessment of AI, since they remain nascent techniques that are not yet widely applied.
Generative adversarial networks (GANs) use two neural networks contesting one other in a zero-sum game framework (thus “adversarial”). GANs can learn to mimic various distributions of data (for example text, speech, and images) and are therefore valuable in generating test datasets when these are not readily available.
Reinforcement learning is a subfield of machine learning in which systems are trained by receiving virtual “rewards” or “punishments”, essentially learning by trial and error. Google DeepMind has used reinforcement learning to develop systems that can play games, including video games and board games such as Go, better than human champions.
Problem types and their definitions
Classification : Based on a set of training data, categorize new inputs as belonging to one of a set of categories. An example of classification is identifying whether an image contains a specific type of object, such as a cat or a dog, or a product of acceptable quality coming from a manufacturing line.
Continuous estimation : Based on a set of training data, estimate the next numeric value in a sequence. This type of problem is sometimes described as “prediction,” particularly when it is applied to time series data. One example of continuous estimation is forecasting the sales demand for a product, based on a set of input data such as previous sales figures, consumer sentiment, and weather.
Clustering : These problems require a system to create a set of categories, for which individual data instances have a set of common or similar characteristics. An example of clustering is creating a set of consumer segments, based on a set of data about individual consumers, including demographics, preferences, and buyer behavior.
All other optimization : These problems require a system to generate a set of outputs that optimize outcomes for a specific objective function (some of the other problem types can be considered types of optimization, so we describe these as “all other” optimization). Generating a route for a vehicle that creates the optimum combination of time and fuel utilization is an example of optimization.
Anomaly detection : Given a training set of data, determine whether specific inputs are out of the ordinary. For instance, a system could be trained on a set of historical vibration data associated with the performance of an operating piece of machinery, and then determine whether a new vibration reading suggests that the machine is not operating normally. Anomaly detection can be considered a subcategory of classification.
Ranking : Ranking algorithms are used most often in information retrieval problems where the results of a query or request needs to be ordered by some criterion. Recommendation systems suggesting next product to buy use these types of algorithms as a final step, sorting suggestions by relevance, before presenting the results to the user.
Recommendations : These systems provide recommendations based on a set of training data. A common example of recommendations are systems that suggest “next product to buy” for an individual buyer, based on the buying patterns of similar individuals, and the observed behavior of the specific person.
Data generation : These problems require a system to generate appropriately novel data based on training data. For instance, a music composition system might be used to generate new pieces of music in a particular style, after having been trained on pieces of music in that style.
In a business setting, these analytic techniques can be applied to solve real-life problems. The most prevalent problem types are classification, continuous estimation and clustering. A list of problem types and their definitions is available in the sidebar.
We collated and analyzed more than 400 use cases across 19 industries and nine business functions. They provided insight into the areas within specific sectors where deep neural networks can potentially create the most value, the incremental lift that these neural networks can generate compared with traditional analytics (Exhibit 2), and the voracious data requirements—in terms of volume, variety, and velocity—that must be met for this potential to be realized. Our library of use cases, while extensive, is not exhaustive, and may overstate or understate the potential for certain sectors. We will continue refining and adding to it.
Examples of where AI can be used to improve the performance of existing use cases include:
- Predictive maintenance: the power of machine learning to detect anomalies . Deep learning’s capacity to analyze very large amounts of high dimensional data can take existing preventive maintenance systems to a new level. Layering in additional data, such as audio and image data, from other sensors—including relatively cheap ones such as microphones and cameras—neural networks can enhance and possibly replace more traditional methods. AI’s ability to predict failures and allow planned interventions can be used to reduce downtime and operating costs while improving production yield. For example, AI can extend the life of a cargo plane beyond what is possible using traditional analytic techniques by combining plane model data, maintenance history, IoT sensor data such as anomaly detection on engine vibration data, and images and video of engine condition.
- AI-driven logistics optimization can reduce costs through real-time forecasts and behavioral coaching . Application of AI techniques such as continuous estimation to logistics can add substantial value across sectors. AI can optimize routing of delivery traffic, thereby improving fuel efficiency and reducing delivery times. One European trucking company has reduced fuel costs by 15 percent, for example, by using sensors that monitor both vehicle performance and driver behavior; drivers receive real-time coaching, including when to speed up or slow down, optimizing fuel consumption and reducing maintenance costs.
- AI can be a valuable tool for customer service management and personalization challenges . Improved speech recognition in call center management and call routing as a result of the application of AI techniques allow a more seamless experience for customers—and more efficient processing. The capabilities go beyond words alone. For example, deep learning analysis of audio allows systems to assess a customers’ emotional tone; in the event a customer is responding badly to the system, the call can be rerouted automatically to human operators and managers. In other areas of marketing and sales, AI techniques can also have a significant impact. Combining customer demographic and past transaction data with social media monitoring can help generate individualized product recommendations. “Next product to buy” recommendations that target individual customers—as companies such as Amazon and Netflix have successfully been doing--can lead to a twofold increase in the rate of sales conversions.
Two-thirds of the opportunities to use AI are in improving the performance of existing analytics use cases
In 69 percent of the use cases we studied, deep neural networks can be used to improve performance beyond that provided by other analytic techniques. Cases in which only neural networks can be used, which we refer to here as “greenfield” cases, constituted just 16 percent of the total. For the remaining 15 percent, artificial neural networks provided limited additional performance over other analytics techniques, among other reasons because of data limitations that made these cases unsuitable for deep learning (Exhibit 3).
Greenfield AI solutions are prevalent in business areas such as customer service management, as well as among some industries where the data are rich and voluminous and at times integrate human reactions. Among industries, we found many greenfield use cases in healthcare, in particular. Some of these cases involve disease diagnosis and improved care, and rely on rich data sets incorporating image and video inputs, including from MRIs.
On average, our use cases suggest that modern deep learning AI techniques have the potential to provide a boost in additional value above and beyond traditional analytics techniques ranging from 30 percent to 128 percent, depending on industry.
Visualizing the potential impact of AI and advanced analytics
Our interactive data visualization shows the potential value created by artificial intelligence and advanced analytics techniques for 19 industries and nine business functions.
In many of our use cases, however, traditional analytics and machine learning techniques continue to underpin a large percentage of the value creation potential in industries including insurance, pharmaceuticals and medical products, and telecommunications, with the potential of AI limited in certain contexts. In part this is due to the way data are used by these industries and to regulatory issues.
Data requirements for deep learning are substantially greater than for other analytics
Making effective use of neural networks in most applications requires large labeled training data sets alongside access to sufficient computing infrastructure. Furthermore, these deep learning techniques are particularly powerful in extracting patterns from complex, multidimensional data types such as images, video, and audio or speech.
Deep-learning methods require thousands of data records for models to become relatively good at classification tasks and, in some cases, millions for them to perform at the level of humans. By one estimate , a supervised deep-learning algorithm will generally achieve acceptable performance with around 5,000 labeled examples per category and will match or exceed human level performance when trained with a data set containing at least 10 million labeled examples. In some cases where advanced analytics is currently used, so much data are available—million or even billions of rows per data set—that AI usage is the most appropriate technique. However, if a threshold of data volume is not reached, AI may not add value to traditional analytics techniques.
These massive data sets can be difficult to obtain or create for many business use cases, and labeling remains a challenge. Most current AI models are trained through “supervised learning”, which requires humans to label and categorize the underlying data. However promising new techniques are emerging to overcome these data bottlenecks, such as reinforcement learning, generative adversarial networks, transfer learning, and “one-shot learning,” which allows a trained AI model to learn about a subject based on a small number of real-world demonstrations or examples—and sometimes just one.
Organizations will have to adopt and implement strategies that enable them to collect and integrate data at scale. Even with large datasets, they will have to guard against “overfitting,” where a model too tightly matches the “noisy” or random features of the training set, resulting in a corresponding lack of accuracy in future performance, and against “underfitting,” where the model fails to capture all of the relevant features. Linking data across customer segments and channels, rather than allowing the data to languish in silos, is especially important to create value.
Notes from the AI frontier: Insights from hundreds of use cases
Realizing ai’s full potential requires a diverse range of data types including images, video, and audio.
Neural AI techniques excel at analyzing image, video, and audio data types because of their complex, multidimensional nature, known by practitioners as “high dimensionality.” Neural networks are good at dealing with high dimensionality, as multiple layers in a network can learn to represent the many different features present in the data. Thus, for facial recognition, the first layer in the network could focus on raw pixels, the next on edges and lines, another on generic facial features, and the final layer might identify the face. Unlike previous generations of AI, which often required human expertise to do “feature engineering,” these neural network techniques are often able to learn to represent these features in their simulated neural networks as part of the training process.
Along with issues around the volume and variety of data, velocity is also a requirement: AI techniques require models to be retrained to match potential changing conditions, so the training data must be refreshed frequently. In one-third of the cases, the model needs to be refreshed at least monthly, and almost one in four cases requires a daily refresh; this is especially the case in marketing and sales and in supply chain management and manufacturing.
We estimate that the AI techniques we cite in this briefing together have the potential to create between $3.5 trillion and $5.8 trillion in value annually across nine business functions in 19 industries. This constitutes about 40 percent of the overall $9.5 trillion to $15.4 trillion annual impact that could potentially be enabled by all analytical techniques (Exhibit 4).
Per industry, we estimate that AI’s potential value amounts to between one and nine percent of 2016 revenue. The value as measured by percentage of industry revenue varies significantly among industries, depending on the specific applicable use cases, the availability of abundant and complex data, as well as on regulatory and other constraints.
These figures are not forecasts for a particular period, but they are indicative of the considerable potential for the global economy that advanced analytics represents.
From the use cases we have examined, we find that the greatest potential value impact from using AI are both in top-line-oriented functions, such as in marketing and sales, and bottom-line-oriented operational functions, including supply chain management and manufacturing.
Consumer industries such as retail and high tech will tend to see more potential from marketing and sales AI applications because frequent and digital interactions between business and customers generate larger data sets for AI techniques to tap into. E-commerce platforms, in particular, stand to benefit. This is because of the ease with which these platforms collect customer information such as click data or time spent on a web page and can then customize promotions, prices, and products for each customer dynamically and in real time.
Here is a snapshot of three sectors where we have seen AI’s impact: (Exhibit 5)
- In retail, marketing and sales is the area with the most significant potential value from AI, and within that function, pricing and promotion and customer service management are the main value areas. Our use cases show that using customer data to personalize promotions, for example, including tailoring individual offers every day, can lead to a one to two percent increase in incremental sales for brick-and-mortar retailers alone.
- In consumer goods, supply-chain management is the key function that could benefit from AI deployment. Among the examples in our use cases, we see how forecasting based on underlying causal drivers of demand rather than prior outcomes can improve forecasting accuracy by 10 to 20 percent, which translates into a potential five percent reduction in inventory costs and revenue increases of two to three percent.
- In banking, particularly retail banking, AI has significant value potential in marketing and sales, much as it does in retail. However, because of the importance of assessing and managing risk in banking, for example for loan underwriting and fraud detection, AI has much higher value potential to improve performance in risk in the banking sector than in many other industries.
Artificial intelligence is attracting growing amounts of corporate investment, and as the technologies develop, the potential value that can be unlocked is likely to grow. So far, however, only about 20 percent of AI-aware companies are currently using one or more of its technologies in a core business process or at scale.
For all their promise, AI technologies have plenty of limitations that will need to be overcome. They include the onerous data requirements listed above, but also five other limitations:
- First is the challenge of labeling training data, which often must be done manually and is necessary for supervised learning. Promising new techniques are emerging to address this challenge, such as reinforcement learning and in-stream supervision, in which data can be labeled in the course of natural usage.
- Second is the difficulty of obtaining data sets that are sufficiently large and comprehensive to be used for training; for many business use cases, creating or obtaining such massive data sets can be difficult—for example, limited clinical-trial data to predict healthcare treatment outcomes more accurately.
- Third is the difficulty of explaining in human terms results from large and complex models: why was a certain decision reached? Product certifications in healthcare and in the automotive and aerospace industries, for example, can be an obstacle; among other constraints, regulators often want rules and choice criteria to be clearly explainable.
- Fourth is the generalizability of learning: AI models continue to have difficulties in carrying their experiences from one set of circumstances to another. That means companies must commit resources to train new models even for use cases that are similar to previous ones. Transfer learning—in which an AI model is trained to accomplish a certain task and then quickly applies that learning to a similar but distinct activity—is one promising response to this challenge.
- The fifth limitation concerns the risk of bias in data and algorithms. This issue touches on concerns that are more social in nature and which could require broader steps to resolve, such as understanding how the processes used to collect training data can influence the behavior of models they are used to train. For example, unintended biases can be introduced when training data is not representative of the larger population to which an AI model is applied. Thus, facial recognition models trained on a population of faces corresponding to the demographics of AI developers could struggle when applied to populations with more diverse characteristics . A recent report on the malicious use of AI highlights a range of security threats, from sophisticated automation of hacking to hyper-personalized political disinformation campaigns.
Organizational challenges around technology, processes, and people can slow or impede AI adoption
Organizations planning to adopt significant deep learning efforts will need to consider a spectrum of options about how to do so. The range of options includes building a complete in-house AI capability, outsourcing these capabilities, or leveraging AI-as-a-service offerings.
Based on the use cases they plan to build, companies will need to create a data plan that produces results and predictions, which can be fed either into designed interfaces for humans to act on or into transaction systems. Key data engineering challenges include data creation or acquisition, defining data ontology, and building appropriate data “pipes.” Given the significant computational requirements of deep learning, some organizations will maintain their own data centers, because of regulations or security concerns, but the capital expenditures could be considerable, particularly when using specialized hardware. Cloud vendors offer another option.
Process can also become an impediment to successful adoption unless organizations are digitally mature. On the technical side, organizations will have to develop robust data maintenance and governance processes, and implement modern software disciplines such as Agile and DevOps. Even more challenging, in terms of scale, is overcoming the “last mile” problem of making sure the superior insights provided by AI are instantiated in the behavior of the people and processes of an enterprise.
On the people front, much of the construction and optimization of deep neural networks remains something of an art requiring real experts to deliver step-change performance increases. Demand for these skills far outstrips supply at present; according to some estimates , fewer than 10,000 people have the skills necessary to tackle serious AI problems. and competition for them is fierce among the tech giants.
AI can seem an elusive business case
Where AI techniques and data are available and the value is clearly proven, organizations can already pursue the opportunity. In some areas, the techniques today may be mature and the data available, but the cost and complexity of deploying AI may simply not be worthwhile, given the value that could be generated. For example, an airline could use facial recognition and other biometric scanning technology to streamline aircraft boarding, but the value of doing so may not justify the cost and issues around privacy and personal identification.
Similarly, we can see potential cases where the data and the techniques are maturing, but the value is not yet clear. The most unpredictable scenario is where either the data (both the types and volume) or the techniques are simply too new and untested to know how much value they could unlock. For example, in healthcare, if AI were able to build on the superhuman precision we are already starting to see with X-ray analysis and broaden that to more accurate diagnoses and even automated medical procedures, the economic value could be very significant. At the same time, the complexities and costs of arriving at this frontier are also daunting. Among other issues, it would require flawless technical execution and resolving issues of malpractice insurance and other legal concerns.
Societal concerns and regulations can also constrain AI use. Regulatory constraints are especially prevalent in use cases related to personally identifiable information. This is particularly relevant at a time of growing public debate about the use and commercialization of individual data on some online platforms. Use and storage of personal information is especially sensitive in sectors such as banking, health care, and pharmaceutical and medical products, as well as in the public and social sector. In addition to addressing these issues, businesses and other users of data for AI will need to continue to evolve business models related to data use in order to address societies’ concerns.. Furthermore, regulatory requirements and restrictions can differ from country to country, as well from sector to sector.
Implications for stakeholders
As we have seen, it is a company’s ability to execute against AI models that creates value, rather than the models themselves. In this final section, we sketch out some of the high-level implications of our study of AI use cases for providers of AI technology, appliers of AI technology, and policy makers, who set the context for both.
- For AI technology provider companies: Many companies that develop or provide AI to others have considerable strength in the technology itself and the data scientists needed to make it work, but they can lack a deep understanding of end markets. Understanding the value potential of AI across sectors and functions can help shape the portfolios of these AI technology companies. That said, they shouldn’t necessarily only prioritize the areas of highest potential value. Instead, they can combine that data with complementary analyses of the competitor landscape, of their own existing strengths, sector or function knowledge, and customer relationships, to shape their investment portfolios. On the technical side, the mapping of problem types and techniques to sectors and functions of potential value can guide a company with specific areas of expertise on where to focus.
- Many companies seeking to adopt AI in their operations have started machine learning and AI experiments across their business. Before launching more pilots or testing solutions, it is useful to step back and take a holistic approach to the issue, moving to create a prioritized portfolio of initiatives across the enterprise, including AI and the wider analytic and digital techniques available. For a business leader to create an appropriate portfolio, it is important to develop an understanding about which use cases and domains have the potential to drive the most value for a company, as well as which AI and other analytical techniques will need to be deployed to capture that value. This portfolio ought to be informed not only by where the theoretical value can be captured, but by the question of how the techniques can be deployed at scale across the enterprise. The question of how analytical techniques are scaling is driven less by the techniques themselves and more by a company’s skills, capabilities, and data. Companies will need to consider efforts on the “first mile,” that is, how to acquire and organize data and efforts, as well as on the “last mile,” or how to integrate the output of AI models into work flows ranging from clinical trial managers and sales force managers to procurement officers. Previous MGI research suggests that AI leaders invest heavily in these first- and last-mile efforts.
- Policy makers will need to strike a balance between supporting the development of AI technologies and managing any risks from bad actors. They have an interest in supporting broad adoption, since AI can lead to higher labor productivity, economic growth, and societal prosperity. Their tools include public investments in research and development as well as support for a variety of training programs, which can help nurture AI talent. On the issue of data, governments can spur the development of training data directly through open data initiatives. Opening up public-sector data can spur private-sector innovation. Setting common data standards can also help. AI is also raising new questions for policy makers to grapple with for which historical tools and frameworks may not be adequate. Therefore, some policy innovations will likely be needed to cope with these rapidly evolving technologies. But given the scale of the beneficial impact on business the economy and society, the goal should not be to constrain the adoption and application of AI, but rather to encourage its beneficial and safe use.
Stay current on your favorite topics
Michael Chui is a partner of the McKinsey Global Institute, where James Manyika is chairman and a director; Mehdi Miremadi is a partner in McKinsey’s Chicago office; Nicolaus Henke is a senior partner in the London office; Rita Chung is a consultant in the Silicon Valley office; Pieter Nel is a specialist in the New York office, where Sankalp Malhotra is a consultant.
Explore a career with us
Related articles.
Visualizing the uses and potential impact of AI and other analytics
How artificial intelligence and data add value to businesses
An executive’s guide to AI
- Skip to main content
- Skip to FDA Search
- Skip to in this section menu
- Skip to footer links
The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you're on a federal government site.
The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.
On Oct. 1, 2024, the FDA began implementing a reorganization impacting many parts of the agency. We are in the process of updating FDA.gov content to reflect these changes.
U.S. Food and Drug Administration
- Search
- Menu
- News & Events for Human Drugs
The Role of Artificial Intelligence in Clinical Trial Design and Research with Dr. ElZarrad
Q&A with FDA Podcast | Transcript
Continuing Education Credits (CE) Instructions
Login, complete evaluation, and print certificate.
- Navigate to ceportal.fda.gov
- If you have an account, please login
- If you do not have an account, click on “Create an account!”
- Once you’re logged in, please complete your profile if you haven't done so already
- Navigate to the “Enduring Materials” tab
- Select the activity you want to complete
- Click on the “Enroll” button. As you complete all the steps listed in the box, a green checkmark will appear
- Begin the activity by clicking on the link at the bottom of the page
- After listening to the podcast, click on “Complete Post-test”
- After successful completion of the Post-test, click on “Complete Evaluation”
- After completion of the evaluation, you may view/print your statement of credit and/or certificate of completion
Requirements for receiving CE credits
Review the activity, complete the post-test and evaluation. Upon completion, learners may view or print their statement of credit.
For those of you who are pharmacists or pharmacy technicians: The FDA CE Team will report your credit to the National Association of Boards of Pharmacy—otherwise known as “NABP”—provided you add your NABP ID and date of birth to your profile in the FDA CE Portal. The only official Statement of Credit is the one you pull from CPE Monitor. If you do not see your credit reflected on the CPE Monitor after 45 days after completion of the activity, please contact [email protected] . The CPE Monitor sets a strict 60-day limit on uploading credits.
Claim 0.5 CE credit (CME/AAPA/CNE/CPE/CPT/CPH) by listening to our podcast and responding to the questions
Dr. Roach: Welcome to “Q&A with FDA,” from the FDA’s Division of Drug Information, where we aim to answer some of the most frequently asked questions that we’ve received from the public.
My name is Dr. Sara Roach and today we will be discussing the role of Artificial Intelligence, or AI, in clinical trial design.
AI, including machine learning, is all gaining traction in clinical research, changing the clinical trial landscape, and is increasingly being integrated in areas where FDA is actively engaged, including clinical trial design, digital health technologies, and real-world data analytics.
Today we are joined by Dr. Khair ElZarrad, Director of the Office of Medical Policy within FDA’s Center for Drug Evaluation and Research to discuss recent advances and use of technology in clinical trial design.
Dr. ElZarrad leads the development, coordination, and implementation of medical policy programs and strategic initiatives and works to enhance policies and improve drug development and regulatory review processes. He was also recently awarded the 2023 Arthur Flemming Award, which honors outstanding employees.
Good afternoon Dr. ElZarrad, congratulations on the award! We are so glad that you could join us!
Dr. ElZarrad: Good afternoon Sara, my pleasure. Great to be with you today!
Dr. Roach: Can you set the stage for our audience by describing AI and machine learning?
Dr. ElZarrad: Sure absolutely. So generally AI and machine learning can be described as machine-based systems that can, for a given set obviously, for human-defined objectives, make predictions, recommendations, or decisions. Those AI systems use machine- and human-based inputs to perceive real and virtual environments. So they, they can abstract such perceptions into models in an automated manner. They use model inference typically to formulate options for information or action.
You know, machine learning, specifically, I would say, which is the most used form of AI that we’ve seen in drug development, is a subset of AI, and it employs a set of technologies or techniques that can be used to train AI algorithms to improve performance at the task, at the specific task, based on data that’s available.
Dr. Roach: It might not be well known that FDA has already received hundreds of submissions that reference use of AI. Can you talk more about that?
Dr. ElZarrad: Yeah sure absolutely, it’s a very exciting area actually! So over the last few years, FDA has seen a rapid growth in the number of submissions that reference AI. Obviously, when an application references AI that doesn’t tell you how deep or complex the use of AI. Nonetheless, we’ve seen a rapid rise in those applications. The last data I have, actually, from 2016 to today, approximately 300 submissions we’ve received that reference AI use. These submissions transverse the landscape of drug development, all the way from discovery to clinical research, you highlighted clinical trials for example in the beginning, but also to post-market safety surveillance and to even manufacturing, you know advanced manufacturing specifically.
We are also working to better understand how the use of AI in any one specific setting relates to participants safety and the reliability of study results. I call those like the two pillars that we have to pay attention to, the safety and the reliability of results.
The use of AI, including to facilitate data collection, this combined with robust information management, advanced computing abilities that we’ve been seeing increasingly in the recent year, are really transforming the way drugs are developed and used, so we are very excited about this area.
Dr. Roach: One of the most significant applications of AI in drug development is in efforts to streamline and advance clinical research. Can you tell us more about how these technologies are being applied in clinical trials?
Dr. ElZarrad: Yeah absolutely. Let me first say that one of our key objectives at the FDA is to really advance the modernization of clinical trials. Clinical trials themselves are cornerstone for evidence generation. We rely on them, and we must continue to make them more agile, inclusive, and innovative. You know and you can say multiple other areas where we can innovate around clinical trials.
So over the past few years, I hope everyone took note of our guidances, for example, on decentralized clinical trials or what we call DCTs, the digital health technologies or DHTs, and modern good clinical practice guidelines, among many other areas that hopefully will, will initiate a new generation if you would, of clinical trials both modernizing on the design front and the conduct front.
AI itself is being utilized – actually it’s being used to analyze a vast amount of data, and this is data from both clinical trials and observational studies. And its use is really to help make inferences regarding the safety and effectiveness of the drug being evaluated. AI also has the potential to inform the design and efficiency of clinical trials, including in decentralized clinical trials. And you see also a potential for AI use in trials incorporating the use of real-world data, another really main topic for us. For example, there is a great potential for AI uses in the extraction and organization of data from electronic health records and medical claims, as well as other data sources that tend to be valuable, sometimes unstructured. So, there’s a great potential for AI to assist us across the board here.
Dr. Roach: For those unfamiliar with DCTs - in these trials, some or all of the clinical trial’s activities occur at locations other than a traditional clinical trial site. These alternate locations can include the participant’s home, a local health care facility, or a nearby laboratory that incorporate DHTs, which are systems that capture health care information for the clinical trials directly from individuals. Can you elaborate on how decentralized clinical trials are currently incorporating digital health technologies?
Dr. ElZarrad: Yeah absolutely. It’s important to mention first, two points. Decentralized designs and the use of digital health technology hold a great potential in streamlining clinical trials in general, but also in expanding the reach of trials, and reducing the burden on participants. We really hope those designs and those tools can help us reach rural communities, for example.
That said, DCTs and DHTs are not silver bullets on themselves, and should be evaluated to see if they fit, if they are fit, for purpose, while considering the total context of the clinical trial, the type of the intervention, and the population involved among other factors. Take for example, that we should not really assume what patient’s preference would be. You know some patients may like to have a healthcare provider come to their homes. Others may not. So I think we should do our due diligence in understanding how those designs can and could be incorporated in the context of the trial.
Another point is that many DHTs are portable instruments obviously, such as you know activity trackers, glucose monitors, blood pressure monitors, spirometers, and they contain advances such as electronic sensors, computing platforms and information technology. So we have to consider all of that as we think about incorporating those tools.
DHTs are also, could include interactive mobile applications, where participants take for example, can rate their quality of life. They can rate pain, depression, daily function, or even perform tests of functional performance such as cognition, coordination, vision and among many other factors.
Many DHTs may be worn as you know, like implanted, ingested, or placed in the environment. And this is a really critical point because when we think of DHTs a lot of times we think of wearable trackers. This is, goes far beyond that. And placing those DHTs in a specific environment and allowing real time collection of data from trial participants in their homes and their locations, which, as you mentioned, remote to the clinical trial sites, can help gather real-world data relating to patient health status or the delivery of health care. You know, it’s practically, rather than getting a segmented view of patients behavior or input, you’re going to get a bit more of a holistic picture.
One area that I‘m keen on is how we can use technology to assist in the recruitment and retention of trial participants, as well as explaining trial processes to those participating in a more user-friendly and meaningful way. You know, one of the key reasons for failure in clinical trials is lack of recruitment. So can we utilize those tools to do better in this area is something we’re very much interested in.
Dr. Roach: This could really benefit clinical trials and drug development.
Dr. ElZarrad: Yeah, absolutely. You know, like, there are many examples, you know. Take DHTs - may enable continuous or frequent measurements of clinical features, or measurements of novel clinical features even. Those could, not be captured usually in traditional study visit because you go to the site. Typically, you capture data at the point. DHTs can enable much more really, in capturing more comprehensive data in that sense.
Those are typically state-of-the-art digital health technologies that we’re seeing increasingly. They allow participants of themselves to partake in the clinical trial remotely, so you can for example see how they are a natural fit for decentralized clinical trials, you know. Like for example, you can really reach communities by utilizing those tools that have never been reached before any clinical trials. Typically, those communities also have a huge burden of the disease.
We still though, we have to need to avoid a little bit the flashiness of those new technologies. You know, that we call them this new tool. Everybody wants to be, you know, part of this innovation right? So, but we have to really avoid the flashiness behind this new technology and ask, first - Do they fit in the context of the trial. I can give you one example in a recent meeting with academic experts, they really highlighted to us that customization and incorporation of technologies may also lead to increased complexity. We tend to think of those tools as removing complexity. But we really have to think about the context in its totality.
Dr. Roach: Considering the tools, data, and analytics we have discussed so far, such as Digital Health Technologies and Real-World Data, how is AI being increasingly integrated in areas where FDA is actively engaged?
Dr. ElZarrad: Yeah, good question. Many DHTs are AI-enabled, either as you know, embedded algorithm within the DHT itself, or employed upon the data generated from the DHT after the data is collected. So the utility of AI can really be in multiple places here.
Take one example: AI has been used to predict the status of chronic disease and its response to treatment, or to identify novel characteristics of an underlying condition. AI can be utilized to analyze the large and diverse data sets. This is, can be generated from continuous monitoring for example, continuous monitoring of participants. And AI can also be used for a range of data cleaning and curation purposes, an area that’s really important in the context of real-world data. This could include, for example, identifying duplicate participation, imputation of missing data values, as well as the ability to harmonize controlled terminology across drug development program.
One other point is that AI has also been used to analyze high volumes of real-world data extracted from electronic health records, medical claims, and disease registries, among other sources. We are also seeing AI being used in creating EHR phenotype, or patient cohorts, based on health records and claims data.
Dr. Roach: Are there other applications of AI being actively explored that you can share?
Dr. ElZarrad: The use of AI in predictive modeling, or counterfactual simulation, and in silicone modeling, for example, to inform clinical trial designs is being actively explored, and we’re really excited about the potential there.
In addition, AI used to improve the conduct of clinical trials and augment operational efficiency is also being explored. As I mentioned, making clinical trials more agile is a critical aspect here.
We’ve seen AI being used to assist in recruitment, and is being really developed and used in, in, to more effectively connect individuals as part of the trials. This can be really, can involve mining vast amounts of data from diverse sources, as we mentioned before, but also including social media, medical literature, registries, and structured and unstructured data in electronic health records.
So I think key to all of this really is recognizing that the potential is huge, but context matters and thoughtfulness is really essential across the board.
Dr. Roach: It appears that AI may also help improve clinical trial diversity. Can you expand on how AI is impacting recruitment and the logistical aspects of clinical trial design?
Dr. ElZarrad: Yeah, absolutely. And diversity is a very critical aspect for us, and an important aspect. But it’s important to remember that diversity in clinical trials is really a complex issue and it will take multifaceted approach, multifaceted solutions, to help ensure that trial participants reflect the population that will ultimately take or use the intervention if approved.
AI can be used to help with site selection for example. Trial operational conduct could be optimized by utilizing AI algorithm, take, to help identify which sites have the greatest potential for successful recruitment to the trial.
Its use also can help enhance site selection, improve participant recruitment strategies, and support more targeted engagement initiatives.
AI has been explored already and used in part of a clinical investigation in the prediction of an individual participant’s clinical outcome based on baseline characteristics. This supports, for example, our enrichment strategy that we have a separate guidance on. And this enrichment strategy could aid in participant selection in clinical trials.
Dr. Roach: AI’s applications are quite transformational.
Dr. ElZarrad: Yeah absolutely. You know, we do at the FDA, we discuss and we recognize the potential for AI to enhance drug development in so many ways. And across the spectrum of drug development, you know I mentioned even manufacturing, and informing post-market safety as well.
Dr. Roach: What are some specific ways you are seeing AI transform the way we have traditionally approached clinical trial design?
Dr. ElZarrad: Yeah, the advancements have been, as you said, substantial, and the potential is really massive. Today, AI can be used to characterize and predict pharmacokinetic profiles, for example, after drug administration. And these kinds of models can really be used to optimize the dose or dosing even regimen in any selected study, you know. And this is really an important part of drug development.
AI can be used to monitor and improve adherence during the clinical trial. This can be through tools such as smartphone alerts and reminders, electronic tracking of medications, tracking of missed clinical visits. This is all can trigger potential non-adherence alerts, and hopefully provide us with the basic information to address that.
AI has already been used in clinical research to improve medication adherence, actually. This is specifically through the use of applications, like digital biomarkers, facial and vocal expression for example, to monitor adherence, remotely. I recall specifically one specific tool that is in development by academia and industry to track adherence in that form.
This technology has the potential to improve retention and participants’ access to relevant trial information. This can be done by enabling AI chatbots, for example, voice assistance, and intelligent search. So really, the potential, like, you can run through so many modalities of where the potential is really great here.
The last one I want to mention is that data from digital health technology and other systems can be used to develop participants profiles to potentially predict dropouts as well, and this would, could help in participants retention. So, if you know, somebody is likely to drop out of the trial, you can try to employ some methods to prevent that from happening.
Dr. Roach: AI has a role in clinical trial safety monitoring too. Can you expand on safety?
Dr. ElZarrad: Yeah sure. AI-enabled algorithms have the ability to detect clusters of signs and symptoms to identify potential safety signals, and that can be done in real time, which is again one of the areas that is really powerful. AI can be used to predict also adverse events in clinical trial participants. And this is an area we are definitely interested in exploring.
Dr. Roach: There are clearly numerous benefits to using these technologies, but what are some of the drawbacks or challenges?
Dr. ElZarrad: Yeah this is a great question. In, in recent years the integration of AI in drug development has gained significant traction as we discussed already. But this integration, coupled with continuous advancements of AI, not only holds the potential to accelerate the development of safe and effective drugs and enhance patient care, which is an important point. It also has to be considered very carefully. We’ve seen, for example, concurrent with this technology advancement, the use of AI in regards to submissions, have already increased. However, AI and drug development present, I would say, some unique challenges.
Take first, the variability in the quality and size and representativeness of data sets for training AI models. This can introduce bias and raise questions about the reliability of AI driven results. Responsible use of AI demands, truly, that the data used to develop these models are fit for purpose and fit for use. This is our concept we try to highlight and clarify. The fit for use really means that the data should be both relevant. You know, the data could include key elements and sufficient number of representative participants, and also that the data is reliable. And, and what we mean by reliable are factors such as accuracy, completeness, traceability.
Another important factor is because of the complexity of those computational and statistical methodologies behind the AI model, understanding how AI models are developed and how they arrive at their conclusion can be really difficult at times. And this may necessitate, or require us, to start thinking of new approaches around transparency.
Another factor is that the uncertainty in employed model may be difficult to interpret and explain or quantify at times, so that would require us to in a way try to figure out how to employ a risk-based approach. To look under the hood if you would, and how, how to do that in a reasonable and effective way.
And then finally, the last thing I would like to mention is that another challenge with some of those AI models is that the performance of those AI models could degrade over time, especially if you think of learning systems, right, where the data inputs are introduced, new data is acquired. And then the outputs may differ over time. So how do we make sure that we do not see that lag and performance. You know, I think it’s called in a, a data drift concept here.
Dr. Roach: How is FDA preparing to educate industry and to overcome these challenges?
Dr. ElZarrad: So internally, FDA has established a steering committee to provide advice, and this is advice on the general use or feasibility of DHTs and the implementation of DCTs. DHTs again, digital health technology, DCTs decentralized clinical trials. And as I mentioned, AI has been utilized across the board here. So we’re trying to provide engagement and also meaningful input on the use of those technologies and designs. To engage the public and receive outside opinions, we also are hosting multiple workshops, announcing demonstration projects, and the goal here is really to push and encourage mutual learning.
FDA recently had published a discussion paper, I think the title was “ Using Artificial Intelligence and Machine Learning and the Development of Drugs and Biological Products ,” and this is really aimed to spur discussion with the community. Obviously we have some expertise in the FDA. But we know a lot of the expertise lies outside of the FDA. We’re very interested in learning and making sure that whatever we develop when it comes to our regulations is responsive to the community where it encourages innovations, while at the same time safeguard, you know, patient safety.
This paper that I mentioned, we solicited specific feedback on specific challenges that we see. And it outlined that very nicely. And we’ve been receiving some great inputs from the community. So we’re very excited to really make sure we incorporate the thoughts and the, the future processes around, you know, policy development.
I just want to mention too, that we’re very interested in feedback, not only from pharmaceutical companies that are involved typically in drug development, but also on those who are developing AI algorithms from ethicists, from academia, from patients and patient groups, and also global counterparts. We realize this technology is really being developed across the world. And we want to, we want to do our due diligence by engaging globally in that sense.
Dr. Roach: In fact, our Small Business and Industry Assistance program, or SBIA program, recently discussed digital health technologies in our Clinical Investigator Training Course, and is always looking for ways to assist and educate industry, so this feedback is extremely important! A link to the presentations can be found by clicking on the episode webpage and navigating to the hyperlink.
Dr. ElZarrad, thank you so much for joining us today. Do you have any final thoughts that you would like to convey to our audience?
Dr. ElZarrad: Thanks again so much for the opportunity here. You know, as regulators, we’re really excited about the potential of AI and technological innovations in general. We do hope that such innovation will result in facilitating drug development and ultimately accelerating how safe and effective drugs get to those who need them in a quick fashion.
We plan on developing and adapting a flexible risk-based regulatory framework that will promote innovation and protect patient safety. You will see that not just for AI, but across our guidances and policies that will address technological innovations.
As we move further into integrating AI in drug development, we are committed to continuing to engage with all interested parties. We will share preliminary considerations, seek input, and encourage discussions on fostering the responsible use and deployment of these technologies.
As I mentioned the mutual learning is really critical for us, and we hope as we move forward collectively we shape this field in a responsible way. We do recognize that we need to learn from experts and experiences across sectors here. And as I mentioned, not just the standard traditional sectors. We really need to go beyond into the technology, into ethics and beyond. We are very excited to see how these innovations are being used and will continue to be used to accelerate the development of safe and effective drugs. And we definitely want to be part of the solution to continue this innovation as fast as we can. Thank you.
Dr. Roach: Thank you again, Dr. ElZarrad. As we’ve discussed today, and as with any innovation, AI creates opportunities along with new and unique challenges. Thanks for listening to “Q&A with FDA”. The full podcast and transcript of this recording is available at fda.gov/qawithfda . Many of our episodes offer continuing education credits for health care professionals, so be sure to visit the webpage for more details. If you are looking for additional learning or continuing education credit opportunities, including live and home study webinars, you’ll also want to check out fda.gov/CDERLearn and fda.gov/DDIWebinars . And if you have questions about this episode, or anything drug-related, email us at [email protected] .
- Artificial Intelligence and Machine Learning (AI/ML) for Drug Development
- Digital Health Technologies (DHTs) for Drug Development
- The Evolving Role of Decentralized Clinical Trials and Digital Health Technologies
- Using Artificial Intelligence & Machine Learning in the Development of Drug & Biological Products – Discussion Paper and Request for Feedback
- Real-World Evidence
- FDA Clinical Investigator Training Course (CITC) 2022
Advertisement
Supported by
Alarmed by A.I. Chatbots, Universities Start Revamping How They Teach
With the rise of the popular new chatbot ChatGPT, colleges are restructuring some courses and taking preventive measures.
- Share full article
By Kalley Huang
Kalley Huang, who covers youth and technology from San Francisco, interviewed more than 30 professors, students and university administrators for this article.
While grading essays for his world religions course last month, Antony Aumann, a professor of philosophy at Northern Michigan University, read what he said was easily “the best paper in the class.” It explored the morality of burqa bans with clean paragraphs, fitting examples and rigorous arguments.
A red flag instantly went up.
Mr. Aumann confronted his student over whether he had written the essay himself. The student confessed to using ChatGPT , a chatbot that delivers information, explains concepts and generates ideas in simple sentences — and, in this case, had written the paper.
Alarmed by his discovery, Mr. Aumann decided to transform essay writing for his courses this semester. He plans to require students to write first drafts in the classroom, using browsers that monitor and restrict computer activity. In later drafts, students have to explain each revision. Mr. Aumann, who may forgo essays in subsequent semesters, also plans to weave ChatGPT into lessons by asking students to evaluate the chatbot’s responses.
“What’s happening in class is no longer going to be, ‘Here are some questions — let’s talk about it between us human beings,’” he said, but instead “it’s like, ‘What also does this alien robot think?’”
Across the country, university professors like Mr. Aumann, department chairs and administrators are starting to overhaul classrooms in response to ChatGPT , prompting a potentially huge shift in teaching and learning. Some professors are redesigning their courses entirely, making changes that include more oral exams, group work and handwritten assessments in lieu of typed ones.
The moves are part of a real-time grappling with a new technological wave known as generative artificial intelligence . ChatGPT, which was released in November by the artificial intelligence lab OpenAI, is at the forefront of the shift. The chatbot generates eerily articulate and nuanced text in response to short prompts, with people using it to write love letters, poetry, fan fiction — and their schoolwork.
That has upended some middle and high schools, with teachers and administrators trying to discern whether students are using the chatbot to do their schoolwork. Some public school systems, including in New York City and Seattle, have since banned the tool on school Wi-Fi networks and devices to prevent cheating, though students can easily find workarounds to access ChatGPT.
In higher education, colleges and universities have been reluctant to ban the A.I. tool because administrators doubt the move would be effective and they don’t want to infringe on academic freedom. That means the way people teach is changing instead.
“We try to institute general policies that certainly back up the faculty member’s authority to run a class,” instead of targeting specific methods of cheating, said Joe Glover, provost of the University of Florida. “This isn’t going to be the last innovation we have to deal with.”
That’s especially true as generative A.I. is in its early days. OpenAI is expected to soon release another tool, GPT-4, which is better at generating text than previous versions. Google has built LaMDA , a rival chatbot, and Microsoft is discussing a $10 billion investment in OpenAI. Silicon Valley start-ups , including Stability AI and Character.AI , are also working on generative A.I. tools.
An OpenAI spokeswoman said the lab recognized its programs could be used to mislead people and was developing technology to help people identify text generated by ChatGPT.
At many universities, ChatGPT has now vaulted to the top of the agenda. Administrators are establishing task forces and hosting universitywide discussions to respond to the tool, with much of the guidance being to adapt to the technology.
At schools including George Washington University in Washington, D.C., Rutgers University in New Brunswick, N.J., and Appalachian State University in Boone, N.C., professors are phasing out take-home, open-book assignments — which became a dominant method of assessment in the pandemic but now seem vulnerable to chatbots. They are instead opting for in-class assignments, handwritten papers, group work and oral exams.
Gone are prompts like “write five pages about this or that.” Some professors are instead crafting questions that they hope will be too clever for chatbots and asking students to write about their own lives and current events.
Students are “plagiarizing this because the assignments can be plagiarized,” said Sid Dobrin, chair of the English department at the University of Florida.
Frederick Luis Aldama, the humanities chair at the University of Texas at Austin, said he planned to teach newer or more niche texts that ChatGPT might have less information about, such as William Shakespeare’s early sonnets instead of “A Midsummer Night’s Dream.”
The chatbot may motivate “people who lean into canonical, primary texts to actually reach beyond their comfort zones for things that are not online,” he said.
In case the changes fall short of preventing plagiarism, Mr. Aldama and other professors said they planned to institute stricter standards for what they expect from students and how they grade. It is now not enough for an essay to have just a thesis, introduction, supporting paragraphs and a conclusion.
“We need to up our game,” Mr. Aldama said. “The imagination, creativity and innovation of analysis that we usually deem an A paper needs to be trickling down into the B-range papers.”
Universities are also aiming to educate students about the new A.I. tools. The University at Buffalo in New York and Furman University in Greenville, S.C., said they planned to embed a discussion of A.I. tools into required courses that teach entering or freshman students about concepts such as academic integrity.
“We have to add a scenario about this, so students can see a concrete example,” said Kelly Ahuna, who directs the academic integrity office at the University at Buffalo. “We want to prevent things from happening instead of catch them when they happen.”
Other universities are trying to draw boundaries for A.I. Washington University in St. Louis and the University of Vermont in Burlington are drafting revisions to their academic integrity policies so their plagiarism definitions include generative A.I.
John Dyer, vice president for enrollment services and educational technologies at Dallas Theological Seminary, said the language in his seminary’s honor code felt “a little archaic anyway.” He plans to update its plagiarism definition to include: “using text written by a generation system as one’s own (e.g., entering a prompt into an artificial intelligence tool and using the output in a paper).”
The misuse of A.I. tools will most likely not end, so some professors and universities said they planned to use detectors to root out that activity. The plagiarism detection service Turnitin said it would incorporate more features for identifying A.I., including ChatGPT, this year.
More than 6,000 teachers from Harvard University, Yale University, the University of Rhode Island and others have also signed up to use GPTZero, a program that promises to quickly detect A.I.-generated text, said Edward Tian, its creator and a senior at Princeton University.
Some students see value in embracing A.I. tools to learn. Lizzie Shackney, 27, a student at the University of Pennsylvania’s law school and design school, has started using ChatGPT to brainstorm for papers and debug coding problem sets.
“There are disciplines that want you to share and don’t want you to spin your wheels,” she said, describing her computer science and statistics classes. “The place where my brain is useful is understanding what the code means.”
But she has qualms. ChatGPT, Ms. Shackney said, sometimes incorrectly explains ideas and misquotes sources. The University of Pennsylvania also hasn’t instituted any regulations about the tool, so she doesn’t want to rely on it in case the school bans it or considers it to be cheating, she said.
Other students have no such scruples, sharing on forums like Reddit that they have submitted assignments written and solved by ChatGPT — and sometimes done so for fellow students too. On TikTok, the hashtag #chatgpt has more than 578 million views, with people sharing videos of the tool writing papers and solving coding problems .
One video shows a student copying a multiple choice exam and pasting it into the tool with the caption saying: “I don’t know about y’all but ima just have Chat GPT take my finals. Have fun studying.”
Kalley Huang is a technology reporting fellow based in San Francisco. She graduated from the University of North Carolina at Chapel Hill. More about Kalley Huang
Explore Our Coverage of Artificial Intelligence
News and Analysis
A former OpenAI researcher who helped gather and organize the enormous amounts of internet data used to train the startup’s ChatGPT chatbot says the company broke copyright law .
OpenAI hired Aaron Chatterji, a professor of business and public policy at Duke University, to be its chief economist . He previously served in both the Obama and Biden administrations.
Cerebras, a chip company with bold ambitions to take on Nvidia, filed for an I.P.O. , taking a key step toward being among the first A.I. companies to go public since the release of ChatGPT.
The Age of A.I.
Nevada used A.I. to find students in need of help. The new system cut the number of students deemed “at risk” in the state by 200,000 , leading to tough moral and ethical questions over which children deserve extra assistance.
A project at Stanford points to the need for institutional innovation, especially in government, to increase the odds that A.I. enhances democracy .
From hurricanes to wildfires, a new generation of technologies driven by artificial intelligence could help utilities better plan for the risk of extreme weather to their electric grid .
IMAGES
VIDEO
COMMENTS
Search. The Journal of Artificial Intelligence Research (JAIR) is dedicated to the rapid dissemination of important research results to the global artificial intelligence (AI) community. The journal's scope encompasses all areas of AI, including agents and multi-agent systems, automated reasoning, constraint processing and search, knowledge ...
The aim of this paper is to provide a broad research guideline on fundamental sciences with potential infusion of AI, to help motivate researchers to deeply understand the state-of-the-art applications of AI-based fundamental sciences, and thereby to help promote the continuous development of these fundamental sciences.
Editorial. Artificial Intelligence and Its Applications. Yudong Zhang, 1 Saeed Balochian, 2Praveen Agarwal,3. Vishal Bhatnagar, 4and Orwa Jaber Housheya5. 1 School of Computer Science and ...
Artificial intelligence (AI) is becoming more and more common in people's lives. ... Substantive research and application of AI technologies to SDGs is concerned with the development of better ...
Concisely, this paper provides a review and analysis of artificial computing from 2020 to 2023. The emphasis is on dominant theories and themes, methodologies, frameworks, trends and research direction for understanding AI in recent times. ... The publications in this theme examine the application of AI in scientific research and medicine [35 ...
Design of innovative novel AI and ML applications for predictive and analytical capabilities (4) Design of sophisticated AI and ML-enabled intelligence components with critical social impact (5) Promotion of the Digital Transformation in all the aspects of human activity including business, healthcare, government, commerce, social intelligence etc.
Information technologies, particularly artificial intelligence (AI), are revolutionizing modern education. AI algorithms and educational robots are now integral to learning management and training systems, providing support for a wide array of teaching and learning activities (Costa et al., 2017, García et al., 2007).Numerous applications of AI in education (AIED) have emerged.
Fig. 1: Science in the age of artificial intelligence. Scientific discovery is a multifaceted process that involves several interconnected stages, including hypothesis formation, experimental ...
Artificial intelligence for detection of retinal toxicity in chloroquine and hydroxychloroquine therapy using multifocal electroretinogram waveforms. Mikhail Kulyabin. Jan Kremers. Cord Huchzermeyer. in Scientific Reports. Article Open access 22 October 2024. View all research.
The corpus of scientific literature grows at an ever-increasing speed. Specifically, in the field of artificial intelligence (AI) and machine learning (ML), the number of papers every month is ...
Artificial intelligence (AI) applications in education are on the rise and have received a lot of attention in the last couple of years. AI and adaptive learning technologies are prominently featured as important developments in educational technology in the 2018 Horizon report (Educause, 2018), with a time to adoption of 2 or 3 years.According to the report, experts anticipate AI in education ...
Artificial intelligence (AI) is a leading technology of the current age of the Fourth Industrial Revolution (Industry 4.0 or 4IR), with the capability of incorporating human behavior and intelligence into machines or systems. Thus, AI-based modeling is the key to build automated, intelligent, and smart systems according to today's needs. To solve real-world issues, various types of AI such ...
Generative Artificial Intelligence (AI) stands as a transformative paradigm in machine learning, enabling the creation of complex and realistic data from latent representations. This review paper comprehensively surveys the landscape of Generative AI, encompassing its foundational concepts, diverse models, training methodologies, applications, challenges, recent advancements, evaluation ...
The paper focuses specifically on the incorporation of artificial intelligence (AI), which includes a wide range of technologies and methods, such as machine learning, adaptive learning, natural ...
Industrial experiences in the application of the above techniques, e.g. case studies or benchmarking exercises. Robotics. Engineering Applications of Artificial Intelligence publishes: Survey papers/tutorials. Contributed papers — detailed expositions of new research or applications
Artificial Intelligence (AI) is gradually changing the practice of surgery with the advanced technological development of imaging, navigation, and robotic intervention. In this article, the recent successful and influential applications of AI in surgery are reviewed from pre-operative planning and intra-operative guidance to the integration of ...
Keywords: Artificial intelligence, healthcare applications, machine learning, precision medicine, ambient assisted living, natural language programming, machine vision. 2.1. The new age of healthcare. Big data and machine learning are having an impact on most aspects of modern life, from entertainment, commerce, and healthcare.
National Bureau of Economic Research. Funding for this paper was provided by the MIT Sloan School of Management, by the HBS Division of Research and by the Questrom School of Management. At least one co-author has disclosed a financial relationship of potential relevance for this ... Second, while some applications of artificial intelligence ...
Artificial intelligence (AI), particularly, machine learning (ML) have grown rapidly in recent years in the context of data analysis and computing that typically allows the applications to function in an intelligent manner [].ML usually provides systems with the ability to learn and enhance from experience automatically without being specifically programmed and is generally referred to as the ...
As the field of Big Data analytics continues to evolve, ensuring fairness, transparency, and ethical considerations in the design and deployment of AI systems has become a critical challenge. This Research Topic focuses on advancing research in ethical AI and fairness-aware machine learning. The widespread integration of AI technologies in various industries requires that we address inherent ...
Artificial intelligence is the simulation of human intelligence processes by machines, especially computer systems. Examples of AI applications include expert systems, natural language processing (), speech recognition and machine vision.. As the hype around AI has accelerated, vendors have scrambled to promote how their products and services incorporate it.
Educational and academic practices have been exposed to significant and far-reaching technological advancements in recent times no better exemplified by the recent intervention of Artificial Intelligence (Tuomi, 2018).The swift technological research and embedded innovation in machine learning sciences has accelerated the introduction of language generation models (Dwivedi et al., 2021).
Artificial intelligence (AI) stands out as a transformational technology of our digital age—and its practical application throughout the economy is growing apace. For this briefing, Notes from the AI frontier: Insights from hundreds of use cases (PDF-446KB), we mapped both traditional analytics and newer "deep learning" techniques and the problems they can solve to more than 400 ...
FDA recently had published a discussion paper, I think the title was "Using Artificial Intelligence and Machine Learning and the Development of Drugs and Biological Products," and this is ...
Recent advancements in computer technology and artificial intelligence (AI) have propelled research in basketball, leading to notable achievements in various aspects of the sport.
The paper investigates the wide range of implications of artificial intelligence (AI), and delves deeper into both positive and negative impacts on governments, communities, companies, and individuals. ... [35]. The successful AI applications are categorized under four broad areas viz. computer vision, speech recognition, text analysis, and ...
While grading essays for his world religions course last month, Antony Aumann, a professor of philosophy at Northern Michigan University, read what he said was easily "the best paper in the ...
5. Conclusion. AI technology is rapidly advancing and its application in education is expected to grow rapidly in the near future. In the USA, for example, education sectors are predicted with an approximate 48% of growth in AI market in the near future, from 2018 to 2022 (BusinessWire.com, 2018).