A study on Deep Neural Networks framework

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Brief Bioinform
  • v.19(6); 2018 Nov

Logo of bib

Deep learning for healthcare: review, opportunities and challenges

Riccardo miotto.

1 Institute for Next Generation Healthcare, Department of Genetics and Genomic Sciences at the Icahn School of Medicine at Mount Sinai, New York, NY

2 Division of Health Informatics, Department of Healthcare Policy and Research at Weill Cornell Medicine at Cornell University, New York, NY

Shuang Wang

3 Department of Biomedical Informatics at the University of California San Diego, La Jolla, CA

Xiaoqian Jiang

Joel t dudley.

4 the Institute for Next Generation Healthcare and associate professor in the Department of Genetics and Genomic Sciences at the Icahn School of Medicine at Mount Sinai, New York, NY

Gaining knowledge and actionable insights from complex, high-dimensional and heterogeneous biomedical data remains a key challenge in transforming health care. Various types of data have been emerging in modern biomedical research, including electronic health records, imaging, -omics, sensor data and text, which are complex, heterogeneous, poorly annotated and generally unstructured. Traditional data mining and statistical learning approaches typically need to first perform feature engineering to obtain effective and more robust features from those data, and then build prediction or clustering models on top of them. There are lots of challenges on both steps in a scenario of complicated data and lacking of sufficient domain knowledge. The latest advances in deep learning technologies provide new effective paradigms to obtain end-to-end learning models from complex data. In this article, we review the recent literature on applying deep learning technologies to advance the health care domain. Based on the analyzed work, we suggest that deep learning approaches could be the vehicle for translating big biomedical data into improved human health. However, we also note limitations and needs for improved methods development and applications, especially in terms of ease-of-understanding for domain experts and citizen scientists. We discuss such challenges and suggest developing holistic and meaningful interpretable architectures to bridge deep learning models and human interpretability.

Introduction

Health care is coming to a new era where the abundant biomedical data are playing more and more important roles. In this context, for example, precision medicine attempts to ‘ensure that the right treatment is delivered to the right patient at the right time’ by taking into account several aspects of patient's data, including variability in molecular traits, environment, electronic health records (EHRs) and lifestyle [ 1–3 ].

The large availability of biomedical data brings tremendous opportunities and challenges to health care research. In particular, exploring the associations among all the different pieces of information in these data sets is a fundamental problem to develop reliable medical tools based on data-driven approaches and machine learning. To this aim, previous works tried to link multiple data sources to build joint knowledge bases that could be used for predictive analysis and discovery [ 4–6 ]. Although existing models demonstrate great promises (e.g. [ 7–11 ]), predictive tools based on machine learning techniques have not been widely applied in medicine [ 12 ]. In fact, there remain many challenges in making full use of the biomedical data, owing to their high-dimensionality, heterogeneity, temporal dependency, sparsity and irregularity [ 13–15 ]. These challenges are further complicated by various medical ontologies used to generalize the data (e.g. Systematized Nomenclature of Medicine-Clinical Terms (SNOMED-CT) [ 16 ], Unified Medical Language System (UMLS) [ 17 ], International Classification of Disease-9th version (ICD-9) [ 18 ]), which often contain conflicts and inconsistency [ 19 ]. Sometimes, the same clinical phenotype is also expressed in different ways across the data. As an example, in the EHRs, a patient diagnosed with ‘type 2 diabetes mellitus’ can be identified by laboratory values of hemoglobin A1C >7.0, presence of 250.00 ICD-9 code, ‘type 2 diabetes mellitus’ mentioned in the free-text clinical notes and so on. Consequently, it is nontrivial to harmonize all these medical concepts to build a higher-level semantic structure and understand their correlations [ 6 , 20 ].

A common approach in biomedical research is to have a domain expert to specify the phenotypes to use in an ad hoc manner. However, supervised definition of the feature space scales poorly and misses the opportunities to discover novel patterns. Alternatively, representation learning methods allow to automatically discover the representations needed for prediction from the raw data [ 21 , 22 ]. Deep learning methods are representation-learning algorithms with multiple levels of representation, obtained by composing simple but nonlinear modules that each transform the representation at one level (starting with the raw input) into a representation at a higher, slightly more abstract level [ 23 ]. Deep learning models demonstrated great performance and potential in computer vision, speech recognition and natural language processing tasks [ 24–27 ].

Given its demonstrated performance in different domains and the rapid progresses of methodological improvements, deep learning paradigms introduce exciting new opportunities for biomedical informatics. Efforts to apply deep learning methods to health care are already planned or underway. For example, Google DeepMind has announced plans to apply its expertise to health care [ 28 ] and Enlitic is using deep learning intelligence to spot health problems on X-rays and Computed Tomography (CT) scans [ 29 ].

However, deep learning approaches have not been extensively evaluated for a broad range of medical problems that could benefit from its capabilities. There are many aspects of deep learning that could be helpful in health care, such as its superior performance, end-to-end learning scheme with integrated feature learning, capability of handling complex and multi-modality data and so on. To accelerate these efforts, the deep learning research field as a whole must address several challenges relating to the characteristics of health care data (i.e. sparse, noisy, heterogeneous, time-dependent) as need for improved methods and tools that enable deep learning to interface with health care information workflows and clinical decision support.

In this article, we discuss recent and forthcoming applications of deep learning in medicine, highlighting the key aspects to significantly impact health care. We do not aim to provide a comprehensive background on technical details (see e.g. [ 21 , 30–32 ]) or general application of deep learning (see e.g. [ 23 ]). Instead, we focus on biomedical data only, in particular those originated from clinical imaging, EHRs, genomes and wearable devices. While additional sources of information, such as metabolome, antibodyome and other omics information are expected to be valuable for health monitoring, at this point deep learning has not been significantly used in these domains. Thus, in the following, we briefly introduce the general deep learning framework, we review some of its applications in the medical domain and we discuss the opportunities, challenges and applications related to these methods when used in the context of precision medicine and next-generation health care.

Deep learning framework

Machine learning is a general-purpose method of artificial intelligence that can learn relationships from the data without the need to define them a priori [ 33 ]. The major appeal is the ability to derive predictive models without a need for strong assumptions about the underlying mechanisms, which are usually unknown or insufficiently defined [ 34 ]. The typical machine learning workflow involves four steps: data harmonization, representation learning, model fitting and evaluation [ 35 ]. For decades, constructing a machine learning system required careful engineering and domain expertise to transform the raw data into a suitable internal representation from which the learning subsystem, often a classifier, could detect patterns in the data set. Conventional techniques are composed of a single, often linear, transformation of the input space and are limited in their ability to process natural data in their raw form [ 21 ].

Deep learning is different from traditional machine learning in how representations are learned from the raw data. In fact, deep learning allows computational models that are composed of multiple processing layers based on neural networks to learn representations of data with multiple levels of abstraction [ 23 ]. The major differences between deep learning and traditional artificial neural networks (ANNs) are the number of hidden layers, their connections and the capability to learn meaningful abstractions of the inputs. In fact, traditional ANNs are usually limited to three layers and are trained to obtain supervised representations that are optimized only for the specific task and are usually not generalizable [ 36 ]. Differently, every layer of a deep learning system produces a representation of the observed patterns based on the data it receives as inputs from the layer below, by optimizing a local unsupervised criterion [ 37 ]. The key aspect of deep learning is that these layers of features are not designed by human engineers, but they are learned from data using a general-purpose learning procedure. Figure 1 illustrates such differences at a high level: deep neural networks process the inputs in a layer-wise nonlinear manner to pre-train (initialize) the nodes in subsequent hidden layers to learn ‘deep structures’ and representations that are generalizable. These representations are then fed into a supervised layer to fine-tune the whole network using the backpropagation algorithm toward representations that are optimized for the specific end-to-end task.

An external file that holds a picture, illustration, etc.
Object name is bbx044f1.jpg

Comparison between ANNs and deep architectures. While ANNs are usually composed by three layers and one transformation toward the final outputs, deep learning architectures are constituted by several layers of neural networks. Layer-wise unsupervised pre-training allows deep networks to be tuned efficiently and to extract deep structure from inputs to serve as higher-level features that are used to obtain better predictions.

The unsupervised pre-training breakthrough [ 23 , 38 ], new methods to prevent overfitting [ 39 ], the use of general-purpose graphic processing units to speedup computations and the development of high-level modules to easily build neural networks (e.g. Theano [ 40 ], Caffe [ 41 ], TensorFlow [ 42 ]) allowed deep models to establish as state-of-the-art solutions for several tasks. In fact, deep learning turned out to be good at discovering intricate structures in high-dimensional data and obtained remarkable performances for object detection in images [ 43 , 44 ], speech recognition [ 45 ] and natural language understanding [ 46 ] and translation [ 47 ]. Relevant clinical-ready successes have been obtained in health care as well (e.g. detection of diabetic retinopathy in retinal fundus photographs [ 48 ], classification of skin cancer [ 49 ], predicting of the sequence specificities of DNA- and RNA-binding proteins [ 50 ]), initiating the way toward a potential new generation of intelligent tools-based deep learning for real-world medical care.

Literature review

The use of deep learning for medicine is recent and not thoroughly explored. In the next sections, we will review some of the main recent literature (i.e. 32 papers) related to applications of deep models to clinical imaging, EHRs, genomics and wearable device data.

Table 1 summarizes all the papers mentioned in this literature review, in particular highlighting the type of networks and the medical data considered. To the best of our knowledge, there are no studies using deep learning to combine neither all these data sources, nor a part of them (e.g. only EHRs and clinical images, only EHRs and genomics) in a joint representation for medical analysis and prediction. A few preliminary studies evaluated the combined use of EHRs and genomics (e.g. see [ 9 , 80 ]), without applying deep learning though; for this reason, they were not considered relevant to this review. The deep architectures applied to the health care domain have been mostly based on convolutional neural networks (CNNs) [ 81 ], recurrent neural networks (RNNs) [ 82 ], Restricted Boltzmann Machines (RBMs) [ 83 ] and Autoencoders (AEs) [ 84 ]. Table 2 briefly reviews these models and provides the main ideas behind their structures.

Summary of the articles described in the literature review with highlighted the deep learning architecture applied and the medical domain considered

We report 32 different papers using deep learning on clinical images, EHRs, genomics and mobile data. As it can be seen, most of the papers apply CNNs and AEs, regardless the medical domain. To the best of our knowledge, no works in the literature jointly process these different types of data (e.g. all of them, only EHRs and clinical images, only EHRs and mobile data) using deep learning for medical intelligence and prediction.

RNN = recurrent neural network; CNN = convolutional neural network; RBM = restricted Boltzmann machine; AE = autoencoder; LSTM = long short-term memory; GRU = gated recurrent unit.

Review of the neural networks shaping the deep learning architectures applied to the health care domain in the literature

Clinical imaging

Following the success in computer vision, the first applications of deep learning to clinical data were on image processing, especially on the analysis of brain Magnetic Resonance Imaging (MRI) scans to predict Alzheimer disease and its variations [ 51 , 52 ]. In other medical domains, CNNs were used to infer a hierarchical representation of low-field knee MRI scans to automatically segment cartilage and predict the risk of osteoarthritis [ 53 ]. Despite using 2D images, this approach obtained better results than a state-of-the-art method using manually selected 3D multi-scale features. Deep learning was also applied to segment multiple sclerosis lesions in multi-channel 3D MRI [ 54 ] and for the differential diagnosis of benign and malignant breast nodules from ultrasound images [ 55 ]. More recently, Gulshan et al . [ 48 ] used CNNs to identify diabetic retinopathy in retinal fundus photographs, obtaining high sensitivity and specificity over about 10 000 test images with respect to certified ophthalmologist annotations. CNNs also obtained performances on par with 21 board-certified dermatologists on classifying biopsy-proven clinical images of different types of skin cancer (keratinocyte carcinomas versus benign seborrheic keratoses and malignant melanomas versus benign nevi) over a large data set of 130 000 images (1942 biopsy-labeled test images) [ 49 ].

Electronic health records

More recently deep learning has been applied to process aggregated EHRs, including both structured (e.g. diagnosis, medications, laboratory tests) and unstructured (e.g. free-text clinical notes) data. The greatest part of this literature processed the EHRs of a health care system with a deep architecture for a specific, usually supervised, predictive clinical task. In particular, a common approach is to show that deep learning obtains better results than conventional machine learning models with respect to certain metrics, such as Area Under the Receiver Operating Characteristic Curve, accuracy and F-score [ 91 ]. In this scenario, while most papers present end-to-end supervised networks, some works also propose unsupervised models to derive latent patient representations, which are then evaluated using shallow classifiers (e.g. random forests, logistic regression).

Several works applied deep learning to predict diseases from the patient clinical status. Liu et al . [ 56 ] used a four-layer CNN to predict congestive heart failure and chronic obstructive pulmonary disease and showed significant advantages over the baselines. RNNs with long short-term memory (LSTM) hidden units, pooling and word embedding were used in DeepCare [ 58 ], an end-to-end deep dynamic network that infers current illness states and predicts future medical outcomes. The authors also proposed to moderate the LSTM unit with a decay effect to handle irregular timed events (which are typical in longitudinal EHRs). Moreover, they incorporated medical interventions in the model to dynamically shape the predictions. DeepCare was evaluated for disease progression modeling, intervention recommendation and future risk prediction on diabetes and mental health patient cohorts. RNNs with gated recurrent unit (GRU) were used by Choi et al . [ 65 ] to develop Doctor AI, an end-to-end model that uses patient history to predict diagnoses and medications for subsequent encounters. The evaluation showed significantly higher recall than shallow baselines and good generalizability by adapting the resulting model from one institution to another without losing substantial accuracy. Differently, Miotto et al . [ 59 ] proposed to learn deep patient representations from the EHRs using a three-layer Stacked Denoising Autoencoder (SDA). They applied this novel representation on disease risk prediction using random forest as classifiers. The evaluation was performed on 76 214 patients comprising 78 diseases from diverse clinical domains and temporal windows (up to a 1 year). The results showed that the deep representation leads to significantly better predictions than using raw EHRs or conventional representation learning algorithms (e.g. Principal Component Analysis (PCA), k-means). Moreover, they also showed that results significantly improve when adding a logistic regression layer on top of the last AE to fine-tune the entire supervised network [ 60 ]. Similarly, Liang et al . [ 61 ] used RBMs to learn representations from EHRs that revealed novel concepts and demonstrated better prediction accuracy on a number of diseases.

Deep learning was also applied to model continuous time signals, such as laboratory results, toward the automatic identification of specific phenotypes. For example, Lipton et al . [ 57 ] used RNNs with LSTM to recognize patterns in multivariate time series of clinical measurements. Specifically, they trained a model to classify 128 diagnoses from 13 frequently but irregularly sampled clinical measurements from patients in pediatric intensive unit care. The results showed significant improvements with respect to several strong baselines, including multilayer perceptron trained on hand-engineered features. Che et al . [ 63 ] used SDAs regularized with a prior knowledge based on ICD-9s for detecting characteristic patterns of physiology in clinical time series. Lasko et al . [ 64 ] used a two-layer stacked AE (without regularization) to model longitudinal sequences of serum uric acid measurements to distinguish the uric-acid signatures of gout and acute leukemia. Razavian et al . [ 67 ] evaluated CNNs and RNNs with LSTM units to predict disease onset from laboratory test measures alone, showing better performances than logistic regression with hand-engineered, clinically relevant features.

Neural language deep models were also applied to EHRs, in particular to learn embedded representations of medical concepts, such as diseases, medications and laboratory tests, that could be used for analysis and prediction [ 92 ]. As an example, Tran et al . [ 62 ] used RBMs to learn abstractions of ICD-10 codes on a cohort of 7578 mental health patients to predict suicide risk. A deep architecture based on RNNs also obtained promising results in removing protected health information from clinical notes to leverage the automatic de-identification of free-text patient summaries [ 68 ].

The prediction of unplanned patient readmissions after discharge recently received attention as well. In this domain, Nguyen et al . [ 66 ] proposed Deepr, an end-to-end architecture based on CNNs, which detects and combines clinical motifs in the longitudinal patient EHRs to stratify medical risks. Deepr performed well in predicting readmission within 6 months and was able to detect meaningful and interpretable clinical patterns.

Deep learning in high-throughput biology is used to capture the internal structure of increasingly larger and high-dimensional data sets (e.g. DNA sequencing, RNA measurements). Deep models enable the discovery of high-level features, improving performances over traditional models, increasing interpretability and providing additional understanding about the structure of the biological data. Different works have been proposed in the literature. Here we review the general ideas and refer the reader to [ 93–96 ] for more comprehensive reviews.

The first applications of neural networks in genomics replaced conventional machine learning with deep architectures, without changing the input features. For example, Xiong et al . [ 97 ] used a fully connected feed-forward neural network to predict the splicing activity of individual exons. The model was trained using >1000 predefined features extracted from the candidate exon and adjacent introns. This method obtained higher prediction accuracy of splicing activity compared with simpler approaches, and was able to identify rare mutations implicated in splicing misregulation.

More recent works apply CNNs directly on the raw DNA sequence, without the need to define features a priori (e.g. [ 50 , 69 , 70 ]). CNNs use less parameters than a fully connected network by computing convolution on small regions of the input space and by sharing parameters between regions. This allowed training the models on larger sequence windows of DNAs, improving the detection of the relevant patterns. For example, Alipanahi et al . [ 50 ] proposed DeepBind, a deep architecture based on CNNs that predicts specificities of DNA- and RNA-binding proteins. In the reported experiment, DeepBind was able to recover known and novel sequence of motifs, quantify the effect of sequence alterations and identify functional single nucleotide variations (SNVs). Zhou and Troyanskaya [ 69 ] used CNNs to predict chromatin marks from DNA sequence. Similarly, Kelley et al . [ 70 ] developed Basset, an open-source framework to predict DNase I hypersensitivity across multiple cell types and to quantify the effect of SNVs on chromatin accessibility. CNNs were also used by Angermueller et al . [ 71 ] to predict DNA methylation states in single-cell bisulfite sequencing studies and, more recently, by Koh et al . [ 72 ] to denoise genome-wide chromatin immunoprecipitation followed by sequencing data to obtain a more accurate prevalence estimate for different chromatin marks.

While CNNs are the most widely used architectures to extract features from fixed-size DNA sequence windows, other deep architectures have been proposed as well. For example, sparse AEs were applied to classify cancer cases from gene expression profiles or to predict protein backbones [ 74 ]. Deep neural networks also enabled researchers to significantly improve the state-of-the-art drug discovery pipeline for genomic medicine [ 98 ].

Sensor-equipped smartphones and wearables are transforming a variety of mobile apps, including health monitoring [ 99 ]. As the difference between consumer health wearables and medical devices begins to soften, it is now possible for a single wearable device to monitor a range of medical risk factors. Potentially, these devices could give patients direct access to personal analytics that can contribute to their health, facilitate preventive care and aid in the management of ongoing illness [ 100 ]. Deep learning is considered to be a key element in analyzing this new type of data. However, only a few recent works used deep models within the health care sensing domain, mostly owing to hardware limitations. In fact, running an efficient and reliable deep architecture on a mobile device to process noisy and complex sensor data is still a challenging task that is likely to drain the device resources [ 101 ]. Several studies investigated solutions to overcome such hardware limitations. As an example, Lane and Georgiev [ 102 ] proposed a low-power deep neural network inference engine that exploited both Central Processing Unit (CPU) and Digital Signal Processor (DSP) of the mobile device, without leading to any major overburden of the hardware. They also proposed DeepX, a software accelerator capable of lowering the device resources required by deep learning that currently act as a severe bottleneck to mobile adoption. This architecture enabled large-scale deep learning to execute efficiently on mobile devices and significantly outperformed cloud-based off-loading solutions [ 103 ].

We did not find any relevant study applying deep learning on commercial wearable devices for health monitoring. However, a few works processed data from phones and medical monitor devices. In particular, relevant studies based on deep learning were done on Human Activity Recognition (HAR). While not directly exploring a medical application, many studies argue that the accurate predictions obtained by deep models on HAR can leverage clinical applications as well. In the health care domain, Hammerla et al . [ 75 ] evaluated CNNs and RNNs with LSTM to predict the freezing of gait in Parkinson disease (PD) patients. Freezing is a common motor complication in PD, where affected individuals struggle to initiate movements such as walking. Results based on accelerometer data from above the ankle, above the knee and on the trunk of 10 patients showed that RNNs obtained the best results, with a significantly large improvement over the other models, including CNNs. While the size of this data set was small, this study highlights the potential of deep learning in processing activity recognition measures for clinical use. Zhu et al . [ 76 ] obtained promising results in predicting Energy Expenditure (EE) from triaxial accelerometer and heart rate sensor data during ambulatory activities. EE is considered important in tracking personal activity and preventing chronic diseases such as obesity, diabetes and cardiovascular diseases. They used CNNs and significantly improved performances over regression and a shallow neural network.

In other clinical domains, deep learning, in particular CNNs and RBMs, improved over conventional machine learning in analyzing portable neurophysiological signals such as Electroencephalogram, Local Field Potentials and Photoplethysmography [ 77 , 78 ]. Differently, Sathyanarayana et al . [ 79 ] applied deep learning to predict poor or good sleep using actigraphy measurements of the physical activity of patients during awake time. In particular, by using a data set of 92 adolescents and one full week of monitored data, they showed that CNNs were able to obtain the highest specificity and sensitivity, with results 46% better than logistic regression.

Challenges and opportunities

Despite the promising results obtained using deep architectures, there remain several unsolved challenges facing the clinical application of deep learning to health care. In particular, we highlight the following key issues:

  • Data volume : Deep learning refers to a set of highly intensive computational models. One typical example is fully connected multi-layer neural networks, where tons of network parameters need to be estimated properly. The basis to achieve this goal is the availability of huge amount of data. In fact, while there are no hard guidelines about the minimum number of training documents, a general rule of thumb is to have at least about 10× the number of samples as parameters in the network. This is also one of the reasons why deep learning is so successful in domains where huge amount of data can be easily collected (e.g. computer vision, speech, natural language). However, health care is a different domain; in fact, we only have approximately 7.5 billion people all over the world (as per September 2016), with a great part not having access to primary health care. Consequently, we cannot get as many patients as we want to train a comprehensive deep learning model. Moreover, understanding diseases and their variability is much more complicated than other tasks, such as image or speech recognition. Consequently, from a big data perspective, the amount of medical data that is needed to train an effective and robust deep learning model would be much more comparing with other media.
  • Data quality : Unlike other domains where the data are clean and well-structured, health care data are highly heterogeneous, ambiguous, noisy and incomplete. Training a good deep learning model with such massive and variegate data sets is challenging and needs to consider several issues, such as data sparsity, redundancy and missing values.
  • Temporality : The diseases are always progressing and changing over time in a nondeterministic way. However, many existing deep learning models, including those already proposed in the medical domain, assume static vector-based inputs, which cannot handle the time factor in a natural way. Designing deep learning approaches that can handle temporal health care data is an important aspect that will require the development of novel solutions.
  • Domain complexity : Different from other application domains (e.g. image and speech analysis), the problems in biomedicine and health care are more complicated. The diseases are highly heterogeneous and for most of the diseases there is still no complete knowledge on their causes and how they progress. Moreover, the number of patients is usually limited in a practical clinical scenario and we cannot ask for as many patients as we want.
  • Interpretability : Although deep learning models have been successful in quite a few application domains, they are often treated as black boxes. While this might not be a problem in other more deterministic domains such as image annotation (because the end user can objectively validate the tags assigned to the images), in health care, not only the quantitative algorithmic performance is important, but also the reason why the algorithms works is relevant. In fact, such model interpretability (i.e. providing which phenotypes are driving the predictions) is crucial for convincing the medical professionals about the actions recommended from the predictive system (e.g. prescription of a specific medication, potential high risk of developing a certain disease).

All these challenges introduce several opportunities and future research possibilities to improve the field. Therefore, with all of them in mind, we point out the following directions, which we believe would be promising for the future of deep learning in health care.

  • Feature enrichment : Because of the limited amount of patients in the world, we should capture as many features as possible to characterize each patient and find novel methods to jointly process them. The data sources for generating those features need to include, but not to be limited to, EHRs, social media (e.g. there are prior research leveraging patient-reported information on social media for pharmacovigilance [ 104 , 105 ]), wearable devices, environments, surveys, online communities, genome profiles, omics data such as proteome and so on. The effective integration of such highly heterogeneous data and how to use them in a deep learning model would be an important and challenging research topic. In fact, to the best of our knowledge, the literature does not provide any study that attempts to combine different types of medical data sources using deep learning. A potential solution in this domain could exploit the hierarchical nature of deep learning and process separately every data source with the appropriate deep model, and stack the resulting representations in a joint model toward a holistic abstraction of the patient data (e.g. using layers of AEs or deep Bayesian networks).
  • Federated inference : Each clinical institution possesses its own patient population. Building a deep learning model by leveraging the patients from different sites without leaking their sensitive information becomes a crucial problem in this setting. Consequently learning deep model in this federated setting in a secure way will be another important research topic, which will interface with other mathematical domains, such as cryptography (e.g. homomorphic encryption [ 106 ] and secure multiparty computation [ 107 ]).
  • Model privacy : Privacy is an important concern in scaling up deep learning (e.g. through cloud computing services). In fact, a recent work by Tramèr et al . [ 108 ] has demonstrated the vulnerability of Machine Learning (ML)-as-a-service (i.e. ‘predictive analytics’) on a set of common models including deep neural networks. The attack abides all authentication or access-control mechanisms but infers parameters or training data through exposed Application Program Interface (APIs), which breaks the model and personal privacy. This issue is well known to the privacy community, and researchers have developed a principled framework called ‘differential privacy’ [ 109 , 110 ] to ensure the indistinguishability of individual samples in training data through their functional outputs [ 111 ]. However, naive approaches might render outputs useless or cannot provide sufficient protection [ 22 ], which makes the development of practically useful differential privacy solutions nontrivial. For example, Chaudhuri et al . [ 112 ] developed differential private methods to protect the parameters trained for the logistic regression model. Preserving the privacy of deep learning models is even more challenging, as there are more parameters to be safeguarded and several recent works have pushed the fronts in this area [ 113–115 ]. Yet, considering all the personal information likely to be processed by deep models in clinical applications, the deployment of intelligent tools for next-generation health care needs to consider these risks and attempt to implement a differential privacy standard.
  • Incorporating expert knowledge : The existing expert knowledge for medical problems is invaluable for health care problems. Because of the limited amount of medical data and their various quality problems, incorporating the expert knowledge into the deep learning process to guide it toward the right direction is an important research topic. For example, online medical encyclopedia and PubMed abstracts should be mined to extract reliable content that can be included in the deep architecture to leverage the overall performances of the systems. Also semi-supervised learning, an effective scheme to learn from large amount of unlabeled samples with only a few labeled samples, would be of great potential because of its capability of leveraging both labeled (which encodes the knowledge) and unlabeled samples [ 105 ].
  • Temporal modeling : Considering that the time factor is important in all kinds of health care-related problems, in particular in those involving EHRs and monitoring devices, training a time-sensitive deep learning model is critical for a better understanding of the patient condition and for providing timely clinical decision support. Thus, temporal deep learning is crucial for solving health care problems (as already shown in some of the early studies reported in the literature review). To this aim, we expect that RNNs as well as architectures coupled with memory (e.g. [ 86 ]) and attention mechanisms (e.g. [ 116 ]) will play a more significant role toward better clinical deep architectures.
  • Interpretable modeling : Model performance and interpretability are equally important for health care problems. Clinicians are unlikely to adopt a system they cannot understand. Deep learning models are popular because of their superior performance. Yet, how to explain the results obtained by these models and how to make them more understandable is of key importance toward the development of trustable and reliable systems. In our opinion, research directions will include both algorithms to explain the deep models (i.e. what drives the hidden units of the networks to turn on/off along the process—see e.g. [ 117 ]) as well as methods to support the networks with existing tools that explain the predictions of data-driven systems (e.g. see [ 118 ]).

Applications

Deep learning methods are powerful tools that complement traditional machine learning and allow computers to learn from the data, so that they can come up with ways to create smarter applications. These approaches have already been used in a number of applications, especially for computer vision and natural language processing. All the results available in the literature illustrate the capabilities of deep learning for health care data analysis as well. In fact, processing medical data with multi-layer neural networks increased the predictive power for several specific applications in different clinical domains. Additionally, because of their hierarchical learning structure, deep architectures have the potential to integrate diverse data sets across heterogeneous data types and provide greater generalization given the focus on representation learning and not simply on classification accuracy.

Consequently, we believe that deep learning can open the way toward the next generation of predictive health care systems that can (i) scale to include many millions to billions of patient records and (ii) use a single, distributed patient representation to effectively support clinicians in their daily activities—rather than multiple systems working with different patient representations and data. Ideally, this representation would join all the different data sources, including EHRs, genomics, environment, wearables, social activities and so on, toward a holistic and comprehensive description of an individual status. In this scenario, the deep learning framework would be deployed into a health care platform (e.g. a hospital EHR system) and the models would be constantly updated to follow the changes in the patient population.

Such deep representations can then be used to leverage clinician activities in different domains and applications, such as disease risk prediction, personalized prescriptions, treatment recommendations, clinical trial recruitment as well as research and data analysis. As an example, Wang et al . recently won the Parkinson’s Progression Marker’s Initiative data challenge on subtyping Parkinson’s disease using a temporal deep learning approach [ 119 ]. In fact, because Parkinson’s disease is highly progressive, the traditional vector or matrix-based approach may not be optimal, as it is unable to accurately capture the disease progression patterns, as the entries in those vectors/matrices are typically aggregated over time. Consequently, the authors used the LSTM RNN model and identified three interesting subtypes for Parkinson’s disease, wherein each subtype demonstrates common disease progression trends. We believe that this work shows the great potential of deep learning models in real-world health care problems and how it could lead to more reliable and robust automatic systems in the near future.

Last, more broadly, deep learning can serve as a guiding principle to organize both hypothesis-driven research and exploratory investigation in clinical domains (e.g. clustering, visualization of patient cohorts, stratification of disease populations). For this potential to be realized, statistical and medical tasks must be integrated at all levels, including study design, experiment planning, model building and refinement and data interpretation.

  • The fastest growing types of data in biomedical research, such as EHRs, imaging, -omics profiles and monitor data, are complex, heterogeneous, poorly annotated and generally unstructured.
  • Early applications of deep learning to biomedical data showed effective opportunities to model, represent and learn from such complex and heterogeneous sources.
  • State-of-the-art deep learning approaches need to be improved in terms of data integration, interpretability, security and temporal modeling to be effectively applied to the clinical domain.
  • Deep learning can open the way toward the next generation of predictive health care systems, which can scale to include billions of patient records and rely on a single holistic patient representation to effectively support clinicians in their daily activities.
  • Deep learning can serve as a guiding principle to organize both hypothesis-driven research and exploratory investigation in clinical domains based on different sources of data.

This study was supported by the following grants from the National Institute of Health: National Human Genome Research Institute (R00-HG008175) to S.W.; and National Library of Medicine (R21-LM012060); National Institute of Biomedical Imaging & Bioengineering (U01EB023685); (R01GM118609) to X.J.; National Institute of Diabetes and Digestive and Kidney Diseases (R01-DK098242-03); National Cancer Institute (U54-CA189201-02); and National Center for Advancing Translational Sciences (UL1TR000067) Clinical and Translational Science Awards to J.T.D. This study was also supported by the National Science Foundation: Information and Intelligent Systems (1650723) to F.W. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Biographies

Riccardo Miotto , PhD, is a senior data scientist in the Institute for Next Generation Healthcare, Department of Genetics and Genomic Sciences at the Icahn School of Medicine at Mount Sinai, New York, NY.

Fei Wang , PhD, is an assistant professor in the Division of Health Informatics, Department of Healthcare Policy and Research at Weill Cornell Medicine at Cornell University, New York, NY.

Shuang Wang , PhD, is an assistant professor in the Department of Biomedical Informatics at the University of California San Diego, La Jolla, CA.

Xiaoqian Jiang is an assistant professor in the Department of Biomedical Informatics at the University of California San Diego, La Jolla, CA.

Joel T. Dudley , PhD, is the executive director of the Institute for Next Generation Healthcare and associate professor in the Department of Genetics and Genomic Sciences at the Icahn School of Medicine at Mount Sinai, New York, NY.

Help | Advanced Search

Computer Science > Computer Vision and Pattern Recognition

Title: fuss-free network: a simplified and efficient neural network for crowd counting.

Abstract: In the field of crowd-counting research, many recent deep learning based methods have demonstrated robust capabilities for accurately estimating crowd sizes. However, the enhancement in their performance often arises from an increase in the complexity of the model structure. This paper introduces the Fuss-Free Network (FFNet), a crowd counting deep learning model that is characterized by its simplicity and efficiency in terms of its structure. The model comprises only a backbone of a neural network and a multi-scale feature fusion structure.The multi-scale feature fusion structure is a simple architecture consisting of three branches, each only equipped with a focus transition module, and combines the features from these branches through the concatenation operation.Our proposed crowd counting model is trained and evaluated on four widely used public datasets, and it achieves accuracy that is comparable to that of existing complex models.The experimental results further indicate that excellent performance in crowd counting tasks can also be achieved by utilizing a simple, low-parameter, and computationally efficient neural network structure.

Submission history

Access paper:.

  • HTML (experimental)
  • Other Formats

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

A survey on spatio-temporal series prediction with deep learning: taxonomy, applications, and future directions

  • Published: 09 April 2024

Cite this article

  • Feiyan Sun 1 , 2 ,
  • Wenning Hao   ORCID: orcid.org/0000-0002-1526-7889 1 ,
  • Ao Zou 1 &
  • Qianyan Shen 2  

116 Accesses

Explore all metrics

With the rapid development of data acquisition and storage technology, spatio-temporal (ST) data in various fields are growing explosively, so many ST prediction methods have emerged. The review presented in this paper mainly studies the prediction of ST series. We propose a new taxonomy organized along three dimensions: ST series prediction methods (focusing on time feature learning, focusing on spatial feature learning, and focusing on spatial–temporal feature learning), techniques of ST series prediction (the RNN-, CNN-, and transformer-based models, as well as the CNN-based-composite model and GNN-based-composite models, and the miscellaneous model) and ST series prediction results (single target and multi-target). We first introduce and explain each dimension of the taxonomy in detail. After providing this three-dimensional view, we comprehensively review and compare the recent related ideas in the literature and analyze their advantages and limitations. Moreover, we summarize the key information of the existing literature and provide guidance for researchers to select suitable models. Second, we summarize the different applications of deep learning models in ST series prediction based on current literature and list relevant datasets and download links per application classifications. Lastly, we comprehensively analyze the current innovation and challenges and suggest future directions for researching ST series prediction after comparing and analyzing the computing performance of these forecasting models. In addition, each method or model solves one aspect of the challenge, which means that two or more methods should be combined to solve more challenges at the same time. We hope this article provides readers a broader and deeper understanding of the field of ST series research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

deep neural network research paper

Similar content being viewed by others

deep neural network research paper

BasicTS: An Open Source Fair Multivariate Time Series Prediction Benchmark

deep neural network research paper

A Multi-output Integration Residual Network for Predicting Time Series Data with Diverse Scales

deep neural network research paper

An Analysis of Deep Neural Networks for Predicting Trends in Time Series Data

Laptev N, Yosinski J, Li LE, Smyl S (2017) Time-series extreme event forecasting with neural networks at uber. Int Conf Mach Learn 34:1–5

Google Scholar  

Diao Z, Wang X, Zhang D, Liu Y, Xie K, He S (2019) July) Dynamic spatial-temporal graph convolutional neural networks for traffic forecasting. Proc AAAI Conf Artif Intell 33(01):890–897

Liang Y, Ke S, Zhang J, Yi X, Zheng Y (2018) Geoman: Multi-level attention networks for geo-sensory time series prediction. IJCAI 2018:3428–3434

Wu B, Wang L, Zeng Y-R (2022) Interpretable wind speed prediction with multivariate time series and temporal fusion transformers. Energy 252:123990

Article   Google Scholar  

Deb C et al (2017) A review on time series forecasting techniques for building energy consumption. Renew Sustain Energy Rev 74:902–924

Nam S, Hur J (2019) A hybrid spatio-temporal forecasting of solar generating resources for grid integration. Energy 177:503–510

Zhao G, Xue M, Cheng L (2023) A new hybrid model for multi-step WTI futures price forecasting based on self-attention mechanism and spatial-temporal graph neural network. Resour Policy 85:103956

Yang Y, Zhang H (2019) Spatial-temporal forecasting of tourism demand. Ann Tour Res 75:106–119

Hou X, Wang K, Zhong C, Wei Z (2021) St-trader: A spatial-temporal deep neural network for modeling stock market movement. IEEE/CAA J Autom Sin 8(5):1015–1024

Xu W, Wang Q, Chen R (2018) Spatio-temporal prediction of crop disease severity for agricultural emergency management based on recurrent neural networks. GeoInformatica 22:363–381

Khaki S, Wang L (2019) Crop yield prediction using deep neural networks. Front Plant Sci 10:621

Williams BM, Hoel LA (2003) Modeling and forecasting vehicular traffic flow as a seasonal ARIMA process: theoretical basis and empirical results. J Transp Eng 129(6):664–672

Baum LE, Petrie T (1966) Statistical inference for probabilistic functions of finite state Markov chains. Ann Math Stat 37(6):1554–1563

Article   MathSciNet   Google Scholar  

Murphy KP (2012) Machine learning: a probabilistic perspective. MIT press, pp 38–46

Martínez F, Frías MP, Pérez MD, Rivera AJ (2019) A methodology for applying k-nearest neighbor to time series forecasting. Artif Intell Rev 52(3):2019–2037

Camastra F et al (2022) Prediction of environmental missing data time series by support vector machine regression and correlation dimension estimation. Environ Model Softw 150:105343

Lai RK, Fan CY, Huang WH, Chang PC (2009) Evolving and clustering fuzzy decision tree for financial time series data forecasting. Expert Syst Appl 36(2):3761–3773

Morid MA, Sheng ORL, Dunbar J (2023) Time series prediction using deep learning methods in healthcare. ACM Trans Manage Inf Syst 14(1):1–29

Hou Y et al (2022) Deep learning methods in short-term traffic prediction: a survey. Inf Technol Control 51(1):139–157

Xu L, Chen N, Chen Z, Zhang C, Yu H (2021) Spatiotemporal forecasting in earth system science: methods, uncertainties, predictability and future directions. Earth Sci Rev 222:103828

Hewamalage H, Bergmeir C, Bandara K (2021) Recurrent neural networks for time series forecasting: current status and future directions. Int J Forecast 37(1):388–427

Sahili ZA, Awad M (2023) Spatio-temporal graph neural networks: a survey. arXiv preprint arXiv:2301.10569

Lim B, Zohren S (2021) Time-series forecasting with deep learning: a survey. Phil Trans R Soc A 379(2194):209

Benidis K et al (2022) Deep learning for time series forecasting: tutorial and literature survey. ACM Comput Surv 55(6):1–36

Hamdi A et al (2022) Spatiotemporal data mining: a survey on challenges and open problems. Artif Intell Rev 2022:1–48

Wang S, Cao J, Philip SY (2020) Deep learning for spatio-temporal data mining: a survey. IEEE Trans Knowl Data Eng 34(8):3681–3700

Paré G, Kitsiou S (2017) Chapter 9 methods for literature reviews. In: Handbook of eHealth evaluation: an evidence-based approach [Internet]. University of Victoria. https://www.ncbi.nlm.nih.gov/books/NBK481583/

Shih SY, Sun FK, Lee H (2019) Temporal pattern attention for multivariate time series forecasting. Mach Learn 108:1421–1441

Memory LST (2010) Long short-term memory. Neural Comput 9(8):1735–1780

Cho K, Van Merriënboer B, Gulcehre C et al. (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation arXiv preprint arXiv:1406.1078

Jung Y, Jung J, Kim B, Han S (2020) Long short-term memory recurrent neural network for modeling temporal patterns in long-term power forecasting for solar PV facilities: case study of South Korea. J Clean Prod 250:119476

Liu X, Lin Z (2021) Impact of Covid-19 pandemic on electricity demand in the UK based on multivariate time series forecasting with bidirectional long short term memory. Energy 227:120455

Lu W, Li J, Li Y, Sun A, Wang J (2020) A CNN-LSTM-based model to forecast stock prices. Complexity 2020:1–10

Livieris IE, Pintelas E, Pintelas P (2020) A CNN–LSTM model for gold price time-series forecasting. Neural Comput Appl 32:17351–17360

Chimmula VKR, Zhang L (2020) Time series forecasting of COVID-19 transmission in Canada using LSTM networks. Chaos Solitons Fractals 135:109864

Bai S, Kolter JZ, Koltun V (2018) An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint: arXiv:1803.01271

Li Y, Li K, Chen C, Zhou X, Zeng Z, Li K (2021) Modeling temporal patterns with dilated convolutions for time-series forecasting. ACM Trans Knowl Discov Data (TKDD) 16(1):1–22

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In NeurIPS, 30

Wu N et al. (2020) Deep transformer models for time series forecasting: the influenza prevalence case. arXiv preprint arXiv:2001.08317

Li S, Jin X, Xuan Y, Zhou X, Chen W, Wang Y-X, Yan X(2019) Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. In NeurIPS, 32

Zhang Y, Yan J (2022) Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting. In: The eleventh international conference on learning representations

Zhou H, Zhang S, Peng J, Zhang S, Li J, Xiong H, Zhang W (2021) May) Informer: beyond efficient transformer for long sequence time-series forecasting. Proc AAAI Conf Artif Intell 35(12):11106–11115

Liu S, Yu H, Lia C, Li J, Lin W, Liu AX, Dustdar S (2021) Pyraformer: low-complexity pyramidal attention for long-range time series modeling and forecasting. In: International conference on learning representations

Zhou T, Ma Z, Wen Q, et al (2022) Fedformer: frequency enhanced decomposed transformer for long-term series forecasting. In: International conference on machine learning.. PMLR, pp 27268–27286

Wu H, Xu J, Wang J, Long M (2021) Autoformer: decomposition transformers with auto-correlation for long-term series forecasting. In: Proceedings of the advances in neural information processing systems (NeurIPS), pp 101–112

Oreshkin BN, Carpov D, Chapados N, Bengio Y (2019) N-BEATS: Neural basis expansion analysis for interpretable time series forecasting. arXiv preprint arXiv:1905.10437

Challu C, Olivares KG, Oreshkin BN, Ramirez FG, Canseco MM, Dubrawski A (2023) June) Nhits: Neural hierarchical interpolation for time series forecasting. Proc AAAI Conf Artif Intell 37(6):6989–6997

Nie Y et al. (2022) A time series is worth 64 words: long-term forecasting with transformers. arXiv preprint arXiv:2211.14730

Lim B, Arık SÖ, Loeff N, Pfister T (2021) Temporal fusion transformers for interpretable multi-horizon time series forecasting. Int J Forecast 37(4):1748–1764

Woo G et al. (2022) Etsformer: exponential smoothing transformers for time-series forecasting. arXiv preprint arXiv:2202.01381

Grigsby J, Wang Z, Qi Y (2021) Long-range transformers for dynamic spatiotemporal forecasting. arXiv preprint arXiv:2109.12218

Liu Y, Hu T, Zhang H, Wu H, Wang S, Ma L, Long M (2023) iTransformer: inverted transformers are effective for time series forecasting. arXiv preprint arXiv:2310.06625

LeCun Y (1989) Generalization and network design strategies. Connectionism Perspective 19:143–155

Krizhevsky A, Sutskever I, Geoffrey E (2017) Hinton, Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90

LeCun Y, Bengio Y, Hinton G (2015) Deep Learn Nat 521(7553):436–444

Sen R, Yu H-F, Dhillon IS (2019) Think globally, act locally, a deep neural network approach to high-dimensional time series forecasting In: NeurIPS, 32

Zhang J, Zheng Y, Qi D, Li R, Yi X (2016) DNN-based prediction model for spatio-temporal data. In: Proceedings of the 24th ACM SIGSPATIAL international conference on advances in geographic information systems, pp 1–4

Zhang J, Zheng Y, Qi D (2017) Deep spatio-temporal residual networks for citywide crowd flows prediction. In: Proceedings of the AAAI conference on artificial intelligence 31(1)

Yu B, Yin H, Zhu Z (2018) Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting. In: Proceedings of the 27th international joint conference on artificial intelligence, pp 3634–3640

Guo S, Lin Y, Feng N et al (2019) Attention based spatial-temporal graph convolutional networks for traffic flow forecasting. Proc AAAI Conf Artif Intell 33:922–929

Geng X, Li Y, Wang L et al (2019) Spatiotemporal multi-graph convolution network for ride-hailing demand forecasting. Proc AAAI Conf Artif Intell 33:3656–3663

Guo S, Lin Y, Wan H, Li X, Cong G (2022) Learning dynamics and heterogeneity of spatial-temporal graph data for traffic forecasting. IEEE Trans Knowl Data Eng 34(11):5415–5428

Lai G, Chang W C, Yang Y et al. (2018) Modeling long-and short-term temporal patterns with deep neural networks. The 41st international ACM SIGIR conference on research and development in information retrieval, pp 95–104

Zheng C, Fan X, Wang C, Qi J (2020) Gman: a graph multi-attention network for traffic prediction. Proc AAAI Conf Artif Intell 34(01):1234–1241

Zhao Y, Shen Y, Zhu Y, Yao J (2018) Forecasting wavelet transformed time series with attentive neural networks. In: 2018 IEEE international conference on data mining (ICDM), pp 1452–1457 IEEE

Yao H, Tang X, Wei H, Zheng G, Li Z (2019) July) Revisiting spatial-temporal similarity: a deep learning framework for traffic prediction. Proc AAAI Conf Artif Intell 32(01):5668–5675

Li Y, Yu R, Shahabi C, Liu Y (2017) Diffusion convolutional recurrent neural network: data-driven traffic forecasting. arXiv preprint arXiv:1707.01926

Zhang J, Zheng Y, Qi D, Li R, Yi X, Li T (2018) Predicting citywide crowd flows using deep spatio-temporal residual networks. Artif Intell 259:147–166

Ali A, Zhu Y, Zakarya M (2021) A data aggregation based approach to exploit dynamic spatio-temporal correlations for citywide crowd flows prediction in fog computing. Multimedia Tools Appl, pp 1–33

Wang H et al. (2023) MICN: multi-scale local and global context modeling for long-term series forecasting. In: The eleventh international conference on learning representations

Zhao L, Song Y, Zhang C et al (2019) T-GCN: a temporal graph convolutional network for traffic temporal. IEEE Trans Intell Transp Syst 21(9):3848–3858

Song C, Lin Y, Guo S et al (2020) Spatial-temporal synchronous graph convolutional networks: a new framework for spatial-temporal network data forecasting. Proc AAAI Conf Artif Intell 34(01):914–921

Sun J, Zhang J, Li Q, Yi X, Liang Y, Zheng Y (2020) Predicting citywide crowd flows in irregular regions using multi-view graph convolutional networks. IEEE Trans Knowl Data Eng 34(5):2348–2359

Dai R, Xu S, Gu Q et al. (2020) Hybrid spatio-temporal graph convolutional network: Improving traffic prediction with navigation data. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery and data mining. [S.l.]: ACM, pp 3074–3082

Wu Z, Pan S, Long G et al. (2020) Connecting the dots: Multivariate time series forecasting with graph neural networks. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pp 753–763

Wu Z, Pan S, Long G, Jiang J, Zhang C (2019) Graph wavenet for deep spatial-temporal graph modeling. Proc Twenty Eighth Int Joint Conf Artif Intell IJCAI 2019:1907–1913

Li M, Zhu Z (2021) Spatial-temporal fusion graph neural networks for traffic flow forecasting. Proc AAAI Conf Artif Intell 35(5):4189–4196

Shin Y, Yoon Y (2022) Pgcn: progressive graph convolutional networks for spatial-temporal traffic forecasting. arXiv preprint arXiv:2202.08982

Zhengyang Z et al. (2023) GReTo: remedying dynamic graph topology-task discordance via target homophily. In: The eleventh international conference on learning representations

Li F, Feng J, Yan H, Jin G, Yang F, Sun F, Li Y (2023) Dynamic graph convolutional recurrent network for traffic prediction: benchmark and solution. ACM Trans Knowl Discov Data 17(1):1–21

Qiao M et al (2023) KSTAGE: A knowledge-guided spatial-temporal attention graph learning network for crop yield prediction. Inf Sci 619:19–37

Fan J et al. (2022) A GNN-RNN approach for harnessing geospatial and temporal information: application to crop yield prediction. In: Proceedings of the AAAI conference on artificial intelligence 36(11)

Wang L et al. (2022) Causalgnn: causal-based graph neural networks for spatio-temporal epidemic forecasting. In: Proceedings of the AAAI conference on artificial intelligence 36(11)

Zhang W, Zhang C, Tsung F (2021) Transformer based spatial-temporal fusion network for metro passenger flow forecasting. In: 2021 IEEE 17th international conference on automation science and engineering (CASE), pp 1515–1520, IEEE

Shao Z, Zhang Z, Wang F, Xu Y (2022) Pre-training enhanced spatial-temporal graph neural network for multivariate time series forecasting. In: Proceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining, pp 1567–1577

Huo G, Zhang Y, Wang B, Gao J, Hu Y, Yin B (2023) Hierarchical spatio-temporal graph convolutional networks and transformer network for traffic flow forecasting. IEEE Trans Intell Transp Syst 24(4):3855–3867

Xu Y, Zhao X, Zhang X, Paliwal M (2023) Real-time forecasting of dockless scooter-sharing demand: a spatio-temporal multi-graph transformer approach. IEEE Trans Intell Transp Syst

Sun Y, Jiang G, Lam SK et al (2022) Multi-fold correlation attention network for predicting traffic speeds with heterogeneous frequency. Appl Soft Comput 124:108977

Zhang C, Zhu F, Lv Y et al (2021) MLRNN: Taxi demand prediction based on multi-level deep learning and regional heterogeneity analysis. IEEE Trans Intell Transp Syst 23(7):8412–8422

Schirmer M, Eltayeb M, Lessmann S, Rudolph M (2022) Modeling irregular time series with continuous recurrent units. In: International conference on machine learning, pp 19388–19405 PMLR

Fu Y, Wu D, Boulet B (2022) June) Reinforcement learning based dynamic model combination for time series forecasting. Proc AAAI Conf Artif Intell 36(6):6639–6647

Feng C, Zhang J (2019) Reinforcement learning based dynamic model selection for short-term load forecasting. In: 2019 IEEE power and energy society innovative smart grid technologies conference (ISGT), pp 1–5 IEEE

Feng C, Sun M, Zhang J (2019) Reinforced deterministic and probabilistic load forecasting via Q-learning dynamic model selection. IEEE Trans Smart Grid 11(2):1377–1386

Liu X, Liang Y, Huang C, Zheng Y, Hooi B, Zimmermann R (2022) When do contrastive learning signals help spatio-temporal graph forecasting?. In: Proceedings of the 30th international conference on advances in geographic information systems, pp 1–12

Pöppelbaum J, Chadha GS, Schwung A (2022) Contrastive learning based self-supervised time-series analysis. Appl Soft Comput 117:108397

Oreshkin BN, Carpov D, Chapados N, Bengio Y (2021) May) Meta-learning framework with applications to zero-shot time-series forecasting. Proc AAAI Conf Artif Intell 35(10):9242–9250

Talagala TS, Hyndman RJ, Athanasopoulos G (2023) Meta-learning how to forecast time series. J Forecast 42(6):1476–1501

Woo G, Liu C, Sahoo D, Kumar A, Hoi S (2023) Learning deep time-index models for time series forecasting

He H, Zhang Q, Bai S, Yi K, Niu Z (2022) CATN: Cross attentive tree-aware network for multivariate time series forecasting. Proc AAAI Conf Artif Intell 36(4):4030–4038

Lv Y, Lv Z, Cheng Z, Zhu Z, Rashidi TH (2023) TS-STNN: Spatial-temporal neural network based on tree structure for traffic flow prediction. Transp Res Part E Logist Transp Rev 177:103251

Marjanović M, Krautblatter M, Abolmasov B, Đurić U, Sandić C, Nikolić V (2018) The rainfall-induced landsliding in Western Serbia: a temporal prediction approach using Decision Tree technique. Eng Geol 232:147–159

Rady EHA, Fawzy H, Fattah AMA (2021) Time series forecasting using tree-based methods. J Stat Appl Probab 10(1):229–244

Qiu X, Zhang L, Suganthan PN, Amaratunga GA (2017) Oblique random forest ensemble via least square estimation for time series forecasting. Inf Sci 420:249–262

Zonoozi A, Kim J-J, Li X-L, Cong G (2018) Periodic-CRN: A convolutional recurrent model for crowd density prediction with recurring periodic patterns. In: IJCAI, pp 3732–3738

Chen C, Li K, Teo SG, Chen G, Zou X, Yang X, Vijay RC, Jiashi Feng, and Zeng Zeng. (2018) Exploiting spatio-temporal correlations with multiple 3d CNNs for citywide vehicle flow prediction. In 2018 IEEE international conference on data mining (ICDM). IEEE, pp 893–898

Zhu J, Han X, Deng H, Tao C, Zhao L, Wang P, Lin T, Li H (2022) Kst-gcn: A knowledge-driven spatial-temporal graph convolutional network for traffic forecasting. IEEE Trans Intell Transp Syst

Ta X, Liu Z, Hu X, Yu L, Sun L, Du B (2022) Adaptive spatio-temporal graph neural network for traffic forecasting. Knowl Based Syst, p 108199

Lin H, Gao Z, Xu Y et al (2022) Conditional local convolution for spatio-temporal meteorological forecasting[C]. Proc AAAI Conf Artif Intell 36(7):7470–7478

Li H (2022) Short-term wind power prediction via spatial temporal analysis and deep residual networks. Front Energy Res 10:662

Gao J, Sharma R, Qian C, Glass LM, Spaeder J, Romberg J, Sun J, Xiao C (2021) STAN: spatiotemporal attention network for pandemic prediction using real-world evidence. J Am Med Inform Assoc 28(4):733–743

Harvey A, Kattuman P (2020) Time series models based on growth curves with applications to forecasting coronavirus. Covid economics, vetted and real-time papers (24)

Chatzis SP et al (2018) Forecasting stock market crisis events using deep and statistical machine learning techniques. Expert Syst Appl 112:353–371

Wang AX, Tran C, Desai N, Lobell D, Ermon S (2018) Deep transfer learning for crop yield prediction with remote sensing data. In: Proceedings of the 1st ACM SIGCAS conference on computing and sustainable societies, pp 1–5

Trevisan R, Bullock D, Martin N (2021) Spatial variability of crop responses to agronomic inputs in on-farm precision experimentation. Precision Agric 22:342–363

Hong T, Pinson P, Wang Y, Weron R, Yang D, Zareipour H (2020) Energy forecasting: a review and outlook. IEEE Open Access J Power Energy 7:376–388

Liang J, Tang W (2022) Ultra-short-term spatiotemporal forecasting of renewable resources: An attention temporal convolutional network-based approach. IEEE Trans Smart Grid 13(5):3798–3812

Gu Q, Feng M, Lin Y (2022) Research on retailer churn prediction based on spatial-temporal features. In: 2022 7th International conference on intelligent computing and signal processing (ICSP), pp 876–884 IEEE

Punia S, Nikolopoulos K, Singh SP, Madaan JK, Litsiou K (2020) Deep learning with long short-term memory networks and random forests for demand forecasting in multi-channel retail. Int J Prod Res 58(16):4964–4979

Download references

Acknowledgements

This research is supported by Defense Industrial Technology Development Program, Grant/Award Number: JCKY2020601B018; Research Fund of Jinling Institute of Technology for Advanced Talents, Grant/Award Number: jit-b-201805. The authors would like to thank Haijun Zhang, the associate editor of Neural Computing and Applications, and anonymous reviewers for their insightful comments and suggestions. As a result, this paper has been improved substantially.

Author information

Authors and affiliations.

Army Engineering University of PLA, Nanjing, China

Feiyan Sun, Wenning Hao & Ao Zou

Jinling Institute of Technology, Nanjing, China

Feiyan Sun & Qianyan Shen

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Wenning Hao .

Ethics declarations

Conflict of interest.

The authors declare that they have no conflict of interest.

Data availability

Datasets have been provided as links in the manuscript.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix 1. Methods

See Table 12 .

Appendix 2. Table Note

Here, we provide keys to help read the tables in the paper,

Results means whether the method does single-target predictions and predicts multi-target simultaneously.

Loss and metrics indicates the loss of training and metrics of evaluation. Because the definitions can be found in relevant papers, we provide only explanations of abbreviations here: mean absolute error (MAE), mean relative error (MRE), mean absolute percentage error (MAPE), normalized root mean squared error (NRMSE, RMSE, MSE), L1 loss (MAE), L2 loss (MSE), quantile loss (QL), empirical correlation coefficient (CORR), root relative squared error (RRSE), negative log-likelihood (NLL), ρ-quantile loss R_ρ with ρϵ(0,1), and symmetric mean absolute percentage error (sMAPE).

Structure refers to the different combinations of time and spatial modeling, including series, parallel, and fusion structures. Series structure models one dimension first, using the output obtained as input for modeling another dimension, and then models the other dimension. One example is modeling the temporal dependency relationship of input features first, using the resulting output as input to the spatial relationship extraction module, conducting spatial modeling, and finally obtaining the final predicted value. Parallel structure means the input sequence is simultaneously input into both the time and spatial networks for learning temporal and spatial dependencies. The obtained time and spatial network information are fused before being applied as the input sequence to the next layer. After an intervention round, further learning is done to obtain the final prediction result. Fusion structure refers to time modeling and spatial modeling that are not independent but cross-integrated, for example, when time modeling is conducted, each time step incorporates the spatial information of the nodes rather than simply their own time series.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Sun, F., Hao, W., Zou, A. et al. A survey on spatio-temporal series prediction with deep learning: taxonomy, applications, and future directions. Neural Comput & Applic (2024). https://doi.org/10.1007/s00521-024-09659-1

Download citation

Received : 03 July 2023

Accepted : 25 March 2024

Published : 09 April 2024

DOI : https://doi.org/10.1007/s00521-024-09659-1

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Spatio-temporal data
  • Spatio-temporal series prediction
  • Three-dimensional taxonomy
  • Find a journal
  • Publish with us
  • Track your research

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 09 April 2024

Tetris-inspired detector with neural network for radiation mapping

  • Ryotaro Okabe   ORCID: orcid.org/0000-0002-5095-5951 1 , 2   na1 ,
  • Shangjie Xue 1 , 3 , 4   na1 ,
  • Jayson R. Vavrek 5   na1 ,
  • Jiankai Yu   ORCID: orcid.org/0000-0001-8029-9847 3 ,
  • Ryan Pavlovsky 5 ,
  • Victor Negut 5 ,
  • Brian J. Quiter 5 ,
  • Joshua W. Cates 5 ,
  • Tongtong Liu 1 , 6 ,
  • Benoit Forget 3 ,
  • Stefanie Jegelka   ORCID: orcid.org/0000-0002-6121-9474 4 ,
  • Gordon Kohse 7 ,
  • Lin-wen Hu   ORCID: orcid.org/0000-0003-3126-2225 7 &
  • Mingda Li   ORCID: orcid.org/0000-0002-7055-6368 1 , 3  

Nature Communications volume  15 , Article number:  3061 ( 2024 ) Cite this article

819 Accesses

78 Altmetric

Metrics details

  • Computational science
  • Mechanical engineering
  • Scientific data

Radiation mapping has attracted widespread research attention and increased public concerns on environmental monitoring. Regarding materials and their configurations, radiation detectors have been developed to identify the position and strength of the radioactive sources. However, due to the complex mechanisms of radiation-matter interaction and data limitation, high-performance and low-cost radiation mapping is still challenging. Here, we present a radiation mapping framework using Tetris-inspired detector pixels. Applying inter-pixel padding for enhancing contrast between pixels and neural networks trained with Monte Carlo (MC) simulation data, a detector with as few as four pixels can achieve high-resolution directional prediction. A moving detector with Maximum a Posteriori (MAP) further achieved radiation position localization. Field testing with a simple detector has verified the capability of the MAP method for source localization. Our framework offers an avenue for high-quality radiation mapping with simple detector configurations and is anticipated to be deployed for real-world radiation detection.

Similar content being viewed by others

deep neural network research paper

Geometry-enhanced pretraining on interatomic potentials

Taoyong Cui, Chenyu Tang, … Wanli Ouyang

deep neural network research paper

Intelligent wireless power transfer via a 2-bit compact reconfigurable transmissive-metasurface-based router

Wenzhi Li, Qiyue Yu, … Jiaran Qi

deep neural network research paper

Roadmapping the next generation of silicon photonics

Sudip Shekhar, Wim Bogaerts, … Bhavin J. Shastri

Introduction

Since the Fukushima nuclear accident in 2011 till the recent risk at Zaporizhzhia nuclear power plant, there is an increasing global need calling for improved radiation detection technology, aiming to achieve high-performance radiation detection mapping with minimum impact on detectors and reduced cost. Due to the simultaneous presence of multiple radiation-interaction mechanisms, radiation detection for ionizing radiation is considerably harder than visible light. The large penetration depth of radiation, such as hard X-ray, γ -ray, and neutron, reduces the angular sensitivity of detectors and limits the majority of radiation detection efforts to focus on counting or spectra acquisition rather than their directional information. The challenge of acquiring directional radiation information further triggers additional difficulties in performing source localization, that to determine the position distributions of radiation sources 1 , 2 . In recent years, radiation localization has attracted increased interest with applications such as autonomous nuclear site inspection. Several prototypes of system configurations have been proposed including unmanned ground 2 , 3 , 4 , aerial 3 , 5 , 6 , 7 and underwater vehicles 8 , 9 . Despite the remarkable progress, the information extraction process of the radioactive environment is still seeking active developments.

In past decades, several approaches have been proposed for directional radiation detection. One approach is the High-Efficiency Multimode Imager (HEMI), which can be used to detect and locate γ -ray and X-ray radioactive sources 10 , 11 , 12 , 13 . A typical HEMI consists of two layers of CdZnTe (CZT) detectors; the first layer has a randomly arranged aperture for coded aperture imaging, and the second layer is the conventional co-planar detector grid. This system requires the incident beam to only come from a limited solid angle range to make sure the beam passes through the aperture of the first layer to interact with the second layer. The traditional reconstruction algorithm requires all the incident beams to come within the field of view. The accuracy will be affected if the radiation is incident from another direction (especially for near-field radiation). Besides, this system can only conditionally detect multiple sources, usually when the sources come from different isotopes and can be distinguished by energy. In this scenario, the detection with multiple sources can be reduced to single-source detection by only considering the count of events within an energy range. However, in real-world applications, different sources are not necessarily distinguishable in the energy spectrum. Besides HEMI, another approach for directional radiation detection is realized by using single pad detectors separated by padding material 14 i.e., self-shielded method. Radiation sources from different directions and distances can result in different intensity distribution patterns over detector arrays. Because of the inaccuracy of the model caused by misalignment and manufacturing errors of detector and shielding material, it is challenging to extract information from detector data via a traditional method such as non-linear fitting. Also, the traditional method is often most efficient in single source with reduced efficiency in multiple sources. As for radiation localization and mapping, inspired by the widespread interest in Simultaneous Localization and Mapping (SLAM) 15 techniques, several works using non-directional detectors 6 , 7 or HEMI 13 for radiation source localization and mapping have been presented. Recently, active masked designs have been developed, contrasting with traditional coded masks. In these designs, multiple detector segments shield each other, creating an anisotropically sensitive and omnidirectional field of view. For example, recent projects developed the neutron-gamma localization and mapping platform (NG-LAMP) system with a 2 × 2 array of CLLBC (Cs 2 LiLa(Br,Cl) 6 :Ce) detectors 16 ; the MiniPRISM system 17 , 18 , which uses a partially-populated 6 × 6 × 4 array of CZT detectors; and an advanced neutron- and gamma-sensitive imager using a partially-populated 6 × 6 × 4 array of CLLBC detectors 19 . There have been similar kinds of high-angular-contrast designs with additional high-density passive elements 20 , 21 , or more traditional Compton cameras for in-field use 22 , 23 , 24 .

In this work, we propose a radiation detection framework using a minimal number of detectors, combining Tetris-shaped detector with inter-pixel paddings, along with a deep-neural-network-based detector reading analysis. Figure  1 shows the overview of our framework. We demonstrate that detectors comprised of as few as four pixels, augmented by the inter-pixel padding material to intentionally increase contrast, could extract directional information with high accuracy. Moreover, we show that the shapes of the detectors do not have to be limited to a square grid. Inspired by the famous video game of Tetris, we demonstrate that other shapes from the Tetrominoes family, in which the geometric shapes are composed of four squares, can have potentially higher resolution (Fig.  1 a). For each shape of the detector, we generate the data of the detector’s input from radiation sources using Monte Carlo (MC) simulation (Fig.  1 b). Figure  1 c shows the machine learning model we trained to predict the direction of radiation sources. Using the filter layer and the deep U-net convolutional neural networks, we establish the model to predict the radiation source direction from the detected signal. As Fig.  1 d illustrates, we compare the ground-truth label of the radiation source direction (blue) with the predicted direction (brown). By using Wasserstein distance as the loss function (see “Methods” for details), the model can achieve high accuracy of direction estimation. As an application of the directional detector, additional Maximum a Posteriori (MAP) has been implemented to a moving detector so that we can further estimate the spatial position of the radiation sources in both simulations and real-world experiments. Throughout this work, we limit the discussion to 2D since it is sufficient in many realistic scenario and leave the 3D discussion for future studies. It’s important to note that the radiation source extends beyond just gamma radiation. Each type of radiation necessitates the development of specialized detectors, as the properties of penetration, scattering, and detection mechanisms vary significantly among different radiation sources. For neutron localization tasks, significant advancements in neutron localization have been made with the high-efficiency, fast, and high-resolution thermal neutron imaging detectors 19 , 25 , 26 , 27 . Our work focuses on mapping γ radiation. Localization of other radiation source types is expected to share similar principles, but is dedicated to future works.

figure 1

a The geometrical setting of the radiation detectors. Instead of using a detector with a large square grid, here we use small 2 × 2 square and other Tetromino shapes. Padding material is added between each pixel to increase contrast. b – d The workflow for learning the radiation directional information with Tetris-shaped detector and machine learning. b Monte Carlo simulation is performed to generate the detector readings for various source directions. c The detector’s readouts are embedded to a matrix of filter layers for better distinguishing far-field and near-field scenarios. The embedded data then goes through a deep U-net. d The predicted direction of radiation sources from the U-net (brown) with predicted angular \(\hat{\theta }\) is compared to the ground-truth Monte Carlo simulations (blue) with true angular data θ . The prediction loss is calculated by comparing the pairs ( θ , \(\hat{\theta }\) ).

Directional prediction with static detectors

First, we train the machine learning model so that the static detectors can detect the direction from which the radiation comes from. We use OpenMC 28 package for MC simulation of the radiation detector receiving the signal from an external radiation source (more details in “Methods” Section). We assume that the detector pixels are composed of CZT detectors with pixel size 1 cm × 1 cm, slightly larger than the current crystals but still much smaller than the 5 meters of source-detector distance. The inter-pixel padding material is chosen to be 1 mm-thick lead empirically, which is thick enough to create contrast and with quite a low photon absorption in the γ -ray range. Throughout this study, we assume that the incident beam energy is γ -ray of 0.5 MeV, which is the realistic energy from pair production and comparable to many energy γ -decay energy levels. Given the energy resolution from CZT detector, radiations with other energies are also expected to be resolvable, even though here we only focus on the directional mapping where only counting matters. More details on the data preparation, normalization, neural network architectures and training procedures are shown in Methods Section. We evaluate the prediction accuracy of detectors which comprise with four detector configurations: 2 × 2 square grid, Tetrominos of S-, J-, T-shapes. The I-shaped Tetris detector array is not presented since it does not show performance good enough for directional mapping. The main results of the predicted radiation direction for the four Tetris-inspired detectors are illustrated in Fig.  2 and summarized in Table 1 . While the S-shape detector worked with the smallest prediction followed by 2 × 2 square, J- and T-shapes, all of the four types of detector could work enough to know the direction of the radiation source with about 1-deg(°) accuracy.

figure 2

Selected outcomes from source direction predictions with detectors of simple configurations are displayed. Each figure includes the signal input employed for the input from the detector and the polar coordinates showing the directional predictions. The blue and brown curves represent the ground truth and prediction, respectively. We show results using the detector configurations of ( a , b ) a 2 × 2 square, ( c , d ) S-shape, ( e , f ) J-shape, and ( g , h ) T-shape. For each detector type, we display ( a , c , e , g ) the scenarios demonstrating the largest loss for each detector shape in test data, highlighting challenging prediction situations. b , d , f , h The scenarios with the smallest loss in test data, showcasing relatively successful prediction cases.

Figure  2 shows detector readouts and typical angular distributions predicted by neural networks in polar plots. The blue and brown colors represent the ground truth from MC and the neural network prediction, respectively. Figure  2 a, b are the 2 × 2 square-grid detector, showing a predictive power of the radiation with the largest and smallest prediction errors. The performance of other Tetris-inspired detector shapes, including S-, L- and T-shapes, are shown in Fig.  2 c–h. By comparing different Tetris shapes, we can see that there is a generic trend that S-shaped Tetris can show the best performance while T-shaped Tetris is the least accurate. This can be intuitively understood from a symmetry analysis. For instance, for incident radiation from the “north” with θ  = 0°, the left and right pixels and padding materials in the T-shape receive identical signals from radiation sources, which reduces the number of effective pixels and padding materials. On the other hand, the S-shaped detector possesses pixels each of which receive non-equivalent signals from radiation sources of any direction, presenting lower prediction accuracy than the square detector. Although the square Tetris in Fig.  2 a, b has higher symmetry than others, it also contains four pieces of inter-pixel padding materials, in contrast to other cases with three pieces. Such analysis may also apply to the I-shaped detector array, given its high symmetry and less effective pixels. Figure  S4 provides the detailed analysis of the prediction accuracy with respect to the radiation source directions. In Supplementary Note  3 , we present further analysis of directional prediction with the Tetris-inspired detectors. Figure  S5 shows the effect of the radiation source energy, arguing the importance of optimizing the radiation source energy for generating training dataset for practical application of the detector. Also, we proposed our proposed model architecture with two filter layers is effective in diverse scenarios by comparing it with the benchmark model of a single filter layer. Figure  S6 and Table  S3 show that the single-filter layer model has the capability to predict the directions of two radiation sources simultaneously. However, the two-filter layer model offers could accommodate diverse scenarios, as explained in Table  S4 . Furthermore, we surveyed the robustness of our directional prediction against the background noise effect. Figure  S7 presented that the S-shaped detector performs the best if there is no noisy background, while the 2 × 2 detector had the highest robustness against the added Gaussian noise.

Positional prediction with moving detectors and maximum a posteriori (MAP) estimation

In real-world applications of radiation mapping, it would be highly desirable to go beyond the directional information and also determine the position of the radiation source. Here, a method based on Maximum A Posteriori (MAP) estimation is proposed in order to generate the guessed distribution of radiations through the motion of detectors. The workflow is as follows: first, the detector readout is simulated by MC given the detector’s initial position and orientation, just as the case for the static detector. Second, the detector begins to move in a circular motion. The schematics are shown in Fig.  3 a. It is worthwhile mentioning that the particular detector face that aligns with the detector’s moving direction does not matter much since the detector facing any direction is already a valid directional detector that is sensitive to radiations coming from all directions. In other words, even if the detector is rotated intentionally or accidentally during the circular motion, the final results are still robust (Figs.  S15 – S20 and Supplementary Movies  7 – 10 in Supplementary Note  6 ). Third, at each instantaneous timestamp during the detector motion, the predicted source direction is calculated based on the deep U-net model, just like the static detector case. Finally, the radiation source location is estimated via MAP based on the series of neural-network-inferred detector direction data at different detector positions. In an ideal case for one single isotropic radiation source, as few as two detector spatial positions are just enough to locate the source position (as the intersection of the two rays along the directions in the two detector positions) and the circular motion and MAP are implemented for more complicated radiation profile mapping. To improve the performance, we set a threshold for visualizing the radiation map. Our radiation maps are normalized by the highest probability of the presence of the radiation sources in the area of interest. We set the threshold as 0.3 and made every value on the area of interest that is lower than this threshold zero. This procedure enables us to visualize the mapping result clearly.

figure 3

a By acquiring detector readings at each spatial position during the circular motion, the position of the radiation can be gradually optimized through MAP. b – d The process to map the radiation source at a few representative times at t  = 10, 30, and 60 s, respectively. The “×” symbol on the maps shows the ground-truth location of the radiation source. The areas colored with intense red indicate a high probability of where the radiation source is located. The purple arrows indicate the front side of the radiation detector. e – j The detector’s input signals and the predicted directions of the radiation sources at time t  = 10, 20, 30, 40, 50, and 60 s. Both the input signal and the curves of the polar coordinates are visualized in the detector frame. The top side of the detector represents the front side. Check the radiation mapping process in Supplementary Movie  1 .

Figure  3 b–j shows the dynamical process of radiation mapping and position determination using the S-shaped detector. The detector geometry is the same as before, and the radius of motion is chosen randomly distributed from 0.5 m to 5 m. Figure  3 b–d shows the inferred radiation mappings at three different timestamps t  = 10, 30, 60 s at the beginning, half-circle, and close to the end of the circular motion. The ground-truth location of the radiation source is shown as the black cross in all three figures. At the early t  = 10 s, there is not sufficient information for MAP to estimate the radiation position, and the estimation (red lines in Fig.  3 b) has a ray shape that acts more like directional mapping. After 30 s, the MAP estimation is improved, though the estimated radiation is located at a broader spatial area rather than the ground truth. Finally, the detector could complete the mapping process with sufficient accuracy to point out the position of the radiation source (Fig.  3 d). The detector’s readouts and the predicted directions at each timestamp ( t  = 10, 20, 30, 40, 50, and 60 s) are illustrated as Fig.  3 e–j. The detailed results with other Tetris-inspired detectors are shown in Supplementary Note  4 . Figures  S9 – S11 present the moving detector and radiation mapping using the detectors of 2 × 2 square, J-shape and T-shape, respectively. Supplementary Movies  2 – 4 visualize these mapping processes at each timestamp. The actual and predicted relative angle are plotted throughout the observed time in Fig.  S12 .

When performing a realistic radiation mapping, the area of interest may contain multiple radiation sources, which increases the level of difficulties of radiation mapping. To tackle this challenge, we further study the radiation distribution map, which includes multiple radiation sources (Fig.  4 ). We can see good agreement can be achieved for two radiation sources. However, we would like to point out that more detector pixels such as 10 × 10 (Fig.  4 a) or 5 × 5 (Fig.  4 b) grids are used since the 2 × 2 square-grid detector does not show adequate performance unless the restriction of fixed distances between the detector and the radiation sources, as shown in Fig.  S6 . Figures S13 and S14 in Supplementary Note  5 present the intermediate process of radiation mapping with the square detectors of 10 × 10 and 5 × 5 configurations, respectively.

figure 4

Two radiation sources are placed in the space (shown as the two black crosses of “×''). The detector is moving in a circular motion around the sources (blue circles). We use the detector of 10 × 10 grid (in a ) and 5 × 5 grid (in b ). Check the radiation mapping process in Supplementary Movies  5 and 6 .

Experimental validation of radiation mapping with a 4-pixel detector

In the preceding sections, we demonstrated the efficacy of our machine-learning approach in accurately locating radiation sources in 2D space using MC simulation data. To validate the practical utility of our method in the field of radiation measurement, it is essential to assess its performance in real-world experimental scenarios.

We conducted a comprehensive experiment to map the location of a radiation source within a real-world environment. As Fig.  5 a shows the experimental schematics, the measurement involved positioning a Cs-137 radiation source at coordinates (5.0, 0.0, 0.0), while the experimental team kept the source position secret until radiation mapping algorithms predicted it. We deployed a radiation detector configured in a 2 × 2 square layout and moved the detector around the area near the radiation source. The detector outputs the radiation absorption by each crystal (pixel) at regular intervals of 0.5 s. We employed a signal smoothing technique to reduce measurement fluctuations. “Methods” and Supplementary Note  7 explain further details regarding the experimental setup and data analysis, including a conventional non-neural-network approach for radiation source mapping.

figure 5

a Annotated photograph of the measurement setup. The red and green arrows show the approximate x - and y -axes of the detector coordinate, and the 171 μ Ci Cs-137 source is shown on the corner of the concrete ledge about 80 cm above the sidewalk level. Note that the radiation source was deliberately positioned at coordinates (5.0, 0.0, 0.0), and the data analysis remained intentionally blinded to the true source location until they made a prediction. b Top-down view of the GPSL reconstruction z -scores of the measurement. The thin bands of black LiDAR points appearing around (0.0, 0.0, 0.0) are artifacts from the system’s initial static dwell. c – f The progression of radiation source mapping at representative time intervals of t  = 0, 14, 28, 42 s, respectively. The gray point clouds in the diagrams represent the surrounding environment of our experiment. The symbol “×” designates the ground-truth location of the radiation source. The black dot on the maps indicates the position of the radiation detector. We visualize the left ( y ) and front ( x ) axes of the detector with green and red arrows, respectively. The black solid line indicates the trajectory of the detector. Check the radiation mapping process in Supplementary Movie  11 .

As an existing analysis method, we demonstrated the non-ML gridded point source likelihood (GPSL) reconstruction method 7 for comparison. Figure  5 b shows a top-down view of the measurement and GPSL reconstruction for the measurement. The gray points are the LiDAR point cloud of the scanned area, and the red, green, and blue lines show the detector’s x , y , and z axes at each 0.5 s timestamp. The path between the axes is colorized by gross counts (qualitatively) from low (cyan) to high (magenta). The color bar denotes the likelihood contours of z -scores up to 5 σ that the given 10 cm pixel contains a point source. The most likely point source position is highlighted by the black dashed crosshair, while the red dashed crosshair shows the actual source position. With the limited approach to the source afforded by the circular detector trajectory, there is a slight 0.75 m error in the reconstructed position, but the activity estimate closely matches the actual value of 170.8  μ Ci.

In Fig.  5 c–f, we present the results of our MAP analysis applied to radiation mapping with the 4-pixel detector. The maps depict probabilities of holding a radiation source. The area denoted by intense red shading represents the higher probability, which precisely converges around the ground-truth position of the radiation source marked by “×.” This convergence underscores that our neural network, equipped for directional prediction and MAP analysis, effectively approximated the actual location of the radiation source with equivalent quality to the GPSL approach. We provide additional images offering both top-down ( z -axis direction) and aerial perspectives in Figs.  S21 and S22 of Supplementary Note  7 for a comprehensive view of the mapping process at various time intervals. In Supplementary Movie  11 , we present the measured signal at each timestamp and the process by which the moving detector maps the radiation in the experimental scenario.

The conventional detector configuration has a grid structure vertically facing the source of detection, where each detector pixel receives the radiation signal with a slightly different solid angle. In this work, we propose an alternative detector configuration with a few features. First, the detector grid is placed horizontally within the plane instead of vertically facing the source. Second, additional thin padding layers are padded between detector pixels, i.e., the contrast between pixels is not only created by incident angles but also enhanced by padding layers that are good absorption layers of radiation. Third, machine learning algorithms are implemented to analyze the detector reading, demonstrating great promise to reduce the need for detector pixel numbers and thereby reduce the cost of fabrication and deployment. Fourth, non-conventional Tetris-shaped detector geometry is proposed beyond the square grid, which can lead to more efficient use of pixels with improved resolution, particularly for the S-shaped Tetris detector. Finally, we demonstrate the possibility of locating the positions of radiation sources in a moving detector scheme through MAP. Experimental validation could further prove the capability of our machine learning approach for locating radiation sources in the real-world scenario.

Despite these initial successes, we believe the configuration proposed in this work is still in its infancy. Several refined works are foreseeable. Particularly, although the 2D configuration can represent several realistic scenarios, such as radiation sources are far away from the detector but still close to ground level, it would still be an interesting problem to study 3D configuration, possibly with 3D detectors like Rubic-shaped detector cubes. Moreover, several improvements, such as moving radiation sources and energy spectra of radiation, may be feasible with a more advanced approach like reinforcement learning. Our work represents one step that leverages the detector pixels and shapes with machine learning toward radiation detection with reduced complexity and cost.

Monte Carlo simulation of static detector and data representation

The training data, in other words, the intensity measured by each detectors is simulated by MC Simulation based on the principle of radiation-matter interaction. We used an existing MC simulation package called OpenMC 28 . OpenMC incorporates effects like Compton scattering and pair production, enabling simulations to accurately model radiation interactions in realistic scenarios. Some representative results of the detector arrays is shown in Fig.  2 . For the sake of simplicity, we temporarily assume the radiation source and the detector arrays are in the same plane. In the MC simulation, first we define the geometry of the detector. Schematic figures of detectors arrays are shown in Fig.  1 . The adjacent detectors (yellow) are separated by attenuation materials (black), which forms the detectors’ configuration like lattices. We set the distance d (cm) between the center of the detector and the radiation source. The direction of the radiation source is defined as an angle θ , which is defined in clockwise direction from the front side of the detector. When we generate training data, we selected d and θ at random ( d   ∈  [20, 500],  θ   ∈  [0, 2 π )) so that the neural network could get feature from radiation sources of various distances and directions. The distribution of the radiation source positions is shown in Fig.  S1 . After MC simulation was completed, the detector’s readouts are represented as the matrix of ( h  ×  w ), where h and w are detectors’ dimensions of heights and width respectively. For the square detector comprised of four detector panels, the data of 2 × 2 matrix was normalized so that the mean and the standard deviation are 0 and 1, respectively. For the detectors of Tetromino-shapes, the detectors’ readouts are represented as 2 × 3 matrices. Since two sites of the matrices’ 6 elements are vacant, we filled them with zero and did normalization in the same way as the square detectors. We followed the same MC simulation method as above to generate the 64 filter layers, which we explain in more detail in this section. All other parameters used in MC simulations are shown in Table  S1 .

The dataset D is in the form of \({\{{{{{{{{{\bf{x}}}}}}}}}^{(i)},{{{{{{{{\bf{y}}}}}}}}}^{(i)}\}}_{i\in [1,{N}_{1}]}\) , where N 1 is the size of the dataset. \({{{{{{{{\bf{x}}}}}}}}}^{(i)}\in {{\mathbb{R}}}^{h\times w}\) is the normalized readouts of the detector arrays h, w denotes the number of rows and columns of the detector arrays, respectively. For example, h  =  w  = 2 for 2 × 2 square detector, h  = 2, w  = 3 for Tetromino-shape detector. \({{{{{{{{\bf{y}}}}}}}}}^{(i)}\in {{\mathbb{R}}}^{{N}_{a}}\) is the angular distribution of the incident radiation, N a is the number of sectors that are used to separate [0, 2 π ). Each element in y ( i ) represents the ratio of incident radiation intensity received from the direction of this sector to the total incident radiation intensity. For the point sources, the angular distribution of the radiation source is represented by the following method. For an angular distribution y contributed by multiple point sources, let y j represent the angular distribution contributed by the j th point source. The k th element of y j is defined by:

We also have:

where θ j denotes the incident angle of the j th radiation source, I j denotes the total incident intensity revived from the j th radiation source, I 0  = Σ j I j s the total incident intensity revived from all radiation sources. By using this representation, we are able to accurately represent the incident direction of a point source with an arbitrary angle, with a discretized angle interval. In the experiments, [0, 2 π ) is separated in to N a  = 64 sectors. Figure  2 illustrates this representation by a pie chart.

Deep neural network architecture

In order to extract the global patterns of the input data, a set of global filters is designed. We obtain several filters based on very high-quality simulations in some representative cases, including far-field incident (the radiation source is located at a far distance compared to the size of the detector arrays) and near-filed (the radiation source is located at a close distance) incident at different directions. As an example, near-field filters of the S-shape detector are shown in Fig.  S2 . Each filter has the size of (h × w), which is set same as that of the training data. The weight of each unit is given by the readout of each single pad detector in the simulation. The output of this layer is given by:

where Z m k is the k th element of the output array at channel m   ∈  {1, 2}. Channel 1 and 2 correspond to the far-field and near-field filters respectively. x i j denotes the input array at pixel i , j , F m k i j denotes the ( i , j ) element of the global filter obtained in the case that incident radiation from the k th sector, in far-field ( m  = 1) or near-field ( m  = 2) scenario. w m denotes a channel-wise normalization weight, and b m k denotes the bias for this global filter. During training, the weights of the global filter are initialized with the far-field and near-field filters that we obtained from the high-quality simulations. The weights are slightly fine-tuned with a learning rate lower than the learning rate of the other layer of the network, while the bias for each filter is trained with the same learning rate as the other layers. The filter layer is followed by an Exponential Linear Unit (ELU) activation function 29 . The output of the global filtering layer embeds the directional information corresponding to the direction of the filter channel. It is then fed into the U-shape network to extract the directional information.

In the neural network, the input data is normalized (h × w) detectors readout. However, it is essentially different from images captured by cameras. For the processing of images which is measured from visible light, convolution neural networks (CNNs) are widely used 30 . A convolution layer is used for extracting features that are presented as localized patterns. However, due to the penetration properties of several kinds of radiation, such features are presented as global patterns, which is different from the imaging of visible light. Therefore, an updated architecture is designed for this purpose; the input data is followed by a global filtering layer with the shape (2, 64, h , w ) in order to extract the global pattern. The output of this layer conveys the directional information with a size of (2, 64), which corresponds to the final output, i.e., the estimated angular distribution, with a size of (1, 64). In order to perform a pixel-to-pixel prediction of the angular distribution, a U-shape fully convolutional architecture which is similar to U-Net 31 is then utilized as Fig.  1 b–d shows. Noticed that the U-Net architecture was originally developed for image segmentation 31 . However, here the output is a 1D array, and the input is viewed as a 1D array with two channels. Thus, we accordingly set the dimension of the U-Net, as is shown in Fig.  S3 . Finally, the output of the final layer feeds into a softmax layer for normalization.

In order to represent the distributional similarity between the predicted and target distribution, Wasserstein distance is proposed to be used as the loss function. It is a distance function on a given metric space between two probability distributions. As this metric is an analogy of the minimum cost required to move a pile of earth into the other, it is also known as the earth mover’s distance 32 , 33 , 34 . The Wasserstein loss function is given by:

where π ( i ,  j ) is the transport policy which represents the mass to be transferred from state i to state j , y i is the ground truth of the normalized angular distribution, \(\hat{{y}_{i}}\) is the estimated angular distribution. C ( i ,  j ) is the cost matrix representing the cost of moving unit mass from state i to j . Our work is in a cyclic case and uses the following form:

An algorithm is developed to calculate the cyclic Wasserstein distance, as is shown in Algorithm 1. Particularly, the cyclic case is unrolled into an ordered case. The ring is split into a line at n different units and obtain n different distributions. The cost matrix in the cyclic and ordered cases. The Wasserstein distance could be computed in closed form 33 , 35 :

Where C D F ( ⋅ ) calculates the cumulative distribution of its input. Following this formula, a decycling algorithm is developed to calculate the Wasserstein distance with a cyclic cost. The algorithm is shown below:

Algorithm 1

deep neural network research paper

It could be numerically verified that this algorithm enables exact calculation of 1st ( l  = 1) Wasserstein distance for the cyclic Wasserstein distance, given arbitrary distribution. This algorithm is differentiable and enables us to optimize the objective through back-propagation. As for evaluation in the experiments, we use 1st Wasserstein distance which can directly represent the angle difference of the estimated and real directions. In the training process, we use the above 2nd ( l  = 2) Wasserstein distance as a loss function since it usually converges faster with gradient descent-based optimization methods, compared to the 1st Wasserstein distance 33 , 36 .

In the proposed model, the network is trained using Adam 37 with a learning rate of 0.001 for all parameters in the neural network except for the weights of the global filtering layer, whose learning rate is set to 3 × 10 −5 . The training batch for each step is randomly selected from the training set which is based on several pristine simulation result set D train on one radiation source. All models randomly split the data into 90% training (2700 data) and 10% testing (300 data) sets. Furthermore, we trained our models with a 5-fold cross-validation scheme. We summarize the parameters for training the neural network for predicting the directions of one or two radiation sources in Table  S2 .

Radiation source mapping with maximum a posteriori (MAP) estimation

We set up a radiation mapping problem by considering the case that there is one point source in the environment. This task could be extended to a Simultaneous Localization and Mapping (SLAM) problem 15 . The directional radiation detector could be viewed as a kind of sensor that could only obtain directional information, and a radiation source could be viewed as a particular landmark that could only be measured by this kind of detector. We show that by treating the directional radiation detector as a sensor with only directional resolution, it could be easily integrated into the MAP optimization framework and it enables source localization. Details of how to integrate the directional radiation detector into the MAP framework are presented in Fig.  S8 .

Here, a method based on Maximum a Posteriori (MAP) Estimation is proposed in order to generate the radiation distribution map. In addition, here, we assume that the carrier for the detector has already localized itself and is only required to build the radiation map. The map is discretized into a mesh with N m square pixels. Let \(c\in {{\mathbb{R}}}^{{N}_{m}}\) denote the radiation concentration at each pixel. It could be assumed that the radiation is uniformly generated from the pixel. The measurements result z t  =  I 0 y t denotes the incident radiation intensity coming from different directions at time t . It could be assumed that the measurement probability p ( z t ∣ c ) is linear in its arguments, with added Gaussian noise:

where \(\delta \sim N\left(0,{\Sigma }_{\delta }\right)\) describes the measurement noise, \({{{{{{{{\bf{M}}}}}}}}}_{t}\in {{\mathbb{R}}}^{N\times {N}_{m}}\) denotes the observation matrix at time t . Note that in our study, we treat gamma measurements as continuous real numbers with Gaussian noise, which is appropriate for scenarios where the sample size is sufficiently large, and the mean count rate is not significantly low. The central limit theorem ensures that for a significant sample size, the Poisson distribution, which describes the discrete nature of event counts, tends to approximate a Gaussian distribution. As such, our choice of Gaussian distribution provides a reasonable approximation for the behavior of directional radiation detectors under the conditions of our experiments, where the counts are reasonably large, allowing us to accurately model the measurement uncertainties. Considering the contribution of one pixel to one direction sector of the detector, only the overlapped area can contribute to the sector, as is shown in Fig.  S8 , and the intensity is proportional to the overlapped area. Besides, the intensity is inversely proportional to the square of the distance between the detector and the source. Therefore, the element of M t , or in other words the intensity contribution of the i th pixel to the j th directional sector of the detector at time t can be written as:

where A t i j denotes the area of the overlapped region of the i th pixel and the j th sector at time t (blue area in Fig.  S8 ), r t i denotes the distance between the detector and the center of the pixel. According to Bayes’ rule, we have:

Here, we assume that measurements at different times are conditionally independent given c . As we are trying to find c that maximizes p ( c ∣ z 1 ,  z 2 , …,  z t ), the p ( z 1 ,  z 2 , …,  z t ) term could be ignored as it is independent of c . It could be assumed that the prior term p ( c ) =  N (0,  ε I ) is a Gaussian distribution. Then following the Maximum a Posteriori (MAP) estimation, we have:

Therefore, the radiation concentration distribution could be obtained by solving the optimization problem:

where \(\frac{1}{{\varepsilon }^{2}}\parallel {{{{{{{\bf{c}}}}}}}}{\parallel }_{2}^{2}\) term could be viewed as a regularization term. If we do not have much information regarding the prior distribution, ε will be a large number. This term will penalize large concentration if the measuring data is inadequate to determine the concentration (i.e., the area is not fully explored and caused very small M t i j ). In practice, ε could be tuned by utilizing differentiable convex optimization layers 38 , in which the optimization problem could be viewed as a layer within the neural network, and error back-propagation is enabled through implicit differentiation, given predicted and ground truth data. In our demonstration case, we simply set \(\frac{1}{{\varepsilon }^{2}}=0.1\) such that the regularization term is relatively small.

Experimental setup of real-world radiation mapping and post-process for MAP analysis

The system comprises a 2 × 2 array of 1 × 1 × 2” CLLBC (Cs 2 LiLa(Br,Cl) 6 :Ce) gamma/neutron detectors separated by a polyethylene cross. The detector is equipped with a Localization and Mapping Platform (LAMP) sensor suite used to map the 3D environment and determine the detector pose (position and orientation) within that environment, and the demonstration measurements were made with Lawrence Berkeley National Laboratory’s Neutron Gamma LAMP (NG-LAMP) radiation mapping system 39 . A 171  μ Ci Cs-137 check source was placed on a concrete ledge in an outdoor environment, and the detector system was used to make free-moving measurements of the source. The detector was hand-carried in a circular pattern of about 2 m radius around a point 5 m away from the source location for up to 45 s, completing almost 2.5 loops. Throughout the measurement, the detector orientation was kept almost fixed with respect to the environment. The CLLBC crystals were kept at a height close to that of the radiation source during the entire measurement duration. The listmode radiation data in an energy region of interest (ROI) of [550, 800] keV and the detector pose determined by NG-LAMP’s contextual sensor suite were then interpolated to a 0.5-s time binning. We note that these real-world radiation measurements placed the source outside the circular detector trajectory to model a realistic source-search scenario closely.

To enhance the quality of measurement data for analysis, we applied a series of post-processing steps to the measured data. First, the location of the 661.7 keV Cs-137 photopeak in each crystal had drifted to 690–700 keV, necessitating a manual, multiplicative gain correction for each crystal. To avoid any possible energy aliasing from this gain correction, the corrected energy values were blurred by a Gaussian kernel of standard deviation 1 keV, much smaller than the width of the photopeak (10 keV). Furthermore, other post-processing steps were applied to the contextual data to streamline the analysis. The light detection and ranging (LiDAR) point clouds and global coordinate frames were more precisely aligned to a single coordinate frame using the Iterative Closest Point (ICP) algorithm in Open3D 40 , 41 . Moreover, the initial and final parts of the measurements where the system was walked to/from its intended measurement position(s) and used to perform a dedicated LiDAR scan of the area were cut from the radiation mapping analysis. The trajectory and radiation data were cut here, but the contextual LiDAR point clouds were not.

As an existing analysis method for comparative analysis, We demonstrated the non-machine-learning gridded point source likelihood (GPSL) reconstruction method 7 . Using quantitative response functions, GPSL computes the best-fit source activity for every potential source point in the imaging space and selects the source point with the maximum likelihood 16 . As in the neural network analysis, reconstructions were computed for the 2D plane level with the actual source height, using an energy region of interest (ROI) of 661.7 ± 80 keV.

We conducted radiation mapping to validate that our neural networks and MAP are applicable to the radiation measurement in a real-world scenario. As detailed below, we first pre-processed experimental data as the inputs of our analysis. At each position of the 91 timestamps through 45-s measurements, the 2 × 2 square detector acquired measurement data as a matrix of dimensions (4, 84). These measurements corresponded to the count of photons absorbed by the four pixels on the crystal panels. The photon counts were recorded for 84 energy bins ranging from 550 to 800 keV. To process this raw measurement dataset into the input dataset of our neural network model, we summed the photon counts absorbed by each pixel across this entire energy region of interest.

The initial detector signals exhibited significant statistical fluctuations, making it challenging to predict the radiation source direction accurately. When the detector remained stationary at the starting pose, the radiation source direction predicted by our algorithm changed abruptly in the early timestamps. To mitigate this issue, we implemented a signal-smoothing technique using a moving average filter written in Eq. ( 13 ). This process smoothed the detector’s signals by averaging the signals of the neighboring 2 M  + 1 timestamps. We used M , equal to 3, resulting in a window size of 7 timestamps. This smoothing technique reduced the total number of timestamps from 91 to 85. The smoothed signal data served as the input for our machine-learning model:

We employed a U-Net architecture trained using MC simulation data to predict the direction of the radiation source given the signals from the detector of 2 × 2 configuration. The trained model was then applied to predict the radiation direction based on the smoothed measurement data. With directional information predicted by our model at each timestamp, we applied the MAP method to reconstruct the radiation map. We restricted the MAP analysis to a 15 × 15 square area within the entire space to reduce computational complexity.

Data availability

The Supplementary Movies showing the radiation mapping processes are available. The training data generated with the OpenMC 28 package have been deposited in the GitHub repository ( https://github.com/RyotaroOKabe/radiation_mapping/tree/main/save/openmc_data/saved_files ).

Code availability

The source code is available at https://github.com/RyotaroOKabe/radiation_mapping.git 42 . The GitHub repository and Supplementary Note  8 present the instructions for reproducing the results of our simulations and machine learning.

Connor, D., Martin, P. G. & Scott, T. B. Airborne radiation mapping: overview and application of current and future aerial systems. Int. J. Remote Sens. 37 , 5953–5987 (2016).

Article   Google Scholar  

Lazna, T., Jilek, T., Gabrlik, P. & Zalud, L. Multi-robotic area exploration for environmental protection. In International Conference on Industrial Applications of Holonic and Multi-Agent Systems , 240–254 (Springer, 2017).

Christie, G. et al. Radiation search operations using scene understanding with autonomous UAV and UGV. J. Field Robot. 34 , 1450–1468 (2017).

Guzman, R., Navarro, R., Ferre, J. & Moreno, M. Rescuer: development of a modular chemical, biological, radiological, and nuclear robot for intervention, sampling, and situation awareness. J. Field Robot. 33 , 931–945 (2016).

Towler, J., Krawiec, B. & Kochersberger, K. Radiation mapping in post-disaster environments using an autonomous helicopter. Remote Sens. 4 , 1995–2015 (2012).

Article   ADS   Google Scholar  

Pavlovsky, R. et al. 3-D radiation mapping in real-time with the localization and mapping platform lamp from unmanned aerial systems and man-portable configurations. arXiv https://doi.org/10.48550/arXiv.1901.05038 (2018).

Hellfeld, D. et al. Gamma-ray point-source localization and sparse image reconstruction using Poisson likelihood. IEEE Trans. Nucl. Sci. 66 , 2088–2099 (2019).

Article   ADS   CAS   Google Scholar  

Briones, L., Bustamante, P. & Serna, M. A. Wall-climbing robot for inspection in nuclear power plants. In Proceedings of the 1994 IEEE International Conference on Robotics and Automation , 1409–1414 (IEEE, 1994).

Mazumdar, A., Fittery, A., Ubellacker, W. & Asada, H. H. A ball-shaped underwater robot for direct inspection of nuclear reactors and other water-filled infrastructure. In 2013 IEEE International Conference on Robotics and Automation , 3415–3422 (IEEE, 2013).

Amman, M. et al. Detector module development for the high efficiency multimode imager. In 2009 IEEE Nuclear Science Symposium Conference Record (NSS/MIC) , 981–985 (IEEE, 2009).

Caroli, E., Stephen, J., Di Cocco, G., Natalucci, L. & Spizzichino, A. Coded aperture imaging in x-and gamma-ray astronomy. Space Sci. Rev. 45 , 349–403 (1987).

Galloway, M., Zoglauer, A., Amman, M., Boggs, S. E. & Luke, P. N. Simulation and detector response for the high efficiency multimode imager. Nucl. Instrum. Methods Phys. Res. Sect. A Accel. Spectrometers Detect. Assoc. Equip. 652 , 641–645 (2011).

Vetter, K. et al. Gamma-ray imaging for nuclear security and safety: towards 3-d gamma-ray vision. Nucl. Instrum. Methods Phys. Res. Sect. A Accel. Spectrometers Detect. Assoc. Equip. 878 , 159–168 (2018).

Hanna, D., Sagnières, L., Boyle, P. & MacLeod, A. A directional gamma-ray detector based on scintillator plates. Nucl. Instrum. Methods Phys. Res. Sect. A Accel. Spectrometers Detect. Assoc. Equip. 797 , 13–18 (2015).

Cadena, C. et al. Past, present, and future of simultaneous localization and mapping: toward the robust-perception age. IEEE Trans. Robot. 32 , 1309–1332 (2016).

Vavrek, J. R. et al. Reconstructing the position and intensity of multiple gamma-ray point sources with a sparse parametric algorithm. IEEE Trans. Nucl. Sci. 67 , 2421–2430 (2020).

Pavlovsky, R. et al. MiniPRISM: 3D realtime gamma-ray mapping from small unmanned aerial systems and handheld scenarios. In IEEE NSS-MIC Conference Record (IEEE, 2019).

Hellfeld, D. et al. Free-moving quantitative gamma-ray imaging. Sci. Rep. 11 , 1–14 (2021).

Vavrek, J. R. et al. 4π multi-crystal gamma-ray and neutron response modeling of a dual modality imaging system. In IEEE Symposium on Radiation Measurements and Applications (SORMA West) (2021).

Kitayama, Y., Nogami, M. & Hitomi, K. Feasibility study on a gamma-ray imaging using three-dimensional shadows of gamma rays. In IEEE NSS-MIC (2022).

Hu, Y. et al. A wide energy range and 4 π -view gamma camera with interspaced position-sensitive scintillator array and embedded heavy metal bars. Sensors 23 , 953 (2023).

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Sinclair, L. et al. Silicon photomultiplier-based compton telescope for safety and security (SCoTSS). IEEE Trans. Nucl. Sci. 61 , 2745–2752 (2014).

Sinclair, L. E. et al. End-user experience with the SCoTSS Compton imager and directional survey spectrometer. Nucl. Instrum. Methods Phys. Res. Sect. A Accel. Spectrometers Detect. Assoc. Equip. 954 , 161683 (2020).

Article   CAS   Google Scholar  

Murtha, N., Sinclair, L., Saull, P., McCann, A. & MacLeod, A. Tomographic reconstruction of a spatially-extended source from the perimeter of a restricted-access zone using a SCoTSS compton gamma imager. J. Environ. Radioact. 240 , 106758 (2021).

Article   CAS   PubMed   Google Scholar  

Bonomally, S., Ihantola, S. & Vacheret, A. Enhancing source detection for threat localization. Poster presented at Nuclear Security Detection Workshop. https://indico.cern.ch/event/731980/contributions/3285768/attachments/1829613/2995886/VACHARET_NuSec-Poster-2017.pdf (2017).

Cortesi, M., Zboray, R., Adams, R., Dangendorf, V. & Prasser, H.-M. Concept of a novel fast neutron imaging detector based on THGEM for fan-beam tomography applications. J. Instrum. 7 , C02056 (2012).

Breskin, A. et al. Large-area high-resolution thermal neutron imaging detectors. In International Conference on Neutrons and Their Applications , Vol. 2339, 281–286 (SPIE, 1995).

Romano, P. K. et al. OpenMC: a state-of-the-art Monte Carlo code for research and development. Ann. Nucl. Energy 82 , 90–97 (2015).

Clevert, D.-A., Unterthiner, T. & Hochreiter, S. Fast and accurate deep network learning by exponential linear units (elus). arXiv https://doi.org/10.48550/arXiv.1511.07289 (2015).

LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521 , 436–444 (2015).

Article   ADS   CAS   PubMed   Google Scholar  

Ronneberger, O., Fischer, P. & Brox, T. U-net: convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-assisted Intervention , 234–241 (Springer, 2015).

Talebi, H. & Milanfar, P. Nima: neural image assessment. IEEE Trans. Image Process. 27 , 3998–4011 (2018).

Article   ADS   MathSciNet   Google Scholar  

Hou, L., Yu, C.-P. & Samaras, D. Squared Earth mover’s distance-based loss for training deep neural networks. arXiv https://doi.org/10.48550/arXiv.1611.05916 (2016).

Genevay, A., Peyré, G. & Cuturi, M. Learning generative models with sinkhorn divergences. In International Conference on Artificial Intelligence and Statistics , 1608–1617 (PMLR, 2018).

Levina, E. & Bickel, P. The Earth mover’s distance is the mallows distance: some insights from statistics. In Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001 , Vol. 2, 251–256 (IEEE, 2001).

Shalev-Shwartz, S. & Tewari, A. Stochastic methods for l 1 regularized loss minimization. In Proceedings of the 26th Annual International Conference on Machine Learning , 929–936 (2009).

Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. arXiv https://doi.org/10.48550/arXiv.1412.6980 (2014).

Agrawal, A. et al. Differentiable convex optimization layers. In Advances in Neural Information Processing Systems 32 (NeurIPS, 2019).

Pavlovsky, R. et al. 3D gamma-ray and neutron mapping in real-time with the Localization and Mapping Platform from unmanned aerial systems and man-portable configurations. arXiv https://doi.org/10.48550/arXiv.1908.06114 (2019).

Wang, R., Peethambaran, J. & Chen, D. Lidar point clouds to 3-D urban models: a review. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 11 , 606–627 (2018).

Zhou, Q.-Y., Park, J. & Koltun, V. Open3d: a modern library for 3d data processing. arXiv https://doi.org/10.48550/arXiv.1801.09847 (2018).

Okabe, R., Xue, S. & Vavrek, J. Tetris-inspired detector with neural network for radiation mapping. https://zenodo.org/doi/10.5281/zenodo.10685051 ; https://doi.org/10.5281/zenodo.10685051 (2024).

Download references

Acknowledgements

R.O. and M.L. thank the helpful discussions from F. Frankel. R.O., T.L., G.K., and M.L. acknowledge the support from U.S. Department of Energy (DOE), Advanced Research Projects Agency-Energy (ARPA-E) DE-AR0001298. R.O. and M.L. are partly supported by DOE Basic Energy Sciences (BES), Award No. DE-SC0021940. R.O. acknowledges support from Heiwa Nakajima Foundation. M.L. acknowledges the Norman C Rasmussen Career Development Chair, the MIT Class of 1947 Career Development Chair, and the support from Dr. R. Wachnik. This manuscript has been authored by an author at Lawrence Berkeley National Laboratory under Contract No. DE-AC02-05CH11231 with the U.S. Department of Energy. The U.S. Government retains, and the publisher, by accepting the article for publication, acknowledges, that the U.S. Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for U.S. Government purposes.

Author information

These authors contributed equally: Ryotaro Okabe, Shangjie Xue, Jayson R. Vavrek.

Authors and Affiliations

Quantum Measurement Group, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA

Ryotaro Okabe, Shangjie Xue, Tongtong Liu & Mingda Li

Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA

Ryotaro Okabe

Department of Nuclear Science and Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA

Shangjie Xue, Jiankai Yu, Benoit Forget & Mingda Li

Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA

Shangjie Xue & Stefanie Jegelka

Applied Nuclear Physics Program, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA

Jayson R. Vavrek, Ryan Pavlovsky, Victor Negut, Brian J. Quiter & Joshua W. Cates

Department of Physics, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA

Tongtong Liu

Nuclear Reactor Laboratory, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA

Gordon Kohse & Lin-wen Hu

You can also search for this author in PubMed   Google Scholar

Contributions

R.O. took the lead in the project. R.O. and S.X. developed the machine learning model. R.O. performed the Monte Carlo simulations with support from T.L., J.Y. and B.F. R.O. wrote the manuscript with support from S.X. and J.V. J.V., R.P., V.N., B.Q. and J.C. carried out the experiment. R.O., S.X. and J.V. carried out data analysis. M.L., S.J., G.K. and L.H. designed the project, provided supervision, and revised the manuscript.

Corresponding authors

Correspondence to Ryotaro Okabe , Lin-wen Hu or Mingda Li .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Peer review

Peer review information.

Nature Communications thanks Hong Joo Kim, Liqian Li and Nicholas Maliszewskyj for their contribution to the peer review of this work. A  peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information, peer review file, description of additional supplementary files, supplementary movie 1, supplementary movie 2, supplementary movie 3, supplementary movie 4, supplementary movie 5, supplementary movie 6, supplementary movie 7, supplementary movie 8, supplementary movie 9, supplementary movie 10, supplementary movie 11, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Okabe, R., Xue, S., Vavrek, J.R. et al. Tetris-inspired detector with neural network for radiation mapping. Nat Commun 15 , 3061 (2024). https://doi.org/10.1038/s41467-024-47338-w

Download citation

Received : 07 February 2023

Accepted : 27 March 2024

Published : 09 April 2024

DOI : https://doi.org/10.1038/s41467-024-47338-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

deep neural network research paper

IMAGES

  1. (PDF) Deep Recurrent Neural Network for Mobile Human Activity

    deep neural network research paper

  2. The general scheme of deep neural network (DNN) (a) and recurrent

    deep neural network research paper

  3. Examples of deep neural networks. a Deep feedforward neural network

    deep neural network research paper

  4. (PDF) Deep Neural Networks for Text: A Review

    deep neural network research paper

  5. Deep Neural Networks as Scientific Models: Trends in Cognitive Sciences

    deep neural network research paper

  6. Deep-Neural-Network-What-is-Deep-Learning-Edureka

    deep neural network research paper

VIDEO

  1. Colorizebot Neural Network Research (Deep Learning)

  2. ChatGPT: A 30 Year History #chatgpt #openai #ai

  3. Research Paper Deep Dive

  4. Neural Network Research with Wolfram Language

  5. L15.0: Introduction to Recurrent Neural Networks -- Lecture Overview

  6. Graph Neural Network

COMMENTS

  1. Review of deep learning: concepts, CNN architectures, challenges

    We have reviewed the significant research papers in the field published during 2010-2020, mainly from the years of 2020 and 2019 with some papers from 2021. ... such as recurrent neural networks (RNNs), convolutional neural networks (CNNs), and deep neural networks (DNNs). In addition, the RNN category includes gated recurrent units (GRUs ...

  2. [1404.7828] Deep Learning in Neural Networks: An Overview

    In recent years, deep artificial neural networks (including recurrent ones) have won numerous contests in pattern recognition and machine learning. This historical survey compactly summarises relevant work, much of it from the previous millennium. Shallow and deep learners are distinguished by the depth of their credit assignment paths, which ...

  3. Deep learning: systematic review, models, challenges, and research

    The deep belief network (DBN) is another unsupervised deep neural network that performs in a similar way as the deep feed-forward neural network with inputs and multiple computational layers, known as hidden layers, as illustrated in Fig. 3C. In DBM, there are two main phases that are necessary to be performed, pre-train and fine-tuning phases.

  4. Deep Learning: A Comprehensive Overview on Techniques ...

    Deep learning (DL), a branch of machine learning (ML) and artificial intelligence (AI) is nowadays considered as a core technology of today's Fourth Industrial Revolution (4IR or Industry 4.0). Due to its learning capabilities from data, DL technology originated from artificial neural network (ANN), has become a hot topic in the context of computing, and is widely applied in various ...

  5. Deep learning

    This paper introduced a novel and effective way of training very deep neural networks by pre-training one hidden layer at a time using the unsupervised learning procedure for restricted Boltzmann ...

  6. Deep learning in computer vision: A critical review of emerging

    Section 8 concludes this paper. 2. Recent developments on deep network architectures and evolvement. ... GFS-DCF method can significantly improve the performance of a DCF tracker equipped with deep neural network features, with the AUC increasing from 55.49% to 63.07%. ... Latest Stage (2019-now) and Research Trends.

  7. PDF Review of deep learning: concepts, CNN architectures, challenges

    particular, this paper outlines the importance of DL, presents the types of DL tech‑ niques and networks. It then presents convolutional neural networks (CNNs) which the most utilized DL network type and describes the development of CNNs architectures together with their main features, e.g., starting with the AlexNet network and closing

  8. [2010.01496] Explaining Deep Neural Networks

    Deep neural networks are becoming more and more popular due to their revolutionary success in diverse areas, such as computer vision, natural language processing, and speech recognition. However, the decision-making processes of these models are generally not interpretable to users. In various domains, such as healthcare, finance, or law, it is critical to know the reasons behind a decision ...

  9. Deep learning in mental health outcome research: a scoping review

    Deep feedforward neural network. Artificial neural network (ANN) is proposed with the intention of mimicking how human brain works, where the basic element is an artificial neuron depicted in Fig. 2a.

  10. PDF Deep Learning: A Comprehensive Overview on Techniques ...

    articial neural network (ANN). Deep learning became a prominent topic after that, resulting in a rebirth in neural network research, hence, some times referred to as "new-generation neural networks". This is because deep networks, when properly trained, have produced significant success in

  11. (PDF) Deep Learning in Neural Networks: The science behind an

    success in the field of speech recognition, computer vision and language processing. This paper will. contain the fundamental concepts of deep learning along with a list of the current and future ...

  12. Conceptual Understanding of Convolutional Neural Network- A Deep

    Convolutional Neural Network (CNN) is a deep learning approach that is widely used for solving complex problems. ... and Karnowski, T. P. (2010) “Deep machine learning-a new frontier in artificial intelligence research [research frontier].†IEEE computational intelligence magazine 5 (4): 13-18. [2] ... Regular Papers, 59 (5) (2012 ...

  13. Introduction to Machine Learning, Neural Networks, and Deep Learning

    Introduction. Over the past decade, artificial intelligence (AI) has become a popular subject both within and outside of the scientific community; an abundance of articles in technology and non-technology-based journals have covered the topics of machine learning (ML), deep learning (DL), and AI. 1-6 Yet there still remains confusion around ...

  14. Dermatologist-level classification of skin cancer with deep neural networks

    Deep convolutional neural networks (CNNs) 4,5 show potential for general and highly variable tasks across many fine-grained object categories 6,7,8,9,10,11. Here we demonstrate classification of ...

  15. Object Detection Using Deep Learning, CNNs and Vision Transformers: A

    Detecting objects remains one of computer vision and image understanding applications' most fundamental and challenging aspects. Significant advances in object detection have been achieved through improved object representation and the use of deep neural network models. This paper examines more closely how object detection has evolved in the era of deep learning over the past years. We ...

  16. Deep Learning: A Comprehensive Overview on Techniques, Taxonomy

    Deep learning became a prominent topic after that, resulting in a rebirth in neural network research, hence, some times referred to as "new-generation neural networks". This is because deep networks, when properly trained, have produced significant success in a variety of classification and regression challenges [ 52 ].

  17. Review of deep learning: concepts, CNN architectures, challenges

    We have reviewed the significant research papers in the field published during 2010-2020, mainly from the years of 2020 and 2019 with some papers from 2021. ... Adam is a learning strategy that has been designed specifically for training deep neural networks. More memory efficient and less computational power are two advantages of Adam. The ...

  18. PDF 1 A Survey on Deep Neural Network Pruning: Taxonomy, Comparison

    To facilitate future research on deep neural network pruning, we summarize broad pruning applications (e.g., adversarial robustness, natural language understanding, etc.) and build a curated collection of datasets, networks, and evaluations on ... Fig. 1: The number of papers for neural network pruning and compression from 1988-2022. original ...

  19. A study on Deep Neural Networks framework

    Abstract: Deep neural networks(DNN) is an important method for machine learning, which has been widely used in many fields. Compared with the shallow neural networks(NN), DNN has better feature expression and the ability to fit the complex mapping. In this paper, we first introduce the background of the development of the DNN, and then introduce several typical DNN model, including deep belief ...

  20. PDF Deep Neural Networks for YouTube Recommendations

    mulated as a deep neural network in [22] and autoencoders in [18]. Elkahky et al. used deep learning for cross domain user modeling [5]. In a content-based setting, Burges et al. used deep neural networks for music recommendation [21]. The paper is organized as follows: A brief system overview is presented in Section 2. Section 3 describes the ...

  21. Mastering the game of Go with deep neural networks and tree search

    A computer Go program based on deep neural networks defeats a human professional player to achieve one of the grand challenges of artificial intelligence.

  22. Deep learning for healthcare: review, opportunities and challenges

    The first applications of neural networks in genomics replaced conventional machine learning with deep architectures, without changing the input features. For example, Xiong et al. used a fully connected feed-forward neural network to predict the splicing activity of individual exons. The model was trained using >1000 predefined features ...

  23. Elastic parameter identification of three-dimensional ...

    Deep neural networks, serving as surrogate solvers for predicting physical fields, have achieved excellent results (Yang et al., ... Building upon previous research, this paper proposes a new method for identifying the elastic parameters of human soft tissues based on UNet. Existing soft tissue displacement-elastic modulus data are employed to ...

  24. [2404.07847] Fuss-Free Network: A Simplified and Efficient Neural

    This paper introduces the Fuss-Free Network (FFNet), a crowd counting deep learning model that is characterized by its simplicity and efficiency in terms of its structure. The model comprises only a backbone of a neural network and a multi-scale feature fusion structure.The multi-scale feature fusion structure is a simple architecture ...

  25. Recent advances and applications of deep learning methods in ...

    It is beyond the scope of this article to give a detailed hands-on introduction to Deep Learning. There are many materials for this purpose, for example, the free online book "Neural Networks ...

  26. A survey on spatio-temporal series prediction with deep learning

    With the rapid development of data acquisition and storage technology, spatio-temporal (ST) data in various fields are growing explosively, so many ST prediction methods have emerged. The review presented in this paper mainly studies the prediction of ST series. We propose a new taxonomy organized along three dimensions: ST series prediction methods (focusing on time feature learning, focusing ...

  27. Novel Deep Mutual Recurrent Neural Network and Long Short-Term ...

    The novel deep mutual RNN and LSTM model with unique alteration of the layers was proposed and developed effectively to determine fake reviews that were computer generated. The findings suggested that while it might be difficult for humans to identify fake reviews, machines can compete against other machines in this area.

  28. Research on local sound field intensity control technique in

    The deep neural network was trained using a kernel loss function based on a radial basis kernel function, which established an inverse mapping relationship between the desired sound field to the metasurface physical structure parameters. Finally, the sound field intensity modulation at a localized target range was achieved.

  29. Accelerating Deep Neural Network Training with Decentralized

    Dr. Kun Yuan is a Researcher at Center for Machine Learning Research (CMLR) in Peking University. He completed his Ph.D. degree at UCLA in 2019, and was a staff algorithm engineer in Alibaba (US) Group between 2019 and 2022. His research focuses on the development of fast, scalable, reliable, and distributed algorithms with applications in large-scale optimization, deep neural network training ...

  30. Tetris-inspired detector with neural network for radiation mapping

    In the proposed model, the network is trained using Adam 37 with a learning rate of 0.001 for all parameters in the neural network except for the weights of the global filtering layer, whose ...