Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 01 May 2024

Novel applications of Convolutional Neural Networks in the age of Transformers

  • Tansel Ersavas 1 ,
  • Martin A. Smith 1 , 2 , 3 , 4 &
  • John S. Mattick 1  

Scientific Reports volume  14 , Article number:  10000 ( 2024 ) Cite this article

177 Accesses

3 Altmetric

Metrics details

  • Computational science
  • Machine learning

Convolutional Neural Networks (CNNs) have been central to the Deep Learning revolution and played a key role in initiating the new age of Artificial Intelligence. However, in recent years newer architectures such as Transformers have dominated both research and practical applications. While CNNs still play critical roles in many of the newer developments such as Generative AI, they are far from being thoroughly understood and utilised to their full potential. Here we show that CNNs can recognise patterns in images with scattered pixels and can be used to analyse complex datasets by transforming them into pseudo images with minimal processing for any high dimensional dataset, representing a more general approach to the application of CNNs to datasets such as in molecular biology, text, and speech. We introduce a pipeline called DeepMapper , which allows analysis of very high dimensional datasets without intermediate filtering and dimension reduction, thus preserving the full texture of the data, enabling detection of small variations normally deemed ‘noise’. We demonstrate that DeepMapper can identify very small perturbations in large datasets with mostly random variables, and that it is superior in speed and on par in accuracy to prior work in processing large datasets with large numbers of features.

Similar content being viewed by others

literature review of neural network

Deep learning for cellular image analysis

literature review of neural network

Off-the-shelf deep learning is not enough, and requires parsimony, Bayesianity, and causality

literature review of neural network

A guide to machine learning for biologists

Introduction.

There are exponential increases in data 1 especially from highly complex systems, whose non-linear interactions and relationships are not well understood, and which can display major or unexpected changes in response to small perturbations, known as the ‘Butterfly effect’ 2 .

In domains characterised by high-dimensional data, traditional statistical methods and Machine Learning (ML) techniques make heavy use of feature engineering that incorporates extensive filtering, selection of highly variable parameters, and dimension reduction techniques such as Principal Component Analysis (PCA) 3 . Most current tools filter out smaller changes in data, mostly considered artefacts or `noise`, which may contain information that is paramount to understanding the nature and behaviour of such highly complex systems 4 .

The emergence of Deep Learning (DL) offers a paradigm shift. DL algorithms, underpinned by adaptive learning mechanisms, can discern both linear and non-linear data intricacies, and open avenues to analyse data that is not possible or practical by conventional techniques 5 , particularly in complex domains such as image, temporal sequence analysis, molecular biology, and astronomy 6 . DL models, such as Convolutional Neural Networks (CNNs) 7 , Recurrent Neural Networks (RNNs) 8 , Generative Network s 9 and Transformers 10 , have demonstrated exceptional performance in various domains, such as image and speech recognition, natural language processing, and game playing 6 . CNNs and LSTMs were found to be great tools to predict behaviour of so called `chaotic` systems 11 . Modern DL systems often surpass human-level performance, and challenge humans even in creative endeavours.

CNNs utilise a unique architecture that comprises several layers, including convolutional layers, pooling layers, and fully connected layers, to process and transform the input data hierarchically 5 . CNNs have no knowledge of sequence, and therefore are generally not used in analysing time-series or similar data, which is traditionally attempted with Recurrent Neural Networks (RNNs) 12 and Long Short-Term Memory networks (LSTMs) 8 due to their ability to capture temporal patterns. Where CNNs have been employed for sequence or time-series analysis, 1-dimensional (1D) CNNs have been selected because of their vector based 1D input structure 13 . However, attempts to analyse such data in 1D CNNs do not always give superior results 14 . In addition, GPU (Graphical Processing Units) systems are not always optimised for processing 1D CNNs, therefore even though 1D CNNs have fewer parameters than 2-dimensional (2D) CNNs, 2D CNNs can outperform 1D CNNs 15 .

Transformers , introduced by Vaswani et al. 10 , have recently come to prominence, particularly for tasks where data are in the form of time series or sequences, in domains ranging from language modelling to stock market prediction 16 . Transformers leverage self-attention, a key component that allows a model to weigh and focus on various parts of an input sequence when producing an output, enabling the capture of long-range dependencies in data. Unlike CNNs, which use local receptive fields, self-attention weighs the significance of various parts of the input data 17 .

Following success with sequence-based tasks, Transformers are being extended to image processing. Vision-Transformers in object detection 18 , Detection Transformers 19 and lately Real-time Detection Transformers all claim superiority over CNNs 20 . However, their inference operations demand far more resources than CNNs and trail CNNs in flexibility. They also suffer similar augmentation problems as CNNs. More recently, Retentive-Networks have been offered as an alternative to Transformers 21 and may soon challenge the Transformer architecture.

CNNs can recognise dispersed patterns

Even though CNNs are widely used, there are some misconceptions, notably that CNNs are largely limited to image data, and require established spatial relationships between pixels in images, both of which are open to challenge. The latter is of particular importance when considering the potential of CNNs to analyse complex non-image datasets, whose data structures are arbitrary.

Moreover, while CNNs are universal function approximators 22 , they may not always generalise 23 , especially if they are trained on data that is insufficient to cover the solution space 24 . It is also known that they can spontaneously generalise even when supplied with a small number of samples during training after overfitting, called ‘grokking’ 25 , 26 . CNNs can generalise from scattered data if given enough samples, or if they grok, and this can be determined by observing changes to training versus testing accuracy and loss.

Non-image processing with CNNs

While CNNs have achieved remarkable success in computer vision applications, such as image classification and object detection 7 , 27 , they have also been employed in other domains to a lesser degree with impressive results, including: (1) natural language processing, text classification, sentiment analysis and named entity recognition, by treating text data as a one-dimensional image with characters represented as pixels 16 , 28 ; (2) audio processing, such as speech recognition, speaker identification and audio event detection, by applying convolutions over time frequency representations of audio signals 29 ; (3) time series analysis, such as financial market prediction, human activity recognition and medical signal analysis, using one-dimensional convolutions to capture local temporal patterns and learn features from time series data 30 ; and (4) biopolymer (e.g., DNA) sequencing, using 2D CNNs to accurately classify molecular barcodes in raw signals from Oxford Nanopore sequencers using a transformation to turn a 1D signal into 2D images—improving barcode identification recovery from 38 to over 85% 31 .

Indeed, CNNs are not perfect tools for image processing as they do not develop semantic understanding of images even though they can be trained to do semantic segmentation 32 . They cannot easily recognise negative images when trained with positive images 33 . CNNs are also sensitive to the orientation and scale of objects and must rely on augmentation of image datasets, often involving hundreds of variations of the same image 34 . There are no such changes in the perspective and orientation of data converted into flat 2D images.

In the realm of complex domains that generate huge amounts of data, augmentation is usually not required for non-image datasets, as the datasets will be rich enough. Moreover, introducing arbitrary augmentation does not always improve accuracy; indeed, introducing hand-tailored augmentation may hinder analysis 35 . If augmentation is required, it can be introduced in a data-oriented form, but even when using automated augmentation such as AutoAugment 35 or FasterAutoAugment 36 , many of the augmentations (such as shearing, translation, rotation, inversion, etc.) should not be used, and the result should be tested carefully, as augmentation may introduce artefacts.

A frequent problem with handling non-image datasets with many variables is noise. Many algorithms have been developed for noise elimination, most of which are domain specific. CNNs can be trained to use the whole input space with minimal filtering and no dimension reduction, and can find useful information in what might be ascribed as ‘noise’ 4 , 37 . Indeed, a key reason to retain ‘noise’ is to allow discovery of small perturbations that cannot be detected by other methods 11 .

Conversion of non-image data to artificial images for CNN processing

Transforming sequence data to images without resorting to dimension reduction or filtering offers a potent toolset for discerning complex patterns in time series and sequence data, which potentiates the two major advantages of CNNs compared to RNNs, LSTMs and Transformers . First, CNNs do not depend on past data to recognise current patterns, which increases sensitivity to detect patterns that appear in the beginning of time-series or sequence data. Second, 2D CNNs are better optimised for GPUs and highly parallelizable, and are consequently faster than other current architectures, which accelerates training and inference, while reducing resource and energy consumption during in all phases including image transformation, training, and inference significantly.

Image data such as MNIST represented in a matrix can be classified by basic deep networks such as Multi-level Perceptrons (MLP) by turning their matrix representation to vectors (Fig.  1 a). Using this approach analysis of images becomes increasingly complex as the image size grows, increasing the input parameters of MLP and the computational cost exponentially. On the other hand, 2D CNNs can handle the original matrix much faster than MLP with equal or better accuracy and scale to much larger images.

figure 1

Conversion of images to vectors and vice versa. ( a ) Basic operation of transformation of an image to a vector, forming a sequence representation of the numeric values of pixels. ( b ) Transforming a vector to a matrix, forming an image by encoding numerical values as pixels. During this operation if the vector size cannot be mapped to m X n because vector size is smaller than the nearest m X n, then it is padded with zeroes to the nearest m X n.

Just like how a simple neural network analyses a 2D image by turning it into a vector, the reciprocal is also true—data in a vector can be converted to a 2D matrix (Fig.  1 b). Vectors converted to such matrices form arbitrary patterns that are incomprehensible to human eye. A similar technique for such mapping has also been proposed by Kovelarchuk et al. using another algorithm called CPC-R 38 .

Attribution

An important aspect of any analysis is to be able to identify those variables that are most important and the degree to which they contribute to a given classification. Identifying these variables is particularly challenging in CNNs due to their complex hierarchical architecture, and many non-linear transformations 39 . To address this problem many ‘attribution methods’ have been developed to try to quantify the contribution of each variable (e.g., pixels in images) to the final output for deep neural networks and CNNs 40 .

Saliency maps serve as an intuitive attribution and visualisation tool for CNNs, spotlighting regions in input data that significantly influence the model's predictions 27 . By offering a heatmap representation, these maps illuminate key features that the model deems crucial, thus aiding in demystifying the model's decision-making process. For instance, when analysing an image of a cat, the saliency map would emphasise the cat's distinct features over the background. While their simplicity facilitates understanding even for those less acquainted with deep learning, saliency maps do face challenges, particularly their sensitivity to noise and occasional misalignment with human intuition 41 , 42 , 43 . Nonetheless, they remain a pivotal tool in enhancing model transparency and bridging the interpretability gap between ML models and human comprehension.

Several methods have been proposed for attribution, including Guided Backpropagation 44 , Layer-wise Relevance Propagation 45 , Gradient-weighted Class Activation Mapping 46 , Integrated Gradients 47 , DeepLIFT 48 , and SHAP (SHapley Additive exPlanations) 49 . Many of these methods were developed because it is challenging to identify important input features when there are different images with the same label (e.g., ‘bird’ with many species) presented at different scales, colours, and perspectives. In contrast, most non-image data does not have such variations, as each pixel corresponds to the same feature. For this reason, choosing attributions with minimal processing is sufficient to identify the salient input variables that have the maximal impact on classification.

Here we introduce a new analytical pipeline, DeepMapper , which applies a non-indexed or indexed mapping to the data representing each data point with one pixel, enabling the classification or clustering of data using 2D CNNs. This simple direct mapping has been tried by others but has not been tested with datasets with sufficiently large amounts of data in various conditions. We use raw data with minimal filtering and no dimension reduction to preserve small perturbations in data that are normally removed, in order to assess their impact.

The pipeline includes conversion of data, separation to training and validation, assessment of training quality, attribution, and accumulation of results in a pipeline. The pipeline is run multiple times until a consensus is reached. The significant variables can then be identified using attribution and exported appropriately.

The DeepMapper architecture is shown in Fig.  2 . The complete algorithm of DeepMapper is detailed in the “ Methods ” section and the Python source code is supplied at GitHub 50 .

figure 2

DeepMapper architecture. DeepMapper uses sequence or multi-variate data as input. The first step of DeepMapper is to merge and if required index input files to prepare them into matrix format. The data are normalised using log normalisation, then folded to a matrix. Folding is performed either directly with the natural order of the data or by using the index that is generated or supplied during the data import. After folding, the data are kept in temporary storage and separated to ‘train’ and ‘test’ using SciPy train test split. Training is done using either using CNNs that are supplied by the PyTorch libraries, or a custom CNN supplied ( ResNet18 is used by default). Intermediary results are run through attribution algorithms supplied by the Captum 51 and saved to run history log. The run is then repeated until convergence is achieved, or until a pre-determined number of iterations are performed by shuffling training testing and validation data. Results are summarised in a report with exportable tables and graphics. Attribution is applied to true positives and true negatives, and these are translated back to features to be added to reports. Further details can be directly found in the accompanying code 50 .

DeepMapper is developed to implement an approach to process high-dimensional data without resorting to excessive filtering and dimension reduction techniques that eliminate smaller perturbations in data to be able to identify those differences that would otherwise be filtered out. The following algorithm is used to achieve this result:

Read and setup the running parameters.

Read the data into a tabulated form in the form of observations, features, and outcome (in the form of labels, or if self-supervised, the input itself).

If the input data includes categorical features, these features should be converted to numbers and normalised before feeding to DeepMapper .

Identify features and labels.

Do only basic filtering that eliminates observations or features if all of them are 0 or empty.

Normalise features.

Transform tabulated data to 2-dimensional matrices as illustrated in Fig.  1 a by applying a vector to matrix transformation.

If the analysis is supervised, then transform class labels to output matrices.

Begin iteration:

Separate the data into training and validation groups.

Train on the dataset for required number of epochs, until reaching satisfactory testing accuracy and loss, or maximum a pre-determined number of iterations.

If satisfactory testing results are obtained, then:

Perform attributions by associating each result to contributing input pixels using Captum, a Python library for attributions 51 .

Accumulate attribution results by collecting the attribution results for each class.

If training is satisfactory:

Tabulate attribution results by averaging accumulated attributions.

Save the model.

Report results.

The results of DeepMapper analysis can be used in 2 ways:

Supervised: DeepMapper produces a list of features that played a prominent role in the differentiation of classes.

Self-supervised: Highlights the most important features in differentiating observations from each other in a non-linear fashion. The output can be used as an alternative feature selection tool for dimension reduction.

In both modes, any hidden layer can be examined as latent space. A special bottleneck layer can be introduced to reduce dimensions for clustering purposes.

We present a simple example to demonstrate that CNNs can readily interpret data with a well dispersed pattern of pixels, using the MNIST dataset, which is widely used for hand-written image recognition and which humans as well as CNNs can easily recognise and classify based on the obvious spatial relationships between pixels (Fig.  3 ). This dataset is a more complicated problem than datasets such as the Gisette dataset 52 that was developed to distinguish between 4 and 9. It includes all digits and uses a full randomisation of pixels, and can be regenerated with the script supplied 50 and changing the seed will generate different patterns.

figure 3

A sample from MNIST dataset (left side of each image) and its shuffled counterpart (right side).

We randomly shuffled the data in Fig.  3 using the same seed 50 to obtain 60,000 training images such as those shown on the right side of each digit, and validated the results with a separate batch of 20,000 images (Fig.  3 ). Although the resulting images are no longer recognizable by eye, a CNN has no difficulty distinguishing and classifying each pattern with ~ 2% testing error compared to the reference data (Fig.  4 ). This result demonstrates that CNNs can accurately recognise global patterns in images without reliance on local relationships between neighbouring pixels. It also confirms the finding that shuffling images only marginally increases training loss 23 and extends it to testing loss (Fig.  4 ).

figure 4

Results of training MNIST dataset ( a ) and the shuffled dataset ( b ) with PyTorch model ResNet18 50 . The charts demonstrate although the training continued for 50 epochs, about 15 epochs for shuffled images ( b ) would be enough, as further training starts causing overfitting. The decrease of accuracy between normal and shuffled images is about 3%, and this difference cannot be improved by using more sophisticated CNNs with more layers, meaning shuffling images cause a measurable loss of information, yet still hold patterns recognisable by CNNs.

Testing DeepMapper

Finding slight changes in very few variables in otherwise seemingly random datasets with large numbers of variables is like finding a needle in a haystack. Such differences in data are almost impossible to detect using traditional analysis tools because small variations are usually filtered out before analysis.

We devised a simple test case to determine if DeepMapper can detect one or more variables with small but distinct variations in otherwise randomly generated data. We generated a dataset with 10,000 data items with 18,225 numeric variables as an example of a high-dimensional dataset using PyTorch’s uniform random algorithms 53 . The algorithm sets 18,223 of these variables to random numbers in the range of 0–1, and two of the variables into two distinct groups as seen in Table 1 .

We call this type of dataset ‘Needle in a haystack’ (NIHS) dataset, where very small amounts of data with small variance is hidden among a set of random variables that is order(s) of magnitude greater than the meaningful components. We provide a script that can generate this and similar datasets among the source supplied 50 .

DeepMapper was able to accurately classify the two datasets (Fig.  5 ). Furthermore, using attribution DeepMapper was also able to determine the two datapoints that have different variances in the two classes. Note that DeepMapper may not always find all the changes in the first attempt as neural network initialisation of weights is a stochastic process. However, DeepMapper o vercomes this matter via multiple iterations to establish acceptable training and testing accuracies as described in the Methods.

figure 5

In this demonstration of analysis of high dimensional data with very small perturbations, DeepMapper can find these small variations in a few (in this example two) variables out of very large number of random variables (here 18,225). ( a ) DeepMapper representations of each record. ( b ) The result of the test run of the classification with unseen data (3750 elements). ( c ) The first and second variables in the graph are measurably higher than the other variables.

Comparison of DeepMapper with DeepInsight

DeepInsight 54 is the most general approach published to date for converting non-image data into image-like structures, with the claim that these processed structures allow CNNs to capture complex patterns and features in the data. DeepInsight offers an algorithm to create images that have similar features collated into a “well organised image form”, or by applying one of several dimensionality reduction algorithms (e.g., t-SNE, PCA or KPCA) 54 . However, these algorithms add computational complexity, potentially eliminate valuable information, limit the abilities of CNNs to find small perturbations, and make it more difficult to use attribution to determine most notable features impacting analysis as multiple features may overlap in the transformed image. In contrast DeepMapper uses a direct mapping mechanism where each feature corresponds to one pixel.

To identify important input variables, DeepInsight authors later developed DeepFeature 55 using an elaborate mechanism to associate image areas identified by attribution methods to the input variables. DeepMapper uses a simpler approach as each pixel corresponds to only one variable and can use any of the attribution methods to link results to its input space. While both DeepMapper and DeepInsight follow the general idea that non-image data can be processed with 2D CNNs, DeepMapper uses a much simpler and faster algorithm, while DeepInsight chooses a sophisticated set of algorithms to convert non-image data to images, dramatically increasing computational cost. The DeepInsight conversion process is not designed to utilise GPUs so cannot be accelerated by better hardware, and the obtained images may be larger than the number of data points, also impacting performance.

One of the biggest differences between DeepFeature and DeepMapper is that DeepFeature in many cases selects multiple features during attribution because DeepInsight pixels represent multiple values, whereas each DeepMapper pixel represents one input feature, therefore it can determine differentiating features with pinpoint accuracy at a resolution of 1 pixel per feature.

The DeepInsight manuscript offers various examples of data to demonstrate its abilities. However, many of the examples use low dimensions (20–4000 features) while today’s complex datasets may regularly require tens of thousands to millions of features such as in genome analysis in biology and radio-telescope analysis in astronomy. As such, several examples provided by DeepInsight have insufficient dimensions for a sophisticated mechanism such as DeepMapper , which should ideally have 10,000 or more dimensions as required by modern complex datasets. DeepInsight examples include a speech dataset from the TIMIT corpus with 39 dimensions, Relathe (text) dataset, which is derived from newsgroup documents and partitioned evenly across different newsgroups. It contains 1427 samples and 4322 dimensions. The ringnorm-DELVE , which is an implementation of Leo Breiman’s ringnorm example, is a 20 dimensional, 2 class classification with 7400 samples 54 . Another example, Madelon , introduced an artificially generated dataset 2600 samples and 500 dimensions, where only 5 principal and 20 derived variables containing information. Instead, we used a much more complicated example than Madelon , an NIHS dataset 50 that we used to test DeepMapper in the first place. We attempted to run DeepInsight with NIHS data, but we could not get it to train properly and for this reason we cannot supply a comparison.

The most complex problem published by DeepInsight was the analysis of a public RNA sequencing gene expression dataset from TCGA ( https://cancergenome.nih.gov/ ) containing 6216 samples of 60,483 genes or dimensions, of which DeepInsight used 19,319. We selected this example as the second demonstration of application of DeepMapper to high dimensional data, as well as a benchmark for comparison with DeepInsight .

We generated the data using the R script offered by DeepInsight 54 and ran DeepMapper as well as DeepInsight using the generated dataset to compare accuracy and speed. In this test DeepMapper exhibited much improved processing speed with near identical accuracy (Table 2 , Fig.  6 ).

figure 6

Analysis of TCGA data by DeepInsight vs DeepMapper: The image on the top was generated by DeepInsight using its default values and a t-SNE transformer supplied by DeepInsight . The image at the bottom was generated by DeepMapper. Image conversion and training speeds and the analysis results can be found in Table 2 .

CNNs are fundamentally sophisticated pattern matchers that can establish intricate mappings between input features and output representations 6 . They excel at transforming various inputs into outputs, including identifying classes or bounding boxes, through a series of operations involving convolution, pooling, and activation functions 7 , 56 .

Even though CNNs are in the centre of many of today’s revolutionary AI systems from self-driving cars to generative AI systems such as Dall-E-2 , MidJourney and Stable Diffusion , they are still not well understood nor efficiently utilised, and their usage beyond image analysis has been limited.

While CNNs used in image analysis are constrained historically and practically to a 224 × 224 matrix or a similar fixed size input, this limitation arises for pre-trained models. When CNNs have not been pre-trained, one can select a much wider variety of sizes as input shape depending on the CNN architecture. Some CNNs are more flexible in their input size that implemented with adaptive pooling layers such as ResNet18 using adaptive pooling 57 . This provides flexibility to choose optimal sizes for the task in hand for non-image applications, as most non-image applications will not use pre-trained CNNs.

Here we have demonstrated uses of CNNs that are outside the norm. There is a need for analysis of complex data with many thousands of features that are not primarily images. There is also a lack of tools that offer minimal conversion of non-image data to image-like formats that then can easily be processed with CNNs in classification and clustering tasks. As a lot of this data is coming from complex systems that have a lot of features, DeepMapper offers a way of investigating such data in ways that may not be possible with traditional approaches.

Although DeepMapper currently uses CNN as its AI component, alternative analytic strategies can easily be substituted in lieu of CNN with minimal changes, such as Vision Transformers 18 or RetNets 21 , which have great potential for this application. While Transformers and RetNets have input size limitations for inference in terms of number of tokens. Vision Transformers can handle much larger inputs by dividing images to segments that incorporate multiple pixels 18 . This type of approach would be applicable to both Transformers and RetNets , and future architectures. DeepMapping can leverage these newer architectures, and others, in the future 57 .

Data availability

DeepMapper is released as an open source tool on GitHub https://github.com/tansel/deepmapper . Data that is not available from GitHub because of size constraints can be requested from the authors.

Taylor, P. Volume of data/information created, captured, copied, and consumed worldwide from 2010 to 2020, with forecasts from 2021 to 2025. https://www.statista.com/statistics/871513/worldwide-data-created/ (2023).

Ghys, É. The butterfly effect. in The Proceedings of the 12th International Congress on Mathematical Education: Intellectual and attitudinal challenges, pp. 19–39 (Springer). (2015).

Jolliffe, I. T. Mathematical and statistical properties of sample principal components. Principal Component Analysis , pp. 29–61 (Springer). https://doi.org/10.1007/0-387-22440-8_3 (2002).

Landauer, R. The noise is the signal. Nature 392 , 658–659. https://doi.org/10.1038/33551 (1998).

Article   ADS   CAS   Google Scholar  

Goodfellow, I., Bengio, Y. & Courville, A. Deep Learning (MIT Press). http://www.deeplearningbook.org (2016).

LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521 , 436–444. https://doi.org/10.1038/nature14539 (2015).

Article   ADS   CAS   PubMed   Google Scholar  

Krizhevsky, A., Sutskever, I. & Hinton, G. E. ImageNet classification with deep convolutional neural networks. Commun. ACM 60 , 84–90. https://doi.org/10.1145/3065386 (2017).

Article   Google Scholar  

Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9 , 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735 (1997).

Article   CAS   PubMed   Google Scholar  

Goodfellow, I. et al. Generative adversarial nets. Commun. ACM 63 , 139–144. https://doi.org/10.1145/3422622 (2020).

Vaswani, A. et al. Attention is all you need. NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems , pp. 6000–6010. https://doi.org/10.5555/3295222.3295349 (2017).

Barrio, R. et al. Deep learning for chaos detection. Chaos 33 , 073146. https://doi.org/10.1063/5.0143876 (2023).

Article   ADS   MathSciNet   PubMed   Google Scholar  

Levin, E. A recurrent neural network: limitations and training. Neural Netw. 3 , 641–650. https://doi.org/10.1016/0893-6080(90)90054-O (1990).

LeCun, Y. & Bengio, Y. Convolutional networks for images, speech, and time series. in The handbook of brain theory and neural networks, pp. 255–258. https://doi.org/10.5555/303568.303704 (MIT Press, 1998).

Wu, Y., Yang, F., Liu, Y., Zha, X. & Yuan, S. A comparison of 1-D and 2-D deep convolutional neural networks in ECG classification. arXiv preprint arXiv:1810.07088 . https://doi.org/10.48550/arXiv.1810.07088 (2018).

Hu, J. et al. A multichannel 2D convolutional neural network model for task-evoked fMRI data classification. Comput. Intell. Neurosci. 2019 , 5065214. https://doi.org/10.1155/2019/5065214 (2019).

Article   PubMed   PubMed Central   Google Scholar  

Zhang, S. et al. A deep learning framework for modeling structural features of RNA-binding protein targets. Nucleic Acids Res. 44 , e32. https://doi.org/10.1093/nar/gkv1025 (2016).

Article   PubMed   Google Scholar  

Maurício, J., Domingues, I. & Bernardino, J. Comparing vision transformers and convolutional neural networks for image classification: A literature review. Appl. Sci. 13 , 5521. https://doi.org/10.3390/app13095521 (2023).

Article   CAS   Google Scholar  

Dosovitskiy, A. et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 . https://doi.org/10.48550/arXiv.2010.11929 (2020).

Carion, N. et al. End-to-end object detection with transformers. Computer Vision-ECCV 2020 (Springer), pp. 213–229. https://doi.org/10.1007/978-3-030-58452-8_13 (2020).

Lv, W. et al. DETRs beat YOLOs on real-time object detection. arXiv preprint arXiv:2304.08069 . https://doi.org/10.48550/arXiv.2304.08069 (2023).

Sun, Y. et al. Retentive network: A successor to transformer for large language models. arXiv preprint arXiv:2307.08621 . https://doi.org/10.48550/arXiv.2307.08621 (2023).

Zhou, D.-X. Universality of deep convolutional neural networks. Appl. Comput. Harmonic Anal. 48 , 787–794. https://doi.org/10.1016/j.acha.2019.06.004 (2020).

Article   MathSciNet   Google Scholar  

Chiyuan, Z., Bengio, S., Hardt, M., Recht, B. & Vinyals, O. Understanding deep learning (still) requires rethinking generalization. Commun. ACM 64 , 107–115. https://doi.org/10.1145/3446776 (2021).

Ma, W., Papadakis, M., Tsakmalis, A., Cordy, M. & Traon, Y. L. Test selection for deep learning systems. ACM Trans. Softw. Eng. Methodol. 30 , 13. https://doi.org/10.1145/3417330 (2021).

Liu, Z., Michaud, E. J. & Tegmark, M. Omnigrok: grokking beyond algorithmic data. arXiv preprint arXiv:2210.01117 . https://doi.org/10.48550/arXiv.2210.01117 (2022).

Power, A., Burda, Y., Edwards, H., Babuschkin, I. & Misra, V. Grokking: generalization beyond overfitting on small algorithmic datasets. arXiv preprint arXiv:2201.02177 . https://doi.org/10.48550/arXiv.2201.02177 (2022).

Simonyan, K., Vedaldi, A. & Zisserman, A. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 . https://doi.org/10.48550/arXiv.1312.6034 (2013).

Kim, Y. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 . https://doi.org/10.48550/arXiv.1408.5882 (2014).

Abdel-Hamid, O. et al. Convolutional neural networks for speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 22 , 1533–1545. https://doi.org/10.1109/TASLP.2014.2339736 (2014).

Hatami, N., Gavet, Y. & Debayle, J. Classification of time-series images using deep convolutional neural networks. in Proceedings Tenth International Conference on Machine Vision (ICMV 2017) 10696 , 106960Y. https://doi.org/10.1117/12.2309486 (2018).

Smith, M. A. et al. Molecular barcoding of native RNAs using nanopore sequencing and deep learning. Genome Res. 30 , 1345–1353. https://doi.org/10.1101/gr.260836.120 (2020).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Emek Soylu, B. et al. Deep-Learning-based approaches for semantic segmentation of natural scene images: A review. Electronics 12 , 2730. https://doi.org/10.3390/electronics12122730 (2023).

Hosseini, H., Xiao, B., Jaiswal, M. & Poovendran, R. On the limitation of Convolutional Neural Networks in recognizing negative images. in 16th IEEE International Conference on Machine Learning and Applications, pp. 352–358. https://ieeexplore.ieee.org/document/8260656 (2017).

Montserrat, D. M., Lin, Q., Allebach, J. & Delp, E. J. Training object detection and recognition CNN models using data augmentation. Electron. Imaging 2017 , 27–36. https://doi.org/10.2352/ISSN.2470-1173.2017.10.IMAWM-163 (2017).

Cubuk, E. D., Zoph, B., Mane, D., Vasudevan, V. & Le, Q. V. Autoaugment: learning augmentation policies from data. arXiv preprint arXiv:1805.09501 . https://doi.org/10.48550/arXiv.1805.09501 (2018).

Hataya, R., Zdenek, J., Yoshizoe, K. & Nakayama, H. Faster AutoAugment: Learning augmentation strategies using backpropagation, in Computer Vision–ECCV 2020: 16th European Conference, Proceedings, Part XXV, pp. 1–16 (Springer). https://doi.org/10.1007/978-3-030-58595-2_1 (2020).

Xiao, K., Engstrom, L., Ilyas, A. & Madry, A. Noise or signal: the role of image backgrounds in object recognition. arXiv preprint arXiv:2006.09994 . https://doi.org/10.48550/arXiv.2006.09994 (2020).

Kovalerchuk, B., Kalla, D. C. & Agarwal, B., Deep learning image recognition for non-images, in Integrating artificial intelligence and visualization for visual knowledge discovery (eds. Kovalerchuk, B., et al. ) pp. 63–100 (Springer). https://doi.org/10.1007/978-3-030-93119-3_3 (2022).

Samek, W., Binder, A., Montavon, G., Lapuschkin, S. & Muller, K. R. Evaluating the visualization of what a deep neural network has learned. IEEE Trans. Neural Netw. Learn. Syst. 28 , 2660–2673. https://doi.org/10.1109/tnnls.2016.2599820 (2017).

Article   MathSciNet   PubMed   Google Scholar  

Montavon, G., Samek, W. & Müller, K.-R. Methods for interpreting and understanding deep neural networks. Digital Signal Process. 73 , 1–15. https://doi.org/10.1016/j.dsp.2017.10.011 (2018).

De Cesarei, A., Cavicchi, S., Cristadoro, G. & Lippi, M. Do humans and deep convolutional neural networks use visual information similarly for the categorization of natural scenes?. Cognit. Sci. 45 , e13009. https://doi.org/10.1111/cogs.13009 (2021).

Kindermans, P.-J. et al. The (un) reliability of saliency methods, in Explainable AI: Interpreting, explaining and visualizing deep learning. Lecture Notes in Computer Science 11700 , pp. 267–280 (Springer). https://doi.org/10.1007/978-3-030-28954-6_14 (2019).

Zeiler, M. D. & Fergus, R. Visualizing and understanding convolutional networks. Computer Vision—ECCV 2014, pp. 818–833 (Fleet, D., Pajdla T., Schiele, B., & Tuytelaars, T., eds) (Springer). https://doi.org/10.1007/978-3-319-10590-1_53 (2014).

Springenberg, J. T., Dosovitskiy, A., Brox, T. & Riedmiller, M. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806 . https://doi.org/10.48550/arXiv.1412.6806 (2014).

Binder, A., Montavon, G., Lapuschkin, S., Müller, K.-R. & Samek, W. Layer-wise relevance propagation for neural networks with local renormalization layers, in Artificial Neural Networks and Machine Learning–ICANN 2016: Proceedings 25th International Conference on Artificial Neural Networks, pp. 63–71 (Springer). https://doi.org/10.1007/978-3-319-44781-0_8 (2016).

Selvaraju, R. R. et al. Grad-cam: visual explanations from deep networks via gradient-based localization. Proceedings of the 2017 IEEE international conference on computer vision, pp. 618–626. https://ieeexplore.ieee.org/document/8237336 (2017).

Sundararajan, M., Taly, A. & Yan, Q. (2017) Axiomatic attribution for deep networks. in Proceedings of the 34th International Conference on Machine Learning 70 , 3319–3328. https://doi.org/10.5555/3305890.3306024 .

Shrikumar, A., Greenside, P. & Kundaje, A. Learning important features through propagating activation differences. in Proceedings of the 34th International Conference on Machine Learning 70 , 3145–3153. https://doi.org/10.5555/3305890.3306006 (2017).

Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. in Proceedings of the 31st International Conference on Machine Learning, pp . 4768–4777. https://doi.org/10.5555/3295222.3295230 (2017).

Ersavas, T. Deepmapper. https://github.com/tansel/deepmapper (2023).

Kokhlikyan, N. et al. Captum: A unified and generic model interpretability library for pytorch. arXiv preprint arXiv:2009.07896 . https://doi.org/10.48550/arXiv.2009.07896 (2020).

Guyon, I. G. S. B.-H. A. & Dror, G. Gisette. UCI Machine Learning Repository . https://archive.ics.uci.edu/dataset/170/gisette (2008).

PyTorch, torch.rand. https://pytorch.org/docs/stable/generated/torch.rand.html (2023).

Sharma, A., Vans, E., Shigemizu, D., Boroevich, K. A. & Tsunoda, T. DeepInsight: A methodology to transform a non-image data to an image for convolution neural network architecture. Sci. Rep. 9 , 11399. https://doi.org/10.1038/s41598-019-47765-6 (2019).

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Sharma, A., Lysenko, A., Boroevich, K. A., Vans, E. & Tsunoda, T. DeepFeature: feature selection in nonimage data using convolutional neural network. Brief. Bioinform. 22 , bbab297. https://doi.org/10.1093/bib/bbab297 (2021).

Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 . https://doi.org/10.48550/arXiv.1409.1556 (2014).

Pytorch2, AdaptiveAvgPool2d. https://pytorch.org/docs/stable/generated/torch.nn.AdaptiveAvgPool2d.html (2023).

Download references

Acknowledgements

We thank Murat Karaorman, Mitchell Cummins, and Fatemeh Vafaee for helpful advice and comments on the manuscript. This research is supported by an Australian Government Research Training Program Scholarships RSAI8000 and RSAP1000 to T.E., a Fonds de Recherche du Quebec Santé Junior 1 Award 284217 to M.A.S., and UNSW SHARP Grant RG193211 to J.S.M.

Author information

Authors and affiliations.

School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Sydney, NSW, 2052, Australia

Tansel Ersavas, Martin A. Smith & John S. Mattick

Department of Biochemistry and Molecular Medicine, Faculty of Medicine, Université de Montréal, Montréal, QC, H3C 3J7, Canada

Martin A. Smith

CHU Sainte-Justine Research Centre, Montreal, Canada

UNSW RNA Institute, UNSW Sydney, Australia

You can also search for this author in PubMed   Google Scholar

Contributions

T.E. developed the methods, implemented DeepMapper and produced the first draft of the paper. J.S.M. provided advice, structured the paper, and edited it for improved readability and clarity. M.A.S. provided advice and edited the paper.

Corresponding authors

Correspondence to Tansel Ersavas or John S. Mattick .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Ersavas, T., Smith, M.A. & Mattick, J.S. Novel applications of Convolutional Neural Networks in the age of Transformers. Sci Rep 14 , 10000 (2024). https://doi.org/10.1038/s41598-024-60709-z

Download citation

Received : 16 January 2024

Accepted : 26 April 2024

Published : 01 May 2024

DOI : https://doi.org/10.1038/s41598-024-60709-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

literature review of neural network

  • Open access
  • Published: 16 January 2024

A review of graph neural networks: concepts, architectures, techniques, challenges, datasets, applications, and future directions

  • Bharti Khemani 1 ,
  • Shruti Patil 2 ,
  • Ketan Kotecha 2 &
  • Sudeep Tanwar 3  

Journal of Big Data volume  11 , Article number:  18 ( 2024 ) Cite this article

8865 Accesses

4 Citations

Metrics details

Deep learning has seen significant growth recently and is now applied to a wide range of conventional use cases, including graphs. Graph data provides relational information between elements and is a standard data format for various machine learning and deep learning tasks. Models that can learn from such inputs are essential for working with graph data effectively. This paper identifies nodes and edges within specific applications, such as text, entities, and relations, to create graph structures. Different applications may require various graph neural network (GNN) models. GNNs facilitate the exchange of information between nodes in a graph, enabling them to understand dependencies within the nodes and edges. The paper delves into specific GNN models like graph convolution networks (GCNs), GraphSAGE, and graph attention networks (GATs), which are widely used in various applications today. It also discusses the message-passing mechanism employed by GNN models and examines the strengths and limitations of these models in different domains. Furthermore, the paper explores the diverse applications of GNNs, the datasets commonly used with them, and the Python libraries that support GNN models. It offers an extensive overview of the landscape of GNN research and its practical implementations.

Introduction

Graph Neural Networks (GNNs) have emerged as a transformative paradigm in machine learning and artificial intelligence. The ubiquitous presence of interconnected data in various domains, from social networks and biology to recommendation systems and cybersecurity, has fueled the rapid evolution of GNNs. These networks have displayed remarkable capabilities in modeling and understanding complex relationships, making them pivotal in solving real-world problems that traditional machine-learning models struggle to address. GNNs’ unique ability to capture intricate structural information inherent in graph-structured data is significant. This information often manifests as dependencies, connections, and contextual relationships essential for making informed predictions and decisions. Consequently, GNNs have been adopted and extended across various applications, redefining what is possible in machine learning.

In this comprehensive review, we embark on a journey through the multifaceted landscape of Graph Neural Networks, encompassing an array of critical aspects. Our study is motivated by the ever-increasing literature and diverse perspectives within the field. We aim to provide researchers, practitioners, and students with a holistic understanding of GNNs, serving as an invaluable resource to navigate the intricacies of this dynamic field. The scope of this review is extensive, covering fundamental concepts that underlie GNNs, various architectural designs, techniques for training and inference, prevalent challenges and limitations, the diversity of datasets utilized, and practical applications spanning a myriad of domains. Furthermore, we delve into the intriguing future directions that GNN research will likely explore, shedding light on the exciting possibilities.

In recent years, deep learning (DL) has been called the gold standard in machine learning (ML). It has also steadily evolved into the most widely used computational technique in ML, producing excellent results on various challenging cognitive tasks, sometimes even matching or outperforming human ability. One benefit of DL is its capacity to learn enormous amounts of data [ 1 ]. GNN variations such as graph convolutional networks (GCNs), graph attention networks (GATs), and GraphSAGE have shown groundbreaking performance on various deep learning tasks in recent years [ 2 ].

A graph is a data structure that consists of nodes (also called vertices) and edges. Mathematically, it is defined as G = (V, E), where V denotes the nodes and E denotes the edges. Edges in a graph can be directed or undirected based on whether directional dependencies exist between nodes. A graph can represent various data structures, such as social networks, knowledge graphs, and protein–protein interaction networks. Graphs are non-Euclidean spaces, meaning that the distance between two nodes in a graph is not necessarily equal to the distance between their coordinates in an Euclidean space. This makes applying traditional neural networks to graph data difficult, as they are typically designed for Euclidean data.

Graph neural networks (GNNs) are a type of deep learning model that can be used to learn from graph data. GNNs use a message-passing mechanism to aggregate information from neighboring nodes, allowing them to capture the complex relationships in graphs. GNNs are effective for various tasks, including node classification, link prediction, and clustering.

Organization of paper

The paper is organized as follows:

The primary focus of this research is to comprehensively examine Concepts, Architectures, Techniques, Challenges, Datasets, Applications, and Future Directions within the realm of Graph Neural Networks.

The paper delves into the Evolution and Motivation behind the development of Graph Neural Networks, including an analysis of the growth of publication counts over the years.

It provides an in-depth exploration of the Message Passing Mechanism used in Graph Neural Networks.

The study presents a concise summary of GNN learning styles and GNN models, complemented by an extensive literature review.

The paper thoroughly analyzes the Advantages and Limitations of GNN models when applied to various domains.

It offers a comprehensive overview of GNN applications, the datasets commonly used with GNNs, and the array of Python libraries that support GNN models.

In addition, the research identifies and addresses specific research gaps, outlining potential future directions in the field.

" Introduction " section describes the Introduction to GNN. " Background study " section provides background details in terms of the Evolution of GNN. " Research motivation " section describes the research motivation behind GNN. Section IV describes the GNN message-passing mechanism and the detailed description of GNN with its Structure, Learning Styles, and Types of tasks. " GNN Models and Comparative Analysis of GNN Models " section describes the GNN models with their literature review details and comparative study of different GNN models. " Graph Neural Network Applications " section describes the application of GNN. And finally, future direction and conclusions are defined in " Future Directions of Graph Neural Network " and " Conclusions " sections, respectively. Figure  1 gives the overall structure of the paper.

figure 1

The overall structure of the paper

Background study

As shown in Fig.  2 below, the evolution of GNNs started in 2005. For the past 5 years, research in this area has been going into great detail. Neural graph networks are being used by practically all researchers in fields such as NLP, computer vision, and healthcare.

figure 2

Year-wise publication count of GNN (2005–2022)

Graph neural network research evolution

Graph neural networks (GNNs) were first proposed in 2005, but only recently have they begun to gain traction. GNNs were first introduced by Gori [2005] and Scarselli [2004, 2009]. A node's attributes and connected nodes in the graph serve as its natural definitions. A GNN aims to learn a state embedding h v ε R s that encapsulates each node's neighborhood data. The distribution of the expected node label is one example of the output. An s-dimension vector of node v, the state embedding h v , can be utilized to generate an output O v , such as the anticipated distribution node name. The predicted node label (O v ) distribution is created using the state embedding h v [ 30 ]. Thomas Kipf and Max Welling introduced the convolutional graph network (GCN) in 2017. A GCN layer defines a localized spectral filter's first-order approximation on graphs. GCNs can be thought of as convolutional neural networks that have been expanded to handle graph-structured data.

Graph neural network evolution

As shown in Fig.  3 below, research on graph neural networks (GNNs) began in 2005 and is still ongoing. GNNs can define a broader class of graphs that can be used for node-focused tasks, edge-focused tasks, graph-focused tasks, and many other applications. In 2005, Marco Gori introduced the concept of GNNs and defined recursive neural networks extended by GNNs [ 4 ]. Franco Scarselli also explained the concepts for ranking web pages with the help of GNNs in 2005 [ 5 ]. In 2006, Swapnil Gandhi and Anand Padmanabha Iyer of Microsoft Research introduced distributed deep graph learning at scale, which defines a deep graph neural network [ 6 ]. They explained new concepts such as GCN, GAT, etc. [ 1 ]. Pucci and Gori used GNN concepts in the recommendation system.

figure 3

Graph Neural Network Evolution

2007 Chun Guang Li, Jun Guo, and Hong-gang Zhang used a semi-supervised learning concept with GNNs [ 7 ]. They proposed a pruning method to enhance the basic GNN to resolve the problem of choosing the neighborhood scale parameter. In 2008, Ziwei Zhang introduced a new concept of Eigen-GNN [ 8 ], which works well with several GNN models. In 2009, Abhijeet V introduced the GNN concept in fuzzy networks [ 9 ], proposing a granular reflex fuzzy min–max neural network for classification. In 2010, DK Chaturvedi explained the concept of GNN for soft computing techniques [ 10 ]. Also, in 2010, GNNs were widely used in many applications. In 2010, Tanzima Hashem discussed privacy-preserving group nearest neighbor queries [ 11 ]. The first initiative to use GNNs for knowledge graph embedding is R-GCN, which suggests a relation-specific transformation in the message-passing phases to deal with various relations.

Similarly, from 2011 to 2017, all authors surveyed a new concept of GNNs, and the survey linearly increased from 2018 onwards. Our paper shows that GNN models such as GCN, GAT, RGCN, and so on are helpful [ 12 ].

Literature review

In the Table  1 describe the literature survey on graph neural networks, including the application area, the data set used, the model applied, and performance evaluation. The literature is from the years 2018 to 2023.

Research motivation

We employ grid data structures for normalization of image inputs, typically using an n*n-sized filter. The result is computed by applying an aggregation or maximum function. This process works effectively due to the inherent fixed structure of images. We position the grid over the image, move the filter across it, and derive the output vector as depicted on the left side of Fig.  4 . In contrast, this approach is unsuitable when working with graphs. Graphs lack a predefined structure for data storage, and there is no inherent knowledge of node-to-neighbor relationships, as illustrated on the right side of Fig.  4 . To overcome this limitation, we focus on graph convolution.

figure 4

CNN In Euclidean Space (Left), GNN In Euclidean Space (Right)

In the context of GCNs, convolutional operations are adapted to handle graphs’ irregular and non-grid-like structures. These operations typically involve aggregating information from neighboring nodes to update the features of a central node. CNNs are primarily used for grid-like data structures, such as images. They are well-suited for tasks where spatial relationships between neighboring elements are crucial, as in image processing. CNNs use convolutional layers to scan small local receptive fields and learn hierarchical representations. GNNs are designed for graph-structured data, where edges connect entities (nodes). Graphs can represent various relationships, such as social networks, citation networks, or molecular structures. GNNs perform operations that aggregate information from neighboring nodes to update the features of a central node. CNNs excel in processing grid-like data with spatial dependencies; GNNs are designed to handle graph-structured data with complex relationships and dependencies between entities.

Limitation of CNN over GNN

Graph Neural Networks (GNNs) draw inspiration from Convolutional Neural Networks (CNNs). Before delving into the intricacies of GNNs, it is essential to understand why Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) may not suffice for effectively handling data structured as graphs. As illustrated in Fig.  5 , Convolutional Neural Networks (CNNs) are designed for data that exhibits a grid structure, such as images. Conversely, Recurrent Neural Networks (RNNs) are tailored to sequences, like text.

figure 5

Convolution can be performed if the input is an image using an n*n mask (Left). Convolution can't be achieved if the input is a graph using an n*n mask. (Right)

Typically, we use arrays for storage when working with text data. Likewise, for image data, matrices are the preferred choice. However, as depicted in Fig.  5 , arrays and matrices fall short when dealing with graph data. In the case of graphs, we require a specialized technique known as Graph Convolution. This approach enables deep neural networks to handle graph-structured data directly, leading to a graph neural network.

Fig. 5 illustrates that we can employ masking techniques and apply filtering operations to transform the data into vector form when we have images. Conversely, traditional masking methods are not applicable when dealing with graph data as input, as shown in the right image.

Graph neural network

Graph Neural Networks, or GNNs, are a class of neural networks tailored for handling data organized in graph structures. Graphs are mathematical representations of nodes connected by edges, making them ideal for modeling relationships and dependencies in complex systems. GNNs have the inherent ability to learn and reason about graph-structured data, enabling diverse applications. In this section, we first explained the passing mechanism of GNN (" Message Passing Mechanism in Graph Neural Network Section "), then described graphs related to the structure of graphs, graph types, and graph learning styles (" Description of GNN Taxonomy " Section).

Message passing mechanism in graph neural network

Graph symmetries are maintained using a GNN, an optimizable transformation on all graph properties (nodes, edges, and global context) (permutation invariances). Because a GNN does not alter the connectivity of the input graph, the output may be characterized using the same adjacency list and feature vector count as the input graph. However, the output graph has updated embeddings because the GNN modified each node, edge, and global-context representation.

In Fig. 6 , circles are nodes, and empty boxes show aggregation of neighbor/adjacent nodes. The model aggregates messages from A's local graph neighbors (i.e., B, C, and D). In turn, the messages coming from neighbors are based on information aggregated from their respective neighborhoods, and so on. This visualization shows a two-layer version of a message-passing model. Notice that the computation graph of the GNN forms a tree structure by unfolding the neighborhood around the target node [ 17 ]. Graph neural networks (GNNs) are neural models that capture the dependence of graphs via message passing between the nodes of graphs [ 30 ].

figure 6

How a single node aggregates messages from its adjacent neighbor nodes

The message-passing mechanism of Graph Neural Networks is shown in Fig. 7 . In this, we take an input graph with a set of node features X ε R d ⇥ |V| and Use this knowledge to produce node embeddings z u . However, we will also review how the GNN framework may embed subgraphs and whole graphs.

figure 7

Message passing mechanism in GNN

At each iteration, each node collects information from the neighborhood around it. Each node embedding has more data from distant reaches of the graph as these iterations progress. After the first iteration (k = 1), each node embedding expressly retains information from its 1-hop neighborhood, which may be accessed via a path in the length graph 1. [ 31 ]. After the second iteration (k = 2), each node embedding contains data from its 2-hop neighborhood; generally, after k iterations, each node embedding includes data from its k-hop setting. The kind of “information” this message passes consists of two main parts: structural information about the graph (i.e., degree of nodes, etc.), and the other is feature-based.

In the message-passing mechanism of a neural network, each node has its message stored in the form of feature vectors, and each time, the neighbor updates the information in the form of the feature vector [ 1 ]. This process aggregates the information, which means the grey node is connected to the blue node. Both features are aggregated and form new feature vectors by updating the values to include the new message.

Equations  4.1 and 4.2 shows that h denotes the message, u represents the node number, and k indicates the iteration number. Where AGGREGATE and UPDATE are arbitrarily differentiable functions (i.e., neural networks), and mN(u) is the “message,” which is aggregated from u's graph neighborhood N(u). We employ superscripts to identify the embeddings and functions at various message-passing iterations. The AGGREGATE function receives as input the set of embeddings of the nodes in the u's graph neighborhood N (u) at each iteration k of the GNN and generates a message. \({m}_{N(u)}^{k}\) . Based on this aggregated neighborhood information. The update function first UPDATES the message and then combines the message. \({m}_{N(u)}^{k}\) with the previous message \({h}_{u}^{(k-1)}\) of node, u to generate the updated message \({h}_{u}^{k}\) .

Description of GNN taxonomy

We can see from Fig. 8 below shows that we have divided our GNN taxonomy into 3 parts [ 30 ].

figure 8

Graph Neural Network Taxonomy

1. Graph Structures 2. Graph Types 3. Graph Learning Tasks

Graph structure

The two scenarios shown in Fig. 9 typically present are structural and non-structural. Applications involving molecular and physical systems, knowledge graphs, and other objects explicitly state the graph structure in structural contexts.

figure 9

Graph Structure

Graphs are implicit in non-structural situations. As a result, we must first construct the graph from the current task. For text, we must build a fully connected “a word” graph and a scene graph for images.

Graph types

There may be more information about nodes and links in complex graph types. Graphs are typically divided into 5 categories, as shown in Fig.  10 .

figure 10

Types of Graphs

Directed/undirected graphs

A directed graph is characterized by edges with a specific direction, indicating the flow from one node to another. Conversely, in an undirected graph, the edges lack a designated direction, allowing nodes to interact bidirectionally. As illustrated in Fig. 11 (left side), the directed graph exhibits directed edges, while in Fig. 11 (right side), the undirected graph conspicuously lacks directional edges. In undirected graphs, it's important to note that each edge can be considered to comprise two directed edges, allowing for mutual interaction between connected nodes.

figure 11

Directed/Undirected Graph

Static/dynamic graphs

The term “dynamic graph” pertains to a graph in which the properties or structure of the graph change with time. In dynamic graphs shown in Fig. 12 , it is essential to account for the temporal dimension appropriately. These dynamic graphs represent time-dependent events, such as the addition and removal of nodes and edges, typically presented as an ordered sequence or an asynchronous stream.

A noteworthy example of a dynamic graph can be observed in social networks like Twitter. In such networks, a new node is created each time a new user joins, and when a user follows another individual, a following edge is established. Furthermore, when users update their profiles, the respective nodes are also modified, reflecting the evolving nature of the graph. It's worth noting that different deep-learning libraries handle graph dynamics differently. TensorFlow, for instance, employs a static graph, while PyTorch utilizes a dynamic graph.

figure 12

Static/Dynamic Graph

Homogeneous/heterogeneous graphs

Homogeneous graphs have only one type of node and one type of edge shown in Fig. 13 (Left). A homogeneous graph is one with the same type of nodes and edges, such as an online social network with friendship as edges and nodes representing people. In homogeneous networks, nodes and edges have the same types.

Heterogeneous graphs shown in Fig. 13 (Right) , however, have two or more different kinds of nodes and edges. A heterogeneous network is an online social network with various edges between nodes of the ‘person’ type, such as ‘friendship’ and ‘co-worker.’ Nodes and edges in heterogeneous graphs come in several varieties. Types of nodes and edges play critical functions in heterogeneous networks that require further consideration.

figure 13

Homogeneous (Left), Heterogeneous (Right) Graph

Knowledge graphs

An array of triples in the form of (h, r, t) or (s, r, o) can be represented as a Knowledge Graph (KG), which is a network of entity nodes and relationship edges, with each triple (h, r, t) representing a single entity node. The relationship between an entity’s head (h) and tail (t) is denoted by the r. Knowledge Graph can be considered a heterogeneous graph from this perspective. The Knowledge Graph visually depicts several real-world objects and their relationships [ 32 ]. It can be used for many new aspects, including information retrieval, knowledge-guided innovation, and answering questions [ 30 ]. Entities are objects or things that exist in the real world, including individuals, organizations, places, music tracks, movies, and people. Each relation type describes a particular relationship between various elements similarly. We can see from Fig. 14 the Knowledge graph for Mr. Sundar Pichai.

figure 14

Knowledge graph

Transductive/inductive graphs

In a transductive scenario shown in Fig. 15 (up), the entire graph is input, the label of the valid data is hidden, and finally, the label for the correct data is predicted. However, with an inductive graph shown in Fig. 15 (down), we also input the entire graph (but only sample to batch), mask the valid data’s label, and forecast the valuable data’s label. The model must forecast the labels of the given unlabeled nodes in a transductive context. In the inductive situation, it is possible to infer new unlabeled nodes from the same distribution.

figure 15

Transductive/Inductive Graphs

Transductive Graph:

In the transductive approach, the entire graph is provided as input.

This method involves concealing the labels of the valid data.

The primary objective is to predict the labels for the valid data.

Inductive Graph:

The inductive approach still uses the complete graph, but only a sample within a batch is considered.

A crucial step in this process is masking the labels of the valid data.

The key aim here is to make predictions for the labels of the valid data.

Graph learning tasks

We perform three tasks with graphs: node classification, link prediction, and Graph Classification shown in Fig. 16 .

figure 16

Node Level Prediction (e.g., social network) (LEFT), Edge Level Prediction (e.g., Next YouTube Video?) (MIDDLE), Graph Level Prediction (e.g., molecule) (Right)

Node-level task

Node-level tasks are primarily concerned with determining the identity or function of each node within a graph. The core objective of a node-level task is to predict specific properties associated with individual nodes. For example, a node-level task in social networks could involve predicting which social group a new member is likely to join based on their connections and the characteristics of their friends' memberships. Node-level tasks are typically used when working with unlabeled data, such as identifying whether a particular individual is a smoker.

Edge-level task (link prediction)

Edge-level tasks revolve around analyzing relationships between pairs of nodes in a graph. An illustrative application of an edge-level task is assessing the compatibility or likelihood of a connection between two entities, as seen in matchmaking or dating apps. Another instance of an edge-level task is evident when using platforms like Netflix, where the task involves predicting the following video to be recommended based on viewing history and user preferences.

Graph-level

In graph-level tasks, the objective is to make predictions about a characteristic or property that encompasses the entire graph. For example, using a graph-based representation, one might aim to predict attributes like the olfactory quality of a molecule or its potential to bind with a disease-associated receptor. The essence of a graph-level task is to provide predictions that pertain to the graph as a whole. For instance, when assessing a newly synthesized chemical compound, a graph-level task might seek to determine whether the molecule has the potential to be an effective drug. The summary of all three learning tasks are shown in Fig. 17 .

figure 17

Graph Learning Tasks Summary

GNN models and comparative analysis of GNN models

Graph Neural Network (GNN) models represent a category of neural networks specially crafted to process data organized in graph structures. They've garnered substantial acclaim across various domains, primarily due to their exceptional capability to grasp intricate relationships and patterns within graph data. As illustrated in Fig.  18 , we've outlined three distinct GNN models. A comprehensive description of these GNN models, specifically Graph Convolutional Networks (GCN), Graph Attention Networks (GAT/GAN), and GraphSAGE models can be found in the reference [ 33 ]. In Sect. " GNN models ", we delve into these GNN models' intricacies; in " Comparative Study of GNN Models " section, we provide an in-depth analysis that explores their theoretical and practical aspects.

figure 18

Graph convolution neural network (GCN)

GCN is one of the basic graph neural network variants. Thomas Kipf and Max Welling developed GCN networks. Convolution layers in Convolutional Neural Networks are essentially the same process as 'convolution' in GCNs. The input neurons are multiplied by weights called filters or kernels. The filters act as a sliding window across the image, allowing CNN to learn information from nearby cells. Weight sharing uses the same filter within the same layer throughout the image; when CNN is used to identify photos of cats vs. non-cats, the same filter is employed in the same layer to detect the cat's nose and ears. Throughout the image, the same weight (or kernel or filter in CNNs) is applied [ 33 ]. GCNs were first introduced in “Spectral Networks and Deep Locally Connected Networks on Graphs” [ 34 ].

GCNs, which learn features by analyzing neighboring nodes, carry out similar behaviors. The primary difference between CNNs and GNNs is that CNNs are made to operate on regular (Euclidean) ordered data. GNNs, on the other hand, are a generalized version of CNNs with different numbers of node connections and unordered nodes (irregular on non-Euclidean structured data). GCNs have been applied to solve many problems, for example, image classification [ 35 ], traffic forecasting [ 36 ], recommendation systems [ 17 ], scene graph generation [ 37 ], and visual question answering [ 38 ].

GCNs are particularly well-suited for tasks that involve data represented as graphs, such as social networks, citation networks, recommendation systems, and more. These networks are an extension of traditional CNNs, widely used for tasks involving grid-like data, such as images. The key idea behind GCNs is to perform convolution operations on the graph data. This enables them to capture and propagate information through the nodes in a graph by considering both a node’s features and those of its neighboring nodes. GCNs typically consist of several layers, each performing convolution and aggregation steps to refine the node representations in the graph. By applying these layers iteratively, GCNs can capture complex patterns and dependencies within the graph data.

Working of graph convolutional network

A Graph Convolutional Network (GCN) is a type of neural network architecture designed for processing and analyzing graph-structured data. GCNs work by aggregating and propagating information through the nodes in a graph. GCN works with the following steps shown in Fig.  19 :

Initialization:

figure 19

Working of GCN

Each node in the graph is associated with a feature vector. Depending on the application, these feature vectors can represent various attributes or characteristics of the nodes. For example, in a social network, each node might represent a user, and the features could include user profile information.

Convolution Operation:

The core of a GCN is the convolution operation, which is adapted from convolutional neural networks (CNNs). It aims to aggregate information from neighboring nodes. This is done by taking a weighted sum of the feature vectors of neighboring nodes. The graph's adjacency matrix determines the weights. The resulting aggregated information is a new feature vector for each node.

Weighted Aggregation:

The graph's adjacency matrix, typically after normalization, provides weights for the aggregation process. In this context, for a given node, the features of its neighboring nodes are scaled by the corresponding values within the adjacency matrix, and the outcomes are then accumulated. A precise mathematical elucidation of this aggregation step is described in " Equation of GCN " section.

Activation function and learning weights:

The aggregated features are typically passed through an activation function (e.g., ReLU) to introduce non-linearity. The weight matrix W used in the aggregation step is learned during training. This learning process allows the GCN to adapt to the specific graph and task it is designed for.

Stacking Layers:

GCNs are often used in multiple layers. This allows the network to capture more complex relationships and higher-level features in the graph. The output of one GCN layer becomes the input for the next, and this process is repeated for a predefined number of layers.

Task-Specific Output:

The final output of the GCN can be used for various graph-based tasks, such as node classification, link prediction, or graph classification, depending on the specific application.

Equation of GCN

The Graph Convolutional Network (GCN) is based on a message-passing mechanism that can be described using mathematical equations. The core equation of a superficial, first-order GCN layer can be expressed as follows: For a graph with N nodes, let's define the following terms:

Equation  5.1 depicts a GCN layer's design. The normalized graph adjacency matrix A' and the nodes feature matrix F serve as the layer's inputs. The bias vector b and the weight matrix W are trainable parameters for the layer.

When used with the design matrix, the normalized adjacency matrix effectively smoothes a node’s feature vector based on the feature vectors of its close graph neighbors. This matrix captures the graph structure. A’ is normalized to make each neighboring node’s contribution proportional to the network's connectivity.

The layer definition is finished by applying A'FW + b to an element-wise non-linear function, such as ReLU. The downstream node classification task requires deep neural architectures to learn a complicated hierarchy of node attributes. This layer's output matrix Z can be routed into another GCN layer or any other neural network layer to do this.

Summary of graph convolution neural network (GCN) is shown in Table 2 .

Graph attention network (gat/gan).

Graph Attention Network (GAT/GAN) is a new neural network that works with graph-structured data. It uses masked self-attentional layers to address the shortcomings of past methods that depended on graph convolutions or their approximations. By stacking layers, the process makes it possible (implicitly) to assign various nodes in a neighborhood different weights, allowing nodes to focus on the characteristics of their neighborhoods without having to perform an expensive matrix operation (like inversion) or rely on prior knowledge of the graph's structure. GAT concurrently tackles numerous significant limitations of spectral-based graph neural networks, making the model suitable for both inductive and transductive applications.

Working of GAT

The Graph Attention Network (GAT) is a neural network architecture designed for processing and analyzing graph-structured data shown in Fig. 20 . GATs are a variation of Graph Convolutional Networks (GCNs) that incorporate the concept of attention mechanisms. GAT/GAN works with the following steps shown in Fig.  21 .

figure 20

How attention Coefficients updates

As with other graph-based models, GAT starts with nodes in the graph, each associated with a feature vector. These features can represent various characteristics of the nodes.

Self-Attention Mechanism and Attention Computation:

GAT introduces an attention mechanism similar to what is used in sequence-to-sequence models in natural language processing. The attention mechanism allows each node to focus on different neighbors when aggregating information. It assigns different attention coefficients to the neighboring nodes, making the process more flexible. For each node in the graph, GAT computes attention scores for its neighboring nodes. These attention scores are based on the features of the central node and its neighbors. The attention scores are calculated using a weighted sum of the features of the central node and its neighbors.

The attention scores determine how much each neighbor’s feature contributes to the aggregation for the central node. This weighted aggregation is carried out for all neighboring nodes, resulting in a new feature vector for the central node.

Multiple Attention Heads and Output Combination:

GAT often employs multiple attention heads in parallel. Each attention head computes its attention scores and aggregation results. These multiple attention heads capture different aspects of the relationships in the graph. The outputs from the multiple attention heads are combined, typically by concatenation or averaging, to create a final feature vector for each node.

Learning Weights and Stacking Layers:

Similar to GCNs, GATs learn weight parameters during training. These weights are learned to optimize the attention mechanisms and adapt to the specific graph and task. GATs can be used in multiple layers to capture higher-level features and complex relationships in the graph. The output of one GAT layer becomes the input for the next layer.

The learning weights capture the importance of node relationships and contribute to information aggregation during the neighborhood aggregation process. The learning process in GNNs also relies on backpropagation and optimization algorithms. The stacking of GNN layers enables the model to capture higher-level abstractions and dependencies in the graph. Each layer refines the node representations based on information from the previous layer.

The final output of the GAT can be used for various graph-based tasks, such as node classification, link prediction, or graph classification, depending on the application.

Equation for GAT

GAT’s main distinctive feature is gathering data from the one-hop neighborhood [ 30 ]. A graph convolution operation in GCN produces the normalized sum of node properties of neighbors. Equation  5.2 shows the Graph attention network, which \({h}_{i}^{(l+1)}\) defines the current node output, \(\sigma\) denotes the non-linearity ReLU function, \(j\varepsilon N\left(i\right)\) one hop neighbor, \({\complement }_{i,j}\) normalized vector, \({W}^{\left(l\right)}\) weight matrix, and \({h}_{j}^{(l)}\) denotes the previous node.

Why is GAT better than GCN?

We learned from the Graph Convolutional Network (GCN) that integrating local graph structure and node-level features results in good node classification performance. The way GCN aggregates messages, on the other hand, is structure-dependent, which may limit its use.

How attention coefficients update: the attention layer has 4 parts: [ 47 ]

A linear transformation: A shared linear transformation is applied to each node in the following Equation.

where h is a set of node features. W is the weight matrix. Z is the output layer node.

Attention Coefficients: In the GAT paradigm, it is crucial because every node can now attend to every other node, discarding any structural information. The pair-wise un-normalized attention score between two neighbors is computed in the next step. It combines the 'z' embeddings of the two nodes. Where || stands for concatenation, a learnable weight vector a(l) is put through a dot product, and a LeakyReLU is used [ 1 ]. Contrary to the dot-product attention utilized in the Transformer model, this kind of attention is called additive attention. The nodes are subsequently subjected to self-attention.

Softmax: We utilize the softmax function to normalize the coefficients over all j values, improving their comparability across nodes.

Aggregation: This process is comparable to GCN. The neighborhood embeddings are combined and scaled based on the attention scores.

Summary of graph attention network (GAT) is shown in Table 3 .

GraphSAGE represents a tangible realization of an inductive learning framework shown in Fig. 22 . It exclusively considers training samples linked to the training set's edges during training. This process consists of two main steps: “Sampling” and “Aggregation.” Subsequently, the node representation vector is paired with the vector from the aggregated model and passed through a fully connected layer with a non-linear activation function. It's important to note that each network layer shares a standard aggregator and weight matrix. Thus, the consideration should be on the number of layers or weight matrices rather than the number of aggregators. Finally, a normalization step is applied to the layer's output.

Two major steps:

Sample It describes how to sample a large number of neighbors.

Aggregator refers to obtaining the neighbor node embedding and then determining how to collect these embeddings and change your embedding information.

figure 22

Working of Graph SAGE Method

Working of graphSAGE model:

First, initializes the eigenvectors of all nodes in the input graph

For each node, get its sampled neighbor nodes

The aggregation function is used to aggregate the information of neighbor nodes

And combined with embedding, Update the same by a non-linear transformation embedding Express.

Types of aggregators

In the GraphSAGE method, 4 types of Aggregators are used.

Simple neighborhood aggregator:

Mean aggregator

LSTM Aggregator: Applies LSTM to a random permutation of neighbors.

Pooling Aggregator: It applies a symmetric vector function and converts adjacent vectors.

Equation of graphSAGE

W k , B k : is learnable weight matrices.

\({W}_{k}{B}_{k}=\) is learnable wight matrices.

\({h}_{v}^{0}= {x}_{v}:initial 0-\) the layer embeddings are equal to node features.

\({h}_{u}^{k-1}=\) Generalized Aggregation.

\({z}_{v }= {h}_{v}^{k}n\) : embedding after k layers of neighborhood aggregation.

\(\sigma\) – non linearity (ReLU).

Summary of graphSAGE is shown in Table 4 .

Comparative study of gnn models, comparison based on practical implementation of gnn models.

Table 5 describes the dataset statistics for different datasets used in literature for graph type of input. The datasets are CORA, Citeseer, and Pubmed. These statistics provide information about the kind of dataset, the number of nodes and edges, the number of classes, the number of features, and the label rate for each dataset. These details are essential for understanding the characteristics and scale of the datasets used in the context of citation networks. Comparison of the GNN model with equation in shown in Fig.  23 .

figure 23

Equations of GNN Models

Table 6 shows the performance results of different Graph Neural Network (GNN) models on various datasets. Table 6 provides accuracy scores for other GNN models on different datasets. Additionally, the time taken for some models to compute results is indicated in seconds. This information is crucial for evaluating the performance of these models on specific datasets.

Comparison based on theoretical concepts of GNN models are described in Table 7 .

Graph neural network applications, graph construction.

Graph Neural Networks (GNNs) have a wide range of applications spanning diverse domains, which encompass modern recommender systems, computer vision, natural language processing, program analysis, software mining, bioinformatics, anomaly detection, and urban intelligence, among others. The fundamental prerequisite for GNN utilization is the transformation or representation of input data into a graph-like structure. In the realm of graph representation learning, GNNs excel in acquiring essential node or graph embeddings that serve as a crucial foundation for subsequent tasks [ 61 ].

The construction of a graph involves a two-fold process:

Graph creation and

Learning about graph representations

Graph Creation: The generation of graphs is essential for depicting the intricate relationships embedded within diverse incoming data. With the varied nature of input data, various applications adopt techniques to create meaningful graphs. This process is indispensable for effectively communicating the structural nuances of the data, ensuring the nodes and edges convey their semantic significance, particularly tailored to the specific task at hand.

Learning about graph representations: The subsequent phase involves utilizing the graph expression acquired from the input data. In GNN-based Learning for graph representations, some studies employ well-established GNN models like GraphSAGE, GCN, GAT, and GGNN, which offer versatility for various application tasks. However, when faced with specific tasks, it may be necessary to customize the GNN architecture to address particular challenges more effectively.

The different application which is considered a graph

Molecular Graphs: Atoms and electrons serve as the basic building blocks of matter and molecules, organized in three-dimensional structures. While all particles interact, we primarily acknowledge a covalent connection between two stable atoms when they are sufficiently spaced apart. Various atom-to-atom bond configurations exist, including single and double bonds. This three-dimensional arrangement is conveniently and commonly represented as a graph, with atoms representing nodes and covalent bonds representing edges [ 62 ].

Graphs of social networks: These networks are helpful research tools for identifying trends in the collective behavior of individuals, groups, and organizations. We may create a graph that represents groupings of people by visualizing individuals as nodes and their connections as edges [ 63 ].

Citation networks as graphs: When they publish papers, scientists regularly reference the work of other scientists. Each manuscript can be visualized as a node in a graph of these citation networks, with each directed edge denoting a citation from one publication to another. Additionally, we can include details about each document in each node, such as an abstract's word embedding [ 64 ].

Within computer vision: We may want to tag certain things in visual scenes. Then, we can construct graphs by treating these things as nodes and their connections as edges.

GNNs are used to model data as graphs, allowing for the capture of complex relationships and dependencies that traditional machine learning models may struggle to represent. This makes GNNs a valuable tool for tasks where data has an inherent graph structure or where modeling relationships is crucial for accurate predictions and analysis.

Graph neural networks (GNNs) applications in different fields

Nlp (natural language processing).

Document Classification: GNNs can be used to model the relationships between words or sentences in documents, allowing for improved document classification and information retrieval.

Text Generation: GNNs can assist in generating coherent and contextually relevant text by capturing dependencies between words or phrases.

Question Answering: GNNs can help in question-answering tasks by representing the relationships between question words and candidate answers within a knowledge graph.

Sentiment Analysis: GNNs can capture contextual information and sentiment dependencies in text, improving sentiment analysis tasks.

Computer vision

Image Segmentation: GNNs can be employed for pixel-level image segmentation tasks by modeling relationships between adjacent pixels as a graph.

Object Detection: GNNs can assist in object detection by capturing contextual information and relationships between objects in images.

Scene Understanding: GNNs are used for understanding complex scenes and modeling spatial relationships between objects in an image.

Bioinformatics

Protein-Protein Interaction Prediction: GNNs can be applied to predict interactions between proteins in biological networks, aiding in drug discovery and understanding disease mechanisms.

Genomic Sequence Analysis: GNNs can model relationships between genes or genetic sequences, helping in gene expression prediction and sequence classification tasks.

Drug Discovery: GNNs can be used for drug-target interaction prediction and molecular property prediction, which is vital in pharmaceutical research.

Table 8 offers a concise overview of various research papers that utilize Graph Neural Networks (GNNs) in diverse domains, showcasing the applications and contributions of GNNs in each study.

Table 9 highlights various applications of GNNs in Natural Language Processing, Computer Vision, and Bioinformatics domains, showcasing how GNN models are adapted and used for specific tasks within each field.

Future directions of graph neural network

The contribution of the existing literature to GNN principles, models, datasets, applications, etc., was the main emphasis of this survey. In this section, several potential future study directions are suggested. Significant challenges have been noted, including unbalanced datasets, the effectiveness of current methods, text classification, etc. We have also looked at the remedies to address these problems. We have suggested future and advanced directions to address these difficulties regarding domain adaptation, data augmentation, and improved classification. Table 10 displays future directions.

Imbalanced Datasets—Limited labeled data, domain-dependent data, and imbalanced data are currently issues with available datasets. Transfer learning and domain adaptation are solutions to these issues.

Accuracy of Existing Systems/Models—can utilize deep learning models such as GCN, GAT, and GraphSAGE approaches to increase the efficiency and precision of current systems. Additionally, training models on sizable, domain-specific datasets can enhance performance.

Enhancing Text Classification: Text classification poses another significant challenge, which is effectively addressed by leveraging advanced deep learning methodologies like graph neural networks, contributing to the improvement of text classification accuracy and performance.

The above Table  10 describes the research gaps and future directions presented in the above literature. These research gaps and future directions highlight the challenges and proposed solutions in the field of text classification and structural analysis.

Table 11 provides an overview of different research papers, their publication years, the applications they address, the graph structures they use, the graph types, the graph tasks, and the specific Graph Neural Network (GNN) models utilized in each study.

Conclusions

Graph Neural Networks (GNNs) have witnessed rapid advancements in addressing the unique challenges presented by data structured as graphs, a domain where conventional deep learning techniques, originally designed for images and text, often struggle to provide meaningful insights. GNNs offer a powerful and intuitive approach that finds broad utility in applications relying on graph structures. This comprehensive survey on GNNs offers an in-depth analysis covering critical aspects such as GNN fundamentals, the interplay with convolutional neural networks, GNN message-passing mechanisms, diverse GNN models, practical use cases, and a forward-looking perspective. Our central focus is on elucidating the foundational characteristics of GNNs, a field teeming with contemporary applications that continually enhance our comprehension and utilization of this technology.

The continuous evolution of GNN-based research has underscored the growing need to address issues related to graph analysis, which we aptly refer to as the frontiers of GNNs. In our exploration, we delve into several crucial recent research domains within the realm of GNNs, encompassing areas like link prediction, graph generation, and graph categorization, among others.

Availability of data and materials

Not applicable.

Abbreviations

Graph Neural Network

Graph Convolution Network

Graph Attention Networks

Natural Language Processing

Convolution Neural Networks

Recurrent Neural Networks

Machine Learning

Deep Learning

Knowledge Graph

Pucci A, Gori M, Hagenbuchner M, Scarselli F, Tsoi AC. Investigation into the application of graph neural networks to large-scale recommender systems, infona.pl, no. 32, no 4, pp. 17–26, 2006.

Mahmud FB, Rayhan MM, Shuvo MH, Sadia I, Morol MK. A comparative analysis of Graph Neural Networks and commonly used machine learning algorithms on fake news detection, Proc. - 2022 7th Int. Conf. Data Sci. Mach. Learn. Appl. CDMA 2022, pp. 97–102, 2022.

Cui L, Seo H, Tabar M, Ma F, Wang S, Lee D, Deterrent: Knowledge Guided Graph Attention Network for Detecting Healthcare Misinformation, Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., pp. 492–502, 2020.

Gori M, Monfardini G, Scarselli F, A new model for earning in raph domains, Proc. Int. Jt. Conf. Neural Networks, vol. 2, no. January 2005, pp. 729–734, 2005, https://doi.org/10.1109/IJCNN.2005.1555942 .

Scarselli F, Yong SL, Gori M, Hagenbuchner M, Tsoi AC, Maggini M. Graph neural networks for ranking web pages, Proc.—2005 IEEE/WIC/ACM Int. Web Intell. WI 2005, vol. 2005, no. January, pp. 666–672, 2005, doi: https://doi.org/10.1109/WI.2005.67 .

Gandhi S, Zyer AP, P3: Distributed deep graph learning at scale, Proc. 15th USENIX Symp. Oper. Syst. Des. Implementation, OSDI 2021, pp. 551–568, 2021.

Li C, Guo J, Zhang H. Pruning neighborhood graph for geodesic distance based semi-supervised classification, in 2007 International Conference on Computational Intelligence and Security (CIS 2007), 2007, pp. 428–432.

Zhang Z, Cui P, Pei J, Wang X, Zhu W, Eigen-gnn: A graph structure preserving plug-in for gnns, IEEE Trans. Knowl. Data Eng., 2021.

Nandedkar AV, Biswas PK. A granular reflex fuzzy min–max neural network for classification. IEEE Trans Neural Netw. 2009;20(7):1117–34.

Article   Google Scholar  

Chaturvedi DK, Premdayal SA, Chandiok A. Short-term load forecasting using soft computing techniques. Int’l J Commun Netw Syst Sci. 2010;3(03):273.

Google Scholar  

Hashem T, Kulik L, Zhang R. Privacy preserving group nearest neighbor queries, in Proceedings of the 13th International Conference on Extending Database Technology, 2010, pp. 489–500.

Sun Z et al. Knowledge graph alignment network with gated multi-hop neighborhood aggregation, in Proceedings of the AAAI Conference on Artificial Intelligence, 2020, vol. 34, no. 01, pp. 222–229.

Zhang M, Chen Y. Link prediction based on graph neural networks. Adv Neural Inf Process Syst. 31, 2018.

Stanimirović PS, Katsikis VN, Li S. Hybrid GNN-ZNN models for solving linear matrix equations. Neurocomputing. 2018;316:124–34.

Stanimirović PS, Petković MD. Gradient neural dynamics for solving matrix equations and their applications. Neurocomputing. 2018;306:200–12.

Zhang C, Song D, Huang C, Swami A, Chawla NV. Heterogeneous graph neural network, in Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, 2019, pp. 793–803.

Fan W et al. Graph neural networks for social recommendation," in The world wide web conference, 2019, pp. 417–426.

Gui T et al. A lexicon-based graph neural network for Chinese NER," in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, pp. 1040–1050.

Qasim SR, Mahmood H, Shafait F. Rethinking table recognition using graph neural networks, in 2019 International Conference on Document Analysis and Recognition (ICDAR), 2019, pp. 142–147

You J, Ying R, Leskovec J. Position-aware graph neural networks, in International conference on machine learning, 2019, pp. 7134–7143.

Cao D, et al. Spectral temporal graph neural network for multivariate time-series forecasting. Adv Neural Inf Process Syst. 2020;33:17766–78.

Xhonneux LP, Qu M, Tang J. Continuous graph neural networks. In International Conference on Machine Learning, 2020, pp. 10432–10441.

Zhou K, Huang X, Li Y, Zha D, Chen R, Hu X. Towards deeper graph neural networks with differentiable group normalization. Adv Neural Inf Process Syst. 2020;33:4917–28.

Gu F, Chang H, Zhu W, Sojoudi S, El Ghaoui L. Implicit graph neural networks. Adv Neural Inf Process Syst. 2020;33:11984–95.

Liu Y, Guan R, Giunchiglia F, Liang Y, Feng X. Deep attention diffusion graph neural networks for text classification. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021, pp. 8142–8152.

Gasteiger J, Becker F, Günnemann S. Gemnet: universal directional graph neural networks for molecules. Adv Neural Inf Process Syst. 2021;34:6790–802.

Yao D et al. Deep hybrid: multi-graph neural network collaboration for hyperspectral image classification. Def. Technol. 2022.

Li Y, et al. Research on multi-port ship traffic prediction method based on spatiotemporal graph neural networks. J Mar Sci Eng. 2023;11(7):1379.

Djenouri Y, Belhadi A, Srivastava G, Lin JC-W. Hybrid graph convolution neural network and branch-and-bound optimization for traffic flow forecasting. Futur Gener Comput Syst. 2023;139:100–8.

Zhou J, et al. Graph neural networks: a review of methods and applications. AI Open. 2020;1(January):57–81. https://doi.org/10.1016/j.aiopen.2021.01.001 .

Rong Y, Huang W, Xu T, Huang J. Dropedge: Towards deep graph convolutional networks on node classification. arXiv preprint arXiv:1907.10903. 2019.

Abu-Salih B, Al-Qurishi M, Alweshah M, Al-Smadi M, Alfayez R, Saadeh H. Healthcare knowledge graph construction: a systematic review of the state-of-the-art, open issues, and opportunities. J Big Data. 2023;10(1):81.

Kipf TN, Welling M. Semi-supervised classification with graph convolutional networks. arXiv Prepr. arXiv1609.02907, 2016.

Berg RV, Kipf TN, Welling M. Graph Convolutional Matrix Completion. 2017, http://arxiv.org/abs/1706.02263

Monti F, Boscaini D, Masci J, Rodola E, Svoboda J, Bronstein MM. Geometric deep learning on graphs and manifolds using mixture model cnns. InProceedings of the IEEE conference on computer vision and pattern recognition 2017 (pp. 5115-5124).

Cui Z, Henrickson K, Ke R, Wang Y. Traffic graph convolutional recurrent neural network: a deep learning framework for network-scale traffic learning and forecasting. IEEE Trans Intell Transp Syst. 2020;21(11):4883–94. https://doi.org/10.1109/TITS.2019.2950416 .

Yang J, Lu J, Lee S, Batra D, Parikh D. Graph r-cnn for scene graph generation. InProceedings of the European conference on computer vision (ECCV) 2018 (pp. 670-685). https://doi.org/10.1007/978-3-030-01246-5_41 .

Teney D, Liu L, van Den Hengel A. Graph-structured representations for visual question answering. InProceedings of the IEEE conference on computer vision and pattern recognition 2017 (pp. 1-9). https://doi.org/10.1109/CVPR.2017.344 .

Yao L, Mao C, Luo Y. Graph convolutional networks for text classification. Proc AAAI Conf Artif Intell. 2019;33(01):7370–7.

De Cao N, Aziz W, Titov I. Question answering by reasoning across documents with graph convolutional networks. arXiv Prepr. arXiv1808.09920, 2018.

Gao H, Wang Z, Ji S. Large-scale learnable graph convolutional networks. in Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, 2018, pp. 1416–1424.

Hu F, Zhu Y, Wu S, Wang L, Tan T. Hierarchical graph convolutional networks for semi-supervised node classification. arXiv Prepr. arXiv1902.06667, 2019.

Lange O, Perez L. Traffic prediction with advanced graph neural networks. DeepMind Research Blog Post, https://deepmind.google/discover/blog/traffic-prediction-with-advanced-graph-neural-networks/ . 2020.

Duan C, Hu B, Liu W, Song J. Motion capture for sporting events based on graph convolutional neural networks and single target pose estimation algorithms. Appl Sci. 2023;13(13):7611.

Balcıoğlu YS, Sezen B, Çerasi CC, Huang SH. machine design automation model for metal production defect recognition with deep graph convolutional neural network. Electronics. 2023;12(4):825.

Baghbani A, Bouguila N, Patterson Z. Short-term passenger flow prediction using a bus network graph convolutional long short-term memory neural network model. Transp Res Rec. 2023;2677(2):1331–40.

Velickovic P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y. Graph attention networks. Stat. 2017;1050(20):10–48550.

Hamilton W, Ying Z, Leskovec J. Inductive representation learning on large graphs. Advances in neural information processing systems. 2017; 30.

Ye Y, Ji S. Sparse graph attention networks. IEEE Trans Knowl Data Eng. 2021;35(1):905–16.

MathSciNet   Google Scholar  

Chen Z et al. Graph neural network-based fault diagnosis: a review. arXiv Prepr. arXiv2111.08185, 2021.

Brody S, Alon U, Yahav E. How attentive are graph attention networks? arXiv Prepr. arXiv2105.14491, 2021.

Huang J, Shen H, Hou L, Cheng X. Signed graph attention networks," in International Conference on Artificial Neural Networks. 2019, pp. 566–577.

Seraj E, Wang Z, Paleja R, Sklar M, Patel A, Gombolay M. Heterogeneous graph attention networks for learning diverse communication. arXiv preprint arXiv: 2108.09568. 2021.

Zhang Y, Wang X, Shi C, Jiang X, Ye Y. Hyperbolic graph attention network. IEEE Transactions on Big Data. 2021;8(6):1690–701.

Yang X, Ma H, Wang M. Research on rumor detection based on a graph attention network with temporal features. Int J Data Warehous Min. 2023;19(2):1–17.

Lan W, et al. KGANCDA: predicting circRNA-disease associations based on knowledge graph attention network. Brief Bioinform. 2022;23(1):bbab494.

Xiao L, Wu X, Wang G, 2019, December. Social network analysis based on graph SAGE. In 2019 12th international symposium on computational intelligence and design (ISCID) (Vol. 2, pp. 196–199). IEEE.

Chang L, Branco P. Graph-based solutions with residuals for intrusion detection: The modified e-graphsage and e-resgat algorithms. arXiv preprint arXiv:2111.13597. 2021.

Oh J, Cho K, Bruna J. Advancing graphsage with a data-driven node sampling. arXiv preprint arXiv:1904.12935. 2019.

Kapoor M, Patra S, Subudhi BN, Jakhetiya V, Bansal A. Underwater Moving Object Detection Using an End-to-End Encoder-Decoder Architecture and GraphSage With Aggregator and Refactoring. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2023 (pp. 5635-5644).

Bhatti UA, Tang H, Wu G, Marjan S, Hussain A. Deep learning with graph convolutional networks: An overview and latest applications in computational intelligence. Int J Intell Syst. 2023;2023:1–28.

David L, Thakkar A, Mercado R, Engkvist O. Molecular representations in AI-driven drug discovery: a review and practical guide. J Cheminform. 2020;12(1):1–22.

Davies A, Ajmeri N. Realistic Synthetic Social Networks with Graph Neural Networks. arXiv preprint arXiv:2212.07843. 2022; 15.

Frank MR, Wang D, Cebrian M, Rahwan I. The evolution of citation graphs in artificial intelligence research. Nat Mach Intell. 2019;1(2):79–85.

Gao C, Wang X, He X, Li Y. Graph neural networks for recommender system. InProceedings of the Fifteenth ACM International Conference on Web Search and Data Mining 2022 (pp. 1623-1625).

Wu S, Sun F, Zhang W, Xie X, Cui B. Graph neural networks in recommender systems: a survey. ACM Comput Surv. 2022;55(5):1–37.

Wu L, Chen Y, Shen K, Guo X, Gao H, Li S, Pei J, Long B. Graph neural networks for natural language processing: a survey. Found Trends Mach Learn. 2023;16(2):119–328.

Wu L, Chen Y, Ji H, Liu B. Deep learning on graphs for natural language processing. InProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval 2021 (pp. 2651-2653).

Liu X, Su Y, Xu B. The application of graph neural network in natural language processing and computer vision. In2021 3rd International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI) 2021 (pp. 708-714).

Harmon SHE, Faour DE, MacDonald NE. Mandatory immunization and vaccine injury support programs: a survey of 28 GNN countries. Vaccine. 2021;39(49):7153–7.

Yan W, Zhang Z, Zhang Q, Zhang G, Hua Q, Li Q. Deep data analysis-based agricultural products management for smart public healthcare. Front Public Health. 2022;10:847252.

Hamaguchi T, Oiwa H, Shimbo M, Matsumoto Y. Knowledge transfer for out-of-knowledge-base entities: a graph neural network approach. arXiv preprint arXiv:1706.05674. 2017.

Dai D, Zheng H, Luo F, Yang P, Chang B, Sui Z. Inductively representing out-of-knowledge-graph entities by optimal estimation under translational assumptions. arXiv preprint arXiv:2009.12765.

Pradhyumna P, Shreya GP. Graph neural network (GNN) in image and video understanding using deep learning for computer vision applications. In2021 Second International Conference on Electronics and Sustainable Communication Systems (ICESC) 2021 (pp. 1183-1189).

Shi W, Rajkumar R. Point-gnn: Graph neural network for 3d object detection in a point cloud. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition 2020 (pp. 1711-1719).

Wu Y, Dai HN, Tang H. Graph neural networks for anomaly detection in industrial internet of things. IEEE Int Things J. 2021;9(12):9214–31.

Pitsik EN, et al. The topology of fMRI-based networks defines the performance of a graph neural network for the classification of patients with major depressive disorder. Chaos Solitons Fractals. 2023;167: 113041.

Liao W, Zeng B, Liu J, Wei P, Cheng X, Zhang W. Multi-level graph neural network for text sentiment analysis. Comput Electr Eng. 2021;92: 107096.

Kumar VS, Alemran A, Karras DA, Gupta SK, Dixit CK, Haralayya B. Natural Language Processing using Graph Neural Network for Text Classification. In2022 International Conference on Knowledge Engineering and Communication Systems (ICKES) 2022; (pp. 1-5).

Dara S, Srinivasulu CH, Babu CM, Ravuri A, Paruchuri T, Kilak AS, Vidyarthi A. Context-Aware auto-encoded graph neural model for dynamic question generation using NLP. ACM transactions on asian and low-resource language information processing. 2023.

Wu L, Cui P, Pei J, Zhao L, Guo X. Graph neural networks: foundation, frontiers and applications. InProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2022; (pp. 4840-4841).

Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G. The graph neural network model. IEEE Trans neural networks. 2008;20(1):61–80.

Cao P, Zhu Z, Wang Z, Zhu Y, Niu Q. Applications of graph convolutional networks in computer vision. Neural Comput Appl. 2022;34(16):13387–405.

You R, Yao S, Mamitsuka H, Zhu S. DeepGraphGO: graph neural network for large-scale, multispecies protein function prediction. Bioinformatics. 2021;37(Supplement_1):i262-71.

Long Y, et al. Pre-training graph neural networks for link prediction in biomedical networks. Bioinformatics. 2022;38(8):2254–62.

Wu Y, Gao M, Zeng M, Zhang J, Li M. BridgeDPI: a novel graph neural network for predicting drug–protein interactions. Bioinformatics. 2022;38(9):2571–8.

Kang C, Zhang H, Liu Z, Huang S, Yin Y. LR-GNN: a graph neural network based on link representation for predicting molecular associations. Briefings Bioinf. 2022;23(1):bbab513.

Wei X, Huang H, Ma L, Yang Z, Xu L. Recurrent Graph Neural Networks for Text Classification. in 2020 IEEE 11th International Conference on Software Engineering and Service Science (ICSESS), 2020, pp. 91–97.

Schlichtkrull MS, De Cao N, Titov I. Interpreting graph neural networks for nlp with differentiable edge masking. arXiv Prepr. arXiv2010.00577, 2020.

Tu M, Huang J, He X, Zhou B. Graph sequential network for reasoning over sequences. arXiv Prepr. arXiv2004.02001, 2020.

Download references

Acknowledgements

I am grateful to all of those with whom I have had the pleasure to work during this research work. Each member has provided me extensive personal and professional guidance and taught me a great deal about scientific research and life in general.

This work was supported by the Research Support Fund (RSF) of Symbiosis International (Deemed University), Pune, India.

Author information

Authors and affiliations.

Symbiosis Institute of Technology Pune Campus, Symbiosis International (Deemed University) (SIU), Lavale, Pune, 412115, India

Bharti Khemani

Symbiosis Centre for Applied Artificial Intelligence (SCAAI), Symbiosis Institute of Technology Pune Campus, Symbiosis International (Deemed University) (SIU), Lavale, Pune, 412115, India

Shruti Patil & Ketan Kotecha

IEEE, Department of Computer Science and Engineering, Institute of Technology, Nirma University, Ahmedabad, India

Sudeep Tanwar

You can also search for this author in PubMed   Google Scholar

Contributions

Conceptualization, BK and SP; methodology, BK and SP; software, BK; validation, BK, SP, KK; formal analysis, BK; investigation, BK; resources, BK; data curation, BK and SP; writing—original draft preparation, BK; writing—review and editing, SP, KK, and ST; visualization, BK; supervision, SP; project administration, SP, ST; funding acquisition, KK.

Corresponding author

Correspondence to Shruti Patil .

Ethics declarations

Ethics approval and consent to participate.

Not applicable

Consent for publication

Competing interests.

The authors declare that they have no competing interests .

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

See Tables  12 and 13

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Khemani, B., Patil, S., Kotecha, K. et al. A review of graph neural networks: concepts, architectures, techniques, challenges, datasets, applications, and future directions. J Big Data 11 , 18 (2024). https://doi.org/10.1186/s40537-023-00876-4

Download citation

Received : 28 June 2023

Accepted : 27 December 2023

Published : 16 January 2024

DOI : https://doi.org/10.1186/s40537-023-00876-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Graph Neural Network (GNN)
  • Graph Convolution Network (GCN)
  • Graph Attention Networks (GAT)
  • Message Passing Mechanism
  • Natural Language Processing (NLP)

literature review of neural network

  • Open access
  • Published: 08 June 2020

Deep learning in finance and banking: A literature review and classification

  • Jian Huang 1 ,
  • Junyi Chai   ORCID: orcid.org/0000-0003-1560-845X 2 &
  • Stella Cho 2  

Frontiers of Business Research in China volume  14 , Article number:  13 ( 2020 ) Cite this article

64k Accesses

89 Citations

70 Altmetric

Metrics details

Deep learning has been widely applied in computer vision, natural language processing, and audio-visual recognition. The overwhelming success of deep learning as a data processing technique has sparked the interest of the research community. Given the proliferation of Fintech in recent years, the use of deep learning in finance and banking services has become prevalent. However, a detailed survey of the applications of deep learning in finance and banking is lacking in the existing literature. This study surveys and analyzes the literature on the application of deep learning models in the key finance and banking domains to provide a systematic evaluation of the model preprocessing, input data, and model evaluation. Finally, we discuss three aspects that could affect the outcomes of financial deep learning models. This study provides academics and practitioners with insight and direction on the state-of-the-art of the application of deep learning models in finance and banking.

Introduction

Deep learning (DL) is an advanced technique of machine learning (ML) based on artificial neural network (NN) algorithms. As a promising branch of artificial intelligence, DL has attracted great attention in recent years. Compared with conventional ML techniques such as support vector machine (SVM) and k-nearest neighbors (kNN), DL possesses advantages of the unsupervised feature learning, a strong capability of generalization, and a robust training power for big data. Currently, DL has been applied comprehensively in classification and prediction tasks, computer visions, image processing, and audio-visual recognition (Chai and Li 2019 ). Although DL was developed in the field of computer science, its applications have penetrated diversified fields such as medicine, neuroscience, physics and astronomy, finance and banking (F&B), and operations management (Chai et al. 2013 ; Chai and Ngai 2020 ). The existing literature lacks a good overview of DL applications in F&B fields. This study attempts to bridge this gap.

While DL is the focus of computer vision (e.g., Elad and Aharon 2006 ; Guo et al. 2016 ) and natural language processing (e.g., Collobert et al. 2011 ) in the mainstream, DL applications in F&B are developing rapidly. Shravan and Vadlamani (2016) investigated the tools of text mining for F&B domains. They examined the representative ML algorithms, including SVM, kNN, genetic algorithm (GA), and AdaBoost. Butaru et al. ( 2016 ) compared performances of DL algorithms, including random forests, decision trees, and regularized logistic regression. They found that random forests gained the highest classification accuracy in the delinquency status.

Cavalcante et al. ( 2016 ) summarized the literature published from 2009 to 2015. They analyzed DL models, including multi-layer perceptron (MLP) (a fast library for approximate nearest neighbors), Chebyshev functional link artificial NN, and adaptive weighting NN. Although the study constructed a prediction framework in financial trading, some notable DL techniques such as long short-term memory (LSTM) and reinforcement learning (RL) models are neglect. Thus, the framework cannot ascertain the optimal model in a specific condition.

The reviews of the existing literature are either incomplete or outdated. However, our study provides a comprehensive and state-of-the-art review that could capture the relationships between typical DL models and various F&B domains. We identified critical conditions to limit our collection of articles. We employed academic databases in Science Direct, Springer-Link Journal, IEEE Xplore, Emerald, JSTOR, ProQuest Database, EBSCOhost Research Databases, Academic Search Premier, World Scientific Net, and Google Scholar to search for articles. We used two groups of keywords for our search. One group is related to the DL, including “deep learning,” “neural network,” “convolutional neural networks” (CNN), “recurrent neural network” (RNN), “LSTM,” and “RL.” The other group is related to finance, including “finance,” “market risk,” “stock risk,” “credit risk,” “stock market,” and “banking.” It is important to conduct cross searches between computer-science-related and finance-related literature. Our survey exclusively focuses on the financial application of DL models rather than other DL models like SVM, kNN, or random forest. The time range of our review was set between 2014 and 2018. In this stage, we collected more than 150 articles after cross-searching. We carefully reviewd each article and considered whether it is worthy of entering our pool of articles for review. We removed the articles if they are not from reputable journals or top professional conferences. Moreover, articles were discarded if the details of financial DL models presented were not clarified. Thus, 40 articles were selected for this review eventually.

This study contributes to the literature in the following ways. First, we systematically review the state-of-the-art applications of DL in F&B fields. Second, we summarize multiple DL models regarding specified F&B domains and identify the optimal DL model of various application scenarios. Our analyses rely on the data processing methods of DL models, including preprocessing, input data, and evaluation rules. Third, our review attempts to bridge the technological and application levels of DL and F&B, respectively. We recognize the features of various DL models and highlight their feasibility toward different F&B domains. The penetration of DL into F&B is an emerging trend. Researchers and financial analysts should know the feasibilities of particular DL models toward a specified financial domain. They usually face difficulties due to the lack of connections between core financial domains and numerous DL models. This study will fill this literature gap and guide financial analysts.

The rest of this paper is organized as follows. Section 2 provides a background of DL techniques. Section 3 introduces our research framework and methodology. Section 4 analyzes the established DL models. Section 5 analyzes key methods of data processing, including data preprocessing and data inputs. Section 6 captures appeared criteria for evaluating the performance of DL models. Section 7 provides a general comparison of DL models against identified F&B domains. Section 8 discusses the influencing factors in the performance of financial DL models. Section 9 concludes and outlines the scope for promising future studies.

Background of deep learning

Regarding DL, the term “deep” presents the multiple layers that exist in the network. The history of DL can be traced back to stochastic gradient descent in 1952, which is employed for an optimization problem. The bottleneck of DL at that time was the limit of computer hardware, as it was very time-consuming for computers to process the data. Today, DL is booming with the developments of graphics processing units (GPUs), dataset storage and processing, distributed systems, and software such as Tensor Flow. This section briefly reviews the basic concept of DL, including NN and deep neural network (DNN). All of these models have greatly contributed to the applications in F&B.

The basic structure of NN can be illustrated as Y  =  F ( X T w  +  c ) regarding the independent (input) variables X , the weight terms w , and the constant terms c . Y is the dependent variable and X is formed as an n  ×  m matrix for the number of training sample n and the number of input variables m . To apply this structure in finance, Y can be considered as the price of next term, the credit risk level of clients, or the return rate of a portfolio. F is an activation function that is unique and different from regression models. F is usually formulated as sigmoid functions and tanh functions. Other functions can also be used, including ReLU functions, identity functions, binary step functions, ArcTan functions, ArcSinh functions, ISRU functions, ISRLU functions, and SQNL functions. If we combine several perceptrons in each layer and add a hidden layer from Z 1 to Z 4 in the middle, we term a single layer as a neural network, where the input layers are the X s , and the output layers are the Y s . In finance, Y can be considered as the stock price. Moreover, multiple Y s are also applicable; for instance, fund managers often care about future prices and fluctuations. Figure  1 illustrates the basic structure.

figure 1

The structure of NN

Based on the basic structure of NN shown in Fig.  1 , traditional networks include DNN, backpropagation (BP), MLP, and feedforward neural network (FNN). Using these models can ignore the order of data and the significance of time. As shown in Fig.  2 , RNN has a new NN structure that can address the issues of long-term dependence and the order between input variables. As financial data in time series are very common, uncovering hidden correlations is critical in the real world. RNN can be better at solving this problem, as compared to other moving average (MA) methods that have been frequently adopted before. A detailed structure of RNN for a sequence over time is shown in Part B of the Appendix (see Fig. 7 in Appendix ).

figure 2

The abstract structure of RNN

Although RNN can resolve the issue of time-series order, the issue of long-term dependencies remains. It is difficult to find the optimal weight for long-term data. LSTM, as a type of RNN, added a gated cell to overcome long-term dependencies by combining different activation functions (e.g., sigmoid or tanh). Given that LSTM is frequently used for forecasting in the finance literature, we extract LSTM from RNN models and name other structures of standard RNN as RNN(O).

As we focus on the application rather than theoretical DL aspect, this study will not consider other popular DL algorithms, including CNN and RL, as well as Latent variable models such as variational autoencoders and generative adversarial network. Table 6 in Appendix shows a legend note to explain the abbreviations used in this paper. We summarize the relationship between commonly used DL models in Fig.  3 .

figure 3

Relationships of reviewed DL models for F&B domains

Research framework and methodology

Our research framework is illustrated in Fig.  4 . We combine qualitative and quantitative analyses of the articles in this study. Based on our review, we recognize and identify seven core F&B domains, as shown in Fig.  5 . To connect the DL side and the F&B side, we present our review on the application of the DL model in seven F&B domains in Section 4. It is crucial to analyze the feasibility of a DL model toward particular domains. To do so, we provide summarizations in three key aspects, including data preprocessing, data inputs, and evaluation rules, according to our collection of articles. Finally, we determine optimal DL models regarding the identified domains. We further discuss two common issues in using DL models for F&B: overfitting and sustainability.

figure 4

The research framework of this study

figure 5

The identified domains of F&B for DL applications

Figure  5 shows that the application domains can be divided into two major areas: (1) banking and credit risk and (2) financial market investment. The former contains two domains: credit risk prediction and macroeconomic prediction. The latter contains financial prediction, trading, and portfolio management. Prediction tasks are crucial, as emphasized by Cavalcante et al. ( 2016 ). We study this domain from three aspects of prediction, including exchange rate, stock market, and oil price. We illustrate this structure of application domains in F&B.

Figure  6 shows a statistic in the listed F&B domains. We illustrate the domains of financial applications on the X-axis and count the number of articles on the Y-axis. Note that a reviewed article could cover more than one domain in this figure; thus, the sum of the counts (45) is larger than the size of our review pool (40 articles). As shown in Fig.  6 , stock marketing prediction and trading dominate the listed domains, followed by exchange rate prediction. Moreover, we found two articles on banking credit risk and two articles on portfolio management. Price prediction and macroeconomic prediction are two potential topics that deserve more studies.

figure 6

A count of articles over seven identified F&B domains

Application of DL models in F&B domains

Based on our review, six types of DL models are reported. They are FNN, CNN, RNN, RL, deep belief networks (DBN), and restricted Boltzmann machine (RBM). Regarding FNN, several papers use the alternative terms of backpropagation artificial neural network (ANN), FNN, MLP, and DNN. They have an identical structure. Regarding RNN, one of its well-known models in the time-series analysis is called LSTM. Nearly half of the reviewed articles apply FNN as the primary DL technique. Nine articles apply LSTM, followed by eight articles for RL, and six articles for RNN. Minor ones that are applied in F&B include CNN, DBM, and RBM. We count the number of articles that use various DL models in seven F&B domains, as shown in Table  1 . FNN is the principal model used in exchange rate, price, and macroeconomic predictions, as well as banking default risk and credit. LSTM and FNN are two kinds of popular models for stock market prediction. Differently, RL and FNN are frequently used regarding stock trading. FNN, RL, and simple RNN can be conducted in portfolio management. FNN is the primary model in macroeconomic and banking risk prediction. CNN, LSTM, and RL are emerging research approaches in banking risk prediction. The detailed statistics that contain specific articles can be found in Table 5 in Appendix .

Exchange rate prediction

Shen et al. ( 2015 ) construct an improved DBN model by including RBM and find that their model outperforms the random walk algorithm, auto-regressive-moving-average (ARMA), and FNN with fewer errors. Zheng et al. ( 2017 ) examine the performance of DBN and find that the DBN model estimates the exchange rate better than FNN model does. They find that a small number of layer nodes engender a more significant effect on DBN.

Several scholars believe that a hybrid model should have better performance. Ravi et al. ( 2017 ) contribute a hybrid model by using MLP (FNN), chaos theory, and multi-objective evolutionary algorithms. Their Chaos+MLP + NSGA-II model Footnote 1 has a mean squared error (MSE) with 2.16E-08 that is very low. Several articles point out that only a complicated neural network like CNN can gain higher accuracy. For example, Galeshchuk and Mukherjee ( 2017 ) conduct experiments and claim that a single hidden layer NN or SVM performs worse than a simple model like moving average (MA). However, they find that CNN could achieve higher classification accuracy in predicting the direction of the change of exchange rate because of successive layers of DNN.

Stock market prediction

In stock market prediction, some studies suggest that market news may influence the stock price and DL model, such as using a magic filter to extract useful information for price prediction. Matsubara et al. ( 2018 ) extract information from the news and propose a deep neural generative model to predict the movement of the stock price. This model combines DNN and a generative model. It suggests that this hybrid approach outperforms SVM and MLP.

Minh et al. ( 2017 ) develop a novel framework with two streams combining the gated recurrent unit network and the Stock2vec. It employs a word embedding and sentiment training system on financial news and the Harvard IV-4 dataset. They use the historical price and news-based signals from the model to predict the S&P500 and VN-index price directions. Their model shows that the two-stream gated recurrent unit is better than the gated recurrent unit or the LSTM. Jiang et al. ( 2018 ) establish a recurrent NN that extracts the interaction between the inner-domain and cross-domain of financial information. They prove that their model outperforms the simple RNN and MLP in the currency and stock market. Krausa and Feuerriegel ( 2017 ) propose that they can transform financial disclosure into a decision through the DL model. After training and testing, they point out that LSTM works better than the RNN and conventional ML methods such as ridge regression, Lasso, elastic net, random forest, SVR, AdaBoost, and gradient boosting. They further pre-train words embeddings with transfer learning (Krausa and Feuerriegel 2017 ). They conclude that better performance comes from LSTM with word embeddings. In the sentiment analysis, Sohangir et al. ( 2018 ) compares LSTM, doc2vec, and CNN to evaluate the stock opinions on the StockTwits. They conclude that CNN is the optimal model to predict the sentiment of authors. This result may be further applied to predict the stock market trend.

Data preprocessing is conducted to input data into the NN. Researchers may apply numeric unsupervised methods of feature extraction, including principal component analysis, autoencoder, RBM, and kNN. These methods can reduce the computational complexity and prevent overfitting. After the input of high-frequency transaction data, Chen et al. ( 2018b ) establish a DL model with an autoencoder and an RBM. They compare their model with backpropagation FNN, extreme learning machine, and radial basis FNN. They claim that their model can better predict the Chinese stock market. Chong et al. ( 2017 ) apply the principal component analysis (PCA) and RBM with high-frequency data of the South Korean market. They find that their model can explain the residual of the autoregressive model. The DL model can thus extract additional information and improve prediction performance. More so, Singh and Srivastava ( 2017 ) describe a model involving 2-directional and 2-dimensional (2D 2 ) PCA and DNN. Their model outperforms 2D 2 with radial basis FNN and RNN.

For time-series data, sometimes it is difficult to judge the weight of long-term and short-term data. The LSTM model is just for resolving this problem in financial prediction. The literature has attempted to prove that LSTM models are applicable and outperform conventional FNN models. Yan and Ouyang ( 2017 ) apply LSTM to challenge the MLP, SVM, and kNN in predicting a static and dynamic trend. After a wavelet decomposition and a reconstruction of the financial time series, their model can be used to predict a long-term dynamic trend. Baek and Kim ( 2018 ) apply LSTM not only in predicting the price of S&P500 and KOSPI200 but also in preventing overfitting. Kim and Won ( 2018 ) apply LSTM in the prediction of stock price volatility. They propose a hybrid model that combines LSTM with three generalized autoregressive conditional heteroscedasticity (GARCH)-type models. Hernandez and Abad ( 2018 ) argue that RBM is inappropriate for dynamic data modeling in the time-series analysis because it cannot retain memory. They apply a modified RBM model called p -RBM that can retain the memory of p past states. This model is used in predicting market directions of the NASDAQ-100 index. Compared with vector autoregression (VAR) and LSTM, notwithstanding, they find that LSTM is better because it can uncover the hidden structure within the non-linear data while VAR and p -RBM cannot capture the non-linearity in data.

CNN was established to predict the price with a complicated structure. Making the best use of historical price, Dingli and Fournier ( 2017 ) develop a new CNN model. This model can predict next month’s price. Their results cannot surpass other comparable models, such as logistic regression (LR) and SVM. Tadaaki ( 2018 ) applies the financial ratio and converts them into a “grayscale image” in the CNN model. The results reveal that CNN is more efficient than decision trees (DT), SVM, linear discriminant analysis, MLP, and AdaBoost. To predict the stock direction, Gunduz et al. ( 2017 ) establish a CNN model with a so-called specially ordered feature set whose classifier outperforms either CNN or LR.

Stock trading

Many studies adopt the conventional FNN model and try to set up a profitable trading system. Sezer et al. ( 2017 ) combine GA with MLP. Chen et al. ( 2017 ) adopt a double-layer NN and discover that its accuracy is better than ARMA-GARCH and single-layer NN. Hsu et al. ( 2018 ) equip the Black-Scholes model and a three-layer fully-connected feedforward network to estimate the bid-ask spread of option price. They argue that this novel model is better than the conventional Black-Scholes model with lower RMSE. Krauss et al. ( 2017 ) apply DNN, gradient-boosted-trees, and random forests in statistical arbitrage. They argue that their returns outperform the market index S&P500.

Several studies report that RNN and its derivate models are potential. Deng et al. ( 2017 ) extend the fuzzy learning into the RNN model. After comparing their model to different DL models like CNN, RNN, and LSTM, they claim that their model is the optimal one. Fischer and Krauss ( 2017 ) and Bao et al. ( 2017 ) argue that LSTM can create an optimal trading system. Fischer and Krauss ( 2017 ) claim that their model has a daily return of 0.46 and a sharp ratio of 5.8 prior to the transaction cost. Given the transaction cost, however, LSTM’s profitability fluctuated around zero after 2010. Bao et al. ( 2017 ) advance Fischer and Krauss’s ( 2017 ) work and propose a novel DL model (i.e., WSAEs-LSTM model). It uses wavelet transforms to eliminate noise, stacked autoencoders (SAEs) to predict stock price, and LSTM to predict the close price. The result shows that their model outperforms other models such as WLSTM, Footnote 2 LSTM, and RNN in predictive accuracy and profitability.

RL is popular recently despite its complexity. We find that five studies apply this model. Chen et al. ( 2018a ) propose an agent-based RL system to mimic 80% professional trading strategies. Feuerriegel and Prendinger ( 2016 ) convert the news sentiment into the signal in the trading system, although their daily returns and abnormal returns are nearly zero. Chakraborty ( 2019 ) cast the general financial market fluctuation into a stochastic control problem and explore the power of two RL models, including Q-learning Footnote 3 and state-action-reward-state-action (SARSA) algorithm. Both models can enhance profitability (e.g., 9.76% for Q-learning and 8.52% for SARSA). They outperform the buy-and-hold strategy. Footnote 4 Zhang and Maringer ( 2015 ) conduct a hybrid model called GA, with recurrent RL. GA is used to select an optimal combination of technical indicators, fundamental indicators, and volatility indicators. The out-of-sample trading performance is improved due to a significantly positive Sharpe ratio. Martinez-Miranda et al. ( 2016 ) create a new topic of trading. It uses a market manipulation scanner model rather than a trading system. They use RL to model spoofing-and-pinging trading. This study reveals that their model just works on the bull market. Jeong and Kim ( 2018 ) propose a model called deep Q-network that is constructed by RL, DNN, and transfer learning. They use transfer learning to solve the overfitting issue incurred as a result of insufficient data. They argue that the profit yields in this system increase by four times the amount in S&P500, five times in KOSPI, six times in EuroStoxx50, and 12 times in HIS.

Banking default risk and credit

Most articles in this domain focus on FNN applications. Rönnqvist and Sarlin ( 2017 ) propose a model for detecting relevant discussions in texting and extracting natural language descriptions of events. They convert the news into a signal of the bank-distress report. In their back-test, their model reflects the distressing financial event of the 2007–2008 period.

Zhu et al. ( 2018 ) propose a hybrid CNN model with a feature selection algorithm. Their model outperforms LR and random forest in consumer credit scoring. Wang et al. ( 2019 ) consider that online operation data can be used to predict consumer credit scores. They thus convert each kind of event into a word and apply the Event2vec model to transform the word into a vector in the LSTM network. The probability of default yields higher accuracy than other models. Jurgovsky et al. ( 2018 ) employs the LSTM to detect credit card fraud and find that LSTM can enhance detection accuracy.

Han et al. ( 2018 ) report a method that adopts RL to assess the credit risk. They claim that high-dimensional partial differential equations (PDEs) can be reformulated by using backward stochastic differential equations. NN approximates the gradient of the unknown solution. This model can be applied to F&B risk evaluation after considering all elements such as participating agents, assets, and resources, simultaneously.

Portfolio management

Song et al. ( 2017 ) establish a model after combining ListNet and RankNet to make a portfolio. They take a long position for the top 25% stocks and hold the short position for the bottom 25% stocks weekly. The ListNetlong-short model is the optimal one, which can achieve a return of 9.56%. Almahdi and Yang ( 2017 ) establish a better portfolio with a combination of RNN and RL. The result shows that the proposed trading system respond to transaction cost effects efficiently and outperform hedge fund benchmarks consistently.

Macroeconomic prediction

Sevim et al. ( 2014 ) develops a model with a back-propagation learning algorithm to predict the financial crises up to a year before it happened. This model contains three-layer perceptrons (i.e., MLP) and can achieve an accuracy rate of approximately 95%, which is superior to DT and LR. Chatzis et al. ( 2018 ) examine multiple models such as classification tree, SVM, random forests, DNN, and extreme gradient boosting to predict the market crisis. The results show that crises encourage persistence. Furthermore, using DNN increases the classification accuracy that makes global warning systems more efficient.

Price prediction

For price prediction, Sehgal and Pandey ( 2015 ) review ANN, SVM, wavelet, GA, and hybrid systems. They separate the time-series models into stochastic models, AI-based models, and regression models to predict oil prices. They reveal that researchers prevalently use MLP for price prediction.

Data preprocessing and data input

Data preprocessing.

Data preprocessing is conducted to denoise before data training of DL. This section summarizes the methods of data preprocessing. Multiple preprocessing techniques discussed in Part 4 include the principal component analysis (Chong et al. 2017 ), SVM (Gunduz et al. 2017 ), autoencoder, and RBM (Chen et al. 2018b ). There are several additional techniques of feature selection as follows.

Relief: The relief algorithm (Zhu et al. 2018 ) is a simple approach to weigh the importance of the feature. Based on NN algorithms, relief repeats the process for n times and divides each final weight vector by n . Thus, the weight vectors are the relevance vectors, and features are selected if their relevance is larger than the threshold τ .

Wavelet transforms: Wavelet transforms are used to fix the noise feature of the financial time series before feeding into a DL network. It is a widely used technique for filtering and mining single-dimensional signals (Bao et al. 2017 ).

Chi-square: Chi-square selection is commonly used in ML to measure the dependence between a feature and a class label. The representative usage is by Gunduz et al. ( 2017 ).

Random forest: Random forest algorithm is a two-stage process that contains random feature selection and bagging. The representative usage is by Fischer and Krauss ( 2017 ).

Data inputs

Data inputs are an important criterion for judging whether a DL model is feasible for particular F&B domains. This section summarizes the method of data inputs that have been adopted in the literature. Based on our review, five types of input data in the F&B domain can be presented. Table  2 provides a detailed summary of the input variable in F&B domains.

History price: The daily exchange rate can be considered as history price. The price can be the high, low, open, and close price of the stock. Related articles include Bao et al. ( 2017 ), Chen et al. ( 2017 ), Singh and Srivastava ( 2017 ), and Yan and Ouyang ( 2017 ).

Technical index: Technical indexes include MA, exponential MA, MA convergence divergence, and relative strength index. Related articles include Bao et al. ( 2017 ), Chen et al. ( 2017 ), Gunduz et al. ( 2017 ), Sezer et al. ( 2017 ), Singh and Srivastava ( 2017 ), and Yan and Ouyang ( 2017 ).

Financial news: Financial news covers financial message, sentiment shock score, and sentiment trend score. Related articles include Feuerriegel and Prendinger ( 2016 ), Krausa and Feuerriegel ( 2017 ), Minh et al. ( 2017 ), and Song et al. ( 2017 ).

Financial report data: Financial report data can account for items in the financial balance sheet or the financial report data (e.g., return on equity, return on assets, price to earnings ratio, and debt to equity ratio). Zhang and Maringer ( 2015 ) is a representative study on the subject.

Macroeconomic data: This kind of data includes macroeconomic variables. It may affect elements of the financial market, such as exchange rate, interest rate, overnight interest rate, and gross foreign exchange reserves of the central bank. Representative articles include Bao et al. ( 2017 ), Kim and Won ( 2018 ), and Sevim et al. ( 2014 ).

Stochastic data: Chakraborty ( 2019 ) provides a representative implementation.

Evaluation rules

It is critical to judge whether an adopted DL model works well in a particular financial domain. We, thus, need to consider evaluation systems of criteria for gauging the performance of a DL model. This section summarizes the evaluation rules of F&B-oriented DL models. Based on our review, three evaluation rules dominate: the error term, the accuracy index, and the financial index. Table  3 provides a detailed summary. The evaluation rules can be boiled down to the following categories.

Error term: Suppose Y t  +  i and F t  +  i are the real data and the prediction data, respectively, where m is the total number. The following is a summary of the functional formula commonly employed for evaluating DL models.

Mean Absolute Error (MAE): \( {\sum}_{i=1}^m\frac{\left|{Y}_{t+i}-{F}_{t+i}\right|}{m} \) ;

Mean Absolute Percent Error (MAPE): \( \frac{100}{m}{\sum}_{i=1}^m\frac{\left|{Y}_{t+i}-{F}_{t+i}\right|}{Y_{t+i}} \) ;

Mean Squared Error (MSE): \( {\sum}_{i=1}^m\frac{{\left({Y}_{t+i}-{F}_{t+i}\right)}^2}{m} \) ;

Root Mean Squared Error (RMSE): \( \sqrt{\sum_{i=1}^m\frac{{\left({Y}_{t+i}-{F}_{t+i}\right)}^2}{m}} \) ;

Normalized Mean Square Error (NMSE): \( \frac{1}{m}\frac{\sum {\left({Y}_{t+i}-{F}_{t+i}\right)}^2}{\mathit{\operatorname{var}}\left({Y}_{t+i}\right)} \) .

Accuracy index: According to Matsubara et al. ( 2018 ), we use TP, TN, FP, and FN to represent the number of true positives, true negatives, false positives, and false negatives, respectively, in a confusion matrix for classification evaluation. Based on our review, we summarize the accuracy indexes as follows.

Directional Predictive Accuracy (DPA): \( \frac{1}{N}{\sum}_{t=1}^N{D}_t \) , if ( Y t  + 1  −  Y t ) × ( F t  + 1  −  Y t ) ≥ 0, D t  = 1, otherwise, D t  = 0;

Actual Correlation Coefficient (ACC): \( \frac{TP+ TN}{TP+ FP+ FN+ TN} \) ;

Matthews Correlation Coefficient (MCC): \( \frac{TP\times TN- FP\times FN}{\sqrt{\left( TP+ FP\right)\left( TP+ FN\right)\left( TN+ FP\right)\left( TN+ FN\right)}} \) .

Financial index: Financial indexes involve total return, Sharp ratio, abnormal return, annualized return, annualized number of transaction, percentage of success, average profit percent per transaction, average transaction length, maximum profit percentage in the transaction, maximum loss percentage in the transaction, maximum capital, and minimum capital.

For the prediction by regressing the numeric dependent variables (e.g., exchange rate prediction or stock market prediction), evaluation rules are mostly error terms. For the prediction by classification in the category data (e.g., direction prediction on oil price), the accuracy indexes are widely conducted. For stock trading and portfolio management, financial indexes are the final evaluation rules.

General comparisons of DL models

This study identifies the most efficient DL model in each identified F&B domain. Table  4 illustrates our comparisons of the error terms in the pool of reviewed articles. Note that “A > B” means that the performance of model A is better than that of model B. “A + B” indicates the hybridization of multiple DL models.

At this point, we have summarized three methods of data processing in DL models against seven specified F&B domains, including data preprocessing, data inputs, and evaluation rules. Apart from the technical level of DL, we find the following:

NN has advantages in handling cross-sectional data;

RNN and LSTM are more feasible in handling time series data;

CNN has advantages in handling the data with multicollinearity.

Apart from application domains, we can induce the following viewpoints. Cross-sectional data usually appear in exchange rate prediction, price prediction, and macroeconomic prediction, for which NN could be the most feasible model. Time series data usually appear in stock market prediction, for which LSTM and RNN are the best options. Regarding stock trading, a feasible DL model requires the capabilities of decision and self-learning, for which RL can be the best. Moreover, CNN is more suitable for the multivariable environment of any F&B domains. As shown in the statistics of the Appendix , the frequency of using corresponding DL models corresponds to our analysis above. Selecting proper DL models according to the particular needs of financial analysis is usually challenging and crucial. This study provides several recommendations.

We summarize emerging DL models in F&B domains. Nevertheless, can these models refuse the efficient market hypothesis (EMH)? Footnote 5 According to the EMH, the financial market has its own discipline. There is no long-term technical tool that could outperform an efficient market. If so, using DL models may not be practical in long-term trading as it requires further experimental tests. However, why do most of the reviewed articles argue that their DL models of trading outperform the market returns? This argument has challenged the EMH. A possible explanation is that many DL algorithms are still challenging to apply in the real-world market. The DL models may raise trading opportunities to gain abnormal returns in the short-term. In the long run, however, many algorithms may lose their superiority, whereas EMH still works as more traders recognize the arbitrage gap offered by these DL models.

This section discusses three aspects that could affect the outcomes of DL models in finance.

Training and validation of data processing

The size of the training set.

The optimal way to improve the performance of models is by enhancing the size of the training data. Bootstrap can be used for data resampling, and generative adversarial network (GAN) can extend the data features. However, both can recognize numerical parts of features. Sometimes, the sample set is not diverse enough; thus, it loses its representativeness. Expanding the data size could make the model more unstable. The current literature reported diversified sizes of training sets. The requirements of data size in the training stage could vary by different F&B tasks.

The number of input factors

Input variables are independent variables. Based on our review, multi-factor models normally perform better than single-factor models in the case that the additional input factors are effective. In the time-series data model, long-term data have less prediction errors than that for a short period. The number of input factors depends on the employment of the DL structure and the specific environment of F&B tasks.

The quality of data

Several methods can be used to improve the data quality, including data cleaning (e.g., dealing with missing data), data normalization (e.g., taking the logarithm, calculating the changes of variables, and calculating the t -value of variables), feature selection (e.g., Chi-square test), and dimensionality reduction (e.g., PCA). Financial DL models require that the input variables should be interpretable in economics. When inputting the data, researchers should clarify the effective variables and noise. Several financial features, such as technical indexes, are likely to be created and added into the model.

Selection on structures of DL models

DL model selection should depend on problem domains and cases in finance. NN is suitable for processing cross-sectional data. LSTM and other RNNs are optimal choices for time-series data in prediction tasks. CNN can settle the multicollinearity issue through data compression. Latent variable models like GAN can be better for dimension reduction and clustering. RL is applicable in the cases with judgments like portfolio management and trading. The return levels and outcomes on RL can be affected significantly by environment (observation) definitions, situation probability transfer matrix, and actions.

The setting of objective functions and the convexity of evaluation rules

Objective function selection affects training processes and expected outcomes. For predictions on stock price, low MAE merely reflects the effectiveness of applied models in training; however, it may fail in predicting future directions. Therefore, it is vital for additional evaluation rules for F&B. Moreover, it can be more convenient to resolve the objective functions if they are convex.

The influence of overfitting (underfitting)

Overfitting (underfitting) commonly happens in using DL models, which is clearly unfavorable. A generated model performs perfectly in one case but usually cannot replicate good performance with the same model and identical coefficients. To solve this problem, we have to trade off the bias against variances. Bias posits that researchers prefer to keep it small to illustrate the superiority of their models. Generally, a deeper (i.e., more layered) NN model or neurons can reduce errors. However, it is more time-consuming and could reduce the feasibility of applied DL models.

One solution is to establish validation sets and testing sets for deciding the numbers of layers and neurons. After setting optimal coefficients in the validation set (Chong et al. 2017 ; Sevim et al. 2014 ), the result in the testing sets reveals the level of errors that could mitigate the effect of overfitting. One can input more samples of financial data to check the stability of the model’s performance. This method is known as the early stopping. It stops training more layers in the network once the testing result has achieved an optimal level.

Moreover, regularization is another approach to conquer the overfitting. Chong et al. ( 2017 ) introduces a constant term for the objective function and eventually reduces the variates of the result. Dropout is also a simple method to address overfitting. It reduces the dimensions and layers of the network (Minh et al. 2017 ; Wang et al. 2019 ). Finally, the data cleaning process (Baek and Kim 2018 ; Bao et al. 2017 ), to an extent, could mitigate the impact of overfitting.

Financial models

The sustainability of the model.

According to our reviews, the literature focus on evaluating the performance of historical data. However, crucial problems remain. Given that prediction is always complicated, the problem of how to justify the robustness of the used DL models in the future remains. More so, whether a DL model could survive in dynamic environments must be considered.

The following solutions could be considered. First, one can divide the data into two groups according to the time range; performance can subsequently be checked (e.g., using the data for the first 3 years to predict the performance of the fourth year). Second, the feature selection can be used in the data preprocessing, which could improve the sustainability of models in the long run. Third, stochastic data can be generated for each input variable by fixing them with a confidence interval, after which a simulation to examine the robustness of all possible future situations is conducted.

The popularity of the model

Whether a DL model is effective for trading is subject to the popularity of the model in the financial market. If traders in the same market conduct an identical model with limited information, they may run identical results and adopt the same trading strategy accordingly. Thus, they may lose money because their strategy could sell at a lower price after buying at a higher.

Conclusion and future works

Concluding remarks.

This paper provides a comprehensive survey of the literature on the application of DL in F&B. We carefully review 40 articles refined from a collection of 150 articles published between 2014 and 2018. The review and refinement are based on a scientific selection of academic databases. This paper first recognizes seven core F&B domains and establish the relationships between the domains and their frequently-used DL models. We review the details of each article under our framework. Importantly, we analyze the optimal models toward particular domains and make recommendations according to the feasibility of various DL models. Thus, we summarize three important aspects, including data preprocessing, data inputs, and evaluation rules. We further analyze the unfavorable impacts of overfitting and sustainability when applying DL models and provide several possible solutions. This study contributes to the literature by presenting a valuable accumulation of knowledge on related studies and providing useful recommendations for financial analysts and researchers.

Future works

Future studies can be conducted from the DL technical and F&B application perspectives. Regarding the perspective of DL techniques, training DL model for F&B is usually time-consuming. However, effective training could greatly enhance accuracy by reducing errors. Most of the functions can be simulated with considerable weights in complicated networks. First, one of the future works should focus on data preprocessing, such as data cleaning, to reduce the negative effect of data noise in the subsequent stage of data training. Second, further studies on how to construct layers of networks in the DL model are required, particularly when considering a reduction of the unfavorable effects of overfitting and underfitting. According to our review, the comparisons between the discussed DL models do not hinge on an identical source of input data, which renders these comparisons useless. Third, more testing regarding F&B-oriented DL models would be beneficial.

In addition to the penetration of DL techniques in F&B fields, more structures of DL models should be explored. From the perspective of F&B applications, the following problems need further research to investigate desirable solutions. In the case of financial planning, can a DL algorithm transfer asset recommendations to clients according to risk preferences? In the case of corporate finance, how can a DL algorithm benefit capital structure management and, thus, maximize the values of corporations? How can managers utilize DL technical tools to gauge the investment environment and financial data? How can they use such tools to optimize cash balances and cash inflow and outflow? Until recently, DL models like RL and generative adversarial networks are rarely used. More investigations on constructing DL structures for F&B regarding preferences would be beneficial. Finally, the developments of professional F&B software and system platforms that implement DL techniques are highly desirable.

Availability of data and materials

Not applicable.

In the model, NSGA stands for non-dominated sorting genetic algorithm.

A combination of Wavelet transforms (WT) and long-short term memory (LSTM) is called WLSTM in Bao et al. ( 2017 ).

Q-learning is a model-free reinforcement learning algorithm.

Buy-and-hold is a passive investment strategy in which an investor buys stocks (or ETFs) and holds them for a long period regardless of fluctuations in the market.

EMH was developed from a Ph.D. dissertation by economist Eugene Fama in the 1960s. It says that at any given time, stock prices reflect all available information and trade at exactly their fair value at all times. It is impossible to consistently choose stocks that will beat the returns of the overall stock market. Therefore, this hypothesis implies that the pursuit of market-beating performance is more about chance than it is about researching and selecting the right stocks.

Almahdi, S., & Yang, S. Y. (2017). An adaptive portfolio trading system: A risk-return portfolio optimization using recurrent reinforcement learning with expected maximum drawdown. Expert Systems with Applications, 87 , 267–279.

Article   Google Scholar  

Baek, Y., & Kim, H. Y. (2018). ModAugNet: A new forecasting framework for stock market index value with an overfitting prevention LSTM module and a prediction LSTM module. Expert Systems with Applications, 113 , 457–480.

Bao, W., Yue, J., & Rao, Y. (2017). A deep learning framework for financial time series using stacked autoencoders and long-short-term memory. PLoS One, 12 (7), e0180944.

Butaru, F., Chen, Q., Clark, B., Das, S., Lo, A. W., & Siddique, A. (2016). Risk and risk management in the credit card industry. Journal of Banking & Finance, 72 , 218–239.

Cavalcante, R. C., Brasileiro, R. C., Souza, V. L. F., Nobrega, J. P., & Oliveira, A. L. I. (2016). Computational intelligence and financial markets: A survey and future directions. Expert System with Application, 55 , 194–211.

Chai, J. Y., & Li, A. M. (2019). Deep learning in natural language processing: A state-of-the-art survey. In The proceeding of the 2019 international conference on machine learning and cybernetics (pp. 535–540). Japan: Kobe.

Google Scholar  

Chai, J. Y., Liu, J. N. K., & Ngai, E. W. T. (2013). Application of decision-making techniques in supplier selection: A systematic review of literature. Expert Systems with Applications, 40 (10), 3872–3885.

Chai, J. Y., & Ngai, E. W. T. (2020). Decision-making techniques in supplier selection: Recent accomplishments and what lies ahead. Expert Systems with Applications, 140 , 112903. https://doi.org/10.1016/j.eswa.2019.112903 .

Chakraborty, S. (2019). Deep reinforcement learning in financial markets Retrieved from https://arxiv.org/pdf/1907.04373.pdf . Accessed 04 Apr 2020.

Chatzis, S. P., Siakoulis, V., Petropoulos, A., Stavroulakis, E., & Vlachogiannakis, E. (2018). Forecasting stock market crisis events using deep and statistical machine learning techniques. Expert Systems with Applications, 112 , 353–371.

Chen, C. T., Chen, A. P., & Huang, S. H. (2018a). Cloning strategies from trading records using agent-based reinforcement learning algorithm. In The proceeding of IEEE international conference on agents (pp. 34–37).

Chen, H., Xiao, K., Sun, J., & Wu, S. (2017). A double-layer neural network framework for high-frequency forecasting. ACM Transactions on Management Information Systems, 7 (4), 11.

Chen, L., Qiao, Z., Wang, M., Wang, C., Du, R., & Stanley, H. E. (2018b). Which artificial intelligence algorithm better predicts the Chinese stock market? IEEE Access, 6 , 48625–48633.

Chong, E., Han, C., & Park, F. C. (2017). Deep learning networks for stock market analysis and prediction: Methodology, data representations, and case studies. Expert Systems with Applications, 83 , 187–205.

Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12 , 2493–2537.

Deng, Y., Bao, F., Kong, Y., Ren, Z., & Dai, Q. (2017). Deep direct reinforcement learning for financial signal representation and trading. IEEE Transactions on Neural Networks and Learning Systems, 28 (3), 653–664.

Dingli, A., & Fournier, K. S. (2017). Financial time series forecasting—A machine learning approach. International Journal of Machine Learning and Computing, 4 , 11–27.

Elad, M., & Aharon, M. (2006). Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image Processing, 15 (12), 3736–3745.

Feuerriegel, S., & Prendinger, H. (2016). News-based trading strategies. Decision Support Systems, 90 , 65–74.

Fischer, T., & Krauss, C. (2017). Deep learning with long short-term memory networks for financial market predictions. European Journal of Operational Research, 270 (2), 654–669.

Galeshchuk, S., & Mukherjee, S. (2017). Deep networks for predicting the direction of change in foreign exchange rates. Intelligent Systems in Accounting, Finance and Maangement, 24 (4), 100–110.

Gunduz, H., Yaslan, Y., & Cataltepe, Z. (2017). Intraday prediction of Borsa Istanbul using convolutional neural networks and feature correlations. Knowledge-Based Systems, 137 , 138–148.

Guo, Y., Liu, Y., Oerlemans, A., Lao, S., Wu, S., & Lew, M. S. (2016). Deep learning for visual understanding: A review. Neurocomputing, 187 , 27–48.

Han, J., Jentzen, A., & Weinan, E. (2018). Solving high-dimensional partial differential equations using deep learning. The proceedings of the National Academy of Sciences of the United States of America (PNAS) ; 8505–10).

Hernandez, J., & Abad, A. G. (2018). Learning from multivariate discrete sequential data using a restricted Boltzmann machine model. In The proceeding of IEEE 1st Colombian conference on applications in computational intelligence (ColCACI) (pp. 1–6).

Hsu, P. Y., Chou, C., Huang, S. H., & Chen, A. P. (2018). A market making quotation strategy based on dual deep learning agents for option pricing and bid-ask spread estimation.   The proceeding of IEEE international conference on agents (pp. 99–104).

Jeong, G., & Kim, H. Y. (2018). Improving financial trading decisions using deep Q-learning: Predicting the number of shares, action strategies and transfer learning. Expert Systems with Applications, 117 , 125–138.

Jiang, X., Pan, S., Jiang, J., & Long, G. (2018). Cross-domain deep learning approach for multiple financial market predictions. The proceeding of international joint conference on neural networks (pp. 1–8).

Jurgovsky, J., Granitzer, M., Ziegler, K., Calabretto, S., Portier, P. E., Guelton, L. H., & Caelen, O. (2018). Sequence classification for credit-card fraud detection. Expert Systems with Applications, 100 , 234–245.

Kim, H. Y., & Won, C. H. (2018). Forecasting the volatility of stock price index: A hybrid model integrating LSTM with multiple GARCH-type models. Expert Systems with Applications, 103 , 25–37.

Krausa, M., & Feuerriegel, S. (2017). Decision support from financial disclosures with deep neural networks and transfer learning Retrieved from https://arxiv.org/pdf/1710.03954.pdf Accessed 04 Apr 2020.

Book   Google Scholar  

Krauss, C., Do, X. A., & Huck, N. (2017). Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the S&P500. European Journal of Operational Research, 259 (2), 689–702.

Martinez-Miranda, E., McBurney, P., & Howard, M. J. W. (2016). Learning unfair trading: A market manipulation analysis from the reinforcement learning perspective. In The proceeding of 2016 IEEE conference on evolving and adaptive intelligent systems (EAIS) (pp. 103–109).

Chapter   Google Scholar  

Matsubara, T., Akita, R., & Uehara, K. (2018). Stock price prediction by deep neural generative model of news articles. IEICE Transactions on Information and Systems, 4 , 901–908.

Minh, D. L., Sadeghi-Niaraki, A., Huy, H. D., Min, K., & Moon, H. (2017). Deep learning approach for short-term stock trends prediction based on two-stream gated recurrent unit network. IEEE Access, 6 , 55392–55404.

Ravi, V., Pradeepkumar, D., & Deb, K. (2017). Financial time series prediction using hybrids of chaos theory, multi-layer perceptron and multi-objective evolutionary algorithms. Swarm and Evolutionary Computation, 36 , 136–149.

Rönnqvist, S., & Sarlin, P. (2017). Bank distress in the news describing events through deep learning. Neurocomputing, 264 (15), 57–70.

Sehgal, N., & Pandey, K. K. (2015). Artificial intelligence methods for oil price forecasting: A review and evaluation. Energy System, 6 , 479–506.

Sevim, C., Oztekin, A., Bali, O., Gumus, S., & Guresen, E. (2014). Developing an early warning system to predict currency crises. European Journal of Operational Research, 237 (3), 1095–1104.

Sezer, O. B., Ozbayoglu, M., & Gogdu, E. (2017). A deep neural-network-based stock trading system based on evolutionary optimized technical analysis parameters. Procedia Computer Science, 114 , 473–480.

Shen, F., Chao, J., & Zhao, J. (2015). Forecasting exchange rate using deep belief networks and conjugate gradient method. Neurocomputing, 167 , 243–253.

Singh, R., & Srivastava, S. (2017). Stock prediction using deep learning. Multimedia Tools Application, 76 , 18569–18584.

Sohangir, S., Wang, D., Pomeranets, A., & Khoshgoftaar, T. M. (2018). Big data: Deep learning for financial sentiment analysis. Journal of Big Data, 5 (3), 1–25.

Song, Q., Liu, A., & Yang, S. Y. (2017). Stock portfolio selection using learning-to-rank algorithms with news sentiment. Neurocomputing, 264 , 20–28.

Tadaaki, H. (2018). Bankruptcy prediction using imaged financial ratios and convolutional neural networks. Expert Systems with Applications, 117 , 287–299.

Wang, C., Han, D., Liu, Q., & Luo, S. (2019). A deep learning approach for credit scoring of peer-to-peer lending using attention mechanism LSTM. IEEE Access, 7 , 2161–2167.

Yan, H., & Ouyang, H. (2017). Financial time series prediction based on deep learning. Wireless Personal Communications, 102 , 683–700.

Zhang, J., & Maringer, D. (2015). Using a genetic algorithm to improve recurrent reinforcement learning for equity trading. Computational Economics, 47 , 551–567.

Zheng, J., Fu, X., & Zhang, G. (2017). Research on exchange rate forecasting based on a deep belief network. Neural Computing and Application, 31 , 573–582.

Zhu, B., Yang, W., Wang, H., & Yuan, Y. (2018). A hybrid deep learning model for consumer credit scoring. In The proceeding of international conference on artificial intelligence and big data (pp. 205–208).

Download references

Acknowledgments

The constructive comments of the editor and three anonymous reviewers on an earlier version of this paper are greatly appreciated. The authors are indebted to seminar participants at 2019 China Accounting and Financial Innovation Form at Zhuhai for insightful discussions. The corresponding author thanks the financial supports from BNU-HKBU United International College Research Grant under Grant R202026.

BNU-HKBU United International College Research Grant under Grant R202026.

Author information

Authors and affiliations.

Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong, China

Division of Business and Management, BNU-HKBU United International College, Zhuhai, China

Junyi Chai & Stella Cho

You can also search for this author in PubMed   Google Scholar

Contributions

JH carried out the collections and analyses of the literature, participated in the design of this study and preliminarily drafted the manuscript. JC initiated the idea and research project, identified the research gap and motivations, carried out the collections and analyses of the literature, participated in the design of this study, helped to draft the manuscript and proofread the manuscript. SC participated in the design of the study and the analysis of the literature, helped to draft the manuscript and proofread the manuscript. The authors read and approved the final manuscript.

Corresponding author

Correspondence to Junyi Chai .

Ethics declarations

Competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Part A. Summary of publications in DL and F&B domains

Part b. detailed structure of standard rnn.

The abstract structure of RNN for a sequence cross over time can be extended, as shown in Fig. 7 in Appendix , which presents the inputs as X , the outputs as Y , the weights as w , and the Tanh functions.

figure 7

The detailed structure of RNN

Part C. List of abbreviations

Rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Huang, J., Chai, J. & Cho, S. Deep learning in finance and banking: A literature review and classification. Front. Bus. Res. China 14 , 13 (2020). https://doi.org/10.1186/s11782-020-00082-6

Download citation

Received : 02 September 2019

Accepted : 30 April 2020

Published : 08 June 2020

DOI : https://doi.org/10.1186/s11782-020-00082-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Literature review
  • Deep learning

literature review of neural network

BRIEF RESEARCH REPORT article

Data leakage in deep learning studies of translational eeg.

\r\nGeoffrey Brookshire
&#x;

  • 1 SPARK Neuro Inc., New York, NY, United States
  • 2 Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, United States
  • 3 Pacific Brain Health Center, Pacific Neuroscience Institute and Foundation, Santa Monica, CA, United States
  • 4 Saint John's Cancer Institute at Providence Saint John's Health Center, Santa Monica, CA, United States
  • 5 Psychiatry and Biobehavioral Sciences, Semel Institute for Neuroscience and Human Behavior, David Geffen School of Medicine at University of California, Los Angeles, Los Angeles, CA, United States

A growing number of studies apply deep neural networks (DNNs) to recordings of human electroencephalography (EEG) to identify a range of disorders. In many studies, EEG recordings are split into segments, and each segment is randomly assigned to the training or test set. As a consequence, data from individual subjects appears in both the training and the test set. Could high test-set accuracy reflect data leakage from subject-specific patterns in the data, rather than patterns that identify a disease? We address this question by testing the performance of DNN classifiers using segment-based holdout (in which segments from one subject can appear in both the training and test set), and comparing this to their performance using subject-based holdout (where all segments from one subject appear exclusively in either the training set or the test set). In two datasets (one classifying Alzheimer's disease, and the other classifying epileptic seizures), we find that performance on previously-unseen subjects is strongly overestimated when models are trained using segment-based holdout. Finally, we survey the literature and find that the majority of translational DNN-EEG studies use segment-based holdout. Most published DNN-EEG studies may dramatically overestimate their classification performance on new subjects.

1 Introduction

Translational neuroscience studies increasingly turn to deep neural network (DNN) models to find structure in neural data. The power of DNN models comes from their ability to discover patterns in the data that researchers would not have been able to specify. DNN classifiers have the potential to revolutionize medical care by increasing the speed, accuracy, and availability of diagnosis ( Mall et al., 2023 ). DNNs have been trained on a variety of imaging techniques to identify a wide range of clinical conditions. Many of these studies use DNNs to diagnose diseases based on anatomical neuroimaging. For example, DNN models can identify Alzheimer's disease (AD) using structural magnetic resonance imaging (MRI) ( Wen et al., 2020 ), and a variety of cancers and brain injuries using CT scans ( Hosny et al., 2018 ; Kaka et al., 2021 ). In addition to anatomical data, a large number of studies have used DNNs to identify diseases from functional neuroimaging data. For example, DNNs with functional MRI show promise for identifying AD, Autism spectrum disorders, attention-deficit/hyperactivity disorder (ADHD), and schizophrenia ( Wen et al., 2018 ). Furthermore, DNNs have been used with electroencephalography (EEG) to study a variety of different neural and cognitive disorders ( de Bardeci et al., 2021 ).

Deep learning helps to reveal previously-unknown patterns in neuroimaging data, but it also presents researchers with subtle pitfalls. One set of challenges concerns how the data are split into separate training and test sets. The training set is used to fit the model's parameters, and the test set is used to estimate the model's performance on new data (a third subset of the data is often held aside as a validation set, used to tune the model's hyperparameters and to determine when to stop training the model). In some cases, researchers train their model on one subset of the available data, and then evaluate the model's performance on a separate test set. In other cases, researchers use cross-validation (CV) to train and test models on multiple subsets of the data. Under both of these approaches, researchers must be careful to avoid “data leakage” when splitting the data into training and test sets. Data leakage, which arises when information about the test set is present in the training set, results in a positively-biased estimate of the model's performance ( Kaufman et al., 2012 ). For example, in a data-mining competition focused on identifying patients with breast cancer, one team of researchers found that the patient ID number carried predictive information about cancer risk ( Rosset et al., 2010 ). These ID numbers may have appeared after compiling data from different medical institutions. Because the ID number was assigned based on patients' diagnosis, it constitutes a source of data leakage ( Rosset et al., 2010 ). In general, data leakage occurs when an experimenter handles the data in a way that artificially introduces correlations between the training and test sets.

DNN models typically require a large amount of training data to perform well, but neural datasets are usually expensive and difficult to obtain. To increase the number of observations available to train the model, these studies often split a single neural recording into multiple samples, and use each sample as a separate observation during training or testing. For example, a 3D structural MR volume could be split into multiple 2D slices, and an fMRI time-series could be split into multiple segments of time ( Wen et al., 2020 ). When multiple observations from a single subject are included in both the training and test sets, it constitutes data leakage: Instead of learning a generalizable pattern, these models could learn characteristics of the individual subjects in the training set, and then simply recognize those familiar subjects in the test set. As a result, these models perform well in the study's test set, leading the researchers to believe they have a robust classifier. In new subjects, however, the model may fail to generalize.

Prior research has shown that leakage of subject-specific information—sometimes referred to as “identity confounding” ( Chaibub Neto et al., 2019 )—occurs in a number of different research areas. For example, this type of data-leakage occurs in published MRI studies ( Wen et al., 2020 ). Furthermore, leakage of subject-specific information is widespread in translational studies using optical coherence tomography (OCT), and leads to strongly inflated estimates of test accuracy ( Tampu et al., 2022 ). Identity confounding has also been demonstrated in studies that make clinical predictions on the basis of smartphone data, wearable sensor data, and audio voice recordings ( Saeb et al., 2017 ; Tougui et al., 2021 ).

Studies using DNNs with EEG are particularly susceptible to data leakage. In these studies, each subject's full EEG time-series (lasting several minutes) is commonly divided up into brief segments (lasting several seconds) ( de Bardeci et al., 2021 ). Each segment is then used as a separate observation during training or testing. This segmentation procedure is meant to ensure that DNN models have enough training data to learn robust representations of the patterns that characterize a disease, and to prepare the data for commonly-used model architectures. However, EEG segmentation leads to data leakage if the same subjects appear in both the training and test sets. Segments of EEG from one subject are more similar to each other than to segments from different subjects ( Demuru and Fraschini, 2020 ). Instead of learning an abstract representation that would generalize to new subjects, a DNN model could therefore achieve high classification accuracy by associating a label with each subject's idiosyncratic pattern of brain activity. As a consequence, randomly splitting EEG segments into training and test sets results in data leakage, and a biased estimate of test performance: accuracy is high on the researchers' test set, but the classifier will generalize poorly to new subjects. In a clinical setting, this leads to an apparently-promising diagnostic tool that fails when applied to new patients. To avoid this kind of data leakage, all segments from a given subject must be assigned to only a single partition of the data (i.e., train or validation or test).

How does leakage of subject-specific information bias the results of translational DNN-EEG studies? Here we address this question by examining the effects of data leakage in two case studies, and then reviewing the published literature to gauge the prevalence of this leakage. In the case studies, we reproduce two convolutional neural network (CNN) architectures used by published studies—both of which used a train-test split that introduced data leakage. In order to focus on the ways in which leakage results from the train-test split, and to facilitate comparison with prior literature, we reuse these published model architectures without any modification. First, we use a CNN to classify subjects as either healthy or as having dementia due to Alzheimer's disease. Second, we use a CNN to classify whether segments of time contain an epileptic seizure. In both datasets, we find that real-world performance is dramatically overestimated when data from individual subjects is included in both the training and test sets. In the literature review, we find that the majority of translational DNN-EEG studies suffer from data leakage due to data from individual subjects appearing in both the training and test sets.

2.1 Deep neural network analysis overview

To investigate how segment-based holdout leads to data leakage, we reproduced the model architectures from two published studies ( Oh et al., 2020 ; Rashed-Al-Mahfuz et al., 2021 ). The goal of these analyses was not to develop an optimal architecture, but rather to evaluate the impact of different cross-validation choices on the estimated model performance. We therefore re-used the published architectures and data processing pipelines without modification, and without any model selection or hyperparameter tuning. The code necessary to reproduce both of these DNN models is provided in the Supplementary material .

2.2 Experiment 1: Alzheimer's disease diagnosis

2.2.1 eeg data.

We analyzed EEG data that was collected for a previously published study ( Ganapathi et al., 2022 ). These EEG recordings were provided to us by the Pacific Neuroscience Institute. All procedures were approved by the St. John's Cancer Institute Institutional Review Board (Protocol JWCI-19-1101) in accordance with the Helsinki Declaration of 1975. Patients were evaluated by a dementia specialist as part of their visit to a specialty memory clinic (Pacific Brain Health Center in Santa Monica, CA) for memory complaints. This evaluations included behavioral testing as well as EEG recordings. After these evaluations, subjects were selected by retrospectively reviewing charts for patients aged 55 and older seen between July 2018 and February 2021.

Patients received a consensus diagnosis from a panel of board-certified dementia specialists. Diagnoses were performed using standard clinical methods on the basis of neurological examinations, cognitive testing (MMSE Folstein et al., 1975 or MoCA Nasreddine et al., 2005 ), clinical history (e.g., hypertension, diabetes, head injury, depression), and laboratory results (e.g., vitamin B-12 levels, thyroid stimulating hormone levels, and rapid plasma regain testing). These tests were used to rule out reversible causes of memory loss and to diagnose subjective cognitive impairment (SCI), mild cognitive impairment (MCI), and dementia. EEG data was not included in the diagnostic process. Cognitive impairment was diagnosed on the basis of MMSE [or MoCA scores converted to MMSE ( Bergeron et al., 2017 )], with MCI diagnosed according to established criteria ( Langa and Levine, 2014 ). MCI was distinguished from dementia on the basis of preserved independence in functional abilities, and a lack of significant impairment in social or occupational functioning. SCI was diagnosed in patients with subjective complaints but without evidence of MCI. Diagnostic categorization was based on the clinical syndromes ( Langa and Levine, 2014 ), and did not consider disease etiology or subtypes within each stage.

EEG data were recorded at 250 Hz using the eVox System (Evoke Neuroscience), with a cap that included 19 electrodes following the International 10-20 system (FP1, FP2, F7, F3, Fz, F4, F8, T7, C3, Cz, C4, T8, P7, P3, Pz, P4, P8, O1, and O2). The full EEG session included a 5-min block of eyes-open rest, a 5-minute block of eyes-closed rest, and a 15-min go/no-go task. In this study, we analyzed only the eyes-open resting-state data. Recordings were low-pass filtered below 125 Hz, and split into non-overlapping segments of 2 s (500 samples) for model training. Channels were stacked to produce matrices of shape (500, 19) as model inputs.

We selected all 49 subjects in the dataset who were diagnosed with dementia due to Alzheimer's disease (18 male, 31 female; age 73.9 ± 6.8 years). As a comparison, we selected an equal number of subjects with subjective cognitive impairment (SCI; n = 49, 18 male, 31 female; age 63.9 ± 11.4 years).

2.2.2 Architecture

Because our goal was to evaluate the effects of different cross-validation strategies on generalizability, we re-used a previously-published model architecture without modification. We reproduced the model architecture from Oh et al. (2020) ; this model is a 1D convolutional neural network trained to classify segments of time-series EEG data as SCI or AD.

This model learns temporal filters that are applied equivalently across each EEG channel. Progressing through the network, subsequent layers build more complex features that take into account a larger temporal receptive field, and some invariance is achieved through pooling over time. The model consisted of four convolutional layers, each followed by rectification, max pooling, and batch normalization; convolutional layers were followed by two dense fully-connected layers of 20 and 10 hidden units, respectively, each rectified, and finally a dense connectivity to the output layer with 2 units representing AD yes/no probability logits. All deep learning models were trained with Keras and Tensorflow. The exact Keras code used to specify the architecture can be found in the Supplementary material .

2.2.3 Training

Models were trained for 70 epochs without any early stopping or hyperparameter tuning. A batch size of 32, initial learning rate of 0.0001, and the Adam optimizer were used to optimize models. Training accuracy was computed and stored online during each epoch, and averaged across batches to report the training accuracy for each epoch. To visualize how quickly the models reached their final performance, test set accuracy was also computed after each epoch, averaged across batches. Since we reused the model architecture from prior published work, no model selection was performed; performing ongoing validation on the test is therefore not a source of data leakage. For segment-based holdout, data were split using 10-fold cross-validation (see “Cross-validation” for details).

2.3 Experiment 2: seizure detection

2.3.1 eeg data.

We analyzed data from the Siena Scalp EEG Database ( Detti, 2020 ; Detti et al., 2020 ) hosted on PhysioNet ( Goldberger et al., 2000 ). These recordings were collected in accordance with the Declaration of Helsinki, and approved by the Ethical Committee of the University of Siena. Participants provided written informed consent before beginning data collection. This dataset includes recordings from 14 epilepsy patients (age 20–71 years, nine male) digitized at 512 Hz with electrodes arranged following the International 10-20 system. Seizures in the data were labeled by an expert clinician. This dataset contains 47 seizures in ~128 h of recorded EEG. To ensure that the data were balanced between seizure and non-seizure epochs, we selected non-seizure data from the beginning of each subject's recordings to match the duration of their seizure-labeled data. This led to 47 min 21 s of data in each condition (1 h 34 min 42 s in total).

In contrast to the previous section where raw time series were used, EEG data were prepared for the classifier analysis in the frequency domain, following the approach used by Rashed-Al-Mahfouz and colleagues ( Rashed-Al-Mahfuz et al., 2021 ). Spectrograms were computed with a window length of 256 samples (0.5 s) overlapping by 128 samples (0.25 s), using a Hann taper. Spectrograms were then divided into segments of 1.5 s. As in the original study, we used the RGB representation of the spectrogram (viridis color-map), and exported as 224 × 224 × 3 images for training and testing with the CNN models.

2.3.2 Architecture

The aim of this study was to evaluate the impact of different cross-validation choices, not to identify a highly-performing model architecture. We therefore reused the model architecture presented by Rashed-Al-Mahfuz et al. (2021) without modification. No model selection or hyperparameter tuning was performed. To handle 3D spectrogram data (vs. 2D time-series used in the previous section), a 2D convolutional neural network was used. This model learns 2D spectrotemporal features that are applied equivalently across the spectrogram. The model contains four convolutional layers, each followed by rectification, pooling, and batch normalization, followed by two hidden fully-connected layers of 256 and 512 units each, dropout, and a final classification layer of 2 units corresponding to seizure yes/no. The exact Keras code used to specify the architecture can be found in the Supplementary material .

2.3.3 Training

Models were trained for 70 epochs with no early stopping. We used the RMSProp optimimzer with a batch size of 32 and a learning rate of 0.00001. Training accuracy was computed and stored online during each epoch, and averaged across batches to report the training accuracy for each epoch. To visualize how quickly the models reached their final performance, test set accuracy was also computed after each epoch, averaged across batches. Since we reused the model architecture from prior published work, no model selection was performed; performing ongoing validation on the test is therefore not a source of data leakage.

2.4 Cross-validation

This study is primarily concerned with the consequences of different approaches to splitting the data between training and test sets. We assess two types of train-test split: (1) holding out individual segments of EEG data without regard for subject ID (“segment-based holdout”), and (2) holding out entire subjects, ensuring that all segments for a given subject appear in only the training or the test set (“subject-based holdout”; Figure 1 ).

www.frontiersin.org

Figure 1 . Illustration of segment-based and subject-based holdout. This example shows cross-validation with three participants, each of whom have three segments of data, and 3-fold cross-validation (CV). Each row shows a separate CV fold. Each square illustrates a single EEG segment, with blue squares indicating observations in the training set and red squares indicating observations in the test set. Gray rectangles are drawn around observations from the same subject.

2.4.1 Segment-based holdout

Segment-based cross-validation considers all EEG segments to be equivalent, and divides them into training and validation partitions without considering subject ID. This segment-holdout approach will lead to data leakage if there is statistical non-independence due to multiple EEG segments coming from each subject. Given n segments and m time-points per segments, we construct a matrix X of EEG segments of size ( n, m ), and a vector y of diagnostic label of length n . The cross-validation is a simple partition of the index vector α = {1, 2, …, n } into disjoint subsets α train and α test . Where X i gives the i th segments of X , we then have X train = { X i }∀ i ∈ α train , X test = { X i }∀ i ∈ α test , and y train = { y i }∀ i ∈ α train , y test = { y i }∀ i ∈ α test .

2.4.2 Subject-based holdout

Subject-based cross-validation takes into account which subject each EEG segment comes from. This approach enforces that each subject appears in only one partition of the cross-validation, ensuring there is no leakage of subject-level information across training and test sets. To create this split, we consider an additional subject vector s , which is used to constrain the partition of X and y . Concretely, rather than partitioning the index vector α , we partition the unique subject vector s u , which gives the unique entries of s , and collect all corresponding segments from each subject contained in train and validation partitions into α train and α test . This enforces the constraint that s i ≠ s j ∀ i ∈ α train , j ∈ α test . To perform k-fold cross-validation, we first divide s u into k non-overlapping chunks, and each chunk to serve as the validation data in each fold of cross-validation, where the remaining k −1 chunks are reserved for training.

2.5 Literature review

We searched the literature for studies that used deep learning with segments of EEG to classify a variety of diseases. We searched Google Scholar for papers investigating Alzheimer's disease, Parkinson's disease, attention-deficit/hyperactivity disorder (ADHD), depression, schizophrenia, and seizures. We then searched the references of these papers to identify additional publications for inclusion. Following this search, we included every study that used a DNN to identify psychiatric or neurological conditions using EEG. This non-exhaustive search included 63 papers, all of which were published since 2018 and used deep learning to study one of the conditions named above.

Next, we examined how the training and test sets were determined in these studies. If a paper specified that the EEG recordings were split into segments, but did not specify that they used subjects as an organizing factor of the train-test split, we labeled that study as using “segment-based” holdout. Some papers specifically stated that segments from individual subjects were included in both the training and test sets (for example, studies that trained separate models for each subject); these studies were also labeled as segment-based holdout. If a paper specified that all the segments from a single subject were assigned to only the training or the test set, we labeled that study as using “subject-based” holdout. If a study used both segment-based and subject-based holdout in different analyses, we labeled the study as “both”. We labeled studies as “unclear” if we could not determine whether the models were trained on segments of EEG recordings, and it was not explicitly stated that subjects were used as a factor in the holdout procedure.

3.1 Data leakage leads to biased test-set accuracy

We analyze two datasets to test how the estimated accuracy of a DNN classifier depends on the train-test split. First, we examine the effects of data leakage in a patient-level classifier by training a model to diagnose Alzheimer's disease. Second, we examine the effects of data leakage in a segment-level classifier by training a model to identify periods of time that include an epileptic seizure. In each of these analyses, we reuse a published DNN architecture to analyze an existing dataset.

3.1.1 Identifying patients with Alzheimer's disease

To determine whether segment-based holdout leads to a biased estimate of accuracy, we first trained a CNN to diagnose Alzheimer's disease using segments of EEG. When the EEG segments were split into training and test sets without considering subject ID, the model showed nearly perfect test-set accuracy of 99.8% (99.1–100.0%) ( Figure 2A ). Performance quickly approached ceiling within the first 15 training epochs ( Figure 3A ). This high accuracy is consistent with prior studies that use segment-based holdout and report high accuracy for CNNs at identifying neurological disorders ( Acharya et al., 2018b ; Lee et al., 2019 ; Oh et al., 2020 ). Could this pattern of high accuracy reflect data leakage, instead of a robust and generalizable classifier?

www.frontiersin.org

Figure 2 . Test-set accuracy of CNN models predicting held-out data, plotted separately for segment-based holdout and subject-based holdout. (A) Accuracy for models trained to classify Alzheimer's disease in individual subjects. Boxes show the inter-quartile range, dark lines show the median, and whiskers extend to the minimum and maximum points. (B) Accuracy for models trained to identify seizures in segments of EEG data. Details as in (A) .

www.frontiersin.org

Figure 3 . Test-set accuracy of CNN models plotted as a function of the training epoch. Gray lines show accuracy in individual cross-validation folds, and red lines show the average across folds. (A) Accuracy for models trained to classify Alzheimer's disease using segment-based holdout. (B) Accuracy for models trained to classify Alzheimer's disease using subject-based holdout. (C) Accuracy for models trained to identify seizures using segment-based holdout. (D) Accuracy for models trained to identify seizures using subject-based holdout.

When we used subject-based holdout, ensuring that individual subjects' data did not appear in both the training and test sets, test accuracy dropped to 53.0% (43.1–64.8%), with 95% confidence intervals that included chance performance of 50%. Performance remained low throughout the training epochs ( Figure 3B ). Compared with subject-based holdout, segment-based holdout significantly overestimates the model performance on previously-unseen subjects (Wilcoxon T = 0.0, p = 0.002).

3.1.2 Identifying segments containing epileptic seizures

In some cases, artificial neural network models have been used to identify time-limited events within ongoing brain activity, such as epileptic seizures. Does segment-based holdout also lead to data leakage when labeling periods of time within subjects? To answer this question, we trained a CNN to classify segments of EEG data as containing an epileptic seizure or not.

When the EEG segments were split into training and test sets without considering subject ID, the model reached a high test-set accuracy of 79.1% (78.8–79.4%) ( Figure 2B ). Accuracy leveled out within 10 training epochs ( Figure 3C ). When individual subjects' data segments were restricted to appear in only the training or test set, however, accuracy fell to 65.1% (61.3–69.1%). Accuracy remained low throughout training epochs ( Figure 3D ). Even when the model is tasked with labeling periods of activity within subjects, segment-based holdout significantly overestimates performance on previously-unseen subjects (Wilcoxon T = 0.0, p = 0.0001).

3.2 Data leakage in published EEG studies

Do published translational EEG studies suffer from subject-specific data leakage, or do they avoid it by computing their test-set accuracy on held-out subjects? We examined the train-test split strategies in published studies that attempted to identify a clinical disorder using DNNs with EEG recordings. Out of the 63 relevant papers we found, only 17 (27.0%) unambiguously avoided this type of data leakage ( Figure 4 ; Table 1 ). Leakage of subject-specific information is pervasive in the translational EEG literature.

www.frontiersin.org

Figure 4 . Number of studies using each type of test-split. “Segments”: Segments of EEG data were assigned to the training and test sets without regard to subject; this approach leads to data leakage. “Subjects”: Each subject's data appeared in only the training set or the test set. “Both”: Both the Subjects and Segments approaches were used in different analyses. “Unclear”: We could not determine which approach was used for train-test splits.

www.frontiersin.org

Table 1 . Prior translational studies using deep learning with EEG.

4 Discussion

In EEG studies using deep learning, data leakage can occur when segments of data from the same subjects are included in both the training and test sets. Here we demonstrate that leakage of subject-specific information can dramatically overestimate the real-world clinical performance of a DNN classifier. Our Alzheimer's CNN classifier appeared to have an accuracy of above 99% when using segment-based holdout, but its true performance on previously-unseen subjects was indistinguishable from chance. We found this bias in test-set performance both in a between-subjects task (identifying patients with Alzheimer's disease in Experiment 1) and in a within-subjects task (identifying segments that contain a seizure in Experiment 2). Next, we show that this type of data leakage appears in the majority of published translational DNN-EEG studies we examined. Together, these results illustrate how an improperly-designed training-test split can bias the results of DNN studies, and show that biased results are widespread in the published literature.

To be useful in a clinical setting, a diagnostic classifier must be able to identify a disease in new patients. Models trained using segment-based holdout, however, strongly overestimate their ability to perform this task. Instead, these models may learn patterns associated with individual subjects, and then associate those idiosyncratic patterns with a diagnosis. As a consequence, performance of these models drops precipitously when they are tested in new subjects, and performance is unlikely to generalize to a new dataset. When training a translational DNN classifier, the model must be tested with subjects who were not included in the training set.

Our results show that segment-based cross-validation inflates estimates of out-of-sample model performance when training on segments from resting-state EEG. However, the same principles of data leakage will apply to task-based EEG; providing a classifier with person-specific information enables it to artificially inflate performance.

Although this study focused on Alzheimer's Disease and epileptic seizures, our findings are not particular those diseases. Classification studies will overestimate model generalization whenever data from individual participants is present in both the training and test sets. Prior review articles have summarized the details and idiosyncrasies of DNN models in the context of AD ( Cassani et al., 2018 ; Wen et al., 2020 ) and seizures ( Rasheed et al., 2020 ; Shoeibi et al., 2021 ).

4.1 Data leakage in between- and within-subjects designs

We find that segment-based cross-validation overestimates performance for both between-subjects (Alzheimer's disease, Experiment 1) and within-subjects comparisons (seizures, Experiment 2). However, the magnitude of this overestimate was smaller in a within-subjects comparison ( Figure 2 ). What leads to this difference in the size of the effect between the two tasks? In a between-subjects task, the classifier can simply associate a label with each individual participant. In a within-subjects task, however, this shortcut is not available to the model. Instead, it must learn a representation of the labels—albeit a representation that may be contaminated by multiple segments coming from the same event, or one that may be specific to a given participant.

4.2 Data leakage when identifying events within subjects

Instead of identifying a disease in each subject, some studies attempt to identify a diseased process in each segment of time (see Table 1 ). DNN models of epilepsy, for example, often aim to classify the segments of data that contain a seizure. We demonstrated in Experiment 2 that those studies are not immune to data leakage in training-test splits: the accuracy in novel subjects is strongly overestimated when the test set includes subjects who were also in the training set. This result could arise if the model uses different patterns to identify seizures in each subject.

Subject-specific studies indicate that a bespoke classifier could be trained to identify seizures in each new patient ( Jana et al., 2020 ; Liang et al., 2020 ; Li Y. et al., 2020 ). However, this would require every patient to have a large dataset of recordings that have already been labeled, which limits the clinical utility of this approach. A more realistic approach is to train DNN models to identify events in unseen patients.

4.3 Data leakage in other methods

In studies which have only one observation per subject, cross-validation is trivial – single observations are simply assigned to the training or test set. However, in EEG and many other medical imagining methods, the data from each subject is routinely split into multiple segments. In this paper, we showed how data leakage can arise when a long recording is split into multiple shorter segments. However, the same principles apply to any other method that introduces statistical non-independence between the training and test sets. For example, some EEG-based DNNs treat every channel independently, and use information from each channel as a separate observation ( Loh et al., 2021 ). Those studies are likely to suffer from substantial data leakage, since physiological sources of electrical activity appear redundantly across multiple EEG scalp electrodes ( Michel and He, 2019 ).

These principles also apply to other medical imaging methods and classifiers. Similar patterns of “identity confounding” data leakage have been documented in studies using functional ( Wen et al., 2018 ) and anatomical ( Wen et al., 2020 ) MRI, optical coherence tomography (OCT) ( Tampu et al., 2022 ), accelerometer and gyroscope recordings from smartphones ( Saeb et al., 2017 ), audio voice recordings ( Chaibub Neto et al., 2019 ; Tougui et al., 2021 ), and performance on motor tasks ( Chaibub Neto et al., 2019 ). Furthermore, data leakage due to identity confounding is not limited to deep neural networks, and has been uncovered using random forests ( Saeb et al., 2017 ; Chaibub Neto et al., 2019 ; Tougui et al., 2021 ) and support vector machines ( Tougui et al., 2021 ).

4.4 Caveats

We find that segment-based cross-validation leads to data leakage, and this type of cross-validation is common in translational EEG studies. This conclusion mirrors results from studies examining a variety of other types of data and classifier models ( Saeb et al., 2017 ; Wen et al., 2018 , 2020 ; Chaibub Neto et al., 2019 ; Tougui et al., 2021 ; Tampu et al., 2022 ). The precise amount of data leakage and the bias that it introduces, however, are likely to differ based on the details of the experiment. For example, if a study trains a classifier to identify individual subjects with a disease, then there may be stronger bias when the study involves fewer participants ( Saeb et al., 2017 ). The model architecture may also influence the amount of data leakage: a model that can more effectively learn subject-specific representations could show stronger bias than a model that cannot learn subject-specific patterns.

5 Conclusion

Data leakage occurs when EEG segments from one subject appear in the both the training and test sets. As a result, the test set accuracy dramatically overestimates the classifier's performance in new subjects. This type of data leakage is common in published studies using DNNs and translational EEG. To accurately estimate a model's performance, researchers must ensure that each subject's data is included in only the training or the test set, but not both.

Data availability statement

Publicly available datasets were analyzed in this study. This data can be found here: EEG data for experiment 1 were provided by the Pacific Neuroscience Institute. These data are described by Ganapathi et al. (2022) , and can be accessed through agreement with the authors of that study. EEG data for experiment 2 were downloaded from the publicly-available Siena Scalp EEG Database hosted on PhysioNet ( https://physionet.org/content/siena-scalp-eeg/1.0.0/ ).

Ethics statement

Ethical approval was not required for the study involving humans in accordance with the local legislation and institutional requirements. Written informed consent to participate in this study was not required from the participants or the participants' legal guardians/next of kin in accordance with the national legislation and the institutional requirements.

Author contributions

GB: Conceptualization, Formal analysis, Visualization, Writing – original draft. JK: Data curation, Formal analysis, Software, Visualization, Writing – review & editing. NB: Software, Writing – original draft, Writing – review & editing. YW: Conceptualization, Software, Writing – review & editing. RG: Resources, Writing – review & editing. DM: Resources, Supervision, Writing – review & editing. SG: Funding acquisition, Supervision, Writing – review & editing. KY: Writing – review & editing. CQ: Conceptualization, Data curation, Writing – review & editing. CL: Conceptualization, Funding acquisition, Writing – review & editing.

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This work was supported by SPARK Neuro, Inc. The funder was not involved in the study design, collection, analysis, interpretation of data, the writing of this article, or the decision to submit it for publication.

Acknowledgments

The contents of this manuscript have previously appeared online as a preprint on medRxiv ( https://www.medrxiv.org/content/10.1101/2024.01.16.24301366v1 ).

Conflict of interest

GB, JK, NB, YW, SG, KY, CQ, and CL were employed at SPARK Neuro Inc., a medical technology company developing diagnostic aids to help clinicians identify and assess neurodegenerative disease.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fnins.2024.1373515/full#supplementary-material

Acharya, U. R., Oh, S. L., Hagiwara, Y., Tan, J. H., and Adeli, H. (2018a). Deep convolutional neural network for the automated detection and diagnosis of seizure using EEG signals. Comput. Biol. Med . 100, 270–278. doi: 10.1016/j.compbiomed.2017.09.017

PubMed Abstract | Crossref Full Text | Google Scholar

Acharya, U. R., Oh, S. L., Hagiwara, Y., Tan, J. H., Adeli, H., and Subha, D. P. (2018b). Automated EEG-based screening of depression using deep convolutional neural network. Comput. Methods Programs Biomed . 161, 103–113. doi: 10.1016/j.cmpb.2018.04.012

Ahmadi, A., Kashefi, M., Shahrokhi, H., and Nazari, M. A. (2021). Computer aided diagnosis system using deep convolutional neural networks for ADHD subtypes. Biomed. Signal Process. Control 63:102227. doi: 10.1016/j.bspc.2020.102227

Crossref Full Text | Google Scholar

Ahmedt-Aristizabal, D., Fernando, T., Denman, S., Robinson, J. E., Sridharan, S., Johnston, P. J., et al. (2020). Identification of children at risk of schizophrenia via deep learning and EEG responses. IEEE J. Biomed. Health Inf . 25, 69–76. doi: 10.1109/JBHI.2020.2984238

Avcu, M. T., Zhang, Z., and Chan, D. W. S. (2019). “Seizure detection using least EEG channels by deep convolutional neural network,” in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (Brighton: IEEE), 1120–1124.

Google Scholar

Ay, B., Yildirim, O., Talo, M., Baloglu, U. B., Aydin, G., Puthankattil, S. D., et al. (2019). Automated depression detection using deep representation and sequence learning with EEG signals. J. Med. Syst . 43, 1–12. doi: 10.1007/s10916-019-1345-y

Bakhtyari, M., and Mirzaei, S. (2022). ADHD detection using dynamic connectivity patterns of EEG data and convlstm with attention framework. Biomed. Signal Process. Control 76:103708. doi: 10.1016/j.bspc.2022.103708

Bergeron, D., Flynn, K., Verret, L., Poulin, S., Bouchard, R. W., Bocti, C., et al. (2017). Multicenter validation of an MMSE-MoCA conversion table. J. Am. Geriatr. Soc . 65, 1067–1072. doi: 10.1111/jgs.14779

Bi, X., and Wang, H. (2019). Early Alzheimer's disease diagnosis based on EEG spectral images using deep learning. Neural Netw . 114, 119–135. doi: 10.1016/j.neunet.2019.02.005

Bouallegue, G., Djemal, R., Alshebeili, S. A., and Aldhalaan, H. (2020). A dynamic filtering DF-RNN deep-learning-based approach for EEG-based neurological disorders diagnosis. IEEE Access 8, 206992–207007. doi: 10.1109/ACCESS.2020.3037995

Cassani, R., Estarellas, M., San-Martin, R., Fraga, F. J., and Falk, T. H. (2018). Systematic review on resting-state EEG for Alzheimer's disease diagnosis and progression assessment. Dis. Mark . 2018:5174815. doi: 10.1155/2018/5174815

Chaibub Neto, E., Pratap, A., Perumal, T. M., Tummalacherla, M., Snyder, P., Bot, B. M., et al. (2019). Detecting the impact of subject characteristics on machine learning-based diagnostic applications. NPJ Digit. Med . 2:99. doi: 10.1038/s41746-019-0178-x

Chang, Y., Stevenson, C., Chen, I.-C., Lin, D.-S., and Ko, L.-W. (2022). Neurological state changes indicative of ADHD in children learned via EEG-based LSTM networks. J. Neural Eng . 19:016021. doi: 10.1088/1741-2552/ac4f07

Chen, H., Song, Y., and Li, X. (2019a). A deep learning framework for identifying children with ADHD using an EEG-based brain network. Neurocomputing 356, 83–96. doi: 10.1016/j.neucom.2019.04.058

Chen, H., Song, Y., and Li, X. (2019b). Use of deep learning to detect personalized spatial-frequency abnormalities in EEGs of children with ADHD. J. Neural Eng . 16:066046. doi: 10.1088/1741-2552/ab3a0a

Choi, G., Park, C., Kim, J., Cho, K., Kim, T.-J., Bae, H., et al. (2019). “A novel multi-scale 3D CNN with deep neural network for epileptic seizure detection,” in 2019 IEEE International Conference on Consumer Electronics (ICCE) (Las Vegas, NV: IEEE), 1–2.

Chu, L., Qiu, R., Liu, H., Ling, Z., Zhang, T., and Wang, J. (2017). Individual recognition in schizophrenia using deep learning methods with random forest and voting classifiers: Insights from resting state EEG streams. arXiv [preprint].

Daoud, H., and Bayoumi, M. A. (2019). Efficient epileptic seizure prediction based on deep learning. IEEE Trans. Biomed. Circuits Syst . 13, 804–813. doi: 10.1109/TBCAS.2019.2929053

de Bardeci, M., Ip, C. T., and Olbrich, S. (2021). Deep learning applied to electroencephalogram data in mental disorders: a systematic review. Biol. Psychol . 162:108117. doi: 10.1016/j.biopsycho.2021.108117

Demuru, M., and Fraschini, M. (2020). EEG fingerprinting: subject-specific signature based on the aperiodic component of power spectrum. Comput. Biol. Med . 120:103748. doi: 10.1016/j.compbiomed.2020.103748

Detti, P. (2020). Siena Scalp EEG Database (version 1.0.0). PhysioNet. doi: 10.13026/5d4a-j060

Detti, P., Vatti, G., and Zabalo Manrique de Lara, G. (2020). EEG synchronization analysis for seizure prediction: a study on data of noninvasive recordings. Processes 8:846. doi: 10.3390/pr8070846

Dubreuil-Vall, L., Ruffini, G., and Camprodon, J. A. (2020). Deep learning convolutional neural networks discriminate adult ADHD from healthy individuals on the basis of event-related spectral EEG. Front. Neurosci . 14:251. doi: 10.3389/fnins.2020.00251

Emami, A., Kunii, N., Matsuo, T., Shinozaki, T., Kawai, K., and Takahashi, H. (2019). Seizure detection by convolutional neural network-based analysis of scalp electroencephalography plot images. NeuroImage Clin . 22:101684. doi: 10.1016/j.nicl.2019.101684

Folstein, M. F., Folstein, S. E., and McHugh, P. R. (1975). “Mini-mental state”: a practical method for grading the cognitive state of patients for the clinician. J. Psychiatr. Res . 12, 189–198. doi: 10.1016/0022-3956(75)90026-6

Fürbass, F., Kural, M. A., Gritsch, G., Hartmann, M., Kluge, T., and Beniczky, S. (2020). An artificial intelligence-based EEG algorithm for detection of epileptiform EEG discharges: validation against the diagnostic gold standard. Clin. Neurophysiol . 131, 1174–1179. doi: 10.1016/j.clinph.2020.02.032

Ganapathi, A. S., Glatt, R. M., Bookheimer, T. H., Popa, E. S., Ingemanson, M. L., Richards, C. J., et al. (2022). Differentiation of subjective cognitive decline, mild cognitive impairment, and dementia using qEEG/ERP-based cognitive testing and volumetric MRI in an outpatient specialty memory clinic. J. Alzheimers Dis . 90, 1–9. doi: 10.3233/JAD-220616

Gao, Y., Gao, B., Chen, Q., Liu, J., and Zhang, Y. (2020). Deep convolutional neural network-based epileptic electroencephalogram (EEG) signal classification. Front. Neurol . 11:375. doi: 10.3389/fneur.2020.00375

Gkenios, G., Latsiou, K., Diamantaras, K., Chouvarda, I., and Tsolaki, M. (2022). “Diagnosis of Alzheimer's disease and mild cognitive impairment using EEG and recurrent neural networks,” in 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC) (Glasgow: IEEE), 3179–3182.

Goldberger, A. L., Amaral, L. A., Glass, L., Hausdorff, J. M., Ivanov, P. C., Mark, R. G., et al. (2000). PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation 101, e215—220. doi: 10.1161/01.CIR.101.23.e215

Hosny, A., Parmar, C., Quackenbush, J., Schwartz, L. H., and Aerts, H. J. (2018). Artificial intelligence in radiology. Nat. Rev. Cancer 18, 500–510. doi: 10.1038/s41568-018-0016-5

Huggins, C. J., Escudero, J., Parra, M. A., Scally, B., Anghinah, R., Vitória Lacerda De Araújo, A., et al. (2021). Deep learning of resting-state electroencephalogram signals for three-class classification of Alzheimer's disease, mild cognitive impairment and healthy ageing. J. Neural Eng . 18:046087. doi: 10.1088/1741-2552/ac05d8

Hussein, R., Palangi, H., Ward, R. K., and Wang, Z. J. (2019). Optimized deep neural network architecture for robust detection of epileptic seizures using EEG signals. Clin. Neurophysiol . 130, 25–37. doi: 10.1016/j.clinph.2018.10.010

Ieracitano, C., Mammone, N., Bramanti, A., Hussain, A., and Morabito, F. C. (2019). A Convolutional Neural Network approach for classification of dementia stages based on 2D-spectral representation of EEG recordings. Neurocomputing 323, 96–107. doi: 10.1016/j.neucom.2018.09.071

Iešmantas, T., and Alzbutas, R. (2020). Convolutional neural network for detection and classification of seizures in clinical data. Med. Biol. Eng. Comp . 58, 1919–1932. doi: 10.1007/s11517-020-02208-7

Jana, R., Bhattacharyya, S., and Das, S. (2020). “Patient-specific seizure prediction using the convolutional neural networks,” in Intelligence Enabled Research , eds. S. Bhattacharyya, S. Mitra, and P. Dutta (Springer), 51–60.

Kaka, H., Zhang, E., and Khan, N. (2021). Artificial intelligence and deep learning in neuroradiology: exploring the new frontier. Can. Assoc. Radiol. J . 72, 35–44. doi: 10.1177/0846537120954293

Kaufman, S., Rosset, S., Perlich, C., and Stitelman, O. (2012). Leakage in data mining: Formulation, detection, and avoidance. ACM Transact. Knowl. Discov. Data 6, 1–21. doi: 10.1145/2382577.2382579

Khan, H., Marcuse, L., Fields, M., Swann, K., and Yener, B. (2017). Focal onset seizure prediction using convolutional networks. IEEE Transact. Biomed. Eng . 65, 2109–2118. doi: 10.1109/TBME.2017.2785401

Khare, S. K., Bajaj, V., and Acharya, U. R. (2021). PDCNNet: an automatic framework for the detection of Parkinson's disease using EEG signals. IEEE Sens. J . 21, 17017–17024. doi: 10.1109/JSEN.2021.3080135

Kim, D., and Kim, K. (2018). “Detection of early stage Alzheimer's disease using EEG relative power with deep neural network,” in 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) (Honolulu, HI), 352–355.

PubMed Abstract | Google Scholar

Kim, S., Kim, J., and Chun, H.-W. (2018). Wave2vec: vectorizing electroencephalography bio-signal for prediction of brain disease. Int. J. Environ. Res. Public Health 15:1750. doi: 10.3390/ijerph15081750

Kwon, H., Kang, S., Park, W., Park, J., and Lee, Y. (2019). “Deep learning based pre-screening method for depression with imagery frontal EEG channels,” in 2019 International Conference on Information and Communication Technology Convergence (ICTC) (Jeju: IEEE), 378–380.

Langa, K. M., and Levine, D. A. (2014). The diagnosis and management of mild cognitive impairment: a clinical review. J. Am. Med. Assoc . 312, 2551–2561. doi: 10.1001/jama.2014.13806

Lee, S., Hussein, R., and McKeown, M. J. (2019). “A deep convolutional-recurrent neural network architecture for Parkinson's disease EEG classification,” in 2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP) (Ottawa, ON: IEEE), 1–4.

Li, X., La, R., Wang, Y., Hu, B., and Zhang, X. (2020). A deep learning approach for mild depression recognition based on functional connectivity using electroencephalography. Front. Neurosci . 14:192. doi: 10.3389/fnins.2020.00192

Li, X., La, R., Wang, Y., Niu, J., Zeng, S., Sun, S., et al. (2019). EEG-based mild depression recognition using convolutional neural network. Med. Biol. Eng. Comp . 57, 1341–1352. doi: 10.1007/s11517-019-01959-2

Li, Y., Liu, Y., Cui, W.-G., Guo, Y.-Z., Huang, H., and Hu, Z.-Y. (2020). Epileptic seizure detection in EEG signals using a unified temporal-spectral squeeze-and-excitation network. IEEE Transact. Neural Syst. Rehabil. Eng . 28, 782–794. doi: 10.1109/TNSRE.2020.2973434

Liang, W., Pei, H., Cai, Q., and Wang, Y. (2020). Scalp EEG epileptogenic zone recognition and localization based on long-term recurrent convolutional network. Neurocomputing 396, 569–576. doi: 10.1016/j.neucom.2018.10.108

Loh, H. W., Ooi, C. P., Palmer, E., Barua, P. D., Dogan, S., Tuncer, T., et al. (2021). GaborPDNet: gabor transformation and deep neural network for Parkinson's disease detection using EEG signals. Electronics 10:1740. doi: 10.3390/electronics10141740

Mafi, M., and Radfar, S. (2022). High dimensional convolutional neural network for EEG connectivity-based diagnosis of ADHD. J. Biomed. Phys. Eng . 12, 645–654. doi: 10.31661/jbpe.v0i0.2108-1380

Mall, P. K., Singh, P. K., Srivastav, S., Narayan, V., Paprzycki, M., Jaworska, T., et al. (2023). A comprehensive review of deep neural networks for medical image processing: recent developments and future opportunities. Healthc. Anal . 4:100216. doi: 10.1016/j.health.2023.100216

Michel, C. M., and He, B. (2019). EEG source localization. Handb. Clin. Neurol . 160, 85–101. doi: 10.1016/B978-0-444-64032-1.00006-0

Moghaddari, M., Lighvan, M. Z., and Danishvar, S. (2020). Diagnose ADHD disorder in children using convolutional neural network based on continuous mental task EEG. Comput. Methods Programs Biomed . 197:105738. doi: 10.1016/j.cmpb.2020.105738

Morabito, F. C., Campolo, M., Ieracitano, C., Ebadi, J. M., Bonanno, L., Bramanti, A., et al. (2016). “Deep convolutional neural networks for classification of mild cognitive impaired and Alzheimer's disease patients from scalp EEG recordings,” in 2016 IEEE 2nd International Forum on Research and Technologies for Society and Industry Leveraging a Better Tomorrow (RTSI) (Bologna), 1–6.

Mumtaz, W., and Qayyum, A. (2019). A deep learning framework for automatic diagnosis of unipolar depression. Int. J. Med. Inform . 132:103983. doi: 10.1016/j.ijmedinf.2019.103983

Nasreddine, Z. S., Phillips, N. A., Bédirian, V., Charbonneau, S., Whitehead, V., Collin, I., et al. (2005). The montreal cognitive assessment, MoCA: a brief screening tool for mild cognitive impairment. J. Am. Geriatr. Soc . 53, 695–699. doi: 10.1111/j.1532-5415.2005.53221.x

Oh, S. L., Hagiwara, Y., Raghavendra, U., Yuvaraj, R., Arunkumar, N., Murugappan, M., et al. (2020). A deep learning approach for Parkinson's disease diagnosis from EEG signals. Neur. Comp. Appl . 32, 10927–10933. doi: 10.1007/s00521-018-3689-5

Oh, S. L., Vicnesh, J., Ciaccio, E. J., Yuvaraj, R., and Acharya, U. R. (2019). Deep convolutional neural network model for automated diagnosis of schizophrenia using EEG signals. Appl. Sci . 9:2870. doi: 10.3390/app9142870

Raghu, S., Sriraam, N., Temel, Y., Rao, S. V., and Kubben, P. L. (2020). EEG based multi-class seizure type classification using convolutional neural network and transfer learning. Neur. Netw . 124, 202–212. doi: 10.1016/j.neunet.2020.01.017

Rashed-Al-Mahfuz, M., Moni, M. A., Uddin, S., Alyami, S. A., Summers, M. A., and Eapen, V. (2021). A deep convolutional neural network method to detect seizures and characteristic frequencies using epileptic electroencephalogram (EEG) data. IEEE J. Transl. Eng. Health Med . 9, 1–12. doi: 10.1109/JTEHM.2021.3050925

Rasheed, K., Qayyum, A., Qadir, J., Sivathamboo, S., Kwan, P., Kuhlmann, L., et al. (2020). Machine learning for predicting epileptic seizures using EEG signals: a review. IEEE Rev. Biomed. Eng . 14, 139–155. doi: 10.1109/RBME.2020.3008792

Rosset, S., Perlich, C., Świrszcz, G., Melville, P., and Liu, Y. (2010). Medical data mining: insights from winning two competitions. Data Min. Knowl. Discov . 20, 439–468. doi: 10.1007/s10618-009-0158-x

Saeb, S., Lonini, L., Jayaraman, A., Mohr, D. C., and Kording, K. P. (2017). The need to approximate the use-case in clinical machine learning. Gigascience 6:gix019. doi: 10.1093/gigascience/gix019

Shaban, M. (2021). “Automated screening of Parkinson's disease using deep learning based electroencephalography,” in 2021 10th International IEEE/EMBS Conference on Neural Engineering (NER) , 158–161.

Shaban, M., and Amara, A. W. (2022). Resting-state electroencephalography based deep-learning for the detection of Parkinson's disease. PLoS ONE 17:e0263159. doi: 10.1371/journal.pone.0263159

Shalbaf, A., Bagherzadeh, S., and Maghsoudi, A. (2020). Transfer learning with deep convolutional neural network for automated detection of schizophrenia from EEG signals. Phys. Eng. Sci. Med . 43, 1229–1239. doi: 10.1007/s13246-020-00925-9

Shi, X., Wang, T., Wang, L., Liu, H., and Yan, N. (2019). “Hybrid convolutional recurrent neural networks outperform CNN and RNN in task-state EEG detection for Parkinson's disease,” in 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) (Lanzhou: IEEE), 939–944.

Shoeibi, A., Khodatars, M., Ghassemi, N., Jafari, M., Moridian, P., Alizadehsani, R., et al. (2021). Epileptic seizures detection using deep learning techniques: a review. Int. J. Environ. Res. Public Health 18:5780. doi: 10.3390/ijerph18115780

TaghiBeyglou, B., Shahbazi, A., Bagheri, F., Akbarian, S., and Jahed, M. (2022). Detection of ADHD cases using CNN and classical classifiers of raw EEG. Comp. Methods Progr. Biomed . 2:100080. doi: 10.1016/j.cmpbup.2022.100080

Tampu, I. E., Eklund, A., and Haj-Hosseini, N. (2022). Inflation of test accuracy due to data leakage in deep learning-based classification of OCT images. Sci. Data 9:580. doi: 10.1038/s41597-022-01618-6

Tosun, M. (2021). Effects of spectral features of EEG signals recorded with different channels and recording statuses on ADHD classification with deep learning. Phys. Eng. Sci. Med . 44, 693–702. doi: 10.1007/s13246-021-01018-x

Tougui, I., Jilbab, A., and El Mhamdi, J. (2021). Impact of the choice of cross-validation techniques on the results of machine learning-based diagnostic applications. Healthc. Inform. Res . 27, 189–199. doi: 10.4258/hir.2021.27.3.189

Truong, N. D., Nguyen, A. D., Kuhlmann, L., Bonyadi, M. R., Yang, J., Ippolito, S., et al. (2018). Convolutional neural networks for seizure prediction using intracranial and scalp electroencephalogram. Neur. Netw . 105, 104–111. doi: 10.1016/j.neunet.2018.04.018

Ullah, I., Hussain, M., Aboalsamh, H., et al. (2018). An automated system for epilepsy detection using EEG brain signals based on deep learning approach. Expert Syst. Appl . 107, 61–71. doi: 10.1016/j.eswa.2018.04.021

Uyulan, C., Ergüzel, T. T., Unubol, H., Cebi, M., Sayar, G. H., Nezhad Asad, M., et al. (2021). Major depressive disorder classification based on different convolutional neural network models: deep learning approach. Clin. EEG Neurosci . 52, 38–51. doi: 10.1177/1550059420916634

Vahid, A., Bluschke, A., Roessner, V., Stober, S., and Beste, C. (2019). Deep learning based on event-related EEG differentiates children with ADHD from healthy controls. J. Clin. Med . 8:1055. doi: 10.3390/jcm8071055

Wei, X., Zhou, L., Chen, Z., Zhang, L., and Zhou, Y. (2018). Automatic seizure detection using three-dimensional CNN based on multi-channel EEG. BMC Med. Inform. Decis. Mak . 18, 71–80. doi: 10.1186/s12911-018-0693-8

Wei, X., Zhou, L., Zhang, Z., Chen, Z., and Zhou, Y. (2019). Early prediction of epileptic seizures using a long-term recurrent convolutional network. J. Neurosci. Methods 327:108395. doi: 10.1016/j.jneumeth.2019.108395

Wen, D., Wei, Z., Zhou, Y., Li, G., Zhang, X., and Han, W. (2018). Deep learning methods to process fMRI data and their application in the diagnosis of cognitive impairment: a brief overview and our opinion. Front. Neuroinform . 12:23. doi: 10.3389/fninf.2018.00023

Wen, J., Thibeau-Sutre, E., Diaz-Melo, M., Samper-González, J., Routier, A., Bottani, S., et al. (2020). Convolutional neural networks for classification of Alzheimer's disease: overview and reproducible evaluation. Med. Image Anal . 63:101694. doi: 10.1016/j.media.2020.101694

Xie, Y., Yang, B., Lu, X., Zheng, M., Fan, C., Bi, X., et al. (2020). “Anxiety and depression diagnosis method based on brain networks and convolutional neural networks,” in 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC) (Montreal, QC: IEEE), 1503–1506.

You, Z., Zeng, R., Lan, X., Ren, H., You, Z., Shi, X., et al. (2020). Alzheimer's disease classification with a cascade neural network. Front. Public Health 8:584387. doi: 10.3389/fpubh.2020.584387

Zhang, X., Li, J., Hou, K., Hu, B., Shen, J., and Pan, J. (2020). “EEG-based depression detection using convolutional neural network with demographic attention mechanism,” in 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC) (Montreal, QC: IEEE), 128–133.

Zhao, W., Zhao, W., Wang, W., Jiang, X., Zhang, X., Peng, Y., et al. (2020). A novel deep neural network for robust detection of seizures using EEG signals. Comput. Math. Methods Med . 2020:9689821. doi: 10.1155/2020/9689821

Zhao, Y., and He, L. (2015). “Deep learning in the EEG diagnosis of Alzheimer's disease,” in Computer Vision - ACCV 2014 Workshops, Lecture Notes in Computer Science , eds C. Jawahar, and S. Shan (Cham: Springer International Publishing), 340–353.

Zhou, D., Liao, Z., and Chen, R. (2022). Deep learning enabled diagnosis of children's ADHD based on the big data of video screen long-range EEG. J. Healthc. Eng . 2022:5222136. doi: 10.1155/2022/5222136

Zhou, M., Tian, C., Cao, R., Wang, B., Niu, Y., Hu, T., et al. (2018). Epileptic seizure detection based on EEG signals and CNN. Front. Neuroinform . 12:95. doi: 10.3389/fninf.2018.00095

Keywords: electroencephalography, deep neural networks, data leakage, cross-validation, Alzheimer's disease, epilepsy

Citation: Brookshire G, Kasper J, Blauch NM, Wu YC, Glatt R, Merrill DA, Gerrol S, Yoder KJ, Quirk C and Lucero C (2024) Data leakage in deep learning studies of translational EEG. Front. Neurosci. 18:1373515. doi: 10.3389/fnins.2024.1373515

Received: 19 January 2024; Accepted: 04 April 2024; Published: 03 May 2024.

Reviewed by:

Copyright © 2024 Brookshire, Kasper, Blauch, Wu, Glatt, Merrill, Gerrol, Yoder, Quirk and Lucero. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Geoffrey Brookshire, geoff.brookshire@sparkneuro.com

† Present address: Nicholas M. Blauch, Harvard University, Cambridge, MA, United States

‡ These authors have contributed equally to this work and share first authorship

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Comprehensive Review of Artificial Neural Network Applications to Pattern Recognition

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

Loading metrics

Open Access

Peer-reviewed

Research Article

Machine learning in epidemiology: Neural networks forecasting of monkeypox cases

Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

* E-mail: [email protected]

Affiliation Department of Mathematics, University of Hafr Al-Batin, Hafr Al-Batin, Saudi Arabia

ORCID logo

  • Lulah Alnaji

PLOS

  • Published: May 1, 2024
  • https://doi.org/10.1371/journal.pone.0300216
  • Reader Comments

Fig 1

This study integrates advanced machine learning techniques, namely Artificial Neural Networks, Long Short-Term Memory, and Gated Recurrent Unit models, to forecast monkeypox outbreaks in Canada, Spain, the USA, and Portugal. The research focuses on the effectiveness of these models in predicting the spread and severity of cases using data from June 3 to December 31, 2022, and evaluates them against test data from January 1 to February 7, 2023. The study highlights the potential of neural networks in epidemiology, especially concerning recent monkeypox outbreaks. It provides a comparative analysis of the models, emphasizing their capabilities in public health strategies. The research identifies optimal model configurations and underscores the efficiency of the Levenberg-Marquardt algorithm in training. The findings suggest that ANN models, particularly those with optimized Root Mean Squared Error, Mean Absolute Percentage Error, and the Coefficient of Determination values, are effective in infectious disease forecasting and can significantly enhance public health responses.

Citation: Alnaji L (2024) Machine learning in epidemiology: Neural networks forecasting of monkeypox cases. PLoS ONE 19(5): e0300216. https://doi.org/10.1371/journal.pone.0300216

Editor: Mihajlo Jakovljevic, Hosei University: Hosei Daigaku, JAPAN

Received: December 1, 2023; Accepted: February 25, 2024; Published: May 1, 2024

Copyright: © 2024 Lulah Alnaji. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The data used in this research is available from: https://ourworldindata.org/ .

Funding: The author(s) received no specific funding for this work.

Competing interests: The author declares that there are no conflicts of interest regarding the publication of this paper.

Abbreviations: ADAM, Adaptive Moment Estimation: An optimization algorithm for training machine learning models.; ANN, Artificial Neural Network: A type of machine learning model used for predictive analytics.; GRU, Gated Recurrent Unit: A variant of LSTM used in complex sequence modeling.; LM, Levenberg-Marquardt: An algorithm for solving non-linear least squares problems.; LSTM, Long Short-Term Memory: A recurrent neural network architecture for processing sequences of data.; MAPE, Mean Absolute Percentage Error: A measure of prediction accuracy in a forecasting model.; MPXV, Monkeypox Virus: The virus responsible for the disease monkeypox.; R 2 , Coefficient of Determination: A statistical measure of how well regression predictions approximate real data points.; RMSE, Root Mean Square Error: A measure of the differences between values predicted by a model and the values observed.

Introduction

The Monkeypox Virus (MPXV), a member of the Orthopoxvirus genus, is the causative agent of the infectious disease known as monkeypox. This virus is predominantly found in Central and West African countries, with sporadic cases reported in other regions, including the United States and the United Kingdom [ 1 – 3 ].

Transmission of MPXV to humans often occurs through direct contact with infected animals or contaminated objects, such as body fluids, sores, or bedding [ 1 , 2 , 4 ]. Human-to-human transmission is also possible, mainly through close physical interaction with infected individuals or exposure to their bodily fluids [ 1 , 4 ]. Symptoms of MPXV infection include fever, headache, muscle aches, and a characteristic rash that spreads across the body [ 1 , 4 ]. In severe cases, complications such as pneumonia, sepsis, and encephalitis can occur [ 1 , 2 , 4 ].

No specific antiviral treatment for MPXV currently exists; however, supportive care can aid in symptom management and reduction of complication risks [ 1 , 2 , 4 ]. Vaccination against smallpox has shown some effectiveness in preventing monkeypox, but routine smallpox immunization is no longer practiced [ 1 , 2 ]. Therefore, public health measures such as contact tracing, quarantine, and isolation are essential in controlling the spread of the disease [ 1 , 2 , 4 ].

MPXV, part of the Orthopoxvirus family, was first identified in monkeys in the Democratic Republic of the Congo in 1958 and in humans in 1970. The virus is endemic in certain areas of Central and West Africa, with occasional outbreaks. Reports of cases in countries outside Africa, including the United States, Canada, Portugal, and Spain, have increased recently [ 5 ].

The first recorded case of MPXV in the United States occurred in 2003 in a traveler from West Africa. This led to an investigation that identified 47 confirmed or probable cases across six states, primarily linked to prairie dogs infected with the virus. In Canada, a similar outbreak occurred in 2003, with two confirmed cases of MPXV in individuals who had traveled to West Africa [ 6 , 7 ].

Portugal reported its first outbreak of MPXV in 2018, with nine cases linked to recent travel to Nigeria. In 2021, Spain experienced its first outbreak of MPXV, with two cases also related to travel to Nigeria. These instances highlight the increasing frequency of MPXV cases outside Africa, underscoring the need for vigilant surveillance and preparedness to manage any outbreaks [ 8 ].

The utilization of machine learning in epidemiological research represents a transformative approach to understanding and managing infectious diseases. Building on the existing state of the art in disease forecasting, particularly leveraging machine learning techniques, our study aims to enhance the predictive modeling of monkeypox spread. While previous studies like [ 9 ] have developed neural network models for forecasting monkeypox in various countries, our research focuses on employing Artificial Neural Networks (ANN), Long Short-Term Memory (LSTM), and Gated Recurrent Unit (GRU) models to predict monkeypox cases in Canada, Spain, the USA, and Portugal. This approach not only addresses a gap in monkeypox research but also compares the efficacy of different neural network models, contributing perspective to the field.

The recent upsurge in monkeypox cases globally highlights the need for improved disease monitoring and forecasting methods. In this context, our study introduces advanced machine learning techniques, namely ANN, LSTM, and GRU models, to predict monkeypox outbreaks. This research is significant in its focus on the latest MPXV outbreaks and its comparative evaluation of different neural network models for forecasting the disease’s spread in various countries.

Literature review

The study of infectious diseases, particularly emerging viruses like MPXV, has increasingly incorporated machine learning approaches to enhance prediction and management strategies. Key studies in this field have demonstrated the utility of various neural network models, such as ANN, LSTM, and GRU, in understanding and forecasting disease patterns [ 10 – 15 ]. Our work builds upon these foundations, particularly focusing on recent developments in monkeypox forecasting.

Early detection and prediction of infectious diseases like MPXV are crucial for effective management and response. ANN approaches, as utilized in forecasting COVID-19 cases in Pakistan, provide valuable insights for healthcare professionals and policymakers [ 16 ].

ANN techniques are increasingly being used to predict patient outcomes in various diseases, including COVID-19, breast cancer, and cardiovascular disease. For example, ANN models were employed in assessing breast cancer risk among Iranian women [ 17 ].

While prior research like [ 9 ] has leveraged neural networks for predicting MPXV spread in specific regions, our study extends this application to Canada, Spain, the USA, and Portugal. This expansion is crucial, given the distinct epidemiological profiles and healthcare systems in these countries. Such comparative analysis contributes novel insights into the geographical variance in MPXV outbreak dynamics.

The role of machine learning in epidemiological modeling has evolved rapidly, with recent advances highlighting its potential in real-time disease surveillance and response planning. Studies have explored various machine learning techniques, including deep learning and predictive analytics, to enhance the accuracy of disease outbreak predictions and to understand transmission dynamics [ 16 , 18 – 20 ].

Our study contributes to this growing body of literature by employing a combination of ANN, LSTM, and GRU models, enhanced with the ADAM optimizer [ 21 ] and the Levenberg-Marquardt learning algorithm [ 22 ]. This approach not only allows for a comprehensive analysis of MPXV spread but also offers a methodological framework that can be adapted for other infectious diseases. The integration of advanced machine learning models in our research addresses a critical gap in current epidemiological studies.

The study utilizes a range of ANN models, including LSTM and GRU, to predict MPXV cases in the USA, Canada, Spain, and Portugal, based on existing datasets. The comparative analysis of these countries will assist healthcare authorities in formulating appropriate response strategies. This research is the first in-depth study using ANN on recent MPXV outbreaks, offering new insights into the epidemic’s dynamics. Time series dataset of MPXV cases from each country, along with statistical graphs of confirmed cases, is presented [ 23 ]. The distribution and geographical representation of confirmed Monkeypox cases across the studied nations are depicted in Figs 1 and 2 ( Fig 1a shows the distribution of confirmed cases, while Fig 1b provides a geographical representation on a global map). Additionally, the sequence of confirmed MPXV instances, detailed with peak intervals from June to October 2022 for Canada, Portugal, Spain, and the USA, are illustrated in Fig 2a–2d .

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

https://doi.org/10.1371/journal.pone.0300216.g001

thumbnail

(a) Sequence of confirmed MPXV instances in Canada, with a detailed view of the peak interval (June to October 2022), (b) Sequence of confirmed MPXV instances in Portugal, with a detailed view of the peak interval (June to October 2022), (c) Sequence of confirmed MPXV instances in Spain, with a detailed view of the peak interval (June to October 2022), (d) Sequence of confirmed MPXV instances in the USA, with a detailed view of the peak interval (June to October 2022).

https://doi.org/10.1371/journal.pone.0300216.g002

The prediction model uses data from the “Our World in Data” website, employing neural network, LSTM, and GRU models. The model’s performance is enhanced using an Adaptive Moment Estimation (ADAM) optimizer [ 21 ]. Additionally, a Levenberg-Marquardt (LM) learning algorithm is implemented for a single hidden layer ANN model, optimizing the number of neurons using the K-fold cross-validation early stopping validation approach [ 22 ]. ANN-based regression models have been effective in predicting the spread of infectious diseases like MPXV. These models enable informed decision-making by healthcare professionals and policymakers in controlling disease spread and responding effectively to outbreaks. ANN models have been applied in various domains for time-series prediction, demonstrating their versatility and efficacy [ 10 – 15 ].

The remainder of this paper is organized as follows: The Methodology section discusses the methodology used in this study. The Results and Discussions section presents the findings of the research. Following that, the Forecasting Methodology section covers the approach taken for forecasting. The paper concludes with the Conclusion section, summarizing the study’s key findings.

Methodology

In the manuscript, the choice of modeling methods, including ANN, LSTM, and GRU, is justified by their proven effectiveness in time-series analysis and epidemiological forecasting. ANN is renowned for its ability to model complex nonlinear relationships, making it ideal for predicting disease spread [ 24 ]. LSTM and GRU, as advanced recurrent neural networks, effectively capture temporal dependencies in data, crucial for accurate disease trend predictions [ 25 – 27 ]. These methodologies are selected for their ability to handle the intricacies and variabilities in infectious disease data, making them suitable for this study’s purpose. The assumptions underlying these models are standard in the field and have been extensively validated in prior research, ensuring their applicability and reliability in this context.

  • Data Representativeness: The assumption that the datasets used are representative of the wider population and accurately reflect the trends in monkeypox cases.
  • Stationarity of Data: The presumption that the underlying characteristics of the monkeypox data, such as trends and patterns, remain consistent over the period of study.
  • Impact of External Factors: The study assumes that external factors not included in the model (like public health interventions, changes in virus transmissibility) have a negligible impact on the predictions.

This study employs a comparative approach, analyzing ANN, LSTM, and GRU models due to the lack of existing research focusing on the same countries and time period. These models were selected for their proven capabilities in time-series prediction and their adaptability to different data characteristics. The comparative analysis allows for a nuanced understanding of each model’s strengths and weaknesses in predicting monkeypox outbreaks.

  • Data Preprocessing and Normalization: The data underwent preprocessing to correct irregularities and ensure consistency. Normalization, crucial for neural network models, involved scaling input and target values to a [0, 1] range. This step minimizes biases and enhances model interpretability.
  • Model Calibration: Model calibration involved fine-tuning hyperparameters for optimal performance. This process included adjusting learning rates, batch sizes, and layer configurations to enhance model accuracy and efficiency in data prediction.
  • Validation Techniques: K-fold cross-validation was employed to ensure model robustness and avoid overfitting. This technique involved dividing the dataset into ‘K’ subsets and iteratively training and testing the model on these subsets, providing a comprehensive assessment of model performance.
  • Performance Metrics: Statistical measures such as RMSE, MAE, and R-squared were utilized to evaluate model performance. These metrics provided quantitative insights into the model’s prediction accuracy, reliability, and fit to the data.

The artificial neural network

ANN inspired in part by the neuronal architecture of the human brain, consist of simple processing units capable of handling scalar messages. Their extensive interconnection and adaptive interaction between units make ANNs a multi-processor computer system [ 28 , 29 ]. ANNs offer a rapid and flexible approach to modeling, suitable for tasks such as rainfall-runoff prediction [ 30 ]. The network comprises layers of interconnected neurons, where connection weights between one or more hidden layers connect the input and output layers [ 31 ]. During training, the Back Propagation algorithm adjusts the network weights to reduce errors between the predicted and actual outputs [ 31 ]. After training with experimental data to obtain the optimal structure and weights, ANNs undergo evaluation using additional experimental data for validation [ 31 ]. The Multilayer Perceptron, a type of ANN with one or more hidden layers in the feed-forward network, is particularly prevalent [ 31 ]. In ANNs, a node, a data structure, is connected in a network trained using standard methods like gradient descent [ 24 , 32 , 33 ]. Each node in this memory or neural network has two active states (on or off) and one inactive state (off or 0), while each edge (synapse or link between nodes) carries a weight [ 34 – 36 ]. Positive weights stimulate or activate the next inactive node, whereas negative weights inhibit or deactivate the subsequent active node [ 34 , 35 , 37 ].

literature review of neural network

The LM optimizer, a widely used type of ANN, was employed in this study for epidemic prediction [ 38 , 39 ]. The ANN was trained on a dataset using the LM technique, optimizing the network by training with specific inner neurons [ 38 , 39 ]. Performance was evaluated using the Root Mean Square Error (RMSE) and correlation coefficient to minimize the cost function value [ 38 , 39 ].

Levenberg–Marquardt

literature review of neural network

This approximation ensures that the diagonal components of the predicted Hessian matrix are greater than zero, consequently guaranteeing the invertibility of H [ 40 , 41 ]. The LM algorithm employs a blend of the steepest descent and Gauss-Newton algorithms. When μ is close to zero, Eq (4) aligns with the Gauss-Newton method, while a large μ leads to the application of the steepest descent approach [ 42 ].

literature review of neural network

Eq (5) is also recognized as the Gauss-Newton procedure [ 40 ].

Adaptive moment estimation optimization

ADAM is a widely adopted optimization technique in deep learning, merging aspects of gradient descent with momentum and the Root Mean Square Propagation optimizer [ 21 ]. ADAM aims to address the shortcomings of conventional optimization methods, such as sensitivity to step size and gradient noise, by adjusting the learning rate based on estimations of the gradients’ first and second moments.

literature review of neural network

Gated recurrent unit

GRU networks, a type of Recurrent Neural Network (RNN) architecture, use gating mechanisms to control the flow of information. GRUs comprise three main components: the update gate, reset gate, and candidate state. The update gate determines the extent to which the previous hidden state should be maintained and how much new information from the candidate state should be included in the current hidden state. The reset gate decides the amount of the previous hidden state to be forgotten when computing the new candidate state. The candidate state represents new information derived from the input and the previous hidden state.

literature review of neural network

GRU networks, with their selective information updating mechanism, offer enhanced efficiency and effectiveness compared to traditional RNNs.

Long short-term memory

LSTM networks, another variant of RNNs, are adept at learning long-term dependencies by selectively retaining or forgetting information over time through gating mechanisms. An LSTM network consists of three types of gates: the forget gate, input gate, and output gate.

literature review of neural network

Control parameters for each model

The performance of neural network models such as ANN, LSTM, and GRU networks depends on several tunable hyperparameters. These parameters are crucial for the learning process and are optimized during training.

ANN model hyperparameters

  • Weights and Biases: Weights ( w ij and w kj ) are the core parameters adjusted during training. They determine the strength of connections between neurons in successive layers.
  • Number of Neurons in Each Layer: The size ( n and m ) of each layer, especially hidden layers, influences the network’s capacity to learn complex patterns.
  • Learning Algorithm: Back Propagation is used for adjusting weights, typically coupled with optimization techniques like the Levenberg-Marquardt (LM) optimizer.
  • Activation Function: The sigmoid function is used for neuron activation, transforming the weighted sum into an output.
  • Cost Function: E ( w ), the mean squared error between the predicted and actual outputs, is minimized during training.
  • Performance Metrics: RMSE and correlation coefficients are used for evaluating model performance.

LSTM model hyperparameters

  • Forget Gate Weights ( W f ): Controls the amount of previous cell state to retain.
  • Input Gate Weights ( W i ): Determines what new information is added to the cell state.
  • Output Gate Weights ( W o ): Decides what information to output from the cell state.
  • Bias terms ( b f , b i , b o ): Offset values added to gate computations.
  • Activation Functions: Typically sigmoid ( σ ) for gates and tanh for cell state updates.

GRU model hyperparameters

  • Update Gate Weights ( W z ): Balances the previous state and new candidate state contributions.
  • Reset Gate Weights ( W r ): Determines how much past information to forget.
  • Candidate State Weights ( W h ): Computes the potential new information to be added to the state.
  • Bias terms ( b z , b r , b h ): Offset values for each gate and candidate state computation.
  • Activation Functions: Sigmoid ( σ ) for update and reset gates, and tanh for candidate state.

These hyperparameters are iteratively adjusted through backpropagation and optimization algorithms to minimize loss functions, thereby improving the predictive performance of the models.

K-fold cross validation

Overfitting is a common issue with ANN models, where the model tends to learn noise in the data rather than the actual signals, leading to poor performance on untested datasets. To mitigate this, K-fold cross-validation is employed as a robust method [ 44 , 45 ]. In this technique, the data is randomly divided into K groups. The model undergoes training on (K-1) folds and is then evaluated on the remaining fold in each iteration, with RMSE serving as the performance metric. The learning process is monitored by plotting the number of epochs against the average RMSE on the validation folds. Training concludes when there is no significant reduction in RMSE with an increase in epochs [ 46 ].

Once model training is completed, its performance is evaluated against a separate test dataset. This involves scaling the features after loading the dataset, followed by dividing it into 10 folds for the 10-fold cross-validation. This process iterates ten times, each time splitting the dataset into training and validation sets, training the model on the former, and assessing it on the latter. The model’s performance is recorded in each iteration. The procedure progresses through each of the 10 folds until all have been evaluated. Finally, the average performance across all 10 folds is calculated and presented. This process terminates upon completion.

The method for determining the optimal number of hidden neurons in the ANN models is depicted in the flowchart in the below subsection (Flowchart of the 10-fold Cross-Validation Proces). As part of this approach, a total of 12 ANN models with varying numbers of hidden layers were developed. Overfitting occurs when a model learns from the noise in the data rather than the actual underlying patterns, leading to poor performance on unseen datasets. To mitigate this, K-fold cross-validation is employed. The flowchart in ( Fig 3 ) illustrates this process in a concise manner. The flowchart, depicted in ( Fig 4 ), presents a detailed view of the neural network model training and evaluation process utilizing 10-fold cross-validation. The process begins with ‘Start’ and is followed by the ‘Load dataset’ step, where the initial dataset is loaded for analysis. Following this, a ‘Preprocess’ stage involves scaling the features to ensure they are normalized for optimal model performance.

thumbnail

https://doi.org/10.1371/journal.pone.0300216.g003

thumbnail

https://doi.org/10.1371/journal.pone.0300216.g004

Flowchart of the 10-fold cross-validation process

Neural network modelling process.

This study encompassed the training and testing phases in the neural network modeling procedure. To enhance prediction accuracy and expedite model convergence, it was imperative for the data to be normalized within a specific range. The min-max normalization strategy was employed to ensure that both input and target values resided within the [0, 1] range, which is optimal for the activation function’s performance [ 47 , 48 ].

During the training phase, adjustments were made to the model’s synaptic weights to align with the optimal number of neurons in the hidden layer. Additionally, the training dataset was subdivided into “K” subsets using the K-fold cross-validation method. This approach facilitated the determination of the appropriate number of iterations, or “epochs,” required before concluding the model’s training.

Following the training, the model’s accuracy and predictive capacity were evaluated using a testing dataset. This phase enabled the neural network model to learn from the data and predict future instances of MPXV in the selected countries.

Evaluating the performance of the neural network models

literature review of neural network

In these equations, MAE signifies the Mean Absolute Error, and MAPE the Mean Absolute Percentage Error, providing further insight into the model’s accuracy.

K-fold cross-validation was utilized to mitigate overfitting in our neural network models. Training was concluded when a significant reduction in RMSE was no longer observed with an increase in epochs. This method ensured effective learning without overfitting.

The Levenberg-Marquardt optimization technique was crucial in determining when to stop training. It balanced convergence speed and model accuracy, preventing excessive training iterations and ensuring optimized model performance.

For LSTM and GRU models, training stop criteria included monitoring validation loss. Training was halted if validation loss stopped decreasing or started increasing. Early stopping was implemented, where training ceased after a pre-set number of epochs without improvement in validation loss. This prevented learning noise and ensured better generalization. Other hyperparameters like learning rate and batch size were also considered. Specific thresholds for early stopping based on validation loss changes were crucial for optimizing model training.

Peculiarities of applied methodologies

In our exploration of epidemiological forecasting, particularly in modeling the spread of monkeypox, this study introduces a novel approach through the application of Artificial Neural Networks (ANN), Long Short-Term Memory (LSTM), and Gated Recurrent Unit (GRU) models. These methodologies have been meticulously selected based on their demonstrated efficiency in capturing complex nonlinear relationships and temporal dependencies within time-series data, essential attributes for the accurate prediction of disease trends.

The distinctiveness of the methodology lies in the comprehensive adaptation and fine-tuning of these models to cater specifically to the challenges presented by infectious disease data, which is often marked by its variability and unpredictability. By employing a comparative analysis—a strategy less frequented in the existing literature for the countries and time periods under study—the approach facilitates a deeper understanding of each model’s strengths and limitations in forecasting monkeypox outbreaks.

  • Customized Data Preprocessing: The data preprocessing and normalization techniques were specifically tailored to accommodate the unique characteristics of epidemiological data, ensuring that the models are fed input that accurately reflects the dynamics of disease spread. This step is crucial in epidemiological modeling, where the quality of data directly impacts the accuracy of predictions.
  • Model Calibration and Validation: The methodological framework includes meticulous calibration of model hyperparameters, such as the number of neurons in hidden layers and learning rates, through an iterative process. This ensures the models are finely tuned to capture intricate patterns within the data. Furthermore, the use of K-fold cross-validation as a robust validation technique helps mitigate the risk of overfitting, a common challenge when dealing with time-series data in machine learning models.
  • Advanced Optimization Techniques: The adoption of advanced optimization techniques, such as the LM algorithm for ANN and ADAM for LSTM and GRU models, underlines the uniqueness of the approach. These techniques enhance the learning process, allowing for faster convergence and improved model performance by effectively navigating the complex landscape of the cost function.
  • Evaluation Metrics: The selection of comprehensive performance metrics, including RMSE, MAE, and R-squared, further ensuring the accuracy of the methodology. These metrics provide a multifaceted view of model performance, from prediction accuracy to the fit of the model to the data, ensuring a thorough evaluation of each model’s ability to accurately forecast disease trends.

Results and discussions

In this section, we delve deeper into the results and provide a more detailed discussion of the predictive performance of three neural network models: ANN, LSTM, and GRU. These models were trained using data from four countries: the USA, Canada, Spain, and Portugal. The period for training data was from June 3 to December 31, 2022, with the evaluation conducted on test data from January 1 to February 7, 2023. The outcomes of this study are illustrated in (Figs 5 – 7 ).

thumbnail

https://doi.org/10.1371/journal.pone.0300216.g005

thumbnail

https://doi.org/10.1371/journal.pone.0300216.g006

thumbnail

https://doi.org/10.1371/journal.pone.0300216.g007

Initially, perceptron ANN models with one and two hidden layers were developed. It was observed that one or two hidden layers sufficed for training the ANN for complex nonlinear problems [ 18 , 19 ]. This observation aligns with prior studies, including one forecasting dengue fever epidemics in San Juan, Puerto Rico, and the Northwest Coast of Yucatan, Mexico [ 19 ].

For network training, the LM algorithm was employed, recognized for its adaptability and efficiency. The LM method, which circumvents the computation of the Hessian matrix, is faster than traditional backpropagation methods. This technique has been successfully applied in other studies, including one that used a genetic algorithm to optimize the parameters of a COVID-19 SEIR model for US states [ 20 ].

In ( Fig 5 ), the training performance of the ANN model for MPXV over iterations, as measured by MSE. Each line represents one of the four countries, with the MSE values plotted against the number of iterations. The training process of the neural network models is characterized by several distinct phases, as evidenced by the MSE trends for each country. Initially, there is a noticeable spike in MSE for Portugal, indicative of the model’s rapid learning and calibration to correct early inaccuracies. As the iterations progress, the MSE for all countries demonstrates convergence towards lower values, suggesting an improvement in the model’s predictive accuracy on the training dataset. Despite this overall trend, the MSE experiences fluctuations, potentially reflecting the model’s adjustments to diverse patterns within the data. Notably, the MSE lines for Portugal, Spain, Canada, and the United States exhibit comparative stability, with Portugal’s model showing consistently lower MSE values, hinting at a better performance for Portugal data relative to the other countries.

The ( Fig 6 ) shows the LSTM model’s training performance for MPXV, with MSE used as the evaluation metric. Similar to Fig 5 , the convergence of MSE values can be seen. The LSTM model for Portugal demonstrates a unique trend with a slight increase in MSE at the later iterations. The GRU model’s training progression for MPXV is captured in Fig 7 , with MSE again serving as the performance metric. All countries show a rapid decrease in MSE initially, followed by a plateau. Notably, the GRU model for Portugal shows the most consistency in MSE values across iterations. Despite these fluctuations, a convergence towards a stable MSE range is observed for all countries, indicative of effective learning. In (Figs 5 – 7 ), the training performance of the ANN, LSTM and GRU models for MPXV over iterations is showcased, as measured by MSE. The detailed dynamics of this training process, including the specific learning curves for the ANN model across the four studied countries, are further elaborated in (Figs 8 – 10 ), highlighting the reduction in loss over epochs.

thumbnail

The training process is represented by the blue line and the validation process by the red line, with the reduction in loss over epochs indicating effective learning.

https://doi.org/10.1371/journal.pone.0300216.g008

thumbnail

Each subplot shows the training loss (blue line) decreasing over epochs, indicative of the model’s learning capacity, while the validation loss (red line) presents fluctuations, reflecting the model’s generalization to new data. Notable is the slight convergence between the two losses, suggesting a balance between learning and model complexity.

https://doi.org/10.1371/journal.pone.0300216.g009

thumbnail

The blue line indicates the training loss, which decreases with epochs, signifying learning, while the red line denotes the validation loss, showing fluctuations that point to the challenges in model generalization. The convergence of training and validation losses is particularly evident for Canada and the United States, suggesting a more effective model fit.

https://doi.org/10.1371/journal.pone.0300216.g010

The MSE trend analysis for each country revealed intrinsic differences in data characteristics and model behavior. For instance, the initial spike in MSE for Portugal suggests a phase of rapid learning, where the model aggressively adjusts its parameters to fit the complex data patterns. This phase is critical as it indicates the model’s sensitivity to the initial conditions and learning rate.

Subsequent fluctuations in MSE during the training iterations are indicative of the model’s continual adaptation process. These fluctuations may arise from various factors, such as the inherent noise in the data or the introduction of new patterns that the model attempts to learn. The stability observed in later iterations across all countries suggests that the models reach a point of equilibrium where learning is balanced with the complexity of the data. Moreover, the nuanced differences in MSE trends between the ANN, LSTM, and GRU models point to the distinct ways these architectures process temporal data.

To determine the optimal number of hidden neurons, the standard approach outlined in the above subsection (K-fold Cross Validation) was followed. A total of 12 ANN models with varying numbers of hidden layers were constructed, as detailed in Tables 1 – 20 . The best model for each scenario was selected based on its evaluation using R 2 , MAPE, and RMSE. Lower values for RMSE and MAPE and higher values for R 2 were indicative of better model performance.

thumbnail

https://doi.org/10.1371/journal.pone.0300216.t001

thumbnail

https://doi.org/10.1371/journal.pone.0300216.t002

thumbnail

https://doi.org/10.1371/journal.pone.0300216.t003

thumbnail

https://doi.org/10.1371/journal.pone.0300216.t004

thumbnail

https://doi.org/10.1371/journal.pone.0300216.t005

thumbnail

https://doi.org/10.1371/journal.pone.0300216.t006

thumbnail

https://doi.org/10.1371/journal.pone.0300216.t007

thumbnail

https://doi.org/10.1371/journal.pone.0300216.t008

thumbnail

https://doi.org/10.1371/journal.pone.0300216.t009

thumbnail

https://doi.org/10.1371/journal.pone.0300216.t010

thumbnail

https://doi.org/10.1371/journal.pone.0300216.t011

thumbnail

https://doi.org/10.1371/journal.pone.0300216.t012

thumbnail

https://doi.org/10.1371/journal.pone.0300216.t013

thumbnail

https://doi.org/10.1371/journal.pone.0300216.t014

thumbnail

https://doi.org/10.1371/journal.pone.0300216.t015

thumbnail

https://doi.org/10.1371/journal.pone.0300216.t016

thumbnail

https://doi.org/10.1371/journal.pone.0300216.t017

thumbnail

https://doi.org/10.1371/journal.pone.0300216.t018

thumbnail

https://doi.org/10.1371/journal.pone.0300216.t019

thumbnail

https://doi.org/10.1371/journal.pone.0300216.t020

Tables 1 – 24 present the performance metrics of neural network models trained on data from these countries. Each table contains 11 columns representing specific information:

Sl No: Serial number or index of the row in the table.

Neurons: The count of neurons in the neural network’s hidden layer.

RMSE (Train): The model’s RMSE on the training dataset, multiplied by 1000 for scale.

R 2 (Train): Coefficient of determination for the model on the training dataset, expressed as a percentage.

MAPE (Train): Model’s MAPE on the training dataset, expressed as a percentage.

RMSE (Validation): RMSE of the model on the validation set, scaled by 1000.

R 2 (Validation): Coefficient of determination for the model on the validation dataset, expressed as a percentage.

MAPE (Validation): Model’s MAPE on the validation dataset, expressed as a percentage.

RMSE (Test): RMSE of the model on the test set, multiplied by 1000 for scale.

R 2 (Test): Coefficient of determination for the model on the test dataset, expressed as a percentage.

MAPE (Test): Model’s MAPE on the test dataset, expressed as a percentage.

thumbnail

https://doi.org/10.1371/journal.pone.0300216.t021

thumbnail

https://doi.org/10.1371/journal.pone.0300216.t022

thumbnail

https://doi.org/10.1371/journal.pone.0300216.t023

thumbnail

https://doi.org/10.1371/journal.pone.0300216.t024

RMSE, MAPE, and R 2 are key metrics for evaluating regression model performance. The tables for each country’s dataset cover ANN, LSTM, and GRU models with single and two hidden layers, showcasing the impact of neurons and layers on predictive accuracy and generalization. This comparative analysis aids in selecting the optimal neural network configuration for each dataset.

Each Tables 1 – 24 is dedicated to a specific type of neural network model (ANN, LSTM, GRU) and considers variations in the number of hidden layers and neurons. The performance of each model configuration is evaluated based on several metrics: RMSE, R 2 , and MAPE. These metrics are calculated for training, validation, and test datasets. For each country’s dataset, there are tables corresponding to ANN models with single and two hidden layers, LSTM models with single and two hidden layers, and GRU models with single and two hidden layers. The tables are designed to help in selecting the optimal model configuration for each type of neural network, based on the performance metrics across different datasets. This detailed comparison aids in understanding how the number of neurons and hidden layers in a model can impact its predictive accuracy and generalization capabilities for specific datasets. The data in the tables has been adjusted to display certain values as percentages where relevant. This adjustment is especially useful for metrics like R 2 and MAPE, along with other ratio-based figures. Furthermore, to avoid an abundance of decimal places and to improve clarity, the RMSE values have been scaled up by a factor of 1000.

Fig 10 presents the learning curves for GRU models across four different countries: Canada, Portugal, Spain, and the United States. Each model’s training process, represented by the blue line, shows a reduction in loss over epochs, indicating effective learning. Notably, the Canadian and United States models demonstrate a pronounced decrease in training loss, whereas the validation loss for Portugal remains notably stable, suggesting consistent model performance. The Spanish model’s validation loss exhibits more variability, potentially highlighting challenges in generalization. No apparent signs of overfitting are observed within the range of epochs presented, as the validation losses do not trend upwards. Overall, the models demonstrate their potential to fit well to the training data while maintaining a reasonable generalization to the validation data.

Fig 9 presents the learning curves for LSTM models across four different countries: Canada, Portugal, Spain, and the United States. The training loss, depicted by the blue line, indicates a trend of learning and improvement across epochs for all countries. However, the validation loss, depicted by the red line, exhibits fluctuations, which are more pronounced for Portugal and Spain, suggesting challenges in model generalization and potential overfitting. For Canada and the United States, the gap between training and validation loss is relatively smaller, indicating better generalization performance.

Fig 8 illustrates the learning curves for ANN models across four distinct countries: Canada, Portugal, Spain, and the United States. The blue lines, representing the training loss, generally exhibit a downward trend, suggesting a steady improvement in the model’s ability to fit the training data over the epochs. The red lines, indicating the validation loss, fluctuate and do not show a clear decreasing trend. However, the four models show a closer convergence between training and validation loss, which could imply a more robust generalization capability.

The evaluation of neural network models for different datasets, as summarized in Table 25 reveals insightful trends and performance benchmarks.

thumbnail

https://doi.org/10.1371/journal.pone.0300216.t025

For the Canada dataset, the ANN model with a single hidden layer and 8 neurons and the ANN model with two hidden layers and 3 neurons show commendable performance, particularly in achieving high R 2 percentages and low RMSE values. The LSTM and GRU models, both single and double-layered, also exhibit competitive performance, with the GRU single-layer model having 1 neuron demonstrating particular effectiveness in generalization across the validation and test datasets.

In the context of the Portugal dataset, the ANN single-layer model with 7 neurons stands out, especially in training performance. For the double-layer models, all three types of neural networks with 1 neuron each exhibit impressive R 2 percentages, particularly in the validation and test phases, indicating strong predictive accuracy.

The Spain dataset shows a similar pattern where the ANN single-layer model with 8 neurons excels in both training and testing phases. In the two hidden layers scenario, the ANN model with 11 neurons and the LSTM model with 5 neurons are noteworthy for their high R 2 values and low RMSE scores, suggesting a robust model performance.

For the USA dataset, the single-layer ANN model with 5 neurons and the double-layer ANN model with 12 neurons show superior performance, particularly in terms of R 2 and RMSE metrics. This indicates their effectiveness in capturing the underlying patterns in the dataset with a balance of complexity and generalization ability.

These results underscore the importance of choosing the right architecture and neuron count in neural network models for different datasets, highlighting the effectiveness of certain configurations in optimizing predictive performance.

Forecasting methodology

In our study, we conducted a detailed forecasting analysis for Canada, Portugal, and the USA using different neural network architectures. The goal was to predict the number of MPXV cases one month ahead, based on the actual reported cases. The accuracy of these forecasts was quantified using the MAPE.

For Canada, with 43 actual cases, our models demonstrated varying levels of accuracy. ANN with a single hidden layer predicted 42 cases with a MAPE of 2.3%, showcasing its high precision ( Table 26 ). In comparison, when employing two hidden layers, the ANN model maintained the same MAPE, predicting 42 cases ( Table 26 ).

thumbnail

https://doi.org/10.1371/journal.pone.0300216.t026

In Portugal, with 53 actual cases, our ANN models achieved notable accuracy. The single-layer ANN model estimated 54 cases with a MAPE of 1.9%, while the two-layer ANN model achieved perfect accuracy with a MAPE of 0.0%, predicting 53 cases. For a scenario with 51 actual cases, the two-layer ANN model showed a slight increase in MAPE to 2.0%, estimating 52 cases.

The forecasting results for the USA, with 47 actual cases, further highlighted the effectiveness of the ANN models. The single-layer ANN model estimated 50 cases with a MAPE of 6.4%, whereas the two-layer model predicted 48 cases with a reduced MAPE of 2.1%.

Across all countries, the ANN models consistently outperformed LSTM and GRU models in terms of accuracy, as reflected in their lower MAPE values. This suggests that ANN architectures, particularly with two hidden layers, are more adept at capturing the trends and nuances in the data, leading to more accurate forecasts for MPXV cases.

Discussion benefits of the results in the wide perspective of industrial production

The findings of this study have significant implications for the practical application in public health management, particularly in the context of infectious disease outbreaks like Monkeypox. The predictive models developed can be integrated into health surveillance systems, aiding healthcare authorities in early detection and response planning. This proactive approach is crucial for effective disease management, enabling timely interventions such as targeted vaccinations and public health advisories.

Moreover, the methodology and results can be adapted for forecasting other infectious diseases, demonstrating the versatility of the approach. This adaptability is particularly beneficial for regions where healthcare resources are limited, as it allows for strategic allocation of resources based on predicted outbreak patterns. Such data-driven strategies can optimize the use of medical supplies, personnel, and facilities, enhancing the overall efficiency of healthcare systems.

In addition, the study’s approach can be instrumental in guiding policy decisions, such as travel advisories or quarantine measures, by providing accurate forecasts of disease spread. This is especially relevant in the context of global health, where the mobility of populations can significantly impact the dynamics of infectious diseases.

Furthermore, the potential for collaboration with industries involved in healthcare technology cannot be overlooked. The integration of advanced neural network models into health tech solutions can pave the way for more sophisticated disease tracking and prediction tools, contributing to the larger goal of global health security.

This study presented a comprehensive analysis of three different neural network models—ANN, LSTM, and GRU—for predicting the spread of MPXV in the USA, Canada, Spain, and Portugal. Our findings demonstrated that while each model has its strengths, certain models outperformed others in specific scenarios.

For instance, the ANN model exhibited superior performance in terms of lower RMSE and higher R2 values compared to the other models, particularly in predicting short-term trends. Also, LSTM and Gru showed great accuracy in predictions. The ANN model, while more sophisticated than LSTM and GRU, but LSTM and GRU still provided valuable insights.

Quantitatively, the ANN model achieved an average RMSE and an R2 in predicting cases over a 1-month horizon, outperforming the LSTM’s RMSE and R2, and the GRU’s RMSE and R2. These results highlight the potential of utilizing advanced machine learning techniques in epidemiological forecasting.

The study’s methodology, while robust, has certain limitations. The accuracy of the neural network models, including LSTM and GRU, hinges on the quality and completeness of the epidemiological data, which may have gaps or inaccuracies. The complexity of these models can also lead to overfitting, limiting generalizability to new data or scenarios. Moreover, The model’s predictions are based on past data and may not account for future changes in virus behavior, public health policies, or other unforeseen factors.

To address the limitation of machine learning models’ inability to extrapolate beyond the conditions of the study, one solution is to incorporate a diverse and comprehensive dataset that covers a wide range of scenarios. This can help the model learn various patterns and improve its generalizability. Additionally, employing techniques like transfer learning, where a model trained on one task is fine-tuned for another related task, can help in adapting the model to new conditions. Regular updating and retraining of the model with new data as it becomes available can also ensure the model remains relevant and accurate over time. Furthermore, combining machine learning models with domain-specific knowledge and expert insights can enhance the model’s applicability to new conditions.

The methods utilized in this study, specifically ANN, LSTM, and GRU, are not only theoretically robust but also practically applicable in scientific research. Their adaptability to analyze complex data patterns makes them invaluable tools in epidemiological studies, such as forecasting infectious disease spread. These models can handle large-scale data efficiently, identifying underlying trends and making accurate predictions. This capability is crucial for public health officials and researchers in planning interventions and making informed decisions based on predictive analytics.

  • View Article
  • PubMed/NCBI
  • Google Scholar
  • 8. Moore MJ, Rathish B, Zahra F. Monkeypox. StatPearls [Internet]. StatPearls Publishing. 2022.
  • 12. Ahsan MM, Uddin MR, Farjana M, Sakib AN, Momin KA, Luna SA. Image Data Collection and Implementation of Deep Learning-Based Model in Detecting Monkeypox Disease Using Modified VGG16. arXiv preprint arXiv:2206.01862. 2022.
  • 21. Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980. 2014.
  • 23. Roser M, Ortiz-Ospina E, Ritchie H. Our World in Data. University of Oxford. 2013. Available from: https://ourworldindata.org/
  • 25. Cho K, Van Merriënboer B, Bahdanau D, Bengio Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078. 2014.
  • 26. Cho K, Van Merriënboer B, Bahdanau D, Bengio Y. On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259. 2014.
  • 28. Geirhos R, Janssen DHJ, Schütt HH, Rauber J, Bethge M, Wichmann FA. Comparing deep neural networks against humans: object recognition when the signal gets weaker. arXiv preprint arXiv:1706.06969. 2017.
  • 32. Bishop CM. Neural networks for pattern recognition. Oxford university press. 1995.
  • 33. Goodfellow I, Bengio Y, Courville A. Deep learning. MIT press. 2016.
  • 36. Hebb DO. The organization of behavior; a neuropsychological theory. Wiley. 1949.
  • 41. Fletcher R. Practical methods of optimization. John Wiley & Sons. 2013.
  • 43. Chung J, Gulcehre C, Cho K, Bengio Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555. 2014.
  • 46. Seraj A, Mohammadi-Khanaposhtani M, Daneshfar R, Naseri M, Esmaeili M, Baghban A, et al. Cross-validation. In: Handbook of Hydroinformatics. Elsevier. 2023. p. 89–105.

How Artificial Intelligence Can Enhance the Diagnosis of Cardiac Amyloidosis: A Review of Recent Advances and Challenges

Affiliations.

  • 1 Department of Cardiovascular Medicine, Mayo Clinic, Phoenix, AZ 85054, USA.
  • 2 Division of Cardiovascular Imaging, Mayo Clinic, 5777 East Mayo Boulevard, Phoenix, AZ 85054, USA.
  • PMID: 38667736
  • PMCID: PMC11050851
  • DOI: 10.3390/jcdd11040118

Cardiac amyloidosis (CA) is an underdiagnosed form of infiltrative cardiomyopathy caused by abnormal amyloid fibrils deposited extracellularly in the myocardium and cardiac structures. There can be high variability in its clinical manifestations, and diagnosing CA requires expertise and often thorough evaluation; as such, the diagnosis of CA can be challenging and is often delayed. The application of artificial intelligence (AI) to different diagnostic modalities is rapidly expanding and transforming cardiovascular medicine. Advanced AI methods such as deep-learning convolutional neural networks (CNNs) may enhance the diagnostic process for CA by identifying patients at higher risk and potentially expediting the diagnosis of CA. In this review, we summarize the current state of AI applications to different diagnostic modalities used for the evaluation of CA, including their diagnostic and prognostic potential, and current challenges and limitations.

Keywords: artificial intelligence; cardiac amyloidosis; convolutional neural networks; deep-learning.

Publication types

Grants and funding.

A systematic review and meta-analysis of artificial neural network application in geotechnical engineering: theory and applications

  • Review Article
  • Published: 18 March 2019
  • Volume 32 , pages 495–518, ( 2020 )

Cite this article

literature review of neural network

  • Hossein Moayedi 1 , 2 ,
  • Mansour Mosallanezhad 3 ,
  • Ahmad Safuan A. Rashid 4 , 5 ,
  • Wan Amizah Wan Jusoh 6 &
  • Mohammed Abdullahi Muazu 7  

4924 Accesses

105 Citations

Explore all metrics

Artificial neural network (ANN) aimed to simulate the behavior of the nervous system as well as the human brain. Neural network models are mathematical computing systems inspired by the biological neural network in which try to constitute animal brains. ANNs recently extended, presented, and applied by many research scholars in the area of geotechnical engineering. After a comprehensive review of the published studies, there is a shortage of classification of study and research regarding systematic literature review about these approaches. A review of the literature reveals that artificial neural networks is well established in modeling retaining walls deflection, excavation, soil behavior, earth retaining structures, site characterization, pile bearing capacity (both skin friction and end-bearing) prediction, settlement of structures, liquefaction assessment, slope stability, landslide susceptibility mapping, and classification of soils. Therefore, the present study aimed to provide a systematic review of methodologies and applications with recent ANN developments in the subject of geotechnical engineering. Regarding this, a major database of the web of science has been selected. Furthermore, meta-analysis and systematic method which called PRISMA has been used. In this regard, the selected papers were classified according to the technique and method used, the year of publication, the authors, journals and conference names, research objectives, results and findings, and lastly solution and modeling. The outcome of the presented review will contribute to the knowledge of civil and/or geotechnical designers/practitioners in managing information in order to solve most types of geotechnical engineering problems. The methods discussed here help the geotechnical practitioner to be familiar with the limitations and strengths of ANN compared with alternative conventional mathematical modeling methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

literature review of neural network

Similar content being viewed by others

literature review of neural network

A comparative analysis of machine learning algorithms for predicting wave runup

literature review of neural network

Fundamentals of Artificial Neural Networks and Deep Learning

literature review of neural network

Machine learning for geochemical exploration: classifying metallogenic fertility in arc magmas and insights into porphyry copper deposit formation

Lee SJ, Lee SR, Kim YS (2003) An approach to estimate unsaturated shear strength using artificial neural network and hyperbolic formulation. Comput Geotech 30(6):489–503

Article   Google Scholar  

Pujitha AK, Sivaswamy J (2018) Solution to overcome the sparsity issue of annotated data in medical domain. CAAI Trans Intell Technol 3(3):153–160

Adeli H (2001) Neural networks in civil engineering: 1989–2000. Comput-Aided Civi Infrastruct Eng 16(2):126–142

Panwar P, Michael P (2018) Empirical modelling of hydraulic pumps and motors based upon the Latin hypercube sampling method. Int J Hydromechatron 1(3):272–292

Gao W, Wang W, Dimitrov D, Wang Y (2018) Nano properties analysis via fourth multiplicative ABC indicator calculating. Arab J Chem 11(6):793–801

Zhang RL, Lowndes IS (2010) The application of a coupled artificial neural network and fault tree analysis model to predict coal and gas outbursts. Int J Coal Geol 84(2):141–152

Moayedi H, Huat B, Thamer A, Torabihaghighi A, Asadi A (2010) Analysis of longitudinal cracks in crest of Doroodzan Dam. Electron J Geotech Eng, USA (15D):337–347

Google Scholar  

Shahin MA, Jaksa MB, Maier HR (2001) Artificial neural network applications in geotechnical engineering. Aust Geomech 36(1):49–62

Johnson JL (2018) Design of experiments and progressively sequenced regression are combined to achieve minimum data sample size. Int J Hydromechatron 1(3):308–331

Zhou Y, Sun Q, Liu J (2018) Robust optimisation algorithm for the measurement matrix in compressed sensing. CAAI Trans Intell Technol 3(3):133–139

Kostic S, Vasovic N, Todorovic K, Samcovic A (2016) Application of artificial neural networks for slope stability analysis in geotechnical practice. In: 2016 13th Symposium on neural networks and applications (neural) pp 89–94

Wang S-C (2003) Artificial neural network, interdisciplinary computing in java programming. Springer, Berlin, pp 81–100

Book   Google Scholar  

Choobbasti AJ, Farrokhzad F, Barari A (2009) Prediction of slope stability using artificial neural network (case study: Noabad, Mazandaran, Iran). Arab J Geosci 2(4):311–319

Gandomi AH, Alavi AH (2012) A new multi-gene genetic programming approach to non-linear system modeling. Part II: geotechnical and earthquake engineering problems. Neural Comput Appl 21(1):189–201

Mukhlisin M, El-Shafie A, Taha MR (2012) Regularized versus non-regularized neural network model for prediction of saturated soil-water content on weathered granite soil formation. Neural Comput Appl 21(3):543–553

Lian C, Zeng ZG, Yao W, Tang HM (2014) Ensemble of extreme learning machine for landslide displacement prediction based on time series analysis. Neural Comput Appl 24(1):99–107

Salsani A, Daneshian J, Shariati S, Yazdani-Chamzini A, Taheri M (2014) Predicting roadheader performance by using artificial neural network. Neural Comput Appl 24(7–8):1823–1831

Bahrami A, Monjezi M, Goshtasbi K, Ghazvinian A (2011) Prediction of rock fragmentation due to blasting using artificial neural network. Eng Comput 27(2):177–181

Mert E (2014) An artificial neural network approach to assess the weathering properties of sancaktepe granite. Geotech Geol Eng 32(4):1109–1121

Moayedi H, Rezaei A (2017) An artificial neural network approach for under reamed piles subjected to uplift forces in dry sand. Neural Comput Appl 28:1–10

Shu SX, Gong WH (2016) An artificial neural network-based response surface method for reliability analyses of c-phi slopes with spatially variable soil. China Ocean Eng 30(1):113–122

Dong C, Dong XC, Gehman J, Lefsrud L (2017) Using BP neural networks to prioritize risk management approaches for China’s unconventional shale gas industry. Sustainability 9(6):18

Adams MD, Kanaroglou PS (2016) Mapping real-time air pollution health risk for environmental management: combining mobile and stationary air pollution monitoring with neural network models. J Environ Manag 168:133–141

Lisboa PJG (2002) A review of evidence of health benefit from artificial neural networks in medical intervention. Neural Netw 15(1):11–39

Egmont-Petersen M, de Ridder D, Handels H (2002) Image processing with neural networks—a review. Pattern Recognit 35(10):2279–2301

Article   MATH   Google Scholar  

Ayyildiz M, Cetinkaya K (2017) Predictive modeling of geometric shapes of different objects using image processing and an artificial neural network. Proc Inst Mech Eng Part E-J Process Mech Eng 231(6):1206–1216

Gao W, Dimitrov D, Abdo H (2018) Tight independent set neighborhood union condition for fractional critical deleted graphs and ID deleted graphs. Discrete Contin Dyn Syst-S 12(4&5):711–721

MathSciNet   MATH   Google Scholar  

Gao W, Guirao JLG, Basavanagoud B, Wu J (2018) Partial multi-dividing ontology learning algorithm. Inf Sci 467:35–58

Article   MathSciNet   Google Scholar  

Gao W, Guirao JLG, Abdel-Aty M, Xi W (2019) An independent set degree condition for fractional critical deleted graphs. Discret Contin Dyn Syst-S 12(4&5):877–886

Article   MathSciNet   MATH   Google Scholar  

Gao W, Wu H, Siddiqui MK, Baig AQ (2018) Study of biological networks using graph theory. Saudi J Biol Sci 25(6):1212–1219

Chou J-S, Thedja JPP (2016) Metaheuristic optimization within machine learning-based classification system for early warnings related to geotechnical problems. Autom Constr 68:65–80

Lary DJ, Alavi AH, Gandomi AH, Walker AL (2016) Machine learning in geosciences and remote sensing. Geosci Front 7(1):3–10

Wong BK, Bodnovich TA, Selvi Y (1997) Neural network applications in business: a review and analysis of the literature (1988–1995). Decis Support Syst 19(4):301–320

Lazarevska M, Knezevic M, Cvetkovska M, Trombeva-Gavriloska A (2014) Application of artificial neural networks in civil engineering. Teh Vjesn 21(6):1353–1359

Chen JJ, Zeng ZG, Jiang P, Tang HM (2016) Application of multi-gene genetic programming based on separable functional network for landslide displacement prediction. Neural Comput Appl 27(6):1771–1784

Zhang ZF, Liu ZB, Zheng LF, Zhang Y (2014) Development of an adaptive relevance vector machine approach for slope stability inference. Neural Comput Appl 25(7–8):2025–2035

Chou JS, Thedja JPP (2016) Metaheuristic optimization within machine learning-based classification system for early warnings related to geotechnical problems. Autom Constr 68:65–80

Flood I, Kartam N (1994) Neural networks in civil engineering.1. principles and understanding. J Comput Civ Eng 8(2):131–148

Flood I, Kartam N (1994) Neural networks in civil engineering.2. systems and application. J Comput Civ Eng 8(2):149–162

Lu PZ, Chen SY, Zheng YJ (2012) Artificial intelligence in civil engineering. Math Probl Eng 145974:1–22

Li J, Hao H (2016) A review of recent research advances on structural health monitoring in Western Australia. Struct Monit Maint 3(1):33–49

MathSciNet   Google Scholar  

Bolt G (1991) Fault models for artificial neural networks. IEEE, Piscataway

Lee C, Sterling R (1992) Identifying probable failure modes for underground openings using a neural network. Int J Rock Mech Min Sci 29(1):49–67

Goh ATC, Wong KS, Broms BB (1995) Estimation of lateral wall movements in braced excavations using neural networks. Can Geotech J 32(6):1059–1064

Watson JN, Fairfield CA, Wan C, Sibbald A (1995) The use of artificial neural networks in pile integrity testing. Civil Comp Press, Edinburgh

Lee IM, Lee JH (1996) Prediction of pile bearing capacity using artificial neural networks. Comput Geotech 18(3):189–200

Niroumand H, Kassim KA, Nazir R, Faizi K, Adhami B, Moayedi H, Loon W (2012) Slope stability and sheet pile and contiguous bored pile walls. Electron J Geotech Eng 17:19–27

Moayedi H, Nazir R, Mosallanezhad M (2015) Determination of reliable stress and strain distributions along bored piles. Soil Mech Found Eng 51(6):285–291

Nazir R, Moayedi H, Mosallanezhad M, Tourtiz A (2015) Appraisal of reliable skin friction variation in a bored pile. Proc Inst Civ Eng-Geotech Eng 168(1):75–86

Moayedi H, Armaghani DJ (2017) Optimizing an ANN model with ICA for estimating bearing capacity of driven pile in cohesionless soil. Eng Comput 34(2):347–356

Moayedi H, Mosallanezhad M (2017) Uplift resistance of belled and multi-belled piles in loose sand. Measurement 109:346–353

Moayedi H, Mosallanezhad M, Nazir R (2017) Evaluation of maintained load test (MLT) and pile driving analyzer (PDA) in measuring bearing capacity of driven reinforced concrete piles. Soil Mech Found Eng 54(3):150–154

Mosallanezhad M, Moayedi H (2017) Developing hybrid artificial neural network model for predicting uplift resistance of screw piles. Arab J Geosci 10(22):10

Nazir R, Moayedi H, Subramaniam P, Gue S-S (2017) Application and design of transition piled embankment with surcharged prefabricated vertical drain intersection over soft ground. Arab J Sci Eng 43:1–10

Moayedi H, Hayati S (2018) Applicability of a CPT-based neural network solution in predicting load-settlement responses of bored pile. Int J Geomech 18(6):06018009

Moayedi H, Hayati S (2018) Artificial intelligence design charts for predicting friction capacity of driven pile in clay. Neural Comput Appl. https://doi.org/10.1007/s00521-018-3555-5

Asadi A, Moayedi H, Huat BB, Boroujeni FZ, Parsaie A, Sojoudi S (2011) Prediction of zeta potential for tropical peat in the presence of different cations using artificial neural networks. Int J Electrochem Sci 6(4):1146–1158

Asadi A, Moayedi H, Huat BBK, Parsaie A, Taha MR (2011) Artificial neural networks approach for electrochemical resistivity of highly organic soil. Int J Electrochem Sci 6(4):1135–1145

Asadi A, Shariatmadari N, Moayedi H, Huat BB (2011) Effect of MSW leachate on soil consistency under influence of electrochemical forces induced by soil particles. Int J Electrochem Sci 6(7):2344–2351

Benardos AG, Kaliampakos DC (2004) Modelling TBM performance with artificial neural networks. Tunn Undergr Space Technol 19(6):597–605

Ahmad I, El Naggar M, Khan AN (2007) Artificial neural network application to estimate kinematic soil pile interaction response parameters. Soil Dyn Earthq Eng 27(9):892–905

Chakraborty A, Goswami D (2017) Prediction of slope stability using multiple linear regression (MLR) and artificial neural network (ANN). Arab J Geosci 10(17):11

Shahin MA (2015) A review of artificial intelligence applications in shallow foundations. Int J Geotech Eng 9(1):49–60

Fatehnia M, Amirinia G (2018) A review of genetic programming and artificial neural network applications in pile foundations. Int J Geo-Eng 9(1):20

Mabbutt S, Picton P, Shaw P, Black S (2012) Review of artificial neural networks (ANN) applied to corrosion monitoring. In: Ball A, Mishra R, Gu F, Rao BKN (eds) 25th international congress on condition monitoring and diagnostic engineering. Iop Publishing Ltd., Bristol

Shahin MA (2016) State-of-the-art review of some artificial intelligence applications in pile foundations. Geosci Front 7(1):33–44

Lai JX, Qiu JL, Feng ZH, Chen JX, Fan HB (2016) Prediction of soil deformation in tunnelling using artificial neural networks. Comput Intell Neurosci 16:33

Alimoradi A, Moradzadeh A, Naderi R, Salehi MZ, Etemadi A (2008) Prediction of geological hazardous zones in front of a tunnel face using TSP-203 and artificial neural networks. Tunn Undergr Space Technol 23(6):711–717

Alavi AH, Gandomi AH (2011) A robust data mining approach for formulation of geotechnical engineering systems. Eng Comput 28(3–4):242–274

Zhang WG, Goh ATC (2016) Predictive models of ultimate and serviceability performances for underground twin caverns. Geomech Eng 10(2):175–188

Zhang WG, Goh ATC (2015) Regression models for estimating ultimate and serviceability limit states of underground rock caverns. Eng Geol 188:68–76

Asr AA, Javadi A (2016) Air losses in compressed air tunnelling: a prediction model. Proc Inst Civ Eng-Eng Comput Mech 169(3):140–147

Latifi N, Vahedifard F, Ghazanfari E, Horpibulsuk S, Marto A, Williams J (2018) Sustainable improvement of clays using low-carbon nontraditional additive. Int J Geomech 18(3):10

Moayedi H, Hayati S (2018) Modelling and optimization of ultimate bearing capacity of strip footing near a slope by soft computing methods. Appl Soft Comput 66:208–219

Moayedi H, Huat B, Kazemian S, Asadi A (2010) Optimization of shear behavior of reinforcement through the reinforced slope. Electron J Geotech Eng

Moayedi H, Huat BB, Asadi A (2010) Strain absorption optimization of reinforcement in geosynthetic reinforced slope-experimental and FEM modeling. Electron J Geotech Eng, USA 15

Nazir R, Ghareh S, Mosallanezhad M, Moayedi H (2016) The influence of rainfall intensity on soil loss mass from cellular confined slopes. Measurement 81:13–25

Nazir R, Moayedi H (2014) Soil mass loss reduction during rainfalls by reinforcing the slopes with the surficial confinement. World Academy of Science, Engineering and Technology. Int J Geol Environ Eng 8(6):381–384

Raftari M, Kassim KA, Rashid ASA, Moayedi H (2013) Settlement of shallow foundations near reinforced slopes. Electron J Geotech Eng 18:797–808

Shahri AA (2016) Assessment and prediction of liquefaction potential using different artificial neural network models: a case study. Geotech Geol Eng 34(3):807–815

Chern SG, Lee CY (2009) CPT-based simplified liquefaction assessment by using fuzzy-neural network. J Mar Sci Technol-Taiwan 17(4):326–331

Calabrese A, Lai CG (2013) Fragility functions of blockwork wharves using artificial neural networks. Soil Dyn Earthq Eng 52:88–102

Moayedi H, Huat BB, Moayedi F, Asadi A, Parsaie A (2011) Effect of sodium silicate on unconfined compressive strength of soft clay. Electron J Geotech Eng 16:289–295

Garg A, Garg A, Tai K, Barontini S, Stokes A (2014) A computational intelligence-based genetic programming approach for the simulation of soil water retention curves. Transp Porous Media 103(3):497–513

Erzin Y (2007) Artificial neural networks approach for swell pressure versus soil suction behaviour. Can Geotech J 44(10):1215–1223

Latifi N, Marto A, Eisazadeh A (2016) Experimental investigations on behaviour of strip footing placed on chemically stabilised backfills and flexible retaining walls. Arab J Sci Eng 41(10):4115–4126

Latifi N, Rashid ASA, Siddiqua S, Abd Majid MZ (2016) Strength measurement and textural characteristics of tropical residual soil stabilised with liquid polymer. Measurement 91:46–54

Bagtzoglou AC, Hossain F (2009) Radial basis function neural network for hydrologic inversion: an appraisal with classical and spatio-temporal geostatistical techniques in the context of site characterization. Stoch Environ Res Risk Assess 23(7):933–945

Juang CH, Jiang T, Christopher RA (2001) Three-dimensional site characterisation: neural network approach. Geotechnique 51(9):799–809

AttohOkine NO, Fekpe ESK (1996) Strength characteristics modeling of lateritic soils using adaptive neural networks. Constr Build Mater 10(8):577–582

Zhu JH, Zaman MM, Anderson SA (1998) Modelling of shearing behaviour of a residual soil with recurrent neural network. Int J Numer Anal Methods Geomech 22(8):671–687

Pal M (2006) Support vector machines-based modelling of seismic liquefaction potential. Int J Numer Anal Methods Geomech 30(10):983–996

Pala M, Caglar N, Elmas M, Cevik A, Saribiyik M (2008) Dynamic soil-structure interaction analysis of buildings by neural networks. Constr Build Mater 22(3):330–342

Nazzal MD, Tatari O (2013) Evaluating the use of neural networks and genetic algorithms for prediction of subgrade resilient modulus. Int J Pavement Eng 14(4):364–373

Park HI, Kweon GC, Lee SR (2009) Prediction of resilient modulus of granular subgrade soils and subbase materials using artificial neural network. Road Mater Pavement Des 10(3):647–665

Groholski DR, Hashash YMA (2013) Development of an inverse analysis framework for extracting dynamic soil behavior and pore pressure response from downhole array measurements. Int J Numer Anal Methods Geomech 37(12):1867–1890

Nazir R, Moayedi H, Pratikso A, Mosallanezhad M (2014) The uplift load capacity of an enlarged base pier embedded in dry sand. Arab J Geosci 8:1–12

Moayedi H (2019) Optimization of ANFIS with GA and PSO estimating α in driven shafts. Eng Comput 35:1–12

Chan WT, Chow YK, Liu LF (1995) Neural-network—an alternative to pile driving formulas. Comput Geotech 17(2):135–156

Ismail A, Jeng DS (2011) Modelling load-settlement behaviour of piles using high-order neural network (HON-PILE model). Eng Appl Artif Intell 24(5):813–821

Li YZ, Yao QF, Qin LK (2008) The application of neural network to deep foundation pit retaining structure displacement prediction. World Acad Union-World Acad Press, Liverpool

Chen YH, Wang YW (2012) The analysis on the deformation predition of pile-anchor retaining structure in deep foundation pit in Kunming. In: Zhou XG, Chu MJ, Liu JM, Qu SY, Fan HT (eds) Progress in Structure, Pts 1-4. Trans Tech Publications Ltd., Stafa-Zurich, pp 1222–1225

Tiryaki B (2008) Predicting intact rock strength for mechanical excavation using multivariate statistics, artificial neural networks, and regression trees. Eng Geol 99(1–2):51–60

Cao JW, Huang WH, Zhao T, Wang JZ, Wang RR (2017) An enhance excavation equipments classification algorithm based on acoustic spectrum dynamic feature. Multidimens Syst Signal Process 28(3):921–943

Kwon S, Wilson JW (1998) Investigation of the influence of an excavation on adjacent excavations, using neural networks. J S Afr Inst Min Metall 98(3):147–156

Jan JC, Hung SL, Chi SY, Chern JC (2002) Neural network forecast model in deep excavation. J Comput Civ Eng 16(1):59–65

Chua CG, Goh ATC (2005) Estimating wall deflections in deep excavations using Bayesian neural networks. Tunn Undergr Space Technol 20(4):400–409

Huang FK, Wang GS (2007) ANN-based reliability analysis for deep excavation. IEEE, New York

Chern S, Tsai JH, Chien LK, Huang CY (2009) Predicting lateral wall deflection in top–down excavation by neural network. Int J Offshore Polar Eng 19(2):151–157

Yu J, Chen HM, Yu J, Chen HM (2009) Artificial neural network’s application in intelligent prediction of surface settlement induced by foundation pit excavation. Ieee Computer Soc, Los Alamitos

Huang YT, Siller TJ (1997) Fuzzy representation and reasoning in geotechnical site characterization. Comput Geotech 21(1):65–86

Yilmaz O, Eser M, Berilgen M (2009) Applications of engineering seismology for site characterization. J. Earth Sci 20(3):546–554

Garcia-Fernandez M, Jimenez MJ (2012) Site characterization in the Vega Baja, SE Spain, using ambient-noise H/V analysis. Bull Earthq Eng 10(4):1163–1191

Orhan A, Turkoz M, Tosun H (2013) Preliminary hazard assessment and site characterization of MeAYelik campus area. EskiAYehir-Turk Nat Hazards Earth Syst Sci 13(1):75–84

Kim AR, Cho GC, Kwon TH (2014) Site characterization and geotechnical aspects on geological storage of CO 2 in Korea. Geosci J 18(2):167–179

Cao ZJ, Wang Y, Li DQ (2016) Quantification of prior knowledge in geotechnical site characterization. Eng Geol 203:107–116

Wang JP (2016) Site characterization with multiple measurement profiles from different tests: a Bayesian approach. Soils Found 56(4):712–718

Aladejare AE, Wang Y (2017) Sources of uncertainty in site characterization and their impact on geotechnical reliability-based design. ASCE-ASME J Risk Uncertain Eng Syst Part A-Civ Eng 3(4):12

Roy N, Jakka RS (2017) Near-field effects on site characterization using MASW technique. Soil Dyn Earthq Eng 97:289–303

Samui P, Sitharam TG (2010) Site characterization model using least-square support vector machine and relevance vector machine based on corrected SPT data (N-c). Int J Numer Anal Methods Geomech 34(7):755–770

MATH   Google Scholar  

Samui P, Sitharam TG (2010) Site characterization model using artificial neural network and kriging. Int J Geomech 10(5):171–180

Dwivedi VK, Dubey RK, Thockhom S, Pancholi V, Chopra S, Rastogi BK (2017) Assessment of liquefaction potential of soil in Ahmedabad region. West India J Indian Geophys Union 21(2):116–123

Monkul MM, Gultekin C, Gulver M, Akin O, Eseller-Bayat E (2015) Estimation of liquefaction potential from dry and saturated sandy soils under drained constant volume cyclic simple shear loading. Soil Dyn Earthq Eng 75:27–36

Shahri AA, Behzadafshar K, Rajablou R (2013) Verification of a new method for evaluation of liquefaction potential analysis. Arab J Geosci 6(3):881–892

Kayen R, Moss RES, Thompson EM, Seed RB, Cetin KO, Kiureghian AD, Tanaka Y, Tokimatsu K (2013) Shear-wave velocity-based probabilistic and deterministic assessment of seismic soil liquefaction potential. J Geotech Geoenviron Eng 139(3):407–419

Arango I, Lewis MR, Kramer C (2000) Updated liquefaction potential analysis eliminates foundation retrofitting of two critical structures. Soil Dyn Earthq Eng 20(1–4):17–25

Goh A (1994) Seismic liquefaction potential assessed by neural networks. J Geotech Eng 120(9):1467–1480

Seed HB, Tokimatsu K, Harder LF, Chung RM (1985) Influence of SPT procedures in soil liquefaction resistance evaluations. J Geotech Eng-ASCE 111(12):1425–1445

Goh ATC (1994) Nonlinear modelling in geotechnical engineering using neural networks. Trans Inst Eng, Aust Civ Eng 36(4):293–297

Juang CH, Chen CJX, Tien YM (1999) Appraising cone penetration test based liquefaction resistance evaluation methods: artificial neural network approach. Can Geotech J 36(3):443–454

Liu BY, Ye LY, Xiao ML, Miao S (2006) Artificial neural network methodology for soil liquefaction evaluation using CPT values. In: Huang DS, Li K, Irwin GW (eds) Intelligent computing, part I: international conference on intelligent computing, Icic 2006, part I. Springer, Berlin, pp 329–336

Chapter   Google Scholar  

Shibata T, Teparaksa W (1988) Evaluation of liquefaction potentials of soils using cone penetration tests. Soils Found 28(2):49–60

Wang J, Rahman MS (1999) A neural network model for liquefaction-induced horizontal ground displacement. Soil Dyn Earthq Eng 18(8):555–568

Young-Su K, Byung-Tak K (2006) Use of artificial neural networks in the prediction of liquefaction resistance of sands. J Geotech Geoenviron Eng 132(11):1502–1504

Hsu SC, Yang MD, Chen MC, Lin JY (2011) Neural network modeling of liquefaction resistance from shear wave velocity. In: Zhou M (ed) 2011 3rd World congress in applied computing, computer science, and computer engineering. Information Engineering Research Inst, Newark, p 155

Zhang WG, Goh ATC, Zhang YM, Chen YM, Xiao Y (2015) Assessment of soil liquefaction based on capacity energy concept and multivariate adaptive regression splines. Eng Geol 188:29–37

Goh ATC, Goh SH (2007) Support vector machines: their use in geotechnical engineering as illustrated using seismic liquefaction data. Comput Geotech 34(5):410–421

Lu P, Rosenbaum MS (2003) Artificial neural networks and Grey Systems for the prediction of slope stability. Nat Hazards 30(3):383–398

Li SJ, Liu YX (2004) Intelligent forecast procedures for slope stability with evolutionary artificial neural network. In: Yin FL, Wang J, Guo CG (eds) Advances in neural networks–Isnn 2004, Pt 2. Springer, Berlin, pp 792–798

Liu ZB, Shao JF, Xu WY, Chen HJ, Zhang Y (2014) An extreme learning machine approach for slope stability evaluation and prediction. Nat Hazards 73(2):787–804

Aghajani HF, Salehzadeh H, Shahnazari H (2015) Application of artificial neural network for calculating anisotropic friction angle of sands and effect on slope stability. J Cent South Univ 22(5):1878–1891

Rahul A, Khandelwal M, Rai R, Shrivastva BK (2015) Evaluation of dump slope stability of a coal mine using artificial neural network. Geomech Geophys Geo-Energy Geo-Resour 1(3–4):69–77

Gordan B, Armaghani DJ, Hajihassani M, Monjezi M (2016) Prediction of seismic slope stability through combination of particle swarm optimization and neural network. Eng Comput 32(1):85–97

Li AJ, Khoo S, Lyamin AV, Wang Y (2016) Rock slope stability analyses using extreme learning neural network and terminal steepest descent algorithm. Autom Constr 65:42–50

Yamagami T, Jiang JC, Ueta Y (1997) Back calculation of strength parameters for landslide control works using neural networks. A a Balkema Publishers, Leiden

Cai DS, Wang GY, Hu TS (1998) A neural network method of landslide prediction of the Geheyan reservoir area of Qingjiang. A a Balkema Publishers, Leiden

Kobayashi T, Furuta H, Hirokane M, Tanaka S, Tatekawa I (1998) Data mining and analysis for landslide risk using neural networks. A a Balkema Publishers, Leiden

Dahigamuwa T, Yu QY, Gunaratne M (2016) Feasibility study of land cover classification based on normalized difference vegetation index for landslide risk assessment. Geosciences 6(4):14

Pradhan B, Lee S (2010) Regional landslide susceptibility analysis using back-propagation neural network model at Cameron highland, Malaysia. Landslides 7(1):13–30

Murillo-Garcia FG, Alcantara-Ayala I (2015) Landslide susceptibility analysis and mapping using statistical multivariate techniques: Pahuatlan, Puebla, Mexico. In: Wu W (ed) Recent advances in modeling landslides and Debris flows. Springer, Berlin, pp 179–194

Souza FT, Ebecken NFF (2004) A data mining approach to landslide prediction. In: Zanasi A, Ebecken NFF, Brebbia CA (eds) Data mining V: data mining, text mining and their business applications. Wit Press, Southampton, pp 423–432

Wu AL, Zeng ZG, Fu CJ (2014) Data mining paradigm based on functional networks with applications in landslide prediction. In: Proceedings of the 2014 international joint conference on neural networks. IEEE, New York, pp 2826–2830

Li Y, Chen G, Tang C, Zhou G, Zheng L (2012) Rainfall and earthquake-induced landslide susceptibility assessment using GIS and artificial neural network. Nat Hazards Earth Syst Sci 12(8):2719–2729

Xu C, Shen LL, Wang GL (2016) Soft computing in assessment of earthquake-triggered landslide susceptibility. Environ Earth Sci 75(9):17

Wang WD, Xie CM, Du XG (2009) Landslides susceptibility mapping based on geographical information system, GuiZhou, south–west China. Environ Geol 58(1):33–43

Ilia I, Koumantakis I, Rozos D, Koukis G, Tsangaratos P (2015) A geographical information system (GIS) based probabilistic certainty factor approach in assessing landslide susceptibility: the case study of Kimi, Euboea, Greece. Springer, Cham

Moher D, Liberati A, Tetzlaff J, Altman DG, Prisma Group (2009) Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS Med 6(7):e1000097

Shamseer L, Moher D, Clarke M, Ghersi D, Liberati A, Petticrew M, Shekelle P, Stewart LA, Prisma PG (2015) Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015: elaboration and explanation. BMJ-Br Med J 349:25

Mardani A, Nilashi M, Zakuan N, Loganathan N, Soheilirad S, Saman MZM, Ibrahim O (2017) A systematic review and meta-analysis of SWARA and WASPAS methods: theory and applications with recent fuzzy developments. Appl Soft Comput 57:265–292

Welch V, Petticrew M, Tugwell P, Moher D, O’Neill J, Waters E, White H (2012) PRISMA-equity 2012 extension: reporting guidelines for systematic reviews with a focus on health equity. Plos Med 9(10):7

Liberati A, Altman DG, Tetzlaff J, Mulrow C, Gotzsche PC, Ioannidis JPA, Clarke M, Devereaux PJ, Kleijnen J, Moher D (2009) The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. Plos Med 6(7):28

Hill T, Marquez L, O’Connor M, Remus W (1994) Artificial neural network models for forecasting and decision making. Int J Forecast 10(1):5–15

Shafaei SM, Nourmohamadi-Moghadami A, Kamgar S (2016) Development of artificial intelligence based systems for prediction of hydration characteristics of wheat. Comput Electron Agric 128:34–45

Hornik K (1991) Approximation capabilities of multilayer feedforward networks. Neural Netw 4(2):251–257

Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2(5):359–366

Han J, Moraga C, Sinne S (1996) Optimization of feedforward neural networks. Eng Appl Artif Intell 9(2):109–119

Uncuoglu E, Laman M, Saglamer A, Kara HB (2008) Prediction of lateral effective stresses in sand using artificial neural network. Soils Found 48(2):141–153

Lian C, Zeng ZG, Yao W, Tang HM (2013) Displacement prediction model of landslide based on a modified ensemble empirical mode decomposition and extreme learning machine. Nat Hazards 66(2):759–771

Protopapadakis E, Schauer M, Pierri E, Doulamis AD, Stavroulakis GE, Bohrnsen JU, Langer S (2016) A genetically optimized neural classifier applied to numerical pile integrity tests considering concrete piles. Comput Struct 162:68–79

Mustafa MR, Rezaur RB, Rahardjo H, Isa MH (2012) Prediction of pore-water pressure using radial basis function neural network. Eng Geol 135:40–47

Shu SX, Gong WH (2015) Radial basis function neural network-based method for slope stability analysis under two-dimensional random field. Rock Soil Mech 36(4):1205–1210

Kang F, Li JJ, Xu Q (2017) System reliability analysis of slopes using multilayer perceptron and radial basis function networks. Int J Numer Anal Methods Geomech 41(18):1962–1978

Zhang W, Dai BB, Liu Z, Zhou CY (2017) Modeling free-surface seepage flow in complicated fractured rock mass using a coupled RPIM-FEM method. Transp Porous Media 117(3):443–463

Samui P, Kurup P, Dhivya S, Jagan J (2016) Reliability analysis of quick sand condition. Geotech Geol Eng 34(2):579–584

Asadizadeh M, Hossaini MF (2016) Predicting rock mass deformation modulus by artificial intelligence approach based on dilatometer tests. Arab J Geosci 9(2):15

Peng C, Wu W, Zhang BY (2015) Three-dimensional simulations of tensile cracks in geomaterials by coupling meshless and finite element method. Int J Numer Anal Methods Geomech 39(2):135–154

Wang Q, Lin J, Ji J, Fang H (2014) Reliability analysis of geotechnical engineering problems based on an RBF metamodeling technique. Crc Press-Taylor & Francis Group, Boca Raton

Liao KW, Fan JC, Huang CL (2011) An artificial neural network for groutability prediction of permeation grouting with microfine cement grouts. Comput Geotech 38(8):978–986

Liao KW, Huang CL (2011) Estimation of groutability of permeation grouting with microfine cement grouts using RBFNN. In: Liu D, Zhang H, Polycarpou M, Alippi C, He H (eds) Advances in neural networks—Isnn 2011, Pt Iii. Springer, Berlin, p 475

Ibric S, Jovanovic M, Djuric Z, Parojcic J, Solomun L, Lucic B (2007) Generalized regression neural networks in prediction of drug stability. J Pharm Pharmacol 59(5):745–750

Pal M, Deswal S (2008) Modeling pile capacity using support vector machines and generalized regression neural network. J Geotech Geoenviron Eng 134(7):1021–1024

Jiang P, Zeng ZG, Chen JJ, Huang TW (2014) Generalized regression neural networks with K-fold cross-validation for displacement of landslide forecasting. In: Zeng Z, Li Y, King I (eds) Advances in Neural Networks–—Isnn 2014. Springer, Berlin, pp 533–541

Goorani M, Hamidi A (2015) A generalized plasticity constitutive model for sand–gravel mixtures. Int J Civ Eng 13(2B):133–145

Rajesh BG, Choudhury D (2017) Generalized seismic active thrust on a retaining wall with submerged backfill using a modified pseudodynamic method. Int J Geomech 17(3):10

Cigizoglu HK, Alp M (2006) Generalized regression neural network in modelling river sediment yield. Adv Eng Softw 37(2):63–68

Li HZ, Guo S, Li CJ, Sun JQ (2013) A hybrid annual power load forecasting model based on generalized regression neural network with fruit fly optimization algorithm. Knowl-Based Syst 37:378–387

Kumar CS, Arumugam V, Sengottuvelusamy R, Srinivasan S, Dhakal H (2017) Failure strength prediction of glass/epoxy composite laminates from acoustic emission parameters using artificial neural network. Appl Acoust 115:32–41

Vardhan H, Bordoloi S, Garg A, Garg A, Sreedeep S (2017) Compressive strength analysis of soil reinforced with fiber extracted from water hyacinth. Eng Comput 34(2):330–342

Ahangar-Asr A, Javadi AA, Johari A, Chen Y (2014) Lateral load bearing capacity modelling of piles in cohesive soils in undrained conditions: an intelligent evolutionary approach. Appl Soft Comput 24:822–828

Samui P (2012) Determination of ultimate capacity of driven piles in cohesionless soil: a multivariate adaptive regression spline approach. Int J Numer Anal Methods Geomech 36(11):1434–1439

Samui P, Das SK, Sitharam TG (2009) Application of soft computing techniques to expansive soil characterization. In: Gopalakrishnan K, Ceylan H, Okine NOA (eds) Intelligent and soft computing in infrastructure systems engineering: recent advances. Springer, Berlin, pp 305–323

Jang JSR, Sun CT (1995) Neuro-fuzzy modeling and control. Proc IEEE 83(3):378–406

Jang SR (1993) ANFIS: adaptive-network-based fuzzy inference system. IEEE Trans Syst, Man, Cybern 23(3):665–685

Cabalar AF, Cevik A, Gokceoglu C (2012) Some applications of adaptive neuro-fuzzy inference system (ANFIS) in geotechnical engineering. Comput Geotech 40:14–33

Balamurugan G, Ramesh V, Touthang M (2016) Landslide susceptibility zonation mapping using frequency ratio and fuzzy gamma operator models in part of NH-39, Manipur. India Nat Hazards 84(1):465–488

Ramesh V, Anbazhagan S (2015) Landslide susceptibility mapping along Kolli hills Ghat road section (India) using frequency ratio, relative effect and fuzzy logic models. Environ Earth Sci 73(12):8009–8021

Bui DT, Pradhan B, Revhaug I, Nguyen DB, Pham HV, Bui QN (2015) A novel hybrid evidential belief function-based fuzzy logic model in spatial prediction of rainfall-induced shallow landslides in the Lang Son city area (Vietnam). Geomat Nat Hazards Risk 6(3):243–271

Vasu NN, Lee SR, Pradhan AMS, Kim YT, Kang SH, Lee DH (2016) A new approach to temporal modelling for landslide hazard assessment using an extreme rainfall induced-landslide index. Eng Geol 215:36–49

denHartog MH, Babuska R, Deketh HJR, Grima MA, Verhoef PNW, Verbruggen HB (1997) Knowledge-based fuzzy model for performance prediction of a rock-cutting trencher. Int J Approx Reason 16(1):43–66

Ghaboussi J, Sidarta DE (1998) New nested adaptive neural networks (NANN) for constitutive modeling. Comput Geotech 22(1):29–52

Grima MA, Babuska R (1999) Fuzzy model for the prediction of unconfined compressive strength of rock samples. Int J Rock Mech Min Sci 36(3):339–349

Baykasoglu A, Cevik A, Ozbakir L, Kulluk S (2009) Generating prediction rules for liquefaction through data mining. Expert Syst Appl 36(10):12491–12499

Kayadelen C, Taskiran T, Gunaydin O, Fener M (2009) Adaptive neuro-fuzzy modeling for the swelling potential of compacted soils. Environ Earth Sci 59(1):109–115

Sezer A, Goktepe BA, Altun S (2010) Adaptive neuro-fuzzy approach for sand permeability estimation. Environ Eng Manag J 9(2):231–238

Atashpaz-Gargari E, Lucas C (2007) Imperialist competitive algorithm: an algorithm for optimization inspired by imperialistic competition. IEEE, Piscataway

Ahmadi MA, Ebadi M, Shokrollahi A, Majidi SMJ (2013) Evolving artificial neural network and imperialist competitive algorithm for prediction oil flow rate of the reservoir. Appl Soft Comput 13(2):1085–1098

Marto A, Hajihassani M, Armaghani DJ, Mohamad ET, Makhtar AM (2014) A novel approach for blast-induced flyrock prediction based on imperialist competitive algorithm and artificial neural network. Sci World J 2014:1–11

Thangavelautham J, Smith A, El Samid NA, Ho A, Boucher D, Richard J, D’Eleuterio GMT (2008) Multirobot lunar excavation and ISRU using artificial-neural-tissue controllers. In: ElGenk MS (ed) Space technology and applications international forum staif 2008. Amer Inst Physics, Melville, p 229

Manouchehrian A, Gholamnejad J, Sharifzadeh M (2014) Development of a model for analysis of slope stability for circular mode failure using genetic algorithm. Environ Earth Sci 71(3):1267–1277

Lian C, Zeng ZG, Yao W, Tang HM, Chen CLP (2016) Landslide displacement prediction with uncertainty based on neural networks with random hidden weights. IEEE Trans Neural Netw Learn Syst 27(12):2683–2695

Gandomi AH, Kashani AR (2018) Automating pseudo-static analysis of concrete cantilever retaining wall using evolutionary algorithms. Measurement 115:104–124

Ghorbani A, Jokar MRA (2016) A hybrid imperialist competitive-simulated annealing algorithm for a multisource multi-product location-routing-inventory problem. Comput Ind Eng 101:116–127

Al Dossary MA, Nasrabadi H (2016) Well placement optimization using imperialist competitive algorithm. J Pet Sci Eng 147:237–248

Download references

Author information

Authors and affiliations.

Department for Management of Science and Technology Development, Ton Duc Thang University, Ho Chi Minh City, Vietnam

Hossein Moayedi

Faculty of Civil Engineering, Ton Duc Thang University, Ho Chi Minh City, Vietnam

Department of Civil and Environmental Engineering, Shiraz University, Shiraz, Iran

Mansour Mosallanezhad

Department of Geotechnics and Transportation, School of Civil Engineering, Faculty of Engineering, Universiti Teknologi Malaysia, Johor Bahru, Malaysia

Ahmad Safuan A. Rashid

Centre of Tropical Geoengineering (Geotropik), School of Civil Engineering, Faculty of Engineering, Universiti Teknologi Malaysia, Johor Bahru, Malaysia

Faculty of Civil Engineering and Environment, Universiti Tun Hussein Onn Malaysia, Parit Raja, 86400, Batu Pahat, Johor Darul Takzim, Malaysia

Wan Amizah Wan Jusoh

Civil Engineering Department, University of Hafr Al-Batin, Al-Jamiah, Hafr Al-Batin, Eastern Province, Kingdom of Saudi Arabia

Mohammed Abdullahi Muazu

You can also search for this author in PubMed   Google Scholar

Ethics declarations

Conflict of interest.

The authors declare that there is no conflict of interest in presenting this manuscript.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Corresponding author at Ton Duc Thang University, Ho Chi Minh City, Vietnam.

Rights and permissions

Reprints and permissions

About this article

Moayedi, H., Mosallanezhad, M., Rashid, A.S.A. et al. A systematic review and meta-analysis of artificial neural network application in geotechnical engineering: theory and applications. Neural Comput & Applic 32 , 495–518 (2020). https://doi.org/10.1007/s00521-019-04109-9

Download citation

Received : 22 August 2018

Accepted : 20 February 2019

Published : 18 March 2019

Issue Date : January 2020

DOI : https://doi.org/10.1007/s00521-019-04109-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Soft computing
  • Geotechnical engineering
  • Find a journal
  • Publish with us
  • Track your research

IMAGES

  1. Understanding Neural Networks: What, How and Why?

    literature review of neural network

  2. (PDF) Comprehensive Review of Artificial Neural Network Applications to

    literature review of neural network

  3. Neural Network Types & Real-life Examples

    literature review of neural network

  4. Neural Networks. A Neural Network is a series of…

    literature review of neural network

  5. A Literature Review: Forest Management with Neural Network and

    literature review of neural network

  6. Neural Networks Question Papers With Answers 42+ Pages Solution Doc [1

    literature review of neural network

VIDEO

  1. Summer School on Applied Deep Learning in Bioinformatics Aug 12-16, 2024 #shorts

  2. Various architecture of RNN

  3. Neural Network: Models of artificial neural netwok

  4. BSE SENSEX FORECASTING USING ARTIFICIAL NEURAL NETWORK METHOD @IIIT JABALPUR

  5. PR-055: Neural Machine Translation by Jointly Learning to Align and Translate

  6. Revolutionary Graphene Interfaces Set to Transform Neuroscience

COMMENTS

  1. Review of deep learning: concepts, CNN architectures, challenges

    The common convolutional layer of GoogLeNet is substituted by small blocks using the same concept of network-in-network (NIN) architecture , which replaced each layer with a micro-neural network. The GoogLeNet concepts of merge, transform, and split were utilized, supported by attending to an issue correlated with different learning types of ...

  2. Literature review: efficient deep neural networks techniques for

    The convolutional neural network (CNN) mostly triggers the interest, as it is considered one of the most powerful ways to learn useful representations of images and other structured data. ... Abdou, M.A. Literature review: efficient deep neural networks techniques for medical image analysis. Neural Comput & Applic 34, 5791-5812 (2022). https ...

  3. Graph neural networks: A review of methods and applications

    The first motivation of GNNs roots in the long-standing history of neural networks for graphs. In the nineties, Recursive Neural Networks are first utilized on directed acyclic graphs (Sperduti and Starita, 1997; Frasconi et al., 1998).Afterwards, Recurrent Neural Networks and Feedforward Neural Networks are introduced into this literature respectively in (Scarselli et al., 2009) and (Micheli ...

  4. Convolutional neural network: a review of models ...

    CNNs have widely used deep learning algorithms and the most prominent category of neural networks, mainly in high dimensional data, like images and videos. It is a multi-layer neural network (NN) architecture, stimulated by the neurobiology of visual cortex, which contains convolutional layer(s) pursued by fully connected (FC) layer(s).

  5. A Comprehensive Literature Review on Convolutional Neural Networks

    AI algorithms [1] leads us to "online adaptation", inference that can be drawn is the basic idea of our online adaptation scheme is to use pixels with very confident. A Comprehensive Literature Review on Convolutional Neural Networks. Mohammed Ehsan Ur Rahman. 1. , Soora Narasimha Reddy.

  6. Systematic Literature Review of Various Neural Network ...

    The popularity of using various neural network models and deep learning-based models to predict environmental temperament is increasing due to their ability to comprehend and address complex systems. When examining oceans and marine systems, Sea Surface Temperature (SST) is a critical factor to consider in terms of its impact on species, water availability, and natural events such as droughts ...

  7. A systematic literature review of deep learning neural network for time

    The success of deep learning over machine learning is the major attraction among researchers to study the ability of the technique in depth. Various deep learning architectures can be designed with the wide availability of time series air quality data (Sezer et al. 2020).The most common deep learning methods used in air quality forecasting are deep neural network (DNN), recurrent neural ...

  8. Convolutional Neural Networks: A Comprehensive Review of Architectures

    Convolutional Neural Networks have become a popular image and video recognition tool, achieving state-of-the-art performance in various domains such as object detection, face recognition, and natural language processing. This paper provides a comprehensive review of CNN (Convolutional Neural Network) architectures and their applications. We first introduce the basic structure of CNN s, and ...

  9. Review of deep learning: concepts, CNN architectures, challenges

    The common convolutional layer of GoogLeNet is substituted by small blocks using the same concept of network-in-network (NIN) architecture , which replaced each layer with a micro-neural network. The GoogLeNet concepts of merge, transform, and split were utilized, supported by attending to an issue correlated with different learning types of ...

  10. Novel applications of Convolutional Neural Networks in the age of

    Convolutional Neural Networks (CNNs) have been central to the Deep Learning revolution and played a key role in initiating the new age of Artificial Intelligence. However, in recent years newer ...

  11. Informatics

    We presented a comprehensive, detailed review of recent works on compressing and accelerating deep neural networks. Popular methods such as pruning methods, quantization methods, and low-rank factorization methods were described. We hope this paper can act as a keystone for future research on deep network compression.

  12. Deep Convolutional Neural Network Compression based on the Intrinsic

    Deep convolutional neural networks for image classification: A comprehensive review. Neural computation 29, 9 (2017), 2352--2449. Google Scholar; Xiaofeng Ruan, Yufan Liu, Bing Li, Chunfeng Yuan, and Weiming Hu. 2021. DPFPS: dynamic and progressive filter pruning for compressing convolutional neural networks from scratch.

  13. A Review of Convolutional Neural Networks

    Before Convolutional Neural Networks gained popularity, computer recognition problems involved extracting features out of the data provided which was not adequately efficient or provided a high degree of accuracy. However in recent times, Convolutional Neural Networks have attempted to provide a higher level of efficiency and accuracy in all the fields in which it has been employed in most ...

  14. A systematic review of convolutional neural network-based structural

    The main objective of this paper is to provide a systematic review of recent convolutional neural network (a subset of deep learning methods)-based techniques that have been widely developed in the context of non-contact sensing-based SHM. ... [34] provided an intensive literature review of state-of-the-art computer vision techniques using ...

  15. A Brief Review of Hypernetworks in Deep Learning

    A BRIEF REVIEW OF HYPERNETWORKS IN DEEP LEARNING Vinod Kumar Chauhan 1∗, Jiandong Zhou, Ping Lu 1, Soheila Molaei and David A. Clifton,2 1Institute of Biomedical Engineering, University of Oxford, OX3 7DQ, UK 2Oxford-Suzhou Institute of Advanced Research (OSCAR), Suzhou, China August 11, 2023 ABSTRACT Hypernetworks, or hypernets in short, are neural networks that generate weights for another ...

  16. A review of graph neural networks: concepts, architectures, techniques

    In the Table 1 describe the literature survey on graph neural networks, including the application area, the data set used, the model applied, and performance evaluation. The literature is from the years 2018 to 2023. ... Zhou J, et al. Graph neural networks: a review of methods and applications. AI Open. 2020;1(January):57-81.

  17. A Systematic Literature Review on Binary Neural Networks

    This paper presents an extensive literature review on Binary Neural Network (BNN). BNN utilizes binary weights and activation function parameters to substitute the full-precision values. In digital implementations, BNN replaces the complex calculations of Convolutional Neural Networks (CNNs) with simple bitwise operations. BNN optimizes large computation and memory storage requirements, which ...

  18. Deep learning in finance and banking: A literature review and

    Based on the basic structure of NN shown in Fig. 1, traditional networks include DNN, backpropagation (BP), MLP, and feedforward neural network (FNN).Using these models can ignore the order of data and the significance of time. As shown in Fig. 2, RNN has a new NN structure that can address the issues of long-term dependence and the order between input variables.

  19. (PDF) A Review on Artificial Neural Networks

    The neural network's role in medical science analysis attached to technology shows its elaborative ways of developing neural networks. Discover the world's research 25+ million members

  20. A systematic literature review of deep learning neural network for time

    A systematic literature review of deep learning neural network for time series air quality forecasting Environ Sci Pollut Res Int. 2022 Jan;29(4) :4958-4990. ... Owing to this, literature search is conducted thoroughly from all scientific databases to avoid unnecessary clutter. This study summarizes and discusses different types of deep ...

  21. NILM applications: Literature review of learning approaches, recent

    The neural networks' ability to learn representations and extract complex features from raw data allowed solutions to be designed with the minimum-required handcrafted feature extractions. However, transforming signal data into neural network's input is a crucial matter, that has been paid a lot of attention.

  22. Discovering robust biomarkers of neurological disorders from functional

    convolutional neural networks (CNN) customised for connec-tome datasets [19], [20] were proposed as an improvement over vanilla deep neural networks (DNN) [21], graph neural networks (GNN) have since emerged as the state-of-the-art deep learning model used in network neuroscience studies [22]. Besides being an intuitive fit to FC matrices (which

  23. Data leakage in deep learning studies of translational EEG

    In the literature review, we find that the majority of translational DNN-EEG studies suffer from data leakage due to data from individual subjects appearing in both the training and test sets. 2 Method 2.1 Deep neural network analysis overview

  24. Comprehensive Review of Artificial Neural Network Applications to

    The era of artificial neural network (ANN) began with a simplified application in many fields and remarkable success in pattern recognition (PR) even in manufacturing industries. Although significant progress achieved and surveyed in addressing ANN application to PR challenges, nevertheless, some problems are yet to be resolved like whimsical orientation (the unknown path that cannot be ...

  25. Machine learning in epidemiology: Neural networks forecasting of

    This study integrates advanced machine learning techniques, namely Artificial Neural Networks, Long Short-Term Memory, and Gated Recurrent Unit models, to forecast monkeypox outbreaks in Canada, Spain, the USA, and Portugal. The research focuses on the effectiveness of these models in predicting the spread and severity of cases using data from June 3 to December 31, 2022, and evaluates them ...

  26. How Artificial Intelligence Can Enhance the Diagnosis of Cardiac

    Advanced AI methods such as deep-learning convolutional neural networks (CNNs) may enhance the diagnostic process for CA by identifying patients at higher risk and potentially expediting the diagnosis of CA. ... In this review, we summarize the current state of AI applications to different diagnostic modalities used for the evaluation of CA ...

  27. A systematic review and meta-analysis of artificial neural network

    Artificial neural network (ANN) aimed to simulate the behavior of the nervous system as well as the human brain. Neural network models are mathematical computing systems inspired by the biological neural network in which try to constitute animal brains. ANNs recently extended, presented, and applied by many research scholars in the area of geotechnical engineering. After a comprehensive review ...

  28. Sustainability

    The convolutional neural network is a highly effective method for extracting data features, and it is a feed-forward neural network, commonly employed for the convolution and feature extraction of image data . When dealing with time series data, CNNs can leverage one-dimensional convolution kernels to extract features from one-dimensional time ...

  29. Review Article Applications of artificial neural network based battery

    The literature review encompasses a wide range of studies on ANN-based battery management systems. The BMS applications and prediction methods for SOH, SOC, and RUL are thoroughly classified. The review covers state-of-the-art ANN methods, including feedforward neural network, deep neural network, convolutional neural network, ...

  30. Discovering robust biomarkers of neurological disorders from functional

    Graph neural networks (GNN) have emerged as a popular tool for modelling functional magnetic resonance imaging (fMRI) datasets. Many recent studies have reported significant improvements in disorder classification performance via more sophisticated GNN designs and highlighted salient features that could be potential biomarkers of the disorder. In this review, we provide an overview of how GNN ...