artificial neural networks thesis topics

13 Interesting Neural Network Project Ideas & Topics for Beginners [2024]

The topic of neural networks has captivated the world of artificial intelligence and machine learning with its ability to mimic the human brain’s learning process. Neural networks aim to recognize underlying relationships in datasets through a process that mimics the functioning of the human brain.

Modern developments in image recognition, autonomous vehicles, and many other fields are based on application of neural networks. These adaptable frameworks have revolutionized the way we use technology. Such systems can learn to perform tasks without being programmed with precise rules. You can implement different neural network projects to understand all about network architectures and how they work.

In this article, we will delve into the fundamental principles of neural networks, and witness their capabilities through real-world Advanced deep learning projects ideas and understand what is neural network in AI? Read on to familiarize yourself with some exciting applications!

Fundamentals of neural networks

The fundamentals of neural networks algorithms lie in their inspiration from the human brain’s neural networks and their ability to learn from data. These networks are useful tools for a variety of machine-learning applications since they are built to analyze data and identify patterns.

Before we begin with our list of neural network project ideas , let us first revise the basics.

A neural network is a series of algorithms that process complex data
It can adapt to changing input.
It can generate the best possible results without requiring you to redesign the output criteria.
Computer scientists use neural networks to recognize patterns and solve diverse problems.
It is an example of machine learning.
The phrase “deep learning” is used for complex neural networks.

Understanding these fundamentals is essential for building, training, and deploying effective application neural network models.

Today, neural networks are applied to a wide range of business functions, such as customer research, sales forecasting, data validation, risk management, etc. And adopting a hands-on training approach brings many advantages if you want to pursue a career in deep learning. So, let us dive into the topics one by one. Learn more about the applications of neural networks.

Neural Network Projects

Here are few examples of Neural network projects with source code

1. Autoencoders based on neural networks

Autoencoders are the simplest of deep learning architectures. They are a specific type of feedforward neural networks where the input is first compressed into a lower-dimensional code. Then, the output is reconstructed from the compact code representation or summary. Therefore, autoencoders have three components built inside them – encoder, code, and decoder. In the next section, we have summarized how the architecture works.

The input passes through the encoder to produce the code.
The decoder (mirror image of the encoder’s structure) processes the output using the code.
An output is generated, which is identical to the input.

From the above steps, you will observe that an autoencoder is a dimensionality reduction or compression algorithm. To begin the development process, you will need an encoding method, a decoding method, and a loss function. Binary cross-entropy and mean squared error are the two top choices for the loss function. And to train the autoencoders, you can follow the same procedure as artificial neural networks via back-propagation. Now, let us discuss the applications of these networks.

You can create a handwriting recognition tool using the MNIST dataset as input. MNIST is a manageable, beginner-friendly data source that can be used to generate images of handwritten numbers. Since these images are noisy, they need a noise removal filter to classify and read the digits properly. And autoencoders can learn this noise removal feature for a particular dataset. You can try this project yourself by downloading freely available code from online repositories.

Must Read : Free nlp online course !

2. Convolutional neural network model

Convolutional neural networks or CNNs are typically applied to analyze visual imagery. This architecture can be used for different purposes, such as for image processing in self-driving cars.

Autonomous driving applications use this model to interface with the vehicle where CNNs receive image feedback and pass it along to a series of output decisions (turn right/left, stop/drive, etc.) Then, Reinforcement Learning algorithms process these decisions for driving. Here is how you can start building a full-fledged application on your own:

Take a tutorial on MNIST or CIFAR-10.
Get acquainted with binary image classification models.
Plug and play with the open code in your Jupyter notebook.

With this approach, you can learn how to import custom datasets and experiment with the implementation to achieve the desired performance. You can try increasing the number of epochs, toying with images, adding more layers, etc. Additionally, you can dive into some object detection algorithms like SSD, YOLO, Fast R-CNN, etc. Facial recognition in the iPhone’s FaceID feature is one of the most common examples of this model.

Once you have brushed up your concepts, try your hand at constructing a traffic sign classification system for a self-driving car using CNN and the Keras library. You can explore the GTSRB dataset for this project. Learn more about convolutional neural networks.

Learn Machine Learning courses from the World’s top Universities – Masters, Executive Post Graduate Programs, and Advanced Certificate Program in ML & AI to fast-track your career.

3. Recurrent neural network model

Unlike feedforward nets, recurrent neural networks or RNNs can deal with sequences of variable lengths. Sequence models like RNN have several applications, ranging from chatbots, text mining, video processing, to price predictions.

If you are just getting started, you should first acquire a foundational understanding of the LSTN gate with a char-level RNN. For example, you can attempt loading stock price datasets. You can train RNNs to predict what comes next by processing real data sequences one by one. We have explained this process below:

Assume that the predictions are probabilistic.
Sampling iterations take place in the network’s output distribution.
The sample is fed as input in the next step.
The trained network generates novel sequences.

With this, we have covered the main types of neural networks and their applications . Let us now look at some more specific neural network project ideas .

Best Machine Learning and AI Courses Online

4. cryptographic applications using artificial neural networks.

Cryptography is concerned with maintaining computational security and avoiding data leakages in electronic communications.

Artificial neural network-based cryptographic applications represent an intriguing merging of cutting-edge technologies to address pressing problems with security, privacy, and data protection. Cryptographic applications utilising neural networks have the potential to change the face of modern cybersecurity and data protection as research in this field advances.

You can implement a project in this field by using different neural network architectures and training algorithms.

Suppose the objective of your study is to investigate the use of artificial neural networks in cryptography. For the implementation, you can use a simple recurrent structure like the Jordan network, trained by the back-propagation algorithm. You will get a finite state sequential machine, which will be used for the encryption and decryption processes. Additionally, chaotic neural nets can form an integral part of the cryptographic algorithm in such systems.

5. Credit scoring system

Loan defaulters can stimulate enormous losses for banks and financial institutions. Therefore, they have to dedicate significant resources for assessing credit risks and classifying applications. In such a scenario, neural networks can provide an excellent alternative to traditional statistical models.

They offer a better predictive ability and more accurate classification outcomes than techniques like logistic regression and discriminant analysis. So, consider taking up a project to prove the same. You can design a credit scoring system based on artificial neural networks, and a draw a conclusion for your study from the following steps:

Extract a real-world credit card data set for analysis.
Determine the structure of neural networks for use, such as mixture-of-experts or the radial basis function.
Specify weights to minimize the total errors.
Explain your optimization technique or theory.
Compare your proposed decision-support system with other credit scoring applications.

6. Web-based training environment

If you want to learn how to create an advanced web education system using modern internet and development technologies, refer to the project called Socratenon. It will give you a peek into how web-based training can go beyond traditional solutions like virtual textbooks. The project’s package has been finalized, and its techniques have been tested for their superiority over other solutions available from open literature.

Socrantenon demonstrates how existing learning environments can be improved using sophisticated tools, such as:

User modeling to personalize content for users
Intelligent agents to provide better assistance and search
An intelligent back-end using neural networks and case-based reasoning

In-demand Machine Learning Skills

7. vehicle security system using facial recognition.

For this project, you can refer to SmartEye, a solution developed by Alfred Ritikos at Universiti Teknologi Malaysia . It covers several techniques, from facial recognition to optics and intelligent software development.

Over the years, security systems have come to benefit from many innovative products that facilitate identification, verification, and authentication of individuals. And SmartEye tries to conceptualize these processes by simulation. Also, it experiments with the existing facial recognition technologies by combining multilevel wavelet decomposition and neural networks.

8. Automatic music generation

An interesting and cutting-edge use of artificial intelligence that has the potential to transform the music industry and creative processes is automatic music generation utilising neural networks. With deep learning, it is possible to make real music without knowing how to play any instruments. You can train machines to write music, harmonise tunes, and create new musical compositions . You can create an automatic music generator using MIDI file data and building an LSTM model to generate new compositions.

OpenAI’s MuseNet serves as the appropriate example for this type of project. MuseNet is a deep neural network programmed to learn from discovered patterns of harmony, style, and rhythm and predict the next tokens to generate musical compositions. It can produce four-minute-long pieces with ten different instruments and combine forms like country music and rock music.

Learn more: Introduction to Deep Learning & Neural Networks

9. Application for cancer detection

Another ground breaking idea for Deep learning projects with source code is in area of medicine for the diagnosis of cancer, which holds great promise for breakthroughs in early identification and better patient outcomes. Neural network implementations have the potential to introduce efficiency in medical diagnosis, and particularly in the field of cancer detection. There are numerous benefits to employing neural networks to detect cancer:

Early Detection
Speed and Efficiency
Scalability

Since cancer cells are different from healthy cells, it is possible to detect the ailment using histology images. In order to learn the intricate patterns and traits associated with cancer, neural networks can be trained on enormous databases of medical pictures. This allows them to detect malignant spots more effectively and reliably. For example, a multi-tiered neural network architecture allows you to classify breast tissue into malignant and benign. You can practice building this breast cancer classifier using an IDC dataset from Kaggle, which is available in the public domain.

10. Text summarizer

By leveraging the capabilities of neural networks, this technology can efficiently extract key information and essential details from documents, articles, or other textual sources, compressing the content while preserving the main ideas and context.

Automatic text summarization involves condensing a piece of text into a shorter version. For this project, you will apply deep neural networks using natural language processing . The manual process of writing summaries is both laborious and time expensive. So, automatic text summarizers have gained immense importance in the area of academic research.

11. Intelligent chatbot

A fascinating use of artificial intelligence and natural language processing (NLP) is developing intelligent chatbots using neural networks. Chatbots can be created to comprehend natural language questions and answer in a human-like manner by utilising the power of neural networks, providing individualised and contextually appropriate interactions.

Modern businesses are using chatbots to take care of routine requests and enhance customer service. Some of these bots can also identify the context of the queries and then respond with relevant answers. So, there are several ways to implement a chatbot system.

You can implement a project on retrieval-based chatbots using NLTK and Keras. Or you can go for generative models that are based on deep neural networks and do not require predefined responses.

Read: How to make chatbot in Python?

12. Human pose estimation project

This project will encompass detecting the human body in an image and then estimating its key points such as eyes, head, neck, knees, elbows, etc. It is the same technology Snapchat and Instagram use to fix face filters on a person. You can use the MPII Human Pose dataset to create your version.

13. Human activity recognition project

You can explore various Neural Network Project ideas, including implementing a neural network-based model for detecting human activities, such as sitting on a chair, falling, picking something up, opening or closing a door, etc. This is a video classification project, which will include combining a series of images and classifying the action. You can use a labeled video clips database, such as 20BN-something-something.

Neural networks and deep learning have brought significant transformations to the world of artificial intelligence. Today, these methods have penetrated a wide range of industries, from medicine and biomedical systems to banking and finance to marketing and retail.

Popular AI and ML Blogs & Free Courses

The journey of exploring neural networks has been one of the most exhilarating phases of my career. Diving into this complex yet fascinating world of artificial intelligence, I’ve had the opportunity to work on various projects, each teaching me something unique about how machines can mimic human brain functions. From these experiences, I’ve compiled a list of Neural Network Project Ideas for Beginners, aimed at helping those just starting out in this field. When I began, the challenge wasn’t just understanding the intricate details of neural networks but also finding practical projects to apply this knowledge. This article is crafted from my firsthand experiences, offering a blend of practical project ideas and topics that are perfectly suited for beginners. Whether you’re a student, a budding professional, or simply an AI enthusiast, these neural network project ideas are tailored to kickstart your journey into the world of artifi cial intelligence.

If you’re interested to gain Machine Learning certification, check out IIIT-B & upGrad’s Executive PG Programme in Machine Learning & AI which is designed for working professionals and offers 450+ hours of rigorous training, 30+ case studies & assignments, IIIT-B Alumni status, 5+ practical hands-on capstone projects & job assistance with top firms.

Refer to your Network!

If you know someone, who would benefit from our specially curated programs? Kindly fill in this form to register their interest. We would assist them to upskill with the right program, and get them a highest possible pre-applied fee-waiver up to ₹ 70,000/-

You earn referral incentives worth up to ₹80,000 for each friend that signs up for a paid programme! Read more about our referral incentives here .

Pavan Vadapalli

Something went wrong

Machine Learning Skills To Master

Artificial Intelligence Courses
Tableau Courses
NLP Courses
Deep Learning Courses

Our Popular Machine Learning Course

Our Trending Machine Learning Courses

Advanced Certificate Programme in Machine Learning and NLP from IIIT Bangalore - Duration 8 Months
Master of Science in Machine Learning & AI from LJMU - Duration 18 Months
Executive PG Program in Machine Learning and AI from IIIT-B - Duration 12 Months

Frequently Asked Questions (FAQs)

Artificial Intelligence (AI) projects enable machines to perform tasks that would otherwise require human intelligence. Learning, thinking, problem-solving, and perception are all goals of these intelligent creatures. Many theories, methodologies, and technologies are used in AI. Machine learning, neural networks, expert systems, cognitive technologies, human computer interaction, and natural language are just a few of the subfields. Graphics rendering unit, Iot, Complex algorithms, and API are some of the other AI-supporting technologies.

AI can be divided into four categories. Reactive machines are AI systems that do not rely on prior experience to complete a task. They have no memory and respond based on what they see. IBM's chess-playing supercomputers, Deep Blue, are an example. In order to act in current situations, people with limited memory rely on their past experiences. Autonomous vehicles are an example of limited memory. Theory of mind is a form of artificial intelligence system that allows machines to make decisions. None of them are as capable of making decisions as humans are. It is, nonetheless, making substantial progress. A self-aware AI system is one that is aware of its own existence. These systems should be self-aware, aware of their own condition, and able to predict the feelings of others.

Face biometrics are used to unlock a phone in an artificial intelligence project. The AI application can extract image attributes using deep learning. Convolution neural networks and Deep autoencoders networks are the two primary types of neural networks used. It's also a four-step procedure. Detection and face recognition, face alignment, face extraction, and face recognition are the four methods.

Explore Free Courses

Learn more about the education system, top universities, entrance tests, course information, and employment opportunities in Canada through this course.

Advance your career in the field of marketing with Industry relevant free courses

Build your foundation in one of the hottest industry of the 21st century

Master industry-relevant skills that are required to become a leader and drive organizational success

Build essential technical skills to move forward in your career in these evolving times

Get insights from industry leaders and career counselors and learn how to stay ahead in your career

Kickstart your career in law by building a solid foundation with these relevant free courses.

Stay ahead of the curve and upskill yourself on Generative AI and ChatGPT

Build your confidence by learning essential soft skills to help you become an Industry ready professional.

Learn more about the education system, top universities, entrance tests, course information, and employment opportunities in USA through this course.

Suggested Blogs

by venkatesh Rajanala

29 Feb 2024

Artificial Intelligence in Banking 2024: Examples & Challenges

by Pavan Vadapalli

27 Feb 2024

Top 9 Python Libraries for Machine Learning in 2024

19 Feb 2024

Top 15 IoT Interview Questions & Answers 2024 – For Beginners & Experienced

by Kechit Goyal

Data Preprocessing in Machine Learning: 7 Easy Steps To Follow

18 Feb 2024

Artificial Intelligence Salary in India [For Beginners & Experienced] in 2024

17 Feb 2024

45+ Interesting Machine Learning Project Ideas For Beginners [2024]

by Jaideep Khare

16 Feb 2024

Research Topics in Neural Networks

In artificial intelligence and machine learning, Neural Networks-based research is a wide and consistently emerging field. The concepts include theoretical basics, methodological enhancements, novel frameworks and a broad range of applications. All the members in phdprojects.org are extremely cooperative and work tirelessly to get original and novel topics on your area. We offer dedicated help to provide meaningful project. Below, we discuss about various latest and evolving research concepts in neural networks:

Fundamental Research:

Neural Network Theory: Our research interprets the neural network’s in-depth theoretical factors such as ability, generalization capabilities and the reason for its robustness among several tasks.
Optimization Methods: To efficiently and appropriately train the neural networks, we create novel optimization techniques.
Neural Architecture Search (NAS): Machine learning assists us to discover the best network frameworks and effectively automate the neural network development process.
Quantum Neural Networks: We examine how quantum techniques improve efficiency of neural networks and analyze the intersection of neural networks and quantum computing.

Advances in Learning Techniques:

Meta-Learning: In meta-learning, our model learns how to learn and enhances its efficiency with every task with remembering the previously gained skills.
Federated Learning: By keeping the data confidentiality and safety, we explore the training of distributed neural networks throughout various devices.
Reinforcement Learning: To accomplish the aim, our approach enhances the methods that enable models to decide consecutive decisions by communicating with their platforms.
Few-shot or Semi-supervised Learning: This technique allows our neural network models to learn from a limited labeled dataset added with a huge unlabeled dataset.

Enhancing Neural Network Components:

Activation Functions: To enhance the efficiency and training variations of neural networks, we investigate various activation functions.
Dynamic & Adaptive Networks: This is about the development of neural networks that alter their design and dimension at the training process based on the difficult nature of the task.
Regularization Methods: To avoid overfitting issues and enhance the neural network’s generalization, we build novel regularization techniques.

Neural Network Efficiency:

Explainable AI (XAI): To make our model more clear and reliable, we improve the understandability of neural network decisions.
Adversarial Machine Learning: Our research explores the neural network’s safety factors, specifically its efficiency against adversarial assaults and creates protection.
Fault Tolerance in Neural Networks: Make sure whether our neural networks are robust even its aspects fail or data is modified.

New Architectures & Frameworks:

Capsule Networks: Approaching our capsule networks framework which intends to address the challenges of CNNs including its inefficiency in managing spatial hierarchies.
Spiking Neural Networks (SNN): We create neural frameworks that nearly copies the processing way of biological neurons and effectively guides to more robust AI frameworks.
Integrated frameworks: Our project integrates neural networks with statistical frameworks or machine learning to manipulate the effectiveness of both.

Neural Networks Applications:

Clinical Diagnosis: In clinical imaging and diagnosis such as radiology, pathology and genomics, we enhance the neural network’s utilization.
Climate Modeling: Neural networks support us to interpret the complicated climatic systems and improve the climate forecasting’s accuracy,
Automatic Systems: Our project intends to create neural networks to utilize in automatic drones, robots, and self-driving cars.
Neural Networks in Natural Language processing (NLP): For various tasks such as summarization, translation, question-answering and others, we employ the latest language frameworks.
Financial Modeling: Neural networks helpful for us to forecast market trends, evaluate severity and automate business.

Cross-disciplinary Concepts:

Bio-inspired Neural Networks: To develop more robust and effective neural network methods, we observe motivations from neuroscience.
Neural Networks for Social Good: For overcoming social limitations like disaster concerns, poverty consideration, or monitoring disease spread, our research uses a neural network approach.

Evolving Approaches:

AI for Creativity: For innovative tasks like creating arts, music, development and writing, we make use of neural networks.
Edge AI: The process of neural network optimization helps us to effectively execute our model on edge-based devices such as IoT devices or smartphones with a small amount of computational energy.

It is very significant for us to think about the accessible resources, our own knowledge and possible project effects while selecting research concepts. A novel research approach emerges through the association with business, integrative community and institution and it also offers potential applications for our project.

What specific neural network architectures are being explored in the research thesis?

Neural Network Architecture operates by using organized layers to change input data into important depictions. The original layer obtains the unprocessed data, which then undergoes mathematical calculations within one or multiple hidden layers.

Convolutional Neural Networks (CNN) outshine in image recognition tasks, while Recurrent Neural Networks (RNN) prove superior performance in categorization calculation.

Global Asymptotical Stability of Recurrent Neural Networks With Multiple Discrete Delays and Distributed Delays
An Improved Algebraic Criterion for Global Exponential Stability of Recurrent Neural Networks With Time-Varying Delays
Finding Features for Real-Time Premature Ventricular Contraction Detection Using a Fuzzy Neural Network System
Improved Delay-Dependent Stability Condition of Discrete Recurrent Neural Networks With Time-Varying Delays
Experiments in the application of neural networks to rotating machine fault diagnosis
Flash-based programmable nonlinear capacitor for switched-capacitor implementations of neural networks
Polynomial functions can be realized by finite size multilayer feedforward neural networks
Convergence of Nonautonomous Cohen–Grossberg-Type Neural Networks With Variable Delays
Analysis and Optimization of Network Properties for Bionic Topology Hopfield Neural Network Using Gaussian-Distributed Small-World Rewiring Method
Comparing Support Vector Machines and Feedforward Neural Networks With Similar Hidden-Layer Weights
An artificial neural network study of the relationship between arousal, task difficulty and learning
Flow-Based Encrypted Network Traffic Classification With Graph Neural Networks
Deriving sufficient conditions for global asymptotic stability of delayed neural networks via nonsmooth analysis-II
Bifurcating pulsed neural networks, chaotic neural networks and parametric recursions: conciliating different frameworks in neuro-like computing
Prediction of internal surface roughness in drilling using three feedforward neural networks – a comparison
Comparison of two neural networks approaches to Boolean matrix factorization
A new class of convolutional neural networks (SICoNNets) and their application of face detection
The Guelph Darwin Project: the evolution of neural networks by genetic algorithms
Training neural networks with threshold activation functions and constrained integer weights
A commodity trading model based on a neural network-expert system hybrid
PHD Guidance
PHD PROJECTS UK
PHD ASSISTANCE IN BANGALORE
PHD Assistance
PHD In 3 Months
PHD Dissertation Help
PHD IN JAVA PROGRAMMING
PHD PROJECTS IN MATLAB
PHD PROJECTS IN RTOOL
PHD PROJECTS IN WEKA
PhD projects in computer networking
COMPUTER SCIENCE THESIS TOPICS FOR UNDERGRADUATES
PHD PROJECTS AUSTRALIA
PHD COMPANY
PhD THESIS STRUCTURE
PHD GUIDANCE HELP
PHD PROJECTS IN HADOOP
PHD PROJECTS IN OPENCV
PHD PROJECTS IN SCILAB
PHD PROJECTS IN WORDNET
NETWORKING PROJECTS FOR PHD
THESIS TOPICS FOR COMPUTER SCIENCE STUDENTS
IEEE JOURNALS IN COMPUTER SCIENCE
OPEN ACCESS JOURNALS IN COMPUTER SCIENCE
SCIENCE CITATION INDEX COMPUTER SCIENCE JOURNALS
SPRINGER JOURNALS IN COMPUTER SCIENCE
ELSEVIER JOURNALS IN COMPUTER SCIENCE
ACM JOURNALS IN COMPUTER SCIENCE
INTERNATIONAL JOURNALS FOR COMPUTER SCIENCE AND ENGINEERING
COMPUTER SCIENCE JOURNALS WITHOUT PUBLICATION FEE
SCIENCE CITATION INDEX EXPANDED JOURNALS LIST
THOMSON REUTERS INDEXED JOURNALS
DOAJ COMPUTER SCIENCE JOURNALS
SCOPUS INDEXED COMPUTER SCIENCE JOURNALS
SCI INDEXED COMPUTER SCIENCE JOURNALS
SPRINGER JOURNALS IN COMPUTER SCIENCE AND TECHNOLOGY
ISI INDEXED JOURNALS IN COMPUTER SCIENCE
PAID JOURNALS IN COMPUTER SCIENCE
NATIONAL JOURNALS IN COMPUTER SCIENCE AND ENGINEERING
MONTHLY JOURNALS IN COMPUTER SCIENCE
SCIMAGO JOURNALS LIST
THOMSON REUTERS INDEXED COMPUTER SCIENCE JOURNALS
RESEARCH PAPER FOR SALE
CHEAP PAPER WRITING SERVICE
RESEARCH PAPER ASSISTANCE
THESIS BUILDER
WRITING YOUR JOURNAL ARTICLE IN 12 WEEKS
WRITE MY PAPER FOR ME
PHD PAPER WRITING SERVICE
THESIS MAKER
THESIS HELPER
DISSERTATION HELP UK
DISSERTATION WRITERS UK
BUY DISSERTATION ONLINE
PHD THESIS WRITING SERVICES
DISSERTATION WRITING SERVICES UK
DISSERTATION WRITING HELP
PHD PROJECTS IN COMPUTER SCIENCE
DISSERTATION ASSISTANCE

Artificial Neural Network Thesis Topics

Artificial Neural Network Thesis Topics are recently explored for student’s interest on Artificial Neural Network. This is one of our preeminent services, which have attracted many students and research scholars due to its ever-growing research scope. Artificial Neural Network (ANN) is a mathematical model used to predict system performance, which is inspired by the function and structure of human biological neural networks (function is similar to the human brain and nervous system).

We have world-class engineers with us who are working on every part of this domain to resolve the issues of ANN. Consequently, We are well known for university guidelines worldwide because we tie-ups with top international colleges and universities, and our thesis writing service is delivered in all kinds of research fields. We are developing Artificial Neural Networks based Projects for the past ten years; till now, we have accomplished 1000+ Artificial Neural Network Thesis Topics for students and research scholars.

Neural Network Topics

Artificial Neural Topics offered by us for budding students and research scholars. We always provide thesis topics on current trends because we are one of the members in high-level journals like IEEE, SPRINGER, Elsevier, and other SCI-indexed journals. Our company is an ISO 9001.2000 certified company that wrote a thesis for students and research scholars in the world’s various countries. To select Artificial Thesis Topics, you must know about Artificial Neural Networks and their important aspects. Here’s we have given a brief overview of ANN for your reference,

Key Features of Artificial Neural Networks

Adaptive Learning
Patter extraction and detection
Semantic meaning extraction from imprecise data
Also in Real time operations

Artificial Neural Networks based Algorithms

Feedforward Neural Networks
Radial Basis Function Networks
Time delay neural network
Regulatory feedback network
Probabilistic neural network
Associative Neural Network
Fully Recurrent Network
Echo State Network
Bi-Directional RNN
Simple Recurrent Networks
Stochastic Neural Networks
Long Short Term Memory Networks
Genetic Scale RNN
Holographic Associative Memory
Spiking Neural Networks
Cascading Neural Networks
Dynamic Neural Networks
Neuro-Fuzzy Networks
One Shot Associative Memory
Instantaneously Trained Networks
Hierarchical Temporal Memory
Oscillating Neural Network
Growing Neural Gas
Counter Propagation Neural Network
Hybridization Neural Network
And also in Convolutional Neural Network

List of Tasks That are Used ANN

Classification (Pattern and sequence recognition, sequential decision making and also novelty detection)
Control (also as Computer Numerical Control)
Data processing (Clustering , filtering, compression and also blind source separation)
Function approximation/Regression analysis (Modeling, fitness approximation and also time series prediction)
Robotics (Prosthesis and also in directing manipulators)

Major Research Issues on Artificial Neural Network

Takes long training period
Complex computation [Time Consuming]
Ensure the development of Robust method s
Improve extrapolation ability
Data with uncertainty
Increase model transparency

Support for Matlab Toolbox

To solve ANN issues we have also use the latest version of Neural Network Toolbox version 3.0. It offers the following features, including

New Reduced Memory [Levenburg Marquardt Algorithm also to handle large scale problems]
New-pre-processing and also post-processing functions
Newly supervised networks [Probabilistic and also Generalized Regression]
New network training algorithms [conjugate gradient, two quasi-newton methods, and also to resilient back propagation)
Automatic creation of network simulation blocks using Simulink

Real Time Applications

Image compression
Security related applications
Medical image processing applications
Character recognition
System identification and also control
Trajectory prediction and also in vehicle control
Process control and natural resources
Pattern classification (radar systems, object recognition and also face recognition)
Data mining (e-mail spam filtering, also knowledge discovery in databases)
Sequence recognition (Speech, handwritten text recognition and also gesture recognition)

Future Applications of ANNs include

Integration of Fuzzy logic with ANN
Pulsed Artificial Neural Networks
Hardware specialized Artificial Neural Networks
Robots can see, feel and also in predict the world abnormal behavior
Music composition
Self-driving cars common usage
Improved stock prediction
Self-diagnosis of medical problems
And also in Handwritten documents to be automatically transformed

Current Artificial Neural Network Thesis Topics

Design and analysis of an intelligent flow transmitter based on artificial neural networks
Inter and intra channel nonlinearity compensation also in WDN OFDM coherent optical using Artificial Neural Network using nonlinear equalization
Empirical mode decomposition and also artificial neural network based wind turbine using fast, turbsim and Simulink
Artificial neural network based structural damage fault detection also for profile monitoring
Reduce common-mode voltage also in cascaded multilevel inverter using artificial neural network
Diffusion based overlay measurement also using artificial neural networks
Optimization and also modeling of tensile strength and also yield point on a steel bar using ANN

We have provided a few major aspects of Artificial Neural Networks. But we explore beyond the student’s level, which can make them stand in the field of research. We fully served with a research perspective, and our guidance and assistance make our students Expert. Create your own style in research; let it be unique also intended for yourself and yet identifiable for others.

Services we offer.

Mathematical proof

Pseudo code

Conference Paper

Research Proposal

System Design

Literature Survey

Data Collection

Thesis Writing

Data Analysis

Rough Draft

Paper Collection

Code and Programs

Paper Writing

Course Work

Available Master's thesis topics in machine learning

Main content.

Here we list topics that are available. You may also be interested in our list of completed Master's theses .

Learning and inference with large Bayesian networks

Most learning and inference tasks with Bayesian networks are NP-hard. Therefore, one often resorts to using different heuristics that do not give any quality guarantees.

Task: Evaluate quality of large-scale learning or inference algorithms empirically.

Advisor: Pekka Parviainen

Sum-product networks

Traditionally, probabilistic graphical models use a graph structure to represent dependencies and independencies between random variables. Sum-product networks are a relatively new type of a graphical model where the graphical structure models computations and not the relationships between variables. The benefit of this representation is that inference (computing conditional probabilities) can be done in linear time with respect to the size of the network.

Potential thesis topics in this area: a) Compare inference speed with sum-product networks and Bayesian networks. Characterize situations when one model is better than the other. b) Learning the sum-product networks is done using heuristic algorithms. What is the effect of approximation in practice?

Bayesian Bayesian networks

The naming of Bayesian networks is somewhat misleading because there is nothing Bayesian in them per se; A Bayesian network is just a representation of a joint probability distribution. One can, of course, use a Bayesian network while doing Bayesian inference. One can also learn Bayesian networks in a Bayesian way. That is, instead of finding an optimal network one computes the posterior distribution over networks.

Task: Develop algorithms for Bayesian learning of Bayesian networks (e.g., MCMC, variational inference, EM)

Large-scale (probabilistic) matrix factorization

The idea behind matrix factorization is to represent a large data matrix as a product of two or more smaller matrices.They are often used in, for example, dimensionality reduction and recommendation systems. Probabilistic matrix factorization methods can be used to quantify uncertainty in recommendations. However, large-scale (probabilistic) matrix factorization is computationally challenging.

Potential thesis topics in this area: a) Develop scalable methods for large-scale matrix factorization (non-probabilistic or probabilistic), b) Develop probabilistic methods for implicit feedback (e.g., recommmendation engine when there are no rankings but only knowledge whether a customer has bought an item)

Bayesian deep learning

Standard deep neural networks do not quantify uncertainty in predictions. On the other hand, Bayesian methods provide a principled way to handle uncertainty. Combining these approaches leads to Bayesian neural networks. The challenge is that Bayesian neural networks can be cumbersome to use and difficult to learn.

The task is to analyze Bayesian neural networks and different inference algorithms in some simple setting.

Deep learning for combinatorial problems

Deep learning is usually applied in regression or classification problems. However, there has been some recent work on using deep learning to develop heuristics for combinatorial optimization problems; see, e.g., [1] and [2].

Task: Choose a combinatorial problem (or several related problems) and develop deep learning methods to solve them.

References: [1] Vinyals, Fortunato and Jaitly: Pointer networks. NIPS 2015. [2] Dai, Khalil, Zhang, Dilkina and Song: Learning Combinatorial Optimization Algorithms over Graphs. NIPS 2017.

Advisors: Pekka Parviainen, Ahmad Hemmati

Estimating the number of modes of an unknown function

Mode seeking considers estimating the number of local maxima of a function f. Sometimes one can find modes by, e.g., looking for points where the derivative of the function is zero. However, often the function is unknown and we have only access to some (possibly noisy) values of the function.

In topological data analysis, we can analyze topological structures using persistent homologies. For 1-dimensional signals, this can translate into looking at the birth/death persistence diagram, i.e. the birth and death of connected topological components as we expand the space around each point where we have observed our function. These observations turn out to be closely related to the modes (local maxima) of the function. A recent paper [1] proposed an efficient method for mode seeking.

In this project, the task is to extend the ideas from [1] to get a probabilistic estimate on the number of modes. To this end, one has to use probabilistic methods such as Gaussian processes.

[1] U. Bauer, A. Munk, H. Sieling, and M. Wardetzky. Persistence barcodes versus Kolmogorov signatures: Detecting modes of one-dimensional signals. Foundations of computational mathematics17:1 - 33, 2017.

Advisors: Pekka Parviainen , Nello Blaser

Causal Abstraction Learning

We naturally make sense of the world around us by working out causal relationships between objects and by representing in our minds these objects with different degrees of approximation and detail. Both processes are essential to our understanding of reality, and likely to be fundamental for developing artificial intelligence. The first process may be expressed using the formalism of structural causal models, while the second can be grounded in the theory of causal abstraction.

This project will consider the problem of learning an abstraction between two given structural causal models. The primary goal will be the development of efficient algorithms able to learn a meaningful abstraction between the given causal models.

Advisor: Fabio Massimo Zennaro

Causal Bandits

"Multi-armed bandit" is an informal name for slot machines, and the formal name of a large class of problems where an agent has to choose an action among a range of possibilities without knowing the ensuing rewards. Multi-armed bandit problems are one of the most essential reinforcement learning problems where an agent is directly faced with an exploitation-exploration trade-off.

This project will consider a class of multi-armed bandits where an agent, upon taking an action, interacts with a causal system. The primary goal will be the development of learning strategies that takes advantage of the underlying causal system in order to learn optimal policies in a shortest amount of time.

Causal Modelling for Battery Manufacturing

Lithium-ion batteries are poised to be one of the most important sources of energy in the near future. Yet, the process of manufacturing these batteries is very hard to model and control. Optimizing the different phases of production to maximize the lifetime of the batteries is a non-trivial challenge since physical models are limited in scope and collecting experimental data is extremely expensive and time-consuming.

This project will consider the problem of aggregating and analyzing data regarding a few stages in the process of battery manufacturing. The primary goal will be the development of algorithms for transporting and integrating data collected in different contexts, as well as the use of explainable algorithms to interpret them.

Reinforcement Learning for Computer Security

The field of computer security presents a wide variety of challenging problems for artificial intelligence and autonomous agents. Guaranteeing the security of a system against attacks and penetrations by malicious hackers has always been a central concern of this field, and machine learning could now offer a substantial contribution. Security capture-the-flag simulations are particularly well-suited as a testbed for the application and development of reinforcement learning algorithms.

This project will consider the use of reinforcement learning for the preventive purpose of testing systems and discovering vulnerabilities before they can be exploited. The primary goal will be the modelling of capture-the-flag challenges of interest and the development of reinforcement learning algorithms that can solve them.

Approaches to AI Safety

The world and the Internet are more and more populated by artificial autonomous agents carrying out tasks on our behalf. Many of these agents are provided with an objective and they learn their behaviour trying to achieve their objective as best as they can. However, this approach can not guarantee that an agent, while learning its behaviour, will not undertake actions that may have unforeseen and undesirable effects. Research in AI safety tries to design autonomous agent that will behave in a predictable and safe way.

This project will consider specific problems and novel solution in the domain of AI safety and reinforcement learning. The primary goal will be the development of innovative algorithms and their implementation withing established frameworks.

Reinforcement Learning for Super-modelling

Super-modelling [1] is a technique designed for combining together complex dynamical models: pre-trained models are aggregated with messages and information being exchanged in order synchronize the behavior of the different modles and produce more accurate and reliable predictions. Super-models are used, for instance, in weather or climate science, where pre-existing models are ensembled together and their states dynamically aggregated to generate more realistic simulations.

This project will consider how reinforcement learning algorithms may be used to solve the coordination problem among the individual models forming a super-model. The primary goal will be the formulation of the super-modelling problem within the reinforcement learning framework and the study of custom RL algorithms to improve the overall performance of super-models.

[1] Schevenhoven, Francine, et al. "Supermodeling: improving predictions with an ensemble of interacting models." Bulletin of the American Meteorological Society 104.9 (2023): E1670-E1686.

Advisor: Fabio Massimo Zennaro , Francine Janneke Schevenhoven

The Topology of Flight Paths

Air traffic data tells us the position, direction, and speed of an aircraft at a given time. In other words, if we restrict our focus to a single aircraft, we are looking at a multivariate time-series. We can visualize the flight path as a curve above earth's surface quite geometrically. Topological data analysis (TDA) provides different methods for analysing the shape of data. Consequently, TDA may help us to extract meaningful features from the air traffic data. Although the typical flight path shapes may not be particularly intriguing, we can attempt to identify more intriguing patterns or “abnormal” manoeuvres, such as aborted landings, go-arounds, or diverts.

Advisor: Odin Hoff Gardå , Nello Blaser

Automatic hyperparameter selection for isomap

Isomap is a non-linear dimensionality reduction method with two free hyperparameters (number of nearest neighbors and neighborhood radius). Different hyperparameters result in dramatically different embeddings. Previous methods for selecting hyperparameters focused on choosing one optimal hyperparameter. In this project, you will explore the use of persistent homology to find parameter ranges that result in stable embeddings. The project has theoretic and computational aspects.

Advisor: Nello Blaser

Validate persistent homology

Persistent homology is a generalization of hierarchical clustering to find more structure than just the clusters. Traditionally, hierarchical clustering has been evaluated using resampling methods and assessing stability properties. In this project you will generalize these resampling methods to develop novel stability properties that can be used to assess persistent homology. This project has theoretic and computational aspects.

Topological Ancombs quartet

This topic is based on the classical Ancombs quartet and families of point sets with identical 1D persistence ( https://arxiv.org/abs/2202.00577 ). The goal is to generate more interesting datasets using the simulated annealing methods presented in ( http://library.usc.edu.ph/ACM/CHI%202017/1proc/p1290.pdf ). This project is mostly computational.

Persistent homology vectorization with cycle location

There are many methods of vectorizing persistence diagrams, such as persistence landscapes, persistence images, PersLay and statistical summaries. Recently we have designed algorithms to in some cases efficiently detect the location of persistence cycles. In this project, you will vectorize not just the persistence diagram, but additional information such as the location of these cycles. This project is mostly computational with some theoretic aspects.

Divisive covers

Divisive covers are a divisive technique for generating filtered simplicial complexes. They original used a naive way of dividing data into a cover. In this project, you will explore different methods of dividing space, based on principle component analysis, support vector machines and k-means clustering. In addition, you will explore methods of using divisive covers for classification. This project will be mostly computational.

Learning Acquisition Functions for Cost-aware Bayesian Optimization

This is a follow-up project of an earlier Master thesis that developed a novel method for learning Acquisition Functions in Bayesian Optimization through the use of Reinforcement Learning. The goal of this project is to further generalize this method (more general input, learned cost-functions) and apply it to hyperparameter optimization for neural networks.

Advisors: Nello Blaser , Audun Ljone Henriksen

Stable updates

This is a follow-up project of an earlier Master thesis that introduced and studied empirical stability in the context of tree-based models. The goal of this project is to develop stable update methods for deep learning models. You will design sevaral stable methods and empirically compare them (in terms of loss and stability) with a baseline and with one another.

Advisors: Morten Blørstad , Nello Blaser

Multimodality in Bayesian neural network ensembles

One method to assess uncertainty in neural network predictions is to use dropout or noise generators at prediction time and run every prediction many times. This leads to a distribution of predictions. Informatively summarizing such probability distributions is a non-trivial task and the commonly used means and standard deviations result in the loss of crucial information, especially in the case of multimodal distributions with distinct likely outcomes. In this project, you will analyze such multimodal distributions with mixture models and develop ways to exploit such multimodality to improve training. This project can have theoretical, computational and applied aspects.

Learning a hierarchical metric

Often, labels have defined relationships to each other, for instance in a hierarchical taxonomy. E.g. ImageNet labels are derived from the WordNet graph, and biological species are taxonomically related, and can have similarities depending on life stage, sex, or other properties.

ArcFace is an alternative loss function that aims for an embedding that is more generally useful than softmax. It is commonly used in metric learning/few shot learning cases.

Here, we will develop a metric learning method that learns from data with hierarchical labels. Using multiple ArcFace heads, we will simultaneously learn to place representations to optimize the leaf label as well as intermediate labels on the path from leaf to root of the label tree. Using taxonomically classified plankton image data, we will measure performance as a function of ArcFace parameters (sharpness/temperature and margins -- class-wise or level-wise), and compare the results to existing methods.

Advisor: Ketil Malde ( [email protected] )

Self-supervised object detection in video

One challenge with learning object detection is that in many scenes that stretch off into the distance, annotating small, far-off, or blurred objects is difficult. It is therefore desirable to learn from incompletely annotated scenes, and one-shot object detectors may suffer from incompletely annotated training data.

To address this, we will use a region-propsal algorithm (e.g. SelectiveSearch) to extract potential crops from each frame. Classification will be based on two approaches: a) training based on annotated fish vs random similarly-sized crops without annotations, and b) using a self-supervised method to build a representation for crops, and building a classifier for the extracted regions. The method will be evaluated against one-shot detectors and other training regimes.

If successful, the method will be applied to fish detection and tracking in videos from baited and unbaited underwater traps, and used to estimate abundance of various fish species.

See also: Benettino (2016): https://link.springer.com/chapter/10.1007/978-3-319-48881-3_56

Representation learning for object detection

While traditional classifiers work well with data that is labeled with disjoint classes and reasonably balanced class abundances, reality is often less clean. An alternative is to learn a vectors space embedding that reflects semantic relationships between objects, and deriving classes from this representation. This is especially useful for few-shot classification (ie. very few examples in the training data).

The task here is to extend a modern object detector (e.g. Yolo v8) to output an embedding of the identified object. Instead of a softmax classifier, we can learn the embedding either in a supervised manner (using annotations on frames) by attaching an ArcFace or other supervised metric learning head. Alternatively, the representation can be learned from tracked detections over time using e.g. a contrastive loss function to keep the representation for an object (approximately) constant over time. The performance of the resulting object detector will be measured on underwater videos, targeting species detection and/or indiviual recognition (re-ID).

Time-domain object detection

Object detectors for video are normally trained on still frames, but it is evident (from human experience) that using time domain information is more effective. I.e., it can be hard to identify far-off or occluded objects in still images, but movement in time often reveals them.

Here we will extend a state of the art object detector (e.g. yolo v8) with time domain data. Instead of using a single frame as input, the model will be modified to take a set of frames surrounding the annotated frame as input. Performance will be compared to using single-frame detection.

Large-scale visualization of acoustic data

The Institute of Marine Research has decades of acoustic data collected in various surveys. These data are in the process of being converted to data formats that can be processed and analyzed more easily using packages like Xarray and Dask.

The objective is to make these data more accessible to regular users by providing a visual front end. The user should be able to quickly zoom in and out, perform selection, export subsets, apply various filters and classifiers, and overlay annotations and other relevant auxiliary data.

Learning acoustic target classification from simulation

Broadband echosounders emit a complex signal that spans a large frequency band. Different targets will reflect, absorb, and generate resonance at different amplitudes and frequencies, and it is therefore possible to classify targets at much higher resolution and accuracy than before. Due to the complexity of the received signals, deriving effective profiles that can be used to identify targets is difficult.

Here we will use simulated frequency spectra from geometric objects with various shapes, orientation, and other properties. We will train ML models to estimate (recover) the geometric and material properties of objects based on these spectra. The resulting model will be applied to read broadband data, and compared to traditional classification methods.

Online learning in real-time systems

Build a model for the drilling process by using the Virtual simulator OpenLab ( https://openlab.app/ ) for real-time data generation and online learning techniques. The student will also do a short survey of existing online learning techniques and learn how to cope with errors and delays in the data.

Advisor: Rodica Mihai

Building a finite state automaton for the drilling process by using queries and counterexamples

Datasets will be generated by using the Virtual simulator OpenLab ( https://openlab.app/ ). The student will study the datasets and decide upon a good setting to extract a finite state automaton for the drilling process. The student will also do a short survey of existing techniques for extracting finite state automata from process data. We present a novel algorithm that uses exact learning and abstraction to extract a deterministic finite automaton describing the state dynamics of a given trained RNN. We do this using Angluin's L*algorithm as a learner and the trained RNN as an oracle. Our technique efficiently extracts accurate automata from trained RNNs, even when the state vectors are large and require fine differentiation.arxiv.org

Scaling Laws for Language Models in Generative AI

Large Language Models (LLM) power today's most prominent language technologies in Generative AI like ChatGPT, which, in turn, are changing the way that people access information and solve tasks of many kinds.

A recent interest on scaling laws for LLMs has shown trends on understanding how well they perform in terms of factors like the how much training data is used, how powerful the models are, or how much computational cost is allocated. (See, for example, Kaplan et al. - "Scaling Laws for Neural Language Models”, 2020.)

In this project, the task will consider to study scaling laws for different language models and with respect with one or multiple modeling factors.

Advisor: Dario Garigliotti

Applications of causal inference methods to omics data

Many hard problems in machine learning are directly linked to causality [1]. The graphical causal inference framework developed by Judea Pearl can be traced back to pioneering work by Sewall Wright on path analysis in genetics and has inspired research in artificial intelligence (AI) [1].

The Michoel group has developed the open-source tool Findr [2] which provides efficient implementations of mediation and instrumental variable methods for applications to large sets of omics data (genomics, transcriptomics, etc.). Findr works well on a recent data set for yeast [3].

We encourage students to explore promising connections between the fiels of causal inference and machine learning. Feel free to contact us to discuss projects related to causal inference. Possible topics include: a) improving methods based on structural causal models, b) evaluating causal inference methods on data for model organisms, c) comparing methods based on causal models and neural network approaches.

References:

1. Schölkopf B, Causality for Machine Learning, arXiv (2019): https://arxiv.org/abs/1911.10500

2. Wang L and Michoel T. Efficient and accurate causal inference with hidden confounders from genome-transcriptome variation data. PLoS Computational Biology 13:e1005703 (2017). https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005703

3. Ludl A and and Michoel T. Comparison between instrumental variable and mediation-based methods for reconstructing causal gene networks in yeast. arXiv:2010.07417 https://arxiv.org/abs/2010.07417

Advisors: Adriaan Ludl , Tom Michoel

Space-Time Linkage of Fish Distribution to Environmental Conditions

Conditions in the marine environment, such as, temperature and currents, influence the spatial distribution and migration patterns of marine species. Hence, understanding the link between environmental factors and fish behavior is crucial in predicting, e.g., how fish populations may respond to climate change. Deriving this link is challenging because it requires analysis of two types of datasets (i) large environmental (currents, temperature) datasets that vary in space and time, and (ii) sparse and sporadic spatial observations of fish populations.

Project goal

The primary goal of the project is to develop a methodology that helps predict how spatial distribution of two fish stocks (capelin and mackerel) change in response to variability in the physical marine environment (ocean currents and temperature). The information can also be used to optimize data collection by minimizing time spent in spatial sampling of the populations.

The project will focus on the use of machine learning and/or causal inference algorithms. As a first step, we use synthetic (fish and environmental) data from analytic models that couple the two data sources. Because the ‘truth’ is known, we can judge the efficiency and error margins of the methodologies. We then apply the methodologies to real world (empirical) observations.

Advisors: Tom Michoel , Sam Subbey .

Towards precision medicine for cancer patient stratification

On average, a drug or a treatment is effective in only about half of patients who take it. This means patients need to try several until they find one that is effective at the cost of side effects associated with every treatment. The ultimate goal of precision medicine is to provide a treatment best suited for every individual. Sequencing technologies have now made genomics data available in abundance to be used towards this goal.

In this project we will specifically focus on cancer. Most cancer patients get a particular treatment based on the cancer type and the stage, though different individuals will react differently to a treatment. It is now well established that genetic mutations cause cancer growth and spreading and importantly, these mutations are different in individual patients. The aim of this project is use genomic data allow to better stratification of cancer patients, to predict the treatment most likely to work. Specifically, the project will use machine learning approach to integrate genomic data and build a classifier for stratification of cancer patients.

Advisor: Anagha Joshi

Unraveling gene regulation from single cell data

Multi-cellularity is achieved by precise control of gene expression during development and differentiation and aberrations of this process leads to disease. A key regulatory process in gene regulation is at the transcriptional level where epigenetic and transcriptional regulators control the spatial and temporal expression of the target genes in response to environmental, developmental, and physiological cues obtained from a signalling cascade. The rapid advances in sequencing technology has now made it feasible to study this process by understanding the genomewide patterns of diverse epigenetic and transcription factors as well as at a single cell level.

Single cell RNA sequencing is highly important, particularly in cancer as it allows exploration of heterogenous tumor sample, obstructing therapeutic targeting which leads to poor survival. Despite huge clinical relevance and potential, analysis of single cell RNA-seq data is challenging. In this project, we will develop strategies to infer gene regulatory networks using network inference approaches (both supervised and un-supervised). It will be primarily tested on the single cell datasets in the context of cancer.

Developing a Stress Granule Classifier

To carry out the multitude of functions 'expected' from a human cell, the cell employs a strategy of division of labour, whereby sub-cellular organelles carry out distinct functions. Thus we traditionally understand organelles as distinct units defined both functionally and physically with a distinct shape and size range. More recently a new class of organelles have been discovered that are assembled and dissolved on demand and are composed of liquid droplets or 'granules'. Granules show many properties characteristic of liquids, such as flow and wetting, but they can also assume many shapes and indeed also fluctuate in shape. One such liquid organelle is a stress granule (SG).

Stress granules are pro-survival organelles that assemble in response to cellular stress and important in cancer and neurodegenerative diseases like Alzheimer's. They are liquid or gel-like and can assume varying sizes and shapes depending on their cellular composition.

In a given experiment we are able to image the entire cell over a time series of 1000 frames; from which we extract a rough estimation of the size and shape of each granule. Our current method is susceptible to noise and a granule may be falsely rejected if the boundary is drawn poorly in a small majority of frames. Ideally, we would also like to identify potentially interesting features, such as voids, in the accepted granules.

We are interested in applying a machine learning approach to develop a descriptor for a 'classic' granule and furthermore classify them into different functional groups based on disease status of the cell. This method would be applied across thousands of granules imaged from control and disease cells. We are a multi-disciplinary group consisting of biologists, computational scientists and physicists.

Advisors: Sushma Grellscheid , Carl Jones

Machine Learning based Hyperheuristic algorithm

Develop a Machine Learning based Hyper-heuristic algorithm to solve a pickup and delivery problem. A hyper-heuristic is a heuristics that choose heuristics automatically. Hyper-heuristic seeks to automate the process of selecting, combining, generating or adapting several simpler heuristics to efficiently solve computational search problems [Handbook of Metaheuristics]. There might be multiple heuristics for solving a problem. Heuristics have their own strength and weakness. In this project, we want to use machine-learning techniques to learn the strength and weakness of each heuristic while we are using them in an iterative search for finding high quality solutions and then use them intelligently for the rest of the search. Once a new information is gathered during the search the hyper-heuristic algorithm automatically adjusts the heuristics.

Advisor: Ahmad Hemmati

Machine learning for solving satisfiability problems and applications in cryptanalysis

Advisor: Igor Semaev

Hybrid modeling approaches for well drilling with Sintef

Several topics are available.

"Flow models" are first-principles models simulating the flow, temperature and pressure in a well being drilled. Our project is exploring "hybrid approaches" where these models are combined with machine learning models that either learn from time series data from flow model runs or from real-world measurements during drilling. The goal is to better detect drilling problems such as hole cleaning, make more accurate predictions and correctly learn from and interpret real-word data.

The "surrogate model" refers to a ML model which learns to mimic the flow model by learning from the model inputs and outputs. Use cases for surrogate models include model predictions where speed is favoured over accuracy and exploration of parameter space.

Surrogate models with active Learning

While it is possible to produce a nearly unlimited amount of training data by running the flow model, the surrogate model may still perform poorly if it lacks training data in the part of the parameter space it operates in or if it "forgets" areas of the parameter space by being fed too much data from a narrow range of parameters.

The goal of this thesis is to build a surrogate model (with any architecture) for some restricted parameter range and implement an active learning approach where the ML requests more model runs from the flow model in the parts of the parameter space where it is needed the most. The end result should be a surrogate model that is quick and performs acceptably well over the whole defined parameter range.

Surrogate models trained via adversarial learning

How best to train surrogate models from runs of the flow model is an open question. This master thesis would use the adversarial learning approach to build a surrogate model which to its "adversary" becomes indistinguishable from the output of an actual flow model run.

GPU-based Surrogate models for parameter search

While CPU speed largely stalled 20 years ago in terms of working frequency on single cores, multi-core CPUs and especially GPUs took off and delivered increases in computational power by parallelizing computations.

Modern machine learning such as deep learning takes advantage this boom in computing power by running on GPUs.

The SINTEF flow models in contrast, are software programs that runs on a CPU and does not happen to utilize multi-core CPU functionality. The model runs advance time-step by time-step and each time step relies on the results from the previous time step. The flow models are therefore fundamentally sequential and not well suited to massive parallelization.

It is however of interest to run different model runs in parallel, to explore parameter spaces. The use cases for this includes model calibration, problem detection and hypothesis generation and testing.

The task of this thesis is to implement an ML-based surrogate model in such a way that many surrogate model outputs can be produced at the same time using a single GPU. This will likely entail some trade off with model size and maybe some coding tricks.

Uncertainty estimates of hybrid predictions (Lots of room for creativity, might need to steer it more, needs good background literature)

When using predictions from a ML model trained on time series data, it is useful to know if it's accurate or should be trusted. The student is challenged to develop hybrid approaches that incorporates estimates of uncertainty. Components could include reporting variance from ML ensembles trained on a diversity of time series data, implementation of conformal predictions, analysis of training data parameter ranges vs current input, etc. The output should be a "traffic light signal" roughly indicating the accuracy of the predictions.

Transfer learning approaches

We're assuming an ML model is to be used for time series prediction

It is possible to train an ML on a wide range of scenarios in the flow models, but we expect that to perform well, the model also needs to see model runs representative of the type of well and drilling operation it will be used in. In this thesis the student implements a transfer learning approach, where the model is trained on general model runs and fine-tuned on a most representative data set.

(Bonus1: implementing one-shot learning, Bonus2: Using real-world data in the fine-tuning stage)

ML capable of reframing situations

When a human oversees an operation like well drilling, she has a mental model of the situation and new data such as pressure readings from the well is interpreted in light of this model. This is referred to as "framing" and is the normal mode of work. However, when a problem occurs, it becomes harder to reconcile the data with the mental model. The human then goes into "reframing", building a new mental model that includes the ongoing problem. This can be seen as a process of hypothesis generation and testing.

A computer model however, lacks re-framing. A flow model will keep making predictions under the assumption of no problems and a separate alarm system will use the deviation between the model predictions and reality to raise an alarm. This is in a sense how all alarm systems work, but it means that the human must discard the computer model as a tool at the same time as she's handling a crisis.

The student is given access to a flow model and a surrogate model which can learn from model runs both with and without hole cleaning and is challenged to develop a hybrid approach where the ML+flow model continuously performs hypothesis generation and testing and is able to "switch" into predictions of a hole cleaning problem and different remediations of this.

Advisor: Philippe Nivlet at Sintef together with advisor from UiB

Explainable AI at Equinor

In the project Machine Teaching for XAI (see https://xai.w.uib.no ) a master thesis in collaboration between UiB and Equinor.

Advisor: One of Pekka Parviainen/Jan Arne Telle/Emmanuel Arrighi + Bjarte Johansen from Equinor.

Explainable AI at Eviny

In the project Machine Teaching for XAI (see https://xai.w.uib.no ) a master thesis in collaboration between UiB and Eviny.

Advisor: One of Pekka Parviainen/Jan Arne Telle/Emmanuel Arrighi + Kristian Flikka from Eviny.

If you want to suggest your own topic, please contact Pekka Parviainen , Fabio Massimo Zennaro or Nello Blaser .

artificial neural networks thesis topics

Current Members
Off-Campus Students
Robot Videos
Funded Projects
Publications by Year
Publications by Type
Robot Learning Lecture
Robot Learning IP
Humanoid Robotics Seminar
Research Oberseminar
New, Open Topics
Ongoing Theses
Completed Theses
External Theses
Advice for Thesis Students
Thesis Checklist and Template
Jobs and Open Positions
Current Openings
Information for Applicants
Apply Here!
TU Darmstadt Student Hiwi Jobs
Contact Information

Currently Open Theses Topics

We offer these current topics directly for Bachelor and Master students at TU Darmstadt who can feel free to DIRECTLY contact the thesis advisor if you are interested in one of these topics. Excellent external students from another university may be accepted but are required to first email Jan Peters before contacting any other lab member for a thesis topic. Note that we cannot provide funding for any of these theses projects.

We highly recommend that you do either our robotics and machine learning lectures ( Robot Learning , Statistical Machine Learning ) or our colleagues ( Grundlagen der Robotik , Probabilistic Graphical Models and/or Deep Learning). Even more important to us is that you take both Robot Learning: Integrated Project, Part 1 (Literature Review and Simulation Studies) and Part 2 (Evaluation and Submission to a Conference) before doing a thesis with us.

In addition, we are usually happy to devise new topics on request to suit the abilities of excellent students. Please DIRECTLY contact the thesis advisor if you are interested in one of these topics. When you contact the advisor, it would be nice if you could mention (1) WHY you are interested in the topic (dreams, parts of the problem, etc), and (2) WHAT makes you special for the projects (e.g., class work, project experience, special programming or math skills, prior work, etc.). Supplementary materials (CV, grades, etc) are highly appreciated. Of course, such materials are not mandatory but they help the advisor to see whether the topic is too easy, just about right or too hard for you.

Only contact *ONE* potential advisor at the same time! If you contact a second one without first concluding discussions with the first advisor (i.e., decide for or against the thesis with her or him), we may not consider you at all. Only if you are super excited for at most two topics send an email to both supervisors, so that the supervisors are aware of the additional interest.

FOR FB16+FB18 STUDENTS: Students from other depts at TU Darmstadt (e.g., ME, EE, IST), you need an additional formal supervisor who officially issues the topic. Please do not try to arrange your home dept advisor by yourself but let the supervising IAS member get in touch with that person instead. Multiple professors from other depts have complained that they were asked to co-supervise before getting contacted by our advising lab member.

NEW THESES START HERE

Imitation Learning for High-Speed Robot Air Hockey

Scope: Master thesis Advisor: Puze Liu and Julen Urain De Jesus Start: ASAP Topic:

High-speed reactive motion is one of the fundamental capabilities of robots to achieve human-level behavior. Optimization-based methods suffer from real-time requirement when the problem is non-convex and contains constraints. Reinforcement learning requires extensive reward engineering to achieve the desired performance. Imitation learning, on the other hand, gathers human knowledge directly from data collection and enables robots to learn natural movements efficiently. In this paper, we explore how imitation learning can be performed in a complex robot Air Hockey Task. The robot needs to learn not only low-level skills, but also high-level tactics from human demonstrations.

Requirements

Strong Python programming skills
Knowledge in Machine Learning / Supervised Learning
Good Knowledge in Robotics
Experience with deep learning libraries is a plus

References * Chi, Cheng, et al. "Diffusion policy: Visuomotor policy learning via action diffusion." arXiv preprint arXiv:2303.04137 (2023). * Liu, Puze, et al. "Robot reinforcement learning on the constraint manifold." Conference on Robot Learning. PMLR (2022). * Pan, Yunpeng, et al. "Imitation learning for agile autonomous driving." The International Journal of Robotics Research 39.2-3 (2020). Interested students can apply by sending an e-mail to [email protected] and attaching the required documents mentioned above.

Walk your network: investigating neural network’s location in Q-learning methods.

Scope: Master thesis Advisor: Theo Vincent and Boris Belousov Start: Flexible Topic:

Q-learning methods are at the heart of Reinforcement Learning. They have been shown to outperform humans on some complex tasks such as playing video games [1]. In robotics, where the action space is in most cases continuous, actor-critic methods are relying on Q-learning methods to learn the critic [2]. Although Q-learning methods have been extensively studied in the past, little focus has been placed on the way the online neural network is exploring the space of Q functions. Most approaches focus on crafting a loss that would make the agent learn better policies [3]. Here, we offer a thesis that focuses on the position of the online Q neural network in the space of Q functions. The student will first investigate this idea on simple problems before comparing the performance to strong baselines such as DQN or REM [1, 4] on Atari games. Depending on the result, the student might as well get into MuJoCo and compare the results with SAC [2]. The student will be welcome to propose some ideas as well.

Highly motivated students can apply by sending an email to [email protected] . Please attach your CV and clearly state why you are interested in this topic.

Knowledge in Reinforcement Learning

References [1] Mnih, Volodymyr, et al. "Human-level control through deep reinforcement learning." nature 518.7540 (2015): 529-533. [2] Haarnoja, Tuomas, et al. "Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor." International conference on machine learning. PMLR, 2018. [3] Hessel, Matteo, et al. "Rainbow: Combining improvements in deep reinforcement learning." Proceedings of the AAAI conference on artificial intelligence. Vol. 32. No. 1. 2018. [4] Agarwal, R., Schuurmans, D. & Norouzi, M.. (2020). An Optimistic Perspective on Offline Reinforcement Learning International Conference on Machine Learning (ICML).

Co-optimizing Hand and Action for Robotic Grasping of Deformable objects

This project aims to advance deformable object manipulation by co-optimizing robot gripper morphology and control policies. The project will involve utilizing existing simulation environments for deformable object manipulation [2] and implementing a method to jointly optimize gripper morphology and grasp policies within the simulation.

Required Qualification:

Familiarity with deep learning libraries such as PyTorch or Tensorflow

Preferred Qualification:

Attendance of the lectures "Statistical Machine Learning", "Computational Engineering and Robotics" and "Robot Learning"

Application Requirements:

Curriculum Vitae
Motivation letter explaining why you would like to work on this topic and why you are the perfect candidate

Interested students can apply by sending an e-mail to [email protected] and attaching the required documents mentioned above.

References: [1] Xu, Jie, et al. "An End-to-End Differentiable Framework for Contact-Aware Robot Design." Robotics: Science & Systems. 2021. [2] Huang, Isabella, et al. "DefGraspNets: Grasp Planning on 3D Fields with Graph Neural Nets." arXiv preprint arXiv:2303.16138 (2023).

Geometry-Aware Diffusion Models for Robotics

In this thesis, you will work on developing an imitation learning algorithm using diffusion models for robotic manipulation tasks, such as the ones in [2, 3, 4], but taking into account the geometry of the task space.

If this sounds interesting, please send an email to [email protected] and [email protected] , and possibly attach your CV, highlighting the relevant courses you took in robotics and machine learning.

What's in it for you:

You get to work on an exciting topic at the intersection of deep-learning and robotics
We will supervise you closely throughout your thesis
Depending on the results, we will aim for an international conference publication

Requirements:

Be motivated -- we will support you a lot, but we expect you to contribute a lot too
Robotics knowledge
Experience setting up deep learning pipelines -- from data collection, architecture design, training, and evaluation
PyTorch -- especially experience writing good parallelizable code (i.e., runs fast in the GPU)

References: [1] https://arxiv.org/abs/2112.10752 [2] https://arxiv.org/abs/2308.01557 [3] https://arxiv.org/abs/2209.03855 [4] https://arxiv.org/abs/2303.04137 [5] https://arxiv.org/abs/2205.09991

Learning Latent Representations for Embodied Agents

Interested students can apply by sending an E-Mail to [email protected] and attaching the required documents mentioned below.

Experience with TensorFlow/PyTorch
Familiarity with core Machine Learning topics
Experience programming/controlling robots (either simulated or real world)
Knowledgeable about different robot platforms (quadrupeds and bipedal robots)
Resume / CV
Cover letter explaining why this topic fits you well and why you are an ideal candidate

References: [1] Ho and Ermon. "Generative adversarial imitation learning" [2] Arenz, et al. "Efficient Gradient-Free Variational Inference using Policy Search"

Characterizing Fear-induced Adaptation of Balance by Inverse Reinforcement Learning

Interested students can apply by sending an E-Mail to [email protected] and attaching the required documents mentioned below.

Basic knowledge of reinforcement learning
Hand-on experience with reinforcement learning or inverse reinforcement learning
Cognitive science background

References: [1] Maki, et al. "Fear of Falling and Postural Performance in the Elderly" [2] Davis et al. "The relationship between fear of falling and human postural control" [3] Ho and Ermon. "Generative adversarial imitation learning"

Timing is Key: CPGs for regularizing Quadruped Gaits learned with DRL

To tackle this problem we want to utilize Central Pattern Generators (CPGs), which can generate timings for ground contacts for the four feet. The policy gets rewarded for complying with the contact patterns of the CPGs. This leads to a straightforward way of regularizing and steering the policy to a natural gait without posing too strong restrictions on it. We first want to manually find fitting CPG parameters for different gait velocities and later move to learning those parameters in an end-to-end fashion.

Highly motivated students can apply by sending an E-Mail to [email protected] and attaching the required documents mentioned below.

Minimum Qualification:

Good Python programming skills
Basic knowledge of the PyTorch library
Basic knowledge of Reinforcement Learning
Good knowledge of the PyTorch library
Basic knowledge of the MuJoCo simulator

References: [1] Cheng, Xuxin, et al. "Extreme Parkour with Legged Robots."

Damage-aware Reinforcement Learning for Deformable and Fragile Objects

Goal of this thesis will be the development and application of a model-based reinforcement learning method on real robots. Your tasks will include: 1. Setting up a simulation environment for deformable object manipulation 2. Utilizing existing models for stress and deformability prediction[1] 3. Implementing a reinforcement learning method to work in simulation and, if possible, on the real robot methods.

If you are interested in this thesis topic and believe you possess the necessary skills and qualifications, please submit your application, including a resume and a brief motivation letter explaining your interest and relevant experience. Please send your application to [email protected].

Required Qualification :

Enthusiasm for and experience in robotics, machine learning, and simulation
Strong programming skills in Python

Desired Qualification :

Attendance of the lectures "Statistical Machine Learning", "Computational Engineering and Robotics" and (optionally) "Robot Learning"

References: [1] Huang, I., Narang, Y., Bajcsy, R., Ramos, F., Hermans, T., & Fox, D. (2023). DefGraspNets: Grasp Planning on 3D Fields with Graph Neural Nets. arXiv preprint arXiv:2303.16138.

Imitation Learning meets Diffusion Models for Robotics

The objective of this thesis is to build upon prior research [2, 3] to establish a connection between Diffusion Models and Imitation Learning. We aim to explore how to exploit Diffusion Models and improve the performance of Imitation learning algorithms that interact with the world.

We welcome highly motivated students to apply for this opportunity by sending an email expressing their interest to Firas Al-Hafez ( [email protected] ) Julen Urain ( [email protected] ). Please attach your letter of motivation and CV, and clearly state why you are interested in this topic and why you are the ideal candidate for this position.

Required Qualification : 1. Strong Python programming skills 2. Basic Knowledge in Imitation Learning 3. Interest in Diffusion models, Reinforcement Learning

Desired Qualification : 1. Attendance of the lectures "Statistical Machine Learning", "Computational Engineering and Robotics" and/or "Reinforcement Learning: From Fundamentals to the Deep Approaches"

References: [1] Song, Yang, and Stefano Ermon. "Generative modeling by estimating gradients of the data distribution." Advances in neural information processing systems 32 (2019). [2] Ho, Jonathan, and Stefano Ermon. "Generative adversarial imitation learning." Advances in neural information processing systems 29 (2016). [3] Garg, D., Chakraborty, S., Cundy, C., Song, J., & Ermon, S. (2021). Iq-learn: Inverse soft-q learning for imitation. Advances in Neural Information Processing Systems, 34, 4028-4039. [4] Chen, R. T., & Lipman, Y. (2023). Riemannian flow matching on general geometries. arXiv preprint arXiv:2302.03660.

Be extremely motivated -- we will support you a lot, but we expect you to contribute a lot too

Scaling Behavior Cloning to Humanoid Locomotion

Scope: Bachelor / Master thesis Advisor: Joe Watson Added: 2023-10-07 Start: ASAP Topic: In a previous project [1], I found that behavior cloning (BC) was a surprisingly poor baseline for imitating humanoid locomotion. I suspect the issue may lie in the challenges of regularizing high-dimensional regression.

The goal of this project is to investigate BC for humanoid imitation, understand the scaling issues present, and evaluate possible solutions, e.g. regularization strategies from the regression literature.

The project will be building off Google Deepmind's Acme library [2], which has BC algorithms and humanoid demonstration datasets [3] already implemented, and will serve as the foundation of the project.

To apply, email [email protected] , ideally with a CV and transcript so I can assess your suitability.

Experience, interest and enthusiasm for the intersection of robot learning and machine learning
Experience with Acme and JAX would be a benefit, but not necessary

References: [1] https://arxiv.org/abs/2305.16498 [2] https://github.com/google-deepmind/acme [3] https://arxiv.org/abs/2106.00672

Robot Gaze for Communicating Collision Avoidance Intent in Shared Workspaces

Scope: Bachelor/Master thesis Advisor: Alap Kshirsagar , Dorothea Koert Added: 2023-09-27 Start: ASAP

Topic: In order to operate close to non-experts, future robots require both an intuitive form of instruction accessible to lay users and the ability to react appropriately to a human co-worker. Instruction by imitation learning with probabilistic movement primitives (ProMPs) [1] allows capturing tasks by learning robot trajectories from demonstrations including the motion variability. However, appropriate responses to human co-workers during the execution of the learned movements are crucial for fluent task execution, perceived safety, and subjective comfort. To facilitate such appropriate responsive behaviors in human-robot interaction, the robot needs to be able to react to its human workspace co-inhabitant online during the execution. Also, the robot needs to communicate its motion intent to the human through non-verbal gestures such as eye and head gazes [2][3]. In particular for humanoid robots, combining motions of arms with expressive head and gaze directions is a promising approach that has not yet been extensively studied in related work.

Goals of the thesis:

Develop a method to combine robot head/gaze motion with ProMPs for online collision avoidance
Implement the method on a Franka-Emika Panda Robot
Evaluate and compare the implemented behaviors in a study with human participants

Highly motivated students can apply by sending an email to [email protected]. Please attach your CV and transcript, and clearly state your prior experiences and why you are interested in this topic.

Strong Programming Skills in python
Prior experience with Robot Operating System (ROS) and user studies would be beneficial
Strong motivation for human-centered robotics including design and implementation of a user study

References : [1] Koert, Dorothea, et al. "Learning intention aware online adaptation of movement primitives." IEEE Robotics and Automation Letters 4.4 (2019): 3719-3726. [2] Admoni, Henny, and Brian Scassellati. "Social eye gaze in human-robot interaction: a review." Journal of Human-Robot Interaction 6.1 (2017): 25-63. [3] Lemasurier, Gregory, et al. "Methods for expressing robot intent for human–robot collaboration in shared workspaces." ACM Transactions on Human-Robot Interaction (THRI) 10.4 (2021): 1-27.

Tactile Sensing for the Real World

Topic: Tactile sensing is a crucial sensing modality that allows humans to perform dexterous manipulation[1]. In recent years, the development of artificial tactile sensors has made substantial progress, with current models relying on cameras inside the fingertips to extract information about the points of contact [2]. However, robotic tactile sensing is still a largely unsolved topic despite these developments. A central challenge of tactile sensing is the extraction of usable representations of sensor readings, especially since these generally contain an incomplete view of the environment.

Recent model-based reinforcement learning methods like Dreamer [3] leverage latent state-space models to reason about the environment from partial and noisy observations. However, more work has yet to be done to apply such methods to real-world manipulation tasks. Hence, this thesis will explore whether Dreamer can solve challenging real-world manipulation tasks by leveraging tactile information. Initial results suggest that tasks like peg-in-a-hole can indeed be solved with Dreamer in simulation (see figure above), but the applicability of this method in the real world has yet to be shown.

In this work, you will work with state-of-the-art hardware and compute resources on a hot research topic with the option of publishing your work at a scientific conference.

Highly motivated students can apply by sending an email to [email protected]. Please attach a transcript of records and clearly state your prior experiences and why you are interested in this topic.

Ideally experience with deep learning libraries like JAX or PyTorch
Experience with reinforcement learning is a plus
Experience with Linux

References [1] 2S Match Anest2, Roland Johansson Lab (2005), https://www.youtube.com/watch?v=HH6QD0MgqDQ [2] Gelsight Inc., Gelsight Mini, https://www.gelsight.com/gelsightmini/ [3] Hafner, D., Lillicrap, T., Ba, J., & Norouzi, M. (2019). Dream to control: Learning behaviors by latent imagination. arXiv preprint arXiv:1912.01603.

Large Vision-Language Neural Networks for Open-Vocabulary Robotic Manipulation

Robots are expected to soon leave their factory/laboratory enclosures and operate autonomously in everyday unstructured environments such as households. Semantic information is especially important when considering real-world robotic applications where the robot needs to re-arrange objects as per a set of language instructions or human inputs (as shown in the figure). Many sophisticated semantic segmentation networks exist [1]. However, a challenge when using such methods in the real world is that the semantic classes rarely align perfectly with the language input received by the robot. For instance, a human language instruction might request a ‘glass’ or ‘water’, but the semantic classes detected might be ‘cup’ or ‘drink’.

Nevertheless, with the rise of large language and vision-language models, we now have capable segmentation models that do not directly predict semantic classes but use learned associations between language queries and classes to give us ’open-vocabulary’ segmentation [2]. Some models are especially powerful since they can be used with arbitrary language queries.

In this thesis, we aim to build on advances in 3D vision-based robot manipulation and large open-vocabulary vision models [2] to build a full pick-and-place pipeline for real-world manipulation. We also aim to find synergies between scene reconstruction and semantic segmentation to determine if knowing the object semantics can aid the reconstruction of the objects and, in turn, aid manipulation.

Highly motivated students can apply by sending an e-mail expressing their interest to Snehal Jauhri (email: [email protected]) or Ali Younes (email: [email protected]), attaching your letter of motivation and possibly your CV.

Topic in detail : Thesis_Doc.pdf

Requirements: Enthusiasm, ambition, and a curious mind go a long way. There will be ample supervision provided to help the student understand basic as well as advanced concepts. However, prior knowledge of computer vision, robotics, and Python programming would be a plus.

References: [1] Y. Wu, A. Kirillov, F. Massa, W.-Y. Lo, and R. Girshick, “Detectron2”, https://github.com/facebookresearch/detectron2 , 2019. [2] F. Liang, B. Wu, X. Dai, K. Li, Y. Zhao, H. Zhang, P. Zhang, P. Vajda, and D. Marculescu, “Open-vocabulary semantic segmentation with mask-adapted clip,” in CVPR, 2023, pp. 7061–7070, https://github.com/facebookresearch/ov-seg

Dynamic Tiles for Deep Reinforcement Learning

Linear approximators in Reinforcement Learning are well-studied and come with an in-depth theoretical analysis. However, linear methods require defining a set of features of the state to be used by the linear approximation. Unfortunately, the feature construction process is a particularly problematic and challenging task. Deep Reinforcement learning methods have been introduced to mitigate the feature construction problem: these methods do not require handcrafted features, as features are extracted automatically by the network during learning, using gradient descent techniques.

In simple reinforcement learning tasks, however, it is possible to use tile coding as features: Tiles are simply a convenient discretization of the state space that allows us to easily control the generalization capabilities of the linear approximator. The objective of this thesis is to design a novel algorithm for automatic feature extraction that generates a set of features similar to tile coding, but that can arbitrarily partition the state space and deal with arbitrary complex state space, such as images. The idea is to combine the feature extraction problem directly with Linear Reinforcement Learning methods, defining an algorithm that is able both to have the theoretical guarantees and good convergence properties of these methods and the flexibility of Deep Learning approaches.

Curriculum Vitae (CV);
A motivation letter explaining the reason for applying for this thesis and academic/career objectives.

Minimum knowledge

Good Python programming skills;
Basic knowledge of Reinforcement Learning.

Preferred knowledge

Knowledge of the PyTorch library;
Knowledge of the Atari environments (ale-py library).
Knowledge of the MushroomRL library.

Accepted candidate will

Define a generalization of tile coding working with an arbitrary input set (including images);
Design a learning algorithm to adapt the tiles using data of interaction with the environment;
Combine feature learning with standard linear methods for Reinforcement Learning;
Verify the novel methodology in simple continuous state and discrete actions environments;
(Optionally) Extend the experimental analysis to the Atari environment setting.

Deep Learning Meets Teleoperation: Constructing Learnable and Stable Inductive Guidance for Shared Control

This work considers policies as learnable inductive guidance for shared control. In particular, we use the class of Riemannian motion policies [3] and consider them as differentiable optimization layers [4]. We analyze (i) if RMPs can be pre-trained by learning from demonstrations [5] or reinforcement learning [6] given a specific context; (ii) and subsequently employed seamlessly for human-guided teleoperation thanks to their physically consistent properties, such as stability [3]. We believe this step eliminates the laborious process of constructing complex policies and leads to improved and generalizable shared control architectures.

Highly motivated students can apply by sending an e-mail expressing your interest to [email protected] and [email protected] , attaching your letter of motivation and possibly your CV.

Experience with deep learning libraries (in particular Pytorch)
Knowledge in reinforcement learning and/or machine learning

References: [1] Niemeyer, Günter, et al. "Telerobotics." Springer handbook of robotics (2016); [2] Selvaggio, Mario, et al. "Autonomy in physical human-robot interaction: A brief survey." IEEE RAL (2021); [3] Cheng, Ching-An, et al. "RMP flow: A Computational Graph for Automatic Motion Policy Generation." Springer (2020); [4] Jaquier, Noémie, et al. "Learning to sequence and blend robot skills via differentiable optimization." IEEE RAL (2022); [5] Mukadam, Mustafa, et al. "Riemannian motion policy fusion through learnable lyapunov function reshaping." CoRL (2020); [6] Xie, Mandy, et al. "Neural geometric fabrics: Efficiently learning high-dimensional policies from demonstration." CoRL (2023).

Dynamic symphony: Seamless human-robot collaboration through hierarchical policy blending

This work focuses on arbitration between the user and assistive policy, i.e., shared autonomy. Various works allow the user to influence the dynamic behavior explicitly and, therefore, could not satisfy stability guarantees [3]. We pursue the idea of formulating arbitration as a trajectory-tracking problem that implicitly considers the user's desired behavior as an objective [4]. Therefore, we extend the work of Hansel et al. [5], who employed probabilistic inference for policy blending in robot motion control. The proposed method corresponds to a sampling-based online planner that superposes reactive policies given a predefined objective. This method enables the user to implicitly influence the behavior without injecting energy into the system, thus satisfying stability properties. We believe this step leads to an alternative view of shared autonomy with an improved and generalizable framework.

Highly motivated students can apply by sending an e-mail expressing your interest to [email protected] or [email protected] , attaching your letter of motivation and possibly your CV.

References: [1] Niemeyer, Günter, et al. "Telerobotics." Springer handbook of robotics (2016); [2] Selvaggio, Mario, et al. "Autonomy in physical human-robot interaction: A brief survey." IEEE RAL (2021); [3] Dragan, Anca D., and Siddhartha S. Srinivasa. "A policy-blending formalism for shared control." IJRR (2013); [4] Javdani, Shervin, et al. "Shared autonomy via hindsight optimization for teleoperation and teaming." IJRR (2018); [5] Hansel, Kay, et al. "Hierarchical Policy Blending as Inference for Reactive Robot Control." IEEE ICRA (2023).

Feeling the Heat: Igniting Matches via Tactile Sensing and Human Demonstrations

In this thesis, we want to investigate the effectiveness of vision-based tactile sensors for solving dynamic tasks (igniting matches). Since the whole task is difficult to simulate, we directly collect real-world data to learn policies from the human demonstrations [2,3]. We believe that this work is an important step towards more advanced tactile skills.

Highly motivated students can apply by sending an e-mail expressing your interest to [email protected] and [email protected] , attaching your letter of motivation and possibly your CV.

Good knowledge of Python
Prior experience with real robots and Linux is a plus

References: [1] https://www.youtube.com/watch?v=HH6QD0MgqDQ [2] Learning Compliant Manipulation through Kinesthetic and Tactile Human-Robot Interaction; Klas Kronander and Aude Billard. [3] https://www.youtube.com/watch?v=jAtNvfPrKH8

Inverse Reinforcement Learning for Neuromuscular Control of Humanoids

Within this thesis, the problems of learning from observations and efficient exploration in overactued systems should be addressed. Regarding the former, novel methods incorporating inverse dynamics models into the inverse reinforcement learning problem [1] should be adapted and applied. To address the problem of efficient exploration in overactuted systems, two approaches should be implemented and compared. The first approach uses a handcrafted action space, which disables and modulates actions in different phases of the gait based on biomechanics knowledge [2]. The second approach uses a stateful policy to incorporate an inductive bias into the policy [3]. The thesis will be supervised in conjunction with Guoping Zhao ( [email protected] ) from the locomotion lab.

Highly motivated students can apply by sending an e-mail expressing their interest to Firas Al-Hafez ( [email protected] ), attaching your letter of motivation and possibly your CV. Try to make clear why you would like to work on this topic, and why you would be the perfect candidate for the latter.

Required Qualification : 1. Strong Python programming skills 2. Knowledge in Reinforcement Learning 3. Interest in understanding human locomotion

Desired Qualification : 1. Hands-on experience on robotics-related RL projects 2. Prior experience with different simulators 3. Attendance of the lectures "Statistical Machine Learning", "Computational Engineering and Robotics" and/or "Reinforcement Learning: From Fundamentals to the Deep Approaches"

References: [1] Al-Hafez, F.; Tateo, D.; Arenz, O.; Zhao, G.; Peters, J. (2023). LS-IQ: Implicit Reward Regularization for Inverse Reinforcement Learning, International Conference on Learning Representations (ICLR). [2] Ong CF; Geijtenbeek T.; Hicks JL; Delp SL (2019) Predicting gait adaptations due to ankle plantarflexor muscle weakness and contracture using physics-based musculoskeletal simulations. PLoS Computational Biology [3] Srouji, M.; Zhang, J:;Salakhutdinow, R. (2018) Structured Control Nets for Deep Reinforcement Learning, International Conference on Machine Learning (ICML)

Robotic Tactile Exploratory Procedures for Identifying Object Properties

Goals of the thesis

Literature review of robotic EPs for identifying object properties [2,3,4]
Develop and implement robotic EPs for a Digit tactile sensor
Compare performance of robotic EPs with human EPs

Desired Qualifications

Interested in working with real robotic systems
Python programming skills

Literature [1] Lederman and Klatzky, “Haptic perception: a tutorial” [2] Seminara et al., “Active Haptic Perception in Robots: A Review” [3] Chu et al., “Using robotic exploratory procedures to learn the meaning of haptic adjectives” [4] Kerzel et al., “Neuro-Robotic Haptic Object Classification by Active Exploration on a Novel Dataset”

Scaling learned, graph-based assembly policies

scaling our previous methods to incorporate mobile manipulators or the Kobo bi-manual manipulation platform. The increased workspace of both would allow for handling a wider range of objects
[2] has shown more powerful, yet, it includes running a MILP for every desired structure. Thus another idea could be to investigate approaches aiming to approximate this solution
adapting the methods to handle more irregular-shaped objects / investigate curriculum learning

Highly motivated students can apply by sending an e-mail expressing your interest to [email protected] , attaching your letter of motivation and possibly your CV.

Experience with deep learning libraries (in particular Pytorch) is a plus
Experience with reinforcement learning / having taken Robot Learning is also a plus

References: [1] Learn2Assemble with Structured Representations and Search for Robotic Architectural Construction; Niklas Funk et al. [2] Graph-based Reinforcement Learning meets Mixed Integer Programs: An application to 3D robot assembly discovery; Niklas Funk et al. [3] Structured agents for physical construction; Victor Bapst et al.

Long-Horizon Manipulation Tasks from Visual Imitation Learning (LHMT-VIL): Algorithm

The proposed architecture can be broken down into the following sub-tasks: 1. Multi-object 6D pose estimation from video: Identify the object 6D poses in each video frame to generate the object trajectories 2. Action segmentation from video: Classify the action being performed in each video frame 3. High-level task representation learning: Learn the sequence of robotic movement primitives with the associated object poses such that the robot completes the demonstrated task 4. Low-level movement primitives: Create a database of low-level robotic movement primitives which can be sequenced to solve the long-horizon task

Desired Qualification: 1. Strong Python programming skills 2. Prior experience in Computer Vision and/or Robotics is preferred

Long-Horizon Manipulation Tasks from Visual Imitation Learning (LHMT-VIL): Dataset

During the project, we will create a large-scale dataset of videos of humans demonstrating industrial assembly sequences. The dataset will contain information of the 6D poses of the objects, the hand and body poses of the human, the action sequences among numerous other features. The dataset will be open-sourced to encourage further research on VIL.

[1] F. Sener, et al. "Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities". CVPR 2022. [2] P. Sharma, et al. "Multiple Interactions Made Easy (MIME) : Large Scale Demonstrations Data for Imitation." CoRL, 2018.

Adaptive Human-Robot Interactions with Human Trust Maximization

Good knowledge of Python and/or C++;
Good knowledge in Robotics and Machine Learning;
Good knowledge of Deep Learning frameworks, e.g, PyTorch;

References: [1] Xu, Anqi, and Gregory Dudek. "Optimo: Online probabilistic trust inference model for asymmetric human-robot collaborations." ACM/IEEE HRI, IEEE, 2015; [2] Kwon, Minae, et al. "When humans aren’t optimal: Robots that collaborate with risk-aware humans." ACM/IEEE HRI, IEEE, 2020; [3] Chen, Min, et al. "Planning with trust for human-robot collaboration." ACM/IEEE HRI, IEEE, 2018; [4] Poole, Ben et al. “On variational bounds of mutual information”. ICML, PMLR, 2019.

Causal inference of human behavior dynamics for physical Human-Robot Interactions

Highly motivated students can apply by sending an e-mail expressing your interest to [email protected] , attaching your a letter of motivation and possibly your CV.

Good knowledge of Robotics;
Good knowledge of Deep Learning frameworks, e.g, PyTorch
Li, Q., Chalvatzaki, G., Peters, J., Wang, Y., Directed Acyclic Graph Neural Network for Human Motion Prediction, 2021 IEEE International Conference on Robotics and Automation (ICRA).
Löwe, S., Madras, D., Zemel, R. and Welling, M., 2020. Amortized causal discovery: Learning to infer causal graphs from time-series data. arXiv preprint arXiv:2006.10833.
Yang, W., Paxton, C., Mousavian, A., Chao, Y.W., Cakmak, M. and Fox, D., 2020. Reactive human-to-robot handovers of arbitrary objects. arXiv preprint arXiv:2011.08961.

Incorporating First and Second Order Mental Models for Human-Robot Cooperative Manipulation Under Partial Observability

Scope: Master Thesis Advisor: Dorothea Koert , Joni Pajarinen Added: 2021-06-08 Start: ASAP

The ability to model the beliefs and goals of a partner is an essential part of cooperative tasks. While humans develop theory of mind models for this aim already at a very early age [1] it is still an open question how to implement and make use of such models for cooperative robots [2,3,4]. In particular, in shared workspaces human robot collaboration could potentially profit from the use of such models e.g. if the robot can detect and react to planned human goals or a human's false beliefs during task execution. To make such robots a reality, the goal of this thesis is to investigate the use of first and second order mental models in a cooperative manipulation task under partial observability. Partially observable Markov decision processes (POMDPs) and interactive POMDPs (I-POMDPs) [5] define an optimal solution to the mental modeling task and may provide a solid theoretical basis for modelling. The thesis may also compare related approaches from the literature and setup an experimental design for evaluation with the bi-manual robot platform Kobo.

Highly motivated students can apply by sending an e-mail expressing your interest to [email protected] attaching your CV and transcripts.

References:

Wimmer, H., & Perner, J. Beliefs about beliefs: Representation and constraining function of wrong beliefs in young children's understanding of deception (1983)
Sandra Devin and Rachid Alami. An implemented theory of mind to improve human-robot shared plans execution (2016)
Neil Rabinowitz, Frank Perbet, Francis Song, Chiyuan Zhang, SM Ali Eslami,and Matthew Botvinick. Machine theory of mind (2018)
Connor Brooks and Daniel Szafir. Building second-order mental models for human-robot interaction. (2019)
Prashant Doshi, Xia Qu, Adam Goodie, and Diana Young. Modeling recursive reasoning by humans using empirically informed interactive pomdps. (2010)

Fundamentals of Artificial Neural Networks and Deep Learning

Open Access
First Online: 14 January 2022

Cite this chapter

You have full access to this open access chapter

Osval Antonio Montesinos López 4 ,
Abelardo Montesinos López 5 &
Jose Crossa 6 , 7

29k Accesses

23 Citations

In this chapter, we go through the fundamentals of artificial neural networks and deep learning methods. We describe the inspiration for artificial neural networks and how the methods of deep learning are built. We define the activation function and its role in capturing nonlinear patterns in the input data. We explain the universal approximation theorem for understanding the power and limitation of these methods and describe the main topologies of artificial neural networks that play an important role in the successful implementation of these methods. We also describe loss functions (and their penalized versions) and give details about in which circumstances each of them should be used or preferred. In addition to the Ridge, Lasso, and Elastic Net regularization methods, we provide details of the dropout and the early stopping methods. Finally, we provide the backpropagation method and illustrate it with two simple artificial neural networks.

You have full access to this open access chapter, Download chapter PDF

Deep Learning

Neural Networks

Supervised Artificial Neural Networks: Backpropagation Neural Networks

Artificial neural networks
Deep learning
Activation functions
Loss functions
Backpropagation method

10.1 The Inspiration for the Neural Network Model

The inspiration for artificial neural networks (ANN), or simply neural networks, resulted from the admiration for how the human brain computes complex processes, which is entirely different from the way conventional digital computers do this. The power of the human brain is superior to many information-processing systems, since it can perform highly complex, nonlinear, and parallel processing by organizing its structural constituents (neurons) to perform such tasks as accurate predictions, pattern recognition, perception, motor control, etc. It is also many times faster than the fastest digital computer in existence today. An example is the sophisticated functioning of the information-processing task called human vision. This system helps us to understand and capture the key components of the environment and supplies us with the information we need to interact with the environment. That is, the brain very often performs perceptual recognition tasks (e.g., voice recognition embedded in a complex scene) in around 100–200 ms, whereas less complex tasks many times take longer even on a powerful computer (Haykin 2009 ).

Another interesting example is the sonar of a bat, since the sonar is an active echolocation system. The sonar provides information not only about how far away the target is located but also about the relative velocity of the target, its size, and the size of various features of the target, including its azimuth and elevation. Within a brain the size of a plum occur the computations required to extract all this information from the target echo. Also, it is documented that an echolocating bat has a high rate of success when pursuing and capturing its target and, for this reason, is the envy of radar and sonar engineers (Haykin 2009 ). This bat capacity inspired the development of radar, which is able to detect objects that are in its path, without needing to see them, thanks to the emission of an ultrasonic wave, the subsequent reception and processing of the echo, which allows it to detect obstacles in its flight with surprising speed and accuracy (Francisco-Caicedo and López-Sotelo 2009 ).

In general, the functioning of the brains of humans and other animals is intriguing because they are able to perform very complex tasks in a very short time and with high efficiency. For example, signals from sensors in the body convey information related to sight, hearing, taste, smell, touch, balance, temperature, pain, etc. Then the brain’s neurons, which are autonomous units, transmit, process, and store this information so that we can respond successfully to external and internal stimuli (Dougherty 2013 ). The neurons of many animals transmit spikes of electrical activity through a long, thin strand called an axon. An axon is divided into thousands of terminals or branches, where depending on the size of the signal they synapse to dendrites of other neurons (Fig. 10.1 ). It is estimated that the brain is composed of around 10 11 neurons that work in parallel, since the processing done by the neurons and the memory captured by the synapses are distributed together over the network. The amount of information processed and stored depends on the threshold firing levels and also on the weight given by each neuron to each of its inputs (Dougherty 2013 ).

A graphic representation of a biological neuron

One of the characteristics of biological neurons, to which they owe their great capacity to process and perform highly complex tasks, is that they are highly connected to other neurons from which they receive stimuli from an event as it occurs, or hundreds of electrical signals with the information learned. When it reaches the body of the neuron, this information affects its behavior and can also affect a neighboring neuron or muscle (Francisco-Caicedo and López-Sotelo 2009 ). Francisco-Caicedo and López-Sotelo ( 2009 ) also point out that the communication between neurons goes through the so-called synapses. A synapse is a space that is occupied by chemicals called neurotransmitters. These neurotransmitters are responsible for blocking or passing on signals that come from other neurons. The neurons receive electrical signals from other neurons with which they are in contact. These signals accumulate in the body of the neuron and determine what to do. If the total electrical signal received by the neuron is sufficiently large, the action potential can be overcome, which allows the neuron to be activated or, on the contrary, to remain inactive. When a neuron is activated, it is able to transmit an electrical impulse to the neurons with which it is in contact. This new impulse, for example, acts as an input to other neurons or as a stimulus in some muscles (Francisco-Caicedo and López-Sotelo 2009 ). The architecture of biological neural networks is still the subject of active research, but some parts of the brain have been mapped, and it seems that neurons are often organized in consecutive layers, as shown in Fig. 10.2 .

Multiple layers in a biological neural network of human cortex

ANN are machines designed to perform specific tasks by imitating how the human brain works, and build a neural network made up of hundreds or even thousands of artificial neurons or processing units. The artificial neural network is implemented by developing a computational learning algorithm that does not need to program all the rules since it is able to build up its own rules of behavior through what we usually refer to as “experience.” The practical implementation of neural networks is possible due to the fact that they are massively parallel computing systems made up of a huge number of basic processing units (neurons) that are interconnected and learn from their environment, and the synaptic weights capture and store the strengths of the interconnected neurons. The job of the learning algorithm consists of modifying the synaptic weights of the network in a sequential and supervised way to reach a specific objective (Haykin 2009 ). There is evidence that neurons working together are able to learn complex linear and nonlinear input–output relationships by using sequential training procedures. It is important to point out that even though the inspiration for these models was quite different from what inspired statistical models, the building blocks of both types of models are quite similar. Anderson et al. ( 1990 ) and Ripley ( 1993 ) pointed out that neural networks are simply no more than generalized nonlinear statistical models . However, Anderson et al. ( 1990 ) were more expressive in this sense and also pointed out that “ANN are statistics for amateurs since most neural networks conceal the statistics from the user.”

10.2 The Building Blocks of Artificial Neural Networks

To get a clear idea of the main elements used to construct ANN models, in Fig. 10.3 we provide a general artificial neural network model that contains the main components for this type of models.

General artificial neural network model

x 1 , …, x p represents the information (input) that the neuron receives from the external sensory system or from other neurons with which it has a connection. w = ( w 1 , …, w p ) is the vector of synaptic weights that modifies the received information emulating the synapse between the biological neurons. These can be interpreted as gains that can attenuate or amplify the values that they wish to propagate toward the neuron. Parameter b j is known as the bias (intercept or threshold) of a neuron. Here in ANN, learning refers to the method of modifying the weights of connections between the nodes (neurons) of a specified network.

The different values that the neuron receives are modified by the synaptic weights, which then are added together to produce what is called the net input . In mathematical notation, that is equal to

This net input ( v j ) is what determines whether the neuron is activated or not. The activation of the neuron depends on what we call the activation function . The net input is evaluated in this function and we obtain the output of the network as shown next:

where g is the activation function. For example, if we define this function as a unit step (also called threshold), the output will be 1 if the net input is greater than zero; otherwise the output will be 0. Although there is no biological behavior indicating the presence of something similar to the neurons of the brain, the use of the activation function is an artifice that allows applying ANN to a great diversity of real problems. According to what has been mentioned, output y j of the neuron is generated when evaluating the net input ( v j ) in the activation function. We can propagate the output of the neuron to other neurons or it can be the output of the network, which, according to the application, will have an interpretation for the user. In general, the job of an artificial neural network model is done by simple elements called neurons. The signals are passed between neurons through connection links. Each connection link has an associated weight, which, in a typical neuronal network, multiplies the transmitted signal. Each neuron applies an activation function (usually nonlinear) to the network inputs (sum of the heavy input signals) for determining its corresponding sign. Later in this chapter, we describe the many options for activation functions and the context in which they can be used.

A unilayer ANN like that in Fig. 10.3 has a low processing capacity by itself and its level of applicability is low; its true power lies in the interconnection of many ANNs, as happens in the human brain. This has motivated different researchers to propose various topologies (architectures) to connect neurons to each other in the context of ANN. Next, we provide two definitions of ANN and one definition of deep learning:

Definition 1 . An artificial neural network is a system composed of many simple elements of processing which operate in parallel and whose function is determined by the structure of the network and the weight of connections, where the processing is done in each of the nodes or computing elements that has a low processing capacity (Francisco-Caicedo and López-Sotelo 2009 ).

Definition 2 . An artificial neural network is a structure containing simple elements that are interconnected in many ways with hierarchical organization, which tries to interact with objects in the real world in the same way as the biological nervous system does (Kohonen 2000 ).

Deep learning model . We define deep learning as a generalization of ANN where more than one hidden layer is used, which implies that more neurons are used for implementing the model. For this reason, an artificial neural network with multiple hidden layers is called a Deep Neural Network (DNN) and the practice of training this type of networks is called deep learning (DL) , which is a branch of statistical machine learning where a multilayered (deep) topology is used to map the relations between input variables (independent variables) and the response variable (outcome). Chollet and Allaire ( 2017 ) point out that DL puts the “emphasis on learning successive layers of increasingly meaningful representations.” The adjective “deep” applies not to the acquired knowledge, but to the way in which the knowledge is acquired (Lewis 2016 ), since it stands for the idea of successive layers of representations. The “deep” of the model refers to the number of layers that contribute to the model. For this reason, this field is also called layered representation learning and hierarchical representation learning (Chollet and Allaire 2017 ).

It is important to point out that DL as a subset of machine learning is an aspect of artificial intelligence (AI) that has more complex ways of connecting layers than conventional ANN, which uses more neurons than previous networks to capture nonlinear aspects of complex data better, but at the cost of more computing power required to automatically extract useful knowledge from complex data.

To have a more complete picture of ANN, we provide another model, which is a DL model since it has two hidden layers, as shown in Fig. 10.4 .

Artificial deep neural network with a feedforward neural network with eight input variables ( x 1, … , x 8), four output variables ( y 1, y 2, y 3, y 4), and two hidden layers with three neurons each

From Fig. 10.4 we can see that an artificial neural network is a directed graph whose nodes correspond to neurons and whose edges correspond to links between them. Each neuron receives, as input, a weighted sum of the outputs of the neurons connected to its incoming edges (Shalev-Shwartz and Ben-David 2014 ). In the artificial deep neural network given in Fig. 10.4 , there are four layers ( V 0 , V 1 , V 2 , and V 3 ): V 0 represents the input layer, V 1 and V 2 are the hidden layers, and V 3 denotes the output layer. In this artificial deep neural network, three is the number of layers of the network since V 0 , which contains the input information, is excluded. This is also called the “depth” of the network. The size of this network is \( \left|V\right|=\left|\bigcup \limits_{t=0}^{\mathrm{T}}{V}_t\right|=\left|9+4+4+4\right|=21 \) . Note that in each layer we added +1 to the observed units to represent the node of the bias (or intercept). The width of the network is max| V t | = 9.

The analytical form of the model given in Fig. 10.4 for output o , with d inputs, M 1 hidden neurons (units) in hidden layer 1, M 2 hidden units in hidden layer 2, and O output neurons is given by the following ( 10.1 )–( 10.3 ):

where ( 10.1 ) produces the output of each of the neurons in the first hidden layer, ( 10.2 ) produces the output of each of the neurons in the second hidden layer, and finally ( 10.3 ) produces the output of each response variable of interest. The learning process is obtained with the weights ( \( {w}_{ji}^{(1)},{w}_{kj}^{(2)}, \) and \( {w}_{lk}^{(3)}\Big) \) , which are accommodated in the following vector: \( \boldsymbol{w}=\left({w}_{11}^{(1)},{w}_{12}^{(1)},\dots, {w}_{1d}^{(1)},{w}_{21}^{(2)},{w}_{22}^{(2)},\dots, {w}_{2{M}_1}^{(2)},{w}_{31}^{(3)},{w}_{32}^{(3)},\dots, {w}_{3{M}_2}^{(3)}\right), \) g 1 , g 2 , and g 3 are the activation functions in hidden layers 1, 2, and the output layer, respectively.

The model given in Fig. 10.4 is organized as several interconnected layers: the input layer, hidden layers, and output layer, where each layer that performs nonlinear transformations is a collection of artificial neurons, and connections among these layers are made using weights (Fig. 10.4 ). When only one output variable is present in Fig. 10.4 , the model is called univariate DL model. Also, when only one hidden layer is present in Fig. 10.4 , the DL model is reduced to a conventional artificial neural network model, but when more than one hidden layer is included, it is possible to better capture complex interactions, nonlinearities, and nonadditive effects. To better understand the elements of the model depicted in Fig. 10.4 , it is important to distinguish between the types of layers and the types of neurons; for this reason, next we will explain the type of layers and then the type of neurons in more detail.

Input layer: It is the set of neurons that directly receives the information coming from the external sources of the network. In the context of Fig. 10.4 , this information is x 1, … , x 8 (Francisco-Caicedo and López-Sotelo 2009 ). Therefore, the number of neurons in an input layer is most of the time the same as the number of the input explanatory variables provided to the network. Usually input layers are followed by at least one hidden layer. Only in feedforward neuronal networks, input layers are fully connected to the next hidden layer (Patterson and Gibson 2017 ).

Hidden layers: Consist of a set of internal neurons of the network that do not have direct contact with the outside. The number of hidden layers can be 0, 1, or more. In general, the neurons of each hidden layer share the same type of information; for this reason, they are called hidden layers. The neurons of the hidden layers can be interconnected in different ways; this determines, together with their number, the different topologies of ANN and DNN (Francisco-Caicedo and López-Sotelo 2009 ). The learned information extracted from the training data is stored and captured by the weight values of the connections between the layers of the artificial neural network. Also, it is important to point out that hidden layers are key components for capturing complex nonlinear behaviors of data more efficiently (Patterson and Gibson 2017 ).

Output layer: It is a set of neurons that transfers the information that the network has processed to the outside (Francisco-Caicedo and López-Sotelo 2009 ). In Fig. 10.4 the output neurons correspond to the output variables y 1, y 2, y 3, and y 4. This means that the output layer gives the answer or prediction of the artificial neural network model based on the input from the input layer. The final output can be continuous, binary, ordinal, or count depending on the setup of the ANN which is controlled by the activation (or inverse link in the statistical domain) function we specified on the neurons in the output layer (Patterson and Gibson 2017 ).

Next, we define the types of neurons: (1) input neuron . A neuron that receives external inputs from outside the network; (2) output neuron . A neuron that produces some of the outputs of the network; and (3) hidden neuron . A neuron that has no direct interaction with the “outside world” but only with other neurons within the network. Similar terminology is used at the layer level for multilayer neural networks .

As can be seen in Fig. 10.4 , the distribution of neurons within an artificial neural network is done by forming levels of a certain number of neurons. If a set of artificial neurons simultaneously receives the same type of information, we call it a layer. We also described a network of three types of levels called layers. Figure 10.5 shows another six networks with different numbers of layers, and half of them (Fig. 10.5a, c, e ) are univariate since the response variable we wish to predict is only one, while the other half (Fig. 10.5b, d, f ) are multivariate since the interest of the network is to predict two outputs. It is important to point out that subpanels a and b in Fig. 10.5 are networks with only one layer and without hidden layers; for this reason, this type of networks corresponds to conventional regression or classification regression models.

Different feedforward topologies with univariate and multivariate outputs and different number of layers. ( a ) Unilayer and univariate output. ( b ) Unilayer and multivariate output. ( c ) Three layer and univariate output. ( d ) Three layer and multivariate output. ( e ) Four layer univariate output. ( f ) Four layer multivariate output

Therefore, the topology of an artificial neural network is the way in which neurons are organized inside the network; it is closely linked to the learning algorithm used to train the network. Depending on the number of layers, we define the networks as monolayer and multilayer ; and if we take as a classification element the way information flows, we define the networks as feedforward or recurrent. Each type of topology will be described in another section.

In summary, an artificial (deep) neural network model is an information processing system that mimics the behavior of biological neural networks, which was developed as a generalization of mathematical models of human knowledge or neuronal biology.

10.3 Activation Functions

The mapping between inputs and a hidden layer in ANN and DNN is determined by activation functions. Activation functions propagate the output of one layer’s nodes forward to the next layer (up to and including the output layer). Activation functions are scalar-to-scalar functions that provide a specific output of the neuron. Activation functions allow nonlinearities to be introduced into the network’s modeling capabilities (Wiley 2016 ). The activation function of a neuron (node) defines the functional form for how a neuron gets activated. For example, if we define a linear activation function as g ( z ) = z , in this case the value of the neuron would be the raw input, x , times the learned weight, that is, a linear model. Next, we describe the most popular activation functions.

10.3.1 Linear

Figure 10.6 shows a linear activation function that is basically the identity function. It is defined as g ( z ) = Wz , where the dependent variable has a direct, proportional relationship with the independent variable. In practical terms, it means the function passes the signal through unchanged. The problem with making activation functions linear is that this does not permit any nonlinear functional forms to be learned (Patterson and Gibson 2017 ).

Representation of a linear activation function

10.3.2 Rectifier Linear Unit (ReLU)

The rectifier linear unit (ReLU) activation function is one of the most popular. The ReLU activation function is flat below some threshold (usually the threshold is zero) and then linear. The ReLU activates a node only if the input is above a certain quantity. When the input is below zero, the output is zero, but when the input rises above a certain threshold, it has a linear relationship with the dependent variable g ( z ) = max (0, z ), as demonstrated in Fig. 10.7 . Despite its simplicity, the ReLU activation function provides nonlinear transformation, and enough linear rectifiers can be used to approximate arbitrary nonlinear functions, unlike when only linear activation functions are used (Patterson and Gibson 2017 ). ReLUs are the current state of the art because they have proven to work in many different situations. Because the gradient of a ReLU is either zero or a constant, it is not easy to control the vanishing exploding gradient issue, also known as the “dying ReLU” issue. ReLU activation functions have been shown to train better in practice than sigmoid activation functions. This activation function is the most used in hidden layers and in output layers when the response variable is continuous and larger than zero.

Representation of the ReLU activation function

10.3.3 Leaky ReLU

Leaky ReLUs are a strategy to mitigate the “dying ReLU” issue. As opposed to having the function be zero when z < 0, the leaky ReLU will instead have a small negative slope, α ,where α is a value between 0 and 1 (Fig. 10.8 ). In practice, some success has been achieved with this ReLU variation, but results are not always consistent. The function of this activation function is given here:

Representation of the Leaky ReLU activation function with α = 0.1

10.3.4 Sigmoid

A sigmoid activation function is a machine that converts independent variables of near infinite range into simple probabilities between 0 and 1, and most of its output will be very close to 0 or 1. Like all logistic transformations, sigmoids can reduce extreme values or outliers in data without removing them. This activation function resembles an S (Wiley 2016 ; Patterson and Gibson 2017 ) and is defined as g ( z ) = (1 + e − z ) −1 . This activation function is one of the most common types of activation functions used to construct ANNs and DNNs, where the outcome is a probability or binary outcome. This activation function is a strictly increasing function that exhibits a graceful balance between linear and nonlinear behavior but has the propensity to get “stuck,” i.e., the output values would be very close to 1 or 0 when the input values are strongly positive or negative (Fig. 10.9 ). By getting “stuck” we mean that the learning process is not improving due to the large or small values of the output values of this activation function.

Representation of the sigmoid activation function

10.3.5 Softmax

Softmax is a generalization of the sigmoid activation function that handles multinomial labeling systems, that is, it is appropriate for categorical outcomes. Softmax is the function you will often find in the output layer of a classifier with more than two categories. The softmax activation function returns the probability distribution over mutually exclusive output classes. To further illustrate the idea of the softmax output layer and how to use it, let’s consider two types of uses. If we have a multiclass modeling problem we only care about the best score across these classes, we’d use a softmax output layer with an argmax() function to get the highest score across all classes. For example, let us assume that our categorical response has ten classes; with this activation function we calculate a probability for each category (the sum of the ten categories is one) and we classify a particular individual in the class with the largest probability. It is important to recall that if we want to get binary classifications per output (e.g., “diseased and not diseased”), we do not want softmax as an output layer. Instead, we will use the sigmoid activation function explained before. The softmax function is defined as

This activation function is a generalization of the sigmoid activation function that squeezes (force) a C dimensional vector of arbitrary real values to a C dimensional vector of real values in the range [0,1] that adds up to 1. A strong prediction would have a single entry in the vector close to 1, while the remaining entries would be close to 0. A weak prediction would have multiple possible categories (labels) that are more or less equally likely. The sigmoid and softmax activation functions are suitable for probabilistic interpretation due to the fact that the output is a probabilistic distribution of the classes. This activation function is mostly recommended for output layers when the response variable is categorical.

10.3.6 Tanh

The hyperbolic tangent (Tanh) activation function is defined as \( \tanh \left(\mathrm{z}\right)=\sinh \left(\mathrm{z}\right)/\cosh \left(\mathrm{z}\right)=\frac{\exp (z)-\exp \left(-z\right)}{\exp (z)+\exp \left(-z\right)} \) . The hyperbolic tangent works well in some cases and, like the sigmoid activation function, has a sigmoidal (“S” shaped) output, with the advantage that it is less likely to get “stuck” than the sigmoid activation function since its output values are between −1 and 1, as shown in Fig. 10.10 . For this reason, for hidden layers should be preferred the Tanh activation function. Large negative inputs to the tanh function will give negative outputs, while large positive inputs will give positive outputs (Patterson and Gibson 2017 ). The advantage of tanh is that it can deal more easily with negative numbers.

Representation of the tanh activation function

It is important to point out that there are more activations functions like the threshold activation function introduced in the pioneering work on ANN by McCulloch and Pitts ( 1943 ), but the ones just mentioned are some of the most used.

10.4 The Universal Approximation Theorem

The universal approximation theorem is at the heart of ANN since it provides the mathematical basis of why artificial neural networks work in practice for nonlinear input–output mapping. According to Haykin ( 2009 ), this theorem can be stated as follows.

Let g () be a bounded, and monotone-increasing continuous function. Let \( {I}_{m_0} \) denote the m 0 -dimensional unit hypercube \( {\left[0,1\right]}^{m_0} \) . The space of continuous functions on \( {I}_{m_0} \) is denoted by \( C\left({I}_{m_0}\right) \) . Then given any function \( f\ni C\left({I}_{m_0}\right) \) and ε > 0, there is an integer m 1 and sets of real constants α i , b i , and w ij , where i = 1,…, m 1 and j = 1,…, m 0 such that we may define

as an approximate realization of function f (·); that is,

For all \( {x}_1,\dots, {x}_{m_0} \) that lie in the input space.

m 0 represents the input nodes of a multilayer perceptron with a single hidden layer. m 1 is the number of neurons in the single hidden layer, \( {x}_1,\dots, {x}_{m_0} \) are the inputs, w ij denotes the weight of neuron i in input j , b i denotes the bias corresponding to neuron i , and α i is the weight of the output layer in neuron i .

This theorem states that any feedforward neural network containing a finite number of neurons is capable of approximating any continuous functions of arbitrary complexity to arbitrary accuracy, if provided enough neurons in even a single hidden layer, under mild assumptions of the activation function. In other words, this theorem says that any continuous function that maps intervals of real numbers to some output interval of real numbers can be approximated arbitrarily and closely by a multilayer perceptron with just one hidden layer and a finite very large number of neurons (Cybenko 1989 ; Hornik 1991 ). However, this theorem only guarantees a reasonable approximation; for this reason, this theorem is an existence theorem. This implies that simple ANNs are able to represent a wide variety of interesting functions if given enough neurons and appropriate parameters; but nothing is mentioned about the algorithmic learnability of those parameters, nor about their time of learning, ease of implementation, generalization, or that a single hidden layer is optimum. The first version of this theorem was given by Cybenko ( 1989 ) for sigmoid activation functions. Two years later, Hornik ( 1991 ) pointed out that the potential of “ANN of being universal approximators is not due to the specific choice of the activation function, but to the multilayer feedforward architecture itself.”

From this theorem, we can deduce that when an artificial neural network has more than two hidden layers, it will not always improve the prediction performance since there is a higher risk of converging to a local minimum. However, using two hidden layers is recommended when the data has discontinuities. Although the proof of this theorem was done for only a single output, it is also valid for the multi-output scenario and can easily be deduced from the single output case. It is important to point out that this theorem states that all activation functions will perform equally well in specific learning problems since their performance depends on the data and additional issues such as minimal redundancy, computational efficiency, etc.

10.5 Artificial Neural Network Topologies

In this subsection, we describe the most popular network topologies. An artificial neural network topology represents the way in which neurons are connected to form a network. In other words, the neural network topology can be seen as the relationship between the neurons by means of their connections. The topology of a neural network plays a fundamental role in its functionality and performance, as illustrated throughout this chapter. The generic terms structure and architecture are used as synonyms for network topology. However, caution should be exercised when using these terms since their meaning is not well defined and causes confusion in other domains where the same terms are used for other purposes.

More precisely, the topology of a neural network consists of its frame or framework of neurons, together with its interconnection structure or connectivity:

The next two subsections are devoted to these two components.

Artificial neural framework

Most neural networks, including many biological ones, have a layered topology. There are a few exceptions where the network is not explicitly layered, but those can usually be interpreted as having a layered topology, for example, in some associative memory networks, which can be seen as one-layer neural networks where all neurons function both as input and output units. At the framework level, neurons are considered abstract entities, therefore possible differences between them are not considered. The framework of an artificial neural network can therefore be described by the number of neurons, number of layers (denoted by L ) , and the size of the layer, which consists of the number of neurons in each of the layers.

Interconnection structure

The interconnection structure of an artificial neural network determines the way in which the neurons are linked. Based on a layered structure, several different kinds of connections can be distinguished (see Fig. 10.11 ): (a) Interlayer connection : This connects neurons in adjacent layers whose layer indices differ by one; (b) Intralayer connection : This is a connection between neurons in the same layer; (c) Self-connection : This is a special kind of intralayer connection that connects a neuron to itself; (d) Supralayer connection : This is a connection between neurons that are in distinct nonadjacent layers; in other words, these connections “cross” or “jump” at least one hidden layer.

Network topology with two layers. (i) denotes the six interlayer connections, (s) denotes the four supralayered connections, and (a) denotes four intralayer connections of which two are self-connections

With each connection (interconnection ), a weight (strength) is associated which is a weighting factor that reflects its importance. This weight is a scalar value (a number), which can be positive (excitatory) or negative (inhibitory). If a connection has zero weight, it is considered to be nonexistent at that point in time.

Note that the basic concept of layeredness is based on the presence of interlayer connections. In other words, every layered neural network has at least one interlayer connection between adjacent layers. If interlayer connections are absent between any two adjacent clusters in the network, a spatial reordering can be applied to the topology, after which certain connections become the interlayer connections of the transformed, layered network.

Now that we have described the two key components of an artificial neural network topology, we will present two of the most commonly used topologies.

Feedforward network

In this type of artificial neural network, the information flows in a single direction from the input neurons to the processing layer or layers (only interlayer connections) for monolayer and multilayer networks, respectively, until reaching the output layer of the neural network. This means that there are no connections between neurons in the same layer (no intralayer), and there are no connections that transmit data from a higher layer to a lower layer, that is, no supralayer connections (Fig. 10.12 ). This type of network is simple to analyze, but is not restricted to only one hidden layer.

A simple two-layer feedforward artificial neural network

Recurrent networks

In this type of neural network, information does not always flow in one direction, since it can feed back into previous layers through synaptic connections. This type of neural network can be monolayer or multilayer. In this network, all the neurons have (1) incoming connections emanating from all the neurons in the previous layer, (2) ongoing connections leading to all the neurons in the subsequent layer, and (3) recurrent connections that propagate information between neurons of the same layer. Recurrent neural networks (RNNs) are different from a feedforward neural network in that they have at least one feedback loop since the signals travel in both directions. This type of network is frequently used in time series prediction since short-term memory, or delay, increases the power of recurrent networks immensely. In this case, we present an example of a recurrent two-layer neural network. The output of each neuron is passed through a delay unit and then taken to all the neurons, except itself. In Figs. 10.13 and 10.14 , we can see that only one input variable is presented to the input units, the feedforward flow is computed, and the outputs are fed back as auxiliary inputs. This leads to a different set of hidden unit activations, new output activations, and so on. Ultimately, the activations stabilize, and the final output values are used for predictions.

A simple two-layer recurrent artificial neural network with univariate output

A two-layer recurrent artificial neural network with multivariate outputs

However, it is important to point out out that despite the just mentioned virtues of recurrent artificial neural networks, they are still largely theoretical and produce mixed results (good and bad) in real applications. On the other hand, the feedforward networks are the most popular since they are successfully implemented in all areas of domain; the multilayer perceptron (MLP; that is, onother name give to feedforward networks) is the de facto standard artificial neural network topology (Lantz 2015 ). There are other DNN topologies like convolutional neural networks that are presented in Chap. 13 , but they can be found also in books specializing in deep learning.

10.6 Successful Applications of ANN and DL

The success of ANN and DL is due to remarkable results on perceptual problems such as seeing and hearing—problems involving skills that seem natural and intuitive to humans but have long been elusive for machines. Next, we provide some of these successful applications:

Near-human-level image classification, speech recognition, handwriting transcription, autonomous driving (Chollet and Allaire 2017 )

Automatic translation of text and images (LeCun et al. 2015 )

Improved text-to-speech conversion (Chollet and Allaire 2017 )

Digital assistants such as Google Now and Amazon Alexa

Improved ad targeting, as used by Google, Baidu, and Bing

Improved search results on the Web (Chollet and Allaire 2017 )

Ability to answer natural language questions (Goldberg 2016 )

In games like chess, Jeopardy, GO, and poker (Makridakis et al. 2018 )

Self-driving cars (Liu et al. 2017 ),

Voice search and voice-activated intelligent assistants (LeCun et al. 2015 )

Automatically adding sound to silent movies (Chollet and Allaire 2017 )

Energy market price forecasting (Weron 2014 )

Image recognition (LeCun et al. 2015 )

Prediction of time series (Dingli and Fournier 2017 )

Predicting breast, brain (Cole et al. 2017 ), or skin cancer

Automatic image captioning (Chollet and Allaire 2017 )

Predicting earthquakes (Rouet-Leduc et al. 2017 )

Genomic prediction (Montesinos-López et al. 2018a , b )

It is important to point out that the applications of ANN and DL are not restricted to perception and natural language understanding, such as formal reasoning. There are also many successful applications in biological science. For example, deep learning has been successfully applied for predicting univariate continuous traits (Montesinos-López et al. 2018a ), multivariate continuous traits (Montesinos-López et al. 2018b ), univariate ordinal traits (Montesinos-López et al. 2019a ), and multivariate traits with mixed outcomes (Montesinos-López et al. 2019b ) in the context of genomic-based prediction. Menden et al. ( 2013 ) applied a DL method to predict the viability of a cancer cell line exposed to a drug. Alipanahi et al. ( 2015 ) used DL with a convolutional network architecture (an ANN with convolutional operations; see Chap. 13 ) to predict specificities of DNA- and RNA-binding proteins. Tavanaei et al. ( 2017 ) used a DL method for predicting tumor suppressor genes and oncogenes. DL methods have also made accurate predictions of single-cell DNA methylation states (Angermueller et al. 2016 ). In the area of genomic selection, we mention two reports only: (a) McDowell and Grant ( 2016 ) found that DL methods performed similarly to several Bayesian and linear regression techniques that are commonly employed for phenotype prediction and genomic selection in plant breeding and (b) Ma et al. ( 2017 ) also used a DL method with a convolutional neural network architecture to predict phenotypes from genotypes in wheat and found that the DL method outperformed the GBLUP method. However, a review of DL application to genomic selection is provided by Montesinos-López et al. ( 2021 ).

10.7 Loss Functions

Loss function (also known as objective function) in general terms is a function that maps an event or values of one or more variables onto a real number intuitively representing some “cost” associated with the event. An optimization problem seeks to minimize a loss function. An objective function is either a loss function or its negative (in specific domains, variously called a reward function, a profit function, a utility function, a fitness function, etc.), in which case now the goal is a maximization process. In the statistical machine learning domain, a loss function tries to quantify how close the predicted values produced by an artificial neural network or DL model are to the true values. That is, the loss function measures the quality of the network’s output by computing a distance score between the observed and predicted values (Chollet and Allaire 2017 ). The basic idea is to calculate a metric based on the observed error between the true and predicted values to measure how well the artificial neural network model’s prediction matches what was expected. Then these errors are averaged over the entire data set to provide only a single number that represents how the artificial neural network is performing with regard to its ideal. In looking for this ideal, it is possible to find the parameters (weights and biases) of the artificial neural network that will minimize the “loss” produced by the errors. Training ANN models with loss functions allows the use of optimization methods to estimate the required parameters. Although most of the time it is not possible to obtain an analytical solution to estimate the parameters, very often good approximations can be obtained using iterative optimization algorithms like gradient descent (Patterson and Gibson 2017 ). Next, we provide the most used loss functions for each type of response variable.

10.7.1 Loss Functions for Continuous Outcomes

Sum of square error loss.

This loss function is appropriate for continuous response variables (outcomes), assuming that we want to predict L response variables. The error (difference between observed ( y ij ) and predicted ( \( {\hat{y}}_{ij} \) ) values) in a prediction is squared and summed over the number of observations, since the training of the network is not local but global. To capture all possible trends in the training data, the expression used for sum of square error (SSE) loss is

Note that n is the size of your data set, and L , the number of targets (outputs) the network has to predict. It is important to point out that when there is only one response variable, the L is dropped. Also, the division by two is added for mathematical convenience (which will become clearer in the context of its gradient in backpropagation). One disadvantage of this loss function is that it is quite sensitive to outliers and, for this reason, other loss functions have been proposed for continuous response variables. With the loss function, it is possible to calculate the loss score, which is used as a feedback signal to adjust the weights of the artificial neural network; this process of adjusting the weights in ANN is illustrated in Fig. 10.15 (Chollet and Allaire 2017 ). It is also common practice to use as a loss function, the SSE divided by the training sample ( n ) multiplied by the number of outputs ( L ).

The loss score is used as a feedback signal to adjust the weights

Figure 10.15 shows that in the learning process of an artificial neural network are involved the interaction of layers, input data, loss function which defines the feedback signal used for learning, and the optimizer which determines how the learning proceeds and uses the loss value to update the network’s weights. Initially, the weights of the network are assigned small random values, but when this provides an output far from the ideal values, it also implies a high loss score. But at each iteration of the network process, the weights are adjusted a little to reduce the difference between the observed and predicted values and, of course, to decrease the loss score. This is the basic step of the training process of statistical machine learning models in general, and when this process is repeated a sufficient number of times (on the order of thousands of iterations), it yields weight values that minimize the loss function. A network with minimal loss is one in which the observed and predicted values are very close; it is called a trained network (Chollet and Allaire 2017 ). There are other options of loss functions for continuous data like the sum of absolute percentage error loss (SAPE): \( L\left(\boldsymbol{w}\right)={\sum}_{i=1}^n{\sum}_{j=1}^L\left|\frac{{\hat{y}}_{ij}-{y}_{ij}}{y_{ij}}\right| \) and the sum of squared log error loss (Patterson and Gibson 2017 ): \( L\left(\boldsymbol{w}\right)={\sum}_{i=1}^n{\sum}_{j=1}^L{\left(\log \left({\hat{y}}_{ij}\right)-\log \left({y}_{ij}\right)\right)}^2 \) , but the SSE is popular in ANN and DL models due to its nice mathematical properties.

10.7.2 Loss Functions for Binary and Ordinal Outcomes

Next, we provide two popular loss functions for binary data: the hinge loss and the cross-entropy loss.

This loss function originated in the context of the support vector machine for “maximum-margin” classification, and is defined as

It is important to point out that since this loss function is appropriate for binary data, the intended response variable output is denoted as +1 for success and −1 for failure.

Logistic loss

This loss function is defined as

This loss function originated as the negative log-likelihood of the product of Bernoulli distributions. It is also known as cross-entropy loss since we arrive at the logistic loss by calculating the cross-entropy (difference between two probability distributions) loss, which is a measure of the divergence between the predicted probability distribution and the true distribution. Logistic loss functions are preferred over the hinge loss when the scientist is mostly interested in the probabilities of success rather than in just the hard classifications. For example, when a scientist is interested in the probability that a patient can get cancer as a function of a set of covariates, the logistic loss is preferred since it allows calculating true probabilities.

When the number of classes is more than two according to Patterson and Gibson ( 2017 ), that is, when we are in the presence of categorical data, the loss function is known as categorical cross-entropy and is equal to

Poisson loss

This loss function is built as the minus log-likelihood of a Poisson distribution and is appropriate for predicting count outcomes. It is defined as

Also, for count data the loss function can be obtained under a negative binomial distribution, which can do a better job than the Poisson distribution when the assumption of equal mean and variance is hard to justify.

10.7.3 Regularized Loss Functions

Regularization is a method that helps to reduce the complexity of the model and significantly reduces the variance of statistical machine learning models without any substantial increase in their bias. For this reason, to prevent overfitting and improve the generalizability of our models, we use regularization (penalization), which is concerned with reducing testing errors so that the model performs well on new data as well as on training data. Regularized or penalized loss functions are those that instead of minimizing the conventional loss function, L ( w ), minimize an augmented loss function that consists of the sum of the conventional loss function and a penalty (or regularization) term that is a function of the weights. This is defined as

where L ( w , λ ) is the regularized (or penalized) loss function, λ is the degree or strength of the penalty term, and E P is the penalization proposed for the weights; this is known as the regularization term. The regularization term shrinks the weight estimates toward zero, which helps to reduce the variance of the estimates and increase the bias of the weights, which in turn helps to improve the out-of-sample predictions of statistical machine learning models (James et al. 2013 ). As you remember, the way to introduce the penalization term is using exactly the same logic used in Ridge regression in Chap. 3 . Depending on the form of E P , there is a name for the type of regularization. For example, when E P = w T w , it is called Ridge penalty or weight decay penalty. This regularization is also called L2 penalty and has the effect that larger weights (positive or negative) result in larger penalties. On the other hand, when \( {E}_P={\sum}_{p=1}^P\left|{w}_p\right| \) , that is, when the E P term is equal to the sum of the absolute weights, the name of this regularization is Least Absolute Shrinkage and Selection Operator (Lasso) or simply L1 regularization. The L1 penalty produces a sparse solution (more zero weights) because small and larger weights are equally penalized and force some weights to be exactly equal to zero when the λ is considerably large (James et al. 2013 ; Wiley 2016 ); for this reason, the Lasso penalization also performs variable selection and provides a model more interpretable than the Ridge penalty. By combining Ridge (L2) and Lasso (L1) regularization, we obtained Elastic Net regularization, where the loss function is defined as \( L\left(\boldsymbol{w},{\lambda}_1,{\lambda}_2\right)=L\left(\boldsymbol{w}\right)+0.5\times {\lambda}_1{\sum}_{p=1}^P\left|{w}_p\right|+0.5\times {\lambda}_2{\sum}_{p=1}^P{w}_p^2 \) , and where instead of one lambda parameter, two are needed.

It is important to point out that more than one hyperparameter is needed in ANN and DL models where different degrees of penalties can be applied to different layers and different hyperparameters. This differential penalization is sometimes desirable to improve the predictions in new data, but this has the disadvantage that more hyperparameters need to be tuned, which increases the computation cost of the optimization process (Wiley 2016 ).

In all types of regularization, when λ = 0 (or λ 1 = λ 2 = 0), the penalty term has no effect, but the larger the value of λ , the more the shrinkage and penalty grows and the weight estimates will approach zero. The selection of the appropriate value of λ is challenging and critical; for this reason, λ is also treated as a hyperparameter that needs to be tuned and is usually optimized by evaluating a range of possible λ values through cross-validation. It is also important to point out that scaling the input data before implementing artificial neural networks is recommended, since the effect of the penalty depends on the size of the weights and the size of the weights depends on the scale of the data. Also, the user needs to recall from Chap. 3 where Ridge regression was presented, that the shrinkage penalty is applied to all the weights except the intercept or bias terms (Wiley 2016 ).

Another type of regularization that is very popular in ANN and DL is the dropout, which consists of setting to zero a random fraction (or percentage) of the weights of the input neurons or hidden neurons. Suppose that our original topology is like the topology given in Fig. 10.16 .16a, where all the neurons are active (with weights different to zero), while when a random fraction of neurons is dropped out, this means that all its connections (weights) are set to zero and the topology with the dropout neurons (with weights set to zero) is observed in Fig. 10.16b . The contribution of those dropped out neurons to the activation of downstream neurons is temporarily removed on the forward pass and any weight updates are not applied to the neuron on the backward pass. Dropout is only used during the training of a model but not when evaluating the skill of the model; it prevents the units from co-adapting too much.

Feedforward neural network with four layers. ( a ) Three input neurons, four neurons in hidden layers 1 and 3, and five neurons in hidden layer 2 without dropout and ( b ) the same network with dropout; dropping out one in the input neuron, three neurons in hidden layers 1–3

This type of regularization is very simple and there is a lot of empirical evidence of its power to avoid overfitting. This regularization is quite new in the context of statistical machine learning and was proposed by Srivastava et al. ( 2014 ) in the paper Dropout: A Simple Way to Prevent Neural Networks from Overfitting . There are no unique rules to choose the percentage of neurons that will be dropped out. Some tips are given below to choose the % dropout:

Usually a good starting point is to use 20% dropout, but values between 20% and 50% are reasonable. A percentage that is too low has minimal effect and a value that is too high results in underfitting the network.

The larger the network, the better, when you use the dropout method, since then you are more likely to get a better performance, because the model had more chance to learn independent representations.

Application of dropout is not restricted to hidden neurons; it can also be applied in the input layer. In both cases, there is evidence that it improves the performance of the ANN model.

When using dropout, increasing the learning rate (learning rate is a tuning parameter in an optimization algorithm that regulates the step size at each epoch (iteration) while moving toward a minimum (or maximum) of a loss function ) of the ANN algorithm by a factor of 10–100 is suggested, as well as increasing the momentum value (another tuning parameter useful for computing the gradient at each iteration), for example, from 0.90 to 0.99.

When dropout is used, it is also a good idea to constrain the size of network weights, since the larger the learning rate, the larger the network weights. For this reason, constraining the size of network weights to less than five in absolute values with max-norm regularization has shown to improve results.

It is important to point out that all the loss functions described in the previous section can be converted to regularized (penalized) loss functions using the elements given in this section. The dropout method can also be implemented with any type of loss function.

10.7.4 Early Stopping Method of Training

During the training process, the ANN and DL models learn in stages, from simple realizations to complex mapping functions. This process is captured by monitoring the behavior of the mean squared error that compares the match between observed and predicted values, which starts decreasing rapidly by increasing the number of epochs (epoch refers to one cycle through the full training data set) used for training, then decrease slowly when the error surface is close to a local minimum. However, to attain the larger generalization power of a model, it is necessary to figure out when it is best to stop training, which is a very challenging situation since a very early stopping point can produce underfitting, while a very late (no large) stopping point can produce overfitting of the training data. As mentioned in Chap. 4 , one way to avoid overfitting is to use a CV strategy, where the training set is split into a training-inner and testing-inner set; with the training-inner set, the model is trained for the set of hyperparameters, and with the testing-inner (tuning) set, the power to predict out of sample data is evaluated, and in this way the optimal hyperparameters are obtained. However, we can incorporate the early stopping method to CV to fight better overfitting by using the CV strategy in the usual way with a minor modification, which consists of stopping the training section periodically (i.e., every so many epochs) and testing the model on the validation subset, after reaching the specified number of epochs (Haykin 2009 ). In other words, the stopping method combined with the CV strategy that consists of a periodic “estimation-followed-by-validation process” basically proceeds as follows:

After a period of estimation (training)—every three epochs, for example—the weights and bias (intercept) parameters of the multilayer perceptron are all fixed, and the network is operated in its forward mode. Then the training and validation error are computed.

When the validation prediction performance is completed, the estimation (training) is started again for another period, and the process is repeated.

Due to its nature (just described above), which is simple to understand and easy to implement in practice, this method is called early stopping method of training. To better understand this method, in Fig. 10.17 this approach is conceptualized with two learning curves, one for the training subset and the other for the validation subset. Figure 10.17 shows that the prediction power in terms of MSE is lower in the training set than in the validation set, which is expected. The estimation learning curve that corresponds to the training set decreases monotonically as the number of epochs increases, which is normal, while the validation learning curve decreases monotonically to a minimum and then as the training continues, starts to increase. However, the estimation learning curve of the training set suggests that we can do better by going beyond the minimum point on the validation learning curve, but this is not really true since in essence what is learned beyond this point is the noise contained in the training data. For this reason, the minimum point on the validation learning curve could be used as a sensible criterion for stopping the training session. However, the validation sample error does not evolve as smoothly as the perfect curve shown in Fig. 10.17 , over the number of epochs used for training, since the validation sample error many times exhibits few local minima of its own before it starts to increase with an increasing number of epochs. For this reason, in the presence of two or more local minima, the selection of a “slower” stopping criterion (i.e., a criterion that stops later than other criteria) makes it possible to attain a small improvement in generalization performance (typically, about 4%, on average) at the cost of a much longer training period (about a factor of four, on average).

Schematic representation of the early stopping rule based on cross-validation (Haykin 2009 )

10.8 The King Algorithm for Training Artificial Neural Networks: Backpropagation

The training process of ANN, which consists of adjusting connection weights, requires a lot of computational resources. For this reason, although they had been studied for many decades, few real applications of ANN were available until the mid-to-late 1980s, when the backpropagation method made its arrival. This method is attributed to Rumelhart et al. ( 1986 ). It is important to point out that, independently, other research teams around the same time published the backpropagation algorithm, but the one previously mentioned is one of the most cited. This algorithm led to the resurgence of ANN after the 1980s, but this algorithm is still considerably slower than other statistical machine learning algorithms. Some advantages of this algorithm are (a) it is able to make predictions of categorical or continuous outcomes, (b) it does a better job in capturing complex patterns than nearly any other algorithm, and (c) few assumptions about the underlying relationships of the data are made. However, this algorithm is not without weaknesses, some of which are (a) it is very slow to train since it requires a lot of computational resources because the more complex the network topology, the more computational resources are needed, this statement is true not only for ANN but also for any algorithm, (b) it is very susceptible to overfitting training data, and (c) its results are difficult to interpret (Lantz 2015 ).

Next, we provide the derivation of the backpropagation algorithm for the multilayer perceptron network shown in Fig. 10.18 .

Schematic representation of a multilayer feedforward network with one hidden layer, eight input variables, and three output variables

As mentioned earlier, the goal of the backpropagation algorithm is to find the weights of a multilayered feedforward network. The multilayered feedforward network given in Fig. 10.18 is able to approximate any function to any degree of accuracy (Cybenko 1989 ) with enough hidden units, as stated in the universal approximation theorem (Sect. 10.4 ), which makes the multilayered feedforward network a powerful statistical machine learning tool. Suppose that we provide this network with n input patterns of the form

where x i denotes the input pattern of individual i with i = 1, …, n , and x ip denotes the input p th of x i . Let y ij denote the response variable of the i th individual for the j th output and this is associated with the input pattern x i . For this reason, to be able to train the neural network, we must learn the functional relationship between the inputs and outputs. To illustrate the learning process of this relationship, we use the SSE loss function (explained in the section about “loss functions” to optimize the weights) which is defined as

Now, to explain how the backpropagation algorithm works, we will explain how information is first passed forward through the network. Providing the input values to the input layer is the first step, but no operation is performed on this information since it is simply passed to the hidden units. Then the net input into the k th hidden neuron is calculated as

Here P is the total number of explanatory variables or input nodes, \( {w}_{kp}^{(h)} \) is the weight from input unit p to hidden unit k , the superscript, h , refers to hidden layer, and x ip is the value of the p th input for pattern or individual i . It is important to point out that the bias term ( \( {b}_j^{(h)} \) ) of neuron k in the hidden layer has been excluded from ( 10.6 ) because the bias can be accounted for by adding an extra neuron to the input layer and fixing its value at 1. Then the output of the k neuron resulting from applying an activation function to its net input is

where g ( h ) is the activation function that is applied to the net input of any neuron k of the hidden layer. In a similar vein, now with all the outputs of the neurons in the hidden layer, we can estimate the net input of the j th neuron of the output unit j as

where M is the number of neurons in the hidden layer and \( {w}_{jk}^{(l)} \) represents the weights from hidden unit k to output j . The superscript, l ,refers to output layer. Also, here the bias term ( \( {b}_j^{(l)}\Big) \) of neuron j in the output layer was not included in ( 10.8 ) since it can be included by adding an extra neuron to the hidden layer and fixing its value at 1. Now, by applying the activation function to the output of the j th neuron of the output layer, we get the predicted value of the j th output as

where \( {\hat{y}}_{ij} \) is the predicted value of individual i in output j and g ( l ) is the activation function of the output layer. We are interested in learning the weights ( \( {w}_{kp}^{(h)},{w}_{jk}^{(l)} \) ) that minimize the sum of squared errors known as the mean square loss function ( 10.5 ), which is a function of the unknown weights, as can be observed in ( 10.6 )–( 10.8 ). Therefore, the partial derivatives of the loss function with respect to the weights represent the rate of change of the loss function with respect to the weights (this is the slope of the loss function). The loss function will decrease when moving the weights down this slope. This is the intuition behind the iterative method called backpropagation for finding the optimal weights and biases. This method consists of evaluating the partial derivatives of the loss function with regard to the weights and then moving these values down the slope, until the score of the loss function no longer decreases. For example, if we make the variation of the weights proportional to the negative of the gradient, the change in the weights in the right direction is reached. The gradient of the loss function given in ( 10.5 ) with respect to the weights connecting the hidden units to the output units ( \( {w}_{jk}^{(l)}\Big) \) is given by

where η is the learning rate that scales the step size and is specified by the user. To be able to calculate the adjustments for the weights connecting the hidden neurons to the outputs, \( {w}_{jk}^{(l)} \) , first we substitute ( 10.6 )–( 10.9 ) in ( 10.5 ), which yields

Then, by expanding ( 10.10 ) using the change rule, we get

Next, we get each partial derivative

By substituting these partial derivatives in ( 10.10 ), we obtain the change in weights from the hidden units to the output units, \( \Delta {w}_{jk}^{(l)}, \) as

where \( {\delta}_{ij}=\left({y}_{ij}-{\hat{y}}_{ij}\right)\ {g}^{(l)\acute{\mkern6mu}}\left({z}_{ij}^{(l)}\right) \) . Therefore, the formula used to update the weights from the hidden units to the output units is

This equation reflects that the adjusted weights from ( 10.13 ) are added to the current estimate of the weights, \( {w}_{jk}^{(l)(t)} \) , to obtain the updated estimates, \( {w}_{jk}^{(l)\left(t+1\right)} \) .

Next, to update the weights connecting the input units to the hidden units, we follow a similar process as in ( 10.12 ). Thus

Using the chain rule, we get that

where \( \frac{\partial E}{\partial {\hat{y}}_{ij}} \) and \( \frac{\partial {\hat{y}}_{ij}}{\partial {z}_{ij}^{(l)}} \) are given in ( 10.11 ), while

Substituting back into ( 10.14 ), we obtain the change in the weights from the input units to the hidden units, \( \Delta {w}_{kp}^{(h)} \) , as

where \( {\psi}_{ik}={\sum}_{j=1}^L{\delta}_{ij}{w}_{jk}^{(l)}{g}^{(h)\acute{\mkern6mu}}\left({z}_{ik}^{(h)}\right) \) . The summation over the number of output units is because each hidden neuron is connected to all the output units. Therefore, all the outputs should be affected if the weight connecting an input unit to a hidden unit changes. In a similar way, the formula for updating the weights from the input units to the hidden units is

This equation also reflects that the adjusted weights from ( 10.17 ) are added to the current estimate of the weights, \( {w}_{kp}^{(h)(t)} \) , to obtain the updated estimates, \( {w}_{kp}^{(h)\left(t+1\right)} \) . Now we are able to put down the processing steps needed to compute the change in the network weights using the backpropagation algorithm. We define w as the entire collection of weights.

10.8.1 Backpropagation Algorithm: Online Version

10.8.1.1 feedforward part.

Step 1. Initialize the weights to small random values, and define the learning rate ( η ) and the minimum expected loss score (tol). By tol we can fix a small value that when this value is reached, the training process will stop.

Step 2. If the stopping condition is false, perform steps 3–14.

Step 3. Select a pattern x i = [ x i 1 , …, x iP ] T as the input vector sequentially ( i = 1 till the number of samples) or at random.

Step 4. The net inputs of the hidden layer are calculated: \( {z}_{ik}^{(h)}={\sum}_{p=0}^P{w}_{kp}^{(h)}{x}_{ip} \) , i = 1, …, n and k = 0, …, M .

Step 5. The outputs of the hidden layer are calculated: \( {V}_{ik}^{(h)}={g}^{(h)}\left({z}_{ik}^{(h)}\right) \)

Step 6. The net inputs of the output layer are calculated: \( {z}_{ij}^{(l)}={\sum}_{k=0}^M{w}_{jk}^{(l)}{V}_{ik}^{(h)},j=1,\dots, L \)

Step 7. The predicted values (outputs) of the neural network are calculated: \( {\hat{y}}_{ij}={g}^{(l)}\left({z}_{ij}^{(l)}\right) \)

Step 8. Compute the mean square error (loss function) for pattern i error: \( {E}_i=\frac{1}{2 nL}{\sum}_{j=1}^L{\left({\hat{y}}_{ij}-{y}_{ij}\right)}^2+{E}_i \) ; then E ( w ) = E i + E ( w ) ; in the first step of an epoch, initialize E i = 0. Note that the value of the loss function is accumulated over all data pairs, that is, ( y ij , x i ).

10.8.1.2 Backpropagation Part

Step 9. The output errors are calculated: \( {\delta}_{ij}=\left({y}_{ij}-{\hat{y}}_{ij}\right)\ {g}^{(l)\acute{\mkern6mu}}\left({z}_{ij}^{(l)}\right) \)

Step 10. The hidden layer errors are calculated: \( {\psi}_{ik}={g}^{(h)\acute{\mkern6mu}}\left({z}_{ik}^{(h)}\right){\sum}_{j=1}^L{\delta}_{ij}{w}_{jk}^{(l)} \)

Step 11. The weights of the output layer are updated: \( {w}_{jk}^{(l)\left(t+1\right)}={w}_{jk}^{(l)(t)}+{\eta \delta}_{ij}{V}_{ik}^{(h)} \)

Step 12. The weights of the hidden layer are updated: \( {w}_{kp}^{(h)\left(t+1\right)}={w}_{kp}^{(h)(t)}+{\eta \psi}_{ik}{x}_{ip} \)

Step 13. If i < n , go to step 3; otherwise go to step 14.

Step 14. Once the learning of an epoch is complete, i = n ; then we check if the global error is satisfied with the specified tolerance (tol). If this condition is satisfied we terminate the learning process which means that the network has been trained satisfactorily. Otherwise, go to step 3 and start a new learning epoch: i = 1, since E ( w ) < tol.

The backpropagation algorithm is iterative. This means that the search process occurs over multiple discrete steps, each step hopefully slightly improving the model parameters. Each step involves using the model with the current set of internal parameters to make predictions of some samples, comparing the predictions to the real expected outcomes, calculating the error, and using the error to update the internal model parameters. This update procedure is different for different algorithms, but in the case of ANN, as previously pointed out, the backpropagation update algorithm is used.

10.8.2 Illustrative Example 10.1: A Hand Computation

In this section, we provide a simple example that will be computed step by step by hand to fully understand how the training is done using the backpropagation method. The topology used for this example is given in Fig. 10.19 .

A simple artificial neural network with one input, one hidden layer with one neuron, and one response variable (output)

The data set for this example is given in Table 10.1 , where we can see that the data collected consist of four observations, the response variable ( y ) takes values between 0 and 1, and the input information is for only one predictor ( x ). Additionally, Table 10.1 gives the starting values for the hidden weights ( \( {w}_{kp}^{(h)}\Big) \) and for the output weights ( \( {w}_{jk}^{(l)} \) ). It is important to point out that due to the fact that the response variable is in the interval between zero and one, we will use the sigmoid activation function for both the hidden layer and the output layer. A learning rate ( η ) equal to 0.1 and tolerance equal to 0.025 were also used.

The backpropagation algorithm described before was given for one input pattern at a time; however, to simplify the calculations, we will implement this algorithm using the four patterns of data available simultaneously using matrix calculations. For this reason, first we build the design matrix of inputs and outputs:

We also define the vectors of the starting values of the hidden and output weights:

Here we can see that P = 1,and M = 2. Next we calculate the net inputs for the hidden layer as

Now the output for the hidden layer is calculated using the sigmoid activation function

where \( {V}_{ik}^{(h)}={g}^{(h)}\left({z}_{ik}^{(h)}\right),i=1,\dots, 4 \) and g ( h ) ( z ) = 1/(1 + exp (− z )), which can be replaced by another desired activation function. Then the net inputs for the output layer are calculated as follows:

The predicted values (outputs) of the neural network are calculated as

where \( {\hat{y}}_i={g}^{(l)}\left({z}_{i1}^{(l)}\right),i=1,\dots, 4 \) and g ( l ) ( z ) = 1/(1 + exp (− z )). Next the output errors are calculated using the Hadamard product, ∘, (element-wise matrix multiplication) as

The hidden layer errors are calculated as

where \( {\boldsymbol{w}}_1^{(l)} \) is w ( l ) without the weight of the intercept, that is, without the first element.

The weights of the output layer are updated:

where 2 denotes that the output weights are for epoch number 2. Then the weights for epoch 2 of the hidden layer are obtained with

We check to see if the global error is satisfied with the specified tolerance (tol). Since \( E\left(\boldsymbol{w}\right)=\frac{1}{2n}{\sum}_{i=1}^n{\left({\hat{y}}_i-{y}_i\right)}^2=0.03519>\mathrm{tol}=0.025, \) this means that we need to increase the number of epochs to satisfy the tol = 0.025 specified.

Epoch 2. Using the updated weights of epoch 1, we obtain the new weights after epoch 2. First for the output layer:

And next for the hidden layer:

Now the predicted values are \( {\hat{y}}_1=0.8233 \) , \( {\hat{y}}_2=0.3754 \) , \( {\hat{y}}_3=0.8480 \) , and \( {\hat{y}}_4=0.2459 \) , and again we found that \( E\left(\boldsymbol{w}\right)=\frac{1}{2n}{\sum}_{i=1}^n{\left({\hat{y}}_i-{y}_i\right)}^2=0.03412>\mathrm{tol}=0.025. \) This means that we need to continue the number of epochs to be able to satisfy the tol = 0.025 specified. The learning process by decreasing the MSE is observed in Fig. 10.20 , where we can see that tol = 0.025 is reached in epoch number 13, with an MSE = E ( w ) = 0.02425.

Behavior of the learning process by monitoring the MSE for Example 10.1—a hand computation

10.8.3 Illustrative Example 10.2—By Hand Computation

Table 10.2 gives the information for this example; the data collected contain five observations, the response variable ( y ) has a value between −1 and 1, and there are three inputs (predictors). Table 10.2 also provides the starting values for the hidden weights ( \( {w}_{kp}^{(h)}\Big) \) and for the output weights ( \( {w}_{jk}^{(l)} \) ). Due to the fact that the response variable is in the interval between −1 and 1, we will use the hyperbolic tangent activation function (Tanh) for the hidden and output layers. Now we used a learning rate ( η ) equal to 0.05 and a tolerance equal to 0.008 (Fig. 10.21 ).

A simple artificial neural network with three inputs, one hidden layer with two neurons, and one response variable (output)

Here the backpropagation algorithm was implemented using the five patterns of data simultaneously using matrix calculations. Again, first we represent the design matrix of inputs and outputs:

Then we define the vectors of starting values of the hidden ( w ( h ) ) and output ( w ( l ) ) weights:

Now P = 3 and M = 3. Next, we calculate the net inputs for the hidden layer as

Now with the tanh activation function, the output of the hidden layer is calculated:

Again \( {V}_{ik}^{(h)}={g}^{(h)}\left({z}_{ik}^{(h)}\right),i=1;k=1,2 \) , and \( {g}^{(h)}(z)=\tanh (z)=\frac{\exp (z)-\exp \left(-z\right)}{\exp (z)+\exp \left(-z\right)} \) , which can also be replaced by another activation function. Then the net inputs for the output layer are calculated as

where \( {\hat{y}}_{ij}={g}^{(l)}\left({z}_{i1}^{(l)}\right),i=1,\dots, 5 \) and \( {g}^{(l)}(z)=\tanh (z)=\frac{\exp (z)-\exp \left(-z\right)}{\exp (z)+\exp \left(-z\right)} \) . The output errors are calculated as

where \( {\boldsymbol{w}}_1^{(h)} \) is w ( h ) without the weights of the intercepts, that is, without the first row.

Number 2 in w ( l )(2) indicates that output weights are for epoch number 2. The weights of the hidden layer in epoch 2 are obtained with

We check to see if the global errors are satisfied with the specified tolerance (tol). \( E\left(\boldsymbol{w}\right)=\frac{1}{2n}{\sum}_{i=1}^n{\left({\hat{y}}_i-{y}_i\right)}^2=0.01104>\mathrm{tol}=0.008 \) which means that we have to continue with the next epoch by cycling the training data again.

Epoch 2. Using the updated weights of epoch 1, we obtain the new weights for epoch 2.

For the output layer, these are

While for the hidden layer, they are

Now the predicted values are \( {\hat{y}}_1=0.6372 \) , \( {\hat{y}}_2=0.2620 \) , \( {\hat{y}}_3=-0.6573 \) , \( {\hat{y}}_4=0.6226, \) \( {\hat{y}}_5=-0.9612, \) and the \( E\left(\boldsymbol{w}\right)=\frac{1}{2n}{\sum}_{i=1}^n{\left({\hat{y}}_i-{y}_i\right)}^2=0.01092>\mathrm{tol}=0.008, \) which means that we have to continue with the next epoch by cycling the training data again. Figure 10.22 shows that the E ( w ) = 0.00799 < tol = 0.008 until epoch 83.

Behavior of the learning process by monitoring the MSE for Example 10.2—a hand computation

In this algorithm, zero weights are not an option because each layer is symmetric in the weights flowing to the different neurons. Then the starting values should be close to zero and can be taken from random uniform or Gaussian distributions (Efron and Hastie 2016 ). One of the disadvantages of the basic backpropagation algorithm just described above is that the learning parameter η is fixed.

Alipanahi B, Delong A, Weirauch MT, Frey BJ (2015) Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat Biotechnol 33:831–838

Article CAS Google Scholar

Anderson J, Pellionisz A, Rosenfeld E (1990) Neurocomputing 2: directions for research. MIT, Cambridge

Google Scholar

Angermueller C, Pärnamaa T, Parts L, Stegle O (2016) Deep learning for computational biology. Mol Syst Biol 12(878):1–16

Chollet F, Allaire JJ (2017) Deep learning with R. Manning Publications, Manning Early Access Program (MEA), 1st edn

Cole JH, Rudra PK, Poudel DT, Matthan WA, Caan CS, Tim D, Spector GM (2017) Predicting brain age with deep learning from raw imaging data results in a reliable and heritable biomarker. NeuroImage 163(1):115–124. https://doi.org/10.1016/j.neuroimage.2017.07.059

Article PubMed Google Scholar

Cybenko G (1989) Approximations by superpositions of sigmoidal functions. Math Control Signal Syst 2:303–314

Article Google Scholar

Dingli A, Fournier KS (2017) Financial time series forecasting—a deep learning approach. Int J Mach Learn Comput 7(5):118–122

Dougherty G (2013) Pattern recognition and classification-an introduction. Springer Science + Business Media, New York

Book Google Scholar

Efron B, Hastie T (2016) Computer age statistical inference. Algorithms, evidence, and data science. Cambridge University Press, New York

Francisco-Caicedo EF, López-Sotelo JA (2009) Una approximación práctica a las redes neuronales artificiales. Universidad del Valle, Cali

Goldberg Y (2016) A primer on neural network models for natural language processing. J Artif Intell Res 57(345):420

Haykin S (2009) Neural networks and learning machines, 3rd edn. Pearson Prentice Hall, New York

Hornik K (1991) Approximation capabilities of multilayer feedforward networks. Neural Netw 4:251–257

James G, Witten D, Hastie T, Tibshirani R (2013) An introduction to statistical learning with applications in R. Springer, New York

Kohonen T (2000) Self-organizing maps. Springer, Berlin

Lantz B (2015) Machine learning with R, 2nd edn. Packt Publishing Ltd, Birmingham

LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444

Lewis ND (2016) Deep learning made easy with R. A gentle introduction for data science. CreateSpace Independent Publishing Platform

Liu S, Tang J, Zhang Z, Gaudiot JL (2017) CAAD: computer architecture for autonomous driving. ariv preprint ariv:1702.01894

Ma W, Qiu Z, Song J, Cheng Q, Ma C (2017) DeepGS: predicting phenotypes from genotypes using Deep Learning. bioRxiv 241414. https://doi.org/10.1101/241414

Makridakis S, Spiliotis E, Assimakopoulos V (2018) Statistical and Machine Learning forecasting methods: concerns and ways forward. PLoS One 13(3):e0194889. https://doi.org/10.1371/journal.pone.0194889

Article CAS PubMed PubMed Central Google Scholar

McCulloch WS, Pitts W (1943) A logical calculus of the ideas immanent in nervous activity. Bull Math Biophys 5:115–133

McDowell R, Grant D (2016) Genomic selection with deep neural networks. Graduate Theses and Dissertations, p 15973. https://lib.dr.iastate.edu/etd/15973

Menden MP, Iorio F, Garnett M, McDermott U, Benes CH et al (2013) Machine learning prediction of cancer cell sensitivity to drugs based on genomic and chemical properties. PLoS One 8:e61318

Montesinos-López A, Montesinos-López OA, Gianola D, Crossa J, Hernández-Suárez CM (2018a) Multi-environment genomic prediction of plant traits using deep learners with a dense architecture. G3: Genes, Genomes, Genetics 8(12):3813–3828. https://doi.org/10.1534/g3.118.200740

Montesinos-López OA, Montesinos-López A, Crossa J, Gianola D, Hernández-Suárez CM et al (2018b) Multi-trait, multi-environment deep learning modeling for genomic-enabled prediction of plant traits. G3: Genes, Genomes, Genetics 8(12):3829–3840. https://doi.org/10.1534/g3.118.200728

Montesinos-López OA, Vallejo M, Crossa J, Gianola D, Hernández-Suárez CM, Montesinos-López A, Juliana P, Singh R (2019a) A benchmarking between deep learning, support vector machine and bayesian threshold best linear unbiased prediction for predicting ordinal traits in plant breeding. G3: Genes, Genomes, Genetics 9(2):601–618

Montesinos-López OA, Martín-Vallejo J, Crossa J, Gianola D, Hernández-Suárez CM, Montesinos-López A, Juliana P, Singh R (2019b) New deep learning genomic prediction model for multi-traits with mixed binary, ordinal, and continuous phenotypes. G3: Genes, Genomes, Genetics 9(5):1545–1556

Montesinos-López OA, Montesinos-López A, Pérez-Rodríguez P, Barrón-López JA, Martini JWR, Fajardo-Flores SB, Gaytan-Lugo LS, Santana-Mancilla PC, Crossa J (2021) A review of deep learning applications for genomic selection. BMC Genomics 22:19

Patterson J, Gibson A (2017) Deep learning: a practitioner’s approach. O’Reilly Media

Ripley B (1993) Statistical aspects of neural networks. In: Bornndorff-Nielsen U, Jensen J, Kendal W (eds) Networks and chaos—statistical and probabilistic aspects. Chapman and Hall, London, pp 40–123

Chapter Google Scholar

Rouet-Leduc B, Hulbert C, Lubbers N, Barros K, Humphreys CJ et al (2017) Machine learning predicts laboratory earthquakes. Geophys Res Lett 44(28):9276–9282

Rumelhart DE, Hinton GE, Williams RJ (1986) Learning internal representations by backpropagating errors. Nature 323:533–536

Shalev-Shwartz, Ben-David (2014) Understanding machine learning: from theory to algorithms. Cambridge University Press, New York

Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(6):1929–1958

Tavanaei A, Anandanadarajah N, Maida AS, Loganantharaj R (2017) A deep learning model for predicting tumor suppressor genes and oncogenes from PDB structure. bioRiv 177378. https://doi.org/10.1101/177378

Weron R (2014) Electricity price forecasting: a review of the state-of-the-art with a look into the future. Int J Forecast 30(4):1030–1081

Wiley JF (2016) R deep learning essentials: build automatic classification and prediction models using unsupervised learning. Packt Publishing, Birmingham, Mumbai

Download references

Author information

Authors and affiliations.

Facultad de Telemática, University of Colima, Colima, México

Osval Antonio Montesinos López

Departamento de Matemáticas, University of Guadalajara, Guadalajara, México

Abelardo Montesinos López

Biometrics and Statistics Unit, CIMMYT, Edo de México, México

Jose Crossa

Colegio de Post- Graduado, Edo de México, México

You can also search for this author in PubMed Google Scholar

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this chapter

Montesinos López, O.A., Montesinos López, A., Crossa, J. (2022). Fundamentals of Artificial Neural Networks and Deep Learning. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction. Springer, Cham. https://doi.org/10.1007/978-3-030-89010-0_10

Download citation

DOI : https://doi.org/10.1007/978-3-030-89010-0_10

Published : 14 January 2022

Publisher Name : Springer, Cham

Print ISBN : 978-3-030-89009-4

Online ISBN : 978-3-030-89010-0

eBook Packages : Biomedical and Life Sciences Biomedical and Life Sciences (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Publish with us

Policies and ethics

Find a journal
Track your research

Artificial neural networks and their application to sequence recognition

Downloadable content.

Bengio, Yoshua
This thesis studies the introduction of a priori structure into the design of learning systems based on artificial neural networks applied to sequence recognition, in particular to phoneme recognition in continuous speech. Because we are interested in sequence analysis, algorithms for training recurrent networks are studied and an original algorithm for constrained recurrent networks is proposed and test results are reported. We also discuss the integration of connectionist models with other analysis tools that have been shown to be useful for sequences, such as dynamic programming and hidden Markov models. We introduce an original algorithm to perform global optimization of a neural network/hidden Markov model hybrid, and show how to perform such a global optimization on all the parameters of the system. Finally, we consider some alternatives to sigmoid networks: Radial Basis Functions, and a method for searching for better learning rules using a priori knowledge and optimization algorithms.
Computer Science.
Artificial Intelligence.
McGill University
https://escholarship.mcgill.ca/concern/theses/qv33rx48q
School of Computer Science
Doctor of Philosophy
Theses & Dissertations

MIT Libraries home DSpace@MIT

DSpace@MIT Home
MIT Libraries
Doctoral Theses

Modeling Intelligence via Graph Neural Networks

Terms of use

Date issued, collections.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

View all journals
My Account Login
Explore content
About the journal
Publish with us
Sign up for alerts
Open access
Published: 01 May 2024

Novel applications of Convolutional Neural Networks in the age of Transformers

Tansel Ersavas 1 ,
Martin A. Smith 1 , 2 , 3 , 4 &
John S. Mattick 1

Scientific Reports volume 14 , Article number: 10000 ( 2024 ) Cite this article

333 Accesses

3 Altmetric

Metrics details

Computational science
Machine learning

Convolutional Neural Networks (CNNs) have been central to the Deep Learning revolution and played a key role in initiating the new age of Artificial Intelligence. However, in recent years newer architectures such as Transformers have dominated both research and practical applications. While CNNs still play critical roles in many of the newer developments such as Generative AI, they are far from being thoroughly understood and utilised to their full potential. Here we show that CNNs can recognise patterns in images with scattered pixels and can be used to analyse complex datasets by transforming them into pseudo images with minimal processing for any high dimensional dataset, representing a more general approach to the application of CNNs to datasets such as in molecular biology, text, and speech. We introduce a pipeline called DeepMapper , which allows analysis of very high dimensional datasets without intermediate filtering and dimension reduction, thus preserving the full texture of the data, enabling detection of small variations normally deemed ‘noise’. We demonstrate that DeepMapper can identify very small perturbations in large datasets with mostly random variables, and that it is superior in speed and on par in accuracy to prior work in processing large datasets with large numbers of features.

Deep learning for cellular image analysis

Off-the-shelf deep learning is not enough, and requires parsimony, Bayesianity, and causality

Piecewise linear neural networks and deep learning

Introduction.

There are exponential increases in data 1 especially from highly complex systems, whose non-linear interactions and relationships are not well understood, and which can display major or unexpected changes in response to small perturbations, known as the ‘Butterfly effect’ 2 .

In domains characterised by high-dimensional data, traditional statistical methods and Machine Learning (ML) techniques make heavy use of feature engineering that incorporates extensive filtering, selection of highly variable parameters, and dimension reduction techniques such as Principal Component Analysis (PCA) 3 . Most current tools filter out smaller changes in data, mostly considered artefacts or `noise`, which may contain information that is paramount to understanding the nature and behaviour of such highly complex systems 4 .

The emergence of Deep Learning (DL) offers a paradigm shift. DL algorithms, underpinned by adaptive learning mechanisms, can discern both linear and non-linear data intricacies, and open avenues to analyse data that is not possible or practical by conventional techniques 5 , particularly in complex domains such as image, temporal sequence analysis, molecular biology, and astronomy 6 . DL models, such as Convolutional Neural Networks (CNNs) 7 , Recurrent Neural Networks (RNNs) 8 , Generative Network s 9 and Transformers 10 , have demonstrated exceptional performance in various domains, such as image and speech recognition, natural language processing, and game playing 6 . CNNs and LSTMs were found to be great tools to predict behaviour of so called `chaotic` systems 11 . Modern DL systems often surpass human-level performance, and challenge humans even in creative endeavours.

CNNs utilise a unique architecture that comprises several layers, including convolutional layers, pooling layers, and fully connected layers, to process and transform the input data hierarchically 5 . CNNs have no knowledge of sequence, and therefore are generally not used in analysing time-series or similar data, which is traditionally attempted with Recurrent Neural Networks (RNNs) 12 and Long Short-Term Memory networks (LSTMs) 8 due to their ability to capture temporal patterns. Where CNNs have been employed for sequence or time-series analysis, 1-dimensional (1D) CNNs have been selected because of their vector based 1D input structure 13 . However, attempts to analyse such data in 1D CNNs do not always give superior results 14 . In addition, GPU (Graphical Processing Units) systems are not always optimised for processing 1D CNNs, therefore even though 1D CNNs have fewer parameters than 2-dimensional (2D) CNNs, 2D CNNs can outperform 1D CNNs 15 .

Transformers , introduced by Vaswani et al. 10 , have recently come to prominence, particularly for tasks where data are in the form of time series or sequences, in domains ranging from language modelling to stock market prediction 16 . Transformers leverage self-attention, a key component that allows a model to weigh and focus on various parts of an input sequence when producing an output, enabling the capture of long-range dependencies in data. Unlike CNNs, which use local receptive fields, self-attention weighs the significance of various parts of the input data 17 .

Following success with sequence-based tasks, Transformers are being extended to image processing. Vision-Transformers in object detection 18 , Detection Transformers 19 and lately Real-time Detection Transformers all claim superiority over CNNs 20 . However, their inference operations demand far more resources than CNNs and trail CNNs in flexibility. They also suffer similar augmentation problems as CNNs. More recently, Retentive-Networks have been offered as an alternative to Transformers 21 and may soon challenge the Transformer architecture.

CNNs can recognise dispersed patterns

Even though CNNs are widely used, there are some misconceptions, notably that CNNs are largely limited to image data, and require established spatial relationships between pixels in images, both of which are open to challenge. The latter is of particular importance when considering the potential of CNNs to analyse complex non-image datasets, whose data structures are arbitrary.

Moreover, while CNNs are universal function approximators 22 , they may not always generalise 23 , especially if they are trained on data that is insufficient to cover the solution space 24 . It is also known that they can spontaneously generalise even when supplied with a small number of samples during training after overfitting, called ‘grokking’ 25 , 26 . CNNs can generalise from scattered data if given enough samples, or if they grok, and this can be determined by observing changes to training versus testing accuracy and loss.

Non-image processing with CNNs

While CNNs have achieved remarkable success in computer vision applications, such as image classification and object detection 7 , 27 , they have also been employed in other domains to a lesser degree with impressive results, including: (1) natural language processing, text classification, sentiment analysis and named entity recognition, by treating text data as a one-dimensional image with characters represented as pixels 16 , 28 ; (2) audio processing, such as speech recognition, speaker identification and audio event detection, by applying convolutions over time frequency representations of audio signals 29 ; (3) time series analysis, such as financial market prediction, human activity recognition and medical signal analysis, using one-dimensional convolutions to capture local temporal patterns and learn features from time series data 30 ; and (4) biopolymer (e.g., DNA) sequencing, using 2D CNNs to accurately classify molecular barcodes in raw signals from Oxford Nanopore sequencers using a transformation to turn a 1D signal into 2D images—improving barcode identification recovery from 38 to over 85% 31 .

Indeed, CNNs are not perfect tools for image processing as they do not develop semantic understanding of images even though they can be trained to do semantic segmentation 32 . They cannot easily recognise negative images when trained with positive images 33 . CNNs are also sensitive to the orientation and scale of objects and must rely on augmentation of image datasets, often involving hundreds of variations of the same image 34 . There are no such changes in the perspective and orientation of data converted into flat 2D images.

In the realm of complex domains that generate huge amounts of data, augmentation is usually not required for non-image datasets, as the datasets will be rich enough. Moreover, introducing arbitrary augmentation does not always improve accuracy; indeed, introducing hand-tailored augmentation may hinder analysis 35 . If augmentation is required, it can be introduced in a data-oriented form, but even when using automated augmentation such as AutoAugment 35 or FasterAutoAugment 36 , many of the augmentations (such as shearing, translation, rotation, inversion, etc.) should not be used, and the result should be tested carefully, as augmentation may introduce artefacts.

A frequent problem with handling non-image datasets with many variables is noise. Many algorithms have been developed for noise elimination, most of which are domain specific. CNNs can be trained to use the whole input space with minimal filtering and no dimension reduction, and can find useful information in what might be ascribed as ‘noise’ 4 , 37 . Indeed, a key reason to retain ‘noise’ is to allow discovery of small perturbations that cannot be detected by other methods 11 .

Conversion of non-image data to artificial images for CNN processing

Transforming sequence data to images without resorting to dimension reduction or filtering offers a potent toolset for discerning complex patterns in time series and sequence data, which potentiates the two major advantages of CNNs compared to RNNs, LSTMs and Transformers . First, CNNs do not depend on past data to recognise current patterns, which increases sensitivity to detect patterns that appear in the beginning of time-series or sequence data. Second, 2D CNNs are better optimised for GPUs and highly parallelizable, and are consequently faster than other current architectures, which accelerates training and inference, while reducing resource and energy consumption during in all phases including image transformation, training, and inference significantly.

Image data such as MNIST represented in a matrix can be classified by basic deep networks such as Multi-level Perceptrons (MLP) by turning their matrix representation to vectors (Fig. 1 a). Using this approach analysis of images becomes increasingly complex as the image size grows, increasing the input parameters of MLP and the computational cost exponentially. On the other hand, 2D CNNs can handle the original matrix much faster than MLP with equal or better accuracy and scale to much larger images.

Conversion of images to vectors and vice versa. ( a ) Basic operation of transformation of an image to a vector, forming a sequence representation of the numeric values of pixels. ( b ) Transforming a vector to a matrix, forming an image by encoding numerical values as pixels. During this operation if the vector size cannot be mapped to m X n because vector size is smaller than the nearest m X n, then it is padded with zeroes to the nearest m X n.

Just like how a simple neural network analyses a 2D image by turning it into a vector, the reciprocal is also true—data in a vector can be converted to a 2D matrix (Fig. 1 b). Vectors converted to such matrices form arbitrary patterns that are incomprehensible to human eye. A similar technique for such mapping has also been proposed by Kovelarchuk et al. using another algorithm called CPC-R 38 .

Attribution

An important aspect of any analysis is to be able to identify those variables that are most important and the degree to which they contribute to a given classification. Identifying these variables is particularly challenging in CNNs due to their complex hierarchical architecture, and many non-linear transformations 39 . To address this problem many ‘attribution methods’ have been developed to try to quantify the contribution of each variable (e.g., pixels in images) to the final output for deep neural networks and CNNs 40 .

Saliency maps serve as an intuitive attribution and visualisation tool for CNNs, spotlighting regions in input data that significantly influence the model's predictions 27 . By offering a heatmap representation, these maps illuminate key features that the model deems crucial, thus aiding in demystifying the model's decision-making process. For instance, when analysing an image of a cat, the saliency map would emphasise the cat's distinct features over the background. While their simplicity facilitates understanding even for those less acquainted with deep learning, saliency maps do face challenges, particularly their sensitivity to noise and occasional misalignment with human intuition 41 , 42 , 43 . Nonetheless, they remain a pivotal tool in enhancing model transparency and bridging the interpretability gap between ML models and human comprehension.

Several methods have been proposed for attribution, including Guided Backpropagation 44 , Layer-wise Relevance Propagation 45 , Gradient-weighted Class Activation Mapping 46 , Integrated Gradients 47 , DeepLIFT 48 , and SHAP (SHapley Additive exPlanations) 49 . Many of these methods were developed because it is challenging to identify important input features when there are different images with the same label (e.g., ‘bird’ with many species) presented at different scales, colours, and perspectives. In contrast, most non-image data does not have such variations, as each pixel corresponds to the same feature. For this reason, choosing attributions with minimal processing is sufficient to identify the salient input variables that have the maximal impact on classification.

Here we introduce a new analytical pipeline, DeepMapper , which applies a non-indexed or indexed mapping to the data representing each data point with one pixel, enabling the classification or clustering of data using 2D CNNs. This simple direct mapping has been tried by others but has not been tested with datasets with sufficiently large amounts of data in various conditions. We use raw data with minimal filtering and no dimension reduction to preserve small perturbations in data that are normally removed, in order to assess their impact.

The pipeline includes conversion of data, separation to training and validation, assessment of training quality, attribution, and accumulation of results in a pipeline. The pipeline is run multiple times until a consensus is reached. The significant variables can then be identified using attribution and exported appropriately.

The DeepMapper architecture is shown in Fig. 2 . The complete algorithm of DeepMapper is detailed in the “ Methods ” section and the Python source code is supplied at GitHub 50 .

DeepMapper architecture. DeepMapper uses sequence or multi-variate data as input. The first step of DeepMapper is to merge and if required index input files to prepare them into matrix format. The data are normalised using log normalisation, then folded to a matrix. Folding is performed either directly with the natural order of the data or by using the index that is generated or supplied during the data import. After folding, the data are kept in temporary storage and separated to ‘train’ and ‘test’ using SciPy train test split. Training is done using either using CNNs that are supplied by the PyTorch libraries, or a custom CNN supplied ( ResNet18 is used by default). Intermediary results are run through attribution algorithms supplied by the Captum 51 and saved to run history log. The run is then repeated until convergence is achieved, or until a pre-determined number of iterations are performed by shuffling training testing and validation data. Results are summarised in a report with exportable tables and graphics. Attribution is applied to true positives and true negatives, and these are translated back to features to be added to reports. Further details can be directly found in the accompanying code 50 .

DeepMapper is developed to implement an approach to process high-dimensional data without resorting to excessive filtering and dimension reduction techniques that eliminate smaller perturbations in data to be able to identify those differences that would otherwise be filtered out. The following algorithm is used to achieve this result:

Read and setup the running parameters.

Read the data into a tabulated form in the form of observations, features, and outcome (in the form of labels, or if self-supervised, the input itself).

If the input data includes categorical features, these features should be converted to numbers and normalised before feeding to DeepMapper .

Identify features and labels.

Do only basic filtering that eliminates observations or features if all of them are 0 or empty.

Normalise features.

Transform tabulated data to 2-dimensional matrices as illustrated in Fig. 1 a by applying a vector to matrix transformation.

If the analysis is supervised, then transform class labels to output matrices.

Begin iteration:

Separate the data into training and validation groups.

Train on the dataset for required number of epochs, until reaching satisfactory testing accuracy and loss, or maximum a pre-determined number of iterations.

If satisfactory testing results are obtained, then:

Perform attributions by associating each result to contributing input pixels using Captum, a Python library for attributions 51 .

Accumulate attribution results by collecting the attribution results for each class.

If training is satisfactory:

Tabulate attribution results by averaging accumulated attributions.

Save the model.

Report results.

The results of DeepMapper analysis can be used in 2 ways:

Supervised: DeepMapper produces a list of features that played a prominent role in the differentiation of classes.

Self-supervised: Highlights the most important features in differentiating observations from each other in a non-linear fashion. The output can be used as an alternative feature selection tool for dimension reduction.

In both modes, any hidden layer can be examined as latent space. A special bottleneck layer can be introduced to reduce dimensions for clustering purposes.

We present a simple example to demonstrate that CNNs can readily interpret data with a well dispersed pattern of pixels, using the MNIST dataset, which is widely used for hand-written image recognition and which humans as well as CNNs can easily recognise and classify based on the obvious spatial relationships between pixels (Fig. 3 ). This dataset is a more complicated problem than datasets such as the Gisette dataset 52 that was developed to distinguish between 4 and 9. It includes all digits and uses a full randomisation of pixels, and can be regenerated with the script supplied 50 and changing the seed will generate different patterns.

A sample from MNIST dataset (left side of each image) and its shuffled counterpart (right side).

We randomly shuffled the data in Fig. 3 using the same seed 50 to obtain 60,000 training images such as those shown on the right side of each digit, and validated the results with a separate batch of 20,000 images (Fig. 3 ). Although the resulting images are no longer recognizable by eye, a CNN has no difficulty distinguishing and classifying each pattern with ~ 2% testing error compared to the reference data (Fig. 4 ). This result demonstrates that CNNs can accurately recognise global patterns in images without reliance on local relationships between neighbouring pixels. It also confirms the finding that shuffling images only marginally increases training loss 23 and extends it to testing loss (Fig. 4 ).

Results of training MNIST dataset ( a ) and the shuffled dataset ( b ) with PyTorch model ResNet18 50 . The charts demonstrate although the training continued for 50 epochs, about 15 epochs for shuffled images ( b ) would be enough, as further training starts causing overfitting. The decrease of accuracy between normal and shuffled images is about 3%, and this difference cannot be improved by using more sophisticated CNNs with more layers, meaning shuffling images cause a measurable loss of information, yet still hold patterns recognisable by CNNs.

Testing DeepMapper

Finding slight changes in very few variables in otherwise seemingly random datasets with large numbers of variables is like finding a needle in a haystack. Such differences in data are almost impossible to detect using traditional analysis tools because small variations are usually filtered out before analysis.

We devised a simple test case to determine if DeepMapper can detect one or more variables with small but distinct variations in otherwise randomly generated data. We generated a dataset with 10,000 data items with 18,225 numeric variables as an example of a high-dimensional dataset using PyTorch’s uniform random algorithms 53 . The algorithm sets 18,223 of these variables to random numbers in the range of 0–1, and two of the variables into two distinct groups as seen in Table 1 .

We call this type of dataset ‘Needle in a haystack’ (NIHS) dataset, where very small amounts of data with small variance is hidden among a set of random variables that is order(s) of magnitude greater than the meaningful components. We provide a script that can generate this and similar datasets among the source supplied 50 .

DeepMapper was able to accurately classify the two datasets (Fig. 5 ). Furthermore, using attribution DeepMapper was also able to determine the two datapoints that have different variances in the two classes. Note that DeepMapper may not always find all the changes in the first attempt as neural network initialisation of weights is a stochastic process. However, DeepMapper o vercomes this matter via multiple iterations to establish acceptable training and testing accuracies as described in the Methods.

In this demonstration of analysis of high dimensional data with very small perturbations, DeepMapper can find these small variations in a few (in this example two) variables out of very large number of random variables (here 18,225). ( a ) DeepMapper representations of each record. ( b ) The result of the test run of the classification with unseen data (3750 elements). ( c ) The first and second variables in the graph are measurably higher than the other variables.

Comparison of DeepMapper with DeepInsight

DeepInsight 54 is the most general approach published to date for converting non-image data into image-like structures, with the claim that these processed structures allow CNNs to capture complex patterns and features in the data. DeepInsight offers an algorithm to create images that have similar features collated into a “well organised image form”, or by applying one of several dimensionality reduction algorithms (e.g., t-SNE, PCA or KPCA) 54 . However, these algorithms add computational complexity, potentially eliminate valuable information, limit the abilities of CNNs to find small perturbations, and make it more difficult to use attribution to determine most notable features impacting analysis as multiple features may overlap in the transformed image. In contrast DeepMapper uses a direct mapping mechanism where each feature corresponds to one pixel.

To identify important input variables, DeepInsight authors later developed DeepFeature 55 using an elaborate mechanism to associate image areas identified by attribution methods to the input variables. DeepMapper uses a simpler approach as each pixel corresponds to only one variable and can use any of the attribution methods to link results to its input space. While both DeepMapper and DeepInsight follow the general idea that non-image data can be processed with 2D CNNs, DeepMapper uses a much simpler and faster algorithm, while DeepInsight chooses a sophisticated set of algorithms to convert non-image data to images, dramatically increasing computational cost. The DeepInsight conversion process is not designed to utilise GPUs so cannot be accelerated by better hardware, and the obtained images may be larger than the number of data points, also impacting performance.

One of the biggest differences between DeepFeature and DeepMapper is that DeepFeature in many cases selects multiple features during attribution because DeepInsight pixels represent multiple values, whereas each DeepMapper pixel represents one input feature, therefore it can determine differentiating features with pinpoint accuracy at a resolution of 1 pixel per feature.

The DeepInsight manuscript offers various examples of data to demonstrate its abilities. However, many of the examples use low dimensions (20–4000 features) while today’s complex datasets may regularly require tens of thousands to millions of features such as in genome analysis in biology and radio-telescope analysis in astronomy. As such, several examples provided by DeepInsight have insufficient dimensions for a sophisticated mechanism such as DeepMapper , which should ideally have 10,000 or more dimensions as required by modern complex datasets. DeepInsight examples include a speech dataset from the TIMIT corpus with 39 dimensions, Relathe (text) dataset, which is derived from newsgroup documents and partitioned evenly across different newsgroups. It contains 1427 samples and 4322 dimensions. The ringnorm-DELVE , which is an implementation of Leo Breiman’s ringnorm example, is a 20 dimensional, 2 class classification with 7400 samples 54 . Another example, Madelon , introduced an artificially generated dataset 2600 samples and 500 dimensions, where only 5 principal and 20 derived variables containing information. Instead, we used a much more complicated example than Madelon , an NIHS dataset 50 that we used to test DeepMapper in the first place. We attempted to run DeepInsight with NIHS data, but we could not get it to train properly and for this reason we cannot supply a comparison.

The most complex problem published by DeepInsight was the analysis of a public RNA sequencing gene expression dataset from TCGA ( https://cancergenome.nih.gov/ ) containing 6216 samples of 60,483 genes or dimensions, of which DeepInsight used 19,319. We selected this example as the second demonstration of application of DeepMapper to high dimensional data, as well as a benchmark for comparison with DeepInsight .

We generated the data using the R script offered by DeepInsight 54 and ran DeepMapper as well as DeepInsight using the generated dataset to compare accuracy and speed. In this test DeepMapper exhibited much improved processing speed with near identical accuracy (Table 2 , Fig. 6 ).

Analysis of TCGA data by DeepInsight vs DeepMapper: The image on the top was generated by DeepInsight using its default values and a t-SNE transformer supplied by DeepInsight . The image at the bottom was generated by DeepMapper. Image conversion and training speeds and the analysis results can be found in Table 2 .

CNNs are fundamentally sophisticated pattern matchers that can establish intricate mappings between input features and output representations 6 . They excel at transforming various inputs into outputs, including identifying classes or bounding boxes, through a series of operations involving convolution, pooling, and activation functions 7 , 56 .

Even though CNNs are in the centre of many of today’s revolutionary AI systems from self-driving cars to generative AI systems such as Dall-E-2 , MidJourney and Stable Diffusion , they are still not well understood nor efficiently utilised, and their usage beyond image analysis has been limited.

While CNNs used in image analysis are constrained historically and practically to a 224 × 224 matrix or a similar fixed size input, this limitation arises for pre-trained models. When CNNs have not been pre-trained, one can select a much wider variety of sizes as input shape depending on the CNN architecture. Some CNNs are more flexible in their input size that implemented with adaptive pooling layers such as ResNet18 using adaptive pooling 57 . This provides flexibility to choose optimal sizes for the task in hand for non-image applications, as most non-image applications will not use pre-trained CNNs.

Here we have demonstrated uses of CNNs that are outside the norm. There is a need for analysis of complex data with many thousands of features that are not primarily images. There is also a lack of tools that offer minimal conversion of non-image data to image-like formats that then can easily be processed with CNNs in classification and clustering tasks. As a lot of this data is coming from complex systems that have a lot of features, DeepMapper offers a way of investigating such data in ways that may not be possible with traditional approaches.

Although DeepMapper currently uses CNN as its AI component, alternative analytic strategies can easily be substituted in lieu of CNN with minimal changes, such as Vision Transformers 18 or RetNets 21 , which have great potential for this application. While Transformers and RetNets have input size limitations for inference in terms of number of tokens. Vision Transformers can handle much larger inputs by dividing images to segments that incorporate multiple pixels 18 . This type of approach would be applicable to both Transformers and RetNets , and future architectures. DeepMapping can leverage these newer architectures, and others, in the future 57 .

Data availability

DeepMapper is released as an open source tool on GitHub https://github.com/tansel/deepmapper . Data that is not available from GitHub because of size constraints can be requested from the authors.

Taylor, P. Volume of data/information created, captured, copied, and consumed worldwide from 2010 to 2020, with forecasts from 2021 to 2025. https://www.statista.com/statistics/871513/worldwide-data-created/ (2023).

Ghys, É. The butterfly effect. in The Proceedings of the 12th International Congress on Mathematical Education: Intellectual and attitudinal challenges, pp. 19–39 (Springer). (2015).

Jolliffe, I. T. Mathematical and statistical properties of sample principal components. Principal Component Analysis , pp. 29–61 (Springer). https://doi.org/10.1007/0-387-22440-8_3 (2002).

Landauer, R. The noise is the signal. Nature 392 , 658–659. https://doi.org/10.1038/33551 (1998).

Article ADS CAS Google Scholar

Goodfellow, I., Bengio, Y. & Courville, A. Deep Learning (MIT Press). http://www.deeplearningbook.org (2016).

LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521 , 436–444. https://doi.org/10.1038/nature14539 (2015).

Article ADS CAS PubMed Google Scholar

Krizhevsky, A., Sutskever, I. & Hinton, G. E. ImageNet classification with deep convolutional neural networks. Commun. ACM 60 , 84–90. https://doi.org/10.1145/3065386 (2017).

Article Google Scholar

Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9 , 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735 (1997).

Article CAS PubMed Google Scholar

Goodfellow, I. et al. Generative adversarial nets. Commun. ACM 63 , 139–144. https://doi.org/10.1145/3422622 (2020).

Vaswani, A. et al. Attention is all you need. NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems , pp. 6000–6010. https://doi.org/10.5555/3295222.3295349 (2017).

Barrio, R. et al. Deep learning for chaos detection. Chaos 33 , 073146. https://doi.org/10.1063/5.0143876 (2023).

Article ADS MathSciNet PubMed Google Scholar

Levin, E. A recurrent neural network: limitations and training. Neural Netw. 3 , 641–650. https://doi.org/10.1016/0893-6080(90)90054-O (1990).

LeCun, Y. & Bengio, Y. Convolutional networks for images, speech, and time series. in The handbook of brain theory and neural networks, pp. 255–258. https://doi.org/10.5555/303568.303704 (MIT Press, 1998).

Wu, Y., Yang, F., Liu, Y., Zha, X. & Yuan, S. A comparison of 1-D and 2-D deep convolutional neural networks in ECG classification. arXiv preprint arXiv:1810.07088 . https://doi.org/10.48550/arXiv.1810.07088 (2018).

Hu, J. et al. A multichannel 2D convolutional neural network model for task-evoked fMRI data classification. Comput. Intell. Neurosci. 2019 , 5065214. https://doi.org/10.1155/2019/5065214 (2019).

Article PubMed PubMed Central Google Scholar

Zhang, S. et al. A deep learning framework for modeling structural features of RNA-binding protein targets. Nucleic Acids Res. 44 , e32. https://doi.org/10.1093/nar/gkv1025 (2016).

Article PubMed Google Scholar

Maurício, J., Domingues, I. & Bernardino, J. Comparing vision transformers and convolutional neural networks for image classification: A literature review. Appl. Sci. 13 , 5521. https://doi.org/10.3390/app13095521 (2023).

Article CAS Google Scholar

Dosovitskiy, A. et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 . https://doi.org/10.48550/arXiv.2010.11929 (2020).

Carion, N. et al. End-to-end object detection with transformers. Computer Vision-ECCV 2020 (Springer), pp. 213–229. https://doi.org/10.1007/978-3-030-58452-8_13 (2020).

Lv, W. et al. DETRs beat YOLOs on real-time object detection. arXiv preprint arXiv:2304.08069 . https://doi.org/10.48550/arXiv.2304.08069 (2023).

Sun, Y. et al. Retentive network: A successor to transformer for large language models. arXiv preprint arXiv:2307.08621 . https://doi.org/10.48550/arXiv.2307.08621 (2023).

Zhou, D.-X. Universality of deep convolutional neural networks. Appl. Comput. Harmonic Anal. 48 , 787–794. https://doi.org/10.1016/j.acha.2019.06.004 (2020).

Article MathSciNet Google Scholar

Chiyuan, Z., Bengio, S., Hardt, M., Recht, B. & Vinyals, O. Understanding deep learning (still) requires rethinking generalization. Commun. ACM 64 , 107–115. https://doi.org/10.1145/3446776 (2021).

Ma, W., Papadakis, M., Tsakmalis, A., Cordy, M. & Traon, Y. L. Test selection for deep learning systems. ACM Trans. Softw. Eng. Methodol. 30 , 13. https://doi.org/10.1145/3417330 (2021).

Liu, Z., Michaud, E. J. & Tegmark, M. Omnigrok: grokking beyond algorithmic data. arXiv preprint arXiv:2210.01117 . https://doi.org/10.48550/arXiv.2210.01117 (2022).

Power, A., Burda, Y., Edwards, H., Babuschkin, I. & Misra, V. Grokking: generalization beyond overfitting on small algorithmic datasets. arXiv preprint arXiv:2201.02177 . https://doi.org/10.48550/arXiv.2201.02177 (2022).

Simonyan, K., Vedaldi, A. & Zisserman, A. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 . https://doi.org/10.48550/arXiv.1312.6034 (2013).

Kim, Y. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 . https://doi.org/10.48550/arXiv.1408.5882 (2014).

Abdel-Hamid, O. et al. Convolutional neural networks for speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 22 , 1533–1545. https://doi.org/10.1109/TASLP.2014.2339736 (2014).

Hatami, N., Gavet, Y. & Debayle, J. Classification of time-series images using deep convolutional neural networks. in Proceedings Tenth International Conference on Machine Vision (ICMV 2017) 10696 , 106960Y. https://doi.org/10.1117/12.2309486 (2018).

Smith, M. A. et al. Molecular barcoding of native RNAs using nanopore sequencing and deep learning. Genome Res. 30 , 1345–1353. https://doi.org/10.1101/gr.260836.120 (2020).

Article CAS PubMed PubMed Central Google Scholar

Emek Soylu, B. et al. Deep-Learning-based approaches for semantic segmentation of natural scene images: A review. Electronics 12 , 2730. https://doi.org/10.3390/electronics12122730 (2023).

Hosseini, H., Xiao, B., Jaiswal, M. & Poovendran, R. On the limitation of Convolutional Neural Networks in recognizing negative images. in 16th IEEE International Conference on Machine Learning and Applications, pp. 352–358. https://ieeexplore.ieee.org/document/8260656 (2017).

Montserrat, D. M., Lin, Q., Allebach, J. & Delp, E. J. Training object detection and recognition CNN models using data augmentation. Electron. Imaging 2017 , 27–36. https://doi.org/10.2352/ISSN.2470-1173.2017.10.IMAWM-163 (2017).

Cubuk, E. D., Zoph, B., Mane, D., Vasudevan, V. & Le, Q. V. Autoaugment: learning augmentation policies from data. arXiv preprint arXiv:1805.09501 . https://doi.org/10.48550/arXiv.1805.09501 (2018).

Hataya, R., Zdenek, J., Yoshizoe, K. & Nakayama, H. Faster AutoAugment: Learning augmentation strategies using backpropagation, in Computer Vision–ECCV 2020: 16th European Conference, Proceedings, Part XXV, pp. 1–16 (Springer). https://doi.org/10.1007/978-3-030-58595-2_1 (2020).

Xiao, K., Engstrom, L., Ilyas, A. & Madry, A. Noise or signal: the role of image backgrounds in object recognition. arXiv preprint arXiv:2006.09994 . https://doi.org/10.48550/arXiv.2006.09994 (2020).

Kovalerchuk, B., Kalla, D. C. & Agarwal, B., Deep learning image recognition for non-images, in Integrating artificial intelligence and visualization for visual knowledge discovery (eds. Kovalerchuk, B., et al. ) pp. 63–100 (Springer). https://doi.org/10.1007/978-3-030-93119-3_3 (2022).

Samek, W., Binder, A., Montavon, G., Lapuschkin, S. & Muller, K. R. Evaluating the visualization of what a deep neural network has learned. IEEE Trans. Neural Netw. Learn. Syst. 28 , 2660–2673. https://doi.org/10.1109/tnnls.2016.2599820 (2017).

Article MathSciNet PubMed Google Scholar

Montavon, G., Samek, W. & Müller, K.-R. Methods for interpreting and understanding deep neural networks. Digital Signal Process. 73 , 1–15. https://doi.org/10.1016/j.dsp.2017.10.011 (2018).

De Cesarei, A., Cavicchi, S., Cristadoro, G. & Lippi, M. Do humans and deep convolutional neural networks use visual information similarly for the categorization of natural scenes?. Cognit. Sci. 45 , e13009. https://doi.org/10.1111/cogs.13009 (2021).

Kindermans, P.-J. et al. The (un) reliability of saliency methods, in Explainable AI: Interpreting, explaining and visualizing deep learning. Lecture Notes in Computer Science 11700 , pp. 267–280 (Springer). https://doi.org/10.1007/978-3-030-28954-6_14 (2019).

Zeiler, M. D. & Fergus, R. Visualizing and understanding convolutional networks. Computer Vision—ECCV 2014, pp. 818–833 (Fleet, D., Pajdla T., Schiele, B., & Tuytelaars, T., eds) (Springer). https://doi.org/10.1007/978-3-319-10590-1_53 (2014).

Springenberg, J. T., Dosovitskiy, A., Brox, T. & Riedmiller, M. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806 . https://doi.org/10.48550/arXiv.1412.6806 (2014).

Binder, A., Montavon, G., Lapuschkin, S., Müller, K.-R. & Samek, W. Layer-wise relevance propagation for neural networks with local renormalization layers, in Artificial Neural Networks and Machine Learning–ICANN 2016: Proceedings 25th International Conference on Artificial Neural Networks, pp. 63–71 (Springer). https://doi.org/10.1007/978-3-319-44781-0_8 (2016).

Selvaraju, R. R. et al. Grad-cam: visual explanations from deep networks via gradient-based localization. Proceedings of the 2017 IEEE international conference on computer vision, pp. 618–626. https://ieeexplore.ieee.org/document/8237336 (2017).

Sundararajan, M., Taly, A. & Yan, Q. (2017) Axiomatic attribution for deep networks. in Proceedings of the 34th International Conference on Machine Learning 70 , 3319–3328. https://doi.org/10.5555/3305890.3306024 .

Shrikumar, A., Greenside, P. & Kundaje, A. Learning important features through propagating activation differences. in Proceedings of the 34th International Conference on Machine Learning 70 , 3145–3153. https://doi.org/10.5555/3305890.3306006 (2017).

Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. in Proceedings of the 31st International Conference on Machine Learning, pp . 4768–4777. https://doi.org/10.5555/3295222.3295230 (2017).

Ersavas, T. Deepmapper. https://github.com/tansel/deepmapper (2023).

Kokhlikyan, N. et al. Captum: A unified and generic model interpretability library for pytorch. arXiv preprint arXiv:2009.07896 . https://doi.org/10.48550/arXiv.2009.07896 (2020).

Guyon, I. G. S. B.-H. A. & Dror, G. Gisette. UCI Machine Learning Repository . https://archive.ics.uci.edu/dataset/170/gisette (2008).

PyTorch, torch.rand. https://pytorch.org/docs/stable/generated/torch.rand.html (2023).

Sharma, A., Vans, E., Shigemizu, D., Boroevich, K. A. & Tsunoda, T. DeepInsight: A methodology to transform a non-image data to an image for convolution neural network architecture. Sci. Rep. 9 , 11399. https://doi.org/10.1038/s41598-019-47765-6 (2019).

Article ADS CAS PubMed PubMed Central Google Scholar

Sharma, A., Lysenko, A., Boroevich, K. A., Vans, E. & Tsunoda, T. DeepFeature: feature selection in nonimage data using convolutional neural network. Brief. Bioinform. 22 , bbab297. https://doi.org/10.1093/bib/bbab297 (2021).

Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 . https://doi.org/10.48550/arXiv.1409.1556 (2014).

Pytorch2, AdaptiveAvgPool2d. https://pytorch.org/docs/stable/generated/torch.nn.AdaptiveAvgPool2d.html (2023).

Download references

Acknowledgements

We thank Murat Karaorman, Mitchell Cummins, and Fatemeh Vafaee for helpful advice and comments on the manuscript. This research is supported by an Australian Government Research Training Program Scholarships RSAI8000 and RSAP1000 to T.E., a Fonds de Recherche du Quebec Santé Junior 1 Award 284217 to M.A.S., and UNSW SHARP Grant RG193211 to J.S.M.

Author information

Authors and affiliations.

School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Sydney, NSW, 2052, Australia

Tansel Ersavas, Martin A. Smith & John S. Mattick

Department of Biochemistry and Molecular Medicine, Faculty of Medicine, Université de Montréal, Montréal, QC, H3C 3J7, Canada

Martin A. Smith

CHU Sainte-Justine Research Centre, Montreal, Canada

UNSW RNA Institute, UNSW Sydney, Australia

You can also search for this author in PubMed Google Scholar

Contributions

T.E. developed the methods, implemented DeepMapper and produced the first draft of the paper. J.S.M. provided advice, structured the paper, and edited it for improved readability and clarity. M.A.S. provided advice and edited the paper.

Corresponding authors

Correspondence to Tansel Ersavas or John S. Mattick .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Ersavas, T., Smith, M.A. & Mattick, J.S. Novel applications of Convolutional Neural Networks in the age of Transformers. Sci Rep 14 , 10000 (2024). https://doi.org/10.1038/s41598-024-60709-z

Download citation

Received : 16 January 2024

Accepted : 26 April 2024

Published : 01 May 2024

DOI : https://doi.org/10.1038/s41598-024-60709-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

Explore articles by subject
Guide to authors
Editorial policies

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Bibliography
More Referencing guides Blog Automated transliteration Relevant bibliographies by topics
Automated transliteration
Relevant bibliographies by topics
Referencing guides

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

artificial-neural-networks

Here are 64 public repositories matching this topic..., ethz-pes / ai-mag.

AI-mag: Inductor Modeling and Design with FEM and Artificial Neural Network

Updated Aug 10, 2023

taliegemen / Neural-Network-Adaptive-PID-Controller

Official repository of Artificial Neural Network-Based Adaptive PID Controller Design for Vertical Takeoff and Landing Model, which is presented in European Journal of Science and Technology.

Updated Mar 14, 2024

darshanime / neural-networks-MATLAB

Implementation of Artificial neural networks in MATLAB.

Updated Aug 7, 2015

vasudev-sharma / Breast-Cancer-Detection-using-Artificial-Neural-Networks

Matlab based GUI to predict breast cancer using Deep Learning

Updated Apr 17, 2021

Varniex / Load-Forecasting

Load Forecasting with MATLAB (ANN)

Updated Sep 9, 2019

archana1998 / image-encryption

Image Encryption and Decryption using Neural Networks

Updated Oct 14, 2019

AbhiSaphire / MachineLearning-CourseraCourse-StanfordUniversity

This repo is for anyone who wants help in understanding and solving Andrew Ng's Coursera Course on Machine Learning Assignments and Quizes. (YEAR 2020)

Updated May 19, 2020

datarocksAmy / MATLAB

Deep Learning using Neural Network Toolbox + Finance Portfolio Selection with MorningStar

Updated May 5, 2018

shivang8 / Digit-Recognition

Digit Recognition using backpropagation algorithm on Artificial Neural Network with MATLAB. Dataset used from MNSIT.

Updated Feb 12, 2018

sleebapaul / masters_thesis_project

M. Tech. Thesis Project on Detection and Identification of Hybrid Distribution System Using Wavelet Transform and Artificial Neural Networks.

Updated Jun 28, 2017

Sohan-Rai / Multivariate-windspeed-prediction-using-ANN

Artificial neural networks model on matlab to predict wind speed. Data on wind speed, humidity, temperature and wind direction was obtained from Bagalkot wind farm, Karnataka, India, in the year 2014.

Updated Dec 18, 2017

virajmavani / predicting-wind-speed

A MATLAB project to predict wind speed on the basis of temperature

Updated May 11, 2018

FilipePires98 / 64-QAM-Classification

Optical Communications: 64-QAM classification with neural networks.

Updated Apr 15, 2021

mahshadlotfinia / Stress316L

Machine Learning-Based Generalized Model for Finite Element Analysis of Roll Deflection During the Austenitic Stainless Steel 316L Strip Rolling

Updated Feb 12, 2021

AlexSantopaolo / Implementation-of-Artificial-Pancreas-for-Patients-of-1-Diabetes-Using-Artificial-Neural-Network-

University project concerning the implementation of an Artificial Pancreas exploiting the potentiality of ANN. To generate the data for training our neural network, a reference controller was used. The reference controller employed was Model Predictive Control. The code has been written in Matlab.

Updated Nov 20, 2020

AabidPatel / Leaf-Disease-Detection-and-Health-Grading-System-using-MATLAB

This project identifies a disease caused by a particular micro-organism that is infested on the leaf of a plant and also shows the estimated health severity of the leaf based on how much of a leaf is infected.

Updated Jul 2, 2021

angelinbeni / ANN-tuning

Optimization of Artificial Neural Network using Bear Smell Search Algorithm

Updated Apr 23, 2022

o-messai / 3DBIQA-AdaBoost

Adaboost Neural Network And Cyclopean View For No-reference Stereoscopic Image Quality Assessment

Updated Feb 15, 2022

hahmadima / ANNGA_opt

Modelling with Artificial neural network optimized by Genetic algorithm

Updated Aug 11, 2020

lucianoandrade1 / Time-Series-Forecasting

Implementation of time series forecasting using some artificial neural networks from Mathworks MATLAB toolbox.

Updated Apr 17, 2023

Improve this page

Add a description, image, and links to the artificial-neural-networks topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the artificial-neural-networks topic, visit your repo's landing page and select "manage topics."

Trending Now
Foundational Courses
Data Science
Practice Problem
Machine Learning
System Design
DevOps Tutorial
Deep Learning Tutorial
Introduction to Deep Learning
Difference Between Artificial Intelligence vs Machine Learning vs Deep Learning

Basic Neural Network

Difference between ANN and BNN
Single Layer Perceptron in TensorFlow
Multi-Layer Perceptron Learning in Tensorflow
Deep Neural net with forward and back propagation from scratch - Python
Understanding Multi-Layer Feed Forward Networks
List of Deep Learning Layers
Activation Functions
Types Of Activation Function in ANN
Activation Functions in Pytorch
Understanding Activation Functions in Depth

Artificial Neural Network

Artificial neural networks and its applications.

Gradient Descent Optimization in Tensorflow
Choose Optimal Number of Epochs to Train a Neural Network in Keras

Classification

Python | Classify Handwritten Digits with Tensorflow
Train a Deep Learning Model With Pytorch
Linear Regression using PyTorch
Linear Regression Using Tensorflow
Hyperparameter tuning
Introduction to Convolution Neural Network
Digital Image Processing Basics
Difference between Image Processing and Computer Vision
CNN | Introduction to Pooling Layer
CIFAR-10 Image Classification in TensorFlow
Implementation of a CNN based Image Classifier using PyTorch
Convolutional Neural Network (CNN) Architectures
Object Detection vs Object Recognition vs Image Segmentation
YOLO v2 - Object Detection

Recurrent Neural Network

Natural Language Processing (NLP) Tutorial
Introduction to NLTK: Tokenization, Stemming, Lemmatization, POS Tagging
Word Embeddings in NLP
Introduction to Recurrent Neural Network
Recurrent Neural Networks Explanation
Sentiment Analysis with an Recurrent Neural Networks (RNN)
Short term Memory
Deep Learning | Introduction to Long Short Term Memory
Long Short Term Memory Networks Explanation
LSTM - Derivation of Back propagation through time
Text Generation using Recurrent Long Short Term Memory Network
Gated Recurrent Unit Networks
ML | Text Generation using Gated Recurrent Unit Networks

Generative Learning

Autoencoders -Machine Learning
How Autoencoders works ?
Variational AutoEncoders
Contractive Autoencoder (CAE)
ML | AutoEncoder with TensorFlow 2.0
Implementing an Autoencoder in PyTorch

Generative adversarial networks

Basics of Generative Adversarial Networks (GANs)
Generative Adversarial Network (GAN)
Use Cases of Generative Adversarial Networks
Building a Generative Adversarial Network using Keras
Cycle Generative Adversarial Network (CycleGAN)
StyleGAN - Style Generative Adversarial Networks

Reinforcement Learning

Understanding Reinforcement Learning in-depth
Introduction to Thompson Sampling | Reinforcement Learning
Markov Decision Process
Bellman Equation
Meta-Learning in Machine Learning
Q-Learning in Python
ML | Reinforcement Learning Algorithm : Python Implementation using Q-learning

Deep Q Learning

Deep Q-Learning
Implementing Deep Q-Learning using Tensorflow
AI Driven Snake Game using Deep Q Learning
Deep Learning Interview Questions

As you read this article, which organ in your body is thinking about it? It’s the brain of course! But do you know how the brain works? Well, it has neurons or nerve cells that are the primary units of both the brain and the nervous system. These neurons receive sensory input from the outside world which they process and then provide the output which might act as the input to the next neuron.

Each of these neurons is connected to other neurons in complex arrangements at synapses. Now, are you wondering how this is related to Artificial Neural Networks ? Well, Artificial Neural Networks are modeled after the neurons in the human brain. Let’s check out what they are in detail and how they learn information.

Artificial Neural Networks

Artificial Neural Networks contain artificial neurons which are called units . These units are arranged in a series of layers that together constitute the whole Artificial Neural Network in a system. A layer can have only a dozen units or millions of units as this depends on how the complex neural networks will be required to learn the hidden patterns in the dataset. Commonly, Artificial Neural Network has an input layer, an output layer as well as hidden layers. The input layer receives data from the outside world which the neural network needs to analyze or learn about. Then this data passes through one or multiple hidden layers that transform the input into data that is valuable for the output layer. Finally, the output layer provides an output in the form of a response of the Artificial Neural Networks to input data provided.

In the majority of neural networks, units are interconnected from one layer to another. Each of these connections has weights that determine the influence of one unit on another unit. As the data transfers from one unit to another, the neural network learns more and more about the data which eventually results in an output from the output layer.

Neural Networks Architecture

The structures and operations of human neurons serve as the basis for artificial neural networks. It is also known as neural networks or neural nets. The input layer of an artificial neural network is the first layer, and it receives input from external sources and releases it to the hidden layer, which is the second layer. In the hidden layer, each neuron receives input from the previous layer neurons, computes the weighted sum, and sends it to the neurons in the next layer. These connections are weighted means effects of the inputs from the previous layer are optimized more or less by assigning different-different weights to each input and it is adjusted during the training process by optimizing these weights for improved model performance.

Artificial neurons vs Biological neurons

The concept of artificial neural networks comes from biological neurons found in animal brains So they share a lot of similarities in structure and function wise.

Structure : The structure of artificial neural networks is inspired by biological neurons. A biological neuron has a cell body or soma to process the impulses, dendrites to receive them, and an axon that transfers them to other neurons. The input nodes of artificial neural networks receive input signals, the hidden layer nodes compute these input signals, and the output layer nodes compute the final output by processing the hidden layer’s results using activation functions.
Synapses : Synapses are the links between biological neurons that enable the transmission of impulses from dendrites to the cell body. Synapses are the weights that join the one-layer nodes to the next-layer nodes in artificial neurons. The strength of the links is determined by the weight value.
Learning : In biological neurons, learning happens in the cell body nucleus or soma, which has a nucleus that helps to process the impulses. An action potential is produced and travels through the axons if the impulses are powerful enough to reach the threshold. This becomes possible by synaptic plasticity, which represents the ability of synapses to become stronger or weaker over time in reaction to changes in their activity. In artificial neural networks, backpropagation is a technique used for learning, which adjusts the weights between nodes according to the error or differences between predicted and actual outcomes.
Activation : In biological neurons, activation is the firing rate of the neuron which happens when the impulses are strong enough to reach the threshold. In artificial neural networks, A mathematical function known as an activation function maps the input to the output, and executes activations.

Biological neurons to Artificial neurons

How do Artificial Neural Networks learn?

Artificial neural networks are trained using a training set. For example, suppose you want to teach an ANN to recognize a cat. Then it is shown thousands of different images of cats so that the network can learn to identify a cat. Once the neural network has been trained enough using images of cats, then you need to check if it can identify cat images correctly. This is done by making the ANN classify the images it is provided by deciding whether they are cat images or not. The output obtained by the ANN is corroborated by a human-provided description of whether the image is a cat image or not. If the ANN identifies incorrectly then back-propagation is used to adjust whatever it has learned during training. Backpropagation is done by fine-tuning the weights of the connections in ANN units based on the error rate obtained. This process continues until the artificial neural network can correctly recognize a cat in an image with minimal possible error rates.

What are the types of Artificial Neural Networks?

Feedforward Neural Network : The feedforward neural network is one of the most basic artificial neural networks. In this ANN, the data or the input provided travels in a single direction. It enters into the ANN through the input layer and exits through the output layer while hidden layers may or may not exist. So the feedforward neural network has a front-propagated wave only and usually does not have backpropagation.
Convolutional Neural Network : A Convolutional neural network has some similarities to the feed-forward neural network, where the connections between units have weights that determine the influence of one unit on another unit. But a CNN has one or more than one convolutional layer that uses a convolution operation on the input and then passes the result obtained in the form of output to the next layer. CNN has applications in speech and image processing which is particularly useful in computer vision.
Modular Neural Network: A Modular Neural Network contains a collection of different neural networks that work independently towards obtaining the output with no interaction between them. Each of the different neural networks performs a different sub-task by obtaining unique inputs compared to other networks. The advantage of this modular neural network is that it breaks down a large and complex computational process into smaller components, thus decreasing its complexity while still obtaining the required output.
Radial basis function Neural Network: Radial basis functions are those functions that consider the distance of a point concerning the center. RBF functions have two layers. In the first layer, the input is mapped into all the Radial basis functions in the hidden layer and then the output layer computes the output in the next step. Radial basis function nets are normally used to model the data that represents any underlying trend or function.
Recurrent Neural Network: The Recurrent Neural Network saves the output of a layer and feeds this output back to the input to better predict the outcome of the layer. The first layer in the RNN is quite similar to the feed-forward neural network and the recurrent neural network starts once the output of the first layer is computed. After this layer, each unit will remember some information from the previous step so that it can act as a memory cell in performing computations.

Applications of Artificial Neural Networks

Social Media: Artificial Neural Networks are used heavily in Social Media. For example, let’s take the ‘People you may know’ feature on Facebook that suggests people that you might know in real life so that you can send them friend requests. Well, this magical effect is achieved by using Artificial Neural Networks that analyze your profile, your interests, your current friends, and also their friends and various other factors to calculate the people you might potentially know. Another common application of Machine Learning in social media is facial recognition . This is done by finding around 100 reference points on the person’s face and then matching them with those already available in the database using convolutional neural networks.
Marketing and Sales: When you log onto E-commerce sites like Amazon and Flipkart, they will recommend your products to buy based on your previous browsing history. Similarly, suppose you love Pasta, then Zomato, Swiggy, etc. will show you restaurant recommendations based on your tastes and previous order history. This is true across all new-age marketing segments like Book sites, Movie services, Hospitality sites, etc. and it is done by implementing personalized marketing . This uses Artificial Neural Networks to identify the customer likes, dislikes, previous shopping history, etc., and then tailor the marketing campaigns accordingly.
Healthcare : Artificial Neural Networks are used in Oncology to train algorithms that can identify cancerous tissue at the microscopic level at the same accuracy as trained physicians. Various rare diseases may manifest in physical characteristics and can be identified in their premature stages by using Facial Analysis on the patient photos. So the full-scale implementation of Artificial Neural Networks in the healthcare environment can only enhance the diagnostic abilities of medical experts and ultimately lead to the overall improvement in the quality of medical care all over the world.
Personal Assistants: I am sure you all have heard of Siri, Alexa, Cortana, etc., and also heard them based on the phones you have!!! These are personal assistants and an example of speech recognition that uses Natural Language Processing to interact with the users and formulate a response accordingly. Natural Language Processing uses artificial neural networks that are made to handle many tasks of these personal assistants such as managing the language syntax, semantics, correct speech, the conversation that is going on, etc.

Please Login to comment...

Improve your Coding Skills with Practice

What kind of Experience do you want to share?

IMAGES

Research Artificial Neural Network Thesis Topics (Ideas)
PPT
High Quality Artificial Neural Network Thesis [Professional PhD Writers]
Understanding Artificial Neural network (ANN)
An Illustrated Guide to Artificial Neural Networks
Introduction to Neural Networks and Their Key Elements…

VIDEO

Demystifying Neural Networks: Understanding their Layers and Role in AI
Graph Neural Networks vs. Traditional Methods for Recommending MOOC Courses
Neural Network: Models of artificial neural netwok
How an Artificial Neuron Works?
Lecture 10: Artificial Neural Networks
Bacterial foraging optimization based Radial Basis Function Neural Network BRBFNN for identification

COMMENTS

13 Interesting Neural Network Project Ideas & Topics for ...
Here are few examples of Neural network projects with source code. 1. Autoencoders based on neural networks. Autoencoders are the simplest of deep learning architectures. They are a specific type of feedforward neural networks where the input is first compressed into a lower-dimensional code.
Trending Research Topics & Ideas in Neural Networks
Research Topics in Neural Networks. In artificial intelligence and machine learning, Neural Networks-based research is a wide and consistently emerging field. ... What specific neural network architectures are being explored in the research thesis? Neural Network Architecture operates by using organized layers to change input data into ...
Advanced Topics in Neural Networks
The first provides a simple introduction to the topic of neural networks, to those who are unfamiliar. The second article covers more intermediary topics such as activation functions, neural architecture, and loss functions. The third article looks at more advanced aspects such as momentum, adaptive learning rates, and batch normalization.
Research Artificial Neural Network Thesis Topics (Ideas)
Neural Network Topics. Artificial Neural Topics offered by us for budding students and research scholars. We always provide thesis topics on current trends because we are one of the members in high-level journals like IEEE, SPRINGER, Elsevier, and other SCI-indexed journals. Our company is an ISO 9001.2000 certified company that wrote a thesis ...
(PDF) Artificial Neural Networks: An Overview
Neural networks, also known as artificial neural networks, are a type of deep learning technology that falls under the. category of artificial intelligence, or AI. These technologies' commercial ...
Available Master's thesis topics in machine learning
Towards precision medicine for cancer patient stratification. Unraveling gene regulation from single cell data. Developing a Stress Granule Classifier. Machine Learning based Hyperheuristic algorithm. Machine learning for solving satisfiability problems and applications in cryptanalysis. Hybrid modeling approaches for well drilling with Sintef.
PDF IMAGE CLASSIFICATION USING CONVOLUTIONAL NEURAL NETWORKS
An artificial neural network (ANN) is a set of layers of neurons (in this context they are called units or nodes). In the case of a fully connected ANN, each unit in a layer is connected to each unit in the next layer (Figure 2). FIGURE 2. The artificial neural network architecture (ANN i-h 1-h 2-h n-o). (5.)
Intelligent Autonomous Systems
Here, we offer a thesis that focuses on the position of the online Q neural network in the space of Q functions. The student will first investigate this idea on simple problems before comparing the performance to strong baselines such as DQN or REM [1, 4] on Atari games.
[2010.01496] Explaining Deep Neural Networks
Therefore, several directions for explaining neural models have recently been explored. In this thesis, I investigate two major directions for explaining deep neural networks. The first direction consists of feature-based post-hoc explanatory methods, that is, methods that aim to explain an already trained and fixed model (post-hoc), and that ...
Fundamentals of Artificial Neural Networks and Deep Learning
Therefore, the topology of an artificial neural network is the way in which neurons are organized inside the network; it is closely linked to the learning algorithm used to train the network. Depending on the number of layers, we define the networks as monolayer and multilayer ; and if we take as a classification element the way information ...
Explaining Deep Neural Networks
Therefore, several directions for explaining neural models have recently been explored. In this thesis, I investigate two major directions for explaining deep neural networks. The rst direction consists of feature-based post-hoc explanatory methods, that is, methods that aim to explain an already trained and
artificial-neural-networks · GitHub Topics · GitHub
Add this topic to your repo. To associate your repository with the artificial-neural-networks topic, visit your repo's landing page and select "manage topics." GitHub is where people build software. More than 100 million people use GitHub to discover, fork, and contribute to over 420 million projects.
Thesis
This thesis studies the introduction of a priori structure into the design of learning systems based on artificial neural networks applied to sequence recognition, in particular to phoneme recognition in continuous speech. Because we are interested in sequence analysis, algorithms for training recurrent networks are studied and an original ...
PDF ARTIFICIAL NEURAL NETWORKS AND DEEP LEARNING
networks. How does artificial neural network function and how it is structured comprehensively. Some of the most often used terminology of the subject will also be covered. The thesis also covers the practical applications of artificial neural network technology and how it is used in different fields of industry. Positive and negative ...
Modeling Intelligence via Graph Neural Networks
In this thesis, we address the fundamental problem of modeling intelligence that can learn to represent and reason about the world. We study both questions from the lens of graph neural networks, a class of neural networks acting on graphs. First, we can abstract many objects in the world as graphs and learn their representations with graph ...
Intermediate Topics in Neural Networks
While vanilla neural networks (also called "perceptrons") have been around since the 1940s, it is only in the last several decades where they have become a major part of artificial intelligence.This is due to the arrival of a technique called backpropagation (which we discussed in the previous tutorial), which allows networks to adjust their neuron weights in situations where the outcome ...
Exploring the Advancements and Future Research Directions of Artificial
Artificial Neural Networks (ANNs) are machine learning algorithms inspired by the structure and function of the human brain. Their popularity has increased in recent years due to their ability to learn and improve through experience, making them suitable for a wide range of applications. ANNs are often used as part of deep learning, which enables them to learn, transfer knowledge, make ...
(PDF) An artificial neural network analysis of academic perfomance in
Therefore, this doctoral dissertation aims to extend both conceptually and methodologically the use of artificial neural networks to analyze academic performance in higher education. This doctoral ...
Novel applications of Convolutional Neural Networks in the age of
Convolutional Neural Networks (CNNs) have been central to the Deep Learning revolution and played a key role in initiating the new age of Artificial Intelligence. However, in recent years newer ...
Artificial Neural Networks
Relevant answer. Sundus F Hantoosh. Dec 19, 2023. Answer. Dear Doctor. "A neural network is a machine learning (ML) model designed to mimic the function and structure of the human brain. Neural ...
Dissertations / Theses: 'Artificial Neural Networking (ANN ...
This dissertation research aims to identify design strategies for the deployment of an Artificial Neural Network (ANN) based AID algorithm for major arterial streets. A section of the US-1 corridor in Miami-Dade County, Florida was coded in the CORSIM microscopic simulation model to generate data for both model calibration and validation.
Artificial Neural Networks Applied in Civil Engineering
The authors who published the most journal articles related to subjects in this Special Issue on "Artificial neural networks applied in civil engineering" are listed in Table 1: Table 1. The authors with most journal articles related to "Artificial neural networks applied in civil engineering", in Scopus (until 26 December 2022). 2.
Dissertations / Theses on the topic 'Artificial neural networks'
List of dissertations / theses on the topic 'Artificial neural networks'. Scholarly publications with full text pdf download. Related research topic ideas.
artificial-neural-networks · GitHub Topics · GitHub
Add this topic to your repo. To associate your repository with the artificial-neural-networks topic, visit your repo's landing page and select "manage topics." GitHub is where people build software. More than 100 million people use GitHub to discover, fork, and contribute to over 420 million projects.
Artificial Neural Networks for Neuroscientists: A Primer
Artificial neural networks (ANNs) are essential tools in machine learning that have drawn increasing attention in neuroscience. Besides offering powerful techniques for data analysis, ANNs provide a new approach for neuroscientists to build models for complex behaviors, heterogeneous neural activity, and circuit connectivity, as well as to explore optimization in neural systems, in ways that ...
PDF Bond University DOCTORAL THESIS Artificial Neural Networks: A Financial
exchange trading systems. The thesis examines the methodologies involved in applying ANNs to these problems as well as comparing their results with those of more conventional econometric methods. The chapter outline is as follows: 1: Introduction to Artificial Intelligence and Artificial Neural Networks 1: An Artificial Neural Networks' Primer
Artificial Neural Networks and its Applications
It is also known as neural networks or neural nets. The input layer of an artificial neural network is the first layer, and it receives input from external sources and releases it to the hidden layer, which is the second layer. In the hidden layer, each neuron receives input from the previous layer neurons, computes the weighted sum, and sends ...
The Artificial Intelligence and Neural Network in Teaching
2.1. AI-Based Smart Classroom. AI provides powerful technical support for the intellectualization of distance education [].Students' thinking paths and their problem-solving potential target structures are tracked, and students' understanding domains are diagnosed and evaluated using expert systems, natural language processing, artificial neural networks, machine learning, and other technologies.