Encyclopedia Britannica

  • Games & Quizzes
  • History & Society
  • Science & Tech
  • Biographies
  • Animals & Nature
  • Geography & Travel
  • Arts & Culture
  • On This Day
  • One Good Fact
  • New Articles
  • Lifestyles & Social Issues
  • Philosophy & Religion
  • Politics, Law & Government
  • World History
  • Health & Medicine
  • Browse Biographies
  • Birds, Reptiles & Other Vertebrates
  • Bugs, Mollusks & Other Invertebrates
  • Environment
  • Fossils & Geologic Time
  • Entertainment & Pop Culture
  • Sports & Recreation
  • Visual Arts
  • Demystified
  • Image Galleries
  • Infographics
  • Top Questions
  • Britannica Kids
  • Saving Earth
  • Space Next 50
  • Student Center

polynucleotide chain of deoxyribonucleic acid (DNA)

  • Who discovered the structure of DNA?
  • What is an enzyme?
  • What are enzymes composed of?
  • What are examples of enzymes?
  • What factors affect enzyme activity?

DNA strands on blue background


Our editors will review what you’ve submitted and determine whether to revise the article.

  • Khan Academy - Introduction to biomolecules
  • Nature - A learning based framework for diverse biomolecule relationship prediction in molecular association network
  • University of Minnesota Libraries Publishes Journals - Human Biology - Biological Molecules
  • Academia - Biomolecules : Classification and structural properties of carbohydrates
  • Biology LibreTexts - Biological Molecules
  • BCcampus Open Publishing - Biological Molecules
  • National Center for Biotechnology Information - PubMed Central - Biomolecule and Bioentity Interaction Databases in Systems Biology: A Comprehensive Review
  • Northern Arizona University - The Molecules of Life

polynucleotide chain of deoxyribonucleic acid (DNA)

biomolecule , any of numerous substances that are produced by cells and living organisms. Biomolecules have a wide range of sizes and structures and perform a vast array of functions. The four major types of biomolecules are carbohydrates , lipids , nucleic acids , and proteins .

Among biomolecules, nucleic acids, namely DNA and RNA , have the unique function of storing an organism’s genetic code —the sequence of nucleotides that determines the amino acid sequence of proteins, which are of critical importance to life on Earth. There are 20 different amino acids that can occur within a protein; the order in which they occur plays a fundamental role in determining protein structure and function. Proteins themselves are major structural elements of cells. They also serve as transporters, moving nutrients and other molecules in and out of cells, and as enzymes and catalysts for the vast majority of chemical reactions that take place in living organisms. Proteins also form antibodies and hormones , and they influence gene activity.

write an assignment on biomolecules

Likewise, carbohydrates, which are made up primarily of molecules containing atoms of carbon , hydrogen , and oxygen , are essential energy sources and structural components of all life, and they are among the most abundant biomolecules on Earth. They are built from four types of sugar units— monosaccharides , disaccharides , oligosaccharides , and polysaccharides . Lipids, another key biomolecule of living organisms, fulfill a variety of roles, including serving as a source of stored energy and acting as chemical messengers. They also form membranes , which separate cells from their environments and compartmentalize the cell interior, giving rise to organelles , such as the nucleus and the mitochondrion , in higher (more complex) organisms.

write an assignment on biomolecules

All biomolecules share in common a fundamental relationship between structure and function, which is influenced by factors such as the environment in which a given biomolecule occurs. Lipids, for example, are hydrophobic (“water-fearing”); in water, many spontaneously arrange themselves in such a way that the hydrophobic ends of the molecules are protected from the water, while the hydrophilic ends are exposed to the water. This arrangement gives rise to lipid bilayers, or two layers of phospholipid molecules, which form the membranes of cells and organelles. In another example, DNA, which is a very long molecule—in humans, the combined length of all the DNA molecules in a single cell stretched end to end would be about 1.8 metres (6 feet), whereas the cell nucleus is about 6 μm (6 10 -6 metre) in diameter—has a highly flexible helical structure that allows the molecule to become tightly coiled and looped. This structural feature plays a key role in enabling DNA to fit in the cell nucleus, where it carries out its function in coding genetic traits.

  • Biology Article

Table of Contents

What are Lipids?

Properties of lipids, lipid structure, classification of lipids, types of lipids, examples of lipids, lipids definition.

“Lipids are organic compounds that contain hydrogen, carbon, and oxygen atoms, which form the framework for the structure and function of living cells.”

BYJUS Classes Doubt solving

These organic compounds are nonpolar molecules, which are soluble only in nonpolar solvents and insoluble in water because water is a polar molecule. In the human body, these molecules can be synthesized in the liver and are found in oil, butter, whole milk, cheese, fried foods and also in some red meats.

Let us have a detailed look at the lipid structure, properties, types and classification of lipids.

Also read: Biomolecules 


Lipids are a family of organic compounds, composed of fats and oils. These molecules yield high energy and are responsible for different functions within the human body. Listed below are some important characteristics of Lipids.

  • Lipids are oily or greasy nonpolar molecules, stored in the adipose tissue of the body.
  • Lipids are a heterogeneous group of compounds, mainly composed of hydrocarbon chains.
  • Lipids are energy-rich organic molecules, which provide energy for different life processes.
  • Lipids are a class of compounds characterised by their solubility in nonpolar solvents and insolubility in water.
  • Lipids are significant in biological systems as they form a mechanical barrier dividing a cell from the external environment known as the cell membrane.

Also Read:  Digestion and Absorption of Lipids

Lipids are the polymers of fatty acids that contain a long, non-polar hydrocarbon chain with a small polar region containing oxygen. The lipid structure is explained in the diagram below:

Lipid Structure

Lipid Structure – Saturated and Unsaturated Fatty Acids

Lipids can be classified into two main classes:

  • Nonsaponifiable lipids
  • Saponifiable lipids

Nonsaponifiable Lipids

A nonsaponifiable lipid cannot be disintegrated into smaller molecules through hydrolysis. Nonsaponifiable lipids include cholesterol, prostaglandins, etc

Saponifiable Lipids

A saponifiable lipid comprises one or more ester groups, enabling it to undergo hydrolysis in the presence of a base, acid, or enzymes , including waxes, triglycerides, sphingolipids and phospholipids.

Further, these categories can be divided into non-polar and polar lipids.

Nonpolar lipids, namely triglycerides, are utilized as fuel and to store energy.

Polar lipids, that could form a barrier with an external water environment, are utilized in membranes. Polar lipids comprise sphingolipids and glycerophospholipids.

Fatty acids are pivotal components of all these lipids.

Within these two major classes of lipids, there are numerous specific types of lipids, which are important to life, including fatty acids, triglycerides, glycerophospholipids, sphingolipids and steroids. These are broadly classified as simple lipids and complex lipids.

Also read: Biomolecules in Living Organisms

Simple Lipids

Esters of fatty acids with various alcohols.

  • Fats: Esters of fatty acids with glycerol. Oils are fats in the liquid state
  • Waxes : Esters of fatty acids with higher molecular weight monohydric alcohols

Complex Lipids

Esters of fatty acids containing groups in addition to alcohol and fatty acid.

  • Phospholipids : These are lipids containing, in addition to fatty acids and alcohol, phosphate group. They frequently have nitrogen-containing bases and other substituents, eg, in glycerophospholipids the alcohol is glycerol and in sphingophospholipids the alcohol is sphingosine.
  • Glycolipids (glycosphingolipids) : Lipids containing a fatty acid, sphingosine and carbohydrate.
  • Other complex lipids : Lipids such as sulfolipids and amino lipids. Lipoproteins may also be placed in this category.

Precursor and Derived Lipids

These include fatty acids, glycerol, steroids, other alcohols, fatty aldehydes, and ketone bodies, hydrocarbons, lipid-soluble vitamins, and hormones. Because they are uncharged, acylglycerols (glycerides), cholesterol, and cholesteryl esters are termed neutral lipids. These compounds are produced by the hydrolysis of simple and complex lipids.

Some of the different types of lipids are described below in detail.

Fatty Acids

Fatty acids are carboxylic acids (or organic acid), usually with long aliphatic tails (long chains), either unsaturated or saturated.

  • Saturated fatty acids

Lack of carbon-carbon double bonds indicate that the fatty acid is saturated. The saturated fatty acids have higher melting points compared to unsaturated acids of the corresponding size due to their ability to pack their molecules together thus leading to a straight rod-like shape.

  • Unsaturated fatty acids

Unsaturated fatty acid is indicated when a fatty acid has more than one double bond.

“Often, naturally occurring fatty acids possesses an even number of carbon atoms and are unbranched.”

On the other hand, unsaturated fatty acids contain a cis-double bond(s) which create a structural kink that disables them to group their molecules in straight rod-like shape.

Role of Fats

Fats play several major roles in our body. Some of the important roles of fats are mentioned below:

  • Fats in the correct amounts are necessary for the proper functioning of our body.
  • Many fat-soluble vitamins need to be associated with fats in order to be effectively absorbed by the body.
  • They also provide insulation to the body.
  • They are an efficient way to store energy for longer periods.

Also Read:  Fats

There are different types of lipids. Some examples of lipids include butter, ghee, vegetable oil, cheese, cholesterol and other steroids, waxes, phospholipids, and fat-soluble vitamins. All these compounds have similar features, i.e. insoluble in water and soluble in organic solvents, etc.

Waxes are “esters” (an organic compound made by replacing the hydrogen with acid by an alkyl or another organic group) formed from long-alcohols and long-chain carboxylic acids.

Waxes are found almost everywhere. The fruits and leaves of many plants possess waxy coatings, that can safeguard them from small predators and dehydration.

Fur of a few animals and the feathers of birds possess the same coatings serving as water repellants.

Carnauba wax is known for its water resistance and toughness (significant for car wax).



Membranes are primarily composed of phospholipids that are Phosphoacylglycerols.

Triacylglycerols and phosphoacylglycerols are the same, but, the terminal OH group of the phosphoacylglycerol is esterified with phosphoric acid in place of fatty acid which results in the formation of phosphatidic acid.

The name phospholipid is derived from the fact that phosphoacylglycerols are lipids containing a phosphate group.

Our bodies possess chemical messengers known as hormones ,  which are basically organic compounds synthesized in glands and transported by the bloodstream to various tissues in order to trigger or hinder the desired process.

Steroids are a kind of hormone that is typically recognized by their tetracyclic skeleton, composed of three fused six-membered and one five-membered ring, as seen above. The four rings are assigned as A, B, C & D as observed in the shade blue, while the numbers in red indicate the carbons.


  • Cholesterol is a wax-like substance, found only in animal source foods.  Triglycerides, LDL, HDL, VLDL are different types of cholesterol found in the blood cells.
  • Cholesterol is an important lipid found in the cell membrane. It is a sterol, which means that cholesterol is a combination of steroid and alcohol. In the human body, cholesterol is synthesized in the liver.
  • These compounds are biosynthesized by all living cells and are essential for the structural component of the cell membrane.
  • In the cell membrane, the steroid ring structure of cholesterol provides a rigid hydrophobic structure that helps boost the rigidity of the cell membrane. Without cholesterol, the cell membrane would be too fluid.
  • It is an important component of cell membranes and is also the basis for the synthesis of other steroids, including the sex hormones estradiol and testosterone, as well as other steroids such as cortisone and vitamin D.

Also Refer:   Vitamins and Minerals

Frequently Asked Questions

What are lipids.

Lipids are organic compounds that are fatty acids or derivatives of fatty acids, which are insoluble in water but soluble in organic solvents. For eg., natural oil, steroid, waxes.

How are lipids important to our body?

Lipids play a very important role in our body. They are the structural component of the cell membrane. They help in providing energy and produce hormones in our body. They help in the proper digestion and absorption of food. They are a healthy part of our diet if taken in proper amounts. They also play an important role in signalling.

How are lipids digested?

The enzyme lipase breaks down fats into fatty acids and glycerol, which is facilitated by bile in the liver.

What is lipid emulsion?

It refers to an emulsion of lipid for human intravenous use. These are also referred to as intralipids which is the emulsion of soybean oil, glycerin and egg phospholipids. It is available in 10%, 20% and 30% concentrations.

How are lipids metabolized?

Lipid metabolism involves the oxidation of fatty acids to generate energy to synthesize new lipids from smaller molecules. The metabolism of lipids is associated with carbohydrate metabolism as the products of glucose are converted into lipids.

How are lipids released in the blood?

The medium-chain triglycerides with 8-12 carbons are digested and absorbed in the small intestine. Since lipids are insoluble in water, they are carried to the bloodstream by lipoproteins which are water-soluble and can carry the lipids internally.

What are the main types of lipids?

There are two major types of lipids- simple lipids and complex lipids. Simple lipids are esters of fatty acids with various alcohols. For eg., fats and waxes. On the contrary, complex lipids are esters of fatty acids with groups other than alcohol and fatty acids. For eg., phospholipids and sphingolipids.

What are lipids made up of?

Lipids are made up of a glycerol molecule attached to three fatty acid molecules. Such a lipid is called triglyceride.

Quiz Image

Put your understanding of this concept to test by answering a few MCQs. Click ‘Start Quiz’ to begin!

Select the correct answer and click on the “Finish” button Check your score and answers at the end of the quiz

Visit BYJU’S for all Biology related queries and study materials

Your result is as below

Request OTP on Voice Call

BIOLOGY Related Links

Leave a Comment Cancel reply

Your Mobile number and Email id will not be published. Required fields are marked *

Post My Comment

write an assignment on biomolecules

Intriguing. I often wonder if high cholesterol isn’t mirroring high blood sugar, in that some form of cell resistance is at work.

No Doubt. It’s very very outstanding. Details I liked the most. I would like to thank all the people who published this lesson and to those who actually wrote this. It’s awesome. It builds a nice understanding to the students…I mean just like me…I hate the lipids since I was in ninth class….and now I am in the first year. And now after learning from this website…I think biology is my favourite subject and lipids is my favourite topic…Thanks a lot again to all of you.

I AGREE with you!

Very good apps, which I was looking for, found everything in it.

outstsanding. all one need about lipids is here

Wow it is wonderful I really appreciate the author of the write up. I look forward to know more

Thank you so much really helpful, appreciated.

This is really helping big thanks to the publishers

Wow, wow Thank U so much Really, I am so happy to help me with this.

I kindly needed help how to answer this question ‘How does the study of lipids start?’

write an assignment on biomolecules

Register with BYJU'S & Download Free PDFs

Register with byju's & watch live videos.


Towards interpretable cryo-em: disentangling latent spaces of molecular conformations.

David A. Klindt,

  • 1 LCLS, SLAC National Accelerator Laboratory, Stanford University, Palo Alto, CA, United States
  • 2 Department of Electrical and Computer Engineering, University of California Santa Barbara (UCSB), Santa Barbara, CA, United States
  • 3 Department of Computer Science, University of Helsinki, Helsinki, Finland
  • 4 Department of Electrical Engineering, Stanford University, Palo Alto, CA, United States

Molecules are essential building blocks of life and their different conformations (i.e., shapes) crucially determine the functional role that they play in living organisms. Cryogenic Electron Microscopy (cryo-EM) allows for acquisition of large image datasets of individual molecules. Recent advances in computational cryo-EM have made it possible to learn latent variable models of conformation landscapes. However, interpreting these latent spaces remains a challenge as their individual dimensions are often arbitrary. The key message of our work is that this interpretation challenge can be viewed as an Independent Component Analysis (ICA) problem where we seek models that have the property of identifiability. That means, they have an essentially unique solution, representing a conformational latent space that separates the different degrees of freedom a molecule is equipped with in nature. Thus, we aim to advance the computational field of cryo-EM beyond visualizations as we connect it with the theoretical framework of (nonlinear) ICA and discuss the need for identifiable models, improved metrics, and benchmarks. Moving forward, we propose future directions for enhancing the disentanglement of latent spaces in cryo-EM, refining evaluation metrics and exploring techniques that leverage physics-based decoders of biomolecular systems. Moreover, we discuss how future technological developments in time-resolved single particle imaging may enable the application of nonlinear ICA models that can discover the true conformation changes of molecules in nature. The pursuit of interpretable conformational latent spaces will empower researchers to unravel complex biological processes and facilitate targeted interventions. This has significant implications for drug discovery and structural biology more broadly. More generally, latent variable models are deployed widely across many scientific disciplines. Thus, the argument we present in this work has much broader applications in AI for science if we want to move from impressive nonlinear neural network models to mathematically grounded methods that can help us learn something new about nature.

1 Introduction

Molecules such as proteins or nucleic acids make up the building blocks of life. Living organisms contain a plethora of molecules that often comprise thousands of atoms. Biomolecules change their conformation (i.e., shape) to fulfill important biological functions such as enzymatic reactions or cellular communication. Understanding the conformational heterogeneity of biomolecules is crucial for deciphering their functional mechanisms and designing targeted interventions. Cryo-Electron Microscopy (cryo-EM) has emerged as a powerful technique for visualizing molecular structures at high resolution. Recent advancements in computational cryo-EM have demonstrated the potential of latent variable models to capture the diverse conformations adopted by biomolecules (reviewed in Donnat et al., 2022 ). However, interpreting these learned latent spaces and extracting biologically meaningful information from them remains a significant challenge.

In this paper, we propose a fruitful approach to unravel the complexities of conformational latent spaces in cryo-EM by framing this as an Independent Component Analysis (ICA) problem. In its original linear formulation, the high level goal of ICA is to discover linear projections of the data that are as statistically independent as possible ( Hyvärinen and Oja, 2000 ). A common observation in ICA applications is that these linear projections discover underlying factors of variation in the data that give insight into the underlying processes. A few prior works have tested the application of linear ICA to molecular imaging ( Borek et al., 2018 ; Gao et al., 2020 ) finding more meaningful separation of molecular conformation changes. However, the transformation between meaningful factors and the data is inherently nonlinear in cryo-EM. Therefore, we need theory and models that work for the nonlinear models used in modern cryo-EM (e.g., Zhong et al., 2021a ). Building on recent theoretical work in identifiable nonlinear ICA Hyvärinen et al. (2024) , in disentanglement models and their benchmarks in machine learning Locatello et al. (2019) , we suggest a path to bridging the gap between theoretical advancements and practical applications in cryo-EM research. We argue that nonlinear ICA methods have the potential to provide a powerful framework to disentangling the latent representations of biomolecular conformations from cryo-EM datasets, overcoming the limitations of traditional volume visualization approaches and ultimately allowing to delve deeper into the structural dynamics of biomolecules. Moreover, we argue that the establishment of benchmarks and metrics specific to cryo-EM disentanglement models is of paramount importance. Adapting and extending existing benchmarks from the machine learning field should allow to objectively evaluate the performance of different disentanglement approaches and track progress in the development of interpretable cryo-EM methods. Ultimately, this interdisciplinary approach will enhance our understanding of complex biological processes and open up new avenues for therapeutic interventions and drug discovery.

The paper is structured as follows. We first provide a general background on the cryo-EM computational problem ( Section 2 ) and how it can be framed as an ICA problem ( Section 2.1 ). We then discuss the two fundamental challenges associated with cryo-EM: firstly, separating the conformation and the pose representations ( Section 2.2 ) and, secondly, finding the right (disentangled) representation of conformations ( Section 2.3 ). We then go into more detail on both challenges by providing quantitative metrics to measure progress and modeling suggestions to improve current frameworks. To disentangle poses and shapes, we propose intervention based metrics ( Section 3.1 ) and training schemes ( Section 3.2 ) that can be added to existing models. For the larger problem of disentangling conformation representations ( Section 4.1 ), we discuss existing disentanglement benchmarks and metrics ( Locatello et al., 2019 ). We then discuss three potential approaches for solving this problem ( Section 4.2 ), based on temporal information ( Section 4.2.1 ), temperature control ( Section 4.2.2 ) and atomic models ( Section 4.2.3 ). Finally, we discuss the path forward and the broader implications for this framework to take computational approaches from neural network based curve fitting to actual understanding of the mechanisms in nature.

2 Interpretable heterogeneous reconstruction–a disentanglement problem

Heterogeneous cryo-EM reconstruction methods aim to model the different conformations that a molecule may assume ( Donnat et al., 2022 ). For instance, we can think of a molecule with a fixed central structure and two adjustable “arms” (see Figure 1 ). Clearly, any conformation that this molecule may assume can be described by providing the position of both arms. Thus, these independently moving parts may be thought of as the fundamental degrees of freedom of this molecule’s conformations.


Figure 1 . Overview. What does it mean to have a disentangled representation of molecular conformations? (A) The example (left) shows a simple molecule with two degrees of freedom 1. and 2. for changing its conformation. An entangled model (right, top) represents mixtures of both movements on each of its latent dimensions z 1 ∼ 1. + 2. and z 2 ∼ 1. − 2. A disentangled model (right, bottom) represents pure movements on each of its latent dimensions z 1 ∼ 1. and z 2 ∼ 2.; actually, z 2 = −2. but the sign flip, incorporated in ∼, does not compromise interpretability. Note that disentangling conformations from cryo-EM measurements requires additional information (e.g., time, temperature or physics), as discussed in section 4 . (B) Training a VAE with separate pose ϕ and conformation z latent spaces on cryo-EM particle images, without any intervention. (C) Interpreting the learned latent space of a model. An axis traversal (blue) results in a complex motion of both arms, i.e., fails at disentangling the two degrees of freedom. A simple transformation, moving only the left arm, corresponds to a curved trajectory.

We can parameterize them by a two dimensional latent variable z ∈ Z where Z ≔ R 2 is some degree of movement. The volume that the molecule occupies in three dimensional space can be thought of as a function v ∈ V , v : R 3 × Z → { 0,1 } that is parameterized by z and indicates for any position in space ( R 3 ) whether it is part of the molecular volume or not, known as an implicit representation of the volume ( Sitzmann et al., 2020 ; Donnat et al., 2022 ). That means, for different values of z , v z = v (., z ) would describe a different volume. Crucially, this function is not known and it is a central goal in heterogeneous cryo-EM reconstruction to learn and study it. For instance, one approach would be to train a neural network ( v θ ) to approximate the true volume function v θ ≈ v * ( Zhong et al., 2021a ).

Furthermore, in cryo-EM we typically see a projection π : V × Φ → R N to a gray-scale pixel image (represented, to keep notation uncluttered, as a vector with N entries). This projection depends on the pose parameters ϕ ∈ Φ = SO (3), so we will also refer to π as the pose function ( Table 1 ). The pose parameters may also need to be inferred (typically, the cryo-EM image formation model would also include camera parameters such as the microscope defocus—we are skipping those for simplicity). That means, for different values of ϕ , π ϕ = π (., ϕ ) would describe a different projection. Importantly, the function π does not have to be learned because we know the physics, i.e., optics behind this projection, thus, we know that the projection in our model π ϕ must be the same one as the ground truth projection π ϕ * = π ϕ .


Table 1 . Glossary. Whenever a distinction is necessary in a given context, we use a ∗ (e.g., g *) to highlight that we are referring to the ground truth model ( g *) or ground truth latent variables ( z *, ϕ *). For instance, we have the ground truth generator g * of the data, in contrast to the learned generator g (i.e., decoder) from our model of the data.

Putting this together we can write the combined cryo-EM generative model (i.e., the abstract process that yields the data we observe). That is, the observed data x is modeled as being generated by the ground truth model as x = g *( z *, ϕ *), which is, crucially, a function of the ground truth latent quantities ( z *, ϕ *)

Usually, we would be measuring very noisy signals x = g *( z *, ϕ *) + ϵ where the noise can be modeled as additive Gaussian ϵ ∼ N ( 0 , σ 2 ) in image space. The essential problem of heterogeneous reconstruction in computational cryo-EM can now be stated as follows.

Given only noisy observations of x = g *( z *, ϕ *) + ϵ, can we recover v * ?

This would be the true volume function v * that shows us how the independent degrees of freedom change the molecule’s conformation. Many cryo-EM models actually learn a probabilistic p ( x | z , ϕ ) representation of the observed data x conditioned on the latent variables. Thus, it becomes necessary to perform inference such as maximum a posteriori estimation of the latents, conditioned on some observed data p ( z , ϕ | x ). Alternatively, it is common to approximate the posterior distribution itself with an amortized variational method such as a variational autoencoder (VAE) ( Kingma and Welling, 2013 ). In this work we will be agnostic about the inference procedure (maximum a posteriori probability (MAP), or mean of the amortized variational posterior) and just assume that there exists a mapping f : X → Z from data to latent variables.

2.1 Heterogeneous reconstruction in Cryo-EM is an ICA problem

Let us compare this to a standard independent component analysis (ICA) setting ( Comon, 1994 ). In ICA we assume that there are K > 1 independent variables collected in the random vector z = ( z 1 , …, z K ). As an example to illustrate this, we may think of a public space where K different speakers proclaim their prophecies z i , completely independently of one another p ( z i , z j ) = p ( z i ) p ( z j ), ∀i ≠ j . However, we do not observe those z directly. Instead, we observe K linear combinations of those variables

with A ∈ R K × K some unknown full rank matrix and, again, with additive Gaussian ϵ ∼ N ( 0 , σ 2 ) . In our example, this may correspond to K microphones placed in the space and each recording some linear combination A i T z of the speech signals. This scenario is also called blind source separation , the term “blind” referring to the idea that we know almost nothing about the “sources” z i , apart from some general statistical properties. In linear ICA, the function g : Z → X , g ( z ) = A z that maps from sources z to observations x is called the mixing function . This basic case of linear ICA has been well-studied in the machine learning and signal processing literature ( Hyvärinen and Oja, 2000 ). Briefly, under the simple assumption that at most one of the sources z i follows a Gaussian distribution, we can find an unmixing function f : X → Z that approximately inverts the mixing function, in practice up to ( f ◦ g )( z ) ∼ C z , i.e., some simple equivalence class ∼ C such as permutations and scalings. 1

If g *( z *, ϕ ) in Eq. 1 was linear in z and ϕ , then the cryo-EM reconstruction problem would amount to a simple linear ICA problem with the (extended) sources ( z ‖ ϕ ) where ‖ denotes concatenation. Unfortunately, the cryo-EM mixing function g ( z , ϕ ) in Eq. 1 is nonlinear. This can be easily appreciated, e.g., by noting that a multiple of some latent αz will not produce the same output as an equally scaled image g ( αz , ϕ ) ≠ αg ( z , ϕ ) which would be just the same image but changed in brightness. In the case of a nonlinear mixing function g , ( Hyvärinen and Pajunen, 1999 ) showed that it is possible to construct many functions f : X → Z that turn the data into independent variables. However, most of these independent variables have no intelligible relationship with the true sources z . This problem is called the lack of identifiability of the model, which in general mathematical terms means lack of uniqueness of the solution.

For cryo-EM models this would mean that we can learn latent spaces whose individual dimensions have no principled relationship with the true degrees of freedom in molecular conformations. As an example, we may end up with a representation of the simple two dimensional molecule from above where traversing any single dimension in the latent space of our model corresponds to complex combinations of the two arm movements ( Figure 1 ). This would, likely, bias our interpretation of how they are articulated together to carry out their function. Thus, without further restrictions on our model, we would fail to discover the simple and elegant structure where the molecule just changes conformation along two independent degrees of freedom, i.e., left and right arm. In the modern machine learning context, finding a latent space that separates the underlying factors of variation is often called disentanglement ( Bengio et al., 2013 ), but it has to be noted that the meaning of that term is quite vague.

Fortunately, recent advances propose ways to solve this problem with nonlinear ICA ( Hyvärinen et al., 2024 ). For example, Khemakhem et al. (2020) adds conditioning (“auxiliary”) variables u that change the source distributions p ( z i | u ). Such a u could represent extra measurements by another modality, or it could be defined by interventions. The model then becomes identifiable if the u modulates the distribution of z strongly enough. This is possible because then the z i are conditionally independent for any u , which provides much stronger constraints than the mere (unconditional) independence of the z i as in the basic ICA framework. Khemakhem et al. (2020) further propose to estimate this model using variational methods, leading to an algorithm which is a variant of VAEs. An alternative approach is possible by assuming temporal dependencies of the source time series ( Hyvarinen and Morioka, 2017 ; Klindt et al., 2020 ; Hälvä et al., 2021 ); spatial dependencies can also be used ( Hälvä et al., 2024 ). In this case, independence of the components over time lags leads, again, to more constraints, and thus to identifiability under some conditions. A very different approach can be developed by constraining the nonlinear function g , parameterizing it with such a small number of parameters that identifiability is obtained ( Hyvärinen et al., 2024 ; Section 5.4); for example, if we know the physics underlying g (i.e., pose transformations and projections) we may also be able to obtain identifiability. Finally, we point out that the independence assumption can be relaxed ( Träuble et al., 2021 ); even causal relationships between the independent components have been modeled, but this requires further constraints and assumptions ( Träuble et al., 2021 ; Morioka and Hyvärinen, 2023 ; Yao et al., 2023 ). Any such learning is easier if interventions on the system are possible ( Ahuja et al., 2023 ) or if it is assumed that the system undergoes sparse, discrete state changes like in robotics experiments ( Locatello et al., 2020b ), but the theory mentioned above is specifically unsupervised , thus not necessarily requiring interventions.

In this work, we will argue that the heterogeneous reconstruction problem in cryo-EM should be framed as a non-linear ICA problem to help us build better and more interpretable models that separate the independent degrees of freedom with which molecules change conformation in nature. Few prior works have applied linear ICA to molecular imaging ( Borek et al., 2018 ; Gao et al., 2020 ), finding more meaningful separation of molecular conformation changes ( Figure 2 ). However, to the best of our knowledge, none of the nonlinear ICA approaches mentioned above have so far been applied to the field of computational cryo-EM. Below we will propose three promising candidate approaches for solving the nonlinear ICA problem in cryo-EM.


Figure 2 . Application of ICA to cryo-EM data (reproduced with permission from ( Gao et al., 2020 ). Original caption: (A) Principal component analysis of the 18 multi-body parameters refined for each particle image yields 18 principal components (PC) displayed here in decreasing order of explained variance. The first three components explain more than 30% of the variability in the particle images. (B) (left) definition of the multi-body segmentation: the central PDE6 stalk in blue corresponds to Body 1, while the 2 G α T⋅ GTP subunits correspond to Bodies two and 3. (right) The motion of each body is parameterized with three translational parameters and three rotational parameters. Each of the 18 principal and three independent components is a linear combination of the resulting 18 rigid-body parameters, and their weights are shown here for the first three principal components (from negligible to larger weight as the shade of grey becomes darker). (C) Negentropy (i.e., reverse entropy) of the first three principal and independent components. (D) (resp. (E) )–(top) histogram of the projection of all image particle parameters on the first three principal (resp. independent) components PC1, PC2 and PC3 (resp. IC1, IC2 and IC3). (bottom) 2D histograms of the projections of all image particle parameters on all pairs of the first three principal (resp. independent) components. (F) (resp. (G) –Maps illustrating the motions carried by IC1 (resp. IC2). (top) map reconstructed from the particles whose projections belong to the last bin along IC1 (resp. IC2). (bottom) map reconstructed from the particles whose projections belong to the first bin along IC1 (resp. IC2). All maps are shown overlaid on the consensus map, with threshold set at a lower density value, colored according to the scheme in (B) .

2.2 Disentangling pose and conformation

The first challenge in computational cryo-EM is that of separating the pose ϕ of a molecule from its conformation z . Again, looking at the cryo-EM mixing function (Eq. 1 ) x = g *( z *, ϕ *) + ϵ , this means we want to find a representation

that separates the estimated conformation z and the pose ϕ . 2 In other words, as a first step, we could find just two latent subspaces without specifying the individual components, or the bases, inside those subspaces. Again, if g * were linear, we might use the well-developed methods of independent subspace analysis, or subspace ICA, to approach this problem ( Hyvärinen and Hoyer, 2000 ; Theis, 2006 ). This might also help with the pose variables’ topology that is, typically, not Euclidean. For instance, a circular pose variable ϕ i ∈ S 1 that lives on a circle and encodes rotations around one axis could not be represented, by a single dimension, in a typical latent variable model that maps to real valued scalars f : X → R K . However, some subspace variants of ICA provide exactly such a transformation into spherical coordinates ( Hyvärinen et al., 2009 , Ch. 10).

Many cryo-EM models use separate latent spaces to represent conformation and pose ( Donnat et al., 2022 ). However, that does not mean that models learn, during nonlinear optimization, to actually use those separate spaces in the intended way. Recent work by Edelberg and Lederman (2023) demonstrated that this is a problem in popular cryo-EM models such as CryoDRGN ( Zhong et al., 2021a ). In particular, they showed that a 90° rotation of an image causes a different prediction in the space of conformation latent variables, even though those should be invariant to pose transformations (see below, 3.1).

2.3 Disentangling independent factors of conformations

A more fundamental challenge is that of separating the independent degrees of freedom of a molecule. Specifically, we want to find a representation f z of the molecular conformation that inverts, up to some equivalence class ∼ C like permutations and scaling (see above), the ground truth generative model ( f z ◦ g *)( z ) = z ∼ C z *.

A popular approach (see CryoDRGN tutorial) consists in fitting a nonlinear model to cryo-EM data ( Figure 1B ) followed by manual investigation of the learned latent space that represents conformational heterogeneity ( Zhong et al., 2021a ) ( Figure 1C ), thus limiting our ability to quantitatively compare models. Here, we propose a possible remedy in the shape of benchmarks where we simulate data using the generative model (Eq. 1 ) to assess how close different methods get to the correct (i.e., up to ∼ C ) representation of conformational latent spaces. This taps into a rich, recent literature in nonlinear ICA methods ( Hyvärinen et al., 2024 ) including benchmarks and metrics for model comparisons ( Locatello et al., 2020a ).

Once we have benchmarks and metrics, we can measure quantitative progress. However, none of the existing heterogeneous reconstruction approaches in computational cryo-EM are identifiable—mirroring the state of the disentangled representation machine learning field in 2019 ( Locatello et al., 2019 ). To actually make progress, in this perspective, we propose three potential approaches to apply nonlinear ICA method for the unsupervised discovery of molecular conformational changes:

1. Time-resolved single particle imaging. Observing conformational changes over time, such as a sparse change in a single conformational degree of freedom, provides valuable information; this relies on nonlinear ICA methods that use temporal autocorrelations of the sources ( Section 4.2.1 ).

2. Boltzmann ICA. It may be possible to disentangle conformational degrees of freedom by sampling at different temperatures; this relies on nonlinear ICA methods that use additional conditioning variables u , like temperature ( Section 4.2.2 ).

3. Atomic models. Building constraint models, with knowledge of the physical mechanism, may exclude faulty solutions; this relies on reducing the hypothesis class provided by the (typically over-parameterized) neural network models ( Section 4.2.3 ).

3 Disentangling pose and conformation

In this section we will first discuss the problem of separating pose and conformation in cryo-EM latent variable models. Recent experiments by Edelberg and Lederman (2023) demonstrated that this desired disentanglement is, unfortunately, violated in the case of CryoDRGN ( Zhong et al., 2021a ). We start by proposing more systematic evaluations and metrics to measure progress on this task ( Section 3.1 ). Note that those are not metrics in a strict mathematical sense, but rather indices that allow us to measure progress. These metrics inspire simple supervised intervention experiments that can be executed in simulation and added to existing training pipelines to disentangle pose and conformation in cryo-EM latent spaces ( Section 3.2 ).

3.1 Evaluating disentanglement of pose and conformation

We are interested in the cryo-EM ground truth generative function g * ( z * , ϕ * ) = π ϕ * ( v z * * ) (Eq. 1 ), which consists of a known pose function π ϕ * = π ( . , ϕ * ) , an unknown volume function v z * * = v * ( . , z * ) , an unknown pose ϕ * and an unknown conformation z *. Now, for any specific image, we have full control over the pose function π ϕ * but do not know the pose ϕ *; however, for the conformation we have neither control over the volume function v z * * nor knowledge of the true conformation z * ( Shannon et al., 1959 ). Consequently, in this section and in Section 3.2 we leverage the fact that we have complete knowledge about the pose function, to measure and constrain the flexibility of the conformation z and volume function v z that we learn in our model.

Put simply, what we want is that, for a molecule with fixed conformation, our model predicts the same conformation even if we change the pose of the molecule. That is, we want the conformation representation f z to be invariant to pose changes. Additionally, we want the pose representation f ϕ to be invariant to conformation changes. Mathematically, the requirements of invariance can be written as

for all possible poses ϕ ∈ Φ and conformations z ∈ Z . When we train a model, this can go wrong both in our encoder f ( x ) (if it fails to separate pose and conformation), or in our decoder g ( z , ϕ ) = π ( v z , ϕ ) (if the volume function v z learns to represent pose changes). Moreover, it may be necessary to add observation noise to the generated images g ( z , ϕ ) + ϵ to mitigate for domain shift between the training data and these simulations. To measure progress in this challenge, we can turn this into six different evaluation metrics. We introduce those six in Appendix A . In practice, a single metric (Alg. 1, Eq. 2 ) seems to suffice as we will discuss in the next section.

3.2 Correcting disentanglement of pose and conformation

Algorithm 1. Interventions for Pose and Conformation Disentanglement.

1. Encode a batch ( x i ) i = 1 , … , N of images into conformations and poses ( z i ‖ ϕ i ) = f ( x i )

2. Detach all z i and ϕ i from the computation graph. a

3. Random shuffle the poses ϕ i ′ = ϕ σ ( i ) b

4. Decode and encode the new conformation and pose pairs into ( z ̂ i ‖ ϕ ̂ i ) = ( f θ 1 ◦ g θ 2 ) ( z i , ϕ i ′ )

5. Measure the distances d (., .) to the original latents c

6. Optimize the encoder and decoder ( f θ 1 , g θ 2 ) along the derivatives ( ∂ L ∂ θ 1 , ∂ L ∂ θ 2 )

7. Repeat 1. to 6. until convergence; or add L ( f θ 1 , g θ 2 ) to total loss function in regular training

a We treat these as given latents and do not differentiate with respect to their initial computation

b Using a random permutation σ instead of a perturbation δ ϕ ensures that we stay within the posterior distribution p ( ϕ | x ) of poses

c In the conformation space, this could just be Euclidean; in pose space we would have to compute, e.g., the geodesic distance in SO (3).

Based on the ideas proposed in the previous section and the metrics in App. 5.1, we are now going to propose a simple penalty term that can be added to existing cryo-EM models to disentangle pose and conformation. The logic behind these intervention experiments is illustrated in Figure 3 . This procedure relies on the physics-based decoder g with an explicit pose representation π ϕ . Typically, the representation f θ 1 and the generator g θ 2 are parameterized as neural networks with learnable parameters θ 1 and θ 2 (Kingma and Welling, 2013; Zhong et al., 2021a). Clearly, we can compute the gradients of all metrics with respect to those parameters. In practice, we observed that good results can be achieved simply by following Algorithm 1 . Note that this is a straightforward, supervised learning objective that is a relatively standard problem in modern machine learning and should present little difficulty. Thus, we can just add L ( f θ 1 , g θ 2 ) ( Algorithm 1 ) as an additional penalty term to the loss function of any of the existing models with separate pose and conformation representation to encourage disentanglement.


Figure 3 . Physics-based Pose and Conformation Disentanglement. (A) We perform an intervention, i.e., changing only the pose (conformation) of a latent pair; these changed latents are decoded and encoded again to measure the consistency and invariance of our model. (B) Compared to a vanilla VAE, a model PoseVAE trained with interventions (i.e., Alg. 1) achieves lower reconstruction error, lower KL divergence to the prior and a lower pose disentanglement loss (Eq. 2 ). (C) Disentanglement, measured as mean correlation coefficient (MCC ( Hyvarinen and Morioka, 2017 ),), increases not only between pose and conformation variables (left), but also among the conformation variables (right). (D) Visualizations of the learned latents for the vanilla VAE model, showing that the learned angle is not perfectly representing the true angle (plots one and two from the left); the three plots on the right show the learned conformation latents, representing mixtures of true conformation and pose. (E) Same as (D) but for PoseVAE, showing a perfect monotonic relationship between learned and true angle; also the conformation latents contain little information about the true angle (third plot) and disentangle the true conformation variables up to a 45° rotation.

Importantly, we are only able to write this approach in such a concise and easy form because of the physics-based decoder g . By this we mean the fact that we know the physics, i.e., optics behind the projection π ϕ in the image formation model (Eq. 1 ). We could imagine a different cryo-EM generative model where both the conformation and the pose change are modeled by the implicit volume representation v : R 3 × Z ′ with some extended latent space Z ′ . Or, in even more general terms, we could just train a standard VAE ( Kingma and Welling, 2013 ) on cryo-EM images to learn a neural network encoder f : X → Z ″ and decoder g : Z ″ → X back to image space with some, potentially, even more abstract latent space Z ″ Miolane et al. (2020) . However, such abstract models would not have the built-in physics of objects in space, their poses ϕ and their projections π ϕ onto a two dimensional image, which we assume a priori in our standard cryo-EM decoder (Eq. 1 ). In other words, such more abstract models would lack the architectural distinction between z and ϕ which we need in our intervention experiment to disentangle pose and conformation. Thus, we would not be able to manipulate distinct parts of the extended latent spaces ( Z ′ or Z ″ ), knowing that those represent distinct physical manipulations of the image.

For models using an implicit representation of the volume, the reason we have to use this interventional approach to disentangle pose and conformation in the first place is that the implicit representation v ( z ) is a highly flexible neural network that can easily model pose changes ( Sitzmann et al., 2020 ). Only by combining this with the constraints physics (i.e., the image formation optics) are we able to disentangle pose and conformation representation. This is akin to disentanglement approaches that use the assumption of sparse manipulations, i.e., pairs of data points where only subsets of the latents are modified ( Locatello et al., 2020b ). Those models have been demonstrated to solve the nonlinear ICA problem theoretically and practically. Thus, whenever we know something about the physics of the world it makes our representation learning task much simpler if we can run intervention experiments that test the causal dependencies between our latent variables ( Ahuja et al., 2023 ; Squires et al., 2023 ).

We performed a small proof-of-concept experiment to test these predictions and report the results in Figure 3 . We train a standard VAE (with separate pose and, implicit, volume representation) on pseudo cryo-EM data and compare it to the same model but with the additional training step in Algorithm 1 . We refer to that model as PoseVAE. In Figure 3B , we see that the additional penalty term in PoseVAE does, indeed, succeed at lowering the pose disentanglement metric (Eq. 2 ). Inspecting the latent representations ( Figure 3E , middle), we observe that the pose is now fully confined to the pose variable. Moreover, we observe that the conformation space z is, itself, becoming more disentangled ( Figure 3E , right). Intuitively, this makes sense because less can go wrong now in encoding two instead of three variables into it. Quantitatively, this observation is confirmed by standard disentanglement metrics showing that PoseVAE achieves higher mean correlation coefficient (MCC) both across all latents ( Figure 3C , left) but also within the conformation latents z alone ( Figure 3 , right). This is encouraging for the next task of disentangling the conformations.

4 Disentangling conformations

Analogously to the previous section, in a second step, we propose a theoretical framework with metrics and benchmarks concerning the further disentanglement of the individual components inside the conformation vector z ( Figure 1 ). This addresses the essential challenge of interpretable cryo-EM conformational representations for heterogeneous reconstruction ( Section 4.1 ). These benchmarks will help us measure true progress in the budding field of computational cryo-EM (June 2023: Cryo-EM Heterogeneity Challenge). Lastly, we discuss different methods to leverage recent development in nonlinear ICA that have the potential to build the next-generation of cryo-EM models that get closer to the true answer ( Section. 4.2 ). These models may require future technological advancements such as temperature dependent cryo-EM ( Bock and Grubmuller, 2022 ) or time-resolved single particle (X-ray) imaging ( Shenoy et al., 2023a ; b ).

4.1 Evaluating disentanglement of independent factors of conformations

We consider the disentanglement of the individual components or dimensions inside the conformation space z . We propose a metric, in the form of a computational procedure, to evaluate whether the independent components of conformational variations are disentangled in the latent space. Essentially, this proposal relies on simulated data that exactly fulfills the generative model (Eq. 1 ) which we assume for cryo-EM data. Having full control over the generative model is important, not only to measure progress, but also to simulate extended datasets (e.g., time-resolved imaging), because we know that only those future datasets with additional assumptions will, provably, allow progress in disentangling conformations. Otherwise, this challenge is hopeless ( Locatello et al., 2019 ). Thus, in the ideal case, we have access to a good cryo-EM simulator g *, so we can just use this.

However, if we do not have a good cryo-EM simulator, we can just use the existing state-of-the-art model and see if we can recover its latents. More precisely, we can do the following: Train a regular cryo-EM model with the additional training loss ( Algorithm 1 ) that ensures that pose and conformation are disentangled. Then, check that the model is approximately, invertible. This is a common assumption in ICA theory ( Hyvärinen et al., 2024 ) to make sure that the task of recovering the sources is well-defined. For this, we basically want to make sure that no two distinct points in conformation latent space z 1 ≠ z 2 would lead to the same volume representation v ( z 1 ) = v ( z 2 ). Once this is, approximately, validated we can use the model as a cryo-EM simulator. Intuitively, we now treat this first model as ground truth generator g *≔ g and see if we can recover its latents z *≔ z .

The procedure to evaluate and benchmark heterogeneous cryo-EM latent variable models would then be to assess how well they learn the same (up to equivalences) conformation latent space as the original model. Thus, we would effectively sample a ground truth pose z * and some random pose ϕ * and feed them into the ground truth model to obtain an image x = g *( z *, ϕ *) + ϵ . We then process this image with a candidate model f z ( x ) = z to obtain the learned conformation representation z . Consequently, we have to compare the two vector representations z * and z . Depending on the equivalence class (∼ C ) that we are interested, there are many different metrics to assess how well z * is disentangled in z . Intuitively, we want some kind of one-to-one correspondence between the two representations where changing a single entry in z * corresponds to changing a single entry in z , and vice versa . Fortunately, this problem has been studied extensively in the machine learning subfield of disentangled representation learning ( Bengio et al., 2013 ), with many proposed metrics and standardized benchmarks ( Locatello et al., 2020a ). We can build on those advances to get better quantitative measures on progress in heterogeneous cryo-EM reconstruction than volume based comparisons.

As an example metric measuring disentanglement, we will discuss the Mean Correlation Coefficient (MCC) ( Hyvarinen and Morioka, 2017 ). Intuitively, we want each learned latent variable to be perfectly correlated (or anti-correlated, since sign flips do not compromise interpretability) with a single source variable. To measure this we can just compute the (absolute) correlation coefficient between all ground truth latents z * and all learned latents z . To account for permutations, we have to solve a linear sum assignment with a permutation σ : {1, …, K } → {1, …, K }, which basically finds the best matching z σ ( i ) * for each z i . The MCC is then, simply, the mean over those matches

with corr() denoting correlation. Other metrics focus on decodability, or informational independence ( Locatello et al., 2019 ) and there is no agreed-upon consensus on the optimal disentanglement metric. Thus, we simply report scores across all metrics–these can be further grouped by rank ordering to get overall model comparison scores (see Klindt et al., 2020 ).

4.2 Correcting disentanglement of independent factors of conformations

Let us now discuss the hardest task, i.e., finding the independent degrees of freedom that determine the conformation of a molecule ( Figure 1 ). This is a hard problem in the sense that, for a nonlinear function g (Eq. 1 ), without any additional assumption it has been known for the last 2 decades that this is, practically, impossible ( Hyvärinen and Pajunen, 1999 ). Moreover, the field of disentangled representation learning ( Bengio et al., 2013 ) has spent multiple years proposing methods that were, ultimately, unidentifiable ( Locatello et al., 2020a ). Going forward, computational cryo-EM should learn from those lessons and avoid the same pitfalls. As a very basic example, if our conformation latent space has an isometric Gaussian prior, as in standard VAEs ( Kingma and Welling, 2013 ), we can always perform a random rotation on the learned latents without changing the likelihood of the model ( Hauberg, 2018 ). Thus, any direction in latent space may be representing the actual isolated change in conformation of the molecule. Fortunately, recent years have seen the development of different methods that solve the problem of nonlinear ICA ( Hyvärinen et al., 2024 ). Below, we propose different approaches that are in technological reach ( Section 4.2.1 ), or that make additional statistical assumptions that fit cryo-EM data ( Sectioin 4.2.2 ) or that integrate additional physical knowledge to constrain the problem ( Section 4.2.3 ).

4.2.1 Time-resolved single particle imaging

If we had temporal data of conformation changes for the same molecule over time, we could start applying nonlinear ICA methods that depend on temporal autocorrelations of the sources ( Hyvarinen and Morioka, 2017 ; Hälvä et al., 2021 ). Specifically, these methods operate under the assumption that we are able to record a time series like

with temporal dependencies in the sense that

For instance, SlowVAE ( Klindt et al., 2020 ) assumes that the transitions

follow some sparse distribution, like a Laplace Δ t ( i ) ∼ Lap ( μ = 0 , λ > 0 ) , and the transitions are independent between components, i.e.,

Klindt et al. (2020) showed that those assumptions are often verified on natural video data, which is important since making additional statistical assumptions to obtain identifiable models is only useful if those assumptions are, actually, aligned with the statistics of real world data.

Practically, such a model leads to a minimal modification to standard VAE training, where now the temporal difference of latents is also penalized to follow the specified transition distribution. However, the crucial difference in this learning paradigm is having access to temporal data ( x t , x t +1 ). While this is not routinely feasible experimentally, efforts to develop time-resolved cryo-EM ( Mäeots and Enchev, 2022 ; Lorenz, 2024 ) will eventually enable the direct observation of protein dynamics in the microseconds to seconds range, yielding datasets where each particle image will be associated to a timestamp that can be readily deployed in the modified modeling approach above.

We performed these disentanglement experiments in Figure 4 . In the first row, we have a demonstration of sparse transitions (drawn from a Laplace distribution) that show changes in some of the latent variables ( z 0 , z 1 , ϕ ). Below, we trained N = 50 models with and without temporal prior ( Klindt et al., 2020 ) and measure seven typical disentanglement metrics ( Locatello et al., 2019 ). We observe that in six out of seven of those metrics, the model with temporal prior, SlowVAE , does, indeed, achieve higher disentanglement scores. Further improvements could be achieved by including the pose loss from the previous section. However, this is already a promising proof-of-concept for future disentanglement of conformation latents based on temporal data.


Figure 4 . Temporal Conformation Disentanglement. (A) Schematic of temporal data pairs ( x t , x t +1 ) and minimal changes to VAE training. To obtain SlowVAE, all we need to do is change the conformation prior for the second time step to be a Laplace distribution centered around the posterior mean of the previous time step ( Klindt et al., 2020 ). (B) Example temporal transitions in the two conformation ( z 0 , z 1 ) and one pose ϕ latents, drawn from a Laplace distribution (default parameters from Klindt et al. (2020) : data rate λ =1, VAE ( β = 1, γ = 10), VAE prior rate λ ′ = 10. (C) Most common disentanglement metrics, for details see ( Locatello et al., 2019 ). In seven out of eight metrics we see that SlowVAE learns a more disentangled representation of the conformation latents than a regular VAE without any adaptations.

4.2.2 Controlling the Boltzmann distribution

The idea above applies to time-resolved experiments studying transient dynamics, triggered by some process such as mixing with a ligand or light excitation. Another class of experiments is concerned with steady-state dynamics where timestamps labels are not helpful. For those experiments, a potentially useful knob that could help solve the non-linear ICA problem is knowledge of the temperature associated with each particle in the dataset. The effect of cooling has been reviewed ( Bock and Grubmuller, 2022 ) and different studies have used temperature to change the conformation distribution to obtain insights ( Chen et al., 2019 ; Mehra et al., 2020 ). Experimentally, this could either be achieved by freezing grids with different cryogens, although a preferable approach would follow the development of thermochromic molecular probes able to report on the local temperature on the cryo-EM grid ( Kortekaas and Browne, 2019 ). This way, precise temperature labeling of each particle in the dataset could be achieved.

Formally, manipulating the temperature τ would provide us with control over the Boltzmann distribution of molecular conformations

with Q ( τ ) the canonical partition function and ɛ ( z ) the energy of being in conformation z . This conditional distribution, where we assume knowledge or experimental control over the temperature, maps onto the theoretical framework of iVAE ( Khemakhem et al., 2020 ) with u = τ . Future theoretical investigations are needed to verify if the additional assumptions for their identifiability results are fulfilled in this setting.

However, to build intuition, we can walk through a thought experiment to see how control of the temperature can suffice to discover the independent degrees of freedom in molecular conformation changes ( Figure 5A ). Assume, again a molecule with two degrees of freedom z 1 , z 2 ∈ R that both follow temperature-dependent normal distributions

Now, suppose that at low temperature τ a , we only see variation in the first component z 1 while the second component is nearly constant, i.e., σ 2 2 ( τ a ) ≪ σ 1 2 ( τ a ) . By contrast, at high temperature τ b > τ a , we see that the second component also starts moving, i.e., σ 2 2 ( τ a ) ≫ 0 . Thus, using temperature alone, we can successfully isolate the different degrees of freedom. Intuitively, this should make it possible to solve the disentanglement task. We could simply fit a model to the data at temperature τ b with the additional constraint that the same model also has to be able to encode the data at temperature τ a , albeit, with only the first latent dimension z 1 . Whether those assumptions bare out in real molecules is not clear, yet, recording at different temperatures is within closer technological reach than time-resolved SPI.


Figure 5 . Alternative Conformation Disentanglement. (A) Proof-of-concept illustration of Boltzmann ICA for the use of temperature as a conditioning variable ( Khemakhem et al., 2020 ). At low temperature τ 1 (left) only the left arm of the molecule shows significant movement. At high temperature τ 2 both arms show significant movement. Conditioning the prior of the conformation latent with this information may allow identifying the distinct sources. (B) Proof-of-concept illustration of Physics-based ICA using physics-based decoder with atomic models that assign mechanistic meaning to latent variables, e.g., in terms of atomic coordinates, (potentially, sparse) movement of volume ( Punjani and Fleet, 2021 ), or local normal mode analysis (NMA) deformations ( Nashed et al., 2022 ; Koo et al., 2023 ).

4.2.3 Atomic models

While the previous two proposals require different data, we may also hope to make progress with different models. We saw how the implicit volume representation needs additional care to disentangle pose and conformation information ( Section 3.1 ). Imbuing the generative model with physics inspired structure, allowed us to separate pose from conformation ( section 3.2 ). Maybe, even more physics can help us solve the harder problem of finding the conformational degrees of freedom. In particular, if we replace the highly expressive implicit volume representation v with an atomic model ( Zhong et al., 2021b ; Rosenbaum et al., 2021 ; Nashed et al., 2022 ; Koo et al., 2023 ), then the pose latent variable z will have to encode how an atomic reference structure (maybe the mode of the conformation space) is deformed into an observed conformation ( Figure 5B ).

One existing work by Punjani and Fleet (2021) proposes to learn a convection field that deforms a reference volume. This comes with the elegant property of volume preservation which is not always the case in implicit conformation representations, but obeys our knowledge of the underlying physics. However, the learned convection fields as well as the reference volume model are still over-parameterized compared to an ideal atomistic model with movement vectors for each atom. The problem in building smaller and more constraint models is that modern machine learning methods have, to some extent, proven so powerful because they allow heavily overparameterized hypothesis classes that still generalize well beyond the training data. The question then becomes whether we can combine the best of both worlds, i.e., the non-convex optimization and generalization properties of deep neural networks (e.g., for implicit volume or convection representations) with the physical detail of constraint atomistic models ( Zhong et al., 2021b ; Rosenbaum et al., 2021 ; Nashed et al., 2022 ; Koo et al., 2023 ).

5 Discussion

In recent years, the integration of computational models, particularly VAEs ( Kingma and Welling, 2013 ), has revolutionized research across various natural sciences, including cryo-EM ( Zhong et al., 2021a ). This perspective piece underscores the critical importance of understanding conformational latent spaces in cryo-EM by drawing on cutting-edge theoretical advancements in identifiable nonlinear ICA ( Hyvärinen et al., 2024 ). By bridging the gap between theoretical frameworks and practical applications in cryo-EM, we are suggesting a significant advance in the interpretability and utility of latent variable models. Furthermore, our study advocates for the adoption of better quantitative measures to assess progress in heterogeneous cryo-EM reconstruction, transcending traditional volume-based comparisons. This aligns with recent initiatives such as the Cryo-EM Heterogeneity Challenge emphasizing the need for refined evaluation metrics to accurately gauge advancements in this field. Nevertheless, this work is merely an opinion piece and proof-of-concept demonstration. Significant technical (e.g., time resolved SPI) and engineering challenges (e.g., identifiable nonlinear ICA models that work in low SNR regimes) lie ahead on this path towards interpretable cryo-EM conformations spaces.

One of the key limitations to the approach that we are proposing in this paper is the assumption that there exist a limited number of factors of variation that determine the possible conformations and conformation changes out of all the possible rearrangements of constituent atoms. Moreover, we assume that those factors are independent which means that the molecule has parts that move independently of each other. Prior work in nonlinear ICA has considered the effect of dependencies ( Träuble et al., 2021 ) and how to mitigate them, but that might not even be necessary. It is conceivable that, for instance in the little cartoon Figure 1 , both arms of the molecule always move up and down together so that they are not independent. However, if both arms always move together, then this is, likely, to fulfill some biological function. In this scenario, our model would presumably learn to describe the combined motion by a single latent variable which would, thus, represent the motion required to perform this biological function. Consequently, such dependencies might unveil (more complex) molecular motions and biological function that we can, thus, extract.

The history of ICA’s emergence in the 1990s, and in particular its early adoption in neuroimaging ( McKeown et al., 1998 ), shows its capacity to evolve into a cornerstone of data-based modeling. This trend, moving even further away from hypothesis-driven research ( Friston, 1998 ) toward data-centric approaches ( Beckmann et al., 2005 ), also underscores the importance of incorporating principles like nonlinear ICA to ensure meaningful model outputs. While modern machine learning techniques, including VAEs or nonlinear dimensionality reduction methods such as t-SNE ( Van der Maaten and Hinton, 2008 ) or UMAP ( McInnes et al., 2018 ), have become ubiquitous in data-based modeling, they often overlook source recovery, i.e., identifiability considerations. To fully harness the potential of latent spaces, it is paramount to ensure their alignment with meaningful representations of the underlying data.

In conclusion, our approach integrates nonlinear ICA principles into the development and analysis of cryo-EM latent variable models, ensuring more interpretable representations that encapsulate the intrinsic structure of the data. Unlocking latent spaces aligned with the underlying fundamental factors governing complex phenomena is pivotal for gaining deep insights into biological processes, expediting drug discovery, and facilitating targeted interventions. This progress extends beyond cryo-EM, resonating with diverse scientific disciplines such as computer vision, natural language processing, and generative modeling, where (VAE) latent spaces play a pivotal role in data representation and the generation of new scientific hypothesis as part of initiatives such as AI4Science. Our interdisciplinary approach, embracing nonlinear ICA and disentanglement models, holds promise in generating meaningful representations that carve nature at the joints , thereby propelling transformative discoveries.

Data availability statement

The original contributions presented in the study are included in the article/Supplementary material, further inquiries can be directed to the corresponding author.

Author contributions

DK: Conceptualization, Formal Analysis, Investigation, Methodology, Validation, Visualization, Writing–original draft, Writing–review and editing. AH: Conceptualization, Methodology, Validation, Writing–review and editing. AL: Investigation, Methodology, Writing–review and editing. NM: Conceptualization, Funding acquisition, Investigation, Methodology, Project administration, Resources, Supervision, Validation, Writing–original draft, Writing–review and editing. FP: Conceptualization, Funding acquisition, Project administration, Resources, Supervision, Writing–original draft, Writing–review and editing.

The author(s) declare that financial support was received for the research, authorship, and/or publication of this article. This work was supported by the U.S. Department of Energy, under DOE Contract No. DE-AC02-76SF00515. DK was supported by the SLAC LDRD project “AtomicSPI: Learning atomic scale biomolecular dynamics from single-particle imaging data” (PI: FP). AH was supported by a CIFAR Fellowship and the Academy of Finland. NM acknowledges funding from grant NIH 1R01GM144965-01.


This work was inspired by discussions at the Flatiron CryoEM Summer Workshop 2023. Specifically, we acknowledge Yifan Cheng’s determination to really understand conformational latent spaces. We hope that the present work will help the field of computational cryo-EM to this next level where they can provide truly novel insights into Biology.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

1 Precisely, writing f ( x ) = Wx , ∀ z ∈ Z : ( f ◦ g ) ( z * ) ∼ C z * ⇔ W A = D P where D is a diagonal matrix and P is a permutation.

2 In the common framework of VAEs, f z ( x ) = μ z ( x ) could be defined as the mean of the variational posterior; in an auto-decoding framework this could be the MAP outcome of inference, i.e., f z ( x ) = argmax z p ( x | z ) p ( z ).

Ahuja, K., Mahajan, D., Wang, Y., and Bengio, Y. (2023). “Interventional causal representation learning,” in International conference on machine learning ( PMLR ), 372–407.

Google Scholar

Beckmann, C. F., DeLuca, M., Devlin, J. T., and Smith, S. M. (2005). Investigations into resting-state connectivity using independent component analysis. Philosophical Trans. R. Soc. B Biol. Sci. 360, 1001–1013. doi:10.1098/rstb.2005.1634

PubMed Abstract | CrossRef Full Text | Google Scholar

Bengio, Y., Courville, A., and Vincent, P. (2013). Representation learning: a review and new perspectives. IEEE Trans. pattern analysis Mach. Intell. 35, 1798–1828. doi:10.1109/TPAMI.2013.50

Bock, L. V., and Grubmuller, H. (2022). Effects of cryo-em cooling on structural ensembles. Biophysical J. 121, 148a. doi:10.1016/j.bpj.2021.11.1981

CrossRef Full Text | Google Scholar

Borek, D., Bromberg, R., Hattne, J., and Otwinowski, Z. (2018). Real-space analysis of radiation-induced specific changes with independent component analysis. J. Synchrotron Radiat. 25, 451–467. doi:10.1107/S1600577517018148

Chen, C.-Y., Chang, Y.-C., Lin, B.-L., Huang, C.-H., and Tsai, M.-D. (2019). Temperature-resolved cryo-em uncovers structural bases of temperature-dependent enzyme functions. J. Am. Chem. Soc. 141, 19983–19987. doi:10.1021/jacs.9b10687

Comon, P. (1994). Independent component analysis, a new concept? Signal Process. 36, 287–314. doi:10.1016/0165-1684(94)90029-9

Donnat, C., Levy, A., Poitevin, F., Zhong, E. D., and Miolane, N. (2022). Deep generative modeling for volume reconstruction in cryo-electron microscopy. J. Struct. Biol. 214 (4), 107920. doi:10.1016/j.jsb.2022.107920

Edelberg, D. G., and Lederman, R. R. (2023). Using vaes to learn latent variables: observations on applications in cryo-em . arXiv preprint arXiv:2303.07487.

Friston, K. J. (1998). Modes or models: a critique on independent component analysis for fmri. Trends cognitive Sci. 2, 373–375. doi:10.1016/s1364-6613(98)01227-3

Gao, Y., Eskici, G., Ramachandran, S., Poitevin, F., Seven, A. B., Panova, O., et al. (2020). Structure of the visual signaling complex between transducin and phosphodiesterase 6. Mol. Cell 80, 237–245. doi:10.1016/j.molcel.2020.09.013

Hälvä, H., Corff, S. L., Lehéricy, L., So, J., Zhu, Y., Gassiat, E., et al. (2021). “Disentangling identifiable features from noisy data with structured nonlinear ICA,” in Advances in neural information processing systems (NeurIPS2021) (virtual) .

Hälvä, H., So, J., Turner, R. E., and Hyvärinen, A. (2024). “Identifiable feature learning for spatial data with nonlinear ICA,” in Proc. Artificial intelligence and statistics (AISTATS2024) (Valencia, Spain: PMLR ).

Hauberg, S. (2018). Only bayes should learn a manifold (on the estimation of differential geometric structure from data) . arXiv preprint arXiv:1806.04994.

Hyvärinen, A., and Hoyer, P. O. (2000). Emergence of phase and shift invariant features by decomposition of natural images into independent feature subspaces. Neural Comput. 12, 1705–1720. doi:10.1162/089976600300015312

Hyvärinen, A., Hurri, J., and Hoyer, P. O. (2009). Natural image statistics . Springer-Verlag .

Hyvärinen, A., Khemakhem, I., and Monti, R. (2024). Identifiability of latent-variable and structural-equation models: from linear to nonlinear. Ann. Inst. Stat. Math. 76, 1–33. doi:10.1007/s10463-023-00884-4

Hyvarinen, A., and Morioka, H. (2017). “Nonlinear ica of temporally dependent stationary sources,” in Artificial intelligence and statistics ( PMLR ), 460–469.

Hyvärinen, A., and Oja, E. (2000). Independent component analysis: algorithms and applications. Neural Netw. 13, 411–430. doi:10.1016/s0893-6080(00)00026-5

Hyvärinen, A., and Pajunen, P. (1999). Nonlinear independent component analysis: existence and uniqueness results. Neural Netw. 12, 429–439. doi:10.1016/s0893-6080(98)00140-3

Khemakhem, I., Kingma, D., Monti, R., and Hyvarinen, A. (2020). “Variational autoencoders and nonlinear ica: a unifying framework,” in International conference on artificial intelligence and statistics ( PMLR ), 2207–2217.

Kingma, D. P., and Welling, M. (2013). Auto-encoding variational bayes . arXiv preprint arXiv:1312.6114.

Klindt, D., Schott, L., Sharma, Y., Ustyuzhaninov, I., Brendel, W., Bethge, M., et al. (2020). Towards nonlinear disentanglement in natural data with temporal sparse coding . arXiv preprint arXiv:2007.10930.

Koo, B., Martel, J., Peck, A., Levy, A., Poitevin, F., and Miolane, N. (2023). Reconstructing heterogeneous cryo-em molecular structures by decomposing them into polymer chains . arXiv preprint arXiv:2306.07274.

Kortekaas, L., and Browne, W. R. (2019). The evolution of spiropyran: fundamentals and progress of an extraordinarily versatile photochrome. Chem. Soc. Rev. 48, 3406–3424. doi:10.1039/c9cs00203k

Locatello, F., Bauer, S., Lucic, M., Raetsch, G., Gelly, S., Schölkopf, B., et al. (2019). “Challenging common assumptions in the unsupervised learning of disentangled representations,” in In international conference on machine learning ( PMLR ), 4114–4124.

Locatello, F., Bauer, S., Lucic, M., Rätsch, G., Gelly, S., Schölkopf, B., et al. (2020a). A sober look at the unsupervised learning of disentangled representations and their evaluation. J. Mach. Learn. Res. 21, 8629–8690.

Locatello, F., Poole, B., Rätsch, G., Schölkopf, B., Bachem, O., and Tschannen, M. (2020b). “Weakly-supervised disentanglement without compromises,” in International conference on machine learning ( PMLR ), 6348–6359.

Lorenz, U. J. (2024). Microsecond time-resolved cryo-electron microscopy . arXiv preprint arXiv:2402.16519.

Mäeots, M.-E., and Enchev, R. I. (2022). Structural dynamics: review of time-resolved cryo-em. Acta Crystallogr. Sect. D. Struct. Biol. 78, 927–935. doi:10.1107/S2059798322006155

McInnes, L., Healy, J., and Melville, J. (2018). Umap: uniform manifold approximation and projection for dimension reduction . arXiv preprint arXiv:1802.03426.

McKeown, M. J., Makeig, S., Brown, G. G., Jung, T.-P., Kindermann, S. S., Bell, A. J., et al. (1998). Analysis of fmri data by blind separation into independent spatial components. Hum. Brain Mapp. 6, 160–188. doi:10.1002/(SICI)1097-0193(1998)6:3<160::AID-HBM5>3.0.CO;2-1

Mehra, R., Dehury, B., and Kepp, K. P. (2020). Cryo-temperature effects on membrane protein structure and dynamics. Phys. Chem. Chem. Phys. 22, 5427–5438. doi:10.1039/c9cp06723j

Miolane, N., Poitevin, F., Li, Y.-T., and Holmes, S. (2020). “Estimation of orientation and camera parameters from cryo-electron microscopy images with variational autoencoders and generative adversarial networks,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops , 970–971.

Morioka, H., and Hyvärinen, A. (2023). “Connectivity-contrastive learning: combining causal discovery and representation learning for multimodal data,” in Proc. Artificial intelligence and statistics (AISTATS2023) (Valencia, Spain: ML Research Press ).

Nashed, Y., Peck, A., Martel, J., Levy, A., Koo, B., Wetzstein, G., et al. (2022). Heterogeneous reconstruction of deformable atomic models in cryo-em . arXiv preprint arXiv:2209.15121.

Punjani, A., and Fleet, D. J. (2023). 3D flex: determining structure and motion of flexible proteins from cryo-EM. Nat. Methods . 20, 860–870. doi:10.1038/s41592-023-01853-8

Rosenbaum, D., Garnelo, M., Zielinski, M., Beattie, C., Clancy, E., Huber, A., et al. (2021). Inferring a continuous distribution of atom coordinates from cryo-em images using vaes .

Shannon, C. E., et al. (1959). Coding theorems for a discrete source with a fidelity criterion. IRE Nat. Conv. Rec. 4, 1.

Shenoy, J., Levy, A., Poitevin, F., and Wetzstein, G. (2023a). “Amortized pose estimation for x-ray single particle imaging,” in Machine learning for structural biology Workshop ( NeurIPS 2023 ).

Shenoy, J., Levy, A., Poitevin, F., and Wetzstein, G. (2023b). Scalable 3d reconstruction from single particle x-ray diffraction images based on online machine learning . arXiv preprint arXiv:2312.14432.

Sitzmann, V., Martel, J., Bergman, A., Lindell, D., and Wetzstein, G. (2020). Implicit neural representations with periodic activation functions. Adv. neural Inf. Process. Syst. 33, 7462–7473.

Squires, C., Seigal, A., Bhate, S. S., and Uhler, C. (2023). “Linear causal disentanglement via interventions,” in International conference on machine learning ( PMLR ), 32540–32560.

Theis, F. (2006). Towards a general independent subspace analysis. Adv. Neural Inf. Process. Syst. 19.

Träuble, F., Creager, E., Kilbertus, N., Locatello, F., Dittadi, A., Goyal, A., et al. (2021). “On disentangled representations learned from correlated data,” in International conference on machine learning ( PMLR ), 10401–10412.

Van der Maaten, L., and Hinton, G. (2008). Visualizing data using t-sne. J. Mach. Learn. Res. 9, 2579–2605.

Yao, D., Xu, D., Lachapelle, S., Magliacane, S., Taslakian, P., Martius, G., et al. (2023). Multi-view causal representation learning with partial observability . arXiv preprint arXiv:2311.04056.

Zhong, E. D., Bepler, T., Berger, B., and Davis, J. H. (2021a). Cryodrgn: reconstruction of heterogeneous cryo-em structures using neural networks. Nat. methods 18, 176–185. doi:10.1038/s41592-020-01049-4

Zhong, E. D., Lerer, A., Davis, J. H., and Berger, B. (2021b). Exploring generative atomic models in cryo-em reconstruction .

Appendix A: Pose and Conformation Disentanglement metrics

We consider disentanglement evaluation metrics for all three learned functions, i.e., the volume v ( z ), the conformation encoder f z ( x ) and the pose encoder f ϕ . For each function, we will consider two metrics that measure their consistency (i.e., does it model the latent it is supposed to model?) and their invariance (i.e., is it invariant to changes in other latents?). Starting with the volume metrics , we measure.

1. Volume consistency , i.e., how accurately the learned volume v ( z ) changes the conformation:

2. Volume pose invariance , i.e., how accurately the learned volume v ( z ) changes only the conformation:

where, ideally, one would use the oracle encoder f *( x ) = argmax z , ϕ p ( x | z , ϕ ) p ( z , ϕ ). In practice, we can just use the current encoder f ( x ) = ( f z ( x )‖ f ϕ ( x )) and optimize it as well. Again this can be split into two metrics (consistency and invariance), both for the conformation encoder f z

1. Conformation-encoder consistency , i.e., how accurately the conformation encoder f z recovers any conformation, independent of pose:

2. Conformation-encoder pose invariance , i.e., how invariant the conformation encoder f z is to pose perturbations:

as well as for the pose encoder f ϕ

1. Pose-encoder consistency , i.e., how accurately the pose encoder f ϕ recovers any pose, independent of conformation:

2. Pose-encoder conformation invariance , i.e., how invariant the pose encoder f ϕ is to conformation perturbations:

where, again, ideally we would like to use the ground truth generator g *, but we can also just use the current learned decoder. All of these six metrics can be evaluated, in a supervised way, over a sufficient number of randomly sampled conformations z , poses ϕ and perturbations ( δ z , δ ϕ ).

Keywords: cryo-EM, machine learning, ICA, AI for science, disentanglement, physics-based models

Citation: Klindt DA, Hyvärinen A, Levy A, Miolane N and Poitevin F (2024) Towards interpretable Cryo-EM: disentangling latent spaces of molecular conformations. Front. Mol. Biosci. 11:1393564. doi: 10.3389/fmolb.2024.1393564

Received: 29 February 2024; Accepted: 22 May 2024; Published: 09 July 2024.

Reviewed by:

Copyright © 2024 Klindt, Hyvärinen, Levy, Miolane and Poitevin. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: David A. Klindt, [email protected]

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.


  1. 15. Biomolecule Notes.doc

    write an assignment on biomolecules

  2. Biomolecules

    write an assignment on biomolecules

  3. Biomolecules Post Lab Assignment

    write an assignment on biomolecules

  4. Biomolecules

    write an assignment on biomolecules

  5. Talk Read Talk Write Biomolecules Student Made Notes

    write an assignment on biomolecules

  6. Biomolecules

    write an assignment on biomolecules



  2. Proteins || Biomolecules || Online NEET Class

  3. Communication assignment file #viral #bca How to write assignment file #exam How to write file

  4. Biomolecules ke best question. / biology NCERT #neet #youtubeshorts

  5. Biomolecules || Class-11th || Part-04 || Enzymes

  6. Biomolecules| Short Notes| #Neet Preparation 2025|


  1. Biomolecule

    biomolecule, any of numerous substances that are produced by cells and living organisms. Biomolecules have a wide range of sizes and structures and perform a vast array of functions. The four major types of biomolecules are carbohydrates, lipids, nucleic acids, and proteins.. Among biomolecules, nucleic acids, namely DNA and RNA, have the unique function of storing an organism's genetic code ...

  2. Biomolecules- Carbohydrates, Proteins, Nucleic acids and Lipids

    Lipids are organic substances that are insoluble in water, soluble in organic solvents, are related to fatty acids and are utilized by the living cell. They include fats, waxes, sterols, fat-soluble vitamins, mono-, di- or triglycerides, phospholipids, etc. Unlike carbohydrates, proteins, and nucleic acids, lipids are not polymeric molecules.

  3. Biomolecules of cells assignment

    Biomolecules of cells assignment Carbohydrates 1a. Summarize the role and function of carbohydrates in the body. Carbohydrates provide you with energy for everyday task. They are also a vital part of a nutritious diet, although some carbohydrates are better for your health than others. 1b. List a carbohydrate food example. Potatoes 1c.

  4. Biomolecules of Cells Assignment

    biomolecules (biological macromolecules). There are four major classes of biomolecules, carbohydrates, lipids, proteins, and nucleic acids which are an important component of the cell. Combined, these molecules make up the majority of a cell's mass. Foods you consume help to produce these 4 biomolecules needed for a healthy life.

  5. 10: Introduction to Biomolecules

    10.0: Introduction to Amino Acids and Proteins Proteins are polymers of amino acids, linked by amide groups known as peptide bonds. An amino acid can be thought of as having two components: a 'backbone', or 'main chain', composed of an ammonium group, an 'alpha-carbon', and a carboxylate, and a variable 'side chain' (in green below) bonded to the alpha-carbon.

  6. Biological Molecules

    Hank talks about the molecules that make up every living thing - carbohydrates, lipids, and proteins - and how we find them in our environment and in the foo...

  7. Lecture 04: Biomolecules

    Figure 4. Sucrose is formed when a monomer of glucose and a monomer of fructose are joined in a dehydration reaction to form a glycosidic bond. In the process, a water molecule is lost. By convention, the carbon atoms in a monosaccharide are numbered from the terminal carbon closest to the carbonyl group.

  8. 13.0: An Introduction to Biochemistry

    Learning Objectives. Explain what a biomolecule is and list the four main types. Biochemistry is the study of the molecules of life, ( biomolecules ); those that structurally make up living organisms and function to keep them alive. Although the complexity of biomolecules ranges from individual small molecules, such as glycine, to very large ...

  9. 2.34: Structures of Biological Molecules

    An RNA nucleotide consists of a five-carbon sugar phosphate linked to one of four nucleic acid bases: guanine (G), cytosine (C), adenine (A) and uracil (U). Figure 2.34.9 2.34. 9. In a DNA nucleototide, the sugar is missing the hydroxyl group at the 2' position, and the thymine base (T) is used instead of uracil.

  10. 13: Biomolecules

    The LibreTexts libraries are Powered by NICE CXone Expert and are supported by the Department of Education Open Textbook Pilot Project, the UC Davis Office of the Provost, the UC Davis Library, the California State University Affordable Learning Solutions Program, and Merlot. We also acknowledge previous National Science Foundation support under grant numbers 1246120, 1525057, and 1413739.

  11. Biomolecules Assignment Flashcards

    Biomolecules Assignment. 4.0 (5 reviews) Flashcards. Learn. Test. Match. Biomolecule. ... Two biomolecules that are used for energy. Lipids. A molecule that is used by animals for long term energy. Fat, Oil, Wax. Give two examples of a Lipid. ... Write the system in matrix form. (b) ...

  12. Biomolecules on the Menu

    This Click & Learn connects food with energy production and storage in cells. Students embark on an engaging exploration of how food is digested into nutrients, how nutrients are absorbed into the bloodstream and delivered to cells, and how cells use nutrients in cellular respiration. The three main sections provide a progressively zoomed-in ...

  13. Biomolecules Chemistry Assignment

    BIOMOLECULES CHEMISTRY ASSIGNMENT - Free download as Word Doc (.doc / .docx), PDF File (.pdf), Text File (.txt) or read online for free. This document is a student's chemistry assignment on biomolecules. It includes an introduction discussing atoms in DNA molecules and how we are each microscopic universes. The assignment then covers various types of biomolecules including micromolecules like ...

  14. PDF Biomolecules on the Menu

    The Biomolecules on the MenuClick & Learn illustrates the process of digestion and how it connects to metabolism and cellular respiration. Students embark on an engaging exploration of how food is digested into nutrients, how nutrients are absorbed into the bloodstream and delivered to cells, and how cells use nutrients in cellular respiration.

  15. Create a Concept Map of Biomolecules

    Your Assignment: As a group, construct a concept map that illustrates the major properties, functions and examples of the four groups of molecules. You can use your book and other resources to create a comprehensive graphic, that contains details and sketches. Your map will be created on a whiteboard or poster board.

  16. Chemistry Project (Biomolecules) .

    Chemistry Project (Biomolecules). - Free download as PDF File (.pdf), Text File (.txt) or read online for free. The document discusses various types of biomolecules including carbohydrates, proteins, nucleic acids, and their structures and functions. It provides detailed information about monomers like monosaccharides and how they combine to form polysaccharides.

  17. What Are Lipids?

    Lipids are oily or greasy nonpolar molecules, stored in the adipose tissue of the body. Lipids are a heterogeneous group of compounds, mainly composed of hydrocarbon chains. Lipids are energy-rich organic molecules, which provide energy for different life processes. Lipids are a class of compounds characterised by their solubility in nonpolar ...

  18. Assignment: Biomolecules

    The assignment requires labeling chemical structures, writing reactions, and explaining concepts in biomolecules. This document contains a chemistry assignment on biomolecules with 47 questions covering topics such as carbohydrates, proteins, vitamins, and nucleic acids.

  19. PDF Microsoft Word

    Sucrose. Lactose. The building blocks are used to form typical biopolymers such as proteins (amino acids), polysac-charides (monosaccharides), DNA/RNA (mononucleotides), and lipids (molecular aggregates) (Ta-ble 1.3). The function of these biopolymers tends to be the same in all living organisms.

  20. Biomolecules of Cells Assignment.docx

    View Biomolecules of Cells Assignment.docx from BIOLOGY 101 at Grand Canyon University. BIOMOLECULES OF CELLS CARBOHYDRATES 1a. ... View FSHN_120_Writing_Assignment from FSHN 120 at University of Illinois, Urbana Champa... Week 2 Case Study .docx. Chamberlain College of Nursing. HEALTH ASS NR 222. Nutrition. Trans fat.

  21. 3.9: Assignment- Nutritionist for a Day

    Nutritionist for a Day. Outcome: Identify and describe the main features of the four main classes of important biomolecules. Criteria. Ratings. Pts. The role of carbohydrates in the body is identified and their structural makeup summarized. Carbohydrates and their role in the body identified; structural makeup discussed/illustrated in detail in ...

  22. Biomolecules Post Lab Assignment

    Write a complete methods section (paragraph style, not a list) based upon the in-person lab experiment for the Biomolecules lab. As you type, the file will space down so feel free to continue this onto the next page. ... Testing for Biomolecules Post-Lab Assignment (30pts) *Wednesday Lab Students- DUE in Bb by 4:00 pm Friday, Feb. 12 *Thursday ...

  23. Frontiers

    Biomolecules change their conformation (i.e., shape) to fulfill important biological functions such as enzymatic reactions or cellular communication. Understanding the conformational heterogeneity of biomolecules is crucial for deciphering their functional mechanisms and designing targeted interventions.

  24. Amoeba Sisters Biomolecules Worksheet

    AMOEBA SISTERS: VIDEO RECAP BIOMOLECULES Amoeba Sisters Video Recap: Biomolecules. Directions: For each statement, write a "C" if it best applies to the carbohydrates, "L" if it best applies to lipids, "P" if it best applies to proteins, or "N" if it best applies to nucleic acids. 1. ____ I am useful for a fast source of energy.