• Privacy Policy

Research Method

Home » Case Study – Methods, Examples and Guide

Case Study – Methods, Examples and Guide

Table of Contents

Case Study Research

A case study is a research method that involves an in-depth examination and analysis of a particular phenomenon or case, such as an individual, organization, community, event, or situation.

It is a qualitative research approach that aims to provide a detailed and comprehensive understanding of the case being studied. Case studies typically involve multiple sources of data, including interviews, observations, documents, and artifacts, which are analyzed using various techniques, such as content analysis, thematic analysis, and grounded theory. The findings of a case study are often used to develop theories, inform policy or practice, or generate new research questions.

Types of Case Study

Types and Methods of Case Study are as follows:

Single-Case Study

A single-case study is an in-depth analysis of a single case. This type of case study is useful when the researcher wants to understand a specific phenomenon in detail.

For Example , A researcher might conduct a single-case study on a particular individual to understand their experiences with a particular health condition or a specific organization to explore their management practices. The researcher collects data from multiple sources, such as interviews, observations, and documents, and uses various techniques to analyze the data, such as content analysis or thematic analysis. The findings of a single-case study are often used to generate new research questions, develop theories, or inform policy or practice.

Multiple-Case Study

A multiple-case study involves the analysis of several cases that are similar in nature. This type of case study is useful when the researcher wants to identify similarities and differences between the cases.

For Example, a researcher might conduct a multiple-case study on several companies to explore the factors that contribute to their success or failure. The researcher collects data from each case, compares and contrasts the findings, and uses various techniques to analyze the data, such as comparative analysis or pattern-matching. The findings of a multiple-case study can be used to develop theories, inform policy or practice, or generate new research questions.

Exploratory Case Study

An exploratory case study is used to explore a new or understudied phenomenon. This type of case study is useful when the researcher wants to generate hypotheses or theories about the phenomenon.

For Example, a researcher might conduct an exploratory case study on a new technology to understand its potential impact on society. The researcher collects data from multiple sources, such as interviews, observations, and documents, and uses various techniques to analyze the data, such as grounded theory or content analysis. The findings of an exploratory case study can be used to generate new research questions, develop theories, or inform policy or practice.

Descriptive Case Study

A descriptive case study is used to describe a particular phenomenon in detail. This type of case study is useful when the researcher wants to provide a comprehensive account of the phenomenon.

For Example, a researcher might conduct a descriptive case study on a particular community to understand its social and economic characteristics. The researcher collects data from multiple sources, such as interviews, observations, and documents, and uses various techniques to analyze the data, such as content analysis or thematic analysis. The findings of a descriptive case study can be used to inform policy or practice or generate new research questions.

Instrumental Case Study

An instrumental case study is used to understand a particular phenomenon that is instrumental in achieving a particular goal. This type of case study is useful when the researcher wants to understand the role of the phenomenon in achieving the goal.

For Example, a researcher might conduct an instrumental case study on a particular policy to understand its impact on achieving a particular goal, such as reducing poverty. The researcher collects data from multiple sources, such as interviews, observations, and documents, and uses various techniques to analyze the data, such as content analysis or thematic analysis. The findings of an instrumental case study can be used to inform policy or practice or generate new research questions.

Case Study Data Collection Methods

Here are some common data collection methods for case studies:

Interviews involve asking questions to individuals who have knowledge or experience relevant to the case study. Interviews can be structured (where the same questions are asked to all participants) or unstructured (where the interviewer follows up on the responses with further questions). Interviews can be conducted in person, over the phone, or through video conferencing.

Observations

Observations involve watching and recording the behavior and activities of individuals or groups relevant to the case study. Observations can be participant (where the researcher actively participates in the activities) or non-participant (where the researcher observes from a distance). Observations can be recorded using notes, audio or video recordings, or photographs.

Documents can be used as a source of information for case studies. Documents can include reports, memos, emails, letters, and other written materials related to the case study. Documents can be collected from the case study participants or from public sources.

Surveys involve asking a set of questions to a sample of individuals relevant to the case study. Surveys can be administered in person, over the phone, through mail or email, or online. Surveys can be used to gather information on attitudes, opinions, or behaviors related to the case study.

Artifacts are physical objects relevant to the case study. Artifacts can include tools, equipment, products, or other objects that provide insights into the case study phenomenon.

How to conduct Case Study Research

Conducting a case study research involves several steps that need to be followed to ensure the quality and rigor of the study. Here are the steps to conduct case study research:

  • Define the research questions: The first step in conducting a case study research is to define the research questions. The research questions should be specific, measurable, and relevant to the case study phenomenon under investigation.
  • Select the case: The next step is to select the case or cases to be studied. The case should be relevant to the research questions and should provide rich and diverse data that can be used to answer the research questions.
  • Collect data: Data can be collected using various methods, such as interviews, observations, documents, surveys, and artifacts. The data collection method should be selected based on the research questions and the nature of the case study phenomenon.
  • Analyze the data: The data collected from the case study should be analyzed using various techniques, such as content analysis, thematic analysis, or grounded theory. The analysis should be guided by the research questions and should aim to provide insights and conclusions relevant to the research questions.
  • Draw conclusions: The conclusions drawn from the case study should be based on the data analysis and should be relevant to the research questions. The conclusions should be supported by evidence and should be clearly stated.
  • Validate the findings: The findings of the case study should be validated by reviewing the data and the analysis with participants or other experts in the field. This helps to ensure the validity and reliability of the findings.
  • Write the report: The final step is to write the report of the case study research. The report should provide a clear description of the case study phenomenon, the research questions, the data collection methods, the data analysis, the findings, and the conclusions. The report should be written in a clear and concise manner and should follow the guidelines for academic writing.

Examples of Case Study

Here are some examples of case study research:

  • The Hawthorne Studies : Conducted between 1924 and 1932, the Hawthorne Studies were a series of case studies conducted by Elton Mayo and his colleagues to examine the impact of work environment on employee productivity. The studies were conducted at the Hawthorne Works plant of the Western Electric Company in Chicago and included interviews, observations, and experiments.
  • The Stanford Prison Experiment: Conducted in 1971, the Stanford Prison Experiment was a case study conducted by Philip Zimbardo to examine the psychological effects of power and authority. The study involved simulating a prison environment and assigning participants to the role of guards or prisoners. The study was controversial due to the ethical issues it raised.
  • The Challenger Disaster: The Challenger Disaster was a case study conducted to examine the causes of the Space Shuttle Challenger explosion in 1986. The study included interviews, observations, and analysis of data to identify the technical, organizational, and cultural factors that contributed to the disaster.
  • The Enron Scandal: The Enron Scandal was a case study conducted to examine the causes of the Enron Corporation’s bankruptcy in 2001. The study included interviews, analysis of financial data, and review of documents to identify the accounting practices, corporate culture, and ethical issues that led to the company’s downfall.
  • The Fukushima Nuclear Disaster : The Fukushima Nuclear Disaster was a case study conducted to examine the causes of the nuclear accident that occurred at the Fukushima Daiichi Nuclear Power Plant in Japan in 2011. The study included interviews, analysis of data, and review of documents to identify the technical, organizational, and cultural factors that contributed to the disaster.

Application of Case Study

Case studies have a wide range of applications across various fields and industries. Here are some examples:

Business and Management

Case studies are widely used in business and management to examine real-life situations and develop problem-solving skills. Case studies can help students and professionals to develop a deep understanding of business concepts, theories, and best practices.

Case studies are used in healthcare to examine patient care, treatment options, and outcomes. Case studies can help healthcare professionals to develop critical thinking skills, diagnose complex medical conditions, and develop effective treatment plans.

Case studies are used in education to examine teaching and learning practices. Case studies can help educators to develop effective teaching strategies, evaluate student progress, and identify areas for improvement.

Social Sciences

Case studies are widely used in social sciences to examine human behavior, social phenomena, and cultural practices. Case studies can help researchers to develop theories, test hypotheses, and gain insights into complex social issues.

Law and Ethics

Case studies are used in law and ethics to examine legal and ethical dilemmas. Case studies can help lawyers, policymakers, and ethical professionals to develop critical thinking skills, analyze complex cases, and make informed decisions.

Purpose of Case Study

The purpose of a case study is to provide a detailed analysis of a specific phenomenon, issue, or problem in its real-life context. A case study is a qualitative research method that involves the in-depth exploration and analysis of a particular case, which can be an individual, group, organization, event, or community.

The primary purpose of a case study is to generate a comprehensive and nuanced understanding of the case, including its history, context, and dynamics. Case studies can help researchers to identify and examine the underlying factors, processes, and mechanisms that contribute to the case and its outcomes. This can help to develop a more accurate and detailed understanding of the case, which can inform future research, practice, or policy.

Case studies can also serve other purposes, including:

  • Illustrating a theory or concept: Case studies can be used to illustrate and explain theoretical concepts and frameworks, providing concrete examples of how they can be applied in real-life situations.
  • Developing hypotheses: Case studies can help to generate hypotheses about the causal relationships between different factors and outcomes, which can be tested through further research.
  • Providing insight into complex issues: Case studies can provide insights into complex and multifaceted issues, which may be difficult to understand through other research methods.
  • Informing practice or policy: Case studies can be used to inform practice or policy by identifying best practices, lessons learned, or areas for improvement.

Advantages of Case Study Research

There are several advantages of case study research, including:

  • In-depth exploration: Case study research allows for a detailed exploration and analysis of a specific phenomenon, issue, or problem in its real-life context. This can provide a comprehensive understanding of the case and its dynamics, which may not be possible through other research methods.
  • Rich data: Case study research can generate rich and detailed data, including qualitative data such as interviews, observations, and documents. This can provide a nuanced understanding of the case and its complexity.
  • Holistic perspective: Case study research allows for a holistic perspective of the case, taking into account the various factors, processes, and mechanisms that contribute to the case and its outcomes. This can help to develop a more accurate and comprehensive understanding of the case.
  • Theory development: Case study research can help to develop and refine theories and concepts by providing empirical evidence and concrete examples of how they can be applied in real-life situations.
  • Practical application: Case study research can inform practice or policy by identifying best practices, lessons learned, or areas for improvement.
  • Contextualization: Case study research takes into account the specific context in which the case is situated, which can help to understand how the case is influenced by the social, cultural, and historical factors of its environment.

Limitations of Case Study Research

There are several limitations of case study research, including:

  • Limited generalizability : Case studies are typically focused on a single case or a small number of cases, which limits the generalizability of the findings. The unique characteristics of the case may not be applicable to other contexts or populations, which may limit the external validity of the research.
  • Biased sampling: Case studies may rely on purposive or convenience sampling, which can introduce bias into the sample selection process. This may limit the representativeness of the sample and the generalizability of the findings.
  • Subjectivity: Case studies rely on the interpretation of the researcher, which can introduce subjectivity into the analysis. The researcher’s own biases, assumptions, and perspectives may influence the findings, which may limit the objectivity of the research.
  • Limited control: Case studies are typically conducted in naturalistic settings, which limits the control that the researcher has over the environment and the variables being studied. This may limit the ability to establish causal relationships between variables.
  • Time-consuming: Case studies can be time-consuming to conduct, as they typically involve a detailed exploration and analysis of a specific case. This may limit the feasibility of conducting multiple case studies or conducting case studies in a timely manner.
  • Resource-intensive: Case studies may require significant resources, including time, funding, and expertise. This may limit the ability of researchers to conduct case studies in resource-constrained settings.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Focus Groups in Qualitative Research

Focus Groups – Steps, Examples and Guide

Exploratory Research

Exploratory Research – Types, Methods and...

Questionnaire

Questionnaire – Definition, Types, and Examples

Phenomenology

Phenomenology – Methods, Examples and Guide

Basic Research

Basic Research – Types, Methods and Examples

Correlational Research Design

Correlational Research – Methods, Types and...

methods of analysis in case study

The Ultimate Guide to Qualitative Research - Part 1: The Basics

methods of analysis in case study

  • Introduction and overview
  • What is qualitative research?
  • What is qualitative data?
  • Examples of qualitative data
  • Qualitative vs. quantitative research
  • Mixed methods
  • Qualitative research preparation
  • Theoretical perspective
  • Theoretical framework
  • Literature reviews

Research question

  • Conceptual framework
  • Conceptual vs. theoretical framework

Data collection

  • Qualitative research methods
  • Focus groups
  • Observational research

What is a case study?

Applications for case study research, what is a good case study, process of case study design, benefits and limitations of case studies.

  • Ethnographical research
  • Ethical considerations
  • Confidentiality and privacy
  • Power dynamics
  • Reflexivity

Case studies

Case studies are essential to qualitative research , offering a lens through which researchers can investigate complex phenomena within their real-life contexts. This chapter explores the concept, purpose, applications, examples, and types of case studies and provides guidance on how to conduct case study research effectively.

methods of analysis in case study

Whereas quantitative methods look at phenomena at scale, case study research looks at a concept or phenomenon in considerable detail. While analyzing a single case can help understand one perspective regarding the object of research inquiry, analyzing multiple cases can help obtain a more holistic sense of the topic or issue. Let's provide a basic definition of a case study, then explore its characteristics and role in the qualitative research process.

Definition of a case study

A case study in qualitative research is a strategy of inquiry that involves an in-depth investigation of a phenomenon within its real-world context. It provides researchers with the opportunity to acquire an in-depth understanding of intricate details that might not be as apparent or accessible through other methods of research. The specific case or cases being studied can be a single person, group, or organization – demarcating what constitutes a relevant case worth studying depends on the researcher and their research question .

Among qualitative research methods , a case study relies on multiple sources of evidence, such as documents, artifacts, interviews , or observations , to present a complete and nuanced understanding of the phenomenon under investigation. The objective is to illuminate the readers' understanding of the phenomenon beyond its abstract statistical or theoretical explanations.

Characteristics of case studies

Case studies typically possess a number of distinct characteristics that set them apart from other research methods. These characteristics include a focus on holistic description and explanation, flexibility in the design and data collection methods, reliance on multiple sources of evidence, and emphasis on the context in which the phenomenon occurs.

Furthermore, case studies can often involve a longitudinal examination of the case, meaning they study the case over a period of time. These characteristics allow case studies to yield comprehensive, in-depth, and richly contextualized insights about the phenomenon of interest.

The role of case studies in research

Case studies hold a unique position in the broader landscape of research methods aimed at theory development. They are instrumental when the primary research interest is to gain an intensive, detailed understanding of a phenomenon in its real-life context.

In addition, case studies can serve different purposes within research - they can be used for exploratory, descriptive, or explanatory purposes, depending on the research question and objectives. This flexibility and depth make case studies a valuable tool in the toolkit of qualitative researchers.

Remember, a well-conducted case study can offer a rich, insightful contribution to both academic and practical knowledge through theory development or theory verification, thus enhancing our understanding of complex phenomena in their real-world contexts.

What is the purpose of a case study?

Case study research aims for a more comprehensive understanding of phenomena, requiring various research methods to gather information for qualitative analysis . Ultimately, a case study can allow the researcher to gain insight into a particular object of inquiry and develop a theoretical framework relevant to the research inquiry.

Why use case studies in qualitative research?

Using case studies as a research strategy depends mainly on the nature of the research question and the researcher's access to the data.

Conducting case study research provides a level of detail and contextual richness that other research methods might not offer. They are beneficial when there's a need to understand complex social phenomena within their natural contexts.

The explanatory, exploratory, and descriptive roles of case studies

Case studies can take on various roles depending on the research objectives. They can be exploratory when the research aims to discover new phenomena or define new research questions; they are descriptive when the objective is to depict a phenomenon within its context in a detailed manner; and they can be explanatory if the goal is to understand specific relationships within the studied context. Thus, the versatility of case studies allows researchers to approach their topic from different angles, offering multiple ways to uncover and interpret the data .

The impact of case studies on knowledge development

Case studies play a significant role in knowledge development across various disciplines. Analysis of cases provides an avenue for researchers to explore phenomena within their context based on the collected data.

methods of analysis in case study

This can result in the production of rich, practical insights that can be instrumental in both theory-building and practice. Case studies allow researchers to delve into the intricacies and complexities of real-life situations, uncovering insights that might otherwise remain hidden.

Types of case studies

In qualitative research , a case study is not a one-size-fits-all approach. Depending on the nature of the research question and the specific objectives of the study, researchers might choose to use different types of case studies. These types differ in their focus, methodology, and the level of detail they provide about the phenomenon under investigation.

Understanding these types is crucial for selecting the most appropriate approach for your research project and effectively achieving your research goals. Let's briefly look at the main types of case studies.

Exploratory case studies

Exploratory case studies are typically conducted to develop a theory or framework around an understudied phenomenon. They can also serve as a precursor to a larger-scale research project. Exploratory case studies are useful when a researcher wants to identify the key issues or questions which can spur more extensive study or be used to develop propositions for further research. These case studies are characterized by flexibility, allowing researchers to explore various aspects of a phenomenon as they emerge, which can also form the foundation for subsequent studies.

Descriptive case studies

Descriptive case studies aim to provide a complete and accurate representation of a phenomenon or event within its context. These case studies are often based on an established theoretical framework, which guides how data is collected and analyzed. The researcher is concerned with describing the phenomenon in detail, as it occurs naturally, without trying to influence or manipulate it.

Explanatory case studies

Explanatory case studies are focused on explanation - they seek to clarify how or why certain phenomena occur. Often used in complex, real-life situations, they can be particularly valuable in clarifying causal relationships among concepts and understanding the interplay between different factors within a specific context.

methods of analysis in case study

Intrinsic, instrumental, and collective case studies

These three categories of case studies focus on the nature and purpose of the study. An intrinsic case study is conducted when a researcher has an inherent interest in the case itself. Instrumental case studies are employed when the case is used to provide insight into a particular issue or phenomenon. A collective case study, on the other hand, involves studying multiple cases simultaneously to investigate some general phenomena.

Each type of case study serves a different purpose and has its own strengths and challenges. The selection of the type should be guided by the research question and objectives, as well as the context and constraints of the research.

The flexibility, depth, and contextual richness offered by case studies make this approach an excellent research method for various fields of study. They enable researchers to investigate real-world phenomena within their specific contexts, capturing nuances that other research methods might miss. Across numerous fields, case studies provide valuable insights into complex issues.

Critical information systems research

Case studies provide a detailed understanding of the role and impact of information systems in different contexts. They offer a platform to explore how information systems are designed, implemented, and used and how they interact with various social, economic, and political factors. Case studies in this field often focus on examining the intricate relationship between technology, organizational processes, and user behavior, helping to uncover insights that can inform better system design and implementation.

Health research

Health research is another field where case studies are highly valuable. They offer a way to explore patient experiences, healthcare delivery processes, and the impact of various interventions in a real-world context.

methods of analysis in case study

Case studies can provide a deep understanding of a patient's journey, giving insights into the intricacies of disease progression, treatment effects, and the psychosocial aspects of health and illness.

Asthma research studies

Specifically within medical research, studies on asthma often employ case studies to explore the individual and environmental factors that influence asthma development, management, and outcomes. A case study can provide rich, detailed data about individual patients' experiences, from the triggers and symptoms they experience to the effectiveness of various management strategies. This can be crucial for developing patient-centered asthma care approaches.

Other fields

Apart from the fields mentioned, case studies are also extensively used in business and management research, education research, and political sciences, among many others. They provide an opportunity to delve into the intricacies of real-world situations, allowing for a comprehensive understanding of various phenomena.

Case studies, with their depth and contextual focus, offer unique insights across these varied fields. They allow researchers to illuminate the complexities of real-life situations, contributing to both theory and practice.

methods of analysis in case study

Whatever field you're in, ATLAS.ti puts your data to work for you

Download a free trial of ATLAS.ti to turn your data into insights.

Understanding the key elements of case study design is crucial for conducting rigorous and impactful case study research. A well-structured design guides the researcher through the process, ensuring that the study is methodologically sound and its findings are reliable and valid. The main elements of case study design include the research question , propositions, units of analysis, and the logic linking the data to the propositions.

The research question is the foundation of any research study. A good research question guides the direction of the study and informs the selection of the case, the methods of collecting data, and the analysis techniques. A well-formulated research question in case study research is typically clear, focused, and complex enough to merit further detailed examination of the relevant case(s).

Propositions

Propositions, though not necessary in every case study, provide a direction by stating what we might expect to find in the data collected. They guide how data is collected and analyzed by helping researchers focus on specific aspects of the case. They are particularly important in explanatory case studies, which seek to understand the relationships among concepts within the studied phenomenon.

Units of analysis

The unit of analysis refers to the case, or the main entity or entities that are being analyzed in the study. In case study research, the unit of analysis can be an individual, a group, an organization, a decision, an event, or even a time period. It's crucial to clearly define the unit of analysis, as it shapes the qualitative data analysis process by allowing the researcher to analyze a particular case and synthesize analysis across multiple case studies to draw conclusions.

Argumentation

This refers to the inferential model that allows researchers to draw conclusions from the data. The researcher needs to ensure that there is a clear link between the data, the propositions (if any), and the conclusions drawn. This argumentation is what enables the researcher to make valid and credible inferences about the phenomenon under study.

Understanding and carefully considering these elements in the design phase of a case study can significantly enhance the quality of the research. It can help ensure that the study is methodologically sound and its findings contribute meaningful insights about the case.

Ready to jumpstart your research with ATLAS.ti?

Conceptualize your research project with our intuitive data analysis interface. Download a free trial today.

Conducting a case study involves several steps, from defining the research question and selecting the case to collecting and analyzing data . This section outlines these key stages, providing a practical guide on how to conduct case study research.

Defining the research question

The first step in case study research is defining a clear, focused research question. This question should guide the entire research process, from case selection to analysis. It's crucial to ensure that the research question is suitable for a case study approach. Typically, such questions are exploratory or descriptive in nature and focus on understanding a phenomenon within its real-life context.

Selecting and defining the case

The selection of the case should be based on the research question and the objectives of the study. It involves choosing a unique example or a set of examples that provide rich, in-depth data about the phenomenon under investigation. After selecting the case, it's crucial to define it clearly, setting the boundaries of the case, including the time period and the specific context.

Previous research can help guide the case study design. When considering a case study, an example of a case could be taken from previous case study research and used to define cases in a new research inquiry. Considering recently published examples can help understand how to select and define cases effectively.

Developing a detailed case study protocol

A case study protocol outlines the procedures and general rules to be followed during the case study. This includes the data collection methods to be used, the sources of data, and the procedures for analysis. Having a detailed case study protocol ensures consistency and reliability in the study.

The protocol should also consider how to work with the people involved in the research context to grant the research team access to collecting data. As mentioned in previous sections of this guide, establishing rapport is an essential component of qualitative research as it shapes the overall potential for collecting and analyzing data.

Collecting data

Gathering data in case study research often involves multiple sources of evidence, including documents, archival records, interviews, observations, and physical artifacts. This allows for a comprehensive understanding of the case. The process for gathering data should be systematic and carefully documented to ensure the reliability and validity of the study.

Analyzing and interpreting data

The next step is analyzing the data. This involves organizing the data , categorizing it into themes or patterns , and interpreting these patterns to answer the research question. The analysis might also involve comparing the findings with prior research or theoretical propositions.

Writing the case study report

The final step is writing the case study report . This should provide a detailed description of the case, the data, the analysis process, and the findings. The report should be clear, organized, and carefully written to ensure that the reader can understand the case and the conclusions drawn from it.

Each of these steps is crucial in ensuring that the case study research is rigorous, reliable, and provides valuable insights about the case.

The type, depth, and quality of data in your study can significantly influence the validity and utility of the study. In case study research, data is usually collected from multiple sources to provide a comprehensive and nuanced understanding of the case. This section will outline the various methods of collecting data used in case study research and discuss considerations for ensuring the quality of the data.

Interviews are a common method of gathering data in case study research. They can provide rich, in-depth data about the perspectives, experiences, and interpretations of the individuals involved in the case. Interviews can be structured , semi-structured , or unstructured , depending on the research question and the degree of flexibility needed.

Observations

Observations involve the researcher observing the case in its natural setting, providing first-hand information about the case and its context. Observations can provide data that might not be revealed in interviews or documents, such as non-verbal cues or contextual information.

Documents and artifacts

Documents and archival records provide a valuable source of data in case study research. They can include reports, letters, memos, meeting minutes, email correspondence, and various public and private documents related to the case.

methods of analysis in case study

These records can provide historical context, corroborate evidence from other sources, and offer insights into the case that might not be apparent from interviews or observations.

Physical artifacts refer to any physical evidence related to the case, such as tools, products, or physical environments. These artifacts can provide tangible insights into the case, complementing the data gathered from other sources.

Ensuring the quality of data collection

Determining the quality of data in case study research requires careful planning and execution. It's crucial to ensure that the data is reliable, accurate, and relevant to the research question. This involves selecting appropriate methods of collecting data, properly training interviewers or observers, and systematically recording and storing the data. It also includes considering ethical issues related to collecting and handling data, such as obtaining informed consent and ensuring the privacy and confidentiality of the participants.

Data analysis

Analyzing case study research involves making sense of the rich, detailed data to answer the research question. This process can be challenging due to the volume and complexity of case study data. However, a systematic and rigorous approach to analysis can ensure that the findings are credible and meaningful. This section outlines the main steps and considerations in analyzing data in case study research.

Organizing the data

The first step in the analysis is organizing the data. This involves sorting the data into manageable sections, often according to the data source or the theme. This step can also involve transcribing interviews, digitizing physical artifacts, or organizing observational data.

Categorizing and coding the data

Once the data is organized, the next step is to categorize or code the data. This involves identifying common themes, patterns, or concepts in the data and assigning codes to relevant data segments. Coding can be done manually or with the help of software tools, and in either case, qualitative analysis software can greatly facilitate the entire coding process. Coding helps to reduce the data to a set of themes or categories that can be more easily analyzed.

Identifying patterns and themes

After coding the data, the researcher looks for patterns or themes in the coded data. This involves comparing and contrasting the codes and looking for relationships or patterns among them. The identified patterns and themes should help answer the research question.

Interpreting the data

Once patterns and themes have been identified, the next step is to interpret these findings. This involves explaining what the patterns or themes mean in the context of the research question and the case. This interpretation should be grounded in the data, but it can also involve drawing on theoretical concepts or prior research.

Verification of the data

The last step in the analysis is verification. This involves checking the accuracy and consistency of the analysis process and confirming that the findings are supported by the data. This can involve re-checking the original data, checking the consistency of codes, or seeking feedback from research participants or peers.

Like any research method , case study research has its strengths and limitations. Researchers must be aware of these, as they can influence the design, conduct, and interpretation of the study.

Understanding the strengths and limitations of case study research can also guide researchers in deciding whether this approach is suitable for their research question . This section outlines some of the key strengths and limitations of case study research.

Benefits include the following:

  • Rich, detailed data: One of the main strengths of case study research is that it can generate rich, detailed data about the case. This can provide a deep understanding of the case and its context, which can be valuable in exploring complex phenomena.
  • Flexibility: Case study research is flexible in terms of design , data collection , and analysis . A sufficient degree of flexibility allows the researcher to adapt the study according to the case and the emerging findings.
  • Real-world context: Case study research involves studying the case in its real-world context, which can provide valuable insights into the interplay between the case and its context.
  • Multiple sources of evidence: Case study research often involves collecting data from multiple sources , which can enhance the robustness and validity of the findings.

On the other hand, researchers should consider the following limitations:

  • Generalizability: A common criticism of case study research is that its findings might not be generalizable to other cases due to the specificity and uniqueness of each case.
  • Time and resource intensive: Case study research can be time and resource intensive due to the depth of the investigation and the amount of collected data.
  • Complexity of analysis: The rich, detailed data generated in case study research can make analyzing the data challenging.
  • Subjectivity: Given the nature of case study research, there may be a higher degree of subjectivity in interpreting the data , so researchers need to reflect on this and transparently convey to audiences how the research was conducted.

Being aware of these strengths and limitations can help researchers design and conduct case study research effectively and interpret and report the findings appropriately.

methods of analysis in case study

Ready to analyze your data with ATLAS.ti?

See how our intuitive software can draw key insights from your data with a free trial today.

Organizing Your Social Sciences Research Assignments

  • Annotated Bibliography
  • Analyzing a Scholarly Journal Article
  • Group Presentations
  • Dealing with Nervousness
  • Using Visual Aids
  • Grading Someone Else's Paper
  • Types of Structured Group Activities
  • Group Project Survival Skills
  • Leading a Class Discussion
  • Multiple Book Review Essay
  • Reviewing Collected Works
  • Writing a Case Analysis Paper
  • Writing a Case Study
  • About Informed Consent
  • Writing Field Notes
  • Writing a Policy Memo
  • Writing a Reflective Paper
  • Writing a Research Proposal
  • Generative AI and Writing
  • Acknowledgments

Definition and Introduction

Case analysis is a problem-based teaching and learning method that involves critically analyzing complex scenarios within an organizational setting for the purpose of placing the student in a “real world” situation and applying reflection and critical thinking skills to contemplate appropriate solutions, decisions, or recommended courses of action. It is considered a more effective teaching technique than in-class role playing or simulation activities. The analytical process is often guided by questions provided by the instructor that ask students to contemplate relationships between the facts and critical incidents described in the case.

Cases generally include both descriptive and statistical elements and rely on students applying abductive reasoning to develop and argue for preferred or best outcomes [i.e., case scenarios rarely have a single correct or perfect answer based on the evidence provided]. Rather than emphasizing theories or concepts, case analysis assignments emphasize building a bridge of relevancy between abstract thinking and practical application and, by so doing, teaches the value of both within a specific area of professional practice.

Given this, the purpose of a case analysis paper is to present a structured and logically organized format for analyzing the case situation. It can be assigned to students individually or as a small group assignment and it may include an in-class presentation component. Case analysis is predominately taught in economics and business-related courses, but it is also a method of teaching and learning found in other applied social sciences disciplines, such as, social work, public relations, education, journalism, and public administration.

Ellet, William. The Case Study Handbook: A Student's Guide . Revised Edition. Boston, MA: Harvard Business School Publishing, 2018; Christoph Rasche and Achim Seisreiner. Guidelines for Business Case Analysis . University of Potsdam; Writing a Case Analysis . Writing Center, Baruch College; Volpe, Guglielmo. "Case Teaching in Economics: History, Practice and Evidence." Cogent Economics and Finance 3 (December 2015). doi:https://doi.org/10.1080/23322039.2015.1120977.

How to Approach Writing a Case Analysis Paper

The organization and structure of a case analysis paper can vary depending on the organizational setting, the situation, and how your professor wants you to approach the assignment. Nevertheless, preparing to write a case analysis paper involves several important steps. As Hawes notes, a case analysis assignment “...is useful in developing the ability to get to the heart of a problem, analyze it thoroughly, and to indicate the appropriate solution as well as how it should be implemented” [p.48]. This statement encapsulates how you should approach preparing to write a case analysis paper.

Before you begin to write your paper, consider the following analytical procedures:

  • Review the case to get an overview of the situation . A case can be only a few pages in length, however, it is most often very lengthy and contains a significant amount of detailed background information and statistics, with multilayered descriptions of the scenario, the roles and behaviors of various stakeholder groups, and situational events. Therefore, a quick reading of the case will help you gain an overall sense of the situation and illuminate the types of issues and problems that you will need to address in your paper. If your professor has provided questions intended to help frame your analysis, use them to guide your initial reading of the case.
  • Read the case thoroughly . After gaining a general overview of the case, carefully read the content again with the purpose of understanding key circumstances, events, and behaviors among stakeholder groups. Look for information or data that appears contradictory, extraneous, or misleading. At this point, you should be taking notes as you read because this will help you develop a general outline of your paper. The aim is to obtain a complete understanding of the situation so that you can begin contemplating tentative answers to any questions your professor has provided or, if they have not provided, developing answers to your own questions about the case scenario and its connection to the course readings,lectures, and class discussions.
  • Determine key stakeholder groups, issues, and events and the relationships they all have to each other . As you analyze the content, pay particular attention to identifying individuals, groups, or organizations described in the case and identify evidence of any problems or issues of concern that impact the situation in a negative way. Other things to look for include identifying any assumptions being made by or about each stakeholder, potential biased explanations or actions, explicit demands or ultimatums , and the underlying concerns that motivate these behaviors among stakeholders. The goal at this stage is to develop a comprehensive understanding of the situational and behavioral dynamics of the case and the explicit and implicit consequences of each of these actions.
  • Identify the core problems . The next step in most case analysis assignments is to discern what the core [i.e., most damaging, detrimental, injurious] problems are within the organizational setting and to determine their implications. The purpose at this stage of preparing to write your analysis paper is to distinguish between the symptoms of core problems and the core problems themselves and to decide which of these must be addressed immediately and which problems do not appear critical but may escalate over time. Identify evidence from the case to support your decisions by determining what information or data is essential to addressing the core problems and what information is not relevant or is misleading.
  • Explore alternative solutions . As noted, case analysis scenarios rarely have only one correct answer. Therefore, it is important to keep in mind that the process of analyzing the case and diagnosing core problems, while based on evidence, is a subjective process open to various avenues of interpretation. This means that you must consider alternative solutions or courses of action by critically examining strengths and weaknesses, risk factors, and the differences between short and long-term solutions. For each possible solution or course of action, consider the consequences they may have related to their implementation and how these recommendations might lead to new problems. Also, consider thinking about your recommended solutions or courses of action in relation to issues of fairness, equity, and inclusion.
  • Decide on a final set of recommendations . The last stage in preparing to write a case analysis paper is to assert an opinion or viewpoint about the recommendations needed to help resolve the core problems as you see them and to make a persuasive argument for supporting this point of view. Prepare a clear rationale for your recommendations based on examining each element of your analysis. Anticipate possible obstacles that could derail their implementation. Consider any counter-arguments that could be made concerning the validity of your recommended actions. Finally, describe a set of criteria and measurable indicators that could be applied to evaluating the effectiveness of your implementation plan.

Use these steps as the framework for writing your paper. Remember that the more detailed you are in taking notes as you critically examine each element of the case, the more information you will have to draw from when you begin to write. This will save you time.

NOTE : If the process of preparing to write a case analysis paper is assigned as a student group project, consider having each member of the group analyze a specific element of the case, including drafting answers to the corresponding questions used by your professor to frame the analysis. This will help make the analytical process more efficient and ensure that the distribution of work is equitable. This can also facilitate who is responsible for drafting each part of the final case analysis paper and, if applicable, the in-class presentation.

Framework for Case Analysis . College of Management. University of Massachusetts; Hawes, Jon M. "Teaching is Not Telling: The Case Method as a Form of Interactive Learning." Journal for Advancement of Marketing Education 5 (Winter 2004): 47-54; Rasche, Christoph and Achim Seisreiner. Guidelines for Business Case Analysis . University of Potsdam; Writing a Case Study Analysis . University of Arizona Global Campus Writing Center; Van Ness, Raymond K. A Guide to Case Analysis . School of Business. State University of New York, Albany; Writing a Case Analysis . Business School, University of New South Wales.

Structure and Writing Style

A case analysis paper should be detailed, concise, persuasive, clearly written, and professional in tone and in the use of language . As with other forms of college-level academic writing, declarative statements that convey information, provide a fact, or offer an explanation or any recommended courses of action should be based on evidence. If allowed by your professor, any external sources used to support your analysis, such as course readings, should be properly cited under a list of references. The organization and structure of case analysis papers can vary depending on your professor’s preferred format, but its structure generally follows the steps used for analyzing the case.

Introduction

The introduction should provide a succinct but thorough descriptive overview of the main facts, issues, and core problems of the case . The introduction should also include a brief summary of the most relevant details about the situation and organizational setting. This includes defining the theoretical framework or conceptual model on which any questions were used to frame your analysis.

Following the rules of most college-level research papers, the introduction should then inform the reader how the paper will be organized. This includes describing the major sections of the paper and the order in which they will be presented. Unless you are told to do so by your professor, you do not need to preview your final recommendations in the introduction. U nlike most college-level research papers , the introduction does not include a statement about the significance of your findings because a case analysis assignment does not involve contributing new knowledge about a research problem.

Background Analysis

Background analysis can vary depending on any guiding questions provided by your professor and the underlying concept or theory that the case is based upon. In general, however, this section of your paper should focus on:

  • Providing an overarching analysis of problems identified from the case scenario, including identifying events that stakeholders find challenging or troublesome,
  • Identifying assumptions made by each stakeholder and any apparent biases they may exhibit,
  • Describing any demands or claims made by or forced upon key stakeholders, and
  • Highlighting any issues of concern or complaints expressed by stakeholders in response to those demands or claims.

These aspects of the case are often in the form of behavioral responses expressed by individuals or groups within the organizational setting. However, note that problems in a case situation can also be reflected in data [or the lack thereof] and in the decision-making, operational, cultural, or institutional structure of the organization. Additionally, demands or claims can be either internal and external to the organization [e.g., a case analysis involving a president considering arms sales to Saudi Arabia could include managing internal demands from White House advisors as well as demands from members of Congress].

Throughout this section, present all relevant evidence from the case that supports your analysis. Do not simply claim there is a problem, an assumption, a demand, or a concern; tell the reader what part of the case informed how you identified these background elements.

Identification of Problems

In most case analysis assignments, there are problems, and then there are problems . Each problem can reflect a multitude of underlying symptoms that are detrimental to the interests of the organization. The purpose of identifying problems is to teach students how to differentiate between problems that vary in severity, impact, and relative importance. Given this, problems can be described in three general forms: those that must be addressed immediately, those that should be addressed but the impact is not severe, and those that do not require immediate attention and can be set aside for the time being.

All of the problems you identify from the case should be identified in this section of your paper, with a description based on evidence explaining the problem variances. If the assignment asks you to conduct research to further support your assessment of the problems, include this in your explanation. Remember to cite those sources in a list of references. Use specific evidence from the case and apply appropriate concepts, theories, and models discussed in class or in relevant course readings to highlight and explain the key problems [or problem] that you believe must be solved immediately and describe the underlying symptoms and why they are so critical.

Alternative Solutions

This section is where you provide specific, realistic, and evidence-based solutions to the problems you have identified and make recommendations about how to alleviate the underlying symptomatic conditions impacting the organizational setting. For each solution, you must explain why it was chosen and provide clear evidence to support your reasoning. This can include, for example, course readings and class discussions as well as research resources, such as, books, journal articles, research reports, or government documents. In some cases, your professor may encourage you to include personal, anecdotal experiences as evidence to support why you chose a particular solution or set of solutions. Using anecdotal evidence helps promote reflective thinking about the process of determining what qualifies as a core problem and relevant solution .

Throughout this part of the paper, keep in mind the entire array of problems that must be addressed and describe in detail the solutions that might be implemented to resolve these problems.

Recommended Courses of Action

In some case analysis assignments, your professor may ask you to combine the alternative solutions section with your recommended courses of action. However, it is important to know the difference between the two. A solution refers to the answer to a problem. A course of action refers to a procedure or deliberate sequence of activities adopted to proactively confront a situation, often in the context of accomplishing a goal. In this context, proposed courses of action are based on your analysis of alternative solutions. Your description and justification for pursuing each course of action should represent the overall plan for implementing your recommendations.

For each course of action, you need to explain the rationale for your recommendation in a way that confronts challenges, explains risks, and anticipates any counter-arguments from stakeholders. Do this by considering the strengths and weaknesses of each course of action framed in relation to how the action is expected to resolve the core problems presented, the possible ways the action may affect remaining problems, and how the recommended action will be perceived by each stakeholder.

In addition, you should describe the criteria needed to measure how well the implementation of these actions is working and explain which individuals or groups are responsible for ensuring your recommendations are successful. In addition, always consider the law of unintended consequences. Outline difficulties that may arise in implementing each course of action and describe how implementing the proposed courses of action [either individually or collectively] may lead to new problems [both large and small].

Throughout this section, you must consider the costs and benefits of recommending your courses of action in relation to uncertainties or missing information and the negative consequences of success.

The conclusion should be brief and introspective. Unlike a research paper, the conclusion in a case analysis paper does not include a summary of key findings and their significance, a statement about how the study contributed to existing knowledge, or indicate opportunities for future research.

Begin by synthesizing the core problems presented in the case and the relevance of your recommended solutions. This can include an explanation of what you have learned about the case in the context of your answers to the questions provided by your professor. The conclusion is also where you link what you learned from analyzing the case with the course readings or class discussions. This can further demonstrate your understanding of the relationships between the practical case situation and the theoretical and abstract content of assigned readings and other course content.

Problems to Avoid

The literature on case analysis assignments often includes examples of difficulties students have with applying methods of critical analysis and effectively reporting the results of their assessment of the situation. A common reason cited by scholars is that the application of this type of teaching and learning method is limited to applied fields of social and behavioral sciences and, as a result, writing a case analysis paper can be unfamiliar to most students entering college.

After you have drafted your paper, proofread the narrative flow and revise any of these common errors:

  • Unnecessary detail in the background section . The background section should highlight the essential elements of the case based on your analysis. Focus on summarizing the facts and highlighting the key factors that become relevant in the other sections of the paper by eliminating any unnecessary information.
  • Analysis relies too much on opinion . Your analysis is interpretive, but the narrative must be connected clearly to evidence from the case and any models and theories discussed in class or in course readings. Any positions or arguments you make should be supported by evidence.
  • Analysis does not focus on the most important elements of the case . Your paper should provide a thorough overview of the case. However, the analysis should focus on providing evidence about what you identify are the key events, stakeholders, issues, and problems. Emphasize what you identify as the most critical aspects of the case to be developed throughout your analysis. Be thorough but succinct.
  • Writing is too descriptive . A paper with too much descriptive information detracts from your analysis of the complexities of the case situation. Questions about what happened, where, when, and by whom should only be included as essential information leading to your examination of questions related to why, how, and for what purpose.
  • Inadequate definition of a core problem and associated symptoms . A common error found in case analysis papers is recommending a solution or course of action without adequately defining or demonstrating that you understand the problem. Make sure you have clearly described the problem and its impact and scope within the organizational setting. Ensure that you have adequately described the root causes w hen describing the symptoms of the problem.
  • Recommendations lack specificity . Identify any use of vague statements and indeterminate terminology, such as, “A particular experience” or “a large increase to the budget.” These statements cannot be measured and, as a result, there is no way to evaluate their successful implementation. Provide specific data and use direct language in describing recommended actions.
  • Unrealistic, exaggerated, or unattainable recommendations . Review your recommendations to ensure that they are based on the situational facts of the case. Your recommended solutions and courses of action must be based on realistic assumptions and fit within the constraints of the situation. Also note that the case scenario has already happened, therefore, any speculation or arguments about what could have occurred if the circumstances were different should be revised or eliminated.

Bee, Lian Song et al. "Business Students' Perspectives on Case Method Coaching for Problem-Based Learning: Impacts on Student Engagement and Learning Performance in Higher Education." Education & Training 64 (2022): 416-432; The Case Analysis . Fred Meijer Center for Writing and Michigan Authors. Grand Valley State University; Georgallis, Panikos and Kayleigh Bruijn. "Sustainability Teaching using Case-Based Debates." Journal of International Education in Business 15 (2022): 147-163; Hawes, Jon M. "Teaching is Not Telling: The Case Method as a Form of Interactive Learning." Journal for Advancement of Marketing Education 5 (Winter 2004): 47-54; Georgallis, Panikos, and Kayleigh Bruijn. "Sustainability Teaching Using Case-based Debates." Journal of International Education in Business 15 (2022): 147-163; .Dean,  Kathy Lund and Charles J. Fornaciari. "How to Create and Use Experiential Case-Based Exercises in a Management Classroom." Journal of Management Education 26 (October 2002): 586-603; Klebba, Joanne M. and Janet G. Hamilton. "Structured Case Analysis: Developing Critical Thinking Skills in a Marketing Case Course." Journal of Marketing Education 29 (August 2007): 132-137, 139; Klein, Norman. "The Case Discussion Method Revisited: Some Questions about Student Skills." Exchange: The Organizational Behavior Teaching Journal 6 (November 1981): 30-32; Mukherjee, Arup. "Effective Use of In-Class Mini Case Analysis for Discovery Learning in an Undergraduate MIS Course." The Journal of Computer Information Systems 40 (Spring 2000): 15-23; Pessoa, Silviaet al. "Scaffolding the Case Analysis in an Organizational Behavior Course: Making Analytical Language Explicit." Journal of Management Education 46 (2022): 226-251: Ramsey, V. J. and L. D. Dodge. "Case Analysis: A Structured Approach." Exchange: The Organizational Behavior Teaching Journal 6 (November 1981): 27-29; Schweitzer, Karen. "How to Write and Format a Business Case Study." ThoughtCo. https://www.thoughtco.com/how-to-write-and-format-a-business-case-study-466324 (accessed December 5, 2022); Reddy, C. D. "Teaching Research Methodology: Everything's a Case." Electronic Journal of Business Research Methods 18 (December 2020): 178-188; Volpe, Guglielmo. "Case Teaching in Economics: History, Practice and Evidence." Cogent Economics and Finance 3 (December 2015). doi:https://doi.org/10.1080/23322039.2015.1120977.

Writing Tip

Ca se Study and Case Analysis Are Not the Same!

Confusion often exists between what it means to write a paper that uses a case study research design and writing a paper that analyzes a case; they are two different types of approaches to learning in the social and behavioral sciences. Professors as well as educational researchers contribute to this confusion because they often use the term "case study" when describing the subject of analysis for a case analysis paper. But you are not studying a case for the purpose of generating a comprehensive, multi-faceted understanding of a research problem. R ather, you are critically analyzing a specific scenario to argue logically for recommended solutions and courses of action that lead to optimal outcomes applicable to professional practice.

To avoid any confusion, here are twelve characteristics that delineate the differences between writing a paper using the case study research method and writing a case analysis paper:

  • Case study is a method of in-depth research and rigorous inquiry ; case analysis is a reliable method of teaching and learning . A case study is a modality of research that investigates a phenomenon for the purpose of creating new knowledge, solving a problem, or testing a hypothesis using empirical evidence derived from the case being studied. Often, the results are used to generalize about a larger population or within a wider context. The writing adheres to the traditional standards of a scholarly research study. A case analysis is a pedagogical tool used to teach students how to reflect and think critically about a practical, real-life problem in an organizational setting.
  • The researcher is responsible for identifying the case to study; a case analysis is assigned by your professor . As the researcher, you choose the case study to investigate in support of obtaining new knowledge and understanding about the research problem. The case in a case analysis assignment is almost always provided, and sometimes written, by your professor and either given to every student in class to analyze individually or to a small group of students, or students select a case to analyze from a predetermined list.
  • A case study is indeterminate and boundless; a case analysis is predetermined and confined . A case study can be almost anything [see item 9 below] as long as it relates directly to examining the research problem. This relationship is the only limit to what a researcher can choose as the subject of their case study. The content of a case analysis is determined by your professor and its parameters are well-defined and limited to elucidating insights of practical value applied to practice.
  • Case study is fact-based and describes actual events or situations; case analysis can be entirely fictional or adapted from an actual situation . The entire content of a case study must be grounded in reality to be a valid subject of investigation in an empirical research study. A case analysis only needs to set the stage for critically examining a situation in practice and, therefore, can be entirely fictional or adapted, all or in-part, from an actual situation.
  • Research using a case study method must adhere to principles of intellectual honesty and academic integrity; a case analysis scenario can include misleading or false information . A case study paper must report research objectively and factually to ensure that any findings are understood to be logically correct and trustworthy. A case analysis scenario may include misleading or false information intended to deliberately distract from the central issues of the case. The purpose is to teach students how to sort through conflicting or useless information in order to come up with the preferred solution. Any use of misleading or false information in academic research is considered unethical.
  • Case study is linked to a research problem; case analysis is linked to a practical situation or scenario . In the social sciences, the subject of an investigation is most often framed as a problem that must be researched in order to generate new knowledge leading to a solution. Case analysis narratives are grounded in real life scenarios for the purpose of examining the realities of decision-making behavior and processes within organizational settings. A case analysis assignments include a problem or set of problems to be analyzed. However, the goal is centered around the act of identifying and evaluating courses of action leading to best possible outcomes.
  • The purpose of a case study is to create new knowledge through research; the purpose of a case analysis is to teach new understanding . Case studies are a choice of methodological design intended to create new knowledge about resolving a research problem. A case analysis is a mode of teaching and learning intended to create new understanding and an awareness of uncertainty applied to practice through acts of critical thinking and reflection.
  • A case study seeks to identify the best possible solution to a research problem; case analysis can have an indeterminate set of solutions or outcomes . Your role in studying a case is to discover the most logical, evidence-based ways to address a research problem. A case analysis assignment rarely has a single correct answer because one of the goals is to force students to confront the real life dynamics of uncertainly, ambiguity, and missing or conflicting information within professional practice. Under these conditions, a perfect outcome or solution almost never exists.
  • Case study is unbounded and relies on gathering external information; case analysis is a self-contained subject of analysis . The scope of a case study chosen as a method of research is bounded. However, the researcher is free to gather whatever information and data is necessary to investigate its relevance to understanding the research problem. For a case analysis assignment, your professor will often ask you to examine solutions or recommended courses of action based solely on facts and information from the case.
  • Case study can be a person, place, object, issue, event, condition, or phenomenon; a case analysis is a carefully constructed synopsis of events, situations, and behaviors . The research problem dictates the type of case being studied and, therefore, the design can encompass almost anything tangible as long as it fulfills the objective of generating new knowledge and understanding. A case analysis is in the form of a narrative containing descriptions of facts, situations, processes, rules, and behaviors within a particular setting and under a specific set of circumstances.
  • Case study can represent an open-ended subject of inquiry; a case analysis is a narrative about something that has happened in the past . A case study is not restricted by time and can encompass an event or issue with no temporal limit or end. For example, the current war in Ukraine can be used as a case study of how medical personnel help civilians during a large military conflict, even though circumstances around this event are still evolving. A case analysis can be used to elicit critical thinking about current or future situations in practice, but the case itself is a narrative about something finite and that has taken place in the past.
  • Multiple case studies can be used in a research study; case analysis involves examining a single scenario . Case study research can use two or more cases to examine a problem, often for the purpose of conducting a comparative investigation intended to discover hidden relationships, document emerging trends, or determine variations among different examples. A case analysis assignment typically describes a stand-alone, self-contained situation and any comparisons among cases are conducted during in-class discussions and/or student presentations.

The Case Analysis . Fred Meijer Center for Writing and Michigan Authors. Grand Valley State University; Mills, Albert J. , Gabrielle Durepos, and Eiden Wiebe, editors. Encyclopedia of Case Study Research . Thousand Oaks, CA: SAGE Publications, 2010; Ramsey, V. J. and L. D. Dodge. "Case Analysis: A Structured Approach." Exchange: The Organizational Behavior Teaching Journal 6 (November 1981): 27-29; Yin, Robert K. Case Study Research and Applications: Design and Methods . 6th edition. Thousand Oaks, CA: Sage, 2017; Crowe, Sarah et al. “The Case Study Approach.” BMC Medical Research Methodology 11 (2011):  doi: 10.1186/1471-2288-11-100; Yin, Robert K. Case Study Research: Design and Methods . 4th edition. Thousand Oaks, CA: Sage Publishing; 1994.

  • << Previous: Reviewing Collected Works
  • Next: Writing a Case Study >>
  • Last Updated: Jun 3, 2024 9:44 AM
  • URL: https://libguides.usc.edu/writingguide/assignments

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, automatically generate references for free.

  • Knowledge Base
  • Methodology
  • Case Study | Definition, Examples & Methods

Case Study | Definition, Examples & Methods

Published on 5 May 2022 by Shona McCombes . Revised on 30 January 2023.

A case study is a detailed study of a specific subject, such as a person, group, place, event, organisation, or phenomenon. Case studies are commonly used in social, educational, clinical, and business research.

A case study research design usually involves qualitative methods , but quantitative methods are sometimes also used. Case studies are good for describing , comparing, evaluating, and understanding different aspects of a research problem .

Table of contents

When to do a case study, step 1: select a case, step 2: build a theoretical framework, step 3: collect your data, step 4: describe and analyse the case.

A case study is an appropriate research design when you want to gain concrete, contextual, in-depth knowledge about a specific real-world subject. It allows you to explore the key characteristics, meanings, and implications of the case.

Case studies are often a good choice in a thesis or dissertation . They keep your project focused and manageable when you don’t have the time or resources to do large-scale research.

You might use just one complex case study where you explore a single subject in depth, or conduct multiple case studies to compare and illuminate different aspects of your research problem.

Case study examples
Research question Case study
What are the ecological effects of wolf reintroduction? Case study of wolf reintroduction in Yellowstone National Park in the US
How do populist politicians use narratives about history to gain support? Case studies of Hungarian prime minister Viktor Orbán and US president Donald Trump
How can teachers implement active learning strategies in mixed-level classrooms? Case study of a local school that promotes active learning
What are the main advantages and disadvantages of wind farms for rural communities? Case studies of three rural wind farm development projects in different parts of the country
How are viral marketing strategies changing the relationship between companies and consumers? Case study of the iPhone X marketing campaign
How do experiences of work in the gig economy differ by gender, race, and age? Case studies of Deliveroo and Uber drivers in London

Prevent plagiarism, run a free check.

Once you have developed your problem statement and research questions , you should be ready to choose the specific case that you want to focus on. A good case study should have the potential to:

  • Provide new or unexpected insights into the subject
  • Challenge or complicate existing assumptions and theories
  • Propose practical courses of action to resolve a problem
  • Open up new directions for future research

Unlike quantitative or experimental research, a strong case study does not require a random or representative sample. In fact, case studies often deliberately focus on unusual, neglected, or outlying cases which may shed new light on the research problem.

If you find yourself aiming to simultaneously investigate and solve an issue, consider conducting action research . As its name suggests, action research conducts research and takes action at the same time, and is highly iterative and flexible. 

However, you can also choose a more common or representative case to exemplify a particular category, experience, or phenomenon.

While case studies focus more on concrete details than general theories, they should usually have some connection with theory in the field. This way the case study is not just an isolated description, but is integrated into existing knowledge about the topic. It might aim to:

  • Exemplify a theory by showing how it explains the case under investigation
  • Expand on a theory by uncovering new concepts and ideas that need to be incorporated
  • Challenge a theory by exploring an outlier case that doesn’t fit with established assumptions

To ensure that your analysis of the case has a solid academic grounding, you should conduct a literature review of sources related to the topic and develop a theoretical framework . This means identifying key concepts and theories to guide your analysis and interpretation.

There are many different research methods you can use to collect data on your subject. Case studies tend to focus on qualitative data using methods such as interviews, observations, and analysis of primary and secondary sources (e.g., newspaper articles, photographs, official records). Sometimes a case study will also collect quantitative data .

The aim is to gain as thorough an understanding as possible of the case and its context.

In writing up the case study, you need to bring together all the relevant aspects to give as complete a picture as possible of the subject.

How you report your findings depends on the type of research you are doing. Some case studies are structured like a standard scientific paper or thesis, with separate sections or chapters for the methods , results , and discussion .

Others are written in a more narrative style, aiming to explore the case from various angles and analyse its meanings and implications (for example, by using textual analysis or discourse analysis ).

In all cases, though, make sure to give contextual details about the case, connect it back to the literature and theory, and discuss how it fits into wider patterns or debates.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

McCombes, S. (2023, January 30). Case Study | Definition, Examples & Methods. Scribbr. Retrieved 11 June 2024, from https://www.scribbr.co.uk/research-methods/case-studies/

Is this article helpful?

Shona McCombes

Shona McCombes

Other students also liked, correlational research | guide, design & examples, a quick guide to experimental design | 5 steps & examples, descriptive research design | definition, methods & examples.

Academic Success Center

Research Writing and Analysis

  • NVivo Group and Study Sessions
  • SPSS This link opens in a new window
  • Statistical Analysis Group sessions
  • Using Qualtrics
  • Dissertation and Data Analysis Group Sessions
  • Defense Schedule - Commons Calendar This link opens in a new window
  • Research Process Flow Chart
  • Research Alignment Chapter 1 This link opens in a new window
  • Step 1: Seek Out Evidence
  • Step 2: Explain
  • Step 3: The Big Picture
  • Step 4: Own It
  • Step 5: Illustrate
  • Annotated Bibliography
  • Literature Review This link opens in a new window
  • Systematic Reviews & Meta-Analyses
  • How to Synthesize and Analyze
  • Synthesis and Analysis Practice
  • Synthesis and Analysis Group Sessions
  • Problem Statement
  • Purpose Statement
  • Conceptual Framework
  • Theoretical Framework
  • Locating Theoretical and Conceptual Frameworks This link opens in a new window
  • Quantitative Research Questions
  • Qualitative Research Questions
  • Trustworthiness of Qualitative Data
  • Analysis and Coding Example- Qualitative Data
  • Thematic Data Analysis in Qualitative Design
  • Dissertation to Journal Article This link opens in a new window
  • International Journal of Online Graduate Education (IJOGE) This link opens in a new window
  • Journal of Research in Innovative Teaching & Learning (JRIT&L) This link opens in a new window

Writing a Case Study

Hands holding a world globe

What is a case study?

A Map of the world with hands holding a pen.

A Case study is: 

  • An in-depth research design that primarily uses a qualitative methodology but sometimes​​ includes quantitative methodology.
  • Used to examine an identifiable problem confirmed through research.
  • Used to investigate an individual, group of people, organization, or event.
  • Used to mostly answer "how" and "why" questions.

What are the different types of case studies?

Man and woman looking at a laptop

Descriptive

This type of case study allows the researcher to:

How has the implementation and use of the instructional coaching intervention for elementary teachers impacted students’ attitudes toward reading?

Explanatory

This type of case study allows the researcher to:

Why do differences exist when implementing the same online reading curriculum in three elementary classrooms?

Exploratory

This type of case study allows the researcher to:

 

What are potential barriers to student’s reading success when middle school teachers implement the Ready Reader curriculum online?

Multiple Case Studies

or

Collective Case Study

This type of case study allows the researcher to:

How are individual school districts addressing student engagement in an online classroom?

Intrinsic

This type of case study allows the researcher to:

How does a student’s familial background influence a teacher’s ability to provide meaningful instruction?

Instrumental

This type of case study allows the researcher to:

How a rural school district’s integration of a reward system maximized student engagement?

Note: These are the primary case studies. As you continue to research and learn

about case studies you will begin to find a robust list of different types. 

Who are your case study participants?

Boys looking through a camera

 

This type of study is implemented to understand an individual by developing a detailed explanation of the individual’s lived experiences or perceptions.

 

 

 

This type of study is implemented to explore a particular group of people’s perceptions.

This type of study is implemented to explore the perspectives of people who work for or had interaction with a specific organization or company.

This type of study is implemented to explore participant’s perceptions of an event.

What is triangulation ? 

Validity and credibility are an essential part of the case study. Therefore, the researcher should include triangulation to ensure trustworthiness while accurately reflecting what the researcher seeks to investigate.

Triangulation image with examples

How to write a Case Study?

When developing a case study, there are different ways you could present the information, but remember to include the five parts for your case study.

Man holding his hand out to show five fingers.

 

Writing Icon Purple Circle w/computer inside

Was this resource helpful?

  • << Previous: Thematic Data Analysis in Qualitative Design
  • Next: Journal Article Reporting Standards (JARS) >>
  • Last Updated: May 29, 2024 8:05 AM
  • URL: https://resources.nu.edu/researchtools

NCU Library Home

Sales CRM Terms

What is Case Study Analysis? (Explained With Examples)

Oct 11, 2023

What is Case Study Analysis? (Explained With Examples)

Case Study Analysis is a widely used research method that examines in-depth information about a particular individual, group, organization, or event. It is a comprehensive investigative approach that aims to understand the intricacies and complexities of the subject under study. Through the analysis of real-life scenarios and inquiry into various data sources, Case Study Analysis provides valuable insights and knowledge that can be used to inform decision-making and problem-solving strategies.

1°) What is Case Study Analysis?

Case Study Analysis is a research methodology that involves the systematic investigation of a specific case or cases to gain a deep understanding of the subject matter. This analysis encompasses collecting and analyzing various types of data, including qualitative and quantitative information. By examining multiple aspects of the case, such as its context, background, influences, and outcomes, researchers can draw meaningful conclusions and provide valuable insights for various fields of study.

When conducting a Case Study Analysis, researchers typically begin by selecting a case or multiple cases that are relevant to their research question or area of interest. This can involve choosing a specific organization, individual, event, or phenomenon to study. Once the case is selected, researchers gather relevant data through various methods, such as interviews, observations, document analysis, and artifact examination.

The data collected during a Case Study Analysis is then carefully analyzed and interpreted. Researchers use different analytical frameworks and techniques to make sense of the information and identify patterns, themes, and relationships within the data. This process involves coding and categorizing the data, conducting comparative analysis, and drawing conclusions based on the findings.

One of the key strengths of Case Study Analysis is its ability to provide a rich and detailed understanding of a specific case. This method allows researchers to delve deep into the complexities and nuances of the subject matter, uncovering insights that may not be captured through other research methods. By examining the case in its natural context, researchers can gain a holistic perspective and explore the various factors and variables that contribute to the case.

1.1 - Definition of Case Study Analysis

Case Study Analysis can be defined as an in-depth examination and exploration of a particular case or cases to unravel relevant details and complexities associated with the subject being studied. It involves a comprehensive and detailed analysis of various factors and variables that contribute to the case, aiming to answer research questions and uncover insights that can be applied in real-world scenarios.

When conducting a Case Study Analysis, researchers employ a range of research methods and techniques to collect and analyze data. These methods can include interviews, surveys, observations, document analysis, and experiments, among others. By using multiple sources of data, researchers can triangulate their findings and ensure the validity and reliability of their analysis.

Furthermore, Case Study Analysis often involves the use of theoretical frameworks and models to guide the research process. These frameworks provide a structured approach to analyzing the case and help researchers make sense of the data collected. By applying relevant theories and concepts, researchers can gain a deeper understanding of the underlying factors and dynamics at play in the case.

1.2 - Advantages of Case Study Analysis

Case Study Analysis offers numerous advantages that make it a popular research method across different disciplines. One significant advantage is its ability to provide rich and detailed information about a specific case, allowing researchers to gain a holistic understanding of the subject matter. Additionally, Case Study Analysis enables researchers to explore complex issues and phenomena in their natural context, capturing the intricacies and nuances that may not be captured through other research methods.

Moreover, Case Study Analysis allows researchers to investigate rare or unique cases that may not be easily replicated or studied through experimental methods. This method is particularly useful when studying phenomena that are complex, multifaceted, or involve multiple variables. By examining real-world cases, researchers can gain insights that can be applied to similar situations or inform future research and practice.

Furthermore, this research method allows for the analysis of multiple sources of data, such as interviews, observations, documents, and artifacts, which can contribute to a comprehensive and well-rounded examination of the case. Case Study Analysis also facilitates the exploration and identification of patterns, trends, and relationships within the data, generating valuable insights and knowledge for future reference and application.

1.3 - Disadvantages of Case Study Analysis

While Case Study Analysis offers various advantages, it also comes with certain limitations and challenges. One major limitation is the potential for researcher bias, as the interpretation of data and findings can be influenced by preconceived notions and personal perspectives. Researchers must be aware of their own biases and take steps to minimize their impact on the analysis.

Additionally, Case Study Analysis may suffer from limited generalizability, as it focuses on specific cases and contexts, which might not be applicable or representative of broader populations or situations. The findings of a case study may not be easily generalized to other settings or individuals, and caution should be exercised when applying the results to different contexts.

Moreover, Case Study Analysis can require significant time and resources due to its in-depth nature and the need for meticulous data collection and analysis. This can pose challenges for researchers working with limited budgets or tight deadlines. However, the thoroughness and depth of the analysis often outweigh the resource constraints, as the insights gained from a well-conducted case study can be highly valuable.

Finally, ethical considerations also play a crucial role in Case Study Analysis, as researchers must ensure the protection of participant confidentiality and privacy. Researchers must obtain informed consent from participants and take measures to safeguard their identities and personal information. Ethical guidelines and protocols should be followed to ensure the rights and well-being of the individuals involved in the case study.

2°) Examples of Case Study Analysis

Real-world examples of Case Study Analysis demonstrate the method's practical application and showcase its usefulness across various fields. The following examples provide insights into different scenarios where Case Study Analysis has been employed successfully.

2.1 - Example in a Startup Context

In a startup context, a Case Study Analysis might explore the factors that contributed to the success of a particular startup company. It would involve examining the organization's background, strategies, market conditions, and key decision-making processes. This analysis could reveal valuable lessons and insights for aspiring entrepreneurs and those interested in understanding the intricacies of startup success.

2.2 - Example in a Consulting Context

In the consulting industry, Case Study Analysis is often utilized to understand and develop solutions for complex business problems. For instance, a consulting firm might conduct a Case Study Analysis on a company facing challenges in its supply chain management. This analysis would involve identifying the underlying issues, evaluating different options, and proposing recommendations based on the findings. This approach enables consultants to apply their expertise and provide practical solutions to their clients.

2.3 - Example in a Digital Marketing Agency Context

Within a digital marketing agency, Case Study Analysis can be used to examine successful marketing campaigns. By analyzing various factors such as target audience, message effectiveness, channel selection, and campaign metrics, this analysis can provide valuable insights into the strategies and tactics that contribute to successful marketing initiatives. Digital marketers can then apply these insights to optimize future campaigns and drive better results for their clients.

2.4 - Example with Analogies

Case Study Analysis can also be utilized with analogies to investigate specific scenarios and draw parallels to similar situations. For instance, a Case Study Analysis could explore the response of different countries to natural disasters and draw analogies to inform disaster management strategies in other regions. These analogies can help policymakers and researchers develop more effective approaches to mitigate the impact of disasters and protect vulnerable populations.

In conclusion, Case Study Analysis is a powerful research method that provides a comprehensive understanding of a particular individual, group, organization, or event. By analyzing real-life cases and exploring various data sources, researchers can unravel complexities, generate valuable insights, and inform decision-making processes. With its advantages and limitations, Case Study Analysis offers a unique approach to gaining in-depth knowledge and practical application across numerous fields.

About the author

methods of analysis in case study

Arnaud Belinga

methods of analysis in case study

Close deals x2 faster with

Breakcold sales crm.

SEE PRICING

*No credit card required

Related Articles

What is the 80-20 rule? (Explained With Examples)

What is the 80-20 rule? (Explained With Examples)

What is the ABCD Sales Method? (Explained With Examples)

What is the ABCD Sales Method? (Explained With Examples)

What is an Accelerated Sales Cycle? (Explained With Examples)

What is an Accelerated Sales Cycle? (Explained With Examples)

What is Account-Based Marketing (ABM)? (Explained With Examples)

What is Account-Based Marketing (ABM)? (Explained With Examples)

What is an Account Manager? (Explained With Examples)

What is an Account Manager? (Explained With Examples)

What is Account Mapping? (Explained With Examples)

What is Account Mapping? (Explained With Examples)

What is Account-Based Selling? (Explained With Examples)

What is Account-Based Selling? (Explained With Examples)

What is Ad Targeting? (Explained With Examples)

What is Ad Targeting? (Explained With Examples)

What is the Addressable Market? (Explained With Examples)

What is the Addressable Market? (Explained With Examples)

What is the Adoption Curve? (Explained With Examples)

What is the Adoption Curve? (Explained With Examples)

What is an AE (Account Executive)? (Explained With Examples)

What is an AE (Account Executive)? (Explained With Examples)

What is Affiliate Marketing? (Explained With Examples)

What is Affiliate Marketing? (Explained With Examples)

What is AI in Sales? (Explained With Examples)

What is AI in Sales? (Explained With Examples)

What is an AI-Powered CRM? (Explained With Examples)

What is an AI-Powered CRM? (Explained With Examples)

What is an Alternative Close? (Explained With Examples)

What is an Alternative Close? (Explained With Examples)

What is the Annual Contract Value? (ACV - Explained With Examples)

What is the Annual Contract Value? (ACV - Explained With Examples)

What are Appointments Set? (Explained With Examples)

What are Appointments Set? (Explained With Examples)

What is an Assumptive Close? (Explained With Examples)

What is an Assumptive Close? (Explained With Examples)

What is Automated Outreach? (Explained With Examples)

What is Automated Outreach? (Explained With Examples)

What is Average Revenue Per Account (ARPA)? (Explained With Examples)

What is Average Revenue Per Account (ARPA)? (Explained With Examples)

What is B2B (Business-to-Business)? (Explained With Examples)

What is B2B (Business-to-Business)? (Explained With Examples)

What is B2G (Business-to-Government)? (Explained With Examples)

What is B2G (Business-to-Government)? (Explained With Examples)

What is B2P (Business-to-Partner)? (Explained With Examples)

What is B2P (Business-to-Partner)? (Explained With Examples)

What is BANT (Budget, Authority, Need, Timing)? (Explained With Examples)

What is BANT (Budget, Authority, Need, Timing)? (Explained With Examples)

What is Behavioral Economics in Sales? (Explained With Examples)

What is Behavioral Economics in Sales? (Explained With Examples)

What is Benchmark Data? (Explained With Examples)

What is Benchmark Data? (Explained With Examples)

What is Benefit Selling? (Explained With Examples)

What is Benefit Selling? (Explained With Examples)

What are Benefit Statements? (Explained With Examples)

What are Benefit Statements? (Explained With Examples)

What is Beyond the Obvious? (Explained With Examples)

What is Beyond the Obvious? (Explained With Examples)

What is a Bootstrapped Startup? (Explained With Examples)

What is a Bootstrapped Startup? (Explained With Examples)

What is the Bottom of the Funnel (BOFU)? (Explained With Examples)

What is the Bottom of the Funnel (BOFU)? (Explained With Examples)

What is Bounce Rate? (Explained With Examples)

What is Bounce Rate? (Explained With Examples)

What is Brand Awareness? (Explained With Examples)

What is Brand Awareness? (Explained With Examples)

What is the Break-Even Point? (Explained With Examples)

What is the Break-Even Point? (Explained With Examples)

What is a Breakup Email? (Explained With Examples)

What is a Breakup Email? (Explained With Examples)

What is Business Development? (Explained With Examples)

What is Business Development? (Explained With Examples)

What are Business Insights? (Explained With Examples)

What are Business Insights? (Explained With Examples)

What is Business Process Automation? (Explained With Examples)

What is Business Process Automation? (Explained With Examples)

What is a Buyer Persona? (Explained With Examples)

What is a Buyer Persona? (Explained With Examples)

What is the Buyer's Journey? (Explained With Examples)

What is the Buyer's Journey? (Explained With Examples)

What is the Buying Cycle? (Explained With Examples)

What is the Buying Cycle? (Explained With Examples)

What is a Buying Signal? (Explained With Examples)

What is a Buying Signal? (Explained With Examples)

What is a Buying Team? (Explained With Examples)

What is a Buying Team? (Explained With Examples)

What is a C-Level Executive? (Explained With Examples)

What is a C-Level Executive? (Explained With Examples)

What is Call Logging? (Explained With Examples)

What is Call Logging? (Explained With Examples)

What is Call Recording? (Explained With Examples)

What is Call Recording? (Explained With Examples)

What is a Call-to-Action (CTA)? (Explained With Examples)

What is a Call-to-Action (CTA)? (Explained With Examples)

What is Challenger Sales? (Explained With Examples)

What is Challenger Sales? (Explained With Examples)

What is Chasing Lost Deals? (Explained With Examples)

What is Chasing Lost Deals? (Explained With Examples)

What is Churn Prevention? (Explained With Examples)

What is Churn Prevention? (Explained With Examples)

What is Churn Rate? (Explained With Examples)

What is Churn Rate? (Explained With Examples)

What is Click-Through Rate (CTR)? (Explained With Examples)

What is Click-Through Rate (CTR)? (Explained With Examples)

What is Client Acquisition? (Explained With Examples)

What is Client Acquisition? (Explained With Examples)

What is the Closing Ratio? (Explained With Examples)

What is the Closing Ratio? (Explained With Examples)

What is the Ben Franklin Close? (Explained With Examples)

What is the Ben Franklin Close? (Explained With Examples)

What is Cognitive Bias in Sales? (Explained With Examples)

What is Cognitive Bias in Sales? (Explained With Examples)

What is Cognitive Dissonance in Sales? (Explained With Examples)

What is Cognitive Dissonance in Sales? (Explained With Examples)

What is Cold Calling? (Explained With Examples)

What is Cold Calling? (Explained With Examples)

What is Cold Outreach? (Explained With Examples)

What is Cold Outreach? (Explained With Examples)

What is a Competitive Advantage? (Explained With Examples)

What is a Competitive Advantage? (Explained With Examples)

What is a Competitive Analysis? (Explained With Examples)

What is a Competitive Analysis? (Explained With Examples)

What is Competitive Positioning? (Explained With Examples)

What is Competitive Positioning? (Explained With Examples)

What is Conceptual Selling? (Explained With Examples)

What is Conceptual Selling? (Explained With Examples)

What is Consultative Closing? (Explained With Examples)

What is Consultative Closing? (Explained With Examples)

What is Consultative Negotiation? (Explained With Examples)

What is Consultative Negotiation? (Explained With Examples)

What is Consultative Prospecting? (Explained With Examples)

What is Consultative Prospecting? (Explained With Examples)

What is Consultative Selling? (Explained With Examples)

What is Consultative Selling? (Explained With Examples)

What is Content Marketing? (Explained With Examples)

What is Content Marketing? (Explained With Examples)

What is Content Syndication? (Explained With Examples)

What is Content Syndication? (Explained With Examples)

What is a Conversion Funnel? (Explained With Examples)

What is a Conversion Funnel? (Explained With Examples)

What is Conversion Optimization? (Explained With Examples)

What is Conversion Optimization? (Explained With Examples)

What is a Conversion Path? (Explained With Examples)

What is a Conversion Path? (Explained With Examples)

What is Conversion Rate? (Explained With Examples)

What is Conversion Rate? (Explained With Examples)

What is Cost-Per-Click (CPC)? (Explained With Examples)

What is Cost-Per-Click (CPC)? (Explained With Examples)

What is a CRM (Customer Relationship Management)? (Explained With Examples)

What is a CRM (Customer Relationship Management)? (Explained With Examples)

What is Cross-Cultural Selling? (Explained With Examples)

What is Cross-Cultural Selling? (Explained With Examples)

What is a Cross-Sell Ratio? (Explained With Examples)

What is a Cross-Sell Ratio? (Explained With Examples)

What is Cross-Selling? (Explained With Examples)

What is Cross-Selling? (Explained With Examples)

What is Customer Acquisition Cost (CAC)? (Explained With Examples)

What is Customer Acquisition Cost (CAC)? (Explained With Examples)

What is Customer-Centric Marketing? (Explained With Examples)

What is Customer-Centric Marketing? (Explained With Examples)

What is Customer-Centric Selling? (Explained With Examples)

What is Customer-Centric Selling? (Explained With Examples)

What is Customer Journey Mapping? (Explained With Examples)

What is Customer Journey Mapping? (Explained With Examples)

What is the Customer Journey? (Explained With Examples)

What is the Customer Journey? (Explained With Examples)

What is the Customer Lifetime Value (CLV)? (Explained With Examples)

What is the Customer Lifetime Value (CLV)? (Explained With Examples)

What is Customer Profiling? (Explained With Examples)

What is Customer Profiling? (Explained With Examples)

What is Customer Retention? (Explained With Examples)

What is Customer Retention? (Explained With Examples)

What is Dark Social? (Explained With Examples)

What is Dark Social? (Explained With Examples)

What is Data Enrichment? (Explained With Examples)

What is Data Enrichment? (Explained With Examples)

What is Data Segmentation? (Explained With Examples)

What is Data Segmentation? (Explained With Examples)

What is Database Marketing? (Explained With Examples)

What is Database Marketing? (Explained With Examples)

What are Decision Criteria? (Explained With Examples)

What are Decision Criteria? (Explained With Examples)

What is a Decision Maker? (Explained With Examples)

What is a Decision Maker? (Explained With Examples)

What is a Decision-Making Unit (DMU)? (Explained With Examples)

What is a Decision-Making Unit (DMU)? (Explained With Examples)

What is Demand Generation? (Explained With Examples)

What is Demand Generation? (Explained With Examples)

What is Digital Marketing? (Explained With Examples)

What is Digital Marketing? (Explained With Examples)

What is Direct Marketing? (Explained With Examples)

What is Direct Marketing? (Explained With Examples)

What is a Discovery Call? (Explained With Examples)

What is a Discovery Call? (Explained With Examples)

What is a Discovery Meeting? (Explained With Examples)

What is a Discovery Meeting? (Explained With Examples)

What are Discovery Questions? (Explained With Examples)

What are Discovery Questions? (Explained With Examples)

What is Door-to-Door Sales? (Explained With Examples)

What is Door-to-Door Sales? (Explained With Examples)

What is a Drip Campaign? (Explained With Examples)

What is a Drip Campaign? (Explained With Examples)

What is Dunning? (Explained With Examples)

What is Dunning? (Explained With Examples)

What is an Early Adopter? (Explained With Examples)

What is an Early Adopter? (Explained With Examples)

What is Elevator Pitch? (Explained With Examples)

What is Elevator Pitch? (Explained With Examples)

What is Email Hygiene? (Explained With Examples)

What is Email Hygiene? (Explained With Examples)

What is Email Marketing? (Explained With Examples)

What is Email Marketing? (Explained With Examples)

What is Emotional Intelligence Selling? (Explained With Examples)

What is Emotional Intelligence Selling? (Explained With Examples)

What is Engagement Marketing? (Explained With Examples)

What is Engagement Marketing? (Explained With Examples)

What is Engagement Rate? (Explained With Examples)

What is Engagement Rate? (Explained With Examples)

What is Engagement Strategy? (Explained With Examples)

What is Engagement Strategy? (Explained With Examples)

What is Feature-Benefit Selling? (Explained With Examples)

What is Feature-Benefit Selling? (Explained With Examples)

What is Field Sales? (Explained With Examples)

What is Field Sales? (Explained With Examples)

What is a Follow-Up? (Explained With Examples)

What is a Follow-Up? (Explained With Examples)

What is Forecast Accuracy? (Explained With Examples)

What is Forecast Accuracy? (Explained With Examples)

What is a Funnel? (Explained With Examples)

What is a Funnel? (Explained With Examples)

What is Gamification in Sales? (Explained With Examples)

What is Gamification in Sales? (Explained With Examples)

What is Gatekeeper Strategy? (Explained With Examples)

What is Gatekeeper Strategy? (Explained With Examples)

What is Gatekeeper? (Explained With Examples)

What is Gatekeeper? (Explained With Examples)

What is a Go-to Market Strategy? (Explained With Examples)

What is a Go-to Market Strategy? (Explained With Examples)

What is Growth Hacking? (Explained With Examples)

What is Growth Hacking? (Explained With Examples)

What is Growth Marketing? (Explained With Examples)

What is Growth Marketing? (Explained With Examples)

What is Guerrilla Marketing? (Explained With Examples)

What is Guerrilla Marketing? (Explained With Examples)

What is High-Ticket Sales? (Explained With Examples)

What is High-Ticket Sales? (Explained With Examples)

What is Holistic Selling? (Explained With Examples)

What is Holistic Selling? (Explained With Examples)

What is Ideal Customer Profile (ICP)? (Explained With Examples)

What is Ideal Customer Profile (ICP)? (Explained With Examples)

What is Inbound Lead Generation? (Explained With Examples)

What is Inbound Lead Generation? (Explained With Examples)

What is an Inbound Lead? (Explained With Examples)

What is an Inbound Lead? (Explained With Examples)

What is Inbound Marketing? (Explained With Examples)

What is Inbound Marketing? (Explained With Examples)

What is Inbound Sales? (Explained With Examples)

What is Inbound Sales? (Explained With Examples)

What is Influencer Marketing? (Explained With Examples)

What is Influencer Marketing? (Explained With Examples)

What is Inside Sales Representative? (Explained With Examples)

What is Inside Sales Representative? (Explained With Examples)

What is Inside Sales? (Explained With Examples)

What is Inside Sales? (Explained With Examples)

What is Insight Selling? (Explained With Examples)

What is Insight Selling? (Explained With Examples)

What is a Key Account? (Explained With Examples)

What is a Key Account? (Explained With Examples)

What is a Key Performance Indicator (KPI)? (Explained With Examples)

What is a Key Performance Indicator (KPI)? (Explained With Examples)

What is a Landing Page? (Explained With Examples)

What is a Landing Page? (Explained With Examples)

What is Lead Database? (Explained With Examples)

What is Lead Database? (Explained With Examples)

What is a Lead Enrichment? (Explained With Examples)

What is a Lead Enrichment? (Explained With Examples)

What is Lead Generation? (Explained With Examples)

What is Lead Generation? (Explained With Examples)

What is Lead Nurturing? (Explained With Examples)

What is Lead Nurturing? (Explained With Examples)

What is Lead Qualification? (Explained With Examples)

What is Lead Qualification? (Explained With Examples)

What is Lead Scoring? (Explained With Examples)

What is Lead Scoring? (Explained With Examples)

What are LinkedIn InMails? (Explained With Examples)

What are LinkedIn InMails? (Explained With Examples)

What is LinkedIn Sales Navigator? (Explained With Examples)

What is LinkedIn Sales Navigator? (Explained With Examples)

What is Lost Opportunity? (Explained With Examples)

What is Lost Opportunity? (Explained With Examples)

What is Market Positioning? (Explained With Examples)

What is Market Positioning? (Explained With Examples)

What is Market Research? (Explained With Examples)

What is Market Research? (Explained With Examples)

What is Market Segmentation? (Explained With Examples)

What is Market Segmentation? (Explained With Examples)

What is MEDDIC? (Explained With Examples)

What is MEDDIC? (Explained With Examples)

What is Middle Of The Funnel (MOFU)? (Explained With Examples)

What is Middle Of The Funnel (MOFU)? (Explained With Examples)

What is Motivational Selling? (Explained With Examples)

What is Motivational Selling? (Explained With Examples)

What is a MQL (Marketing Qualified Lead)? (Explained With Examples)

What is a MQL (Marketing Qualified Lead)? (Explained With Examples)

What is MRR Growth? (Explained With Examples)

What is MRR Growth? (Explained With Examples)

What is MRR (Monthly Recurring Revenue)? (Explained With Examples)

What is MRR (Monthly Recurring Revenue)? (Explained With Examples)

What is N.E.A.T. Selling? (Explained With Examples)

What is N.E.A.T. Selling? (Explained With Examples)

What is Neil Rackham's Sales Tactics? (Explained With Examples)

What is Neil Rackham's Sales Tactics? (Explained With Examples)

What is Networking? (Explained With Examples)

What is Networking? (Explained With Examples)

What is NLP Sales Techniques? (Explained With Examples)

What is NLP Sales Techniques? (Explained With Examples)

What is the Net Promotion Score? (NPS - Explained With Examples)

What is the Net Promotion Score? (NPS - Explained With Examples)

What is Objection Handling Framework? (Explained With Examples)

What is Objection Handling Framework? (Explained With Examples)

What is On-Hold Messaging? (Explained With Examples)

What is On-Hold Messaging? (Explained With Examples)

What is Onboarding in Sales? (Explained With Examples)

What is Onboarding in Sales? (Explained With Examples)

What is Online Advertising? (Explained With Examples)

What is Online Advertising? (Explained With Examples)

What is Outbound Sales? (Explained With Examples)

What is Outbound Sales? (Explained With Examples)

What is Pain Points Analysis? (Explained With Examples)

What is Pain Points Analysis? (Explained With Examples)

What is Permission Marketing? (Explained With Examples)

What is Permission Marketing? (Explained With Examples)

What is Personality-Based Selling? (Explained With Examples)

What is Personality-Based Selling? (Explained With Examples)

What is Persuasion Selling? (Explained With Examples)

What is Persuasion Selling? (Explained With Examples)

What is Pipeline Management? (Explained With Examples)

What is Pipeline Management? (Explained With Examples)

What is Pipeline Velocity? (Explained With Examples)

What is Pipeline Velocity? (Explained With Examples)

What is Predictive Lead Scoring? (Explained With Examples)

What is Predictive Lead Scoring? (Explained With Examples)

What is Price Negotiation? (Explained With Examples)

What is Price Negotiation? (Explained With Examples)

What is Price Objection? (Explained With Examples)

What is Price Objection? (Explained With Examples)

What is Price Sensitivity? (Explained With Examples)

What is Price Sensitivity? (Explained With Examples)

What is Problem-Solution Selling? (Explained With Examples)

What is Problem-Solution Selling? (Explained With Examples)

What is Product Knowledge? (Explained With Examples)

What is Product Knowledge? (Explained With Examples)

What is Product-Led-Growth? (Explained With Examples)

What is Product-Led-Growth? (Explained With Examples)

What is Prospecting? (Explained With Examples)

What is Prospecting? (Explained With Examples)

What is a Qualified Lead? (Explained With Examples)

What is a Qualified Lead? (Explained With Examples)

What is Question-Based Selling? (Explained With Examples)

What is Question-Based Selling? (Explained With Examples)

What is Referral Marketing? (Explained With Examples)

What is Referral Marketing? (Explained With Examples)

What is Relationship Building? (Explained With Examples)

What is Relationship Building? (Explained With Examples)

What is Revenue Forecast? (Explained With Examples)

What is Revenue Forecast? (Explained With Examples)

What is a ROI? (Explained With Examples)

What is a ROI? (Explained With Examples)

What is Sales Automation? (Explained With Examples)

What is Sales Automation? (Explained With Examples)

What is a Sales Bonus Plan? (Explained With Examples)

What is a Sales Bonus Plan? (Explained With Examples)

What is a Sales Champion? (Explained With Examples)

What is a Sales Champion? (Explained With Examples)

What is a Sales Collateral? (Explained With Examples)

What is a Sales Collateral? (Explained With Examples)

What is a Sales Commission Structure Plan? (Explained With Examples)

What is a Sales Commission Structure Plan? (Explained With Examples)

What is a Sales CRM? (Explained With Examples)

What is a Sales CRM? (Explained With Examples)

What is a Sales Cycle? (Explained With Examples)

What is a Sales Cycle? (Explained With Examples)

What is a Sales Demo? (Explained With Examples)

What is a Sales Demo? (Explained With Examples)

What is Sales Enablement? (Explained With Examples)

What is Sales Enablement? (Explained With Examples)

What is a Sales Flywheel? (Explained With Examples)

What is a Sales Flywheel? (Explained With Examples)

What is a Sales Funnel? (Explained With Examples)

What is a Sales Funnel? (Explained With Examples)

What are Sales KPIs? (Explained With Examples)

What are Sales KPIs? (Explained With Examples)

What is a Sales Meetup? (Explained With Examples)

What is a Sales Meetup? (Explained With Examples)

What is a Sales Pipeline? (Explained With Examples)

What is a Sales Pipeline? (Explained With Examples)

What is a Sales Pitch? (Explained With Examples)

What is a Sales Pitch? (Explained With Examples)

What is a Sales Pitch? (Explained With Examples)

What is a Sales Playbook? (Explained With Examples)

Try breakcold now, are you ready to accelerate your sales pipeline.

Join over +1000 agencies, startups & consultants closing deals with Breakcold Sales CRM

Get Started for free

Sales CRM Features

Sales CRM Software

Sales Pipeline

Sales Lead Tracking

CRM with social media integrations

Social Selling Software

Contact Management

CRM Unified Email LinkedIn Inbox

Breakcold works for many industries

CRM for Agencies

CRM for Startups

CRM for Consultants

CRM for Small Business

CRM for LinkedIn

CRM for Coaches

Sales CRM & Sales Pipeline Tutorials

The 8 Sales Pipeline Stages

The Best CRMs for Agencies

The Best CRMs for Consultants

The Best LinkedIn CRMs

How to close deals in 2024, not in 2010

CRM automation: from 0 to PRO in 5 minutes

LinkedIn Inbox Management

LinkedIn Account-Based Marketing (2024 Tutorial with video)

Tools & more

Sales Pipeline Templates

Alternatives

Integrations

CRM integration with LinkedIn

© 2024 Breakcold

Privacy Policy

Terms of Service

Case Study Research in Software Engineering: Guidelines and Examples by Per Runeson, Martin Höst, Austen Rainer, Björn Regnell

Get full access to Case Study Research in Software Engineering: Guidelines and Examples and 60K+ other titles, with a free 10-day trial of O'Reilly.

There are also live events, courses curated by job role, and more.

DATA ANALYSIS AND INTERPRETATION

5.1 introduction.

Once data has been collected the focus shifts to analysis of data. It can be said that in this phase, data is used to understand what actually has happened in the studied case, and where the researcher understands the details of the case and seeks patterns in the data. This means that there inevitably is some analysis going on also in the data collection phase where the data is studied, and for example when data from an interview is transcribed. The understandings in the earlier phases are of course also valid and important, but this chapter is more focusing on the separate phase that starts after the data has been collected.

Data analysis is conducted differently for quantitative and qualitative data. Sections 5.2 – 5.5 describe how to analyze qualitative data and how to assess the validity of this type of analysis. In Section 5.6 , a short introduction to quantitative analysis methods is given. Since quantitative analysis is covered extensively in textbooks on statistical analysis, and case study research to a large extent relies on qualitative data, this section is kept short.

5.2 ANALYSIS OF DATA IN FLEXIBLE RESEARCH

5.2.1 introduction.

As case study research is a flexible research method, qualitative data analysis methods are commonly used [176]. The basic objective of the analysis is, as in any other analysis, to derive conclusions from the data, keeping a clear chain of evidence. The chain of evidence means that a reader ...

Get Case Study Research in Software Engineering: Guidelines and Examples now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Don’t leave empty-handed

Get Mark Richards’s Software Architecture Patterns ebook to better understand how to design components—and how they should interact.

It’s yours, free.

Cover of Software Architecture Patterns

Check it out now on O’Reilly

Dive in for free with a 10-day trial of the O’Reilly learning platform—then explore all the other resources our members count on to build skills and solve problems every day.

methods of analysis in case study

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

Research Methods | Definitions, Types, Examples

Research methods are specific procedures for collecting and analyzing data. Developing your research methods is an integral part of your research design . When planning your methods, there are two key decisions you will make.

First, decide how you will collect data . Your methods depend on what type of data you need to answer your research question :

  • Qualitative vs. quantitative : Will your data take the form of words or numbers?
  • Primary vs. secondary : Will you collect original data yourself, or will you use data that has already been collected by someone else?
  • Descriptive vs. experimental : Will you take measurements of something as it is, or will you perform an experiment?

Second, decide how you will analyze the data .

  • For quantitative data, you can use statistical analysis methods to test relationships between variables.
  • For qualitative data, you can use methods such as thematic analysis to interpret patterns and meanings in the data.

Table of contents

Methods for collecting data, examples of data collection methods, methods for analyzing data, examples of data analysis methods, other interesting articles, frequently asked questions about research methods.

Data is the information that you collect for the purposes of answering your research question . The type of data you need depends on the aims of your research.

Qualitative vs. quantitative data

Your choice of qualitative or quantitative data collection depends on the type of knowledge you want to develop.

For questions about ideas, experiences and meanings, or to study something that can’t be described numerically, collect qualitative data .

If you want to develop a more mechanistic understanding of a topic, or your research involves hypothesis testing , collect quantitative data .

Qualitative to broader populations. .
Quantitative .

You can also take a mixed methods approach , where you use both qualitative and quantitative research methods.

Primary vs. secondary research

Primary research is any original data that you collect yourself for the purposes of answering your research question (e.g. through surveys , observations and experiments ). Secondary research is data that has already been collected by other researchers (e.g. in a government census or previous scientific studies).

If you are exploring a novel research question, you’ll probably need to collect primary data . But if you want to synthesize existing knowledge, analyze historical trends, or identify patterns on a large scale, secondary data might be a better choice.

Primary . methods.
Secondary

Descriptive vs. experimental data

In descriptive research , you collect data about your study subject without intervening. The validity of your research will depend on your sampling method .

In experimental research , you systematically intervene in a process and measure the outcome. The validity of your research will depend on your experimental design .

To conduct an experiment, you need to be able to vary your independent variable , precisely measure your dependent variable, and control for confounding variables . If it’s practically and ethically possible, this method is the best choice for answering questions about cause and effect.

Descriptive . .
Experimental

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

methods of analysis in case study

Research methods for collecting data
Research method Primary or secondary? Qualitative or quantitative? When to use
Primary Quantitative To test cause-and-effect relationships.
Primary Quantitative To understand general characteristics of a population.
Interview/focus group Primary Qualitative To gain more in-depth understanding of a topic.
Observation Primary Either To understand how something occurs in its natural setting.
Secondary Either To situate your research in an existing body of work, or to evaluate trends within a research topic.
Either Either To gain an in-depth understanding of a specific group or context, or when you don’t have the resources for a large study.

Your data analysis methods will depend on the type of data you collect and how you prepare it for analysis.

Data can often be analyzed both quantitatively and qualitatively. For example, survey responses could be analyzed qualitatively by studying the meanings of responses or quantitatively by studying the frequencies of responses.

Qualitative analysis methods

Qualitative analysis is used to understand words, ideas, and experiences. You can use it to interpret data that was collected:

  • From open-ended surveys and interviews , literature reviews , case studies , ethnographies , and other sources that use text rather than numbers.
  • Using non-probability sampling methods .

Qualitative analysis tends to be quite flexible and relies on the researcher’s judgement, so you have to reflect carefully on your choices and assumptions and be careful to avoid research bias .

Quantitative analysis methods

Quantitative analysis uses numbers and statistics to understand frequencies, averages and correlations (in descriptive studies) or cause-and-effect relationships (in experiments).

You can use quantitative analysis to interpret data that was collected either:

  • During an experiment .
  • Using probability sampling methods .

Because the data is collected and analyzed in a statistically valid way, the results of quantitative analysis can be easily standardized and shared among researchers.

Research methods for analyzing data
Research method Qualitative or quantitative? When to use
Quantitative To analyze data collected in a statistically valid manner (e.g. from experiments, surveys, and observations).
Meta-analysis Quantitative To statistically analyze the results of a large collection of studies.

Can only be applied to studies that collected data in a statistically valid manner.

Qualitative To analyze data collected from interviews, , or textual sources.

To understand general themes in the data and how they are communicated.

Either To analyze large volumes of textual or visual data collected from surveys, literature reviews, or other sources.

Can be quantitative (i.e. frequencies of words) or qualitative (i.e. meanings of words).

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Chi square test of independence
  • Statistical power
  • Descriptive statistics
  • Degrees of freedom
  • Pearson correlation
  • Null hypothesis
  • Double-blind study
  • Case-control study
  • Research ethics
  • Data collection
  • Hypothesis testing
  • Structured interviews

Research bias

  • Hawthorne effect
  • Unconscious bias
  • Recall bias
  • Halo effect
  • Self-serving bias
  • Information bias

Quantitative research deals with numbers and statistics, while qualitative research deals with words and meanings.

Quantitative methods allow you to systematically measure variables and test hypotheses . Qualitative methods allow you to explore concepts and experiences in more detail.

In mixed methods research , you use both qualitative and quantitative data collection and analysis methods to answer your research question .

A sample is a subset of individuals from a larger population . Sampling means selecting the group that you will actually collect data from in your research. For example, if you are researching the opinions of students in your university, you could survey a sample of 100 students.

In statistics, sampling allows you to test a hypothesis about the characteristics of a population.

The research methods you use depend on the type of data you need to answer your research question .

  • If you want to measure something or test a hypothesis , use quantitative methods . If you want to explore ideas, thoughts and meanings, use qualitative methods .
  • If you want to analyze a large amount of readily-available data, use secondary data. If you want data specific to your purposes with control over how it is generated, collect primary data.
  • If you want to establish cause-and-effect relationships between variables , use experimental methods. If you want to understand the characteristics of a research subject, use descriptive methods.

Methodology refers to the overarching strategy and rationale of your research project . It involves studying the methods used in your field and the theories or principles behind them, in order to develop an approach that matches your objectives.

Methods are the specific tools and procedures you use to collect and analyze data (for example, experiments, surveys , and statistical tests ).

In shorter scientific papers, where the aim is to report the findings of a specific study, you might simply describe what you did in a methods section .

In a longer or more complex research project, such as a thesis or dissertation , you will probably include a methodology section , where you explain your approach to answering the research questions and cite relevant sources to support your choice of methods.

Is this article helpful?

Other students also liked, writing strong research questions | criteria & examples.

  • What Is a Research Design | Types, Guide & Examples
  • Data Collection | Definition, Methods & Examples

More interesting articles

  • Between-Subjects Design | Examples, Pros, & Cons
  • Cluster Sampling | A Simple Step-by-Step Guide with Examples
  • Confounding Variables | Definition, Examples & Controls
  • Construct Validity | Definition, Types, & Examples
  • Content Analysis | Guide, Methods & Examples
  • Control Groups and Treatment Groups | Uses & Examples
  • Control Variables | What Are They & Why Do They Matter?
  • Correlation vs. Causation | Difference, Designs & Examples
  • Correlational Research | When & How to Use
  • Critical Discourse Analysis | Definition, Guide & Examples
  • Cross-Sectional Study | Definition, Uses & Examples
  • Descriptive Research | Definition, Types, Methods & Examples
  • Ethical Considerations in Research | Types & Examples
  • Explanatory and Response Variables | Definitions & Examples
  • Explanatory Research | Definition, Guide, & Examples
  • Exploratory Research | Definition, Guide, & Examples
  • External Validity | Definition, Types, Threats & Examples
  • Extraneous Variables | Examples, Types & Controls
  • Guide to Experimental Design | Overview, Steps, & Examples
  • How Do You Incorporate an Interview into a Dissertation? | Tips
  • How to Do Thematic Analysis | Step-by-Step Guide & Examples
  • How to Write a Literature Review | Guide, Examples, & Templates
  • How to Write a Strong Hypothesis | Steps & Examples
  • Inclusion and Exclusion Criteria | Examples & Definition
  • Independent vs. Dependent Variables | Definition & Examples
  • Inductive Reasoning | Types, Examples, Explanation
  • Inductive vs. Deductive Research Approach | Steps & Examples
  • Internal Validity in Research | Definition, Threats, & Examples
  • Internal vs. External Validity | Understanding Differences & Threats
  • Longitudinal Study | Definition, Approaches & Examples
  • Mediator vs. Moderator Variables | Differences & Examples
  • Mixed Methods Research | Definition, Guide & Examples
  • Multistage Sampling | Introductory Guide & Examples
  • Naturalistic Observation | Definition, Guide & Examples
  • Operationalization | A Guide with Examples, Pros & Cons
  • Population vs. Sample | Definitions, Differences & Examples
  • Primary Research | Definition, Types, & Examples
  • Qualitative vs. Quantitative Research | Differences, Examples & Methods
  • Quasi-Experimental Design | Definition, Types & Examples
  • Questionnaire Design | Methods, Question Types & Examples
  • Random Assignment in Experiments | Introduction & Examples
  • Random vs. Systematic Error | Definition & Examples
  • Reliability vs. Validity in Research | Difference, Types and Examples
  • Reproducibility vs Replicability | Difference & Examples
  • Reproducibility vs. Replicability | Difference & Examples
  • Sampling Methods | Types, Techniques & Examples
  • Semi-Structured Interview | Definition, Guide & Examples
  • Simple Random Sampling | Definition, Steps & Examples
  • Single, Double, & Triple Blind Study | Definition & Examples
  • Stratified Sampling | Definition, Guide & Examples
  • Structured Interview | Definition, Guide & Examples
  • Survey Research | Definition, Examples & Methods
  • Systematic Review | Definition, Example, & Guide
  • Systematic Sampling | A Step-by-Step Guide with Examples
  • Textual Analysis | Guide, 3 Approaches & Examples
  • The 4 Types of Reliability in Research | Definitions & Examples
  • The 4 Types of Validity in Research | Definitions & Examples
  • Transcribing an Interview | 5 Steps & Transcription Software
  • Triangulation in Research | Guide, Types, Examples
  • Types of Interviews in Research | Guide & Examples
  • Types of Research Designs Compared | Guide & Examples
  • Types of Variables in Research & Statistics | Examples
  • Unstructured Interview | Definition, Guide & Examples
  • What Is a Case Study? | Definition, Examples & Methods
  • What Is a Case-Control Study? | Definition & Examples
  • What Is a Cohort Study? | Definition & Examples
  • What Is a Conceptual Framework? | Tips & Examples
  • What Is a Controlled Experiment? | Definitions & Examples
  • What Is a Double-Barreled Question?
  • What Is a Focus Group? | Step-by-Step Guide & Examples
  • What Is a Likert Scale? | Guide & Examples
  • What Is a Prospective Cohort Study? | Definition & Examples
  • What Is a Retrospective Cohort Study? | Definition & Examples
  • What Is Action Research? | Definition & Examples
  • What Is an Observational Study? | Guide & Examples
  • What Is Concurrent Validity? | Definition & Examples
  • What Is Content Validity? | Definition & Examples
  • What Is Convenience Sampling? | Definition & Examples
  • What Is Convergent Validity? | Definition & Examples
  • What Is Criterion Validity? | Definition & Examples
  • What Is Data Cleansing? | Definition, Guide & Examples
  • What Is Deductive Reasoning? | Explanation & Examples
  • What Is Discriminant Validity? | Definition & Example
  • What Is Ecological Validity? | Definition & Examples
  • What Is Ethnography? | Definition, Guide & Examples
  • What Is Face Validity? | Guide, Definition & Examples
  • What Is Non-Probability Sampling? | Types & Examples
  • What Is Participant Observation? | Definition & Examples
  • What Is Peer Review? | Types & Examples
  • What Is Predictive Validity? | Examples & Definition
  • What Is Probability Sampling? | Types & Examples
  • What Is Purposive Sampling? | Definition & Examples
  • What Is Qualitative Observation? | Definition & Examples
  • What Is Qualitative Research? | Methods & Examples
  • What Is Quantitative Observation? | Definition & Examples
  • What Is Quantitative Research? | Definition, Uses & Methods

What is your plagiarism score?

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Int J Qual Stud Health Well-being

Methodology or method? A critical review of qualitative case study reports

Despite on-going debate about credibility, and reported limitations in comparison to other approaches, case study is an increasingly popular approach among qualitative researchers. We critically analysed the methodological descriptions of published case studies. Three high-impact qualitative methods journals were searched to locate case studies published in the past 5 years; 34 were selected for analysis. Articles were categorized as health and health services ( n= 12), social sciences and anthropology ( n= 7), or methods ( n= 15) case studies. The articles were reviewed using an adapted version of established criteria to determine whether adequate methodological justification was present, and if study aims, methods, and reported findings were consistent with a qualitative case study approach. Findings were grouped into five themes outlining key methodological issues: case study methodology or method, case of something particular and case selection, contextually bound case study, researcher and case interactions and triangulation, and study design inconsistent with methodology reported. Improved reporting of case studies by qualitative researchers will advance the methodology for the benefit of researchers and practitioners.

Case study research is an increasingly popular approach among qualitative researchers (Thomas, 2011 ). Several prominent authors have contributed to methodological developments, which has increased the popularity of case study approaches across disciplines (Creswell, 2013b ; Denzin & Lincoln, 2011b ; Merriam, 2009 ; Ragin & Becker, 1992 ; Stake, 1995 ; Yin, 2009 ). Current qualitative case study approaches are shaped by paradigm, study design, and selection of methods, and, as a result, case studies in the published literature vary. Differences between published case studies can make it difficult for researchers to define and understand case study as a methodology.

Experienced qualitative researchers have identified case study research as a stand-alone qualitative approach (Denzin & Lincoln, 2011b ). Case study research has a level of flexibility that is not readily offered by other qualitative approaches such as grounded theory or phenomenology. Case studies are designed to suit the case and research question and published case studies demonstrate wide diversity in study design. There are two popular case study approaches in qualitative research. The first, proposed by Stake ( 1995 ) and Merriam ( 2009 ), is situated in a social constructivist paradigm, whereas the second, by Yin ( 2012 ), Flyvbjerg ( 2011 ), and Eisenhardt ( 1989 ), approaches case study from a post-positivist viewpoint. Scholarship from both schools of inquiry has contributed to the popularity of case study and development of theoretical frameworks and principles that characterize the methodology.

The diversity of case studies reported in the published literature, and on-going debates about credibility and the use of case study in qualitative research practice, suggests that differences in perspectives on case study methodology may prevent researchers from developing a mutual understanding of practice and rigour. In addition, discussion about case study limitations has led some authors to query whether case study is indeed a methodology (Luck, Jackson, & Usher, 2006 ; Meyer, 2001 ; Thomas, 2010 ; Tight, 2010 ). Methodological discussion of qualitative case study research is timely, and a review is required to analyse and understand how this methodology is applied in the qualitative research literature. The aims of this study were to review methodological descriptions of published qualitative case studies, to review how the case study methodological approach was applied, and to identify issues that need to be addressed by researchers, editors, and reviewers. An outline of the current definitions of case study and an overview of the issues proposed in the qualitative methodological literature are provided to set the scene for the review.

Definitions of qualitative case study research

Case study research is an investigation and analysis of a single or collective case, intended to capture the complexity of the object of study (Stake, 1995 ). Qualitative case study research, as described by Stake ( 1995 ), draws together “naturalistic, holistic, ethnographic, phenomenological, and biographic research methods” in a bricoleur design, or in his words, “a palette of methods” (Stake, 1995 , pp. xi–xii). Case study methodology maintains deep connections to core values and intentions and is “particularistic, descriptive and heuristic” (Merriam, 2009 , p. 46).

As a study design, case study is defined by interest in individual cases rather than the methods of inquiry used. The selection of methods is informed by researcher and case intuition and makes use of naturally occurring sources of knowledge, such as people or observations of interactions that occur in the physical space (Stake, 1998 ). Thomas ( 2011 ) suggested that “analytical eclecticism” is a defining factor (p. 512). Multiple data collection and analysis methods are adopted to further develop and understand the case, shaped by context and emergent data (Stake, 1995 ). This qualitative approach “explores a real-life, contemporary bounded system (a case ) or multiple bounded systems (cases) over time, through detailed, in-depth data collection involving multiple sources of information … and reports a case description and case themes ” (Creswell, 2013b , p. 97). Case study research has been defined by the unit of analysis, the process of study, and the outcome or end product, all essentially the case (Merriam, 2009 ).

The case is an object to be studied for an identified reason that is peculiar or particular. Classification of the case and case selection procedures informs development of the study design and clarifies the research question. Stake ( 1995 ) proposed three types of cases and study design frameworks. These include the intrinsic case, the instrumental case, and the collective instrumental case. The intrinsic case is used to understand the particulars of a single case, rather than what it represents. An instrumental case study provides insight on an issue or is used to refine theory. The case is selected to advance understanding of the object of interest. A collective refers to an instrumental case which is studied as multiple, nested cases, observed in unison, parallel, or sequential order. More than one case can be simultaneously studied; however, each case study is a concentrated, single inquiry, studied holistically in its own entirety (Stake, 1995 , 1998 ).

Researchers who use case study are urged to seek out what is common and what is particular about the case. This involves careful and in-depth consideration of the nature of the case, historical background, physical setting, and other institutional and political contextual factors (Stake, 1998 ). An interpretive or social constructivist approach to qualitative case study research supports a transactional method of inquiry, where the researcher has a personal interaction with the case. The case is developed in a relationship between the researcher and informants, and presented to engage the reader, inviting them to join in this interaction and in case discovery (Stake, 1995 ). A postpositivist approach to case study involves developing a clear case study protocol with careful consideration of validity and potential bias, which might involve an exploratory or pilot phase, and ensures that all elements of the case are measured and adequately described (Yin, 2009 , 2012 ).

Current methodological issues in qualitative case study research

The future of qualitative research will be influenced and constructed by the way research is conducted, and by what is reviewed and published in academic journals (Morse, 2011 ). If case study research is to further develop as a principal qualitative methodological approach, and make a valued contribution to the field of qualitative inquiry, issues related to methodological credibility must be considered. Researchers are required to demonstrate rigour through adequate descriptions of methodological foundations. Case studies published without sufficient detail for the reader to understand the study design, and without rationale for key methodological decisions, may lead to research being interpreted as lacking in quality or credibility (Hallberg, 2013 ; Morse, 2011 ).

There is a level of artistic license that is embraced by qualitative researchers and distinguishes practice, which nurtures creativity, innovation, and reflexivity (Denzin & Lincoln, 2011b ; Morse, 2009 ). Qualitative research is “inherently multimethod” (Denzin & Lincoln, 2011a , p. 5); however, with this creative freedom, it is important for researchers to provide adequate description for methodological justification (Meyer, 2001 ). This includes paradigm and theoretical perspectives that have influenced study design. Without adequate description, study design might not be understood by the reader, and can appear to be dishonest or inaccurate. Reviewers and readers might be confused by the inconsistent or inappropriate terms used to describe case study research approach and methods, and be distracted from important study findings (Sandelowski, 2000 ). This issue extends beyond case study research, and others have noted inconsistencies in reporting of methodology and method by qualitative researchers. Sandelowski ( 2000 , 2010 ) argued for accurate identification of qualitative description as a research approach. She recommended that the selected methodology should be harmonious with the study design, and be reflected in methods and analysis techniques. Similarly, Webb and Kevern ( 2000 ) uncovered inconsistencies in qualitative nursing research with focus group methods, recommending that methodological procedures must cite seminal authors and be applied with respect to the selected theoretical framework. Incorrect labelling using case study might stem from the flexibility in case study design and non-directional character relative to other approaches (Rosenberg & Yates, 2007 ). Methodological integrity is required in design of qualitative studies, including case study, to ensure study rigour and to enhance credibility of the field (Morse, 2011 ).

Case study has been unnecessarily devalued by comparisons with statistical methods (Eisenhardt, 1989 ; Flyvbjerg, 2006 , 2011 ; Jensen & Rodgers, 2001 ; Piekkari, Welch, & Paavilainen, 2009 ; Tight, 2010 ; Yin, 1999 ). It is reputed to be the “the weak sibling” in comparison to other, more rigorous, approaches (Yin, 2009 , p. xiii). Case study is not an inherently comparative approach to research. The objective is not statistical research, and the aim is not to produce outcomes that are generalizable to all populations (Thomas, 2011 ). Comparisons between case study and statistical research do little to advance this qualitative approach, and fail to recognize its inherent value, which can be better understood from the interpretive or social constructionist viewpoint of other authors (Merriam, 2009 ; Stake, 1995 ). Building on discussions relating to “fuzzy” (Bassey, 2001 ), or naturalistic generalizations (Stake, 1978 ), or transference of concepts and theories (Ayres, Kavanaugh, & Knafl, 2003 ; Morse et al., 2011 ) would have more relevance.

Case study research has been used as a catch-all design to justify or add weight to fundamental qualitative descriptive studies that do not fit with other traditional frameworks (Merriam, 2009 ). A case study has been a “convenient label for our research—when we ‘can't think of anything ‘better”—in an attempt to give it [qualitative methodology] some added respectability” (Tight, 2010 , p. 337). Qualitative case study research is a pliable approach (Merriam, 2009 ; Meyer, 2001 ; Stake, 1995 ), and has been likened to a “curious methodological limbo” (Gerring, 2004 , p. 341) or “paradigmatic bridge” (Luck et al., 2006 , p. 104), that is on the borderline between postpositivist and constructionist interpretations. This has resulted in inconsistency in application, which indicates that flexibility comes with limitations (Meyer, 2001 ), and the open nature of case study research might be off-putting to novice researchers (Thomas, 2011 ). The development of a well-(in)formed theoretical framework to guide a case study should improve consistency, rigour, and trust in studies published in qualitative research journals (Meyer, 2001 ).

Assessment of rigour

The purpose of this study was to analyse the methodological descriptions of case studies published in qualitative methods journals. To do this we needed to develop a suitable framework, which used existing, established criteria for appraising qualitative case study research rigour (Creswell, 2013b ; Merriam, 2009 ; Stake, 1995 ). A number of qualitative authors have developed concepts and criteria that are used to determine whether a study is rigorous (Denzin & Lincoln, 2011b ; Lincoln, 1995 ; Sandelowski & Barroso, 2002 ). The criteria proposed by Stake ( 1995 ) provide a framework for readers and reviewers to make judgements regarding case study quality, and identify key characteristics essential for good methodological rigour. Although each of the factors listed in Stake's criteria could enhance the quality of a qualitative research report, in Table I we present an adapted criteria used in this study, which integrates more recent work by Merriam ( 2009 ) and Creswell ( 2013b ). Stake's ( 1995 ) original criteria were separated into two categories. The first list of general criteria is “relevant for all qualitative research.” The second list, “high relevance to qualitative case study research,” was the criteria that we decided had higher relevance to case study research. This second list was the main criteria used to assess the methodological descriptions of the case studies reviewed. The complete table has been preserved so that the reader can determine how the original criteria were adapted.

Framework for assessing quality in qualitative case study research.

Checklist for assessing the quality of a case study report
Relevant for all qualitative research
1. Is this report easy to read?
2. Does it fit together, each sentence contributing to the whole?
3. Does this report have a conceptual structure (i.e., themes or issues)?
4. Are its issues developed in a series and scholarly way?
5. Have quotations been used effectively?
6. Has the writer made sound assertions, neither over- or under-interpreting?
7. Are headings, figures, artefacts, appendices, indexes effectively used?
8. Was it edited well, then again with a last minute polish?
9. Were sufficient raw data presented?
10. Is the nature of the intended audience apparent?
11. Does it appear that individuals were put at risk?
High relevance to qualitative case study research
12. Is the case adequately defined?
13. Is there a sense of story to the presentation?
14. Is the reader provided some vicarious experience?
15. Has adequate attention been paid to various contexts?
16. Were data sources well-chosen and in sufficient number?
17. Do observations and interpretations appear to have been triangulated?
18. Is the role and point of view of the researcher nicely apparent?
19. Is empathy shown for all sides?
20. Are personal intentions examined?
Added from Merriam ( )
21. Is the case study particular?
22. Is the case study descriptive?
23. Is the case study heuristic?
Added from Creswell ( )
24. Was study design appropriate to methodology?

Adapted from Stake ( 1995 , p. 131).

Study design

The critical review method described by Grant and Booth ( 2009 ) was used, which is appropriate for the assessment of research quality, and is used for literature analysis to inform research and practice. This type of review goes beyond the mapping and description of scoping or rapid reviews, to include “analysis and conceptual innovation” (Grant & Booth, 2009 , p. 93). A critical review is used to develop existing, or produce new, hypotheses or models. This is different to systematic reviews that answer clinical questions. It is used to evaluate existing research and competing ideas, to provide a “launch pad” for conceptual development and “subsequent testing” (Grant & Booth, 2009 , p. 93).

Qualitative methods journals were located by a search of the 2011 ISI Journal Citation Reports in Social Science, via the database Web of Knowledge (see m.webofknowledge.com). No “qualitative research methods” category existed in the citation reports; therefore, a search of all categories was performed using the term “qualitative.” In Table II , we present the qualitative methods journals located, ranked by impact factor. The highest ranked journals were selected for searching. We acknowledge that the impact factor ranking system might not be the best measure of journal quality (Cheek, Garnham, & Quan, 2006 ); however, this was the most appropriate and accessible method available.

International Journal of Qualitative Studies on Health and Well-being.

Journal title2011 impact factor5-year impact factor
2.1882.432
1.426N/A
0.8391.850
0.780N/A
0.612N/A

Search strategy

In March 2013, searches of the journals, Qualitative Health Research , Qualitative Research , and Qualitative Inquiry were completed to retrieve studies with “case study” in the abstract field. The search was limited to the past 5 years (1 January 2008 to 1 March 2013). The objective was to locate published qualitative case studies suitable for assessment using the adapted criterion. Viewpoints, commentaries, and other article types were excluded from review. Title and abstracts of the 45 retrieved articles were read by the first author, who identified 34 empirical case studies for review. All authors reviewed the 34 studies to confirm selection and categorization. In Table III , we present the 34 case studies grouped by journal, and categorized by research topic, including health sciences, social sciences and anthropology, and methods research. There was a discrepancy in categorization of one article on pedagogy and a new teaching method published in Qualitative Inquiry (Jorrín-Abellán, Rubia-Avi, Anguita-Martínez, Gómez-Sánchez, & Martínez-Mones, 2008 ). Consensus was to allocate to the methods category.

Outcomes of search of qualitative methods journals.

Journal titleDate of searchNumber of studies locatedNumber of full text studies extractedHealth sciencesSocial sciences and anthropologyMethods
4 Mar 20131816 Barone ( ); Bronken et al. ( ); Colón-Emeric et al. ( ); Fourie and Theron ( ); Gallagher et al. ( ); Gillard et al. ( ); Hooghe et al. ( ); Jackson et al. ( ); Ledderer ( ); Mawn et al. ( ); Roscigno et al. ( ); Rytterström et al. ( ) Nil Austin, Park, and Goble ( ); Broyles, Rodriguez, Price, Bayliss, and Sevick ( ); De Haene et al. ( ); Fincham et al. ( )
7 Mar 2013117Nil Adamson and Holloway ( ); Coltart and Henwood ( ) Buckley and Waring ( ); Cunsolo Willox et al. ( ); Edwards and Weller ( ); Gratton and O'Donnell ( ); Sumsion ( )
4 Mar 20131611Nil Buzzanell and D’Enbeau ( ); D'Enbeau et al. ( ); Nagar-Ron and Motzafi-Haller ( ); Snyder-Young ( ); Yeh ( ) Ajodhia-Andrews and Berman ( ); Alexander et al. ( ); Jorrín-Abellán et al. ( ); Nairn and Panelli ( ); Nespor ( ); Wimpenny and Savin-Baden ( )
Total453412715

In Table III , the number of studies located, and final numbers selected for review have been reported. Qualitative Health Research published the most empirical case studies ( n= 16). In the health category, there were 12 case studies of health conditions, health services, and health policy issues, all published in Qualitative Health Research . Seven case studies were categorized as social sciences and anthropology research, which combined case study with biography and ethnography methodologies. All three journals published case studies on methods research to illustrate a data collection or analysis technique, methodological procedure, or related issue.

The methodological descriptions of 34 case studies were critically reviewed using the adapted criteria. All articles reviewed contained a description of study methods; however, the length, amount of detail, and position of the description in the article varied. Few studies provided an accurate description and rationale for using a qualitative case study approach. In the 34 case studies reviewed, three described a theoretical framework informed by Stake ( 1995 ), two by Yin ( 2009 ), and three provided a mixed framework informed by various authors, which might have included both Yin and Stake. Few studies described their case study design, or included a rationale that explained why they excluded or added further procedures, and whether this was to enhance the study design, or to better suit the research question. In 26 of the studies no reference was provided to principal case study authors. From reviewing the description of methods, few authors provided a description or justification of case study methodology that demonstrated how their study was informed by the methodological literature that exists on this approach.

The methodological descriptions of each study were reviewed using the adapted criteria, and the following issues were identified: case study methodology or method; case of something particular and case selection; contextually bound case study; researcher and case interactions and triangulation; and, study design inconsistent with methodology. An outline of how the issues were developed from the critical review is provided, followed by a discussion of how these relate to the current methodological literature.

Case study methodology or method

A third of the case studies reviewed appeared to use a case report method, not case study methodology as described by principal authors (Creswell, 2013b ; Merriam, 2009 ; Stake, 1995 ; Yin, 2009 ). Case studies were identified as a case report because of missing methodological detail and by review of the study aims and purpose. These reports presented data for small samples of no more than three people, places or phenomenon. Four studies, or “case reports” were single cases selected retrospectively from larger studies (Bronken, Kirkevold, Martinsen, & Kvigne, 2012 ; Coltart & Henwood, 2012 ; Hooghe, Neimeyer, & Rober, 2012 ; Roscigno et al., 2012 ). Case reports were not a case of something, instead were a case demonstration or an example presented in a report. These reports presented outcomes, and reported on how the case could be generalized. Descriptions focussed on the phenomena, rather than the case itself, and did not appear to study the case in its entirety.

Case reports had minimal in-text references to case study methodology, and were informed by other qualitative traditions or secondary sources (Adamson & Holloway, 2012 ; Buzzanell & D'Enbeau, 2009 ; Nagar-Ron & Motzafi-Haller, 2011 ). This does not suggest that case study methodology cannot be multimethod, however, methodology should be consistent in design, be clearly described (Meyer, 2001 ; Stake, 1995 ), and maintain focus on the case (Creswell, 2013b ).

To demonstrate how case reports were identified, three examples are provided. The first, Yeh ( 2013 ) described their study as, “the examination of the emergence of vegetarianism in Victorian England serves as a case study to reveal the relationships between boundaries and entities” (p. 306). The findings were a historical case report, which resulted from an ethnographic study of vegetarianism. Cunsolo Willox, Harper, Edge, ‘My Word’: Storytelling and Digital Media Lab, and Rigolet Inuit Community Government (2013) used “a case study that illustrates the usage of digital storytelling within an Inuit community” (p. 130). This case study reported how digital storytelling can be used with indigenous communities as a participatory method to illuminate the benefits of this method for other studies. This “case study was conducted in the Inuit community” but did not include the Inuit community in case analysis (Cunsolo Willox et al., 2013 , p. 130). Bronken et al. ( 2012 ) provided a single case report to demonstrate issues observed in a larger clinical study of aphasia and stroke, without adequate case description or analysis.

Case study of something particular and case selection

Case selection is a precursor to case analysis, which needs to be presented as a convincing argument (Merriam, 2009 ). Descriptions of the case were often not adequate to ascertain why the case was selected, or whether it was a particular exemplar or outlier (Thomas, 2011 ). In a number of case studies in the health and social science categories, it was not explicit whether the case was of something particular, or peculiar to their discipline or field (Adamson & Holloway, 2012 ; Bronken et al., 2012 ; Colón-Emeric et al., 2010 ; Jackson, Botelho, Welch, Joseph, & Tennstedt, 2012 ; Mawn et al., 2010 ; Snyder-Young, 2011 ). There were exceptions in the methods category ( Table III ), where cases were selected by researchers to report on a new or innovative method. The cases emerged through heuristic study, and were reported to be particular, relative to the existing methods literature (Ajodhia-Andrews & Berman, 2009 ; Buckley & Waring, 2013 ; Cunsolo Willox et al., 2013 ; De Haene, Grietens, & Verschueren, 2010 ; Gratton & O'Donnell, 2011 ; Sumsion, 2013 ; Wimpenny & Savin-Baden, 2012 ).

Case selection processes were sometimes insufficient to understand why the case was selected from the global population of cases, or what study of this case would contribute to knowledge as compared with other possible cases (Adamson & Holloway, 2012 ; Bronken et al., 2012 ; Colón-Emeric et al., 2010 ; Jackson et al., 2012 ; Mawn et al., 2010 ). In two studies, local cases were selected (Barone, 2010 ; Fourie & Theron, 2012 ) because the researcher was familiar with and had access to the case. Possible limitations of a convenience sample were not acknowledged. Purposeful sampling was used to recruit participants within the case of one study, but not of the case itself (Gallagher et al., 2013 ). Random sampling was completed for case selection in two studies (Colón-Emeric et al., 2010 ; Jackson et al., 2012 ), which has limited meaning in interpretive qualitative research.

To demonstrate how researchers provided a good justification for the selection of case study approaches, four examples are provided. The first, cases of residential care homes, were selected because of reported occurrences of mistreatment, which included residents being locked in rooms at night (Rytterström, Unosson, & Arman, 2013 ). Roscigno et al. ( 2012 ) selected cases of parents who were admitted for early hospitalization in neonatal intensive care with a threatened preterm delivery before 26 weeks. Hooghe et al. ( 2012 ) used random sampling to select 20 couples that had experienced the death of a child; however, the case study was of one couple and a particular metaphor described only by them. The final example, Coltart and Henwood ( 2012 ), provided a detailed account of how they selected two cases from a sample of 46 fathers based on personal characteristics and beliefs. They described how the analysis of the two cases would contribute to their larger study on first time fathers and parenting.

Contextually bound case study

The limits or boundaries of the case are a defining factor of case study methodology (Merriam, 2009 ; Ragin & Becker, 1992 ; Stake, 1995 ; Yin, 2009 ). Adequate contextual description is required to understand the setting or context in which the case is revealed. In the health category, case studies were used to illustrate a clinical phenomenon or issue such as compliance and health behaviour (Colón-Emeric et al., 2010 ; D'Enbeau, Buzzanell, & Duckworth, 2010 ; Gallagher et al., 2013 ; Hooghe et al., 2012 ; Jackson et al., 2012 ; Roscigno et al., 2012 ). In these case studies, contextual boundaries, such as physical and institutional descriptions, were not sufficient to understand the case as a holistic system, for example, the general practitioner (GP) clinic in Gallagher et al. ( 2013 ), or the nursing home in Colón-Emeric et al. ( 2010 ). Similarly, in the social science and methods categories, attention was paid to some components of the case context, but not others, missing important information required to understand the case as a holistic system (Alexander, Moreira, & Kumar, 2012 ; Buzzanell & D'Enbeau, 2009 ; Nairn & Panelli, 2009 ; Wimpenny & Savin-Baden, 2012 ).

In two studies, vicarious experience or vignettes (Nairn & Panelli, 2009 ) and images (Jorrín-Abellán et al., 2008 ) were effective to support description of context, and might have been a useful addition for other case studies. Missing contextual boundaries suggests that the case might not be adequately defined. Additional information, such as the physical, institutional, political, and community context, would improve understanding of the case (Stake, 1998 ). In Boxes 1 and 2 , we present brief synopses of two studies that were reviewed, which demonstrated a well bounded case. In Box 1 , Ledderer ( 2011 ) used a qualitative case study design informed by Stake's tradition. In Box 2 , Gillard, Witt, and Watts ( 2011 ) were informed by Yin's tradition. By providing a brief outline of the case studies in Boxes 1 and 2 , we demonstrate how effective case boundaries can be constructed and reported, which may be of particular interest to prospective case study researchers.

Article synopsis of case study research using Stake's tradition

Ledderer ( 2011 ) used a qualitative case study research design, informed by modern ethnography. The study is bounded to 10 general practice clinics in Denmark, who had received federal funding to implement preventative care services based on a Motivational Interviewing intervention. The researcher question focussed on “why is it so difficult to create change in medical practice?” (Ledderer, 2011 , p. 27). The study context was adequately described, providing detail on the general practitioner (GP) clinics and relevant political and economic influences. Methodological decisions are described in first person narrative, providing insight on researcher perspectives and interaction with the case. Forty-four interviews were conducted, which focussed on how GPs conducted consultations, and the form, nature and content, rather than asking their opinion or experience (Ledderer, 2011 , p. 30). The duration and intensity of researcher immersion in the case enhanced depth of description and trustworthiness of study findings. Analysis was consistent with Stake's tradition, and the researcher provided examples of inquiry techniques used to challenge assumptions about emerging themes. Several other seminal qualitative works were cited. The themes and typology constructed are rich in narrative data and storytelling by clinic staff, demonstrating individual clinic experiences as well as shared meanings and understandings about changing from a biomedical to psychological approach to preventative health intervention. Conclusions make note of social and cultural meanings and lessons learned, which might not have been uncovered using a different methodology.

Article synopsis of case study research using Yin's tradition

Gillard et al. ( 2011 ) study of camps for adolescents living with HIV/AIDs provided a good example of Yin's interpretive case study approach. The context of the case is bounded by the three summer camps of which the researchers had prior professional involvement. A case study protocol was developed that used multiple methods to gather information at three data collection points coinciding with three youth camps (Teen Forum, Discover Camp, and Camp Strong). Gillard and colleagues followed Yin's ( 2009 ) principles, using a consistent data protocol that enhanced cross-case analysis. Data described the young people, the camp physical environment, camp schedule, objectives and outcomes, and the staff of three youth camps. The findings provided a detailed description of the context, with less detail of individual participants, including insight into researcher's interpretations and methodological decisions throughout the data collection and analysis process. Findings provided the reader with a sense of “being there,” and are discovered through constant comparison of the case with the research issues; the case is the unit of analysis. There is evidence of researcher immersion in the case, and Gillard reports spending significant time in the field in a naturalistic and integrated youth mentor role.

This case study is not intended to have a significant impact on broader health policy, although does have implications for health professionals working with adolescents. Study conclusions will inform future camps for young people with chronic disease, and practitioners are able to compare similarities between this case and their own practice (for knowledge translation). No limitations of this article were reported. Limitations related to publication of this case study were that it was 20 pages long and used three tables to provide sufficient description of the camp and program components, and relationships with the research issue.

Researcher and case interactions and triangulation

Researcher and case interactions and transactions are a defining feature of case study methodology (Stake, 1995 ). Narrative stories, vignettes, and thick description are used to provoke vicarious experience and a sense of being there with the researcher in their interaction with the case. Few of the case studies reviewed provided details of the researcher's relationship with the case, researcher–case interactions, and how these influenced the development of the case study (Buzzanell & D'Enbeau, 2009 ; D'Enbeau et al., 2010 ; Gallagher et al., 2013 ; Gillard et al., 2011 ; Ledderer, 2011 ; Nagar-Ron & Motzafi-Haller, 2011 ). The role and position of the researcher needed to be self-examined and understood by readers, to understand how this influenced interactions with participants, and to determine what triangulation is needed (Merriam, 2009 ; Stake, 1995 ).

Gillard et al. ( 2011 ) provided a good example of triangulation, comparing data sources in a table (p. 1513). Triangulation of sources was used to reveal as much depth as possible in the study by Nagar-Ron and Motzafi-Haller ( 2011 ), while also enhancing confirmation validity. There were several case studies that would have benefited from improved range and use of data sources, and descriptions of researcher–case interactions (Ajodhia-Andrews & Berman, 2009 ; Bronken et al., 2012 ; Fincham, Scourfield, & Langer, 2008 ; Fourie & Theron, 2012 ; Hooghe et al., 2012 ; Snyder-Young, 2011 ; Yeh, 2013 ).

Study design inconsistent with methodology

Good, rigorous case studies require a strong methodological justification (Meyer, 2001 ) and a logical and coherent argument that defines paradigm, methodological position, and selection of study methods (Denzin & Lincoln, 2011b ). Methodological justification was insufficient in several of the studies reviewed (Barone, 2010 ; Bronken et al., 2012 ; Hooghe et al., 2012 ; Mawn et al., 2010 ; Roscigno et al., 2012 ; Yeh, 2013 ). This was judged by the absence, or inadequate or inconsistent reference to case study methodology in-text.

In six studies, the methodological justification provided did not relate to case study. There were common issues identified. Secondary sources were used as primary methodological references indicating that study design might not have been theoretically sound (Colón-Emeric et al., 2010 ; Coltart & Henwood, 2012 ; Roscigno et al., 2012 ; Snyder-Young, 2011 ). Authors and sources cited in methodological descriptions were inconsistent with the actual study design and practices used (Fourie & Theron, 2012 ; Hooghe et al., 2012 ; Jorrín-Abellán et al., 2008 ; Mawn et al., 2010 ; Rytterström et al., 2013 ; Wimpenny & Savin-Baden, 2012 ). This occurred when researchers cited Stake or Yin, or both (Mawn et al., 2010 ; Rytterström et al., 2013 ), although did not follow their paradigmatic or methodological approach. In 26 studies there were no citations for a case study methodological approach.

The findings of this study have highlighted a number of issues for researchers. A considerable number of case studies reviewed were missing key elements that define qualitative case study methodology and the tradition cited. A significant number of studies did not provide a clear methodological description or justification relevant to case study. Case studies in health and social sciences did not provide sufficient information for the reader to understand case selection, and why this case was chosen above others. The context of the cases were not described in adequate detail to understand all relevant elements of the case context, which indicated that cases may have not been contextually bounded. There were inconsistencies between reported methodology, study design, and paradigmatic approach in case studies reviewed, which made it difficult to understand the study methodology and theoretical foundations. These issues have implications for methodological integrity and honesty when reporting study design, which are values of the qualitative research tradition and are ethical requirements (Wager & Kleinert, 2010a ). Poorly described methodological descriptions may lead the reader to misinterpret or discredit study findings, which limits the impact of the study, and, as a collective, hinders advancements in the broader qualitative research field.

The issues highlighted in our review build on current debates in the case study literature, and queries about the value of this methodology. Case study research can be situated within different paradigms or designed with an array of methods. In order to maintain the creativity and flexibility that is valued in this methodology, clearer descriptions of paradigm and theoretical position and methods should be provided so that study findings are not undervalued or discredited. Case study research is an interdisciplinary practice, which means that clear methodological descriptions might be more important for this approach than other methodologies that are predominantly driven by fewer disciplines (Creswell, 2013b ).

Authors frequently omit elements of methodologies and include others to strengthen study design, and we do not propose a rigid or purist ideology in this paper. On the contrary, we encourage new ideas about using case study, together with adequate reporting, which will advance the value and practice of case study. The implications of unclear methodological descriptions in the studies reviewed were that study design appeared to be inconsistent with reported methodology, and key elements required for making judgements of rigour were missing. It was not clear whether the deviations from methodological tradition were made by researchers to strengthen the study design, or because of misinterpretations. Morse ( 2011 ) recommended that innovations and deviations from practice are best made by experienced researchers, and that a novice might be unaware of the issues involved with making these changes. To perpetuate the tradition of case study research, applications in the published literature should have consistencies with traditional methodological constructions, and deviations should be described with a rationale that is inherent in study conduct and findings. Providing methodological descriptions that demonstrate a strong theoretical foundation and coherent study design will add credibility to the study, while ensuring the intrinsic meaning of case study is maintained.

The value of this review is that it contributes to discussion of whether case study is a methodology or method. We propose possible reasons why researchers might make this misinterpretation. Researchers may interchange the terms methods and methodology, and conduct research without adequate attention to epistemology and historical tradition (Carter & Little, 2007 ; Sandelowski, 2010 ). If the rich meaning that naming a qualitative methodology brings to the study is not recognized, a case study might appear to be inconsistent with the traditional approaches described by principal authors (Creswell, 2013a ; Merriam, 2009 ; Stake, 1995 ; Yin, 2009 ). If case studies are not methodologically and theoretically situated, then they might appear to be a case report.

Case reports are promoted by university and medical journals as a method of reporting on medical or scientific cases; guidelines for case reports are publicly available on websites ( http://www.hopkinsmedicine.org/institutional_review_board/guidelines_policies/guidelines/case_report.html ). The various case report guidelines provide a general criteria for case reports, which describes that this form of report does not meet the criteria of research, is used for retrospective analysis of up to three clinical cases, and is primarily illustrative and for educational purposes. Case reports can be published in academic journals, but do not require approval from a human research ethics committee. Traditionally, case reports describe a single case, to explain how and what occurred in a selected setting, for example, to illustrate a new phenomenon that has emerged from a larger study. A case report is not necessarily particular or the study of a case in its entirety, and the larger study would usually be guided by a different research methodology.

This description of a case report is similar to what was provided in some studies reviewed. This form of report lacks methodological grounding and qualities of research rigour. The case report has publication value in demonstrating an example and for dissemination of knowledge (Flanagan, 1999 ). However, case reports have different meaning and purpose to case study, which needs to be distinguished. Findings of our review suggest that the medical understanding of a case report has been confused with qualitative case study approaches.

In this review, a number of case studies did not have methodological descriptions that included key characteristics of case study listed in the adapted criteria, and several issues have been discussed. There have been calls for improvements in publication quality of qualitative research (Morse, 2011 ), and for improvements in peer review of submitted manuscripts (Carter & Little, 2007 ; Jasper, Vaismoradi, Bondas, & Turunen, 2013 ). The challenging nature of editor and reviewers responsibilities are acknowledged in the literature (Hames, 2013 ; Wager & Kleinert, 2010b ); however, review of case study methodology should be prioritized because of disputes on methodological value.

Authors using case study approaches are recommended to describe their theoretical framework and methods clearly, and to seek and follow specialist methodological advice when needed (Wager & Kleinert, 2010a ). Adequate page space for case study description would contribute to better publications (Gillard et al., 2011 ). Capitalizing on the ability to publish complementary resources should be considered.

Limitations of the review

There is a level of subjectivity involved in this type of review and this should be considered when interpreting study findings. Qualitative methods journals were selected because the aims and scope of these journals are to publish studies that contribute to methodological discussion and development of qualitative research. Generalist health and social science journals were excluded that might have contained good quality case studies. Journals in business or education were also excluded, although a review of case studies in international business journals has been published elsewhere (Piekkari et al., 2009 ).

The criteria used to assess the quality of the case studies were a set of qualitative indicators. A numerical or ranking system might have resulted in different results. Stake's ( 1995 ) criteria have been referenced elsewhere, and was deemed the best available (Creswell, 2013b ; Crowe et al., 2011 ). Not all qualitative studies are reported in a consistent way and some authors choose to report findings in a narrative form in comparison to a typical biomedical report style (Sandelowski & Barroso, 2002 ), if misinterpretations were made this may have affected the review.

Case study research is an increasingly popular approach among qualitative researchers, which provides methodological flexibility through the incorporation of different paradigmatic positions, study designs, and methods. However, whereas flexibility can be an advantage, a myriad of different interpretations has resulted in critics questioning the use of case study as a methodology. Using an adaptation of established criteria, we aimed to identify and assess the methodological descriptions of case studies in high impact, qualitative methods journals. Few articles were identified that applied qualitative case study approaches as described by experts in case study design. There were inconsistencies in methodology and study design, which indicated that researchers were confused whether case study was a methodology or a method. Commonly, there appeared to be confusion between case studies and case reports. Without clear understanding and application of the principles and key elements of case study methodology, there is a risk that the flexibility of the approach will result in haphazard reporting, and will limit its global application as a valuable, theoretically supported methodology that can be rigorously applied across disciplines and fields.

Conflict of interest and funding

The authors have not received any funding or benefits from industry or elsewhere to conduct this study.

  • Adamson S, Holloway M. Negotiating sensitivities and grappling with intangibles: Experiences from a study of spirituality and funerals. Qualitative Research. 2012; 12 (6):735–752. doi: 10.1177/1468794112439008. [ CrossRef ] [ Google Scholar ]
  • Ajodhia-Andrews A, Berman R. Exploring school life from the lens of a child who does not use speech to communicate. Qualitative Inquiry. 2009; 15 (5):931–951. doi: 10.1177/1077800408322789. [ CrossRef ] [ Google Scholar ]
  • Alexander B. K, Moreira C, Kumar H. S. Resisting (resistance) stories: A tri-autoethnographic exploration of father narratives across shades of difference. Qualitative Inquiry. 2012; 18 (2):121–133. doi: 10.1177/1077800411429087. [ CrossRef ] [ Google Scholar ]
  • Austin W, Park C, Goble E. From interdisciplinary to transdisciplinary research: A case study. Qualitative Health Research. 2008; 18 (4):557–564. doi: 10.1177/1049732307308514. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Ayres L, Kavanaugh K, Knafl K. A. Within-case and across-case approaches to qualitative data analysis. Qualitative Health Research. 2003; 13 (6):871–883. doi: 10.1177/1049732303013006008. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Barone T. L. Culturally sensitive care 1969–2000: The Indian Chicano Health Center. Qualitative Health Research. 2010; 20 (4):453–464. doi: 10.1177/1049732310361893. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Bassey M. A solution to the problem of generalisation in educational research: Fuzzy prediction. Oxford Review of Education. 2001; 27 (1):5–22. doi: 10.1080/03054980123773. [ CrossRef ] [ Google Scholar ]
  • Bronken B. A, Kirkevold M, Martinsen R, Kvigne K. The aphasic storyteller: Coconstructing stories to promote psychosocial well-being after stroke. Qualitative Health Research. 2012; 22 (10):1303–1316. doi: 10.1177/1049732312450366. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Broyles L. M, Rodriguez K. L, Price P. A, Bayliss N. K, Sevick M. A. Overcoming barriers to the recruitment of nurses as participants in health care research. Qualitative Health Research. 2011; 21 (12):1705–1718. doi: 10.1177/1049732311417727. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Buckley C. A, Waring M. J. Using diagrams to support the research process: Examples from grounded theory. Qualitative Research. 2013; 13 (2):148–172. doi: 10.1177/1468794112472280. [ CrossRef ] [ Google Scholar ]
  • Buzzanell P. M, D'Enbeau S. Stories of caregiving: Intersections of academic research and women's everyday experiences. Qualitative Inquiry. 2009; 15 (7):1199–1224. doi: 10.1177/1077800409338025. [ CrossRef ] [ Google Scholar ]
  • Carter S. M, Little M. Justifying knowledge, justifying method, taking action: Epistemologies, methodologies, and methods in qualitative research. Qualitative Health Research. 2007; 17 (10):1316–1328. doi: 10.1177/1049732307306927. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Cheek J, Garnham B, Quan J. What's in a number? Issues in providing evidence of impact and quality of research(ers) Qualitative Health Research. 2006; 16 (3):423–435. doi: 10.1177/1049732305285701. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Colón-Emeric C. S, Plowman D, Bailey D, Corazzini K, Utley-Smith Q, Ammarell N, et al. Regulation and mindful resident care in nursing homes. Qualitative Health Research. 2010; 20 (9):1283–1294. doi: 10.1177/1049732310369337. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Coltart C, Henwood K. On paternal subjectivity: A qualitative longitudinal and psychosocial case analysis of men's classed positions and transitions to first-time fatherhood. Qualitative Research. 2012; 12 (1):35–52. doi: 10.1177/1468794111426224. [ CrossRef ] [ Google Scholar ]
  • Creswell J. W. Five qualitative approaches to inquiry. In: Creswell J. W, editor. Qualitative inquiry and research design: Choosing among five approaches. 3rd ed. Thousand Oaks, CA: Sage; 2013a. pp. 53–84. [ Google Scholar ]
  • Creswell J. W. Qualitative inquiry and research design: Choosing among five approaches. 3rd ed. Thousand Oaks, CA: Sage; 2013b. [ Google Scholar ]
  • Crowe S, Cresswell K, Robertson A, Huby G, Avery A, Sheikh A. The case study approach. BMC Medical Research Methodology. 2011; 11 (1):1–9. doi: 10.1186/1471-2288-11-100. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Cunsolo Willox A, Harper S. L, Edge V. L, ‘My Word’: Storytelling and Digital Media Lab, & Rigolet Inuit Community Government Storytelling in a digital age: Digital storytelling as an emerging narrative method for preserving and promoting indigenous oral wisdom. Qualitative Research. 2013; 13 (2):127–147. doi: 10.1177/1468794112446105. [ CrossRef ] [ Google Scholar ]
  • De Haene L, Grietens H, Verschueren K. Holding harm: Narrative methods in mental health research on refugee trauma. Qualitative Health Research. 2010; 20 (12):1664–1676. doi: 10.1177/1049732310376521. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • D'Enbeau S, Buzzanell P. M, Duckworth J. Problematizing classed identities in fatherhood: Development of integrative case studies for analysis and praxis. Qualitative Inquiry. 2010; 16 (9):709–720. doi: 10.1177/1077800410374183. [ CrossRef ] [ Google Scholar ]
  • Denzin N. K, Lincoln Y. S. Introduction: Disciplining the practice of qualitative research. In: Denzin N. K, Lincoln Y. S, editors. The SAGE handbook of qualitative research. 4th ed. Thousand Oaks, CA: Sage; 2011a. pp. 1–6. [ Google Scholar ]
  • Denzin N. K, Lincoln Y. S, editors. The SAGE handbook of qualitative research. 4th ed. Thousand Oaks, CA: Sage; 2011b. [ Google Scholar ]
  • Edwards R, Weller S. Shifting analytic ontology: Using I-poems in qualitative longitudinal research. Qualitative Research. 2012; 12 (2):202–217. doi: 10.1177/1468794111422040. [ CrossRef ] [ Google Scholar ]
  • Eisenhardt K. M. Building theories from case study research. The Academy of Management Review. 1989; 14 (4):532–550. doi: 10.2307/258557. [ CrossRef ] [ Google Scholar ]
  • Fincham B, Scourfield J, Langer S. The impact of working with disturbing secondary data: Reading suicide files in a coroner's office. Qualitative Health Research. 2008; 18 (6):853–862. doi: 10.1177/1049732307308945. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Flanagan J. Public participation in the design of educational programmes for cancer nurses: A case report. European Journal of Cancer Care. 1999; 8 (2):107–112. doi: 10.1046/j.1365-2354.1999.00141.x. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Flyvbjerg B. Five misunderstandings about case-study research. Qualitative Inquiry. 2006; 12 (2):219–245. doi: 10.1177/1077800405284.363. [ CrossRef ] [ Google Scholar ]
  • Flyvbjerg B. Case study. In: Denzin N. K, Lincoln Y. S, editors. The SAGE handbook of qualitative research. 4th ed. Thousand Oaks, CA: Sage; 2011. pp. 301–316. [ Google Scholar ]
  • Fourie C. L, Theron L. C. Resilience in the face of fragile X syndrome. Qualitative Health Research. 2012; 22 (10):1355–1368. doi: 10.1177/1049732312451871. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Gallagher N, MacFarlane A, Murphy A. W, Freeman G. K, Glynn L. G, Bradley C. P. Service users’ and caregivers’ perspectives on continuity of care in out-of-hours primary care. Qualitative Health Research. 2013; 23 (3):407–421. doi: 10.1177/1049732312470521. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Gerring J. What is a case study and what is it good for? American Political Science Review. 2004; 98 (2):341–354. doi: 10.1017/S0003055404001182. [ CrossRef ] [ Google Scholar ]
  • Gillard A, Witt P. A, Watts C. E. Outcomes and processes at a camp for youth with HIV/AIDS. Qualitative Health Research. 2011; 21 (11):1508–1526. doi: 10.1177/1049732311413907. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Grant M, Booth A. A typology of reviews: An analysis of 14 review types and associated methodologies. Health Information and Libraries Journal. 2009; 26 :91–108. doi: 10.1111/j.1471-1842.2009.00848.x. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Gratton M.-F, O'Donnell S. Communication technologies for focus groups with remote communities: A case study of research with First Nations in Canada. Qualitative Research. 2011; 11 (2):159–175. doi: 10.1177/1468794110394068. [ CrossRef ] [ Google Scholar ]
  • Hallberg L. Quality criteria and generalization of results from qualitative studies. International Journal of Qualitative Studies on Health and Wellbeing. 2013; 8 :1. doi: 10.3402/qhw.v8i0.20647. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hames I. Committee on Publication Ethics, 1. 2013, March. COPE Ethical guidelines for peer reviewers. Retrieved April 7, 2013, from http://publicationethics.org/resources/guidelines . [ Google Scholar ]
  • Hooghe A, Neimeyer R. A, Rober P. “Cycling around an emotional core of sadness”: Emotion regulation in a couple after the loss of a child. Qualitative Health Research. 2012; 22 (9):1220–1231. doi: 10.1177/1049732312449209. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Jackson C. B, Botelho E. M, Welch L. C, Joseph J, Tennstedt S. L. Talking with others about stigmatized health conditions: Implications for managing symptoms. Qualitative Health Research. 2012; 22 (11):1468–1475. doi: 10.1177/1049732312450323. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Jasper M, Vaismoradi M, Bondas T, Turunen H. Validity and reliability of the scientific review process in nursing journals—time for a rethink? Nursing Inquiry. 2013 doi: 10.1111/nin.12030. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Jensen J. L, Rodgers R. Cumulating the intellectual gold of case study research. Public Administration Review. 2001; 61 (2):235–246. doi: 10.1111/0033-3352.00025. [ CrossRef ] [ Google Scholar ]
  • Jorrín-Abellán I. M, Rubia-Avi B, Anguita-Martínez R, Gómez-Sánchez E, Martínez-Mones A. Bouncing between the dark and bright sides: Can technology help qualitative research? Qualitative Inquiry. 2008; 14 (7):1187–1204. doi: 10.1177/1077800408318435. [ CrossRef ] [ Google Scholar ]
  • Ledderer L. Understanding change in medical practice: The role of shared meaning in preventive treatment. Qualitative Health Research. 2011; 21 (1):27–40. doi: 10.1177/1049732310377451. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lincoln Y. S. Emerging criteria for quality in qualitative and interpretive research. Qualitative Inquiry. 1995; 1 (3):275–289. doi: 10.1177/107780049500100301. [ CrossRef ] [ Google Scholar ]
  • Luck L, Jackson D, Usher K. Case study: A bridge across the paradigms. Nursing Inquiry. 2006; 13 (2):103–109. doi: 10.1111/j.1440-1800.2006.00309.x. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Mawn B, Siqueira E, Koren A, Slatin C, Devereaux Melillo K, Pearce C, et al. Health disparities among health care workers. Qualitative Health Research. 2010; 20 (1):68–80. doi: 10.1177/1049732309355590. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Merriam S. B. Qualitative research: A guide to design and implementation. 3rd ed. San Francisco, CA: Jossey-Bass; 2009. [ Google Scholar ]
  • Meyer C. B. A case in case study methodology. Field Methods. 2001; 13 (4):329–352. doi: 10.1177/1525822x0101300402. [ CrossRef ] [ Google Scholar ]
  • Morse J. M. Mixing qualitative methods. Qualitative Health Research. 2009; 19 (11):1523–1524. doi: 10.1177/1049732309349360. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Morse J. M. Molding qualitative health research. Qualitative Health Research. 2011; 21 (8):1019–1021. doi: 10.1177/1049732311404706. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Morse J. M, Dimitroff L. J, Harper R, Koontz A, Kumra S, Matthew-Maich N, et al. Considering the qualitative–quantitative language divide. Qualitative Health Research. 2011; 21 (9):1302–1303. doi: 10.1177/1049732310392386. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Nagar-Ron S, Motzafi-Haller P. “My life? There is not much to tell”: On voice, silence and agency in interviews with first-generation Mizrahi Jewish women immigrants to Israel. Qualitative Inquiry. 2011; 17 (7):653–663. doi: 10.1177/1077800411414007. [ CrossRef ] [ Google Scholar ]
  • Nairn K, Panelli R. Using fiction to make meaning in research with young people in rural New Zealand. Qualitative Inquiry. 2009; 15 (1):96–112. doi: 10.1177/1077800408318314. [ CrossRef ] [ Google Scholar ]
  • Nespor J. The afterlife of “teachers’ beliefs”: Qualitative methodology and the textline. Qualitative Inquiry. 2012; 18 (5):449–460. doi: 10.1177/1077800412439530. [ CrossRef ] [ Google Scholar ]
  • Piekkari R, Welch C, Paavilainen E. The case study as disciplinary convention: Evidence from international business journals. Organizational Research Methods. 2009; 12 (3):567–589. doi: 10.1177/1094428108319905. [ CrossRef ] [ Google Scholar ]
  • Ragin C. C, Becker H. S. What is a case?: Exploring the foundations of social inquiry. Cambridge: Cambridge University Press; 1992. [ Google Scholar ]
  • Roscigno C. I, Savage T. A, Kavanaugh K, Moro T. T, Kilpatrick S. J, Strassner H. T, et al. Divergent views of hope influencing communications between parents and hospital providers. Qualitative Health Research. 2012; 22 (9):1232–1246. doi: 10.1177/1049732312449210. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Rosenberg J. P, Yates P. M. Schematic representation of case study research designs. Journal of Advanced Nursing. 2007; 60 (4):447–452. doi: 10.1111/j.1365-2648.2007.04385.x. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Rytterström P, Unosson M, Arman M. Care culture as a meaning- making process: A study of a mistreatment investigation. Qualitative Health Research. 2013; 23 :1179–1187. doi: 10.1177/1049732312470760. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Sandelowski M. Whatever happened to qualitative description? Research in Nursing & Health. 2000; 23 (4):334–340. doi: 10.1002/1098-240X. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Sandelowski M. What's in a name? Qualitative description revisited. Research in Nursing & Health. 2010; 33 (1):77–84. doi: 10.1002/nur.20362. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Sandelowski M, Barroso J. Reading qualitative studies. International Journal of Qualitative Methods. 2002; 1 (1):74–108. [ Google Scholar ]
  • Snyder-Young D. “Here to tell her story”: Analyzing the autoethnographic performances of others. Qualitative Inquiry. 2011; 17 (10):943–951. doi: 10.1177/1077800411425149. [ CrossRef ] [ Google Scholar ]
  • Stake R. E. The case study method in social inquiry. Educational Researcher. 1978; 7 (2):5–8. [ Google Scholar ]
  • Stake R. E. The art of case study research. Thousand Oaks, CA: Sage; 1995. [ Google Scholar ]
  • Stake R. E. Case studies. In: Denzin N. K, Lincoln Y. S, editors. Strategies of qualitative inquiry. Thousand Oaks, CA: Sage; 1998. pp. 86–109. [ Google Scholar ]
  • Sumsion J. Opening up possibilities through team research: Investigating infants’ experiences of early childhood education and care. Qualitative Research. 2013; 14 (2):149–165. doi: 10.1177/1468794112468471.. [ CrossRef ] [ Google Scholar ]
  • Thomas G. Doing case study: Abduction not induction, phronesis not theory. Qualitative Inquiry. 2010; 16 (7):575–582. doi: 10.1177/1077800410372601. [ CrossRef ] [ Google Scholar ]
  • Thomas G. A typology for the case study in social science following a review of definition, discourse, and structure. Qualitative Inquiry. 2011; 17 (6):511–521. doi: 10.1177/1077800411409884. [ CrossRef ] [ Google Scholar ]
  • Tight M. The curious case of case study: A viewpoint. International Journal of Social Research Methodology. 2010; 13 (4):329–339. doi: 10.1080/13645570903187181. [ CrossRef ] [ Google Scholar ]
  • Wager E, Kleinert S. Responsible research publication: International standards for authors. A position statement developed at the 2nd World Conference on Research Integrity, Singapore, July 22–24, 2010. In: Mayer T, Steneck N, editors. Promoting research integrity in a global environment. Singapore: Imperial College Press/World Scientific; 2010a. pp. 309–316. [ Google Scholar ]
  • Wager E, Kleinert S. Responsible research publication: International standards for editors. A position statement developed at the 2nd World Conference on Research Integrity, Singapore, July 22–24, 2010. In: Mayer T, Steneck N, editors. Promoting research integrity in a global environment. Singapore: Imperial College Press/World Scientific; 2010b. pp. 317–328. [ Google Scholar ]
  • Webb C, Kevern J. Focus groups as a research method: A critique of some aspects of their use in nursing research. Journal of Advanced Nursing. 2000; 33 (6):798–805. doi: 10.1046/j.1365-2648.2001.01720.x. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Wimpenny K, Savin-Baden M. Exploring and implementing participatory action synthesis. Qualitative Inquiry. 2012; 18 (8):689–698. doi: 10.1177/1077800412452854. [ CrossRef ] [ Google Scholar ]
  • Yeh H.-Y. Boundaries, entities, and modern vegetarianism: Examining the emergence of the first vegetarian organization. Qualitative Inquiry. 2013; 19 (4):298–309. doi: 10.1177/1077800412471516. [ CrossRef ] [ Google Scholar ]
  • Yin R. K. Enhancing the quality of case studies in health services research. Health Services Research. 1999; 34 (5 Pt 2):1209–1224. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Yin R. K. Case study research: Design and methods. 4th ed. Thousand Oaks, CA: Sage; 2009. [ Google Scholar ]
  • Yin R. K. Applications of case study research. 3rd ed. Thousand Oaks, CA: Sage; 2012. [ Google Scholar ]

Case Study Research Method in Psychology

Saul Mcleod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul Mcleod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Learn about our Editorial Process

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

Case studies are in-depth investigations of a person, group, event, or community. Typically, data is gathered from various sources using several methods (e.g., observations & interviews).

The case study research method originated in clinical medicine (the case history, i.e., the patient’s personal history). In psychology, case studies are often confined to the study of a particular individual.

The information is mainly biographical and relates to events in the individual’s past (i.e., retrospective), as well as to significant events that are currently occurring in his or her everyday life.

The case study is not a research method, but researchers select methods of data collection and analysis that will generate material suitable for case studies.

Freud (1909a, 1909b) conducted very detailed investigations into the private lives of his patients in an attempt to both understand and help them overcome their illnesses.

This makes it clear that the case study is a method that should only be used by a psychologist, therapist, or psychiatrist, i.e., someone with a professional qualification.

There is an ethical issue of competence. Only someone qualified to diagnose and treat a person can conduct a formal case study relating to atypical (i.e., abnormal) behavior or atypical development.

case study

 Famous Case Studies

  • Anna O – One of the most famous case studies, documenting psychoanalyst Josef Breuer’s treatment of “Anna O” (real name Bertha Pappenheim) for hysteria in the late 1800s using early psychoanalytic theory.
  • Little Hans – A child psychoanalysis case study published by Sigmund Freud in 1909 analyzing his five-year-old patient Herbert Graf’s house phobia as related to the Oedipus complex.
  • Bruce/Brenda – Gender identity case of the boy (Bruce) whose botched circumcision led psychologist John Money to advise gender reassignment and raise him as a girl (Brenda) in the 1960s.
  • Genie Wiley – Linguistics/psychological development case of the victim of extreme isolation abuse who was studied in 1970s California for effects of early language deprivation on acquiring speech later in life.
  • Phineas Gage – One of the most famous neuropsychology case studies analyzes personality changes in railroad worker Phineas Gage after an 1848 brain injury involving a tamping iron piercing his skull.

Clinical Case Studies

  • Studying the effectiveness of psychotherapy approaches with an individual patient
  • Assessing and treating mental illnesses like depression, anxiety disorders, PTSD
  • Neuropsychological cases investigating brain injuries or disorders

Child Psychology Case Studies

  • Studying psychological development from birth through adolescence
  • Cases of learning disabilities, autism spectrum disorders, ADHD
  • Effects of trauma, abuse, deprivation on development

Types of Case Studies

  • Explanatory case studies : Used to explore causation in order to find underlying principles. Helpful for doing qualitative analysis to explain presumed causal links.
  • Exploratory case studies : Used to explore situations where an intervention being evaluated has no clear set of outcomes. It helps define questions and hypotheses for future research.
  • Descriptive case studies : Describe an intervention or phenomenon and the real-life context in which it occurred. It is helpful for illustrating certain topics within an evaluation.
  • Multiple-case studies : Used to explore differences between cases and replicate findings across cases. Helpful for comparing and contrasting specific cases.
  • Intrinsic : Used to gain a better understanding of a particular case. Helpful for capturing the complexity of a single case.
  • Collective : Used to explore a general phenomenon using multiple case studies. Helpful for jointly studying a group of cases in order to inquire into the phenomenon.

Where Do You Find Data for a Case Study?

There are several places to find data for a case study. The key is to gather data from multiple sources to get a complete picture of the case and corroborate facts or findings through triangulation of evidence. Most of this information is likely qualitative (i.e., verbal description rather than measurement), but the psychologist might also collect numerical data.

1. Primary sources

  • Interviews – Interviewing key people related to the case to get their perspectives and insights. The interview is an extremely effective procedure for obtaining information about an individual, and it may be used to collect comments from the person’s friends, parents, employer, workmates, and others who have a good knowledge of the person, as well as to obtain facts from the person him or herself.
  • Observations – Observing behaviors, interactions, processes, etc., related to the case as they unfold in real-time.
  • Documents & Records – Reviewing private documents, diaries, public records, correspondence, meeting minutes, etc., relevant to the case.

2. Secondary sources

  • News/Media – News coverage of events related to the case study.
  • Academic articles – Journal articles, dissertations etc. that discuss the case.
  • Government reports – Official data and records related to the case context.
  • Books/films – Books, documentaries or films discussing the case.

3. Archival records

Searching historical archives, museum collections and databases to find relevant documents, visual/audio records related to the case history and context.

Public archives like newspapers, organizational records, photographic collections could all include potentially relevant pieces of information to shed light on attitudes, cultural perspectives, common practices and historical contexts related to psychology.

4. Organizational records

Organizational records offer the advantage of often having large datasets collected over time that can reveal or confirm psychological insights.

Of course, privacy and ethical concerns regarding confidential data must be navigated carefully.

However, with proper protocols, organizational records can provide invaluable context and empirical depth to qualitative case studies exploring the intersection of psychology and organizations.

  • Organizational/industrial psychology research : Organizational records like employee surveys, turnover/retention data, policies, incident reports etc. may provide insight into topics like job satisfaction, workplace culture and dynamics, leadership issues, employee behaviors etc.
  • Clinical psychology : Therapists/hospitals may grant access to anonymized medical records to study aspects like assessments, diagnoses, treatment plans etc. This could shed light on clinical practices.
  • School psychology : Studies could utilize anonymized student records like test scores, grades, disciplinary issues, and counseling referrals to study child development, learning barriers, effectiveness of support programs, and more.

How do I Write a Case Study in Psychology?

Follow specified case study guidelines provided by a journal or your psychology tutor. General components of clinical case studies include: background, symptoms, assessments, diagnosis, treatment, and outcomes. Interpreting the information means the researcher decides what to include or leave out. A good case study should always clarify which information is the factual description and which is an inference or the researcher’s opinion.

1. Introduction

  • Provide background on the case context and why it is of interest, presenting background information like demographics, relevant history, and presenting problem.
  • Compare briefly to similar published cases if applicable. Clearly state the focus/importance of the case.

2. Case Presentation

  • Describe the presenting problem in detail, including symptoms, duration,and impact on daily life.
  • Include client demographics like age and gender, information about social relationships, and mental health history.
  • Describe all physical, emotional, and/or sensory symptoms reported by the client.
  • Use patient quotes to describe the initial complaint verbatim. Follow with full-sentence summaries of relevant history details gathered, including key components that led to a working diagnosis.
  • Summarize clinical exam results, namely orthopedic/neurological tests, imaging, lab tests, etc. Note actual results rather than subjective conclusions. Provide images if clearly reproducible/anonymized.
  • Clearly state the working diagnosis or clinical impression before transitioning to management.

3. Management and Outcome

  • Indicate the total duration of care and number of treatments given over what timeframe. Use specific names/descriptions for any therapies/interventions applied.
  • Present the results of the intervention,including any quantitative or qualitative data collected.
  • For outcomes, utilize visual analog scales for pain, medication usage logs, etc., if possible. Include patient self-reports of improvement/worsening of symptoms. Note the reason for discharge/end of care.

4. Discussion

  • Analyze the case, exploring contributing factors, limitations of the study, and connections to existing research.
  • Analyze the effectiveness of the intervention,considering factors like participant adherence, limitations of the study, and potential alternative explanations for the results.
  • Identify any questions raised in the case analysis and relate insights to established theories and current research if applicable. Avoid definitive claims about physiological explanations.
  • Offer clinical implications, and suggest future research directions.

5. Additional Items

  • Thank specific assistants for writing support only. No patient acknowledgments.
  • References should directly support any key claims or quotes included.
  • Use tables/figures/images only if substantially informative. Include permissions and legends/explanatory notes.
  • Provides detailed (rich qualitative) information.
  • Provides insight for further research.
  • Permitting investigation of otherwise impractical (or unethical) situations.

Case studies allow a researcher to investigate a topic in far more detail than might be possible if they were trying to deal with a large number of research participants (nomothetic approach) with the aim of ‘averaging’.

Because of their in-depth, multi-sided approach, case studies often shed light on aspects of human thinking and behavior that would be unethical or impractical to study in other ways.

Research that only looks into the measurable aspects of human behavior is not likely to give us insights into the subjective dimension of experience, which is important to psychoanalytic and humanistic psychologists.

Case studies are often used in exploratory research. They can help us generate new ideas (that might be tested by other methods). They are an important way of illustrating theories and can help show how different aspects of a person’s life are related to each other.

The method is, therefore, important for psychologists who adopt a holistic point of view (i.e., humanistic psychologists ).

Limitations

  • Lacking scientific rigor and providing little basis for generalization of results to the wider population.
  • Researchers’ own subjective feelings may influence the case study (researcher bias).
  • Difficult to replicate.
  • Time-consuming and expensive.
  • The volume of data, together with the time restrictions in place, impacted the depth of analysis that was possible within the available resources.

Because a case study deals with only one person/event/group, we can never be sure if the case study investigated is representative of the wider body of “similar” instances. This means the conclusions drawn from a particular case may not be transferable to other settings.

Because case studies are based on the analysis of qualitative (i.e., descriptive) data , a lot depends on the psychologist’s interpretation of the information she has acquired.

This means that there is a lot of scope for Anna O , and it could be that the subjective opinions of the psychologist intrude in the assessment of what the data means.

For example, Freud has been criticized for producing case studies in which the information was sometimes distorted to fit particular behavioral theories (e.g., Little Hans ).

This is also true of Money’s interpretation of the Bruce/Brenda case study (Diamond, 1997) when he ignored evidence that went against his theory.

Breuer, J., & Freud, S. (1895).  Studies on hysteria . Standard Edition 2: London.

Curtiss, S. (1981). Genie: The case of a modern wild child .

Diamond, M., & Sigmundson, K. (1997). Sex Reassignment at Birth: Long-term Review and Clinical Implications. Archives of Pediatrics & Adolescent Medicine , 151(3), 298-304

Freud, S. (1909a). Analysis of a phobia of a five year old boy. In The Pelican Freud Library (1977), Vol 8, Case Histories 1, pages 169-306

Freud, S. (1909b). Bemerkungen über einen Fall von Zwangsneurose (Der “Rattenmann”). Jb. psychoanal. psychopathol. Forsch ., I, p. 357-421; GW, VII, p. 379-463; Notes upon a case of obsessional neurosis, SE , 10: 151-318.

Harlow J. M. (1848). Passage of an iron rod through the head.  Boston Medical and Surgical Journal, 39 , 389–393.

Harlow, J. M. (1868).  Recovery from the Passage of an Iron Bar through the Head .  Publications of the Massachusetts Medical Society. 2  (3), 327-347.

Money, J., & Ehrhardt, A. A. (1972).  Man & Woman, Boy & Girl : The Differentiation and Dimorphism of Gender Identity from Conception to Maturity. Baltimore, Maryland: Johns Hopkins University Press.

Money, J., & Tucker, P. (1975). Sexual signatures: On being a man or a woman.

Further Information

  • Case Study Approach
  • Case Study Method
  • Enhancing the Quality of Case Studies in Health Services Research
  • “We do things together” A case study of “couplehood” in dementia
  • Using mixed methods for evaluating an integrative approach to cancer care: a case study

Print Friendly, PDF & Email

Related Articles

Qualitative Data Coding

Research Methodology

Qualitative Data Coding

What Is a Focus Group?

What Is a Focus Group?

Cross-Cultural Research Methodology In Psychology

Cross-Cultural Research Methodology In Psychology

What Is Internal Validity In Research?

What Is Internal Validity In Research?

What Is Face Validity In Research? Importance & How To Measure

Research Methodology , Statistics

What Is Face Validity In Research? Importance & How To Measure

Criterion Validity: Definition & Examples

Criterion Validity: Definition & Examples

Cart

  • SUGGESTED TOPICS
  • The Magazine
  • Newsletters
  • Managing Yourself
  • Managing Teams
  • Work-life Balance
  • The Big Idea
  • Data & Visuals
  • Reading Lists
  • Case Selections
  • HBR Learning
  • Topic Feeds
  • Account Settings
  • Email Preferences

What the Case Study Method Really Teaches

  • Nitin Nohria

methods of analysis in case study

Seven meta-skills that stick even if the cases fade from memory.

It’s been 100 years since Harvard Business School began using the case study method. Beyond teaching specific subject matter, the case study method excels in instilling meta-skills in students. This article explains the importance of seven such skills: preparation, discernment, bias recognition, judgement, collaboration, curiosity, and self-confidence.

During my decade as dean of Harvard Business School, I spent hundreds of hours talking with our alumni. To enliven these conversations, I relied on a favorite question: “What was the most important thing you learned from your time in our MBA program?”

  • Nitin Nohria is the George F. Baker Jr. and Distinguished Service University Professor. He served as the 10th dean of Harvard Business School, from 2010 to 2020.

Partner Center

  • Business Essentials
  • Leadership & Management
  • Credential of Leadership, Impact, and Management in Business (CLIMB)
  • Entrepreneurship & Innovation
  • Digital Transformation
  • Finance & Accounting
  • Business in Society
  • For Organizations
  • Support Portal
  • Media Coverage
  • Founding Donors
  • Leadership Team

methods of analysis in case study

  • Harvard Business School →
  • HBS Online →
  • Business Insights →

Business Insights

Harvard Business School Online's Business Insights Blog provides the career insights you need to achieve your goals and gain confidence in your business skills.

  • Career Development
  • Communication
  • Decision-Making
  • Earning Your MBA
  • Negotiation
  • News & Events
  • Productivity
  • Staff Spotlight
  • Student Profiles
  • Work-Life Balance
  • AI Essentials for Business
  • Alternative Investments
  • Business Analytics
  • Business Strategy
  • Business and Climate Change
  • Design Thinking and Innovation
  • Digital Marketing Strategy
  • Disruptive Strategy
  • Economics for Managers
  • Entrepreneurship Essentials
  • Financial Accounting
  • Global Business
  • Launching Tech Ventures
  • Leadership Principles
  • Leadership, Ethics, and Corporate Accountability
  • Leading Change and Organizational Renewal
  • Leading with Finance
  • Management Essentials
  • Negotiation Mastery
  • Organizational Leadership
  • Power and Influence for Positive Impact
  • Strategy Execution
  • Sustainable Business Strategy
  • Sustainable Investing
  • Winning with Digital Platforms

5 Benefits of Learning Through the Case Study Method

Harvard Business School MBA students learning through the case study method

  • 28 Nov 2023

While several factors make HBS Online unique —including a global Community and real-world outcomes —active learning through the case study method rises to the top.

In a 2023 City Square Associates survey, 74 percent of HBS Online learners who also took a course from another provider said HBS Online’s case method and real-world examples were better by comparison.

Here’s a primer on the case method, five benefits you could gain, and how to experience it for yourself.

Access your free e-book today.

What Is the Harvard Business School Case Study Method?

The case study method , or case method , is a learning technique in which you’re presented with a real-world business challenge and asked how you’d solve it. After working through it yourself and with peers, you’re told how the scenario played out.

HBS pioneered the case method in 1922. Shortly before, in 1921, the first case was written.

“How do you go into an ambiguous situation and get to the bottom of it?” says HBS Professor Jan Rivkin, former senior associate dean and chair of HBS's master of business administration (MBA) program, in a video about the case method . “That skill—the skill of figuring out a course of inquiry to choose a course of action—that skill is as relevant today as it was in 1921.”

Originally developed for the in-person MBA classroom, HBS Online adapted the case method into an engaging, interactive online learning experience in 2014.

In HBS Online courses , you learn about each case from the business professional who experienced it. After reviewing their videos, you’re prompted to take their perspective and explain how you’d handle their situation.

You then get to read peers’ responses, “star” them, and comment to further the discussion. Afterward, you learn how the professional handled it and their key takeaways.

HBS Online’s adaptation of the case method incorporates the famed HBS “cold call,” in which you’re called on at random to make a decision without time to prepare.

“Learning came to life!” said Sheneka Balogun , chief administration officer and chief of staff at LeMoyne-Owen College, of her experience taking the Credential of Readiness (CORe) program . “The videos from the professors, the interactive cold calls where you were randomly selected to participate, and the case studies that enhanced and often captured the essence of objectives and learning goals were all embedded in each module. This made learning fun, engaging, and student-friendly.”

If you’re considering taking a course that leverages the case study method, here are five benefits you could experience.

5 Benefits of Learning Through Case Studies

1. take new perspectives.

The case method prompts you to consider a scenario from another person’s perspective. To work through the situation and come up with a solution, you must consider their circumstances, limitations, risk tolerance, stakeholders, resources, and potential consequences to assess how to respond.

Taking on new perspectives not only can help you navigate your own challenges but also others’. Putting yourself in someone else’s situation to understand their motivations and needs can go a long way when collaborating with stakeholders.

2. Hone Your Decision-Making Skills

Another skill you can build is the ability to make decisions effectively . The case study method forces you to use limited information to decide how to handle a problem—just like in the real world.

Throughout your career, you’ll need to make difficult decisions with incomplete or imperfect information—and sometimes, you won’t feel qualified to do so. Learning through the case method allows you to practice this skill in a low-stakes environment. When facing a real challenge, you’ll be better prepared to think quickly, collaborate with others, and present and defend your solution.

3. Become More Open-Minded

As you collaborate with peers on responses, it becomes clear that not everyone solves problems the same way. Exposing yourself to various approaches and perspectives can help you become a more open-minded professional.

When you’re part of a diverse group of learners from around the world, your experiences, cultures, and backgrounds contribute to a range of opinions on each case.

On the HBS Online course platform, you’re prompted to view and comment on others’ responses, and discussion is encouraged. This practice of considering others’ perspectives can make you more receptive in your career.

“You’d be surprised at how much you can learn from your peers,” said Ratnaditya Jonnalagadda , a software engineer who took CORe.

In addition to interacting with peers in the course platform, Jonnalagadda was part of the HBS Online Community , where he networked with other professionals and continued discussions sparked by course content.

“You get to understand your peers better, and students share examples of businesses implementing a concept from a module you just learned,” Jonnalagadda said. “It’s a very good way to cement the concepts in one's mind.”

4. Enhance Your Curiosity

One byproduct of taking on different perspectives is that it enables you to picture yourself in various roles, industries, and business functions.

“Each case offers an opportunity for students to see what resonates with them, what excites them, what bores them, which role they could imagine inhabiting in their careers,” says former HBS Dean Nitin Nohria in the Harvard Business Review . “Cases stimulate curiosity about the range of opportunities in the world and the many ways that students can make a difference as leaders.”

Through the case method, you can “try on” roles you may not have considered and feel more prepared to change or advance your career .

5. Build Your Self-Confidence

Finally, learning through the case study method can build your confidence. Each time you assume a business leader’s perspective, aim to solve a new challenge, and express and defend your opinions and decisions to peers, you prepare to do the same in your career.

According to a 2022 City Square Associates survey , 84 percent of HBS Online learners report feeling more confident making business decisions after taking a course.

“Self-confidence is difficult to teach or coach, but the case study method seems to instill it in people,” Nohria says in the Harvard Business Review . “There may well be other ways of learning these meta-skills, such as the repeated experience gained through practice or guidance from a gifted coach. However, under the direction of a masterful teacher, the case method can engage students and help them develop powerful meta-skills like no other form of teaching.”

Your Guide to Online Learning Success | Download Your Free E-Book

How to Experience the Case Study Method

If the case method seems like a good fit for your learning style, experience it for yourself by taking an HBS Online course. Offerings span seven subject areas, including:

  • Business essentials
  • Leadership and management
  • Entrepreneurship and innovation
  • Finance and accounting
  • Business in society

No matter which course or credential program you choose, you’ll examine case studies from real business professionals, work through their challenges alongside peers, and gain valuable insights to apply to your career.

Are you interested in discovering how HBS Online can help advance your career? Explore our course catalog and download our free guide —complete with interactive workbook sections—to determine if online learning is right for you and which course to take.

methods of analysis in case study

About the Author

Green Garage

Case Study Method – 18 Advantages and Disadvantages

The case study method uses investigatory research as a way to collect data about specific demographics. This approach can apply to individuals, businesses, groups, or events. Each participant receives an equal amount of participation, offering information for collection that can then find new insights into specific trends, ideas, of hypotheses.

Interviews and research observation are the two standard methods of data collection used when following the case study method.

Researchers initially developed the case study method to develop and support hypotheses in clinical medicine. The benefits found in these efforts led the approach to transition to other industries, allowing for the examination of results through proposed decisions, processes, or outcomes. Its unique approach to information makes it possible for others to glean specific points of wisdom that encourage growth.

Several case study method advantages and disadvantages can appear when researchers take this approach.

List of the Advantages of the Case Study Method

1. It requires an intensive study of a specific unit. Researchers must document verifiable data from direct observations when using the case study method. This work offers information about the input processes that go into the hypothesis under consideration. A casual approach to data-gathering work is not effective if a definitive outcome is desired. Each behavior, choice, or comment is a critical component that can verify or dispute the ideas being considered.

Intensive programs can require a significant amount of work for researchers, but it can also promote an improvement in the data collected. That means a hypothesis can receive immediate verification in some situations.

2. No sampling is required when following the case study method. This research method studies social units in their entire perspective instead of pulling individual data points out to analyze them. That means there is no sampling work required when using the case study method. The hypothesis under consideration receives support because it works to turn opinions into facts, verifying or denying the proposals that outside observers can use in the future.

Although researchers might pay attention to specific incidents or outcomes based on generalized behaviors or ideas, the study itself won’t sample those situations. It takes a look at the “bigger vision” instead.

3. This method offers a continuous analysis of the facts. The case study method will look at the facts continuously for the social group being studied by researchers. That means there aren’t interruptions in the process that could limit the validity of the data being collected through this work. This advantage reduces the need to use assumptions when drawing conclusions from the information, adding validity to the outcome of the study over time. That means the outcome becomes relevant to both sides of the equation as it can prove specific suppositions or invalidate a hypothesis under consideration.

This advantage can lead to inefficiencies because of the amount of data being studied by researchers. It is up to the individuals involved in the process to sort out what is useful and meaningful and what is not.

4. It is a useful approach to take when formulating a hypothesis. Researchers will use the case study method advantages to verify a hypothesis under consideration. It is not unusual for the collected data to lead people toward the formulation of new ideas after completing this work. This process encourages further study because it allows concepts to evolve as people do in social or physical environments. That means a complete data set can be gathered based on the skills of the researcher and the honesty of the individuals involved in the study itself.

Although this approach won’t develop a societal-level evaluation of a hypothesis, it can look at how specific groups will react in various circumstances. That information can lead to a better decision-making process in the future for everyone involved.

5. It provides an increase in knowledge. The case study method provides everyone with analytical power to increase knowledge. This advantage is possible because it uses a variety of methodologies to collect information while evaluating a hypothesis. Researchers prefer to use direct observation and interviews to complete their work, but it can also advantage through the use of questionnaires. Participants might need to fill out a journal or diary about their experiences that can be used to study behaviors or choices.

Some researchers incorporate memory tests and experimental tasks to determine how social groups will interact or respond in specific situations. All of this data then works to verify the possibilities that a hypothesis proposes.

6. The case study method allows for comparisons. The human experience is one that is built on individual observations from group situations. Specific demographics might think, act, or respond in particular ways to stimuli, but each person in that group will also contribute a small part to the whole. You could say that people are sponges that collect data from one another every day to create individual outcomes.

The case study method allows researchers to take the information from each demographic for comparison purposes. This information can then lead to proposals that support a hypothesis or lead to its disruption.

7. Data generalization is possible using the case study method. The case study method provides a foundation for data generalization, allowing researches to illustrate their statistical findings in meaningful ways. It puts the information into a usable format that almost anyone can use if they have the need to evaluate the hypothesis under consideration. This process makes it easier to discover unusual features, unique outcomes, or find conclusions that wouldn’t be available without this method. It does an excellent job of identifying specific concepts that relate to the proposed ideas that researchers were verifying through their work.

Generalization does not apply to a larger population group with the case study method. What researchers can do with this information is to suggest a predictable outcome when similar groups are placed in an equal situation.

8. It offers a comprehensive approach to research. Nothing gets ignored when using the case study method to collect information. Every person, place, or thing involved in the research receives the complete attention of those seeking data. The interactions are equal, which means the data is comprehensive and directly reflective of the group being observed.

This advantage means that there are fewer outliers to worry about when researching an idea, leading to a higher level of accuracy in the conclusions drawn by the researchers.

9. The identification of deviant cases is possible with this method. The case study method of research makes it easier to identify deviant cases that occur in each social group. These incidents are units (people) that behave in ways that go against the hypothesis under consideration. Instead of ignoring them like other options do when collecting data, this approach incorporates the “rogue” behavior to understand why it exists in the first place.

This advantage makes the eventual data and conclusions gathered more reliable because it incorporates the “alternative opinion” that exists. One might say that the case study method places as much emphasis on the yin as it does the yang so that the whole picture becomes available to the outside observer.

10. Questionnaire development is possible with the case study method. Interviews and direct observation are the preferred methods of implementing the case study method because it is cheap and done remotely. The information gathered by researchers can also lead to farming questionnaires that can farm additional data from those being studied. When all of the data resources come together, it is easier to formulate a conclusion that accurately reflects the demographics.

Some people in the case study method may try to manipulate the results for personal reasons, but this advantage makes it possible to identify this information readily. Then researchers can look into the thinking that goes into the dishonest behaviors observed.

List of the Disadvantages of the Case Study Method

1. The case study method offers limited representation. The usefulness of the case study method is limited to a specific group of representatives. Researchers are looking at a specific demographic when using this option. That means it is impossible to create any generalization that applies to the rest of society, an organization, or a larger community with this work. The findings can only apply to other groups caught in similar circumstances with the same experiences.

It is useful to use the case study method when attempting to discover the specific reasons why some people behave in a specific way. If researchers need something more generalized, then a different method must be used.

2. No classification is possible with the case study method. This disadvantage is also due to the sample size in the case study method. No classification is possible because researchers are studying such a small unit, group, or demographic. It can be an inefficient process since the skills of the researcher help to determine the quality of the data being collected to verify the validity of a hypothesis. Some participants may be unwilling to answer or participate, while others might try to guess at the outcome to support it.

Researchers can get trapped in a place where they explore more tangents than the actual hypothesis with this option. Classification can occur within the units being studied, but this data cannot extrapolate to other demographics.

3. The case study method still offers the possibility of errors. Each person has an unconscious bias that influences their behaviors and choices. The case study method can find outliers that oppose a hypothesis fairly easily thanks to its emphasis on finding facts, but it is up to the researchers to determine what information qualifies for this designation. If the results from the case study method are surprising or go against the opinion of participating individuals, then there is still the possibility that the information will not be 100% accurate.

Researchers must have controls in place that dictate how data gathering work occurs. Without this limitation in place, the results of the study cannot be guaranteed because of the presence of bias.

4. It is a subjective method to use for research. Although the purpose of the case study method of research is to gather facts, the foundation of what gets gathered is still based on opinion. It uses the subjective method instead of the objective one when evaluating data, which means there can be another layer of errors in the information to consider.

Imagine that a researcher interprets someone’s response as “angry” when performing direct observation, but the individual was feeling “shame” because of a decision they made. The difference between those two emotions is profound, and it could lead to information disruptions that could be problematic to the eventual work of hypothesis verification.

5. The processes required by the case study method are not useful for everyone. The case study method uses a person’s memories, explanations, and records from photographs and diaries to identify interactions on influences on psychological processes. People are given the chance to describe what happens in the world around them as a way for researchers to gather data. This process can be an advantage in some industries, but it can also be a worthless approach to some groups.

If the social group under study doesn’t have the information, knowledge, or wisdom to provide meaningful data, then the processes are no longer useful. Researchers must weigh the advantages and disadvantages of the case study method before starting their work to determine if the possibility of value exists. If it does not, then a different method may be necessary.

6. It is possible for bias to form in the data. It’s not just an unconscious bias that can form in the data when using the case study method. The narrow study approach can lead to outright discrimination in the data. Researchers can decide to ignore outliers or any other information that doesn’t support their hypothesis when using this method. The subjective nature of this approach makes it difficult to challenge the conclusions that get drawn from this work, and the limited pool of units (people) means that duplication is almost impossible.

That means unethical people can manipulate the results gathered by the case study method to their own advantage without much accountability in the process.

7. This method has no fixed limits to it. This method of research is highly dependent on situational circumstances rather than overarching societal or corporate truths. That means the researcher has no fixed limits of investigation. Even when controls are in place to limit bias or recommend specific activities, the case study method has enough flexibility built into its structures to allow for additional exploration. That means it is possible for this work to continue indefinitely, gathering data that never becomes useful.

Scientists began to track the health of 268 sophomores at Harvard in 1938. The Great Depression was in its final years at that point, so the study hoped to reveal clues that lead to happy and healthy lives. It continues still today, now incorporating the children of the original participants, providing over 80 years of information to sort through for conclusions.

8. The case study method is time-consuming and expensive. The case study method can be affordable in some situations, but the lack of fixed limits and the ability to pursue tangents can make it a costly process in most situations. It takes time to gather the data in the first place, and then researchers must interpret the information received so that they can use it for hypothesis evaluation. There are other methods of data collection that can be less expensive and provide results faster.

That doesn’t mean the case study method is useless. The individualization of results can help the decision-making process advance in a variety of industries successfully. It just takes more time to reach the appropriate conclusion, and that might be a resource that isn’t available.

The advantages and disadvantages of the case study method suggest that the helpfulness of this research option depends on the specific hypothesis under consideration. When researchers have the correct skills and mindset to gather data accurately, then it can lead to supportive data that can verify ideas with tremendous accuracy.

This research method can also be used unethically to produce specific results that can be difficult to challenge.

When bias enters into the structure of the case study method, the processes become inefficient, inaccurate, and harmful to the hypothesis. That’s why great care must be taken when designing a study with this approach. It might be a labor-intensive way to develop conclusions, but the outcomes are often worth the investments needed.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 09 June 2024

Imbalanced spectral data analysis using data augmentation based on the generative adversarial network

  • Jihoon Chung 1 ,
  • Junru Zhang 2 ,
  • Amirul Islam Saimon 2 ,
  • Yang Liu 2 ,
  • Blake N. Johnson 2 &
  • Zhenyu Kong 2  

Scientific Reports volume  14 , Article number:  13230 ( 2024 ) Cite this article

42 Accesses

Metrics details

  • Bioinformatics
  • High-throughput screening

Spectroscopic techniques generate one-dimensional spectra with distinct peaks and specific widths in the frequency domain. These features act as unique identities for material characteristics. Deep neural networks (DNNs) has recently been considered a powerful tool for automatically categorizing experimental spectra data by supervised classification to evaluate material characteristics. However, most existing work assumes balanced spectral data among various classes in the training data, contrary to actual experiments, where the spectral data is usually imbalanced. The imbalanced training data deteriorates the supervised classification performance, hindering understanding of the phase behavior, specifically, sol-gel transition (gelation) of soft materials and glycomaterials. To address this issue, this paper applies a novel data augmentation method based on a generative adversarial network (GAN) proposed by the authors in their prior work. To demonstrate the effectiveness of the proposed method, the actual imbalanced spectral data from Pluronic F-127 hydrogel and Alpha-Cyclodextrin hydrogel are used to classify the phases of data. Specifically, our approach improves 8.8%, 6.4%, and 6.2% of the performance of the existing data augmentation methods regarding the classifier’s F-score, Precision, and Recall on average, respectively. Specifically, our method consists of three DNNs: the generator, discriminator, and classifier. The method generates samples that are not only authentic but emphasize the differentiation between material characteristics to provide balanced training data, improving the classification results. Based on these validated results, we expect the method’s broader applications in addressing imbalanced measurement data across diverse domains in materials science and chemical engineering.

Similar content being viewed by others

methods of analysis in case study

Validating neural networks for spectroscopic classification on a universal synthetic dataset

methods of analysis in case study

RSPSSL: A novel high-fidelity Raman spectral preprocessing scheme to enhance biomedical applications and chemical resolution visualization

methods of analysis in case study

Enhanced accuracy through machine learning-based simultaneous evaluation: a case study of RBS analysis of multinary materials

Introduction.

Spectroscopic technologies such as X-ray diffraction (XRD), Nuclear Magnetic Resonance (NMR), Raman scattering, and Electrical Impedance Spectral (EIS) are fundamental tools for the characterization of experimental samples in chemistry and materials science. XRD has found extensive use throughout industry and research laboratories for more than a century 1 . It is proven to be an effective method for characterizing crystalline materials as it captures detailed information on the long-range periodic nature of crystal structures. In contrast, NMR and Raman measurements are more strongly dependent on localized chemical interactions and are widely used to characterize the structure of molecular materials 2 , 3 . EIS is a technique used to determine the impedance characteristics of an electrochemical interface. It has been used increasingly in biomaterials studies to understand the interactions between the surface and the biological environment. While their mechanisms and uses may vary, all of these spectroscopic methods generate comparable one-dimensional spectra consisting of unique peak positions, widths, and intensities. These features often serve as “fingerprints” for material characteristics, including patterns and phases 4 , 5 . Identification of the characteristics of unknown specimens can be achieved by comparing newly measured spectra with those of established materials in experimental databases 6 , 7 . However, the analysis process is complicated by factors such as measurement noise, background signals, and inherent minor deviations in the spectra 8 . To automate this process, machine learning has recently emerged as an effective tool since it can automatically classify experimental spectra along material characteristics with significant accuracies 9 , 10 .

The popular method within the domain of machine learning is deep neural networks (DNNs). These networks consist of several layers of artificial neurons designed to mimic the structure and functioning of the human brain 11 . DNNs is widely used in classification tasks of spectral data as they can automatically extract discriminating features. Specifically, DNNs is utilized for supervised classification methods since these methods can use the label information of each class (i.e., material characteristics of spectral data), providing accurate classification results. For example, Kantz et al. 12 used DNNs to classify Liquid Chromatography-Mass Spectrometry (LC-MS) spectral peak shapes. This approach improves peak filtering performance by reducing the false peaks by more than 90% compared to the traditional chemometric methods. Zeng et al. 13 utilized one-dimensional convolutional neural network (CNN) to classify the visible-near infrared spectra of corn seed to evaluate seed viability. In addition, Lee et al. 14 developed a CNN-based model to classify interested phases from a mixture of inorganic compounds using XRD. Similarly, Schuetzke et al. 8 built a robust CNN model for automatically classifying phases using the XRD patterns. This shows superior performance in automatic phase identification of cement compounds and iron ores. These studies assumed balanced training spectral data between classes (i.e., material characteristics of spectral data) in their supervised classification methods.

However, the balanced spectral data among the classes is difficult to appear in actual chemistry, physics, and industries generating the spectral data. For example, medical diagnostic applications often generate imbalanced spectral data reflecting the common asymmetry encountered in health status among screened individuals (e.g., more true negatives than true positives are typically encountered in preventative diagnostics). Materials science and chemistry applications also often generate imbalanced spectral data reflecting the common asymmetry of composition–process–structure–property relations, such as associated with phase equilibrium (e.g., the physics governing the thermodynamics of mixtures often results in asymmetric distributions of stable, unstable, and transition states with respect to varying mixture composition). For example, it is common to encounter samples of one type in accelerated materials discovery applications based on the unknown structure of a material design space and the initially selected search parameters, which may be done randomly or based on prior knowledge. As such, imbalanced spectral data is inevitably generated mainly in actual experiments and industries. However, the imbalanced spectral data leads to compromised supervised classification performance using DNNs. Specifically, the prediction in classification models tends to be biased towards the majority class, which has sizable spectral data samples. This leads to a high probability of misclassifying samples from the minority class 15 .

To address this significant challenge arising from imbalanced spectral data in classification utilizing DNNs, a viable solution is to employ data augmentation techniques to create a balanced training dataset across spectral data of different material characteristics. Basic data augmentation methods, including rotation, flipping, synthetic minority oversampling technique (SMOTE) 16 , and Borderline-SMOTE (B-SMOTE) 17 are commonly used for balancing training data within the classification due to their straightforward implementation 18 , 19 , 20 . However, these techniques primarily take into account localized information, thus failing to capture the complete data distribution and address the challenge of overfitting 21 , 22 . Consequently, these methods are unsuitable for generating realistic spectral data with various characteristics 23 , 24 . In contrast, there has been a growing trend in the active utilization of Generative Adversarial Networks (GAN) and its variations 25 , 26 , including deep convolutional GAN (DCGAN) 27 , CDRAGAN 28 , and Covid GAN 29 , to supplement the limited actual data because of the GAN’s capacity to generate authentic data by comprehensively learning the entire data distribution of actual data through two neural networks: the discriminator and the generator 30 , 31 . Specifically, Balancing GAN (BAGAN) 32 is a well-known GAN-based method focusing on generating minority class samples. Huang and Jafari 28 proposed an enhanced version of BAGAN (BAGAN-GP) 28 by providing an improved initialization method and gradient penalty technique to stabilize the training process. Based on the GAN’s capacity, it has been widely used in spectral data analysis. For example, Wu et al. 33 used a GAN framework to augment synthetic Raman spectroscopy data of skin cancer tissue to address the difficulties of class imbalance in the context of cancer tissue data. Similarly, Gao et al. 34 utilized GAN to generate seizure events in long-term EEG spectra to overcome the data imbalance problem for accurate classification.

Although these studies generate realistic spectral data to provide balanced data among the various material characteristics, they do not consider generating the samples enabling differentiation between characteristics (i.e., characteristics-distinguishable samples). The characteristics-distinguishable samples can further improve the classification performance, which is the ultimate goal of generating the data in the spectral data analysis. The samples can be generated by joint optimization between GAN and the classifier. Specifically, the classifier guides the generator in GAN to create samples that could improve classification results. Regarding this direction, we proposed a novel data augmentation method in a recent paper 15 that jointly optimizes between GAN and the classifier with several stabilizing techniques. The method validated its effectiveness in imbalanced data in additive manufacturing processes. Therefore, we apply the method to spectral data to address the imbalanced spectral data issue that commonly occurs in actual experiments and industries. In this paper, the effectiveness of our method is validated by using the spectral data collected from actual experimentation. Specifically, the electrical impedance spectral data from Pluronic F-127 hydrogel and Alpha-Cyclodextrin hydrogel are used. The phases of spectral data are provided as imbalanced. The results show that the imbalanced spectral data can be successfully overcome by our method in the classification of the phases. In particular, our approach enhances the F-score, Precision, and Recall of the classifier by an average of 8.8%, 6.5%, and 6.2%, respectively, compared to the benchmark methods. Moreover, the technique has great generality. Thus, it can be further applied to address the classification with imbalanced spectral data in other material science or chemical engineering domains.

Several real-world case studies are provided to show the effectiveness of our method in imbalanced spectral data analysis. In “ Case study using spectral data from Pluronic F-127 hydrogel ” and “ Case study using spectral data from Alpha-Cyclodextrin hydrogel ” sections, comparative case studies involving benchmark methods are provided. Specifically, spectral data from two actual materials, Pluronic F-127 hydrogel, and Alpha-Cyclodextrin hydrogel, are provided in “ Case study using spectral data from Pluronic F-127 hydrogel ” and “ Case study using spectral data from Alpha-Cyclodextrin hydrogel ” sections, respectively. The imbalanced spectral data regarding the material phases are provided to evaluate the performance. Therefore, the material characteristics that need to be classified are the material phases in the case studies. The performance assessment is conducted based on the classification results obtained from the imbalanced training dataset. All case studies utilize the Keras with TensorFlow backend. The experiments are carried out on an NVIDIA Tesla P4 GPU within the Google Colab environment 35 .

Benchmark methods

Regarding the benchmark methods, both sampling-based and GAN-based approaches are used. Within the sampling-based category, two techniques that SMOTE 16 and B-SMOTE 17 are used. These methods are implemented using the Python imbalanced-learn library. For the GAN-based approaches, three state-of-the-art class-conditional GAN methods, namely, CDRAGAN 28 , BAGAN-GP 28 , and Covid GAN 29 are selected. In addition, Cooperative GAN 36 , which is also class-conditional GAN that jointly optimizes GAN and the classifier without stabilizing technique, is utilized as one of the benchmark methods. Beyond the GAN methods, we also considered the diffusion model 37 , which has been widely used recently because of its superior generative performance. Specifically, the class-conditioned U-Net-based diffusion model (CCD-diffusion) 38 , 39 is used as a benchmark method. Finally, the baseline is established by evaluating the classification performance without employing any data augmentation method.

Performance evaluation measure

The performance assessment is determined by the classifier’s F-score, Precision, and Recall 40 . Convolutional neural network (CNN) is used as a classifier. The F-score expressed in Eq. ( 1 ) is a composite metric that combines both Precision and Recall.

As the primary goal of this paper is to enhance classification accuracy using imbalanced training data, it includes case studies that encompass different balanced ratios. A balanced ratio refers to the proportion between the training data size of the minority and majority classes. Each case study is iterated ten times. The performance measure is the average performance across all classes from the ten repetitions.

Case study using spectral data from Pluronic F-127 hydrogel

Pluronic F-127 (PF-127), a nonionic amphiphilic surfactant, demonstrates a reversible thermogelling process in aqueous solutions, resembling the behavior observed in other Pluronic compounds 41 . In this section, PF-127 hydrogel libraries are used for the case study. It’s been widely used and studied in a wide range of applications. 96 PF-127 deionized water mixtures with different mass ratios are formulated in the 96-well plates. The concentration of PF-127 deionized water varies from 0.3125 to 30 wt% with an increment of 0.3125 wt%. The phase angle-frequency spectrum of each sample is collected by a sensor-based high-throughput method. The collected spectra are labeled as solution or gel to study the composition-property relationships of PF-127 hydrogels. Three repeated experiments provide 288 spectral data. Specifically, 181 spectral data of solution (Fig.  1 a) and 107 of gel (Fig.  1 b) are utilized for the case study. The frequency range for each experiment and concentration is determined by the spectrum width. Moreover, different sensors are employed in repeated experiments, resulting in diverse spectrum frequency ranges. To use all the spectrum data from three experiments, the x-axis of spectrum data is converted into the sequence of sensor measurements (from one to eight hundred, which is the length of data). The detailed data collection procedure and frequency range of each experiment are described in “ Data collection of Pluronic F-127 hydrogel libraries ” section.

figure 1

Spectral data of Pluronic F-127 hydrogel from ( a ) solution; ( b ) gel; ( c ) solution and gel.

Table  1 describes the imbalanced training data, where the balanced ratios between the two phases are 0.013, 0.027, and 0.039, respectively. The ratio is set because balanced ratios below 0.013 result in significantly poor performance for the classifier. The remaining data sets are used as testing data.

Figure  2 shows the actual and generated samples from the proposed method, respectively. Specifically, Fig.  2 a describes the actual imbalanced training data in Table  1 , while Fig.  2 b represents the actual testing data. The generated samples in Fig.  2 when the balanced ratio is 0.027 are realistic spectral data with apparent differences between phases achieved through a learning process in our method. Specifically, the results show that the generated samples from our approach successfully learn the features of the test data of the gel phase (Fig.  2 b) from the small number of training data samples (Fig.  2 a).

figure 2

Comparison between generated data of Pluronic F-127 hydrogel with ( a ) actual training data; ( b ) actual testing data when the balanced ratio is 0.027.

Figure  3 shows the performance evaluation of the benchmark and our methods using the generated samples from each method. The detailed averages and standard deviations of the performance of each method are provided in Appendix  1.3 . Compared to a baseline result that uses only imbalanced data as training data of the classifier, the sampling-based methods, including B-SMOTE 17 and SMOTE 16 , tend to exhibit similar or worse performance. This is because the small number of minority class samples prevents the generation of various data from sampling-based methods.

figure 3

Performance evaluation using Pluronic F-127 hydrogel with several balanced ratios.

Conversely, GAN-based approaches typically outperform sampling-based methods because their generators learn the actual distribution of samples from minority classes and generate diverse training data for the classifier. In particular, the generator from our method provides more diverse and better-quality samples than other GAN-based methods by jointly optimizing the classifier with stabilizing techniques, resulting in improvements in classification results. Specifically, our method improves 9.4%, 8.3%, and 5.3% of the average performance of the benchmark methods regarding their F-score, Precision, and Recall, respectively. To check the significance of the proposed method over the benchmark methods, we performed the paired-T test 42 between the proposed method and a benchmark method, achieving the best F-score performance, the composite metric of precision and recall. Specifically, Cooperative GAN, BAGAN-GP, and CDARAGAN show the best performance among the benchmark methods at the balanced ratios of 0.013, 0.027, and 0.039, respectively. Table  2 illustrates that the proposed method shows statistically significant improvements over the best benchmark method at a 95% significance level in most cases. Furthermore, Table  3 represents the average training time from each of the data augmentation methods. Although the proposed method takes a relatively large training time compared to benchmark methods, it is valuable to use the proposed method to achieve significant improvements in classification results over the benchmark methods.

figure 4

t-SNE of the feature from the intermediate layer of the classifier from our method in epochs ( a ) 0 and ( b ) 140 when the balanced ratio is 0.027.

Figure  4 illustrates the efficacy of the generated samples produced by our approach by comparing their features in the classifier with those of actual samples when the balanced ratio is 0.027. Specifically, Fig.  4 displays the t-distributed Stochastic Neighbourhood Embedding (t-SNE) of the feature extracted from the intermediate layer of our method’s classifier. t-SNE is a nonlinear dimensionality reduction technique designed for visualizing high-dimensional data by projecting it into lower-dimensional spaces 43 . In Fig.  4 , ‘ \(\bullet\) ’ represents t-SNE of the features from the intermediate layer of classifiers extracted from actual samples, while ‘ \(\times\) ’ represents features from the generated samples within the balanced training batch. To achieve a balanced training batch, there is an abundance of ‘ \(\times\) ’ instances for the minority class (i.e., the gel phase) in each batch. In Fig.  4 a, it is evident that the distribution patterns between actual and generated samples are distinct at epoch 0. Specifically, the ‘ \(\bullet\) ’ of the gel phase is not aligned with ‘ \(\times\) ’ of its phase. Furthermore, it is aligned with the ‘ \(\bullet\) ’ of the solution phase. Because our approach is designed to generate realistic and distinguishable samples between the phases, the features extracted from the generated samples (denoted as ‘ \(\times\) ’) accurately align with those from the actual samples (represented as ‘ \(\bullet\) ’) based on their respective phases at epoch 140 (Fig.  4 b). Furthermore, the features associated with each phase are distinctly separated. This observation confirms the realistic and phase-discriminative characteristics of the generated samples produced by our method. By employing balanced training data characterized by these attributes, our method attains a high level of classification performance.

Case study using spectral data from Alpha-Cyclodextrin hydrogel

Alpha-Cyclodextrin based polypseudorotaxane supramolecular hydrogels, which are based on the self-assembly of a polymer chain “guest” and Alpha-Cyclodextrin “host”, are promising materials for a wide range of applications, including drug delivery and tissue engineering 44 . In this section, hydrogel libraries of Alpha-Cyclodextrin ( \(\alpha\) -CD)/Polyethylene glycol (PEG) are used for the case study. It’s known that composition plays a vital role in forming hydrogels. Here, 96 \(\alpha\) -CD/PEG hydrogel samples with different mass ratios of \(\alpha\) -CD to PEG are formulated in the 96-well plate. The concentration of PEG is kept at 120 mg/mL while the concentration of \(\alpha\) -CD varies from 20 to 40 mg/mL. The phase angle-frequency spectrum of each sample is collected by a sensor-based high throughput method. The collected spectra are labeled as solution or gel to study the composition-structure relationship of \(\alpha\) -CD/PEG hydrogels. Three repeated experiments offer 288 spectral data. Specifically, 194 spectral data of gel (Fig.  5 a) and 94 of solution (Fig.  5 b) are provided for the case study. The detailed procedure of data collection is described in “ Data collection of Alphasps Cyclodextrin hydrogel libraries ” section.

figure 5

Spectral data of alpha-Cyclodextrin hydrogel from ( a ) gel; ( b ) solution; ( c ) gel and solution.

Table  4 illustrates the training data with various balanced ratios. Specifically, the balanced ratios that the classifier’s performances are applicable in practice are utilized. The remaining samples in each phase are used as testing data.

Figure  6 shows the samples of actual and generated samples from the proposed method when the balanced ratio is 0.050. Similar to Fig.  2 , the generated samples from our approach successfully learn the features of the test data of the solution phase (Fig.  6 b) from the small number of training data samples (Fig.  6 a).

figure 6

Comparison between generated data of Alpha-Cyclodextrin hydrogel with ( a ) actual training data; ( b ) actual testing data when the balanced ratio is 0.050.

figure 7

Performance evaluation using Alpha-Cyclodextrin hydrogel with several balanced ratios.

Figure  7 shows the performance evaluation of the benchmark and our methods using the generated samples from each method. The detailed averages and standard deviations of the performance of each method are provided in Appendix  1.4 . In addition, Table  5 represents the average training time from each data augmentation method.

In this case studies, all benchmark methods represent worse results than the baseline. This might be caused by high similarities between the samples from the gel and solution phases, as shown in Fig.  5 . It causes a challenging task. Therefore, the sampling-based methods that consider only local information offer inferior performance. Specifically, BAGAN-GP, CDRAGAN, Covid GAN, and class-conditioned diffusion model represent inferior results since the methods only focus on generating realistic samples but did not consider learning the phase-distinguishable features. Finally, Cooperative GAN also shows poor performance because of its unstable learning, resulting in a limited diversity of generated samples. Our method delivers the best performance by generating realistic and phase-distinguishable samples with a stabilizing technique. Specifically, our method improves 8.2%, 4.6%, and 7.0% of the average performance of the benchmark methods regarding their F-score, Precision, and Recall, respectively. However, the proposed method could not achieve statistically significant improvements over the best benchmark method, unlike the case studies using Pluronic F-127 hydrogel. This is because of the extremely high similarity between the solution and gel phases of Alpha-Cyclodextrin hydrogel, as shown in Fig.  5 . However, the proposed method still achieves the best performance, while all the benchmark methods fail to generate suitable data. Therefore, it is still valuable to use the proposed method in such challenging data, although it still requires some computational resources, as shown in Table  5 .

figure 8

t-SNE of the feature from the intermediate layer of the classifier from our method in epochs ( a ) 0 and ( b ) 135 when the balanced ratio is 0.050.

Figure  8 illustrates the t-SNE visualization of the features extracted from the intermediate layer of classifiers in our method at epochs 0 and 135 when the balanced ratio is 0.050. Similar to Fig.  4 , ‘ \(\bullet\) ’ and ‘ \(\times\) ’ denote features of actual and generated samples, respectively. To make a balanced training data, the solution phase of the Alpha-Cyclodextrin hydrogel has plenty of generated samples (‘ \(\times\) ’) than actual samples (‘ \(\bullet\) ’) in each batch. In contrast to epoch 0 (Fig.  8 a), the features at epoch 135 (Fig.  8 b) demonstrate that the features extracted from the generated samples (‘ \(\times\) ’) of the solution phase of the Alpha-Cyclodextrin hydrogel accurately match those from the actual samples (‘ \(\bullet\) ’). Due to this alignment, the balanced training data generated from our method achieves the best classification results compared to benchmark methods.

Discussions

This paper addresses the material characteristics classification problem using imbalanced spectral data. The imbalanced spectral data usually happens in actual experiments and industries, causing poor supervised classification performance. To address this challenge, a GAN-based data augmentation method proposed by authors in the previous work 15 is utilized. Specifically, the method consists of three DNNs, namely, generator, discriminator, and classifier, jointly optimized. The generator in the method generates both realistic and characteristics-distinguishable data to balance the training data. The imbalanced spectral data between the phases of Pluronic F-127 hydrogel and Alpha-Cyclodextrin hydrogel are used for the case studies. The results show the method successfully addresses the data imbalance problem by improving the phase classification results. Specifically, our method improves 8.8%, 6.4%, and 6.2% of the average performance of the benchmark methods regarding their F-score, Precision, and Recall, respectively, in all case studies. The outstanding performances of the proposed method in various case studies validate that the method could significantly contribute to many applications area using spectral data, such as radiology 45 and additive manufacturing 46 . In addition, it would be an interesting future research topic to generate the minority data from the test data for the users who need to assess the performance of the methods requiring balanced test data.

Methodology

A detailed procedure for the data collection of Pluronic F-127 and Alpha-Cyclodextrin hydrogel libraries are provided in “ Data collection of Pluronic F-127 hydrogel libraries ” and“ Data collection of Alpha-Cyclodextrin hydrogel libraries ” sections, respectively. Then, the proposed methodology is described in “ Proposed methodology ” section. Finally, the hyperparameters and the structure of the deep neural network used in this paper are listed in “ Hyperparameters of the deep neural networks ” section.

Data collection of Pluronic F-127 hydrogel libraries

For the data collection, hydrogel libraries of Pluronic F-127 (PF-127) are obtained from Sigma Aldrich and are prepared in 96-well plates 47 . The stock PF-127 water solution (30% wt%) is first prepared with deionized water. The stock solution is then serial diluted with deionized water across the well plate for concentrations from 0.3125 wt% to 29.6875 wt%. The well plate is left in the fridge overnight for mixing. Then, the plate is taken out from the fridge and leave at room temperature in an hour for cross-linking. Next, the prepared PF-127 hydrogel libraries are characterized by piezoelectric milli-cantilever (PEMC) sensors. The PEMC sensor is integrated with a three-axis robot (MPS50SL; Aerotech), and its movement is controlled by a motion controller (A3200, Aerotech). The impedance spectrum of each hydrogel sample is captured by a network analyzer (E5061B, Keysight) and a customized MATLAB program. Spectra data of all PF127 hydrogels in the 96-well plates are collected by manually controlling the robot-integrated sensor to move from one well to another. The frequency range for each experiment and concentration is determined based on the spectrum width. In addition, different sensors are used in three repeated experiments, leading to varying spectrum frequency ranges. The frequency ranges span 26,013.75–37,000 Hz, 27,016.25–40,000 Hz, and 31,012.5–41,000 Hz for three repeated experiments, respectively. Finally, in the case of labeling, the spectral data are fitted to the sigmoid curve, and then the spectrum before the inflection point of the curve is labeled as a solution, and the spectrum after the inflection point is labeled as a gel.

Data collection of Alpha-Cyclodextrin hydrogel libraries

To generate samples, supramolecular hydrogels of Alpha-Cyclodextrin ( \(\alpha\) -CD)/Polyethylene glycol (PEG) are prepared in 96-well plates. Both \(\alpha\) -CD and PEG are obtained from Sigma Aldrich and used without further purification. Stock solutions of \(\alpha\) -CD (80 mg/mL) and PEG (240 mg/mL) are prepared in advance, and the hydrogel library is obtained by mixing a constant volume of PEG stock solution with different volumes of \(\alpha\) -CD stock solution and deionized water. At first, 190 \(\upmu\) L of PEG is pipetted into each well of the 96-well plate. Then, deionized water is pipetted by increasing from 95 to 190 \(\upmu\) L with a step size of 1 \(\upmu\) L. Next, \(\alpha\) -CD is pipetted by reducing from 190 to 95 \(\upmu\) L with a step size of 1 \(\upmu\) L. The final volume in each well is 380 \(\upmu\) L, and the concentration of PEG is 120 mg/mL, while the concentration of \(\alpha\) -CD varies from 20 to 40 mg/mL. To avoid the formation of inhomogeneous hydrogels, the precursor solution in each well is mixed by pipette immediately once \(\alpha\) -CD is added. After all wells have been formulated, the 96-well plate is further mixed by a digital shaker (LSE digital microplate shaker; Corning) at 1000 rpm for 10 min. Finally, the well plate was placed in a humid environment and reacted at room temperature for 12 h. Then, the prepared hydrogel libraries of \(\alpha\) -CD/PEG are characterized by PEMC sensors in a high-throughput manner. The PEMC sensor is integrated with a robot (FISNAR, F5200N) for automated characterization. The hydrogel in each well is characterized by penetrating the robot-integrated sensor into the sample, and the impedance spectra are collected by a network analyzer (E5061B, Keysight) and a customized MATLAB program. All samples in 96-well plates are automatically characterized by PEMC sensors with the computer-controlled robot. Finally, the phases of the collected \(\alpha\) -CD/PEG spectrum data are obtained by two best-fit linear regression models. Specifically, based on the point where the two linear regression models intersect, the spectrum before the point is identified as a solution and the spectrum after that as a gel.

Proposed methodology

This section introduces a novel GAN-based data augmentation method proposed in the authors’ previous paper 15 . The structure of the overall method is described in “ Three-player structure for imbalanced data learning ” section. In addition, the objective functions of the algorithm are illustrated in “ Objective functions for three-player ” section. Finally, the training procedure of the method is described in “ Training procedure ” section.

Three-player structure for imbalanced data learning

Figure  9 shows the structure of our method, which consists of three players: a discriminator, a generator, and a classifier.

figure 9

Structure of the method 15 .

The generator generates samples of the spectral data using the random noise and corresponding characteristics labels. Within the generated samples, those representing the minority class are integrated with the actual imbalanced spectral data, resulting in balanced training data for the classifier. The proposed approach provides adversarial and cooperative learning to enhance the utility of the generated samples for improving the classifier’s performance. The specific roles of these two learnings are outlined as follows.

Adversarial learning: The interaction between the generator and the discriminator adheres to the adversarial relationship inherent in the GAN structure. The relationship allows both networks to engage in a competitive process, ultimately leading to the generator’s generation of realistic spectral data.

Cooperative learning: The cooperative interaction between the classifier and the generator empowers the generator to produce spectral data that can be well discerned regarding the material’s characteristics (i.e., characteristics-distinguishable samples) by the classifier.

Based on these two relationships, the generator generates samples of minority class with both properties (i.e., realistic and characteristics-distinguishable). Subsequently, these generated samples are combined with actual ones, creating a balanced training batch that flows through the classifier network in one training iteration. Through the iterative learning process involving three players, the classifier eventually attains a high level of performance. The detailed objective function of each player and the training procedure are explained in Appendix 1 .

Hyperparameters of the deep neural networks

Table  6 describes the hyperparameters that are used for all the methods in this paper. The common parameters among methods consisting of deep neural networks are determined based on the literature. Specifically, the optimizer of neural networks is the Adam algorithm with a learning rate of 0.0002 and momentums of 0.5 and 0.9 15 . In addition, many other hyperparameters, including kernel sizes, strides, padding, activation functions, and kernel initializer, are utilized by Huang and Jafari 28 that proposed CDRAGAN and BAGAN-GP, which are the state-of-the-art class conditional GAN methods. Furthermore, the number of kernels is determined as two to the powers, including 32, 64, and 128, commonly used in the existing studies using convolutional neural networks 48 , 49 .

The unique parameters of each method are selected based on the guidelines provided in the literature or determined by the values that showed the best performance within a specific range. For example, the coefficient of the gradient penalty of BAGAN-GP and CDRAGAN are determined at ten based on the recommendation of the previous studies 28 , 50 In the case of Cooperative GAN 36 , the scheduling parameter related to adjusting the borderline between classes is selected based on the performance from a range provided by Choi et al. 36 ((0, 1]). For the SMOTE 16 and B-SMOTE 17 , the parameters defining the number of neighborhood samples to use to generate the synthetic samples are selected based on the performance within a specified range ([1, 5]). Similarly, the dimension of the latent vector, which is the input size of the generator in Covid GAN 29 , is tuned within a range [100, 200]. From the class-conditioned U-Net based diffusion model 38 , 39 , the number of timesteps relevant to overfitting and underfitting to training data is determined as 1000 based on the previous literature 37 , 39 .

Table  7 provides information on the hyperparameters used for the classifier in the case studies. In case studies, a CNN is utilized as the classifier. To ensure a fair and consistent comparison, all the methods adopt the identical classifier configuration outlined in Table  7 .

Data availability

The datasets used and/or analysed during the current study available from the corresponding author on reasonable request ([email protected]).

Friedrich, W., Knipping, P. & Laue, M. Interferenzerscheinungen bei roentgenstrahlen. Ann. Phys. 346 , 971–988 (1913).

Article   Google Scholar  

Callaghan, P. T. Principles of Nuclear Magnetic Resonance Microscopy (Clarendon Press, 1993).

Google Scholar  

Smith, E. & Dent, G. Modern Raman Spectroscopy: A Practical Approach (Wiley, 2019).

Book   Google Scholar  

Wang, H. et al. Rapid identification of X-ray diffraction patterns based on very limited data by interpretable convolutional neural networks. J. Chem. Inf. Model. 60 , 2004–2011 (2020).

Article   CAS   PubMed   Google Scholar  

Schuetzke, J., Szymanski, N. J. & Reischl, M. Validating neural networks for spectroscopic classification on a universal synthetic dataset. NPJ Comput. Mater. 9 , 100 (2023).

Article   ADS   Google Scholar  

Belsky, A., Hellenbrandt, M., Karen, V. L. & Luksch, P. New developments in the inorganic crystal structure database (icsd): Accessibility in support of materials research and design. Acta Crystallogr. Sect. B Struct. Sci. 58 , 364–369 (2002).

Armbruster, T. & Danisi, R. The power of databases: The rruff project. Highlights in Mineralogical Crystallography 1–30 (2015).

Schuetzke, J., Benedix, A., Mikut, R. & Reischl, M. Enhancing deep-learning training for phase identification in powder X-ray diffractograms. IUCrJ 8 , 408–420 (2021).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Choudhary, K. et al. Recent advances and applications of deep learning methods in materials science. NPJ Comput. Mater. 8 , 59 (2022).

Szymanski, N. J. et al. Toward autonomous design and synthesis of novel inorganic materials. Mater. Horizons 8 , 2169–2198 (2021).

Article   CAS   Google Scholar  

McCulloch, W. S. & Pitts, W. A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5 , 115–133 (1943).

Article   MathSciNet   Google Scholar  

Kantz, E. D., Tiwari, S., Watrous, J. D., Cheng, S. & Jain, M. Deep neural networks for classification of lc-ms spectral peaks. Anal. Chem. 91 , 12407–12413 (2019).

Zeng, F., Peng, W., Kang, G., Feng, Z. & Yue, X. Spectral data classification by one-dimensional convolutional neural networks. In 2021 IEEE International Performance, Computing, and Communications Conference (IPCCC) 1–6 (IEEE, 2021).

Lee, J.-W., Park, W. B., Lee, J. H., Singh, S. P. & Sohn, K.-S. A deep-learning technique for phase identification in multiphase inorganic compounds using synthetic xrd powder patterns. Nat. Commun. 11 , 86 (2020).

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Chung, J., Shen, B. & Kong, Z. J. Anomaly detection in additive manufacturing processes using supervised classification with imbalanced sensor data based on generative adversarial network. J. Intell. Manuf. 1 , 1–20 (2023).

Chawla, N. V., Bowyer, K. W., Hall, L. O. & Kegelmeyer, W. P. Smote: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 16 , 321–357 (2002).

Han, H., Wang, W.-Y. & Mao, B.-H. Borderline-smote: A new over-sampling method in imbalanced data sets learning. In International Conference on Intelligent Computing 878–887 (Springer, 2005).

Cui, W., Zhang, Y., Zhang, X., Li, L. & Liou, F. Metal additive manufacturing parts inspection using convolutional neural network. Appl. Sci. 10 , 545 (2020).

Lee, X. Y., Saha, S. K., Sarkar, S. & Giera, B. Automated detection of part quality during two-photon lithography via deep learning. Addit. Manuf. 36 , 101444 (2020).

CAS   Google Scholar  

Mycroft, W. et al. A data-driven approach for predicting printability in metal additive manufacturing processes. J. Intell. Manuf. 31 , 1769–1781 (2020).

Douzas, G. & Bacao, F. Effective data generation for imbalanced learning using conditional generative adversarial networks. Expert Syst. Appl. 91 , 464–471 (2018).

Mikołajczyk, A. & Grochowski, M. Data augmentation for improving deep learning in image classification problem. In 2018 International Interdisciplinary PhD Workshop (IIPhDW) 117–122 (IEEE, 2018).

Fathy, Y., Jaber, M. & Brintrup, A. Learning with imbalanced data in smart manufacturing: A comparative analysis. IEEE Access 9 , 2734–2757 (2020).

Ranasinghe, G. D. & Parlikad, A. K. Generating real-valued failure data for prognostics under the conditions of limited data availability. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM) 1–8 (IEEE, 2019).

de Souza, V. L. T., Marques, B. A. D., Batagelo, H. C. & Gois, J. P. A review on generative adversarial networks for image generation. Comput. Graph. 1 , 1 (2023).

Sampath, V., Maurtua, I., Aguilar Martin, J. J. & Gutierrez, A. A survey on generative adversarial networks for imbalance problems in computer vision tasks. J. Big Data 8 , 1–59 (2021).

Wang, C. et al. CGAN-plankton: Towards large-scale imbalanced class generation and fine-grained classification. In 2017 IEEE International Conference on Image Processing (ICIP) , 855–859 (IEEE, 2017).

Huang, G. & Jafari, A. H. Enhanced balancing gan: Minority-class image generation. Neural Comput. Appl. 35 , 5145–5154 (2023).

Article   PubMed   Google Scholar  

Waheed, A. et al. Covidgan: Data augmentation using auxiliary classifier gan for improved covid-19 detection. IEEE Access 8 , 91916–91923 (2020).

Antoniou, A., Storkey, A. & Edwards, H. Data augmentation generative adversarial networks. Preprint at http://arxiv.org/abs/1711.04340 (2017).

Kiyasseh, D. et al. Plethaugment: Gan-based ppg augmentation for medical diagnosis in low-resource settings. IEEE J. Biomed. Health Inform. 24 , 3226–3235 (2020).

Mariani, G., Scheidegger, F., Istrate, R., Bekas, C. & Malossi, C. Bagan: Data augmentation with balancing gan. Preprint at http://arxiv.org/abs/1803.09655 (2018).

Wu, M. et al. Deep learning data augmentation for Raman spectroscopy cancer tissue classification. Sci. Rep. 11 , 23842 (2021).

Gao, B., Zhou, J., Yang, Y., Chi, J. & Yuan, Q. Generative adversarial network and convolutional neural network-based eeg imbalanced classification model for seizure detection. Biocybern. Biomed. Eng. 42 , 1–15 (2022).

Bisong, E. & Bisong, E. Google colaboratory. In Building Machine Learning and Deep Learning Models on Google Cloud Platform: A Comprehensive Guide for Beginners 59–64 (2019).

Choi, H.-S., Jung, D., Kim, S. & Yoon, S. Imbalanced data classification via cooperative interaction between classifier and generator. IEEE Trans. Neural Netw. Learn. Syst. 33 , 3343 (2021).

Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 33 , 6840–6851 (2020).

Sharma, G., Gupta, C., Agarwal, A., Sharma, L. & Dhall, A. Generating point cloud augmentations via class-conditioned diffusion model. In Proc. IEEE/CVF Winter Conference on Applications of Computer Vision 480–488 (2024).

Nguyen, Q., Le, T., Nguyen, T. & Nhat, M. N. Class label conditioning diffusion model for robust brain tumor mri synthesis. Authorea Preprints (2023).

Powers, D. M. Evaluation: From precision, recall and f-measure to roc, informedness, markedness and correlation. Preprint at http://arxiv.org/abs/2010.16061 (2020).

Jalaal, M., Cottrell, G., Balmforth, N. & Stoeber, B. On the rheology of Pluronic f127 aqueous solutions. J. Rheol. 61 , 139–146 (2017).

Article   ADS   CAS   Google Scholar  

Hsu, H. & Lachenbruch, P. A. Paired t Test. Wiley StatsRef, Statistics Reference Online (2014).

Dimitriadis, G., Neto, J. P. & Kampff, A. R. t-sne visualization of large-scale neural recordings. Neural Comput. 30 , 1750–1774 (2018).

Article   MathSciNet   PubMed   Google Scholar  

Domiński, A., Konieczny, T. & Kurcok, P. \(\alpha\) -cyclodextrin-based polypseudorotaxane hydrogels. Materials 13 , 133 (2019).

Article   ADS   PubMed   PubMed Central   Google Scholar  

Douek, P. C. et al. Clinical applications of photon-counting ct: A review of pioneer studies and a glimpse into the future. Radiology 309 , e222432 (2023).

Zhang, W. et al. X-ray diffraction measurements and computational prediction of residual stress mitigation scanning strategies in powder bed fusion additive manufacturing. Addit. Manuf. 61 , 103275 (2023).

Zhang, J. et al. Rapid, autonomous high-throughput characterization of hydrogel rheological properties via automated sensing and physics-guided machine learning. Appl. Mater. Today 30 , 101720 (2023).

Naseri, H. & Mehrdad, V. Novel cnn with investigation on accuracy by modifying stride, padding, kernel size and filter numbers. Multimedia Tools Appl. 82 , 23673–23691 (2023).

Chang, Y., Chen, J., Qu, C. & Pan, T. Intelligent fault diagnosis of wind turbines via a deep learning network using parallel convolution layers with multi-scale kernels. Renew. Energy 153 , 205–213 (2020).

Kodali, N., Abernethy, J., Hays, J. & Kira, Z. On convergence and stability of gans. Preprint at http://arxiv.org/abs/1705.07215 (2017).

Wang, C., Yu, Z., Zheng, H., Wang, N. & Zheng, B. Cgan-plankton: Towards large-scale imbalanced class generation and fine-grained classification. In 2017 IEEE International Conference on Image Processing (ICIP) 855–859 (IEEE, 2017).

Tao, S. & Wang, J. Alleviation of gradient exploding in gans: Fake can be real. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 1191–1200 (2020).

Arjovsky, M. & Bottou, L. Towards principled methods for training generative adversarial networks. Preprint at http://arxiv.org/abs/1701.04862 (2017).

Tran, N.-T., Bui, T.-A. & Cheung, N.-M. Dist-gan: An improved gan using distance constraints. In Proc. European Conference on Computer Vision (ECCV) 370–385 (2018).

Download references

Acknowledgements

This project was funded by a grant with award number 1933525 from the National Science Foundation (NSF).

Author information

Authors and affiliations.

Department of Industrial Engineering, Pusan National University, Busan, South Korea

Jihoon Chung

Grado Department of Industrial and Systems Engineering, Virginia Tech, Blacksburg, VA, USA

Junru Zhang, Amirul Islam Saimon, Yang Liu, Blake N. Johnson & Zhenyu Kong

You can also search for this author in PubMed   Google Scholar

Contributions

J.C., J.Z., Y.L., B.J., & Z.J.K. conceived and designed the study. J.C., A.I.S, & Z.J.K. implemented the method. J.C, J.Z, A.I.S., & Z.J.K performed the data analysis. J.C., B.Z, & Z.J.K. wrote the main text of the paper. All authors discussed the results and contributed to the writing of the final manuscript.

Corresponding authors

Correspondence to Blake N. Johnson or Zhenyu Kong .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Objective functions for three-player

The review of the generative adversarial network (GAN) is described in “ Generative adversarial network (GAN) ’’ section initially. Then, the objective functions of the discriminator, generator, and classifier are illustrated in “ Objective function of discriminator ’’, “ Objective function of generator ’’ and “ Objective function of classifier ’’ sections, respectively. The iterative optimization between the three players ultimately yields the high-performance classifier from the imbalanced spectral data.

Generative adversarial network (GAN)

The idea of a GAN is to train two networks, namely, generator G and discriminator D , with a minimax game for V ( D ,  G ) demonstrated in Eq. ( 2 ) 51 .

where z denotes the random noise, and \(x_{a}\) is actual samples from spectral data. \(y_{a}\) and \(y_{g}\) are the labels of actual and generated spectral data, respectively. Specifically, the generator is to produce samples of spectral data G ( z ) from z . In contrast, the discriminator is to distinguish whether the origin of input samples is from the actual ( \(x_{a}\) ) or the generator ( G ( z )). In other words, the role of the discriminator is to distinguish the origin of the input samples, whereas the generator’s task is to create synthetic samples with the intention of deceiving the discriminator. This adversarial learning leads to the distribution of newly generated samples approaching the inherent distribution of the actual samples, \(P(X_{a})\) .

Objective function of discriminator

In the proposed approach, the discriminator aims to maximize Eq. ( 2 ) through adversarial learning with the generator. Specifically, the discriminator learns to distinguish the input ( \(x_{a},y_{a}\) ) and ( \(G(z,y_{g}),y_{g}\) ) are actual and generated, respectively. Furthermore, the method introduces two supplementary terms to ensure a stable learning process. This is done because GAN training is usually unstable and challenging to converge, resulting in the generator’s gradient explosions in adversarial learning 52 , 53 . First, our method ensures the regularization of the discriminator’s gradient by imposing a gradient penalty. The penalty enforces 1-Lipschitz continuity upon the discriminator. Second, the proposed approach incorporates an extra input for the discriminator, comprising the actual sample with a wrong label. This added task prevents the discriminator from distinguishing the origin of the input well before the generator successfully approximates the actual sample distribution of the spectral data. Otherwise, it causes unstable learning of GAN through exploding or vanishing the gradient of the generator 53 , 54 . In summary, the objective function of the discriminator ( \(L^{D}\) ) is as follows 15 .

where \(\hat{x}=\alpha x_{a} +(1-\alpha )G(z)\) , and \(\alpha\) is sampled uniformly between 0 and 1. The coefficient \(\lambda\) pertains to the gradient penalty term. The initial three losses in Eq. ( 3 ) are associated with losses incurred when the discriminator misclassifies the source of the actual, generated, and mislabeled sample. The final loss corresponds to the loss linked to the gradient of the discriminator.

Objective function of generator

The primary aim of the generator is to generate samples that align with the distribution of actual spectral data ( \(P(X_{a})\) ), accomplished by minimizing Eq. ( 2 ). Hence, Eq. ( 2 ) enables the adversarial learning between the discriminator and generator. Apart from Eq. ( 2 ), the generator incorporates an additional component in its objective function that pertains to the classifier. Unlike the adversarial relationship with the discriminator, the generator and the classifier establish a cooperative relationship to generate distinctly discernible spectral samples across the material characteristics. In other words, the generator’s role is to generate samples and provide a balanced training dataset that can improve the classifier’s performance, as shown in Fig.  9 . To accomplish this, the generator’s objective function includes the classification loss based on the generated samples. The objective function of the generator ( \(L^{G}\) ) can be formulated as follows 15 .

Objective function of classifier

The objective function of the classifier includes the classification loss derived from both the actual and generated samples of the spectral data. As illustrated in Fig.  9 , the generator’s samples are combined with actual samples to provide a balanced training dataset for each batch of the classifier. The classifier is then optimized by minimizing the classification loss for both the actual and generated samples. Finally, the classifier’s objective function ( \(L^{C}\) ) is listed as follows 15 .

In particular, \(-\mathbb {E}_{(z,y_{g})\sim P(Z, Y_{g})}[y_{g}\text {log}(C(G(z,y_{g}))]\) , a common term in both Eqs. ( 4 ) and ( 5 ) enables cooperative learning between the generator and classifier.

Training procedure

The three players are optimized alternatively. Initially, the discriminator undergoes training using a batch that includes both actual and generated samples, aiming to minimize Eq. ( 3 ). Subsequently, a batch containing only generated samples is employed to update the generator, focusing on minimizing Eq. ( 4 ). Finally, the classifier’s training involves minimizing Eq. ( 5 ) with balanced training data from all the classes. This process begins with sampling a batch from the actual data. Then, the generator generates the remaining samples from the minority class to ensure a balanced training set. The alternating training process continues until it reaches the specified number of predefined epochs.

Performance evaluation in Pluronic F-127 hydrogel case study

Tables  8 ,  9 , and  10 represent the performance evaluation using the Pluronic F-127 hydrogel when the balanced ratios of training data are 0.039, 0.027, and 0.013, respectively.

Performance evaluation in Alpha-Cyclodextrin hydrogel case study

Tables  11 ,  12 and  13 represent the performance evaluation using the Alpha-Cyclodextrin hydrogel when the balanced ratios of training data are 0.083, 0.050, and 0.025, respectively.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Chung, J., Zhang, J., Saimon, A.I. et al. Imbalanced spectral data analysis using data augmentation based on the generative adversarial network. Sci Rep 14 , 13230 (2024). https://doi.org/10.1038/s41598-024-63285-4

Download citation

Received : 18 January 2024

Accepted : 27 May 2024

Published : 09 June 2024

DOI : https://doi.org/10.1038/s41598-024-63285-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

methods of analysis in case study

  • Open access
  • Published: 06 June 2024

iIMPACT: integrating image and molecular profiles for spatial transcriptomics analysis

  • Xi Jiang 1 , 2 ,
  • Shidan Wang 1 ,
  • Lei Guo 1 ,
  • Bencong Zhu 3 , 4 ,
  • Zhuoyu Wen 1 ,
  • Liwei Jia 5 ,
  • Guanghua Xiao 1 &
  • Qiwei Li   ORCID: orcid.org/0000-0002-1020-3050 4  

Genome Biology volume  25 , Article number:  147 ( 2024 ) Cite this article

257 Accesses

1 Altmetric

Metrics details

Current clustering analysis of spatial transcriptomics data primarily relies on molecular information and fails to fully exploit the morphological features present in histology images, leading to compromised accuracy and interpretability. To overcome these limitations, we have developed a multi-stage statistical method called iIMPACT. It identifies and defines histology-based spatial domains based on AI-reconstructed histology images and spatial context of gene expression measurements, and detects domain-specific differentially expressed genes. Through multiple case studies, we demonstrate iIMPACT outperforms existing methods in accuracy and interpretability and provides insights into the cellular spatial organization and landscape of functional genes within spatial transcriptomics data.

Spatially resolved transcriptomics (SRT), a new generation of RNA profiling techniques, provides biological information at the cellular level while preserving the organization of the tissue and cellular microenvironment [ 1 , 2 , 3 , 4 ]. One category of SRT methods builds upon next-generation sequencing (NGS)-based SRT techniques, including spatial transcriptomics (ST) [ 5 ], 10x Visium (an improved ST platform), Slide-seq [ 6 ], Slide-seqV2 [ 7 ], and high-definition spatial transcriptomics (HDST) [ 8 ]. These techniques capture RNA molecules via spatially arrayed barcoded probes. The barcoded areas, namely spots, cover a group of cells and are usually arrayed on a two-dimensional grid. Another category of SRT platforms is based on imaging techniques, such as seqFISH [ 9 ], MERFISH [ 10 ], and STARmap [ 11 ]. They measure the expression level for hundreds to thousands of genes at the single-cell resolution with detailed spatial organization information. With these advancements, SRT techniques have been widely applied to facilitate discoveries of novel insights in biomedical studies.

A central challenge for SRT data analysis is to define clinically or biologically meaningful spatial domains by partitioning regions with similar molecular and/or histological characteristics, because the spatial domain identification serves as the foundation for several important downstream analyses, including but not limited to the domain-based differential expression analysis, trajectory analysis, and functional pathway analysis [ 12 , 13 ]. However, current state-of-the-art methods typically focus on achieving this goal solely by analyzing SRT molecular profiles, such as gene expression, while neglecting the valuable morphological or biological information present in the associated histology images. For example, the Seurat package, the most prevalent single-cell RNA sequencing data analysis pipeline [ 14 , 15 ], utilizes only the high-throughput gene expression of each spot for clustering analysis but omits spatial context. On the other hand, several recently developed methods, such as the hidden-Markov random field model [ 16 ], BayesSpace [ 17 ], and BASS [ 18 ], integrate spatial information using Bayesian frameworks, but do not leverage any information from the paired histology images. Meanwhile, a series of deep-learning-based methods are designed to integrate features extracted from the histology image to enhance SRT data clustering analysis. For example, SpaGCN [ 19 ] relies on the RGB channel data from areas surrounding spots for histological insights, whereas stLearn [ 20 ], MUSE [ 21 ], and SiGra [ 22 ] achieve image feature extraction via various deep neural network models. However, those features do not explicitly reveal detailed morphological information (e.g., cell locations and types) and, thus, have limited ability to directly provide biologically or clinically relevant insights.

Different from molecular information, histology images characterize cellular structures and tissue microenvironments, which have been proven valuable in clinical diagnosis and prognosis [ 23 , 24 ]. Computer vision algorithms have enabled us to automatically segment cell nuclei from digital histology images at a large scale [ 25 ]. Recent developments in deep convolutional neural networks (e.g., H-DenseUNet [ 26 ], Micro-Net [ 27 ], Hover-Net [ 28 ], and HD-Staining model [ 23 ]) have further integrated the automatic identification, classification, and feature extraction of each observed nucleus in a histology image. In practice, a histology-based spatial domain (e.g., tissue) is defined as a group of cells with similar morphological and molecular context as a unit. Thus, we hypothesize that integrating spot-level molecular profiles and cellular-level image profiles from AI-reconstructed histology images—digitally processed tissue samples with AI-identified and classified nuclei, primarily using deep learning—could enhance the spatial domain identification in terms of both accuracy and interpretability.

Another challenge for SRT data analysis is to identify spatial domain-specific differentially expressed genes (spaDEGs), which are defined as genes enriched in a given spatial domain. Recently developed methods, such as SpatialDE [ 29 ], SPARK [ 30 ], BOOST-GP [ 31 ], and BOOST-MI [ 32 ] focus on identifying spatially variable genes (SVGs), which represent genes with spatially correlated expression patterns [ 30 , 31 ]. They characterize the global spatial dependency of a gene in the whole domain while ignoring the spatial pattern heterogeneity due to cellular organization, which could be observed in AI-reconstructed histology images. SpaGCN [ 19 ] proposed domain-guided differential expression analysis to detect spaDEGs without a rigorous statistical framework. Therefore, there is an urgent need to develop a reliable statistical method to detect spaDEGs.

This paper proposes a two-stage statistical approach by integrating Image and Molecular Profiles to Analyze and Cluster spatial Transcriptomics data, or iIMPACT for short. The first stage is implementing a Bayesian finite mixture model to allocate all spots into mutually exclusive clusters, namely histology-based spatial domains. We decompose each mixture component into two sub-components to integrate image and molecular profiles. In particular, a multinomial sub-component is employed to model cell type abundance available in histology images. Following BayesSpace [ 17 ], we use a normal sub-component to model the low-dimensional representation of normalized gene expression from the matching SRT molecular profile. The Bayesian model also adopts a Markov random field prior (MRF) to encourage neighboring spots to be clustered in the same histology-based spatial domain. The spots’ neighborhood structure can be straightforwardly defined from the NGS-based SRT geospatial profile, as spots are usually arrayed on square or triangular lattices. Through the resulting posterior inference, we obtain histology-based spatial domains and their interactive zones, while characterizing each identified histology-based spatial domain by inferring its underlying domain-specific relative abundance of cell types. The second stage is implementing a negative binomial (NB) regression model to search for spaDEGs, which are differentially expressed between a given histology-based spatial domain identified in the first stage and all others. This approach directly models the numbers of read counts (used as a proxy for gene expression) in the SRT molecular profile to achieve minimum information loss. iIMPACT could also be extended to analyze imaging-based SRT data via some special handling. Compared with existing state-of-the-art methods, iIMPACT is able to fully leverage information from the nuclei segmentation procedure on the histology images for clustering analysis and has strong biological interpretability. Applying iIMPACT on multiple datasets from different SRT platforms (summarized in Additional file 1: Table S1), we confirmed that iIMPACT performed better on both spatial domain identification and spaDEG detection than state-of-the-art methods. We further demonstrated that iIMPACT could capture biological features at both the spatial domain level and gene level. Therefore, by integrating image and molecular information, iIMPACT facilitates the discovery of new biological insights from SRT datasets.

Overview of iIMPACT

iIMPACT is a two-stage statistical method to analyze SRT data, with its workflow shown in Fig.  1 . It includes two stages—histology-based spatial domain identification by a Bayesian normal-multinomial mixture model and spaDEG detection by an NB regression model.

figure 1

Workflow of iIMPACT: A iIMPACT starts by combining and processing image profile from AI-reconstructed histology images, and geospatial and molecular profiles from SRT data (circled by dashed lines) to conduct the histology-based spatial domain identification. B A Bayesian normal-multinomial mixture model with the Markov random field (circled by solid lines) is fitted for histology-based spatial domain identification. Based on the spatial domain identification results, biologically important cellular spatial organization can be characterized, including the domain-specific relative abundance of cell types and interactive zones (circled by dotted lines). C Domain-specific spaDEGs are identified by a negative binomial (NB) regression model

To achieve the above goals, iIMPACT utilizes the morphological context of histology images and the spatial context of gene expression measurements, referring to the image and molecular profiles in Fig.  1 A and throughout the paper. In particular, the molecular profile refers to the low-dimensional representation of normalized gene expression values at the spot level (denoted by \({\varvec{Y}}\) ), which is obtained by a pre-specified dimension reduction technique, such as principal component analysis (PCA). The accompanying SRT geospatial profile that records all spots’ locations is processed as an adjacent matrix (denoted by \({\varvec{G}}\) ) representing the spots’ neighborhood structure. iIMPACT requires the locations and types of all cell nuclei in the matching histology image. Combining with the geospatial profile, we can generate the image profile (denoted by \({\varvec{V}}\) ), which indicates the spot-level cell type abundance, i.e., the number of different cell types within a spot and its expanded area.

In the first stage, we employ a Bayesian normal-multinomial mixture model with the MRF prior [ 33 , 34 ] to identify the histology-based spatial domains (Fig.  1 B) and interactive zones, corresponding to those spots with less confidence to be allocated to any histology-based spatial domains. Through model parameter estimation, iIMPACT can infer the underlying relative abundance of cell types at each histology-based spatial domain to provide a reference to distinguish their histological types. In the second stage, an NB regression model is fitted for each gene and each histology-based spatial domain of interest, where spaDEGs can be defined (Fig.  1 C).

Application to human breast cancer dataset

We applied iIMPACT to analyze an SRT dataset from a human breast cancer study. This dataset includes 2518 spots and 17,651 genes. The gene expression was measured on a section of human breast with invasive ductal carcinoma via the 10x Visium platform, along with annotation from pathologists that was used to evaluate the accuracy of spatial domain detection (H&E-stained image with five annotated tissue regions in Fig.  2 A). After applying HD-Staining [ 23 ] to the histology image of breast cancer tissue, we identified 156,235 cells within seven categories: macrophage, ductal epithelium, karyorrhexis, tumor cell, lymphocyte, red blood cell, and stromal cell (detailed information in Additional file 1: Fig. S1).

figure 2

Human breast cancer dataset: A H&E-stained image of the tissue section with spot-level manual annotation from pathologists. B Spatial domains detected by iIMPACT, SpaGCN, BayesSpace, BASS, stLearn, and MUSE, with the number of clusters to be five. C Estimates (posterior means and credible intervals) of domain-specific relative abundance of cell types for the seven cell types observed in the AI-reconstructed histology image. D Interactive zones (black asterisk spots) defined by iIMPACT. E Identified interactive zones (black asterisk spots) and other boundary areas of tumor domain and its adjacent domain 3, and boxplots of gene expression richness for spots in the interactive zone and other boundaries. F Gene enrichment analysis between genes detected by iIMPACT, SpaGCN, SpatialDE, and SPARK, and known breast cancer genes from the COSMIC database. G Spatial expression patterns of two example spaDEGs, COX6C and ELF3 , that were only detected by iIMPACT

Firstly, we compared the five spatial domains identified by iIMPACT, SpaGCN [ 19 ], BayesSpace [ 17 ], BASS [ 18 ], stLearn [ 20 ], and MUSE [ 21 ], with manually annotated domains by pathologists. We quantified the clustering performance via the widely used adjusted Rand index (ARI). It generally ranges from 0 to 1, with higher values indicating greater consistency between the identified spatial domain pattern and the manual annotation, as illustrated in Additional file 1: Fig. S2. We found that iIMPACT achieved the highest consistency with the manual annotation (see Fig.  2 B. ARI = 0.634). stLearn (ARI = 0.527) and SpaGCN (ARI = 0.520) took the image-extracted features or image RGB values, respectively, instead of detailed histology information, which might contribute to their less satisfactory segmentation of non-tumor regions. However, they outperformed BASS (ARI = 0.496) and BayesSpace (ARI = 0.419). Notably, none of the methods performed well in separating the fat region (in blue) from the fibrous tissue (in red) per the manual annotation. Detailed comparisons of spatial domains across different numbers of domains \(K\) and the corresponding ARIs are presented in Additional file 1: Fig. S3 and S4, respectively. In summary, the better performance of iIMPACT suggests the advantage of integrating both molecular and image profiles in the clustering analysis of SRT data.

Secondly, iIMPACT is able to define each individual histology-based spatial domain simultaneously by inferring the latent spatial domain-specific relative abundance of cell types parametrized by the Bayesian multinomial-normal mixture model (Fig.  2 C). In contrast, other methods, despite their good capabilities in identifying spatial domains, currently lack the ability to effectively integrate cell type information and directly interpret the identified domains in a biologically meaningful way. For example, as detailed in Fig.  2 C, the proportion of tumor cells is higher in domain 1 (green spots in Fig.  2 B) than in other domains, indicating that domain 1 is the tumor region. This inference is consistent with tumor regions in the manual annotation. Domain 2 (blue) and domain 5 (red) have a similar proportion of stromal cells, while the proportion of lymphocytes in domain 2 is higher than in domain 5. The difference in the relative abundance of cell types may indicate the functional difference between these two domains. These examples confirm that iIMPACT is able to provide biological interpretation of spatial domains.

Thirdly, iIMPACT can identify the interactive zones among histology-based spatial domains (Fig.  2 D). Interactive zones are distinguished from the identified spatial domain. It is defined as spots with higher uncertainty on domain allocation, which potentially have higher diversity in cell type abundance and heterogeneity in gene expression compared with neighboring spots with unambiguous domain definition. We calculated the gene expression richness at each spot within the tumor boundary, immune boundary, and interactive zone, defining it as the percentage of genes exhibiting non-zero read counts. Note that this measure was taken at the spot level rather than the single-cell level. We observed statistically significant differences among these comparisons (Fig.  2 E), implying that the identified zones are connected areas between tumor and immune domains with a high level of heterogeneity in gene expression and complex cellular interactions. By further comparing the gene expressions for these groups, we found several known cancer or immune genes with high expression in the interactive zones (e.g., GREM1 [ 35 ] ), suggesting the possible tumor-immune interactions in these zones.

Finally, we asked whether the spaDEGs defined by iIMPACT are more consistent with biological knowledge than those from other algorithms, which is an independent evaluation step frequently used for validating the clustering approaches on single-cell and spatial profiling data [ 19 , 29 , 30 ]. We focused on the tumor-domain specific spaDEGs defined by iIMPACT and SpaGCN [ 19 ], and SVGs by SpatialDE [ 29 ] and SPARK [ 30 ], respectively, and performed the enrichment analysis by comparing tumor-domain spaDEGs or SVGs defined by these four methods with the known breast cancer gene set defined in the Catalogue Of Somatic Mutations In Cancer (COSMIC) database. The number of genes identified by each method, along with their overlaps with the referenced gene set, is detailed in Additional file 1: Table S2. As summarized in Fig.  2 F, the tumor-domain spaDEGs detected by iIMPACT showed higher overlap with the known breast cancer gene set than the genes detected by SpaGCN, SpatialDE, and SPARK, respectively, including two example genes that can only be detected by iIMPACT (Fig.  2 G): COX6C , a known biomarker for the identification of hormone-responsive breast cancer [ 36 ], and ELF3 , an epithelial-specific gene that is a novel therapeutic target of breast cancer and has been amplified in early breast cancer [ 37 ]. To have an additional diagnosis of the spatial signals of those detected genes, we employed Moran’s I [ 38 ] statistic to quantify the degree of spatial autocorrelation of gene expression (details in Additional file 1: Section S1). Results are shown in Additional file 1: Fig. S5. Genes detected by iIMPACT exhibit a notably higher average Moran’s I than those detected by SpaGCN and SPARK across various selection thresholds. Additionally, these genes demonstrate a higher average Moran’s I than the top 1000 SpatialDE-identified SVGs. These results confirm that iIMPACT-defined spaDEGs are more closely aligned with established biological knowledge and display a more pronounced spatial expression pattern.

Application to human prostate cancer dataset

To evaluate the performance of iIMPACT in different tissue types, we studied another SRT dataset from a human prostate cancer study, which includes 4371 spots and 17,651 genes. The gene expression was measured on a section from invasive carcinoma of the human prostate via the 10x Visium platform. We applied HD-Staining to analyze the histology image of this tissue (Fig.  3 A). 259,257 cells were segmented and classified into six categories: macrophage, karyorrhexis, tumor cell, lymphocyte, red blood cell, and stromal cell (detailed information in Additional file 1: Fig. S6).

figure 3

Human prostate cancer dataset: A H&E-stained image of the tissue section with spot-level manual annotation from pathologists. B Spatial domains detected by iIMPACT, SpaGCN, BayesSpace, BASS, stLearn, and MUSE, setting the number of clusters to be five. C Estimates (posterior means and credible intervals) of domain-specific relative abundance of cell types for the six cell types observed in the AI-reconstructed histology image. D Interactive zones (black asterisk spots) defined by iIMPACT. E Identified interactive zones (black asterisk spots) and other boundary areas of domain 2 and domain 3, and boxplots of gene expression richness for spots in the interactive zone and other boundaries. F Gene enrichment analysis between genes detected by iIMPACT, SpaGCN, SpatialDE, and SPARK, and the known prostate cancer genes from the COSMIC database. G Spatial expression patterns of two example spaDEGs, EIF3E and TBL1XR1 , that were only detected by iIMPACT

iIMPACT identified spatial domains that align more closely with the manual annotation than other methods. As shown in Fig.  3 B, when the number of spatial domains \(K\) set to \(5\) suggested by the integrated completed likelihood (ICL) plot (Additional file 1: Fig. S7), these six methods could identify the domain (marked in green) with a high proportion of tumor cells, compared with the spatial distribution of tumor cells (Additional file 1: Fig. S6C). Interestingly, iIMPACT could distinguish histology-based spatial domains with different red blood cell proportions (Fig.  3 B, yellow region vs. red region). We confirmed that iIMPACT outperformed other methods in spatial domain identification, given that there are three morphologically distinguished spatial domains: tumor, stroma and partially atrophic changes, and stroma (Additional file 1: Fig. S8). We observed that iIMPACT achieved the highest consistency with the manual annotation (ARI = 0.659). Additional file 1: Fig. S9 displays the spatial domains identified across the settings of the number of domains \(K\) from 2 to 8, and Additional file 1: Fig. S4 shows the corresponding ARI comparisons among different methods.

To demonstrate the interpretability of iIMPACT, we characterized the domain-specific relative abundance of cell types in Fig.  3 C. We observed that domain 1 has a higher proportion of tumor cells than other domains, indicating that it is probably the tumor domain. Comparing domain 2 with domain 3, we observed that they have a similar proportion of tumor cells, but domain 2 has a higher proportion of immune cells (i.e., lymphocyte and macrophage), implying the heterogeneity of immune composition within tumors.

In addition, interactive zones can also be defined by iIMPACT (Fig.  3 D). By checking the interactive zones of domains 2 and 3 and calculating the gene expression richness, we observed a clear trend between the interactive zones and the surrounding boundaries, indicating the unique characteristics of interactive zones (Fig.  3 E). We further found that gene DNAJC5 [ 39 ] expressed higher on the identified interactive zones, implying its potential relationship with the intermediate areas of immune cell distribution.

We also compared iIMPACT, SpaGCN [ 19 ], SpatialDE [ 29 ], and SPARK [ 30 ] in detecting biologically meaningful genes in this prostate cancer dataset. We confirmed that, for tumor-domain (domain 1) specific spaDEGs, iIMPACT outperformed SpaGCN, SpatialDE, and SPARK in detecting known prostate cancer genes from the COSMIC database (Fig.  3 F and Additional file 1: Table S2), illustrating that iIMPACT could detect spaDEGs that are biologically relevant. These iIMPACT-defined spaDEGs in tumor domains have experimental evidence to support their functional relevance to the development of prostate cancer. For example, as shown in Fig.  3 G, EIF3E , which is associated with increased cell cycle progression and motility in prostate cancer [ 40 ], and TBL1XR1 , which displays an oncogene role for prostate cancer cell proliferation [ 41 ]. Based on the calculation of Moran’s I (Additional file 1: Fig. S5), genes detected by iIMPACT have strong spatial correlation, similar with SpaGCN, and higher than those detected by SpatialDE and SPARK.

Application to human ovarian cancer dataset

The third NGS-based SRT dataset is from a section of human ovarian tumor tissue. This dataset includes 3455 spots and 17,651 genes. The gene expression was measured on a section of serous papillary carcinoma from human ovarian via the 10x Visium platform, with the H&E-stained image shown in Fig.  4 A, HD-Staining model segmented and classified 211,746 cells in six categories: macrophage, karyorrhexis, tumor cell, lymphocyte, red blood cell, and stromal cell (Detailed information in Additional file 1: Fig. S10). By utilizing the cell type abundance information from the histology image, we observed that iIMPACT had better performance on spatial domain identification. Setting the number of spatial domains to be 5, as suggested by the ICL plot (Additional file 1: Fig. S7), iIMPACT could identify the domain (Fig.  4 B, domain marked in green) with a high proportion of tumor cells, which has a high consistency with the tumor region annotated by the pathologist (Additional file 1: Fig. S11) and the region with a high amount of tumor cells (Additional file 1: Fig. S10C). By comparing the clustering results of six methods (iIMPACT, SpaGCN, BayesSpace, BASS, stLearn, and MUSE) with the annotated tumor and benign domains for this SRT dataset, we observed a remarkable concordance between the clustering results obtained from iIMPACT and the pathologist's annotations (ARI = 0.967, see Additional file 1: Fig. S11). Additional file 1: Fig. S12 shows the spatial domains identified across the settings of the number of domains \(K\) from 2 to 8.

figure 4

Human ovarian cancer dataset: A H&E-stained image of the tissue section with spot-level manual annotation from pathologists. B Spatial domains detected by iIMPACT, SpaGCN, BayesSpace, BASS, stLearn, and MUSE, setting the number of clusters to be five. C Estimates (posterior means and credible intervals) of domain-specific relative abundance of cell types for the six cell types observed in the AI-reconstructed histology image. D Interactive zones (black asterisk spots) defined by iIMPACT. E Identified interactive zones (black asterisk spots) and other boundary areas of tumor domain and its adjacent domain 5, and boxplots of gene expression richness for spots in interactive zone and other boundaries. F Gene enrichment analysis between genes detected by iIMPACT, SpaGCN, SpatialDE, and SPARK, and the known ovarian cancer genes from the COSMIC database. G Spatial expression patterns of two example spaDEGs, BCL6 and CHD4 , that were only detected by iIMPACT

iIMPACT could also distinguish domains with different red blood cell proportions. Figure  4 C shows the estimation of the relative abundance of cell types for the five histology-based spatial domains. Domain 1 has a higher proportion of tumor cells than other domains, indicating that it is likely to be the tumor domain. We further examined the interactive zones (Fig.  4 D) and compared the interactive zone between domains 1 and 5 with other boundary spots (Fig.  4 E). A significant difference in gene expression richness between boundary spots and the interactive zone was observed. Furthermore, we found that gene TTLL5 [ 42 ] and CLEC12A [ 43 ] have a higher expression on the interactive zone between domains 1 and 5, which may infer their potential relationship with the tumor-immune interaction.

We further detected spaDEGs using iIMPACT and then queried tumor-region spaDEGs with the known ovarian cancer gene set defined by the COSMIC database. We observed that iIMPACT-defined ovarian cancer spaDEGs showed a higher overlap with the known ovarian cancer gene set than that of SpaGCN, SpatialDE, and SPARK (Fig.  4 F and Additional file 1: Table S2). Moreover, we explored these ovarian cancer spaDEGs only defined by iIMPACT and found that many of them possess compelling experimental evidence substantiating their functional relevance to ovarian cancer. For example, our list included BCL6 , which displays pro-oncogenic activity in ovarian cancer [ 44 ], and CHD4 , which is associated with apoptosis mediated by cisplatin in ovarian cancer cells [ 45 ] (Fig.  4 G). Additional file 1: Fig. S5 illustrates that the top \(\text{1,000}\) spaDEGs identified by iIMPACT exhibit greater average Moran's I values than those identified by other methods, indicating a stronger spatial correlation in their expression patterns.

Application to mouse visual cortex STARmap dataset

To demonstrate that iIMPACT is also able to analyze data from imaging-based SRT platforms, we applied iIMPACT to a STARmap dataset [ 11 ]. This dataset was generated from mouse visual cortex, including the hippocampus, corpus callosum, and the neocortical layers. In total, 1020 genes were measured in 1207 cells with 15 cell types. The layer structure and cell type distribution of the tissue section provided by the original study are displayed in Fig.  5 A.

figure 5

Mouse visual cortex STARmap data: A Layer structure of the tissue section from the original study. Spatial domains detected by iIMPACT, SpaGCN, BayesSpace, BASS, and stLearn, setting the number of clusters to seven (the number of layers). Manually added square lattice grid when fitting iIMPACT is displayed with dashed lines. B Interactive zones (black asterisk spots) defined by iIMPACT. C Gene enrichment analysis between genes detected by iIMPACT, SpaGCN, SpatialDE, and SPARK, and genes functionally relevant to visual cortex for five layers. D Spatial expression patterns and barplots of proportion of non-zero expression of two example spaDEGs, Deptor and Vamp3 , that were only detected by iIMPACT

As shown in Fig.  5 A, iIMPACT displayed the second accurate clustering results with the known layer structure (ARI = 0.592). BASS is designed for single-cell-resolution SRT data, thus it had the best performance (ARI = 0.666). We also noticed that implementing iIMPACT on a lower resolution level (grids in Fig.  5 A) might reduce the influence of noise, thus making the clustering result more robust. We also leveraged iIMPACT to identify the interactive zones (Fig.  5 B). The majority areas of identified interacting areas were boundaries between two adjacent layers.

We found these iIMPACT-defined spaDEGs are frequently functionally relevant to the visual cortex (Fig.  5 C and Additional file 1: Table S2). For example, we observed Deptor , which is highly expressed and functions in a significant portion of corticostriatal and callosal neurons, located in the middle and superficial portions of layer 5 (L5) [ 46 ], and Vamp1 , which is ubiquitously expressed and functioned in layer III pyramidal neurons in higher-order areas [ 47 ] (Fig.  5 D). These two genes were detected by iIMPACT only.

In this paper, we presented iIMPACT, a two-stage statistical method that integrates histology images and molecular profiles. The first stage is a Bayesian finite normal-multinomial mixture model for identifying histology-based spatial domains. Numerous methods for spatial domain identification necessitate a dimensionality reduction step applied to the molecular profile, which compromises the clarity and direct interpretability of the identified spatial domains. However, iIMPACT fully leverages cellular-level information from histology images to improve clustering performance and increase interpretability. The cell type abundance data derived from HD-Staining encompasses comprehensive morphological details, such as cell growth pattern, cell–cell interaction, and cell interaction with the surrounding microenvironment, thereby improving the performance. On the other side, the latent spatial domain-specific relative abundance of cell types parametrized by iIMPACT offers a straightforward and user-friendly approach to define and characterize the identified spatial domains. The second stage is a NB regression model for detecting domain-specific spaDEGs. From both the simulation study (details in Additional file 1: Section S2) and real data analysis, we demonstrated that iIMPACT had higher accuracy in identifying spatial domains than published state-of-the-art methods due to the integration of histopathology images in iIMPACT. In addition, iIMPACT is versatile in analyzing both NGS-based and imaging-based SRT techniques, and therefore have broad impacts in the SRT field. Furthermore, iIMPACT has good biological interpretability to characterize histology-based spatial domains. For example, the inferred domain-specific cell-type compositions are consistent with curated annotations, and the interactive zones emphasize the areas with highly heterogeneous cell-type composition and gene expression compared with surroundings. Compared with other SVG detection methods, iIMPACT-defined spaDEGs are more enriched of known functional genes, confirming that iIMPACT could provide a better understanding of both cellular spatial organization and functional gene landscape of developmental and diseased tissues. Last but not least, compared with other methods, we also confirmed that iIMPACT is computationally efficient (Additional file 1: Table S3).

In real data applications, we assessed the performance of spatial domain identification by measuring the consistency between the identified domains and the manual annotation provided by the pathologists. While we recognize that these manual annotations might not perfectly reflect the true segmentation of domains integrating both morphological and molecular information, using them as a benchmark remains a standard and widely accepted practice in spatial domain identification, as established by precedent in foundational work [ 17 , 18 , 19 ]. Moreover, iIMPACT, alongside other spatial domain identification methods primarily relying on molecular profiles, exhibits limited capability in characterizing regions that are histologically distinct but have similar or low-quality gene expression. For instance, for the human breast cancer dataset, none of the methods effectively distinguished the fat region (in blue) from the fibrous tissue (in red) as per the manual annotation. The constrained number of cells in fat tissue results in a limited amount of gene expression measured, leading to low-quality molecular profiles and consequently unsatisfactory performance in the identification of those domains.

iIMPACT, BASS, and BayesSpace utilize a Bayesian mixture model with a Markov random field model for the identification of smooth spatial domains based on the SRT molecular profile. Unlike BASS and BayesSpace omit complementary information from the paired histology image, iIMPACT integrates cell type abundance derived from the image as an additional component. This integration significantly improves the accuracy of spatial domain identification (comparison of ARIs in Additional file 1: Table S4) and enables the biological interpretability of these domains. Moreover, iIMPACT assumes the image and molecular profiles—specifically, cell type abundance and gene expression levels—to contribute to spatial clustering, with adjustable weighting to optimize results. In contrast, BASS models cell type composition as a hidden layer within its Bayesian hierarchical model, asserting a direct probabilistic link between gene expression features and latent cell types. Notably, BASS specializes in analyzing imaging-based SRT data, which typically achieve the single-cell resolution and supports multi-sample clustering, whereas iIMPACT conducts spatial domain identification at the spot resolution, rendering it more suitable for analyzing NGS-based SRT data.

Nuclei identification methods for histology image analysis exhibit several limitations that hinder their widespread applicability and accuracy. One primary challenge is their generalizability. Most deep-learning-based algorithms require model training on high-quality labeled data, making them less adaptable to varied datasets and potentially limiting their generalizability across different tissue types and staining techniques. Besides, the performance of these nuclei identification methods may decrease when handling overlapping nuclei, where segmentation becomes intricate due to the lack of clear boundaries. To address the limitations of existing nuclei identification methods and enhance the versatility of iIMPACT, we proposed an alternative approach for the data preparation outlined in Additional file 1: Fig. S13. When implementing iIMPACT on tissue sections where precise nuclei classification proves challenging using nuclei identification methods, we leveraged the outputs from deep-learning-based or statistical nuclei segmentation methods to derive the nuclei localization, enabling us to determine the number of nuclei in each spot. Many methods [ 48 , 49 , 50 ] exist for isolating cell nuclei across various tissue types without relying on manual labeling data for training. Subsequently, we recommend using reference-free cell-type deconvolution methods [ 51 ] to generate the cell type abundance table. This data preparation pipeline was applied to additional SRT data, the LIBD human dorsolateral prefrontal cortex (DLPFC) data generated via 10x Visium [ 52 ]. Notably, iIMPACT demonstrated superior performance in spatial domain identification under this alternative approach to generate the image profile (Additional file 1: Fig. S14). Details are introduced in Additional file 1: Section S3. We also validated this data preparation pipeline on the human breast cancer data, and iIMPACT achieved better performance than only utilizing the molecular profile for the human breast cancer dataset and similar performance for human prostate and ovarian cancer datasets, as shown in Additional file 1: Fig. S15–S17 and Table S5. To leverage the histology image more sufficiently and enhance the interpretability of identified domains, we suggest obtaining the image profile, i.e., the cell type abundance table, by conducting nuclei identification through HD-Staining for cancer tissues. While originally designed for lung cancer, HD-Staining has proven effective for breast, prostate, and ovarian cancers, as demonstrated in our study, indicating its broader utility. The alternate data preparation pipeline should be reserved for instances where HD-Staining is less effective, such as with non-cancerous tissue sections.

There are several important future extensions for iIMPACT. First, improvement of nuclei segmentation and classification methods might further improve the performance of iIMPACT and therefore will be our focus in the near future. Second, the number of histology-based spatial domains has to be pre-specified when implementing the current version of iIMPACT. To automatically estimate the number of spatial domains, we plan to replace the proposed Bayesian finite mixture model with a Bayesian nonparametric model, such as the Dirichlet process mixture model [ 53 ] or a mixture of finite mixture model [ 54 , 55 ]. Third, iIMPACT's performance in spatial domain identification was less satisfactory when dependent solely on histological image profiles, as shown in Additional file 1: Fig. S15–S17. This may be due to the extensive cell-type heterogeneity within domains, exemplified in Additional file 1: Fig. S18 for the human breast cancer dataset. Thus, integrating molecular information is crucial for effective spatial clustering. However, further investigation into better utilization of image profiles is also warranted. For instance, cell–cell interaction information can be incorporated into iIMPACT to improve the accuracy of histology-based spatial domain identification and increase the model interpretability. These future directions could potentially further boost the performance and interpretability of iIMPACT.

Conclusions

In conclusion, we have introduced iIMPACT, a multi-stage method that integrates histology image and spatial transcriptomics data to identify histology-based spatial domains and detect spatial domain-specific differentially expressed genes. Compared with existing methods, iIMPACT improves spatial domain identification accuracy and enhances biological interpretability by leveraging cellular-level information from AI-reconstructed histology images, and identifies spaDEGs enriched with known functional genes, making it a powerful tool for spatial transcriptomics analysis.

In this section, we first define the molecular and geospatial profiles from NGS-based SRT data (e.g., spatial transcriptomics and the improved 10 × Visium platform) and the image profile from the matching AI-reconstructed histology image. Then we discuss how to construct the corresponding profiles from imaging-based SRT (e.g., STARmap) data. After that, we detail the statistical models used in the two stages of iIMPACT. Additional file 1: Table S6 summarizes all key notations introduced in this section.

Data preparation

Molecular profile y.

In general, the spot-level molecular profile of NGS-based SRT data can be represented by an \(N\times P\) count table \({\varvec{C}}\) , where each entry \({c}_{ij}\in {\mathbb{N}}\) , \(i=1,\cdots ,N,j=1,\cdots ,P\) is the read count for gene \(j\) measured at spot \(i\) . To account for nuisance effects across spots, including sequencing depth, amplification and dilution efficiency, and reverse transcription efficiency, we normalize each read count \({c}_{ij}\) to its relative level \({\widetilde{c}}_{ij}={c}_{ij}/{s}_{i}\) , where \({s}_{i}\) is the total sum of counts across all genes at spot \(i\) , \({s}_{i}={\sum }_{j=1}^{P}{c}_{ij}\) , although other normalization methods are acceptable. Then, the relative gene expression \({\widetilde{c}}_{ij}\) are further log transformed to approximately conform to normality. Following the preprocessing steps in BayesSpace [ 17 ], we select the top 2000 most highly variable genes in terms of their relative expression and perform principal component analysis (PCA), or other dimension reduction techniques (e.g., t-SNE [ 56 ] or UMAP [ 57 ]), to obtain the low-dimensional representation of normalized gene expression denoted by an \(N\times P{^\prime}\) matrix \({\varvec{Y}}\) , where each entry \({y}_{ij}\in {\mathbb{R}}\) , \(i=1,\cdots ,N,j=1,\cdots ,P{^\prime}\) is the value of the \(j\) -th top principal component (PC) at spot \(i\) . We choose to model the PCs in \({\varvec{Y}}\) rather than the raw count table \({\varvec{C}}\) to avoid the use of complex finite mixture models with feature selection based on cumbersome multivariate distributions. Here, we recommend modeling the top three PCs ( \({P}{^\prime}=3\) ) for simplicity. A sensitivity analysis on the human breast cancer data (see Additional file 1: Fig. S19) shows that larger \({P}{^\prime}\) only provided marginal improvements in clustering performance.

Image profile V

To integrate the image profile into iIMPACT, we applied a nuclei segmentation and identification algorithm, the histology-based digital (HD)-Staining model [ 23 ], to extract cellular features from images. The HD-Staining model is a trained deep-learning model implemented by the mask regional convolutional neural network (Mask R-CNN) architecture [ 58 ] for the tumor morphological microenvironment to segment the nuclei of different types of cells, such as immune, tumor, and stromal cells. The model was first trained using histology images from lung adenocarcinoma patients in the National Lung Screening Trial study, which has nuclei of six different cell types manually labeled by pathologists. Although the model was originally trained by lung cancer data, it has been improved and verified to be widely adapted to histology image datasets with other cancer types, such as breast cancer, head and neck cancer, ovarian cancer, prostate cancer, and other carcinomas.

The HD-Staining model takes a batch of high-resolution histology image patches of a tissue section as input and simultaneously segments and classifies cell nuclei on this image patch. It provides the locations and types for all identified nuclei in the whole histology image. To match the molecular information measured at spots, which only take less than half area (e.g., the area of all spots in 10x Visium platform is about 38% of the whole domain area), we count cells with different types within each spot and its expanded area (see Additional file 1: Fig. S20) so that all the cellular information can be utilized. The result is summarized into an \(N\times Q\) count matrix \({\varvec{V}}\) , namely cell abundance table, where each entry \({v}_{iq}\in {\mathbb{N}}\) , \(i=1,\cdots ,N,q=1,\cdots ,Q\) is the number of cells with type \(q\) observed at spot \(i\) and its expanded area. iIMPACT leverages the single-cell level histology information from the image profile to enhance spatial domain identification.

Geospatial profile G

Spots are the round area of barcoded mRNA capture probes where gene expression is measured. The spatial distribution of spots is arrayed on a square or triangular lattice. We denote the SRT geospatial profile by an \(N\times 2\) matrix \({\varvec{T}}\) , where each row \({{\varvec{t}}}_{i}=({t}_{i1},{t}_{i2})\) gives the \(x\) and \(y\) coordinates of the spot \(i\) on a two-dimensional Cartesian plane. ST and 10 × Visium spots are arranged on square and triangular lattice grids, respectively. Thus, defining a neighborhood structure provides an alternative way to represent the geospatial profile \({\varvec{G}}\) . In particular, \({\varvec{G}}\) is an \(N\times N\) binary adjacent matrix, where each entry \({g}_{ii{^\prime}}=1\) if spot \(i\) and \(i{^\prime}\) are neighbors (i.e., the Euclidean distance \(\sqrt{{\left({t}_{i1}-{t}_{i{\prime}1}\right)}^{2}+{\left({t}_{i2}-{t}_{i{\prime}2}\right)}^{2}}\) between spot \(i\) and \({i}{^\prime}\) is less than a threshold) and \({g}_{ii{^\prime}}=0\) otherwise. Note that each diagonal entry \({g}_{ii{^\prime}}\) is equal to zero. There are four and six neighbors for each non-boundary spot from the ST and 10 × Visium platforms, respectively. With this neighborhood structure \({\varvec{G}}\) as our geospatial profile, the spatial information can be easily integrated into Bayesian cluster analysis via an appropriate prior setting.

Special handling to imaging-based SRT data

Imaging-based SRT techniques usually have a higher spatial resolution than NGS-based SRT techniques, which is capable of profiling mRNA at the single-cell level. Data from some imaging-based platforms might provide the spatial distribution and types of cells on the tissue section in the original study. To fit iIMPACT to imaging-based SRT data such as STARmap [ 11 ], we manually add a square lattice grid with appropriate size to the whole domain and consider each square unit as a spot (see Fig.  5 A). Note that those ‘spots’ fill the whole domain; thus, there is no gap between two adjacent spots. For STARmap data in the RESULTS section, the grid size was chosen to be \(750\times 750\) pixels, resulting in \(N=170\) spots. Each non-boundary spot has four neighboring spots. We define \({\varvec{G}}\) with each entry  \(g_{ii^\prime}=1\)  if spot \(i\) and  \(i^\prime\) are neighbors. To construct the molecular profile \({\varvec{Y}}\) , we first normalize, transform, and reduce the dimension of the gene expression counts at the single-cell level, and then average the resulting values across all cells within each spot. To obtain the “image” profile \({\varvec{V}}\) , we directly count the cells with different types in each spot.

Stage I: a Bayesian normal-multinomial mixture model for identifying histology-based spatial domains

The first stage of iIMPACT is to use a Bayesian finite mixture model to partition the whole domain into \(K\) mutually exclusive histology-based spatial domains. In general, a finite mixture model [ 59 ] generates random variables from a weighted sum of \(K\) independent distributions that belong to the same parametric family,

where \({\varvec{z}}={\left({z}_{1},\cdots ,{z}_{N}\right)}^{\text{T}}\) denotes the latent variables specifying the identity of the mixture component \({f}_{k}\) , characterized by \({{\varvec{\theta}}}_{k}\) , to each observation \({{\varvec{x}}}_{i}\) . In the context of this paper, \({{\varvec{x}}}_{i}=\left\{{{\varvec{y}}}_{i}\in {\mathbb{R}}^{{P}{^\prime}},{{\varvec{v}}}_{i}\in {\mathbb{N}}^{Q}\right\}\) represents the observed molecular and image profiling data, and \({z}_{i}=k\) indicates that spot \(i\) belongs to histology-based spatial domain \(k\) . Since there are two modalities \({\varvec{Y}}\) and \({\varvec{V}}\) , we decompose the mixture component \({f}_{k}\) into two sub-components described below. In addition, we incorporate the information from the geospatial profile \({\varvec{G}}\) into the prior placed over the auxiliary variable \({\varvec{z}}\) , encouraging the neighboring spots to be in the same histology-based spatial domain.

Modeling the molecular profile Y

We use a multivariate normal (MN) sub-component for modeling the low-dimensional gene expression \({{\varvec{y}}}_{i}\) at spot \(i\) :

where \({{\varvec{\mu}}}_{k}={\left({\mu }_{k1},\dots ,{\mu }_{k{P}{\prime}}\right)}^{\text{T}},{\mu }_{kp}\in {\mathbb{R}}\) is the domain-specific mean vector and \({\boldsymbol{\Sigma }}_{k}\) is the \({P}{^\prime}\times P{^\prime}\) domain-specific variance–covariance matrix, requiring positive definiteness. For computational efficiency, we specify a normal prior for \({{\varvec{\mu}}}_{k}\) conditional on \({\boldsymbol{\Sigma }}_{k}\) , and an inverse-Wishart (IW) prior for \({\boldsymbol{\Sigma }}_{k}\) , i.e., \({{\varvec{\mu}}}_{k}|{\boldsymbol{\Sigma }}_{k} \sim \text{MN}({{\varvec{\nu}}}_{0}, {\boldsymbol{\Sigma }}_{k}/{\tau }_{0})\) and \({\boldsymbol{\Sigma }}_{k} \sim \text{IW}\left({\eta }_{0},{{\varvec{\Phi}}}_{0}\right)\) . This conjugate setting leads to analytically tractable posterior distributions on \({{\varvec{\mu}}}_{k}\) and \({\boldsymbol{\Sigma }}_{k}\) . Here, \({{\varvec{\nu}}}_{0}\) , \({\tau }_{0}\) , \({\eta }_{0}\) , and \({\Phi }_{0}\) are fixed hyperparameters. We set \({{\varvec{\nu}}}_{0}\) to be the empirical mean vector over all spots and \({\tau }_{0}=0.01\) to provide a weak prior information so that the data itself would dominate the estimation of \({{\varvec{\mu}}}_{k}\) . We set the degree of freedom parameter \({\eta }_{0}={P}{^\prime}+1\) , controlling the informative strength, and the scale matrix \({{\varvec{\Phi}}}_{0}\) to be the identity matrix. Let \({n}_{k}={\sum }_{i=1}^{N}\text{I}\left({z}_{i}=k\right)\) and \({\overline{{\varvec{y}}} }_{k}=\frac{1}{{n}_{k}}{\sum }_{i=1}^{N}\text{I}\left({z}_{i}=k\right){{\varvec{y}}}_{i}\) , the closed-form posterior distributions are \({{\varvec{\mu}}}_{k}|{\boldsymbol{\Sigma }}_{k},{\varvec{Y}} \sim \text{MN}({{\varvec{\nu}}}_{k}, {\boldsymbol{\Sigma }}_{k}/{\tau }_{k})\) and \({\boldsymbol{\Sigma }}_{k}|{\varvec{Y}}\sim \text{IW}\left({\eta }_{k},{{\varvec{\Phi}}}_{k}\right)\) , where \({\tau }_{k}={\tau }_{0}+{n}_{k}\) , \({\eta }_{k}={\eta }_{0}+{n}_{k}\) , \({{\varvec{\nu}}}_{k}={(\tau }_{0}{{\varvec{\nu}}}_{0}+{n}_{k}{\overline{{\varvec{y}}} }_{k})/({n}_{k}+{\tau }_{0})\) , \({{\varvec{\Phi}}}_{k}={{\varvec{\Phi}}}_{0}+{\sum }_{i=1}^{N}\text{I}\left({z}_{i}=k\right){\left({{\varvec{y}}}_{i}-{\overline{{\varvec{y}}} }_{k}\right)}^{\text{T}}\left({{\varvec{y}}}_{i}-{\overline{{\varvec{y}}} }_{k}\right)+\frac{{n}_{k}{\tau }_{0}}{{\tau }_{0}+{n}_{k}}{\left({\overline{{\varvec{y}}} }_{k}-{{\varvec{\nu}}}_{0}\right)}^{\text{T}}\left({\overline{{\varvec{y}}} }_{k}-{{\varvec{\nu}}}_{0}\right)\) .

Suppose we choose PCA to perform an orthogonal projection of the scaled and normalized SRT molecular profiling data, we can further set all off-diagonal entries in \({\boldsymbol{\Sigma }}_{k}\) to be zero, i.e., \({\sigma }_{kp{p}{\prime}}=0,\forall p\ne p{\prime}\) . In this case, the multivariate normal model can be decomposed into a product of \(P{\prime}\) independent normal model,

where \({{\varvec{\sigma}}}_{k}^{2}={\left({\sigma }_{k1}^{2},\dots ,{\sigma }_{kP{\prime}}^{2}\right)}^{\text{T}}\) is the set of diagonal entries in \({\boldsymbol{\Sigma }}_{k}\) . The conjugate setting for each dimension becomes a normal-inverse-gamma (IG) distribution [ 60 ], \({\mu }_{kp}|{\sigma }_{kp}^{2} \sim \text{N}(0, {\sigma }_{kp}^{2}/{\tau }_{0})\) and \({\sigma }_{kp}^{2} \sim \text{IG}\left({\nu }_{0}/2,{\Phi }_{0}/2\right)\) , resulting in the closed-form posteriors \({\mu }_{kp}|{\sigma }_{kp}^{2} \sim \text{N}(0, {\sigma }_{kp}^{2}/{\tau }_{k})\) and \({\sigma }_{kp}^{2} \sim \text{IG}\left({\nu }_{k}/2,{\Phi }_{k}/2\right)\) , where \({\tau }_{k}={\tau }_{0}+{n}_{k}\) , \({\eta }_{k}={\eta }_{0}+{n}_{k}\) , and \({\Phi }_{k}={\Phi }_{0}+{\sum }_{i=1}^{N}\text{I}\left({z}_{i}=k\right){\left({y}_{ip}-{\overline{y} }_{kp}\right)}^{2}+\frac{{n}_{k}{\tau }_{0}}{{\tau }_{0}+{n}_{k}}{{\overline{y} }_{kp}}^{2}\) . One standard way of setting a weakly informative IG prior is to choose small values of both parameters, such as \({\nu }_{0}/2={\Phi }_{0}/2=0.1\) .

Modeling the image profile V

We use a multinomial sub-component for modeling the number of cells with different types \({{\varvec{v}}}_{i}\) within spot \(i\) and its expanded area:

where \({m}_{i}={\sum }_{q=1}^{Q}{v}_{iq}\) is the total number of cells observed within the area and \({{\varvec{\omega}}}_{k}={\left({\omega }_{k1},\dots ,{\omega }_{kQ}\right)}^{\text{T}}\) is defined on a \(Q\) -dimensional simplex (i.e., \({\omega }_{kq}>0\) , \(\forall q\) and \({\sum }_{q=1}^{Q}{\omega }_{kq}=1\) ), representing the underlying relative abundance of cell types in histology-based spatial domain \(k\) . Of particular note is that \({{\varvec{\omega}}}_{1},\cdots ,{{\varvec{\omega}}}_{K}\) are the parameters of key interest in iIMPACT, because it can be used to interpret or even define the identified histology-based spatial domains. For example, if a histology-based spatial domain is heavily dominated by cell type \(q\) , i.e., \({\omega }_{kq}\gg {\omega }_{kq{\prime}},\forall q{^\prime}\) , then it could be named after cell type \(q\) . Note that cell type abundance is assumed to be homogeneous across the same histology-based spatial domain. For computational efficiency, we specify a Dirichlet prior setting for \({{\varvec{\omega}}}_{k}\) , i.e., \({{\varvec{\omega}}}_{k} \sim \text{Dir}\left({\boldsymbol{\alpha }}_{0}\right)\) , where \({\boldsymbol{\alpha }}_{0}={\left({\alpha }_{01},\dots ,{\alpha }_{0Q}\right)}^{\text{T}},{\alpha }_{0q}\in {\mathbb{R}}^{+}\) are fixed hyperparameters. This conjugate setting leads to an analytically tractable posterior distribution on \({{\varvec{\omega}}}_{k}|{\varvec{V}} \sim \text{Dir}\left({\boldsymbol{\alpha }}_{k}\right)\) with each entry \({\boldsymbol{\alpha }}_{kq}={\alpha }_{0q}+{\sum }_{i=1}^{N}\text{I}\left({z}_{i}=k\right){v}_{iq}\) . We recommend \({\alpha }_{01}=\cdots ={\alpha }_{0Q}=1/2\) or \(1\) for a non or weakly informative setting.

Incorporating the geospatial profile G

To utilize the available spatial information in the geospatial profile, we employ a Markov random field prior [ 33 , 34 ] on the histology-based spatial domain indicator \({\varvec{z}}\) , encouraging neighboring spots to be clustered into the same histology-based spatial domain:

where \({{\varvec{z}}}_{-i}\) denotes the set of all entries in \({\varvec{z}}\) excluding the \(i\) th one, the hyperparameters \({\varvec{d}}={\left({d}_{1},\cdots ,{d}_{N}\right)}^{\text{T}}\) control the number of spots belonging to each of the \(K\) histology-based spatial domains and \(f{\in {\mathbb{R}}}^{+}\) controls the spatial dependency or smoothness. Note that if a spot has no neighbors, the above prior distribution reduces to a multinomial distribution, \({z}_{i}=k\sim \text{Multi}\left(N,\text{exp}\left({\varvec{d}}\right)/{\sum }_{k=1}^{K}\text{exp}\left({d}_{k}\right)\right)\) . Although the larger the \(f\) , the smoother the pattern of spatial domains, careful determination of \(f\) is required. This is because a large value of \(f\) may lead to a phase transition problem (i.e., all spots are assigned to the same histology-based spatial domain). In this paper, we choose \({d}_{1}=\cdots ={d}_{K}=1\) and \(f=1\) by default, as this setting performs very well in the simulation study and yields reasonable results in our real data analysis.

Posterior sampling via MCMC algorithm

iIMPACT integrates the molecular, image, and geospatial profiles to partition the whole domain into \(K\) biologically meaningful spatial domains. Because the low-dimensional molecular profile \({\varvec{Y}}\) and AI-reconstructed image profile \({\varvec{V}}\) are generated from different sources, they are conditionally independent of each other. Thus, we define the mixture component

where the tuning parameter \(w\in [\text{0,1}]\) controls the image profile’s contribution to the clustering process, with respect to that of the molecular profile. Parameterizing the data likelihood above by decreasing \(w\) will result in a flatter multinomial distribution, thus downplaying the role of the image profile. When \(w=0\) , iIMPACT will not depend on any cell type abundance information. We conducted a sensitivity analysis to search for the best choice of \(w\) . Our result suggests setting \(w=0.05\) and \(0.5\) for 10 × Visium and STARmap data, respectively (see Additional file 1: Fig. S21). Note that in addition to the SRT platform and application, we should also consider the image and molecular profiles’ dimensionalities (i.e., \(Q\) and \(P{\prime}\) ) to determine the value of \(w\) with some degree of caution. Finally, we give the full posterior distribution as,

To identify histology-based spatial domains, the posterior distribution of \({z}_{i}\) will be of direct interest to us, given by

The individual quantities of all possible values of \({z}_{i}\) are first computed and then summed to find the normalization constant \(e={\sum }_{k=1}^{K}\pi \left({z}_{i}=k|\bullet \right)\) . A new value of \({z}_{i}\) can be drawn from a multinomial distribution \(\text{Multi}\left(1,{\left(\pi \left({z}_{i}=1|\bullet \right)/e,\cdots ,\pi \left({z}_{i}=K|\bullet \right)/e\right)}^{\text{T}}\right)\) . For any particular domain-specific parameters, i.e., \({{\varvec{\mu}}}_{k},{\boldsymbol{\Sigma }}_{k},{{\varvec{\omega}}}_{k}\) , we only require the partial data likelihood in estimating its posterior density as detailed before. Since the posterior conditional distributions for all parameters are in closed form, it is straightforward to use a Gibbs sampler, a type of Markov chain Monte Carlo (MCMC) algorithm, to obtain a sequence of observations approximated from the multivariate distribution \(\pi \left({\varvec{z}},{{\varvec{\mu}}}_{1},\cdots ,{{\varvec{\mu}}}_{k},{\boldsymbol{\Sigma }}_{1},\cdots ,{\boldsymbol{\Sigma }}_{k},{{\varvec{\omega}}}_{1}{\cdots{\varvec{\omega}}}_{k}|{\varvec{V}},\boldsymbol{ }{\varvec{Y}}\right)\) (details in Additional file 1: Section S4). Consequently, the posterior inference can be made by post-processing the MCMC samples, such as \(\left\{{{\varvec{z}}}^{(1)},\cdots ,{{\varvec{z}}}^{(U)}\right\}\) and \(\left\{{{\varvec{\omega}}}_{k}^{(1)},\cdots ,{{\varvec{\omega}}}_{k}^{(U)}\right\}\) , where \(u\) indexes the MCMC iteration and \(U\) is the total number of iterations after burn-in.

In any finite mixture model, the invariance of the likelihood under permutation of the cluster labels \({\varvec{z}}\) may result in an identifiability problem, leading to symmetric and multimodal posterior distributions with up to \(K!\) copies of each genuine model. What is worse, it will also complicate inference on other parameters. To address this issue, we impose an order restriction on the posterior samples of parameters \({{\varvec{\omega}}}_{1}{\cdots{\varvec{\omega}}}_{k}\) based on a given cell type \(q\) . In particular, at each iteration \(u\) , we relabel \({\varvec{z}}\) and switch all the related domain-specific parameters of the MCMC outputs to satisfy the constraint \({\omega }_{kq}^{(u)}>{\omega }_{k{\prime}q}^{(u)}\) for cluster indicator \(k<k^{\prime}\) . In other words, the first histology-based spatial domain has the largest proportion of cell type \(q\) , while histology-based spatial domain \(K\) has the small proportion of cell type \(q\) .

Identifying histology-based spatial domains and interactive zones

Our primary interest lies in identifying histology-based spatial domains via making inferences on the spatial domain indicator vector \({\varvec{z}}\) . Here we apply the mode estimates [ 61 ] based on the marginal probabilities \(\pi \left({z}_{i}=k|\cdot \right)\approx \frac{1}{U}{\sum }_{u=1}^{U}\text{I}({z}_{i}^{\left(u\right)}=k)\) . The estimate of \({\widehat{z}}_{i}\) can be obtained by selecting the highest value:

Uncertainty quantification is one advantage of the proposed Bayesian finite mixture model. For example, if the marginal probability of assigning spot i to histology-based spatial domain k is considerably high, e.g., \(\pi \left({z}_{i}=k|\cdot \right)\ge 0.9,\) then we are confident about the assignment. However, if some marginal probabilities are almost equivalent or there is no significant mode for a spot, e.g., \(\pi \left({z}_{i}=k|\cdot \right)<0.9,\forall k\) , then we tend not to assign the spot to any histology-based spatial domains. Instead, we define the spot as the boundary spot, and the resulting connected area as the interactive zone.

Interpreting and defining histology-based spatial domains

The domain-specific relative abundance of cell types \({{\varvec{\omega}}}_{1},\cdots ,{{\varvec{\omega}}}_{K}\) are another group of parameters of interest in our model, because it can be used to interpret or even define the identified histology-based spatial domains. We use the posterior mean as the estimate,

averaging over all its MCMC samples. Additionally, the credible interval for each \({\omega }_{kq}\) can be approximated by its post-burn-in MCMC sample quantiles. Note that the MCMC samples can also be used to approximate any other quantity of interest that analytical solution is impossible, e.g., \(\pi \left(\left.{\omega }_{kq}>{\omega }_{k{\prime}q}\right|\cdot\right)\) for some \(k\) , \(k{^\prime}\) , and \(q\) .

Choosing the number of histology-based spatial domains K.

The number of histology-based spatial domain \(K\) can be determined by prior biological knowledge when available. In the absence of this information, we could apply the integrated completed likelihood (ICL) [ 62 ] as the criterion for selecting \(K\) . The ICL is calculated using the following:

where \(\text{L}\left({\varvec{Y}},{\varvec{V}},\widehat{{\varvec{z}}}|{\widehat{{\varvec{\mu}}}}_{1},\cdots ,{\widehat{{\varvec{\mu}}}}_{K},{\widehat{\boldsymbol{\Sigma }}}_{1},\cdots ,{\widehat{\boldsymbol{\Sigma }}}_{K},{\widehat{{\varvec{\omega}}}}_{1},\cdots ,{\widehat{{\varvec{\omega}}}}_{K}\right)\) is complete data likelihood, i.e., the product of Eq. ( 1 ) over \(i\) , and \(d=2K{P}{^\prime}+K (Q-1)\) is the total number of model parameters.

Stage II: a generalized linear regression model for detecting spaDEGs

To test if each gene is differentially expressed among those identified histology-based spatial domains in Stage I of iIMPACT, we use a generalized linear regression model, where the response variable is gene expression counts, and the predictor variables are the histology-based spatial domain indicators. In particular, we assume that all read counts from a gene \(j\) across different spots indexed by \(i\) are from an NB distribution:

where \({s}_{i}\) is the size factor of spot \(i\) , \({\psi }_{j}\) is the over-dispersion parameter of gene \(j\) , and \({\lambda }_{ij}\) is the underlying normalized expression level for gene \(j\) at spot \(i\) . We further use the canonical link,

which is typically used in the Poisson and NB regression models. Here, \({x}_{i,k}=\text{I}({z}_{i}=k)\) is a binary indicator. If spot \(i\) is assigned to histology-based spatial domain \(k\) in Stage I of iIMPACT, then \({x}_{i,k}=1\) ; otherwise, \({x}_{i,k}=0\) . Thus, we can interpret the intercept \({\alpha }_{j,k}\) as the baseline expression level of gene \(j\) in the whole domain excluding histology-based spatial domain \(k\) , and the slope \({\beta }_{j,k}\) as the differential expression level of gene \(j\) in histology-based spatial domain \(k\) as a shift from the baseline. With this modeling framework, spaDEGs, which are differentially expressed in a given histology-based spatial domain \(k\) compared with all other domains, can be identified via testing the null hypothesis \({H}_{0}:{\beta }_{j,k}=0\) versus the alternative \({H}_{\alpha }:{\beta }_{j,k}\ne 0\) . For those genes whose resulting adjusted p-values are less than a significance level (e.g., \(0.05\) ), we define them as domain- \(k\) -specific spatially variable genes. To control the false discovery rate, the Benjamini and Hochberg method [ 63 ] needs to be applied to adjust p-values. The above NB regression model is fitted via the function glm.nb in the R package MASS [ 64 ].

Availability of data and materials

The authors analyzed four publicly available SRT datasets. Raw count matrices, images, and spatial data for three SRT datasets from 10x Visium are accessible on the 10x Genomics website at  https://support.10xgenomics.com/spatial-gene-expression/datasets . Mouse visual cortex STARmap data can be downloaded from https://www.starmapresources.com/data . Processed data of the analyzed four SRT datasets can be downloaded from Zenodo [ 65 ]. An open-source implementation of the iIMPACT algorithm in R/C ++ is available at Github ( https://github.com/Xijiang1997/iIMPACT ) [ 66 ] and Zenodo [ 65 ].

Asp M, Bergenstrahle J, Lundeberg J. Spatially resolved transcriptomes-next generation tools for tissue exploration. BioEssays. 2020;42:e1900221.

Article   PubMed   Google Scholar  

Burgess DJ. Spatial transcriptomics coming of age. Nat Rev Genet. 2019;20:317.

Article   CAS   PubMed   Google Scholar  

Zhang M, Sheffield T, Zhan X, Li Q, Yang DM, Wang Y, Wang S, Xie Y, Wang T, Xiao G. Spatial molecular profiling: platforms, applications and analysis tools. Brief Bioinform. 2021;22(3):bbaa145.

Moor AE, Itzkovitz S. Spatial transcriptomics: paving the way for tissue-level systems biology. Curr Opin Biotechnol. 2017;46:126–33.

Ståhl PL, Salmén F, Vickovic S, Lundmark A, Navarro JF, Magnusson J, Giacomello S, Asp M, Westholm JO, Huss M, et al. Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science. 2016;353:78–82.

Rodriques SG, Stickels RR, Goeva A, Martin CA, Murray E, Vanderburg CR, Welch J, Chen LM, Chen F, Macosko EZ. Slide-seq: A scalable technology for measuring genome-wide expression at high spatial resolution. Science. 2019;363:1463–7.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Stickels RR, Murray E, Kumar P, Li J, Marshall JL, Di Bella DJ, Arlotta P, Macosko EZ, Chen F. Highly sensitive spatial transcriptomics at near-cellular resolution with Slide-seqV2. Nat Biotechnol. 2021;39:313–9.

Vickovic S, Eraslan G, Salmén F, Klughammer J, Stenbeck L, Schapiro D, Äijö T, Bonneau R, Bergenstråhle L, Navarro JF, et al. High-definition spatial transcriptomics for in situ tissue profiling. Nat Methods. 2019;16:987–90.

Lubeck E, Coskun AF, Zhiyentayev T, Ahmad M, Cai L. Single-cell in situ RNA profiling by sequential hybridization. In Nat Methods. 2014;11:360–1.

Article   CAS   Google Scholar  

Chen KH, Boettiger AN, Moffitt JR, Wang S, Zhuang X. RNA imaging. Spatially resolved, highly multiplexed RNA profiling in single cells. Science. 2015;348:aaa6090.

Article   PubMed   PubMed Central   Google Scholar  

Wang X, Allen WE, Wright MA, Sylwestrak EL, Samusik N, Vesuna S, Evans K, Liu C, Ramakrishnan C, Liu J, et al. Three-dimensional intact-tissue sequencing of single-cell transcriptional states. Science. 2018;361(6400):eaat5691.

Moses L, Pachter L. Museum of spatial transcriptomics. Nat Methods. 2022;19:534–46.

Thrane K, Eriksson H, Maaskola J, Hansson J, Lundeberg J. Spatially resolved transcriptomics enables dissection of genetic heterogeneity in stage III cutaneous malignant melanoma. Cancer Res. 2018;78:5970–9.

Satija R, Farrell JA, Gennert D, Schier AF, Regev A. Spatial reconstruction of single-cell gene expression data. Nat Biotechnol. 2015;33:495–502.

Kiselev VY, Andrews TS, Hemberg M. Challenges in unsupervised clustering of single-cell RNA-seq data. Nat Rev Genet. 2019;20:273–82.

Zhu Q, Shah S, Dries R, Cai L, Yuan GC. Identification of spatially associated subpopulations by combining scRNAseq and sequential fluorescence in situ hybridization data. Nat Biotechnol. 2018;36(12):1183–90.

Zhao E, Stone MR, Ren X, Guenthoer J, Smythe KS, Pulliam T, Williams SR, Uytingco CR, Taylor SEB, Nghiem P, et al. Spatial transcriptomics at subspot resolution with BayesSpace. Nat Biotechnol. 2021;39:1375–84.

Li Z, Zhou X. BASS: multi-scale and multi-sample analysis enables accurate cell type clustering and spatial domain detection in spatial transcriptomic studies. Genome Biol. 2022;23:168.

Hu J, Li X, Coleman K, Schroeder A, Ma N, Irwin DJ, Lee EB, Shinohara RT, Li M. SpaGCN: Integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network. Nat Methods. 2021;18:1342–51.

Pham D, Tan X, Xu J, Grice LF, Lam PY, Raghubar A, Vukovic J, Ruitenberg MJ, Nguyen Q. stLearn: integrating spatial location, tissue morphology and gene expression to find cell types, cell-cell interactions and spatial trajectories within undissociated tissues. bioRxiv. 2020:2020.2005.2031.125658.

Bao F, Deng Y, Wan S, Shen SQ, Wang B, Dai Q, Altschuler SJ, Wu LF. Integrative spatial analysis of cell morphologies and transcriptional states with MUSE. Nat Biotechnol. 2022;40:1200–9.

Tang Z, Li Z, Hou T, Zhang T, Yang B, Su J, Song Q. SiGra: single-cell spatial elucidation through an image-augmented graph transformer. Nat Commun. 2023;14:5618.

Wang S, Rong R, Yang DM, Fujimoto J, Yan S, Cai L, Yang L, Luo D, Behrens C, Parra ER, et al. Computational staining of pathology images to study the tumor microenvironment in lung cancer. Cancer Res. 2020;80:2056–66.

Fox H. Is H&E morphology coming to an end? J Clin Pathol. 2000;53:38–40.

Cui Y, Zhang G, Liu Z, Xiong Z, Hu J. A deep learning algorithm for one-step contour aware nuclei segmentation of histopathology images. Med Biol Eng Comput. 2019;57:2027–43.

Li X, Chen H, Qi X, Dou Q, Fu C-W, Heng P-A. H-DenseUNet: hybrid densely connected UNet for liver and tumor segmentation from CT volumes. IEEE Trans Med Imaging. 2018;37:2663–74.

Raza SEA, Cheung L, Shaban M, Graham S, Epstein D, Pelengaris S, Khan M, Rajpoot NM. Micro-Net: A unified model for segmentation of various objects in microscopy images. Med Image Anal. 2019;52:160–73.

Graham S, Vu QD, Raza SEA, Azam A, Tsang YW, Kwak JT, Rajpoot N. Hover-Net: Simultaneous segmentation and classification of nuclei in multi-tissue histology images. Med Image Anal. 2019;58:101563.

Svensson V, Teichmann SA, Stegle O. SpatialDE: identification of spatially variable genes. Nat Methods. 2018;15:343–6.

Sun S, Zhu J, Zhou X. Statistical analysis of spatial expression patterns for spatially resolved transcriptomic studies. Nat Methods. 2020;17:193–200.

Li Q, Zhang M, Xie Y, Xiao G. Bayesian modeling of spatial molecular profiling data via gaussian process. Bioinformatics. 2021;37(22):4129–36.

Jiang X, Xiao G, Li Q. A Bayesian modified Ising model for identifying spatially variable genes from spatial transcriptomics data. Stat Med. 2022;41:4647–65.

Clifford P. Markov random fields in statistics. Disorder in physical systems: A volume in honour of John M Hammersley. 1990. p. 19–32.

Google Scholar  

Morris R, Descombes X, Zerubia J. Fully Bayesian image segmentation-an engineering perspective. In Proceedings of International Conference on Image Processing. IEEE; 1997. pp. 54–57.

Neckmann U, Wolowczyk C, Hall M, Almaas E, Ren J, Zhao S, Johannessen B, Skotheim RI, Bjørkøy G, Ten Dijke P. GREM1 is associated with metastasis and predicts poor prognosis in ER-negative breast cancer patients. Cell Commun Signal. 2019;17:1–17.

Wang C, Lv J, Xue C, Li J, Liu Y, Xu D, Jiang Y, Jiang S, Zhu M, Yang Y, Zhang S. Novel role of COX6c in the regulation of oxidative phosphorylation and diseases. Cell Death Discov. 2022;8:336.

Gajulapalli VN, Samanthapudi VS, Pulaganti M, Khumukcham SS, Malisetty VL, Guruprasad L, Chitta SK, Manavathi B. A transcriptional repressive role for epithelial-specific ETS factor ELF3 on oestrogen receptor alpha in breast cancer cells. Biochem J. 2016;473:1047–61.

Li H, Calder CA, Cressie N. Beyond Moran’s I: testing for spatial dependence based on the spatial autoregressive model. Geogr Anal. 2007;39:357–75.

Article   Google Scholar  

Moses MA, Kim YS, Rivera-Marquez GM, Oshima N, Watson MJ, Beebe KE, Wells C, Lee S, Zuehlke AD, Shao H. Targeting the Hsp40/Hsp70 chaperone axis as a novel strategy to treat castration-resistant prostate cancer. Can Res. 2018;78:4022–35.

Gao Y, Teng J, Hong Y, Qu F, Ren J, Li L, Pan X, Chen L, Yin L, Xu D, Cui X. The oncogenic role of EIF3D is associated with increased cell cycle progression and motility in prostate cancer. Med Oncol. 2015;32:518.

Daniels G, Li Y, Gellert LL, Zhou A, Melamed J, Wu X, Zhang X, Zhang D, Meruelo D, Logan SK, et al. TBLR1 as an androgen receptor (AR) coactivator selectively activates AR target genes to inhibit prostate cancer growth. Endocr Relat Cancer. 2014;21:127–42.

Sand M, Bechara FG, Gambichler T, Sand D, Bromba M, Hahn SA, Stockfleth E, Hessam S. Circular RNA expression in cutaneous squamous cell carcinoma. J Dermatol Sci. 2016;83:210–8.

Chatterjee R, Chatterji U. CLEC12A: a promise target for cancer therapy. Arch Clin Med Case Rep. 2022;6:706–14.

Wang YQ, Xu MD, Weng WW, Wei P, Yang YS, Du X. BCL6 is a negative prognostic factor and exhibits pro-oncogenic activity in ovarian cancer. Am J Cancer Res. 2015;5:255–66.

PubMed   Google Scholar  

Oyama Y, Shigeta S, Tokunaga H, Tsuji K, Ishibashi M, Shibuya Y, Shimada M, Yasuda J, Yaegashi N. CHD4 regulates platinum sensitivity through MDR1 expression in ovarian cancer: A potential role of CHD4 inhibition as a combination therapy with platinum agents. PLoS One. 2021;16:e0251079.

Sorensen SA, Bernard A, Menon V, Royall JJ, Glattfelder KJ, Desta T, Hirokawa K, Mortrud M, Miller JA, Zeng H, et al. Correlated gene expression and target specificity demonstrate excitatory projection neuron diversity. Cereb Cortex. 2015;25:433–49.

Krienen FM, Yeo BT, Ge T, Buckner RL, Sherwood CC. Transcriptional profiles of supragranular-enriched genes associate with corticocortical network architecture in the human brain. Proc Natl Acad Sci U S A. 2016;113:E469-478.

Hayakawa T, Prasath VS, Kawanaka H, Aronow BJ, Tsuruoka S. Computational nuclei segmentation methods in digital pathology: a survey. Arch Comput Methods Eng. 2021;28:1–13.

Mouroutis T, Roberts SJ, Bharath AA. Robust cell nuclei segmentation using statistical modelling. Bioimaging. 1998;6:79–91.

Fatakdawala H, Xu J, Basavanhally A, Bhanot G, Ganesan S, Feldman M, Tomaszewski JE, Madabhushi A. Expectation–maximization-driven geodesic active contour with overlap resolution (emagacor): Application to lymphocyte segmentation on breast cancer histopathology. IEEE Trans Biomed Eng. 2010;57:1676–89.

Miller BF, Huang F, Atta L, Sahoo A, Fan J. Reference-free cell type deconvolution of multi-cellular pixel-resolution spatially resolved transcriptomics data. Nat Commun. 2022;13:2339.

Maynard KR, Collado-Torres L, Weber LM, Uytingco C, Barry BK, Williams SR, Catallini JL, Tran MN, Besich Z, Tippani M. Transcriptome-scale spatial gene expression in the human dorsolateral prefrontal cortex. Nat Neurosci. 2021;24:425–36.

Ferguson TS. A Bayesian analysis of some nonparametric problems. The annals of statistics. 1973. p. 209–30.

Miller JW, Harrison MT. Mixture models with a prior on the number of components. J Am Stat Assoc. 2018;113:340–56.

Hu G, Yang HC, Xue Y. Bayesian group learning for shot selection of professional basketball players. Stat. 2021;10:e324.

Van der Maaten L, Hinton G. Visualizing data using t-SNE. JMLR; 2008. p. 9.

Becht E, McInnes L, Healy J, Dutertre C-A, Kwok IW, Ng LG, Ginhoux F, Newell EW. Dimensionality reduction for visualizing single-cell data using UMAP. Nat Biotechnol. 2019;37:38–44.

He K, Gkioxari G, Dollár P, Girshick R. Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision. 2017. p. 2961–9.

Kontkanen P, Myllymaki P, Tirri H. Constructing Bayesian finite mixture models by the EM algorithm. In: ESPRIT Working Group on Neural and Computational Learning (NeuroCOLT. Citeseer; 1996.

Alvarez I, Niemi J, Simpson M. Bayesian inference for a covariance matrix. arXiv preprint arXiv:14084050 2014.

Li Q, Dahl DB, Vannucci M, Hyun J, Tsai JW. Bayesian model of protein primary sequence for secondary structure prediction. PLoS One. 2014;9:e109832.

Biernacki C, Celeux G, Govaert G. Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans Pattern Anal Mach Intell. 2000;22:719–25.

Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Stat Soc Ser B (Methodol). 1995;57:289–300.

Ripley B, Venables B, Bates DM, Hornik K, Gebhardt A, Firth D, Ripley MB. Package ‘mass.’ Cran r. 2013;538:113–20.

Jiang X, Wang S, Guo L, Zhu B, Wen Z, Jia L, Xu L, Xiao G, Li Q. iIMPACT: integrating image and molecular profiles for spatial transcriptomics analysis.  Zenodo. 2024.  https://doi.org/10.5281/zenodo.11117768 .

Jiang X, Wang S, Guo L, Zhu B, Wen Z, Jia L, Xu L, Xiao G, Li Q. iIMPACT: Integrating image and molecular profiles for spatial transcriptomics analysis. Github.  https://github.com/Xijiang1997/iIMPACT .

Download references

Acknowledgements

Not applicable.

This work was supported by the following funding: the National Science Foundation [2210912, 2113674] and the National Institutes of Health [1R01GM141519] (to Q. L.); the National Institutes of Health [R01GM140012, R01GM141519, R01DE030656, U01CA249245], and the Cancer Prevention and Research Institute of Texas [CPRIT RP230330] (to G. X.); the Rally Foundation, Children’s Cancer Fund (Dallas), the Cancer Prevention and Research Institute of Texas (RP180319, RP200103, RP220032, RP170152 and RP180805), and the National Institutes of Health (R01DK127037, R01CA263079, R21CA259771, UM1HG011996, and R01HL144969) (to L. X.); The funding bodies had no role in the design, collection, analysis, or interpretation of data in this study.

Author information

Authors and affiliations.

Quantitative Biomedical Research Center, Peter O’Donnell Jr. School of Public Health, The University of Texas Southwestern Medical Center, Dallas, TX, USA

Xi Jiang, Shidan Wang, Lei Guo, Zhuoyu Wen, Lin Xu & Guanghua Xiao

Department of Statistics and Data Science, Southern Methodist University, Dallas, TX, USA

Department of Statistics, The Chinese University of Hong Kong, Hong Kong SAR, China

Bencong Zhu

Department of Mathematical Sciences, The University of Texas at Dallas, Richardson, TX, USA

Bencong Zhu & Qiwei Li

Department of Pathology, The University of Texas Southwestern Medical Center, Dallas, TX, USA

You can also search for this author in PubMed   Google Scholar

Contributions

XJ analyzed the data and interpreted the result, and was a major contributor in writing the manuscript. SW and ZW generated the image profiles of the human breast cancer dataset, human prostate cancer dataset, and human ovarian cancer dataset. LG performed the gene enrichment analysis. BZ conducted the analysis on the selection of a number of spatial domains. LJ performed the histological examination of pathology images from the human prostate cancer dataset and human ovarian cancer dataset. LX, GX, and QL conceived the study and supervised the statistical modeling and analyses. All authors read and approved the final manuscript.

Review history

The review history is available as Additional file 2.

Peer review information

Xiang Zhou and Veronique van den Berghe were the primary editors of this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Corresponding authors

Correspondence to Lin Xu , Guanghua Xiao or Qiwei Li .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1. supplementary materials for iimpact, including the supplementary sections, figures, and tables., additional file 2. review history., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Jiang, X., Wang, S., Guo, L. et al. iIMPACT: integrating image and molecular profiles for spatial transcriptomics analysis. Genome Biol 25 , 147 (2024). https://doi.org/10.1186/s13059-024-03289-5

Download citation

Received : 28 June 2023

Accepted : 23 May 2024

Published : 06 June 2024

DOI : https://doi.org/10.1186/s13059-024-03289-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Spatially resolved transcriptomics
  • AI-reconstructed histology image
  • Markov random field
  • Spatial clustering
  • Spatially variable gene

Genome Biology

ISSN: 1474-760X

methods of analysis in case study

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

applsci-logo

Article Menu

methods of analysis in case study

  • Subscribe SciFeed
  • Recommended Articles
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

A novel stacking ensemble learning approach for predicting pm2.5 levels in dense urban environments using meteorological variables: a case study in macau, share and cite.

Tian, H.; Kong, H.; Wong, C. A Novel Stacking Ensemble Learning Approach for Predicting PM2.5 Levels in Dense Urban Environments Using Meteorological Variables: A Case Study in Macau. Appl. Sci. 2024 , 14 , 5062. https://doi.org/10.3390/app14125062

Tian H, Kong H, Wong C. A Novel Stacking Ensemble Learning Approach for Predicting PM2.5 Levels in Dense Urban Environments Using Meteorological Variables: A Case Study in Macau. Applied Sciences . 2024; 14(12):5062. https://doi.org/10.3390/app14125062

Tian, Haoting, Hoiio Kong, and Chanseng Wong. 2024. "A Novel Stacking Ensemble Learning Approach for Predicting PM2.5 Levels in Dense Urban Environments Using Meteorological Variables: A Case Study in Macau" Applied Sciences 14, no. 12: 5062. https://doi.org/10.3390/app14125062

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

IMAGES

  1. How To Do Case Study Analysis?

    methods of analysis in case study

  2. Analysis process in the case study analysis .

    methods of analysis in case study

  3. 😍 Case study method. Writing a Case Study. 2019-02-08

    methods of analysis in case study

  4. -4 Methods for case study data analysis

    methods of analysis in case study

  5. how case study methodology

    methods of analysis in case study

  6. qualitative research recent developments in case study methods

    methods of analysis in case study

VIDEO

  1. Fault Analysis

  2. 12th Accountancy Chapter 8 (Financial Statements Analysis )Case Study

  3. Qualitative Research Tools

  4. Case Study Research

  5. Business Analysis Questions & Answers

  6. Fundamental Analysis

COMMENTS

  1. Case Study

    A case study is a qualitative research method that involves the in-depth exploration and analysis of a particular case, which can be an individual, group, organization, event, or community. The primary purpose of a case study is to generate a comprehensive and nuanced understanding of the case, including its history, context, and dynamics.

  2. Case Study Methods and Examples

    The purpose of case study research is twofold: (1) to provide descriptive information and (2) to suggest theoretical relevance. Rich description enables an in-depth or sharpened understanding of the case. It is unique given one characteristic: case studies draw from more than one data source. Case studies are inherently multimodal or mixed ...

  3. What Is a Case Study?

    A case study is a detailed study of a specific subject, such as a person, group, place, event, organization, or phenomenon. Case studies are commonly used in social, educational, clinical, and business research. A case study research design usually involves qualitative methods, but quantitative methods are sometimes also used.

  4. What is a Case Study?

    A case study protocol outlines the procedures and general rules to be followed during the case study. This includes the data collection methods to be used, the sources of data, and the procedures for analysis. Having a detailed case study protocol ensures consistency and reliability in the study.

  5. Case Study Methodology of Qualitative Research: Key Attributes and

    A case study is one of the most commonly used methodologies of social research. This article attempts to look into the various dimensions of a case study research strategy, the different epistemological strands which determine the particular case study type and approach adopted in the field, discusses the factors which can enhance the effectiveness of a case study research, and the debate ...

  6. Writing a Case Analysis Paper

    Case study is a method of in-depth research and rigorous inquiry; case analysis is a reliable method of teaching and learning. A case study is a modality of research that investigates a phenomenon for the purpose of creating new knowledge, solving a problem, or testing a hypothesis using empirical evidence derived from the case being studied.

  7. Case Study Method: A Step-by-Step Guide for Business Researchers

    Case study protocol is a formal document capturing the entire set of procedures involved in the collection of empirical material . It extends direction to researchers for gathering evidences, empirical material analysis, and case study reporting . This section includes a step-by-step guide that is used for the execution of the actual study.

  8. Case Study

    Case studies tend to focus on qualitative data using methods such as interviews, observations, and analysis of primary and secondary sources (e.g., newspaper articles, photographs, official records). Sometimes a case study will also collect quantitative data. Example: Mixed methods case study. For a case study of a wind farm development in a ...

  9. LibGuides: Research Writing and Analysis: Case Study

    A Case study is: An in-depth research design that primarily uses a qualitative methodology but sometimes includes quantitative methodology. Used to examine an identifiable problem confirmed through research. Used to investigate an individual, group of people, organization, or event. Used to mostly answer "how" and "why" questions.

  10. What Is a Case, and What Is a Case Study?

    Résumé. Case study is a common methodology in the social sciences (management, psychology, science of education, political science, sociology). A lot of methodological papers have been dedicated to case study but, paradoxically, the question "what is a case?" has been less studied.

  11. PDF Analyzing Case Study Evidence

    For case study analysis, one of the most desirable techniques is to use a pattern-matching logic. Such a logic (Trochim, 1989) compares an empiri-cally based pattern with a predicted one (or with several alternative predic-tions). If the patterns coincide, the results can help a case study to strengthen its internal validity. If the case study ...

  12. What is Case Study Analysis? (Explained With Examples)

    Case Study Analysis is a widely used research method that examines in-depth information about a particular individual, group, organization, or event. It is a comprehensive investigative approach that aims to understand the intricacies and complexities of the subject under study. Through the analysis of real-life scenarios and inquiry into ...

  13. (PDF) Qualitative Case Study Methodology: Study Design and

    McMaster University, West Hamilton, Ontario, Canada. Qualitative case study methodology prov ides tools for researchers to study. complex phenomena within their contexts. When the approach is ...

  14. Chapter 5: DATA ANALYSIS AND INTERPRETATION

    As case study research is a flexible research method, qualitative data analysis methods are commonly used [176]. The basic objective of the analysis is, as in any other analysis, to derive conclusions from the data, keeping a clear chain of evidence.

  15. Research Methods

    Qualitative analysis methods. Qualitative analysis is used to understand words, ideas, and experiences. You can use it to interpret data that was collected: From open-ended surveys and interviews, literature reviews, case studies, ethnographies, and other sources that use text rather than numbers. Using non-probability sampling methods.

  16. Methodology or method? A critical review of qualitative case study

    Definitions of qualitative case study research. Case study research is an investigation and analysis of a single or collective case, intended to capture the complexity of the object of study (Stake, 1995).Qualitative case study research, as described by Stake (), draws together "naturalistic, holistic, ethnographic, phenomenological, and biographic research methods" in a bricoleur design ...

  17. Four Steps to Analyse Data from a Case Study Method

    regards to case study research (Markus, 1989), states that the case study method "is an empirical enquiry that investigates a contemporary phenomenon within its real life context". Typically interview techniques are utilised as part of the case study method to address the 'how' and 'why' type research questions.

  18. Case Study Research Method in Psychology

    The case study is not a research method, but researchers select methods of data collection and analysis that will generate material suitable for case studies. Freud (1909a, 1909b) conducted very detailed investigations into the private lives of his patients in an attempt to both understand and help them overcome their illnesses.

  19. What the Case Study Method Really Teaches

    What the Case Study Method Really Teaches. Summary. It's been 100 years since Harvard Business School began using the case study method. Beyond teaching specific subject matter, the case study ...

  20. 5 Benefits of the Case Study Method

    Through the case method, you can "try on" roles you may not have considered and feel more prepared to change or advance your career. 5. Build Your Self-Confidence. Finally, learning through the case study method can build your confidence. Each time you assume a business leader's perspective, aim to solve a new challenge, and express and ...

  21. Case Study Method

    This method offers a continuous analysis of the facts. The case study method will look at the facts continuously for the social group being studied by researchers. That means there aren't interruptions in the process that could limit the validity of the data being collected through this work. This advantage reduces the need to use assumptions ...

  22. Case Study Methodology of Qualitative Research: Key Attributes and

    The following key attributes of the case study methodology can be underlined. 1. Case study is a research strategy, and not just a method/technique/process of data collection. 2. A case study involves a detailed study of the concerned unit of analysis within its natural setting. A de-contextualised study has no relevance in a case study ...

  23. What Is Data Analysis? (With Examples)

    Written by Coursera Staff • Updated on Apr 19, 2024. Data analysis is the practice of working with data to glean useful information, which can then be used to make informed decisions. "It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts," Sherlock ...

  24. Imbalanced spectral data analysis using data augmentation ...

    Several real-world case studies are provided to show the effectiveness of our method in imbalanced spectral data analysis. In "Case study using spectral data from Pluronic F-127 hydrogel" and ...

  25. Time‐Dependent Fuzzy Reliability Analysis of Concrete Slab during the

    The proposed methods consist of two main components, the first of which is similar to our previous study , i.e., the mechanical parameters are calibrated using deformation back analysis techniques, and the time-dependent performance function of slab cracking is constructed by the Drucker-Prager (DP) yield criterion and tensile strength ...

  26. Sentiment Analysis in Python: Going Beyond Bag of Words

    In this article, we will conduct a brief case study using Twitter data. In the end, you will see a case study that has a significant impact on real life, which will surely pique your interest. But first, let's start with the basics. What is Sentiment Analysis? Sentiment analysis is a method, used to predict feelings, like digital psychologists.

  27. Case Selection Techniques in Case Study Research: A Menu of Qualitative

    How can scholars select cases from a large universe for in-depth case study analysis? Random sampling is not typically a viable approach when the total number of cases to be selected is small. ... Andrew, and Angela Tsay. 2000. Sequence analysis and optimal matching methods in sociology. Sociological Methods and Research 29:3-33. Google Scholar ...

  28. iIMPACT: integrating image and molecular profiles for spatial

    Current clustering analysis of spatial transcriptomics data primarily relies on molecular information and fails to fully exploit the morphological features present in histology images, leading to compromised accuracy and interpretability. To overcome these limitations, we have developed a multi-stage statistical method called iIMPACT. It identifies and defines histology-based spatial domains ...

  29. Applied Sciences

    Air pollution, particularly particulate matter such as PM2.5 and PM10, has become a focal point of global concern due to its significant impact on air quality and human health. Macau, as one of the most densely populated cities in the world, faces severe air quality challenges. We leveraged daily pollution data from 2015 to 2023 and hourly meteorological pollution monitoring data from 2020 to ...