CRENC Learn

How to Create a Data Analysis Plan: A Detailed Guide

by Barche Blaise | Aug 12, 2020 | Writing

how to create a data analysis plan

If a good research question equates to a story then, a roadmap will be very vita l for good storytelling. We advise every student/researcher to personally write his/her data analysis plan before seeking any advice. In this blog article, we will explore how to create a data analysis plan: the content and structure.

This data analysis plan serves as a roadmap to how data collected will be organised and analysed. It includes the following aspects:

  • Clearly states the research objectives and hypothesis
  • Identifies the dataset to be used
  • Inclusion and exclusion criteria
  • Clearly states the research variables
  • States statistical test hypotheses and the software for statistical analysis
  • Creating shell tables

1. Stating research question(s), objectives and hypotheses:

All research objectives or goals must be clearly stated. They must be Specific, Measurable, Attainable, Realistic and Time-bound (SMART). Hypotheses are theories obtained from personal experience or previous literature and they lay a foundation for the statistical methods that will be applied to extrapolate results to the entire population.

2. The dataset:

The dataset that will be used for statistical analysis must be described and important aspects of the dataset outlined. These include; owner of the dataset, how to get access to the dataset, how the dataset was checked for quality control and in what program is the dataset stored (Excel, Epi Info, SQL, Microsoft access etc.).

3. The inclusion and exclusion criteria :

They guide the aspects of the dataset that will be used for data analysis. These criteria will also guide the choice of variables included in the main analysis.

4. Variables:

Every variable collected in the study should be clearly stated. They should be presented based on the level of measurement (ordinal/nominal or ratio/interval levels), or the role the variable plays in the study (independent/predictors or dependent/outcome variables). The variable types should also be outlined.  The variable type in conjunction with the research hypothesis forms the basis for selecting the appropriate statistical tests for inferential statistics. A good data analysis plan should summarize the variables as demonstrated in Figure 1 below.

Presentation of variables in a data analysis plan

5. Statistical software

There are tons of software packages for data analysis, some common examples are SPSS, Epi Info, SAS, STATA, Microsoft Excel. Include the version number,  year of release and author/manufacturer. Beginners have the tendency to try different software and finally not master any. It is rather good to select one and master it because almost all statistical software have the same performance for basic and the majority of advance analysis needed for a student thesis. This is what we recommend to all our students at CRENC before they begin writing their results section .

6. Selecting the appropriate statistical method to test hypotheses

Depending on the research question, hypothesis and type of variable, several statistical methods can be used to answer the research question appropriately. This aspect of the data analysis plan outlines clearly why each statistical method will be used to test hypotheses. The level of statistical significance (p-value) which is often but not always <0.05 should also be written.  Presented in figures 2a and 2b are decision trees for some common statistical tests based on the variable type and research question

A good analysis plan should clearly describe how missing data will be analysed.

How to choose a statistical method to determine association between variables

7. Creating shell tables

Data analysis involves three levels of analysis; univariable, bivariable and multivariable analysis with increasing order of complexity. Shell tables should be created in anticipation for the results that will be obtained from these different levels of analysis. Read our blog article on how to present tables and figures for more details. Suppose you carry out a study to investigate the prevalence and associated factors of a certain disease “X” in a population, then the shell tables can be represented as in Tables 1, Table 2 and Table 3 below.

Table 1: Example of a shell table from univariate analysis

Example of a shell table from univariate analysis

Table 2: Example of a shell table from bivariate analysis

Example of a shell table from bivariate analysis

Table 3: Example of a shell table from multivariate analysis

Example of a shell table from multivariate analysis

aOR = adjusted odds ratio

Now that you have learned how to create a data analysis plan, these are the takeaway points. It should clearly state the:

  • Research question, objectives, and hypotheses
  • Dataset to be used
  • Variable types and their role
  • Statistical software and statistical methods
  • Shell tables for univariate, bivariate and multivariate analysis

Further readings

Creating a Data Analysis Plan: What to Consider When Choosing Statistics for a Study https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4552232/pdf/cjhp-68-311.pdf

Creating an Analysis Plan: https://www.cdc.gov/globalhealth/healthprotection/fetp/training_modules/9/creating-analysis-plan_pw_final_09242013.pdf

Data Analysis Plan: https://www.statisticssolutions.com/dissertation-consulting-services/data-analysis-plan-2/

Photo created by freepik – www.freepik.com

Barche Blaise

Dr Barche is a physician and holds a Masters in Public Health. He is a senior fellow at CRENC with interests in Data Science and Data Analysis.

Post Navigation

16 comments.

Ewane Edwin, MD

Thanks. Quite informative.

James Tony

Educative write-up. Thanks.

Mabou Gabriel

Easy to understand. Thanks Dr

Amabo Miranda N.

Very explicit Dr. Thanks

Dongmo Roosvelt, MD

I will always remember how you help me conceptualize and understand data science in a simple way. I can only hope that someday I’ll be in a position to repay you, my dear friend.

Menda Blondelle

Plan d’analyse

Marc Lionel Ngamani

This is interesting, Thanks

Nkai

Very understandable and informative. Thank you..

Ndzeshang

love the figures.

Selemani C Ngwira

Nice, and informative

MONICA NAYEBARE

This is so much educative and good for beginners, I would love to recommend that you create and share a video because some people are able to grasp when there is an instructor. Lots of love

Kwasseu

Thank you Doctor very helpful.

Mbapah L. Tasha

Educative and clearly written. Thanks

Philomena Balera

Well said doctor,thank you.But when do you present in tables ,bars,pie chart etc?

Rasheda

Very informative guide!

Submit a Comment Cancel Reply

Your email address will not be published. Required fields are marked *

Notify me of follow-up comments by email.

Notify me of new posts by email.

Submit Comment

  Receive updates on new courses and blog posts

Never Miss a Thing!

Never Miss a Thing!

Subscribe to our mailing list to receive the latest news and updates on our webinars, articles and courses.

You have Successfully Subscribed!

  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • QuestionPro

survey software icon

  • Solutions Industries Gaming Automotive Sports and events Education Government Travel & Hospitality Financial Services Healthcare Cannabis Technology Use Case NPS+ Communities Audience Contactless surveys Mobile LivePolls Member Experience GDPR Positive People Science 360 Feedback Surveys
  • Resources Blog eBooks Survey Templates Case Studies Training Help center

how to create a data analysis in research

Home Market Research

Data Analysis in Research: Types & Methods

data-analysis-in-research

Content Index

Why analyze data in research?

Types of data in research, finding patterns in the qualitative data, methods used for data analysis in qualitative research, preparing data for analysis, methods used for data analysis in quantitative research, considerations in research data analysis, what is data analysis in research.

Definition of research in data analysis: According to LeCompte and Schensul, research data analysis is a process used by researchers to reduce data to a story and interpret it to derive insights. The data analysis process helps reduce a large chunk of data into smaller fragments, which makes sense. 

Three essential things occur during the data analysis process — the first is data organization . Summarization and categorization together contribute to becoming the second known method used for data reduction. It helps find patterns and themes in the data for easy identification and linking. The third and last way is data analysis – researchers do it in both top-down and bottom-up fashion.

LEARN ABOUT: Research Process Steps

On the other hand, Marshall and Rossman describe data analysis as a messy, ambiguous, and time-consuming but creative and fascinating process through which a mass of collected data is brought to order, structure and meaning.

We can say that “the data analysis and data interpretation is a process representing the application of deductive and inductive logic to the research and data analysis.”

Researchers rely heavily on data as they have a story to tell or research problems to solve. It starts with a question, and data is nothing but an answer to that question. But, what if there is no question to ask? Well! It is possible to explore data even without a problem – we call it ‘Data Mining’, which often reveals some interesting patterns within the data that are worth exploring.

Irrelevant to the type of data researchers explore, their mission and audiences’ vision guide them to find the patterns to shape the story they want to tell. One of the essential things expected from researchers while analyzing data is to stay open and remain unbiased toward unexpected patterns, expressions, and results. Remember, sometimes, data analysis tells the most unforeseen yet exciting stories that were not expected when initiating data analysis. Therefore, rely on the data you have at hand and enjoy the journey of exploratory research. 

Create a Free Account

Every kind of data has a rare quality of describing things after assigning a specific value to it. For analysis, you need to organize these values, processed and presented in a given context, to make it useful. Data can be in different forms; here are the primary data types.

  • Qualitative data: When the data presented has words and descriptions, then we call it qualitative data . Although you can observe this data, it is subjective and harder to analyze data in research, especially for comparison. Example: Quality data represents everything describing taste, experience, texture, or an opinion that is considered quality data. This type of data is usually collected through focus groups, personal qualitative interviews , qualitative observation or using open-ended questions in surveys.
  • Quantitative data: Any data expressed in numbers of numerical figures are called quantitative data . This type of data can be distinguished into categories, grouped, measured, calculated, or ranked. Example: questions such as age, rank, cost, length, weight, scores, etc. everything comes under this type of data. You can present such data in graphical format, charts, or apply statistical analysis methods to this data. The (Outcomes Measurement Systems) OMS questionnaires in surveys are a significant source of collecting numeric data.
  • Categorical data: It is data presented in groups. However, an item included in the categorical data cannot belong to more than one group. Example: A person responding to a survey by telling his living style, marital status, smoking habit, or drinking habit comes under the categorical data. A chi-square test is a standard method used to analyze this data.

Learn More : Examples of Qualitative Data in Education

Data analysis in qualitative research

Data analysis and qualitative data research work a little differently from the numerical data as the quality data is made up of words, descriptions, images, objects, and sometimes symbols. Getting insight from such complicated information is a complicated process. Hence it is typically used for exploratory research and data analysis .

Although there are several ways to find patterns in the textual information, a word-based method is the most relied and widely used global technique for research and data analysis. Notably, the data analysis process in qualitative research is manual. Here the researchers usually read the available data and find repetitive or commonly used words. 

For example, while studying data collected from African countries to understand the most pressing issues people face, researchers might find  “food”  and  “hunger” are the most commonly used words and will highlight them for further analysis.

LEARN ABOUT: Level of Analysis

The keyword context is another widely used word-based technique. In this method, the researcher tries to understand the concept by analyzing the context in which the participants use a particular keyword.  

For example , researchers conducting research and data analysis for studying the concept of ‘diabetes’ amongst respondents might analyze the context of when and how the respondent has used or referred to the word ‘diabetes.’

The scrutiny-based technique is also one of the highly recommended  text analysis  methods used to identify a quality data pattern. Compare and contrast is the widely used method under this technique to differentiate how a specific text is similar or different from each other. 

For example: To find out the “importance of resident doctor in a company,” the collected data is divided into people who think it is necessary to hire a resident doctor and those who think it is unnecessary. Compare and contrast is the best method that can be used to analyze the polls having single-answer questions types .

Metaphors can be used to reduce the data pile and find patterns in it so that it becomes easier to connect data with theory.

Variable Partitioning is another technique used to split variables so that researchers can find more coherent descriptions and explanations from the enormous data.

LEARN ABOUT: Qualitative Research Questions and Questionnaires

There are several techniques to analyze the data in qualitative research, but here are some commonly used methods,

  • Content Analysis:  It is widely accepted and the most frequently employed technique for data analysis in research methodology. It can be used to analyze the documented information from text, images, and sometimes from the physical items. It depends on the research questions to predict when and where to use this method.
  • Narrative Analysis: This method is used to analyze content gathered from various sources such as personal interviews, field observation, and  surveys . The majority of times, stories, or opinions shared by people are focused on finding answers to the research questions.
  • Discourse Analysis:  Similar to narrative analysis, discourse analysis is used to analyze the interactions with people. Nevertheless, this particular method considers the social context under which or within which the communication between the researcher and respondent takes place. In addition to that, discourse analysis also focuses on the lifestyle and day-to-day environment while deriving any conclusion.
  • Grounded Theory:  When you want to explain why a particular phenomenon happened, then using grounded theory for analyzing quality data is the best resort. Grounded theory is applied to study data about the host of similar cases occurring in different settings. When researchers are using this method, they might alter explanations or produce new ones until they arrive at some conclusion.

LEARN ABOUT: 12 Best Tools for Researchers

Data analysis in quantitative research

The first stage in research and data analysis is to make it for the analysis so that the nominal data can be converted into something meaningful. Data preparation consists of the below phases.

Phase I: Data Validation

Data validation is done to understand if the collected data sample is per the pre-set standards, or it is a biased data sample again divided into four different stages

  • Fraud: To ensure an actual human being records each response to the survey or the questionnaire
  • Screening: To make sure each participant or respondent is selected or chosen in compliance with the research criteria
  • Procedure: To ensure ethical standards were maintained while collecting the data sample
  • Completeness: To ensure that the respondent has answered all the questions in an online survey. Else, the interviewer had asked all the questions devised in the questionnaire.

Phase II: Data Editing

More often, an extensive research data sample comes loaded with errors. Respondents sometimes fill in some fields incorrectly or sometimes skip them accidentally. Data editing is a process wherein the researchers have to confirm that the provided data is free of such errors. They need to conduct necessary checks and outlier checks to edit the raw edit and make it ready for analysis.

Phase III: Data Coding

Out of all three, this is the most critical phase of data preparation associated with grouping and assigning values to the survey responses . If a survey is completed with a 1000 sample size, the researcher will create an age bracket to distinguish the respondents based on their age. Thus, it becomes easier to analyze small data buckets rather than deal with the massive data pile.

LEARN ABOUT: Steps in Qualitative Research

After the data is prepared for analysis, researchers are open to using different research and data analysis methods to derive meaningful insights. For sure, statistical analysis plans are the most favored to analyze numerical data. In statistical analysis, distinguishing between categorical data and numerical data is essential, as categorical data involves distinct categories or labels, while numerical data consists of measurable quantities. The method is again classified into two groups. First, ‘Descriptive Statistics’ used to describe data. Second, ‘Inferential statistics’ that helps in comparing the data .

Descriptive statistics

This method is used to describe the basic features of versatile types of data in research. It presents the data in such a meaningful way that pattern in the data starts making sense. Nevertheless, the descriptive analysis does not go beyond making conclusions. The conclusions are again based on the hypothesis researchers have formulated so far. Here are a few major types of descriptive analysis methods.

Measures of Frequency

  • Count, Percent, Frequency
  • It is used to denote home often a particular event occurs.
  • Researchers use it when they want to showcase how often a response is given.

Measures of Central Tendency

  • Mean, Median, Mode
  • The method is widely used to demonstrate distribution by various points.
  • Researchers use this method when they want to showcase the most commonly or averagely indicated response.

Measures of Dispersion or Variation

  • Range, Variance, Standard deviation
  • Here the field equals high/low points.
  • Variance standard deviation = difference between the observed score and mean
  • It is used to identify the spread of scores by stating intervals.
  • Researchers use this method to showcase data spread out. It helps them identify the depth until which the data is spread out that it directly affects the mean.

Measures of Position

  • Percentile ranks, Quartile ranks
  • It relies on standardized scores helping researchers to identify the relationship between different scores.
  • It is often used when researchers want to compare scores with the average count.

For quantitative research use of descriptive analysis often give absolute numbers, but the in-depth analysis is never sufficient to demonstrate the rationale behind those numbers. Nevertheless, it is necessary to think of the best method for research and data analysis suiting your survey questionnaire and what story researchers want to tell. For example, the mean is the best way to demonstrate the students’ average scores in schools. It is better to rely on the descriptive statistics when the researchers intend to keep the research or outcome limited to the provided  sample  without generalizing it. For example, when you want to compare average voting done in two different cities, differential statistics are enough.

Descriptive analysis is also called a ‘univariate analysis’ since it is commonly used to analyze a single variable.

Inferential statistics

Inferential statistics are used to make predictions about a larger population after research and data analysis of the representing population’s collected sample. For example, you can ask some odd 100 audiences at a movie theater if they like the movie they are watching. Researchers then use inferential statistics on the collected  sample  to reason that about 80-90% of people like the movie. 

Here are two significant areas of inferential statistics.

  • Estimating parameters: It takes statistics from the sample research data and demonstrates something about the population parameter.
  • Hypothesis test: I t’s about sampling research data to answer the survey research questions. For example, researchers might be interested to understand if the new shade of lipstick recently launched is good or not, or if the multivitamin capsules help children to perform better at games.

These are sophisticated analysis methods used to showcase the relationship between different variables instead of describing a single variable. It is often used when researchers want something beyond absolute numbers to understand the relationship between variables.

Here are some of the commonly used methods for data analysis in research.

  • Correlation: When researchers are not conducting experimental research or quasi-experimental research wherein the researchers are interested to understand the relationship between two or more variables, they opt for correlational research methods.
  • Cross-tabulation: Also called contingency tables,  cross-tabulation  is used to analyze the relationship between multiple variables.  Suppose provided data has age and gender categories presented in rows and columns. A two-dimensional cross-tabulation helps for seamless data analysis and research by showing the number of males and females in each age category.
  • Regression analysis: For understanding the strong relationship between two variables, researchers do not look beyond the primary and commonly used regression analysis method, which is also a type of predictive analysis used. In this method, you have an essential factor called the dependent variable. You also have multiple independent variables in regression analysis. You undertake efforts to find out the impact of independent variables on the dependent variable. The values of both independent and dependent variables are assumed as being ascertained in an error-free random manner.
  • Frequency tables: The statistical procedure is used for testing the degree to which two or more vary or differ in an experiment. A considerable degree of variation means research findings were significant. In many contexts, ANOVA testing and variance analysis are similar.
  • Analysis of variance: The statistical procedure is used for testing the degree to which two or more vary or differ in an experiment. A considerable degree of variation means research findings were significant. In many contexts, ANOVA testing and variance analysis are similar.
  • Researchers must have the necessary research skills to analyze and manipulation the data , Getting trained to demonstrate a high standard of research practice. Ideally, researchers must possess more than a basic understanding of the rationale of selecting one statistical method over the other to obtain better data insights.
  • Usually, research and data analytics projects differ by scientific discipline; therefore, getting statistical advice at the beginning of analysis helps design a survey questionnaire, select data collection  methods, and choose samples.

LEARN ABOUT: Best Data Collection Tools

  • The primary aim of data research and analysis is to derive ultimate insights that are unbiased. Any mistake in or keeping a biased mind to collect data, selecting an analysis method, or choosing  audience  sample il to draw a biased inference.
  • Irrelevant to the sophistication used in research data and analysis is enough to rectify the poorly defined objective outcome measurements. It does not matter if the design is at fault or intentions are not clear, but lack of clarity might mislead readers, so avoid the practice.
  • The motive behind data analysis in research is to present accurate and reliable data. As far as possible, avoid statistical errors, and find a way to deal with everyday challenges like outliers, missing data, data altering, data mining , or developing graphical representation.

LEARN MORE: Descriptive Research vs Correlational Research The sheer amount of data generated daily is frightening. Especially when data analysis has taken center stage. in 2018. In last year, the total data supply amounted to 2.8 trillion gigabytes. Hence, it is clear that the enterprises willing to survive in the hypercompetitive world must possess an excellent capability to analyze complex research data, derive actionable insights, and adapt to the new market needs.

LEARN ABOUT: Average Order Value

QuestionPro is an online survey platform that empowers organizations in data analysis and research and provides them a medium to collect data by creating appealing surveys.

MORE LIKE THIS

Government Customer Experience

Government Customer Experience: Impact on Government Service

Apr 11, 2024

Employee Engagement App

Employee Engagement App: Top 11 For Workforce Improvement 

Apr 10, 2024

employee evaluation software

Top 15 Employee Evaluation Software to Enhance Performance

event feedback software

Event Feedback Software: Top 11 Best in 2024

Apr 9, 2024

Other categories

  • Academic Research
  • Artificial Intelligence
  • Assessments
  • Brand Awareness
  • Case Studies
  • Communities
  • Consumer Insights
  • Customer effort score
  • Customer Engagement
  • Customer Experience
  • Customer Loyalty
  • Customer Research
  • Customer Satisfaction
  • Employee Benefits
  • Employee Engagement
  • Employee Retention
  • Friday Five
  • General Data Protection Regulation
  • Insights Hub
  • Life@QuestionPro
  • Market Research
  • Mobile diaries
  • Mobile Surveys
  • New Features
  • Online Communities
  • Question Types
  • Questionnaire
  • QuestionPro Products
  • Release Notes
  • Research Tools and Apps
  • Revenue at Risk
  • Survey Templates
  • Training Tips
  • Uncategorized
  • Video Learning Series
  • What’s Coming Up
  • Workforce Intelligence

A Step-by-Step Guide to the Data Analysis Process

Like any scientific discipline, data analysis follows a rigorous step-by-step process. Each stage requires different skills and know-how. To get meaningful insights, though, it’s important to understand the process as a whole. An underlying framework is invaluable for producing results that stand up to scrutiny.

In this post, we’ll explore the main steps in the data analysis process. This will cover how to define your goal, collect data, and carry out an analysis. Where applicable, we’ll also use examples and highlight a few tools to make the journey easier. When you’re done, you’ll have a much better understanding of the basics. This will help you tweak the process to fit your own needs.

Here are the steps we’ll take you through:

  • Defining the question
  • Collecting the data
  • Cleaning the data
  • Analyzing the data
  • Sharing your results
  • Embracing failure

On popular request, we’ve also developed a video based on this article. Scroll further along this article to watch that.

Ready? Let’s get started with step one.

1. Step one: Defining the question

The first step in any data analysis process is to define your objective. In data analytics jargon, this is sometimes called the ‘problem statement’.

Defining your objective means coming up with a hypothesis and figuring how to test it. Start by asking: What business problem am I trying to solve? While this might sound straightforward, it can be trickier than it seems. For instance, your organization’s senior management might pose an issue, such as: “Why are we losing customers?” It’s possible, though, that this doesn’t get to the core of the problem. A data analyst’s job is to understand the business and its goals in enough depth that they can frame the problem the right way.

Let’s say you work for a fictional company called TopNotch Learning. TopNotch creates custom training software for its clients. While it is excellent at securing new clients, it has much lower repeat business. As such, your question might not be, “Why are we losing customers?” but, “Which factors are negatively impacting the customer experience?” or better yet: “How can we boost customer retention while minimizing costs?”

Now you’ve defined a problem, you need to determine which sources of data will best help you solve it. This is where your business acumen comes in again. For instance, perhaps you’ve noticed that the sales process for new clients is very slick, but that the production team is inefficient. Knowing this, you could hypothesize that the sales process wins lots of new clients, but the subsequent customer experience is lacking. Could this be why customers don’t come back? Which sources of data will help you answer this question?

Tools to help define your objective

Defining your objective is mostly about soft skills, business knowledge, and lateral thinking. But you’ll also need to keep track of business metrics and key performance indicators (KPIs). Monthly reports can allow you to track problem points in the business. Some KPI dashboards come with a fee, like Databox and DashThis . However, you’ll also find open-source software like Grafana , Freeboard , and Dashbuilder . These are great for producing simple dashboards, both at the beginning and the end of the data analysis process.

2. Step two: Collecting the data

Once you’ve established your objective, you’ll need to create a strategy for collecting and aggregating the appropriate data. A key part of this is determining which data you need. This might be quantitative (numeric) data, e.g. sales figures, or qualitative (descriptive) data, such as customer reviews. All data fit into one of three categories: first-party, second-party, and third-party data. Let’s explore each one.

What is first-party data?

First-party data are data that you, or your company, have directly collected from customers. It might come in the form of transactional tracking data or information from your company’s customer relationship management (CRM) system. Whatever its source, first-party data is usually structured and organized in a clear, defined way. Other sources of first-party data might include customer satisfaction surveys, focus groups, interviews, or direct observation.

What is second-party data?

To enrich your analysis, you might want to secure a secondary data source. Second-party data is the first-party data of other organizations. This might be available directly from the company or through a private marketplace. The main benefit of second-party data is that they are usually structured, and although they will be less relevant than first-party data, they also tend to be quite reliable. Examples of second-party data include website, app or social media activity, like online purchase histories, or shipping data.

What is third-party data?

Third-party data is data that has been collected and aggregated from numerous sources by a third-party organization. Often (though not always) third-party data contains a vast amount of unstructured data points (big data). Many organizations collect big data to create industry reports or to conduct market research. The research and advisory firm Gartner is a good real-world example of an organization that collects big data and sells it on to other companies. Open data repositories and government portals are also sources of third-party data .

Tools to help you collect data

Once you’ve devised a data strategy (i.e. you’ve identified which data you need, and how best to go about collecting them) there are many tools you can use to help you. One thing you’ll need, regardless of industry or area of expertise, is a data management platform (DMP). A DMP is a piece of software that allows you to identify and aggregate data from numerous sources, before manipulating them, segmenting them, and so on. There are many DMPs available. Some well-known enterprise DMPs include Salesforce DMP , SAS , and the data integration platform, Xplenty . If you want to play around, you can also try some open-source platforms like Pimcore or D:Swarm .

Want to learn more about what data analytics is and the process a data analyst follows? We cover this topic (and more) in our free introductory short course for beginners. Check out tutorial one: An introduction to data analytics .

3. Step three: Cleaning the data

Once you’ve collected your data, the next step is to get it ready for analysis. This means cleaning, or ‘scrubbing’ it, and is crucial in making sure that you’re working with high-quality data . Key data cleaning tasks include:

  • Removing major errors, duplicates, and outliers —all of which are inevitable problems when aggregating data from numerous sources.
  • Removing unwanted data points —extracting irrelevant observations that have no bearing on your intended analysis.
  • Bringing structure to your data —general ‘housekeeping’, i.e. fixing typos or layout issues, which will help you map and manipulate your data more easily.
  • Filling in major gaps —as you’re tidying up, you might notice that important data are missing. Once you’ve identified gaps, you can go about filling them.

A good data analyst will spend around 70-90% of their time cleaning their data. This might sound excessive. But focusing on the wrong data points (or analyzing erroneous data) will severely impact your results. It might even send you back to square one…so don’t rush it! You’ll find a step-by-step guide to data cleaning here . You may be interested in this introductory tutorial to data cleaning, hosted by Dr. Humera Noor Minhas.

Carrying out an exploratory analysis

Another thing many data analysts do (alongside cleaning data) is to carry out an exploratory analysis. This helps identify initial trends and characteristics, and can even refine your hypothesis. Let’s use our fictional learning company as an example again. Carrying out an exploratory analysis, perhaps you notice a correlation between how much TopNotch Learning’s clients pay and how quickly they move on to new suppliers. This might suggest that a low-quality customer experience (the assumption in your initial hypothesis) is actually less of an issue than cost. You might, therefore, take this into account.

Tools to help you clean your data

Cleaning datasets manually—especially large ones—can be daunting. Luckily, there are many tools available to streamline the process. Open-source tools, such as OpenRefine , are excellent for basic data cleaning, as well as high-level exploration. However, free tools offer limited functionality for very large datasets. Python libraries (e.g. Pandas) and some R packages are better suited for heavy data scrubbing. You will, of course, need to be familiar with the languages. Alternatively, enterprise tools are also available. For example, Data Ladder , which is one of the highest-rated data-matching tools in the industry. There are many more. Why not see which free data cleaning tools you can find to play around with?

4. Step four: Analyzing the data

Finally, you’ve cleaned your data. Now comes the fun bit—analyzing it! The type of data analysis you carry out largely depends on what your goal is. But there are many techniques available. Univariate or bivariate analysis, time-series analysis, and regression analysis are just a few you might have heard of. More important than the different types, though, is how you apply them. This depends on what insights you’re hoping to gain. Broadly speaking, all types of data analysis fit into one of the following four categories.

Descriptive analysis

Descriptive analysis identifies what has already happened . It is a common first step that companies carry out before proceeding with deeper explorations. As an example, let’s refer back to our fictional learning provider once more. TopNotch Learning might use descriptive analytics to analyze course completion rates for their customers. Or they might identify how many users access their products during a particular period. Perhaps they’ll use it to measure sales figures over the last five years. While the company might not draw firm conclusions from any of these insights, summarizing and describing the data will help them to determine how to proceed.

Learn more: What is descriptive analytics?

Diagnostic analysis

Diagnostic analytics focuses on understanding why something has happened . It is literally the diagnosis of a problem, just as a doctor uses a patient’s symptoms to diagnose a disease. Remember TopNotch Learning’s business problem? ‘Which factors are negatively impacting the customer experience?’ A diagnostic analysis would help answer this. For instance, it could help the company draw correlations between the issue (struggling to gain repeat business) and factors that might be causing it (e.g. project costs, speed of delivery, customer sector, etc.) Let’s imagine that, using diagnostic analytics, TopNotch realizes its clients in the retail sector are departing at a faster rate than other clients. This might suggest that they’re losing customers because they lack expertise in this sector. And that’s a useful insight!

Predictive analysis

Predictive analysis allows you to identify future trends based on historical data . In business, predictive analysis is commonly used to forecast future growth, for example. But it doesn’t stop there. Predictive analysis has grown increasingly sophisticated in recent years. The speedy evolution of machine learning allows organizations to make surprisingly accurate forecasts. Take the insurance industry. Insurance providers commonly use past data to predict which customer groups are more likely to get into accidents. As a result, they’ll hike up customer insurance premiums for those groups. Likewise, the retail industry often uses transaction data to predict where future trends lie, or to determine seasonal buying habits to inform their strategies. These are just a few simple examples, but the untapped potential of predictive analysis is pretty compelling.

Prescriptive analysis

Prescriptive analysis allows you to make recommendations for the future. This is the final step in the analytics part of the process. It’s also the most complex. This is because it incorporates aspects of all the other analyses we’ve described. A great example of prescriptive analytics is the algorithms that guide Google’s self-driving cars. Every second, these algorithms make countless decisions based on past and present data, ensuring a smooth, safe ride. Prescriptive analytics also helps companies decide on new products or areas of business to invest in.

Learn more:  What are the different types of data analysis?

5. Step five: Sharing your results

You’ve finished carrying out your analyses. You have your insights. The final step of the data analytics process is to share these insights with the wider world (or at least with your organization’s stakeholders!) This is more complex than simply sharing the raw results of your work—it involves interpreting the outcomes, and presenting them in a manner that’s digestible for all types of audiences. Since you’ll often present information to decision-makers, it’s very important that the insights you present are 100% clear and unambiguous. For this reason, data analysts commonly use reports, dashboards, and interactive visualizations to support their findings.

How you interpret and present results will often influence the direction of a business. Depending on what you share, your organization might decide to restructure, to launch a high-risk product, or even to close an entire division. That’s why it’s very important to provide all the evidence that you’ve gathered, and not to cherry-pick data. Ensuring that you cover everything in a clear, concise way will prove that your conclusions are scientifically sound and based on the facts. On the flip side, it’s important to highlight any gaps in the data or to flag any insights that might be open to interpretation. Honest communication is the most important part of the process. It will help the business, while also helping you to excel at your job!

Tools for interpreting and sharing your findings

There are tons of data visualization tools available, suited to different experience levels. Popular tools requiring little or no coding skills include Google Charts , Tableau , Datawrapper , and Infogram . If you’re familiar with Python and R, there are also many data visualization libraries and packages available. For instance, check out the Python libraries Plotly , Seaborn , and Matplotlib . Whichever data visualization tools you use, make sure you polish up your presentation skills, too. Remember: Visualization is great, but communication is key!

You can learn more about storytelling with data in this free, hands-on tutorial .  We show you how to craft a compelling narrative for a real dataset, resulting in a presentation to share with key stakeholders. This is an excellent insight into what it’s really like to work as a data analyst!

6. Step six: Embrace your failures

The last ‘step’ in the data analytics process is to embrace your failures. The path we’ve described above is more of an iterative process than a one-way street. Data analytics is inherently messy, and the process you follow will be different for every project. For instance, while cleaning data, you might spot patterns that spark a whole new set of questions. This could send you back to step one (to redefine your objective). Equally, an exploratory analysis might highlight a set of data points you’d never considered using before. Or maybe you find that the results of your core analyses are misleading or erroneous. This might be caused by mistakes in the data, or human error earlier in the process.

While these pitfalls can feel like failures, don’t be disheartened if they happen. Data analysis is inherently chaotic, and mistakes occur. What’s important is to hone your ability to spot and rectify errors. If data analytics was straightforward, it might be easier, but it certainly wouldn’t be as interesting. Use the steps we’ve outlined as a framework, stay open-minded, and be creative. If you lose your way, you can refer back to the process to keep yourself on track.

In this post, we’ve covered the main steps of the data analytics process. These core steps can be amended, re-ordered and re-used as you deem fit, but they underpin every data analyst’s work:

  • Define the question —What business problem are you trying to solve? Frame it as a question to help you focus on finding a clear answer.
  • Collect data —Create a strategy for collecting data. Which data sources are most likely to help you solve your business problem?
  • Clean the data —Explore, scrub, tidy, de-dupe, and structure your data as needed. Do whatever you have to! But don’t rush…take your time!
  • Analyze the data —Carry out various analyses to obtain insights. Focus on the four types of data analysis: descriptive, diagnostic, predictive, and prescriptive.
  • Share your results —How best can you share your insights and recommendations? A combination of visualization tools and communication is key.
  • Embrace your mistakes —Mistakes happen. Learn from them. This is what transforms a good data analyst into a great one.

What next? From here, we strongly encourage you to explore the topic on your own. Get creative with the steps in the data analysis process, and see what tools you can find. As long as you stick to the core principles we’ve described, you can create a tailored technique that works for you.

To learn more, check out our free, 5-day data analytics short course . You might also be interested in the following:

  • These are the top 9 data analytics tools
  • 10 great places to find free datasets for your next project
  • How to build a data analytics portfolio

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Can J Hosp Pharm
  • v.68(4); Jul-Aug 2015

Logo of cjhp

Creating a Data Analysis Plan: What to Consider When Choosing Statistics for a Study

There are three kinds of lies: lies, damned lies, and statistics. – Mark Twain 1

INTRODUCTION

Statistics represent an essential part of a study because, regardless of the study design, investigators need to summarize the collected information for interpretation and presentation to others. It is therefore important for us to heed Mr Twain’s concern when creating the data analysis plan. In fact, even before data collection begins, we need to have a clear analysis plan that will guide us from the initial stages of summarizing and describing the data through to testing our hypotheses.

The purpose of this article is to help you create a data analysis plan for a quantitative study. For those interested in conducting qualitative research, previous articles in this Research Primer series have provided information on the design and analysis of such studies. 2 , 3 Information in the current article is divided into 3 main sections: an overview of terms and concepts used in data analysis, a review of common methods used to summarize study data, and a process to help identify relevant statistical tests. My intention here is to introduce the main elements of data analysis and provide a place for you to start when planning this part of your study. Biostatistical experts, textbooks, statistical software packages, and other resources can certainly add more breadth and depth to this topic when you need additional information and advice.

TERMS AND CONCEPTS USED IN DATA ANALYSIS

When analyzing information from a quantitative study, we are often dealing with numbers; therefore, it is important to begin with an understanding of the source of the numbers. Let us start with the term variable , which defines a specific item of information collected in a study. Examples of variables include age, sex or gender, ethnicity, exercise frequency, weight, treatment group, and blood glucose. Each variable will have a group of categories, which are referred to as values , to help describe the characteristic of an individual study participant. For example, the variable “sex” would have values of “male” and “female”.

Although variables can be defined or grouped in various ways, I will focus on 2 methods at this introductory stage. First, variables can be defined according to the level of measurement. The categories in a nominal variable are names, for example, male and female for the variable “sex”; white, Aboriginal, black, Latin American, South Asian, and East Asian for the variable “ethnicity”; and intervention and control for the variable “treatment group”. Nominal variables with only 2 categories are also referred to as dichotomous variables because the study group can be divided into 2 subgroups based on information in the variable. For example, a study sample can be split into 2 groups (patients receiving the intervention and controls) using the dichotomous variable “treatment group”. An ordinal variable implies that the categories can be placed in a meaningful order, as would be the case for exercise frequency (never, sometimes, often, or always). Nominal-level and ordinal-level variables are also referred to as categorical variables, because each category in the variable can be completely separated from the others. The categories for an interval variable can be placed in a meaningful order, with the interval between consecutive categories also having meaning. Age, weight, and blood glucose can be considered as interval variables, but also as ratio variables, because the ratio between values has meaning (e.g., a 15-year-old is half the age of a 30-year-old). Interval-level and ratio-level variables are also referred to as continuous variables because of the underlying continuity among categories.

As we progress through the levels of measurement from nominal to ratio variables, we gather more information about the study participant. The amount of information that a variable provides will become important in the analysis stage, because we lose information when variables are reduced or aggregated—a common practice that is not recommended. 4 For example, if age is reduced from a ratio-level variable (measured in years) to an ordinal variable (categories of < 65 and ≥ 65 years) we lose the ability to make comparisons across the entire age range and introduce error into the data analysis. 4

A second method of defining variables is to consider them as either dependent or independent. As the terms imply, the value of a dependent variable depends on the value of other variables, whereas the value of an independent variable does not rely on other variables. In addition, an investigator can influence the value of an independent variable, such as treatment-group assignment. Independent variables are also referred to as predictors because we can use information from these variables to predict the value of a dependent variable. Building on the group of variables listed in the first paragraph of this section, blood glucose could be considered a dependent variable, because its value may depend on values of the independent variables age, sex, ethnicity, exercise frequency, weight, and treatment group.

Statistics are mathematical formulae that are used to organize and interpret the information that is collected through variables. There are 2 general categories of statistics, descriptive and inferential. Descriptive statistics are used to describe the collected information, such as the range of values, their average, and the most common category. Knowledge gained from descriptive statistics helps investigators learn more about the study sample. Inferential statistics are used to make comparisons and draw conclusions from the study data. Knowledge gained from inferential statistics allows investigators to make inferences and generalize beyond their study sample to other groups.

Before we move on to specific descriptive and inferential statistics, there are 2 more definitions to review. Parametric statistics are generally used when values in an interval-level or ratio-level variable are normally distributed (i.e., the entire group of values has a bell-shaped curve when plotted by frequency). These statistics are used because we can define parameters of the data, such as the centre and width of the normally distributed curve. In contrast, interval-level and ratio-level variables with values that are not normally distributed, as well as nominal-level and ordinal-level variables, are generally analyzed using nonparametric statistics.

METHODS FOR SUMMARIZING STUDY DATA: DESCRIPTIVE STATISTICS

The first step in a data analysis plan is to describe the data collected in the study. This can be done using figures to give a visual presentation of the data and statistics to generate numeric descriptions of the data.

Selection of an appropriate figure to represent a particular set of data depends on the measurement level of the variable. Data for nominal-level and ordinal-level variables may be interpreted using a pie graph or bar graph . Both options allow us to examine the relative number of participants within each category (by reporting the percentages within each category), whereas a bar graph can also be used to examine absolute numbers. For example, we could create a pie graph to illustrate the proportions of men and women in a study sample and a bar graph to illustrate the number of people who report exercising at each level of frequency (never, sometimes, often, or always).

Interval-level and ratio-level variables may also be interpreted using a pie graph or bar graph; however, these types of variables often have too many categories for such graphs to provide meaningful information. Instead, these variables may be better interpreted using a histogram . Unlike a bar graph, which displays the frequency for each distinct category, a histogram displays the frequency within a range of continuous categories. Information from this type of figure allows us to determine whether the data are normally distributed. In addition to pie graphs, bar graphs, and histograms, many other types of figures are available for the visual representation of data. Interested readers can find additional types of figures in the books recommended in the “Further Readings” section.

Figures are also useful for visualizing comparisons between variables or between subgroups within a variable (for example, the distribution of blood glucose according to sex). Box plots are useful for summarizing information for a variable that does not follow a normal distribution. The lower and upper limits of the box identify the interquartile range (or 25th and 75th percentiles), while the midline indicates the median value (or 50th percentile). Scatter plots provide information on how the categories for one continuous variable relate to categories in a second variable; they are often helpful in the analysis of correlations.

In addition to using figures to present a visual description of the data, investigators can use statistics to provide a numeric description. Regardless of the measurement level, we can find the mode by identifying the most frequent category within a variable. When summarizing nominal-level and ordinal-level variables, the simplest method is to report the proportion of participants within each category.

The choice of the most appropriate descriptive statistic for interval-level and ratio-level variables will depend on how the values are distributed. If the values are normally distributed, we can summarize the information using the parametric statistics of mean and standard deviation. The mean is the arithmetic average of all values within the variable, and the standard deviation tells us how widely the values are dispersed around the mean. When values of interval-level and ratio-level variables are not normally distributed, or we are summarizing information from an ordinal-level variable, it may be more appropriate to use the nonparametric statistics of median and range. The first step in identifying these descriptive statistics is to arrange study participants according to the variable categories from lowest value to highest value. The range is used to report the lowest and highest values. The median or 50th percentile is located by dividing the number of participants into 2 groups, such that half (50%) of the participants have values above the median and the other half (50%) have values below the median. Similarly, the 25th percentile is the value with 25% of the participants having values below and 75% of the participants having values above, and the 75th percentile is the value with 75% of participants having values below and 25% of participants having values above. Together, the 25th and 75th percentiles define the interquartile range .

PROCESS TO IDENTIFY RELEVANT STATISTICAL TESTS: INFERENTIAL STATISTICS

One caveat about the information provided in this section: selecting the most appropriate inferential statistic for a specific study should be a combination of following these suggestions, seeking advice from experts, and discussing with your co-investigators. My intention here is to give you a place to start a conversation with your colleagues about the options available as you develop your data analysis plan.

There are 3 key questions to consider when selecting an appropriate inferential statistic for a study: What is the research question? What is the study design? and What is the level of measurement? It is important for investigators to carefully consider these questions when developing the study protocol and creating the analysis plan. The figures that accompany these questions show decision trees that will help you to narrow down the list of inferential statistics that would be relevant to a particular study. Appendix 1 provides brief definitions of the inferential statistics named in these figures. Additional information, such as the formulae for various inferential statistics, can be obtained from textbooks, statistical software packages, and biostatisticians.

What Is the Research Question?

The first step in identifying relevant inferential statistics for a study is to consider the type of research question being asked. You can find more details about the different types of research questions in a previous article in this Research Primer series that covered questions and hypotheses. 5 A relational question seeks information about the relationship among variables; in this situation, investigators will be interested in determining whether there is an association ( Figure 1 ). A causal question seeks information about the effect of an intervention on an outcome; in this situation, the investigator will be interested in determining whether there is a difference ( Figure 2 ).

An external file that holds a picture, illustration, etc.
Object name is cjhp-68-311f1.jpg

Decision tree to identify inferential statistics for an association.

An external file that holds a picture, illustration, etc.
Object name is cjhp-68-311f2.jpg

Decision tree to identify inferential statistics for measuring a difference.

What Is the Study Design?

When considering a question of association, investigators will be interested in measuring the relationship between variables ( Figure 1 ). A study designed to determine whether there is consensus among different raters will be measuring agreement. For example, an investigator may be interested in determining whether 2 raters, using the same assessment tool, arrive at the same score. Correlation analyses examine the strength of a relationship or connection between 2 variables, like age and blood glucose. Regression analyses also examine the strength of a relationship or connection; however, in this type of analysis, one variable is considered an outcome (or dependent variable) and the other variable is considered a predictor (or independent variable). Regression analyses often consider the influence of multiple predictors on an outcome at the same time. For example, an investigator may be interested in examining the association between a treatment and blood glucose, while also considering other factors, like age, sex, ethnicity, exercise frequency, and weight.

When considering a question of difference, investigators must first determine how many groups they will be comparing. In some cases, investigators may be interested in comparing the characteristic of one group with that of an external reference group. For example, is the mean age of study participants similar to the mean age of all people in the target group? If more than one group is involved, then investigators must also determine whether there is an underlying connection between the sets of values (or samples ) to be compared. Samples are considered independent or unpaired when the information is taken from different groups. For example, we could use an unpaired t test to compare the mean age between 2 independent samples, such as the intervention and control groups in a study. Samples are considered related or paired if the information is taken from the same group of people, for example, measurement of blood glucose at the beginning and end of a study. Because blood glucose is measured in the same people at both time points, we could use a paired t test to determine whether there has been a significant change in blood glucose.

What Is the Level of Measurement?

As described in the first section of this article, variables can be grouped according to the level of measurement (nominal, ordinal, or interval). In most cases, the independent variable in an inferential statistic will be nominal; therefore, investigators need to know the level of measurement for the dependent variable before they can select the relevant inferential statistic. Two exceptions to this consideration are correlation analyses and regression analyses ( Figure 1 ). Because a correlation analysis measures the strength of association between 2 variables, we need to consider the level of measurement for both variables. Regression analyses can consider multiple independent variables, often with a variety of measurement levels. However, for these analyses, investigators still need to consider the level of measurement for the dependent variable.

Selection of inferential statistics to test interval-level variables must include consideration of how the data are distributed. An underlying assumption for parametric tests is that the data approximate a normal distribution. When the data are not normally distributed, information derived from a parametric test may be wrong. 6 When the assumption of normality is violated (for example, when the data are skewed), then investigators should use a nonparametric test. If the data are normally distributed, then investigators can use a parametric test.

ADDITIONAL CONSIDERATIONS

What is the level of significance.

An inferential statistic is used to calculate a p value, the probability of obtaining the observed data by chance. Investigators can then compare this p value against a prespecified level of significance, which is often chosen to be 0.05. This level of significance represents a 1 in 20 chance that the observation is wrong, which is considered an acceptable level of error.

What Are the Most Commonly Used Statistics?

In 1983, Emerson and Colditz 7 reported the first review of statistics used in original research articles published in the New England Journal of Medicine . This review of statistics used in the journal was updated in 1989 and 2005, 8 and this type of analysis has been replicated in many other journals. 9 – 13 Collectively, these reviews have identified 2 important observations. First, the overall sophistication of statistical methodology used and reported in studies has grown over time, with survival analyses and multivariable regression analyses becoming much more common. The second observation is that, despite this trend, 1 in 4 articles describe no statistical methods or report only simple descriptive statistics. When inferential statistics are used, the most common are t tests, contingency table tests (for example, χ 2 test and Fisher exact test), and simple correlation and regression analyses. This information is important for educators, investigators, reviewers, and readers because it suggests that a good foundational knowledge of descriptive statistics and common inferential statistics will enable us to correctly evaluate the majority of research articles. 11 – 13 However, to fully take advantage of all research published in high-impact journals, we need to become acquainted with some of the more complex methods, such as multivariable regression analyses. 8 , 13

What Are Some Additional Resources?

As an investigator and Associate Editor with CJHP , I have often relied on the advice of colleagues to help create my own analysis plans and review the plans of others. Biostatisticians have a wealth of knowledge in the field of statistical analysis and can provide advice on the correct selection, application, and interpretation of these methods. Colleagues who have “been there and done that” with their own data analysis plans are also valuable sources of information. Identify these individuals and consult with them early and often as you develop your analysis plan.

Another important resource to consider when creating your analysis plan is textbooks. Numerous statistical textbooks are available, differing in levels of complexity and scope. The titles listed in the “Further Reading” section are just a few suggestions. I encourage interested readers to look through these and other books to find resources that best fit their needs. However, one crucial book that I highly recommend to anyone wanting to be an investigator or peer reviewer is Lang and Secic’s How to Report Statistics in Medicine (see “Further Reading”). As the title implies, this book covers a wide range of statistics used in medical research and provides numerous examples of how to correctly report the results.

CONCLUSIONS

When it comes to creating an analysis plan for your project, I recommend following the sage advice of Douglas Adams in The Hitchhiker’s Guide to the Galaxy : Don’t panic! 14 Begin with simple methods to summarize and visualize your data, then use the key questions and decision trees provided in this article to identify relevant statistical tests. Information in this article will give you and your co-investigators a place to start discussing the elements necessary for developing an analysis plan. But do not stop there! Use advice from biostatisticians and more experienced colleagues, as well as information in textbooks, to help create your analysis plan and choose the most appropriate statistics for your study. Making careful, informed decisions about the statistics to use in your study should reduce the risk of confirming Mr Twain’s concern.

Appendix 1. Glossary of statistical terms * (part 1 of 2)

  • 1-way ANOVA: Uses 1 variable to define the groups for comparing means. This is similar to the Student t test when comparing the means of 2 groups.
  • Kruskall–Wallis 1-way ANOVA: Nonparametric alternative for the 1-way ANOVA. Used to determine the difference in medians between 3 or more groups.
  • n -way ANOVA: Uses 2 or more variables to define groups when comparing means. Also called a “between-subjects factorial ANOVA”.
  • Repeated-measures ANOVA: A method for analyzing whether the means of 3 or more measures from the same group of participants are different.
  • Freidman ANOVA: Nonparametric alternative for the repeated-measures ANOVA. It is often used to compare rankings and preferences that are measured 3 or more times.
  • Fisher exact: Variation of chi-square that accounts for cell counts < 5.
  • McNemar: Variation of chi-square that tests statistical significance of changes in 2 paired measurements of dichotomous variables.
  • Cochran Q: An extension of the McNemar test that provides a method for testing for differences between 3 or more matched sets of frequencies or proportions. Often used as a measure of heterogeneity in meta-analyses.
  • 1-sample: Used to determine whether the mean of a sample is significantly different from a known or hypothesized value.
  • Independent-samples t test (also referred to as the Student t test): Used when the independent variable is a nominal-level variable that identifies 2 groups and the dependent variable is an interval-level variable.
  • Paired: Used to compare 2 pairs of scores between 2 groups (e.g., baseline and follow-up blood pressure in the intervention and control groups).

Lang TA, Secic M. How to report statistics in medicine: annotated guidelines for authors, editors, and reviewers. 2nd ed. Philadelphia (PA): American College of Physicians; 2006.

Norman GR, Streiner DL. PDQ statistics. 3rd ed. Hamilton (ON): B.C. Decker; 2003.

Plichta SB, Kelvin E. Munro’s statistical methods for health care research . 6th ed. Philadelphia (PA): Wolters Kluwer Health/ Lippincott, Williams & Wilkins; 2013.

This article is the 12th in the CJHP Research Primer Series, an initiative of the CJHP Editorial Board and the CSHP Research Committee. The planned 2-year series is intended to appeal to relatively inexperienced researchers, with the goal of building research capacity among practising pharmacists. The articles, presenting simple but rigorous guidance to encourage and support novice researchers, are being solicited from authors with appropriate expertise.

Previous articles in this series:

  • Bond CM. The research jigsaw: how to get started. Can J Hosp Pharm . 2014;67(1):28–30.
  • Tully MP. Research: articulating questions, generating hypotheses, and choosing study designs. Can J Hosp Pharm . 2014;67(1):31–4.
  • Loewen P. Ethical issues in pharmacy practice research: an introductory guide. Can J Hosp Pharm. 2014;67(2):133–7.
  • Tsuyuki RT. Designing pharmacy practice research trials. Can J Hosp Pharm . 2014;67(3):226–9.
  • Bresee LC. An introduction to developing surveys for pharmacy practice research. Can J Hosp Pharm . 2014;67(4):286–91.
  • Gamble JM. An introduction to the fundamentals of cohort and case–control studies. Can J Hosp Pharm . 2014;67(5):366–72.
  • Austin Z, Sutton J. Qualitative research: getting started. C an J Hosp Pharm . 2014;67(6):436–40.
  • Houle S. An introduction to the fundamentals of randomized controlled trials in pharmacy research. Can J Hosp Pharm . 2014; 68(1):28–32.
  • Charrois TL. Systematic reviews: What do you need to know to get started? Can J Hosp Pharm . 2014;68(2):144–8.
  • Sutton J, Austin Z. Qualitative research: data collection, analysis, and management. Can J Hosp Pharm . 2014;68(3):226–31.
  • Cadarette SM, Wong L. An introduction to health care administrative data. Can J Hosp Pharm. 2014;68(3):232–7.

Competing interests: None declared.

Further Reading

  • Devor J, Peck R. Statistics: the exploration and analysis of data. 7th ed. Boston (MA): Brooks/Cole Cengage Learning; 2012. [ Google Scholar ]
  • Lang TA, Secic M. How to report statistics in medicine: annotated guidelines for authors, editors, and reviewers. 2nd ed. Philadelphia (PA): American College of Physicians; 2006. [ Google Scholar ]
  • Mendenhall W, Beaver RJ, Beaver BM. Introduction to probability and statistics. 13th ed. Belmont (CA): Brooks/Cole Cengage Learning; 2009. [ Google Scholar ]
  • Norman GR, Streiner DL. PDQ statistics. 3rd ed. Hamilton (ON): B.C. Decker; 2003. [ Google Scholar ]
  • Plichta SB, Kelvin E. Munro’s statistical methods for health care research. 6th ed. Philadelphia (PA): Wolters Kluwer Health/Lippincott, Williams & Wilkins; 2013. [ Google Scholar ]
  • Privacy Policy

Buy Me a Coffee

Research Method

Home » Data Analysis – Process, Methods and Types

Data Analysis – Process, Methods and Types

Table of Contents

Data Analysis

Data Analysis

Definition:

Data analysis refers to the process of inspecting, cleaning, transforming, and modeling data with the goal of discovering useful information, drawing conclusions, and supporting decision-making. It involves applying various statistical and computational techniques to interpret and derive insights from large datasets. The ultimate aim of data analysis is to convert raw data into actionable insights that can inform business decisions, scientific research, and other endeavors.

Data Analysis Process

The following are step-by-step guides to the data analysis process:

Define the Problem

The first step in data analysis is to clearly define the problem or question that needs to be answered. This involves identifying the purpose of the analysis, the data required, and the intended outcome.

Collect the Data

The next step is to collect the relevant data from various sources. This may involve collecting data from surveys, databases, or other sources. It is important to ensure that the data collected is accurate, complete, and relevant to the problem being analyzed.

Clean and Organize the Data

Once the data has been collected, it needs to be cleaned and organized. This involves removing any errors or inconsistencies in the data, filling in missing values, and ensuring that the data is in a format that can be easily analyzed.

Analyze the Data

The next step is to analyze the data using various statistical and analytical techniques. This may involve identifying patterns in the data, conducting statistical tests, or using machine learning algorithms to identify trends and insights.

Interpret the Results

After analyzing the data, the next step is to interpret the results. This involves drawing conclusions based on the analysis and identifying any significant findings or trends.

Communicate the Findings

Once the results have been interpreted, they need to be communicated to stakeholders. This may involve creating reports, visualizations, or presentations to effectively communicate the findings and recommendations.

Take Action

The final step in the data analysis process is to take action based on the findings. This may involve implementing new policies or procedures, making strategic decisions, or taking other actions based on the insights gained from the analysis.

Types of Data Analysis

Types of Data Analysis are as follows:

Descriptive Analysis

This type of analysis involves summarizing and describing the main characteristics of a dataset, such as the mean, median, mode, standard deviation, and range.

Inferential Analysis

This type of analysis involves making inferences about a population based on a sample. Inferential analysis can help determine whether a certain relationship or pattern observed in a sample is likely to be present in the entire population.

Diagnostic Analysis

This type of analysis involves identifying and diagnosing problems or issues within a dataset. Diagnostic analysis can help identify outliers, errors, missing data, or other anomalies in the dataset.

Predictive Analysis

This type of analysis involves using statistical models and algorithms to predict future outcomes or trends based on historical data. Predictive analysis can help businesses and organizations make informed decisions about the future.

Prescriptive Analysis

This type of analysis involves recommending a course of action based on the results of previous analyses. Prescriptive analysis can help organizations make data-driven decisions about how to optimize their operations, products, or services.

Exploratory Analysis

This type of analysis involves exploring the relationships and patterns within a dataset to identify new insights and trends. Exploratory analysis is often used in the early stages of research or data analysis to generate hypotheses and identify areas for further investigation.

Data Analysis Methods

Data Analysis Methods are as follows:

Statistical Analysis

This method involves the use of mathematical models and statistical tools to analyze and interpret data. It includes measures of central tendency, correlation analysis, regression analysis, hypothesis testing, and more.

Machine Learning

This method involves the use of algorithms to identify patterns and relationships in data. It includes supervised and unsupervised learning, classification, clustering, and predictive modeling.

Data Mining

This method involves using statistical and machine learning techniques to extract information and insights from large and complex datasets.

Text Analysis

This method involves using natural language processing (NLP) techniques to analyze and interpret text data. It includes sentiment analysis, topic modeling, and entity recognition.

Network Analysis

This method involves analyzing the relationships and connections between entities in a network, such as social networks or computer networks. It includes social network analysis and graph theory.

Time Series Analysis

This method involves analyzing data collected over time to identify patterns and trends. It includes forecasting, decomposition, and smoothing techniques.

Spatial Analysis

This method involves analyzing geographic data to identify spatial patterns and relationships. It includes spatial statistics, spatial regression, and geospatial data visualization.

Data Visualization

This method involves using graphs, charts, and other visual representations to help communicate the findings of the analysis. It includes scatter plots, bar charts, heat maps, and interactive dashboards.

Qualitative Analysis

This method involves analyzing non-numeric data such as interviews, observations, and open-ended survey responses. It includes thematic analysis, content analysis, and grounded theory.

Multi-criteria Decision Analysis

This method involves analyzing multiple criteria and objectives to support decision-making. It includes techniques such as the analytical hierarchy process, TOPSIS, and ELECTRE.

Data Analysis Tools

There are various data analysis tools available that can help with different aspects of data analysis. Below is a list of some commonly used data analysis tools:

  • Microsoft Excel: A widely used spreadsheet program that allows for data organization, analysis, and visualization.
  • SQL : A programming language used to manage and manipulate relational databases.
  • R : An open-source programming language and software environment for statistical computing and graphics.
  • Python : A general-purpose programming language that is widely used in data analysis and machine learning.
  • Tableau : A data visualization software that allows for interactive and dynamic visualizations of data.
  • SAS : A statistical analysis software used for data management, analysis, and reporting.
  • SPSS : A statistical analysis software used for data analysis, reporting, and modeling.
  • Matlab : A numerical computing software that is widely used in scientific research and engineering.
  • RapidMiner : A data science platform that offers a wide range of data analysis and machine learning tools.

Applications of Data Analysis

Data analysis has numerous applications across various fields. Below are some examples of how data analysis is used in different fields:

  • Business : Data analysis is used to gain insights into customer behavior, market trends, and financial performance. This includes customer segmentation, sales forecasting, and market research.
  • Healthcare : Data analysis is used to identify patterns and trends in patient data, improve patient outcomes, and optimize healthcare operations. This includes clinical decision support, disease surveillance, and healthcare cost analysis.
  • Education : Data analysis is used to measure student performance, evaluate teaching effectiveness, and improve educational programs. This includes assessment analytics, learning analytics, and program evaluation.
  • Finance : Data analysis is used to monitor and evaluate financial performance, identify risks, and make investment decisions. This includes risk management, portfolio optimization, and fraud detection.
  • Government : Data analysis is used to inform policy-making, improve public services, and enhance public safety. This includes crime analysis, disaster response planning, and social welfare program evaluation.
  • Sports : Data analysis is used to gain insights into athlete performance, improve team strategy, and enhance fan engagement. This includes player evaluation, scouting analysis, and game strategy optimization.
  • Marketing : Data analysis is used to measure the effectiveness of marketing campaigns, understand customer behavior, and develop targeted marketing strategies. This includes customer segmentation, marketing attribution analysis, and social media analytics.
  • Environmental science : Data analysis is used to monitor and evaluate environmental conditions, assess the impact of human activities on the environment, and develop environmental policies. This includes climate modeling, ecological forecasting, and pollution monitoring.

When to Use Data Analysis

Data analysis is useful when you need to extract meaningful insights and information from large and complex datasets. It is a crucial step in the decision-making process, as it helps you understand the underlying patterns and relationships within the data, and identify potential areas for improvement or opportunities for growth.

Here are some specific scenarios where data analysis can be particularly helpful:

  • Problem-solving : When you encounter a problem or challenge, data analysis can help you identify the root cause and develop effective solutions.
  • Optimization : Data analysis can help you optimize processes, products, or services to increase efficiency, reduce costs, and improve overall performance.
  • Prediction: Data analysis can help you make predictions about future trends or outcomes, which can inform strategic planning and decision-making.
  • Performance evaluation : Data analysis can help you evaluate the performance of a process, product, or service to identify areas for improvement and potential opportunities for growth.
  • Risk assessment : Data analysis can help you assess and mitigate risks, whether it is financial, operational, or related to safety.
  • Market research : Data analysis can help you understand customer behavior and preferences, identify market trends, and develop effective marketing strategies.
  • Quality control: Data analysis can help you ensure product quality and customer satisfaction by identifying and addressing quality issues.

Purpose of Data Analysis

The primary purposes of data analysis can be summarized as follows:

  • To gain insights: Data analysis allows you to identify patterns and trends in data, which can provide valuable insights into the underlying factors that influence a particular phenomenon or process.
  • To inform decision-making: Data analysis can help you make informed decisions based on the information that is available. By analyzing data, you can identify potential risks, opportunities, and solutions to problems.
  • To improve performance: Data analysis can help you optimize processes, products, or services by identifying areas for improvement and potential opportunities for growth.
  • To measure progress: Data analysis can help you measure progress towards a specific goal or objective, allowing you to track performance over time and adjust your strategies accordingly.
  • To identify new opportunities: Data analysis can help you identify new opportunities for growth and innovation by identifying patterns and trends that may not have been visible before.

Examples of Data Analysis

Some Examples of Data Analysis are as follows:

  • Social Media Monitoring: Companies use data analysis to monitor social media activity in real-time to understand their brand reputation, identify potential customer issues, and track competitors. By analyzing social media data, businesses can make informed decisions on product development, marketing strategies, and customer service.
  • Financial Trading: Financial traders use data analysis to make real-time decisions about buying and selling stocks, bonds, and other financial instruments. By analyzing real-time market data, traders can identify trends and patterns that help them make informed investment decisions.
  • Traffic Monitoring : Cities use data analysis to monitor traffic patterns and make real-time decisions about traffic management. By analyzing data from traffic cameras, sensors, and other sources, cities can identify congestion hotspots and make changes to improve traffic flow.
  • Healthcare Monitoring: Healthcare providers use data analysis to monitor patient health in real-time. By analyzing data from wearable devices, electronic health records, and other sources, healthcare providers can identify potential health issues and provide timely interventions.
  • Online Advertising: Online advertisers use data analysis to make real-time decisions about advertising campaigns. By analyzing data on user behavior and ad performance, advertisers can make adjustments to their campaigns to improve their effectiveness.
  • Sports Analysis : Sports teams use data analysis to make real-time decisions about strategy and player performance. By analyzing data on player movement, ball position, and other variables, coaches can make informed decisions about substitutions, game strategy, and training regimens.
  • Energy Management : Energy companies use data analysis to monitor energy consumption in real-time. By analyzing data on energy usage patterns, companies can identify opportunities to reduce energy consumption and improve efficiency.

Characteristics of Data Analysis

Characteristics of Data Analysis are as follows:

  • Objective : Data analysis should be objective and based on empirical evidence, rather than subjective assumptions or opinions.
  • Systematic : Data analysis should follow a systematic approach, using established methods and procedures for collecting, cleaning, and analyzing data.
  • Accurate : Data analysis should produce accurate results, free from errors and bias. Data should be validated and verified to ensure its quality.
  • Relevant : Data analysis should be relevant to the research question or problem being addressed. It should focus on the data that is most useful for answering the research question or solving the problem.
  • Comprehensive : Data analysis should be comprehensive and consider all relevant factors that may affect the research question or problem.
  • Timely : Data analysis should be conducted in a timely manner, so that the results are available when they are needed.
  • Reproducible : Data analysis should be reproducible, meaning that other researchers should be able to replicate the analysis using the same data and methods.
  • Communicable : Data analysis should be communicated clearly and effectively to stakeholders and other interested parties. The results should be presented in a way that is understandable and useful for decision-making.

Advantages of Data Analysis

Advantages of Data Analysis are as follows:

  • Better decision-making: Data analysis helps in making informed decisions based on facts and evidence, rather than intuition or guesswork.
  • Improved efficiency: Data analysis can identify inefficiencies and bottlenecks in business processes, allowing organizations to optimize their operations and reduce costs.
  • Increased accuracy: Data analysis helps to reduce errors and bias, providing more accurate and reliable information.
  • Better customer service: Data analysis can help organizations understand their customers better, allowing them to provide better customer service and improve customer satisfaction.
  • Competitive advantage: Data analysis can provide organizations with insights into their competitors, allowing them to identify areas where they can gain a competitive advantage.
  • Identification of trends and patterns : Data analysis can identify trends and patterns in data that may not be immediately apparent, helping organizations to make predictions and plan for the future.
  • Improved risk management : Data analysis can help organizations identify potential risks and take proactive steps to mitigate them.
  • Innovation: Data analysis can inspire innovation and new ideas by revealing new opportunities or previously unknown correlations in data.

Limitations of Data Analysis

  • Data quality: The quality of data can impact the accuracy and reliability of analysis results. If data is incomplete, inconsistent, or outdated, the analysis may not provide meaningful insights.
  • Limited scope: Data analysis is limited by the scope of the data available. If data is incomplete or does not capture all relevant factors, the analysis may not provide a complete picture.
  • Human error : Data analysis is often conducted by humans, and errors can occur in data collection, cleaning, and analysis.
  • Cost : Data analysis can be expensive, requiring specialized tools, software, and expertise.
  • Time-consuming : Data analysis can be time-consuming, especially when working with large datasets or conducting complex analyses.
  • Overreliance on data: Data analysis should be complemented with human intuition and expertise. Overreliance on data can lead to a lack of creativity and innovation.
  • Privacy concerns: Data analysis can raise privacy concerns if personal or sensitive information is used without proper consent or security measures.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Cluster Analysis

Cluster Analysis – Types, Methods and Examples

Data collection

Data Collection – Methods Types and Examples

Delimitations

Delimitations in Research – Types, Examples and...

Discriminant Analysis

Discriminant Analysis – Methods, Types and...

Research Process

Research Process – Steps, Examples and Tips

Research Design

Research Design – Types, Methods and Examples

Grad Coach

Quantitative Data Analysis 101

The lingo, methods and techniques, explained simply.

By: Derek Jansen (MBA)  and Kerryn Warren (PhD) | December 2020

Quantitative data analysis is one of those things that often strikes fear in students. It’s totally understandable – quantitative analysis is a complex topic, full of daunting lingo , like medians, modes, correlation and regression. Suddenly we’re all wishing we’d paid a little more attention in math class…

The good news is that while quantitative data analysis is a mammoth topic, gaining a working understanding of the basics isn’t that hard , even for those of us who avoid numbers and math . In this post, we’ll break quantitative analysis down into simple , bite-sized chunks so you can approach your research with confidence.

Quantitative data analysis methods and techniques 101

Overview: Quantitative Data Analysis 101

  • What (exactly) is quantitative data analysis?
  • When to use quantitative analysis
  • How quantitative analysis works

The two “branches” of quantitative analysis

  • Descriptive statistics 101
  • Inferential statistics 101
  • How to choose the right quantitative methods
  • Recap & summary

What is quantitative data analysis?

Despite being a mouthful, quantitative data analysis simply means analysing data that is numbers-based – or data that can be easily “converted” into numbers without losing any meaning.

For example, category-based variables like gender, ethnicity, or native language could all be “converted” into numbers without losing meaning – for example, English could equal 1, French 2, etc.

This contrasts against qualitative data analysis, where the focus is on words, phrases and expressions that can’t be reduced to numbers. If you’re interested in learning about qualitative analysis, check out our post and video here .

What is quantitative analysis used for?

Quantitative analysis is generally used for three purposes.

  • Firstly, it’s used to measure differences between groups . For example, the popularity of different clothing colours or brands.
  • Secondly, it’s used to assess relationships between variables . For example, the relationship between weather temperature and voter turnout.
  • And third, it’s used to test hypotheses in a scientifically rigorous way. For example, a hypothesis about the impact of a certain vaccine.

Again, this contrasts with qualitative analysis , which can be used to analyse people’s perceptions and feelings about an event or situation. In other words, things that can’t be reduced to numbers.

How does quantitative analysis work?

Well, since quantitative data analysis is all about analysing numbers , it’s no surprise that it involves statistics . Statistical analysis methods form the engine that powers quantitative analysis, and these methods can vary from pretty basic calculations (for example, averages and medians) to more sophisticated analyses (for example, correlations and regressions).

Sounds like gibberish? Don’t worry. We’ll explain all of that in this post. Importantly, you don’t need to be a statistician or math wiz to pull off a good quantitative analysis. We’ll break down all the technical mumbo jumbo in this post.

Need a helping hand?

how to create a data analysis in research

As I mentioned, quantitative analysis is powered by statistical analysis methods . There are two main “branches” of statistical methods that are used – descriptive statistics and inferential statistics . In your research, you might only use descriptive statistics, or you might use a mix of both , depending on what you’re trying to figure out. In other words, depending on your research questions, aims and objectives . I’ll explain how to choose your methods later.

So, what are descriptive and inferential statistics?

Well, before I can explain that, we need to take a quick detour to explain some lingo. To understand the difference between these two branches of statistics, you need to understand two important words. These words are population and sample .

First up, population . In statistics, the population is the entire group of people (or animals or organisations or whatever) that you’re interested in researching. For example, if you were interested in researching Tesla owners in the US, then the population would be all Tesla owners in the US.

However, it’s extremely unlikely that you’re going to be able to interview or survey every single Tesla owner in the US. Realistically, you’ll likely only get access to a few hundred, or maybe a few thousand owners using an online survey. This smaller group of accessible people whose data you actually collect is called your sample .

So, to recap – the population is the entire group of people you’re interested in, and the sample is the subset of the population that you can actually get access to. In other words, the population is the full chocolate cake , whereas the sample is a slice of that cake.

So, why is this sample-population thing important?

Well, descriptive statistics focus on describing the sample , while inferential statistics aim to make predictions about the population, based on the findings within the sample. In other words, we use one group of statistical methods – descriptive statistics – to investigate the slice of cake, and another group of methods – inferential statistics – to draw conclusions about the entire cake. There I go with the cake analogy again…

With that out the way, let’s take a closer look at each of these branches in more detail.

Descriptive statistics vs inferential statistics

Branch 1: Descriptive Statistics

Descriptive statistics serve a simple but critically important role in your research – to describe your data set – hence the name. In other words, they help you understand the details of your sample . Unlike inferential statistics (which we’ll get to soon), descriptive statistics don’t aim to make inferences or predictions about the entire population – they’re purely interested in the details of your specific sample .

When you’re writing up your analysis, descriptive statistics are the first set of stats you’ll cover, before moving on to inferential statistics. But, that said, depending on your research objectives and research questions , they may be the only type of statistics you use. We’ll explore that a little later.

So, what kind of statistics are usually covered in this section?

Some common statistical tests used in this branch include the following:

  • Mean – this is simply the mathematical average of a range of numbers.
  • Median – this is the midpoint in a range of numbers when the numbers are arranged in numerical order. If the data set makes up an odd number, then the median is the number right in the middle of the set. If the data set makes up an even number, then the median is the midpoint between the two middle numbers.
  • Mode – this is simply the most commonly occurring number in the data set.
  • In cases where most of the numbers are quite close to the average, the standard deviation will be relatively low.
  • Conversely, in cases where the numbers are scattered all over the place, the standard deviation will be relatively high.
  • Skewness . As the name suggests, skewness indicates how symmetrical a range of numbers is. In other words, do they tend to cluster into a smooth bell curve shape in the middle of the graph, or do they skew to the left or right?

Feeling a bit confused? Let’s look at a practical example using a small data set.

Descriptive statistics example data

On the left-hand side is the data set. This details the bodyweight of a sample of 10 people. On the right-hand side, we have the descriptive statistics. Let’s take a look at each of them.

First, we can see that the mean weight is 72.4 kilograms. In other words, the average weight across the sample is 72.4 kilograms. Straightforward.

Next, we can see that the median is very similar to the mean (the average). This suggests that this data set has a reasonably symmetrical distribution (in other words, a relatively smooth, centred distribution of weights, clustered towards the centre).

In terms of the mode , there is no mode in this data set. This is because each number is present only once and so there cannot be a “most common number”. If there were two people who were both 65 kilograms, for example, then the mode would be 65.

Next up is the standard deviation . 10.6 indicates that there’s quite a wide spread of numbers. We can see this quite easily by looking at the numbers themselves, which range from 55 to 90, which is quite a stretch from the mean of 72.4.

And lastly, the skewness of -0.2 tells us that the data is very slightly negatively skewed. This makes sense since the mean and the median are slightly different.

As you can see, these descriptive statistics give us some useful insight into the data set. Of course, this is a very small data set (only 10 records), so we can’t read into these statistics too much. Also, keep in mind that this is not a list of all possible descriptive statistics – just the most common ones.

But why do all of these numbers matter?

While these descriptive statistics are all fairly basic, they’re important for a few reasons:

  • Firstly, they help you get both a macro and micro-level view of your data. In other words, they help you understand both the big picture and the finer details.
  • Secondly, they help you spot potential errors in the data – for example, if an average is way higher than you’d expect, or responses to a question are highly varied, this can act as a warning sign that you need to double-check the data.
  • And lastly, these descriptive statistics help inform which inferential statistical techniques you can use, as those techniques depend on the skewness (in other words, the symmetry and normality) of the data.

Simply put, descriptive statistics are really important , even though the statistical techniques used are fairly basic. All too often at Grad Coach, we see students skimming over the descriptives in their eagerness to get to the more exciting inferential methods, and then landing up with some very flawed results.

Don’t be a sucker – give your descriptive statistics the love and attention they deserve!

Examples of descriptive statistics

Branch 2: Inferential Statistics

As I mentioned, while descriptive statistics are all about the details of your specific data set – your sample – inferential statistics aim to make inferences about the population . In other words, you’ll use inferential statistics to make predictions about what you’d expect to find in the full population.

What kind of predictions, you ask? Well, there are two common types of predictions that researchers try to make using inferential stats:

  • Firstly, predictions about differences between groups – for example, height differences between children grouped by their favourite meal or gender.
  • And secondly, relationships between variables – for example, the relationship between body weight and the number of hours a week a person does yoga.

In other words, inferential statistics (when done correctly), allow you to connect the dots and make predictions about what you expect to see in the real world population, based on what you observe in your sample data. For this reason, inferential statistics are used for hypothesis testing – in other words, to test hypotheses that predict changes or differences.

Inferential statistics are used to make predictions about what you’d expect to find in the full population, based on the sample.

Of course, when you’re working with inferential statistics, the composition of your sample is really important. In other words, if your sample doesn’t accurately represent the population you’re researching, then your findings won’t necessarily be very useful.

For example, if your population of interest is a mix of 50% male and 50% female , but your sample is 80% male , you can’t make inferences about the population based on your sample, since it’s not representative. This area of statistics is called sampling, but we won’t go down that rabbit hole here (it’s a deep one!) – we’ll save that for another post .

What statistics are usually used in this branch?

There are many, many different statistical analysis methods within the inferential branch and it’d be impossible for us to discuss them all here. So we’ll just take a look at some of the most common inferential statistical methods so that you have a solid starting point.

First up are T-Tests . T-tests compare the means (the averages) of two groups of data to assess whether they’re statistically significantly different. In other words, do they have significantly different means, standard deviations and skewness.

This type of testing is very useful for understanding just how similar or different two groups of data are. For example, you might want to compare the mean blood pressure between two groups of people – one that has taken a new medication and one that hasn’t – to assess whether they are significantly different.

Kicking things up a level, we have ANOVA, which stands for “analysis of variance”. This test is similar to a T-test in that it compares the means of various groups, but ANOVA allows you to analyse multiple groups , not just two groups So it’s basically a t-test on steroids…

Next, we have correlation analysis . This type of analysis assesses the relationship between two variables. In other words, if one variable increases, does the other variable also increase, decrease or stay the same. For example, if the average temperature goes up, do average ice creams sales increase too? We’d expect some sort of relationship between these two variables intuitively , but correlation analysis allows us to measure that relationship scientifically .

Lastly, we have regression analysis – this is quite similar to correlation in that it assesses the relationship between variables, but it goes a step further to understand cause and effect between variables, not just whether they move together. In other words, does the one variable actually cause the other one to move, or do they just happen to move together naturally thanks to another force? Just because two variables correlate doesn’t necessarily mean that one causes the other.

Stats overload…

I hear you. To make this all a little more tangible, let’s take a look at an example of a correlation in action.

Here’s a scatter plot demonstrating the correlation (relationship) between weight and height. Intuitively, we’d expect there to be some relationship between these two variables, which is what we see in this scatter plot. In other words, the results tend to cluster together in a diagonal line from bottom left to top right.

Sample correlation

As I mentioned, these are are just a handful of inferential techniques – there are many, many more. Importantly, each statistical method has its own assumptions and limitations.

For example, some methods only work with normally distributed (parametric) data, while other methods are designed specifically for non-parametric data. And that’s exactly why descriptive statistics are so important – they’re the first step to knowing which inferential techniques you can and can’t use.

Remember that every statistical method has its own assumptions and limitations,  so you need to be aware of these.

How to choose the right analysis method

To choose the right statistical methods, you need to think about two important factors :

  • The type of quantitative data you have (specifically, level of measurement and the shape of the data). And,
  • Your research questions and hypotheses

Let’s take a closer look at each of these.

Factor 1 – Data type

The first thing you need to consider is the type of data you’ve collected (or the type of data you will collect). By data types, I’m referring to the four levels of measurement – namely, nominal, ordinal, interval and ratio. If you’re not familiar with this lingo, check out the video below.

Why does this matter?

Well, because different statistical methods and techniques require different types of data. This is one of the “assumptions” I mentioned earlier – every method has its assumptions regarding the type of data.

For example, some techniques work with categorical data (for example, yes/no type questions, or gender or ethnicity), while others work with continuous numerical data (for example, age, weight or income) – and, of course, some work with multiple data types.

If you try to use a statistical method that doesn’t support the data type you have, your results will be largely meaningless . So, make sure that you have a clear understanding of what types of data you’ve collected (or will collect). Once you have this, you can then check which statistical methods would support your data types here .

If you haven’t collected your data yet, you can work in reverse and look at which statistical method would give you the most useful insights, and then design your data collection strategy to collect the correct data types.

Another important factor to consider is the shape of your data . Specifically, does it have a normal distribution (in other words, is it a bell-shaped curve, centred in the middle) or is it very skewed to the left or the right? Again, different statistical techniques work for different shapes of data – some are designed for symmetrical data while others are designed for skewed data.

This is another reminder of why descriptive statistics are so important – they tell you all about the shape of your data.

Factor 2: Your research questions

The next thing you need to consider is your specific research questions, as well as your hypotheses (if you have some). The nature of your research questions and research hypotheses will heavily influence which statistical methods and techniques you should use.

If you’re just interested in understanding the attributes of your sample (as opposed to the entire population), then descriptive statistics are probably all you need. For example, if you just want to assess the means (averages) and medians (centre points) of variables in a group of people.

On the other hand, if you aim to understand differences between groups or relationships between variables and to infer or predict outcomes in the population, then you’ll likely need both descriptive statistics and inferential statistics.

So, it’s really important to get very clear about your research aims and research questions, as well your hypotheses – before you start looking at which statistical techniques to use.

Never shoehorn a specific statistical technique into your research just because you like it or have some experience with it. Your choice of methods must align with all the factors we’ve covered here.

Time to recap…

You’re still with me? That’s impressive. We’ve covered a lot of ground here, so let’s recap on the key points:

  • Quantitative data analysis is all about  analysing number-based data  (which includes categorical and numerical data) using various statistical techniques.
  • The two main  branches  of statistics are  descriptive statistics  and  inferential statistics . Descriptives describe your sample, whereas inferentials make predictions about what you’ll find in the population.
  • Common  descriptive statistical methods include  mean  (average),  median , standard  deviation  and  skewness .
  • Common  inferential statistical methods include  t-tests ,  ANOVA ,  correlation  and  regression  analysis.
  • To choose the right statistical methods and techniques, you need to consider the  type of data you’re working with , as well as your  research questions  and hypotheses.

how to create a data analysis in research

Psst… there’s more (for free)

This post is part of our dissertation mini-course, which covers everything you need to get started with your dissertation, thesis or research project. 

You Might Also Like:

Narrative analysis explainer

74 Comments

Oddy Labs

Hi, I have read your article. Such a brilliant post you have created.

Derek Jansen

Thank you for the feedback. Good luck with your quantitative analysis.

Abdullahi Ramat

Thank you so much.

Obi Eric Onyedikachi

Thank you so much. I learnt much well. I love your summaries of the concepts. I had love you to explain how to input data using SPSS

Lumbuka Kaunda

Amazing and simple way of breaking down quantitative methods.

Charles Lwanga

This is beautiful….especially for non-statisticians. I have skimmed through but I wish to read again. and please include me in other articles of the same nature when you do post. I am interested. I am sure, I could easily learn from you and get off the fear that I have had in the past. Thank you sincerely.

Essau Sefolo

Send me every new information you might have.

fatime

i need every new information

Dr Peter

Thank you for the blog. It is quite informative. Dr Peter Nemaenzhe PhD

Mvogo Mvogo Ephrem

It is wonderful. l’ve understood some of the concepts in a more compréhensive manner

Maya

Your article is so good! However, I am still a bit lost. I am doing a secondary research on Gun control in the US and increase in crime rates and I am not sure which analysis method I should use?

Joy

Based on the given learning points, this is inferential analysis, thus, use ‘t-tests, ANOVA, correlation and regression analysis’

Peter

Well explained notes. Am an MPH student and currently working on my thesis proposal, this has really helped me understand some of the things I didn’t know.

Jejamaije Mujoro

I like your page..helpful

prashant pandey

wonderful i got my concept crystal clear. thankyou!!

Dailess Banda

This is really helpful , thank you

Lulu

Thank you so much this helped

wossen

Wonderfully explained

Niamatullah zaheer

thank u so much, it was so informative

mona

THANKYOU, this was very informative and very helpful

Thaddeus Ogwoka

This is great GRADACOACH I am not a statistician but I require more of this in my thesis

Include me in your posts.

Alem Teshome

This is so great and fully useful. I would like to thank you again and again.

Mrinal

Glad to read this article. I’ve read lot of articles but this article is clear on all concepts. Thanks for sharing.

Emiola Adesina

Thank you so much. This is a very good foundation and intro into quantitative data analysis. Appreciate!

Josyl Hey Aquilam

You have a very impressive, simple but concise explanation of data analysis for Quantitative Research here. This is a God-send link for me to appreciate research more. Thank you so much!

Lynnet Chikwaikwai

Avery good presentation followed by the write up. yes you simplified statistics to make sense even to a layman like me. Thank so much keep it up. The presenter did ell too. i would like more of this for Qualitative and exhaust more of the test example like the Anova.

Adewole Ikeoluwa

This is a very helpful article, couldn’t have been clearer. Thank you.

Samih Soud ALBusaidi

Awesome and phenomenal information.Well done

Nūr

The video with the accompanying article is super helpful to demystify this topic. Very well done. Thank you so much.

Lalah

thank you so much, your presentation helped me a lot

Anjali

I don’t know how should I express that ur article is saviour for me 🥺😍

Saiqa Aftab Tunio

It is well defined information and thanks for sharing. It helps me a lot in understanding the statistical data.

Funeka Mvandaba

I gain a lot and thanks for sharing brilliant ideas, so wish to be linked on your email update.

Rita Kathomi Gikonyo

Very helpful and clear .Thank you Gradcoach.

Hilaria Barsabal

Thank for sharing this article, well organized and information presented are very clear.

AMON TAYEBWA

VERY INTERESTING AND SUPPORTIVE TO NEW RESEARCHERS LIKE ME. AT LEAST SOME BASICS ABOUT QUANTITATIVE.

Tariq

An outstanding, well explained and helpful article. This will help me so much with my data analysis for my research project. Thank you!

chikumbutso

wow this has just simplified everything i was scared of how i am gonna analyse my data but thanks to you i will be able to do so

Idris Haruna

simple and constant direction to research. thanks

Mbunda Castro

This is helpful

AshikB

Great writing!! Comprehensive and very helpful.

himalaya ravi

Do you provide any assistance for other steps of research methodology like making research problem testing hypothesis report and thesis writing?

Sarah chiwamba

Thank you so much for such useful article!

Lopamudra

Amazing article. So nicely explained. Wow

Thisali Liyanage

Very insightfull. Thanks

Melissa

I am doing a quality improvement project to determine if the implementation of a protocol will change prescribing habits. Would this be a t-test?

Aliyah

The is a very helpful blog, however, I’m still not sure how to analyze my data collected. I’m doing a research on “Free Education at the University of Guyana”

Belayneh Kassahun

tnx. fruitful blog!

Suzanne

So I am writing exams and would like to know how do establish which method of data analysis to use from the below research questions: I am a bit lost as to how I determine the data analysis method from the research questions.

Do female employees report higher job satisfaction than male employees with similar job descriptions across the South African telecommunications sector? – I though that maybe Chi Square could be used here. – Is there a gender difference in talented employees’ actual turnover decisions across the South African telecommunications sector? T-tests or Correlation in this one. – Is there a gender difference in the cost of actual turnover decisions across the South African telecommunications sector? T-tests or Correlation in this one. – What practical recommendations can be made to the management of South African telecommunications companies on leveraging gender to mitigate employee turnover decisions?

Your assistance will be appreciated if I could get a response as early as possible tomorrow

Like

This was quite helpful. Thank you so much.

kidane Getachew

wow I got a lot from this article, thank you very much, keep it up

FAROUK AHMAD NKENGA

Thanks for yhe guidance. Can you send me this guidance on my email? To enable offline reading?

Nosi Ruth Xabendlini

Thank you very much, this service is very helpful.

George William Kiyingi

Every novice researcher needs to read this article as it puts things so clear and easy to follow. Its been very helpful.

Adebisi

Wonderful!!!! you explained everything in a way that anyone can learn. Thank you!!

Miss Annah

I really enjoyed reading though this. Very easy to follow. Thank you

Reza Kia

Many thanks for your useful lecture, I would be really appreciated if you could possibly share with me the PPT of presentation related to Data type?

Protasia Tairo

Thank you very much for sharing, I got much from this article

Fatuma Chobo

This is a very informative write-up. Kindly include me in your latest posts.

naphtal

Very interesting mostly for social scientists

Boy M. Bachtiar

Thank you so much, very helpfull

You’re welcome 🙂

Dr Mafaza Mansoor

woow, its great, its very informative and well understood because of your way of writing like teaching in front of me in simple languages.

Opio Len

I have been struggling to understand a lot of these concepts. Thank you for the informative piece which is written with outstanding clarity.

Eric

very informative article. Easy to understand

Leena Fukey

Beautiful read, much needed.

didin

Always greet intro and summary. I learn so much from GradCoach

Mmusyoka

Quite informative. Simple and clear summary.

Jewel Faver

I thoroughly enjoyed reading your informative and inspiring piece. Your profound insights into this topic truly provide a better understanding of its complexity. I agree with the points you raised, especially when you delved into the specifics of the article. In my opinion, that aspect is often overlooked and deserves further attention.

Shantae

Absolutely!!! Thank you

Thazika Chitimera

Thank you very much for this post. It made me to understand how to do my data analysis.

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

  • Print Friendly

PW Skills | Blog

Data Analysis Techniques in Research – Methods, Tools & Examples

' src=

Varun Saharawat is a seasoned professional in the fields of SEO and content writing. With a profound knowledge of the intricate aspects of these disciplines, Varun has established himself as a valuable asset in the world of digital marketing and online content creation.

data analysis techniques in research

Data analysis techniques in research are essential because they allow researchers to derive meaningful insights from data sets to support their hypotheses or research objectives.

Data Analysis Techniques in Research : While various groups, institutions, and professionals may have diverse approaches to data analysis, a universal definition captures its essence. Data analysis involves refining, transforming, and interpreting raw data to derive actionable insights that guide informed decision-making for businesses.

Data Analytics Course

A straightforward illustration of data analysis emerges when we make everyday decisions, basing our choices on past experiences or predictions of potential outcomes.

If you want to learn more about this topic and acquire valuable skills that will set you apart in today’s data-driven world, we highly recommend enrolling in the Data Analytics Course by Physics Wallah . And as a special offer for our readers, use the coupon code “READER” to get a discount on this course.

Table of Contents

What is Data Analysis?

Data analysis is the systematic process of inspecting, cleaning, transforming, and interpreting data with the objective of discovering valuable insights and drawing meaningful conclusions. This process involves several steps:

  • Inspecting : Initial examination of data to understand its structure, quality, and completeness.
  • Cleaning : Removing errors, inconsistencies, or irrelevant information to ensure accurate analysis.
  • Transforming : Converting data into a format suitable for analysis, such as normalization or aggregation.
  • Interpreting : Analyzing the transformed data to identify patterns, trends, and relationships.

Types of Data Analysis Techniques in Research

Data analysis techniques in research are categorized into qualitative and quantitative methods, each with its specific approaches and tools. These techniques are instrumental in extracting meaningful insights, patterns, and relationships from data to support informed decision-making, validate hypotheses, and derive actionable recommendations. Below is an in-depth exploration of the various types of data analysis techniques commonly employed in research:

1) Qualitative Analysis:

Definition: Qualitative analysis focuses on understanding non-numerical data, such as opinions, concepts, or experiences, to derive insights into human behavior, attitudes, and perceptions.

  • Content Analysis: Examines textual data, such as interview transcripts, articles, or open-ended survey responses, to identify themes, patterns, or trends.
  • Narrative Analysis: Analyzes personal stories or narratives to understand individuals’ experiences, emotions, or perspectives.
  • Ethnographic Studies: Involves observing and analyzing cultural practices, behaviors, and norms within specific communities or settings.

2) Quantitative Analysis:

Quantitative analysis emphasizes numerical data and employs statistical methods to explore relationships, patterns, and trends. It encompasses several approaches:

Descriptive Analysis:

  • Frequency Distribution: Represents the number of occurrences of distinct values within a dataset.
  • Central Tendency: Measures such as mean, median, and mode provide insights into the central values of a dataset.
  • Dispersion: Techniques like variance and standard deviation indicate the spread or variability of data.

Diagnostic Analysis:

  • Regression Analysis: Assesses the relationship between dependent and independent variables, enabling prediction or understanding causality.
  • ANOVA (Analysis of Variance): Examines differences between groups to identify significant variations or effects.

Predictive Analysis:

  • Time Series Forecasting: Uses historical data points to predict future trends or outcomes.
  • Machine Learning Algorithms: Techniques like decision trees, random forests, and neural networks predict outcomes based on patterns in data.

Prescriptive Analysis:

  • Optimization Models: Utilizes linear programming, integer programming, or other optimization techniques to identify the best solutions or strategies.
  • Simulation: Mimics real-world scenarios to evaluate various strategies or decisions and determine optimal outcomes.

Specific Techniques:

  • Monte Carlo Simulation: Models probabilistic outcomes to assess risk and uncertainty.
  • Factor Analysis: Reduces the dimensionality of data by identifying underlying factors or components.
  • Cohort Analysis: Studies specific groups or cohorts over time to understand trends, behaviors, or patterns within these groups.
  • Cluster Analysis: Classifies objects or individuals into homogeneous groups or clusters based on similarities or attributes.
  • Sentiment Analysis: Uses natural language processing and machine learning techniques to determine sentiment, emotions, or opinions from textual data.

Also Read: AI and Predictive Analytics: Examples, Tools, Uses, Ai Vs Predictive Analytics

Data Analysis Techniques in Research Examples

To provide a clearer understanding of how data analysis techniques are applied in research, let’s consider a hypothetical research study focused on evaluating the impact of online learning platforms on students’ academic performance.

Research Objective:

Determine if students using online learning platforms achieve higher academic performance compared to those relying solely on traditional classroom instruction.

Data Collection:

  • Quantitative Data: Academic scores (grades) of students using online platforms and those using traditional classroom methods.
  • Qualitative Data: Feedback from students regarding their learning experiences, challenges faced, and preferences.

Data Analysis Techniques Applied:

1) Descriptive Analysis:

  • Calculate the mean, median, and mode of academic scores for both groups.
  • Create frequency distributions to represent the distribution of grades in each group.

2) Diagnostic Analysis:

  • Conduct an Analysis of Variance (ANOVA) to determine if there’s a statistically significant difference in academic scores between the two groups.
  • Perform Regression Analysis to assess the relationship between the time spent on online platforms and academic performance.

3) Predictive Analysis:

  • Utilize Time Series Forecasting to predict future academic performance trends based on historical data.
  • Implement Machine Learning algorithms to develop a predictive model that identifies factors contributing to academic success on online platforms.

4) Prescriptive Analysis:

  • Apply Optimization Models to identify the optimal combination of online learning resources (e.g., video lectures, interactive quizzes) that maximize academic performance.
  • Use Simulation Techniques to evaluate different scenarios, such as varying student engagement levels with online resources, to determine the most effective strategies for improving learning outcomes.

5) Specific Techniques:

  • Conduct Factor Analysis on qualitative feedback to identify common themes or factors influencing students’ perceptions and experiences with online learning.
  • Perform Cluster Analysis to segment students based on their engagement levels, preferences, or academic outcomes, enabling targeted interventions or personalized learning strategies.
  • Apply Sentiment Analysis on textual feedback to categorize students’ sentiments as positive, negative, or neutral regarding online learning experiences.

By applying a combination of qualitative and quantitative data analysis techniques, this research example aims to provide comprehensive insights into the effectiveness of online learning platforms.

Also Read: Learning Path to Become a Data Analyst in 2024

Data Analysis Techniques in Quantitative Research

Quantitative research involves collecting numerical data to examine relationships, test hypotheses, and make predictions. Various data analysis techniques are employed to interpret and draw conclusions from quantitative data. Here are some key data analysis techniques commonly used in quantitative research:

1) Descriptive Statistics:

  • Description: Descriptive statistics are used to summarize and describe the main aspects of a dataset, such as central tendency (mean, median, mode), variability (range, variance, standard deviation), and distribution (skewness, kurtosis).
  • Applications: Summarizing data, identifying patterns, and providing initial insights into the dataset.

2) Inferential Statistics:

  • Description: Inferential statistics involve making predictions or inferences about a population based on a sample of data. This technique includes hypothesis testing, confidence intervals, t-tests, chi-square tests, analysis of variance (ANOVA), regression analysis, and correlation analysis.
  • Applications: Testing hypotheses, making predictions, and generalizing findings from a sample to a larger population.

3) Regression Analysis:

  • Description: Regression analysis is a statistical technique used to model and examine the relationship between a dependent variable and one or more independent variables. Linear regression, multiple regression, logistic regression, and nonlinear regression are common types of regression analysis .
  • Applications: Predicting outcomes, identifying relationships between variables, and understanding the impact of independent variables on the dependent variable.

4) Correlation Analysis:

  • Description: Correlation analysis is used to measure and assess the strength and direction of the relationship between two or more variables. The Pearson correlation coefficient, Spearman rank correlation coefficient, and Kendall’s tau are commonly used measures of correlation.
  • Applications: Identifying associations between variables and assessing the degree and nature of the relationship.

5) Factor Analysis:

  • Description: Factor analysis is a multivariate statistical technique used to identify and analyze underlying relationships or factors among a set of observed variables. It helps in reducing the dimensionality of data and identifying latent variables or constructs.
  • Applications: Identifying underlying factors or constructs, simplifying data structures, and understanding the underlying relationships among variables.

6) Time Series Analysis:

  • Description: Time series analysis involves analyzing data collected or recorded over a specific period at regular intervals to identify patterns, trends, and seasonality. Techniques such as moving averages, exponential smoothing, autoregressive integrated moving average (ARIMA), and Fourier analysis are used.
  • Applications: Forecasting future trends, analyzing seasonal patterns, and understanding time-dependent relationships in data.

7) ANOVA (Analysis of Variance):

  • Description: Analysis of variance (ANOVA) is a statistical technique used to analyze and compare the means of two or more groups or treatments to determine if they are statistically different from each other. One-way ANOVA, two-way ANOVA, and MANOVA (Multivariate Analysis of Variance) are common types of ANOVA.
  • Applications: Comparing group means, testing hypotheses, and determining the effects of categorical independent variables on a continuous dependent variable.

8) Chi-Square Tests:

  • Description: Chi-square tests are non-parametric statistical tests used to assess the association between categorical variables in a contingency table. The Chi-square test of independence, goodness-of-fit test, and test of homogeneity are common chi-square tests.
  • Applications: Testing relationships between categorical variables, assessing goodness-of-fit, and evaluating independence.

These quantitative data analysis techniques provide researchers with valuable tools and methods to analyze, interpret, and derive meaningful insights from numerical data. The selection of a specific technique often depends on the research objectives, the nature of the data, and the underlying assumptions of the statistical methods being used.

Also Read: Analysis vs. Analytics: How Are They Different?

Data Analysis Methods

Data analysis methods refer to the techniques and procedures used to analyze, interpret, and draw conclusions from data. These methods are essential for transforming raw data into meaningful insights, facilitating decision-making processes, and driving strategies across various fields. Here are some common data analysis methods:

  • Description: Descriptive statistics summarize and organize data to provide a clear and concise overview of the dataset. Measures such as mean, median, mode, range, variance, and standard deviation are commonly used.
  • Description: Inferential statistics involve making predictions or inferences about a population based on a sample of data. Techniques such as hypothesis testing, confidence intervals, and regression analysis are used.

3) Exploratory Data Analysis (EDA):

  • Description: EDA techniques involve visually exploring and analyzing data to discover patterns, relationships, anomalies, and insights. Methods such as scatter plots, histograms, box plots, and correlation matrices are utilized.
  • Applications: Identifying trends, patterns, outliers, and relationships within the dataset.

4) Predictive Analytics:

  • Description: Predictive analytics use statistical algorithms and machine learning techniques to analyze historical data and make predictions about future events or outcomes. Techniques such as regression analysis, time series forecasting, and machine learning algorithms (e.g., decision trees, random forests, neural networks) are employed.
  • Applications: Forecasting future trends, predicting outcomes, and identifying potential risks or opportunities.

5) Prescriptive Analytics:

  • Description: Prescriptive analytics involve analyzing data to recommend actions or strategies that optimize specific objectives or outcomes. Optimization techniques, simulation models, and decision-making algorithms are utilized.
  • Applications: Recommending optimal strategies, decision-making support, and resource allocation.

6) Qualitative Data Analysis:

  • Description: Qualitative data analysis involves analyzing non-numerical data, such as text, images, videos, or audio, to identify themes, patterns, and insights. Methods such as content analysis, thematic analysis, and narrative analysis are used.
  • Applications: Understanding human behavior, attitudes, perceptions, and experiences.

7) Big Data Analytics:

  • Description: Big data analytics methods are designed to analyze large volumes of structured and unstructured data to extract valuable insights. Technologies such as Hadoop, Spark, and NoSQL databases are used to process and analyze big data.
  • Applications: Analyzing large datasets, identifying trends, patterns, and insights from big data sources.

8) Text Analytics:

  • Description: Text analytics methods involve analyzing textual data, such as customer reviews, social media posts, emails, and documents, to extract meaningful information and insights. Techniques such as sentiment analysis, text mining, and natural language processing (NLP) are used.
  • Applications: Analyzing customer feedback, monitoring brand reputation, and extracting insights from textual data sources.

These data analysis methods are instrumental in transforming data into actionable insights, informing decision-making processes, and driving organizational success across various sectors, including business, healthcare, finance, marketing, and research. The selection of a specific method often depends on the nature of the data, the research objectives, and the analytical requirements of the project or organization.

Also Read: Quantitative Data Analysis: Types, Analysis & Examples

Data Analysis Tools

Data analysis tools are essential instruments that facilitate the process of examining, cleaning, transforming, and modeling data to uncover useful information, make informed decisions, and drive strategies. Here are some prominent data analysis tools widely used across various industries:

1) Microsoft Excel:

  • Description: A spreadsheet software that offers basic to advanced data analysis features, including pivot tables, data visualization tools, and statistical functions.
  • Applications: Data cleaning, basic statistical analysis, visualization, and reporting.

2) R Programming Language:

  • Description: An open-source programming language specifically designed for statistical computing and data visualization.
  • Applications: Advanced statistical analysis, data manipulation, visualization, and machine learning.

3) Python (with Libraries like Pandas, NumPy, Matplotlib, and Seaborn):

  • Description: A versatile programming language with libraries that support data manipulation, analysis, and visualization.
  • Applications: Data cleaning, statistical analysis, machine learning, and data visualization.

4) SPSS (Statistical Package for the Social Sciences):

  • Description: A comprehensive statistical software suite used for data analysis, data mining, and predictive analytics.
  • Applications: Descriptive statistics, hypothesis testing, regression analysis, and advanced analytics.

5) SAS (Statistical Analysis System):

  • Description: A software suite used for advanced analytics, multivariate analysis, and predictive modeling.
  • Applications: Data management, statistical analysis, predictive modeling, and business intelligence.

6) Tableau:

  • Description: A data visualization tool that allows users to create interactive and shareable dashboards and reports.
  • Applications: Data visualization , business intelligence , and interactive dashboard creation.

7) Power BI:

  • Description: A business analytics tool developed by Microsoft that provides interactive visualizations and business intelligence capabilities.
  • Applications: Data visualization, business intelligence, reporting, and dashboard creation.

8) SQL (Structured Query Language) Databases (e.g., MySQL, PostgreSQL, Microsoft SQL Server):

  • Description: Database management systems that support data storage, retrieval, and manipulation using SQL queries.
  • Applications: Data retrieval, data cleaning, data transformation, and database management.

9) Apache Spark:

  • Description: A fast and general-purpose distributed computing system designed for big data processing and analytics.
  • Applications: Big data processing, machine learning, data streaming, and real-time analytics.

10) IBM SPSS Modeler:

  • Description: A data mining software application used for building predictive models and conducting advanced analytics.
  • Applications: Predictive modeling, data mining, statistical analysis, and decision optimization.

These tools serve various purposes and cater to different data analysis needs, from basic statistical analysis and data visualization to advanced analytics, machine learning, and big data processing. The choice of a specific tool often depends on the nature of the data, the complexity of the analysis, and the specific requirements of the project or organization.

Also Read: How to Analyze Survey Data: Methods & Examples

Importance of Data Analysis in Research

The importance of data analysis in research cannot be overstated; it serves as the backbone of any scientific investigation or study. Here are several key reasons why data analysis is crucial in the research process:

  • Data analysis helps ensure that the results obtained are valid and reliable. By systematically examining the data, researchers can identify any inconsistencies or anomalies that may affect the credibility of the findings.
  • Effective data analysis provides researchers with the necessary information to make informed decisions. By interpreting the collected data, researchers can draw conclusions, make predictions, or formulate recommendations based on evidence rather than intuition or guesswork.
  • Data analysis allows researchers to identify patterns, trends, and relationships within the data. This can lead to a deeper understanding of the research topic, enabling researchers to uncover insights that may not be immediately apparent.
  • In empirical research, data analysis plays a critical role in testing hypotheses. Researchers collect data to either support or refute their hypotheses, and data analysis provides the tools and techniques to evaluate these hypotheses rigorously.
  • Transparent and well-executed data analysis enhances the credibility of research findings. By clearly documenting the data analysis methods and procedures, researchers allow others to replicate the study, thereby contributing to the reproducibility of research findings.
  • In fields such as business or healthcare, data analysis helps organizations allocate resources more efficiently. By analyzing data on consumer behavior, market trends, or patient outcomes, organizations can make strategic decisions about resource allocation, budgeting, and planning.
  • In public policy and social sciences, data analysis is instrumental in developing and evaluating policies and interventions. By analyzing data on social, economic, or environmental factors, policymakers can assess the effectiveness of existing policies and inform the development of new ones.
  • Data analysis allows for continuous improvement in research methods and practices. By analyzing past research projects, identifying areas for improvement, and implementing changes based on data-driven insights, researchers can refine their approaches and enhance the quality of future research endeavors.

However, it is important to remember that mastering these techniques requires practice and continuous learning. That’s why we highly recommend the Data Analytics Course by Physics Wallah . Not only does it cover all the fundamentals of data analysis, but it also provides hands-on experience with various tools such as Excel, Python, and Tableau. Plus, if you use the “ READER ” coupon code at checkout, you can get a special discount on the course.

For Latest Tech Related Information, Join Our Official Free Telegram Group : PW Skills Telegram Group

Data Analysis Techniques in Research FAQs

What are the 5 techniques for data analysis.

The five techniques for data analysis include: Descriptive Analysis Diagnostic Analysis Predictive Analysis Prescriptive Analysis Qualitative Analysis

What are techniques of data analysis in research?

Techniques of data analysis in research encompass both qualitative and quantitative methods. These techniques involve processes like summarizing raw data, investigating causes of events, forecasting future outcomes, offering recommendations based on predictions, and examining non-numerical data to understand concepts or experiences.

What are the 3 methods of data analysis?

The three primary methods of data analysis are: Qualitative Analysis Quantitative Analysis Mixed-Methods Analysis

What are the four types of data analysis techniques?

The four types of data analysis techniques are: Descriptive Analysis Diagnostic Analysis Predictive Analysis Prescriptive Analysis

  • Finance Data Analysis: What is a Financial Data Analysis?

finance data analysis

Finance data analysis is used increasingly by many companies worldwide. Data analysis in finance helps to collect various financial-related raw…

  • Big Data: What Do You Mean By Big Data?

big data

Big data is a tremendous volume of complex data collected from various sources, such as text, videos, audio, email, etc.…

  • What are Data Analysis Tools?

analytical tools for data analysis

Data Analytical tools help to extract important insights from raw and unstructured data. Read this article to get a list…

right adv

Related Articles

  • Top 20 Big Data Tools Used By Professionals
  • Best Courses For Data Analytics: Top 10 Courses For Your Career in Trend
  • 10 Most Popular Big Data Analytics Tools
  • Top Best Big Data Analytics Classes 2024
  • Big Data and Analytics – Definition, Benefits, and More
  • Best 5 Unique Strategies to Use Artificial Intelligence Data Analytics
  • Best BI Tool: Top 15 Business Intelligence Tools (BI Tools)

bottom banner

  • AI & NLP
  • Churn & Loyalty
  • Customer Experience
  • Customer Journeys
  • Customer Metrics
  • Feedback Analysis
  • Product Experience
  • Product Updates
  • Sentiment Analysis
  • Surveys & Feedback Collection
  • Try Thematic

Welcome to the community

how to create a data analysis in research

Qualitative Data Analysis: Step-by-Step Guide (Manual vs. Automatic)

When we conduct qualitative methods of research, need to explain changes in metrics or understand people's opinions, we always turn to qualitative data. Qualitative data is typically generated through:

  • Interview transcripts
  • Surveys with open-ended questions
  • Contact center transcripts
  • Texts and documents
  • Audio and video recordings
  • Observational notes

Compared to quantitative data, which captures structured information, qualitative data is unstructured and has more depth. It can answer our questions, can help formulate hypotheses and build understanding.

It's important to understand the differences between quantitative data & qualitative data . But unfortunately, analyzing qualitative data is difficult. While tools like Excel, Tableau and PowerBI crunch and visualize quantitative data with ease, there are a limited number of mainstream tools for analyzing qualitative data . The majority of qualitative data analysis still happens manually.

That said, there are two new trends that are changing this. First, there are advances in natural language processing (NLP) which is focused on understanding human language. Second, there is an explosion of user-friendly software designed for both researchers and businesses. Both help automate the qualitative data analysis process.

In this post we want to teach you how to conduct a successful qualitative data analysis. There are two primary qualitative data analysis methods; manual & automatic. We will teach you how to conduct the analysis manually, and also, automatically using software solutions powered by NLP. We’ll guide you through the steps to conduct a manual analysis, and look at what is involved and the role technology can play in automating this process.

More businesses are switching to fully-automated analysis of qualitative customer data because it is cheaper, faster, and just as accurate. Primarily, businesses purchase subscriptions to feedback analytics platforms so that they can understand customer pain points and sentiment.

Overwhelming quantity of feedback

We’ll take you through 5 steps to conduct a successful qualitative data analysis. Within each step we will highlight the key difference between the manual, and automated approach of qualitative researchers. Here's an overview of the steps:

The 5 steps to doing qualitative data analysis

  • Gathering and collecting your qualitative data
  • Organizing and connecting into your qualitative data
  • Coding your qualitative data
  • Analyzing the qualitative data for insights
  • Reporting on the insights derived from your analysis

What is Qualitative Data Analysis?

Qualitative data analysis is a process of gathering, structuring and interpreting qualitative data to understand what it represents.

Qualitative data is non-numerical and unstructured. Qualitative data generally refers to text, such as open-ended responses to survey questions or user interviews, but also includes audio, photos and video.

Businesses often perform qualitative data analysis on customer feedback. And within this context, qualitative data generally refers to verbatim text data collected from sources such as reviews, complaints, chat messages, support centre interactions, customer interviews, case notes or social media comments.

How is qualitative data analysis different from quantitative data analysis?

Understanding the differences between quantitative & qualitative data is important. When it comes to analyzing data, Qualitative Data Analysis serves a very different role to Quantitative Data Analysis. But what sets them apart?

Qualitative Data Analysis dives into the stories hidden in non-numerical data such as interviews, open-ended survey answers, or notes from observations. It uncovers the ‘whys’ and ‘hows’ giving a deep understanding of people’s experiences and emotions.

Quantitative Data Analysis on the other hand deals with numerical data, using statistics to measure differences, identify preferred options, and pinpoint root causes of issues.  It steps back to address questions like "how many" or "what percentage" to offer broad insights we can apply to larger groups.

In short, Qualitative Data Analysis is like a microscope,  helping us understand specific detail. Quantitative Data Analysis is like the telescope, giving us a broader perspective. Both are important, working together to decode data for different objectives.

Qualitative Data Analysis methods

Once all the data has been captured, there are a variety of analysis techniques available and the choice is determined by your specific research objectives and the kind of data you’ve gathered.  Common qualitative data analysis methods include:

Content Analysis

This is a popular approach to qualitative data analysis. Other qualitative analysis techniques may fit within the broad scope of content analysis. Thematic analysis is a part of the content analysis.  Content analysis is used to identify the patterns that emerge from text, by grouping content into words, concepts, and themes. Content analysis is useful to quantify the relationship between all of the grouped content. The Columbia School of Public Health has a detailed breakdown of content analysis .

Narrative Analysis

Narrative analysis focuses on the stories people tell and the language they use to make sense of them.  It is particularly useful in qualitative research methods where customer stories are used to get a deep understanding of customers’ perspectives on a specific issue. A narrative analysis might enable us to summarize the outcomes of a focused case study.

Discourse Analysis

Discourse analysis is used to get a thorough understanding of the political, cultural and power dynamics that exist in specific situations.  The focus of discourse analysis here is on the way people express themselves in different social contexts. Discourse analysis is commonly used by brand strategists who hope to understand why a group of people feel the way they do about a brand or product.

Thematic Analysis

Thematic analysis is used to deduce the meaning behind the words people use. This is accomplished by discovering repeating themes in text. These meaningful themes reveal key insights into data and can be quantified, particularly when paired with sentiment analysis . Often, the outcome of thematic analysis is a code frame that captures themes in terms of codes, also called categories. So the process of thematic analysis is also referred to as “coding”. A common use-case for thematic analysis in companies is analysis of customer feedback.

Grounded Theory

Grounded theory is a useful approach when little is known about a subject. Grounded theory starts by formulating a theory around a single data case. This means that the theory is “grounded”. Grounded theory analysis is based on actual data, and not entirely speculative. Then additional cases can be examined to see if they are relevant and can add to the original grounded theory.

Methods of qualitative data analysis; approaches and techniques to qualitative data analysis

Challenges of Qualitative Data Analysis

While Qualitative Data Analysis offers rich insights, it comes with its challenges. Each unique QDA method has its unique hurdles. Let’s take a look at the challenges researchers and analysts might face, depending on the chosen method.

  • Time and Effort (Narrative Analysis): Narrative analysis, which focuses on personal stories, demands patience. Sifting through lengthy narratives to find meaningful insights can be time-consuming, requires dedicated effort.
  • Being Objective (Grounded Theory): Grounded theory, building theories from data, faces the challenges of personal biases. Staying objective while interpreting data is crucial, ensuring conclusions are rooted in the data itself.
  • Complexity (Thematic Analysis): Thematic analysis involves identifying themes within data, a process that can be intricate. Categorizing and understanding themes can be complex, especially when each piece of data varies in context and structure. Thematic Analysis software can simplify this process.
  • Generalizing Findings (Narrative Analysis): Narrative analysis, dealing with individual stories, makes drawing broad challenging. Extending findings from a single narrative to a broader context requires careful consideration.
  • Managing Data (Thematic Analysis): Thematic analysis involves organizing and managing vast amounts of unstructured data, like interview transcripts. Managing this can be a hefty task, requiring effective data management strategies.
  • Skill Level (Grounded Theory): Grounded theory demands specific skills to build theories from the ground up. Finding or training analysts with these skills poses a challenge, requiring investment in building expertise.

Benefits of qualitative data analysis

Qualitative Data Analysis (QDA) is like a versatile toolkit, offering a tailored approach to understanding your data. The benefits it offers are as diverse as the methods. Let’s explore why choosing the right method matters.

  • Tailored Methods for Specific Needs: QDA isn't one-size-fits-all. Depending on your research objectives and the type of data at hand, different methods offer unique benefits. If you want emotive customer stories, narrative analysis paints a strong picture. When you want to explain a score, thematic analysis reveals insightful patterns
  • Flexibility with Thematic Analysis: thematic analysis is like a chameleon in the toolkit of QDA. It adapts well to different types of data and research objectives, making it a top choice for any qualitative analysis.
  • Deeper Understanding, Better Products: QDA helps you dive into people's thoughts and feelings. This deep understanding helps you build products and services that truly matches what people want, ensuring satisfied customers
  • Finding the Unexpected: Qualitative data often reveals surprises that we miss in quantitative data. QDA offers us new ideas and perspectives, for insights we might otherwise miss.
  • Building Effective Strategies: Insights from QDA are like strategic guides. They help businesses in crafting plans that match people’s desires.
  • Creating Genuine Connections: Understanding people’s experiences lets businesses connect on a real level. This genuine connection helps build trust and loyalty, priceless for any business.

How to do Qualitative Data Analysis: 5 steps

Now we are going to show how you can do your own qualitative data analysis. We will guide you through this process step by step. As mentioned earlier, you will learn how to do qualitative data analysis manually , and also automatically using modern qualitative data and thematic analysis software.

To get best value from the analysis process and research process, it’s important to be super clear about the nature and scope of the question that’s being researched. This will help you select the research collection channels that are most likely to help you answer your question.

Depending on if you are a business looking to understand customer sentiment, or an academic surveying a school, your approach to qualitative data analysis will be unique.

Once you’re clear, there’s a sequence to follow. And, though there are differences in the manual and automatic approaches, the process steps are mostly the same.

The use case for our step-by-step guide is a company looking to collect data (customer feedback data), and analyze the customer feedback - in order to improve customer experience. By analyzing the customer feedback the company derives insights about their business and their customers. You can follow these same steps regardless of the nature of your research. Let’s get started.

Step 1: Gather your qualitative data and conduct research (Conduct qualitative research)

The first step of qualitative research is to do data collection. Put simply, data collection is gathering all of your data for analysis. A common situation is when qualitative data is spread across various sources.

Classic methods of gathering qualitative data

Most companies use traditional methods for gathering qualitative data: conducting interviews with research participants, running surveys, and running focus groups. This data is typically stored in documents, CRMs, databases and knowledge bases. It’s important to examine which data is available and needs to be included in your research project, based on its scope.

Using your existing qualitative feedback

As it becomes easier for customers to engage across a range of different channels, companies are gathering increasingly large amounts of both solicited and unsolicited qualitative feedback.

Most organizations have now invested in Voice of Customer programs , support ticketing systems, chatbot and support conversations, emails and even customer Slack chats.

These new channels provide companies with new ways of getting feedback, and also allow the collection of unstructured feedback data at scale.

The great thing about this data is that it contains a wealth of valubale insights and that it’s already there! When you have a new question about user behavior or your customers, you don’t need to create a new research study or set up a focus group. You can find most answers in the data you already have.

Typically, this data is stored in third-party solutions or a central database, but there are ways to export it or connect to a feedback analysis solution through integrations or an API.

Utilize untapped qualitative data channels

There are many online qualitative data sources you may not have considered. For example, you can find useful qualitative data in social media channels like Twitter or Facebook. Online forums, review sites, and online communities such as Discourse or Reddit also contain valuable data about your customers, or research questions.

If you are considering performing a qualitative benchmark analysis against competitors - the internet is your best friend. Gathering feedback in competitor reviews on sites like Trustpilot, G2, Capterra, Better Business Bureau or on app stores is a great way to perform a competitor benchmark analysis.

Customer feedback analysis software often has integrations into social media and review sites, or you could use a solution like DataMiner to scrape the reviews.

G2.com reviews of the product Airtable. You could pull reviews from G2 for your analysis.

Step 2: Connect & organize all your qualitative data

Now you all have this qualitative data but there’s a problem, the data is unstructured. Before feedback can be analyzed and assigned any value, it needs to be organized in a single place. Why is this important? Consistency!

If all data is easily accessible in one place and analyzed in a consistent manner, you will have an easier time summarizing and making decisions based on this data.

The manual approach to organizing your data

The classic method of structuring qualitative data is to plot all the raw data you’ve gathered into a spreadsheet.

Typically, research and support teams would share large Excel sheets and different business units would make sense of the qualitative feedback data on their own. Each team collects and organizes the data in a way that best suits them, which means the feedback tends to be kept in separate silos.

An alternative and a more robust solution is to store feedback in a central database, like Snowflake or Amazon Redshift .

Keep in mind that when you organize your data in this way, you are often preparing it to be imported into another software. If you go the route of a database, you would need to use an API to push the feedback into a third-party software.

Computer-assisted qualitative data analysis software (CAQDAS)

Traditionally within the manual analysis approach (but not always), qualitative data is imported into CAQDAS software for coding.

In the early 2000s, CAQDAS software was popularised by developers such as ATLAS.ti, NVivo and MAXQDA and eagerly adopted by researchers to assist with the organizing and coding of data.  

The benefits of using computer-assisted qualitative data analysis software:

  • Assists in the organizing of your data
  • Opens you up to exploring different interpretations of your data analysis
  • Allows you to share your dataset easier and allows group collaboration (allows for secondary analysis)

However you still need to code the data, uncover the themes and do the analysis yourself. Therefore it is still a manual approach.

The user interface of CAQDAS software 'NVivo'

Organizing your qualitative data in a feedback repository

Another solution to organizing your qualitative data is to upload it into a feedback repository where it can be unified with your other data , and easily searchable and taggable. There are a number of software solutions that act as a central repository for your qualitative research data. Here are a couple solutions that you could investigate:  

  • Dovetail: Dovetail is a research repository with a focus on video and audio transcriptions. You can tag your transcriptions within the platform for theme analysis. You can also upload your other qualitative data such as research reports, survey responses, support conversations, and customer interviews. Dovetail acts as a single, searchable repository. And makes it easier to collaborate with other people around your qualitative research.
  • EnjoyHQ: EnjoyHQ is another research repository with similar functionality to Dovetail. It boasts a more sophisticated search engine, but it has a higher starting subscription cost.

Organizing your qualitative data in a feedback analytics platform

If you have a lot of qualitative customer or employee feedback, from the likes of customer surveys or employee surveys, you will benefit from a feedback analytics platform. A feedback analytics platform is a software that automates the process of both sentiment analysis and thematic analysis . Companies use the integrations offered by these platforms to directly tap into their qualitative data sources (review sites, social media, survey responses, etc.). The data collected is then organized and analyzed consistently within the platform.

If you have data prepared in a spreadsheet, it can also be imported into feedback analytics platforms.

Once all this rich data has been organized within the feedback analytics platform, it is ready to be coded and themed, within the same platform. Thematic is a feedback analytics platform that offers one of the largest libraries of integrations with qualitative data sources.

Some of qualitative data integrations offered by Thematic

Step 3: Coding your qualitative data

Your feedback data is now organized in one place. Either within your spreadsheet, CAQDAS, feedback repository or within your feedback analytics platform. The next step is to code your feedback data so we can extract meaningful insights in the next step.

Coding is the process of labelling and organizing your data in such a way that you can then identify themes in the data, and the relationships between these themes.

To simplify the coding process, you will take small samples of your customer feedback data, come up with a set of codes, or categories capturing themes, and label each piece of feedback, systematically, for patterns and meaning. Then you will take a larger sample of data, revising and refining the codes for greater accuracy and consistency as you go.

If you choose to use a feedback analytics platform, much of this process will be automated and accomplished for you.

The terms to describe different categories of meaning (‘theme’, ‘code’, ‘tag’, ‘category’ etc) can be confusing as they are often used interchangeably.  For clarity, this article will use the term ‘code’.

To code means to identify key words or phrases and assign them to a category of meaning. “I really hate the customer service of this computer software company” would be coded as “poor customer service”.

How to manually code your qualitative data

  • Decide whether you will use deductive or inductive coding. Deductive coding is when you create a list of predefined codes, and then assign them to the qualitative data. Inductive coding is the opposite of this, you create codes based on the data itself. Codes arise directly from the data and you label them as you go. You need to weigh up the pros and cons of each coding method and select the most appropriate.
  • Read through the feedback data to get a broad sense of what it reveals. Now it’s time to start assigning your first set of codes to statements and sections of text.
  • Keep repeating step 2, adding new codes and revising the code description as often as necessary.  Once it has all been coded, go through everything again, to be sure there are no inconsistencies and that nothing has been overlooked.
  • Create a code frame to group your codes. The coding frame is the organizational structure of all your codes. And there are two commonly used types of coding frames, flat, or hierarchical. A hierarchical code frame will make it easier for you to derive insights from your analysis.
  • Based on the number of times a particular code occurs, you can now see the common themes in your feedback data. This is insightful! If ‘bad customer service’ is a common code, it’s time to take action.

We have a detailed guide dedicated to manually coding your qualitative data .

Example of a hierarchical coding frame in qualitative data analysis

Using software to speed up manual coding of qualitative data

An Excel spreadsheet is still a popular method for coding. But various software solutions can help speed up this process. Here are some examples.

  • CAQDAS / NVivo - CAQDAS software has built-in functionality that allows you to code text within their software. You may find the interface the software offers easier for managing codes than a spreadsheet.
  • Dovetail/EnjoyHQ - You can tag transcripts and other textual data within these solutions. As they are also repositories you may find it simpler to keep the coding in one platform.
  • IBM SPSS - SPSS is a statistical analysis software that may make coding easier than in a spreadsheet.
  • Ascribe - Ascribe’s ‘Coder’ is a coding management system. Its user interface will make it easier for you to manage your codes.

Automating the qualitative coding process using thematic analysis software

In solutions which speed up the manual coding process, you still have to come up with valid codes and often apply codes manually to pieces of feedback. But there are also solutions that automate both the discovery and the application of codes.

Advances in machine learning have now made it possible to read, code and structure qualitative data automatically. This type of automated coding is offered by thematic analysis software .

Automation makes it far simpler and faster to code the feedback and group it into themes. By incorporating natural language processing (NLP) into the software, the AI looks across sentences and phrases to identify common themes meaningful statements. Some automated solutions detect repeating patterns and assign codes to them, others make you train the AI by providing examples. You could say that the AI learns the meaning of the feedback on its own.

Thematic automates the coding of qualitative feedback regardless of source. There’s no need to set up themes or categories in advance. Simply upload your data and wait a few minutes. You can also manually edit the codes to further refine their accuracy.  Experiments conducted indicate that Thematic’s automated coding is just as accurate as manual coding .

Paired with sentiment analysis and advanced text analytics - these automated solutions become powerful for deriving quality business or research insights.

You could also build your own , if you have the resources!

The key benefits of using an automated coding solution

Automated analysis can often be set up fast and there’s the potential to uncover things that would never have been revealed if you had given the software a prescribed list of themes to look for.

Because the model applies a consistent rule to the data, it captures phrases or statements that a human eye might have missed.

Complete and consistent analysis of customer feedback enables more meaningful findings. Leading us into step 4.

Step 4: Analyze your data: Find meaningful insights

Now we are going to analyze our data to find insights. This is where we start to answer our research questions. Keep in mind that step 4 and step 5 (tell the story) have some overlap . This is because creating visualizations is both part of analysis process and reporting.

The task of uncovering insights is to scour through the codes that emerge from the data and draw meaningful correlations from them. It is also about making sure each insight is distinct and has enough data to support it.

Part of the analysis is to establish how much each code relates to different demographics and customer profiles, and identify whether there’s any relationship between these data points.

Manually create sub-codes to improve the quality of insights

If your code frame only has one level, you may find that your codes are too broad to be able to extract meaningful insights. This is where it is valuable to create sub-codes to your primary codes. This process is sometimes referred to as meta coding.

Note: If you take an inductive coding approach, you can create sub-codes as you are reading through your feedback data and coding it.

While time-consuming, this exercise will improve the quality of your analysis. Here is an example of what sub-codes could look like.

Example of sub-codes

You need to carefully read your qualitative data to create quality sub-codes. But as you can see, the depth of analysis is greatly improved. By calculating the frequency of these sub-codes you can get insight into which  customer service problems you can immediately address.

Correlate the frequency of codes to customer segments

Many businesses use customer segmentation . And you may have your own respondent segments that you can apply to your qualitative analysis. Segmentation is the practise of dividing customers or research respondents into subgroups.

Segments can be based on:

  • Demographic
  • And any other data type that you care to segment by

It is particularly useful to see the occurrence of codes within your segments. If one of your customer segments is considered unimportant to your business, but they are the cause of nearly all customer service complaints, it may be in your best interest to focus attention elsewhere. This is a useful insight!

Manually visualizing coded qualitative data

There are formulas you can use to visualize key insights in your data. The formulas we will suggest are imperative if you are measuring a score alongside your feedback.

If you are collecting a metric alongside your qualitative data this is a key visualization. Impact answers the question: “What’s the impact of a code on my overall score?”. Using Net Promoter Score (NPS) as an example, first you need to:

  • Calculate overall NPS
  • Calculate NPS in the subset of responses that do not contain that theme
  • Subtract B from A

Then you can use this simple formula to calculate code impact on NPS .

Visualizing qualitative data: Calculating the impact of a code on your score

You can then visualize this data using a bar chart.

You can download our CX toolkit - it includes a template to recreate this.

Trends over time

This analysis can help you answer questions like: “Which codes are linked to decreases or increases in my score over time?”

We need to compare two sequences of numbers: NPS over time and code frequency over time . Using Excel, calculate the correlation between the two sequences, which can be either positive (the more codes the higher the NPS, see picture below), or negative (the more codes the lower the NPS).

Now you need to plot code frequency against the absolute value of code correlation with NPS. Here is the formula:

Analyzing qualitative data: Calculate which codes are linked to increases or decreases in my score

The visualization could look like this:

Visualizing qualitative data trends over time

These are two examples, but there are more. For a third manual formula, and to learn why word clouds are not an insightful form of analysis, read our visualizations article .

Using a text analytics solution to automate analysis

Automated text analytics solutions enable codes and sub-codes to be pulled out of the data automatically. This makes it far faster and easier to identify what’s driving negative or positive results. And to pick up emerging trends and find all manner of rich insights in the data.

Another benefit of AI-driven text analytics software is its built-in capability for sentiment analysis, which provides the emotive context behind your feedback and other qualitative textual data therein.

Thematic provides text analytics that goes further by allowing users to apply their expertise on business context to edit or augment the AI-generated outputs.

Since the move away from manual research is generally about reducing the human element, adding human input to the technology might sound counter-intuitive. However, this is mostly to make sure important business nuances in the feedback aren’t missed during coding. The result is a higher accuracy of analysis. This is sometimes referred to as augmented intelligence .

Codes displayed by volume within Thematic. You can 'manage themes' to introduce human input.

Step 5: Report on your data: Tell the story

The last step of analyzing your qualitative data is to report on it, to tell the story. At this point, the codes are fully developed and the focus is on communicating the narrative to the audience.

A coherent outline of the qualitative research, the findings and the insights is vital for stakeholders to discuss and debate before they can devise a meaningful course of action.

Creating graphs and reporting in Powerpoint

Typically, qualitative researchers take the tried and tested approach of distilling their report into a series of charts, tables and other visuals which are woven into a narrative for presentation in Powerpoint.

Using visualization software for reporting

With data transformation and APIs, the analyzed data can be shared with data visualisation software, such as Power BI or Tableau , Google Studio or Looker. Power BI and Tableau are among the most preferred options.

Visualizing your insights inside a feedback analytics platform

Feedback analytics platforms, like Thematic, incorporate visualisation tools that intuitively turn key data and insights into graphs.  This removes the time consuming work of constructing charts to visually identify patterns and creates more time to focus on building a compelling narrative that highlights the insights, in bite-size chunks, for executive teams to review.

Using a feedback analytics platform with visualization tools means you don’t have to use a separate product for visualizations. You can export graphs into Powerpoints straight from the platforms.

Two examples of qualitative data visualizations within Thematic

Conclusion - Manual or Automated?

There are those who remain deeply invested in the manual approach - because it’s familiar, because they’re reluctant to spend money and time learning new software, or because they’ve been burned by the overpromises of AI.  

For projects that involve small datasets, manual analysis makes sense. For example, if the objective is simply to quantify a simple question like “Do customers prefer X concepts to Y?”. If the findings are being extracted from a small set of focus groups and interviews, sometimes it’s easier to just read them

However, as new generations come into the workplace, it’s technology-driven solutions that feel more comfortable and practical. And the merits are undeniable.  Especially if the objective is to go deeper and understand the ‘why’ behind customers’ preference for X or Y. And even more especially if time and money are considerations.

The ability to collect a free flow of qualitative feedback data at the same time as the metric means AI can cost-effectively scan, crunch, score and analyze a ton of feedback from one system in one go. And time-intensive processes like focus groups, or coding, that used to take weeks, can now be completed in a matter of hours or days.

But aside from the ever-present business case to speed things up and keep costs down, there are also powerful research imperatives for automated analysis of qualitative data: namely, accuracy and consistency.

Finding insights hidden in feedback requires consistency, especially in coding.  Not to mention catching all the ‘unknown unknowns’ that can skew research findings and steering clear of cognitive bias.

Some say without manual data analysis researchers won’t get an accurate “feel” for the insights. However, the larger data sets are, the harder it is to sort through the feedback and organize feedback that has been pulled from different places.  And, the more difficult it is to stay on course, the greater the risk of drawing incorrect, or incomplete, conclusions grows.

Though the process steps for qualitative data analysis have remained pretty much unchanged since psychologist Paul Felix Lazarsfeld paved the path a hundred years ago, the impact digital technology has had on types of qualitative feedback data and the approach to the analysis are profound.  

If you want to try an automated feedback analysis solution on your own qualitative data, you can get started with Thematic .

how to create a data analysis in research

Community & Marketing

Tyler manages our community of CX, insights & analytics professionals. Tyler's goal is to help unite insights professionals around common challenges.

We make it easy to discover the customer and product issues that matter.

Unlock the value of feedback at scale, in one platform. Try it for free now!

  • Questions to ask your Feedback Analytics vendor
  • How to end customer churn for good
  • Scalable analysis of NPS verbatims
  • 5 Text analytics approaches
  • How to calculate the ROI of CX

Our experts will show you how Thematic works, how to discover pain points and track the ROI of decisions. To access your free trial, book a personal demo today.

Recent posts

Watercare is New Zealand's largest water and wastewater service provider. They are responsible for bringing clean water to 1.7 million people in Tamaki Makaurau (Auckland) and safeguarding the wastewater network to minimize impact on the environment. Water is a sector that often gets taken for granted, with drainage and

Become a qualitative theming pro! Creating a perfect code frame is hard, but thematic analysis software makes the process much easier.

Qualtrics is one of the most well-known and powerful Customer Feedback Management platforms. But even so, it has limitations. We recently hosted a live panel where data analysts from two well-known brands shared their experiences with Qualtrics, and how they extended this platform’s capabilities. Below, we’ll share the

Child Care and Early Education Research Connections

Data analysis.

Different statistics and methods used to describe the characteristics of the members of a sample or population, explore the relationships between variables, to test research hypotheses, and to visually represent data are described. Terms relating to the topics covered are defined in the  Research Glossary .

Descriptive Statistics

Tests of Significance

Graphical/Pictorial Methods

Analytical techniques.

Descriptive statistics can be useful for two purposes:

To provide basic information about the characteristics of a sample or population. These characteristics are represented by variables in a research study dataset.

To highlight potential relationships between these characteristics, or the relationships among the variables in the dataset.

The four most common descriptive statistics are:

Proportions, Percentages and Ratios

Measures of central tendency, measures of dispersion, measures of association.

One of the most basic ways of describing the characteristics of a sample or population is to classify its individual members into mutually exclusive categories and counting the number of cases in each of the categories. In research, variables with discrete, qualitative categories are called nominal or categorical variables. The categories can be given numerical codes, but they cannot be ranked, added, or multiplied. Examples of nominal variables include gender (male, female), preschool program attendance (yes, no), and race/ethnicity (White, African American, Hispanic, Asian, American Indian). Researchers calculate proportions, percentages and ratios in order to summarize the data from nominal or categorical variables and to allow for comparisons to be made between groups.

Proportion —The number of cases in a category divided by the total number of cases across all categories of a variable.

Percentage —The proportion multiplied by 100 (or the number of cases in a category divided by the total number of cases across all categories of a value times 100).

Ratio —The number of cases in one category to the number of cases in a second category.

A researcher selects a sample of 100 students from a Head Start program. The sample includes 20 White children, 30 African American children, 40 Hispanic children and 10 children of mixed-race/ethnicity.

Proportion of Hispanic children in the program = 40 / (20+30+40+10) = .40.

Percentage of Hispanic children in the program = .40 x 100 = 40%.

Ratio of Hispanic children to White children in the program = 40/20 = 2.0, or the ratio of Hispanic to White children enrolled in the Head Start program is 2 to 1.

Proportions, percentages and ratios are used to summarize the characteristics of a sample or population that fall into discrete categories. Measures of central tendency are the most basic and, often, the most informative description of a population's characteristics, when those characteristics are measured using an interval scale. The values of an interval variable are ordered where the distance between any two adjacent values is the same but the zero point is arbitrary. Values on an interval scale can be added and subtracted. Examples of interval scales or interval variables include household income, years of schooling, hours a child spends in child care and the cost of child care.

Measures of central tendency describe the "average" member of the sample or population of interest. There are three measures of central tendency:

Mean —The arithmetic average of the values of a variable. To calculate the mean, all the values of a variable are summed and divided by the total number of cases.

Median —The value within a set of values that divides the values in half (i.e. 50% of the variable's values lie above the median, and 50% lie below the median).

Mode —The value of a variable that occurs most often.

The annual incomes of five randomly selected people in the United States are $10,000, $10,000, $45,000, $60,000, and $1,000,000.

Mean Income = (10,000 + 10,000 + 45,000 + 60,000 + 1,000,000) / 5 = $225,000.

Median Income = $45,000.

Modal Income = $10,000.

The mean is the most commonly used measure of central tendency. Medians are generally used when a few values are extremely different from the rest of the values (this is called a skewed distribution). For example, the median income is often the best measure of the average income because, while most individuals earn between $0 and $200,000 annually, a handful of individuals earn millions.

Measures of dispersion provide information about the spread of a variable's values. There are three key measures of dispersion:

Range  is simply the difference between the smallest and largest values in the data. Researchers often report simply the values of the range (e.g., 75 – 100).

Variance  is a commonly used measure of dispersion, or how spread out a set of values are around the mean. It is calculated by taking the average of the squared differences between each value and the mean. The variance is the standard deviation squared.

Standard deviation , like variance, is a measure of the spread of a set of values around the mean of the values. The wider the spread, the greater the standard deviation and the greater the range of the values from their mean. A small standard deviation indicates that most of the values are close to the mean. A large standard deviation on the other hand indicates that the values are more spread out. The standard deviation is the square root of the variance.

Five randomly selected children were administered a standardized reading assessment. Their scores on the assessment were 50, 50, 60,75 and 90 with a mean score of 65.

Range = 90 - 50 = 40.

Variance = [(50 - 65)2 + (50 - 65)2 + (60 - 65)2 + (75 - 65)2 + (90 - 65)2] / 5 = 300.

Standard Deviation = Square Root (150,540,000,000) = 17.32.

Skewness and Kurtosis

The range, variance and standard deviation are measures of dispersion and provide information about the spread of the values of a variable. Two additional measures provide information about the shape of the distribution of values.

Skew  is a measure of whether some values of a variable are extremely different from the majority of the values. Skewness refers to the tendency of the values of a variable to depart from symmetry. A distribution is symmetric if one half of the distribution is exactly equal to the other half. For example, the distribution of annual income in the U.S. is skewed because most people make between $0 and $200,000 a year, but a handful of people earn millions. A variable is positively skewed (skewed to the right) if the extreme values are higher than the majority of values. A variable is negatively skewed (skewed to the left) if the extreme values are lower than the majority of values. In the example of students' standardized test scores, the distribution is slightly positively skewed.

Kurtosis  measures how outlier-prone a distribution is. Outliers are values of a variable that are much smaller or larger than most of the values found in a dataset. The kurtosis of a normal distribution is 0. If the kurtosis is different from 0, then the distribution produces outliers that are either more extreme (positive kurtosis) or less extreme (negative kurtosis) than are produced by the normal distribution.

Measures of association indicate whether two variables are related. Two measures are commonly used:

Chi-square test of independence

Correlation

Chi-Square test of independence  is used to evaluate whether there is an association between two variables. (The chi-square test can also be used as a measure of goodness of fit, to test if data from a sample come from a population with a specific distribution, as an alternative to Anderson-Darling and Kolmogorov-Smirnov goodness-of-fit tests.)

It is most often used with nominal data (i.e., data that are put into discrete categories: e.g., gender [male, female] and type of job [unskilled, semi-skilled, skilled]) to determine whether they are associated. However, it can also be used with ordinal data.

Assumes that the samples being compared (e.g., males, females) are independent.

Tests the null hypothesis of no difference between the two variables (i.e., type of job is not related to gender).

To test for associations, a chi-square is calculated in the following way: Suppose a researcher wants to know whether there is a relationship between gender and two types of jobs, construction worker and administrative assistant. To perform a chi-square test, the researcher counts the number of female administrative assistants, the number of female construction workers, the number of male administrative assistants, and the number of male construction workers in the data. These counts are compared with the number that would be expected in each category if there were no association between job type and gender (this expected count is based on statistical calculations). The association between the two variables is determined to be significant (the null hypothesis is rejected), if the value of the chi-square test is greater than or equal to the critical value for a given significance level (typically .05) and the degrees of freedom associated with the test found in a chi-square table. The degrees of freedom for the chi-square are calculated using the following formula:  df  = (r-1)(c-1) where r is the number of rows and c is the number of columns in a contingency or cross-tabulation table. For example, the critical value for a 2 x 2 table with 1 degree of freedom ([2-1][2-1]=1) is 3.841.

Correlation coefficient  is used to measure the strength and direction of the relationship between numeric variables (e.g., weight and height).

The most common correlation coefficient is the Pearson's product-moment correlation coefficient (or simply  Pearson's r ), which can range from -1 to +1.

Values closer to 1 (either positive or negative) indicate that a stronger association exists between the two variables.

A positive coefficient (values between 0 and 1) suggests that larger values of one of the variables are accompanied by larger values of the other variable. For example, height and weight are usually positively correlated because taller people tend to weigh more.

A negative association (values between 0 and -1) suggests that larger values of one of the variables are accompanied by smaller values of the other variable. For example, age and hours slept per night are often negatively correlated because older people usually sleep fewer hours per night than younger people.

The findings reported by researchers are typically based on data collected from a single sample that was drawn from the population of interest (e.g., a sample of children selected from the population of children enrolled in Head Start or Early Head Start). If additional random samples of the same size were drawn from this population, the estimated percentages and means calculated using the data from each of these other samples might differ by chance somewhat from the estimates produced from one sample. Researchers use one of several tests to evaluate whether their findings are statistically significant.

Statistical significance refers to the probability or likelihood that the difference between groups or the relationship between variables observed in statistical analyses is not due to random chance (e.g., that differences between the average scores on a measure of language development between 3- and 4-year-olds are likely to be “real” rather than just observed in this sample by chance). If there is a very small probability that an observed difference or relationship is due to chance, the results are said to reach statistical significance. This means that the researcher concludes that there is a real difference between two groups or a real relationship between the observed variables.

Significance tests and the associated  p-  value only tell us how likely it is that a statistical result (e.g., a difference between the means of two or more groups, or a correlation between two variables) is due to chance. The p-value is the probability that the results of a statistical test are due to chance. In the social and behavioral sciences, a p-value less than or equal to .05 is usually interpreted to mean that the results are statistically significant (that the statistical results would occur by chance 5 times or fewer out of 100), although sometimes researchers use a p-value of .10 to indicate whether a result is statistically significant. The lower the p-value, the less likely a statistical result is due to chance. Lower p-values are therefore a more rigorous criteria for concluding significance.

Researchers use a variety of approaches to test whether their findings are statistically significant or not. The choice depends on several factors, including the number of groups being compared, whether the groups are independent from one another, and the type of variables used in the analysis. Three widely used tests are the t-test, F-test, and Chi-square test.

Three of the more widely used tests of statistical significance are described briefly below.

Chi-Square test  is used when testing for associations between categorical variables (e.g., differences in whether a child has been diagnosed as having a cognitive disability by gender or race/ethnicity). It is also used as a goodness-of-fit test to determine whether data from a sample come from a population with a specific distribution.

t-test  is used to compare the means of two independent samples (independent t-test), the means of one sample at different times (paired sample t-test) or the mean of one sample against a known mean (one sample t-test). For example, when comparing the mean assessment scores of boys and girls or the mean scores of 3- and 4-year-old children, an independent t-test would be used. When comparing the mean assessment scores of girls only at two time points (e.g., fall and spring of the program year) a paired t-test would be used. A one sample t-test would be used when comparing the mean scores of a sample of children to the mean score of a population of children. The t- test is appropriate for small sample sizes (less than 30) although it is often used when testing group differences for larger samples. It is also used to test whether correlation and regression coefficients are significantly different from zero.

F-test  is an extension of the t-test and is used to compare the means of three or more independent samples (groups). The F-test is used in Analysis of Variance (ANOVA) to examine the ratio of the between groups to within groups variance. It is also used to test the significance of the total variance explained by a regression model with multiple independent variables.

Significance tests alone do not tell us anything about the size of the difference between groups or the strength of the association between variables. Because significance test results are sensitive to sample size, studies with different sample sizes with the same means and standard deviations would have different t statistics and p values. It is therefore important that researchers provide additional information about the size of the difference between groups or the association and whether the difference/association is substantively meaningful.

See the following for additional information about descriptive statistics and tests of significance:

Descriptive analysis in education: A guide for researchers  (PDF)

Basic Statistics

Effect Sizes and Statistical Significance

Summarizing and Presenting Data

There are several graphical and pictorial methods that enhance understanding of individual variables and the relationships between variables. Graphical and pictorial methods provide a visual representation of the data. Some of these methods include:

Line graphs

Scatter plots.

Geographical Information Systems (GIS)

Bar charts visually represent the frequencies or percentages with which different categories of a variable occur.

Bar charts are most often used when describing the percentages of different groups with a specific characteristic. For example, the percentages of boys and girls who participate in team sports. However, they may also be used when describing averages such as the average boys and girls spend per week participating in team sports.

Each category of a variable (e.g., gender [boys and girls], children's age [3, 4, and 5]) is displayed along the bottom (or horizontal or X axis) of a bar chart.

The vertical axis (or Y axis) includes the values of the statistic on that the groups are being compared (e.g., percentage participating in team sports).

A bar is drawn for each of the categories along the horizontal axis and the height of the bar corresponds to the frequency or percentage with which that value occurs.

A pie chart (or a circle chart) is one of the most commonly used methods for graphically presenting statistical data.

As its name suggests, it is a circular graphic, which is divided into slices to illustrate the proportion or percentage of a sample or population that belong to each of the categories of a variable.

The size of each slice represents the proportion or percentage of the total sample or population with a specific characteristic (found in a specific category). For example, the percentage of children enrolled in Early Head Start who are members of different racial/ethnic groups would be represented by different slices with the size of each slice proportionate to the group's representation in the total population of children enrolled in the Early Head Start program.

A line graph is a type of chart which displays information as a series of data points connected by a straight line.

Line graphs are often used to show changes in a characteristic over time.

It has an X-axis (horizontal axis) and a Y axis (vertical axis). The time segments of interest are displayed on the X-axis (e.g., years, months). The range of values that the characteristic of interest can take are displayed along the Y-axis (e.g., annual household income, mean years of schooling, average cost of child care). A data point is plotted coinciding with the value of the Y variable plotted for each of the values of the X variable, and a line is drawn connecting the points.

Scatter plots display the relationship between two quantitative or numeric variables by plotting one variable against the value of another variable

The values of one of the two variables are displayed on the horizontal axis (x axis) and the values of the other variable are displayed on the vertical axis (y axis)

Each person or subject in a study would receive one data point on the scatter plot that corresponds to his or her values on the two variables. For example, a scatter plot could be used to show the relationship between income and children's scores on a math assessment. A data point for each child in the study showing his or her math score and family income would be shown on the scatter plot. Thus, the number of data points would equal the total number of children in the study.

Geographic Information Systems (GIS)

A Geographic Information System is computer software capable of capturing, storing, analyzing, and displaying geographically referenced information; that is, data identified according to location.

Using a GIS program, a researcher can create a map to represent data relationships visually. For example, the National Center for Education Statistics creates maps showing the characteristics of school districts across the United States such as the percentage of children living in married couple households, median family incomes and percentage of population that speaks a language other than English. The data that are linked to school district location come from the American Community Survey.

Display networks of relationships among variables, enabling researchers to identify the nature of relationships that would otherwise be too complex to conceptualize.

See the following for additional information about different graphic methods:

Graphical Analytic Techniques

Geographic Information Systems

Researchers use different analytical techniques to examine complex relationships between variables. There are three basic types of analytical techniques:

Regression Analysis

Grouping methods, multiple equation models.

Regression analysis assumes that the dependent, or outcome, variable is directly affected by one or more independent variables. There are four important types of regression analyses:

Ordinary least squares (OLS) regression

OLS regression (also known as linear regression) is used to determine the relationship between a dependent variable and one or more independent variables.

OLS regression is used when the dependent variable is continuous. Continuous variables, in theory, can take on any value with a range. For example, family child care expenses, measured in dollars, is a continuous variable.

Independent variables may be nominal, ordinal or continuous. Nominal variables, which are also referred to as categorical variables, have two or more non-numeric or qualitative categories. Examples of nominal variables are children's gender (male, female), their parents' marital status (single, married, separated, divorced), and the type of child care children receive (center-based, home-based care). Ordinal variables are similar to nominal variables except it is possible to order the categories and the order has meaning. For example, children's families’ socioeconomic status may be grouped as low, middle and high.

When used to estimate the associations between two or more independent variables and a single dependent variable, it is called multiple linear regression.

In multiple regression, the coefficient (i.e., standardized or unstandardized regression coefficient for each independent variable) tells you how much the dependent variable is expected to change when that independent variable increases by one, holding all the other independent variables constant.

Logistic regression

Logistic regression (or logit regression) is a special form of regression analysis that is used to examine the associations between a set of independent or predictor variables and a dichotomous outcome variable. A dichotomous variable is a variable with only two possible values, e.g. child receives child care before or after the Head Start program day (yes, no).

Like linear regression, the independent variables may be either interval, ordinal, or nominal. A researcher might use logistic regression to study the relationships between parental education, household income, and parental employment and whether children receive child care from someone other than their parents (receives nonparent care/does not receive nonparent care).

Hierarchical linear modeling (HLM)

Used when data are nested. Nested data occur when several individuals belong to the same group under study. For example, in child care research, children enrolled in a center-based child care program are grouped into classrooms with several classrooms in a center. Thus, the children are nested within classrooms and classrooms are nested within centers.

Allows researchers to determine the effects of characteristics for each level of nested data, classrooms and centers, on the outcome variables. HLM is also used to study growth (e.g., growth in children’s reading and math knowledge and skills over time).

Duration models

Used to estimate the length of time before a given event occurs or the length of time spent in a state. For example, in child care policy research, duration models have been used to estimate the length of time that families receive child care subsidies.

Sometimes referred to as survival analysis or event history analysis.

Grouping methods are techniques for classifying observations into meaningful categories. Two of the most common grouping methods are discriminant analysis and cluster analysis.

Discriminant analysis

Identifies characteristics that distinguish between groups. For example, a researcher could use discriminant analysis to determine which characteristics identify families that seek child care subsidies and which identify families that do not.

It is used when the dependent variable is a categorical variable (e.g., family receives child care subsidies [yes, no], child enrolled in family care [yes, no], type of child care child receives [relative care, non-relative care, center-based care]). The independent variables are interval variables (e.g., years of schooling, family income).

Cluster analysis

Used to classify similar individuals together. It uses a set of measured variables to classify a sample of individuals (or organizations) into a number of groups such that individuals with similar values on the variables are placed in the same group. For example, cluster analysis would be used to group together parents who hold similar views of child care or children who are suspended from school.

Its goal is to sort individuals into groups in such a way that individuals in the same group (cluster) are more similar to each other than to individuals in other groups.

The variables used in cluster analysis may be nominal, ordinal or interval.

Multiple equation modeling, which is an extension of regression, is used to examine the causal pathways from independent variables to the dependent variable. For example, what are the variables that link (or explain) the relationship between maternal education (independent variable) and children's early reading skills (dependent variable)? These variables might include the nature and quality of mother-child interactions or the frequency and quality of shared book reading.

There are two main types of multiple equation models:

Path analysis

Structural equation modeling

Path analysis is an extension of multiple regression that allows researchers to examine multiple direct and indirect effects of a set of variables on a dependent, or outcome, variable. In path analysis, a direct effect measures the extent to which the dependent variable is influenced by an independent variable. An indirect effect measures the extent to which an independent variable's influence on the dependent variable is due to another variable.

A path diagram is created that identifies the relationships (paths) between all the variables and the direction of the influence between them.

The paths can run directly from an independent variable to a dependent variable (e.g., X→Y), or they can run indirectly from an independent variable, through an intermediary, or mediating, variable, to the dependent variable (e.g. X1→X2→Y).

The paths in the model are tested to determine the relative importance of each.

Because the relationships between variables in a path model can become complex, researchers often avoid labeling the variables in the model as independent and dependent variables. Instead, two types of variables are found in these models:

Exogenous variables  are not affected by other variables in the model. They have straight arrows emerging from them and not pointing to them.

Endogenous variables  are influenced by at least one other variable in the model. They have at least one straight arrow pointing to them.

Structural equation modeling (SEM)

Structural equation modeling expands path analysis by allowing for multiple indicators of unobserved (or latent) variables in the model. Latent variables are variables that are not directly observed (measured), but instead are inferred from other variables that are observed or directly measured. For example, children's school readiness is a latent variable with multiple indicators of children's development across multiple domains (e.g., children's scores on standardized assessments of early math and literacy, language, scores based on teacher reports of children's social skills and problem behaviors).

There are two parts to a SEM analysis. First, the measurement model is tested. This involves examining the relationships between the latent variables and their measures (indicators). Second, the structural model is tested in order to examine how the latent variables are related to one another. For example, a researcher might use SEM to investigate the relationships between different types of executive functions and word reading and reading comprehension for elementary school children. In this example, the latent variables word reading and reading comprehension might be inferred from a set of standardized reading assessments and the latent variables cognitive flexibility and inhibitory control from a set of executive function tasks. The measurement model of SEM allows the researcher to evaluate how well children's scores on the standardized reading assessments combine to identify children's word reading and reading comprehension. Assuming that the results of these analyses are acceptable, the researcher would move on to an evaluation of the structural model, examining the predicted relationships between two types of executive functions and two dimensions of reading.

SEM has several advantages over traditional path analysis:

Use of multiple indicators for key variables reduces measurement error.

Can test whether the effects of variables in the model and the relationships depicted in the entire model are the same for different groups (e.g., are the direct and indirect effects of parent investments on children's school readiness the same for White, Hispanic and African American children).

Can test models with multiple dependent variables (e.g., models predicting several domains of child development).

See the following for additional information about multiple equation models:

Finding Our Way: An Introduction to Path Analysis (Streiner)

An Introduction to Structural Equation Modeling (Hox & Bechger)  (PDF)

How to Write Data Analysis Reports in 9 Easy Steps

how to create a data analysis in research

Table of contents

Imagine a bunch of bricks. They don’t have a purpose until you put them together into a house, do they?

In business intelligence, data is your building material, and a quality data analysis report is what you want to see as the result.

But if you’ve ever tried to use the collected data and assemble it into an insightful report, you know it’s not an easy job to do. Data is supposed to tell a story about your performance, but there’s a long way from unprocessed, raw data to a meaningful narrative that you can use to create an actionable plan for making steady progress towards your goals.

This article will help you improve the quality of your data analysis reports and build them effortlessly and fast. Let’s jump right in.

What Is a Data Analysis Report?

Why is data analysis reporting important, how to write a data analysis report 9 simple steps, data analysis report examples.

marketing_overview_hubspot_ga_dashboard_databox

A data analysis report is a type of business report in which you present quantitative and qualitative data to evaluate your strategies and performance. Based on this data, you give recommendations for further steps and business decisions while using the data as evidence that backs up your evaluation.

Today, data analysis is one of the most important elements of business intelligence strategies as companies have realized the potential of having data-driven insights at hand to help them make data-driven decisions.

Just like you’ll look at your car’s dashboard if something’s wrong, you’ll pull your data to see what’s causing drops in website traffic, conversions, or sales – or any other business metric you may be following. This unprocessed data still doesn’t give you a diagnosis – it’s the first step towards a quality analysis. Once you’ve extracted and organized your data, it’s important to use graphs and charts to visualize it and make it easier to draw conclusions.

Once you add meaning to your data and create suggestions based on it, you have a data analysis report.

A vital detail everyone should know about data analysis reports is their accessibility for everyone in your team, and the ability to innovate. Your analysis report will contain your vital KPIs, so you can see where you’re reaching your targets and achieving goals, and where you need to speed up your activities or optimize your strategy. If you can uncover trends or patterns in your data, you can use it to innovate and stand out by offering even more valuable content, services, or products to your audience.

Data analysis is vital for companies for several reasons.

A reliable source of information

Trusting your intuition is fine, but relying on data is safer. When you can base your action plan on data that clearly shows that something is working or failing, you won’t only justify your decisions in front of the management, clients, or investors, but you’ll also be sure that you’ve taken appropriate steps to fix an issue or seize an important opportunity.

A better understanding of your business

According to Databox’s State of Business Reporting , most companies stated that regular monitoring and reporting improved progress monitoring, increased team effectiveness, allowed them to identify trends more easily, and improved financial performance. Data analysis makes it easier to understand your business as a whole, and each aspect individually. You can see how different departments analyze their workflow and how each step impacts their results in the end, by following their KPIs over time. Then, you can easily conclude what your business needs to grow – to boost your sales strategy, optimize your finances, or up your SEO game, for example.

An additional way to understand your business better is to compare your most important metrics and KPIs against companies that are just like yours. With Databox Benchmarks , you will need only one spot to see how all of your teams stack up against your peers and competitors.

Instantly and Anonymously Benchmark Your Company’s Performance Against Others Just Like You

If you ever asked yourself:

  • How does our marketing stack up against our competitors?
  • Are our salespeople as productive as reps from similar companies?
  • Are our profit margins as high as our peers?

Databox Benchmark Groups can finally help you answer these questions and discover how your company measures up against similar companies based on your KPIs.

When you join Benchmark Groups, you will:

  • Get instant, up-to-date data on how your company stacks up against similar companies based on the metrics most important to you. Explore benchmarks for dozens of metrics, built on anonymized data from thousands of companies and get a full 360° view of your company’s KPIs across sales, marketing, finance, and more.
  • Understand where your business excels and where you may be falling behind so you can shift to what will make the biggest impact. Leverage industry insights to set more effective, competitive business strategies. Explore where exactly you have room for growth within your business based on objective market data.
  • Keep your clients happy by using data to back up your expertise. Show your clients where you’re helping them overperform against similar companies. Use the data to show prospects where they really are… and the potential of where they could be.
  • Get a valuable asset for improving yearly and quarterly planning . Get valuable insights into areas that need more work. Gain more context for strategic planning.

The best part?

  • Benchmark Groups are free to access.
  • The data is 100% anonymized. No other company will be able to see your performance, and you won’t be able to see the performance of individual companies either.

When it comes to showing you how your performance compares to others, here is what it might look like for the metric Average Session Duration:

how to create a data analysis in research

And here is an example of an open group you could join:

how to create a data analysis in research

And this is just a fraction of what you’ll get. With Databox Benchmarks, you will need only one spot to see how all of your teams stack up — marketing, sales, customer service, product development, finance, and more. 

  • Choose criteria so that the Benchmark is calculated using only companies like yours
  • Narrow the benchmark sample using criteria that describe your company
  • Display benchmarks right on your Databox dashboards

Sounds like something you want to try out? Join a Databox Benchmark Group today!

It makes data accessible to everyone

Data doesn’t represent a magical creature reserved for data scientists only anymore. Now that you have streamlined and easy-to-follow data visualizations and tools that automatically show the latest figures, you can include everyone in the decision-making process as they’ll understand what means what in the charts and tables. The data may be complex, but it becomes easy to read when combined with proper illustrations. And when your teams gain such useful and accessible insight, they will feel motivated to act on it immediately.

Better collaboration

Data analysis reports help teams collaborate better, as well. You can apply the SMART technique to your KPIs and goals, because your KPIs become assignable. When they’re easy to interpret for your whole team, you can assign each person with one or multiple KPIs that they’ll be in charge of. That means taking a lot off a team leader’s plate so they can focus more on making other improvements in the business. At the same time, removing inaccurate data from your day-to-day operations will improve friction between different departments, like marketing and sales, for instance.

More productivity

You can also expect increased productivity, since you’ll be saving time you’d otherwise spend on waiting for specialists to translate data for other departments, etc. This means your internal procedures will also be on a top level.

Want to give value with your data analysis report? It’s critical to master the skill of writing a quality data analytics report. Want to know how to report on data efficiently? We’ll share our secret in the following section.

  • Start with an Outline
  • Make a Selection of Vital KPIs
  • Pick the Right Charts for Appealing Design
  • Use a Narrative
  • Organize the Information
  • Include a Summary
  • Careful with Your Recommendations
  • Double-Check Everything
  • Use Interactive Dashboards

1. Start with an Outline

If you start writing without having a clear idea of what your data analysis report is going to include, it may get messy. Important insights may slip through your fingers, and you may stray away too far from the main topic. To avoid this, start the report by writing an outline first. Plan the structure and contents of each section first to make sure you’ve covered everything, and only then start crafting the report.

2. Make a Selection of Vital KPIs

Don’t overwhelm the audience by including every single metric there is. You can discuss your whole dashboard in a meeting with your team, but if you’re creating data analytics reports or marketing reports for other departments or the executives, it’s best to focus on the most relevant KPIs that demonstrate the data important for the overall business performance.

PRO TIP: How Well Are Your Marketing KPIs Performing?

Like most marketers and marketing managers, you want to know how well your efforts are translating into results each month. How much traffic and new contact conversions do you get? How many new contacts do you get from organic sessions? How are your email campaigns performing? How well are your landing pages converting? You might have to scramble to put all of this together in a single report, but now you can have it all at your fingertips in a single Databox dashboard.

Our Marketing Overview Dashboard includes data from Google Analytics 4 and HubSpot Marketing with key performance metrics like:

  • Sessions . The number of sessions can tell you how many times people are returning to your website. Obviously, the higher the better.
  • New Contacts from Sessions . How well is your campaign driving new contacts and customers?
  • Marketing Performance KPIs . Tracking the number of MQLs, SQLs, New Contacts and similar will help you identify how your marketing efforts contribute to sales.
  • Email Performance . Measure the success of your email campaigns from HubSpot. Keep an eye on your most important email marketing metrics such as number of sent emails, number of opened emails, open rate, email click-through rate, and more.
  • Blog Posts and Landing Pages . How many people have viewed your blog recently? How well are your landing pages performing?

Now you can benefit from the experience of our Google Analytics and HubSpot Marketing experts, who have put together a plug-and-play Databox template that contains all the essential metrics for monitoring your leads. It’s simple to implement and start using as a standalone dashboard or in marketing reports, and best of all, it’s free!

marketing_overview_hubspot_ga_dashboard_preview

You can easily set it up in just a few clicks – no coding required.

To set up the dashboard, follow these 3 simple steps:

Step 1: Get the template 

Step 2: Connect your HubSpot and Google Analytics 4 accounts with Databox. 

Step 3: Watch your dashboard populate in seconds.

3. Pick the Right Charts for Appealing Design

If you’re showing historical data – for instance, how you’ve performed now compared to last month – it’s best to use timelines or graphs. For other data, pie charts or tables may be more suitable. Make sure you use the right data visualization to display your data accurately and in an easy-to-understand manner.

4. Use a Narrative

Do you work on analytics and reporting ? Just exporting your data into a spreadsheet doesn’t qualify as either of them. The fact that you’re dealing with data may sound too technical, but actually, your report should tell a story about your performance. What happened on a specific day? Did your organic traffic increase or suddenly drop? Why? And more. There are a lot of questions to answer and you can put all the responses together in a coherent, understandable narrative.

5. Organize the Information

Before you start writing or building your dashboard, choose how you’re going to organize your data. Are you going to talk about the most relevant and general ones first? It may be the best way to start the report – the best practices typically involve starting with more general information and then diving into details if necessary.

6. Include a Summary

Some people in your audience won’t have the time to read the whole report, but they’ll want to know about your findings. Besides, a summary at the beginning of your data analytics report will help the reader get familiar with the topic and the goal of the report. And a quick note: although the summary should be placed at the beginning, you usually write it when you’re done with the report. When you have the whole picture, it’s easier to extract the key points that you’ll include in the summary.

7. Careful with Your Recommendations

Your communication skills may be critical in data analytics reports. Know that some of the results probably won’t be satisfactory, which means that someone’s strategy failed. Make sure you’re objective in your recommendations and that you’re not looking for someone to blame. Don’t criticize, but give suggestions on how things can be improved. Being solution-oriented is much more important and helpful for the business.

8. Double-Check Everything

The whole point of using data analytics tools and data, in general, is to achieve as much accuracy as possible. Avoid manual mistakes by proofreading your report when you finish, and if possible, give it to another person so they can confirm everything’s in place.

9. Use Interactive Dashboards

Using the right tools is just as important as the contents of your data analysis. The way you present it can make or break a good report, regardless of how valuable the data is. That said, choose a great reporting tool that can automatically update your data and display it in a visually appealing manner. Make sure it offers streamlined interactive dashboards that you can also customize depending on the purpose of the report.

To wrap up the guide, we decided to share nine excellent examples of what awesome data analysis reports can look like. You’ll learn what metrics you should include and how to organize them in logical sections to make your report beautiful and effective.

  • Marketing Data Analysis Report Example

SEO Data Analysis Report Example

Sales data analysis report example.

  • Customer Support Data Analysis Report Example

Help Desk Data Analysis Report Example

Ecommerce data analysis report example, project management data analysis report example, social media data analysis report example, financial kpi data analysis report example, marketing data report example.

If you need an intuitive dashboard that allows you to track your website performance effortlessly and monitor all the relevant metrics such as website sessions, pageviews, or CTA engagement, you’ll love this free HubSpot Marketing Website Overview dashboard template .

Marketing Data Report Example

Tracking the performance of your SEO efforts is important. You can easily monitor relevant SEO KPIs like clicks by page, engaged sessions, or views by session medium by downloading this Google Organic SEO Dashboard .

Google Organic SEO Dashboard

How successful is your sales team? It’s easy to analyze their performance and predict future growth if you choose this HubSpot CRM Sales Analytics Overview dashboard template and track metrics such as average time to close the deal, new deals amount, or average revenue per new client.

Sales Data Analysis Report Example

Customer Support Analysis Data Report Example

Customer support is one of the essential factors that impact your business growth. You can use this streamlined, customizable Customer Success dashboard template . In a single dashboard, you can monitor metrics such as customer satisfaction score, new MRR, or time to first response time.

Customer Support Analysis Data Report Example

Other than being free and intuitive, this HelpScout for Customer Support dashboard template is also customizable and enables you to track the most vital metrics that indicate your customer support agents’ performance: handle time, happiness score, interactions per resolution, and more.

Help Desk Data Analysis Report Example

Is your online store improving or failing? You can easily collect relevant data about your store and monitor the most important metrics like total sales, orders placed, and new customers by downloading this WooCommerce Shop Overview dashboard template .

Ecommerce Data Analysis Report Example

Does your IT department need feedback on their project management performance? Download this Jira dashboard template to track vital metrics such as issues created or resolved, issues by status, etc. Jira enables you to gain valuable insights into your teams’ productivity.

Project Management Data Analysis Report Example

Need to know if your social media strategy is successful? You can find that out by using this easy-to-understand Social Media Awareness & Engagement dashboard template . Here you can monitor and analyze metrics like sessions by social source, track the number of likes and followers, and measure the traffic from each source.

Social Media Data Analysis Report Example

Tracking your finances is critical for keeping your business profitable. If you want to monitor metrics such as the number of open invoices, open deals amount by stage by pipeline, or closed-won deals, use this free QuickBooks + HubSpot CRM Financial Performance dashboard template .

Financial KPI Data Analysis Report Example

Rely on Accurate Data with Databox

“I don’t have time to build custom reports from scratch.”

“It takes too long and becomes daunting very soon.”

“I’m not sure how to organize the data to make it effective and prove the value of my work.”

Does this sound like you?

Well, it’s something we all said at some point – creating data analytics reports can be time-consuming and tiring. And you’re still not sure if the report is compelling and understandable enough when you’re done.

That’s why we decided to create Databox dashboards – a world-class solution for saving your money and time. We build streamlined and easy-to-follow dashboards that include all the metrics that you may need and allow you to create custom ones if necessary. That way, you can use templates and adjust them to any new project or client without having to build a report from scratch.

You can skip the setup and get your first dashboard for free in just 24 hours, with our fantastic customer support team on the line to assist you with the metrics you should track and the structure you should use.

Enjoy crafting brilliant data analysis reports that will improve your business – it’s never been faster and more effortless. Sign up today and get your free dashboard in no time.

Share on Twitter

Get practical strategies that drive consistent growth

12 Tips for Developing a Successful Data Analytics Strategy

how to create a data analysis in research

What Is Data Reporting and How to Create Data Reports for Your Business

What is kpi reporting kpi report examples, tips, and best practices.

' src=

Build your first dashboard in 5 minutes or less

Latest from our blog

  • Playmaker Spotlight: Tory Ferrall, Director of Revenue Operations March 27, 2024
  • New in Databox: Safeguard Your Data With Advanced Security Settings March 18, 2024
  • Metrics & KPIs
  • vs. Tableau
  • vs. Looker Studio
  • vs. Klipfolio
  • vs. Power BI
  • vs. Whatagraph
  • vs. AgencyAnalytics
  • Product & Engineering
  • Inside Databox
  • Terms of Service
  • Privacy Policy
  • Talent Resources
  • We're Hiring!
  • Help Center
  • API Documentation

Pledge 1%

medRxiv

Maternal and Infant Research Electronic Data Analysis (MIREDA): A protocol for creating a common data model for federated analysis of UK birth cohorts and the life course

  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for MJ Seaborne
  • For correspondence: [email protected]
  • ORCID record for HE Jones
  • ORCID record for N Cockburn
  • ORCID record for S Durbaba
  • ORCID record for TC Giles
  • ORCID record for A González-Izquierdo
  • ORCID record for A Hough
  • ORCID record for D Mason
  • ORCID record for A Mendez-Villalon
  • ORCID record for C. Sanchez-Soriano
  • ORCID record for C. Orton
  • ORCID record for D Ford
  • ORCID record for P Quinlan
  • ORCID record for K Nirantharakumar
  • ORCID record for L. Poston
  • ORCID record for RM Reynolds
  • ORCID record for G Santorelli
  • ORCID record for S Brophy
  • Info/History
  • Preview PDF

Introduction Birth cohorts are valuable resources for studying early life, the determinants of health, disease, and development. They are essential for studying life course. Electronic cohorts are live, dynamic longitudinal cohorts using anonymised, routinely collected data. There is no selection bias through direct recruitment, but they are limited to health and administrative system data and may lack contextual information.

The MIREDA (Maternal and Infant Research Electronic Data Analysis) partnership creates a UK-wide birth cohort by aligning existing electronic birth cohorts to have the same structure, content, and vocabularies, enabling UK-wide federated analyses.

Create a core dynamic, live UK-wide electronic birth cohort with approximately 100,000 new births per year using a common data model (CDM).

Provide data linkage and automation for long-term follow up of births from MuM-PreDiCT and the ‘Born in’ initiatives of Bradford, Wales, Scotland, and South London for comparable analyses.

Methods We will establish core data content and collate linkable data. Use a suite of extraction, transformation, and load (ETL) tools will be used to transform the data for each birth cohort into the CDM. Transformed datasets will remain within each cohort’s trusted research environment (TRE). Metadata will be uploaded for the public to the Health Data Research (HDRUK) Innovation Gateway . We will develop a single online data access request for researchers. A cohort profile will be developed for researchers to reference the resource.

Ethics Each cohort has approval from their TRE through compliance with their project application processes and information governance.

Dissemination We will engage with researchers in the field to promote our resource through partnership networking, publication, research collaborations, conferences, social media, and marketing communications strategies.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This work was supported by an MRC Partnership Grant [MR/X02055X/1], MatCHNet pump-priming [U20005/302873] and an MRC Programme Grant [MR/X009742/1].

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

Access to data is granted according to the information governance requirements of each TRE. The Data Protection Act 2018 is not applicable to anonymised data and the OMOP CDM will be anonymised and provide aggregated data and statistics only. Each TRE has ethical approval for its operation and use, thus no additional ethical approval was required beyond the standard project approval by official channels.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

The formatting of the abstract on the website removed spaces between headings and text so I have amended to add punctuation to make it clearer. The paper itself is unchanged.

Data availability statement

Data will be available upon reasonable request through the Health Data Research (HDRUK) Innovation Gateway .

Abbreviations

View the discussion thread.

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Reddit logo

Citation Manager Formats

  • EndNote (tagged)
  • EndNote 8 (xml)
  • RefWorks Tagged
  • Ref Manager
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Public and Global Health
  • Addiction Medicine (316)
  • Allergy and Immunology (620)
  • Anesthesia (160)
  • Cardiovascular Medicine (2284)
  • Dentistry and Oral Medicine (280)
  • Dermatology (201)
  • Emergency Medicine (370)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (805)
  • Epidemiology (11593)
  • Forensic Medicine (10)
  • Gastroenterology (681)
  • Genetic and Genomic Medicine (3600)
  • Geriatric Medicine (337)
  • Health Economics (618)
  • Health Informatics (2311)
  • Health Policy (916)
  • Health Systems and Quality Improvement (865)
  • Hematology (335)
  • HIV/AIDS (753)
  • Infectious Diseases (except HIV/AIDS) (13171)
  • Intensive Care and Critical Care Medicine (758)
  • Medical Education (360)
  • Medical Ethics (100)
  • Nephrology (391)
  • Neurology (3369)
  • Nursing (191)
  • Nutrition (508)
  • Obstetrics and Gynecology (652)
  • Occupational and Environmental Health (647)
  • Oncology (1764)
  • Ophthalmology (526)
  • Orthopedics (210)
  • Otolaryngology (284)
  • Pain Medicine (223)
  • Palliative Medicine (66)
  • Pathology (441)
  • Pediatrics (1008)
  • Pharmacology and Therapeutics (422)
  • Primary Care Research (407)
  • Psychiatry and Clinical Psychology (3077)
  • Public and Global Health (6005)
  • Radiology and Imaging (1227)
  • Rehabilitation Medicine and Physical Therapy (715)
  • Respiratory Medicine (811)
  • Rheumatology (367)
  • Sexual and Reproductive Health (356)
  • Sports Medicine (318)
  • Surgery (390)
  • Toxicology (50)
  • Transplantation (171)
  • Urology (142)

Have a thesis expert improve your writing

Check your thesis for plagiarism in 10 minutes, generate your apa citations for free.

  • Knowledge Base

The Beginner's Guide to Statistical Analysis | 5 Steps & Examples

Statistical analysis means investigating trends, patterns, and relationships using quantitative data . It is an important research tool used by scientists, governments, businesses, and other organisations.

To draw valid conclusions, statistical analysis requires careful planning from the very start of the research process . You need to specify your hypotheses and make decisions about your research design, sample size, and sampling procedure.

After collecting data from your sample, you can organise and summarise the data using descriptive statistics . Then, you can use inferential statistics to formally test hypotheses and make estimates about the population. Finally, you can interpret and generalise your findings.

This article is a practical introduction to statistical analysis for students and researchers. We’ll walk you through the steps using two research examples. The first investigates a potential cause-and-effect relationship, while the second investigates a potential correlation between variables.

Table of contents

Step 1: write your hypotheses and plan your research design, step 2: collect data from a sample, step 3: summarise your data with descriptive statistics, step 4: test hypotheses or make estimates with inferential statistics, step 5: interpret your results, frequently asked questions about statistics.

To collect valid data for statistical analysis, you first need to specify your hypotheses and plan out your research design.

Writing statistical hypotheses

The goal of research is often to investigate a relationship between variables within a population . You start with a prediction, and use statistical analysis to test that prediction.

A statistical hypothesis is a formal way of writing a prediction about a population. Every research prediction is rephrased into null and alternative hypotheses that can be tested using sample data.

While the null hypothesis always predicts no effect or no relationship between variables, the alternative hypothesis states your research prediction of an effect or relationship.

  • Null hypothesis: A 5-minute meditation exercise will have no effect on math test scores in teenagers.
  • Alternative hypothesis: A 5-minute meditation exercise will improve math test scores in teenagers.
  • Null hypothesis: Parental income and GPA have no relationship with each other in college students.
  • Alternative hypothesis: Parental income and GPA are positively correlated in college students.

Planning your research design

A research design is your overall strategy for data collection and analysis. It determines the statistical tests you can use to test your hypothesis later on.

First, decide whether your research will use a descriptive, correlational, or experimental design. Experiments directly influence variables, whereas descriptive and correlational studies only measure variables.

  • In an experimental design , you can assess a cause-and-effect relationship (e.g., the effect of meditation on test scores) using statistical tests of comparison or regression.
  • In a correlational design , you can explore relationships between variables (e.g., parental income and GPA) without any assumption of causality using correlation coefficients and significance tests.
  • In a descriptive design , you can study the characteristics of a population or phenomenon (e.g., the prevalence of anxiety in U.S. college students) using statistical tests to draw inferences from sample data.

Your research design also concerns whether you’ll compare participants at the group level or individual level, or both.

  • In a between-subjects design , you compare the group-level outcomes of participants who have been exposed to different treatments (e.g., those who performed a meditation exercise vs those who didn’t).
  • In a within-subjects design , you compare repeated measures from participants who have participated in all treatments of a study (e.g., scores from before and after performing a meditation exercise).
  • In a mixed (factorial) design , one variable is altered between subjects and another is altered within subjects (e.g., pretest and posttest scores from participants who either did or didn’t do a meditation exercise).
  • Experimental
  • Correlational

First, you’ll take baseline test scores from participants. Then, your participants will undergo a 5-minute meditation exercise. Finally, you’ll record participants’ scores from a second math test.

In this experiment, the independent variable is the 5-minute meditation exercise, and the dependent variable is the math test score from before and after the intervention. Example: Correlational research design In a correlational study, you test whether there is a relationship between parental income and GPA in graduating college students. To collect your data, you will ask participants to fill in a survey and self-report their parents’ incomes and their own GPA.

Measuring variables

When planning a research design, you should operationalise your variables and decide exactly how you will measure them.

For statistical analysis, it’s important to consider the level of measurement of your variables, which tells you what kind of data they contain:

  • Categorical data represents groupings. These may be nominal (e.g., gender) or ordinal (e.g. level of language ability).
  • Quantitative data represents amounts. These may be on an interval scale (e.g. test score) or a ratio scale (e.g. age).

Many variables can be measured at different levels of precision. For example, age data can be quantitative (8 years old) or categorical (young). If a variable is coded numerically (e.g., level of agreement from 1–5), it doesn’t automatically mean that it’s quantitative instead of categorical.

Identifying the measurement level is important for choosing appropriate statistics and hypothesis tests. For example, you can calculate a mean score with quantitative data, but not with categorical data.

In a research study, along with measures of your variables of interest, you’ll often collect data on relevant participant characteristics.

Population vs sample

In most cases, it’s too difficult or expensive to collect data from every member of the population you’re interested in studying. Instead, you’ll collect data from a sample.

Statistical analysis allows you to apply your findings beyond your own sample as long as you use appropriate sampling procedures . You should aim for a sample that is representative of the population.

Sampling for statistical analysis

There are two main approaches to selecting a sample.

  • Probability sampling: every member of the population has a chance of being selected for the study through random selection.
  • Non-probability sampling: some members of the population are more likely than others to be selected for the study because of criteria such as convenience or voluntary self-selection.

In theory, for highly generalisable findings, you should use a probability sampling method. Random selection reduces sampling bias and ensures that data from your sample is actually typical of the population. Parametric tests can be used to make strong statistical inferences when data are collected using probability sampling.

But in practice, it’s rarely possible to gather the ideal sample. While non-probability samples are more likely to be biased, they are much easier to recruit and collect data from. Non-parametric tests are more appropriate for non-probability samples, but they result in weaker inferences about the population.

If you want to use parametric tests for non-probability samples, you have to make the case that:

  • your sample is representative of the population you’re generalising your findings to.
  • your sample lacks systematic bias.

Keep in mind that external validity means that you can only generalise your conclusions to others who share the characteristics of your sample. For instance, results from Western, Educated, Industrialised, Rich and Democratic samples (e.g., college students in the US) aren’t automatically applicable to all non-WEIRD populations.

If you apply parametric tests to data from non-probability samples, be sure to elaborate on the limitations of how far your results can be generalised in your discussion section .

Create an appropriate sampling procedure

Based on the resources available for your research, decide on how you’ll recruit participants.

  • Will you have resources to advertise your study widely, including outside of your university setting?
  • Will you have the means to recruit a diverse sample that represents a broad population?
  • Do you have time to contact and follow up with members of hard-to-reach groups?

Your participants are self-selected by their schools. Although you’re using a non-probability sample, you aim for a diverse and representative sample. Example: Sampling (correlational study) Your main population of interest is male college students in the US. Using social media advertising, you recruit senior-year male college students from a smaller subpopulation: seven universities in the Boston area.

Calculate sufficient sample size

Before recruiting participants, decide on your sample size either by looking at other studies in your field or using statistics. A sample that’s too small may be unrepresentative of the sample, while a sample that’s too large will be more costly than necessary.

There are many sample size calculators online. Different formulas are used depending on whether you have subgroups or how rigorous your study should be (e.g., in clinical research). As a rule of thumb, a minimum of 30 units or more per subgroup is necessary.

To use these calculators, you have to understand and input these key components:

  • Significance level (alpha): the risk of rejecting a true null hypothesis that you are willing to take, usually set at 5%.
  • Statistical power : the probability of your study detecting an effect of a certain size if there is one, usually 80% or higher.
  • Expected effect size : a standardised indication of how large the expected result of your study will be, usually based on other similar studies.
  • Population standard deviation: an estimate of the population parameter based on a previous study or a pilot study of your own.

Once you’ve collected all of your data, you can inspect them and calculate descriptive statistics that summarise them.

Inspect your data

There are various ways to inspect your data, including the following:

  • Organising data from each variable in frequency distribution tables .
  • Displaying data from a key variable in a bar chart to view the distribution of responses.
  • Visualising the relationship between two variables using a scatter plot .

By visualising your data in tables and graphs, you can assess whether your data follow a skewed or normal distribution and whether there are any outliers or missing data.

A normal distribution means that your data are symmetrically distributed around a center where most values lie, with the values tapering off at the tail ends.

Mean, median, mode, and standard deviation in a normal distribution

In contrast, a skewed distribution is asymmetric and has more values on one end than the other. The shape of the distribution is important to keep in mind because only some descriptive statistics should be used with skewed distributions.

Extreme outliers can also produce misleading statistics, so you may need a systematic approach to dealing with these values.

Calculate measures of central tendency

Measures of central tendency describe where most of the values in a data set lie. Three main measures of central tendency are often reported:

  • Mode : the most popular response or value in the data set.
  • Median : the value in the exact middle of the data set when ordered from low to high.
  • Mean : the sum of all values divided by the number of values.

However, depending on the shape of the distribution and level of measurement, only one or two of these measures may be appropriate. For example, many demographic characteristics can only be described using the mode or proportions, while a variable like reaction time may not have a mode at all.

Calculate measures of variability

Measures of variability tell you how spread out the values in a data set are. Four main measures of variability are often reported:

  • Range : the highest value minus the lowest value of the data set.
  • Interquartile range : the range of the middle half of the data set.
  • Standard deviation : the average distance between each value in your data set and the mean.
  • Variance : the square of the standard deviation.

Once again, the shape of the distribution and level of measurement should guide your choice of variability statistics. The interquartile range is the best measure for skewed distributions, while standard deviation and variance provide the best information for normal distributions.

Using your table, you should check whether the units of the descriptive statistics are comparable for pretest and posttest scores. For example, are the variance levels similar across the groups? Are there any extreme values? If there are, you may need to identify and remove extreme outliers in your data set or transform your data before performing a statistical test.

From this table, we can see that the mean score increased after the meditation exercise, and the variances of the two scores are comparable. Next, we can perform a statistical test to find out if this improvement in test scores is statistically significant in the population. Example: Descriptive statistics (correlational study) After collecting data from 653 students, you tabulate descriptive statistics for annual parental income and GPA.

It’s important to check whether you have a broad range of data points. If you don’t, your data may be skewed towards some groups more than others (e.g., high academic achievers), and only limited inferences can be made about a relationship.

A number that describes a sample is called a statistic , while a number describing a population is called a parameter . Using inferential statistics , you can make conclusions about population parameters based on sample statistics.

Researchers often use two main methods (simultaneously) to make inferences in statistics.

  • Estimation: calculating population parameters based on sample statistics.
  • Hypothesis testing: a formal process for testing research predictions about the population using samples.

You can make two types of estimates of population parameters from sample statistics:

  • A point estimate : a value that represents your best guess of the exact parameter.
  • An interval estimate : a range of values that represent your best guess of where the parameter lies.

If your aim is to infer and report population characteristics from sample data, it’s best to use both point and interval estimates in your paper.

You can consider a sample statistic a point estimate for the population parameter when you have a representative sample (e.g., in a wide public opinion poll, the proportion of a sample that supports the current government is taken as the population proportion of government supporters).

There’s always error involved in estimation, so you should also provide a confidence interval as an interval estimate to show the variability around a point estimate.

A confidence interval uses the standard error and the z score from the standard normal distribution to convey where you’d generally expect to find the population parameter most of the time.

Hypothesis testing

Using data from a sample, you can test hypotheses about relationships between variables in the population. Hypothesis testing starts with the assumption that the null hypothesis is true in the population, and you use statistical tests to assess whether the null hypothesis can be rejected or not.

Statistical tests determine where your sample data would lie on an expected distribution of sample data if the null hypothesis were true. These tests give two main outputs:

  • A test statistic tells you how much your data differs from the null hypothesis of the test.
  • A p value tells you the likelihood of obtaining your results if the null hypothesis is actually true in the population.

Statistical tests come in three main varieties:

  • Comparison tests assess group differences in outcomes.
  • Regression tests assess cause-and-effect relationships between variables.
  • Correlation tests assess relationships between variables without assuming causation.

Your choice of statistical test depends on your research questions, research design, sampling method, and data characteristics.

Parametric tests

Parametric tests make powerful inferences about the population based on sample data. But to use them, some assumptions must be met, and only some types of variables can be used. If your data violate these assumptions, you can perform appropriate data transformations or use alternative non-parametric tests instead.

A regression models the extent to which changes in a predictor variable results in changes in outcome variable(s).

  • A simple linear regression includes one predictor variable and one outcome variable.
  • A multiple linear regression includes two or more predictor variables and one outcome variable.

Comparison tests usually compare the means of groups. These may be the means of different groups within a sample (e.g., a treatment and control group), the means of one sample group taken at different times (e.g., pretest and posttest scores), or a sample mean and a population mean.

  • A t test is for exactly 1 or 2 groups when the sample is small (30 or less).
  • A z test is for exactly 1 or 2 groups when the sample is large.
  • An ANOVA is for 3 or more groups.

The z and t tests have subtypes based on the number and types of samples and the hypotheses:

  • If you have only one sample that you want to compare to a population mean, use a one-sample test .
  • If you have paired measurements (within-subjects design), use a dependent (paired) samples test .
  • If you have completely separate measurements from two unmatched groups (between-subjects design), use an independent (unpaired) samples test .
  • If you expect a difference between groups in a specific direction, use a one-tailed test .
  • If you don’t have any expectations for the direction of a difference between groups, use a two-tailed test .

The only parametric correlation test is Pearson’s r . The correlation coefficient ( r ) tells you the strength of a linear relationship between two quantitative variables.

However, to test whether the correlation in the sample is strong enough to be important in the population, you also need to perform a significance test of the correlation coefficient, usually a t test, to obtain a p value. This test uses your sample size to calculate how much the correlation coefficient differs from zero in the population.

You use a dependent-samples, one-tailed t test to assess whether the meditation exercise significantly improved math test scores. The test gives you:

  • a t value (test statistic) of 3.00
  • a p value of 0.0028

Although Pearson’s r is a test statistic, it doesn’t tell you anything about how significant the correlation is in the population. You also need to test whether this sample correlation coefficient is large enough to demonstrate a correlation in the population.

A t test can also determine how significantly a correlation coefficient differs from zero based on sample size. Since you expect a positive correlation between parental income and GPA, you use a one-sample, one-tailed t test. The t test gives you:

  • a t value of 3.08
  • a p value of 0.001

The final step of statistical analysis is interpreting your results.

Statistical significance

In hypothesis testing, statistical significance is the main criterion for forming conclusions. You compare your p value to a set significance level (usually 0.05) to decide whether your results are statistically significant or non-significant.

Statistically significant results are considered unlikely to have arisen solely due to chance. There is only a very low chance of such a result occurring if the null hypothesis is true in the population.

This means that you believe the meditation intervention, rather than random factors, directly caused the increase in test scores. Example: Interpret your results (correlational study) You compare your p value of 0.001 to your significance threshold of 0.05. With a p value under this threshold, you can reject the null hypothesis. This indicates a statistically significant correlation between parental income and GPA in male college students.

Note that correlation doesn’t always mean causation, because there are often many underlying factors contributing to a complex variable like GPA. Even if one variable is related to another, this may be because of a third variable influencing both of them, or indirect links between the two variables.

Effect size

A statistically significant result doesn’t necessarily mean that there are important real life applications or clinical outcomes for a finding.

In contrast, the effect size indicates the practical significance of your results. It’s important to report effect sizes along with your inferential statistics for a complete picture of your results. You should also report interval estimates of effect sizes if you’re writing an APA style paper .

With a Cohen’s d of 0.72, there’s medium to high practical significance to your finding that the meditation exercise improved test scores. Example: Effect size (correlational study) To determine the effect size of the correlation coefficient, you compare your Pearson’s r value to Cohen’s effect size criteria.

Decision errors

Type I and Type II errors are mistakes made in research conclusions. A Type I error means rejecting the null hypothesis when it’s actually true, while a Type II error means failing to reject the null hypothesis when it’s false.

You can aim to minimise the risk of these errors by selecting an optimal significance level and ensuring high power . However, there’s a trade-off between the two errors, so a fine balance is necessary.

Frequentist versus Bayesian statistics

Traditionally, frequentist statistics emphasises null hypothesis significance testing and always starts with the assumption of a true null hypothesis.

However, Bayesian statistics has grown in popularity as an alternative approach in the last few decades. In this approach, you use previous research to continually update your hypotheses based on your expectations and observations.

Bayes factor compares the relative strength of evidence for the null versus the alternative hypothesis rather than making a conclusion about rejecting the null hypothesis or not.

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

The research methods you use depend on the type of data you need to answer your research question .

  • If you want to measure something or test a hypothesis , use quantitative methods . If you want to explore ideas, thoughts, and meanings, use qualitative methods .
  • If you want to analyse a large amount of readily available data, use secondary data. If you want data specific to your purposes with control over how they are generated, collect primary data.
  • If you want to establish cause-and-effect relationships between variables , use experimental methods. If you want to understand the characteristics of a research subject, use descriptive methods.

Statistical analysis is the main method for analyzing quantitative research data . It uses probabilities and models to test predictions about a population from sample data.

Is this article helpful?

Other students also liked, a quick guide to experimental design | 5 steps & examples, controlled experiments | methods & examples of control, between-subjects design | examples, pros & cons, more interesting articles.

  • Central Limit Theorem | Formula, Definition & Examples
  • Central Tendency | Understanding the Mean, Median & Mode
  • Correlation Coefficient | Types, Formulas & Examples
  • Descriptive Statistics | Definitions, Types, Examples
  • How to Calculate Standard Deviation (Guide) | Calculator & Examples
  • How to Calculate Variance | Calculator, Analysis & Examples
  • How to Find Degrees of Freedom | Definition & Formula
  • How to Find Interquartile Range (IQR) | Calculator & Examples
  • How to Find Outliers | Meaning, Formula & Examples
  • How to Find the Geometric Mean | Calculator & Formula
  • How to Find the Mean | Definition, Examples & Calculator
  • How to Find the Median | Definition, Examples & Calculator
  • How to Find the Range of a Data Set | Calculator & Formula
  • Inferential Statistics | An Easy Introduction & Examples
  • Levels of measurement: Nominal, ordinal, interval, ratio
  • Missing Data | Types, Explanation, & Imputation
  • Normal Distribution | Examples, Formulas, & Uses
  • Null and Alternative Hypotheses | Definitions & Examples
  • Poisson Distributions | Definition, Formula & Examples
  • Skewness | Definition, Examples & Formula
  • T-Distribution | What It Is and How To Use It (With Examples)
  • The Standard Normal Distribution | Calculator, Examples & Uses
  • Type I & Type II Errors | Differences, Examples, Visualizations
  • Understanding Confidence Intervals | Easy Examples & Formulas
  • Variability | Calculating Range, IQR, Variance, Standard Deviation
  • What is Effect Size and Why Does It Matter? (Examples)
  • What Is Interval Data? | Examples & Definition
  • What Is Nominal Data? | Examples & Definition
  • What Is Ordinal Data? | Examples & Definition
  • What Is Ratio Data? | Examples & Definition
  • What Is the Mode in Statistics? | Definition, Examples & Calculator

Read our research on: Gun Policy | International Conflict | Election 2024

Regions & Countries

9 facts about americans and marijuana.

People smell a cannabis plant on April 20, 2023, at Washington Square Park in New York City. (Leonardo Munoz/VIEWpress)

The use and possession of marijuana is illegal under U.S. federal law, but about three-quarters of states have legalized the drug for medical or recreational purposes. The changing legal landscape has coincided with a decades-long rise in public support for legalization, which a majority of Americans now favor.

Here are nine facts about Americans’ views of and experiences with marijuana, based on Pew Research Center surveys and other sources.

As more states legalize marijuana, Pew Research Center looked at Americans’ opinions on legalization and how these views have changed over time.

Data comes from surveys by the Center,  Gallup , and the  2022 National Survey on Drug Use and Health  from the U.S. Substance Abuse and Mental Health Services Administration. Information about the jurisdictions where marijuana is legal at the state level comes from the  National Organization for the Reform of Marijuana Laws .

More information about the Center surveys cited in the analysis, including the questions asked and their methodologies, can be found at the links in the text.

Around nine-in-ten Americans say marijuana should be legal for medical or recreational use,  according to a January 2024 Pew Research Center survey . An overwhelming majority of U.S. adults (88%) say either that marijuana should be legal for medical use only (32%) or that it should be legal for medical  and  recreational use (57%). Just 11% say the drug should not be legal in any form. These views have held relatively steady over the past five years.

A pie chart showing that only about 1 in 10 U.S. adults say marijuana should not be legal at all.

Views on marijuana legalization differ widely by age, political party, and race and ethnicity, the January survey shows.

A horizontal stacked bar chart showing that views about legalizing marijuana differ by race and ethnicity, age and partisanship.

While small shares across demographic groups say marijuana should not be legal at all, those least likely to favor it for both medical and recreational use include:

  • Older adults: 31% of adults ages 75 and older support marijuana legalization for medical and recreational purposes, compared with half of those ages 65 to 74, the next youngest age category. By contrast, 71% of adults under 30 support legalization for both uses.
  • Republicans and GOP-leaning independents: 42% of Republicans favor legalizing marijuana for both uses, compared with 72% of Democrats and Democratic leaners. Ideological differences exist as well: Within both parties, those who are more conservative are less likely to support legalization.
  • Hispanic and Asian Americans: 45% in each group support legalizing the drug for medical and recreational use. Larger shares of Black (65%) and White (59%) adults hold this view.

Support for marijuana legalization has increased dramatically over the last two decades. In addition to asking specifically about medical and recreational use of the drug, both the Center and Gallup have asked Americans about legalizing marijuana use in a general way. Gallup asked this question most recently, in 2023. That year, 70% of adults expressed support for legalization, more than double the share who said they favored it in 2000.

A line chart showing that U.S. public opinion on legalizing marijuana, 1969-2023.

Half of U.S. adults (50.3%) say they have ever used marijuana, according to the 2022 National Survey on Drug Use and Health . That is a smaller share than the 84.1% who say they have ever consumed alcohol and the 64.8% who have ever used tobacco products or vaped nicotine.

While many Americans say they have used marijuana in their lifetime, far fewer are current users, according to the same survey. In 2022, 23.0% of adults said they had used the drug in the past year, while 15.9% said they had used it in the past month.

While many Americans say legalizing recreational marijuana has economic and criminal justice benefits, views on these and other impacts vary, the Center’s January survey shows.

  • Economic benefits: About half of adults (52%) say that legalizing recreational marijuana is good for local economies, while 17% say it is bad. Another 29% say it has no impact.

A horizontal stacked bar chart showing how Americans view the effects of legalizing recreational marijuana.

  • Criminal justice system fairness: 42% of Americans say legalizing marijuana for recreational use makes the criminal justice system fairer, compared with 18% who say it makes the system less fair. About four-in-ten (38%) say it has no impact.
  • Use of other drugs: 27% say this policy decreases the use of other drugs like heroin, fentanyl and cocaine, and 29% say it increases it. But the largest share (42%) say it has no effect on other drug use.
  • Community safety: 21% say recreational legalization makes communities safer and 34% say it makes them less safe. Another 44% say it doesn’t impact safety.

Democrats and adults under 50 are more likely than Republicans and those in older age groups to say legalizing marijuana has positive impacts in each of these areas.

Most Americans support easing penalties for people with marijuana convictions, an October 2021 Center survey found . Two-thirds of adults say they favor releasing people from prison who are being held for marijuana-related offenses only, including 41% who strongly favor this. And 61% support removing or expunging marijuana-related offenses from people’s criminal records.

Younger adults, Democrats and Black Americans are especially likely to support these changes. For instance, 74% of Black adults  favor releasing people from prison  who are being held only for marijuana-related offenses, and just as many favor removing or expunging marijuana-related offenses from criminal records.

Twenty-four states and the District of Columbia have legalized small amounts of marijuana for both medical and recreational use as of March 2024,  according to the  National Organization for the Reform of Marijuana Laws  (NORML), an advocacy group that tracks state-level legislation on the issue. Another 14 states have legalized the drug for medical use only.

A map of the U.S. showing that nearly half of states have legalized the recreational use of marijuana.

Of the remaining 12 states, all allow limited access to products such as CBD oil that contain little to no THC – the main psychoactive substance in cannabis. And 26 states overall have at least partially  decriminalized recreational marijuana use , as has the District of Columbia.

In addition to 24 states and D.C.,  the U.S. Virgin Islands ,  Guam  and  the Northern Mariana Islands  have legalized marijuana for medical and recreational use.

More than half of Americans (54%) live in a state where both recreational and medical marijuana are legal, and 74% live in a state where it’s legal either for both purposes or medical use only, according to a February Center analysis of data from the Census Bureau and other outside sources. This analysis looked at state-level legislation in all 50 states and the District of Columbia.

In 2012, Colorado and Washington became the first states to pass legislation legalizing recreational marijuana.

About eight-in-ten Americans (79%) live in a county with at least one cannabis dispensary, according to the February analysis. There are nearly 15,000 marijuana dispensaries nationwide, and 76% are in states (including D.C.) where recreational use is legal. Another 23% are in medical marijuana-only states, and 1% are in states that have made legal allowances for low-percentage THC or CBD-only products.

The states with the largest number of dispensaries include California, Oklahoma, Florida, Colorado and Michigan.

A map of the U.S. showing that cannabis dispensaries are common along the coasts and in a few specific states.

Note: This is an update of a post originally published April 26, 2021, and updated April 13, 2023.  

how to create a data analysis in research

Sign up for our weekly newsletter

Fresh data delivered Saturday mornings

Americans overwhelmingly say marijuana should be legal for medical or recreational use

Religious americans are less likely to endorse legal marijuana for recreational use, four-in-ten u.s. drug arrests in 2018 were for marijuana offenses – mostly possession, two-thirds of americans support marijuana legalization, most popular.

About Pew Research Center Pew Research Center is a nonpartisan fact tank that informs the public about the issues, attitudes and trends shaping the world. It conducts public opinion polling, demographic research, media content analysis and other empirical social science research. Pew Research Center does not take policy positions. It is a subsidiary of The Pew Charitable Trusts .

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • How to Do Thematic Analysis | Step-by-Step Guide & Examples

How to Do Thematic Analysis | Step-by-Step Guide & Examples

Published on September 6, 2019 by Jack Caulfield . Revised on June 22, 2023.

Thematic analysis is a method of analyzing qualitative data . It is usually applied to a set of texts, such as an interview or transcripts . The researcher closely examines the data to identify common themes – topics, ideas and patterns of meaning that come up repeatedly.

There are various approaches to conducting thematic analysis, but the most common form follows a six-step process: familiarization, coding, generating themes, reviewing themes, defining and naming themes, and writing up. Following this process can also help you avoid confirmation bias when formulating your analysis.

This process was originally developed for psychology research by Virginia Braun and Victoria Clarke . However, thematic analysis is a flexible method that can be adapted to many different kinds of research.

Table of contents

When to use thematic analysis, different approaches to thematic analysis, step 1: familiarization, step 2: coding, step 3: generating themes, step 4: reviewing themes, step 5: defining and naming themes, step 6: writing up, other interesting articles.

Thematic analysis is a good approach to research where you’re trying to find out something about people’s views, opinions, knowledge, experiences or values from a set of qualitative data – for example, interview transcripts , social media profiles, or survey responses .

Some types of research questions you might use thematic analysis to answer:

  • How do patients perceive doctors in a hospital setting?
  • What are young women’s experiences on dating sites?
  • What are non-experts’ ideas and opinions about climate change?
  • How is gender constructed in high school history teaching?

To answer any of these questions, you would collect data from a group of relevant participants and then analyze it. Thematic analysis allows you a lot of flexibility in interpreting the data, and allows you to approach large data sets more easily by sorting them into broad themes.

However, it also involves the risk of missing nuances in the data. Thematic analysis is often quite subjective and relies on the researcher’s judgement, so you have to reflect carefully on your own choices and interpretations.

Pay close attention to the data to ensure that you’re not picking up on things that are not there – or obscuring things that are.

Prevent plagiarism. Run a free check.

Once you’ve decided to use thematic analysis, there are different approaches to consider.

There’s the distinction between inductive and deductive approaches:

  • An inductive approach involves allowing the data to determine your themes.
  • A deductive approach involves coming to the data with some preconceived themes you expect to find reflected there, based on theory or existing knowledge.

Ask yourself: Does my theoretical framework give me a strong idea of what kind of themes I expect to find in the data (deductive), or am I planning to develop my own framework based on what I find (inductive)?

There’s also the distinction between a semantic and a latent approach:

  • A semantic approach involves analyzing the explicit content of the data.
  • A latent approach involves reading into the subtext and assumptions underlying the data.

Ask yourself: Am I interested in people’s stated opinions (semantic) or in what their statements reveal about their assumptions and social context (latent)?

After you’ve decided thematic analysis is the right method for analyzing your data, and you’ve thought about the approach you’re going to take, you can follow the six steps developed by Braun and Clarke .

The first step is to get to know our data. It’s important to get a thorough overview of all the data we collected before we start analyzing individual items.

This might involve transcribing audio , reading through the text and taking initial notes, and generally looking through the data to get familiar with it.

Next up, we need to code the data. Coding means highlighting sections of our text – usually phrases or sentences – and coming up with shorthand labels or “codes” to describe their content.

Let’s take a short example text. Say we’re researching perceptions of climate change among conservative voters aged 50 and up, and we have collected data through a series of interviews. An extract from one interview looks like this:

In this extract, we’ve highlighted various phrases in different colors corresponding to different codes. Each code describes the idea or feeling expressed in that part of the text.

At this stage, we want to be thorough: we go through the transcript of every interview and highlight everything that jumps out as relevant or potentially interesting. As well as highlighting all the phrases and sentences that match these codes, we can keep adding new codes as we go through the text.

After we’ve been through the text, we collate together all the data into groups identified by code. These codes allow us to gain a a condensed overview of the main points and common meanings that recur throughout the data.

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

Next, we look over the codes we’ve created, identify patterns among them, and start coming up with themes.

Themes are generally broader than codes. Most of the time, you’ll combine several codes into a single theme. In our example, we might start combining codes into themes like this:

At this stage, we might decide that some of our codes are too vague or not relevant enough (for example, because they don’t appear very often in the data), so they can be discarded.

Other codes might become themes in their own right. In our example, we decided that the code “uncertainty” made sense as a theme, with some other codes incorporated into it.

Again, what we decide will vary according to what we’re trying to find out. We want to create potential themes that tell us something helpful about the data for our purposes.

Now we have to make sure that our themes are useful and accurate representations of the data. Here, we return to the data set and compare our themes against it. Are we missing anything? Are these themes really present in the data? What can we change to make our themes work better?

If we encounter problems with our themes, we might split them up, combine them, discard them or create new ones: whatever makes them more useful and accurate.

For example, we might decide upon looking through the data that “changing terminology” fits better under the “uncertainty” theme than under “distrust of experts,” since the data labelled with this code involves confusion, not necessarily distrust.

Now that you have a final list of themes, it’s time to name and define each of them.

Defining themes involves formulating exactly what we mean by each theme and figuring out how it helps us understand the data.

Naming themes involves coming up with a succinct and easily understandable name for each theme.

For example, we might look at “distrust of experts” and determine exactly who we mean by “experts” in this theme. We might decide that a better name for the theme is “distrust of authority” or “conspiracy thinking”.

Finally, we’ll write up our analysis of the data. Like all academic texts, writing up a thematic analysis requires an introduction to establish our research question, aims and approach.

We should also include a methodology section, describing how we collected the data (e.g. through semi-structured interviews or open-ended survey questions ) and explaining how we conducted the thematic analysis itself.

The results or findings section usually addresses each theme in turn. We describe how often the themes come up and what they mean, including examples from the data as evidence. Finally, our conclusion explains the main takeaways and shows how the analysis has answered our research question.

In our example, we might argue that conspiracy thinking about climate change is widespread among older conservative voters, point out the uncertainty with which many voters view the issue, and discuss the role of misinformation in respondents’ perceptions.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Measures of central tendency
  • Chi square tests
  • Confidence interval
  • Quartiles & Quantiles
  • Cluster sampling
  • Stratified sampling
  • Discourse analysis
  • Cohort study
  • Peer review
  • Ethnography

Research bias

  • Implicit bias
  • Cognitive bias
  • Conformity bias
  • Hawthorne effect
  • Availability heuristic
  • Attrition bias
  • Social desirability bias

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Caulfield, J. (2023, June 22). How to Do Thematic Analysis | Step-by-Step Guide & Examples. Scribbr. Retrieved April 12, 2024, from https://www.scribbr.com/methodology/thematic-analysis/

Is this article helpful?

Jack Caulfield

Jack Caulfield

Other students also liked, what is qualitative research | methods & examples, inductive vs. deductive research approach | steps & examples, critical discourse analysis | definition, guide & examples, what is your plagiarism score.

Office of the Vice President for Research

Four clas faculty researchers secure prestigious early career awards.

Continuing  an upward trend of University of Iowa faculty securing prestigious early-career grants, four investigators from the Departments of Physics and Astronomy and Computer Science have been awarded notable grant awards to advance their careers.

DeRoo, Hoadley advance space instrumentation with Nancy Grace Roman Technology Fellowships in Astrophysics for Early Career Researchers

Casey DeRoo and Keri Hoadley , both assistant professors in the Department of Physics and Astronomy, each received a Nancy Grace Roman Technology Fellowship in Astrophysics for Early Career Researchers. The NASA fellowship provides each researcher with $500,000 over two years to support their research in space-based instrumentation. 

Keri Hoadley

Hoadley’s research is two-pronged. She will design and ultimately prototype a mirror-based vacuum ultraviolet polarizer, which will allow researchers to access polarized light from space below 120-nanometer wavelength. Polarizing light at such a low wavelength is crucial to building optics for NASA’s future Habitable World Observatory (HWO), the agency’s next flagship astrophysics mission after the Nancy Grace Roman Space Telescope. 

“Our vacuum ultraviolet polarizer project is meant to help set up our lab to propose to NASA for one or more follow-up technology programs, including adapting this polarizer for use in vacuum systems, duplicating it and measuring its efficiency to measure additional flavors of polarized UV light, quantifying the polarization effects introduced by UV optical components that may be used on HWO, and building an astronomical instrument to measure the polarization of UV from around massive stars and throughout star-forming regions,” said Hoadley.

In addition, Hoadley and her team will build a facility to align, calibrate, and integrate small space telescopes before flight, using a vacuum chamber and wavelengths of light typically only accessible in space, which could help the university win future small satellite and suborbital missions from NASA. 

Casey DeRoo

DeRoo will work to advance diffraction gratings made with electron beams that pattern structures on a nanometer scale.   Like a prism, diffraction gratings spread out and direct light coming from stars and galaxies, allowing researchers to deduce things like the temperature, density, or composition of an astronomical object.

The fellowship will allow DeRoo to upgrade the university’s Raith

DeRoo

 Voyager tool, a specialized fabrication tool hosted by OVPR’s Materials Analysis, Testing and Fabrication (MATFab) facility.

“These upgrades will let us perform algorithmic patterning, which uses computer code to quickly generate the patterns to be manufactured,” DeRoo said. “This is a major innovation that should enable us to make more complex grating shapes as well as make gratings more quickly.” DeRoo added that the enhancements mean his team may be able to make diffraction gratings that allow space instrument designs that are distinctly different from those launched to date.

“For faculty who develop space-based instruments, the Nancy Grace Roman Technology Fellowship is on par with the prestige of an NSF CAREER or Department of Energy Early Career award,” said Mary Hall Reno, professor and department chair. “Our track record with the program elevates our status as a destination university for astrophysics and space physics missions.”

Uppu pursues building blocks quantum computing with NSF CAREER Award

Ravitej Uppu

Ravitej Uppu, assistant professor in the Department of Physics and Astronomy, received a 5-year NSF CAREER award of $550,000 to conduct research aimed at amplifying the power of quantum computing and making its application more practical. 

Uppu and his team will explore the properties of light-matter interactions at the level of a single photon interacting with a single molecule, enabling them to generate efficient and high-quality multiphoton entangled states of light. Multiphoton entangled states, in which photons become inextricably linked, are necessary for photons to serve as practical quantum interconnects, transmitting information between quantum computing units, akin to classical cluster computers. 

“ In our pursuit of secure communication, exploiting quantum properties of light is the final frontier,” said Uppu. “However, unavoidable losses that occur in optical fiber links between users can easily nullify the secure link. Our research on multiphoton entangled states is a key building block for implementing ‘quantum repeaters’ that can overcome this challenge.”

Jiang tackles real-world data issues with NSF CAREER Award

Peng Jiang

Peng Jiang, assistant professor in the Department of Computer Science, received an NSF CAREER Award that will provide $548,944 over five years to develop tools to support the use of sampling-based algorithms. 

Sampling-based algorithms reduce computing costs by processing only a random selection of a dataset, which has made them increasingly popular, but the method still faces limited efficiency. Jiang will develop a suite of tools that simplify the implementation of sampling-based algorithms and improve their efficacy across wide range of computing and big data applications.

“ A simple example of a real-world application is subgraph matching,” Jiang said. “For example, one might be interested in finding a group of people with certain connections in a social network. The use of sampling-based algorithms can significantly accelerate this process.”

In addition to providing undergraduate students the opportunity to engage with this research, Jiang also plans for the project to enhance projects in computer science courses.

Cart

  • SUGGESTED TOPICS
  • The Magazine
  • Newsletters
  • Managing Yourself
  • Managing Teams
  • Work-life Balance
  • The Big Idea
  • Data & Visuals
  • Reading Lists
  • Case Selections
  • HBR Learning
  • Topic Feeds
  • Account Settings
  • Email Preferences

6 Common Leadership Styles — and How to Decide Which to Use When

  • Rebecca Knight

how to create a data analysis in research

Being a great leader means recognizing that different circumstances call for different approaches.

Research suggests that the most effective leaders adapt their style to different circumstances — be it a change in setting, a shift in organizational dynamics, or a turn in the business cycle. But what if you feel like you’re not equipped to take on a new and different leadership style — let alone more than one? In this article, the author outlines the six leadership styles Daniel Goleman first introduced in his 2000 HBR article, “Leadership That Gets Results,” and explains when to use each one. The good news is that personality is not destiny. Even if you’re naturally introverted or you tend to be driven by data and analysis rather than emotion, you can still learn how to adapt different leadership styles to organize, motivate, and direct your team.

Much has been written about common leadership styles and how to identify the right style for you, whether it’s transactional or transformational, bureaucratic or laissez-faire. But according to Daniel Goleman, a psychologist best known for his work on emotional intelligence, “Being a great leader means recognizing that different circumstances may call for different approaches.”

how to create a data analysis in research

  • RK Rebecca Knight is a journalist who writes about all things related to the changing nature of careers and the workplace. Her essays and reported stories have been featured in The Boston Globe, Business Insider, The New York Times, BBC, and The Christian Science Monitor. She was shortlisted as a Reuters Institute Fellow at Oxford University in 2023. Earlier in her career, she spent a decade as an editor and reporter at the Financial Times in New York, London, and Boston.

Partner Center

Help | Advanced Search

Computer Science > Distributed, Parallel, and Cluster Computing

Title: analysis of distributed algorithms for big-data.

Abstract: The parallel and distributed processing are becoming de facto industry standard, and a large part of the current research is targeted on how to make computing scalable and distributed, dynamically, without allocating the resources on permanent basis. The present article focuses on the study and performance of distributed and parallel algorithms their file systems, to achieve scalability at local level (OpenMP platform), and at global level where computing and file systems are distributed. Various applications, algorithms,file systems have been used to demonstrate the areas, and their performance studies have been presented. The systems and applications chosen here are of open-source nature, due to their wider applicability.

Submission history

Access paper:.

  • HTML (experimental)
  • Other Formats

license icon

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

2024 Identity Fraud Study: Resolving the Shattered Identity Crisis

  • Date: April 10, 2024
  • Suzanne Sando
  • Report Details: 48 pages, 27 graphics
  • Research Topic(s):
  • Fraud & Security
  • Fraud Management
  • PAID CONTENT

The financial landscape experienced meaningful innovation and expansion over the past few decades. The most notable acceleration in advancement has happened in just the past few years. Digital banking, especially in the form of mobile banking apps, is a must for financial institutions to stay relevant with on-the-go consumers. Brick-and-mortar stores have moved into the e-commerce space (with many businesses moving away from physical stores to online-only offerings). Online loan origination is transforming the auto and mortgage industries. The list is endless. 

Whether it was the growing adoption of cryptocurrency and digital assets, real-time payments innovation, or frugal savers taking advantage of interest rate increases to employ strategic saving in 2023, consumers seemed to be taking advantage of forward economic momentum. 

And criminals surely took advantage of this forward momentum. Fraud-related resolution hours skyrocketed in 2023. The average amount of time consumers spent in 2022 resolving issues stemming from identity fraud clocked in at six hours, but in 2023, fraud resolution hours rose steeply, jumping to a nearly 10-hour average, a major disruption for consumers and financial institutions alike. 

Skyrocketing Resolution Hours Create Headaches for Consumers

Arrows showing a low of 6 hours for identity fraud resolution in 2022 and a high of 10 hours in 2023

Traditional identity fraud losses amounted to nearly $23 billion in 2023, resulting in a 13% increase in overall losses for U.S. adult victims of identity fraud. And since Javelin began tracking financial losses attributable to scams in 2021, there has been a steady yet nearly imperceptible drop in financial loss. Scams orchestrated by criminals resulted in just over $20 billion in fraud losses to victims. 

Javelin makes a distinction between traditional identity fraud and identity fraud scam losses to add perspective to the landscape of identity fraud and provide accurate historical information and relevant recommendations to financial institutions, fintechs, third-party fraud solutions providers, and even consumers. But it’s vital to remember that, to identity fraud victims, it doesn’t matter how the losses are analyzed and categorized. 

What matters is how their fraud and scam encounter is managed by the organizations they trust and with which they choose to do business, how they are treated throughout the resolution process, and how they feel after suffering a financial loss and a breach of trust. This must always be top of mind for organizations as they work to improve their efforts to detect and prevent further damage from identity fraud.

2024 Identity Fraud Study Sponsors

The Javelin Strategy & Research 2024 Identity Fraud Study provides a comprehensive analysis of fraud trends in the context of a changing technological and payments landscape. Its goal is to inform consumers, financial institutions, and businesses about the most effective means of controlling identity fraud. The study began in 2003 and serves as the nation’s longest-running analysis of identity fraud, with more than 105,000 consumers surveyed. This study is independently produced by Javelin and made possible with support from the following sponsors:

Table displaying logos of the 2024 IDF Study

Methodology

Survey data collection.

This ID fraud survey was conducted online among 5,000 U.S. adults over the age of 18; this sample is representative of the U.S. census demographics distribution. Data collection took place Oct. 23-Nov. 28, 2023. Data is weighted using 18-plus U.S. population benchmarks on age, gender, race/ethnicity, education, census region, and metropolitan status from the most current CPS targets. Due to rounding errors, the percentages on graphs may add up to 100% plus or minus 1%. To preserve the independence and objectivity of this annual report, the sponsors of this project were not involved in the tabulation, analysis, or reporting of final results.

Comparing Research Findings Across Organizations: Please Anticipate Natural Variances in Key Findings Javelin cautions readers to understand the context behind increases and decreases in key findings as they apply to the annual Identity Fraud Report, especially when comparing your own organization’s experiences or comparing research outcomes from other companies or agencies. It is impossible to compare identity fraud key findings across multiple sources and expect to see universal alignment. Key findings never line up across organizations due to how data is collected. Sample sizes (the number of consumers) also vary a great deal.

Learn More About This Report & Javelin

Related content, pig butchering scams: how banks can stop the slaughter.

Pig butchering is a devastating, multilayered, drawn-out investment scam that leaves victims with nothing. By implementing robust customer education, employing effective fraud dete...

Customer Contact Centers: Heroes in Cybercrime Remediation, Fraud Prevention

Criminals increasingly use cyberattacks and scams to target consumers, and FI call centers are often relied upon for victim assistance. The key will be FI customer-oriented contact...

2024 Authentication and Identity-Proofing Vendor Solutions Scorecard

Identity-proofing, a fundamental need of financial services companies, is a space in considerable flux. Fintech companies that cut their teeth as e-commerce payments platforms are ...

Make informed decisions in a digital financial world

IMAGES

  1. A Step-by-Step Guide to the Data Analysis Process [2022]

    how to create a data analysis in research

  2. 5 Steps of the Data Analysis Process

    how to create a data analysis in research

  3. Data analysis

    how to create a data analysis in research

  4. What is Data Analysis in Research

    how to create a data analysis in research

  5. 7 Steps of Data Analysis Process

    how to create a data analysis in research

  6. Data Analysis: What it is + Free Guide with Examples

    how to create a data analysis in research

VIDEO

  1. Data Analysis

  2. Epidata version 3.1 for data entry

  3. What is Data Analysis in research

  4. How to interpret Reliability analysis results

  5. How to Assess the Quantitative Data Collected from Questionnaire

  6. Data Analysis and Report Writing Part 1

COMMENTS

  1. How to Create a Data Analysis Plan: A Detailed Guide

    In this blog article, we will explore how to create a data analysis plan: the content and structure. This data analysis plan serves as a roadmap to how data collected will be organised and analysed. It includes the following aspects: Clearly states the research objectives and hypothesis. Identifies the dataset to be used.

  2. Data Analysis in Research: Types & Methods

    Definition of research in data analysis: According to LeCompte and Schensul, research data analysis is a process used by researchers to reduce data to a story and interpret it to derive insights. The data analysis process helps reduce a large chunk of data into smaller fragments, which makes sense. Three essential things occur during the data ...

  3. A Step-by-Step Guide to the Data Analysis Process

    1. Step one: Defining the question. The first step in any data analysis process is to define your objective. In data analytics jargon, this is sometimes called the 'problem statement'. Defining your objective means coming up with a hypothesis and figuring how to test it.

  4. Creating a Data Analysis Plan: What to Consider When Choosing

    For those interested in conducting qualitative research, previous articles in this Research Primer series have provided information on the design and analysis of such studies. 2, 3 Information in the current article is divided into 3 main sections: an overview of terms and concepts used in data analysis, a review of common methods used to ...

  5. The Beginner's Guide to Statistical Analysis

    Step 1: Write your hypotheses and plan your research design. To collect valid data for statistical analysis, you first need to specify your hypotheses and plan out your research design. Writing statistical hypotheses. The goal of research is often to investigate a relationship between variables within a population. You start with a prediction ...

  6. A practical guide to data analysis in general literature reviews

    This article is a practical guide to conducting data analysis in general literature reviews. The general literature review is a synthesis and analysis of published research on a relevant clinical issue, and is a common format for academic theses at the bachelor's and master's levels in nursing, physiotherapy, occupational therapy, public health and other related fields.

  7. Data Analysis

    Data Analysis. Definition: Data analysis refers to the process of inspecting, cleaning, transforming, and modeling data with the goal of discovering useful information, drawing conclusions, and supporting decision-making. It involves applying various statistical and computational techniques to interpret and derive insights from large datasets.

  8. Quantitative Data Analysis Methods & Techniques 101

    Factor 1 - Data type. The first thing you need to consider is the type of data you've collected (or the type of data you will collect). By data types, I'm referring to the four levels of measurement - namely, nominal, ordinal, interval and ratio. If you're not familiar with this lingo, check out the video below.

  9. What Is Data Analysis? (With Examples)

    Written by Coursera Staff • Updated on Apr 1, 2024. Data analysis is the practice of working with data to glean useful information, which can then be used to make informed decisions. "It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts," Sherlock ...

  10. Research Design

    Table of contents. Step 1: Consider your aims and approach. Step 2: Choose a type of research design. Step 3: Identify your population and sampling method. Step 4: Choose your data collection methods. Step 5: Plan your data collection procedures. Step 6: Decide on your data analysis strategies.

  11. Data Analysis Techniques In Research

    Data analysis techniques in research are essential because they allow researchers to derive meaningful insights from data sets to support their hypotheses or research objectives.. Data Analysis Techniques in Research: While various groups, institutions, and professionals may have diverse approaches to data analysis, a universal definition captures its essence.

  12. Qualitative Data Analysis: Step-by-Step Guide (Manual vs ...

    Step 1: Gather your qualitative data and conduct research (Conduct qualitative research) The first step of qualitative research is to do data collection. Put simply, data collection is gathering all of your data for analysis. A common situation is when qualitative data is spread across various sources.

  13. What Is a Research Design

    Step 1: Consider your aims and approach. Step 2: Choose a type of research design. Step 3: Identify your population and sampling method. Step 4: Choose your data collection methods. Step 5: Plan your data collection procedures. Step 6: Decide on your data analysis strategies. Other interesting articles.

  14. Learning to Do Qualitative Data Analysis: A Starting Point

    For many researchers unfamiliar with qualitative research, determining how to conduct qualitative analyses is often quite challenging. Part of this challenge is due to the seemingly limitless approaches that a qualitative researcher might leverage, as well as simply learning to think like a qualitative researcher when analyzing data. From framework analysis (Ritchie & Spencer, 1994) to content ...

  15. Data Analysis Plan: Examples & Templates

    A data analysis plan is a roadmap for how you're going to organize and analyze your survey data—and it should help you achieve three objectives that relate to the goal you set before you started your survey: Answer your top research questions. Use more specific survey questions to understand those answers. Segment survey respondents to ...

  16. Research Design: Decide on your Data Analysis Strategy

    The last step of designing your research is planning your data analysis strategies. In this video, we'll take a look at some common approaches for both quant...

  17. A Really Simple Guide to Quantitative Data Analysis

    It is important to know w hat kind of data you are planning to collect or analyse as this w ill. affect your analysis method. A 12 step approach to quantitative data analysis. Step 1: Start with ...

  18. Data Collection

    Data collection is a systematic process of gathering observations or measurements. Whether you are performing research for business, governmental or academic purposes, data collection allows you to gain first-hand knowledge and original insights into your research problem. While methods and aims may differ between fields, the overall process of ...

  19. Data Analysis

    Data Analysis. Different statistics and methods used to describe the characteristics of the members of a sample or population, explore the relationships between variables, to test research hypotheses, and to visually represent data are described. Terms relating to the topics covered are defined in the Research Glossary. Descriptive Statistics.

  20. How to Write Data Analysis Reports in 9 Easy Steps

    1. Start with an Outline. If you start writing without having a clear idea of what your data analysis report is going to include, it may get messy. Important insights may slip through your fingers, and you may stray away too far from the main topic. To avoid this, start the report by writing an outline first.

  21. Experts Explain How To Select And Manage Data For Effective Analysis

    4. Take A 'Decision Back' Approach. Focusing on data and analytics with a value-first drive is critical. To do this, a company must start with its business problem (s), not the data, and take ...

  22. Maternal and Infant Research Electronic Data Analysis (MIREDA): A

    The MIREDA (Maternal and Infant Research Electronic Data Analysis) partnership creates a UK-wide birth cohort by aligning existing electronic birth cohorts to have the same structure, content, and vocabularies, enabling UK-wide federated analyses. Objectives: 1) Create a core dynamic, live UK-wide electronic birth cohort with approximately ...

  23. The Beginner's Guide to Statistical Analysis

    Step 1: Write your hypotheses and plan your research design. To collect valid data for statistical analysis, you first need to specify your hypotheses and plan out your research design. Writing statistical hypotheses. The goal of research is often to investigate a relationship between variables within a population. You start with a prediction ...

  24. Americans Need A Six-figure Salary To Afford A Typical Home In Nearly

    To afford a median-priced home of $402,343, Americans need an annual income of $110,871, up 46 percent since the start of 2020. Americans must earn at least $100,000 annually to afford a median ...

  25. 9 facts about Americans and marijuana

    Another 29% say it has no impact. Criminal justice system fairness: 42% of Americans say legalizing marijuana for recreational use makes the criminal justice system fairer, compared with 18% who say it makes the system less fair. About four-in-ten (38%) say it has no impact. Use of other drugs: 27% say this policy decreases the use of other ...

  26. How to Do Thematic Analysis

    Different approaches to thematic analysis. Once you've decided to use thematic analysis, there are different approaches to consider. There's the distinction between inductive and deductive approaches:. An inductive approach involves allowing the data to determine your themes.; A deductive approach involves coming to the data with some preconceived themes you expect to find reflected there ...

  27. Four CLAS faculty researchers secure prestigious early career awards

    A test array of gratings printed with Raith Voyager tool. Photo courtesy of Casey DeRoo. Voyager tool, a specialized fabrication tool hosted by OVPR's Materials Analysis, Testing and Fabrication (MATFab) facility. "These upgrades will let us perform algorithmic patterning, which uses computer code to quickly generate the patterns to be manufactured," DeRoo said.

  28. 6 Common Leadership Styles

    Summary. Research suggests that the most effective leaders adapt their style to different circumstances — be it a change in setting, a shift in organizational dynamics, or a turn in the business ...

  29. [2404.06461] Analysis of Distributed Algorithms for Big-data

    Analysis of Distributed Algorithms for Big-data. Rajendra Purohit, K R Chowdhary, S D Purohit. The parallel and distributed processing are becoming de facto industry standard, and a large part of the current research is targeted on how to make computing scalable and distributed, dynamically, without allocating the resources on permanent basis.

  30. 2024 Identity Fraud Study: Resolving the Shattered Identity Crisis

    2024 Identity Fraud Study Sponsors. The Javelin Strategy & Research 2024 Identity Fraud Study provides a comprehensive analysis of fraud trends in the context of a changing technological and payments landscape. Its goal is to inform consumers, financial institutions, and businesses about the most effective means of controlling identity fraud.