Back to blog home

How to test your hypothesis (with statistics), the statsig team.

Navigating the realm of data-driven decisions, the role of hypothesis testing cannot be overstated. It serves as a crucial tool that helps you make informed choices by validating your assumptions with statistical rigor.

Diving into the metrics can often seem daunting, but understanding the basics of hypothesis testing empowers you to leverage data effectively. This guide aims to demystify the process and equip you with the knowledge to apply it confidently in your projects.

Understanding Hypothesis Testing Fundamentals

At its core, a hypothesis is an assumption you make about a particular parameter in your data set, crafted to test its validity through statistical analysis. This is not just a guess; it's a statement that suggests a potential outcome based on observed data patterns.

The foundation of hypothesis testing lies in two critical concepts:

Null hypothesis (H0) : This hypothesis posits that there is no effect or no difference in the data. It serves as a default position that indicates any observed effect is due to sampling error.

Alternative hypothesis (H1) : Contrary to the null, this hypothesis suggests that there is indeed an effect or a difference.

To give you a clearer picture:

Suppose you want to test if a new feature on your app increases user engagement. The null hypothesis would state that the feature does not change engagement, while the alternative hypothesis would assert that it does.

In practice, you would collect data on user engagement, apply a hypothesis testing statistics calculator to analyze this data, and determine whether to reject the null hypothesis or fail to reject it (note that "fail to reject" does not necessarily mean "accept"). This decision is usually based on a p-value, which quantifies the probability of obtaining a result at least as extreme as the one observed, under the assumption that the null hypothesis is correct.

Selecting the right statistical test

Choosing the right statistical test is pivotal; it hinges on your data type and the questions at hand. For instance, a t-test is optimal for comparing the means of two groups when you assume a normal distribution. This test helps you decide if differences in group means are statistically significant.

When your study involves more than two groups, ANOVA (Analysis of Variance) is the go-to method. It evaluates differences across group means to ascertain variability within samples. If your data consists of categories, the chi-squared test evaluates whether distributions of categorical variables differ from each other.

Criteria for test selection:

T-test : Use when comparing two groups under normal distribution.

ANOVA : Apply when comparing three or more groups.

Chi-squared : Best for categorical data analysis.

Each test serves a specific purpose, tailored to the nature of your data and research objectives. By selecting the appropriate test, you enhance the reliability of your conclusions, ensuring that your decisions are data-driven.

Sample size and power considerations

Calculating the right sample size is crucial for the reliability of your hypothesis testing. A larger sample size decreases the margin of error and boosts the confidence level. This makes your results more dependable and robust.

Statistical power is the likelihood of correctly rejecting the null hypothesis when it is indeed false. Several factors influence this power:

Sample size : Larger samples increase power.

Effect size : Bigger effects are easier to detect.

Significance level : Lower levels demand stronger evidence.

Understanding these elements helps you design more effective tests. With the right balance, you maximize the chance of detecting true effects, making your insights actionable. Always consider these factors when planning your experiments to ensure meaningful and accurate outcomes.

Implementing the test: Steps and procedures

Setting up and executing a statistical test involves several clear steps. First, define your hypotheses and decide on the appropriate statistical test based on your data type. Next, gather your data through reliable collection methods, ensuring accuracy and relevance.

Handling data anomalies is part of the process. Identify outliers and decide whether to exclude them based on their impact on your results. Utilize software tools like R, Python, or specialized statistical software to analyze the data.

Interpreting the results is crucial. Focus on the p-value; it helps determine the statistical significance of your test results. A low p-value (typically less than 0.05) suggests that you can reject the null hypothesis.

Remember, while p-values indicate whether an effect exists, they don't measure its size or importance. Always complement p-value analysis with confidence intervals and effect size measures to fully understand your test outcomes. This approach ensures you make informed decisions based on comprehensive data analysis.

Common Mistakes and Misinterpretations

P-hacking stands out as a notable pitfall in hypothesis testing. Researchers might cycle through various methods or subsets until they find a p-value that supports their desired outcome. This practice risks producing results that do not accurately reflect the true nature of the data.

Misunderstandings about p-values are widespread. Remember, a significant p-value does not imply causation. It also does not indicate the magnitude of an effect, merely that the effect is unlikely to be due to chance.

Always approach p-values with a critical eye. Appreciate their role in hypothesis testing but understand their limitations. They are tools for decision making, not definitive proofs. For a deeper understanding, you might find it helpful to read about how hypothesis testing is akin to a game of flipping coins, or explore further explanations on the nuances of p-values in hypothesis testing .

Statsig for startups

Statsig offers a generous program for early-stage startups who are scaling fast and need a sophisticated experimentation platform.

Build fast?

Try statsig today.

how to set up hypothesis testing in statistics

Recent Posts

How to add feature flags to next.js.

In this tutorial, we show how to setup Next.JS Product Analytics with the Statsig SDKs. We'll use Next.JS App Router, Statsig for product analytics, and also share how to deploy this app with Vercel.

Go from 0 to 1 with Statsig's free suite of data tools for startups

Statsig has four data tools that are ideal for earlier stage companies: Web Analytics, Session Replay, Sidecar (low-code website experimentation), and Product Analytics.

The Marketers go-to tech stack for website optimization

Boost your site's conversions with Statsig's Web Analytics, low code experimentation Sidecar, and Session Replay. Optimize efficiently with deep insights and testing tools.

Experiment scorecards: Essentials and best practices

An experiment scorecard is more than just a collection of numbers; it's a narrative of your experiment's journey from hypothesis to conclusion.

What's the difference between Statsig and PostHog?

Statsig and PostHog both offer suites of tools that help builders be more data-driven in how they develop products, but how do they differ?

Intro to product analytics

Product analytics reveals user interactions, driving informed decisions, enhancing UX, and boosting business outcomes.

Next-Gen App & Browser Testing Cloud

Trusted by 2 Mn+ QAs & Devs to accelerate their release cycles

Join

Automation Playwright Testing Selenium Python Tutorial

  • What Is Hypothesis Testing in Python: A Hands-On Tutorial

how to set up hypothesis testing in statistics

Jaydeep Karale

Posted On: June 5, 2024

view count

In software testing, there is an approach known as property-based testing that leverages the concept of formal specification of code behavior and focuses on asserting properties that hold true for a wide range of inputs rather than individual test cases.

Python is an open-source programming language that provides a Hypothesis library for property-based testing. Hypothesis testing in Python provides a framework for generating diverse and random test data, allowing development and testing teams to thoroughly test their code against a broad spectrum of inputs.

In this blog, we will explore the fundamentals of Hypothesis testing in Python using Selenium and Playwright. We’ll learn various aspects of Hypothesis testing, from basic usage to advanced strategies, and demonstrate how it can improve the robustness and reliability of the codebase.

TABLE OF CONTENTS

What Is a Hypothesis Library?

Decorators in hypothesis, strategies in hypothesis, setting up python environment for hypothesis testing, how to perform hypothesis testing in python, hypothesis testing in python with selenium and playwright.

  • How to Run Hypothesis Testing in Python With Date Strategy?
  • How to Write Composite Strategies in Hypothesis Testing in Python?

Frequently Asked Questions (FAQs)

Hypothesis is a property-based testing library that automates test data generation based on properties or invariants defined by the developers and testers.

In property-based testing, instead of specifying individual test cases, developers define general properties that the code should satisfy. Hypothesis then generates a wide range of input data to test these properties automatically.

Property-based testing using Hypothesis allows developers and testers to focus on defining the behavior of their code rather than writing specific test cases, resulting in more comprehensive testing coverage and the discovery of edge cases and unexpected behavior.

Writing property-based tests usually consists of deciding on guarantees our code should make – properties that should always hold, regardless of what the world throws at the code.

Examples of such guarantees can be:

  • Your code shouldn’t throw an exception or should only throw a particular type of exception (this works particularly well if you have a lot of internal assertions).
  • If you delete an object, it is no longer visible.
  • If you serialize and then deserialize a value, you get the same value back.

Before we proceed further, it’s worthwhile to understand decorators in Python a bit since the Hypothesis library exposes decorators that we need to use to write tests.

In Python, decorators are a powerful feature that allows you to modify or extend the behavior of functions or classes without changing their source code. Decorators are essentially functions themselves, which take another function (or class) as input and return a new function (or class) with added functionality.

Decorators are denoted by the @ symbol followed by the name of the decorator function placed directly before the definition of the function or class to be modified.

Let us understand this with the help of an example:

decorators in Python a bit since the Hypothesis library

In the example above, only authenticated users are allowed to create_post() . The logic to check authentication is wrapped in its own function, authenticate() .

This function can now be called using @authenticate before beginning a function where it’s needed & Python would automatically know that it needs to execute the code of authenticate() before calling the function.

If we no longer need the authentication logic in the future, we can simply remove the @authenticate line without disturbing the core logic. Thus, decorators are a powerful construct in Python that allows plug-n-play of repetitive logic into any function/method.

Now that we know the concept of Python decorators, let us understand the given decorators that which Hypothesis provides.

Hypothesis @given Decorator

This decorator turns a test function that accepts arguments into a randomized test. It serves as the main entry point to the Hypothesis.

The @given decorator can be used to specify which arguments of a function should be parameterized over. We can use either positional or keyword arguments, but not a mixture of both.

.given(*_given_arguments, **_given_kwargs)

Some valid declarations of the @given decorator are:

given(integers(), integers()) a(x, y): pass given(integers()) b(x, y): pass given(y=integers()) c(x, y): pass given(x=integers()) d(x, y): pass given(x=integers(), y=integers()) e(x, **kwargs): pass given(x=integers(), y=integers()) f(x, *args, **kwargs): pass SomeTest(TestCase): @given(integers()) def test_a_thing(self, x): pass

Some invalid declarations of @given are:

given(integers(), integers(), integers()) g(x, y): pass given(integers()) h(x, *args): pass given(integers(), x=integers()) i(x, y): pass given() j(x, y): pass

Hypothesis @example Decorator

When writing production-grade applications, the ability of a Hypothesis to generate a wide range of input test data plays a crucial role in ensuring robustness.

However, there are certain inputs/scenarios the testing team might deem mandatory to be tested as part of every test run. Hypothesis has the @example decorator in such cases where we can specify values we always want to be tested. The @example decorator works for all strategies.

Let’s understand by tweaking the factorial test example.

Hypothesis to generate a wide range of input test data

The above test will always run for the input value 41 along with other custom-generated test data by the Hypothesis st.integers() function.

By now, we understand that the crux of the Hypothesis is to test a function for a wide range of inputs. These inputs are generated automatically, and the Hypothesis lets us configure the range of inputs. Under the hood, the strategy method takes care of the process of generating this test data of the correct data type.

Hypothesis offers a wide range of strategies such as integers, text, boolean, datetime, etc. For more complex scenarios, which we will see a bit later in this blog, the hypothesis also lets us set up composite strategies.

While not exhaustive, here is a tabular summary of strategies available as part of the Hypothesis library.

Strategy Description
Generates none values.
Generates boolean values (True or False).
Generates integer values.
Generates floating-point values.
Generates unicode text strings.
Generates single unicode characters.
Generates lists of elements.
Generates tuples of elements.
Generates dictionaries with specified keys and values.
Generates sets of elements.
Generates binary data.
Generates datetime objects.
Generates timedelta objects.
Choose one of the given strategies with equal probability.
Chooses values from a given sequence with equal probability.
Generates lists of elements.
Generates date objects.
Generates datetime objects.
Generates a single value.
Generates strings that match a given regular expression.
Generates UUID objects.
Generates complex numbers.
Generates fraction objects.
Builds objects using a provided constructor and strategy for each argument.
Generates single unicode characters.
Generates unicode text strings.
Chooses values from a given sequence with equal probability.
Generates arbitrary data values.
Generates values that are shared between different parts of a test.
Generates recursively structured data.
Generates data based on the outcome of other strategies.

Let’s see the steps to how to set up a test environment to perform Hypothesis testing in Python.

  • Create a separate virtual environment for this project using the built-in venv module of Python using the command.

Create a separate virtual environment

  • Activate the newly created virtual environment using the activate script present within the environment.

Activate the newly created virtual environment

  • Install the Hypothesis library necessary for property-based testing using the pip install hypothesis command. The installed package can be viewed using the command pip list. When writing this blog, the latest version of Hypothesis is 6.102.4. For this article, we have used the Hypothesis version 6.99.6.

Install the Hypothesis library necessary for property-based testing

  • Install python-dotenv , pytest, Playwright, and Selenium packages which we will need to run the tests on the cloud. We will talk about this in more detail later in the blog.

Our final project structure setup looks like below:

Our final project structure setup looks like below

With the setup done, let us now understand Hypothesis testing in Python with various examples, starting with the introductory one and then working toward more complex ones.

Subscribe to the LambdaTest YouTube Channel for quick updates on the tutorials around Selenium Python and more.

Let’s now start writing tests to understand how we can leverage the Hypothesis library to perform Python automation .

For this, let’s look at one test scenario to understand Hypothesis testing in Python.

Test Scenario:

Implementation:

This is what the initial implementation of the function looks like:

factorial(num: int) -> int: if num < 0: raise ValueError("Input must be > 0") fact = 1 for _ in range(1, num + 1): fact *= _ return fact

It takes in an integer as an input. If the input is 0, it raises an error; if not, it uses the range() function to generate a list of numbers within, iterate over it, calculate the factorial, and return it.

Let’s now write a test using the Hypothesis library to test the above function:

hypothesis import given, strategies as st given(st.integers(min_value=1, max_value=30)) test_factorial(num: int): fact_num_result = factorial(num) fact_num_minus_one_result = factorial(num-1) result = fact_num_result / fact_num_minus_one_result assert num == result

Code Walkthrough:

Let’s now understand the step-by-step code walkthrough for Hypothesis testing in Python.

Step 1: From the Hypothesis library, we import the given decorator and strategies method.

 import the given decorator and strategies method

Step 2: Using the imported given and strategies, we set our test strategy of passing integer inputs within the range of 1 to 30 to the function under test using the min_value and max_value arguments.

set our test strategy of passing integer inputs

Step 3: We write the actual test_factorial where the integer generated by our strategy is passed automatically by Hypothesis into the value num.

Using this value we call the factorial function once for value num and num – 1.

Next, we divide the factorial of num by the factorial of num -1 and assert if the result of the operation is equal to the original num.

write the actual test_factorial where the integer generated

Test Execution:

Let’s now execute our hypothesis test using the pytest -v -k “test_factorial” command.

execute our hypothesis test using the pytest

And Hypothesis confirms that our function works perfectly for the given set of inputs, i.e., for integers from 1 to 30.

We can also view detailed statistics of the Hypothesis run by passing the argument –hypothesis-show-statistics to pytest command as:

-v --hypothesis-show-statistics -k "test_factorial"

view detailed statistics of the Hypothesis run

The difference between the reuse and generate phase in the output above is explained below:

  • Reuse Phase: During the reuse phase, the Hypothesis attempts to reuse previously generated test data. If a test case fails or raises an exception, the Hypothesis will try to shrink the failing example to find a minimal failing case.

This phase typically has a very short runtime, as it involves reusing existing test data or shrinking failing examples. The output provides statistics about the typical runtimes and the number of passing, failing, and invalid examples encountered during this phase.

  • Generate Phase: During the generate phase, the Hypothesis generates new test data based on the defined strategies. This phase involves generating a wide range of inputs to test the properties defined by the developer.

The output provides statistics about the typical runtimes and the number of passing, failing, and invalid examples generated during this phase. While this helped us understand what passing tests look like with a Hypothesis, it’s also worthwhile to understand how a Hypothesis can catch bugs in the code.

Let’s rewrite the factorial() function with an obvious bug, i.e., remove the check for when the input value is 0.

factorial(num: int) -> int: # if num < 0: #     raise ValueError("Number must be >= 0") fact = 1 for _ in range(1, num + 1): fact *= _ return fact

We also tweak the test to remove the min_value and max_value arguments.

given(st.integers()) test_factorial(num: int): fact_num_result = factorial(num) fact_num_minus_one_result = factorial(num-1) result = int(fact_num_result / fact_num_minus_one_result) assert num == result

Let us now rerun the test with the same command:

-v --hypothesis-show-statistics -k test_factorial
pytest -v --hypothesis-show-statistics -k test_factorial

We can clearly see how Hypothesis has caught the bug immediately, which is shown in the above output. Hypothesis presents the input that resulted in the failing test under the Falsifying example section of the output.

see how Hypothesis has caught the bug immediately

So far, we’ve performed Hypothesis testing locally. This works nicely for unit tests , but when setting up automation for building more robust and resilient test suites, we can leverage a cloud grid like LambdaTest that supports automation testing tools like Selenium and Playwright.

LambdaTest is an AI-powered test orchestration and execution platform that enables developers and testers to perform automation testing with Selenium and Playwright at scale. It provides a remote test lab of 3000+ real environments.

How to Perform Hypothesis Testing in Python Using Cloud Selenium Grid?

Selenium is an open-source suite of tools and libraries for web automation . When combined with a cloud grid, it can help you perform Hypothesis testing in Python with Selenium at scale.

Let’s look at one test scenario to understand Hypothesis testing in Python with Selenium.

The code to set up a connection to LambdaTest Selenium Grid is stored in a crossbrowser_selenium.py file.

selenium import webdriver selenium.webdriver.chrome.options import Options selenium.webdriver.common.keys import Keys time import sleep urllib3 warnings os selenium.webdriver import ChromeOptions selenium.webdriver import FirefoxOptions selenium.webdriver.remote.remote_connection import RemoteConnection hypothesis.strategies import integers dotenv import load_dotenv () = os.getenv('LT_USERNAME', None) = os.getenv('LT_ACCESS_KEY', None) CrossBrowserSetup: global web_driver def __init__(self): global remote_url urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning) remote_url = "https://" + str(username) + ":" + str(access_key) + "@hub.lambdatest.com/wd/hub" def add(self, browsertype): if (browsertype == "Firefox"): ff_options = webdriver.FirefoxOptions() ff_options.browser_version = "latest" ff_options.platform_name = "Windows 11" lt_options = {} lt_options["build"] = "Build: FF: Hypothesis Testing with Selenium & Pytest" lt_options["project"] = "Project: FF: Hypothesis Testing withSelenium & Pytest" lt_options["name"] = "Test: FF: Hypothesis Testing with Selenium & Pytest" lt_options["browserName"] = "Firefox" lt_options["browserVersion"] = "latest" lt_options["platformName"] = "Windows 11" lt_options["console"] = "error" lt_options["w3c"] = True lt_options["headless"] = False ff_options.set_capability('LT:Options', lt_options) web_driver = webdriver.Remote( command_executor = remote_url, options = ff_options ) self.driver = web_driver self.driver.get("https://www.lambdatest.com")             sleep(1) if web_driver is not None: web_driver.execute_script("lambda-status=passed") web_driver.quit() return True else:               return False

The test_selenium.py contains code to test the Hypothesis that tests will only run on the Firefox browser.

hypothesis import given, settings hypothesis import given, example hypothesis.strategies as strategy src.crossbrowser_selenium import CrossBrowserSetup settings(deadline=None) given(strategy.just("Firefox")) test_add(browsertype_1): cbt = CrossBrowserSetup() assert True == cbt.add(browsertype_1)

Let’s now understand the step-by-step code walkthrough for Hypothesis testing in Python using Selenium Grid.

Step 1: We import the necessary Selenium methods to initiate a connection to LambdaTest Selenium Grid.

The FirefoxOptions() method is used to configure the setup when connecting to LambdaTest Selenium Grid using Firefox.

 FirefoxOptions() method is used to configure the setup

Step 2: We use the load_dotenv package to access the LT_ACCESS_KEY required to access the LambdaTest Selenium Grid, which is stored in the form of environment variables.

use the load_dotenv package to access the LT_ACCESS_KEY

The LT_ACCESS_KEY can be obtained from your LambdaTest Profile > Account Settings > Password & Security .

LT_ACCESS_KEY can be obtained from your LambdaTest Profile

Step 3: We initialize the CrossBrowserSetup class, which prepares the remote connection URL using the username and access_key.

initialize the CrossBrowserSetup class

Step 4: The add() method is responsible for checking the browsertype and then setting the capabilities of the LambdaTest Selenium Grid.

add() method is responsible for checking the browsertype

LambdaTest offers a variety of capabilities, such as cross browser testing , which means we can test on various operating systems such as Windows, Linux, and macOS and multiple browsers such as Chrome, Firefox, Edge, and Safari.

For the purpose of this blog, we will be testing that connection to the LambdaTest Selenium Grid should only happen if the browsertype is Firefox.

Step 5: If the connection to LambdaTest happens, the add() returns True ; else, it returns False .

 LambdaTest happens, the add() returns True

Let’s now understand a step-by-step walkthrough of the test_selenium.py file.

Step 1: We set up the imports of the given decorator and the Hypothesis strategy. We also import the CrossBrowserSetup class.

set up the imports of the given decorator

Step 2: @setting(deadline=None) ensures the test doesn’t timeout if the connection to the LambdaTest Grid takes more time.

We use the @given decorator to set the strategy to just use Firefox as an input to the test_add() argument broswertype_1. We then initialize an instance of the CrossBrowserSetup class & call the add() method using the broswertype_1 & assert if it returns True .

The commented strategy @given(strategy.just(‘Chrome’)) is to demonstrate that the add() method, when called with Chrome, returns False .

commented strategy @given(strategy.just(‘Chrome’))

Let’s now run the test using pytest -k “test_hypothesis_selenium.py”.

 run the test using pytest -k

We can see that the test has passed, and the Web Automation Dashboard reflects that the connection to the Selenium Grid has been successful.

connection to the Selenium Grid has been successful

On opening one of the execution runs, we can see a detailed step-by-step test execution.

see a detailed step-by-step test execution

How to Perform Hypothesis Testing in Python Using Cloud Playwright Grid?

Playwright is a popular open-source tool for end-to-end testing developed by Microsoft. When combined with a cloud grid, it can help you perform Hypothesis testing in Python at scale.

Let’s look at one test scenario to understand Hypothesis testing in Python with Playwright.

website.
os dotenv import load_dotenv playwright.sync_api import expect, sync_playwright hypothesis import given, strategies as st subprocess urllib json () = { 'browserName': 'Chrome',  # Browsers allowed: `Chrome`, `MicrosoftEdge`, `pw-chromium`, `pw-firefox` and `pw-webkit` 'browserVersion': 'latest', 'LT:Options': { 'platform': 'Windows 11', 'build': 'Playwright Hypothesis Demo Build', 'name': 'Playwright Locators Test For Windows 11 & Chrome', 'user': os.getenv('LT_USERNAME'), 'accessKey': os.getenv('LT_ACCESS_KEY'), 'network': True, 'video': True, 'visual': True, 'console': True, 'tunnel': False,   # Add tunnel configuration if testing locally hosted webpage 'tunnelName': '',  # Optional 'geoLocation': '', # country code can be fetched from https://www.lambdatest.com/capabilities-generator/ } interact_with_lambdatest(quantity): with sync_playwright() as playwright: playwrightVersion = str(subprocess.getoutput('playwright --version')).strip().split(" ")[1] capabilities['LT:Options']['playwrightClientVersion'] = playwrightVersion         lt_cdp_url = 'wss://cdp.lambdatest.com/playwright?capabilities=' + urllib.parse.quote(json.dumps(capabilities))     browser = playwright.chromium.connect(lt_cdp_url) page = browser.new_page()         page.goto("https://ecommerce-playground.lambdatest.io/") page.get_by_role("button", name="Shop by Category").click() page.get_by_role("link", name="MP3 Players").click() page.get_by_role("link", name="HTC Touch HD HTC Touch HD HTC Touch HD HTC Touch HD").click()         page.get_by_role("button", name="Add to Cart").click(click_count=quantity) page.get_by_role("link", name="Checkout ").first.click() unit_price = float(page.get_by_role("cell", name="$146.00").first.inner_text().replace("$",""))         page.evaluate("_ => {}", "lambdatest_action: {\"action\": \"setTestStatus\", \"arguments\": {\"status\":\"" + "Passed" + "\", \"remark\": \"" + "pass" + "\"}}" ) page.close() total_price = quantity * unit_price         return total_price = st.integers(min_value=1, max_value=10) given(quantity=quantity_strategy) test_website_interaction(quantity):     assert interact_with_lambdatest(quantity) == quantity * 146.00

Let’s now understand the step-by-step code walkthrough for Hypothesis testing in Python using Playwright Grid.

Step 1: To connect to the LambdaTest Playwright Grid, we need a Username and Access Key, which can be obtained from the Profile page > Account Settings > Password & Security.

We use the python-dotenv module to load the Username and Access Key, which are stored as environment variables.

The capabilities dictionary is used to set up the Playwright Grid on LambdaTest.

We configure the Grid to use Windows 11 and the latest version of Chrome.

Grid

Step 3: The function interact_with_lambdatest interacts with the LambdaTest eCommerce Playground website to simulate adding a product to the cart and proceeding to checkout.

It starts a Playwright session and retrieves the version of the Playwright being used. The LambdaTest CDP URL is created with the appropriate capabilities. It connects to the Chromium browser instance on LambdaTest.

A new page instance is created, and the LambdaTest eCommerce Playground website is navigated. The specified product is added to the cart by clicking through the required buttons and links. The unit price of the product is extracted from the web page.

The browser page is then closed.

quantity_strategy

Step 4: We define a Hypothesis strategy quantity_strategy using st.integers to generate random integers representing product quantities. The generated integers range from 1 to 10

Using the @given decorator from the Hypothesis library, we define a property-based test function test_website_interaction that takes a quantity parameter generated by the quantity_strategy .

Inside the test function, we use the interact_with_lambdatest function to simulate interacting with the website and calculate the total price based on the generated quantity.

We assert that the total_price returned by interact_with_lambdatest matches the expected value calculated as quantity * 146.00.

Test Execution

Let’s now run the test on the Playwright Cloud Grid using pytest -v -k “test_hypothesis_playwright.py ”

passed tests

The LambdaTest Web Automation Dashboard shows successfully passed tests.

LambdaTest Web

Run Your Hypothesis Tests With Selenium & Playwright on Cloud. Try LambdaTest Today!

How to Perform Hypothesis Testing in Python With Date Strategy?

In the previous test scenario, we saw a simple example where we used the integer() strategy available as part of the Hypothesis. Let’s now understand another strategy, the date() strategy, which can be effectively used to test date-based functions.

Also, the output of the Hypothesis run can be customized to produce detailed results. Often, we may wish to see an even more verbose output when executing a Hypothesis test.

To do so, we have two options: either use the @settings decorator or use the –hypothesis-verbosity=<verbosity_level> when performing pytest testing .

hypothesis import Verbosity,settings, given, strategies as st datetime import datetime, timedelta generate_expiry_alert(expiry_date): current_date = datetime.now().date() days_until_expiry = (expiry_date - current_date).days return days_until_expiry <= 45 given(expiry_date=st.dates()) settings(verbosity=Verbosity.verbose, max_examples=1000) test_expiry_alert_generation(expiry_date): alert_generated = generate_expiry_alert(expiry_date) # Check if the alert is generated correctly based on the expiry date days_until_expiry = (expiry_date - datetime.now().date()).days expected_alert = days_until_expiry <= 45 assert alert_generated == expected_alert

Let’s now understand the code step-by-step.

Step 1: The function generate_expiry_alert() , which takes in an expiry_date as input and returns a boolean depending on whether the difference between the current date and expiry_date is less than or equal to 45 days.

generate_expiry_alert

Step 2: To ensure we test the generate_expiry_alert() for a wide range of date inputs, we use the date() strategy.

We also enable verbose logging and set the max_examples=1000 , which requests Hypothesis to generate 1000 date inputs at the max.

generated

Step 3: On the inputs generated by Hypothesis in Step 3, we call the generate_expiry_alert() function and store the returned boolean in alert_generated.

We then compare the value returned by the function generate_expiry_alert() with a locally calculated copy and assert if the match.

assert

We execute the test using the below command in the verbose mode, which allows us to see the test input dates generated by the Hypothesis.

-s --hypothesis-show-statistics --hypothesis-verbosity=debug -k "test_expiry_alert_generation"

reused data

As we can see, Hypothesis ran 1000 tests, 2 with reused data and 998 with unique newly generated data, and found no issues with the code.

Now, imagine the trouble we would have had to take to write 1000 tests manually using traditional example-based testing.

How to Perform Hypothesis Testing in Python With Composite Strategies?

So far, we’ve been using simple standalone examples to demo the power of Hypothesis. Let’s now move on to more complicated scenarios.

website offers customer rewards points. A class tracks the customer reward points and their spending. class.

The implementation of the UserRewards class is stored in a user_rewards.py file for better readability.

UserRewards: def __init__(self, initial_points): self.reward_points = initial_points def get_reward_points(self): return self.reward_points def spend_reward_points(self, spent_points): if spent_points<= self.reward_points: self.reward_points -= spent_points return True else: return False

The tests for the UserRewards class are stored in test_user_rewards.py .

hypothesis import given, strategies as st src.user_rewards import UserRewards = st.integers(min_value=0, max_value=1000)   given(initial_points=reward_points_strategy) test_get_reward_points(initial_points): user_rewards = UserRewards(initial_points) assert user_rewards.get_reward_points() == initial_points given(initial_points=reward_points_strategy, spend_amount=st.integers(min_value=0, max_value=1000)) test_spend_reward_points(initial_points, spend_amount): user_rewards = UserRewards(initial_points) remaining_points = user_rewards.get_reward_points() if spend_amount <= initial_points: assert user_rewards.spend_reward_points(spend_amount) remaining_points -= spend_amount else: assert not user_rewards.spend_reward_points(spend_amount) assert user_rewards.get_reward_points() == remaining_points

Let’s now understand what is happening with both the class file and the test file step-by-step, starting first with the UserReward class.

Step 1: The class takes in a single argument initial_points to initialize the object.

single argument

Step 2: The get_reward_points() function returns the customers current reward points.

reward points

Step 3: The spend_reward_points() takes in the spent_points as input and returns True if spent_points are less than or equal to the customer current point balance and updates the customer reward_points by subtracting the spent_points , else it returns False .

UserReward

That is it for our simple UserRewards class. Next, we understand what’s happening in the test_user_rewards.py step-by-step.

Step 1: We import the @given decorator and strategies from Hypothesis and the UserRewards class.

Hypothesis

Step 2: Since reward points will always be integers, we use the integer() Hypothesis strategy to generate 1000 sample inputs starting with 0 and store them in a reward_points_strategy variable.

rewards_point_strategy

Step 3: Use the rewards_point_strategy as an input we run the test_get_reward_points() for 1000 samples starting with value 0.

For each input, we initialize the UserRewards class and assert that the method get_reward_points() returns the same value as the initial_points .

Step 4: To test the spend_reward_points() function, we generate two sets of sample inputs first, an initial reward_points using the reward_points_strategy we defined in Step 2 and a spend_amount which simulates spending of points.

spending of points

Step 5: Write the test_spend_reward_points , which takes in the initial_points and spend_amount as arguments and initializes the UserRewards class with initial_point .

We also initialize a remaining_points variable to track the points remaining after the spend.

initial_points

Step 6: If the spend_amount is less than the initial_points allocated to the customer, we assert if spend_reward_points returns True and update the remaining_points else, we assert spend_reward_points returns False .

remaining_points

Step 7: Lastly, we assert if the final remaining_points are correctly returned by get_rewards_points , which should be updated after spending the reward points.

Hypothesis

Let’s now run the test and see if Hypothesis is able to find any bugs in the code.

-s --hypothesis-show-statistics --hypothesis-verbosity=debug -k "test_user_rewards"

UserRewards

To test if the Hypothesis indeed works, let’s make a small change to the UserRewards by commenting on the logic to deduct the spent_points in the spend_reward_points() function.

pytest

We run the test suite again using the command pytest -s –hypothesis-show-statistics -k “test_user_rewards “.

hypothesis-show-statistics

This time, the Hypothesis highlights the failures correctly.

Thus, we can catch any bugs and potential side effects of code changes early, making it perfect for unit testing and regression testing .

To understand composite strategies a bit more, let’s now test the shopping cart functionality and see how composite strategy can help write robust tests for even the most complicated of real-world scenarios.

and which handles the shopping cart feature of the website.

Let’s view the implementation of the ShoppingCart class written in the shopping_cart.py file.

random   enum import Enum, auto   Item(Enum):   """Item type"""   LUNIX_CAMERA = auto()   IMAC = auto()   HTC_TOUCH = auto()   CANNON_EOS = auto()   IPOD_TOUCH = auto()   APPLE_VISION_PRO = auto()   COFMACBOOKFEE = auto()   GALAXY_S24 = auto()   def __str__(self):   return self.name.upper()   ShoppingCart:   def __init__(self):   """   ""   self.items = {}   def add_item(self, item: Item, price: int | float, quantity: int = 1) -> None:   """   ""   if item.name in self.items:   self.items[item.name]["quantity"] += quantity   else:   self.items[item.name] = {"price": price, "quantity": quantity}   def remove_item(self, item: Item, quantity: int = 1) -> None:   """   ""   if item.name in self.items:   if self.items[item.name]["quantity"] <= quantity:   del self.items[item.name]   else:   self.items[item.name]["quantity"] -= quantity   def get_total_price(self):   total_price = 0   for item in self.items.values():   total_price += item["price"] * item["quantity"]   return total_price

Let’s now view the tests written to verify the correct behavior of all aspects of the ShoppingCart class stored in a separate test_shopping_cart.py file.

typing import Callable   hypothesis import given, strategies as st   hypothesis.strategies import SearchStrategy   src.shopping_cart import ShoppingCart, Item   st.composite   items_strategy(draw: Callable[[SearchStrategy[Item]], Item]):   return draw(st.sampled_from(list(Item)))   st.composite   price_strategy(draw: Callable[[SearchStrategy[int]], int]):   return draw(st.integers(min_value=1, max_value=100)) st.composite   qty_strategy(draw: Callable[[SearchStrategy[int]], int]):   return draw(st.integers(min_value=1, max_value=10))   given(items_strategy(), price_strategy(), qty_strategy())   test_add_item_hypothesis(item, price, quantity):   cart = ShoppingCart()   # Add items to cart   cart.add_item(item=item, price=price, quantity=quantity)   # Assert that the quantity of items in the cart is equal to the number of items added   assert item.name in cart.items   assert cart.items[item.name]["quantity"] == quantity   given(items_strategy(), price_strategy(), qty_strategy())   test_remove_item_hypothesis(item, price, quantity):   cart = ShoppingCart()   print("Adding Items")   # Add items to cart   cart.add_item(item=item, price=price, quantity=quantity)   cart.add_item(item=item, price=price, quantity=quantity)   print(cart.items)   # Remove item from cart   print(f"Removing Item {item}")   quantity_before = cart.items[item.name]["quantity"]   cart.remove_item(item=item)   quantity_after = cart.items[item.name]["quantity"]   # Assert that if we remove an item, the quantity of items in the cart is equal to the number of items added - 1   assert quantity_before == quantity_after + 1   given(items_strategy(), price_strategy(), qty_strategy())   test_calculate_total_hypothesis(item, price, quantity):   cart = ShoppingCart()   # Add items to cart   cart.add_item(item=item, price=price, quantity=quantity)   cart.add_item(item=item, price=price, quantity=quantity)   # Remove item from cart   cart.remove_item(item=item)   # Calculate total   total = cart.get_total_price()   assert total == cart.items[item.name]["price"] * cart.items[item.name]["quantity"]

Code Walkthrough of ShoppingCart class:

Let’s now understand what is happening in the ShoppingCart class step-by-step.

Step 1: We import the Python built-in Enum class and the auto() method.

The auto function within the Enum module automatically assigns sequential integer values to enumeration members, simplifying the process of defining enumerations with incremental values.

enum

We define an Item enum corresponding to items available for sale on the LambdaTest eCommerce Playground website.

Step 2: We initialize the ShoppingCart class with an empty dictionary of items.

empty dictionary

Step 3: The add_item() method takes in the item, price, and quantity as input and adds it to the shopping cart state held in the item dictionary.

remove_item

Step 4: The remove_item() method takes in an item and quantity and removes it from the shopping cart state indicated by the item dictionary.

item dictionary

Step 5: The get_total_price() method iterates over the item dictionary, multiples the quantity by price, and returns the total_price of items in the cart.

returns

Code Walkthrough of test_shopping_cart:

Let’s now understand step-by-step the tests written to ensure the correct working of the ShoppingCart class.

Step 1: First, we set up the imports, including the @given decorator, strategies, and the ShoppingCart class and Item enum.

The SearchStrategy is one of the various strategies on offer as part of the Hypothesis. It represents a set of rules for generating valid inputs to test a specific property or behavior of a function or program.

Hypothesis strategy

Step 2: We use the @st.composite decorator to define a custom Hypothesis strategy named items_strategy. This strategy takes a single argument, draw, which is a callable used to draw values from other strategies.

The st.sampled_from strategy randomly samples values from a given iterable. Within the strategy, we use draw(st.sampled_from(list(Item))) to draw a random Item instance from a list of all enum members.

Each time the items_strategy is used in a Hypothesis test, it will generate a random instance of the Item enum for testing purposes.

item_strategy

Step 3: The price_strategy runs on similar logic as the item_strategy but generates an integer value between 1 and 100.

logic

Step 4: The qty_strategy runs on similar logic as the item_strategy but generates an integer value between 1 and 10.

generates

Step 5: We use the @given decorator from the Hypothesis library to define a property-based test.

The items_strategy() , price_strategy() , and qty_strategy() functions are used to generate random values for the item, price, and quantity parameters, respectively.

Inside the test function, we create a new instance of a ShoppingCart .

We then add an item to the cart using the generated values for item, price, and quantity.

Finally, we assert that the item was successfully added to the cart and that the quantity matches the generated quantity.

Hypothesis library

Step 6: We use the @given decorator from the Hypothesis library to define a property-based test.

The items_strategy(), price_strategy() , and qty_strategy() functions are used to generate random values for the item, price, and quantity parameters, respectively.

Inside the test function, we create a new instance of a ShoppingCart . We then add the same item to the cart twice to simulate two quantity additions to the cart.

We remove one instance of the item from the cart. After that, we compare the item quantity before and after removal to ensure it decreases by 1.

The test verifies the behavior of the remove_item() method of the ShoppingCart class by testing it with randomly generated inputs for item, price , and quantity.

ShoppingCart

Step 7: We use the @given decorator from the Hypothesis library to define a property-based test.

The items_strategy(), price_strategy(), and qty_strategy() functions are used to generate random values for the item, price, and quantity parameters, respectively.

We add the same item to the cart twice to ensure it’s present, then remove one instance of the item from the cart. After that, we calculate the total price of items remaining in the cart.

Finally, we assert that the total price matches the price of one item times its remaining quantity.

The test verifies the correctness of the get_total_price() method of the ShoppingCart class by testing it with randomly generated inputs for item, price , and quantity .

Let’s now run the test using the command pytest –hypothesis-show-statistics -k “test_shopping_cart”.

ShoppingCart class

We can verify that Hypothesis was able to find no issues with the ShoppingCart class.

Let’s now amend the price_strategy and qty_strategy to remove the min_value and max_value arguments.

max_value

And rerun the test pytest -k “test_shopping_cart” .

respect to handling

The tests run clearly reveal that we have bugs with respect to handling scenarios when quantity and price are passed as 0.

This also reveals the fact that setting the test inputs correctly to ensure we do comprehensive testing is key to writing robots and resilient tests.

Choosing min_val and max_val should only be done when we know beforehand the bounds of inputs the function under test will receive. If we are unsure what the inputs are, maybe it’s important to come up with the right strategies based on the behavior of the function under test.

In this blog we have seen in detail how Hypothesis testing in Python works using the popular Hypothesis library. Hypothesis testing falls under property-based testing and is much better than traditional testing in handling edge cases.

We also explored Hypothesis strategies and how we can use the @composite decorator to write custom strategies for testing complex functionalities.

We also saw how Hypothesis testing in Python can be performed with popular test automation frameworks like Selenium and Playwright. In addition, by performing Hypothesis testing in Python with LambdaTest on Cloud Grid, we can set up effective automation tests to enhance our confidence in the code we’ve written.

What are the three types of Hypothesis tests?

There are three main types of hypothesis tests based on the direction of the alternative hypothesis:

  • Right-tailed test: This tests if a parameter is greater than a certain value.
  • Left-tailed test: This tests if a parameter is less than a certain value.
  • Two-tailed test: This tests for any non-directional difference, either greater or lesser than the hypothesized value.

What is Hypothesis testing in the ML model?

Hypothesis testing is a statistical approach used to evaluate the performance and validity of machine learning models. It helps us determine if a pattern observed in the training data likely holds true for unseen data (generalizability).

how to set up hypothesis testing in statistics

Jaydeep is a software engineer with 10 years of experience, most recently developing and supporting applications written in Python. He has extensive with shell scripting and is also an AI/ML enthusiast. He is also a tech educator, creating content on Twitter, YouTube, Instagram, and LinkedIn. Link to his YouTube channel- https://www.youtube.com/@jaydeepkarale

See author's profile

Author Profile

Author’s Profile

linkedin

Got Questions? Drop them on LambdaTest Community. Visit now

how to set up hypothesis testing in statistics

Related Articles

Related Post

How to Use CSS Layouts For Responsive Websites

Author

Mbaziira Ronald

June 7, 2024

LambdaTest Experiments | Tutorial | Web Development |

Related Post

How to Wait in Python: Python Wait Tutorial With Examples

Author

Automation | Selenium Python | Tutorial |

Related Post

How to Effectively Use the CSS rgba() Function

Author

Onwuemene Joshua

June 6, 2024

Related Post

How to Get Element by Tag Name In Selenium

Author

Vipul Gupta

Automation | Selenium Tutorial | Tutorial |

Related Post

How to Build a DevOps Pipeline? A Complete Guide

Author

Chandrika Deb

June 5, 2024

Automation | CI/CD | DevOps |

Related Post

How To Run Selenium Test Scripts?

Author

Hari Sapna Nair

May 31, 2024

Try LambdaTest Now !!

Get 100 minutes of automation test minutes FREE!!

Join

Download Whitepaper

You'll get your download link by email.

Don't worry, we don't spam!

We use cookies to give you the best experience. Cookies help to provide a more personalized experience and relevant advertising for you, and web analytics for us. Learn More in our Cookies policy , Privacy & Terms of service .

Schedule Your Personal Demo ×

P-Value And Statistical Significance: What It Is & Why It Matters

Saul Mcleod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul Mcleod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Learn about our Editorial Process

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

The p-value in statistics quantifies the evidence against a null hypothesis. A low p-value suggests data is inconsistent with the null, potentially favoring an alternative hypothesis. Common significance thresholds are 0.05 or 0.01.

P-Value Explained in Normal Distribution

Hypothesis testing

When you perform a statistical test, a p-value helps you determine the significance of your results in relation to the null hypothesis.

The null hypothesis (H0) states no relationship exists between the two variables being studied (one variable does not affect the other). It states the results are due to chance and are not significant in supporting the idea being investigated. Thus, the null hypothesis assumes that whatever you try to prove did not happen.

The alternative hypothesis (Ha or H1) is the one you would believe if the null hypothesis is concluded to be untrue.

The alternative hypothesis states that the independent variable affected the dependent variable, and the results are significant in supporting the theory being investigated (i.e., the results are not due to random chance).

What a p-value tells you

A p-value, or probability value, is a number describing how likely it is that your data would have occurred by random chance (i.e., that the null hypothesis is true).

The level of statistical significance is often expressed as a p-value between 0 and 1.

The smaller the p -value, the less likely the results occurred by random chance, and the stronger the evidence that you should reject the null hypothesis.

Remember, a p-value doesn’t tell you if the null hypothesis is true or false. It just tells you how likely you’d see the data you observed (or more extreme data) if the null hypothesis was true. It’s a piece of evidence, not a definitive proof.

Example: Test Statistic and p-Value

Suppose you’re conducting a study to determine whether a new drug has an effect on pain relief compared to a placebo. If the new drug has no impact, your test statistic will be close to the one predicted by the null hypothesis (no difference between the drug and placebo groups), and the resulting p-value will be close to 1. It may not be precisely 1 because real-world variations may exist. Conversely, if the new drug indeed reduces pain significantly, your test statistic will diverge further from what’s expected under the null hypothesis, and the p-value will decrease. The p-value will never reach zero because there’s always a slim possibility, though highly improbable, that the observed results occurred by random chance.

P-value interpretation

The significance level (alpha) is a set probability threshold (often 0.05), while the p-value is the probability you calculate based on your study or analysis.

A p-value less than or equal to your significance level (typically ≤ 0.05) is statistically significant.

A p-value less than or equal to a predetermined significance level (often 0.05 or 0.01) indicates a statistically significant result, meaning the observed data provide strong evidence against the null hypothesis.

This suggests the effect under study likely represents a real relationship rather than just random chance.

For instance, if you set α = 0.05, you would reject the null hypothesis if your p -value ≤ 0.05. 

It indicates strong evidence against the null hypothesis, as there is less than a 5% probability the null is correct (and the results are random).

Therefore, we reject the null hypothesis and accept the alternative hypothesis.

Example: Statistical Significance

Upon analyzing the pain relief effects of the new drug compared to the placebo, the computed p-value is less than 0.01, which falls well below the predetermined alpha value of 0.05. Consequently, you conclude that there is a statistically significant difference in pain relief between the new drug and the placebo.

What does a p-value of 0.001 mean?

A p-value of 0.001 is highly statistically significant beyond the commonly used 0.05 threshold. It indicates strong evidence of a real effect or difference, rather than just random variation.

Specifically, a p-value of 0.001 means there is only a 0.1% chance of obtaining a result at least as extreme as the one observed, assuming the null hypothesis is correct.

Such a small p-value provides strong evidence against the null hypothesis, leading to rejecting the null in favor of the alternative hypothesis.

A p-value more than the significance level (typically p > 0.05) is not statistically significant and indicates strong evidence for the null hypothesis.

This means we retain the null hypothesis and reject the alternative hypothesis. You should note that you cannot accept the null hypothesis; we can only reject it or fail to reject it.

Note : when the p-value is above your threshold of significance,  it does not mean that there is a 95% probability that the alternative hypothesis is true.

One-Tailed Test

Probability and statistical significance in ab testing. Statistical significance in a b experiments

Two-Tailed Test

statistical significance two tailed

How do you calculate the p-value ?

Most statistical software packages like R, SPSS, and others automatically calculate your p-value. This is the easiest and most common way.

Online resources and tables are available to estimate the p-value based on your test statistic and degrees of freedom.

These tables help you understand how often you would expect to see your test statistic under the null hypothesis.

Understanding the Statistical Test:

Different statistical tests are designed to answer specific research questions or hypotheses. Each test has its own underlying assumptions and characteristics.

For example, you might use a t-test to compare means, a chi-squared test for categorical data, or a correlation test to measure the strength of a relationship between variables.

Be aware that the number of independent variables you include in your analysis can influence the magnitude of the test statistic needed to produce the same p-value.

This factor is particularly important to consider when comparing results across different analyses.

Example: Choosing a Statistical Test

If you’re comparing the effectiveness of just two different drugs in pain relief, a two-sample t-test is a suitable choice for comparing these two groups. However, when you’re examining the impact of three or more drugs, it’s more appropriate to employ an Analysis of Variance ( ANOVA) . Utilizing multiple pairwise comparisons in such cases can lead to artificially low p-values and an overestimation of the significance of differences between the drug groups.

How to report

A statistically significant result cannot prove that a research hypothesis is correct (which implies 100% certainty).

Instead, we may state our results “provide support for” or “give evidence for” our research hypothesis (as there is still a slight probability that the results occurred by chance and the null hypothesis was correct – e.g., less than 5%).

Example: Reporting the results

In our comparison of the pain relief effects of the new drug and the placebo, we observed that participants in the drug group experienced a significant reduction in pain ( M = 3.5; SD = 0.8) compared to those in the placebo group ( M = 5.2; SD  = 0.7), resulting in an average difference of 1.7 points on the pain scale (t(98) = -9.36; p < 0.001).

The 6th edition of the APA style manual (American Psychological Association, 2010) states the following on the topic of reporting p-values:

“When reporting p values, report exact p values (e.g., p = .031) to two or three decimal places. However, report p values less than .001 as p < .001.

The tradition of reporting p values in the form p < .10, p < .05, p < .01, and so forth, was appropriate in a time when only limited tables of critical values were available.” (p. 114)

  • Do not use 0 before the decimal point for the statistical value p as it cannot equal 1. In other words, write p = .001 instead of p = 0.001.
  • Please pay attention to issues of italics ( p is always italicized) and spacing (either side of the = sign).
  • p = .000 (as outputted by some statistical packages such as SPSS) is impossible and should be written as p < .001.
  • The opposite of significant is “nonsignificant,” not “insignificant.”

Why is the p -value not enough?

A lower p-value  is sometimes interpreted as meaning there is a stronger relationship between two variables.

However, statistical significance means that it is unlikely that the null hypothesis is true (less than 5%).

To understand the strength of the difference between the two groups (control vs. experimental) a researcher needs to calculate the effect size .

When do you reject the null hypothesis?

In statistical hypothesis testing, you reject the null hypothesis when the p-value is less than or equal to the significance level (α) you set before conducting your test. The significance level is the probability of rejecting the null hypothesis when it is true. Commonly used significance levels are 0.01, 0.05, and 0.10.

Remember, rejecting the null hypothesis doesn’t prove the alternative hypothesis; it just suggests that the alternative hypothesis may be plausible given the observed data.

The p -value is conditional upon the null hypothesis being true but is unrelated to the truth or falsity of the alternative hypothesis.

What does p-value of 0.05 mean?

If your p-value is less than or equal to 0.05 (the significance level), you would conclude that your result is statistically significant. This means the evidence is strong enough to reject the null hypothesis in favor of the alternative hypothesis.

Are all p-values below 0.05 considered statistically significant?

No, not all p-values below 0.05 are considered statistically significant. The threshold of 0.05 is commonly used, but it’s just a convention. Statistical significance depends on factors like the study design, sample size, and the magnitude of the observed effect.

A p-value below 0.05 means there is evidence against the null hypothesis, suggesting a real effect. However, it’s essential to consider the context and other factors when interpreting results.

Researchers also look at effect size and confidence intervals to determine the practical significance and reliability of findings.

How does sample size affect the interpretation of p-values?

Sample size can impact the interpretation of p-values. A larger sample size provides more reliable and precise estimates of the population, leading to narrower confidence intervals.

With a larger sample, even small differences between groups or effects can become statistically significant, yielding lower p-values. In contrast, smaller sample sizes may not have enough statistical power to detect smaller effects, resulting in higher p-values.

Therefore, a larger sample size increases the chances of finding statistically significant results when there is a genuine effect, making the findings more trustworthy and robust.

Can a non-significant p-value indicate that there is no effect or difference in the data?

No, a non-significant p-value does not necessarily indicate that there is no effect or difference in the data. It means that the observed data do not provide strong enough evidence to reject the null hypothesis.

There could still be a real effect or difference, but it might be smaller or more variable than the study was able to detect.

Other factors like sample size, study design, and measurement precision can influence the p-value. It’s important to consider the entire body of evidence and not rely solely on p-values when interpreting research findings.

Can P values be exactly zero?

While a p-value can be extremely small, it cannot technically be absolute zero. When a p-value is reported as p = 0.000, the actual p-value is too small for the software to display. This is often interpreted as strong evidence against the null hypothesis. For p values less than 0.001, report as p < .001

Further Information

  • P-values and significance tests (Kahn Academy)
  • Hypothesis testing and p-values (Kahn Academy)
  • Wasserstein, R. L., Schirm, A. L., & Lazar, N. A. (2019). Moving to a world beyond “ p “< 0.05”.
  • Criticism of using the “ p “< 0.05”.
  • Publication manual of the American Psychological Association
  • Statistics for Psychology Book Download

Bland, J. M., & Altman, D. G. (1994). One and two sided tests of significance: Authors’ reply.  BMJ: British Medical Journal ,  309 (6958), 874.

Goodman, S. N., & Royall, R. (1988). Evidence and scientific research.  American Journal of Public Health ,  78 (12), 1568-1574.

Goodman, S. (2008, July). A dirty dozen: twelve p-value misconceptions . In  Seminars in hematology  (Vol. 45, No. 3, pp. 135-140). WB Saunders.

Lang, J. M., Rothman, K. J., & Cann, C. I. (1998). That confounded P-value.  Epidemiology (Cambridge, Mass.) ,  9 (1), 7-8.

Print Friendly, PDF & Email

Related Articles

Exploratory Data Analysis

Exploratory Data Analysis

What Is Face Validity In Research? Importance & How To Measure

Research Methodology , Statistics

What Is Face Validity In Research? Importance & How To Measure

Criterion Validity: Definition & Examples

Criterion Validity: Definition & Examples

Convergent Validity: Definition and Examples

Convergent Validity: Definition and Examples

Content Validity in Research: Definition & Examples

Content Validity in Research: Definition & Examples

Construct Validity In Psychology Research

Construct Validity In Psychology Research

What is a scientific hypothesis?

It's the initial building block in the scientific method.

A girl looks at plants in a test tube for a science experiment. What's her scientific hypothesis?

Hypothesis basics

What makes a hypothesis testable.

  • Types of hypotheses
  • Hypothesis versus theory

Additional resources

Bibliography.

A scientific hypothesis is a tentative, testable explanation for a phenomenon in the natural world. It's the initial building block in the scientific method . Many describe it as an "educated guess" based on prior knowledge and observation. While this is true, a hypothesis is more informed than a guess. While an "educated guess" suggests a random prediction based on a person's expertise, developing a hypothesis requires active observation and background research. 

The basic idea of a hypothesis is that there is no predetermined outcome. For a solution to be termed a scientific hypothesis, it has to be an idea that can be supported or refuted through carefully crafted experimentation or observation. This concept, called falsifiability and testability, was advanced in the mid-20th century by Austrian-British philosopher Karl Popper in his famous book "The Logic of Scientific Discovery" (Routledge, 1959).

A key function of a hypothesis is to derive predictions about the results of future experiments and then perform those experiments to see whether they support the predictions.

A hypothesis is usually written in the form of an if-then statement, which gives a possibility (if) and explains what may happen because of the possibility (then). The statement could also include "may," according to California State University, Bakersfield .

Here are some examples of hypothesis statements:

  • If garlic repels fleas, then a dog that is given garlic every day will not get fleas.
  • If sugar causes cavities, then people who eat a lot of candy may be more prone to cavities.
  • If ultraviolet light can damage the eyes, then maybe this light can cause blindness.

A useful hypothesis should be testable and falsifiable. That means that it should be possible to prove it wrong. A theory that can't be proved wrong is nonscientific, according to Karl Popper's 1963 book " Conjectures and Refutations ."

An example of an untestable statement is, "Dogs are better than cats." That's because the definition of "better" is vague and subjective. However, an untestable statement can be reworded to make it testable. For example, the previous statement could be changed to this: "Owning a dog is associated with higher levels of physical fitness than owning a cat." With this statement, the researcher can take measures of physical fitness from dog and cat owners and compare the two.

Types of scientific hypotheses

Elementary-age students study alternative energy using homemade windmills during public school science class.

In an experiment, researchers generally state their hypotheses in two ways. The null hypothesis predicts that there will be no relationship between the variables tested, or no difference between the experimental groups. The alternative hypothesis predicts the opposite: that there will be a difference between the experimental groups. This is usually the hypothesis scientists are most interested in, according to the University of Miami .

For example, a null hypothesis might state, "There will be no difference in the rate of muscle growth between people who take a protein supplement and people who don't." The alternative hypothesis would state, "There will be a difference in the rate of muscle growth between people who take a protein supplement and people who don't."

If the results of the experiment show a relationship between the variables, then the null hypothesis has been rejected in favor of the alternative hypothesis, according to the book " Research Methods in Psychology " (​​BCcampus, 2015). 

There are other ways to describe an alternative hypothesis. The alternative hypothesis above does not specify a direction of the effect, only that there will be a difference between the two groups. That type of prediction is called a two-tailed hypothesis. If a hypothesis specifies a certain direction — for example, that people who take a protein supplement will gain more muscle than people who don't — it is called a one-tailed hypothesis, according to William M. K. Trochim , a professor of Policy Analysis and Management at Cornell University.

Sometimes, errors take place during an experiment. These errors can happen in one of two ways. A type I error is when the null hypothesis is rejected when it is true. This is also known as a false positive. A type II error occurs when the null hypothesis is not rejected when it is false. This is also known as a false negative, according to the University of California, Berkeley . 

A hypothesis can be rejected or modified, but it can never be proved correct 100% of the time. For example, a scientist can form a hypothesis stating that if a certain type of tomato has a gene for red pigment, that type of tomato will be red. During research, the scientist then finds that each tomato of this type is red. Though the findings confirm the hypothesis, there may be a tomato of that type somewhere in the world that isn't red. Thus, the hypothesis is true, but it may not be true 100% of the time.

Scientific theory vs. scientific hypothesis

The best hypotheses are simple. They deal with a relatively narrow set of phenomena. But theories are broader; they generally combine multiple hypotheses into a general explanation for a wide range of phenomena, according to the University of California, Berkeley . For example, a hypothesis might state, "If animals adapt to suit their environments, then birds that live on islands with lots of seeds to eat will have differently shaped beaks than birds that live on islands with lots of insects to eat." After testing many hypotheses like these, Charles Darwin formulated an overarching theory: the theory of evolution by natural selection.

"Theories are the ways that we make sense of what we observe in the natural world," Tanner said. "Theories are structures of ideas that explain and interpret facts." 

  • Read more about writing a hypothesis, from the American Medical Writers Association.
  • Find out why a hypothesis isn't always necessary in science, from The American Biology Teacher.
  • Learn about null and alternative hypotheses, from Prof. Essa on YouTube .

Encyclopedia Britannica. Scientific Hypothesis. Jan. 13, 2022. https://www.britannica.com/science/scientific-hypothesis

Karl Popper, "The Logic of Scientific Discovery," Routledge, 1959.

California State University, Bakersfield, "Formatting a testable hypothesis." https://www.csub.edu/~ddodenhoff/Bio100/Bio100sp04/formattingahypothesis.htm  

Karl Popper, "Conjectures and Refutations," Routledge, 1963.

Price, P., Jhangiani, R., & Chiang, I., "Research Methods of Psychology — 2nd Canadian Edition," BCcampus, 2015.‌

University of Miami, "The Scientific Method" http://www.bio.miami.edu/dana/161/evolution/161app1_scimethod.pdf  

William M.K. Trochim, "Research Methods Knowledge Base," https://conjointly.com/kb/hypotheses-explained/  

University of California, Berkeley, "Multiple Hypothesis Testing and False Discovery Rate" https://www.stat.berkeley.edu/~hhuang/STAT141/Lecture-FDR.pdf  

University of California, Berkeley, "Science at multiple levels" https://undsci.berkeley.edu/article/0_0_0/howscienceworks_19

Sign up for the Live Science daily newsletter now

Get the world’s most fascinating discoveries delivered straight to your inbox.

What's the difference between a rock and a mineral?

Earth from space: Mysterious, slow-spinning cloud 'cyclone' hugs the Iberian coast

4,000-year-old 'Seahenge' in UK was built to 'extend summer,' archaeologist suggests

Most Popular

  • 2 Rare fungal STI spotted in US for the 1st time
  • 3 James Webb telescope finds carbon at the dawn of the universe, challenging our understanding of when life could have emerged
  • 4 Neanderthals and humans interbred 47,000 years ago for nearly 7,000 years, research suggests
  • 5 Noise-canceling headphones can use AI to 'lock on' to somebody when they speak and drown out all other noises
  • 2 Bornean clouded leopard family filmed in wild for 1st time ever
  • 3 What is the 3-body problem, and is it really unsolvable?
  • 4 7 potential 'alien megastructures' spotted in our galaxy are not what they seem

how to set up hypothesis testing in statistics

Advertisement

Supported by

U.S. Hiring Rises Strongly, Along With Wages

Hiring was unexpectedly robust in May, with a gain of 272,000 jobs, but it wasn’t all good news: The unemployment rate ticked up, to 4 percent.

  • Share full article

Monthly change in jobs

+272,000 jobs in May

Lydia DePillis

By Lydia DePillis

The U.S. economy keeps throwing curveballs, and the May employment report is the latest example.

Employers added 272,000 jobs last month, the Labor Department reported on Friday, well above what economists had expected as hiring had gradually slowed. That’s an increase from the 232,000-job average over the previous 12 months, scrambling the picture of an economy that’s relaxing into a more sustainable pace.

Most concerning for the Federal Reserve, which meets next week and again in July, wages rose 4.1 percent from a year ago — a sign that inflation might not yet be vanquished.

“For those who may have thought they would see a July rate cut, that door has largely been shut,” said Beth Ann Bovino, chief U.S. economist for U.S. Bank. Although wage gains are good for workers, she noted, persistent price increases undermine their spending power.

Stocks fell shortly after the report was published, then recovered most of their losses by the end of the day. Government bond yields, which track expectations for Fed rate moves, rose sharply and remained elevated through the trading day.

Wage growth ticked up in May

Year-over-year percentage change in earnings vs. inflation

+4.1% in May

+3.4% in April

Consumer Price Index

Avg. hourly earnings

But the portrait of an accelerating labor market isn’t perfectly clear, either. In another part of the report, the unemployment rate ticked up to 4 percent, its highest point since January 2022. That number is drawn from a survey of households, which showed essentially no employment growth for the past year and rising part-time employment that had displaced full-time positions.

The education and health sector gained the most jobs

Change in jobs in May 2024, by sector

Education and health

+86,000 jobs

Leisure and hospitality

Business services

Construction

Manufacturing

We are having trouble retrieving the article content.

Please enable JavaScript in your browser settings.

Thank you for your patience while we verify access. If you are in Reader mode please exit and  log into  your Times account, or  subscribe  for all of The Times.

Thank you for your patience while we verify access.

Already a subscriber?  Log in .

Want all of The Times?  Subscribe .

A general passion

May 30, 2024

A general passion

Image credit: pgen.1011291

research article

Conserved signalling functions for Mps1, Mad1 and Mad2 in the Cryptococcus neoformans spindle checkpoint

Mps1-dependent phosphorylation of C-terminal Mad1 residues is a critical step in Cryptococcus spindle checkpoint signalling. 

Image credit: pgen.1011302

Conserved signalling functions for Mps1, Mad1 and Mad2 in the Cryptococcus neoformans spindle checkpoint

Recently Published Articles

  • Myosin II mediates Shh signals to shape dental epithelia via control of cell adhesion and movement
  • LINC03045 regulating glioblastoma invasion">CRISPRi screen of long non-coding RNAs identifies LINC03045 regulating glioblastoma invasion
  • An eQTL-based approach reveals candidate regulators of LINE-1 RNA levels in lymphoblastoid cells

Current Issue

Current Issue May 2024

Adaptations to nitrogen availability drive ecological divergence of chemosynthetic symbionts

The importance of nitrogen availability in driving the ecological diversification of chemosynthetic symbiont species and the role that bacterial symbionts may play in the adaptation of marine organisms to changing environmental conditions.

Image credit: pgen.1011295

Adaptations to nitrogen availability drive ecological divergence of chemosynthetic symbionts

Paramutation at the maize pl1 locus is associated with RdDM activity at distal tandem repeats

pl1 paramutation depends on trans-chromosomal RNA-directed DNA methylation operating at a discrete cis-linked and copy-number-dependent transcriptional regulatory element.

Image credit: pgen.1011296

Paramutation at the maize pl1 locus is associated with RdDM activity at distal tandem repeats

Research Article

Genomic analyses of Symbiomonas scintillans show no evidence for endosymbiotic bacteria but does reveal the presence of giant viruses

A multi-gene tree showed the three SsV genome types branched within highly supported clades with each of BpV2, OlVs, and MpVs, respectively.

Genomic analyses of Symbiomonas scintillans show no evidence for endosymbiotic bacteria but does reveal the presence of giant viruses

Image credit: pgen.1011218

A natural bacterial pathogen of C . elegans uses a small RNA to induce transgenerational inheritance of learned avoidance

A mechanism of learning and remembering pathogen avoidance likely happens in the wild. 

A natural bacterial pathogen of C. elegans uses a small RNA to induce transgenerational inheritance of learned avoidance

Image credit: pgen.1011178

Spoink , a LTR retrotransposon, invaded D. melanogaster populations in the 1990s

Evidence of Spoink retrotransposon's horizontal transfer into D. melanogaster populations post-1993, suggesting its origin from D.willistoni .

Spoink, a LTR retrotransposon, invaded D. melanogaster populations in the 1990s

Image credit: pgen.1011201

Comparison of clinical geneticist and computer visual attention in assessing genetic conditions

Understanding AI, specifically Deep Learning, in facial diagnostics for genetic conditions can enhance the design and utilization of AI tools.

Comparison of clinical geneticist and computer visual attention in assessing genetic conditions

Image credit: pgen.1011168

Maintenance of proteostasis by Drosophila Rer1 is essential for competitive cell survival and Myc-driven overgrowth

Loss of Rer1 induces proteotoxic stress, leading to cell competition and elimination ...

Maintenance of proteostasis by Drosophila Rer1 is essential for competitive cell survival and Myc-driven overgrowth

Image credit: pgen.1011171

Anthracyclines induce cardiotoxicity through a shared gene expression response signature

TOP2i induce thousands of shared gene expression changes in cardiomyocytes.

Anthracyclines induce cardiotoxicity through a shared gene expression response signature

Image credit: pgen.1011164

New PLOS journals accepting submissions

Five new journals unified in addressing global health and environmental challenges are now ready to receive submissions: PLOS Climate , PLOS Sustainability and Transformation , PLOS Water , PLOS Digital Health , and PLOS Global Public Health

COVID-19 Collection

The COVID-19 Collection highlights all content published across the PLOS journals relating to the COVID-19 pandemic.

Submit your Lab and Study Protocols to PLOS ONE !

PLOS ONE is now accepting submissions of Lab Protocols, a peer-reviewed article collaboration with protocols.io, and Study Protocols, an article that credits the work done prior to producing and publishing results.

PLOS Reviewer Center

A collection of free training and resources for peer reviewers of PLOS journals—and for the peer review community more broadly—drawn from research and interviews with staff editors, editorial board members, and experienced reviewers.

Ten Simple Rules

PLOS Computational Biology 's "Ten Simple Rules" articles provide quick, concentrated guides for mastering some of the professional challenges research scientists face in their careers.

Welcome New Associate Editors!

PLOS Genetics welcomes several new Associate Editors to our board: Nicolas Bierne, Julie Simpson, Yun Li, Hongbin Ji, Hongbing Zhang, Bertrand Servin, & Benjamin Schwessinger

Expanding human variation at PLOS Genetics

The former Natural Variation section at PLOS Genetics relaunches as Human Genetic Variation and Disease. Read the editors' reasoning behind this change.

PLOS Genetics welcomes new Section Editors

Quanjiang Ji (ShanghaiTech University) joined the editorial board and Xiaofeng Zhu (Case Western Reserve University) was promoted as new Section Editors for the PLOS Genetics Methods section.

PLOS Genetics editors elected to National Academy of Sciences

Congratulations to Associate Editor Michael Lichten and Consulting Editor Nicole King, who are newly elected members of the National Academy of Sciences.

Harmit Malik receives Novitski Prize

Congratulations to Associate Editor Harmit Malik, who was awarded the Edward Novitski Prize by the Genetics Society of America for his work on genetic conflict. Harmit has also been elected as a new member of the American Academy of Arts & Sciences.

Publish with PLOS

  • Submission Instructions
  • Submit Your Manuscript

Connect with Us

  • PLOS Genetics on Twitter
  • PLOS on Facebook

Get new content from PLOS Genetics in your inbox

Thank you you have successfully subscribed to the plos genetics newsletter., sorry, an error occurred while sending your subscription. please try again later..

  • How it Works

Call us: +1 – 732 510 0607

Email: [email protected]

Linkedin / Twitter

Hypothesis Testing in Statistics: Step by Step with Examples

Hypothesis testing is the act of statistically evaluating a belief or theory. Hypothesis testing is the process of testing your theory using data from the real world obtained either through observation or experiments. Hypothesis testing is the step-by-step process of analyzing empirical data to check if it differs from the expected numbers if the belief or theory you started with was true.

This article walks you through the hypothesis testing concept and lists the process of hypothesis testing step by step.

To illustrate the concept and show you the hypothesis testing process with a example, we evaluate a belief that the companies in the Russell 3000 grow at a rate greater than 10% per year.

Here is a list of subtopics if you want to jump ahead:

Hypothesis Testing: Step by Step

Structuring the hypothesis test: the null and alternate hypothesis, the null hypothesis.

  • The Alternate Hypthesis
  • Significance level
  • Sample Size and Sampling to get the test statistic

Setting up the Critical Value & Reject Regions

Computing the test statistic.

  • Comparing the Test Statistic Vs. the Critical Values

Concluding the Hypothesis Test

A hypothesis test and a criminal trial: similarities, the sampling distribution, reject region in hypothesis testing, some facts on the null hypothesis, some facts on the alternate hypothesis.

If you already know the concept of hypothesis testing concept and you only need to follow the step-by-step process outlined below.

  • State the null hypothesis
  • State the alternate hypothesis
  • Decide on the level of significance
  • Choose the sample size
  • Determine the statistical technique
  • Set up the critical values to identify the reject region and non-reject region
  • Collect the data sample and compute sample parameters & Test statistic
  • Compare sample/test statistic with critical value/reject or non-reject region.
  • Make your conclusion clear.

List of Topics

A hypothesis test starts with a hypothesis that you want to test. It is designed as a statement or belief that you are examining. This statement or belief is termed the null hypothesis. The null hypothesis is what the hypothesis test is evaluating.

The Alternate Hypothesis

The opposite of the null hypothesis is called an alternate hypothesis. We are not examining the alternate hypothesis. Instead, the alternate hypothesis is what remains if the null hypothesis is rejected after being examined.

We will talk more about designing the null and alternate hypotheses later. Remember that we place what we want to prove in the alternate hypothesis. And we put the opposite of what we want to prove in the null hypothesis.

To continue our example, we will place what we believe to be true (mean growth rate is great than 10%) in the alternate hypothesis. And the opposite of the alternate hypothesis (mean growth rate is less than or equal to 10%) in the null hypothesis. Accordingly, we will have the following null and alternate hypotheses for our example: Ho: Mean growth rate <= 10% Ha: Mean growth rate > 10%

If we reject the null hypothesis, we will be concluding that the alternate hypothesis stands. On the other hand, if the evidence does not provide evidence to reject the null hypothesis, we can only conclude that we cannot reject the null hypothesis. In other words, we have not proven the alternate hypothesis. We conclude that we cannot reject the null hypothesis and therefore make no claim to have proven the alternate hypothesis or our starting theory or belief!

Significance Level

In hypothesis testing, the evidence required is gathered from a sample of the relevant population. Then, the parameter of interest from the sample is computed and referred to as the test statistic. This test statistic informs us about the null hypothesis.

Even if the null hypothesis is true, the test statistic is unlikely to be exactly equal to the parameter of interest of the true population because we are basing our test statistic on a sample of the population! A sample is only an unbiased estimator and not the actual population parameter. However, if the null hypothesis is true, the test statistic is likely to be close to the null hypothesis value, and likely agree with the null hypothesis. How close should it be? Or how far away from the null hypothesis value should the test statistic be before we can conclude that the null hypothesis is not true and “can be rejected”?

This is where the significance level comes into play. The significance level is the level of certainty required to reject the null hypothesis. The most commonly used significance levels are 1%, 5%, or 10% in practice. The significance level should be determined by the type of errors we are willing to tolerate (type 1 or type 2 errors).

We will use a 5% level of significance in our example today.

Significance level helps us determine the point beyond which we say that the null hypothesis is not true and “can be rejected”!

Best practice dictates that the critical value must be set up at the design stage and before the hypothesis test is done. The critical value is based on two factors. 1) the sampling distribution and 2) significance levels.

Sampling Distribution

The sampling distribution is a distribution of sample values we can expect if the null hypothesis were true. Theoretically, the sample distribution is the distribution we would get if we took all possible samples that covered the entire population. The reason the sample distribution is central to hypothesis testing is that the mean of the sample distribution will equal the mean of the true population. So we use the sample distribution to evaluate the sample test statistic and check if our data agree with the null hypothesis.

If our null hypothesis is true, the test statistic will lie close to the middle of the sampling distribution. However, if our null hypothesis is NOT true, the test statistic will likely be closer to the tails of the sampling distribution.

To make a firm decision, we need a point beyond which we say that the null hypothesis is not true. That point is referred to as the critical value. The region beyond the critical value is referred to as the critical region or the reject region. If the test statistic falls in this region, we reject the null hypothesis. We conclude that the alternate hypothesis is true.

In our example, we are looking for a 5% confidence level. Therefore the critical value and reject region will be computed using a 5% confidence level. The critical value and reject region can be computed using the Z table, Microsoft Excel or another software program.

In Microsoft Excel we use the =NORM.S.INV(0.95) for a single tail critical value of 1.645 as the z value. We can use the Z table to arrive at the same value too.

how to set up hypothesis testing in statistics

Once we have the critical value, we run the experiment or gather sample data. Then, we analyze the sample data and compute the sample parameter of interest.

In our example, we randomly sample __ companies of the Russell 3000. We compute the average growth rates of the sample. We then compute the test statistic using this formula.

how to set up hypothesis testing in statistics

Comparing the Test Statistic and the Critical Value

We compare the sample parameter of interest with the critical value/critical region. We are essentially checking if the test statistic falls in the reject region.

We are ready to conclude the hypothesis test only when we have the sample parameter of interest and the critical value at hand. We check if the parameter of interest falls in the critical regions identified in the earlier step.

how to set up hypothesis testing in statistics

In our example, we can see that the test statistic falls in the reject region.

If the parameter of interest falls in the critical regions, we reject the null hypothesis. Only when we reject the null hypothesis can we conclude that we believe the alternate hypothesis!

In our example, we can conclude that we reject the null hypothesis as the test statistic falls in the reject region. Because we reject the null hypothesis, we can say we believe the alternate hypothesis is true. And we conclude that the growth rate of companies of the Russell 3000 is greater than 10% per year!

A hypothesis test is often compared to and explained as a criminal trial. In a criminal trial, we start with the belief “innocent until proven guilty.” Similarly, in hypothesis testing, we assume that the null hypothesis is true. Therefore, we need to present data to disprove the null hypothesis. That is why we say that hypothesis testing is a trial of the null hypothesis. It is not the alternate hypothesis we are testing! The null hypothesis is similar to the criminal defendant. The data scientist is similar to the prosecutor. It is the prosecutor’s job to prove that the criminal is guilty. The prosecutor or the data scientist/researcher examines the data to present evidence that the null hypothesis is not true. Only if the researcher presents data to prove the null hypothesis is not true, can we conclude that that alternate hypothesis is true. If we do not have evidence to prove the criminal is guilty, he escapes conviction. It does not mean he is truly innocent. It only means that he was not found guilty. Similarly, if we do not have evidence to reject the null hypothesis we can only conclude that we cannot reject the null hypothesis.

  • The null hypothesis is the current belief.
  • You are examining or testing the null hypothesis.
  • The null hypothesis refers to a specific parameter/value of the true population (not the sample parameter)
  • The null hypothesis contains the “equal to” parameter
  • If you reject the null hypothesis, you have statistical proof that the alternate hypothesis is true.
  • Failure to reject the null hypothesis does not mean you have statistical proof that the null hypothesis is true.
  • The alternate hypothesis is what the researcher wants to prove statistically.
  • The alternate hypothesis is the opposite of the null hypothesis.
  • The failure to prove the alternate hypothesis does not mean that you have proven the null hypothesis.
  • The alternate hypothesis usually does not contain the “equal to” parameter.

Sample Size and Sampling to get the Test Statistic

We are looking for evidence that the null hypothesis is not true and “can be rejected”. This evidence is provided by a sample. How should this sample be gathered? How large should the sample be to provide this evidence? The sample must be carefully selected to be representative of the true population of interest. A random sample is best. Other sampling methods include cluster sampling, cluster sampling, stratified sampling, convenience sampling, etc. Each has its advantages and disadvantages, which we will not go into here.

Selecting the sample size is important in hypothesis testing. The sample size chosen impacts the risk of Type I and Type 2 errors. The sample size also directly determines the confidence levels and the power of the test. The sample size formula can be resorted to arrive at the sample size.

Hypothesis Testing Tutoring

Please do let us know if we can help you with tutoring for hypothesis testing. Our statistics tutors will be happy to meet with you one on one to help you understand and design and perform a hypothesis test.

Sign Up Now

Recent Posts

  • Understanding a Bank’s Financial Statements
  • GraduateTutor.com Announces Uninterrupted Tutoring Services for Columbia Business School Students Amid Campus Disruptions
  • MBA Tutoring for B8007 Financial Planning & Analysis (Cost Accounting) at Columbia Business School
  • Tutoring for B8306 Capital Markets and Investments at Columbia Business School
  • MBA Tutoring for Financial Accounting B6001 at Columbia Business School

GraduateTutor.com Forest View Drive Avenel, NJ 07001 Call us:+1 – (732) 510-0607, E-mail: [email protected] Privacy , FAQ

Core Areas of Tutoring

Sign up now.

how to set up hypothesis testing in statistics

Article Categories

Book categories, collections.

  • Academics & The Arts Articles
  • Math Articles
  • Statistics Articles

How to Set Up a Hypothesis Test: Null versus Alternative

Statistics for dummies.

Book image

Sign up for the Dummies Beta Program to try Dummies' newest way to learn.

When you set up a hypothesis test to determine the validity of a statistical claim, you need to define both a null hypothesis and an alternative hypothesis.

Typically in a hypothesis test, the claim being made is about a population parameter (one number that characterizes the entire population). Because parameters tend to be unknown quantities, everyone wants to make claims about what their values may be. For example, the claim that 25% (or 0.25) of all women have varicose veins is a claim about the proportion (that’s the parameter ) of all women (that’s the population ) who have varicose veins (that’s the variable — having or not having varicose veins).

Researchers often challenge claims about population parameters. You may hypothesize, for example, that the actual proportion of women who have varicose veins is lower than 0.25, based on your observations. Or you may hypothesize that due to the popularity of high heeled shoes, the proportion may be higher than 0.25. Or if you’re simply questioning whether the actual proportion is 0.25, your alternative hypothesis is: “No, it isn’t 0.25.”

How to define a null hypothesis

Every hypothesis test contains a set of two opposing statements, or hypotheses, about a population parameter. The first hypothesis is called the null hypothesis, denoted H 0 . The null hypothesis always states that the population parameter is equal to the claimed value. For example, if the claim is that the average time to make a name-brand ready-mix pie is five minutes, the statistical shorthand notation for the null hypothesis in this case would be as follows:

image0.png

(That is, the population mean is 5 minutes.)

All null hypotheses include an equal sign in them.

How to define an alternative hypothesis

Before actually conducting a hypothesis test, you have to put two possible hypotheses on the table — the null hypothesis is one of them. But, if the null hypothesis is rejected (that is, there was sufficient evidence against it), what’s your alternative going to be? Actually, three possibilities exist for the second (or alternative) hypothesis, denoted H a . Here they are, along with their shorthand notations in the context of the pie example:

The population parameter is not equal to the claimed value

image1.png

The population parameter is greater than the claimed value

image2.png

The population parameter is less than the claimed value

image3.png

Which alternative hypothesis you choose in setting up your hypothesis test depends on what you’re interested in concluding, should you have enough evidence to refute the null hypothesis (the claim). The alternative hypothesis should be decided upon before collecting or looking at any data, so as not to influence the results.

For example, if you want to test whether a company is correct in claiming its pie takes five minutes to make and it doesn’t matter whether the actual average time is more or less than that, you use the not-equal-to alternative. Your hypotheses for that test would be

image4.png

If you only want to see whether the time turns out to be greater than what the company claims (that is, whether the company is falsely advertising its quick prep time), you use the greater-than alternative, and your two hypotheses are

image5.png

Finally, say you work for the company marketing the pie, and you think the pie can be made in less than five minutes (and could be marketed by the company as such). The less-than alternative is the one you want, and your two hypotheses would be

image6.png

How do you know which hypothesis to put in H 0 and which one to put in H a ? Typically, the null hypothesis says that nothing new is happening; the previous result is the same now as it was before, or the groups have the same average (their difference is equal to zero). In general, you assume that people’s claims are true until proven otherwise. So the question becomes: Can you prove otherwise? In other words, can you show sufficient evidence to reject H 0 ?

About This Article

This article is from the book:.

  • Statistics For Dummies ,

About the book author:

Deborah J. Rumsey , PhD, is an Auxiliary Professor and Statistics Education Specialist at The Ohio State University. She is the author of Statistics For Dummies, Statistics II For Dummies, Statistics Workbook For Dummies, and Probability For Dummies.

This article can be found in the category:

  • Statistics ,
  • Statistics For Dummies Cheat Sheet
  • Checking Out Statistical Confidence Interval Critical Values
  • Handling Statistical Hypothesis Tests
  • Statistically Figuring Sample Size
  • Surveying Statistical Confidence Intervals
  • View All Articles From Book

Statistics Tutorial

Descriptive statistics, inferential statistics, stat reference, statistics - hypothesis testing.

Hypothesis testing is a formal way of checking if a hypothesis about a population is true or not.

Hypothesis Testing

A hypothesis is a claim about a population parameter .

A hypothesis test is a formal procedure to check if a hypothesis is true or not.

Examples of claims that can be checked:

The average height of people in Denmark is more than 170 cm.

The share of left handed people in Australia is not 10%.

The average income of dentists is less the average income of lawyers.

The Null and Alternative Hypothesis

Hypothesis testing is based on making two different claims about a population parameter.

The null hypothesis (\(H_{0} \)) and the alternative hypothesis (\(H_{1}\)) are the claims.

The two claims needs to be mutually exclusive , meaning only one of them can be true.

The alternative hypothesis is typically what we are trying to prove.

For example, we want to check the following claim:

"The average height of people in Denmark is more than 170 cm."

In this case, the parameter is the average height of people in Denmark (\(\mu\)).

The null and alternative hypothesis would be:

Null hypothesis : The average height of people in Denmark is 170 cm.

Alternative hypothesis : The average height of people in Denmark is more than 170 cm.

The claims are often expressed with symbols like this:

\(H_{0}\): \(\mu = 170 \: cm \)

\(H_{1}\): \(\mu > 170 \: cm \)

If the data supports the alternative hypothesis, we reject the null hypothesis and accept the alternative hypothesis.

If the data does not support the alternative hypothesis, we keep the null hypothesis.

Note: The alternative hypothesis is also referred to as (\(H_{A} \)).

The Significance Level

The significance level (\(\alpha\)) is the uncertainty we accept when rejecting the null hypothesis in the hypothesis test.

The significance level is a percentage probability of accidentally making the wrong conclusion.

Typical significance levels are:

  • \(\alpha = 0.1\) (10%)
  • \(\alpha = 0.05\) (5%)
  • \(\alpha = 0.01\) (1%)

A lower significance level means that the evidence in the data needs to be stronger to reject the null hypothesis.

There is no "correct" significance level - it only states the uncertainty of the conclusion.

Note: A 5% significance level means that when we reject a null hypothesis:

We expect to reject a true null hypothesis 5 out of 100 times.

Advertisement

The Test Statistic

The test statistic is used to decide the outcome of the hypothesis test.

The test statistic is a standardized value calculated from the sample.

Standardization means converting a statistic to a well known probability distribution .

The type of probability distribution depends on the type of test.

Common examples are:

  • Standard Normal Distribution (Z): used for Testing Population Proportions
  • Student's T-Distribution (T): used for Testing Population Means

Note: You will learn how to calculate the test statistic for each type of test in the following chapters.

The Critical Value and P-Value Approach

There are two main approaches used for hypothesis tests:

  • The critical value approach compares the test statistic with the critical value of the significance level.
  • The p-value approach compares the p-value of the test statistic and with the significance level.

The Critical Value Approach

The critical value approach checks if the test statistic is in the rejection region .

The rejection region is an area of probability in the tails of the distribution.

The size of the rejection region is decided by the significance level (\(\alpha\)).

The value that separates the rejection region from the rest is called the critical value .

Here is a graphical illustration:

If the test statistic is inside this rejection region, the null hypothesis is rejected .

For example, if the test statistic is 2.3 and the critical value is 2 for a significance level (\(\alpha = 0.05\)):

We reject the null hypothesis (\(H_{0} \)) at 0.05 significance level (\(\alpha\))

The P-Value Approach

The p-value approach checks if the p-value of the test statistic is smaller than the significance level (\(\alpha\)).

The p-value of the test statistic is the area of probability in the tails of the distribution from the value of the test statistic.

If the p-value is smaller than the significance level, the null hypothesis is rejected .

The p-value directly tells us the lowest significance level where we can reject the null hypothesis.

For example, if the p-value is 0.03:

We reject the null hypothesis (\(H_{0} \)) at a 0.05 significance level (\(\alpha\))

We keep the null hypothesis (\(H_{0}\)) at a 0.01 significance level (\(\alpha\))

Note: The two approaches are only different in how they present the conclusion.

Steps for a Hypothesis Test

The following steps are used for a hypothesis test:

  • Check the conditions
  • Define the claims
  • Decide the significance level
  • Calculate the test statistic

One condition is that the sample is randomly selected from the population.

The other conditions depends on what type of parameter you are testing the hypothesis for.

Common parameters to test hypotheses are:

  • Proportions (for qualitative data)
  • Mean values (for numerical data)

You will learn the steps for both types in the following pages.

Get Certified

COLOR PICKER

colorpicker

Contact Sales

If you want to use W3Schools services as an educational institution, team or enterprise, send us an e-mail: [email protected]

Report Error

If you want to report an error, or if you want to make a suggestion, send us an e-mail: [email protected]

Top Tutorials

Top references, top examples, get certified.

Statology

Statistics Made Easy

How to Write Hypothesis Test Conclusions (With Examples)

A   hypothesis test is used to test whether or not some hypothesis about a population parameter is true.

To perform a hypothesis test in the real world, researchers obtain a random sample from the population and perform a hypothesis test on the sample data, using a null and alternative hypothesis:

  • Null Hypothesis (H 0 ): The sample data occurs purely from chance.
  • Alternative Hypothesis (H A ): The sample data is influenced by some non-random cause.

If the p-value of the hypothesis test is less than some significance level (e.g. α = .05), then we reject the null hypothesis .

Otherwise, if the p-value is not less than some significance level then we fail to reject the null hypothesis .

When writing the conclusion of a hypothesis test, we typically include:

  • Whether we reject or fail to reject the null hypothesis.
  • The significance level.
  • A short explanation in the context of the hypothesis test.

For example, we would write:

We reject the null hypothesis at the 5% significance level.   There is sufficient evidence to support the claim that…

Or, we would write:

We fail to reject the null hypothesis at the 5% significance level.   There is not sufficient evidence to support the claim that…

The following examples show how to write a hypothesis test conclusion in both scenarios.

Example 1: Reject the Null Hypothesis Conclusion

Suppose a biologist believes that a certain fertilizer will cause plants to grow more during a one-month period than they normally do, which is currently 20 inches. To test this, she applies the fertilizer to each of the plants in her laboratory for one month.

She then performs a hypothesis test at a 5% significance level using the following hypotheses:

  • H 0 : μ = 20 inches (the fertilizer will have no effect on the mean plant growth)
  • H A : μ > 20 inches (the fertilizer will cause mean plant growth to increase)

Suppose the p-value of the test turns out to be 0.002.

Here is how she would report the results of the hypothesis test:

We reject the null hypothesis at the 5% significance level.   There is sufficient evidence to support the claim that this particular fertilizer causes plants to grow more during a one-month period than they normally do.

Example 2: Fail to Reject the Null Hypothesis Conclusion

Suppose the manager of a manufacturing plant wants to test whether or not some new method changes the number of defective widgets produced per month, which is currently 250. To test this, he measures the mean number of defective widgets produced before and after using the new method for one month.

He performs a hypothesis test at a 10% significance level using the following hypotheses:

  • H 0 : μ after = μ before (the mean number of defective widgets is the same before and after using the new method)
  • H A : μ after ≠ μ before (the mean number of defective widgets produced is different before and after using the new method)

Suppose the p-value of the test turns out to be 0.27.

Here is how he would report the results of the hypothesis test:

We fail to reject the null hypothesis at the 10% significance level.   There is not sufficient evidence to support the claim that the new method leads to a change in the number of defective widgets produced per month.

Additional Resources

The following tutorials provide additional information about hypothesis testing:

Introduction to Hypothesis Testing 4 Examples of Hypothesis Testing in Real Life How to Write a Null Hypothesis

Featured Posts

how to set up hypothesis testing in statistics

Hey there. My name is Zach Bobbitt. I have a Masters of Science degree in Applied Statistics and I’ve worked on machine learning algorithms for professional businesses in both healthcare and retail. I’m passionate about statistics, machine learning, and data visualization and I created Statology to be a resource for both students and teachers alike.  My goal with this site is to help you learn statistics through using simple terms, plenty of real-world examples, and helpful illustrations.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Join the Statology Community

Sign up to receive Statology's exclusive study resource: 100 practice problems with step-by-step solutions. Plus, get our latest insights, tutorials, and data analysis tips straight to your inbox!

By subscribing you accept Statology's Privacy Policy.

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons

Margin Size

  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Statistics LibreTexts

9.1: Introduction to Hypothesis Testing

  • Last updated
  • Save as PDF
  • Page ID 10211

  • Kyle Siegrist
  • University of Alabama in Huntsville via Random Services

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

Basic Theory

Preliminaries.

As usual, our starting point is a random experiment with an underlying sample space and a probability measure \(\P\). In the basic statistical model, we have an observable random variable \(\bs{X}\) taking values in a set \(S\). In general, \(\bs{X}\) can have quite a complicated structure. For example, if the experiment is to sample \(n\) objects from a population and record various measurements of interest, then \[ \bs{X} = (X_1, X_2, \ldots, X_n) \] where \(X_i\) is the vector of measurements for the \(i\)th object. The most important special case occurs when \((X_1, X_2, \ldots, X_n)\) are independent and identically distributed. In this case, we have a random sample of size \(n\) from the common distribution.

The purpose of this section is to define and discuss the basic concepts of statistical hypothesis testing . Collectively, these concepts are sometimes referred to as the Neyman-Pearson framework, in honor of Jerzy Neyman and Egon Pearson, who first formalized them.

A statistical hypothesis is a statement about the distribution of \(\bs{X}\). Equivalently, a statistical hypothesis specifies a set of possible distributions of \(\bs{X}\): the set of distributions for which the statement is true. A hypothesis that specifies a single distribution for \(\bs{X}\) is called simple ; a hypothesis that specifies more than one distribution for \(\bs{X}\) is called composite .

In hypothesis testing , the goal is to see if there is sufficient statistical evidence to reject a presumed null hypothesis in favor of a conjectured alternative hypothesis . The null hypothesis is usually denoted \(H_0\) while the alternative hypothesis is usually denoted \(H_1\).

An hypothesis test is a statistical decision ; the conclusion will either be to reject the null hypothesis in favor of the alternative, or to fail to reject the null hypothesis. The decision that we make must, of course, be based on the observed value \(\bs{x}\) of the data vector \(\bs{X}\). Thus, we will find an appropriate subset \(R\) of the sample space \(S\) and reject \(H_0\) if and only if \(\bs{x} \in R\). The set \(R\) is known as the rejection region or the critical region . Note the asymmetry between the null and alternative hypotheses. This asymmetry is due to the fact that we assume the null hypothesis, in a sense, and then see if there is sufficient evidence in \(\bs{x}\) to overturn this assumption in favor of the alternative.

An hypothesis test is a statistical analogy to proof by contradiction, in a sense. Suppose for a moment that \(H_1\) is a statement in a mathematical theory and that \(H_0\) is its negation. One way that we can prove \(H_1\) is to assume \(H_0\) and work our way logically to a contradiction. In an hypothesis test, we don't prove anything of course, but there are similarities. We assume \(H_0\) and then see if the data \(\bs{x}\) are sufficiently at odds with that assumption that we feel justified in rejecting \(H_0\) in favor of \(H_1\).

Often, the critical region is defined in terms of a statistic \(w(\bs{X})\), known as a test statistic , where \(w\) is a function from \(S\) into another set \(T\). We find an appropriate rejection region \(R_T \subseteq T\) and reject \(H_0\) when the observed value \(w(\bs{x}) \in R_T\). Thus, the rejection region in \(S\) is then \(R = w^{-1}(R_T) = \left\{\bs{x} \in S: w(\bs{x}) \in R_T\right\}\). As usual, the use of a statistic often allows significant data reduction when the dimension of the test statistic is much smaller than the dimension of the data vector.

The ultimate decision may be correct or may be in error. There are two types of errors, depending on which of the hypotheses is actually true.

Types of errors:

  • A type 1 error is rejecting the null hypothesis \(H_0\) when \(H_0\) is true.
  • A type 2 error is failing to reject the null hypothesis \(H_0\) when the alternative hypothesis \(H_1\) is true.

Similarly, there are two ways to make a correct decision: we could reject \(H_0\) when \(H_1\) is true or we could fail to reject \(H_0\) when \(H_0\) is true. The possibilities are summarized in the following table:

Hypothesis Test
State | Decision Fail to reject \(H_0\) Reject \(H_0\)
\(H_0\) True Correct Type 1 error
\(H_1\) True Type 2 error Correct

Of course, when we observe \(\bs{X} = \bs{x}\) and make our decision, either we will have made the correct decision or we will have committed an error, and usually we will never know which of these events has occurred. Prior to gathering the data, however, we can consider the probabilities of the various errors.

If \(H_0\) is true (that is, the distribution of \(\bs{X}\) is specified by \(H_0\)), then \(\P(\bs{X} \in R)\) is the probability of a type 1 error for this distribution. If \(H_0\) is composite, then \(H_0\) specifies a variety of different distributions for \(\bs{X}\) and thus there is a set of type 1 error probabilities.

The maximum probability of a type 1 error, over the set of distributions specified by \( H_0 \), is the significance level of the test or the size of the critical region.

The significance level is often denoted by \(\alpha\). Usually, the rejection region is constructed so that the significance level is a prescribed, small value (typically 0.1, 0.05, 0.01).

If \(H_1\) is true (that is, the distribution of \(\bs{X}\) is specified by \(H_1\)), then \(\P(\bs{X} \notin R)\) is the probability of a type 2 error for this distribution. Again, if \(H_1\) is composite then \(H_1\) specifies a variety of different distributions for \(\bs{X}\), and thus there will be a set of type 2 error probabilities. Generally, there is a tradeoff between the type 1 and type 2 error probabilities. If we reduce the probability of a type 1 error, by making the rejection region \(R\) smaller, we necessarily increase the probability of a type 2 error because the complementary region \(S \setminus R\) is larger.

The extreme cases can give us some insight. First consider the decision rule in which we never reject \(H_0\), regardless of the evidence \(\bs{x}\). This corresponds to the rejection region \(R = \emptyset\). A type 1 error is impossible, so the significance level is 0. On the other hand, the probability of a type 2 error is 1 for any distribution defined by \(H_1\). At the other extreme, consider the decision rule in which we always rejects \(H_0\) regardless of the evidence \(\bs{x}\). This corresponds to the rejection region \(R = S\). A type 2 error is impossible, but now the probability of a type 1 error is 1 for any distribution defined by \(H_0\). In between these two worthless tests are meaningful tests that take the evidence \(\bs{x}\) into account.

If \(H_1\) is true, so that the distribution of \(\bs{X}\) is specified by \(H_1\), then \(\P(\bs{X} \in R)\), the probability of rejecting \(H_0\) is the power of the test for that distribution.

Thus the power of the test for a distribution specified by \( H_1 \) is the probability of making the correct decision.

Suppose that we have two tests, corresponding to rejection regions \(R_1\) and \(R_2\), respectively, each having significance level \(\alpha\). The test with region \(R_1\) is uniformly more powerful than the test with region \(R_2\) if \[ \P(\bs{X} \in R_1) \ge \P(\bs{X} \in R_2) \text{ for every distribution of } \bs{X} \text{ specified by } H_1 \]

Naturally, in this case, we would prefer the first test. Often, however, two tests will not be uniformly ordered; one test will be more powerful for some distributions specified by \(H_1\) while the other test will be more powerful for other distributions specified by \(H_1\).

If a test has significance level \(\alpha\) and is uniformly more powerful than any other test with significance level \(\alpha\), then the test is said to be a uniformly most powerful test at level \(\alpha\).

Clearly a uniformly most powerful test is the best we can do.

\(P\)-value

In most cases, we have a general procedure that allows us to construct a test (that is, a rejection region \(R_\alpha\)) for any given significance level \(\alpha \in (0, 1)\). Typically, \(R_\alpha\) decreases (in the subset sense) as \(\alpha\) decreases.

The \(P\)-value of the observed value \(\bs{x}\) of \(\bs{X}\), denoted \(P(\bs{x})\), is defined to be the smallest \(\alpha\) for which \(\bs{x} \in R_\alpha\); that is, the smallest significance level for which \(H_0\) is rejected, given \(\bs{X} = \bs{x}\).

Knowing \(P(\bs{x})\) allows us to test \(H_0\) at any significance level for the given data \(\bs{x}\): If \(P(\bs{x}) \le \alpha\) then we would reject \(H_0\) at significance level \(\alpha\); if \(P(\bs{x}) \gt \alpha\) then we fail to reject \(H_0\) at significance level \(\alpha\). Note that \(P(\bs{X})\) is a statistic . Informally, \(P(\bs{x})\) can often be thought of as the probability of an outcome as or more extreme than the observed value \(\bs{x}\), where extreme is interpreted relative to the null hypothesis \(H_0\).

Analogy with Justice Systems

There is a helpful analogy between statistical hypothesis testing and the criminal justice system in the US and various other countries. Consider a person charged with a crime. The presumed null hypothesis is that the person is innocent of the crime; the conjectured alternative hypothesis is that the person is guilty of the crime. The test of the hypotheses is a trial with evidence presented by both sides playing the role of the data. After considering the evidence, the jury delivers the decision as either not guilty or guilty . Note that innocent is not a possible verdict of the jury, because it is not the point of the trial to prove the person innocent. Rather, the point of the trial is to see whether there is sufficient evidence to overturn the null hypothesis that the person is innocent in favor of the alternative hypothesis of that the person is guilty. A type 1 error is convicting a person who is innocent; a type 2 error is acquitting a person who is guilty. Generally, a type 1 error is considered the more serious of the two possible errors, so in an attempt to hold the chance of a type 1 error to a very low level, the standard for conviction in serious criminal cases is beyond a reasonable doubt .

Tests of an Unknown Parameter

Hypothesis testing is a very general concept, but an important special class occurs when the distribution of the data variable \(\bs{X}\) depends on a parameter \(\theta\) taking values in a parameter space \(\Theta\). The parameter may be vector-valued, so that \(\bs{\theta} = (\theta_1, \theta_2, \ldots, \theta_n)\) and \(\Theta \subseteq \R^k\) for some \(k \in \N_+\). The hypotheses generally take the form \[ H_0: \theta \in \Theta_0 \text{ versus } H_1: \theta \notin \Theta_0 \] where \(\Theta_0\) is a prescribed subset of the parameter space \(\Theta\). In this setting, the probabilities of making an error or a correct decision depend on the true value of \(\theta\). If \(R\) is the rejection region, then the power function \( Q \) is given by \[ Q(\theta) = \P_\theta(\bs{X} \in R), \quad \theta \in \Theta \] The power function gives a lot of information about the test.

The power function satisfies the following properties:

  • \(Q(\theta)\) is the probability of a type 1 error when \(\theta \in \Theta_0\).
  • \(\max\left\{Q(\theta): \theta \in \Theta_0\right\}\) is the significance level of the test.
  • \(1 - Q(\theta)\) is the probability of a type 2 error when \(\theta \notin \Theta_0\).
  • \(Q(\theta)\) is the power of the test when \(\theta \notin \Theta_0\).

If we have two tests, we can compare them by means of their power functions.

Suppose that we have two tests, corresponding to rejection regions \(R_1\) and \(R_2\), respectively, each having significance level \(\alpha\). The test with rejection region \(R_1\) is uniformly more powerful than the test with rejection region \(R_2\) if \( Q_1(\theta) \ge Q_2(\theta)\) for all \( \theta \notin \Theta_0 \).

Most hypothesis tests of an unknown real parameter \(\theta\) fall into three special cases:

Suppose that \( \theta \) is a real parameter and \( \theta_0 \in \Theta \) a specified value. The tests below are respectively the two-sided test , the left-tailed test , and the right-tailed test .

  • \(H_0: \theta = \theta_0\) versus \(H_1: \theta \ne \theta_0\)
  • \(H_0: \theta \ge \theta_0\) versus \(H_1: \theta \lt \theta_0\)
  • \(H_0: \theta \le \theta_0\) versus \(H_1: \theta \gt \theta_0\)

Thus the tests are named after the conjectured alternative. Of course, there may be other unknown parameters besides \(\theta\) (known as nuisance parameters ).

Equivalence Between Hypothesis Test and Confidence Sets

There is an equivalence between hypothesis tests and confidence sets for a parameter \(\theta\).

Suppose that \(C(\bs{x})\) is a \(1 - \alpha\) level confidence set for \(\theta\). The following test has significance level \(\alpha\) for the hypothesis \( H_0: \theta = \theta_0 \) versus \( H_1: \theta \ne \theta_0 \): Reject \(H_0\) if and only if \(\theta_0 \notin C(\bs{x})\)

By definition, \(\P[\theta \in C(\bs{X})] = 1 - \alpha\). Hence if \(H_0\) is true so that \(\theta = \theta_0\), then the probability of a type 1 error is \(P[\theta \notin C(\bs{X})] = \alpha\).

Equivalently, we fail to reject \(H_0\) at significance level \(\alpha\) if and only if \(\theta_0\) is in the corresponding \(1 - \alpha\) level confidence set. In particular, this equivalence applies to interval estimates of a real parameter \(\theta\) and the common tests for \(\theta\) given above .

In each case below, the confidence interval has confidence level \(1 - \alpha\) and the test has significance level \(\alpha\).

  • Suppose that \(\left[L(\bs{X}, U(\bs{X})\right]\) is a two-sided confidence interval for \(\theta\). Reject \(H_0: \theta = \theta_0\) versus \(H_1: \theta \ne \theta_0\) if and only if \(\theta_0 \lt L(\bs{X})\) or \(\theta_0 \gt U(\bs{X})\).
  • Suppose that \(L(\bs{X})\) is a confidence lower bound for \(\theta\). Reject \(H_0: \theta \le \theta_0\) versus \(H_1: \theta \gt \theta_0\) if and only if \(\theta_0 \lt L(\bs{X})\).
  • Suppose that \(U(\bs{X})\) is a confidence upper bound for \(\theta\). Reject \(H_0: \theta \ge \theta_0\) versus \(H_1: \theta \lt \theta_0\) if and only if \(\theta_0 \gt U(\bs{X})\).

Pivot Variables and Test Statistics

Recall that confidence sets of an unknown parameter \(\theta\) are often constructed through a pivot variable , that is, a random variable \(W(\bs{X}, \theta)\) that depends on the data vector \(\bs{X}\) and the parameter \(\theta\), but whose distribution does not depend on \(\theta\) and is known. In this case, a natural test statistic for the basic tests given above is \(W(\bs{X}, \theta_0)\).

Tutorial Playlist

Statistics tutorial, everything you need to know about the probability density function in statistics, the best guide to understand central limit theorem, an in-depth guide to measures of central tendency : mean, median and mode, the ultimate guide to understand conditional probability.

A Comprehensive Look at Percentile in Statistics

The Best Guide to Understand Bayes Theorem

Everything you need to know about the normal distribution, an in-depth explanation of cumulative distribution function, a complete guide to chi-square test, what is hypothesis testing in statistics types and examples, understanding the fundamentals of arithmetic and geometric progression, the definitive guide to understand spearman’s rank correlation, a comprehensive guide to understand mean squared error, all you need to know about the empirical rule in statistics, the complete guide to skewness and kurtosis, a holistic look at bernoulli distribution.

All You Need to Know About Bias in Statistics

A Complete Guide to Get a Grasp of Time Series Analysis

The Key Differences Between Z-Test Vs. T-Test

The Complete Guide to Understand Pearson's Correlation

A complete guide on the types of statistical studies, everything you need to know about poisson distribution, your best guide to understand correlation vs. regression, the most comprehensive guide for beginners on what is correlation, what is hypothesis testing in statistics types and examples.

Lesson 10 of 24 By Avijeet Biswal

What Is Hypothesis Testing in Statistics? Types and Examples

Table of Contents

In today’s data-driven world , decisions are based on data all the time. Hypothesis plays a crucial role in that process, whether it may be making business decisions, in the health sector, academia, or in quality improvement. Without hypothesis & hypothesis tests, you risk drawing the wrong conclusions and making bad decisions. In this tutorial, you will look at Hypothesis Testing in Statistics.

The Ultimate Ticket to Top Data Science Job Roles

The Ultimate Ticket to Top Data Science Job Roles

What Is Hypothesis Testing in Statistics?

Hypothesis Testing is a type of statistical analysis in which you put your assumptions about a population parameter to the test. It is used to estimate the relationship between 2 statistical variables.

Let's discuss few examples of statistical hypothesis from real-life - 

  • A teacher assumes that 60% of his college's students come from lower-middle-class families.
  • A doctor believes that 3D (Diet, Dose, and Discipline) is 90% effective for diabetic patients.

Now that you know about hypothesis testing, look at the two types of hypothesis testing in statistics.

Hypothesis Testing Formula

Z = ( x̅ – μ0 ) / (σ /√n)

  • Here, x̅ is the sample mean,
  • μ0 is the population mean,
  • σ is the standard deviation,
  • n is the sample size.

How Hypothesis Testing Works?

An analyst performs hypothesis testing on a statistical sample to present evidence of the plausibility of the null hypothesis. Measurements and analyses are conducted on a random sample of the population to test a theory. Analysts use a random population sample to test two hypotheses: the null and alternative hypotheses.

The null hypothesis is typically an equality hypothesis between population parameters; for example, a null hypothesis may claim that the population means return equals zero. The alternate hypothesis is essentially the inverse of the null hypothesis (e.g., the population means the return is not equal to zero). As a result, they are mutually exclusive, and only one can be correct. One of the two possibilities, however, will always be correct.

Your Dream Career is Just Around The Corner!

Your Dream Career is Just Around The Corner!

Null Hypothesis and Alternate Hypothesis

The Null Hypothesis is the assumption that the event will not occur. A null hypothesis has no bearing on the study's outcome unless it is rejected.

H0 is the symbol for it, and it is pronounced H-naught.

The Alternate Hypothesis is the logical opposite of the null hypothesis. The acceptance of the alternative hypothesis follows the rejection of the null hypothesis. H1 is the symbol for it.

Let's understand this with an example.

A sanitizer manufacturer claims that its product kills 95 percent of germs on average. 

To put this company's claim to the test, create a null and alternate hypothesis.

H0 (Null Hypothesis): Average = 95%.

Alternative Hypothesis (H1): The average is less than 95%.

Another straightforward example to understand this concept is determining whether or not a coin is fair and balanced. The null hypothesis states that the probability of a show of heads is equal to the likelihood of a show of tails. In contrast, the alternate theory states that the probability of a show of heads and tails would be very different.

Become a Data Scientist with Hands-on Training!

Become a Data Scientist with Hands-on Training!

Hypothesis Testing Calculation With Examples

Let's consider a hypothesis test for the average height of women in the United States. Suppose our null hypothesis is that the average height is 5'4". We gather a sample of 100 women and determine that their average height is 5'5". The standard deviation of population is 2.

To calculate the z-score, we would use the following formula:

z = ( x̅ – μ0 ) / (σ /√n)

z = (5'5" - 5'4") / (2" / √100)

z = 0.5 / (0.045)

We will reject the null hypothesis as the z-score of 11.11 is very large and conclude that there is evidence to suggest that the average height of women in the US is greater than 5'4".

Steps of Hypothesis Testing

Hypothesis testing is a statistical method to determine if there is enough evidence in a sample of data to infer that a certain condition is true for the entire population. Here’s a breakdown of the typical steps involved in hypothesis testing:

Formulate Hypotheses

  • Null Hypothesis (H0): This hypothesis states that there is no effect or difference, and it is the hypothesis you attempt to reject with your test.
  • Alternative Hypothesis (H1 or Ha): This hypothesis is what you might believe to be true or hope to prove true. It is usually considered the opposite of the null hypothesis.

Choose the Significance Level (α)

The significance level, often denoted by alpha (α), is the probability of rejecting the null hypothesis when it is true. Common choices for α are 0.05 (5%), 0.01 (1%), and 0.10 (10%).

Select the Appropriate Test

Choose a statistical test based on the type of data and the hypothesis. Common tests include t-tests, chi-square tests, ANOVA, and regression analysis . The selection depends on data type, distribution, sample size, and whether the hypothesis is one-tailed or two-tailed.

Collect Data

Gather the data that will be analyzed in the test. This data should be representative of the population to infer conclusions accurately.

Calculate the Test Statistic

Based on the collected data and the chosen test, calculate a test statistic that reflects how much the observed data deviates from the null hypothesis.

Determine the p-value

The p-value is the probability of observing test results at least as extreme as the results observed, assuming the null hypothesis is correct. It helps determine the strength of the evidence against the null hypothesis.

Make a Decision

Compare the p-value to the chosen significance level:

  • If the p-value ≤ α: Reject the null hypothesis, suggesting sufficient evidence in the data supports the alternative hypothesis.
  • If the p-value > α: Do not reject the null hypothesis, suggesting insufficient evidence to support the alternative hypothesis.

Report the Results

Present the findings from the hypothesis test, including the test statistic, p-value, and the conclusion about the hypotheses.

Perform Post-hoc Analysis (if necessary)

Depending on the results and the study design, further analysis may be needed to explore the data more deeply or to address multiple comparisons if several hypotheses were tested simultaneously.

Types of Hypothesis Testing

To determine whether a discovery or relationship is statistically significant, hypothesis testing uses a z-test. It usually checks to see if two means are the same (the null hypothesis). Only when the population standard deviation is known and the sample size is 30 data points or more, can a z-test be applied.

A statistical test called a t-test is employed to compare the means of two groups. To determine whether two groups differ or if a procedure or treatment affects the population of interest, it is frequently used in hypothesis testing.

Chi-Square 

You utilize a Chi-square test for hypothesis testing concerning whether your data is as predicted. To determine if the expected and observed results are well-fitted, the Chi-square test analyzes the differences between categorical variables from a random sample. The test's fundamental premise is that the observed values in your data should be compared to the predicted values that would be present if the null hypothesis were true.

Hypothesis Testing and Confidence Intervals

Both confidence intervals and hypothesis tests are inferential techniques that depend on approximating the sample distribution. Data from a sample is used to estimate a population parameter using confidence intervals. Data from a sample is used in hypothesis testing to examine a given hypothesis. We must have a postulated parameter to conduct hypothesis testing.

Bootstrap distributions and randomization distributions are created using comparable simulation techniques. The observed sample statistic is the focal point of a bootstrap distribution, whereas the null hypothesis value is the focal point of a randomization distribution.

A variety of feasible population parameter estimates are included in confidence ranges. In this lesson, we created just two-tailed confidence intervals. There is a direct connection between these two-tail confidence intervals and these two-tail hypothesis tests. The results of a two-tailed hypothesis test and two-tailed confidence intervals typically provide the same results. In other words, a hypothesis test at the 0.05 level will virtually always fail to reject the null hypothesis if the 95% confidence interval contains the predicted value. A hypothesis test at the 0.05 level will nearly certainly reject the null hypothesis if the 95% confidence interval does not include the hypothesized parameter.

Become a Data Scientist through hands-on learning with hackathons, masterclasses, webinars, and Ask-Me-Anything! Start learning now!

Simple and Composite Hypothesis Testing

Depending on the population distribution, you can classify the statistical hypothesis into two types.

Simple Hypothesis: A simple hypothesis specifies an exact value for the parameter.

Composite Hypothesis: A composite hypothesis specifies a range of values.

A company is claiming that their average sales for this quarter are 1000 units. This is an example of a simple hypothesis.

Suppose the company claims that the sales are in the range of 900 to 1000 units. Then this is a case of a composite hypothesis.

One-Tailed and Two-Tailed Hypothesis Testing

The One-Tailed test, also called a directional test, considers a critical region of data that would result in the null hypothesis being rejected if the test sample falls into it, inevitably meaning the acceptance of the alternate hypothesis.

In a one-tailed test, the critical distribution area is one-sided, meaning the test sample is either greater or lesser than a specific value.

In two tails, the test sample is checked to be greater or less than a range of values in a Two-Tailed test, implying that the critical distribution area is two-sided.

If the sample falls within this range, the alternate hypothesis will be accepted, and the null hypothesis will be rejected.

Become a Data Scientist With Real-World Experience

Become a Data Scientist With Real-World Experience

Right Tailed Hypothesis Testing

If the larger than (>) sign appears in your hypothesis statement, you are using a right-tailed test, also known as an upper test. Or, to put it another way, the disparity is to the right. For instance, you can contrast the battery life before and after a change in production. Your hypothesis statements can be the following if you want to know if the battery life is longer than the original (let's say 90 hours):

  • The null hypothesis is (H0 <= 90) or less change.
  • A possibility is that battery life has risen (H1) > 90.

The crucial point in this situation is that the alternate hypothesis (H1), not the null hypothesis, decides whether you get a right-tailed test.

Left Tailed Hypothesis Testing

Alternative hypotheses that assert the true value of a parameter is lower than the null hypothesis are tested with a left-tailed test; they are indicated by the asterisk "<".

Suppose H0: mean = 50 and H1: mean not equal to 50

According to the H1, the mean can be greater than or less than 50. This is an example of a Two-tailed test.

In a similar manner, if H0: mean >=50, then H1: mean <50

Here the mean is less than 50. It is called a One-tailed test.

Type 1 and Type 2 Error

A hypothesis test can result in two types of errors.

Type 1 Error: A Type-I error occurs when sample results reject the null hypothesis despite being true.

Type 2 Error: A Type-II error occurs when the null hypothesis is not rejected when it is false, unlike a Type-I error.

Suppose a teacher evaluates the examination paper to decide whether a student passes or fails.

H0: Student has passed

H1: Student has failed

Type I error will be the teacher failing the student [rejects H0] although the student scored the passing marks [H0 was true]. 

Type II error will be the case where the teacher passes the student [do not reject H0] although the student did not score the passing marks [H1 is true].

Level of Significance

The alpha value is a criterion for determining whether a test statistic is statistically significant. In a statistical test, Alpha represents an acceptable probability of a Type I error. Because alpha is a probability, it can be anywhere between 0 and 1. In practice, the most commonly used alpha values are 0.01, 0.05, and 0.1, which represent a 1%, 5%, and 10% chance of a Type I error, respectively (i.e. rejecting the null hypothesis when it is in fact correct).

A p-value is a metric that expresses the likelihood that an observed difference could have occurred by chance. As the p-value decreases the statistical significance of the observed difference increases. If the p-value is too low, you reject the null hypothesis.

Here you have taken an example in which you are trying to test whether the new advertising campaign has increased the product's sales. The p-value is the likelihood that the null hypothesis, which states that there is no change in the sales due to the new advertising campaign, is true. If the p-value is .30, then there is a 30% chance that there is no increase or decrease in the product's sales.  If the p-value is 0.03, then there is a 3% probability that there is no increase or decrease in the sales value due to the new advertising campaign. As you can see, the lower the p-value, the chances of the alternate hypothesis being true increases, which means that the new advertising campaign causes an increase or decrease in sales.

Our Data Scientist Master's Program covers core topics such as R, Python, Machine Learning, Tableau, Hadoop, and Spark. Get started on your journey today!

Why Is Hypothesis Testing Important in Research Methodology?

Hypothesis testing is crucial in research methodology for several reasons:

  • Provides evidence-based conclusions: It allows researchers to make objective conclusions based on empirical data, providing evidence to support or refute their research hypotheses.
  • Supports decision-making: It helps make informed decisions, such as accepting or rejecting a new treatment, implementing policy changes, or adopting new practices.
  • Adds rigor and validity: It adds scientific rigor to research using statistical methods to analyze data, ensuring that conclusions are based on sound statistical evidence.
  • Contributes to the advancement of knowledge: By testing hypotheses, researchers contribute to the growth of knowledge in their respective fields by confirming existing theories or discovering new patterns and relationships.

When Did Hypothesis Testing Begin?

Hypothesis testing as a formalized process began in the early 20th century, primarily through the work of statisticians such as Ronald A. Fisher, Jerzy Neyman, and Egon Pearson. The development of hypothesis testing is closely tied to the evolution of statistical methods during this period.

  • Ronald A. Fisher (1920s): Fisher was one of the key figures in developing the foundation for modern statistical science. In the 1920s, he introduced the concept of the null hypothesis in his book "Statistical Methods for Research Workers" (1925). Fisher also developed significance testing to examine the likelihood of observing the collected data if the null hypothesis were true. He introduced p-values to determine the significance of the observed results.
  • Neyman-Pearson Framework (1930s): Jerzy Neyman and Egon Pearson built on Fisher’s work and formalized the process of hypothesis testing even further. In the 1930s, they introduced the concepts of Type I and Type II errors and developed a decision-making framework widely used in hypothesis testing today. Their approach emphasized the balance between these errors and introduced the concepts of the power of a test and the alternative hypothesis.

The dialogue between Fisher's and Neyman-Pearson's approaches shaped the methods and philosophy of statistical hypothesis testing used today. Fisher emphasized the evidential interpretation of the p-value. At the same time, Neyman and Pearson advocated for a decision-theoretical approach in which hypotheses are either accepted or rejected based on pre-determined significance levels and power considerations.

The application and methodology of hypothesis testing have since become a cornerstone of statistical analysis across various scientific disciplines, marking a significant statistical development.

Limitations of Hypothesis Testing

Hypothesis testing has some limitations that researchers should be aware of:

  • It cannot prove or establish the truth: Hypothesis testing provides evidence to support or reject a hypothesis, but it cannot confirm the absolute truth of the research question.
  • Results are sample-specific: Hypothesis testing is based on analyzing a sample from a population, and the conclusions drawn are specific to that particular sample.
  • Possible errors: During hypothesis testing, there is a chance of committing type I error (rejecting a true null hypothesis) or type II error (failing to reject a false null hypothesis).
  • Assumptions and requirements: Different tests have specific assumptions and requirements that must be met to accurately interpret results.

Learn All The Tricks Of The BI Trade

Learn All The Tricks Of The BI Trade

After reading this tutorial, you would have a much better understanding of hypothesis testing, one of the most important concepts in the field of Data Science . The majority of hypotheses are based on speculation about observed behavior, natural phenomena, or established theories.

If you are interested in statistics of data science and skills needed for such a career, you ought to explore the Post Graduate Program in Data Science.

If you have any questions regarding this ‘Hypothesis Testing In Statistics’ tutorial, do share them in the comment section. Our subject matter expert will respond to your queries. Happy learning!

1. What is hypothesis testing in statistics with example?

Hypothesis testing is a statistical method used to determine if there is enough evidence in a sample data to draw conclusions about a population. It involves formulating two competing hypotheses, the null hypothesis (H0) and the alternative hypothesis (Ha), and then collecting data to assess the evidence. An example: testing if a new drug improves patient recovery (Ha) compared to the standard treatment (H0) based on collected patient data.

2. What is H0 and H1 in statistics?

In statistics, H0​ and H1​ represent the null and alternative hypotheses. The null hypothesis, H0​, is the default assumption that no effect or difference exists between groups or conditions. The alternative hypothesis, H1​, is the competing claim suggesting an effect or a difference. Statistical tests determine whether to reject the null hypothesis in favor of the alternative hypothesis based on the data.

3. What is a simple hypothesis with an example?

A simple hypothesis is a specific statement predicting a single relationship between two variables. It posits a direct and uncomplicated outcome. For example, a simple hypothesis might state, "Increased sunlight exposure increases the growth rate of sunflowers." Here, the hypothesis suggests a direct relationship between the amount of sunlight (independent variable) and the growth rate of sunflowers (dependent variable), with no additional variables considered.

4. What are the 2 types of hypothesis testing?

  • One-tailed (or one-sided) test: Tests for the significance of an effect in only one direction, either positive or negative.
  • Two-tailed (or two-sided) test: Tests for the significance of an effect in both directions, allowing for the possibility of a positive or negative effect.

The choice between one-tailed and two-tailed tests depends on the specific research question and the directionality of the expected effect.

5. What are the 3 major types of hypothesis?

The three major types of hypotheses are:

  • Null Hypothesis (H0): Represents the default assumption, stating that there is no significant effect or relationship in the data.
  • Alternative Hypothesis (Ha): Contradicts the null hypothesis and proposes a specific effect or relationship that researchers want to investigate.
  • Nondirectional Hypothesis: An alternative hypothesis that doesn't specify the direction of the effect, leaving it open for both positive and negative possibilities.

Find our PL-300 Microsoft Power BI Certification Training Online Classroom training classes in top cities:

NameDatePlace
6 Jul -21 Jul 2024,
Weekend batch
Your City
20 Jul -4 Aug 2024,
Weekend batch
Your City
10 Aug -25 Aug 2024,
Weekend batch
Your City

About the Author

Avijeet Biswal

Avijeet is a Senior Research Analyst at Simplilearn. Passionate about Data Analytics, Machine Learning, and Deep Learning, Avijeet is also interested in politics, cricket, and football.

Recommended Resources

The Key Differences Between Z-Test Vs. T-Test

Free eBook: Top Programming Languages For A Data Scientist

Normality Test in Minitab: Minitab with Statistics

Normality Test in Minitab: Minitab with Statistics

A Comprehensive Look at Percentile in Statistics

Machine Learning Career Guide: A Playbook to Becoming a Machine Learning Engineer

  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base
  • Choosing the Right Statistical Test | Types & Examples

Choosing the Right Statistical Test | Types & Examples

Published on January 28, 2020 by Rebecca Bevans . Revised on June 22, 2023.

Statistical tests are used in hypothesis testing . They can be used to:

  • determine whether a predictor variable has a statistically significant relationship with an outcome variable.
  • estimate the difference between two or more groups.

Statistical tests assume a null hypothesis of no relationship or no difference between groups. Then they determine whether the observed data fall outside of the range of values predicted by the null hypothesis.

If you already know what types of variables you’re dealing with, you can use the flowchart to choose the right statistical test for your data.

Statistical tests flowchart

Table of contents

What does a statistical test do, when to perform a statistical test, choosing a parametric test: regression, comparison, or correlation, choosing a nonparametric test, flowchart: choosing a statistical test, other interesting articles, frequently asked questions about statistical tests.

Statistical tests work by calculating a test statistic – a number that describes how much the relationship between variables in your test differs from the null hypothesis of no relationship.

It then calculates a p value (probability value). The p -value estimates how likely it is that you would see the difference described by the test statistic if the null hypothesis of no relationship were true.

If the value of the test statistic is more extreme than the statistic calculated from the null hypothesis, then you can infer a statistically significant relationship between the predictor and outcome variables.

If the value of the test statistic is less extreme than the one calculated from the null hypothesis, then you can infer no statistically significant relationship between the predictor and outcome variables.

Prevent plagiarism. Run a free check.

You can perform statistical tests on data that have been collected in a statistically valid manner – either through an experiment , or through observations made using probability sampling methods .

For a statistical test to be valid , your sample size needs to be large enough to approximate the true distribution of the population being studied.

To determine which statistical test to use, you need to know:

  • whether your data meets certain assumptions.
  • the types of variables that you’re dealing with.

Statistical assumptions

Statistical tests make some common assumptions about the data they are testing:

  • Independence of observations (a.k.a. no autocorrelation): The observations/variables you include in your test are not related (for example, multiple measurements of a single test subject are not independent, while measurements of multiple different test subjects are independent).
  • Homogeneity of variance : the variance within each group being compared is similar among all groups. If one group has much more variation than others, it will limit the test’s effectiveness.
  • Normality of data : the data follows a normal distribution (a.k.a. a bell curve). This assumption applies only to quantitative data .

If your data do not meet the assumptions of normality or homogeneity of variance, you may be able to perform a nonparametric statistical test , which allows you to make comparisons without any assumptions about the data distribution.

If your data do not meet the assumption of independence of observations, you may be able to use a test that accounts for structure in your data (repeated-measures tests or tests that include blocking variables).

Types of variables

The types of variables you have usually determine what type of statistical test you can use.

Quantitative variables represent amounts of things (e.g. the number of trees in a forest). Types of quantitative variables include:

  • Continuous (aka ratio variables): represent measures and can usually be divided into units smaller than one (e.g. 0.75 grams).
  • Discrete (aka integer variables): represent counts and usually can’t be divided into units smaller than one (e.g. 1 tree).

Categorical variables represent groupings of things (e.g. the different tree species in a forest). Types of categorical variables include:

  • Ordinal : represent data with an order (e.g. rankings).
  • Nominal : represent group names (e.g. brands or species names).
  • Binary : represent data with a yes/no or 1/0 outcome (e.g. win or lose).

Choose the test that fits the types of predictor and outcome variables you have collected (if you are doing an experiment , these are the independent and dependent variables ). Consult the tables below to see which test best matches your variables.

Parametric tests usually have stricter requirements than nonparametric tests, and are able to make stronger inferences from the data. They can only be conducted with data that adheres to the common assumptions of statistical tests.

The most common types of parametric test include regression tests, comparison tests, and correlation tests.

Regression tests

Regression tests look for cause-and-effect relationships . They can be used to estimate the effect of one or more continuous variables on another variable.

Predictor variable Outcome variable Research question example
What is the effect of income on longevity?
What is the effect of income and minutes of exercise per day on longevity?
Logistic regression What is the effect of drug dosage on the survival of a test subject?

Comparison tests

Comparison tests look for differences among group means . They can be used to test the effect of a categorical variable on the mean value of some other characteristic.

T-tests are used when comparing the means of precisely two groups (e.g., the average heights of men and women). ANOVA and MANOVA tests are used when comparing the means of more than two groups (e.g., the average heights of children, teenagers, and adults).

Predictor variable Outcome variable Research question example
Paired t-test What is the effect of two different test prep programs on the average exam scores for students from the same class?
Independent t-test What is the difference in average exam scores for students from two different schools?
ANOVA What is the difference in average pain levels among post-surgical patients given three different painkillers?
MANOVA What is the effect of flower species on petal length, petal width, and stem length?

Correlation tests

Correlation tests check whether variables are related without hypothesizing a cause-and-effect relationship.

These can be used to test whether two variables you want to use in (for example) a multiple regression test are autocorrelated.

Variables Research question example
Pearson’s  How are latitude and temperature related?

Non-parametric tests don’t make as many assumptions about the data, and are useful when one or more of the common statistical assumptions are violated. However, the inferences they make aren’t as strong as with parametric tests.

Predictor variable Outcome variable Use in place of…
Spearman’s 
Pearson’s 
Sign test One-sample -test
Kruskal–Wallis  ANOVA
ANOSIM MANOVA
Wilcoxon Rank-Sum test Independent t-test
Wilcoxon Signed-rank test Paired t-test

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

This flowchart helps you choose among parametric tests. For nonparametric alternatives, check the table above.

Choosing the right statistical test

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Descriptive statistics
  • Measures of central tendency
  • Correlation coefficient
  • Null hypothesis

Methodology

  • Cluster sampling
  • Stratified sampling
  • Types of interviews
  • Cohort study
  • Thematic analysis

Research bias

  • Implicit bias
  • Cognitive bias
  • Survivorship bias
  • Availability heuristic
  • Nonresponse bias
  • Regression to the mean

Statistical tests commonly assume that:

  • the data are normally distributed
  • the groups that are being compared have similar variance
  • the data are independent

If your data does not meet these assumptions you might still be able to use a nonparametric statistical test , which have fewer requirements but also make weaker inferences.

A test statistic is a number calculated by a  statistical test . It describes how far your observed data is from the  null hypothesis  of no relationship between  variables or no difference among sample groups.

The test statistic tells you how different two or more groups are from the overall population mean , or how different a linear slope is from the slope predicted by a null hypothesis . Different test statistics are used in different statistical tests.

Statistical significance is a term used by researchers to state that it is unlikely their observations could have occurred under the null hypothesis of a statistical test . Significance is usually denoted by a p -value , or probability value.

Statistical significance is arbitrary – it depends on the threshold, or alpha value, chosen by the researcher. The most common threshold is p < 0.05, which means that the data is likely to occur less than 5% of the time under the null hypothesis .

When the p -value falls below the chosen alpha value, then we say the result of the test is statistically significant.

Quantitative variables are any variables where the data represent amounts (e.g. height, weight, or age).

Categorical variables are any variables where the data represent groups. This includes rankings (e.g. finishing places in a race), classifications (e.g. brands of cereal), and binary outcomes (e.g. coin flips).

You need to know what type of variables you are working with to choose the right statistical test for your data and interpret your results .

Discrete and continuous variables are two types of quantitative variables :

  • Discrete variables represent counts (e.g. the number of objects in a collection).
  • Continuous variables represent measurable amounts (e.g. water volume or weight).

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bevans, R. (2023, June 22). Choosing the Right Statistical Test | Types & Examples. Scribbr. Retrieved June 11, 2024, from https://www.scribbr.com/statistics/statistical-tests/

Is this article helpful?

Rebecca Bevans

Rebecca Bevans

Other students also liked, hypothesis testing | a step-by-step guide with easy examples, test statistics | definition, interpretation, and examples, normal distribution | examples, formulas, & uses, what is your plagiarism score.

Hypothesis Testing

About hypothesis testing.

critical values

Contents (Click to skip to the section):

What is a Hypothesis?

What is hypothesis testing.

  • Hypothesis Testing Examples (One Sample Z Test).
  • Hypothesis Test on a Mean (TI 83).

Bayesian Hypothesis Testing.

  • More Hypothesis Testing Articles
  • Hypothesis Tests in One Picture
  • Critical Values

What is the Null Hypothesis?

Need help with a homework problem? Check out our tutoring page!

What is a Hypothesis

A hypothesis is an educated guess about something in the world around you. It should be testable, either by experiment or observation. For example:

  • A new medicine you think might work.
  • A way of teaching you think might be better.
  • A possible location of new species.
  • A fairer way to administer standardized tests.

It can really be anything at all as long as you can put it to the test.

What is a Hypothesis Statement?

If you are going to propose a hypothesis, it’s customary to write a statement. Your statement will look like this: “If I…(do this to an independent variable )….then (this will happen to the dependent variable ).” For example:

  • If I (decrease the amount of water given to herbs) then (the herbs will increase in size).
  • If I (give patients counseling in addition to medication) then (their overall depression scale will decrease).
  • If I (give exams at noon instead of 7) then (student test scores will improve).
  • If I (look in this certain location) then (I am more likely to find new species).

A good hypothesis statement should:

  • Include an “if” and “then” statement (according to the University of California).
  • Include both the independent and dependent variables.
  • Be testable by experiment, survey or other scientifically sound technique.
  • Be based on information in prior research (either yours or someone else’s).
  • Have design criteria (for engineering or programming projects).

hypothesis testing

Hypothesis testing can be one of the most confusing aspects for students, mostly because before you can even perform a test, you have to know what your null hypothesis is. Often, those tricky word problems that you are faced with can be difficult to decipher. But it’s easier than you think; all you need to do is:

  • Figure out your null hypothesis,
  • State your null hypothesis,
  • Choose what kind of test you need to perform,
  • Either support or reject the null hypothesis .

If you trace back the history of science, the null hypothesis is always the accepted fact. Simple examples of null hypotheses that are generally accepted as being true are:

  • DNA is shaped like a double helix.
  • There are 8 planets in the solar system (excluding Pluto).
  • Taking Vioxx can increase your risk of heart problems (a drug now taken off the market).

How do I State the Null Hypothesis?

You won’t be required to actually perform a real experiment or survey in elementary statistics (or even disprove a fact like “Pluto is a planet”!), so you’ll be given word problems from real-life situations. You’ll need to figure out what your hypothesis is from the problem. This can be a little trickier than just figuring out what the accepted fact is. With word problems, you are looking to find a fact that is nullifiable (i.e. something you can reject).

Hypothesis Testing Examples #1: Basic Example

A researcher thinks that if knee surgery patients go to physical therapy twice a week (instead of 3 times), their recovery period will be longer. Average recovery times for knee surgery patients is 8.2 weeks.

The hypothesis statement in this question is that the researcher believes the average recovery time is more than 8.2 weeks. It can be written in mathematical terms as: H 1 : μ > 8.2

Next, you’ll need to state the null hypothesis .  That’s what will happen if the researcher is wrong . In the above example, if the researcher is wrong then the recovery time is less than or equal to 8.2 weeks. In math, that’s: H 0 μ ≤ 8.2

Rejecting the null hypothesis

Ten or so years ago, we believed that there were 9 planets in the solar system. Pluto was demoted as a planet in 2006. The null hypothesis of “Pluto is a planet” was replaced by “Pluto is not a planet.” Of course, rejecting the null hypothesis isn’t always that easy— the hard part is usually figuring out what your null hypothesis is in the first place.

Hypothesis Testing Examples (One Sample Z Test)

The one sample z test isn’t used very often (because we rarely know the actual population standard deviation ). However, it’s a good idea to understand how it works as it’s one of the simplest tests you can perform in hypothesis testing. In English class you got to learn the basics (like grammar and spelling) before you could write a story; think of one sample z tests as the foundation for understanding more complex hypothesis testing. This page contains two hypothesis testing examples for one sample z-tests .

One Sample Hypothesis Testing Example: One Tailed Z Test

A principal at a certain school claims that the students in his school are above average intelligence. A random sample of thirty students IQ scores have a mean score of 112.5. Is there sufficient evidence to support the principal’s claim? The mean population IQ is 100 with a standard deviation of 15.

Step 1: State the Null hypothesis . The accepted fact is that the population mean is 100, so: H 0 : μ = 100.

Step 2: State the Alternate Hypothesis . The claim is that the students have above average IQ scores, so: H 1 : μ > 100. The fact that we are looking for scores “greater than” a certain point means that this is a one-tailed test.

hypothesis testing examples

Step 4: State the alpha level . If you aren’t given an alpha level , use 5% (0.05).

Step 5: Find the rejection region area (given by your alpha level above) from the z-table . An area of .05 is equal to a z-score of 1.645.

z score formula

Step 6: If Step 6 is greater than Step 5, reject the null hypothesis. If it’s less than Step 5, you cannot reject the null hypothesis. In this case, it is more (4.56 > 1.645), so you can reject the null.

One Sample Hypothesis Testing Examples: #3

Blood glucose levels for obese patients have a mean of 100 with a standard deviation of 15. A researcher thinks that a diet high in raw cornstarch will have a positive or negative effect on blood glucose levels. A sample of 30 patients who have tried the raw cornstarch diet have a mean glucose level of 140. Test the hypothesis that the raw cornstarch had an effect.

*This process is made much easier if you use a TI-83 or Excel to calculate the z-score (the “critical value”). See:

  • Critical z value TI 83
  • Z Score in Excel

Hypothesis Testing Examples: Mean (Using TI 83)

You can use the TI 83 calculator for hypothesis testing, but the calculator won’t figure out the null and alternate hypotheses; that’s up to you to read the question and input it into the calculator.

Example problem : A sample of 200 people has a mean age of 21 with a population standard deviation (σ) of 5. Test the hypothesis that the population mean is 18.9 at α = 0.05.

Step 1: State the null hypothesis. In this case, the null hypothesis is that the population mean is 18.9, so we write: H 0 : μ = 18.9

Step 2: State the alternative hypothesis. We want to know if our sample, which has a mean of 21 instead of 18.9, really is different from the population, therefore our alternate hypothesis: H 1 : μ ≠ 18.9

Step 3: Press Stat then press the right arrow twice to select TESTS.

Step 4: Press 1 to select 1:Z-Test… . Press ENTER.

Step 5: Use the right arrow to select Stats .

Step 6: Enter the data from the problem: μ 0 : 18.9 σ: 5 x : 21 n: 200 μ: ≠μ 0

Step 7: Arrow down to Calculate and press ENTER. The calculator shows the p-value: p = 2.87 × 10 -9

This is smaller than our alpha value of .05. That means we should reject the null hypothesis .

Bayesian Hypothesis Testing: What is it?

bayesian hypothesis testing

Bayesian hypothesis testing helps to answer the question: Can the results from a test or survey be repeated? Why do we care if a test can be repeated? Let’s say twenty people in the same village came down with leukemia. A group of researchers find that cell-phone towers are to blame. However, a second study found that cell-phone towers had nothing to do with the cancer cluster in the village. In fact, they found that the cancers were completely random. If that sounds impossible, it actually can happen! Clusters of cancer can happen simply by chance . There could be many reasons why the first study was faulty. One of the main reasons could be that they just didn’t take into account that sometimes things happen randomly and we just don’t know why.

It’s good science to let people know if your study results are solid, or if they could have happened by chance. The usual way of doing this is to test your results with a p-value . A p value is a number that you get by running a hypothesis test on your data. A P value of 0.05 (5%) or less is usually enough to claim that your results are repeatable. However, there’s another way to test the validity of your results: Bayesian Hypothesis testing. This type of testing gives you another way to test the strength of your results.

Traditional testing (the type you probably came across in elementary stats or AP stats) is called Non-Bayesian. It is how often an outcome happens over repeated runs of the experiment. It’s an objective view of whether an experiment is repeatable. Bayesian hypothesis testing is a subjective view of the same thing. It takes into account how much faith you have in your results. In other words, would you wager money on the outcome of your experiment?

Differences Between Traditional and Bayesian Hypothesis Testing.

Traditional testing (Non Bayesian) requires you to repeat sampling over and over, while Bayesian testing does not. The main different between the two is in the first step of testing: stating a probability model. In Bayesian testing you add prior knowledge to this step. It also requires use of a posterior probability , which is the conditional probability given to a random event after all the evidence is considered.

Arguments for Bayesian Testing.

Many researchers think that it is a better alternative to traditional testing, because it:

  • Includes prior knowledge about the data.
  • Takes into account personal beliefs about the results.

Arguments against.

  • Including prior data or knowledge isn’t justifiable.
  • It is difficult to calculate compared to non-Bayesian testing.

Back to top

Hypothesis Testing Articles

  • What is Ad Hoc Testing?
  • Composite Hypothesis Test
  • What is a Rejection Region?
  • What is a Two Tailed Test?
  • How to Decide if a Hypothesis Test is a One Tailed Test or a Two Tailed Test.
  • How to Decide if a Hypothesis is a Left Tailed Test or a Right-Tailed Test.
  • How to State the Null Hypothesis in Statistics.
  • How to Find a Critical Value .
  • How to Support or Reject a Null Hypothesis.

Specific Tests:

  • Brunner Munzel Test (Generalized Wilcoxon Test).
  • Chi Square Test for Normality.
  • Cochran-Mantel-Haenszel Test.
  • Granger Causality Test .
  • Hotelling’s T-Squared.
  • KPSS Test .
  • What is a Likelihood-Ratio Test?
  • Log rank test .
  • MANCOVA Assumptions.
  • MANCOVA Sample Size.
  • Marascuilo Procedure
  • Rao’s Spacing Test
  • Rayleigh test of uniformity.
  • Sequential Probability Ratio Test.
  • How to Run a Sign Test.
  • T Test: one sample.
  • T-Test: Two sample .
  • Welch’s ANOVA .
  • Welch’s Test for Unequal Variances .
  • Z-Test: one sample .
  • Z Test: Two Proportion.
  • Wald Test .

Related Articles:

  • What is an Acceptance Region?
  • How to Calculate Chebyshev’s Theorem.
  • Contrast Analysis
  • Decision Rule.
  • Degrees of Freedom .
  • Directional Test
  • False Discovery Rate
  • How to calculate the Least Significant Difference.
  • Levels in Statistics.
  • How to Calculate Margin of Error.
  • Mean Difference (Difference in Means)
  • The Multiple Testing Problem .
  • What is the Neyman-Pearson Lemma?
  • What is an Omnibus Test?
  • One Sample Median Test .
  • How to Find a Sample Size (General Instructions).
  • Sig 2(Tailed) meaning in results
  • What is a Standardized Test Statistic?
  • How to Find Standard Error
  • Standardized values: Example.
  • How to Calculate a T-Score.
  • T-Score Vs. a Z.Score.
  • Testing a Single Mean.
  • Unequal Sample Sizes.
  • Uniformly Most Powerful Tests.
  • How to Calculate a Z-Score.
  • Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Statistics By Jim

Making statistics intuitive

Statistical Hypothesis Testing Overview

By Jim Frost 59 Comments

In this blog post, I explain why you need to use statistical hypothesis testing and help you navigate the essential terminology. Hypothesis testing is a crucial procedure to perform when you want to make inferences about a population using a random sample. These inferences include estimating population properties such as the mean, differences between means, proportions, and the relationships between variables.

This post provides an overview of statistical hypothesis testing. If you need to perform hypothesis tests, consider getting my book, Hypothesis Testing: An Intuitive Guide .

Why You Should Perform Statistical Hypothesis Testing

Graph that displays mean drug scores by group. Use hypothesis testing to determine whether the difference between the means are statistically significant.

Hypothesis testing is a form of inferential statistics that allows us to draw conclusions about an entire population based on a representative sample. You gain tremendous benefits by working with a sample. In most cases, it is simply impossible to observe the entire population to understand its properties. The only alternative is to collect a random sample and then use statistics to analyze it.

While samples are much more practical and less expensive to work with, there are trade-offs. When you estimate the properties of a population from a sample, the sample statistics are unlikely to equal the actual population value exactly.  For instance, your sample mean is unlikely to equal the population mean. The difference between the sample statistic and the population value is the sample error.

Differences that researchers observe in samples might be due to sampling error rather than representing a true effect at the population level. If sampling error causes the observed difference, the next time someone performs the same experiment the results might be different. Hypothesis testing incorporates estimates of the sampling error to help you make the correct decision. Learn more about Sampling Error .

For example, if you are studying the proportion of defects produced by two manufacturing methods, any difference you observe between the two sample proportions might be sample error rather than a true difference. If the difference does not exist at the population level, you won’t obtain the benefits that you expect based on the sample statistics. That can be a costly mistake!

Let’s cover some basic hypothesis testing terms that you need to know.

Background information : Difference between Descriptive and Inferential Statistics and Populations, Parameters, and Samples in Inferential Statistics

Hypothesis Testing

Hypothesis testing is a statistical analysis that uses sample data to assess two mutually exclusive theories about the properties of a population. Statisticians call these theories the null hypothesis and the alternative hypothesis. A hypothesis test assesses your sample statistic and factors in an estimate of the sample error to determine which hypothesis the data support.

When you can reject the null hypothesis, the results are statistically significant, and your data support the theory that an effect exists at the population level.

The effect is the difference between the population value and the null hypothesis value. The effect is also known as population effect or the difference. For example, the mean difference between the health outcome for a treatment group and a control group is the effect.

Typically, you do not know the size of the actual effect. However, you can use a hypothesis test to help you determine whether an effect exists and to estimate its size. Hypothesis tests convert your sample effect into a test statistic, which it evaluates for statistical significance. Learn more about Test Statistics .

An effect can be statistically significant, but that doesn’t necessarily indicate that it is important in a real-world, practical sense. For more information, read my post about Statistical vs. Practical Significance .

Null Hypothesis

The null hypothesis is one of two mutually exclusive theories about the properties of the population in hypothesis testing. Typically, the null hypothesis states that there is no effect (i.e., the effect size equals zero). The null is often signified by H 0 .

In all hypothesis testing, the researchers are testing an effect of some sort. The effect can be the effectiveness of a new vaccination, the durability of a new product, the proportion of defect in a manufacturing process, and so on. There is some benefit or difference that the researchers hope to identify.

However, it’s possible that there is no effect or no difference between the experimental groups. In statistics, we call this lack of an effect the null hypothesis. Therefore, if you can reject the null, you can favor the alternative hypothesis, which states that the effect exists (doesn’t equal zero) at the population level.

You can think of the null as the default theory that requires sufficiently strong evidence against in order to reject it.

For example, in a 2-sample t-test, the null often states that the difference between the two means equals zero.

When you can reject the null hypothesis, your results are statistically significant. Learn more about Statistical Significance: Definition & Meaning .

Related post : Understanding the Null Hypothesis in More Detail

Alternative Hypothesis

The alternative hypothesis is the other theory about the properties of the population in hypothesis testing. Typically, the alternative hypothesis states that a population parameter does not equal the null hypothesis value. In other words, there is a non-zero effect. If your sample contains sufficient evidence, you can reject the null and favor the alternative hypothesis. The alternative is often identified with H 1 or H A .

For example, in a 2-sample t-test, the alternative often states that the difference between the two means does not equal zero.

You can specify either a one- or two-tailed alternative hypothesis:

If you perform a two-tailed hypothesis test, the alternative states that the population parameter does not equal the null value. For example, when the alternative hypothesis is H A : μ ≠ 0, the test can detect differences both greater than and less than the null value.

A one-tailed alternative has more power to detect an effect but it can test for a difference in only one direction. For example, H A : μ > 0 can only test for differences that are greater than zero.

Related posts : Understanding T-tests and One-Tailed and Two-Tailed Hypothesis Tests Explained

Image of a P for the p-value in hypothesis testing.

P-values are the probability that you would obtain the effect observed in your sample, or larger, if the null hypothesis is correct. In simpler terms, p-values tell you how strongly your sample data contradict the null. Lower p-values represent stronger evidence against the null. You use P-values in conjunction with the significance level to determine whether your data favor the null or alternative hypothesis.

Related post : Interpreting P-values Correctly

Significance Level (Alpha)

image of the alpha symbol for hypothesis testing.

For instance, a significance level of 0.05 signifies a 5% risk of deciding that an effect exists when it does not exist.

Use p-values and significance levels together to help you determine which hypothesis the data support. If the p-value is less than your significance level, you can reject the null and conclude that the effect is statistically significant. In other words, the evidence in your sample is strong enough to be able to reject the null hypothesis at the population level.

Related posts : Graphical Approach to Significance Levels and P-values and Conceptual Approach to Understanding Significance Levels

Types of Errors in Hypothesis Testing

Statistical hypothesis tests are not 100% accurate because they use a random sample to draw conclusions about entire populations. There are two types of errors related to drawing an incorrect conclusion.

  • False positives: You reject a null that is true. Statisticians call this a Type I error . The Type I error rate equals your significance level or alpha (α).
  • False negatives: You fail to reject a null that is false. Statisticians call this a Type II error. Generally, you do not know the Type II error rate. However, it is a larger risk when you have a small sample size , noisy data, or a small effect size. The type II error rate is also known as beta (β).

Statistical power is the probability that a hypothesis test correctly infers that a sample effect exists in the population. In other words, the test correctly rejects a false null hypothesis. Consequently, power is inversely related to a Type II error. Power = 1 – β. Learn more about Power in Statistics .

Related posts : Types of Errors in Hypothesis Testing and Estimating a Good Sample Size for Your Study Using Power Analysis

Which Type of Hypothesis Test is Right for You?

There are many different types of procedures you can use. The correct choice depends on your research goals and the data you collect. Do you need to understand the mean or the differences between means? Or, perhaps you need to assess proportions. You can even use hypothesis testing to determine whether the relationships between variables are statistically significant.

To choose the proper statistical procedure, you’ll need to assess your study objectives and collect the correct type of data . This background research is necessary before you begin a study.

Related Post : Hypothesis Tests for Continuous, Binary, and Count Data

Statistical tests are crucial when you want to use sample data to make conclusions about a population because these tests account for sample error. Using significance levels and p-values to determine when to reject the null hypothesis improves the probability that you will draw the correct conclusion.

To see an alternative approach to these traditional hypothesis testing methods, learn about bootstrapping in statistics !

If you want to see examples of hypothesis testing in action, I recommend the following posts that I have written:

  • How Effective Are Flu Shots? This example shows how you can use statistics to test proportions.
  • Fatality Rates in Star Trek . This example shows how to use hypothesis testing with categorical data.
  • Busting Myths About the Battle of the Sexes . A fun example based on a Mythbusters episode that assess continuous data using several different tests.
  • Are Yawns Contagious? Another fun example inspired by a Mythbusters episode.

Share this:

how to set up hypothesis testing in statistics

Reader Interactions

' src=

January 14, 2024 at 8:43 am

Hello professor Jim, how are you doing! Pls. What are the properties of a population and their examples? Thanks for your time and understanding.

' src=

January 14, 2024 at 12:57 pm

Please read my post about Populations vs. Samples for more information and examples.

Also, please note there is a search bar in the upper-right margin of my website. Use that to search for topics.

' src=

July 5, 2023 at 7:05 am

Hello, I have a question as I read your post. You say in p-values section

“P-values are the probability that you would obtain the effect observed in your sample, or larger, if the null hypothesis is correct. In simpler terms, p-values tell you how strongly your sample data contradict the null. Lower p-values represent stronger evidence against the null.”

But according to your definition of effect, the null states that an effect does not exist, correct? So what I assume you want to say is that “P-values are the probability that you would obtain the effect observed in your sample, or larger, if the null hypothesis is **incorrect**.”

July 6, 2023 at 5:18 am

Hi Shrinivas,

The correct definition of p-value is that it is a probability that exists in the context of a true null hypothesis. So, the quotation is correct in stating “if the null hypothesis is correct.”

Essentially, the p-value tells you the likelihood of your observed results (or more extreme) if the null hypothesis is true. It gives you an idea of whether your results are surprising or unusual if there is no effect.

Hence, with sufficiently low p-values, you reject the null hypothesis because it’s telling you that your sample results were unlikely to have occurred if there was no effect in the population.

I hope that helps make it more clear. If not, let me know I’ll attempt to clarify!

' src=

May 8, 2023 at 12:47 am

Thanks a lot Ny best regards

May 7, 2023 at 11:15 pm

Hi Jim Can you tell me something about size effect? Thanks

May 8, 2023 at 12:29 am

Here’s a post that I’ve written about Effect Sizes that will hopefully tell you what you need to know. Please read that. Then, if you have any more specific questions about effect sizes, please post them there. Thanks!

' src=

January 7, 2023 at 4:19 pm

Hi Jim, I have only read two pages so far but I am really amazed because in few paragraphs you made me clearly understand the concepts of months of courses I received in biostatistics! Thanks so much for this work you have done it helps a lot!

January 10, 2023 at 3:25 pm

Thanks so much!

' src=

June 17, 2021 at 1:45 pm

Can you help in the following question: Rocinante36 is priced at ₹7 lakh and has been designed to deliver a mileage of 22 km/litre and a top speed of 140 km/hr. Formulate the null and alternative hypotheses for mileage and top speed to check whether the new models are performing as per the desired design specifications.

' src=

April 19, 2021 at 1:51 pm

Its indeed great to read your work statistics.

I have a doubt regarding the one sample t-test. So as per your book on hypothesis testing with reference to page no 45, you have mentioned the difference between “the sample mean and the hypothesised mean is statistically significant”. So as per my understanding it should be quoted like “the difference between the population mean and the hypothesised mean is statistically significant”. The catch here is the hypothesised mean represents the sample mean.

Please help me understand this.

Regards Rajat

April 19, 2021 at 3:46 pm

Thanks for buying my book. I’m so glad it’s been helpful!

The test is performed on the sample but the results apply to the population. Hence, if the difference between the sample mean (observed in your study) and the hypothesized mean is statistically significant, that suggests that population does not equal the hypothesized mean.

For one sample tests, the hypothesized mean is not the sample mean. It is a mean that you want to use for the test value. It usually represents a value that is important to your research. In other words, it’s a value that you pick for some theoretical/practical reasons. You pick it because you want to determine whether the population mean is different from that particular value.

I hope that helps!

' src=

November 5, 2020 at 6:24 am

Jim, you are such a magnificent statistician/economist/econometrician/data scientist etc whatever profession. Your work inspires and simplifies the lives of so many researchers around the world. I truly admire you and your work. I will buy a copy of each book you have on statistics or econometrics. Keep doing the good work. Remain ever blessed

November 6, 2020 at 9:47 pm

Hi Renatus,

Thanks so much for you very kind comments. You made my day!! I’m so glad that my website has been helpful. And, thanks so much for supporting my books! 🙂

' src=

November 2, 2020 at 9:32 pm

Hi Jim, I hope you are aware of 2019 American Statistical Association’s official statement on Statistical Significance: https://www.tandfonline.com/doi/full/10.1080/00031305.2019.1583913 In case you do not bother reading the full article, may I quote you the core message here: “We conclude, based on our review of the articles in this special issue and the broader literature, that it is time to stop using the term “statistically significant” entirely. Nor should variants such as “significantly different,” “p < 0.05,” and “nonsignificant” survive, whether expressed in words, by asterisks in a table, or in some other way."

With best wishes,

November 3, 2020 at 2:09 am

I’m definitely aware of the debate surrounding how to use p-values most effectively. However, I need to correct you on one point. The link you provide is NOT a statement by the American Statistical Association. It is an editorial by several authors.

There is considerable debate over this issue. There are problems with p-values. However, as the authors state themselves, much of the problem is over people’s mindsets about how to use p-values and their incorrect interpretations about what statistical significance does and does not mean.

If you were to read my website more thoroughly, you’d be aware that I share many of their concerns and I address them in multiple posts. One of the authors’ key points is the need to be thoughtful and conduct thoughtful research and analysis. I emphasize this aspect in multiple posts on this topic. I’ll ask you to read the following three because they all address some of the authors’ concerns and suggestions. But you might run across others to read as well.

Five Tips for Using P-values to Avoid Being Misled How to Interpret P-values Correctly P-values and the Reproducibility of Experimental Results

' src=

September 24, 2020 at 11:52 pm

HI Jim, i just want you to know that you made explanation for Statistics so simple! I should say lesser and fewer words that reduce the complexity. All the best! 🙂

September 25, 2020 at 1:03 am

Thanks, Rene! Your kind words mean a lot to me! I’m so glad it has been helpful!

' src=

September 23, 2020 at 2:21 am

Honestly, I never understood stats during my entire M.Ed course and was another nightmare for me. But how easily you have explained each concept, I have understood stats way beyond my imagination. Thank you so much for helping ignorant research scholars like us. Looking forward to get hardcopy of your book. Kindly tell is it available through flipkart?

September 24, 2020 at 11:14 pm

I’m so happy to hear that my website has been helpful!

I checked on flipkart and it appears like my books are not available there. I’m never exactly sure where they’re available due to the vagaries of different distribution channels. They are available on Amazon in India.

Introduction to Statistics: An Intuitive Guide (Amazon IN) Hypothesis Testing: An Intuitive Guide (Amazon IN)

' src=

July 26, 2020 at 11:57 am

Dear Jim I am a teacher from India . I don’t have any background in statistics, and still I should tell that in a single read I can follow your explanations . I take my entire biostatistics class for botany graduates with your explanations. Thanks a lot. May I know how I can avail your books in India

July 28, 2020 at 12:31 am

Right now my books are only available as ebooks from my website. However, soon I’ll have some exciting news about other ways to obtain it. Stay tuned! I’ll announce it on my email list. If you’re not already on it, you can sign up using the form that is in the right margin of my website.

' src=

June 22, 2020 at 2:02 pm

Also can you please let me if this book covers topics like EDA and principal component analysis?

June 22, 2020 at 2:07 pm

This book doesn’t cover principal components analysis. Although, I wouldn’t really classify that as a hypothesis test. In the future, I might write a multivariate analysis book that would cover this and others. But, that’s well down the road.

My Introduction to Statistics covers EDA. That’s the largely graphical look at your data that you often do prior to hypothesis testing. The Introduction book perfectly leads right into the Hypothesis Testing book.

June 22, 2020 at 1:45 pm

Thanks for the detailed explanation. It does clear my doubts. I saw that your book related to hypothesis testing has the topics that I am studying currently. I am looking forward to purchasing it.

Regards, Take Care

June 19, 2020 at 1:03 pm

For this particular article I did not understand a couple of statements and it would great if you could help: 1)”If sample error causes the observed difference, the next time someone performs the same experiment the results might be different.” 2)”If the difference does not exist at the population level, you won’t obtain the benefits that you expect based on the sample statistics.”

I discovered your articles by chance and now I keep coming back to read & understand statistical concepts. These articles are very informative & easy to digest. Thanks for the simplifying things.

June 20, 2020 at 9:53 pm

I’m so happy to hear that you’ve found my website to be helpful!

To answer your questions, keep in mind that a central tenant of inferential statistics is that the random sample that a study drew was only one of an infinite number of possible it could’ve drawn. Each random sample produces different results. Most results will cluster around the population value assuming they used good methodology. However, random sampling error always exists and makes it so that population estimates from a sample almost never exactly equal the correct population value.

So, imagine that we’re studying a medication and comparing the treatment and control groups. Suppose that the medicine is truly not effect and that the population difference between the treatment and control group is zero (i.e., no difference.) Despite the true difference being zero, most sample estimates will show some degree of either a positive or negative effect thanks to random sampling error. So, just because a study has an observed difference does not mean that a difference exists at the population level. So, on to your questions:

1. If the observed difference is just random error, then it makes sense that if you collected another random sample, the difference could change. It could change from negative to positive, positive to negative, more extreme, less extreme, etc. However, if the difference exists at the population level, most random samples drawn from the population will reflect that difference. If the medicine has an effect, most random samples will reflect that fact and not bounce around on both sides of zero as much.

2. This is closely related to the previous answer. If there is no difference at the population level, but say you approve the medicine because of the observed effects in a sample. Even though your random sample showed an effect (which was really random error), that effect doesn’t exist. So, when you start using it on a larger scale, people won’t benefit from the medicine. That’s why it’s important to separate out what is easily explained by random error versus what is not easily explained by it.

I think reading my post about how hypothesis tests work will help clarify this process. Also, in about 24 hours (as I write this), I’ll be releasing my new ebook about Hypothesis Testing!

' src=

May 29, 2020 at 5:23 am

Hi Jim, I really enjoy your blog. Can you please link me on your blog where you discuss about Subgroup analysis and how it is done? I need to use non parametric and parametric statistical methods for my work and also do subgroup analysis in order to identify potential groups of patients that may benefit more from using a treatment than other groups.

May 29, 2020 at 2:12 pm

Hi, I don’t have a specific article about subgroup analysis. However, subgroup analysis is just the dividing up of a larger sample into subgroups and then analyzing those subgroups separately. You can use the various analyses I write about on the subgroups.

Alternatively, you can include the subgroups in regression analysis as an indicator variable and include that variable as a main effect and an interaction effect to see how the relationships vary by subgroup without needing to subdivide your data. I write about that approach in my article about comparing regression lines . This approach is my preferred approach when possible.

' src=

April 19, 2020 at 7:58 am

sir is confidence interval is a part of estimation?

' src=

April 17, 2020 at 3:36 pm

Sir can u plz briefly explain alternatives of hypothesis testing? I m unable to find the answer

April 18, 2020 at 1:22 am

Assuming you want to draw conclusions about populations by using samples (i.e., inferential statistics ), you can use confidence intervals and bootstrap methods as alternatives to the traditional hypothesis testing methods.

' src=

March 9, 2020 at 10:01 pm

Hi JIm, could you please help with activities that can best teach concepts of hypothesis testing through simulation, Also, do you have any question set that would enhance students intuition why learning hypothesis testing as a topic in introductory statistics. Thanks.

' src=

March 5, 2020 at 3:48 pm

Hi Jim, I’m studying multiple hypothesis testing & was wondering if you had any material that would be relevant. I’m more trying to understand how testing multiple samples simultaneously affects your results & more on the Bonferroni Correction

March 5, 2020 at 4:05 pm

I write about multiple comparisons (aka post hoc tests) in the ANOVA context . I don’t talk about Bonferroni Corrections specifically but I cover related types of corrections. I’m not sure if that exactly addresses what you want to know but is probably the closest I have already written. I hope it helps!

' src=

January 14, 2020 at 9:03 pm

Thank you! Have a great day/evening.

January 13, 2020 at 7:10 pm

Any help would be greatly appreciated. What is the difference between The Hypothesis Test and The Statistical Test of Hypothesis?

January 14, 2020 at 11:02 am

They sound like the same thing to me. Unless this is specialized terminology for a particular field or the author was intending something specific, I’d guess they’re one and the same.

' src=

April 1, 2019 at 10:00 am

so these are the only two forms of Hypothesis used in statistical testing?

April 1, 2019 at 10:02 am

Are you referring to the null and alternative hypothesis? If so, yes, that’s those are the standard hypotheses in a statistical hypothesis test.

April 1, 2019 at 9:57 am

year very insightful post, thanks for the write up

' src=

October 27, 2018 at 11:09 pm

hi there, am upcoming statistician, out of all blogs that i have read, i have found this one more useful as long as my problem is concerned. thanks so much

October 27, 2018 at 11:14 pm

Hi Stano, you’re very welcome! Thanks for your kind words. They mean a lot! I’m happy to hear that my posts were able to help you. I’m sure you will be a fantastic statistician. Best of luck with your studies!

' src=

October 26, 2018 at 11:39 am

Dear Jim, thank you very much for your explanations! I have a question. Can I use t-test to compare two samples in case each of them have right bias?

October 26, 2018 at 12:00 pm

Hi Tetyana,

You’re very welcome!

The term “right bias” is not a standard term. Do you by chance mean right skewed distributions? In other words, if you plot the distribution for each group on a histogram they have longer right tails? These are not the symmetrical bell-shape curves of the normal distribution.

If that’s the case, yes you can as long as you exceed a specific sample size within each group. I include a table that contains these sample size requirements in my post about nonparametric vs parametric analyses .

Bias in statistics refers to cases where an estimate of a value is systematically higher or lower than the true value. If this is the case, you might be able to use t-tests, but you’d need to be sure to understand the nature of the bias so you would understand what the results are really indicating.

I hope this helps!

' src=

April 2, 2018 at 7:28 am

Simple and upto the point 👍 Thank you so much.

April 2, 2018 at 11:11 am

Hi Kalpana, thanks! And I’m glad it was helpful!

' src=

March 26, 2018 at 8:41 am

Am I correct if I say: Alpha – Probability of wrongly rejection of null hypothesis P-value – Probability of wrongly acceptance of null hypothesis

March 28, 2018 at 3:14 pm

You’re correct about alpha. Alpha is the probability of rejecting the null hypothesis when the null is true.

Unfortunately, your definition of the p-value is a bit off. The p-value has a fairly convoluted definition. It is the probability of obtaining the effect observed in a sample, or more extreme, if the null hypothesis is true. The p-value does NOT indicate the probability that either the null or alternative is true or false. Although, those are very common misinterpretations. To learn more, read my post about how to interpret p-values correctly .

' src=

March 2, 2018 at 6:10 pm

I recently started reading your blog and it is very helpful to understand each concept of statistical tests in easy way with some good examples. Also, I recommend to other people go through all these blogs which you posted. Specially for those people who have not statistical background and they are facing to many problems while studying statistical analysis.

Thank you for your such good blogs.

March 3, 2018 at 10:12 pm

Hi Amit, I’m so glad that my blog posts have been helpful for you! It means a lot to me that you took the time to write such a nice comment! Also, thanks for recommending by blog to others! I try really hard to write posts about statistics that are easy to understand.

' src=

January 17, 2018 at 7:03 am

I recently started reading your blog and I find it very interesting. I am learning statistics by my own, and I generally do many google search to understand the concepts. So this blog is quite helpful for me, as it have most of the content which I am looking for.

January 17, 2018 at 3:56 pm

Hi Shashank, thank you! And, I’m very glad to hear that my blog is helpful!

' src=

January 2, 2018 at 2:28 pm

thank u very much sir.

January 2, 2018 at 2:36 pm

You’re very welcome, Hiral!

' src=

November 21, 2017 at 12:43 pm

Thank u so much sir….your posts always helps me to be a #statistician

November 21, 2017 at 2:40 pm

Hi Sachin, you’re very welcome! I’m happy that you find my posts to be helpful!

' src=

November 19, 2017 at 8:22 pm

great post as usual, but it would be nice to see an example.

November 19, 2017 at 8:27 pm

Thank you! At the end of this post, I have links to four other posts that show examples of hypothesis tests in action. You’ll find what you’re looking for in those posts!

Comments and Questions Cancel reply

University Library, University of Illinois at Urbana-Champaign

University of Illinois Library Wordmark

SPSS Tutorial: General Statistics and Hypothesis Testing

  • About This Tutorial
  • SPSS Components
  • Importing Data
  • General Statistics and Hypothesis Testing
  • Further Resources

Merging Files based on a shared variable.

This section and the "Graphics" section provide a quick tutorial for a few common functions in SPSS, primarily to provide the reader with a feel for the SPSS user interface. This is not a comprehensive tutorial, but SPSS itself provides comprehensive tutorials and case studies through it's help menu. SPSS's help menu is more than a quick reference. It provides detailed information on how and when to use SPSS's various menu options. See the "Further Resources" section for more information. 

To perform a one sample t-test click "Analyze"→"Compare Means"→"One Sample T-Test" and the following dialog box will appear:

how to set up hypothesis testing in statistics

The dialogue allows selection of any scale variable from the box at the left and a test value that represents a hypothetical mean. Select the test variable and set the test value, then press "Ok." Three tables will appear in the Output Viewer:

how to set up hypothesis testing in statistics

The first table gives descriptive statistics about the variable. The second shows the results of the t_test, including the "t" statistic, the degrees of freedom ("df") the p-value ("Sig."), the difference of the test value from the variable mean, and the upper and lower bounds for a ninety-five percent confidence interval. The final table shows one-sample effect sizes.

One-Way ANOVA

In the Data Editor, select "Analyze"→"Compare Means"→"One-Way ANOVA..." to open the dialog box shown below.

how to set up hypothesis testing in statistics

To generate the ANOVA statistic the variables chosen cannot have a "Nominal" level of measurement; they must be "ordinal." 

Once the nominal variables have been changed to ordinal, select "the dependent variable and  the factor, then click "OK." The following output will appear in the Output Viewer:

how to set up hypothesis testing in statistics

Linear Regression

To obtain a linear regression select "Analyze"->"Regression"->"Linear" from the menu, calling up the dialog box shown below:

how to set up hypothesis testing in statistics

The output of this most basic case produces a summary chart showing R, R-square, and the Standard error of the prediction; an ANOVA chart; and a chart providing statistics on model coefficients:

how to set up hypothesis testing in statistics

For Multiple regression, simply add more independent variables in the "Linear Regression" dialogue box. To plot a regression line see the "Legacy Dialogues" section of the "Graphics" tab.

Scholarly Commons

Profile Photo

  • << Previous: Importing Data
  • Next: Graphics >>
  • Last Updated: Mar 1, 2024 4:56 PM
  • URL: https://guides.library.illinois.edu/spss

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

10.1 - setting the hypotheses: examples.

A significance test examines whether the null hypothesis provides a plausible explanation of the data. The null hypothesis itself does not involve the data. It is a statement about a parameter (a numerical characteristic of the population). These population values might be proportions or means or differences between means or proportions or correlations or odds ratios or any other numerical summary of the population. The alternative hypothesis is typically the research hypothesis of interest. Here are some examples.

Example 10.2: Hypotheses with One Sample of One Categorical Variable Section  

About 10% of the human population is left-handed. Suppose a researcher at Penn State speculates that students in the College of Arts and Architecture are more likely to be left-handed than people found in the general population. We only have one sample since we will be comparing a population proportion based on a sample value to a known population value.

  • Research Question : Are artists more likely to be left-handed than people found in the general population?
  • Response Variable : Classification of the student as either right-handed or left-handed

State Null and Alternative Hypotheses

  • Null Hypothesis : Students in the College of Arts and Architecture are no more likely to be left-handed than people in the general population (population percent of left-handed students in the College of Art and Architecture = 10% or p = .10).
  • Alternative Hypothesis : Students in the College of Arts and Architecture are more likely to be left-handed than people in the general population (population percent of left-handed students in the College of Arts and Architecture > 10% or p > .10). This is a one-sided alternative hypothesis.

Example 10.3: Hypotheses with One Sample of One Measurement Variable Section  

 two Diphenhydramine pills

A generic brand of the anti-histamine Diphenhydramine markets a capsule with a 50 milligram dose. The manufacturer is worried that the machine that fills the capsules has come out of calibration and is no longer creating capsules with the appropriate dosage.

  • Research Question : Does the data suggest that the population mean dosage of this brand is different than 50 mg?
  • Response Variable : dosage of the active ingredient found by a chemical assay.
  • Null Hypothesis : On the average, the dosage sold under this brand is 50 mg (population mean dosage = 50 mg).
  • Alternative Hypothesis : On the average, the dosage sold under this brand is not 50 mg (population mean dosage ≠ 50 mg). This is a two-sided alternative hypothesis.

Example 10.4: Hypotheses with Two Samples of One Categorical Variable Section  

vegetarian airline meal

Many people are starting to prefer vegetarian meals on a regular basis. Specifically, a researcher believes that females are more likely than males to eat vegetarian meals on a regular basis.

  • Research Question : Does the data suggest that females are more likely than males to eat vegetarian meals on a regular basis?
  • Response Variable : Classification of whether or not a person eats vegetarian meals on a regular basis
  • Explanatory (Grouping) Variable: Sex
  • Null Hypothesis : There is no sex effect regarding those who eat vegetarian meals on a regular basis (population percent of females who eat vegetarian meals on a regular basis = population percent of males who eat vegetarian meals on a regular basis or p females = p males ).
  • Alternative Hypothesis : Females are more likely than males to eat vegetarian meals on a regular basis (population percent of females who eat vegetarian meals on a regular basis > population percent of males who eat vegetarian meals on a regular basis or p females > p males ). This is a one-sided alternative hypothesis.

Example 10.5: Hypotheses with Two Samples of One Measurement Variable Section  

low carb meal

Obesity is a major health problem today. Research is starting to show that people may be able to lose more weight on a low carbohydrate diet than on a low fat diet.

  • Research Question : Does the data suggest that, on the average, people are able to lose more weight on a low carbohydrate diet than on a low fat diet?
  • Response Variable : Weight loss (pounds)
  • Explanatory (Grouping) Variable : Type of diet
  • Null Hypothesis : There is no difference in the mean amount of weight loss when comparing a low carbohydrate diet with a low fat diet (population mean weight loss on a low carbohydrate diet = population mean weight loss on a low fat diet).
  • Alternative Hypothesis : The mean weight loss should be greater for those on a low carbohydrate diet when compared with those on a low fat diet (population mean weight loss on a low carbohydrate diet > population mean weight loss on a low fat diet). This is a one-sided alternative hypothesis.

Example 10.6: Hypotheses about the relationship between Two Categorical Variables Section  

  • Research Question : Do the odds of having a stroke increase if you inhale second hand smoke ? A case-control study of non-smoking stroke patients and controls of the same age and occupation are asked if someone in their household smokes.
  • Variables : There are two different categorical variables (Stroke patient vs control and whether the subject lives in the same household as a smoker). Living with a smoker (or not) is the natural explanatory variable and having a stroke (or not) is the natural response variable in this situation.
  • Null Hypothesis : There is no relationship between whether or not a person has a stroke and whether or not a person lives with a smoker (odds ratio between stroke and second-hand smoke situation is = 1).
  • Alternative Hypothesis : There is a relationship between whether or not a person has a stroke and whether or not a person lives with a smoker (odds ratio between stroke and second-hand smoke situation is > 1). This is a one-tailed alternative.

This research question might also be addressed like example 11.4 by making the hypotheses about comparing the proportion of stroke patients that live with smokers to the proportion of controls that live with smokers.

Example 10.7: Hypotheses about the relationship between Two Measurement Variables Section  

  • Research Question : A financial analyst believes there might be a positive association between the change in a stock's price and the amount of the stock purchased by non-management employees the previous day (stock trading by management being under "insider-trading" regulatory restrictions).
  • Variables : Daily price change information (the response variable) and previous day stock purchases by non-management employees (explanatory variable). These are two different measurement variables.
  • Null Hypothesis : The correlation between the daily stock price change (\$) and the daily stock purchases by non-management employees (\$) = 0.
  • Alternative Hypothesis : The correlation between the daily stock price change (\$) and the daily stock purchases by non-management employees (\$) > 0. This is a one-sided alternative hypothesis.

Example 10.8: Hypotheses about comparing the relationship between Two Measurement Variables in Two Samples Section  

Calculation of a person's approximate tip for their meal

  • Research Question : Is there a linear relationship between the amount of the bill (\$) at a restaurant and the tip (\$) that was left. Is the strength of this association different for family restaurants than for fine dining restaurants?
  • Variables : There are two different measurement variables. The size of the tip would depend on the size of the bill so the amount of the bill would be the explanatory variable and the size of the tip would be the response variable.
  • Null Hypothesis : The correlation between the amount of the bill (\$) at a restaurant and the tip (\$) that was left is the same at family restaurants as it is at fine dining restaurants.
  • Alternative Hypothesis : The correlation between the amount of the bill (\$) at a restaurant and the tip (\$) that was left is the difference at family restaurants then it is at fine dining restaurants. This is a two-sided alternative hypothesis.

COMMENTS

  1. How to test your hypothesis (with statistics)

    Setting up and executing a statistical test involves several clear steps. First, define your hypotheses and decide on the appropriate statistical test based on your data type. Next, gather your data through reliable collection methods, ensuring accuracy and relevance. Handling data anomalies is part of the process.

  2. Hypothesis Testing Articles

    What is Hypothesis Testing? Hypothesis testing in statistics uses sample data to infer the properties of a whole population.These tests determine whether a random sample provides sufficient evidence to conclude an effect or relationship exists in the population. Researchers use them to help separate genuine population-level effects from false effects that random chance can create in samples.

  3. 6.4: Distribution Needed for Hypothesis Testing

    The estimated value (point estimate) for μ is ˉx, the sample mean. If you are testing a single population proportion, the distribution for the test is for proportions or percentages: (6.4.3) The population parameter is p. The estimated value (point estimate) for p is p′. p ′ = x n where x is the number of successes and n is the sample size.

  4. What Is Hypothesis Testing in Python: A Hands-On Tutorial

    Let's now execute our hypothesis test using the pytest -v -k "test_factorial" command. And Hypothesis confirms that our function works perfectly for the given set of inputs, i.e., for integers from 1 to 30. We can also view detailed statistics of the Hypothesis run by passing the argument -hypothesis-show-statistics to pytest command as:

  5. Understanding P-Values and Statistical Significance

    In statistical hypothesis testing, you reject the null hypothesis when the p-value is less than or equal to the significance level (α) you set before conducting your test. The significance level is the probability of rejecting the null hypothesis when it is true. Commonly used significance levels are 0.01, 0.05, and 0.10.

  6. What is a scientific hypothesis?

    Thus, the hypothesis is true, but it may not be true 100% of the time. Scientific theory vs. scientific hypothesis. The best hypotheses are simple. They deal with a relatively narrow set of phenomena.

  7. U.S. Adds 272,000 Jobs in May, an Unexpectedly Strong Pace of Hiring

    June 7, 2024. The U.S. economy keeps throwing curveballs, and the May employment report is the latest example. Employers added 272,000 jobs last month, the Labor Department reported on Friday ...

  8. PLOS Genetics

    Genomic analyses of Symbiomonas scintillans show no evidence for endosymbiotic bacteria but does reveal the presence of giant viruses. A multi-gene tree showed the three SsV genome types branched within highly supported clades with each of BpV2, OlVs, and MpVs, respectively. Image credit: pgen.1011218. 03/28/2024. Research Article.

  9. Hypothesis Testing

    Present the findings in your results and discussion section. Though the specific details might vary, the procedure you will use when testing a hypothesis will always follow some version of these steps. Table of contents. Step 1: State your null and alternate hypothesis. Step 2: Collect data. Step 3: Perform a statistical test.

  10. 6a.2

    Below these are summarized into six such steps to conducting a test of a hypothesis. Set up the hypotheses and check conditions: Each hypothesis test includes two hypotheses about the population. One is the null hypothesis, notated as H 0, which is a statement of a particular parameter value. This hypothesis is assumed to be true until there is ...

  11. 5.3.1- Steps in Conducting a Hypothesis Test for \(\mu\)

    Step 1: Set up the hypotheses and check conditions. One Mean t-test Hypotheses. H 0: μ = μ 0. H a: μ ≠ μ 0. Conditions: The data comes from an approximately normal distribution or the sample size is at least 30. Step 2: Decide on the significance level, α. Typically, 5%. If α is not specified, use 5%. Step 3: Calculate the test statistic.

  12. Hypothesis Testing in Statistics: Step by Step with Examples

    Choose the sample size. Determine the statistical technique. Set up the critical values to identify the reject region and non-reject region. Collect the data sample and compute sample parameters & Test statistic. Compare sample/test statistic with critical value/reject or non-reject region. Make your conclusion clear.

  13. How to Set Up a Hypothesis Test: Null versus Alternative

    How to define a null hypothesis. Every hypothesis test contains a set of two opposing statements, or hypotheses, about a population parameter. The first hypothesis is called the null hypothesis, denoted H 0. The null hypothesis always states that the population parameter is equal to the claimed value. For example, if the claim is that the ...

  14. Statistics

    Hypothesis testing is based on making two different claims about a population parameter. The null hypothesis ( H 0) and the alternative hypothesis ( H 1) are the claims. The two claims needs to be mutually exclusive, meaning only one of them can be true. The alternative hypothesis is typically what we are trying to prove.

  15. 1.2: The 7-Step Process of Statistical Hypothesis Testing

    Step 7: Based on steps 5 and 6, draw a conclusion about H0. If the F\calculated from the data is larger than the Fα, then you are in the rejection region and you can reject the null hypothesis with (1 − α) level of confidence. Note that modern statistical software condenses steps 6 and 7 by providing a p -value.

  16. 1.2

    Step 7: Based on Steps 5 and 6, draw a conclusion about H 0. If F calculated is larger than F α, then you are in the rejection region and you can reject the null hypothesis with ( 1 − α) level of confidence. Note that modern statistical software condenses Steps 6 and 7 by providing a p -value. The p -value here is the probability of getting ...

  17. 7.1: Basics of Hypothesis Testing

    Test Statistic: z = ¯ x − μo σ / √n since it is calculated as part of the testing of the hypothesis. Definition 7.1.4. p - value: probability that the test statistic will take on more extreme values than the observed test statistic, given that the null hypothesis is true.

  18. Simple hypothesis testing

    Practice this lesson yourself on KhanAcademy.org right now: https://www.khanacademy.org/math/probability/probability-and-combinatorics-topic/decisions-with-p...

  19. How to Write Hypothesis Test Conclusions (With Examples)

    When writing the conclusion of a hypothesis test, we typically include: Whether we reject or fail to reject the null hypothesis. The significance level. A short explanation in the context of the hypothesis test. For example, we would write: We reject the null hypothesis at the 5% significance level.

  20. 8.1: Steps in Hypothesis Testing

    Hypothesis testing consists of two contradictory hypotheses or statements, a decision based on the data, and a conclusion. To perform a hypothesis test, a statistician will: Set up two contradictory hypotheses. Collect sample data (in homework problems, the data or summary statistics will be given to you). Determine the correct distribution to ...

  21. How to Write a Strong Hypothesis

    Developing a hypothesis (with example) Step 1. Ask a question. Writing a hypothesis begins with a research question that you want to answer. The question should be focused, specific, and researchable within the constraints of your project. Example: Research question.

  22. 9.1: Introduction to Hypothesis Testing

    In hypothesis testing, the goal is to see if there is sufficient statistical evidence to reject a presumed null hypothesis in favor of a conjectured alternative hypothesis.The null hypothesis is usually denoted \(H_0\) while the alternative hypothesis is usually denoted \(H_1\). An hypothesis test is a statistical decision; the conclusion will either be to reject the null hypothesis in favor ...

  23. What is Hypothesis Testing in Statistics? Types and Examples

    Hypothesis Testing is a type of statistical analysis in which you put your assumptions about a population parameter to the test. It is used to estimate the relationship between 2 statistical variables. Let's discuss few examples of statistical hypothesis from real-life -. A teacher assumes that 60% of his college's students come from lower ...

  24. Choosing the Right Statistical Test

    What does a statistical test do? Statistical tests work by calculating a test statistic - a number that describes how much the relationship between variables in your test differs from the null hypothesis of no relationship.. It then calculates a p value (probability value). The p-value estimates how likely it is that you would see the difference described by the test statistic if the null ...

  25. Intro to Hypothesis Testing in Statistics

    Get the full course at: http://www.MathTutorDVD.comThe student will learn the big picture of what a hypothesis test is in statistics. We will discuss terms ...

  26. Hypothesis Testing

    Step 2: State the Alternate Hypothesis. The claim is that the students have above average IQ scores, so: H 1: μ > 100. The fact that we are looking for scores "greater than" a certain point means that this is a one-tailed test. Step 3: Draw a picture to help you visualize the problem. Step 4: State the alpha level.

  27. 5.2

    5.2 - Writing Hypotheses. The first step in conducting a hypothesis test is to write the hypothesis statements that are going to be tested. For each test you will have a null hypothesis ( H 0) and an alternative hypothesis ( H a ). Null Hypothesis. The statement that there is not a difference in the population (s), denoted as H 0.

  28. Statistical Hypothesis Testing Overview

    Hypothesis testing is a crucial procedure to perform when you want to make inferences about a population using a random sample. These inferences include estimating population properties such as the mean, differences between means, proportions, and the relationships between variables. This post provides an overview of statistical hypothesis testing.

  29. SPSS Tutorial: General Statistics and Hypothesis Testing

    The first table gives descriptive statistics about the variable. The second shows the results of the t_test, including the "t" statistic, the degrees of freedom ("df") the p-value ("Sig."), the difference of the test value from the variable mean, and the upper and lower bounds for a ninety-five percent confidence interval.

  30. 10.1

    10.1 - Setting the Hypotheses: Examples. A significance test examines whether the null hypothesis provides a plausible explanation of the data. The null hypothesis itself does not involve the data. It is a statement about a parameter (a numerical characteristic of the population). These population values might be proportions or means or ...