p值 统计学意义_什么是统计意义? P值定义以及如何计算
p值 统计学意义

P values are one of the most widely used concepts in statistical analysis. They are used by researchers, analysts and statisticians to draw insights from data and make informed decisions.

P值是统计分析中使用最广泛的概念之一。 研究人员,分析人员和统计学家使用它们来从数据中汲取见解并做出明智的决策。

Along with statistical significance, they are also one of the most widely misused and misunderstood concepts in statistical analysis.


This article will explain:


  • how a P value is used for inferring statistical significance

  • how P values are calculated

  • and how to avoid some common misconceptions


回顾:假设检验 (Recap: Hypothesis testing)

Hypothesis testing is a standard approach to drawing insights from data. It is used in virtually every quantitative discipline, and has a rich history going back over one hundred years.

假设检验是一种从数据中获取见解的标准方法。 几乎所有定量学科都使用它,并且已有一百多年的悠久历史。

The usual approach to hypothesis testing is to define a question in terms of the variables you are interested in. Then, you can form two opposing hypotheses to answer it.


  • The null hypothesis claims there is no statistically significant relationship between the variables


  • The alternative hypothesis claims there is a statistically significant relationship between the variables


For example, say you are testing whether caffeine affects programming productivity. There are two variables you are interested in - the dose of the caffeine, and the productivity of group of software developers.

例如,假设您正在测试咖啡因是否会影响编程效率。 您感兴趣的变量有两个-咖啡因的剂量和软件开发人员的生产率。

The null hypothesis would be:


  • "Caffeine intake has no significant effect on programming productivity".

    “摄入咖啡因对编程效率没有显着影响 ”。

The alternative hypothesis would be:


  • "Caffeine intake does have a significant effect on productivity".

    “摄入咖啡因确实会对生产力产生重大影响 ”。

The word 'significant' has a very specific meaning here. It refers to a relationship between variables existing due to something more than chance alone.

“重要”一词在此具有非常具体的含义。 它指的是由于某些因素而不仅仅是偶然因素而存在的变量之间的关系。

Instead, the relationship exists (at least in part) due to 'real' differences or effects between the variables.


The next step is to collect some data to test the hypotheses. This could be collected from an experiment or survey, or from a set of data you have access to.

下一步是收集一些数据以检验假设。 这可以从实验或调查中收集,也可以从您有权访问的一组数据中收集。

The final step is to calculate a test statistic from the data. This is a single number that represents some characteristic of your data. Examples include the t-test, Chi-squared test, and the Kruskal-Wallis test - among many others.

最后一步是根据数据计算测试统计量。 这是一个代表数据某些特征的数字。 示例包括t检验,卡方检验和Kruskal-Wallis检验-等等。

Exactly which one to calculate will depend on the question you are asking, the structure of your data, and the distribution of your data.


for your reference.


In the caffeine example, a suitable test might be a .

在咖啡因示例中,合适的测试可能是 。

You will end up with a single test statistic from your data. All that is left to do is interpret this result to determine whether it supports or rejects the null hypothesis.

您最终将获得数据中的单个测试统计信息。 剩下要做的就是解释这个结果,以确定它是否支持或拒绝原假设。

This is where P values come into play.


该统计数据不太可能出现? (How unlikely is this statistic?)

Recall that you have calculated a test statistic, which represents some characteristic of your data. You want to understand whether it supports or rejects the null hypothesis.

回想一下,您已经计算了一个测试统计量,该统计量代表了数据的某些特征。 您想了解它是否支持或拒绝原假设。

The approach taken is to assume the null hypothesis is true. That is, assume there are no significant relationships between the variables you are interested in.

所采用的方法是假设零假设为真。 也就是说,假设您感兴趣的变量之间没有显着的关系。

Then, look at the data you have collected. How likely would your test statistic be if the null hypothesis really is true?

然后,查看您收集的数据。 如果原假设正确,那么您的检验统计量有多大?

Let's refer back to the caffeine intake example from before.


  • Say that productivity levels were split about evenly between developers, regardless of whether they drank caffeine or not (graph A). This result would be likely to occur by chance if the null hypothesis were true.

    假设无论开发人员是否喝咖啡因,生产力水平在开发人员之间平均分配(图A)。 如果原假设为真,则可能会偶然发生此结果。

  • However, suppose that almost all of the highest productivity was seen in developers who drank caffeine (graph B). This is a more 'extreme' result, and would be unlikely to occur just by chance if the null hypothesis were true.

    但是,假设在喝咖啡因的开发人员中几乎可以看到最高的生产率(图B)。 这是一个更“极端”的结果,并且如果原假设为真,则不可能偶然发生

But how 'extreme' does a result need to be before it is considered too unlikely to support the null hypothesis?


This is what a P value lets you estimate. It provides a numerical answer to the question: "if the null hypothesis is true, what is the probability of a result this extreme or more extreme?"

这就是P值可让您估算的值。 它提供了以下问题的数字答案:“如果原假设是正确的,那么这个极端或更大极端的结果的概率是多少?”

P values are probabilities, so they are always between 0 and 1.


  • A high P value indicates the observed results are likely to occur by chance under the null hypothesis.


  • A low P value indicates that the results are less likely to occur by chance under the null hypothesis.


Usually, a threshold is chosen to determine statistical significance. This threshold is often denoted α.

通常,选择阈值以确定统计显着性。 该阈值通常表示为α。

If the P value is below the threshold, your results are 'statistically significant'. This means you can reject the null hypothesis (and accept the alternative hypothesis).

如果P值低于阈值 ,则您的结果“具有统计意义 ”。 这意味着您可以拒绝原假设(并接受替代假设)。

There is no one-size-fits-all threshold suitable for all applications. Usually, an arbitrary threshold will be used that is appropriate for the context.

没有适合所有应用的“一刀切”门槛。 通常,将使用适合于上下文的任意阈值。

For example, in fields such as ecology and evolution, it is difficult to control experimental conditions because many factors can affect the outcome. It can also be difficult to collect very large sample sizes. In these fields, a threshold of 0.05 will often be used.

例如,在生态和进化等领域,由于许多因素都会影响实验结果,因此很难控制实验条件。 收集非常大的样本量也可能很困难。 在这些字段中,通常将使用0.05的阈值。

In other contexts such as physics and engineering, a threshold of 0.01 or even lower will be more appropriate.


卡方示例 (Chi-squared example)

In this example, there are two (fictional) variables: region, and political party membership. It uses the to see if there's a relationship between region and political party membership.

在此示例中,有两个(虚构的)变量:地区和政党成员。 它使用来查看地区和政党成员之间是否存在关系。

You can change the number of members for each party.


  • Null hypothesis: "there is no significant relationship between region and political party membership"


  • Alternative hypothesis: "there is a significant relationship between region and political party membership"


Hit the "rerun" button to try different scenarios.


常见的误解以及如何避免它们 (Common misconceptions and how to avoid them)

There are several mistakes that even experienced practitioners often make about the use of P values and hypothesis testing. This section will aim to clear those up.

即使是经验丰富的从业人员,在使用P值和假设检验时也会经常犯一些错误。 本节旨在清除这些内容。

The null hypothesis is uninteresting - if the data is good and analysis is done right, then it is a valid conclusion in its own right.

零假设无趣 -如果数据良好且分析正确,那么它本身就是有效的结论。

A question worth answering should have an interesting answer - whatever the outcome.


P value is the probability of the null hypothesis being true - a P value represents "the probability of the results, given the null hypothesis being true". This is not the same as "the probability of the null hypothesis being true, given the results".

❌P 值是原假设为真的概率-P值表示“假设原假设为真,结果的概率”。 这与“在给出给定结果的情况下,原假设为真的概率”不同。

P(Data | Hypothesis) ≠ P(Hypothesis | Data)


This means a low P value tells you: "if the null hypothesis is true, these results are unlikely". It does not tell you: "if these results are true, the null hypothesis is unlikely".

这意味着低P值会告诉您:“如果零假设成立,那么这些结果就不太可能”。 它不会告诉你:“如果这些结果是真实的,零假设是不可能的”。

You can use the same significance threshold for multiple comparisons - remember the definition of the P value. It is the probability of observing a certain test statistic by chance alone.

您可以对多个比较使用相同的显着性阈值 -请记住P值的定义。 这是仅靠偶然观察某项统计数据的概率。

If you use a threshold of α = 0.05 (or 1-in-20) and you carry out, say, 20 stats tests... you might expect by chance alone to find a low P value.

如果您使用阈值α= 0.05(或20分之一),并且进行了20次统计测试,那么您可能会偶然发现低P值。

You should use a lower threshold if you are carrying out multiple comparisons. There are that will let you calculate how much lower the threshold should be.

if如果要进行多个比较,则应使用较低的阈值。 有一些可让您计算阈值应降低多少。

The significance threshold means anything at all - it is entirely arbitrary. 0.05 is just a convention. The difference between p = 0.049 and p = 0.051 is the pretty much the same as between p = 0.039 and p = 0.041.

significance 重要性阈值意味着任何东西 -完全是任意的。 0.05只是一个约定。 p = 0.049和p = 0.051之间的差异与p = 0.039和p = 0.041之间的差异几乎相同。

This is one of the biggest weaknesses of hypothesis testing this way. It forces you to draw a line in the sand, even though no line can easily be drawn.

这是这种假设检验的最大弱点之一。 即使没有任何线条可以轻易绘制,它也会迫使您在沙子上画一条线。

Therefore, always consider significance thresholds for what they are - totally arbitrary.


Statistical significance means chance plays no part - far from it. Often, there are many causes for a given outcome. Some will be random, others less so.

❌具有统计意义,意味着机会不起作用-远非如此。 通常,给定结果的原因很多。 有些会是随机的,有些则不会如此。

Finding one non-random cause doesn't mean it explains all the differences between your variables. It is important not to mistake statistical significance with "effect size".

找到一个非随机原因并不意味着它可以解释变量之间的所有差异。 重要的是不要将统计显着性与“效应大小”相混淆。

P values are the only way to determine statistical significance - there are other approaches which are sometimes better.

❌P 值是确定统计显着性的唯一方法 -有些方法有时更好。

As well as classical hypothesis testing, consider other approaches - such as using , or instead.

classical与经典假设检验一样,请考虑其他方法-例如使用或 。


p值 统计学意义


