STATISTICAL CHECK OF STATISTICAL

The concept of statistical hypothesis.

Types of hypotheses. Errors of the first and second kind

Hypothesis- this is an assumption about some properties of the studied phenomena. Under statistical hypothesis understand any statement about the general population that can be verified statistically, that is, based on the results of observations in a random sample. Two types of statistical hypotheses are considered: hypotheses about the laws of distribution population and hypotheses about parameters known distributions.

Thus, the hypothesis that the time spent on assembling a machine assembly in a group of machine shops that produce products of the same name and have approximately the same technical and economic production conditions is distributed according to the normal law is a hypothesis about the distribution law. And the hypothesis that the labor productivity of workers in two teams performing the same work under the same conditions does not differ (while the labor productivity of workers in each team has a normal distribution law) is a hypothesis about the distribution parameters.

The hypothesis to be tested is called null, or basic, and denoted H 0 . The null hypothesis is opposed competing or alternative hypothesis, which is H one . As a rule, the competing hypothesis H 1 is a logical negation of the main hypothesis H 0.

An example null hypothesis may be the following: the means of two normally distributed populations are equal, then the competing hypothesis may consist of the assumption that the means are not equal. Symbolically it is written like this:

H 0: M(X) = M(Y); H 1: M(X) M(Y) .

If the null (proposed) hypothesis is rejected, then there is a competing hypothesis.

There are simple and complex hypotheses. If a hypothesis contains only one assumption, then it is - simple hypothesis. Complex a hypothesis consists of a finite or infinite number of simple hypotheses.

For example, the hypothesis H 0: p = p 0 (unknown probability p equal to the hypothetical probability p 0 ) is simple, and the hypothesis H 0: p < p 0 - complex, it consists of countless simple hypotheses of the form H 0: p = p i, where p i- any number less than p 0 .

The proposed statistical hypothesis may be correct or incorrect, so it is necessary to verify based on the results of observations in a random sample; verification is carried out statistical methods, so it is called statistical.

When testing a statistical hypothesis, a specially composed random variable is used, called statistical criterion(or statistics). The accepted conclusion about the correctness (or incorrectness) of the hypothesis is based on the study of the distribution of this random variable according to the sample data. Therefore, statistical testing of hypotheses is probabilistic in nature: there is always a risk of making a mistake when accepting (rejecting) a hypothesis. In this case, errors of two kinds are possible.

Type I error is that the null hypothesis will be rejected even though it is in fact true.

Type II error is that the null hypothesis will be accepted, although the competing one is in fact true.

In most cases, the consequences of these errors are unequal. What is better or worse depends on the specific formulation of the problem and the content of the null hypothesis. Consider examples. Assume that at the enterprise the quality of products is judged by the results of selective control. If the sample fraction of marriage does not exceed a predetermined value p 0 , then the batch is accepted. In other words, the null hypothesis is put forward: H 0: p p 0 . If a Type I error is made in testing this hypothesis, we will reject the good product. If an error of the second kind is made, then the reject will be sent to the consumer. Obviously, the consequences of a Type II error can be much more serious.

Another example can be given from the field of jurisprudence. We will consider the work of judges as actions to verify the presumption of innocence of the defendant. The main hypothesis to be tested is the hypothesis H 0 : the defendant is innocent. Then the alternative hypothesis H 1 is the hypothesis: the accused is guilty of a crime. It is obvious that the court may make errors of the first or second kind in sentencing the defendant. If a mistake of the first kind is made, then this means that the court punished the innocent: the defendant was convicted when in fact he did not commit a crime. If the judges made a mistake of the second kind, then this means that the court delivered a verdict of not guilty, when in fact the accused is guilty of a crime. It is obvious that the consequences of an error of the first kind for the accused will be much more serious, while for society the consequences of an error of the second kind are the most dangerous.

Probability commit mistake first kind called significance level criteria and denote .

In most cases, the significance level of the criterion is taken equal to 0.01 or 0.05. If, for example, the significance level is taken equal to 0.01, then this means that in one case out of a hundred there is a risk of making a type I error (that is, rejecting the correct null hypothesis).

Probability commit type II error denote . Probability
not making a Type II error, that is, rejecting the null hypothesis when it is false, is called the power of the criterion.

Statistical criterion.

Critical areas

A statistical hypothesis is tested using a specially selected random variable, the exact or approximate distribution of which is known (we denote it To). This random variable is called statistical criterion(or simply criterion).

There are various statistical criteria used in practice: U- and Z-criteria (these random variables have a normal distribution); F-criterion (a random variable is distributed according to the Fisher-Snedekor law); t-criterion (according to Student's law); -criterion (according to the "chi-square" law), etc.

The set of all possible values ​​of the criterion can be divided into two non-overlapping subsets: one of them contains the values ​​of the criterion under which the null hypothesis is accepted, and the other - under which it is rejected.

The set of test values ​​under which the null hypothesis is rejected is called critical area. We will denote the critical region by W.

The set of criterion values ​​under which the null hypothesis is accepted is called hypothesis acceptance area(or range of acceptable values ​​of the criterion). We will refer to this area as .

To test the validity of the null hypothesis, according to the sample data, we calculate observed criterion value. We will denote it To obs.

The basic principle of testing statistical hypotheses can be formulated as follows: if the observed value of the criterion fell into the critical region (that is,
), then the null hypothesis is rejected; if the observed value of the criterion fell into the area of ​​accepting the hypothesis (that is,
), then there is no reason to reject the null hypothesis.

What principles should be followed when constructing a critical region W ?

Let's assume that the hypothesis H 0 is actually true. Then hitting the criterion
into the critical region, by virtue of the basic principle of testing statistical hypotheses, entails the rejection of the correct hypothesis H 0 , which means making a Type I error. Therefore, the probability of hitting
to the region W if the hypothesis is true H 0 should be equal to the significance level of the criterion, i.e.

.

Note that the probability of making a Type I error is chosen to be sufficiently small (as a rule,
). Then hitting the criterion
to the critical area W if the hypothesis is true H 0 can be considered an almost impossible event. If, according to the sampling data, the event
nevertheless occurred, then it can be considered incompatible with the hypothesis H 0 (which as a result is rejected), but compatible with the hypothesis H 1 (which is eventually accepted).

Let us now assume that the hypothesis is true H 1 . Then hitting the criterion
into the area of ​​acceptance of the hypothesis leads to the adoption of an incorrect hypothesis H 0 which means committing a Type II error. That's why
.

Since the events
and
are mutually opposite, then the probability of hitting the criterion
to the critical area W will be equal to the power of the criterion if the hypothesis H 1 true, that is

.

Obviously, the critical region should be chosen so that, at a given level of significance, the power of the criterion
was maximum. Maximizing the power of the test will provide a minimum probability of making a Type II error.

It should be noted that no matter how small the value of the significance level , the criterion falling into the critical region is only an unlikely, but not absolutely impossible event. Therefore, it is possible that with a true null hypothesis, the value of the criterion calculated from the sample data will still be in the critical region. Rejecting the hypothesis in this case H 0 , we make a Type I error with probability . The smaller , the less likely it is to make a Type I error. However, with a decrease, the critical region decreases, which means that it becomes less possible for the observed value to fall into it. To obs, even when the hypothesis H 0 is wrong. At =0 hypothesis H 0 will always be accepted regardless of the sample results. Therefore, a decrease entails an increase in the probability of accepting an incorrect null hypothesis, that is, making a Type II error. In this sense, errors of the first and second kind are competing.

Since it is impossible to eliminate errors of the first and second kind, it is necessary at least to strive in each specific case to minimize the losses from these errors. Of course, it is desirable to reduce both errors simultaneously, but since they are competing, a decrease in the probability of making one of them leads to an increase in the probability of making the other. The only way simultaneous decrease the risk of error lies in increasing the sample size.

Depending on the type of competing hypothesis H 1 are building one-sided (right-sided and left-sided) and two-sided critical regions. Points separating the critical region
from the area of ​​acceptance of the hypothesis , called critical points and denote k Crete. For finding the critical region you need to know the critical points.

right hand the critical region can be described by the inequality
To>k Crete. pr, where it is assumed that the right critical point k Crete. pr >0. Such a region consists of points located on the right side of the critical point k Crete. pr, that is, it contains a set of positive and sufficiently large values ​​of the criterion TO. For finding k Crete. pr set first the significance level of the criterion . Further right critical point k Crete. pr is found from the condition . Why exactly this requirement defines a right-handed critical region? Since the probability of an event (TO>k Crete. etc ) is small, then, due to the principle of the practical impossibility of unlikely events, this event should not occur if the null hypothesis is true in a single test. If, nevertheless, it has come, that is, the observed value of the criterion calculated from the data of the samples
turned out to be more k Crete. pr, this can be explained by the fact that the null hypothesis is not consistent with the observational data and therefore should be rejected. Thus the requirement
determines such values ​​of the criterion under which the null hypothesis is rejected, and they constitute the right-hand critical region.

If
fell into the range of acceptable values ​​of the criterion , that is
< k Crete. pr, then the main hypothesis is not rejected, because it is compatible with the observational data. Note that the probability of hitting the criterion
into the range of acceptable values if the null hypothesis is true, it is equal to (1-) and close to 1.

It must be remembered that the hit of the criteria values
into the range of acceptable values ​​is not a rigorous proof of the validity of the null hypothesis. It only indicates that there is no significant discrepancy between the proposed hypothesis and the results of the sample. Therefore, in such cases, we say that the observational data are consistent with the null hypothesis and there is no reason to reject it.

Other critical regions are constructed similarly.

So, lleft-sided the critical region is described by the inequality
To<k Crete. l, where k crit.l<0. Такая область состоит из точек, находящихся по левую сторону от левой критической точки k crit.l, that is, it is a set of negative, but sufficiently large modulo values ​​of the criterion. critical point k crit.l is found from the condition
(To<k Crete. l)
, that is, the probability that the criterion takes a value less than k crit.l, is equal to the accepted level of significance if the null hypothesis is true.

bilateral critical area
is described by the following inequalities: ( To< k crit.l or To>k Crete. pr), where it is assumed that k crit.l<0 и k Crete. pr >0. Such an area is a set of sufficiently large modulo values ​​of the criterion. Critical points are found from the requirement: the sum of the probabilities that the criterion will take a value less than k Crete. l or more k Crete. pr, should be equal to the accepted level of significance if the null hypothesis is true, that is

(TO< k Crete. l )+
(TO>k Crete. etc )= .

If the distribution of the criterion To symmetrical about the origin, then the critical points will be located symmetrically about zero, so k Crete. l = - k Crete. etc. Then the two-sided critical region becomes symmetric and can be described by the following inequality: > k Crete. dw, where k Crete. dw = k Crete. pr Critical point k Crete. dw can be found from the condition

P(K< -k Crete. dv )=P(K>k Crete. dv )= .

Remark 1. For each criterion To critical points at a given level of significance
can be found from the condition
only numerically. Results of numerical calculations k crit are given in the corresponding tables (see, for example, appendix 4 - 6 in the file "Appendices").

Remark 2. The principle of testing a statistical hypothesis described above does not yet prove its truth or untruth. Acceptance of the hypothesis H 0 compared with alternative hypothesis H 1 does not mean that we are sure of the absolute correctness of the hypothesis H 0 - just a hypothesis H 0 agrees with the observational data that we have, that is, it is a fairly plausible statement that does not contradict experience. It is possible that with an increase in the sample size n hypothesis H 0 will be rejected.

5. Main problems of applied statistics - data description, estimation and testing of hypotheses

Key Concepts Used in Hypothesis Testing

Statistical hypothesis - any assumption concerning the unknown distribution of random variables (elements). Here are the formulations of several statistical hypotheses:

1. The results of observations have a normal distribution with zero mathematical expectation.
2. The results of observations have a distribution function N(0,1).
3. The results of observations have a normal distribution.
4. The results of observations in two independent samples have the same normal distribution.
5. The results of observations in two independent samples have the same distribution.

There are null and alternative hypotheses. The null hypothesis is the hypothesis to be tested. An alternative hypothesis is every valid hypothesis other than the null hypothesis. The null hypothesis is H 0 , alternative - H 1(from Hypothesis - “hypothesis” (English)).

The choice of one or another null or alternative hypotheses is determined by the applied tasks facing the manager, economist, engineer, researcher. Consider examples.

Example 11. Let the null hypothesis be hypothesis 2 from the above list, and the alternative hypothesis be hypothesis 1. This means that the real situation is described by a probabilistic model, according to which the results of observations are considered as realizations of independent identically distributed random variables with a distribution function N(0,σ), where the parameter σ is unknown to the statistician. In this model, the null hypothesis is written as follows:

H 0: σ = 1,

and an alternative like this:

H 1: σ ≠ 1.

Example 12. Let the null hypothesis be still hypothesis 2 from the above list, and the alternative hypothesis be hypothesis 3 from the same list. Then, in a probabilistic model of a managerial, economic, or production situation, it is assumed that the results of observations form a sample from a normal distribution N(m, σ) for some values m and σ. Hypotheses are written like this:

H 0: m= 0, σ = 1

(both parameters take fixed values);

H 1: m≠ 0 and/or σ ≠ 1

(i.e. either m≠ 0, or σ ≠ 1, or both m≠ 0, and σ ≠ 1).

Example 13 Let H 0 is hypothesis 1 from the above list, and H 1 - hypothesis 3 from the same list. Then the probabilistic model is the same as in example 12,

H 0: m= 0, σ is arbitrary;

H 1: m≠ 0, σ is arbitrary.

Example 14 Let H 0 is hypothesis 2 from the above list, and according to H 1 observational results have a distribution function F(x), not matching the standard normal distribution function F(x). Then

H 0: F(x) = F(x) for all X(written as F(x) ≡ F(x));

H 1: F(x 0) ≠ F (x 0) at some x 0(i.e. it is not true that F(x) ≡ F(x)).

Note. Here ≡ is the sign of the identical coincidence of functions (i.e., coincidence for all possible values ​​of the argument X).

Example 15 Let H 0 is hypothesis 3 from the above list, and according to H 1 observational results have a distribution function F(x), not being normal. Then

For some m, σ;

H 1: for any m, σ there is x 0 = x 0(m, σ) such that .

Example 16 Let H 0 - hypothesis 4 from the above list, according to the probabilistic model, two samples are drawn from populations with distribution functions F(x) and G(x), which are normal with parameters m 1 , σ 1 and m 2 , σ 2 respectively, and H 1 - negation H 0 . Then

H 0: m 1 = m 2 , σ 1 = σ 2 , and m 1 and σ 1 are arbitrary;

H 1: m 1 ≠ m 2 and/or σ 1 ≠ σ 2 .

Example 17. Let, under the conditions of Example 16, it is additionally known that σ 1 = σ 2 . Then

H 0: m 1 = m 2 , σ > 0, and m 1 and σ are arbitrary;

H 1: m 1 ≠ m 2 , σ > 0.

Example 18. Let H 0 - hypothesis 5 from the above list, according to the probabilistic model, two samples are drawn from populations with distribution functions F(x) and G(x) respectively, and H 1 - negation H 0 . Then

H 0: F(x) G(x) , where F(x)

H 1: F(x) and G(x) are arbitrary distribution functions, and

F(x) G(x) with some X.

Example 19. Let, in the conditions of Example 17, it is additionally assumed that the distribution functions F(x) and G(x) differ only in the shift, i.e. G(x) = F(x- a) at some a. Then

H 0: F(x) G(x) ,

where F(x) is an arbitrary distribution function;

H 1: G(x) = F(x- a), a ≠ 0,

where F(x) is an arbitrary distribution function.

Example 20. Let, in the conditions of Example 14, it is additionally known that according to the probabilistic model of the situation F(x) is a normal distribution function with unit variance, i.e. has the form N(m, one). Then

H 0: m = 0 (those. F(x) = F(x)

for all X); (written as F(x) ≡ F(x));

H 1: m 0

(i.e. it is not true that F(x) ≡ F(x)).

Example 21. In the statistical regulation of technological, economic, managerial or other processes, a sample is taken from the aggregate with normal distribution and known variance, and hypotheses

H 0: m = m 0 ,

H 1: m= m 1 ,

where parameter value m = m 0 corresponds to the established course of the process, and the transition to m= m 1 indicates a breakdown.

Example 22. With statistical acceptance control, the number of defective product units in the sample obeys a hypergeometric distribution, the unknown parameter is p = D/ N is the defect level, where N- the volume of the batch of products, Dtotal number defective items in a batch. Used in regulatory, technical and commercial documentation (standards, supply contracts, etc.), control plans are often aimed at testing a hypothesis

H 0: p < AQL

H 1: p > LQ,

where AQL – acceptance level of defectiveness, LQ is the defectiveness level of defects (obviously, AQL < LQ).

Example 23. As indicators of the stability of a technological, economic, managerial or other process, a number of characteristics of the distributions of controlled indicators are used, in particular, the coefficient of variation v = σ/ M(X). Need to test the null hypothesis

H 0: v < v 0

under the alternative hypothesis

H 1: v > v 0 ,

where v 0 is some predetermined boundary value.

Example 24. Let the probabilistic model of two samples be the same as in Example 18, let us denote the mathematical expectations of the results of observations in the first and second samples M(X) and M(At) respectively. In some situations, the null hypothesis is tested

H 0: M(X) = M(Y)

against the alternative hypothesis

H 1: M(X) ≠ M(Y).

Example 25. It was noted above great importance in mathematical statistics of distribution functions symmetric with respect to 0, When checking symmetry

H 0: F(- x) = 1 – F(x) for all x, otherwise F arbitrary;

H 1: F(- x 0 ) ≠ 1 – F(x 0 ) at some x 0 , otherwise F arbitrary.

In probabilistic-statistical decision-making methods, many other formulations of problems for testing statistical hypotheses are also used. Some of them are discussed below.

The specific task of testing a statistical hypothesis is fully described if the null and alternative hypotheses are given. The choice of a method for testing a statistical hypothesis, the properties and characteristics of the methods are determined by both the null and alternative hypotheses. To test the same null hypothesis under different alternative hypotheses, generally speaking, different methods should be used. So, in examples 14 and 20, the null hypothesis is the same, while the alternative ones are different. Therefore, in the conditions of example 14, methods based on fit criteria with a parametric family (Kolmogorov type or omega-square type) should be used, and in the conditions of example 20, methods based on Student's test or Cramer-Welch test. If, in the conditions of example 14, the Student's criterion is used, then it will not solve the tasks set. If, in the conditions of Example 20, we use a Kolmogorov-type goodness-of-fit test, then, on the contrary, it will solve the tasks set, although, perhaps, worse than the Student's criterion specially adapted for this case.

When processing real data, the correct choice of hypotheses is of great importance. H 0 and H one . Assumptions made, such as normality of distribution, must be carefully justified, in particular by statistical methods. Note that in the vast majority of specific applied settings, the distribution of observation results is different from normal.

A situation often arises when the form of the null hypothesis follows from the formulation of the applied problem, and the form of the alternative hypothesis is not clear. In such cases, an alternative hypothesis should be considered. general view and use methods that solve the problem for all possible H one . In particular, when testing hypothesis 2 (from the list above) as null, one should use as an alternative hypothesis H 1 from example 14, and not from example 20, if there are no special justifications for the normality of the distribution of the results of observations under the alternative hypothesis.

Previous

Since statistics as a research method deals with data in which the patterns of interest to the researcher are distorted by various random factors, most statistical calculations are accompanied by testing some assumptions or hypotheses about the source of these data.

Pedagogical hypothesis (scientific hypothesis about the advantage of one method or another) in the process statistical analysis translated into the language of statistical science and re-formulated in at least two statistical hypotheses.

There are two types of hypotheses: the first type - descriptive hypotheses that describe causes and possible consequences. The second type - explanatory : they give an explanation of the possible consequences of certain causes, and also characterize the conditions under which these consequences will necessarily follow, i.e., it is explained by virtue of what factors and conditions this consequence will be. Descriptive hypotheses do not have foresight, while explanatory hypotheses do. Explanatory hypotheses lead researchers to assume the existence of certain regular relationships between phenomena, factors and conditions.

Hypotheses in pedagogical research may suggest that one of the means (or a group of them) will be more effective than other means. Here, a hypothetical assumption is made about the comparative effectiveness of means, methods, methods, forms of education.

A higher level of hypothetical prediction is that the author of the study hypothesizes that some system of measures will not only be better than another, but among a number of possible systems it seems optimal in terms of certain criteria. Such a conjecture needs a more rigorous and therefore more detailed proof.

Kulaichev A.P. Methods and tools for data analysis in the Windows environment. Ed. 3rd, revised. and additional - M: InKo, 1999, pp. 129-131

Psychological-pedagogical dictionary for teachers and heads of educational institutions. - Rostov-n / D: Phoenix, 1998, p. 92

As a result of studying this chapter, the student should:

know

  • what is a statistical hypothesis;
  • ratio of theoretical, experimental and statistical hypotheses;
  • differences between null and alternative hypotheses;
  • the logic of evaluation, acceptance and rejection of statistical hypotheses;
  • notions of errors of the first and second kind, statistical significance(reliability);
  • differences between parametric and non-parametric statistics, the possibilities and limitations of these two types of statistical tests;

be able to

  • test the simplest hypotheses about the mean using t - Student's test for paired (connected) and unpaired (independent) samples;
  • evaluate two samples for homogeneity using t - Student's test and F - Fisher's test;
  • build confidence intervals for the estimated parameters;

own

  • methodological apparatus and basic skills for proposing and testing statistical hypotheses;
  • skills in evaluating statistical hypotheses and constructing confidence intervals.

General strategy

You already know that in statistical analysis it is customary to distinguish between the concepts of "parameter" and "statistics". These differences are discussed in detail in Chap. one; in table. 2.1 summarizes the discussion that took place.

Recall that any distribution can be characterized by certain theoretical parameters. Mathematical expectation, variance, skewness, kurtosis are examples of such parameters of the distribution of a random variable in the general population. All of them, we note once again this important fact, are theoretical quantities that are almost never known in practice. In the practical activity of a researcher, they can only be estimated with varying degrees of accuracy by calculating various statistics, which are not always equal to the theoretical values ​​of the parameters, as well as to each other, as we have already seen in paragraph 1.4, considering practical examples of assessing various parameters of the distribution of such a personality trait as femininity - masculinity.

Table 2.1

Relationship between parameters and statistics

And this is not surprising: after all, statistics reflect the behavior of random variables only in the sample formed by the experimenter, and not in the general population itself. Therefore, the experimenter may wonder how the calculated statistics correlate with the theoretical distribution parameters. In other words, the experimenter may be interested in whether the sample data at his disposal are actually drawn from a general population characterized by the distribution parameters assumed in the theory. To answer this question, the experimenter puts forward and tests statistical hypotheses.

Statistical hypotheses are called assumptions about the possible values ​​of the distribution parameters of a random variable in the general population. Testing and analysis of statistical hypotheses are carried out as a result of collecting and constructing statistics. The tools for this work are statistical tests, or criteria each of which is some set of standardized rules. Based on these rules, a decision is made about the truth or falsity of the statistical hypothesis.

Consider again the coin tossing example. It can be assumed that the probability of falling "heads" when throwing a normal, non-false and undamaged coin is 50%. It means that expected value such an event with a 100-fold coin toss will be equal to 50. The test of this hypothesis will consist in conducting a similar test, estimating the parameter of interest to us as a result by calculating the corresponding statistics, and using these statistics to test the reliability of the hypothesis put forward. For example, by running 100 trials on a coin, we can verify that each side actually came up 50 times. However, it is likely that the result of such a test will still be somewhat different from the theoretically expected one. In other words, even if heads come up a little less or a little more than 50 times, we are unlikely to have reason to believe that the coin is counterfeit. The situation will be suspicious when such a deviation from the theoretically expected values ​​reaches larger values, for example, when the “eagle” does not fall out even once in 100 trials of the coin. Such an arrangement seems unlikely, given that everything is in order with the coin.

So, it is clear that if in the course of a 100-fold toss of a coin, "eagle" fell out exactly 50 times, everything is in order with the coin. If the "eagle" never fell out, there is reason to believe that something is wrong with the coin. But where is the line that separates positive and negative conclusions? This question is related to the chosen decision criterion. It is these criteria that are developed in mathematical statistics to test statistical hypotheses, statistical tests, which are therefore often called statistical criteria.

Thus, testing of statistical hypotheses is carried out as a result of estimating the probability random event, which is considered the value of statistics. If this probability turns out to be very small under the condition that the proposed hypothesis is true, the statistical hypothesis being tested is rejected, otherwise the hypothesis is accepted.

The difficulty of this procedure, however, may lie in the fact that we may not know in advance the specific value of the distribution parameter of the analyzed random variable. For example, in the case of a coin, it can be assumed that the coin is counterfeit, and, therefore, the probability of falling heads is more or less different from 50%. In this case, after conducting a series of tests, we will not be able to assess the degree of difference between the obtained statistics, which characterizes the value of the mathematical expectation of the analyzed event, and its actual value. And then testing the statistical hypothesis may seem impossible. The way out of this situation, however, may be to estimate the probability of a hypothesis opposite to the one put forward. In other words, in this case it is possible, for example, to put forward a hypothesis about the equality of the theoretical probability of 50%. If this hypothesis turns out to be false, the alternative hypothesis is accepted.

Indeed, when testing statistical hypotheses, the researcher always deals with not one, but two hypotheses, which are denoted as H 0 and H 1. One of these hypotheses is called null, the other is called alternative, i.e. refuting zero.

Null hypothesis H 0 is always specific. It always asserts some specific value of the distribution parameter. For example, the expectation hypothesis might be formulated as follows: μ = BUT, where BUT is some specific value of μ, and the hypothesis concerning the equality of the two magnitudes of the variance is σ1 = σ2.

Alternative hypothesis H 1 is always formulated less specifically, for example: μ > BUT ; * σ2, etc. But, as a rule, it turns out that the experimenter is not interested in a specific null hypothesis H 0, but just a less specific alternative hypothesis H 1, since it is this that is more consistent with the scientific hypothesis tested by him in the experiment.

Conducting an empirical assessment of a theoretical parameter, the experimenter determines the statistical significance of the result obtained, taking as a basis the assumption of truth H 0. Statistical significance is the probability that in an infinite number of experiments that completely reproduce the conditions of the experiment, we will get the same or even greater value of the constructed statistics. If the probability of obtaining such and even greater statistics in an infinite number of experiments with the same conditions, given that the null hypothesis is true, turns out to be small, the experimenter abandons the null hypothesis in favor of the alternative one.

The visually described logic is shown in Fig. 2.1. Obviously, two alternative hypotheses are put forward here. One of them is specific and assumes that the mathematical expectation is equal to zero. This hypothesis is labeled H 0. The curve corresponding to it describes the distribution of the random variable Z predicted by this hypothesis. The second hypothesis, denoted as H 1 is less specific. It only states that the value of the mathematical expectation must exceed zero. In principle, there are an infinite number of curves describing distributions corresponding to this hypothesis. The curve shown is one of the possible ones. Value Ζ exp characterizes the value of statistics estimating the theoretical parameter μ in the experiment. This is what the experimenter has at his disposal, what he was able to obtain by collecting empirical data. For example, it can be the value of the arithmetic mean for the sample. Then the verification of the put forward statistical hypotheses should consist in trying to estimate how likely it is in another similar experiment to obtain the same value of Zexp or even more if the null hypothesis is true. Obviously, this probability is equal to the area under the distribution curve assumed by this hypothesis. This area on the left is limited by the calculated statistics, on the right it is not limited. Such an area, as we remember (see paragraph 1.2), is called the distribution quantile. It can be defined like this:

Rice. 2.1.

The amount of quantile required to accept or reject a hypothesis R in this equation is the so-called significance level calculated statistics Zexp. The larger this value, the more likely the data obtained in the experiment are described by the distribution f Ho( Z ), i.e. distribution predicted by the hypothesis H 0. On the contrary, the smaller the value R, the less likely the empirical data actually fit the distribution f H0(Z), and the more likely it is that they are described by a distribution that assumes a higher value of μ. Thus, evaluating the value R, a decision can be made in favor of one of the two hypotheses put forward.

Hypothesis H 0 can be accepted if the value of the quantile that determines the statistical significance of the empirical value x, appears to be large enough. Alternative hypothesis H 1 is accepted if the value of the quantile, which determines the statistical significance of the result obtained in the experiment, turns out to be negligibly small. The problem, however, is which value of the quantile, which specifies the statistical significance, should be considered sufficiently large, which one should be considered negligibly small. To solve this problem, let's take a closer look at what options an experimenter has when evaluating statistical hypotheses (Table 2.2).

It is clear that the statistical hypotheses put forward can be either true or false. Since the hypotheses H 0 and H 1 are alternative, i.e. they exclude each other, there are only two hypothetical cases that characterize the truth or falsity of the hypotheses under consideration: either H 0 will be correct, and H 1 respectively incorrect, or vice versa. Since the experimenter evaluating the hypotheses never knows which of the hypotheses is correct, one hundred decisions to accept or reject the hypothesis H 0 has nothing to do with its truth or falsity - after all, it is precisely these that he is trying to establish. Thus, in the course of testing statistical hypotheses, there are four possible outcomes, of which only two can be considered favorable for the experimenter, regardless of which hypothesis the researcher actually wants to prove.

Table 2.2

Outcome Matrix in Evaluation of Statistical Hypotheses

If the hypothesis H 0 is correct and accepted as a result of statistical analysis, the experimenter does not make a mistake. And this is a favorable outcome for the researcher, even if he would like to accept an alternative hypothesis. Also, the experimenter does not make mistakes when he rejects the hypothesis. H 0, which is actually incorrect. However, it may happen that the null hypothesis is actually true, but the experimenter still rejects it. In this case, he makes a mistake, which is commonly called type one error or α( alpha )- a mistake. Type II error or β( beta )- a mistake An outcome is called an outcome in which the experimenter accepts the null hypothesis, which in fact turns out to be false.

It is clear that the greater the probability that determines the statistical significance of the result obtained in the experiment, at which the experimenter is ready to abandon the null hypothesis in favor of the alternative one, the greater the probability of a type I error and the lower the probability of a type II error (Fig. 2.2). On the contrary, by decreasing the value of the probability at which the experimenter rejects the null hypothesis, he thereby runs the risk of making a Type II error with a greater probability, but thereby protects himself to a greater extent from a Type I error. Thus, the question is at what level of significance the hypothesis H 0 can be rejected or accepted, is actually related to which of the two possible errors is less important for the experimenter. By applying a more conservative strategy for testing a statistical hypothesis, the experimenter neglects the danger of a Type II error. Applying a more radical version of the action, the experimenter, as it were, forgets about the error of the first kind.

Rice. 2.2.

If the acceptance of a statistical hypothesis implies any important social consequences, a more conservative strategy for its evaluation can be applied. If serious consequences could result from not accepting the statistical hypothesis, one can proceed less conservatively.

For example, let the issue of determining the mental retardation of a particular child be considered. In the course of a psychological examination, it was found that his IQ is below the average for this population of subjects. Thus, an assumption arose about the insufficient intellectual development of this child and the need in connection with this to send him to a special boarding school for the mentally retarded. To test this hypothesis, two alternative statistical hypotheses were formulated, one of which assumes that the data obtained during the survey characterize the usual population distribution with a mathematical expectation equal to the border that determines mental retardation, say, 75 points (hypothesis H 0), and the second assumes a lower value of the mathematical expectation, i.e. mathematical expectation is less than a given limit (hypothesis H one). Further suppose that in the course of assessing the statistical significance of an empirical indicator of a child's intellectual development, it turned out that the probability of obtaining the same result or even a lower one in another random test is no more than one chance in 20. The question arises: is it possible to judge on the basis of this result about insufficient empirical validity of the null hypothesis and, therefore, abandon it in favor of an alternative hypothesis H one? It is clear that the answer to this question will largely depend on what kind of erroneous actions can be considered more acceptable. If we are convinced that the stay of a normal child, albeit with low mental faculties in a boarding school for the mentally retarded is better than educating a mentally retarded person in a normal school, we can make one decision regarding setting limits on the level of significance, if we think differently, we need to make another decision.

Fortunately, the researcher is usually spared the trouble of solving this kind of problem. The fact is that it is statistically impossible to substantiate the optimal significance level, which could be taken as a reference when choosing statistical hypotheses. However, there are some quasi-statistical conventions accepted by default (Table 2.3). The empirical result is considered statistically significant to reject the null hypothesis, if the probability of getting the same or greater (smaller) result in another random test is less than one chance in 20, i.e. when the value R turns out to be less than 0.05. If the value R is less than 0.01, then the result is considered highly significant to reject the null hypothesis. In case the value R exceeds 0.10, it is considered that the experiment did not establish statistically significant differences from the theoretical parameter assumed by the null hypothesis. If the received value R is between 0.10 and 0.05, the result is considered indeterminate. It is said to be on the boundary of significance levels. In another way, this result is called marginally significant.

Table 2.3

Standard quantile values ​​that determine statistical decision making

The described strategy for testing and accepting hypotheses is universal and most common. A more conservative strategy may be to take the probability values ​​of 0.01 and 0.001 as reliable and highly reliable levels, respectively, and set the probability value to 0.05 for the unreliable level (O. Yu. Ermolaev, ). Then the marginally significant result will be the one that is in the range from 0.01 to 0.05. However, such a strategy psychological research is rarely used though.

In any case, it must be borne in mind that the results of the analysis of statistical hypotheses cannot be considered sufficient for evaluating experimental hypotheses if they are taken on their own, without connection with the entire experimental situation.

Statistical hypotheses should not be confused with experimental and theoretical hypotheses. Theoretical hypotheses reflect the nature of the connections and regularities of the phenomena under study. Experimental hypotheses are put forward on the basis of the study of such theoretical knowledge in a given area and thus concretize the theoretical hypotheses themselves. Like statistical hypotheses, they involve the simultaneous formulation of competing hypotheses as denying the existence of the alleged causal relationship. Due to this fact, the empirical regularity under study can allow different causal interpretations, called competing hypotheses.

Unlike experimental ones, statistical hypotheses are only a tool for evaluating the data collected during the experiment and do not initially imply any empirical regularity. The result of their verification is only statistical in nature and therefore does not imply automatic acceptance or rejection of both experimental and, even more so, theoretical hypotheses.

STATISTICAL HYPOTHESES

The sample data obtained in experiments are always limited and are largely random. That is why mathematical statistics is used to analyze such data, which makes it possible to generalize the patterns obtained in the sample and extend them to the entire general population.

The data obtained as a result of the experiment on any sample serve as the basis for judging the general population. However, due to the action of random probabilistic reasons, an estimate of the parameters of the general population, made on the basis of experimental (selective) data, will always be accompanied by an error, and therefore such estimates should be considered as conjectural, and not as final statements. Similar assumptions about the properties and parameters of the general population are called statistical hypotheses . As G.V. Sukhodolsky: "A statistical hypothesis is usually understood as a formal assumption that the similarity (or difference) of some parametric or functional characteristics is random or, conversely, not random."

The essence of testing a statistical hypothesis is to establish whether the experimental data and the proposed hypothesis agree, whether it is permissible to attribute the discrepancy between the hypothesis and the result of the statistical analysis of experimental data to random causes. Thus, a statistical hypothesis is a scientific hypothesis that can be statistically tested, and math statistics is a scientific discipline whose task is to scientifically substantiate the testing of statistical hypotheses.

Statistical hypotheses are divided into null and alternative, directional and non-directional.

Null hypothesis(H0) is the no-difference hypothesis. If we want to prove the significance of differences, then the null hypothesis is required refute, otherwise it is required confirm.

Alternative hypothesis (H 1) is a hypothesis about the significance of differences. This is what we want to prove, which is why she is sometimes called experimental hypothesis.

There are tasks when we want to prove exactly insignificance differences, that is, to confirm the null hypothesis. For example, if we need to make sure that different subjects receive tasks, although different, but balanced in difficulty, or that the experimental and control samples do not differ from each other in some significant characteristics. However, more often than not, we still need to prove the significance of the differences for they are more informative for us in our search for the new.

The null and alternative hypotheses can be directional or non-directional.

Directed hypotheses - if it is assumed that in one group the characteristic values ​​are higher, and in the other lower:

H 0: X 1 less than X 2,

H 1: X 1 exceeds X 2.

Undirected hypotheses - if it is assumed that the forms of distribution of a trait in groups differ:

H 0: X 1 no different from X 2,

H 1: X 1 is different X 2.

If we notice that in one of the groups the individual values ​​of the subjects for some attribute, for example, in social activity, are higher, and in the other, they are lower, then in order to test the significance of these differences, we need to formulate directed hypotheses.

If we want to prove that in the group BUT under the influence of some experimental influences, more pronounced changes occurred than in the group B, then we also need to formulate directed hypotheses.

If we want to prove that the forms of distribution of a trait in groups differ BUT and B, then undirected hypotheses are formulated.

Hypothesis testing is carried out using criteria statistical evaluation differences.

The resulting conclusion is called a statistical decision. We emphasize that such a solution is always probabilistic. When testing a hypothesis, experimental data may contradict the hypothesis H 0 , then this hypothesis is rejected. Otherwise, i.e. if the experimental data are consistent with the hypothesis H 0 She doesn't deviate. It is often said in such cases that the hypothesis H 0 accepted. This shows that statistical testing of hypotheses based on experimental sample data is inevitably associated with the risk (probability) of making a false decision. In this case, errors of two kinds are possible. A Type I error will occur when a decision is made to reject the hypothesis. H 0 , although in reality it turns out to be true. A Type II error will occur when the decision is made not to reject the hypothesis. H 0, although in reality it will be incorrect. Obviously, correct conclusions can also be drawn in two cases. Table 7.1 summarizes the above.

Table 7.1

It is possible that the psychologist may be mistaken in his statistical decision; as we see from table 7.1, these errors can only be of two kinds. Since it is impossible to exclude errors in the adoption of statistical hypotheses, it is necessary to minimize the possible consequences, i.e. accepting an incorrect statistical hypothesis. In most cases the only way error minimization is to increase the sample size.

STATISTICAL CRITERIA

Statistical test- this is decision rule, which provides reliable behavior, that is, accepting the true and rejecting the false hypothesis with high probability .

Statistical criteria also indicate the method of calculating a certain number and this number itself.

When we say that the significance of differences was determined by the criterion j *(the criterion is the Fisher angular transformation), then we mean that we used the method j * to calculate a specific number.

By the ratio of the empirical and critical values ​​of the criterion, we can judge whether the null hypothesis is confirmed or refuted.

In most cases, in order for us to recognize differences as significant, it is necessary that the empirical value of the criterion exceed the critical value, although there are criteria (for example, the Mann-Whitney test or the sign test) in which we must adhere to the opposite rule.

In some cases, the calculation formula of the criterion includes the number of observations in the study sample, denoted as n. In this case, the empirical value of the criterion is simultaneously a test for testing statistical hypotheses. Using a special table, we determine what level of statistical significance of differences corresponds to a given empirical value. An example of such a criterion is the criterion j *, calculated on the basis of the Fisher angular transform.

In most cases, however, the same empirical value of the criterion may turn out to be significant or insignificant depending on the number of observations in the study sample ( n) or on the so-called number of degrees of freedom, which is denoted as v or how df.

Number of degrees of freedom v equal to the number of classes variation series minus the number of conditions under which it was formed. These conditions include the sample size ( n), mean and variance.

Suppose a group of 50 people was divided into three classes according to the principle:

Able to work on a computer;

Able to perform only certain operations;

Can't work on a computer.

There were 20 people in the first and second groups, and 10 in the third.

We are limited by one condition - the sample size. Therefore, even if we have lost data on how many people do not know how to use a computer, we can determine this, knowing that there are 20 test subjects in the first and second classes. We are not free to determine the number of subjects in the third category, "freedom" extends only to the first two cells of the classification: