Scattering characteristics

Sample dispersion measures.

The minimum and maximum of the sample are, respectively, the smallest and largest value of the variable under study. The difference between the maximum and minimum is called on a grand scale samples. All sample data are located between the minimum and maximum. These indicators, as it were, outline the boundaries of the sample.

R#1= 15.6-10=5.6

R №2 \u003d 0.85-0.6 \u003d 0.25

Sample variance(English) variance) and standard deviation samples (English) standard deviation) is a measure of the variability of a variable and characterizes the degree of data spread around the center. At the same time, the standard deviation is a more convenient indicator due to the fact that it has the same dimension as the actual data under study. Therefore, the standard deviation indicator is used along with the value of the arithmetic mean of the sample to briefly describe the results of data analysis.

It is more expedient to calculate the sample variance at by the formula:

The standard deviation is calculated using the formula:

The coefficient of variation is a relative measure of the spread of a feature.

The coefficient of variation is also used as an indicator of the homogeneity of sample observations. It is believed that if the coefficient of variation does not exceed 10%, then the sample can be considered homogeneous, i.e., obtained from one population.

Since the coefficient of variation in both samples, they are homogeneous.

The sample can be represented analytically in the form of a distribution function, as well as in the form of a frequency table consisting of two rows. In the upper line - the elements of the sample (options), arranged in ascending order; the bottom line records the frequency option.

The frequency of options is a number equal to the number of repetitions of this option in the sample.

Sample #1 "Mothers"

Type of distribution curve

Asymmetry or coefficient of skewness (the term was first introduced by Pearson, 1895) is a measure of the skewness of a distribution. If the skewness is distinctly different from 0, the distribution is skewed, the density normal distribution symmetrical about the average.

Index asymmetries(English) skewness) is used to characterize the degree of symmetry in the distribution of data around a center. Asymmetry can take both negative and positive values. A positive value of this parameter indicates that the data is shifted to the left of the center, a negative value - to the right. Thus, the sign of the skewness index indicates the direction of data bias, while the magnitude indicates the degree of this bias. Skewness equal to zero indicates that the data is symmetrically concentrated around the center.

Because the asymmetry is positive, therefore, the top of the curve is shifted to the left from the center.

Kurtosis coefficient(English) kurtosis) is a measure of how tightly the bulk of the data clusters around the center.

With a positive kurtosis, the curve sharpens, with a negative kurtosis, it smoothes out.

The curve is flattened;

The curve is sharpening.

One of the reasons for holding statistical analysis consists in the need to take into account the influence of random factors (perturbations) on the indicator under study, which lead to scatter (scattering) of data. Solving problems in which data scatter is present is associated with risk, since even when using the entire available information it is forbidden exactly predict what will happen in the future. To work adequately in such situations, it is advisable to understand the nature of the risk and be able to determine the degree of dispersion of the data set. There are three numerical characteristics that describe the measure of dispersion: standard deviation, range, and coefficient of variation (variability). Unlike typical indicators (mean, median, mode) characterizing the center, scattering characteristics show how close to this center are the individual values ​​of the dataset
Definition of Standard Deviation Standard deviation(standard deviation) is a measure of the random deviations of data values ​​from the mean. AT real life most of the data is characterized by scatter, i.e. individual values ​​are at some distance from the average.
It is impossible to use the standard deviation as a generalizing characteristic of scattering by simply averaging the deviations of the data, because some of the deviations will turn out to be positive and the other part will be negative, and, as a result, the averaging result may turn out to be zero. To get rid of the negative sign, a standard trick is used: first calculate dispersion as the sum of squared deviations divided by ( n–1), and then the square root is taken from the resulting value. The formula for calculating the standard deviation is as follows: Note 1. The variance does not carry any additional information compared to the standard deviation, but it is more difficult to interpret, because it is expressed in "units squared", while the standard deviation is expressed in units that are familiar to us (for example, in dollars). Note 2. The above formula is for calculating the standard deviation of a sample and is more accurately called sample standard deviation. When calculating the standard deviation population(denoted by the symbol s) divide by n. The value of the sample standard deviation is somewhat larger (because it is divided by n–1), which provides a correction for the randomness of the sample itself. In the case when the data set has a normal distribution, the standard deviation takes on a special meaning. In the figure below, marks are placed on both sides of the mean at a distance of one, two and three standard deviations, respectively. The figure shows that approximately 66.7% (two-thirds) of all values ​​are within one standard deviation on either side of the mean, 95% of the values ​​will be within two standard deviations of the mean, and almost all of the data (99.7%) will be within three standard deviations of the mean.
66,7%


This property of the standard deviation for normally distributed data is called the "two-thirds rule".

In some situations, such as product quality control analysis, limits are often set such that those observations (0.3%) that are more than three standard deviations from the mean are considered as worthy of attention.

Unfortunately, if the data is not normally distributed, then the rule described above cannot be applied.

There is currently a constraint called Chebyshev's rule that can be applied to skewed (skewed) distributions.

Generate initial data

Table 1 shows the dynamics of changes in daily profit on the stock exchange, recorded on working days for the period from July 31 to October 9, 1987.

Table 1. Dynamics of changes in daily profit on the stock exchange

the date Daily Profit the date Daily Profit the date Daily Profit
-0,006 0,009 0,012
-0,004 -0,015 -0,004
0,008 -0,006 0,002
0,011 0,002 -0,008
-0,001 0,011 -0,010
0,017 0,013 -0,013
0,017 0,002 0,009
-0,004 -0,018 -0,020
0,008 -0,014 -0,003
-0,002 -0,001 -0,001
0,006 -0,001 0,017
-0,017 -0,013 0,001
0,004 0,030 -0,000
0,015 0,007 -0,035
0,001 -0,007 0,001
-0,005 0,001 -0,014
Launch Excel
Create file Click the Save button on the Standard toolbar. open the Statistics folder in the dialog box that appears and name the Scattering Characteristics.xls file.
Set Label 6. On Sheet1, in cell A1, enter the label Daily profit, 7. and in the range A2:A49, enter the data from Table 1.
Set function AVERAGE 8. In cell D1, enter the label Average. In cell D2, calculate the average using the AVERAGE statistical function.
Set STDEV function In cell D4, enter the label Standard Deviation. In cell D5, calculate the standard deviation using the statistical function STDEV
Reduce the word length of the result to the fourth decimal place.
Interpretation of results decline daily profit averaged 0.04% (the value of the average daily profit turned out to be -0.0004). This means that the average daily profit for the considered period of time was approximately equal to zero, i.e. the market was at an average rate. The standard deviation turned out to be 0.0118. This means that one dollar ($1) invested in the stock market per day changed on average by $0.0118, i.e. his investment could result in a profit or loss of $0.0118.
Let's check whether the daily profit values ​​given in Table 1 correspond to the rules of normal distribution 1. Calculate the interval corresponding to one standard deviation on either side of the mean. 2. In cells D7, D8 and F8, set the labels respectively: One standard deviation, Lower limit, Upper limit. 3. In cell D9, enter the formula = -0.0004 - 0.0118, and in cell F9, enter the formula = -0.0004 + 0.0118. 4. Get the result up to four decimal places.

5. Determine the number of daily profits that are within one standard deviation. First, filter the data, leaving the daily profit values ​​in the interval [-0.0121, 0.0114]. To do this, select any cell in column A with daily profit values ​​and run the command:

Data®Filter®AutoFilter

Open the menu by clicking on the arrow in the header Daily Profit, and select (Condition...). In the Custom AutoFilter dialog box, set the options as shown below. Click the OK button.

To count the number of filtered data, select the range of daily profit values, right-click on an empty space in the status bar, and select the Number of values ​​command from the context menu. Read the result. Now display all the original data by running the command: Data®Filter®Show All and turn off the autofilter using the command: Data®Filter®AutoFilter.

6. Calculate the percentage of daily profits that are within one standard deviation of the average. To do this, enter the label in cell H8 Percent, and in cell H9, program the formula for calculating the percentage and get the result with an accuracy of one decimal place.

7. Calculate the range of daily profits within two standard deviations from the mean. In cells D11, D12 and F12, set the labels accordingly: Two standard deviations, Bottom line, Upper bound. In cells D13 and F13, enter the calculation formulas and get the result accurate to the fourth decimal place.

8. Determine the number of daily profits that are within two standard deviations by first filtering the data.

9. Calculate the percentage of daily profits that are two standard deviations away from the average. To do this, enter the label in cell H12 Percent, and in cell H13, program the formula for calculating the percentage and get the result with an accuracy of one decimal place.

10. Calculate the range of daily profits within three standard deviations from the mean. In cells D15, D16 and F16, set the labels accordingly: Three standard deviations, Bottom line, Upper bound. In cells D17 and F17, enter the calculation formulas and get the result accurate to the fourth decimal place.

11. Determine the number of daily profits that are within three standard deviations by first filtering the data. Calculate the percentage of daily profit values. To do this, enter the label in cell H16 Percent, and in cell H17, program the formula for calculating the percentage and get the result with an accuracy of one decimal place.

13. Plot a histogram of the stock's daily earnings on the stock exchange and place it along with the frequency distribution table in the area J1:S20. Show on the histogram the approximate mean and intervals corresponding to one, two, and three standard deviations from the mean, respectively.

Variation series

In the general population, some quantitative sign. A sample of volume is randomly extracted from it n, that is, the number of elements in the sample is n. At the first stage of statistical processing, ranging samples, i.e. number ordering x1, x2, …, xn Ascending. Each observed value xi called option. Frequency mi is the number of observations of the value xi in the sample. Relative frequency (frequency) wi is the frequency ratio mi to sample size n: wi=mi/n.

When studying variation series also use the concepts of cumulative frequency and cumulative frequency. Let x some number. Then the number of options , whose values ​​are less x, is called the cumulative frequency: minak=mi for xi is called the cumulative frequency: winak=miak/n.

An attribute is called discretely variable if its individual values ​​(variants) differ from each other by some finite amount (usually an integer). A variational series of such a feature is called a discrete variational series.

Numerical characteristics of the variation series

Numerical characteristics of variational series are calculated from data obtained as a result of observations (statistical data), therefore they are also called statistical characteristics or estimates. In practice, it is often sufficient to know the summary characteristics of the variation series: average or position characteristics (central tendency); scattering characteristics or variation (variability); shape characteristics (asymmetry and steepness of distribution).

The arithmetic mean characterizes the values ​​of the feature around which observations are concentrated, i.e. central distribution trend.

Dignity medians as a measure of the central tendency lies in the fact that it is not affected by a change in the extreme members of the variation series, if any of them, less than the median, remains less than it, and any, greater than the median, continues to be greater than it. The median is preferable to the arithmetic mean for a series in which the extreme variants in comparison with the rest turned out to be excessively large or small. Peculiarity fashion as a measure of the central tendency lies in the fact that it also does not change when the extreme members of the series change, i.e. has a certain

Polo characteristics

Arithmetic mean (sample mean)

xv=i=1nmixin

Fashion

Mo = xj, if mj=mmax

Me = xk+1, if n = 2k+1;

Me = (xk + xk+1)/2, if n = 2k

Scattering characteristics

Sample variance

Dv=i=1nmixixv2n

Sample standard deviation

σv=Dv

Corrected variance

S2=nn1Dv

Corrected standard deviation

The coefficient of variation

V=σinxin∙100%

mean absolute

deviation

θ= i=1nmixixвn

Variation range

R = xmaxxmin

Quartile range

Rkv \u003d Qv - Qn

Form characteristics

Asymmetry coefficient

As= i=1nmixixin3nσin3

Kurtosis coefficient

Ek=i=1nmixixin4nσin43

resistance to trait variation. But of greatest interest are the measures of variation (scattering) of observations around mean values, in particular, around the arithmetic mean. These estimates include sample variance and standard deviation. Sample variance has one significant drawback: if the arithmetic mean is expressed in the same units as the values random variable, then, according to the definition, the dispersion is already expressed in square units. This shortcoming can be avoided if the standard deviation is used as a measure of the variation of a feature. For small sample sizes, the variance is a biased estimate, so for sample sizes n30 use corrected variance and corrected standard deviation. Another frequently used characteristic of the feature dispersion measure is the coefficient of variation. The advantage of the coefficient of variation is that it is a dimensionless characteristic that allows you to compare the variation of incommensurable

variation series. In addition, the lower the value of the coefficient of variation, the more homogeneous the population according to the trait under study and the more typical the average. Populations with coefficient of variation V> 3035% is considered to be heterogeneous.

Along with dispersion, one also uses mean absolute deviation. The advantage of the average linear deviation is its dimension, because expressed in the same units as the values ​​of the random variable. An additional and simple indicator of the dispersion of feature values ​​is quartile range. The quartile range includes the median and 50% of the observations that reflect the central trend of the trait, excluding the smallest and highest values.

The characteristics of the form include the coefficient of asymmetry and kurtosis. If a asymmetry factor equals zero, then the distribution is symmetrical. If the distribution is asymmetric, one of the frequency polygon branches has a gentler slope than the other. If the asymmetry is right-sided, then the inequality is true: xv>Me>Mo, which means the predominant appearance in the distribution of higher values ​​of the feature . If the asymmetry is left-sided, then the inequality is fulfilled:xv , meaning that in distribution, lower values ​​are more common. The greater the value of the asymmetry coefficient, the more asymmetric the distribution (up to 0.25, the asymmetry is insignificant; from 0.25 to 0.5, moderate; over 0.5, significant).

Excess is an indicator of the steepness (pointedness) of the variational series compared to the normal distribution. If the kurtosis is positive, then the polygon of the variational series has a steeper top. This indicates the accumulation of attribute values ​​in the central zone of the distribution series, i.e. about the predominant appearance in the data of values ​​close to the average value. If the kurtosis is negative, then the polygon has a flatter top compared to the normal curve. This means that the trait values ​​are not concentrated in the central part of the series, but rather evenly scattered over the entire range from the minimum to the maximum value. The greater the absolute value of the kurtosis, the more significantly the distribution differs from the normal one.

We have the largest information base in RuNet, so you can always find similar queries

This topic belongs to:

Surface plastic deformation (SPD)

Cheat sheets for the exam. Machine parts, methods of surface plastic deformation (SPD). Answers

This material includes sections:

Phenomena occurring in the surface layer of a part during SPD processing, hardening mechanism

Surface quality obtained by rolling with a roller tool. Scheme of the process, pressure value, multiplicity of application of the deforming force, technological equipment in the processes of rolling with a ball tool.

Surface quality obtained by rolling with a ball tool. Scheme of the process, pressure value, multiplicity of application of the deforming force, technological equipment in the processes of rolling with a ball tool.

Surface microprofile shaping during sliding indenter treatment, its purpose, tooling in vibration hardening processes, scope.

Shaping of the surface microprofile during processing with a rotating indenter, its purpose, technological equipment in the processes of vibration hardening processing, scope.

What effect does the grid angle of the abrasive grains of the bar have on the productivity of the process and the quality of the machined surface during superfinishing? How to adjust the technological equipment to obtain a certain angle of the grid of notches?

How to ensure obtaining a system of parallel channels and the correct grid of channels when processing with a sliding indenter in PPD processes? Comparative characteristics of these channel grids and their influence on the operational properties of the surfaces of machine parts.

What technological methods ensure the quality of the surface layer of the part at the finishing stage of processing? Give them a comparative description. Criteria for choosing a specific method for solving a specific technical problem.

Vibro-impact processing, essence of the process, scope, technological equipment.

Superfinishing, essence of the process, scope. Selection of sizes, method of fixing the bars and their editing in the processes of superfinishing.

Classification of methods of surface plastic deformation (SPD), comparative characteristics and features of their application. Technological equipment of PPD processes.

Explain the terms: reference length of the profile, reference curve of the surface profile, give examples of the microgeometry of surfaces obtained by various technological methods and the methodology for assessing their bearing capacity.

Rigid and elastic contact in PPD processes, and its technological support. Influence of the type of contact on the quality of the surface layer.

Why is vibration plastic deformation used to improve the operational parameters of parts? Compare it to traditional rolling and smoothing without vibrations. Characteristics of the technological equipment of these compared methods

Phenomena occurring in the surface layer of a part during SPD processing, the mechanism of residual stress formation.

Surface and volume burnishing of holes, essence of the process, scope, technological support of burnishing.

Comparative characteristics of grinding methods: high-speed; power; combined; integral; strengthening.

The concept of experiment. Measurement errors: misses, systematic, random. Related content:

Features of studying the topic "Algorithms" in elementary school with the use of computer training programs

Coursework direction of preparation Pedagogical education. The purpose of this work is to identify and prove the need and effectiveness of studying algorithmization in elementary school using computer training programs.

Topographic maps of universal recognition

Abstract. Topographic photographs of land and water areas. Foreign topographic maps

Aesthetics (Aristotle and Plato)

Aristotle, theories of mimesis, the principle of proportionality between man and beauty. Musical aesthetics, Pythagorean aesthetics, Musical and mathematical harmony. Plato's Idealistic Aesthetics

Fertilizer application system in crop rotation

Course project of the Faculty of Agronomy. Department of Agrochemistry and Soil Science

Energy efficiency in construction. Heat drying

Part of a course project. Thermal efficiency of drying installations. Air curtains.

The main characteristic of dispersion of a variational series is called dispersion

The main characteristic of the dispersion of the variation series is called dispersion. Sample varianceD in is calculated using the following formula:

where x i – i -th value from the sample occurring m i times; n – sample size; is the sample mean; k is the number of different values ​​in the sample. In this example: x 1 =72, m 1 =50; x 2 =85, m 2 =44; x 3 =69, m 3 =61; n=155; k=3; . Then:

Note that the larger the dispersion value, the stronger the difference between the values ​​of the measured quantity from each other. If in the sample all values ​​of the measured value are equal to each other, then the variance of such a sample is equal to zero.

The dispersion has special properties.

Property 1.The value of the variance of any sample is non-negative, i.e. .

Property 2.If the measured value is constant X=c, then the variance for such a value is zero: D[c ]= 0.

Property 3.If all values ​​of the measured quantity x in the sample increase in c times, then the variance of this sample will increase by c 2 times: D[cx ]= c 2 D [ x ], where c = const .

Sometimes, instead of the variance, a sample standard deviation is used, which is equal to the arithmetic square root of the sample variance: .

For the considered example, the sample standard deviation is equal to .

Dispersion allows you to evaluate not only the degree of difference in measured indicators within one group, but can also be used to determine the deviation of data between different groups. For this, several types of dispersion are used.

If any group is taken as a sample, then the variance of this group is called group variance. To express numerically the differences between the variances of several groups, there is the concept intergroup variance. Intergroup variance is the variance of group means relative to the overall mean:

where k is the number of groups in the total sample, is the sample mean for i -th group, n i – sample size i th group, - sample mean for all groups.

Consider an example.

The average score for the control work in mathematics in 10 "A" class was 3.64, and in 10 "B" class 3.52. In 10 "A" there are 22 students, and in 10 "B" - 21. Let's find the intergroup dispersion.

In this problem, the sample is divided into two groups (two classes). The sample mean for all groups is:

.

In this case, the intergroup variance is:

Since the intergroup variance is close to zero, we can conclude that the scores of one group (10 "A" class) differ slightly from the scores of the second group (10 "B" class). In other words, from the point of view of intergroup variance, the considered groups differ slightly in terms of a given attribute.

If the total sample (for example, a class of students) is divided into several groups, then in addition to the intergroup variance, one can also calculateintragroup variance. This variance is the average of all group variances.

Intragroup varianceD Hungary calculated by the formula:

where k is the number of groups in the total sample, D i – variance i th volume group n i .

There is a relationship between the overall (D in ), intragroup ( D ngr ) and intergroup ( D intergr) dispersions:

D in \u003d D ingr + D intergr.

The position characteristics describe the distribution center. At the same time, the values ​​of a variant can be grouped around it in both a wide and a narrow band. Therefore, to describe the distribution, it is necessary to characterize the range of change in the values ​​of the attribute. Scattering characteristics are used to describe the range of feature variation. The most widely used are the range of variation, variance, standard deviation and coefficient of variation.

Span variation is defined as the difference between the maximum and minimum value of the trait in the studied population:

R=x max- x min.

The obvious advantage of this indicator is the ease of calculation. However, since the range of variation depends on the values ​​of only the extreme values ​​of the attribute, the scope of its application is limited to fairly homogeneous distributions. In other cases, the information content of this indicator is very small, since there are a lot of distributions that differ greatly in shape, but have the same range. In practical studies, the range of variation is sometimes used for small (no more than 10) sample sizes. So, for example, by the range of variation it is easy to estimate how much the best and worst results differ in a group of athletes.

In this example:

R\u003d 16.36 - 13.04 \u003d 3.32 (m).

The second scattering characteristic is dispersion. The variance is the average square of the deviation of the value of a random variable from its mean value. Dispersion is a characteristic of dispersion, the dispersion of values ​​of a quantity around its average value. The word "dispersion" itself means "scattering".

When conducting sample studies, it is necessary to establish an estimate for the variance. The variance calculated from the sample data is called the sample variance and is denoted S 2 .

At first glance, the most natural estimate for the variance is the statistical variance calculated from the definition using the formula:

In this formula, the sum of the squared deviations of the attribute values x i from the arithmetic mean . This sum is divided by the sample size to obtain the mean squared deviations. P.

However, this estimate is not unbiased. It can be shown that the sum of the squared deviations of the attribute values ​​for the sample arithmetic mean is less than the sum of the squared deviations from any other value, including the true mean (mathematical expectation). Therefore, the result obtained by the above formula will contain a systematic error, and the estimated value of the variance will be underestimated. To eliminate the bias, it is enough to introduce a correction factor. The result is the following relation for the estimated variance:

For large values n, of course, both estimates - biased and unbiased - will differ very little and the introduction of a correction factor becomes meaningless. As a rule, the formula for estimating the variance should be refined when n<30.

In the case of grouped data, the last formula to simplify calculations can be reduced to the following form:

where k- number of grouping intervals;

n i- interval frequency with number i;

x i- the middle value of the interval with the number i.

As an example, let's calculate the variance for the grouped data of the example we are analyzing (see Table 4.):

S 2 =/ 28=0.5473 (m2).

The variance of a random variable has the dimension of the square of the dimension of the random variable, which makes it difficult to interpret and makes it not very visual. For a more visual description of scattering, it is more convenient to use a characteristic whose dimension coincides with the dimension of the feature under study. For this purpose, the concept standard deviation(or standard deviation).

standard deviation is called the positive square root of the variance:

In our example, the standard deviation is

The standard deviation has the same units of measurement as the measurement results of the trait under study and, thus, it characterizes the degree of deviation of the trait from the arithmetic mean. In other words, it shows how the main part of the variant is located relative to the arithmetic mean.

Standard deviation and variance are the most widely used measures of variation. This is due to the fact that they are included in a significant part of the theorems of probability theory, which serves as the foundation of mathematical statistics. In addition, the variance can be decomposed into its constituent elements, which make it possible to assess the influence of various factors on the variation of the trait under study.

In addition to the absolute indicators of variation, which are the variance and standard deviation, relative ones are introduced in statistics. The most commonly used coefficient of variation. The coefficient of variation is equal to the ratio of the standard deviation to the arithmetic mean, expressed as a percentage:

It is clear from the definition that, in its meaning, the coefficient of variation is a relative measure of the dispersion of a feature.

For the example in question:

The coefficient of variation is widely used in statistical research. Being a relative value, it allows you to compare the fluctuations of both traits with different units of measurement, as well as the same trait in several different populations with different values ​​of the arithmetic mean.

The coefficient of variation is used to characterize the homogeneity of the obtained experimental data. In the practice of physical culture and sports, the spread of measurement results depending on the value of the coefficient of variation is considered to be small (V<10%), средним (11-20%) и большим (V> 20%).

Restrictions on the use of the coefficient of variation are related to its relative nature - the definition contains a normalization to the arithmetic mean. In this regard, for small absolute values ​​of the arithmetic mean, the coefficient of variation may lose its information content. The closer the value of the arithmetic mean to zero, the less informative this indicator becomes. In the limiting case, the arithmetic mean goes to zero (for example, temperature) and the coefficient of variation goes to infinity, regardless of the spread of the sign. By analogy with the error case, we can formulate the following rule. If the value of the arithmetic mean in the sample is greater than one, then the use of the coefficient of variation is justified; otherwise, dispersion and standard deviation should be used to describe the spread of experimental data.

In conclusion of this part, we consider the assessment of the variation in the values ​​of the estimated characteristics. As already noted, the values ​​of the distribution characteristics calculated from the experimental data do not coincide with their true values ​​for the general population. It is not possible to accurately establish the latter, since, as a rule, it is impossible to examine the entire population. If we use the results of different samples from the same general population to estimate the distribution parameters, then it turns out that these estimates for different samples differ from each other. Estimated values ​​fluctuate around their true values.

Deviations of estimates of general parameters from the true values ​​of these parameters are called statistical errors. The reason for their occurrence is the limited size of the sample - not all objects of the general population are included in it. To estimate the magnitude of statistical errors, the standard deviation of sample characteristics is used.

As an example, consider the most important position characteristic - the arithmetic mean. It can be shown that the standard deviation of the arithmetic mean is given by:

where σ - standard deviation for the general population.

Since the true value of the standard deviation is not known, a quantity called standard error of the arithmetic mean and equal:

The value characterizes the error that, on average, is allowed when replacing the general average with its sample estimate. According to the formula, an increase in the sample size during the study leads to a decrease in the standard error in proportion to the square root of the sample size.

For the example under consideration, the value of the standard error of the arithmetic mean is . In our case, it turned out to be 5.4 times less than the value of the standard deviation.