The main characteristic of dispersion of a variational series is called dispersion

The main characteristic of the dispersion of the variation series is called dispersion. Sample varianceD in is calculated using the following formula:

where x i – i -th value from the sample occurring m i times; n – sample size; is the sample mean; k is the number of different values ​​in the sample. In this example: x 1 =72, m 1 =50; x 2 =85, m 2 =44; x 3 =69, m 3 =61; n=155; k=3; . Then:

Note that the larger the dispersion value, the stronger the difference between the values ​​of the measured quantity from each other. If in the sample all values ​​of the measured value are equal to each other, then the variance of such a sample is equal to zero.

The dispersion has special properties.

Property 1.The value of the variance of any sample is non-negative, i.e. .

Property 2.If the measured value is constant X=c, then the variance for such a value is zero: D[c ]= 0.

Property 3.If all values ​​of the measured quantity x in the sample increase in c times, then the variance of this sample will increase by c 2 times: D[cx ]= c 2 D [ x ], where c = const .

Sometimes, instead of the variance, a sample standard deviation is used, which is equal to the arithmetic square root of the sample variance: .

For the considered example, the sample standard deviation is equal to .

Dispersion allows you to evaluate not only the degree of difference in measured indicators within the same group, but can also be used to determine the deviation of data between different groups. For this, several types of dispersion are used.

If any group is taken as a sample, then the variance of this group is called group variance. To express numerically the differences between the variances of several groups, there is the concept intergroup variance. Intergroup variance is the variance of group means relative to the overall mean:

where k is the number of groups in the total sample, is the sample mean for i -th group, n i – sample size i th group, - sample mean for all groups.

Consider an example.

Average score for test in mathematics in 10 "A" class was 3.64, and in 10 "B" class 3.52. In 10 "A" there are 22 students, and in 10 "B" - 21. Let's find the intergroup dispersion.

In this problem, the sample is divided into two groups (two classes). The sample mean for all groups is:

.

In this case, the intergroup variance is:

Since the intergroup variance is close to zero, we can conclude that the scores of one group (10 "A" class) differ slightly from the scores of the second group (10 "B" class). In other words, from the point of view of intergroup variance, the considered groups differ slightly in terms of a given attribute.

If the total sample (for example, a class of students) is divided into several groups, then in addition to the intergroup variance, one can also calculateintragroup variance. This variance is the average of all group variances.

Intragroup varianceD Hungary calculated by the formula:

where k is the number of groups in the total sample, D i – variance i th volume group n i .

There is a relationship between the overall (D in ), intragroup ( D ngr ) and intergroup ( D intergr) dispersions:

D in \u003d D ingr + D intergr.

Scattering characteristics

Sample dispersion measures.

The minimum and maximum of the sample are, respectively, the smallest and highest value the variable being studied. The difference between the maximum and minimum is called on a grand scale samples. All sample data are located between the minimum and maximum. These indicators, as it were, outline the boundaries of the sample.

R#1= 15.6-10=5.6

R №2 \u003d 0.85-0.6 \u003d 0.25

Sample variance(English) variance) and standard deviation samples (English) standard deviation) is a measure of the variability of a variable and characterizes the degree of data spread around the center. At the same time, the standard deviation is a more convenient indicator due to the fact that it has the same dimension as the actual data under study. Therefore, the standard deviation indicator is used along with the value of the arithmetic mean of the sample to briefly describe the results of data analysis.

It is more expedient to calculate the sample variance at by the formula:

The standard deviation is calculated using the formula:

The coefficient of variation is a relative measure of the spread of a trait.

The coefficient of variation is also used as an indicator of the homogeneity of sample observations. It is believed that if the coefficient of variation does not exceed 10%, then the sample can be considered homogeneous, i.e., obtained from one population.

Since the coefficient of variation in both samples, they are homogeneous.

The sample can be represented analytically in the form of a distribution function, as well as in the form of a frequency table consisting of two rows. In the upper line - the elements of the sample (options), arranged in ascending order; the bottom line records the frequency option.

The frequency of options is a number equal to the number of repetitions of this option in the sample.

Sample #1 "Mothers"

Type of distribution curve

Asymmetry or coefficient of skewness (the term was first introduced by Pearson, 1895) is a measure of the skewness of a distribution. If the skewness is distinctly different from 0, the distribution is skewed, the density normal distribution symmetrical about the average.

Index asymmetries(English) skewness) is used to characterize the degree of symmetry in the distribution of data around a center. Asymmetry can take both negative and positive values. A positive value of this parameter indicates that the data is shifted to the left of the center, a negative value - to the right. Thus, the sign of the skewness index indicates the direction of data bias, while the magnitude indicates the degree of this bias. Skewness equal to zero indicates that the data is symmetrically concentrated around the center.

Because the asymmetry is positive, therefore, the top of the curve is shifted to the left from the center.

Kurtosis coefficient(English) kurtosis) is a measure of how tightly the bulk of the data clusters around the center.

With a positive kurtosis, the curve sharpens, with a negative kurtosis, it smoothes out.

The curve is flattened;

The curve is sharpening.

One of the reasons for holding statistical analysis consists in the need to take into account the influence of random factors (perturbations) on the indicator under study, which lead to scatter (scattering) of data. Solving problems in which data scatter is present is associated with risk, since even when using the entire available information it is forbidden exactly predict what will happen in the future. To work adequately in such situations, it is advisable to understand the nature of the risk and be able to determine the degree of dispersion of the data set. There are three numerical characteristics that describe the measure of scattering: standard deviation, range, and coefficient of variation (variability). Unlike typical indicators (mean, median, mode) characterizing the center, scattering characteristics show how close to this center are the individual values ​​of the data set
Definition of Standard Deviation Standard deviation(standard deviation) is a measure of the random deviations of data values ​​from the mean. AT real life most of the data is characterized by scatter, i.e. individual values ​​are at some distance from the average.
It is impossible to use the standard deviation as a generalizing characteristic of scattering by simply averaging the deviations of the data, because some of the deviations will turn out to be positive and the other part will be negative, and, as a result, the averaging result may turn out to be zero. To get rid of the negative sign, a standard trick is used: first calculate dispersion as the sum of squared deviations divided by ( n–1), and then the square root is taken from the resulting value. The formula for calculating the standard deviation is as follows: Note 1. The variance does not carry any additional information compared to the standard deviation, but it is more difficult to interpret, because it is expressed in "units squared", while the standard deviation is expressed in units that are familiar to us (for example, in dollars). Note 2. The above formula is for calculating the standard deviation of a sample and is more accurately called sample standard deviation. When calculating the standard deviation population(denoted by the symbol s) divide by n. The value of the sample standard deviation is somewhat larger (because it is divided by n–1), which provides a correction for the randomness of the sample itself. In the case when the data set has a normal distribution, the standard deviation takes on a special meaning. In the figure below, marks are placed on both sides of the mean at a distance of one, two and three standard deviations, respectively. The figure shows that approximately 66.7% (two-thirds) of all values ​​are within one standard deviation on either side of the mean, 95% of the values ​​will be within two standard deviations of the mean, and almost all of the data (99.7%) will be within three standard deviations of the mean.
66,7%


This property of the standard deviation for normally distributed data is called the "two-thirds rule".

In some situations, such as product quality control analysis, limits are often set such that those observations (0.3%) that are more than three standard deviations from the mean are considered as worthy of attention.

Unfortunately, if the data is not normally distributed, then the rule described above cannot be applied.

There is currently a constraint called Chebyshev's rule that can be applied to skewed (skewed) distributions.

Generate initial data

Table 1 shows the dynamics of changes in daily profit on the stock exchange, fixed on working days for the period from July 31 to October 9, 1987.

Table 1. Dynamics of changes in daily profit on the stock exchange

the date Daily Profit the date Daily Profit the date Daily Profit
-0,006 0,009 0,012
-0,004 -0,015 -0,004
0,008 -0,006 0,002
0,011 0,002 -0,008
-0,001 0,011 -0,010
0,017 0,013 -0,013
0,017 0,002 0,009
-0,004 -0,018 -0,020
0,008 -0,014 -0,003
-0,002 -0,001 -0,001
0,006 -0,001 0,017
-0,017 -0,013 0,001
0,004 0,030 -0,000
0,015 0,007 -0,035
0,001 -0,007 0,001
-0,005 0,001 -0,014
Launch Excel
Create file Click the Save button on the Standard toolbar. open the Statistics folder in the dialog box that appears and name the Scattering Characteristics.xls file.
Set Label 6. On Sheet1, in cell A1, enter the label Daily profit, 7. and in the range A2:A49, enter the data from Table 1.
Set function AVERAGE 8. In cell D1, enter the label Average. In cell D2, calculate the average using the AVERAGE statistical function.
Set STDEV function In cell D4, enter the label Standard Deviation. In cell D5, calculate the standard deviation using the statistical function STDEV
Reduce the word length of the result to the fourth decimal place.
Interpretation of results decline daily profit averaged 0.04% (the value of the average daily profit turned out to be -0.0004). This means that the average daily profit for the considered period of time was approximately equal to zero, i.e. the market was at an average rate. The standard deviation turned out to be 0.0118. This means that one dollar ($1) invested in the stock market per day changed on average by $0.0118, i.e. his investment could result in a profit or loss of $0.0118.
Let's check whether the daily profit values ​​given in Table 1 correspond to the rules of normal distribution 1. Calculate the interval corresponding to one standard deviation on either side of the mean. 2. In cells D7, D8 and F8, set the labels respectively: One standard deviation, Lower limit, Upper limit. 3. In cell D9, enter the formula = -0.0004 - 0.0118, and in cell F9, enter the formula = -0.0004 + 0.0118. 4. Get the result up to four decimal places.

5. Determine the number of daily profits that are within one standard deviation. First, filter the data, leaving the daily profit values ​​in the interval [-0.0121, 0.0114]. To do this, select any cell in column A with daily profit values ​​and run the command:

Data®Filter®AutoFilter

Open the menu by clicking on the arrow in the header Daily Profit, and select (Condition...). In the Custom AutoFilter dialog box, set the options as shown below. Click the OK button.

To count the number of filtered data, select the range of daily profit values, right-click on an empty space in the status bar, and select the Number of values ​​command from the context menu. Read the result. Now display all the original data by running the command: Data®Filter®Show All and turn off the autofilter using the command: Data®Filter®AutoFilter.

6. Calculate the percentage of daily profits that are within one standard deviation of the average. To do this, enter the label in cell H8 Percent, and in cell H9, program the formula for calculating the percentage and get the result with an accuracy of one decimal place.

7. Calculate the range of daily profits within two standard deviations from the mean. In cells D11, D12 and F12, set the labels accordingly: Two standard deviations, Bottom line, Upper bound. In cells D13 and F13, enter the calculation formulas and get the result accurate to the fourth decimal place.

8. Determine the number of daily profits that are within two standard deviations by first filtering the data.

9. Calculate the percentage of daily profits that are two standard deviations away from the average. To do this, enter the label in cell H12 Percent, and in cell H13, program the formula for calculating the percentage and get the result with an accuracy of one decimal place.

10. Calculate the range of daily profits within three standard deviations from the mean. In cells D15, D16 and F16, set the labels accordingly: Three standard deviations, Bottom line, Upper bound. In cells D17 and F17, enter the calculation formulas and get the result accurate to the fourth decimal place.

11. Determine the number of daily profits that are within three standard deviations by first filtering the data. Calculate the percentage of daily profit values. To do this, enter the label in cell H16 Percent, and in cell H17, program the formula for calculating the percentage and get the result with an accuracy of one decimal place.

13. Plot a histogram of the stock's daily earnings on the stock exchange and place it along with the frequency distribution table in the area J1:S20. Show on the histogram the approximate mean and intervals corresponding to one, two, and three standard deviations from the mean, respectively.

No matter how important the average characteristics, but no less important characteristic of the array of numerical data is the behavior of the remaining members of the array in relation to the average, how much they differ from the average, how many members of the array differ significantly from the average. In shooting training, they talk about the accuracy of the results, in statistics they study the characteristics of scattering (scatter).

The difference of any value of x from the average value of x is called deviation and calculated as the difference x, - x. In this case, the deviation can take both positive values ​​if the number is greater than the average, and negative values ​​if the number is less than the average. However, in statistics it is often important to be able to operate with a single number that characterizes the "accuracy" of all numerical elements of the data array. Any summation of all deviations of array members will result in zero, since positive and negative deviations cancel each other out. To avoid nulling, the squared differences are used to characterize the scattering, more precisely, the arithmetic mean of the squared deviations. This scattering characteristic is called sample variance.

The greater the variance, the greater the spread of values random variable. To calculate the variance, an approximate value of the sample mean x is used with a margin of one digit in relation to all members of the data array. Otherwise, when summing a large number of approximate values, a significant error will accumulate. In connection with the dimension of numerical values, one drawback of such a scattering index as sample variance should be noted: the unit of measurement of variance D is the square of the unit of values X, whose characteristic is dispersion. To get rid of this shortcoming, statistics introduced such a scattering characteristic as sample standard deviation , which is denoted by the symbol a (read "sigma") and is calculated by the formula

Normally, more than half of the members of the data array differ from the average by less than the value of the standard deviation, i.e. belong to the segment [X - a; x + a]. Otherwise they say: the average indicator, taking into account the spread of the data, is x ± a.

The introduction of another scattering characteristic is related to the dimension of the members of the data array. All numerical characteristics in statistics are introduced in order to compare the results of the study of different numerical arrays characterizing different random variables. However, it is not significant to compare standard deviations from different average values ​​of different data arrays, especially if the dimensions of these values ​​also differ. For example, if the length and weight of any objects or scattering are compared in the manufacture of micro- and macro-products. In connection with the above considerations, a characteristic of relative scattering is introduced, which is called coefficient of variation and is calculated by the formula

To calculate the numerical characteristics of the dispersion of the values ​​of a random variable, it is convenient to use the table (Table 6.9).

Table 6.9

Calculation of the numerical characteristics of the scattering of values ​​of a random variable

Xj- X

(Xj-X) 2 /

In the process of filling this table is the sample mean X, which will be used later in two forms. As the final average characteristic (for example, in the third column of the table) the sample mean X must be rounded to the nearest digit corresponding to the smallest digit of any member of the numeric data array x r However, this indicator is used in the table for further calculations, and in this situation, namely, when calculating in the fourth column of the table, the sample mean X must be rounded up by one digit from the smallest digit of any member of the numeric data array X ( .

The result of calculations using a table like tab. 6.9 will receive the value of the sample variance, and to record the answer, it is necessary to calculate the value of the standard deviation a based on the value of the sample variance.

The answer indicates: a) the average result, taking into account the scatter of data in the form x±o; b) data stability characteristic v. The answer should evaluate the quality of the coefficient of variation: good or bad.

An acceptable coefficient of variation as an indicator of the homogeneity or stability of results in sports research is 10-15%. The coefficient of variation V= 20% in any study is considered a very large indicator. If the sample size P> 25, then V> 32% is a very bad indicator.

For example, for a discrete variational series 1; 5; four; four; 5; 3; 3; one; one; one; one; one; one; 3; 3; 5; 3; 5; four; four; 3; 3; 3; 3; 3 tab. 6.9 will be filled in as follows (Table 6.10).

Table 6.10

An example of calculating the numerical characteristics of the dispersion of values

*1

fi

1

L P 25 = 2,92 = 2,9

D_S_47.6_ P 25

Answer: a) the average characteristic, taking into account the scatter of the data, is X± a = = 3 ± 1.4; b) the stability of the obtained measurements is at a low level, since the coefficient of variation V = 48% > 32%.

Table analogue. 6.9 can also be used to calculate the scattering characteristics of an interval variation series. At the same time, the options x r will be replaced by representatives of gaps x v ja absolute frequencies option f(- to the absolute frequencies of the gaps fv

Based on the above, the following can be done conclusions.

conclusions mathematical statistics are plausible if information about mass phenomena is processed.

Usually, a sample is studied from the general population of objects, which should be representative.

The experimental data obtained as a result of studying any property of the sample objects is the value of a random variable, since the researcher cannot predict in advance which number will correspond to a particular object.

To choose one or another algorithm for the description and primary processing of experimental data, it is important to be able to determine the type of random variable: discrete, continuous, or mixed.

Discrete random variables are described by a discrete variational series and its graphical form - a frequency polygon.

Mixed and continuous random variables are described by an interval variation series and its graphical form - a histogram.

When comparing several samples according to the level of the formed ™ of a certain property, the average numerical characteristics and numerical characteristics of the dispersion of a random variable with respect to the average are used.

When calculating the average characteristic, it is important to correctly choose the type of average characteristic that is adequate to the area of ​​its application. Structural mean values ​​mode and median characterize the structure of the location of the variant in an ordered array of experimental data. The quantitative mean makes it possible to judge the average size of a variant (sample mean).

To calculate the numerical characteristics of scattering - sample variance, standard deviation and coefficient of variation - the tabular method is effective.

    EFFECTIVE SCATTERING SURFACE (AREA)- characteristic of the reflectivity of the target, expressed by the ratio of the power of el. magn. energy reflected by the target in the direction of the receiver, to the surface energy flux density incident on the target. Depends on… … Encyclopedia of the Strategic Missile Forces

    Quantum mechanics ... Wikipedia

    - (EPR) characteristic of the reflectivity of the target irradiated by electromagnetic waves. The EPR value is defined as the ratio of the flow (power) of electromagnetic energy reflected by the target in the direction of the radio-electronic means (RES), to ... ... Marine Dictionary

    stray band- Statistical characteristics of experimental data, reflecting their deviation from the average values. Topics metallurgy in general EN desperal band … Technical Translator's Handbook

    - (modulation transfer function), function, with the help of a cut, the “sharpness” of the image imaging optical. systems and elements of such systems. Ch. to. x. is the Fourier transform of the so-called. line spreading function describing the nature of the "spreading" ... ... Physical Encyclopedia

    Modulation transfer function, a function that evaluates the "sharpness" properties of imaging optical systems and individual elements of such systems (see, for example, Sharpness of a photographic image). Ch. to. x. there is a Fourier ... ...

    stray band - statistical characteristic experimental data, reflecting their deviation from the mean value. See also: Strip Slip strip Reset strip Hardenability strip … Encyclopedic Dictionary of Metallurgy

    SCATTER BAND- statistical characteristic of experimental data, reflecting their deviation from the average values ​​... Metallurgical Dictionary

    Scattering characteristic of the values ​​of a random variable. Mt. h is related to the square deviation (See. Square deviation) σ by the formula This method of measuring scattering is explained by the fact that in the case of normal ... ... Great Soviet Encyclopedia

    VARIATION STATISTICS- VARIATIONAL STATISTICS, a term that unites a group of statistical analysis techniques used mainly in natural sciences. In the second half of the XIX century. Quetelet (Quetelet, “Anthro pometrie ou mesure des differentes facultes de 1… … Big Medical Encyclopedia

    Expected value- (Population mean) Mathematical expectation is the probability distribution of a random variable. Mathematical expectation, definition, expected value discrete and continuous random variables, selective, conditional expectation, calculation, ... ... Encyclopedia of the investor