Fundamentals of probabilistic-statistical methods for describing uncertainties. Probabilistic-statistical methods of research and the method of system analysis Probabilistic statistical methods

How are probability and mathematical statistics used? These disciplines are the basis of probabilistic-statistical methods decision making. To use their mathematical apparatus, you need tasks decision making express in terms of probabilistic-statistical models. Application of a specific probabilistic-statistical method decision making consists of three stages:

transition from economic, managerial, technological reality to an abstract mathematical and statistical scheme, i.e. building a probabilistic model of a control system, a technological process, decision-making procedures, in particular according to the results of statistical control, etc.;
carrying out calculations and obtaining conclusions by purely mathematical means within the framework of a probabilistic model;
interpretation of mathematical and statistical conclusions in relation to a real situation and making an appropriate decision (for example, on the conformity or non-compliance of product quality with established requirements, the need to adjust the technological process, etc.), in particular, conclusions (on the proportion of defective units of products in a batch, on specific form of distribution laws controlled parameters technological process, etc.).

Mathematical statistics uses the concepts, methods and results of probability theory. Consider the main issues of building probabilistic models decision making in economic, managerial, technological and other situations. For the active and correct use of normative-technical and instructive-methodical documents on probabilistic-statistical methods decision making prior knowledge is required. So, it is necessary to know under what conditions one or another document should be applied, what initial information is necessary to have for its selection and application, what decisions should be made based on the results of data processing, etc.

Examples of application of probability theory and mathematical statistics. Let's consider several examples when probabilistic-statistical models are a good tool for solving managerial, industrial, economic, and national economic problems. So, for example, in the novel by A.N. Tolstoy's "Walking through the torments" (vol. 1) says: "the workshop gives twenty-three percent of the marriage, you hold on to this figure," Strukov said to Ivan Ilyich.

The question arises how to understand these words in the conversation of factory managers, since one unit of production cannot be defective by 23%. It can be either good or defective. Perhaps Strukov meant that a large batch contains approximately 23% of defective units. Then the question arises, what does "about" mean? Let 30 out of 100 tested units of products turn out to be defective, or out of 1000-300, or out of 100000-30000, etc., should Strukov be accused of lying?

Or another example. The coin that is used as a lot must be "symmetrical", i.e. when it is thrown, on average, in half the cases, the coat of arms should fall out, and in half the cases - the lattice (tails, number). But what does "average" mean? If you spend many series of 10 throws in each series, then there will often be series in which a coin drops out 4 times with a coat of arms. For a symmetrical coin, this will happen in 20.5% of the series. And if there are 40,000 coats of arms for 100,000 tosses, can the coin be considered symmetrical? Procedure decision making is based on the theory of probability and mathematical statistics.

The example under consideration may not seem serious enough. However, it is not. Drawing lots is widely used in the organization of industrial feasibility experiments, for example, when processing the results of measuring the quality index (friction moment) of bearings depending on various technological factors (the influence of a conservation environment, methods of preparing bearings before measurement, the effect of bearing load in the measurement process, etc.). P.). Suppose it is necessary to compare the quality of bearings depending on the results of their storage in different conservation oils, i.e. in composition oils and . When planning such an experiment, the question arises which bearings should be placed in the composition oil, and which - in the composition oil, but in such a way as to avoid subjectivity and ensure the objectivity of the decision.

The answer to this question can be obtained by drawing lots. A similar example can be given with the quality control of any product. Sampling is done to decide whether or not an inspected lot of products meets the specified requirements. Based on the results of the sample control, a conclusion is made about the entire batch. In this case, it is very important to avoid subjectivity in the formation of the sample, i.e. it is necessary that each unit of product in the controlled lot has the same probability of being selected in the sample. Under production conditions, the selection of units of production in the sample is usually carried out not by lot, but by special tables of random numbers or with the help of computer random number generators.

Similar problems of ensuring the objectivity of comparison arise when comparing different schemes. production organization, remuneration, during tenders and competitions, selection of candidates for vacant positions, etc. Everywhere you need a lottery or similar procedures. Let us explain by the example of identifying the strongest and second strongest teams when organizing a tournament according to the Olympic system (the loser is eliminated). Let the stronger team always win over the weaker one. It is clear that the strongest team will definitely become the champion. The second strongest team will reach the final if and only if it has no games with the future champion before the final. If such a game is planned, then the second strongest team will not reach the final. The one who plans the tournament can either "knock out" the second strongest team from the tournament ahead of time, bringing it down in the first meeting with the leader, or provide it with second place, ensuring meetings with weaker teams until the final. To avoid subjectivity, draw lots. For an 8-team tournament, the probability that the two strongest teams will meet in the final is 4/7. Accordingly, with a probability of 3/7, the second strongest team will leave the tournament ahead of schedule.

In any measurement of product units (using a caliper, micrometer, ammeter, etc.), there are errors. To find out if there are systematic errors, it is necessary to make repeated measurements of a unit of production, the characteristics of which are known (for example, a standard sample). It should be remembered that in addition to the systematic error, there is also a random error.

Therefore, the question arises of how to find out from the measurement results whether there is a systematic error. If we note only whether the error obtained during the next measurement is positive or negative, then this problem can be reduced to the previous one. Indeed, let's compare the measurement with the throwing of a coin, the positive error - with the loss of the coat of arms, the negative - with the lattice (zero error with a sufficient number of divisions of the scale almost never occurs). Then checking the absence of a systematic error is equivalent to checking the symmetry of the coin.

The purpose of these considerations is to reduce the problem of checking the absence of a systematic error to the problem of checking the symmetry of a coin. The above reasoning leads to the so-called "criterion of signs" in mathematical statistics.

In statistical regulation of technological processes, based on the methods of mathematical statistics, rules and plans for statistical control of processes are developed, aimed at timely detection of the disorder of technological processes, taking measures to adjust them and prevent the release of products that do not meet the established requirements. These measures are aimed at reducing production costs and losses from the supply of low-quality products. With statistical acceptance control, based on the methods of mathematical statistics, quality control plans are developed by analyzing samples from product batches. The difficulty lies in being able to correctly build probabilistic-statistical models decision making on the basis of which the above questions can be answered. In mathematical statistics, probabilistic models and methods for testing hypotheses have been developed for this, in particular, hypotheses that the proportion of defective units of production is equal to a certain number, for example, (remember the words of Strukov from the novel by A.N. Tolstoy).

Assessment tasks. In a number of managerial, industrial, economic, national economic situations, problems of a different type arise - problems of estimating the characteristics and parameters of probability distributions.

Consider an example. Let a batch of N electric lamps come to the control. A sample of n electric lamps was randomly selected from this batch. A number of natural questions arise. How can the average service life of electric lamps be determined from the results of testing the sample elements and with what accuracy can this characteristic be estimated? How will the accuracy change if a larger sample is taken? At what number of hours can it be guaranteed that at least 90% of the electric lamps will last more than hours?

Suppose that when testing a sample with a volume of electric lamps, electric lamps turned out to be defective. Then the following questions arise. What limits can be specified for the number of defective electric lamps in a batch, for the level of defectiveness, etc.?

Or, in a statistical analysis of the accuracy and stability of technological processes, it is necessary to evaluate such quality indicators, as an average controlled parameter and the degree of its spread in the process under consideration. According to the theory of probability, it is advisable to use it as the average value of a random variable expected value, and as a statistical characteristic of the spread - dispersion, standard deviation or the coefficient of variation. This raises the question: how to evaluate these statistical characteristics according to sample data and with what accuracy can this be done? There are many similar examples. Here it was important to show how probability theory and mathematical statistics can be used in production management when making decisions in the field of statistical product quality management.

What is "mathematical statistics"? Mathematical statistics is understood as "a branch of mathematics devoted to the mathematical methods of collecting, systematizing, processing and interpreting statistical data, as well as using them for scientific or practical conclusions. The rules and procedures of mathematical statistics are based on the theory of probability, which makes it possible to evaluate the accuracy and reliability of the conclusions obtained in each task based on the available statistical material" [ [ 2.2], p. 326]. At the same time, statistical data refers to information about the number of objects in a more or less extensive collection that have certain characteristics.

According to the type of problems being solved, mathematical statistics is usually divided into three sections: data description, estimation, and hypothesis testing.

According to the type of statistical data being processed, mathematical statistics is divided into four areas:

one-dimensional statistics (statistics of random variables), in which the result of an observation is described by a real number;
multidimensional statistical analysis, where the result of observation over the object is described by several numbers (vector);
statistics of random processes and time series, where the result of observation is a function;
statistics of objects of a non-numerical nature, in which the result of an observation is of a non-numerical nature, for example, it is a set (a geometric figure), an ordering, or obtained as a result of a measurement by a qualitative attribute.

Historically, some areas of statistics of non-numerical objects (in particular, problems of estimating the percentage of marriage and testing hypotheses about it) and one-dimensional statistics were the first to appear. The mathematical apparatus is simpler for them, therefore, by their example, they usually demonstrate the main ideas of mathematical statistics.

Only those methods of data processing, ie. mathematical statistics are evidence-based, which are based on probabilistic models of relevant real phenomena and processes. We are talking about models of consumer behavior, the occurrence of risks, the functioning of technological equipment, obtaining the results of an experiment, the course of a disease, etc. A probabilistic model of a real phenomenon should be considered constructed if the quantities under consideration and the relationships between them are expressed in terms of probability theory. Correspondence to the probabilistic model of reality, i.e. its adequacy is substantiated, in particular, using statistical methods for testing hypotheses.

Incredible data processing methods are exploratory, they can only be used in preliminary data analysis, since they do not make it possible to assess the accuracy and reliability of the conclusions obtained on the basis of limited statistical material.

Probabilistic and statistical methods are applicable wherever it is possible to construct and substantiate a probabilistic model of a phenomenon or process. Their use is mandatory when conclusions drawn from sample data are transferred to the entire population (for example, from a sample to an entire batch of products).

In specific applications, they are used as probabilistic statistical methods wide application, as well as specific ones. For example, in the section of production management devoted to statistical methods of product quality control, applied mathematical statistics (including the design of experiments) are used. With the help of its methods, statistical analysis accuracy and stability of technological processes and statistical evaluation quality. Specific methods include methods of statistical acceptance control of product quality, statistical regulation of technological processes, assessment and control of reliability, etc.

Such applied probabilistic-statistical disciplines as reliability theory and queuing theory are widely used. The content of the first of them is clear from the title, the second deals with the study of systems such as a telephone exchange, which receives calls at random times - the requirements of subscribers dialing numbers on their telephones. The duration of the service of these requirements, i.e. the duration of conversations is also modeled by random variables. A great contribution to the development of these disciplines was made by Corresponding Member of the USSR Academy of Sciences A.Ya. Khinchin (1894-1959), Academician of the Academy of Sciences of the Ukrainian SSR B.V. Gnedenko (1912-1995) and other domestic scientists.

Briefly about the history of mathematical statistics. Mathematical statistics as a science begins with the works of the famous German mathematician Carl Friedrich Gauss (1777-1855), who, based on the theory of probability, investigated and substantiated least square method, created by him in 1795 and used to process astronomical data (in order to refine the orbit of the minor planet Ceres). One of the most popular probability distributions, the normal one, is often named after him, and in the theory of random processes, the main object of study is Gaussian processes.

AT late XIX in. - the beginning of the twentieth century. a major contribution to mathematical statistics was made by English researchers, primarily K. Pearson (1857-1936) and R.A. Fisher (1890-1962). In particular, Pearson developed the "chi-square" test for testing statistical hypotheses, and Fischer - analysis of variance, the theory of experiment planning, the maximum likelihood method of parameter estimation.

In the 30s of the twentieth century. Pole Jerzy Neumann (1894-1977) and Englishman E. Pearson developed a general theory of testing statistical hypotheses, and Soviet mathematicians Academician A.N. Kolmogorov (1903-1987) and Corresponding Member of the USSR Academy of Sciences N.V. Smirnov (1900-1966) laid the foundations of non-parametric statistics. In the forties of the twentieth century. Romanian A. Wald (1902-1950) built the theory of consistent statistical analysis.

Mathematical statistics is rapidly developing at the present time. So, over the past 40 years, four fundamentally new areas of research can be distinguished [ [ 2.16 ] ]:

development and implementation of mathematical methods for planning experiments;
development of statistics of objects of non-numerical nature as an independent direction in applied mathematical statistics;
development of statistical methods resistant to small deviations from the used probabilistic model;
wide deployment of work on the creation of computer software packages designed for statistical data analysis.

Probabilistic-statistical methods and optimization. The idea of optimization permeates modern applied mathematical statistics and other statistical methods. Namely, methods for planning experiments, statistical acceptance control, statistical regulation of technological processes, etc. On the other hand, optimization formulations in theory decision making, for example, the applied theory of optimizing product quality and the requirements of standards, provide for the widespread use of probabilistic-statistical methods, primarily applied mathematical statistics.

In production management, in particular, when optimizing product quality and standard requirements, it is especially important to apply statistical methods at the initial stage life cycle products, i.e. at the stage of research preparation of experimental design developments (development of promising requirements for products, preliminary design, terms of reference for experimental design development). This is due to the limited information available at the initial stage of the product life cycle and the need to predict the technical possibilities and economic situation for the future. Statistical Methods should be applied at all stages of solving the optimization problem - when scaling variables, developing mathematical models for the functioning of products and systems, conducting technical and economic experiments, etc.

In optimization problems, including optimization of product quality and standard requirements, all areas of statistics are used. Namely - the statistics of random variables, multivariate statistical analysis, statistics of random processes and time series, statistics of objects of non-numerical nature. The choice of a statistical method for the analysis of specific data should be carried out according to the recommendations [

3.5.1. Probabilistic-statistical method of research.

In many cases, it is necessary to investigate not only deterministic, but also random probabilistic (statistical) processes. These processes are considered on the basis of probability theory.

The totality of the random variable x is the primary mathematical material. A collection is understood as a set of homogeneous events. The set containing the most diverse variants of a mass phenomenon is called the general population, or a large sample of N. Usually only a part of the general population is studied, called sample population or small sample.

Probability R(x) developments X called the ratio of the number of cases N(x), that lead to the occurrence of the event X, to total number possible cases N:

P(x)=N(x)/N.

Probability theory considers theoretical distributions of random variables and their characteristics.

Math statistics deals with ways of processing and analyzing empirical events.

These two related sciences constitute a unified mathematical theory of mass random processes, widely used for the analysis scientific research.

Very often, the methods of probability and mathematical statistics are used in the theory of reliability, survivability and safety, which is widely used in various branches of science and technology.

3.5.2. Method of statistical modeling or statistical tests (Monte Carlo method).

This method is numerical method solving complex problems and is based on the use of random numbers simulating probabilistic processes. The results of the solution by this method make it possible to establish empirically the dependences of the processes under study.

Solving problems using the Monte Carlo method is effective only with the use of high-speed computers. To solve problems using the Monte Carlo method, it is necessary to have a statistical series, know the law of its distribution, the average value of the mathematical expectation t(x), standard deviation.

Using this method, one can obtain an arbitrarily given accuracy of the solution, i.e.

-> m(x)

3.5.3. System analysis method.

System analysis is understood as a set of techniques and methods for studying complex systems, which are a complex set of interacting elements. The interaction of the elements of the system is characterized by direct and feedback connections.

The essence of system analysis is to identify these relationships and establish their impact on the behavior of the entire system as a whole. The most complete and deep system analysis can be performed using the methods of cybernetics, which is the science of complex dynamic systems that can perceive, store and process information for the purposes of optimization and control.

System analysis consists of four stages.

The first stage consists in setting the task: they determine the object, goals and objectives of the study, as well as the criteria for studying the object and managing it.

During the second stage, the boundaries of the system under study are determined and its structure is determined. All objects and processes related to the goal are divided into two classes - the system under study and the external environment. Distinguish closed and open systems. When researching closed systems influence external environment their behavior is neglected. Then separate the individual components of the system - its elements, establish the interaction between them and the external environment.

The third stage of system analysis is the compilation of a mathematical model of the system under study. First, the system is parametrized, the main elements of the system and elementary effects on it are described using certain parameters. At the same time, there are parameters that characterize continuous and discrete, deterministic and probabilistic processes. Depending on the characteristics of the processes, one or more are used. mathematical apparatus.

As a result of the third stage of system analysis, complete mathematical models of the system are formed, described in a formal, for example, algorithmic, language.

At the fourth stage, the resulting mathematical model is analyzed, its extreme conditions are found in order to optimize processes and control systems, and formulate conclusions. Optimization is evaluated according to the optimization criterion, which in this case takes extreme values (minimum, maximum, minimax).

Usually, one criterion is chosen, and threshold maximum permissible values are set for others. Sometimes mixed criteria are used, which are a function of the primary parameters.

Based on the selected optimization criterion, the dependence of the optimization criterion on the parameters of the model of the object (process) under study is compiled.

There are various mathematical methods for optimizing the models under study: methods of linear, non-linear or dynamic programming; probabilistic-statistical methods based on the theory of queuing; game theory, which considers the development of processes as random situations.

Questions for self-control of knowledge

Methodology of theoretical research.

The main sections of the stage of theoretical development of scientific research.

Types of models and types of modeling of the object of study.

Analytical methods of research.

Analytical research methods using experiment.

Probabilistic-analytical method of research.

Methods of static modeling (Monte Carlo method).

Method of system analysis.

Statistical Methods

Statistical methods- methods of analysis of statistical data. There are methods of applied statistics, which can be applied in all areas of scientific research and any sectors of the national economy, and other statistical methods, the applicability of which is limited to a particular area. This refers to methods such as statistical acceptance control, statistical control of technological processes, reliability and testing, and design of experiments.

Classification of statistical methods

Statistical methods of data analysis are used in almost all areas of human activity. They are used whenever it is necessary to obtain and substantiate any judgments about a group (objects or subjects) with some internal heterogeneity.

It is advisable to distinguish three types of scientific and applied activities in the field of statistical methods of data analysis (according to the degree of specificity of methods associated with immersion in specific problems):

a) development and research of general purpose methods, without taking into account the specifics of the field of application;

b) development and research of statistical models of real phenomena and processes in accordance with the needs of a particular field of activity;

c) application of statistical methods and models for statistical analysis of specific data.

Applied Statistics

Description of the type of data and the mechanism of their generation is the beginning of any statistical research. Both deterministic and probabilistic methods are used to describe data. With the help of deterministic methods, it is possible to analyze only those data that are at the disposal of the researcher. For example, they were used to obtain tables calculated by official state statistics bodies on the basis of statistical reports submitted by enterprises and organizations. It is possible to transfer the obtained results to a wider set, to use them for prediction and control only on the basis of probabilistic-statistical modeling. Therefore, only methods based on probability theory are often included in mathematical statistics.

We do not consider it possible to oppose deterministic and probabilistic-statistical methods. We consider them as successive stages of statistical analysis. At the first stage, it is necessary to analyze the available data, present them in a form convenient for perception using tables and charts. Then it is advisable to analyze the statistical data on the basis of certain probabilistic-statistical models. Note that the possibility of a deeper insight into the essence of a real phenomenon or process is provided by the development of an adequate mathematical model.

In the simplest situation, statistical data are the values of some feature characteristic of the objects under study. Values can be quantitative or represent an indication of the category to which the object can be assigned. In the second case, we talk about a qualitative sign.

When measuring by several quantitative or qualitative characteristics, we obtain a vector as statistical data about the object. It can be considered as a new kind of data. In this case, the sample consists of a set of vectors. If part of the coordinates is numbers, and part is qualitative (categorized) data, then we are talking about a vector of heterogeneous data.

One element of the sample, that is, one dimension, can be a function as a whole. For example, describing the dynamics of the indicator, that is, its change over time, is the patient's electrocardiogram or the amplitude of the beats of the motor shaft. Or a time series that describes the dynamics of the performance of a particular firm. Then the sample consists of a set of functions.

The elements of the sample can also be other mathematical objects. For example, binary relations. Thus, when polling experts, they often use ordering (ranking) of objects of expertise - product samples, investment projects, options for management decisions. Depending on the regulations of the expert study, the elements of the sample can be various types of binary relations (ordering, partitioning, tolerance), sets, fuzzy sets etc.

So, the mathematical nature of the sample elements in various problems of applied statistics can be very different. However, two classes of statistics can be distinguished - numeric and non-numeric. Accordingly, applied statistics is divided into two parts - numerical statistics and non-numerical statistics.

Numeric statistics are numbers, vectors, functions. They can be added, multiplied by coefficients. Therefore, in numerical statistics great importance have different amounts. The mathematical apparatus for analyzing sums of random sample elements is the (classical) laws of large numbers and central limit theorems.

Non-numeric statistical data are categorized data, vectors of heterogeneous features, binary relations, sets, fuzzy sets, etc. They cannot be added and multiplied by coefficients. So it doesn't make sense to talk about sums of non-numeric statistics. They are elements of non-numerical mathematical spaces (sets). The mathematical apparatus for the analysis of non-numerical statistical data is based on the use of distances between elements (as well as proximity measures, difference indicators) in such spaces. With the help of distances, empirical and theoretical averages are determined, the laws of large numbers are proved, nonparametric estimates of the probability distribution density are constructed, problems of diagnostics and cluster analysis are solved, etc. (see).

Applied research uses statistical data various kinds. This is due, in particular, to the methods of obtaining them. For example, if testing of some technical devices continues until a certain point in time, then we get the so-called. censored data consisting of a set of numbers - the duration of the operation of a number of devices before failure, and information that the remaining devices continued to work at the end of the test. Censored data is often used in the assessment and control of the reliability of technical devices.

Usually, statistical methods of data analysis of the first three types are considered separately. This limitation is caused by the circumstance noted above that the mathematical apparatus for analyzing data of a non-numerical nature is essentially different from that for data in the form of numbers, vectors, and functions.

Probabilistic-statistical modeling

When applying statistical methods in specific areas of knowledge and sectors of the national economy, we obtain scientific and practical disciplines such as “statistical methods in industry”, “statistical methods in medicine”, etc. From this point of view, econometrics is “statistical methods in economics”. These disciplines of group b) are usually based on probabilistic-statistical models built in accordance with the characteristics of the application area. It is very instructive to compare the probabilistic-statistical models used in various areas, to discover their closeness and at the same time to state some differences. Thus, one can see the closeness of the problem statements and the statistical methods used to solve them in such areas as scientific medical research, specific sociological research and marketing research, or, in short, in medicine, sociology, and marketing. These are often grouped together under the name "sampling studies".

The difference between selective studies and expert studies is manifested, first of all, in the number of objects or subjects examined - in selective studies, we usually talk about hundreds, and in expert studies, about tens. But the technology of expert research is much more sophisticated. The specificity is even more pronounced in demographic or logistical models, in the processing of narrative (textual, chronicle) information, or in the study of the mutual influence of factors.

Issues of reliability and safety of technical devices and technologies, queuing theory are considered in detail in a large number of scientific papers.

Statistical analysis of specific data

The application of statistical methods and models for the statistical analysis of specific data is closely tied to the problems of the respective field. The results of the third of the identified types of scientific and applied activities are at the intersection of disciplines. They can be considered as examples of the practical application of statistical methods. But there is no less reason to attribute them to the corresponding field of human activity.

For example, the results of a survey of instant coffee consumers are naturally attributed to marketing (which is what they do when lecturing on marketing research). The study of price growth dynamics using inflation indices calculated from independently collected information is of interest primarily from the point of view of economics and management. national economy(both at the macro level and at the level of individual organizations).

Development prospects

The theory of statistical methods is aimed at solving real problems. Therefore, new formulations of mathematical problems of statistical data analysis constantly appear in it, new methods are developed and substantiated. Justification is often carried out by mathematical means, that is, by proving theorems. An important role is played by the methodological component - how exactly to set tasks, what assumptions to accept for the purpose of further mathematical study. The role of modern information technologies in particular, a computer experiment.

An urgent task is to analyze the history of statistical methods in order to identify development trends and apply them for forecasting.

Literature

2. Naylor T. Machine simulation experiments with models economic systems. - M.: Mir, 1975. - 500 p.

3. Kramer G. Mathematical Methods statistics. - M.: Mir, 1948 (1st ed.), 1975 (2nd ed.). - 648 p.

4. Bolshev L. N., Smirnov N. V. Tables of mathematical statistics. - M.: Nauka, 1965 (1st ed.), 1968 (2nd ed.), 1983 (3rd ed.).

5. Smirnov N. V., Dunin-Barkovsky I. V. A course in the theory of probability and mathematical statistics for technical applications. Ed. 3rd, stereotypical. - M.: Nauka, 1969. - 512 p.

6. Norman Draper, Harry Smith Applied regression analysis. Multiple Regression = Applied Regression Analysis. - 3rd ed. - M .: "Dialectics", 2007. - S. 912. - ISBN 0-471-17082-8

STATISTICAL METHODS- STATISTICAL METHODS scientific methods descriptions and studies of mass phenomena that allow a quantitative (numerical) expression. The word “statistics” (from the Yigal. stato state) has a common root with the word “state”. Initially it…… Philosophical Encyclopedia

STATISTICAL METHODS -- scientific methods of description and study of mass phenomena that allow quantitative (numerical) expression. The word "statistics" (from Italian stato - state) has a common root with the word "state". Initially, it referred to the science of management and ... Philosophical Encyclopedia

Statistical Methods- (in ecology and biocenology) methods of variation statistics that allow you to explore the whole (for example, phytocenosis, population, productivity) in its particular sets (for example, according to data obtained on registration sites) and assess the degree of accuracy ... ... Ecological dictionary

statistical methods- (in psychology) (from Latin status status) some methods of applied mathematical statistics used in psychology mainly for processing experimental results. The main purpose of using S. m is to increase the validity of conclusions in ... ... Great Psychological Encyclopedia

Statistical Methods- 20.2. Statistical Methods Specific statistical methods used to organize, regulate and validate activities include, but are not limited to: a) design of experiments and factor analysis; b) analysis of variance and … Dictionary-reference book of terms of normative and technical documentation

STATISTICAL METHODS- Methods for the study of quantities. aspects of mass societies. phenomena and processes. S. m. make it possible in digital terms to characterize the ongoing changes in societies. processes, to study diff. forms of social economic. patterns, change ... ... Agricultural Encyclopedic Dictionary

STATISTICAL METHODS- some methods of applied mathematical statistics used to process experimental results. A number of statistical methods have been developed specifically for quality assurance psychological tests, for use in professional ... ... Professional education. Dictionary

STATISTICAL METHODS- (in engineering psychology) (from Latin status status) some methods of applied statistics used in engineering psychology to process experimental results. The main purpose of using S. m is to increase the validity of conclusions in ... ... Encyclopedic Dictionary of Psychology and Pedagogy

Send your good work in the knowledge base is simple. Use the form below

Students, graduate students, young scientists who use the knowledge base in their studies and work will be very grateful to you.

Posted on http://www.allbest.ru/

Introduction

1. Chi-square distribution

Conclusion

Application

Introduction

How are the approaches, ideas and results of probability theory used in our lives? mathematical square theory

The base is a probabilistic model of a real phenomenon or process, i.e. mathematical model, in which objective relations are expressed in terms of probability theory. Probabilities are used primarily to describe the uncertainties that must be taken into account when making decisions. This refers to both undesirable opportunities (risks) and attractive ones ("lucky chance"). Sometimes randomness is deliberately introduced into the situation, for example, when drawing lots, random selection of units for control, conducting lotteries or consumer surveys.

Probability theory allows one to calculate other probabilities that are of interest to the researcher.

A probabilistic model of a phenomenon or process is the foundation of mathematical statistics. Two parallel series of concepts are used - those related to theory (a probabilistic model) and those related to practice (a sample of observational results). For example, the theoretical probability corresponds to the frequency found from the sample. The mathematical expectation (theoretical series) corresponds to the sample arithmetic mean (practical series). As a rule, sample characteristics are estimates of theoretical ones. At the same time, the quantities related to the theoretical series "are in the minds of researchers", refer to the world of ideas (according to the ancient Greek philosopher Plato), and are not available for direct measurement. Researchers have only selective data, with the help of which they try to establish the properties of a theoretical probabilistic model that are of interest to them.

Why do we need a probabilistic model? The fact is that only with its help it is possible to transfer the properties established by the results of the analysis of a particular sample to other samples, as well as to the entire so-called general population. The term "population" is used when we are talking about a large but finite set of units under study. For example, about the totality of all residents of Russia or the totality of all consumers of instant coffee in Moscow. The purpose of marketing or sociological surveys is to transfer statements received from a sample of hundreds or thousands of people to general populations of several million people. In quality control, a batch of products acts as a general population.

To transfer inferences from a sample to a larger population, some assumptions are needed about the relationship of sample characteristics with the characteristics of this larger population. These assumptions are based on an appropriate probabilistic model.

Of course, it is possible to process sample data without using one or another probabilistic model. For example, you can calculate the sample arithmetic mean, calculate the frequency of fulfillment of certain conditions, etc. However, the results of the calculations will apply only to a specific sample; transferring the conclusions obtained with their help to any other set is incorrect. This activity is sometimes referred to as "data analysis". Compared to probabilistic-statistical methods, data analysis has limited cognitive value.

So, the use of probabilistic models based on estimation and testing of hypotheses with the help of sample characteristics is the essence of probabilistic-statistical decision-making methods.

1. Chi-square distribution

The normal distribution defines three distributions that are now commonly used in statistical data processing. These are the distributions of Pearson ("chi - square"), Student and Fisher.

We will focus on the distribution ("chi - square"). This distribution was first studied by the astronomer F. Helmert in 1876. In connection with the Gaussian theory of errors, he studied the sums of squares of n independent standard normally distributed random variables. Later, Karl Pearson named this distribution function "chi-square". And now the distribution bears his name.

Due to its close connection with the normal distribution, the h2 distribution plays an important role in probability theory and mathematical statistics. The h2 distribution, and many other distributions that are defined by the h2 distribution (for example, the Student's distribution), describe sample distributions of various functions from normally distributed observations and are used to construct confidence intervals and statistical tests.

Pearson distribution (chi - squared) - distribution of a random variable where X1, X2, ..., Xn are normal independent random variables, and the mathematical expectation of each of them is zero, and the standard deviation is one.

Sum of squares

distributed according to the law ("chi - square").

In this case, the number of terms, i.e. n, is called the "number of degrees of freedom" of the chi-squared distribution. As the number of degrees of freedom increases, the distribution slowly approaches normal.

The density of this distribution

So, the distribution of h2 depends on one parameter n - the number of degrees of freedom.

The distribution function h2 has the form:

if h2?0. (2.7.)

Figure 1 shows a graph of the probability density and the χ2 distribution function for different degrees of freedom.

Figure 1 Dependence of the probability density q (x) in the distribution of h2 (chi - squared) for a different number of degrees of freedom

Moments of the "chi-square" distribution:

The chi-squared distribution is used in estimating variance (using a confidence interval), in testing hypotheses of agreement, homogeneity, independence, primarily for qualitative (categorized) variables that take on a finite number of values, and in many other tasks of statistical data analysis.

2. "Chi-square" in problems of statistical data analysis

Statistical methods of data analysis are used in almost all areas of human activity. They are used whenever it is necessary to obtain and substantiate any judgments about a group (objects or subjects) with some internal heterogeneity.

The modern stage of development of statistical methods can be counted from 1900, when the Englishman K. Pearson founded the journal "Biometrika". First third of the 20th century passed under the sign of parametric statistics. Methods based on the analysis of data from parametric families of distributions described by Pearson family curves were studied. The most popular was the normal distribution. The Pearson, Student, and Fisher criteria were used to test the hypotheses. The maximum likelihood method, analysis of variance were proposed, and the main ideas for planning the experiment were formulated.

The chi-square distribution is one of the most widely used in statistics for testing statistical hypotheses. On the basis of the "chi-square" distribution, one of the most powerful goodness-of-fit tests, Pearson's "chi-square" test, is constructed.

The goodness-of-fit test is a criterion for testing the hypothesis about the proposed law of the unknown distribution.

The p2 ("chi-square") test is used to test the hypothesis of different distributions. This is his merit.

The calculation formula of the criterion is equal to

where m and m" are empirical and theoretical frequencies, respectively

distribution under consideration;

n is the number of degrees of freedom.

For verification, we need to compare empirical (observed) and theoretical (calculated under the assumption of a normal distribution) frequencies.

If the empirical frequencies completely coincide with the frequencies calculated or expected, S (E - T) = 0 and the criterion ch2 will also be equal to zero. If S (E - T) is not equal to zero, this will indicate a discrepancy between the calculated frequencies and the empirical frequencies of the series. In such cases, it is necessary to evaluate the significance of the criterion p2, which theoretically can vary from zero to infinity. This is done by comparing the actually obtained value of ch2f with its critical value (ch2st). (a) and number of degrees of freedom (n).

The distribution of probable values of the random variable h2 is continuous and asymmetric. It depends on the number of degrees of freedom (n) and approaches normal distribution as the number of observations increases. Therefore, the application of criterion p2 to the assessment discrete distributions is associated with some errors that affect its value, especially for small samples. To obtain more accurate estimates, the sample distributed in the variation series should have at least 50 options. Correct Application criterion p2 also requires that the frequencies of variants in the extreme classes would not be less than 5; if there are less than 5 of them, then they are combined with the frequencies of neighboring classes so that their total amount is greater than or equal to 5. According to the combination of frequencies, the number of classes (N) also decreases. The number of degrees of freedom is set according to the secondary number of classes, taking into account the number of restrictions on the freedom of variation.

Since the accuracy of determining the criterion p2 largely depends on the accuracy of calculating the theoretical frequencies (T), unrounded theoretical frequencies should be used to obtain the difference between the empirical and calculated frequencies.

As an example, take a study published on a website dedicated to the application of statistical methods in the humanities.

The Chi-square test allows comparison of frequency distributions, whether they are normally distributed or not.

Frequency refers to the number of occurrences of an event. Usually, the frequency of occurrence of an event is dealt with when the variables are measured in the scale of names and their other characteristics, except for the frequency, are impossible or problematic to select. In other words, when the variable has qualitative characteristics. Also, many researchers tend to translate test scores into levels (high, medium, low) and build tables of score distributions to find out the number of people at these levels. To prove that in one of the levels (in one of the categories) the number of people is really more (less), the Chi-square coefficient is also used.

Let's take a look at the simplest example.

A self-esteem test was conducted among younger adolescents. Test scores were translated into three levels: high, medium, low. The frequencies were distributed as follows:

High (H) 27 pers.

Medium (C) 12 people

Low (H) 11 pers.

It is obvious that the majority of children with high self-esteem, however, this needs to be proved statistically. To do this, we use the Chi-square test.

Our task is to check whether the obtained empirical data differ from the theoretically equally probable ones. To do this, it is necessary to find the theoretical frequencies. In our case, theoretical frequencies are equiprobable frequencies that are found by adding all frequencies and dividing by the number of categories.

In our case:

(B + C + H) / 3 \u003d (27 + 12 + 11) / 3 \u003d 16.6

The formula for calculating the chi-square test is:

h2 \u003d? (E - T) I / T

We build a table:

Empirical (Uh)	Theoretical (T)	(E - T)І / T

Find the sum of the last column:

Now you need to find the critical value of the criterion according to the table of critical values (Table 1 in the Appendix). To do this, we need the number of degrees of freedom (n).

n = (R - 1) * (C - 1)

where R is the number of rows in the table, C is the number of columns.

In our case, there is only one column (meaning the original empirical frequencies) and three rows (categories), so the formula changes - we exclude the columns.

n = (R - 1) = 3-1 = 2

For the error probability p?0.05 and n = 2, the critical value is h2 = 5.99.

The empirical value obtained is greater than the critical value - the frequency differences are significant (n2= 9.64; p≤0.05).

As you can see, the calculation of the criterion is very simple and does not take much time. The practical value of the chi-square test is enormous. This method is most valuable in the analysis of responses to questionnaires.

Let's take a more complex example.

For example, a psychologist wants to know if it is true that teachers are more biased towards boys than towards girls. Those. more likely to praise girls. To do this, the psychologist analyzed the characteristics of students written by teachers, regarding the frequency of occurrence of three words: "active", "diligent", "disciplined", synonyms of words were also counted.

Data on the frequency of occurrence of words were entered in the table:

To process the obtained data, we use the chi-square test.

To do this, we construct a table of distribution of empirical frequencies, i.e. the frequencies that we observe:

Theoretically, we expect the frequencies to be distributed equally, i.e. the frequency will be distributed proportionally between boys and girls. Let's build a table of theoretical frequencies. To do this, multiply the row sum by the column sum and divide the resulting number by the total sum (s).

The resulting table for calculations will look like this:

		Empirical (Uh)	Theoretical (T)	(E - T)І / T
boys	"Active"
"Diligent"
"Disciplined"
	"Active"
"Diligent"
"Disciplined"
Amount: 4.21

h2 \u003d? (E - T) I / T

where R is the number of rows in the table.

In our case, chi-square = 4.21; n = 2.

According to the table of critical values of the criterion, we find: with n = 2 and an error level of 0.05, the critical value h2 = 5.99.

The resulting value is less than the critical value, which means that the null hypothesis is accepted.

Conclusion: teachers do not attach importance to the gender of the child when writing his characteristics.

Conclusion

Students of almost all specialties study the section "probability theory and mathematical statistics" at the end of the course of higher mathematics, in reality they get acquainted only with some basic concepts and results, which are clearly not enough for practical work. Students meet some mathematical methods of research in special courses (for example, such as "Forecasting and technical and economic planning", "Technical and economic analysis", "Product quality control", "Marketing", "Controlling", "Mathematical methods of forecasting ", "Statistics", etc. - in the case of students of economic specialties), however, the presentation in most cases is very abbreviated and prescription in nature. As a result, the knowledge of applied statisticians is insufficient.

Therefore, the course "Applied Statistics" in technical universities, and in economic universities- the course "Econometrics", since econometrics is, as you know, a statistical analysis of specific economic data.

Probability theory and mathematical statistics provide fundamental knowledge for applied statistics and econometrics.

They are necessary for specialists for practical work.

I considered a continuous probabilistic model and tried to show its useability with examples.

And at the end of my work, I came to the conclusion that the competent implementation of the basic procedures of mathematical-static data analysis, static testing of hypotheses is impossible without knowledge of the "chi-square" model, as well as the ability to use its table.

Bibliography

1. Orlov A.I. Applied statistics. M.: Publishing house "Exam", 2004.

2. Gmurman V.E. Theory of Probability and Mathematical Statistics. M.: graduate School, 1999. - 479s.

3. Ayvozyan S.A. Probability Theory and Applied Statistics, v.1. M.: Unity, 2001. - 656s.

4. Khamitov G.P., Vedernikova T.I. Probabilities and statistics. Irkutsk: BSUEP, 2006 - 272p.

5. Ezhova L.N. Econometrics. Irkutsk: BSUEP, 2002. - 314p.

6. Mosteller F. Fifty entertaining probabilistic problems with solutions. M.: Nauka, 1975. - 111p.

7. Mosteller F. Probability. M.: Mir, 1969. - 428s.

8. Yaglom A.M. Probability and information. M.: Nauka, 1973. - 511s.

9. Chistyakov V.P. Probability course. M.: Nauka, 1982. - 256s.

10. Kremer N.Sh. Theory of Probability and Mathematical Statistics. M.: UNITI, 2000. - 543s.

11. Mathematical encyclopedia, v.1. M.: Soviet Encyclopedia, 1976. - 655s.

12. http://psystat.at.ua/ - Statistics in psychology and pedagogy. Article Chi-square test.

Application

Critical distribution points p2

Table 1

Hosted on Allbest.ru

...


400	0,1	40	16000
400	0,3	120	48000
400	0,5	200	80000
400	0,1	40	16000
	1,0	400	160000


350	0,1	35	12250
500	0,3	150	75000
500	0,5	250	125000
500	0,1	50	25000
	1,0	485	2372500


300	0,1	30	9000
450	0,3	135	60750
600	0,5	300	180000
600	0,1	60	36000
	1,0	525	285750

Fundamentals of probabilistic-statistical methods for describing uncertainties. Probabilistic-statistical methods of research and the method of system analysis Probabilistic statistical methods

Classification of statistical methods

Applied Statistics

Probabilistic-statistical modeling

Statistical analysis of specific data

Development prospects

Literature

See also

See what "Statistical Methods" is in other dictionaries:

Send your good work in the knowledge base is simple. Use the form below

Similar Documents

Fundamentals of probabilistic-statistical methods for describing uncertainties. Probabilistic-statistical methods of research and the method of system analysis Probabilistic statistical methods

Classification of statistical methods

Applied Statistics

Probabilistic-statistical modeling

Statistical analysis of specific data

Development prospects

Literature

See also

See what "Statistical Methods" is in other dictionaries:

Send your good work in the knowledge base is simple. Use the form below

Similar Documents

Related Articles