Unit -1 :Population, Sample and Data Condensation

Meaning of Statistics:

The subject Statistics, as it seems, is not a new discipline but it is as old as the human society, itself. It has been used right from the existence of life on this earth, although the sphere of its utility was very much restricted.

In the olden days Statistics was regarded as the ‘Science Statecraft’ and was the by-product of the administrative activity of the state. The word Statistics seems to have been derived from the Latin word ‘status’ or the Italian word ‘statista’ or the German word ‘statistik’ or the French word ‘statistique’ each of which means a political state.

In India, an efficient system of collecting official and administrative statistics existed even 2000 years ago, in particular, during the reign of Chandragupta Maurva (324-300 B.C.). Historical evidences about the prevalence of a very good system of collecting vital statistics and registration of births and deaths even before 300 B.C. are available in Kautilya’s ‘Arthashastra’.

The records of land, agriculture and wealth statistics were maintained by Todermal, the land and revenue minister in the reign of Akbar (1556-1605 A.D). A detailed account of the administrative and statistical surveys conducted during Akbar’s reign is available in the book “Ain-e- Akbari” written by Abul Fazl (in 1596-97), one of the nine gems of Akbar.

Definition of Statistics:

Originally the word ‘statistics’ was used for the collection of data concerning states both historical and descriptive. Now it has acquired a much wider meaning and is used for all types of data and methods for the analysis of the data.

Concepts in Statistics:

1. Data:

You might be reading a newspaper regularly. Almost every newspaper gives the minimum and the maximum temperature recorded in the city on the previous day. It also indicates the rainfall recorded, and the time of sunrise and sunset. In the school, attendance of the students are recorded in a register regularly.

Here we are recording the data of minimum and maximum temperature of the city, data of rainfall, data for the time of sunrise and sunset, and the data pertaining to the attendance of children.

As an example, the class-wise attendance of students, in a school, is as recorded in Table 2.0:

Class-Wise Attendance of Students

Table 2.0 gives the data for class-wise attendance of students. Here the data comprise 7 observations in all. These observations are, attendance for class VI, VII, and so on. So, data refers to the set of observations, values, elements or objects under consideration. The complete-set of all possible elements or objects is called a population.

Each of the elements is called a piece of data. Data also refers to the known facts or things used as basis for inference or reckoning facts, information, material to be processed or stored.

. Scores:

Scores or other numbers in continuous series are to be thought of as distances along a continuum, rather than as discrete points. An inch is the linear magnitude between two divisions on a foot rule; and, in like manner, a score in a mental test is a unit distance between two limits. A score of 120 upon an intelligence examination, for example, represents the interval 119.5 up to 120.5.

The exact midpoint of this score interval is 120 as shown below:


Other scores may be interpreted in the same way. A score, of 15, for, instance, includes all values from 14.5 to 15.5, i.e., any value from a point .5 unit below 15 to a point .5 unit above 15. This means that 14.7, 15.0 and 15.4 would all be scored 15. “The usual mathematical meaning of a score is an interval which extends along some dimension from .5 unit below to .5 unit above the face value of the score.” (Garrett 1979)

3. Variable:

The characteristic on which individuals differ among themselves is called a variable. Thus speed, shape, height, weight, age, sex, grades are variables in the above examples. In educational and psychological studies we often deal with variables relating to intellectual abilities.

Now, it is the aim of every physical and behavioural science to study the nature of the variation in whatever variable it is dealing with, and therefore, it is necessary to measure the extent and type of variation in a variable. Statistics is a branch of science which is concerned with the study of variables that vary in unpredictable fashion and helps in providing an understanding of the phenomena and objects which show such variations.

4. Measurement Scales:

Measurement refers to the assignment of numbers to objects and events according to logical acceptable rules. The numbers have many properties, such as identity, order and additivity. If we can legitimately assign numbers in describing of objects and events, then the properties of numbers should be applicable to the objects and events.

It is essential to know about the different kinds of measurement scales, as the number of properties applicable depends upon the measurement scale applied to the objects or events.

Importance and Scope of Statistics:

The fact that in the modern world statistical methods are universally applicable. It is in itself enough to show how important the science of statistics is. As a matter of fact there are millions of people all over the world who have not heard a word about statistics and yet who make a profuse use of statistical methods in their day- to-day decisions. Statistical methods are common ways of thinking and hence are used by all types of persons.

Examples can be multiplied to show that human behaviour and statistical methods have much in common. In fact statistical methods are so closely connected with human actions and behaviour that practically all human activity can be explained by statistical methods. This shows how important and universal statistics is.

Let us now discuss briefly the importance of statistics in some different disciplines:

(i) Statistics in Planning:

Statistics is indispensable in planning—may it be in business, economics or government level. The modern age is termed as the ‘age of planning’ and almost all organisations in the government or business or management are resorting to planning for efficient working and for formulating policy decision.

To achieve this end, the statistical data relating to production, consumption, birth, death, investment, income are of paramount importance. Today efficient planning is a must for almost all countries, particularly the developing economies for their economic development.

(ii) Statistics in Mathematics:

Statistics is intimately related to and essentially dependent upon mathematics. The modern theory of Statistics has its foundations on the theory of probability which in turn is a particular branch of more advanced mathematical theory of Measures and Integration. Ever increasing role of mathematics into statistics has led to the development of a new branch of statistics called Mathematical Statistics.

Thus Statistics may be considered to be an important member of the mathematics family. In the words of Connor, “Statistics is a branch of applied mathematics which specialises in data.”

(iii) Statistics in Economics:

Statistics and Economics are so intermixed with each other that it looks foolishness to separate them. Development of modern statistical methods has led to an extensive use of statistics in Economics.

All the important branches of Economics—consumption, production, exchange, distribution, public finance—use statistics for the purpose of comparison, presentation, interpretation, etc. Problem of spending of income on and by different sections of the people, production of national wealth, adjustment of demand and supply, effect of economic policies on the economy etc. simply indicate the importance of statistics in the field of economics and in its different branches.

Statistics of Public Finance enables us to impose tax, to provide subsidy, to spend on various heads, amount of money to be borrowed or lent etc. So we cannot think of Statistics without Economics or Economics without Statistics.

(iv) Statistics in Social Sciences:

Every social phenomenon is affected to a marked extent by a multiplicity of factors which bring out the variation in observations from time to time, place to place and object to object. Statistical tools of Regression and Correlation Analysis can be used to study and isolate the effect of each of these factors on the given observation.

Sampling Techniques and Estimation Theory are very powerful and indispensable tools for conducting any social survey, pertaining to any strata of society and then analysing the results and drawing valid inferences. The most important application of statistics in sociology is in the field of Demography for studying mortality (death rates), fertility (birth rates), marriages, population growth and so on.

(v) Statistics in Trade:

As already mentioned, statistics is a body of methods to make wise decisions in the face of uncertainties. Business is full of uncertainties and risks. We have to forecast at every step. Speculation is just gaining or losing by way of forecasting. Can we forecast without taking into view the past? Perhaps, no. The future trend of the market can only be expected if we make use of statistics. Failure in anticipation will mean failure of business.

Changes in demand, supply, habits, fashion etc. can be anticipated with the help of statistics. Statistics is of utmost significance in determining prices of the various products, determining the phases of boom and depression etc. Use of statistics helps in smooth running of the business, in reducing the uncertainties and thus contributes towards the success of business.


It includes all the elements from the data set and measurable characteristics of the population such as mean and standard deviation are known as a parameter. For example, All people living in India indicates the population of India.

There are different types of population. They are:

  • Finite Population
  • Infinite Population
  • Existent Population
  • Hypothetical Population

Let us discuss all the types one by one.

Finite Population

The finite population is also known as a countable population in which the population can be counted. In other words, it is defined as the population of all the individuals or objects that are finite. For statistical analysis, the finite population is more advantageous than the infinite population. Examples of finite populations are employees of a company, potential consumer in a market.

Infinite Population

The infinite population is also known as an uncountable population in which the counting of units in the population is not possible. Example of an infinite population is the number of germs in the patient’s body is uncountable.

Existent Population

The existing population is defined as the population of concrete individuals. In other words, the population whose unit is available in solid form is known as existent population. Examples are books, students etc.

Hypothetical Population

The population in which whose unit is not available in solid form is known as the hypothetical population. A population consists of sets of observations, objects etc that are all something in common. In some situations, the populations are only hypothetical. Examples are an outcome of rolling the dice, the outcome of tossing a coin.


It includes one or more observations that are drawn from the population and the measurable characteristic of a sample is a statistic. Sampling is the process of selecting the sample from the population. For example, some people living in India is the sample of the population.

Basically, there are two types of sampling. They are:

  • Probability sampling
  • Non-probability sampling

Probability Sampling

In probability sampling, the population units cannot be selected at the discretion of the researcher. This can be dealt with following certain procedures which will ensure that every unit of the population consists of one fixed probability being included in the sample. Such a method is also called random sampling. Some of the techniques used for probability sampling are:

  • Simple random sampling
  • Cluster sampling
  • Stratified Sampling
  • Disproportionate sampling
  • Proportionate sampling
  • Optimum allocation stratified sampling
  • Multi-stage sampling

Non Probability Sampling

In non-probability sampling, the population units can be selected at the discretion of the researcher. Those samples will use the human judgements for selecting units and has no theoretical basis for estimating the characteristics of the population. Some of the techniques used for non-probability sampling are

  • Quota sampling
  • Judgement sampling
  • Purposive sampling

Population and Sample Examples

  • All the people who have the ID proofs is the population and a group of people who only have voter id with them is the sample.
  • All the students in the class are population whereas the top 10 students in the class are the sample.
  • All the members of the parliament is population and the female candidates present there is the sample.

Raw data

Raw data typically refers to tables of data where each row contains an observation and each column represents a variable that describes some property of each observation. Data in this format is sometimes referred to as tidy data, flat data, primary data, atomic data, and unit record data. Sometimes raw data refers to data that has not yet been processed.


Generally statistics deal with quantitative data only. But in behavioural sciences, one often deals with the variable which arenot quantitatively measurable. Literally an attribute means aquality on characteristic which are not related to  quantitative measurements.

Examples of attributes are health,  honesty, blindness 

Frequency Distribution

Many times it is not easy or feasible to find the frequency of data from a very large dataset. So to make sense of the data we make a frequency table and graphs. Let us take the example of the heights of ten students in cms.

Frequency Distribution Table

139, 145, 150, 145, 136, 150, 152, 144, 138, 138

Frequency distribution table

This frequency table will help us make better sense of the data given. Also when the data set is too big (say if we were dealing with 100 students) we use tally marks for counting. It makes the task more organised and easy. Below is an example of how we use tally marks.

Frequency distribution

Cumulative Frequency Table

The cumulative frequency of a set of data or class intervals of a frequency table is the sum of the frequencies of the data up to a required level. It can be used to determine the number of items that have values below a particular level.