Unit-3: Measure of dispersion

Dispersion is a statistical concept that refers to the extent to which data points in a set are spread out from each other. There are several measures of dispersion, including:

  1. Range: It’s the difference between the largest and the smallest value in a dataset.
  2. Interquartile Range (IQR): It’s the difference between the third quartile and the first quartile, which represents the range of the middle 50% of the data.
  3. Variance: It’s a measure of the spread of a set of data around its mean. Variance is the average of the squared differences between each data point and the mean.
  4. Standard Deviation: It’s the square root of the variance and provides a measure of how far each data point is from the mean.
  5. Mean Absolute Deviation (MAD): It’s the average of the absolute differences between each data point and the mean.
  6. Coefficient of Variation (CV): It’s the ratio of the standard deviation to the mean, expressed as a percentage, and provides a measure of relative dispersion.

These measures of dispersion help us to understand how much the data is spread out and how the data points are distributed around the central value

Concept of dispersion

Dispersion, also known as variability or scatter, is a statistical concept that measures how spread out the values in a set of data are. It provides information about the distribution of the data, such as how much the data points vary from the center of the distribution and how much they vary from each other.

In other words, dispersion reflects the degree of variation or spread in the data. A set of data with high dispersion means that the data points are widely spread out, while a set of data with low dispersion means that the data points are clustered closely together.

There are several measures of dispersion, including range, variance, standard deviation, interquartile range, mean absolute deviation, and coefficient of variation, which are used to describe the spread of a set of data. These measures help us to understand the shape of the distribution and the degree of variation in the data, and provide important information for making decisions and predictions

Absolute and relative measure of dispersion

Dispersion measures can be classified into two categories: absolute and relative measures.

  1. Absolute Measures of Dispersion: These measures describe the spread of the data in absolute terms, such as the difference between the largest and smallest values, or the average difference between each data point and the mean. Examples of absolute measures of dispersion are:
  • Range: The difference between the largest and smallest values in a set of data. For example, if the largest value is 8 and the smallest value is 2, then the range is 8 – 2 = 6.
  • Mean Absolute Deviation (MAD): The average of the absolute differences between each data point and the mean. For example, if the data set is [1, 2, 3, 4, 5] and the mean is 3, the MAD would be (|1-3| + |2-3| + |3-3| + |4-3| + |5-3|)/5 = (2 + 1 + 0 + 1 + 2)/5 = 1.2
  1. Relative Measures of Dispersion: These measures describe the spread of the data relative to the mean or some other central value, such as the standard deviation, which is expressed as a proportion of the mean. Examples of relative measures of dispersion are:
  • Variance: The average of the squared differences between each data point and the mean. For example, if the data set is [1, 2, 3, 4, 5] and the mean is 3, the variance would be ( (1-3)^2 + (2-3)^2 + (3-3)^2 + (4-3)^2 + (5-3)^2)/5 = (4 + 1 + 0 + 1 + 4)/5 = 2.4.
  • Standard Deviation: The square root of the variance. For example, if the variance is 2.4, the standard deviation would be √2.4 = 1.55.
  • Coefficient of Variation (CV): The ratio of the standard deviation to the mean, expressed as a percentage. For example, if the mean is 100 and the standard deviation is 10, the CV would be (10/100)*100 = 10%.

Relative measures of dispersion are particularly useful when comparing data sets with different units or scales, as they provide a normalized measure of spread that is independent of the size of the data

range variance, Standard deviation, Coefficient of variation

Here are examples to illustrate the concepts of range, variance, standard deviation, and coefficient of variation:

  1. Range: The range is the difference between the largest and smallest values in a set of data. For example, consider the following set of numbers: [1, 2, 3, 4, 5]. The largest value is 5 and the smallest value is 1, so the range is 5 – 1 = 4.
  2. Variance: Variance is a measure of the spread of a set of data around its mean. It is the average of the squared differences between each data point and the mean. For example, consider the data set [1, 2, 3, 4, 5] with a mean of 3. The variance would be calculated as follows:

( (1-3)^2 + (2-3)^2 + (3-3)^2 + (4-3)^2 + (5-3)^2)/5 = (4 + 1 + 0 + 1 + 4)/5 = 2.4.

  1. Standard Deviation: Standard deviation is the square root of the variance and provides a measure of how far each data point is from the mean. For the data set [1, 2, 3, 4, 5] with a variance of 2.4, the standard deviation would be √2.4 = 1.55.
  2. Coefficient of Variation (CV): CV is the ratio of the standard deviation to the mean, expressed as a percentage. It provides a measure of relative dispersion, allowing for comparisons between data sets with different units or scales. For example, consider a data set with a mean of 100 and a standard deviation of 10. The CV would be (10/100)*100 = 10%