Methods to Measure Data Dispersion

Data processing to be successful, it is essential to have an overall picture of the data. Descriptive data summarization techniques can be used to identify the typical properties of your data and highlight which data values should be treated as noise or outliers. Therefore, it’s very important to learn about the data characteristics and measure for the same. In this article, we will check Methods to Measure Data Dispersion.

Methods to Measure Data Dispersion

Let’s know how can we disperse the numeric data or spread the numeric data. Below are five different measures of data dispersion.

  1. Range: Let’s consider an example. Suppose we have ….. be a set of observation for some attribute ‘X’. The range of the set can be defined as the difference between the largest max() and the smallest min()
  2. Quantiles: These are the data points taken at the regular intervals of data distribution, dividing it into equal size consecutive sets.

Note: There may not be any data values of ‘X’ that divide the data into exactly equal sized subsets, for readability it’s referred as equal size.

Let’s consider an example, suppose the data for attribute ‘X’ is sorted in ascending numeric order. Now imagine you are supposed to pick certain data points such that it splits the data distribution into equal size consecutive sets. The kth q-quantile for a given data distribution is the value  such that most  of the data values are less than  and at most  of the data values are more than .

Where ‘k’ is an integer such that 0 < k < q. There are q-1 q-quantile.

Example: The 2- quantile is the data point dividing the lower and upper halves of the data distribution which is also called as median.

  1. Quartiles: As you know 2-quantile is also called as median. Similarly, we have 4-quantile which consists of three data points that splits the data distribution into four equal parts, each part represents one-fourth of the distribution. This we commonly refer as quartiles.
  2. Percentiles: The 100-quantiles are called as percentiles. They divide the data distribution into 100 equal-sized consecutive sets.
  3. Interquartile range (IQR): In the below diagram, the first quartile is denoted by Q1, is the 25th It divides the lowest 25% of the data. The quartile is denoted by Q3, is the 75th percentile which divides the highest 25% of the data. The second quartile is 50th percentile, which is the median gives the centre of the data distribution.

Methods to Measure Data Dispersion

The distance between the 1st and 3rd quartiles is the simple measure of distribution which gives the range covered by the middle half of the data.

Now this distance is called as IQR, defined as:

IQR=Q3-Q1

Related reading: