Different types of Data Mining Clustering Algorithms and Examples

There are various types of data mining clustering algorithms but, only few popular algorithms are widely used. Basically, all the clustering algorithms uses the distance measure method, where the data points closer in the data space exhibit more similar characteristics than the points lying further away. Every algorithm follows a different approach to find the ‘similar characteristics’ among the data points.

Read:

Let’s look at the different types of Data Mining Clustering Algorithms in detail:

Data Mining Connectivity Models

This model follows 2 approaches.

In the first approach, they start classifying all the data points into separate clusters, later aggregates the data points as the distance decreases.
In the second approach, all the data points are aggregated as a single cluster and later partitions the data points as the distance increases.

However, these models are easy to interpret but it is not the best model to handle a big data set. Examples of these models are hierarchical clustering.

Data Mining Hierarchical Clustering Method Steps

Below are the steps to solve the Hierarchical Clustering Method:

Given the set of ‘n’ items to be clustered and an ‘n×n’ distance matrix

Step-1: Assign each item to its own cluster, such that if you have ‘n’ items now you will have ‘n’ clusters, each containing just one item. Let the similarities between the clusters equal the similarities between the items they contain.

Step-2: Find the most similar pair of clusters and merge them to the single cluster.

Step-3: Compute the similarities between the new cluster and old cluster each.

Step-4: Repeat step 2 and step 3 until all items are clustered into the single cluster size ‘n’.

Step-3 can be carried out in a different ways; it can be single-link, complete-link and average-link clustering. In which single link clustering is to find the shortest distance between any data point of one cluster to any data point of the other cluster. In complete-link clustering (called as diameter or maximum method) is to find the longest distance between and data point of one cluster to any data point of the other cluster. In the average-link clustering is to find the average distance between any data point of one cluster to any data member of the other cluster.

Data Mining Centroid Models

Data mining K means algorithm is the best example that falls under this category.

In this model the number of clusters required at the end is known in prior. Therefore, it is important to have knowledge of the data set. These are iterative data mining algorithms in which the data points closer to the centroid in the data space will be aggregated to the single cluster. Number of centroid is always equal to the number of clusters.

Data Mining K-Means Method Steps

Below are the steps for K Means Clustering Method:

Place K points into the data space represented by the data objects that are being clustered. These points represent initial group centroids.
Assign each data object to the group that has the closest centroid.
When all the data objects have been assigned, recalculate the positions of the K centroids.
Repeat Steps 2 and 3 until the centroids no longer move. This produces a separation of the data objects into groups.

Data Mining Distribution Models

These models are based on predicting how probable is that the data points in the cluster belong to the same distribution (Gaussain). Popular example for this model is Expectation- Maximization algorithm.

Data Mining Density Models

These models search for areas of varied density of data points in the data space. It isolates various different density regions and assigns the data points within these regions in the same cluster. Popular examples of density models are DBSCAN and OPTICS.

Data Mining DBSCAN (Density Based Spatial Clustering of Applications with Noise) Method

Below are the steps for DBSCAN Clustering Method:

The method requires 2 parameters: epsilon(Eps) and minimum points(MinPts). It starts with a random point that has not yet visited.
Finds all the neighbor data points within the distance Eps of the starting point
The cluster is formed if the number of neighbors is greater than or equal to MinPts. Starting point is marked as visited
If the number of neighbors is less than MinPts, than the data point is marked as noise.
The algorithm repeats the process recursively.