DBscan
Clustering Algorithm
Last updated
Clustering Algorithm
Last updated
Clusters are dense regions in the data space, separated by regions of the lower density of points. The DBSCAN algorithm is based on this intuitive notion of “clusters” and “noise”. The key idea is that for each point of a cluster, the neighborhood of a given radius has to contain at least a minimum number of points.
eps: It defines the neighborhood around a data point. One way to find the eps value is based on the k-distance graph.
MinPts: Minimum number of neighbors (data points) within eps radius. As a general rule, the minimum MinPts can be derived from the number of dimensions D in the dataset as, MinPts >= D+1. The minimum value of MinPts must be chosen at least 3.
In this algorithm, we have 3 types of data points. Core Point: A point is a core point if it has more than MinPts points within eps. Border Point: A point which has fewer than MinPts within eps but it is in the neighborhood of a core point. Noise or outlier: A point which is not a core point or border point.