HDBSCAN tweets clustering for residential area localization
For finding the tweets cluster, we are using the Hierarchical Density-Based Spatial Clustering- HDBSCAN (https://github.com/bobleegogogo/hdbscan). There are two major hyperparameters in HDBSCAN: min_cluster_size and min_samples.
The ideal case of tweets clusters should be able to catch big cities (high amount of total number) as well as small villages and other types of residential areas (low total number but high density)
Two stage clustering considering different min_cluster_size and min_samples.