APPLIED MATH SEMINAR
Speaker: Gil David, Applied Mathematics, Yale University
Title: Hierarchical Clustering via Localized Diffusion Folders
When/where: Tuesday, December 1st, 4:15 PM, AKW 200
Abstract:
Data clustering is a common technique for statistical data analysis. It is used
in many fields including machine learning, data mining, customer segmentation,
trend analysis, pattern recognition and image analysis. The proposed Localized
Diffusion Folders methodology performs hierarchical clustering and
classification of high-dimensional datasets. The diffusion folders are
multi-level data partitioning into local neighborhoods (Voronoi diagrams) that
are generated by several random selections of data points and folders in a
diffusion graph and by defining local diffusion distances between them. This
multi-level partitioning defines an improved localized geometry of the data and
a new localized Markov transition matrix that is used for the next time step in
the diffusion process. The result of this clustering method is a bottom-up
hierarchical clustering of the data while each level in the hierarchy contains
localized diffusion folders of folders from the lower levels. This methodology
defines a new geometry of the data in each level in the hierarchy while
eliminating noisy connections between distinct points and areas in the graph.
The performance of the algorithm is demonstrated on different applications:
1. Image processing - denoising and restoration
2. Recommendation systems - NetFlix movie recommendation
3. Network protocols - clustering and classification of network packets