APPLIED MATH SEMINAR
Speaker: Mauro Maggioni, Department of Mathematics and Computer Science, Duke
University
Tuesday, February 16th, 4:15 PM, AKW 200
Title: "Intrinsic dimensionality estimation and multiscale geometry of data
sets"
Abstract: The analysis of large data sets, modeled as point clouds in high
dimensional spaces, is needed in a wide variety of applications such as
recommendation systems, search engines, molecular dynamics, machine learning,
statistical modeling, just to name a few. Oftentimes it is claimed or assumed
that many data sets, while lying in high dimensional spaces, have indeed a
low-dimensional structure. It may come perhaps as a surprise that only very
few, and rather sample-inefficient, algorithms exist to estimate the intrinsic
dimensionality of these point clouds. We present a recent multiscale algorithm
for estimating the intrinsic dimensionality of data sets, under the assumption
that they are sampled from a rather tame low-dimensional object, such as a
manifold, and perturbed by high dimensional noise. Under natural assumptions,
this algorithm can be proven to estimate the correct dimensionality with a
number of points which is merely linear in the intrinsic dimension. Experiments
on synthetic and real data will be discussed. Furthermore, this algorithm opens
the way to novel algorithms for exploring, visualizing, compressing and
manipulating certain classes of high-dimensional point clouds.