IDEAL Workshop on High-Dimensional Geometry and Analysis

Link for remote participants: Zoom meeting

Speakers

Ainesh Bakshi (Carnegie Mellon University), Arnold Filtser (Bar Ilan University), Weiyun Ma (Stanford University), Assaf Naor (Princeton University), Erik Waingarten (Stanford University)

Logistics

  • Dates: Friday, May 27
  • Location: Northwestern University
  • Rooms: Mudd 3514
  • Streaming: Zoom (see above)
  • Registration: Registration Link

Schedule

All times are in the Central Time Zone (CDT; Chicago time).

  • 9:30: Breakfast
  • 9:55am: Opening Remarks
  • 10:00am: Arnold Filtser, Locality-Sensitive Orderings (video)
  • 11:00am: Ainesh Bakshi,  Robustly Learning a Mixture of k Arbitrary Gaussians (video)
  • 12:00: Lunch
  • 1:00pm: Assaf Naor, Randomized clustering in high dimensions (video)
  • 2:00pm: Erik Waingarten, The Johnson-Lindenstrauss Lemma for Clustering and Subspace Approximation (video)
  • 3:00pm: Coffee Break
  • 3:20pm: Weiyun Ma, Almost 3-Approximate Correlation Clustering in Constant Rounds (video)

 

Titles and Abstracts

Speaker: Arnold Filtser (Bar Ilan University)
Title: Locality-Sensitive Orderings
Abstract: Chan, Har-Peled, and Jones [2020] recently developed locality-sensitive ordering (LSO), a new tool that allows one to reduce problems in the Euclidean space R^d to the 1-dimensional line. They used LSO’s to solve a host of problems. Later, Buchin, Har-Peled, and Oláh [2019,2020] used the LSO of Chan et al. to construct very sparse reliable spanners for the Euclidean space. A highly desirable feature of a reliable spanner is its ability to withstand a massive failure: the network remains functioning even if 90% of the nodes fail. In a follow-up work, Har-Peled, Mendel, and Oláh [2021] constructed reliable spanners for general and topologically structured metrics. Their construction used a different approach, and is based on sparse covers.

We develop the theory of LSO’s in non-Euclidean metrics by introducing new types of LSO’s suitable for general and topologically structured metrics. We then construct such LSO’s, as well as constructing considerably improved LSO’s for doubling metrics. Afterwards, we use our new LSO’s to construct reliable spanners with improved stretch and sparsity parameters. Along the way to the construction of LSO’s and reliable spanners, we introduce and construct ultrametric covers, and construct 2-hop reliable spanners for the line.

Based on joint work with Hung Le.

Speaker: Ainesh Bakshi (Carnegie Mellon University)
Title: Robustly Learning a Mixture of k Arbitrary Gaussians
Abstract: In this talk, we will describe a polynomial-time algorithm for the problem of robustly estimating a mixture of k arbitrary Gaussians in R^d, for any fixed k, in the presence of a constant fraction of arbitrary corruptions. This resolves the main open problem identified in several previous works on algorithmic robust statistics, which addressed the special cases of robustly estimating (a) a single Gaussian, (b) a mixture of TV-distance separated Gaussians, and (c) a uniform mixture of two Gaussians. We will provide a high-level overview of the main tools we establish in this work: an efficient “partial-clustering” algorithm that relies on the sum-of-squares method, and a novel tensor decomposition algorithm that allows errors in both Frobenius norm and low-rank terms.

Based on joint work with Ilias Diakonikolas, He Jia, Daniel Kane, Pravesh Kothari and Santosh Vempala.

Speaker: Assaf Naor (Princeton University)
Title: Randomized clustering in high dimensions
Abstract: The separation modulus of a metric space M is the smallest S > 0 such that for every D>0 there exists a random partition of M into clusters of diameter at most D such that for any two points in M the probability that they belong to different clusters is at most their distance times (S/D). By relating it to well-studied volumetric quantities such as volume ratios and projection bodies, we will obtain asymptotic evaluations of the separation modulus for many high-dimensional normed spaces and their subsets. We will show how these bounds can be used to make progress on classical questions on the extension of Lipschitz functions, and how they relate to the question of reversing the classical isoperimetric inequality.

Speaker: Erik Waingarten (Stanford University)
Title: The Johnson-Lindenstrauss Lemma for Clustering and Subspace Approximation
Abstract: The Johnson-Lindenstrauss lemma says that for any set of vectors in a high-dimensional space, there exists an embedding into a much lower dimensional space which approximately preserves all pairwise distances. Here, we explore dimensionality reduction for other geometric optimization problems: given a geometric optimization problem (for example, k-means clustering or principal components analysis) is there always an embedding to a lower dimensional space which approximately preserves the cost of the optimization? In this talk, I will overview a few results in this space, and present a new technique for using coreset constructions in order to get improved dimensionality reduction.

Based on joint work with Moses Charikar.

Speaker: Weiyun Ma (Stanford University)
Title: Almost 3-Approximate Correlation Clustering in Constant Rounds
Abstract: We study parallel algorithms for correlation clustering. In this problem, each pair among n objects is labeled as either “similar” or “dissimilar”. The goal is to partition the objects into arbitrarily many clusters while minimizing the number of disagreements with the labels.

I will describe an algorithm that obtains an (almost) 3-approximation in constant rounds (of models such as massively parallel computation, local, and semi-streaming). This is a culminating point for the rich literature on parallel correlation clustering as the approximation (almost) matches a natural barrier of 3 for combinatorial algorithms and the round-complexity is constant.

Based on joint work with Soheil Behnezhad, Moses Charikar, and Li-Yang Tan

About the Series

The IDEAL workshop series brings in experts on topics related to the foundations of data science to present their perspective and research on a common theme. Chicago area researchers with an interest in the foundations of data science.

This workshop is part of the Special Quarter on High-Dimensional Data Analysis. It is organized by Konstantin Makarychev (NU) and Yury Makarychev (TTIC).