Research Director

Yahoo Research

New York, NY

me@edoliberty.com

עידו
ליברטי

homepage

I am now the head of Yahoo's Independent Research in New York where we focus on scalable machine learning and data mining for Yahoo critical applications.

I received my B.Sc in Physics and Computer Science from Tel Aviv university and my Ph.D in Computer Science from Yale University, under the supervision of Steven Zucker. After that, I was a Post-Doctoral fellow at Yale in Program in Applied Mathematics.

My personal research interests include fast dimensionality reduction, clustering, streaming and online algorithms, text and pattern mining, machine learning, and large scale numerical linear algebra. I am especially fond of randomized algorithms and high dimensional geometry.

**Inveited Talk:** I recently spoke at MLConf about online algorithms for data mining.

**Code Release:** I have been asked to make some matrix sketching code available for a long time now.
So, Mina Ghashami and I made some of our frequent direction git repo public.
This code is distributed freely for academic use only. Please feel free to send pull requests.

**New Algorithm:** I'm excited about resolving one of the longests standing open problems in the streaming model.
We designed an optimal algorithm for finding any approximate quantile of a stream of elements.
See the paper which Zohar Karnin, Kevin Lang, and myself posted on Arxiv.
Some code to experiment with is available here.

**Code Release:**
Some code to experiment with implementing the first algorithm in our new paper about finding approximate quantiles.
The code is intended for academic use only.

**Workshop:** I was one of the organizers of NYCE 2016 which had
a great turnout and a fantastic set of talks. Links to talk videos are
on the main workshop website.

**Code Release:** I'm glad to announce that Yahoo open sourced parts of our data
sketches library. This is an ongoing effort which I am proud to be a part of.
There are many exciting new algorithms coming in later releases.
See an announcement
on VentureBeat and the link to the project itself.

0368-3248-01-Data Mining - Tel Aviv University

The course
covered algorithmic tools for data mining massive data sets.

It was given as a theory/algorithms class with and emphasis on
randomization.

fall 2011

fall 2012

fall 2013

Optimal Quantile Approximation in Streams

Zohar Karnin, Kevin Lang, Edo Liberty

In progress

Efficient Frequent
Directions Algorithm for Sparse Matrices

Mina Ghashami, Edo
Liberty, Jeff M. Phillips

In progress

Greedy Minimization of
Weakly Supermodular Set Functions

Christos Boutsidis, Edo
Liberty, Maxim Sviridenko

In progress [bib]

Stratified Sampling meets Machine
Learning

Kevin Lang, Edo Liberty, Konstantin Shmakov

ICML 2016

Space Lower Bounds for
Itemset Frequency Sketches

Edo Liberty, Michael
Mitzenmacher, Justin Thaler, Jonathan Ullman

PODS 2016 [bib]

An Algorithm for Online
K-Means Clustering

Edo Liberty, Ram Sriharsha, Maxim
Sviridenko

ALENEX 2016 [bib]

Online PCA with Spectral Bounds

Zohar Karnin, Edo Liberty

COLT 2015 [bib]

(see also 5
minute video letcure)

Online Principal Component Analysis

Christos Boutsidis, Dan Garber, Zohar Karnin, Edo Liberty

SODA 2014 [bib]

Near-optimal Distributions for
Data Matrix Sampling

Dimitris Achlioptas, Zohar Karnin, Edo
Liberty

NIPS 2013 [bib]

Simple and
Deterministic Matrix Sketches

Edo Liberty (see slides
and experimental results in
json format)

Also, here is
talk
I gave at the Simons Institute about this.

**Best paper**
at KDD 2013 [bib]

See also frequent direction git repo by Mina Ghashami and myself.

Threading
Machine Generated Email

Nir Ailon, Zohar Karnin, Edo
Liberty, Yoelle Maarek

**Best paper** at TechPulse 2012
and WSDM 2013 [bib]

Unsupervised SVMs: On the complexity
of the Furthest Hyperplane Problem

Zohar Karnin, Edo
Liberty, Shachar Lovett, Roy Schwartz, and Omri Weinstein

COLT 2012 [Slides] [bib]

Framework and Algorithms for Network
Bucket Testing

Liran Katzir, Edo Liberty, and Oren Somekh

WWW 2012 [bib]

An Almost Optimal
Unrestricted Fast Johnson-Lindenstrauss Transform

Nir Ailon,
Edo Liberty

**Best paper** at SODA 2011 [bib]

Improved
Approximation Algorithms for Bipartite Correlation Clustering

Nir Ailon, Noa Avigdor-Elgrabli, Edo Liberty, Anke van Zuylen

ESA 2011 [slides]
[bib]

Automatically Tagging
Email by Leveraging Other Users' Folders

Yehuda Koren, Edo
Liberty,Yoelle Maarek, and Roman Sandler

KDD 2011 [bib]

Estimating Sizes of Social Networks via Biased Sampling

Liran Katzir, Edo Liberty, and Oren Somekh

WWW 2011 [bib]

Inverted
Index Compression via Online Document Routing

Gal Lavee,
Ronny Lempel, Edo Liberty, and Oren Somekh

WWW 2011 [bib]

Correlation Clustering Revisited:
The "True" Cost of Error Minimization Problems

Nir Ailon,
Edo Liberty

ICALP 2009 [bib]

Dense Fast Random
Projections and Lean Walsh Transforms,

Edo Liberty, Nir
Ailon, Amit Singer

RANDOM 2008 [bib]

Fast Dimension Reduction
Using Rademacher Series on Dual BCH Codes

Nir Ailon, Edo
Liberty

SODA 2008 [bib]

Frequent
Directions: Simple and Deterministic Matrix Sketching

Mina
Ghashami, Edo Liberty, Jeff M. Phillips, David P. Woodruff

In
review [bib]

Estimating Sizes of Social Networks via
Biased Sampling

Liran Katzir, Edo Liberty, Oren Somekh,
Ioana A. Cosma

Journal of Internet Mathematics [bib]

An Almost Optimal
Unrestricted Fast Johnson-Lindenstrauss Transform

Nir Ailon,
Edo Liberty

Transactions on Algorithms [bib]

Improved
Approximation Algorithms for Bipartite Correlation Clustering

Nir Ailon, Noa Avigdor-Elgrabli, Edo Liberty, and Anke van Zuylen

SIAM Journal on Computing [bib]

Unsupervised SVMs: On the complexity
of the Furthest Hyperplane Problem

Zohar Karnin, Edo
Liberty, Shachar Lovett, Roy Schwartz and Omri Weinstein

JMLR
2012 (Journal of Machine Learning Research) [bib]

Dense
Fast Random Projections and Lean Walsh Transforms,

Edo
Liberty, Nir Ailon, Amit Singer

DCG 2010 (Discrete and
Computational Geometry) [bib]

The Mailman algorithm: a
note on matrix vector multiplication

Edo Liberty, Steven
Zucker

IPL 2009 (Information Processing Letters) [bib]

Fast Dimension
Reduction Using Rademacher Series on Dual BCH Codes

Nir
Ailon, Edo Liberty

DCG 2008 (Discrete and Computational
Geometry) [bib]

A fast randomized
algorithm for the approximation of matrices

Edo Liberty,
Franco Woolfe, Vladimir Rokhlin, and Mark Tygert

ACHA 2008
(Applied and Computational Harmonic Analysis) [bib]

Randomized algorithms for the
low-rank approximation of matrices,

Edo Liberty, Franco
Woolfe, Per-Gunnar Martinsson, Vladimir Rokhlin, and Mark Tygert.

PNAS 2007 (Proceedings of the National Academy of Sciences) [bib]

Electrons and Phonons on the Square
Fibonacci Tiling

Roni Ilan, Edo Liberty, Shahar Even-Dar
Mandel, and Ron Lifshitz.

Ferroelectrics 2004.

Classifying man versus machine generated email

Zohar Karnin, Guy Halawi, David Wajc, Edo Liberty

A System for Email sequence identification

Edo Liberty, Zohar Karnin, Yoelle Maarek, Natalie Aizenberg

Sponsored Apps Marketplace in eMail

Ronny Lempel, Yoelle Maarek, Edward Bortnikov, Edo Liberty

Mining Global Email Folders For Identifying Auto-folders tags

Vishwanath Ramarao, Andrei Broder, Idan Szpektor, Edo Liberty, Yehuda Koren, Mark Risher, and Yoelle Maarek

Email sequence identification

Edo Liberty ,Zohar Karnin, Yoelle Maarek

Mailing List Identification and Representation

Zohar Karnin, Michal Aharon, Edo Liberty, Yoelle Maarek

Identification of subject line templates

Zohar Karnin, Edo Liberty, David Wajk, Guy Halawi

Electronic Mail Personal Vault

Edo Liberty, Yoelle Maarek

Mail Lint: Write Better Emails

Joel Tetreaul, Aasish Pappu, Edo Liberty ,Liangliang Cao, Meizhu Liu ,Ellie Tobochnik, Gilad Tzur, Yoelle Maarek

Methods for Displaying Contextually Targeted Content on a Connected Television

Zeev Neumeier, Edo Liberty

Methods for Identifying Video Segments and Displaying Contextually Targeted Content on Connected Televisions

Zeev Neumeier, Edo Liberty

Method And System For Clustering Data Points

Nir Ailon, Edo Liberty, Hari Khalsa

Methods for filtering data and filling in missing data using nonlinear inference

Edo Liberty, Steven Zucker, Yosi Keller, Mauro M. Maggioni, Ronald R. Coifman, Frank Geshwind, and in collaboration with Plain Sight Systems.

Generalized Stratified Sampling

Kevin Lang, Edo Liberty ,Konstantin Shmakov

Contest Generation Methods for Daily Fantasy Sports

Justin Thaler, Maxim Sviridenko, Edo Liberty, Prerit Uppal, Ron Belmarch, Jerry Shen

Correlation Clustering:
from Theory to Practice

KDD 2014 Tutorial [slides] [bib]

Streaming Data Mining

KDD 2012 tutorial on practical algorithms in mining
streaming data; with Jelani Nelson.

Fast Random Projections
survey and new results,

SODA 2011 and IAS and Yale math
seminars 2011.

Video of the talk at IAS available here.

Accelerated Dense Random Projections

PhD Thesis. See also Talk slides

KDD, ICML, WSDM, WWW, SIGIR, AISTATS, COLT, SODA, ESA, FOCS

I'm also an enthusiastic kitesurfer and snowboarder. Here are some pictures of that.

This site was last updated March 2016