Workshop Schedule

Download the full MMDS 2014 program here.
Tue, June 17 Data Analysis and Statistical Data Analysis
08:00 - 09:45 Breakfast and registration *
09:45 - 10:00 Welcome and opening remarks Organizers
10:00 - 11:00 Large Scale Machine Learning at Verizon Ashok Srivastava
11:00 - 11:30 Communication Cost in Big Data Processing Dan Suciu
11:30 - 12:00 Content-based search in 50TB of consumer-produced videos Gerald Friedland
12:00 - 02:00 Lunch *
02:00 - 02:30 Myria: Scalable Analytics as a Service Bill Howe
02:30 - 03:00 Computing stationary distribution, locally Devavrat Shah
03:00 - 03:30 Spectral algorithms for graph mining and analysis Yiannis Koutis
03:30 - 04:00 Network community detection Jiashun Jin
04:00 - 04:30 Coffee break *
04:30 - 05:00 Optimal CUR Matrix Decompositions David Woodruff
05:00 - 05:30 Dimensionality reduction via sparse matrices Jelani Nelson
05:30 - 06:00 Influence sampling for generalized linear models Jinzhu Jia
06:00 - 09:00 Dinner Reception *
Wed, June 18 Industrial and Scientific Applications
09:00 - 10:00 Counterfactual reasoning and massive data sets Leon Bottou
10:00 - 10:30 Connected Components in MapReduce and Beyond Sergei Vassilvitskii
10:30 - 11:00 Coffee break *
11:00 - 11:30 Distributing Large-scale Recommendation Algorithms: from GPUs to the Cloud Xavier Amatriain
11:30 - 12:00 Disentangling sources of risk in massive financial portfolios Jeffrey Bohn
12:00 - 02:30 Lunch *
02:30 - 03:00 Localized Methods for Diffusions in Large Graphs David Gleich
03:00 - 03:30 FAST-PPR: Scaling Personalized PageRank Estimation for Large Graphs Ashish Goel
03:30 - 04:00 Locally-biased and semi-supervised eigenvectors Michael Mahoney
04:00 - 04:30 Coffee break *
04:30 - 05:00 Optimal Shrinkage of Fast Singular Values Matan Gavish
05:00 - 05:30 Dimension Independent Matrix Square using MapReduce Reza Zadeh
Thu, June 19 Novel Algorithmic Approaches
09:00- 10:00 Analyzing Big Graphs via Sketching and Streaming Andrew McGregor
10:00- 10:30 Large-Scale Inference in Time Domain Astrophysics Joshua Bloom
10:30- 11:00 Coffee break *
11:00- 11:30 Exploring "forgotten" one-shot learning Alek Kolcz
11:30- 12:00 Modeling Dynamics of Opinion Formation in Social Networks Sreenivas Gollapudi
12:00- 12:30 Multi-reference Alignment: Estimating Group Transformations using Semidefinite Programming Amit Singer
12:30- 02:30 Lunch *
02:30- 03:00 IPython: a language-independent framework for computation and data Fernando Perez
03:00- 03:30 Reducing Communication in Parallel Graph Computations Aydin Buluc
03:30- 04:00 Large Scale Graph-Parallel Computation for Machine Learning: Applications and Systems Joseph Gonzalez
04:00- 04:30 Coffee break *
04:30- 05:00 CUR Factorization via Discrete Empirical Interpolation Mark Embree
05:00- 05:30 Leverage scores: Sensitivity and an App Ilse Ipsen
05:30- 06:00 libSkylark: Sketching-based Accelerated Numerical Linear Algebra and Machine Learning for Distributed-memory Systems Vikas Sindhwani
06:00- 09:00 Dinner Reception and Poster Session *
Fri, June 20 Novel Matrix and Graph Methods
09:00- 10:00 Large-Scale Numerical Computation Using a Data Flow Engine Matei Zaharia
10:00- 10:30 Automatic discovery of cell types and microcircuitry from neural connectomics Eric Jonas
10:30- 11:00 Coffee break *
11:00- 11:30 Beyond Locality Sensitive Hashing Alexandr Andoni
11:30- 12:00 Combinatorial optimization and sparse computation for large scale data mining Dorit Hochbaum
12:00- 12:30 Public Participation in International Security - Open Source Treaty Verification Christopher Stubbs
12:30- 02:30 Lunch *
02:30- 03:00 The Hearts and Minds of Data Science Cecilia Aragon
03:00- 03:30 The fall and rise of geometric centralities Sebastiano Vigna
03:30- 04:00 Mixed Regression Constantine Caramanis
04:00- 04:30 No Free Lunch for Stress Testers: Toward a Normative Theory of Scenario-Based Risk Assessment Lisa Goldberg

Confirmed Speakers

Alek Kolcz Twitter
Alexandr Andoni Microsoft Research
Amit Singer Princeton University
Andrei Kirilenko MIT Sloan School of Management
Andrew McGregor University of Massachusetts
Anna Gilbert University of Michigan
Ashish Goel Stanford University
Ashok Srivastava Verizon
Aydin Buluc Berkeley Lab
Ben Recht UC Berkeley
Bill Howe University of Washington eScience Institute
Cecilia Aragon University of Washington
Christopher Stubbs Harvard University
Constantine Caramanis UT Austin
Dan Suciu University of Washington
David Gleich Purdue University
David Woodruff IBM Research Almaden
Devavrat Shah MIT
Dorit Hochbaum UC Berkeley
Eric Jonas UC Berkeley
Fernando Perez UC Berkeley
Gerald Friedland ICSI
Ilse Ipsen North Carolina State University
Jeffrey Bohn State Street
Jelani Nelson Harvard University
Jiashun Jin Carnegie Mellon University
Jinzhu Jia Peking University
Joseph Gonzalez UC Berkeley
Joshua Bloom UC Berkeley
Leon Bottou Microsoft Research
Lisa Goldberg University of California, Berkeley
Mark Embree Virginia Tech
Matan Gavish Stanford University
Matei Zaharia Databricks, MIT
Michael Mahoney UC Berkeley
Peter Richtarik Edinburgh
Reza Zadeh Stanford University
Sebastiano Vigna Università degli Studi di Milano
Sergei Vassilvitskii Google
Sreenivas Gollapudi Microsoft Research
Vikas Sindhwani IBM Research
Xavier Amatriain Netflix
Yiannis Koutis University of Puerto Rico - Rio Piedras

The Workshops on Algorithms for Modern Massive Data Sets (MMDS) address algorithmic and statistical challenges in modern large-scale data analysis. The program for MMDS 2014 will be structured around three related foci: theoretical foundations; novel implementations; and diverse applications. Applications to be discussed include astrophysics, genetics, finance, telecommunications, earthquake monitoring, defense and international treaty verification, business analytics, internet advertising and analysis, and social network analysis. Implementation topics will include MapReduce, Spark, and related frameworks, and extending these frameworks to do iterative matrix algorithms and large-scale machine learning and graph analytics; systems for reducing communication in parallel and distributed graph computations; systems for distributed randomized numerical linear algebra; IPython and scalable analytics as a service; and scaling novel theoretical methods up to tera-scale problems and beyond. Theoretical topics will include sketching, streaming, and projection algorithms for matrix and graph problems; randomized numerical linear algebra methods; communication-aware matrix and graph algorithms; localized spectral and diffusion methods for large-scale graph computations; and novel developments in locality-sensitive hashing, large-scale optimization, etc.

Titles for this year's talks include:
Exploring "forgotten" one-shot learning (Alek Kolcz)
Beyond Locality Sensitive Hashing (Alexandr Andoni)
Multi-reference Alignment: Estimating Group Transformations using Semidefinite Programming (Amit Singer)
Do U.S. Regulators Listen to the Public? Testing the Regulatory Process with the RegRank Algorithm (Andrei Kirilenko)
Analyzing Big Graphs via Sketching and Streaming (Andrew McGregor)
Data-based inverse problems (Anna Gilbert)
FAST-PPR: Scaling Personalized PageRank Estimation for Large Graphs (Ashish Goel)
Large Scale Machine Learning at Verizon (Ashok Srivastava)
Reducing Communication in Parallel Graph Computations (Aydin Buluc)
Myria: Scalable Analytics as a Service (Bill Howe)
The Hearts and Minds of Data Science (Cecilia Aragon)
Public Participation in International Security - Open Source Treaty Verification (Christopher Stubbs)
Mixed Regression (Constantine Caramanis)
Communication Cost in Big Data Processing (Dan Suciu)
Localized Methods for Diffusions in Large Graphs (David Gleich)
Optimal CUR Matrix Decompositions (David Woodruff)
Computing stationary distribution, locally (Devavrat Shah)
Combinatorial optimization and sparse computation for large scale data mining (Dorit Hochbaum)
Automatic discovery of cell types and microcircuitry from neural connectomics (Eric Jonas)
IPython: a language-independent framework for computation and data (Fernando Perez)
Content-based search in 50TB of consumer-produced videos (Gerald Friedland)
Leverage scores: Sensitivity and an App (Ilse Ipsen)
Disentangling sources of risk in massive financial portfolios (Jeffrey Bohn)
Dimensionality reduction via sparse matrices (Jelani Nelson)
Network community detection (Jiashun Jin)
Influence sampling for generalized linear models (Jinzhu Jia)
Large Scale Graph-Parallel Computation for Machine Learning: Applications and Systems (Joseph Gonzalez)
Large-Scale Inference in Time Domain Astrophysics (Joshua Bloom)
Counterfactual reasoning and massive data sets (Leon Bottou)
No Free Lunch for Stress Testers: Toward a Normative Theory of Scenario-Based Risk Assessment (Lisa Goldberg)
CUR Factorization via Discrete Empirical Interpolation (Mark Embree)
Optimal Shrinkage of Fast Singular Values (Matan Gavish)
Large-Scale Numerical Computation Using a Data Flow Engine (Matei Zaharia)
Locally-biased and semi-supervised eigenvectors (Michael Mahoney)
APPROX: Accelerated, Parallel and PROXimal coordinate descent (Peter Richtarik)
Dimension Independent Matrix Square using MapReduce (Reza Zadeh)
The fall and rise of geometric centralities (Sebastiano Vigna)
Connected Components in MapReduce and Beyond (Sergei Vassilvitskii)
Modeling Dynamics of Opinion Formation in Social Networks (Sreenivas Gollapudi)
libSkylark: Sketching-based Accelerated Numerical Linear Algebra and Machine Learning for Distributed-memory Systems (Vikas Sindhwani)
Distributing Large-scale Recommendation Algorithms: from GPUs to the Cloud (Xavier Amatriain)
Spectral algorithms for graph mining and analysis (Yiannis Koutis)
020_home
Event Location
Talks will be held on the UC Berkeley campus in Stanley Hall, Room 105. For convenience, here are links to the UC Berkeley and Google maps of Stanley Hall and the campus.


Glyphicons_089_building
Hotels
The following is a list of recommended hotel and bed & breakfast options. Most of the locations are within 30 minute walking distance to Stanley Hall.



028_cars
Directions & Parking
BART from San Francisco International Airport: Take the complimentary AirTrain (red line) from your arrival terminal to the SFO BART station. At the BART station, purchase a ticket to Downtown Berkeley. Board any Pittsburg/Bay Point or Concord train (yellow line). When the train arrives at the 19th St. Oakland station, transfer to the Richmond train, which should be waiting on the opposite side of the platform (timed transfer). Take the Richmond train to Downtown Berkeley. Travel time: 55 minutes.

BART from Oakland International Airport: Take the AirBART shuttle ($3) to the Oakland Coliseum BART station. At the BART station, purchase a ticket to Downtown Berkeley. Board a Richmond train (orange line), which will take you directly to the Downtown Berkeley station. Travel time: 45 minutes.

General information on public trasportation and parking around UC Berkeley may be found here and here. Campus parking is available at the Underhill Structure, located on Channing Way between The closest off-campus parking is the City of Berkeley Telegraph Channing Garage, located at 2450 Durant Ave., between Dana St. and Telegraph Ave. More parking infomation may be found at this link.
Megaphone-icon-36x36

Announcements

The program for MMDS 2014 is now available for download.
Evening receptions will be held on Tuesday, June 17th and Thursday, June 19th. Join Ayasdi on June 17th for an evening reception of food, drinks, and data science. The Thursday reception will be held jointly with the MMDS poster session.
Early registration for the MMDS 2014 workshop extended to May 7th.
Call for posters: In addition to the talks there will be a poster session during one of the receptions. You may apply to present a poster on the registration page.
Early registration for the MMDS 2014 workshop will remain open through May 1st.
We will be running MMDS 2014 at UC Berkeley campus Tuesday, June 17 through Friday, June 20.



News-icon-36x36

Latest news

MMDS 2014 talks will take place in Stanley Hall.
GraphLab is hosting its 3rd conference on Monday July 21, 2014 at the Nikko Hotel in San Francisco. More information about this event may be found here.



Academic-icon-36x36

Organizing committee

Michael Mahoney (Chair), ICSI and Department of Statistics, UC Berkeley.
Alexander Shkolnik, Institute for Computational Mathematics and Engineering, Stanford University.
Petros Drineas, Department of Computer Science, Rensselaer Polytechnic Institute.
Reza Zadeh, Institute for Computational Mathematics and Engineering, Stanford University.
Fernando Perez, Henry H. Wheeler Jr. Brain Imaging Center, UC Berkeley.