The 6th MMDS Workshop on Algorithms for Modern Massive Data Sets was held June 21–24, 2016, in Berkeley, CA. Video recordings of all the talks may be found on our YouTube channel. Download the full MMDS 2016 program here.

Tue, June 21 Data Analysis and Statistical Data Analysis *
08:00 09:45 Breakfast and registration *
09:45 10:00 Welcome and opening remarks Organizers
10:00 11:00 Meaningful Visual Exploration of Massive Data Peter Wang
11:00 11:30 Scalable Collective Inference from Richly Structured Data
show videohide video
Lise Getoor
11:30 12:00 A Framework for Processing Large Graphs in Shared Memory
show videohide video
Julian Shun
12:00 02:00 Lunch *
02:00 02:30 Minimax optimal subsampling for large sample linear regression
show videohide video
Aarti Singh
02:30 03:00 Randomized Low-Rank Approximation and PCA: Beyond Sketching
show videohide video
Cameron Musco
03:00 03:30 Restricted Strong Convexity Implies Weak Submodularity
show videohide video
Alex Dimakis
03:30 04:00 Coffee break *
04:00 04:30 The Stability Principle for Information Extraction from Data
show videohide video
Bin Yu
04:30 05:00 New Results in Non-Convex Optimization for Large Scale Machine Learning
show videohide video
Constantine Caramanis
05:00 05:30 The Union of Intersections Method
show videohide video
Kristofer Bouchard
05:30 06:00 Head, Torso and Tail - Performance for modeling real data
show videohide video
Alex Smola
06:00 08:00 Dinner Reception
Wed, June 22 Industrial and Scientific Applications *
09:00 10:00 New Methods for Designing and Analyzing Large Scale Randomized Experiment
show videohide video
Jasjeet Sekhon
10:00 10:30 Cooperative Computing for Autonomous Data Centers Storing Social Network Data
show videohide video
Jonathan Berry
10:30 11:00 Coffee break
11:00 11:30 Is manifold learning for toy data only?
show videohide video
Marina Meila
11:30 12:00 Exploring Galaxy Evolution through Manifold Learning Jake VanderPlas
12:00 02:00 Lunch
02:00 02:30 Fast, flexible, and interpretable regression modeling
show videohide video
Daniela Witten
02:30 03:00 Randomized Composable Core-sets for Distributed Computation Vahab Mirrokni
03:00 03:30 Local graph clustering algorithms: an optimization perspective
show videohide video
Kimon Fountoulakis
03:30 04:00 Coffee break
04:00 04:30 Using Principal Component Analysis to Estimate a High Dimensional Factor Model with High-Frequency Data
show videohide video
Dacheng Xiu
04:30 05:00 Identifying Broad and Narrow Financial Risk Factors with Convex Optimization: Part 1
show videohide video
Lisa Goldberg
05:00 05:30 Identifying Broad and Narrow Financial Risk Factors with Convex Optimization: Part 2 Alex Shkolnik
05:30 06:00 Learning about business cycle conditions from four terabytes of data
show videohide video
Serena Ng
Thu, June 23 Novel Algorithmic Methods *
09:00 10:00 Top 10 Data Analytics Problems in Science
show videohide video
Prabhat
10:00 10:30 Low-rank matrix factorizations at scale: Spark for scientific data analytics Alex Gittens
10:30 11:00 Coffee break
11:00 11:30 Structure & Dynamics from Random Observations
show videohide video
Abbas Ourmazd
11:30 12:00 Stochastic Integration via Error-Correcting Codes Dimitris Achlioptas
12:30 02:00 Lunch *
02:00 02:30 Why Deep Learning Works: Perspectives from Theoretical Chemistry Charles Martin
02:30 03:00 A theory of multineuronal dimensionality, dynamics and measurement
show videohide video
Surya Ganguli
03:00 03:30 Sub-sampled Newton Methods: Uniform and Non-Uniform Sampling
show videohide video
Fred Roosta
03:30 04:00 Coffee break *
04:00 04:30 In-core computation of geometric centralities with HyperBall: A hundred billion nodes and beyond
show videohide video
Sebastiano Vigna
04:30 05:00 Higher-order clustering of networks David Gleich
05:00 05:30 Mining Tools for Large-Scale Networks
show videohide video
Charalampos Tsourakakis
05:30 06:00 Building Scalable Predictive Modeling Platform for Healthcare Applications
show videohide video
Jimeng Sun
06:00 08:00 Dinner reception and poster session
Fri, June 24 Novel Matrix and Graph Methods *
09:00 10:00 Scalable interaction with data: where artificial intelligence meets visualization Christopher White
10:00 10:30 Ameliorating the Annotation Bottleneck Christopher Re
10:30 11:00 Coffee break
11:00 11:30 Homophily and transitivity in dynamic network formation Bryan Graham
11:30 12:00 Systemwide Commonalities in Market Liquidity Mark Flood
12:30 02:00 Lunch *
02:00 02:30 Train faster, generalize better: Stability of stochastic gradient descent Moritz Hardt
02:30 03:00 Extracting governing equations from highly corrupted data Rachel Ward
03:00 03:30 Nonparametric Network Smoothing Cosma Shalizi
03:30 04:00 Coffee break *
04:00 04:30 PCA from noisy linearly reduced measurements
show videohide video
Amit Singer and Joakim Anden
04:30 05:00 PCA with Model Misspecification
show videohide video
Robert Anderson
05:00 05:30 Fast Graphlet Decomposition
show videohide video
Ted Willke and Nesreen Ahmed
Tue, June 21 Data Analysis and Statistical Data Analysis *
08:00 09:45 Breakfast and registration *
09:45 10:00 Welcome and opening remarks Organizers
10:00 11:00 Meaningful Visual Exploration of Massive Data Peter Wang
11:00 11:30 Scalable Collective Inference from Richly Structured Data Lise Getoor
11:30 12:00 A Framework for Processing Large Graphs in Shared Memory Julian Shun
12:00 02:00 Lunch *
02:00 02:30 Minimax optimal subsampling for large sample linear regression Aarti Singh
02:30 03:00 Randomized Low-Rank Approximation and PCA: Beyond Sketching Cameron Musco
03:00 03:30 Restricted Strong Convexity Implies Weak Submodularity Alex Dimakis
03:30 04:00 Coffee break *
04:00 04:30 The Stability Principle for Information Extraction from Data Bin Yu
04:30 05:00 New Results in Non-Convex Optimization for Large Scale Machine Learning Constantine Caramanis
05:00 05:30 The Union of Intersections Method Kristofer Bouchard
05:30 06:00 Head, Torso and Tail - Performance for modeling real data Alex Smola
06:00 08:00 Dinner reception *
Wed, June 22 Industrial and Scientific Applications *
09:00 10:00 New Methods for Designing and Analyzing Large Scale Randomized Experiment Jasjeet Sekhon
10:00 10:30 Cooperative Computing for Autonomous Data Centers Storing Social Network Data Jonathan Berry
10:30 11:00 Coffee break *
11:00 11:30 Is manifold learning for toy data only? Marina Meila
11:30 12:00 Exploring Galaxy Evolution through Manifold Learning Jake VanderPlas
12:00 02:00 Lunch *
02:00 02:30 Fast, flexible, and interpretable regression modeling Daniela Witten
02:30 03:00 Randomized Composable Core-sets for Distributed Computation Vahab Mirrokni
03:00 03:30 Local graph clustering algorithms: an optimization perspective Kimon Fountoulakis
03:30 04:00 Coffee break *
04:00 04:30 Using Principal Component Analysis to Estimate a High Dimensional Factor Model with High-Frequency Data Dacheng Xiu
04:30 05:00 Identifying Broad and Narrow Financial Risk Factors with Convex Optimization: Part 1 Lisa Goldberg
05:00 05:30 Identifying Broad and Narrow Financial Risk Factors with Convex Optimization: Part 2 Alex Shkolnik
05:30 06:00 Learning about business cycle conditions from four terabytes of data Serena Ng
Thu, June 23 Novel Algorithmic Methods *
09:00 10:00 Top 10 Data Analytics Problems in Science Prabhat
10:00 10:30 Low-rank matrix factorizations at scale: Spark for scientific data analytics Alex Gittens
10:30 11:00 Coffee break *
11:00 11:30 Structure & Dynamics from Random Observations Abbas Ourmazd
11:30 12:00 Stochastic Integration via Error-Correcting Codes Dimitris Achlioptas
12:00 02:00 Lunch *
02:00 02:30 Why Deep Learning Works: Perspectives from Theoretical Chemistry Charles Martin
02:30 03:00 A theory of multineuronal dimensionality, dynamics and measurement Surya Ganguli
03:00 03:30 Sub-sampled Newton Methods: Uniform and Non-Uniform Sampling Fred Roosta
03:30 04:00 Coffee break *
04:00 04:30 In-core computation of geometric centralities with HyperBall: A hundred billion nodes and beyond Sebastiano Vigna
04:30 05:00 Higher-order clustering of networks David Gleich
05:00 05:30 Mining Tools for Large-Scale Networks Charalampos Tsourakakis
05:30 06:00 Building Scalable Predictive Modeling Platform for Healthcare Applications Jimeng Sun
06:00 08:00 Dinner reception and poster session
Fri, June 24 Novel Matrix and Graph Methods *
09:00 10:00 Scalable interaction with data: where artificial intelligence meets visualization Christopher White
10:00 10:30 Ameliorating the Annotation Bottleneck Christopher Re
10:30 11:00 Coffee break *
11:00 11:30 Homophily and transitivity in dynamic network formation Bryan Graham
11:30 12:00 Systemwide Commonalities in Market Liquidity Mark Flood
12:00 02:00 Lunch *
02:00 02:30 Train faster, generalize better: Stability of stochastic gradient descent Moritz Hardt
02:30 03:00 Extracting governing equations from highly corrupted data Rachel Ward
03:00 03:30 Nonparametric Network Smoothing Cosma Shalizi
03:30 04:00 Coffee break *
04:00 04:30 PCA from noisy linearly reduced measurements Amit Singer and Joakim Anden
04:30 05:00 PCA with Model Misspecification Robert Anderson
05:00 05:30 Fast Graphlet Decomposition Ted Willke and Nesreen Ahmed
-->
Dimitris Achlioptas UC Santa Cruz
Nesreen Ahmed Intel Labs
Joakim Anden Princeton University
Robert Anderson UC Berkeley
Jonathan Berry Sandia National Laboratories
Kristofer Bouchard Lawrence Berkeley National Laboratory
Constantine Caramanis UT Austin
Alex Dimakis UT Austin
Mark Flood Office of Financial Research
Kimon Fountoulakis University of California Berkeley
Surya Ganguli Stanford University
Lise Getoor UC Santa Cruz
Alex Gittens International Computer Science Institute
David Gleich Purdue University
Lisa Goldberg UC Berkeley
Bryan Graham UC Berkeley (Economics)
Moritz Hardt Google Research
Charles Martin Calculation Consulting
Marina Meila University of Washington
Vahab Mirrokni Google Research
Cameron Musco Massachusetts Institute of Technology
Serena Ng Columbia University
Abbas Ourmazd Univ. of Wisconsin Milwaukee
Prabhat Lawrence Berkeley National Laboratory
Christopher Re Stanford University
Fred Roosta ICSI and UC Berkeley
Jasjeet Sekhon UC Berkeley
Cosma Shalizi Carnegie Mellon University
Alex Shkolnik UC Berkeley
Julian Shun UC Berkeley
Amit Singer Princeton University
Aarti Singh Carnegie Mellon University
Alex Smola Carnegie Mellon University
Jimeng Sun Georgia tech
Matt Taddy Chicago Booth and Microsoft Research
Charalampos Tsourakakis Harvard University
Jake VanderPlas University of Washington
Sebastiano Vigna Università degli Studi di Milano, Dipartimento di Informatica
Peter Wang Continuum Analytics
Rachel Ward University of Texas at Austin
Christopher White Microsoft
Ted Willke Intel Labs
Daniela Witten University of Washington
Dacheng Xiu Chicago Booth
Bin Yu Statistics and EECS, UC Berkeley

The 2016 Workshop on Algorithms for Modern Massive Data Sets (MMDS 2016) will address algorithmic and statistical challenges in modern large-scale data analysis. The program for MMDS 2016 will be structured around three related themes: theoretical foundations; novel algorithms and implementations; and diverse data applications.

Talk abstracts:
Stochastic Integration via Error-Correcting Codes (Dimitris Achlioptas)
Fast Graphlet Decomposition (Nesreen Ahmed)
PCA from noisy linearly reduced measurements (Joakim Anden)
PCA with Model Misspecification (Robert Anderson)
Cooperative Computing for Autonomous Data Centers Storing Social Network Data (Jonathan Berry)
The Union of Intersections Method (Kristofer Bouchard)
New Results in Non-Convex Optimization for Large Scale Machine Learning (Constantine Caramanis)
Restricted Strong Convexity Implies Weak Submodularity (Alex Dimakis)
Systemwide Commonalities in Market Liquidity (Mark Flood)
Local graph clustering algorithms: an optimization perspective (Kimon Fountoulakis)
A theory of multineuronal dimensionality, dynamics and measurement (Surya Ganguli)
Scalable Collective Inference from Richly Structured Data (Lise Getoor)
Low-rank matrix factorizations at scale: Spark for scientific data analytics (Alex Gittens)
Higher-order clustering of networks (David Gleich)
Identifying Broad and Narrow Financial Risk Factors with Convex Optimization: Part 1 (Lisa Goldberg)
Homophily and transitivity in dynamic network formation (Bryan Graham)
Train faster, generalize better: Stability of stochastic gradient descent (Moritz Hardt)
Why Deep Learning Works: Perspectives from Theoretical Chemistry (Charles Martin)
Is manifold learning for toy data only? (Marina Meila)
Randomized Composable Core-sets for Distributed Computation (Vahab Mirrokni)
Randomized Low-Rank Approximation and PCA: Beyond Sketching (Cameron Musco)
Learning about business cycle conditions from four terabytes of data (Serena Ng)
Structure & Dynamics from Random Observations (Abbas Ourmazd)
Top 10 Data Analytics Problems in Science (Prabhat)
Ameliorating the Annotation Bottleneck (Christopher Re)
Sub-sampled Newton Methods: Uniform and Non-Uniform Sampling (Fred Roosta)
New Methods for Designing and Analyzing Large Scale Randomized Experiment (Jasjeet Sekhon)
Nonparametric Network Smoothing (Cosma Shalizi)
Identifying Broad and Narrow Financial Risk Factors with Convex Optimization: Part 2 (Alex Shkolnik)
A Framework for Processing Large Graphs in Shared Memory (Julian Shun)
PCA from noisy linearly reduced measurements (Amit Singer)
Minimax optimal subsampling for large sample linear regression (Aarti Singh)
Head, Torso and Tail - Performance for modeling real data (Alex Smola)
Building Scalable Predictive Modeling Platform for Healthcare Applications (Jimeng Sun)
TBD (Matt Taddy)
Mining Tools for Large-Scale Networks (Charalampos Tsourakakis)
Exploring Galaxy Evolution through Manifold Learning (Jake VanderPlas)
In-core computation of geometric centralities with HyperBall: A hundred billion nodes and beyond (Sebastiano Vigna)
Meaningful Visual Exploration of Massive Data (Peter Wang)
Extracting governing equations from highly corrupted data (Rachel Ward)
Scalable interaction with data: where artificial intelligence meets visualization (Christopher White)
Fast Graphlet Decomposition (Ted Willke)
Fast, flexible, and interpretable regression modeling (Daniela Witten)
Using Principal Component Analysis to Estimate a High Dimensional Factor Model with High-Frequency Data (Dacheng Xiu)
The Stability Principle for Information Extraction from Data (Bin Yu)
Poster abstracts:
Fast Hierarchy Construction for Dense Subgraphs (A. Erdem Sariyuce)
Structure & Dynamics from Random Observations (Abbas Ourmazd)
node2vec: Scalable Feature Learning for Networks (Aditya Grover)
Variational Gram Functions: Convex Analysis and Optimization (Amin Jalali)
Algorithms for Computing Elements in a Free Distributive Lattice (Aubrey Laskowski)
Analytic Derivatives of High Dimensional Forward Models in Cosmology (Chirag Modi)
A statistical perspective on sketched regression (Daniel Ahfock)
Inferring missing data & accounting for patient variation to predict effective HIV treatments (Deborah Hanus)
Parallelization of Stable Principal Component Pursuit (Derek Driggs)
Core periphery structures to analyse a spatio-temporal dataset of crimes in San Francisco (Divya Sardana)
Pattern Discovery and Large-Scale Data mining on cosmological datasets (Doris Jung Lin Lee)
Streaming Pairwise Document Similarity by Shingling, Sketching and Hashing (Emaad Ahmed Manzoor)
Cosmo 4D: Towards the beginning of the Universe (Grigor Aslanyan)
Novel Machine Learning Techniques for Fast, Accurate Parameter Selection in Gaussian-kernel SVM (Guangliang Chen)
Capturing spatiotemporal variability in the influence of topography and vegetation on snow depth in the Tuolumne River Basin (Ian Bolliger)
Enabling Brain Functional Alignment for a Thousand Subjects (Javier Turek)
Sub-sampled Newton Methods with Non-uniform Sampling (Jiyan Yang)
A Data-Driven Approach to Multi-Asset Class Portfolio Simulations with Latent-Factor-Based Dimensionality Reduction (John Arabadjis)
Latent Behavior Analysis of Large Amounts of Network Security Data (Jovile Grebliauskaite)
Deep surveys of biological modules: K-biclustering gene expression and phenotype data (Marcin Joachimiak)
Fast Randomized Algorithms for Convex Optimization (Mert Pilanci)
Compressed Sensing without Sparsity Assumptions (Miles Lopes)
Compressed Dynamic Mode Decomposition (N. Benjamin Erichson)
Sub-sampled Newton Methods with Non-uniform Sampling (Peng Xu)
Rectools: A recommendation engine package (Pooja Rajkumar)
MyShake - Smartphone crowdsourcing for earthquakes (Qingkai KONG)
A Transfer Learning Approach for Autonomous Reconfiguration of Wearable Systems (Ramyar Saeedi)
Using Play-by-Play Data to Model, Simulate, and Predict NBA Games (Sebastian Rodriguez)
Web-Scale Distributed Community Detection using GraphX (Sebastien Dery)
SPLATT: Enabling Large-Scale Sparse Tensor Analysis (Shaden Smith)
Rapid, Robust, and Reliable Blind Deconvolution via Nonconvex Optimization (Shuyang Ling)
A New Similarity Score for Large-Scale, Sparse, and Discrete-Valued Data (Veronika Strnadova-Neeley)
Freshman or Fresher? Quantifying the Geographic Variation of Internet Language (Vivek Kulkarni)
Robust sketching for multiple square-root LASSO problems (Vu Pham)
A Subsampled Double Bootstrap for Massive Data (Xiaofeng Shao)
Point Integral Method for PDEs on Point Clouds (Zhen Li)
020_home
Event Location
Talks will be held on the UC Berkeley campus in Stanley Hall, Room 105. For convenience: the links UC Berkeley and Google are maps of Stanley Hall and the campus.


Glyphicons_089_building
Hotels
The following is a list of recommended hotel and bed & breakfast options. Most of the locations are within 30 minute walking distance to Stanley Hall.

Hotel Shattuck Plaza - Located in downtown Berkeley, the hotel is an easy stroll to a trendy mix of restaurants, theaters and attractions. With subway (BART) and bus stops literally around the corner, you can easily commute to San Francisco or Oakland within 30 minutes. UC Berkeley campus is also just few blocks away from this hotel.
Faculty Club - Centrally located on the UC Berkeley campus, Faculty Club is a short walk away from bus stops and restaurants. It is within walking distance to every point on the UC Berkeley campus and is perfect for those who’d like to explore the campus.
Hotel Durant - Hotel Durant is located one block away from UC Berkeley campus. With bus stops right around the corners, restaurants and shops are easily accessible. Complete with four-star amenities, Hotel Durant provides both modern luxury and historical Berkeley experience.
Rose Garden Inn - Located in South Berkeley, Rose Garden Inn is housed in five historic buildings surrounded by lush flower gardens and soothing fountains. This Inn offers beautiful rooms and suites with unique furnishing and rich carpet. It is around 5-minute drive from UC Berkeley campus. Restaurants and shops are within walking distance.
Berkeley Lab Guest House - This Guest House is conveniently located on the Lawrence Berkeley National Laboratory campus. Many of the rooms offer spectacular views of the San Francisco Bay, skyline, and City of Berkeley. The Guest House is only a few minutes away from the University of California Berkeley campus and the dynamic Berkeley community.
Double Tree by Hilton - Located on Berkeley Marina with waterfront views and panoramic views of San Francisco skyline and Berkeley Hills. All rooms feature a balcony or patio. It is a bit far away, but can easily commute to UC Berkeley campus with a 20-minute drive.
Claremont Hotel Club & Spa - Conveniently located in the heart of Berkeley, the hotel is only a 10-minute drive from UC Berkeley campus and a 20-minute drive from downtown San Francisco. Private parking is available. It is a large and quiet hotel with many breathtaking views of San Francisco.
Mary's Bed and Breakfast - The large lovely homes and gardens, scenic walks, easy parking, access to shops, and top-rated restaurants make this an ideal location. It is close to pathways with stunning views of San Francisco, the Golden Gate Bridge, Alcatraz, and Marin County. The bed and breakfast is one-mile south of UC Berkeley campus.
The Brick Path Bed and Breakfast - Located in the area of North Berkeley near popular Solano Avenue’s shops and restaurants, it is approximately two miles from the University of California and Downtown Berkeley. The cottage has its own garden entrance, private modern bathroom and comfortable queen-size bed.


028_cars
Directions & Parking
BART from San Francisco International Airport: Take the complimentary AirTrain (red line) from your arrival terminal to the SFO BART station. At the BART station, purchase a ticket to Downtown Berkeley. Board any Pittsburg/Bay Point or Concord train (yellow line). When the train arrives at the 19th St. Oakland station, transfer to the Richmond train, which should be waiting on the opposite side of the platform (timed transfer). Take the Richmond train to Downtown Berkeley. Travel time: 55 minutes.

BART from Oakland International Airport: Take the AirBART shuttle ($3) to the Oakland Coliseum BART station. At the BART station, purchase a ticket to Downtown Berkeley. Board a Richmond train (orange line), which will take you directly to the Downtown Berkeley station. Travel time: 45 minutes.

General information on public trasportation and parking around UC Berkeley may be found here and here. Campus parking is available at the Underhill Structure, located on Channing Way between The closest off-campus parking is the City of Berkeley Telegraph Channing Garage, located at 2450 Durant Ave., between Dana St. and Telegraph Ave. More parking infomation may be found at this link.


Academic-icon-36x36

Organizing committee


Seats220 Michael Mahoney (Chair), ICSI and Department of Statistics, UC Berkeley.
Alexander Shkolnik, CDAR and Department of Economics, UC Berkeley.
Petros Drineas, Department of Computer Science, Rensselaer Polytechnic Institute.

Megaphone-icon-36x36

Announcements

Talks will be held at Stanley Hall, Room 105. Here it is on a UC Berkeley map and a Google map.
The MMDS poster session will be held jointly with an evening reception on Thursday, June 23rd.
Interana is sponsoring MMDS 2016.
Join us on Tuesday, June 21st for an evening reception of food, drinks, and data science.
Early registration for the MMDS 2016 Workshop is extended till May 8th.
Call for posters: In addition to the talks there will be a poster session. You may apply to present a poster on the registration page.
Our new YouTube channel is now online with video recondings of talks from the 2012 & 2014 workshops.
Early registration for the MMDS 2016 Workshop is now open. Deadline: May 1st.
BIDS is sponsoring MMDS 2016.
Founded in 2013, the Berkeley Institute for Data Science (BIDS) is a central hub of research and education at UC Berkeley designed to facilitate and nurture data-intensive science.
The MMDS 2016 workshop will take place on the UC Berkeley Campus Tuesday, June 21 through Friday, June 24.
The 5th MMDS Workshop on Algorithms for Modern Massive Data Sets was held June 17–20, 2014, in Berkeley, CA. Video recordings of all the talks may be found on our YouTube channel. Download the full MMDS 2014 program here.