The 6th MMDS Workshop on Algorithms for Modern Massive Data Sets was held June 21–24, 2016, in Berkeley, CA. Video recordings of all the talks may be found on our YouTube channel. Download the full MMDS 2016 program here.

Tue, June 21 Data Analysis and Statistical Data Analysis *
08:00 09:45 Breakfast and registration *
09:45 10:00 Welcome and opening remarks Organizers
10:00 11:00 Meaningful Visual Exploration of Massive Data Peter Wang
11:00 11:30 Scalable Collective Inference from Richly Structured Data
show videohide video
Lise Getoor
11:30 12:00 A Framework for Processing Large Graphs in Shared Memory
show videohide video
Julian Shun
12:00 02:00 Lunch *
02:00 02:30 Minimax optimal subsampling for large sample linear regression
show videohide video
Aarti Singh
02:30 03:00 Randomized Low-Rank Approximation and PCA: Beyond Sketching
show videohide video
Cameron Musco
03:00 03:30 Restricted Strong Convexity Implies Weak Submodularity
show videohide video
Alex Dimakis
03:30 04:00 Coffee break *
04:00 04:30 The Stability Principle for Information Extraction from Data
show videohide video
Bin Yu
04:30 05:00 New Results in Non-Convex Optimization for Large Scale Machine Learning
show videohide video
Constantine Caramanis
05:00 05:30 The Union of Intersections Method
show videohide video
Kristofer Bouchard
05:30 06:00 Head, Torso and Tail - Performance for modeling real data
show videohide video
Alex Smola
06:00 08:00 Dinner Reception
Wed, June 22 Industrial and Scientific Applications *
09:00 10:00 New Methods for Designing and Analyzing Large Scale Randomized Experiment
show videohide video
Jasjeet Sekhon
10:00 10:30 Cooperative Computing for Autonomous Data Centers Storing Social Network Data
show videohide video
Jonathan Berry
10:30 11:00 Coffee break
11:00 11:30 Is manifold learning for toy data only?
show videohide video
Marina Meila
11:30 12:00 Exploring Galaxy Evolution through Manifold Learning Jake VanderPlas
12:00 02:00 Lunch
02:00 02:30 Fast, flexible, and interpretable regression modeling
show videohide video
Daniela Witten
02:30 03:00 Randomized Composable Core-sets for Distributed Computation Vahab Mirrokni
03:00 03:30 Local graph clustering algorithms: an optimization perspective
show videohide video
Kimon Fountoulakis
03:30 04:00 Coffee break
04:00 04:30 Using Principal Component Analysis to Estimate a High Dimensional Factor Model with High-Frequency Data
show videohide video
Dacheng Xiu
04:30 05:00 Identifying Broad and Narrow Financial Risk Factors with Convex Optimization: Part 1
show videohide video
Lisa Goldberg
05:00 05:30 Identifying Broad and Narrow Financial Risk Factors with Convex Optimization: Part 2 Alex Shkolnik
05:30 06:00 Learning about business cycle conditions from four terabytes of data
show videohide video
Serena Ng
Thu, June 23 Novel Algorithmic Methods *
09:00 10:00 Top 10 Data Analytics Problems in Science
show videohide video
10:00 10:30 Low-rank matrix factorizations at scale: Spark for scientific data analytics Alex Gittens
10:30 11:00 Coffee break
11:00 11:30 Structure & Dynamics from Random Observations
show videohide video
Abbas Ourmazd
11:30 12:00 Stochastic Integration via Error-Correcting Codes Dimitris Achlioptas
12:30 02:00 Lunch *
02:00 02:30 Why Deep Learning Works: Perspectives from Theoretical Chemistry Charles Martin
02:30 03:00 A theory of multineuronal dimensionality, dynamics and measurement
show videohide video
Surya Ganguli
03:00 03:30 Sub-sampled Newton Methods: Uniform and Non-Uniform Sampling
show videohide video
Fred Roosta
03:30 04:00 Coffee break *
04:00 04:30 In-core computation of geometric centralities with HyperBall: A hundred billion nodes and beyond
show videohide video
Sebastiano Vigna
04:30 05:00 Higher-order clustering of networks David Gleich
05:00 05:30 Mining Tools for Large-Scale Networks
show videohide video
Charalampos Tsourakakis
05:30 06:00 Building Scalable Predictive Modeling Platform for Healthcare Applications
show videohide video
Jimeng Sun
06:00 08:00 Dinner reception and poster session
Fri, June 24 Novel Matrix and Graph Methods *
09:00 10:00 Scalable interaction with data: where artificial intelligence meets visualization Christopher White
10:00 10:30 Ameliorating the Annotation Bottleneck Christopher Re
10:30 11:00 Coffee break
11:00 11:30 Homophily and transitivity in dynamic network formation Bryan Graham
11:30 12:00 Systemwide Commonalities in Market Liquidity Mark Flood
12:30 02:00 Lunch *
02:00 02:30 Train faster, generalize better: Stability of stochastic gradient descent Moritz Hardt
02:30 03:00 Extracting governing equations from highly corrupted data Rachel Ward
03:00 03:30 Nonparametric Network Smoothing Cosma Shalizi
03:30 04:00 Coffee break *
04:00 04:30 PCA from noisy linearly reduced measurements
show videohide video
Amit Singer and Joakim Anden
04:30 05:00 PCA with Model Misspecification
show videohide video
Robert Anderson
05:00 05:30 Fast Graphlet Decomposition
show videohide video
Ted Willke and Nesreen Ahmed
Dimitris Achlioptas UC Santa Cruz
Nesreen Ahmed Intel Labs
Joakim Anden Princeton University
Robert Anderson UC Berkeley
Jonathan Berry Sandia National Laboratories
Kristofer Bouchard Lawrence Berkeley National Laboratory
Constantine Caramanis UT Austin
Alex Dimakis UT Austin
Mark Flood Office of Financial Research
Kimon Fountoulakis University of California Berkeley
Surya Ganguli Stanford University
Lise Getoor UC Santa Cruz
Alex Gittens International Computer Science Institute
David Gleich Purdue University
Lisa Goldberg UC Berkeley
Bryan Graham UC Berkeley (Economics)