The 6th MMDS Workshop on Algorithms for Modern Massive Data Sets was held June 21–24, 2016, in Berkeley, CA. Video recordings of all the talks may be found on our YouTube channel. Download the full MMDS 2016 program here.

Tue, June 21 Data Analysis and Statistical Data Analysis *
08:00 09:45 Breakfast and registration *
09:45 10:00 Welcome and opening remarks Organizers
10:00 11:00 Meaningful Visual Exploration of Massive Data Peter Wang
11:00 11:30 Scalable Collective Inference from Richly Structured Data
show videohide video
Lise Getoor
11:30 12:00 A Framework for Processing Large Graphs in Shared Memory
show videohide video
Julian Shun
12:00 02:00 Lunch *
02:00 02:30 Minimax optimal subsampling for large sample linear regression
show videohide video
Aarti Singh
02:30 03:00 Randomized Low-Rank Approximation and PCA: Beyond Sketching
show videohide video
Cameron Musco
03:00 03:30 Restricted Strong Convexity Implies Weak Submodularity
show videohide video
Alex Dimakis
03:30 04:00 Coffee break *
04:00 04:30 The Stability Principle for Information Extraction from Data
show videohide video
Bin Yu
04:30 05:00 New Results in Non-Convex Optimization for Large Scale Machine Learning
show videohide video
Constantine Caramanis
05:00 05:30 The Union of Intersections Method
show videohide video
Kristofer Bouchard
05:30 06:00 Head, Torso and Tail - Performance for modeling real data
show videohide video
Alex Smola
06:00 08:00 Dinner Reception
Wed, June 22 Industrial and Scientific Applications *
09:00 10:00 New Methods for Designing and Analyzing Large Scale Randomized Experiment
show videohide video
Jasjeet Sekhon
10:00 10:30 Cooperative Computing for Autonomous Data Centers Storing Social Network Data
show videohide video
Jonathan Berry
10:30 11:00 Coffee break
11:00 11:30 Is manifold learning for toy data only?
show videohide video
Marina Meila
11:30 12:00 Exploring Galaxy Evolution through Manifold Learning Jake VanderPlas
12:00 02:00 Lunch
02:00 02:30 Fast, flexible, and interpretable regression modeling
show videohide video
Daniela Witten
02:30 03:00 Randomized Composable Core-sets for Distributed Computation Vahab Mirrokni
03:00 03:30 Local graph clustering algorithms: an optimization perspective
show videohide video
Kimon Fountoulakis
03:30 04:00 Coffee break
04:00 04:30 Using Principal Component Analysis to Estimate a High Dimensional Factor Model with High-Frequency Data
show videohide video
Dacheng Xiu
04:30 05:00 Identifying Broad and Narrow Financial Risk Factors with Convex Optimization: Part 1
show videohide video
Lisa Goldberg
05:00 05:30 Identifying Broad and Narrow Financial Risk Factors with Convex Optimization: Part 2 Alex Shkolnik
05:30 06:00 Learning about business cycle conditions from four terabytes of data
show videohide video
Serena Ng
Thu, June 23 Novel Algorithmic Methods *
09:00 10:00 Top 10 Data Analytics Problems in Science
show videohide video
10:00 10:30 Low-rank matrix factorizations at scale: Spark for scientific data analytics Alex Gittens
10:30 11:00 Coffee break
11:00 11:30 Structure & Dynamics from Random Observations
show videohide video
Abbas Ourmazd
11:30 12:00 Stochastic Integration via Error-Correcting Codes Dimitris Achlioptas
12:30 02:00 Lunch *
02:00 02:30 Why Deep Learning Works: Perspectives from Theoretical Chemistry Charles Martin
02:30 03:00 A theory of multineuronal dimensionality, dynamics and measurement
show videohide video
Surya Ganguli
03:00 03:30 Sub-sampled Newton Methods: Uniform and Non-Uniform Sampling
show videohide video
Fred Roosta
03:30 04:00 Coffee break *
04:00 04:30 In-core computation of geometric centralities with HyperBall: A hundred billion nodes and beyond
show videohide video
Sebastiano Vigna
04:30 05:00 Higher-order clustering of networks David Gleich
05:00 05:30 Mining Tools for Large-Scale Networks
show videohide video
Charalampos Tsourakakis
05:30 06:00 Building Scalable Predictive Modeling Platform for Healthcare Applications
show videohide video
Jimeng Sun
06:00 08:00 Dinner reception and poster session
Fri, June 24 Novel Matrix and Graph Methods *
09:00 10:00 Scalable interaction with data: where artificial intelligence meets visualization Christopher White
10:00 10:30 Ameliorating the Annotation Bottleneck Christopher Re
10:30 11:00 Coffee break
11:00 11:30 Homophily and transitivity in dynamic network formation Bryan Graham
11:30 12:00 Systemwide Commonalities in Market Liquidity Mark Flood
12:30 02:00 Lunch *
02:00 02:30 Train faster, generalize better: Stability of stochastic gradient descent Moritz Hardt
02:30 03:00 Extracting governing equations from highly corrupted data Rachel Ward
03:00 03:30 Nonparametric Network Smoothing Cosma Shalizi
03:30 04:00 Coffee break *
04:00 04:30 PCA from noisy linearly reduced measurements
show videohide video
Amit Singer and Joakim Anden
04:30 05:00 PCA with Model Misspecification
show videohide video
Robert Anderson
05:00 05:30 Fast Graphlet Decomposition
show videohide video
Ted Willke and Nesreen Ahmed
Dimitris Achlioptas UC Santa Cruz
Nesreen Ahmed Intel Labs
Joakim Anden Princeton University
Robert Anderson UC Berkeley
Jonathan Berry Sandia National Laboratories
Kristofer Bouchard Lawrence Berkeley National Laboratory
Constantine Caramanis UT Austin
Alex Dimakis UT Austin
Mark Flood Office of Financial Research
Kimon Fountoulakis University of California Berkeley
Surya Ganguli Stanford University
Lise Getoor UC Santa Cruz
Alex Gittens International Computer Science Institute
David Gleich Purdue University
Lisa Goldberg UC Berkeley
Bryan Graham UC Berkeley (Economics)
Moritz Hardt Google Research
Charles Martin Calculation Consulting
Marina Meila University of Washington
Vahab Mirrokni Google Research
Cameron Musco Massachusetts Institute of Technology
Serena Ng Columbia University
Abbas Ourmazd Univ. of Wisconsin Milwaukee
Prabhat Lawrence Berkeley National Laboratory
Christopher Re Stanford University
Fred Roosta ICSI and UC Berkeley
Jasjeet Sekhon UC Berkeley
Cosma Shalizi Carnegie Mellon University
Alex Shkolnik UC Berkeley
Julian Shun UC Berkeley
Amit Singer Princeton University
Aarti Singh Carnegie Mellon University
Alex Smola Carnegie Mellon University
Jimeng Sun Georgia tech
Matt Taddy Chicago Booth and Microsoft Research
Charalampos Tsourakakis Harvard University
Jake VanderPlas University of Washington
Sebastiano Vigna Università degli Studi di Milano, Dipartimento di Informatica
Peter Wang Continuum Analytics
Rachel Ward University of Texas at Austin
Christopher White Microsoft
Ted Willke Intel Labs
Daniela Witten University of Washington
Dacheng Xiu Chicago Booth
Bin Yu Statistics and EECS, UC Berkeley
The 5th MMDS Workshop on Algorithms for Modern Massive Data Sets was held June 17–20, 2014, in Berkeley, CA. Video recordings of all the talks may be found on our YouTube channel. Download the full MMDS 2014 program here.

Tue, June 17 Data Analysis and Statistical Data Analysis *
08:00 09:45 Breakfast and registration *
09:45 10:00 Welcome and opening remarks Organizers
10:00 11:00 Large Scale Machine Learning at Verizon
show videohide video
Ashok Srivastava
11:00 11:30 Communication Cost in Big Data Processing
show videohide video
Dan Suciu
11:30 12:00 Content-based search in 50TB of consumer-produced videos
show videohide video
Gerald Friedland
12:00 02:00 Lunch *
02:30 03:00 Myria: Scalable Analytics as a Service
show videohide video
Bill Howe
03:00 03:30 Spectral algorithms for graph mining and analysis
show videohide video
Yiannis Koutis
03:30 04:00 Network community detection
show videohide video
Jiashun Jin
04:30 05:00 Coffee break *
04:30 05:00 Optimal CUR Matrix Decompositions
show videohide video
David Woodruff
05:00 05:30 Dimensionality reduction via sparse matrices
show videohide video
Jelani Nelson
05:30 06:00 Influence sampling for generalized linear models
show videohide video
Jinzhu Jia
06:00 09:00 Dinner Reception
Wed, June 18 Industrial and Scientific Applications *
09:00 10:00 Counterfactual reasoning and massive data sets
show videohide video
Leon Bottou
10:00 10:30 Connected Components in MapReduce and Beyond
show videohide video
Sergei Vassilvitskii
10:30 11:00 Coffee break
11:00 11:30 Distributing Large-scale Recommendation Algorithms: from GPUs to the Cloud
show videohide video
Xavier Amatriain
11:30 12:00 Disentangling sources of risk in massive financial portfolios
show videohide video
Jeffrey Bohn
12:00 02:30 Lunch
02:30 03:00 Localized Methods for Diffusions in Large Graphs
show videohide video
David Gleich
03:00 03:30 FAST-PPR: Scaling Personalized PageRank Estimation for Large Graphs
show videohide video
Peter Lofgren and Ashish Goel
03:30 04:00 Locally-biased and semi-supervised eigenvectors
show videohide video
Michael Mahoney
04:00 04:30 Coffee break
04:30 05:00 Optimal Shrinkage of Fast Singular Values
show videohide video
Matan Gavish
05:00 05:30 Dimension Independent Matrix Square using MapReduce
show videohide video
Reza Zadeh
Thu, June 19 Novel Algorithmic Approaches *
09:00 10:00 Analyzing Big Graphs via Sketching and Streaming
show videohide video
Andrew McGregor
10:00 10:30 Large-Scale Inference in Time Domain Astrophysics
show videohide video
Joshua Bloom
10:30 11:00 Coffee break
11:00 11:30 Exploring "forgotten" one-shot learning
show videohide video
Alek Kolcz
11:30 12:00 Modeling Dynamics of Opinion Formation in Social Networks
show videohide video
Sreenivas Gollapudi
12:00 12:30 Multi-reference Alignment: Estimating Group Transformations using Semidefinite Programming
show videohide video
Amit Singer
12:30 02:30 Lunch *
02:30 03:00 IPython: a language-independent framework for computation and data
show videohide video
Fernando Perez
03:30 04:00 Reducing Communication in Parallel Graph Computations
show videohide video
Aydin Buluc
03:30 04:00 Large Scale Graph-Parallel Computation for Machine Learning: Applications and Systems
show videohide video
Ankur Dave and Joseph Gonzalez
04:00 04:30 Coffee break *
04:30 05:00 CUR Factorization via Discrete Empirical Interpolation
show videohide video
Mark Embree
05:00 05:30 Leverage scores: Sensitivity and an App
show videohide video
Ilse Ipsen
05:30 06:00 libSkylark: Sketching-based Accelerated Numerical Linear Algebra and Machine Learning for Distributed-memory Systems
show videohide video
Vikas Sindhwani
06:00 09:00 Dinner Reception and Poster Session
Fri, June 20 Novel Matrix and Graph Methods *
09:00 10:00 Large-Scale Numerical Computation Using a Data Flow Engine
show videohide video
Matei Zaharia
10:00 10:30 Automatic discovery of cell types and microcircuitry from neural connectomics
show videohide video
Eric Jonas
10:35 11:05 Coffee break
11:00 11:30 Beyond Locality Sensitive Hashing
show videohide video
Alexandr Andoni
11:30 12:00 Combinatorial optimization and sparse computation for large scale data mining
show videohide video >
Dorit Hochbaum
12:00 02:30 Lunch
12:00 12:30 Public Participation in International Security - Open Source Treaty Verification
show videohide video
Christopher Stubbs
02:30 03:00 The Hearts and Minds of Data Science Cecilia Aragon
03:30 04:00 The fall and rise of geometric centralities
show videohide video
Sebastiano Vigna
03:30 04:00 Mixed Regression Constantine Caramanis
04:00 04:30 No Free Lunch for Stress Testers: Toward a Normative Theory of Scenario-Based Risk Assessment
show videohide video
Lisa Goldberg
Alek Kolcz Twitter
Alexandr Andoni Microsoft Research
Amit Singer Princeton University
Andrei Kirilenko MIT Sloan School of Management
Andrew McGregor University of Massachusetts
Anna Gilbert University of Michigan
Ashish Goel Stanford University
Ashok Srivastava Verizon
Aydin Buluc Berkeley Lab
Ben Recht UC Berkeley
Bill Howe University of Washington eScience Institute
Cecilia Aragon University of Washington
Christopher Stubbs Harvard University
Constantine Caramanis UT Austin
Dan Suciu University of Washington
David Gleich Purdue University
David Woodruff IBM Research Almaden
Dorit Hochbaum UC Berkeley
Eric Jonas UC Berkeley
Fernando Perez UC Berkeley
Gerald Friedland ICSI
Ilse Ipsen North Carolina State University
Jeffrey Bohn State Street
Jelani Nelson Harvard University
Jiashun Jin Carnegie Mellon University
Jinzhu Jia Peking University
Joseph Gonzalez UC Berkeley
Joshua Bloom UC Berkeley
Leon Bottou Microsoft Research
Lisa Goldberg University of California, Berkeley
Mark Embree Virginia Tech
Matan Gavish Stanford University
Matei Zaharia Databricks, MIT
Michael Mahoney UC Berkeley
Peter Richtarik Edinburgh
Reza Zadeh Stanford University
Sebastiano Vigna Università degli Studi di Milano
Sergei Vassilvitskii Google
Sreenivas Gollapudi Microsoft Research
Vikas Sindhwani IBM Research
Xavier Amatriain Netflix
Yiannis Koutis University of Puerto Rico - Rio Piedras
The 4th MMDS Workshop on Algorithms for Modern Massive Data Sets was held July 10–13, 2012, in Stanford, CA. In addition to familiar MMDS topics, MMDS 2012 expanded more into scientific applications in biology and physics. In addition, video recordings of all the talks, made possible by Cloudera, may be found here. Download the full MMDS 2012 program here.

Tuesday, July 10, 2012. Data Analysis and Statistical Data Analysis
8:00 - 10:00 Breakfast and Registration -- outside Cubberley Auditorium (at the Stanford School of Education, just off the Main Quad)
9:45 - 10:00 Welcome and Opening Remarks -- in Cubberley Auditorium
10:00 - 11:00 Tutorial: Jiawei Han
A Meta Path-Based Approach for Similarity Search and Mining of Heterogeneous Information Networks
11:00 - 11:30 Alexander Gray
Faster Learning for Massive Datasets
11:30 - 12:00 Christopher Re
Hazy: Making Data-driven Statistical Applications Easier to Build and Maintain
2:00 - 3:00 Tutorial: Peter Bartlett
Model Selection and Recent Results for Large Scale Problems
3:00 - 3:30 Noureddine El Karoui
On Robust Regression Estimators in High-dimension
3:30 - 4:00 Jure Leskovec
Affiliation Network Models for Densely Overlapping Communities in Networks
4:30 - 5:00 Haesun Park
Nonnegative Matrix Factorizations for Clustering
5:00 - 5:30 Fan Chung Graham
Vectorized Laplacians for Dealing with High-dimensional Data Sets
5:30 - 6:00 Joydeep Ghosh
Actionable Mining of Large, Multi-relational Data using Localized Predictive Models
Wednesday, July 11, 2012.Industrial and Scientific Applications
9:00 - 10:00 Tutorial: DJ Patil
When Algorithms Go Wrong: How Product Design Can Save Algorithmic Limitations
Book PDFs: Building Data Science Teams, Data Jujitsu
10:00 - 10:30 Sean Fahey
Big Data and Analytics for National Security
11:00 - 11:30 Petros Drineas
Leverage Scores, the Column Subset Selection Problem, and Least-squares Problems
11:30 - 12:00 David Woodruff
Low Rank Approximation and Regression in Input Sparsity Time
12:00 - 12:30 Michael W. Mahoney
Implementing Randomized Matrix Algorithms in Parallel and Distributed Environments
2:30 - 3:30 Tutorial: Rick Stevens
The Biological, Algorithmic and Computational Challenges of Systems Biology
3:30 - 4:00 Tiankai Tu
Fault-Tolerant Parallel Analysis of Millisecond-Scale Molecular Dynamics Trajectories
4:30 - 5:00 Alexander Szalay
Current Statistical Challenges in Large Astronomical Surveys
5:00 - 5:30 Joseph Richards
Astronomical Time Series Analysis for the Synoptic Survey Era
5:30 - 6:00 Tony Cass
Data Handling for LHC: Plans and Reality
Thursday, July 12, 2012. Novel Algorithmic Approaches
9:00 - 10:00 Tutorial: Michael Mitzenmacher
Peeling Arguments: Invertible Bloom Lookup Tables and Biff Codes
10:00 - 10:30 Frederic Chazal
Detection and Approximation of Linear Structures in Metric Spaces
11:00 - 11:30 Ping Li
Probabilistic Hashing for Efficient Search and Learning on Massive Data
11:30 - 12:00 Ashish Goel
Real Time Social Search and Related Problems
12:00 - 12:30 Andrew Goldberg
Hub Labels in Databases: Shortest Paths for the Masses
2:30 - 3:00 Theodore Johnson
Data Stream Warehousing
3:00 - 3:30 Josh Wills
Experimenting at Scale
3:30 - 4:00 Hang Li
Large Scale Machine Learning for Query Document Matching in Web Search
4:30 - 4:50 Blair Sullivan
Branching Out: Quantifying Tree-like Structure in Complex Networks
4:50 - 5:10 Mahdi Soltanolkotabi
A Geometric Analysis of Subspace Clustering with Outliers
5:10 - 5:30 Bahman Bahmani
Scalable K-Means++
5:30 - 6:00 Steve Bartel
Analytics at Dropbox
Friday, July 13, 2012. Novel Matrix and Graph Methods
9:00 - 10:00 Tutorial: Yi Ma
The Pursuit of Low-dimensional Structures in High-dimensional Data
10:00 - 10:30 Edoardo Airoldi
Graphlets Decomposition of a Weighted Network
11:00 - 11:30 Yiannis Koutis
SDD Solvers: Bridging the Gap Between Theory and Practice
11:30 - 12:00 Art Owen
Bootstrapping r-fold Tensor Data
12:00 - 12:30 Kamesh Madduri
Algorithms and Tools for Scalable Graph Analytics
2:30 - 3:00 Shaowei Lin
Studying Model Asymptotics with Singular Learning Theory
3:00 - 3:30 David Bindel
Communities, Spectral Clustering, and Random Walks
3:30 - 4:00 Ali Pinar
The Block Two-Level Erdos-Renyi (BTER) Graph Model
4:30 - 5:00 Xiao-Li Meng (presented by Alexander Blocker)
Preprocessing, Multiphase Inference, and Massive Data in Theory and Practice
5:00 - 5:30 Alfred Hero
Hub Discovery in Large Correlation Networks
5:30 - 6:00 Dan Feldman
Google Your Life: Learning Sensors Data
Edoardo Airoldi Harvard University
Bahman Bahmani Stanford University
Steve Bartel Dropbox
Peter Bartlett University of California, Berkeley, and QUT
David Bindel Cornell University
Tony Cass CERN
Frederic Chazal INRIA
Fan Chung Graham University of California, San Diego
Petros Drineas Rensselaer Polytechnic Institute
Noureddine El Karoui University of California, Berkeley
Sean Fahey Johns Hopkins Applied Physics Laboratory
Dan Feldman Massachusetts Institute of Technology
Joydeep Ghosh University of Texas, Austin
Ashish Goel Stanford University
Andrew Goldberg Microsoft Research, Silicon Valley
Alexander Gray Georgia Institute of Technology
Jiawei Han University of Illinois, Urbana-Champaign
Alfred Hero University of Michigan
Theodore Johnson AT&T Research Labs
Yiannis Koutis University of Puerto Rico, Rio Piedras
Jure Leskovec Stanford University
Hang Li Huawei Labs
Ping Li Cornell University
Shaowei Lin University of California, Berkeley
Yi Ma Microsoft Research, Asia
Kamesh Madduri Pennsylvania State University
Xiao-Li Meng Harvard University
Michael Mitzenmacher Harvard University
Art Owen Stanford University
Haesun Park Georgia Institute of Technology
DJ Patil Greylock Partners
Ali Pinar Sandia National Laboratories
Christopher Re University of Wisconsin, Madison
Joseph Richards University of California, Berkeley
Mahdi Soltanolkotabi Stanford University
Rick Stevens Argonne National Laboratory
Blair Sullivan Oak Ridge National Labs
Alexander Szalay Johns Hopkins University
Tiankai Tu DE Shaw Research
Josh Wills Cloudera, Inc
David Woodruff IBM Research, Almaden
The 3rd MMDS Workshop on Algorithms for Modern Massive Data Sets was held June 15–18, 2010, in Stanford, CA. MMDS 2010 addressed computation in large-scale scientific and internet data applications more generally. Click here for an article that appeared in various venues describing the meeting. Download the full MMDS 2010 program here.

Tuesday, June 15, 2010 Large-scale Data and Large-scale Computation
8:00 - 10:00 Breakfast and Registration -- outside Cubberley Auditorium (at the Stanford School of Education, just off the Main Quad)
9:45 - 10:00 Welcome and Opening Remarks -- in Cubberley Auditorium
10:00 - 11:00 Tutorial: Peter Norvig
Internet-Scale Data Analysis
11:00 - 11:30 Ashok Srivastava
Virtual Sensors and Large-Scale Gaussian Processes
11:30 - 12:00 John Langford
A Method for Parallel Online Learning
2:00 - 3:00 Tutorial: John Gilbert
Combinatorial Scientific Computing: Experience and Challenges
3:00 - 3:30 Deepak Agarwal
Recommender Probems for Content Optimization
3:30 - 4:00 James Demmel
Minimizing Communication in Linear Algebra
4:30 - 5:00 Dmitri Krioukov
Hyperbolic Mapping of Complex Networks
5:00 - 5:30 Mehryar Mohri
Matrix Approximation for Large-Scale Learning
5:30 - 6:00 David Bader
Massive-Scale Analytics of Streaming Social Networks
6:00 - 6:30 Ely Porat
Fast Pseudo-Random Fingerprints
Wednesday, June 16, 2010.Networked Data and Algorithmic Tools
9:00 - 10:00 Tutorial: Peter Bickel
Statistical Inference for Networks
10:00 - 10:30 Jure Leskovec
Inferring Networks of Diffusion and Influence
11:00 - 11:30 Michael W. Mahoney
Geometric Network Analysis Tools
11:30 - 12:00 Edward Chang
AdHEat - A New Influence-based Social Ads Model and its Tera-Scale Algorithms
12:00 - 12:30 Mauro Maggioni
Intrinsic Dimensionality Estimation and Multiscale Geometry of Data Sets
2:30 - 3:00 Guillermo Sapiro
Collaborative Hierarchical Sparse Models
3:00 - 3:30 Alekh Agarwal and Peter Bartlett
Information-theoretic Lower Bounds on the Oracle Complexity of Convex Optimization
3:30 - 4:00 John Duchi and Yoram Singer
Composite Objective Optimization and Learning for Massive Datasets
4:30 - 5:00 Steven Hillion
MAD Analytics in Practice
5:00 - 5:30 Matthew Harding
Outlier Detection in Financial Trading Networks
5:30 - 6:00 Neel Sundrahan
Large Dataset Problems at the Long Tail
Thursday, June 17, 2010. Spectral Methods and Sparse Matrix Methods
9:00 - 10:00 Tutorial: Sebastiano Vigna
Spectral Ranking
10:00 - 10:30 Robert Stine
Streaming Feature Selection
11:00 - 11:30 Konstantin Mischaikow
A Combinatorial Framework for Nonlinear Dynamics
11:30 - 12:00 Alfred Hero
Sparse Correlation Screening in High Dimension
12:00 - 12:30 Susan Holmes
Heterogeneous Data Challenge Combining Complex Data
2:30 - 3:30 Tutorial: Piotr Indyk
Sparse Recovery Using Sparse Matrices
3:30 - 4:00 Sayan Mukherjee
Efficient Dimension Reduction on Massive Data
4:30 - 5:00 Padhraic Smyth
Statistical Modeling of Large-Scale Sensor Count Data
5:00 - 5:30 Ping Li
Compressed Counting and Application in Estimating Entropy of Data Steams
5:30 - 6:00 Edo Liberty
Scaleable Correlation Clustering Algorithms
Friday, June 18, 2010. Randomized Algorithms for Data
9:00 - 10:00 Tutorial: Petros Drineas
Randomized Algorithms in Linear Algebra and Large Data Applications
10:00 - 10:30 Gunnar Martinsson
Randomized methods for Computing the SVD/PCA of Very Large Matrices
11:00 - 11:30 Ilse Ipsen
Numerical Reliability of Randomized Algorithms
11:30 - 12:00 Philippe Rigollet
Optimal Rates of Sparse Esimation and Universal Aggregation
12:00 - 12:30 Alexandre d'Aspremont
Subsampling, Spectral Methods & Semidefinite Programming
2:30 - 3:00 Gary Miller
Specialized System Solvers for very large Systems: Theory and Practice
3:00 - 3:30 John Wright and Emmanuel Candes
Robust Principal Component Analysis?
3:30 - 4:00 Alon Orlitsky
Estimation, Prediction, and Classification over Large Alphabets
4:30 - 5:00 Ken Clarkson
Numerical Linear Algebra in the Streaming Model
5:00 - 5:30 David Woodruff
Fast Lp Regression in Data Streams
Alekh Agarwal University of California, Berkeley
Deepak Agarwal Yahoo! Research
Alexandre d'Aspremont Princeton University
David Bader Georgia Tech College of Computing
Peter Bickel University of California, Berkeley
Emmanuel Candes Stanford University
Edward Chang Google Research
Ken Clarkson IBM Almaden Research Center
Jim Demmel University of California, Berkeley
John Duchi University of California, Berkeley
John Gilbert University of California, Santa Barbara
Matthew Harding Stanford University
Alfred Hero University of Michigan, Ann Arbor
Steven Hillion Greenplum
Susan Holmes Stanford University
Peter Indyk Massachusetts Institute of Technology
Ilse Ipsen North Carolina State University
Dmitri Krioukov Cooperative Association for Internet Data Analysis
John Langford Yahoo! Research
Jure Leskovec Stanford University
Ping Li Cornell University
Edo Liberty Yahoo! Research
Mauro Maggioni Duke University
Gunnar Martinsson University of Colorado, Boulder
Gary Miller Carnegie Mellon University
Konstantin Mischaikow Rutgers University
Mehryar Mohri New York University
Sayan Mukherjee Duke University
Peter Norvig Google Research
Alon Orlitsky University of California, San Diego
Ely Porat Bar-Ilan University
Guillermo Sapiro University of Minnesota
Padhraic Smyth University of California, Irvine
Ashok Srivastava National Aeronautics and Space Administration
Neel Sundaresan eBay Research
Robert Stine University of Pennsylvania
Sebastiano Vigna Università Degli Studi Di Milano
David Woodruff IBM Almaden Research Center
John Wright Microsoft Research Asia
The European MMDS Workshop on Challenges Modern Massive Data Sets was held July 1–4, 2009, at the Technical University of Denmark, Lyngby, Denmark. The full conference website may be found here.

Wednesday, July 1, 2009. Statistical Learning and Machine Learning
09:10 - 10:10 Jerome Friedman
Predictive learning via rule ensembles
10:30 - 11:15 Yee Whye Teh
Bayesian nonparametrics in document and language modeling
11:15 - 12:00 Ole Winther
Hierarchical bayesian modelling for collaborative filtering
13:30 - 14:30 Bernhard Schölkopf
Machine learning with positive definite kernels
14:30 - 15:05 Klaus Mosegaard
Metaheuristics in science and engineering
15:25 - 16:00 Nello Cristianini
Looking for memes in media content
16:00 - 16:35 Mikkel Schmidt
Bayesian matrix factorization approaches to blind source separation
Thursday, July 2, 2009Multilinear Algebra for Data Analysis
09:00 - 10:00 Edward Chang
Parallel algorithms for collaborative filtering
10:20 - 10:55 Rasmus Bro
Applications of tensor methods in life sciences data
10:55 - 11:30 Pierre Comon
Tensor decompositions in statistical signal processing
11:30 - 12:05 Lieven De Lathauwer
Tucker compression, Parallel Factor Analysis and block term decompositions: New results
12:05 - 12:40 Lars Eldén
Krylov methods for tensors
14:00 - 15:00 Tomaso Poggio
From neuroscience to hierarchical learning architectures
15:00 - 17:45 Poster Session
Friday, July 3, 2009. Neuroscience and Clustering
09:00 - 10:00 Ricardo Baeza-Yates
The power of data
10:20 - 11:20 Scott Makeig
Multiscale brain/body imaging: Towards a single brain electrophysiology
11:25 - 12:00 John Ashburner
Brain morphometrics from MRI scans data
13:30 - 14:30 Joachim Buhmann
Structure validation in clustering by stability analysis
14:30 - 15:05 Charles Elkan
Accounting for burstiness in topic models
15:25 - 16:00 Neil Lawrence
Nonlinear matrix factorization with gaussian processes
16:00 - 16:35 Michael Mahoney
Community structure in large social and information networks
16:35 - 17:10 Morten Mørup
Clustering on the simplex
Saturday, July 4, 2009. New Mathematical Tools for Data Analysis and Social Computing
09:00 - 10:00 Gunnar Carlsson
Topology and data
10:20 - 10:55 Risi Kondor
Non-commutative harmonic analysis in machine learning: the skew spectrum and the graphlet spectrum of graphs
10:55 - 11:30 Samuel Kaski
Probabilistic retrieval and visualization of relevant experiments
11:30 - 12:05 Lek-Heng Lim
Principal cumulant components analysis
13:30 - 14:30 Pedro Cano
Music recommendation systems: A complex networks perspective
14:30 - 15:05 Joaquin Quiñonero Candela
Probabilistic machine learning in computational advertising
15:25 - 16:00 Mark Herbster
Resistive geometry for graph-based transduction
16:00 - 16:35 Sune Lehmann
Connections matter. Communities of links in complex networks
16:35 - 17:10 Lars Kai Hansen
Machine learning in complex networks

MMDS 2009 Speakers

John Ashburner University College London
Ricardo Baeza-Yates Yahoo! Research, Barcelona
Rasmus Bro University of Copenhagen
Joachim Buhmann Swiss Federal Institute of Technology (ETH), Zürich
Joaquin Quiñonero Candela Microsoft Research, Cambridge
Pedro Cano Barcelona Music and Audio Technologies
Edward Chang Google Research, Beijing
Pierre Comon University of Nice, Sophia-Antipolis
Nello Cristianini University of Bristol
Lieven De Lathauwer Katholieke Universiteit Leuven
Lars Elden Linköping University
Charles Elkan University of California, San Diego
Jerome Friedman Stanford University
Mark Herbster University College London
Samuel Kaski Helsinki University of Technology
Risi Kondor University College London / California Institute of Technology
Neil Lawrence University of Manchester
Sune Lehmann Northeastern University / Harvard University
Scott Makeig University of California, San Diego
Klaus Mosegaard University of Copenhagen
Tomaso Poggio Massachusetts Institute of Technology
Mikkel Schmidt University of Cambridge
Bernhard Schölkopf Max Planck Institute
Yee Whye Teh University College London
Ole Winther Technical University of Denmark / University of Copenhagen
The 2nd MMDS Workshop on Algorithms for Modern Massive Data Sets was held June 25–28, 2008, in Stanford, CA. MMDS 2008 grew out of our expectation for what the algorithmic and statistical foundations of large-scale data analysis should look like a generation from now. Click here for an article that appeared in SIGKDD Explorations and SIAM News about the meeting. Download the full MMDS 2008 program here.

Wednesday, June 25, 2008. Data Analysis and Data Applications
10:00 - 11:00 Tutorial: Christos Faloutsos
Graph mining: laws, generators and tools
11:00 - 11:30 Deepak Agarwal
Predictive discrete latent models for large incomplete dyadic data
11:30 - 12:00 Chandrika Kamath
Scientific data mining: why is it difficult?
2:00 - 3:00 Tutorial: Edward Chang
Challenges in mining large-scale social networks
3:00 - 3:30 Sharad Goel
Predictive indexing for fast search
3:30 - 4:00 James Demmel
Avoiding communication in linear algebra algorithms
4:30 - 5:00 Jun Liu
Bayesian inference of interactions and associations
5:00 - 5:30 Fan Chung
Four graph partitioning algorithms
5:30 - 6:00 Ronald Coifman
Diffusion geometries and harmonic analysis on data sets
Thursday, June 26, 2008Networked Data and Algorithmic Tools
9:00 - 10:00 Tutorial: Milena Mihail
Models and algorithms for complex networks, with network elements maintaining characteristic profiles
10:00 - 10:30 Reid Andersen
An algorithm for improving graph partitions
11:00 - 11:30 Michael W. Mahoney
Community structure in large social and information networks
11:30 - 12:00 Nikhil Srivastava and Daniel Spielman
Graph sparsification by effective resistances
12:00 - 12:30 Amin Saberi
Sequential algorithms for generating random graphs
2:30 - 3:00 Pankaj K. Agarwal
Modeling and analyzing massive terrain data sets
3:00 - 3:30 Leonidas Guibas
Detection of symmetries and repeated patterns in 3D point cloud data
3:30 - 4:00 Yuan Yao
Topological methods for exploring pathway analysis in complex biomolecular folding
4:30 - 5:00 Piotr Indyk
Sparse recovery using sparse random matrices
5:00 - 5:30 Ping Li
Compressed counting and stable random projections
5:30 - 6:00 Joel Tropp
Algorithms for matrix column selection
Friday, June 27, 2008 Statistical, Geometric, and Topological Methods
9:00 - 10:00 Tutorial: Jerome H. Friedman
Fast sparse regression and classification
10:00 - 10:30 Tong Zhang
An adaptive forward/backward greedy algorithm for learning sparse representations
11:00 - 11:30 Jitendra Malik
Classification using intersection kernel SVMs is efficient
11:30 - 12:00 Elad Hazan
Efficient online routing with limited feedback and optimization in the dark
12:00 - 12:30 T.S. Jayram
Cascaded aggregates on data streams
2:30 - 3:30 Tutorial: Gunnar Carlsson
Topology and data
3:30 - 4:00 Partha Niyogi
Manifold regularization and semi-supervised learning
4:30 - 5:00 Sanjoy Dasgupta
Random projection trees and low dimensional manifolds
5:00 - 5:30 Kenneth Clarkson
Tighter bounds for random projections of manifolds
5:30 - 6:00 Yoram Singer
Efficient projection algorithms for learning sparse representations from high dimensional data
6:00 - 6:30 Arindam Banerjee
Bayesian co-clustering for dyadic data analysis
Saturday, June 28, 2008. Machine Learning and Dimensionality Reduction
9:00 - 10:00 Tutorial: Michael I. Jordan
Sufficient dimension reduction
10:00 - 10:30 Nathan Srebro
More data less work: SVM training in time decreasing with larger data sets
11:00 - 11:30 Inderjit S. Dhillon
Rank minimization via online learning
11:30 - 12:00 Nir Ailon
Efficient dimension reduction
2:30 - 3:00 Ravi Kannan
Spectral algorithms
3:00 - 3:30 Chris Wiggins
Inferring and encoding graph partitions
3:30 - 4:00 Anna Gilbert
Combinatorial group testing in signal recovery
4:30 - 5:00 Lars Kai Hansen
Generalization in high-dimensional matrix factorization
5:00 - 5:30 Holly Jin
Exploring sparse nonnegative matrix factorization
5:30 - 6:00 Elizabeth Purdom
Data analysis with graphs
6:00 - 6:30 Lek-Heng Lim
Ranking via Hodge decompositions of graphs and skew-symmetric matrices
Deepak Agarwal Yahoo! Research, Silicon Valley
Pankaj Agarwal Duke University
Nir Ailon Google Research, New York
Reid Andersen Microsoft Research, Redmond
Arindam Banerjee University of Minnesota, Twin Cities
Edward Chang Google Research, Mountain View
Fan Chung University of California, San Diego
Kenneth Clarkson IBM Almaden Research Center
Ronald Coifman Yale University
Sanjoy Dasgupta University of California, San Diego
James Demmel University of California, Berkeley
Inderjit Dhillon University of Texas, Austin
Christos Faloutsos Carnegie Mellon University
Jerome Friedman Stanford University
Anna Gilbert University of Michigan, Ann Arbor
Sharad Goel Yahoo! Research, New York
Leonidas Guibas Stanford University
Lars Kai Hansen Technical University of Denmark
Elad Hazan IBM Almaden Research Center
Piotr Indyk Massachusetts Institute of Technology
T.S. Jayram IBM Almaden Research Center
Holly Jin LinkedIn
Michael Jordan University of California, Berkeley
Satyen Kale Microsoft Research, Redmond
Chandrika Kamath Lawrence Livermore National Laboratory
Ravi Kannan Microsoft Research, India
Ping Li Cornell University
Jun Liu Harvard University
Jitendra Malik University of California, Berkeley
Milena Mihail Georgia Institute of Technology
Partha Niyogi University of Chicago
Elizabeth Purdom University of California, Berkeley
Amin Saberi Stanford University
Yoram Singer Google Research, Mountain View
Daniel Spielman Yale University
Nathan Srebro University of Chicago
Nikhil Srivastava Yale University
Joel Tropp California Institute of Technology
Chris Wiggins Columbia University
Yuan Yao Stanford University
Tong Zhang Rutgers University

MMDS 2008 Participants

Bart Adams Stanford University
Ahmed AfrozTuple Networks
Pankaj AgarwalDuke University
Deepak AgarwalYahoo! Research, Silicon Valley
John-Mark AgostaIntel Research, Santa Clara
Shipra AgrawalStanford University
Nir AilonGoogle Research, New York
Srinivas AkellaRensselaer Polytechnic Institute
Ramakrishna AkellaUniversity of California, Santa Cruz
Bin AnUniversity of California, Santa Cruz
Markus AnderlePricewaterhouseCoopers
Reid AndersenMicrosoft Research, Redmond
Alexandr AndoniMIT
Christina AperjisStanford University
Corey ArnoldUniversity of California, Los Angeles
Arindam BanerjeeUniversity of Minnesota, Twin Cities
Narges Bani-AsadiStanford University
Bhupesh BansalLinkedIn
Joshua BatsonGoogle
Henrik BengtssonUniversity of California, Berkeley
Anmol BhasinLinkedIn
Sandeep BhutaniUniversity of California, Berkeley
Arshavir BlackwellFox Interactive Media
Joshua BloomUniversity of California, Berkeley
Robert BonneauAir Force Office of Scientific Research
Christos BoutsidisRensselaer Polytechnic Institute
Karla Caballero-EspinosaUniversity of California, Santa Cruz
Gunnar CarlssonStanford University
Lawrence CaytonUniversity of California, San Diego
Su ChanYahoo!
Edward ChangGoogle Research, China
Kevin ChangMax Planck Institute for Computer Science
Sheueling ChangSun Microsystems
Vineet ChaojiRensselaer Polytechnic Institute
Priyam ChatterjeeUniversity of California, Santa Cruz
Tiffany ChenStanford University
Yun ChiNEC Labs America
Patrick ChiuFuji Xerox Palo Alto Lab
Jason ChiuPricewaterhouseCoopers
Jon ChuMIT
Fan ChungUniversity of California, San Diego
David ClarkThe Boeing Company
Kenneth ClarksonIBM Almaden Research Center
David CohenCenter for Computing Sciences
Ronald CoifmanYale University
Ioana CosmaOxford University
Helen CunninghamSun Microsystems Labs
Abhimanyu DasUniversity of Southern California
Ali DasdanYahoo!
Sanjoy DasguptaUniversity of California, San Diego
Laurent DemanetStanford University
James DemmelUniversity of California, Berkeley
Vinay DeolalikarHewlett-Packard Research Labs
Inderjit DhillonUniversity of Texas, Austin
Pavel DmitrievYahoo!
Debojyoti DuttaCisco
Charles ElkanUniversity of California, San Diego
Christos FaloutsosCarnegie Mellon University
Daniel FordGoogle
Luca FoschiniUniversity of California, Santa Barbara
Majid FozunbalHewlett-Packard Labs
Jerome FriedmanStanford University
Yael GartenStanford University
Matt GedigianUniversity of California, Berkeley
Andrew GentlesStanford University
Anna GilbertUniversity of Michigan
David GleichStanford University
Sharad GoelYahoo! Research, New York
Jonathon GoldmanLinkedIn
Sreenivas GollapudiMicrosoft Research, Mountain View
Nina GonzaludoStanford University
Virgil GriffithCaltech
George GrigoryevStanford University
Logan GrosenickStanford University
Leonidas GuibasStanford University
Sudhir GuptaNorthern Illinois University
Zoltan GyongyiGoogle
Peter HaasStanford University
Jeff HammerbacherFacebook
Lars Kai HansenTechnical University of Denmark
Anne HardySAP Labs
Eric HarleyJohns Hopkins University
Mohammad HasanRensselaer Polytechnic Institute
Chris HaulkUniversity of California, Berkeley
Jonathan HaynesStanford University
Elad HazanIBM Almaden Research Center
Thomas HoganThe Boeing Company
Chen HuStanford University
Ling HuangIntel Research, Berkeley
Rae HuangSan Jose State University
Qi-xing HuangStanford University
Xuhui HuangStanford University
Piotr IndykMIT
Paul IvanovUniversity of California, Berkeley
Prateek JainUniversity of Texas, Austin
Grahame JastrebskiPayPal
T.S. JayramIBM Almaden Research Center
Jinzhu JiaUniversity of California, Berkeley
Holly JinLinkedIn
Ramesh JohariStanford University
Clinton JonesUniversity of Texas, Austin
Michael JordanUniversity of California, Berkeley
Brock JudkinsMeebo
Karen KafadarIndiana University
Satyen KaleMicrosoft Research, Redmond
David KaleStanford University
Chandrika KamathLawrence Livermore National Lab
Pentti KanervaStanford University
Ravi KannanMicrosoft Research, India
Siddharth KarUniversity of Michigan
Atsushi KashitaniNEC Japan
David KelloggWink
Krishnaram KenthapadiMicrosoft Research, Mountain View
Shinji KimSun Microsystems
Masayoshi Kobayashi NEC Labs Japan
Aleksandra KorolovaStanford University
Jay KrepsLinkedIn
Ramya KrishnamurthyFox Interactive Media
Hanni KruggelUniversity of California, Santa Cruz
Ashish KumarStanford University
Vadim KutsyyeBay
Simon Lacoste-JulienUniversity of California, Berkeley
Thomas LauritzenUniversity of California, Berkeley
Matt LeducStanford University
Miranda LeeStanford University
Cherung LeeUniversity of California, Davis
Lei LiCarnegie Mellon University
Ping LiCornell University
Qi LiStanford University
Zeyu LiUniversity of California, Berkeley
Lek-Heng LimUniversity of California, Berkeley
Colin LittleeBay
Jun LiuHarvard University
Kun LiuIBM Almaden Research Center
Tianyun LiuStanford University
Wai Wai LiuStanford University
Martin LoCaltech
Edgar LobatonUniversity of California, Berkeley
Markus LoecherSense Networks
Huitao LuoLinkedIn
Michael LustigStanford University
Li MaStanford University
John MacphersonStanford University
Anand MadhavanStanford University
Vinit MahediaSan Jose State University
Michael MahoneyYahoo! Research, Silicon Valley
Jitendra MalikUniversity of California, Berkeley
Quentin MerigotINRIA, Sophia-Antipolis
Ming MaoSAP Research
Nathan MartinStanford University
Rahul MazumderStanford University
James McEnerneyLawrence Livermore National Lab
Jim McGregorAdobe Systems
Pankaj MehraHewlett-Packard Labs
James MerinoYahoo!
Milena MihailGeorgia Tech
Peyman MilanfarUniversity of California, Santa Cruz
Nikola MilosavljevicStanford University
Taesup MoonStanford University
Jason MortonStanford University
Rajeev MotwaniStanford University
Ramesh NallapatiStanford University
Hariharan NarayananUniversity of Chicago
Esmond NgLawrence Berkeley National Lab
Huy NguyenMIT
Monica NicolauStanford University
Masoud NikraveshUniversity of California, Berkeley
Lloyd NimetzStanford School of Business
Ken NitzSRI International
Partha NiyogiUniversity of Chicago
Krzysztof OnakMIT
Lorenzo OrecchiaUniversity of California, Berkeley
George OstrouchovOak Ridge National Lab
Giuseppe OttavianoUniversità di Pisa
George PapanicolaouStanford University
Junfeng PanHong Kong University of Science and Technology
Manu ParmarStanford University
Priyank PatelUniversity of Texas, Austin
Fernando PerezUniversity of California, Berkeley
Stefan PicklNaval Postgraduate School and Bundeswehr University, Munich
Sarah PierceStanford University
Katerina PotikaUniversity of California, Santa Cruz
Corey PowellUniversity of California, Santa Cruz
PrabhatLawrence Berkeley National Lab
Winston PrakashSun Microsystems
Elizabeth PurdomUniversity of California, Berkeley
Peng QiuStanford University
Ram RajagopalUniversity of California, Berkeley
Anand RajaramanKosmix
Lyle RamshawHewlett-Packard Labs
Zulfikar RamzanSymantec
Suman RavuriStanford University
Christopher RiccominiPayPal
Monica RogatiLinkedIn
Karl RoheUniversity of California, Berkeley
Tim RoughgardenStanford University
Amin SaberiStanford University
Mehran SahamiStanford University
Suchi SariaStanford University
Tamas SarlosYahoo! Research, Silicon Valley
Michael SaundersStanford University
Hae Jong SeoUniversity of California, Santa Cruz
James SethianUniversity of California, Berkeley
Harlan SextonStanford University
Fei ShaYahoo! Research, Silicon Valley
Cirrus ShakeriSAP Research
Srinivas ShakkottaiTexas A&M University
Aneesh SharmaStanford University
Lei ShiUniversity of California, Berkeley
Jon ShlensUniversity of California, Berkeley
Aleksandr SimmaUniversity of California, Berkeley
Horst SimonLawrence Berkeley National Lab
Yoram SingerGoogle Research, Mountain View
Yannis SismanisIBM Almaden Research Center
Peter SkomorochJuice Analytics
Primoz SkrabaStanford University
Malcolm SlaneyYahoo! Research, Silicon Valley
Daniel SpielmanYale University
Nathan SrebroToyota Technological Institute, Chicago
Nikhil SrivastavaYale University
John StrainUniversity of California, Berkeley
Thomas StrohmerUniversity of California, Davis
Jian SunStanford University
Neel SundaresaneBay Research Labs
Ganesh SwamiSimon Fraser University
Ram SwaminathanHewlett-Packard Labs
Hsiu-Khuern TangHewlett-Packard Labs
Lei TangArizona State University
Vivek TawdeFox Interactive
Evimaria TerziIBM Almaden Research Center
Joel TroppCaltech
Mitchell TrottHewlett-Packard Labs
Panayiotis TsaparasMicrosoft Research, Silicon Valley
Charalampos TsourakakisCarnegie Mellon University
Daniela UshizimaLawrence Berkeley National Lab
Flavian VasileIowa State University
Krishna VenkatramanHewlett-Packard Labs
Jeffrey VitterPurdue University
Vincent VuUniversity of California, Berkeley
Jinjun WangNEC Labs America
Jiong WangLinkedIn
Ying WangStanford University
Nancy WangUniversity of California, Berkeley
Wei WangUniversity of California, Berkeley
Chunye WangUniversity of California, Santa Cruz
Chris WigginsColumbia University
John WuLawrence Berkeley National Lab
Shirley WuStanford University
Cinna WuUniversity of California, Berkeley
Yao XieStanford University
Xing XingUniversity of California, Santa Cruz
Wei XuNEC Labs America
Rong XuStanford University
Shirley XuLinkedIn
Ying XuUniversity of California, Berkeley
TongKe XueStanford University
Donghui YanUniversity of California, Berkeley
Fan YangUniversity of California, Santa Cruz
Yuan YaoStanford University
Lexiang YeUniversity of California, Riverside
Junming YinUniversity of California, Berkeley
Hongfeng YinYebol
Stephen YoungGeorgia Tech
Kai YuNEC Labs America
Byron YuStanford University
Joel ZamoraUniversity of California, Santa Cruz
Bin ZhangHewlett-Packard
Tony ZhangRutgers University
Yi ZhangUniversity of California, Santa Cruz
Shenghu ZhuNEC Labs America
Zhisu ZhuStanford University
Yan ZhuangUniversity of Victoria
Margit ZwemerUniversity of California, Berkeley
The 1st MMDS Workshop on Algorithms for Modern Massive Data Sets was held June 21–24, 2006, in Stanford, CA. MMDS 2006 was originally motivated by the complementary perspectives brought about by numerical linear algebra and theoretical computer science to matrix algorithms in large-scale data applications. Click here for an article in SIAM News about this meeting. Download the full MMDS 2006 program here.

Wednesday, June 21, 2006. Linear Algebraic Basics
10:00 -11:00 Tutorial: Ravi Kannan
The changing face of web search
11:00 -11:30 Santosh Vempala
Related paper: Matrix approximation and projective clustering via volume sampling
11:30 -12:00 Petros Drineas
Subspace sampling and relative error matrix approximation
1:30 - 2:30 Tutorial: Dianne O'Leary
Matrix factorizations for information retrieval
2:30 - 3:00 Pete Stewart
Sparse reduced rank approximations to sparse matrices
3:00 - 3:30 Haesun Park
Adaptive discriminant analysis by regularized minimum squared errors
4:00 - 4:30 Michael Mahoney
CUR matrix decompositions for improved data analysis
4:30 - 5:00 Daniel Spielman
Fast algorithms for graph partitioning, sparsifications, and solving SDD systems
5:00 - 5:30 Anna Gilbert/Martin Strauss
List decoding of noisy Reed-Muller-like codes
5:30 - 6:00 Bob Plemmons
Low-rank nonnegative factorizations for spectral imaging applications
6:00 - 6:30 Art Owen
A hybrid of multivariate regression and factor analysis
Thursday, June 22, 2006.Industrial Applications and Sampling Methods
9:00 -10:00 Tutorial: Prabhakar Raghavan
The changing face of web search
10:00 -10:30 Tong Zhang
Statistical ranking problem
11:00 -11:30 Michael Berry
Text-mining approaches for email surveillance
11:30 -12:00 Hongyuan Zha
Incorporating query difference for learning retrieval functions
12:00 -12:30 Trevor Hastie/Ping Li
Efficient L2 and L1 dimension reduction in massive databases
2:00 - 3:00 Tutorial: Muthu Muthukrishnan
An algorithmer's view of sparse approximation problems
3:00 - 3:30 Inderjit Dhillon
Kernel learning with Bregman matrix divergences
3:30 - 4:00 Bruce Hendrickson
Latent semantic analysis and Fiedler retrieval
4:30 - 5:00 Piotr Indyk
Near optimal hashing algorithms for approximate near(est) neighbor problem
5:00 - 5:30 Moses Charikar
Compact data representations and their applications
5:30 - 6:00 Sudipto Guha
At the confluence of streams; order, information, and signals
6:00 - 6:30 Frank McSherry
Preserving privacy in large-scale data analysis
Friday, June 23, 2006. Kernel and Learning Applications
9:00 -10:00 Tutorial: Dimitris Achlioptas
Applications of random matrices in spectral computations and machine learning
10:00 -10:30 Tomaso Poggio
Learning: theory, engineering applications, and neuroscience
11:00 -11:30 Stephen Smale
Related paper: Finding the homology of submanifolds with high confidence from random samples
11:30 -12:00 Gunnar Carlsson
Algebraic topology and analysis of high dimensional data
12:00 -12:30 Vin de Silva
Point-cloud topology via harmonic forms
2:00 - 2:30 Dan Boley
Fast clustering leads to fast support vector machine training and more
2:30 - 3:00 Chris Ding
On the equivalence of (semi-)nonnegative matrix factorization and k-means
3:00 - 3:30 Al Inselberg
Parallel coordinates: visualization & data mining for high dimensiona datasets
3:30 - 4:00 Joel Tropp
One sketch for all: a sublinear approximation scheme for heavy hitters
5:00 - 5:30 Rob Tibshirani
Prediction by supervised principal components
5:30 - 6:00 Tao Yang/Apostolos Gerasoulis
Page ranking for large-scale internet search:'s experiences
Saturday, June 24, 2006. Tensor-Based Data Applications
10:00 -11:00 Tutorial: Lek-Heng Lim
Tensors, symmetric tensors and nonnegative tensors in data analysis
11:00 -11:30 Eugene Tyrtyshnikov
Tensor compression of petabyte-size data
11:30 -12:00 Lieven De Lathauwer
The decomposition of a tensor as a sum of rank-(R1,R2,R3) terms
1:30 - 2:00 Orly Alter
Matrix and tensor computations for reconstructing the pathways of a cellusr system from genome-scale signals
2:00 - 2:30 Shmuel Friedland
Tensors: Ranks and approximations
2:30 - 3:00 Tammy Kolda
Multilinear algebra for analyzing data with multiple linkages (for PowerPoint)
3:00 - 3:30 Lars Eldén
Computing the best rank-(R1,R2,R3) approximation of a tensor
4:00 - 4:30 Liqun Qi
Eigenvalues of tensors and their applications
4:30 - 5:00 Brett Bader
Analysis of Latent Relationships in Semantic Graphs using DEDICOM
5:00 - 5:30 Alex Vasilescu
Multilinear (tensor) algebraic framework for computer vision and graphics
5:30 - 6:00 Rasmus Bro
Multi-way analysis of bioinformatic data (with movies)
6:00 - 6:30 Pierre Comon
Independent component analysis viewed as a tensor decomposition
Orly Alter University of Texas at Austin
Dimitris Achlioptas Microsoft Research
Brett Bader Sandia National Laboratory
Michael W. Berry University of Tennessee at Knoxville
Daniel Boley University of Minnesota at Twin Cities
Rasmus Bro Royal Veterinary and Agricultural University Denmark
Gunnar E. Carlsson Stanford University
Moses Charikar Princeton University
Pierre Comon University of Nice Sophia-Antipolis
Inderjit S. Dhillon University of Texas at Austin
Chris Ding Lawrence Berkeley National Laboratory
David Donoho Stanford University
Lars Eldén Linköping University
Shmuel Friedland University of Illinois at Chicago
Apostolos Gerasoulis Rutgers University &
Anna C. Gilbert University of Michigan at Ann Arbor
Sudipto Guha University of Pennsylvania
Trevor Hastie Stanford University
Bruce Hendrickson Sandia National Laboratory
Piotr Indyk Massachusetts Institute of Technology
Alfred Inselberg Tel Aviv University
Ravi Kannan Yale University
Tamara G. Kolda Sandia National Laboratory
Lieven de Lathauwer Ecole Nationale Superieure d'Electronique et de ses Applications
Frank McSherry Microsoft Research
Bart de Moor Katholieke Universiteit Leuven
S. Muthu Muthukrishnan Google Inc.
Dianne O'Leary University of Maryland at College Park
Art Owen Stanford University
Haesun Park Georgia Institute of Technology
Bob Plemmons Wake Forest University
Tomaso A. Poggio Massachusetts Institute of Technology
Liqun Qi City University of Hong Kong
Prabhakar Raghavan Yahoo! Research
Vin de Silva Pomona College
Stephen Smale University of California at Berkeley
Daniel A. Spielman Yale University
G.W. Stewart University of Maryland at College Park
Martin Strauss University of Michigan at Ann Arbor
Robert Tibshirani Stanford University
Joel A. Tropp University of Michigan at Ann Arbor
Eugene E. Tyrtyshnikov Russian Academy of Sciences
M. Alex O. Vasilescu Massachusetts Institute of Technology
Santosh S. Vempala Massachusetts Institute of Technology
Tao Yang University of California at Santa Barbara &
Hongyuan Zha Pennsylvania State University
Tong Zhang Yahoo! Research

Organizing committee

Seats220 Michael Mahoney (Chair), ICSI and Department of Statistics, UC Berkeley.
Alexander Shkolnik, CDAR and Department of Economics, UC Berkeley.
Petros Drineas, Department of Computer Science, Rensselaer Polytechnic Institute.

MMDS Workshop Series

Podium169 The Workshops on Algorithms for Modern Massive Data Sets (MMDS) address algorithmic and statistical challenges in modern large-scale data analysis. The goals of this series of workshops are to explore novel techniques for modeling and analyzing massive, high-dimensional, and nonlinearly-structured scientific and internet data sets; and to bring together computer scientists, statisticians, mathematicians, and data analysis practitioners to promote the cross-fertilization of ideas.



Talks will be held at Stanley Hall, Room 105. Here it is on a UC Berkeley map and a Google map.
The MMDS poster session will be held jointly with an evening reception on Thursday, June 23rd.
Interana is sponsoring MMDS 2016.
Join us on Tuesday, June 21st for an evening reception of food, drinks, and data science.
Early registration for the MMDS 2016 Workshop is extended till May 8th.
Call for posters: In addition to the talks there will be a poster session. You may apply to present a poster on the registration page.
Our new YouTube channel is now online with video recondings of talks from the 2012 & 2014 workshops.
Early registration for the MMDS 2016 Workshop is now open. Deadline: May 1st.
BIDS is sponsoring MMDS 2016.
Founded in 2013, the Berkeley Institute for Data Science (BIDS) is a central hub of research and education at UC Berkeley designed to facilitate and nurture data-intensive science.
The MMDS 2016 workshop will take place on the UC Berkeley Campus Tuesday, June 21 through Friday, June 24.
The 5th MMDS Workshop on Algorithms for Modern Massive Data Sets was held June 17–20, 2014, in Berkeley, CA. Video recordings of all the talks may be found on our YouTube channel. Download the full MMDS 2014 program here.