MMDS 2014 Workshop Registration
MMDS 2014. Workshop on Algorithms for Modern Massive Data Sets.UC Berkeley, CA. Tuesday, June 17 to Friday, June 20. Talks will be held in Stanley Hall. Applications for poster presentations are still open. Submit an abstract soon.
Workshop on Algorithms for Modern Massive Data Sets (MMDS)
The Workshops on Algorithms for Modern Massive Data Sets (MMDS) address algorithmic and statistical challenges in modern large-scale data analysis. The goals of this series of workshops are to explore novel techniques for modeling and analyzing massive, high-dimensional, and nonlinearly-structured scientific and internet data sets; and to bring together computer scientists, statisticians, mathematicians, and data analysis practitioners to promote the cross-fertilization of ideas.
Ayasdi Sponsors First Night's Reception
Join Ayasdi on June 17th for an evening reception of food, drinks, and data science! Ayasdi will be giving live demos of their software and answering all of your questions about Topological Data Analysis.
MMDS 2014 Workshop Announcement
The 2014 Workshop on Algorithms for Modern Massive Data Sets (MMDS 2014) will have four days of presentations by experts from academia and industry as well as discussion and poster presentations. Planned workshop themes include: large-scale statistical data analysis; scientific and industrial applications; algorithmic and statistical approaches to data; matrix and graph methods; and large scale computing and machine learning. The event takes place at UC Berkeley campus Tuesday, June 17 through Friday, June 20.
— Subscribe to our mailing list for MMDS 2014 event updates —Subscribe
Workshop ScheduleDownload the full MMDS 2014 program here.
|Tue, June 17||Data Analysis and Statistical Data Analysis|
|08:00 -||09:45||Breakfast and registration||*|
|09:45 -||10:00||Welcome and opening remarks||Organizers|
|10:00 -||11:00||Large Scale Machine Learning at Verizon||Ashok Srivastava|
|11:00 -||11:30||Communication Cost in Big Data Processing||Dan Suciu|
|11:30 -||12:00||Content-based search in 50TB of consumer-produced videos||Gerald Friedland|
|02:00 -||02:30||Myria: Scalable Analytics as a Service||Bill Howe|
|02:30 -||03:00||Computing stationary distribution, locally||Devavrat Shah|
|03:00 -||03:30||Spectral algorithms for graph mining and analysis||Yiannis Koutis|
|03:30 -||04:00||Network community detection||Jiashun Jin|
|04:00 -||04:30||Coffee break||*|
|04:30 -||05:00||Optimal CUR Matrix Decompositions||David Woodruff|
|05:00 -||05:30||Dimensionality reduction via sparse matrices||Jelani Nelson|
|05:30 -||06:00||Influence sampling for generalized linear models||Jinzhu Jia|
|06:00 -||09:00||Dinner Reception||*|
|Wed, June 18||Industrial and Scientific Applications|
|09:00 -||10:00||Counterfactual reasoning and massive data sets||Leon Bottou|
|10:00 -||10:30||Connected Components in MapReduce and Beyond||Sergei Vassilvitskii|
|10:30 -||11:00||Coffee break||*|
|11:00 -||11:30||Distributing Large-scale Recommendation Algorithms: from GPUs to the Cloud||Xavier Amatriain|
|11:30 -||12:00||Disentangling sources of risk in massive financial portfolios||Jeffrey Bohn|
|02:30 -||03:00||Localized Methods for Diffusions in Large Graphs||David Gleich|
|03:00 -||03:30||FAST-PPR: Scaling Personalized PageRank Estimation for Large Graphs||Ashish Goel|
|03:30 -||04:00||Locally-biased and semi-supervised eigenvectors||Michael Mahoney|
|04:00 -||04:30||Coffee break||*|
|04:30 -||05:00||Optimal Shrinkage of Fast Singular Values||Matan Gavish|
|05:00 -||05:30||Dimension Independent Matrix Square using MapReduce||Reza Zadeh|
|Thu, June 19||Novel Algorithmic Approaches|
|09:00-||10:00||Analyzing Big Graphs via Sketching and Streaming||Andrew McGregor|
|10:00-||10:30||Large-Scale Inference in Time Domain Astrophysics||Joshua Bloom|
|11:00-||11:30||Exploring "forgotten" one-shot learning||Alek Kolcz|
|11:30-||12:00||Modeling Dynamics of Opinion Formation in Social Networks||Sreenivas Gollapudi|
|12:00-||12:30||Multi-reference Alignment: Estimating Group Transformations using Semidefinite Programming||Amit Singer|
|02:30-||03:00||IPython: a language-independent framework for computation and data||Fernando Perez|
|03:00-||03:30||Reducing Communication in Parallel Graph Computations||Aydin Buluc|
|03:30-||04:00||Large Scale Graph-Parallel Computation for Machine Learning: Applications and Systems||Joseph Gonzalez|
|04:30-||05:00||CUR Factorization via Discrete Empirical Interpolation||Mark Embree|
|05:00-||05:30||Leverage scores: Sensitivity and an App||Ilse Ipsen|
|05:30-||06:00||libSkylark: Sketching-based Accelerated Numerical Linear Algebra and Machine Learning for Distributed-memory Systems||Vikas Sindhwani|
|06:00-||09:00||Dinner Reception and Poster Session||*|
|Fri, June 20||Novel Matrix and Graph Methods|
|09:00-||10:00||Large-Scale Numerical Computation Using a Data Flow Engine||Matei Zaharia|
|10:00-||10:30||Automatic discovery of cell types and microcircuitry from neural connectomics||Eric Jonas|
|11:00-||11:30||Beyond Locality Sensitive Hashing||Alexandr Andoni|
|11:30-||12:00||Combinatorial optimization and sparse computation for large scale data mining||Dorit Hochbaum|
|12:00-||12:30||Public Participation in International Security - Open Source Treaty Verification||Christopher Stubbs|
|02:30-||03:00||The Hearts and Minds of Data Science||Cecilia Aragon|
|03:00-||03:30||The fall and rise of geometric centralities||Sebastiano Vigna|
|03:30-||04:00||Mixed Regression||Constantine Caramanis|
|04:00-||04:30||No Free Lunch for Stress Testers: Toward a Normative Theory of Scenario-Based Risk Assessment||Lisa Goldberg|
|Alexandr Andoni||Microsoft Research|
|Amit Singer||Princeton University|
|Andrei Kirilenko||MIT Sloan School of Management|
|Andrew McGregor||University of Massachusetts|
|Anna Gilbert||University of Michigan|
|Ashish Goel||Stanford University|
|Aydin Buluc||Berkeley Lab|
|Ben Recht||UC Berkeley|
|Bill Howe||University of Washington eScience Institute|
|Cecilia Aragon||University of Washington|
|Christopher Stubbs||Harvard University|
|Constantine Caramanis||UT Austin|
|Dan Suciu||University of Washington|
|David Gleich||Purdue University|
|David Woodruff||IBM Research Almaden|
|Dorit Hochbaum||UC Berkeley|
|Eric Jonas||UC Berkeley|
|Fernando Perez||UC Berkeley|
|Ilse Ipsen||North Carolina State University|
|Jeffrey Bohn||State Street|
|Jelani Nelson||Harvard University|
|Jiashun Jin||Carnegie Mellon University|
|Jinzhu Jia||Peking University|
|Joseph Gonzalez||UC Berkeley|
|Joshua Bloom||UC Berkeley|
|Leon Bottou||Microsoft Research|
|Lisa Goldberg||University of California, Berkeley|
|Mark Embree||Virginia Tech|
|Matan Gavish||Stanford University|
|Matei Zaharia||Databricks, MIT|
|Michael Mahoney||UC Berkeley|
|Reza Zadeh||Stanford University|
|Sebastiano Vigna||Università degli Studi di Milano|
|Sreenivas Gollapudi||Microsoft Research|
|Vikas Sindhwani||IBM Research|
|Yiannis Koutis||University of Puerto Rico - Rio Piedras|
The Workshops on Algorithms for Modern Massive Data Sets (MMDS) address algorithmic and statistical challenges in modern large-scale data analysis. The program for MMDS 2014 will be structured around three related foci: theoretical foundations; novel implementations; and diverse applications. Applications to be discussed include astrophysics, genetics, finance, telecommunications, earthquake monitoring, defense and international treaty verification, business analytics, internet advertising and analysis, and social network analysis. Implementation topics will include MapReduce, Spark, and related frameworks, and extending these frameworks to do iterative matrix algorithms and large-scale machine learning and graph analytics; systems for reducing communication in parallel and distributed graph computations; systems for distributed randomized numerical linear algebra; IPython and scalable analytics as a service; and scaling novel theoretical methods up to tera-scale problems and beyond. Theoretical topics will include sketching, streaming, and projection algorithms for matrix and graph problems; randomized numerical linear algebra methods; communication-aware matrix and graph algorithms; localized spectral and diffusion methods for large-scale graph computations; and novel developments in locality-sensitive hashing, large-scale optimization, etc.
Titles for this year's talks include:
|Exploring "forgotten" one-shot learning (Alek Kolcz)|
|Beyond Locality Sensitive Hashing (Alexandr Andoni)|
|Multi-reference Alignment: Estimating Group Transformations using Semidefinite Programming (Amit Singer)|
|Do U.S. Regulators Listen to the Public? Testing the Regulatory Process with the RegRank Algorithm (Andrei Kirilenko)|
|Analyzing Big Graphs via Sketching and Streaming (Andrew McGregor)|
|Data-based inverse problems (Anna Gilbert)|
|FAST-PPR: Scaling Personalized PageRank Estimation for Large Graphs (Ashish Goel)|
|Large Scale Machine Learning at Verizon (Ashok Srivastava)|
|Reducing Communication in Parallel Graph Computations (Aydin Buluc)|
|Myria: Scalable Analytics as a Service (Bill Howe)|
|The Hearts and Minds of Data Science (Cecilia Aragon)|
|Public Participation in International Security - Open Source Treaty Verification (Christopher Stubbs)|
|Mixed Regression (Constantine Caramanis)|
|Communication Cost in Big Data Processing (Dan Suciu)|
|Localized Methods for Diffusions in Large Graphs (David Gleich)|
|Optimal CUR Matrix Decompositions (David Woodruff)|
|Computing stationary distribution, locally (Devavrat Shah)|
|Combinatorial optimization and sparse computation for large scale data mining (Dorit Hochbaum)|
|Automatic discovery of cell types and microcircuitry from neural connectomics (Eric Jonas)|
|IPython: a language-independent framework for computation and data (Fernando Perez)|
|Content-based search in 50TB of consumer-produced videos (Gerald Friedland)|
|Leverage scores: Sensitivity and an App (Ilse Ipsen)|
|Disentangling sources of risk in massive financial portfolios (Jeffrey Bohn)|
|Dimensionality reduction via sparse matrices (Jelani Nelson)|
|Network community detection (Jiashun Jin)|
|Influence sampling for generalized linear models (Jinzhu Jia)|
|Large Scale Graph-Parallel Computation for Machine Learning: Applications and Systems (Joseph Gonzalez)|
|Large-Scale Inference in Time Domain Astrophysics (Joshua Bloom)|
|Counterfactual reasoning and massive data sets (Leon Bottou)|
|No Free Lunch for Stress Testers: Toward a Normative Theory of Scenario-Based Risk Assessment (Lisa Goldberg)|
|CUR Factorization via Discrete Empirical Interpolation (Mark Embree)|
|Optimal Shrinkage of Fast Singular Values (Matan Gavish)|
|Large-Scale Numerical Computation Using a Data Flow Engine (Matei Zaharia)|
|Locally-biased and semi-supervised eigenvectors (Michael Mahoney)|
|APPROX: Accelerated, Parallel and PROXimal coordinate descent (Peter Richtarik)|
|Dimension Independent Matrix Square using MapReduce (Reza Zadeh)|
|The fall and rise of geometric centralities (Sebastiano Vigna)|
|Connected Components in MapReduce and Beyond (Sergei Vassilvitskii)|
|Modeling Dynamics of Opinion Formation in Social Networks (Sreenivas Gollapudi)|
|libSkylark: Sketching-based Accelerated Numerical Linear Algebra and Machine Learning for Distributed-memory Systems (Vikas Sindhwani)|
|Distributing Large-scale Recommendation Algorithms: from GPUs to the Cloud (Xavier Amatriain)|
|Spectral algorithms for graph mining and analysis (Yiannis Koutis)|
HotelsThe following is a list of recommended hotel and bed & breakfast options. Most of the locations are within 30 minute walking distance to Stanley Hall.
Rose Garden Inn
Berkeley Lab Guest House
Double Tree by Hilton
Claremont Hotel Club & Spa
Mary's Bed and Breakfast
The Brick Path Bed and Breakfast
Directions & Parking
BART from Oakland International Airport: Take the AirBART shuttle ($3) to the Oakland Coliseum BART station. At the BART station, purchase a ticket to Downtown Berkeley. Board a Richmond train (orange line), which will take you directly to the Downtown Berkeley station. Travel time: 45 minutes.
General information on public trasportation and parking around UC Berkeley may be found here and here. Campus parking is available at the Underhill Structure, located on Channing Way between The closest off-campus parking is the City of Berkeley Telegraph Channing Garage, located at 2450 Durant Ave., between Dana St. and Telegraph Ave. More parking infomation may be found at this link.
AnnouncementsThe program for MMDS 2014 is now available for download.
Evening receptions will be held on Tuesday, June 17th and Thursday, June 19th. Join Ayasdi on June 17th for an evening reception of food, drinks, and data science. The Thursday reception will be held jointly with the MMDS poster session.
Early registration for the MMDS 2014 workshop extended to May 7th.
Call for posters: In addition to the talks there will be a poster session during one of the receptions. You may apply to present a poster on the registration page.
Early registration for the MMDS 2014 workshop will remain open through May 1st.
We will be running MMDS 2014 at UC Berkeley campus Tuesday, June 17 through Friday, June 20.
Latest newsMMDS 2014 talks will take place in Stanley Hall.
GraphLab is hosting its 3rd conference on Monday July 21, 2014 at the Nikko Hotel in San Francisco. More information about this event may be found here.
Organizing committeeMichael Mahoney (Chair), ICSI and Department of Statistics, UC Berkeley.
Alexander Shkolnik, Institute for Computational Mathematics and Engineering, Stanford University.
Petros Drineas, Department of Computer Science, Rensselaer Polytechnic Institute.
Reza Zadeh, Institute for Computational Mathematics and Engineering, Stanford University.
Fernando Perez, Henry H. Wheeler Jr. Brain Imaging Center, UC Berkeley.