Speakers and Abstracts
Long presentation (4-6 hours)
Gunnar Raetsch - Max Planck Tuebingen / Sloan-Kettering Cancer Center, NY Kernel methods, compuidiosyncrasiestational biology applications (abstract/bio/slides/video)
Short presentations (1-2 hours)
Chih-Jen Lin - eBay San Jose and National Taiwan University Machine learning software: design and practical use (abstract/bio/slides/video)
David Haussler - University of California at Santa Cruz Analysis of Cancer Genomics (abstract/bio/slides/video: Part1, )
Asli Celikyilmaz - Microsoft Research Mountain View Language Understanding (abstract/bio/video: Part1, )
Introduction to Supervised, Unsupervised and Partially-supervised Training algorithms ( Slides: lec0, lec1, lec2, lec3; video: Part1, Part2, Part3)
This course will provide a simple unified introduction to batch training algorithms for supervised, unsupervised and partially-supervised learning. The concepts introduced will provide a basis for the more advanced topics in other lectures.
The first part of the course will cover supervised training algorithms, establishing a general foundation through a series of extensions to linear prediction, including nonlinear input transformations (features), L2 regularization (kernels), prediction uncertainty (Gaussian processes), L1 regularization (sparsity), nonlinear output transformations (matching losses), surrogate losses (classification), multivariate prediction, and structured prediction. Relevant optimization and modeling concepts will be acquired along the way.
The second part of the course will cover unsupervised training algorithms, including dimensionality reduction, clustering, coding, and simple extensions. The simple connection between between unsupervised and supervised learning will be emphasized, and basic convex relaxations of the training problems will be introduced.
Time permitting, the last part of the course covers partially-supervised learning---the problem of learning an input representation concurrently with a predictor. A brief overview of current research will be presented, including recent work on boosting and convex relaxations.
Bio: Dale Schuurmans is a Professor of Computing Science and Canada Research Chair in Machine Learning at the University of Alberta. He received his PhD in Computer Science from the University of Toronto, and has been employed at the National Research Council Canada, University of Pennsylvania, NEC Research Institute and the University of Waterloo. He is an Associate Editor of JAIR and AIJ, and currently serves on the IMLS and NIPS Foundation boards. He has previously served as a Program Co-chair for NIPS-2008 and ICML-2004, and as an Associate Editor for IEEE TPAMI, JMLR and MLJ. His research interests include machine learning, optimization, probability models, and search. He is author of more than 100 refereed publications in these areas and has received paper awards at IJCAI, AAAI, ICML, IEEE ICAL and IEEE ADPRL.
Machine learning for object recognition ( Slides: lec1(pdf/pptx) lec2(pdf/pptx) poselet paper; Video: Part 1, Part 2, Part 3)
I will begin my lectures with the canonical, well studied, problem of handwritten digit recognition where greater than 99% accuracy has been achieved with a number of different techniques- neural networks, support vector machines, nearest neighbor and randomized decision trees. We will move on to the problem of object detection where the object is at an unknown scale and location in the image. The classic examples here are face detection and pedestrian detection, both of which have good, practically usable solutions. For general 3d objects, the challenges due to varying pose, articulation, and occlusion, necessitate the use of part-based approaches. I will talk about two of the leading algorithms - the deformable part based model of Felzenszwalb et al (PAMI 2010) and the poselet approach developed in my group (Bourdev et al, 2009, 2010, 2011).
Bio: Jitendra Malik was born in Mathura, India in 1960. He received the B.Tech degree in Electrical Engineering from Indian Institute of Technology, Kanpur in 1980 and the PhD degree in Computer Science from Stanford University in 1985. In January 1986, he joined the university of California at Berkeley, where he is currently the Arthur J. Chick Professor in the Computer Science Division, Department of Electrical Engg and Computer Sciences. He is also on the faculty of the department of Bioengineering, and the Cognitive Science and Vision Science groups. During 2002-2004 he served as the Chair of the Computer Science Division and during 2004-2006 as the Department Chair of EECS. He serves on the advisory board of Microsoft Research India, and on the Governing Body of IIIT Bangalore
Prof. Malik's research group has worked on many different topics in computer vision, computational modeling of human vision, computer graphics and the analysis of biological images, resulting in more than 150 research papers and 30 PhD dissertations. Several well-known concepts and algorithms arose in this research, such as anisotropic diffusion, normalized cuts, high dynamic range imaging, and shape contexts. According to Google Scholar, seven of his papers have received more than a thousand citations each, and he is one of ISI's Highly Cited Researchers in Engineering.
He received the gold medal for the best graduating student in Electrical Engineering from IIT Kanpur in 1980 and a Presidential Young Investigator Award in 1989. At UC Berkeley, he was selected for the Diane S. McEntyre Award for Excellence in Teaching in 2000, a Miller Research Professorship in 2001, and appointed to be the Arthur J. Chick Professor in 2002. He received the Distinguished Alumnus Award from IIT Kanpur in 2008. He was awarded the Longuet-Higgins Prize for a contribution that has stood the test of time twice, in 2007 and in 2008. He is a fellow of the IEEE and the ACM, and a member of the National Academy of Engineering.
Online and distributed learning (Slides: lec1(pdf/pdf for print/pptx) lec2 (pdf/pptx) lec3(pdf/pptx); video)
Avrim Blum - Carnegie Mellon University
This tutorial will discuss simple online learning algorithms with surprisingly strong guarantees, as well as new models for learning over distributed data, and some new ways of looking at powerful classic machine learning tools. Online learning algorithms address questions like: how should I decide what route to drive to work each day if I have to choose before I know what traffic will be like? How should a seller adapt prices based on demand in real-time? In the context of online learning, I will talk about algorithms for "combining expert advice", "sleeping experts", and "bandit" problems, connections to game-theoretic notions of minimax optimality, and online algorithms for learning linear threshold functions in large feature spaces. Using these as a backdrop, I will then discuss interesting ways to analyze and think about learning from general similarity functions, building on the classic theory of kernel functions. Finally, I will discuss a model for distributed machine learning, where different parties hold different portions of a dataset and one would like to learn an overall accurate rule while minimizing the communication involved.
Bio: Avrim Blum is Professor of Computer Science at Carnegie Mellon University. His main research interests are in Machine Learning Theory, Approximation and Online Algorithms, and Algorithmic Game Theory, and he has also worked in AI Planning. He has served as Program Chair for the IEEE Symposium on Foundations of Computer Science (FOCS) and the Conference on Learning Theory (COLT), as well as on the organizing committee for the National Academy of Sciences U.S. Frontiers of Science Symposium. He was recipient of the Sloan Fellowship and NSF National Young Investigator Awards, the ICML/COLT 10-year best paper award, and is a Fellow of the ACM.
An Introduction to Kernel Methods for Classification, Regression and the Analysis of Structured Data ( Slides;)
Kernel methods have become very popular in machine learning research and many fields of applications. This tutorial will introduce kernels, their basic properties and methods which take advantage of them. We will use real world problems from computational biology and beyond as examples to illustrate how do select and engineer an appropriate kernel function. This tutorial will begin with a presentation of kernel methods and their properties. This will be followed by an introduction to the theory of support vector algorithms such as support vector machines, support vector regression and kernel principal component analysis. We will also briefly discuss optimization techniques to obtain solutions and discuss variations such as v-SVMs or C-SVMs. We will also discuss how kernel methods can be used for structured output prediction and nonparametric statistical inference. In the last part, we will show how kernel methods can be applied to problems in computational biology.
Bio: Dr. Gunnar Rätsch studied computer science and physics and obtained his Ph.D. degree in computer science in 2001 with his work in Machine Learning at the Fraunhofer Institute FIRST in Berlin. He was a postdoctoral fellow at the Research School of Information Sciences and Engineering of the Australian National University in Canberra (Australia), at the Max Planck Institute for Biological Cybernetics in Tübingen (Germany), and at Fraunhofer FIRST in Berlin (Germany). In 2002, he received the Michelson award for his Ph.D. work and in 2007 he was awarded the Olympus prize from the German Association for Pattern Recognition. Between 2005 and 2011 he led a research group at the Friedrich Miescher Laboratory of the Max Planck Society in Tübingen (Germany). In January 2012 he and his group moved to the Memorial Sloan-Kettering Cancer Center in New York City (USA). In their research, they analyze and model transcription and RNA processing as well as the regulation thereof and contribute to the development of techniques ranging from machine learning, sequence analysis, and optimization, to genetics, statistical testing, and image analysis.
Machine learning poses data driven optimization problems. Computing the function value and gradients for these problems is challenging because they often involves thousands of variables and millions of training data points. This can often be cast as a convex optimization problem. Therefore, a lot of recent research has focused on designing specialized optimization algorithms for such problems. In this talk, I will present a high level overview of a few such algorithm that were recently developed. The talk will be broadly accessible and will have plenty of fun pictures and illustrations!
Bio: S V N Vishwanathan is an associate Professor in the Departments of Statistics and Computer Science at Purdue University. Prior to coming to Purdue in fall 2008 Vishwanathan was a principal researcher in the Statistical Machine Learning program of NICTA with an adjunct appointment at the College of Engineering and Computer Science, Australian National University. Vishwanathan received his ME and Ph.D. from the Indian Institute of Science in 2000 and 2003 respectively. Vishwanathan's research interests are in the broad area of machine learning with emphasis on optimization, kernel methods, and structured prediction.
Joint probability models are useful for unsupervised learning ("knowledge discovery") as well as supervised learning tasks such as where we want to predict multiple correlated outputs ("collective classification"). One approach to representing such models is to explicitly model the correlations using a graphical model, where we add edges between correlated variables (nodes). Another approach is to implicitly model the correlations via a set of latent common "causes" or factors; this induces dependence between the visible variables without the need to add explicit edges between them. Such models are also often represented in graphical terms, but the structure of the graph is fixed in advance.
In this tutorial, we will discuss both classes of models; we will discuss how to infer latent variables using both exact and approximate methods; and how to estimate or learn parameters from data. The tutorial should be accessible to anyone with a basic understanding of probability and machine learning (e.g., at the level of naive Bayes).
Bio: Kevin Patrick Murphy was born in Ireland, grew up in England, went to graduate school in the USA (MEng from U. Penn, PhD from UC Berkeley, Postdoc at MIT), and then became a professor at the Computer Science and Statistics Departments at the University of British Columbia in Vancouver, Canada in 2004. After getting tenure, Kevin went to Google in Mountain View, California for his sabbatical. In 2011, he converted to a full-time research scientist at Google. Kevin has published over 50 papers in refereed conferences and journals related to machine learning and graphical models. He has recently published an 1100-page textbook called "Machine Learning: a Probabilistic Perspective" (MIT Press, 2012).
Prediction markets are financial markets designed to aggregate opinions across large populations of traders. A typical prediction market offers a set of securities with payoffs determined by the future state of the world. For example, a market might offer a security worth $1 if Barack Obama is re-elected in 2012 and $0 otherwise. Roughly speaking, a trader who believes the probability of Obama's re-election is p should be willing to buy this security at any price less than $p and (short) sell this security at any price greater than $p. For this reason, the going price of this security could be interpreted as traders' collective belief about the likelihood of Obama's re-election. Prediction markets have been used to generate accurate forecasts in a variety of domains including politics, disease surveillance, business, and entertainment, and are cited in the media increasingly often.
This tutorial will cover some of the basic mathematical ideas used in the design of prediction markets, and illustrate several fundamental connections between these ideas and techniques used in machine learning. We will begin with an overview of proper scoring rules, which can be used to measure the accuracy of a single entity's prediction, and are closely related to proper loss functions. We will then discuss market scoring rules, automated market makers based on proper scoring rules which can be used to aggregate the predictions of many forecasters, and describe how market scoring rules can be implemented as inventory-based markets in which securities are bought and sold. We will describe recent research exploring a duality-based interpretation of market scoring rules which can be exploited to design new markets that can be run efficiently over very large state spaces. Finally, time permitting, we will explore the fundamental mathematical connections between market scoring rules and two areas of machine learning: online "no-regret" learning and variational inference with exponential families.
This tutorial will be self-contained. No background on markets or specific areas of machine learning is required.
Bio: Jenn Wortman Vaughan is an assistant professor in the Computer Science Department at UCLA and a member of the UCLA Center for Engineering Economics, Learning, and Networks. She completed her Ph.D. at the University of Pennsylvania in 2009, and subsequently spent a year as a Computing Innovation Fellow at Harvard. Her research interests are in machine learning, algorithmic economics, and social computing. She is the recipient of Penn's 2009 Rubinoff dissertation award for innovative applications of computer technology, a National Science Foundation CAREER award, and best paper or best student paper awards at COLT, ACM EC, and UAI. In her "spare" time, she is involved in a variety of efforts to provide support for women in computer science; most notably, she co-founded the Annual Workshop for Women in Machine Learning, which will be held for the seventh time in 2012.
Boosting is a general method for producing a very accurate classification rule by combining rough and moderately inaccurate "rules of thumb". While rooted in a theoretical framework of machine learning, boosting has been found to perform quite well empirically.
This tutorial will introduce the boosting algorithms AdaBoost, LogitBoost and BrownBoost, and explain the underlying theory of boosting, including explanations that have been given as to why boosting often does not suffer from overfitting. Several theoretical points of view of boosting will be presented. Some practical applications and extensions of boosting will be described. Finally, some open problems for future work will be presented.
Bio: Yoav Freund is a professor of Computer Science and Engineering at UC San Diego. His work is in the area of machine learning, computational statistics information theory and their applications. He is best known for his joint work with Dr. Robert Schapire on the Adaboost algorithm. For this work they were awarded the 2003 Gödel prize in Theoretical Computer Science, as well as the Kanellakis Prize in 2004.
NLP for Smart People who Know Nothing about NLP (Slides [mirror]; video: Part1, Part2, Part3, Part4 , Part5)
Three decades ago, natural language processing researchers were writing grammars by hand. Now we're inventing new machine learning models, algorithms and frameworks. I'll discuss the transition that NLP underwent to statistic methods, with a focus on why we've been building our own new techniques, rather than assimilating off-the-shelf machine learning techniques. The focus of this tutorial will be on "weakly supervised" settings, in which we have a learning task and some relevant data for solving that task. We'll cover problems that are standard in the machine learning literature (sequence labeling), but focus more on problems that are more specific to language (eg., grammar-based models tailored toward syntax, semantics and translation).
Bio: Hal Daumé III is an assistant professor in Computer Science at the University of Maryland, College Park. He holds joint appointments in UMIACS andLinguistics. He was previously an assistant professor in the School of Computing at the University of Utah. His primary research interest is in developing new learning algorithms for prototypical problems that arise in the context of language processing and artificial intelligence. This includes topics like
structured prediction, domain adaptation and unsupervised learning; as well as multilingual modeling and affect analysis. He associates himself most with conferences like ACL, ICML, NIPS and EMNLP. He earned his PhD at the University of Southern California with a thesis on structured prediction for language (his advisor was Daniel Marcu). He spent the summer of 2003 working withEric Brill in the machine learning and applied statistics group at Microsoft Research. Prior to that, he studied math (mostly logic) at Carnegie Mellon University.
Information Retrieval (IR) studies the application of computers to the acquisition, organization, storage, retrieval and distribution of information. Machine learning techniques are widely used in various IR applications, such as text classification, text clustering, recommendation and ranking. In this talk, we will first provide an overview of how machine learning techniques are used in IR. Then we will present major frontiers and challenges in IR and the opportunities for machine learning. Finally, we will describe our research on one of the challenge, pro-active information retrieval (i.e. recommendation), and how machine learning techniques help us answer the following three questions: what to recommend, when to recommend, and how to recommend.
Bio: Yi Zhang is an Associate Professor in School of Engineering, University of California, Santa Cruz. Her research interests are information retrieval, machine learning, data mining and natural language processing. She has received various awards, including ACM SIGIR Best Paper Award, National Science Foundation Faculty Career Award, Air Force Young Investigator Award, Google Research Award, Microsoft Research Award, and IBM Research Fellowship. She has serve
as program chair, area chair and PC member for various conferences. She is an associate editor for ACM Transaction on Information Systems. She has been a consultant or technical adviser for several large companies and startups. Dr. Zhang received her Ph.D. from School of Computer Science at Carnegie Mellon University.
Metric spaces are very simple geometric structures: they comprise a set and a distance measure on that set that obeys the triangle inequality. I'll describe some basic metric space constructions, such as how to build new metric spaces from old ones, how to find "well-distributed" subsets of the spaces, and techniques for finding nearest neighbors.
Bio: Ken Clarkson is manager of the Computer Science Principles and Methodologies department at IBM Almaden Research in San Jose, CA, and for many years before was a staff member at Bell Laboratories. His work has mostly been on geometric algorithms, and in particular on the use of randomization, for such problems as linear programming, nearest neighbor search in metric spaces, simple polygon triangulation, building compressed quadtrees, and computing convex hulls.
Computational drug design (Slides: lec1; )
Drugs are still discovered using human intuition. Machine learning and cloud compute applied to large quantities of laboratory data can change this and revolutionize drug design, making drugs cheaper, safer and more effective – if we get it right.
In this talk I will present the problems inherent in applying machine learning to this domain. I will describe the problem and introduce the key challenges of noise, bias and representation. I will describe how we address these problems at Numerate by considering the particular idiosyncrasies of the available data.
A particular challenge in this domain is the design of appropriate evaluation experiments. I will discuss the challenges of evaluating predictive models on retrospective data and the importance of optimizing the right metrics.
These problems, of noise, bias, representation, and method evaluation are critical to successful applications of machine learning more generally. In this talk I hope to illustrate how they may often be best addressed through reasoning about the problem domain.
Bio: Nigel Duffy is the CTO and Co-Founder at Numerate Inc., where he leads the development of big data technologies for drug design. He has a masters degree in Math from University College Dublin, Ireland, and a PhD. in machine learning from the University of California at Santa Cruz. Nigel has published work in computational chemistry, computational biology, computational economics, computational linguistics and theoretical computer science and has applied machine learning in industry to spam filtering, computer games, and drug design. Under Nigel’s management Numerate has developed big data technology, tools, and processes to design new drugs. Nigel has received more than $7M in competitively funded research awards.
In 2006, Netflix announced the Netflix Prize, a machine learning and data mining competition for movie rating prediction that offered $1 million to whoever improved the accuracy of our existing system by 10%. The goal was to find new ways to improve our recommendations. However, we had to come up with a proxy question that was easier to evaluate and quantify: the root mean squared error (RMSE) of the predicted rating. Some of the findings from the Netflix Prize are still used in our systems today. In particular, Matrix Factorization and Restricted Boltzman Machines are important pieces in our rating prediction component.
But, Netflix personalization nowadays has much more than ratings predictions. Our machine-learned models now cover functionalities such as ranking, similarity, or row selection. And, besides accuracy, they are evaluated on metrics such as novelty, or freshness. In this presentation, I will describe some of the applications of Machine Learning to Netflix Personalization. I will also describe our innovation process in this space that involves thorough offline evaluation of our algorithms as well as continuous feedback from our users in the form of A/B test results.
Bio: Xavier Amatriain manages a team of researchers and engineers creating next generation personalized experiences at Netflix. He is working on the cross-roads of machine learning, data science, software engineering, and agile innovation. Previous to this, he was a researcher focused on Recommender Systems and neighboring areas such as Data Mining, User Modeling, Social Networks, and e-Commerce. He has authored more than 50 papers in books, journals and international conferences, and has lectured in several universities such as UPF in Barcelona, and UCSB in California.
Recommender Systems & The Social Web
Recommender systems present items from a collection to the users of the system in a contextual setting. The items may be chosen to maximize one or more objectives, for example click-through-rate, conversion, user engagement, or revenue. These systems have three classical dimensions: users, items, and interactions such as click, buy, and watch. The pervasiveness of social networks has magnified the utility of recommender systems and all three dimensions have exploded in scale: more users, more heterogeneous items, and more modes of interaction.
In this talk we present the challenges and opportunities of applying simple to sophisticated machine learning, data mining, and statistical modeling techniques to the world of recommender problems in social networks. First, using real world example applications deployed on LinkedIn, building from foundational literature on content based recommendations, collaborative filtering, and behavioral targeting techniques we arrive at the formalism of Social Filtering.
We then cover critical aspects of developing of a web scale Social Recommender Systems including infrastructure, feature engineering & model fitting. We describe some of the most fascinating challenges faced in the real-world setting of operating Recommender Systems including scalability, offline vs online tradeoffs, A/B testing and Multiple Objective Optimization.
Finally, conclude with some new and unique paradigms of virtual profiling, social referral and intent-interest modeling, in the context of the LinkedIn Recommender System.
Bio : Anmol Bhasin is a Senior Engineering Manager at LinkedIn, where he leads a team working on recommender systems, computational advertising and personalization. His team's contributions include LinkedIn's various personalized recommendation products (e.g., "Jobs You Might Be Interested In"), social news ("LinkedIn Today"), and systems for ad targeting and click through rate prediction. His team also built the content processing pipeline and online experimentation framework used for LinkedIn's suite of data products.
Prior to LinkedIn, Anmol worked at business search engine Business.com, where he developed the crawler, indexing systems, and retrieval algorithms. Anmol has also authored mobile gaming applications, including the award-winning Tecmo Bowl. Anmol received a Masters in Computer Science from the State University of New York at Buffalo, where he focused on text mining and applied machine learning for cross document learning.
Sibyl, a large scale machine learning system (Slides)
This talk will outline Sibyl, a large scale machine learning system developed at Google. The talk will describe the scale of problems that we encounter on the internet, some of the challenges that we face in tackling these problems, the machine learning algorithmic choices that we made, the overall architecture of Sibyl, and give some systems performance numbers.
Bio: Tushar Chandra is a Principal Software Engineer at Google, where he has worked on a number of large scale distributed systems. Prior to joining Google, he was a Research Staff Member at IBM research and the lead development architect at Tivoli Software. He holds a Bachelors of Technology in Computer Science from IIT-Kanpur and a Ph.D. in Computer Science from Cornell University. He shared the 2010 Dijkstra award with his co-authors for their work on "Unreliable Failure Detectors for Reliable Distributed Systems" and "The Weakest Failure Detector for Solving Consensus".
Tushar's research has focused on building software systems for large scale distributed computing with an emphasis on distributed algorithms. He is currently working on Sibyl, a large scale machine learning system.
Machine Learning Research at eBay: An Industry Lab Perspective (Slides)
This talk will highlight and discuss a variety of issues and open research topics that are important but often under-appreciated in applying machine learning to large industry problems. The aim is to expose machine learning students to some issues and techniques that are typically not discussed much in machine learning literature or course work, or are only beginning to emerge, but which offer rich opportunities for future and impactful research. These topics include: systems and engineering issues (e.g. why speed is so important, yet often misunderstood, in machine learning implementations), methodological issues (e.g.the importance of de-coupling the models used for training vs execution), and selected promising recent advances in (re)emerging topics (such as sampling and randomized algorithms).
Bio: Dennis DeCoste is currently Research Director of Machine Learning at eBay Research Labs. His research interests focus on large scale machine learning, especially as applied to search ranking, collaborative filtering, CTR prediction, text/image classification, and time series prediction. He also works with fast streaming algorithms for massive data, which includes efficient application of Hadoop, multi-cores, and CUDA/GPU to machine learning. Before joining eBay Research Labs he conducted machine learning research at Facebook, Microsoft Live Labs, and NASA’s Jet Propulsion Laboratory. He was also
the founding head of the machine learning research group at Yahoo! Research. Dennis earned his Ph.D. in Computer Science from the University of Illinois at Urbana-Champaign.
The development of machine learning software involves many issues beyond theory and algorithms. We need to consider numerical computation, code readability, system usability, user-interface design, maintenance, long-term support, and many others. In this talk, we take two popular machine learning packages, LIBSVM and LIBLINEAR, as examples. We have been actively developing them in the past decade. In the first part of this talk, we demonstrate the practical use of these two packages by running some real experiments. We give examples to see how users make mistakes or inappropriately apply machine learning techniques. This part of the course also serves as a useful practical guide to support vector machines (SVM) and related methods.
In the second part, we discuss design considerations in developing machine learning packages. We argue that many issues other than prediction accuracy are also very important. Finally, we give lessons learned and future perspectives for developing useful machine learning software.
Bio: Chih-Jen Lin is currently a distinguished professor at the Department of Computer Science, National Taiwan University and a visiting principal research scientist at eBay Research. He obtained his B.S. degree from National Taiwan University in 1993 and Ph.D. degree from University of Michigan in 1998. His major research areas include machine learning, data mining, and numerical optimization. He is best known for his work on support vector machines (SVM) for data classification. His software LIBSVM is one of the most widely used and cited SVM packages. Nearly all major companies apply his software for classification and regression applications. He has received many awards for his research work. A recent one is the ACM KDD 2010 best paper award. He is an IEEE fellow and an ACM distinguished scientist for his contribution to machine learning algorithms and software design. More information about him can be found at http://www.csie.ntu.edu.tw/~cjlin.
Locality sensitive hashing (LSH) is a fundamental computational tool. The idea in LSH is to hash objects in such a way that the similarity between two objects can be efficiently approximated using their respective hash values. We will review the basic LSH constructions and some recent results, along with some machine-learning and data mining applications.
Bio: Ravi Kumar has been a senior staff research scientist in Google since June 2012. Prior to this, he was a research staff member at the IBM Almaden Research Center and a principal research scientist at Yahoo! He obtained his PhD in Computer Science from Cornell University in 1998. His primary interests are web and data mining, social networks, algorithms for large data sets, and theory of computation. He serves on the editorial boards of JACM, TKDD, and TKDE.
We explore the hypothesis that reading can be acquired naturally if print is constantly available at an early age in the same manner as spoken language. If an appropriate form of written text is made available before formal schooling begins, reading should also be learned inductively, emerge naturally, and with no significant negative consequences. This proposal challenges the commonly held belief that written language requires formal instruction and schooling. It is potentially transformative because its success would revolutionize current views of literacy and schooling.
Utilizing developments in behavioral science and technology, an interactive system (Technology Assisted Reading Acquisition, TARA) will enable young pre-literate children to accurately perceive and learn various properties of written language by simply exposure to the written form. To succeed, TARA will require robust methods for automatically recognizing the meaning inherent in the child’s experience, and describing that experience in a written language form that is appropriate to the perceptual, cognitive, and linguistic capabilities of the child. There are two methods to recognize and convert the child’s experience into written language. The first method is to recognize the speech that is being spoken during the experience. The second method is to recognize the scene, the objects, and the actions it contains. Of course, both of these methods can be used together.
Bio:Dom Massaro (http://mambo.ucsc.edu/people/dominic-massaro.html) is UCSC Emeritus Professor of Psychology and Computer Engineering, director of the Perceptual Science Laboratory, and founding Chair of the Digital Arts and New Media M.F.A. program. He has been a Guggenheim Fellow, a University of Wisconsin Romnes Fellow, a James McKeen Cattell Fellow, and an NIMH Fellow. He is currently the book review editor of the American Journal of Psychology and founding co-editor of the journal Interpreting. His research uses a formal experimental and theoretical approach to the study of speech perception, reading, psycholinguistics, memory, cognition, learning, and decision-making. The applied potential of his research led to the creation of Psyentific Mind (www.psyentificmind.com), whose goal is apply behavioral science and technology to extend the range of the human mind. Its iPhone applications include iBaldi, a computer animated face with realistic speech and emotion; Kid Klok, an educational analog clock for children learning to tell time; Read What I Say, a speech to text for hearing disadvantaged individuals; and Read With Me!, an application that embellishes shared picture book reading with salient and easy-to-read text for very young children.
Efficient Learning Algorithms for Learning Sparse Models from Large Amounts of Data (Slides; video: Part1, Part2)
We will review the design, analysis and implementation of several sparsity promoting learning algorithms. We start with an efficient projected gradient algorithm onto the L1 ball. We then discuss a forward-backward splitting (Fobos) method that incorporates L1 and mixed-norms. We next present adaptive gradient versions of the above methods that generalize well-studied sub-gradient methods. Time permitting, we conclude with a description of a recent approach for "sparse counting" which facilitates compact and accurate Markov models.
Bio: Dr. Yoram Singer is a senior research scientist at Google. From 1999 through 2007 he was an associate professor at the Hebrew University of Jerusalem. From 1995 through 1999 he was a member of the technical staff at AT&T Research. He was the co-chair of the conference on Computational Learning Theory in 2004 and of Neural Information Processing Systems in 2007. He serves as an editor of the Journal of Machine Learning, IEEE Signal Processing Magazine, and IEEE Transactions on Pattern Analysis and Machine Intelligence.
The last ten years have seen a tremendous growth in Internet-based online services such as search, advertising, gaming and social networking. Today, it is important to analyze large collections of user interaction data as a first step in building predictive models for these services as well as learn these models in real-time. One of the biggest challenges in this setting is scale: not only does the sheer scale of data necessitate parallel processing but it also necessitates distributed models; with over 800 million active users at Facebook, any user-specific sets of features in a linear or non-linear model yields models of a size bigger than can be stored in a single system.
In this tutorial, I will give an introduction to distributed message passing, a theoretical framework that can deal both with the distributed inference and storage of models. After an overview of message passing, I will discuss and present recent advances in approximate message passing which allows to control the model size as the amount of training data grows. We will also review how distributed (approximate) message passing can be mapped to generalized distributed computing and how modeling constraints map on the system design. In the second part of the talk, I will give an overview of the application of these techniques to real-world
learning systems, namely:
1. Gamer ranking and matchmaking in TrueSkill and Halo 3
2. AdPredictor click-through rate learning and prediction in sponsored search
3. User-action models in Facebook's information distribution and advertising pipeline
Bio: Ralf Herbrich is Engineering Manager at Facebook where he is working on large-scale, distributed ranking systems & services for information distribution.
Before joining Facebook, he was heading the Bing Personalization team which focused on prototyping and enabling personalized experiences across Microsoft's Online Services Division. Prior this his work on Bing, Ralf was Director of Microsoft's Future Social Experiences (FUSE) Labs UK working on new social experiences powered by computational intelligence technologies on large online data collections. Ralf joined Microsoft Research in 2000 as a Postdoctoral researcher and Research Fellow of the Darwin College Cambridge <http://www.dar.cam.ac.uk/>. During his time at
Microsoft Research, Ralf was working in the area of machine learning, information retrieval, game theory, artificial intelligence and social network analysis. Prior to joining Microsoft, Ralf worked at the Technical University Berlin as a teaching assistant where he obtained both a diploma degree in Computer Science and a Ph.D. degree in Statistics.
Ralf's research interests include Bayesian inference and decision making, computer games, kernel methods and statistical learning theory. He co-authored over 50 journal and conference papers in these areas. Ralf is one of the inventors of the Drivatars? system used in the Forza Motorsport series as well as the TrueSkill? ranking and matchmaking system in Xbox 360 Live. He also co-invented the click-prediction technology used in Bing's online advertising system.
New directions for Google Goggles (video: Part1)
This talk will focus on two frontiers being explored by Google's visual search team.
i) Approaches based on extracting interest points and matching local descriptors using approximate nearest neighbor search have been successfully scaled to indices comprised of billions of images. But these methods fail for objects for which only a few interest points are present and locally only less discriminative features such as simple edges are prevalent. Accomplishing fine grained classification for such weakly textured objects is a challenge. But many objects fall into this category, such as cars or furniture. A promising approach we have been investigating integrates object detection with the ability to localize parts that are subsequently described by grouping local features into sufficiently global discriminative descriptors.
ii) A major challenge AI needs to overcome is to enable a system to learn from noisy datasets not curated by humans. It is a common experience that often the most time consuming task in
constructing a new vision capability is the collection of quality training data since this step typically involves human tagging. The better and the more detailed the annotation of a training set the easier the learning task. This is largely independent of the learning framework used. We will present latest results in our effort to learn from training sets which can contain significant amounts of mislabeled examples; such as training a car detector using a set of images returned by Google Image Search for the query "car". We will describe a method that can learn a Bayes optimal classifier from noisy data by mapping training to a quadratic programe that can be solved by quantum hardware.
Throughout life, the cells in every individual accumulate many changes in the DNA inherited from his or her parents. Certain combinations of changes lead to cancer. During the last decade, the cost of DNA sequencing has been dropping by a factor of 10 every two years, making it now possible to read most of the three billion base genome from a patient’s cancer tumor, and to try to determine all of the thousands of DNA changes in it. Under the auspices of NCI’s Cancer Genome Atlas Project, 10,000 tumors will be sequenced in this manner in the next two years. My group at UCSC is building the data center and much of the analysis infrastructure for this project. Soon cancer genome sequencing will be a widespread clinical practice, and millions of tumors will be sequenced. We plan that our center will be a seed out of which grows a national infrastructure to handle these data.
Language Understanding (video: Part1)
Abstract: Language understanding has been well studied in the context of question/answering, textual entailment, summarization, query understanding, etc. However data sources in virtual personal assistant systems pose new challenges, such as variability and ambiguities in natural language, or short utterances that rarely contain contextual information, etc. Thus, Spoken Language Understanding (SLU) plays an important role in allowing any sophisticated spoken dialog system (e.g., DARPA Calo (Berry et al.,2011), Siri, etc.) to take the correct machine actions.
The goal of SLU is to extract the meaning of queries uttered to a voice search system. A query can be expressed in full language, as what traditional SLU deals with; or it can consist of only keywords without apparent syntactic structure, as is usually seen in Web IR. The query can belong to a diversified set of domains with structured data sources at the backend, or it needs to be handled by a general-purpose search engine. Moreover, due to the sheer scale of the Web, it becomes important to go beyond supervised systems, and to handle uncertainty caused by ASR errors. This talk is intended to provide an overview of the machine learning methods associated with SLU research.
BIO: Asli Celikyilmaz is a Sr. Scientist at the Language & Intent group at Microsoft Silicon Valley. Asli's research interests are natural language processing and spoken language understanding, and machine learning (structured prediction, unsupervised learning and Bayesian methods). Prior to Microsoft, Asli was a postdoc researcher at the Computer Science Department of University of California, Berkeley between 2008-2010 working on machine learning methods on automatic text summarization and question answering.