Yale University.  
Computer Science.  
     
Computer Science
Main Page
Academics
Graduate Program
Undergraduate Program
Course Information
Course Web Pages
Research
Our Research
Research Areas
Technical Reports
People
Faculty
Graduate Students
Research and Technical Staff
Administrative Staff
Alumni
Degree Recipients
Resources
Calendars
Computing Facilities
CS Talks Mailing List
Yale Computer Science FAQ
Yale Workstation Support
Computing Lab
AfterCollege Job Resource
Graduate Writing Center
Department Information
Contact Us
History
Life in the Department
Life About Town
Directions
Job Openings
Faculty Positions
Useful Links
City of New Haven
Yale Applied Mathematics
Yale C2: Creative Consilience of
Computing and the Arts
Yale Faculty of Engineering
Yale GSAS Staff Directory
Yale University Home Page
Google Search
Yale Info Phonebook
Internal
Internal
 

CS Colloquium
September 13, 2012
4:00 p.m., AKW 200

Refreshments will be available at 3:45

Speaker:
David Mimno
Title: Finding thousands of topics in millions of books

Abtract:
Statistical topic models have become popular in domains as distinct as biomedical research, political science, and literary scholarship. These methods represent text documents as combinations of themes, or topics, which are themselves probability distributions over a vocabulary. This low-dimensional topic representation is robust to variation in word choice and ambiguity in word sense, allowing users to analyze trends in large text collections. Existing methods for training topic models, however, have not kept pace with the size of today's document corpora. In this talk I will describe a new method that combines the best aspects of two inference methods, stochastic online inference and Markov chain Monte Carlo. I will demonstrate the scalability of this algorithm on a corpus of 1.2 million out-of-copyright books.

Bio: David Mimno is a postdoctoral researcher in the Computer Science department at Princeton University. He received his PhD from the University of Massachusetts, Amherst. Before graduate school, he served as Head Programmer at the Perseus Project, a digital library for cultural heritage materials, at Tufts University. He is supported by a CRA Computing Innovation fellowship.