Yale University.  
Computer Science.  
     
Computer Science
Main Page
Academics
Graduate Program
Undergraduate Program
Course Information
Course Web Pages
Research
Our Research
Research Areas
Technical Reports
People
Faculty
Graduate Students
Research and Technical Staff
Administrative Staff
Alumni
Degree Recipients
Resources
Calendars
Computing Facilities
CS Talks Mailing List
Yale Computer Science FAQ
Yale Workstation Support
Computing Lab
AfterCollege Job Resource
Graduate Writing Center
Department Information
Contact Us
History
Life in the Department
Life About Town
Directions
Job Openings
Faculty Positions
Useful Links
City of New Haven
Yale Applied Mathematics
Yale C2: Creative Consilience of
Computing and the Arts
Yale Faculty of Engineering
Yale GSAS Staff Directory
Yale University Home Page
Google Search
Yale Info Phonebook
Internal
Internal
 

Alan J. Perlis Lecture Series
April 17, 2007
4:00 p.m., AKW 200

Sign up to meet with speaker.

Speaker: David DeWitt, University of Wisconsin-Madison
Title: Column Stores: A Solution to TB Disk Drives?

Abstract: Relational database systems have used the same storage layout for the last 30 years in which variable length records are stored contiguously using a slotted page layout (generally termed NSM). Although a number of alternative storage strategies have been proposed including transposed files, DSM, and PAX, none have displaced NSM as the standard representation.

Two important technology trends are, however, at work against NSM. First, the NSM represenation has terrible L2 data cache performance. Second, while disks have gotten faster in recent years the rate of increase has not kept pace with the rate at which they have gotten bigger. Consequently, the effective bandwidth per byte of capacity has actually decreased. Column stores provide a potential solution for both of these technology barriers (which will not be going away anytime soon). Database systems based on a column-store architecture seem promising for a number of reasons. First, they exhibit excellent L2 data cache performance. Second, since only those columns needed by a query are actually read from disk, they minimize the amount of I/O performed. Third, column stores are very amenable to a variety of compression techniques - further reducing the I/O requirements of a query.

This talk is aimed at anyone interested in database systems and their implementation. I will trace the history of alternative storage representations for relational systems from transposed files (circa 1971) to DSM (1985), to PAX (2000), to Fractured Mirrors (2003), to C-Store (2005), and finally to SuperColumns (2006).