Course Web Pages
Research and Technical Staff
CS Talks Mailing List
Yale Computer Science FAQ
Yale Workstation Support
AfterCollege Job Resource
Graduate Writing Center
Life in the Department
Life About Town
Yale Applied Mathematics
Yale C2: Creative Consilience of
Computing and the Arts
Yale Faculty of Engineering
Yale GSAS Staff Directory
Yale University Home Page
Yale Info Phonebook
Professor Gerstein does research in the new field of bioinformatics, which involves applying quantitative approaches to problems in molecular biology and genomics. His research involves a range of computational techniques, including systematic datamining and machine learning, visualization of high-dimensional data, biological database design, and molecular simulation.
Broadly, Professor Gerstein is interested in analyses of genome sequences, macromolecular structures, molecular networks, and functional-genomics datasets. He is particularly focused on the human genome and personal genome sequences in relation to three areas.
(1) He is interested in annotating the human genome sequence, especially in characterizing the vast expanse of non-coding sequence. This work involves the creation of automatic pipelines for identifying patterns and homologies in the genome sequence and processing large-scale next-generation sequencing data efficiently. He is also interested in studying the genomic variations between individuals, particularly in identifying and assembling large blocks of variant sequence.
(2) He is trying to get at the function of all the protein elements encoded by the genome. Here, the approach is to characterize function systematically through the use of molecular networks. This work involves extensive application of machine learning approaches such as Bayesian networks, decision trees, and clustering. Also important in this work is developing ontologies for biological functions and statistically reliable methods for predicting protein function based on sequence similarity, functional genomics data, and automated analysis of the literature.
(3) Finally, for the population of proteins that have known 3D structures, he is trying to see how their function is carried out through motion and how motion can be predicted from packing geometry. This involves developing ways of aligning structures, clustering related ones into fold families, analyzing packing with Voronoi polyhedra, and simulating motions using molecular-mechanics potentials.
L.Y. Wang, A. Abyzov, J.O. Korbel, M Snyder, M. Gerstein (2009). "MSB: a mean-shift-based approach for the analysis of structural variation in the genome," Genome Res 19: 106-17.
P.M. Kim, L.J. Lu, Y. Xia, M.B. Gerstein (2006). "Relating three-dimensional structures to protein networks provides evolutionary insights," Science 314: 1938-41.
H. Yu, M. Gerstein (2006). "Genomic
analysis of the hierarchical structure of regulatory networks,"
M. Gerstein, D. Zheng (2006). "The real life of pseudogenes," Sci Am 295: 48-55.