I am not at Yale any more, nor am I at Yahoo, so this web page is somewhat out of date.

I am currently (as of summer 2008) in the math department at Stanford University. Click here for my new web page and for more information.

You can reach me via electronic mail: mmahoney is the username, and cs dot stanford dot edu follows the at symbol. (Of course, my gmail address is still valid.)


NOTE: Our PNAS paper on "CUR Matrix Decompositions for Improved Data Analysis" has appeared as Proc. Natl. Acad. Sci. USA, 106, 697-702 (2009). Please email me if you can't access a copy from the PNAS web site.


Michael Mahoney's old home page

J. W. Gibbs Assistant Professor, Department of Mathematics
Research Affiliate, Department of Computer Science
Yale University



By popular demand ... we are running the MMDS Workshop on "Algorithms for Modern Massive Data Sets" again! As with the 2006 meeting, MMDS 2008 will address algorithmic, statistical, and mathematical challenges in large-scale statistical data analysis.


Click here for the 2006 Stanford/Yahoo "Workshop on Algorithms for Modern Massive Data Sets (MMDS 2006)" that I organized (with G. Golub, P. Drineas, and L-H. Lim) and that took place on June 21-24, 2006, or click here for an article in SIAM News about the meeting.

Click here for a ppt of the SIAM-SDM06 2006 tutorial, or click here for the overheads for several other talks.


Research interests

  • Design and analysis of algorithms for very large matrix, graph, and regression problems.
  • Applications to the statistical data analysis of massive scientific and Internet data.
  • Monte Carlo methods in computer science, statistical physics, and scientific computation.
  • Molecular modeling of liquid water (e.g., TIP5P water) and protein-DNA/RNA-water interactions.
  • Large-scale machine learning and large-scale data analytics.

I work on developing and analyzing algorithms for large matrix, graph, and regression problems, as well as applications of these tools to the statistical analysis of extremely large scientific and Internet data sets. Current work includes applying these and related algorithmic techniques to problems arising in Internet applications, and past work includes the application of these techniques to DNA single nucleotide polymorphism data. I have also worked on developing and analyzing Monte Carlo algorithms for performing useful computations on extremely large matrices, e.g., the additive-error and relative-error CUR matrix decompositions. Past research has also included work in computational statistical mechanics on the development and analysis of the TIP5P model of liquid water, as well as work in both computational and experimental biophysics/bioinformatics on proteins and protein-nucleic acid interactions.


Publications

  • CUR Matrix Decompositions for Improved Data Analysis,
  • M. W. Mahoney and P. Drineas,
    Proc. Natl. Acad. Sci. USA, 106, 697-702 (2009).
  • An Improved Approximation Algorithm for the Column Subset Selection Problem,
  • C. Boutsidis, M. W. Mahoney, and P. Drineas,
    Technical Report, Preprint: arXiv:0812.4293 (2008) (arXiv).
    Proc. 20-th Annual SODA, 968-977 (2009) (ps, pdf).
  • Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters,
  • J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney,
    Technical Report, Preprint: arXiv:0810.1355 (2008) (arXiv).
  • Unsupervised Feature Selection for Principal Components Analysis,
  • C. Boutsidis, M. W. Mahoney, and P. Drineas,
    Proc. 14-th Annual SIGKDD, 61-69 (2008) (ps, pdf).
  • Statistical properties of community structure in large social and information networks,
  • J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney,
    Proc. 17-th International WWW, 695-704 (2008) (ps, pdf).
  • Faster Least Squares Approximation,
  • P. Drineas, M. W. Mahoney, S. Muthukrishnan, and T. Sarlos,
    Technical Report, Preprint: arXiv:0710.1435 (2007) (arXiv),
    Journal version submitted for publication.
  • PCA-Correlated SNPs for Structure Identification in Worldwide Human Populations,
  • P. Paschou, E. Ziv, E. G. Burchard, S. Choudhry, W. Rodriguez-Cintron, M. W. Mahoney, and P. Drineas,
    PLoS Genetics, 3, 1672-1686 (2007) (ps, pdf).
  • Relative-Error CUR Matrix Decompositions,
  • P. Drineas, M. W. Mahoney, and S. Muthukrishnan,
    Technical Report, Preprint: arXiv:0708.3696 (2007) (arXiv),
    SIAM J. Matrix Analysis and Applications, 30, 844-881 (2008) (ps, pdf).
  • Feature Selection Methods for Text Classification,
  • A. Dasgupta, P. Drineas, B. Harb, V. Josifovski, and M. W. Mahoney,
    Proc. 13-th Annual SIGKDD, 230-239 (2007) (ps, pdf).
  • Sampling Algorithms and Coresets for Lp Regression,
  • A. Dasgupta, P. Drineas, B. Harb, R. Kumar, and M. W. Mahoney,
    Technical Report, Preprint: arXiv:0707.1714 (2007) (arXiv),
    Proc. 19-th Annual SODA, 932-941 (2008) (ps, pdf),
    SIAM J. Computing, 38, 2060-2078 (2009) (ps, pdf)
  • Web Information Retrieval and Linear Algebra Algorithms,
  • A. Frommer, M. W. Mahoney, and D. B. Szyld (Eds.),
    Proc. of Dagstuhl Seminar 07071, (2007) (web).
  • Intra- and interpopulation genotype reconstruction from tagging SNPs,
  • P. Paschou, M. W. Mahoney, A. Javed, J. R. Kidd, A. J. Pakstis, S. Gu, K. K. Kidd, and P. Drineas,
    Genome Research, 17(1), 96-107 (2007) (ps, pdf).
  • Bridging the Gap Between Numerical Linear Algebra, Theoretical Computer Science, and Data Applications,
  • G. H. Golub, M. W. Mahoney, P. Drineas, and L.-H. Lim,
    SIAM News 39:8 October 2006 (ps, pdf).
  • Randomized Algorithms for Matrices and Massive Data Sets,
  • P. Drineas and M. W. Mahoney,
    Proc. 32-nd Annual VLDB, 1269 (2006) (ps, pdf).
  • Subspace Sampling and Relative-Error Matrix Approximation: Column-Row-Based Methods,
  • P. Drineas, M. W. Mahoney, and S. Muthukrishnan,
    Proc. 14-th Annual ESA, 304-314 (2006) (ps, pdf).
  • Subspace Sampling and Relative-Error Matrix Approximation: Column-Based Methods,
  • P. Drineas, M. W. Mahoney, and S. Muthukrishnan,
    Proc. 10-th Annual RANDOM, 316-326 (2006) (ps, pdf).
  • Tensor-CUR Decompositions For Tensor-Based Data,
  • M. W. Mahoney, M. Maggioni, and P. Drineas,
    Proc. 12-th Annual SIGKDD, 327-336 (2006) (ps, pdf),
    SIAM J. Matrix Analysis and Applications, 30, 957-987 (2008) (ps, pdf).
  • Polynomial Time Algorithm for Column-Row-Based Relative-Error Low-Rank Matrix Approximation,
  • P. Drineas, M. W. Mahoney, and S. Muthukrishnan,
    Technical Report, DIMACS TR 2006-04 March 2006 (ps, pdf).
  • Sampling Algorithms for L2 Regression and Applications,
  • P. Drineas, M. W. Mahoney, and S. Muthukrishnan,
    Proc. 17-th Annual SODA, 1127-1136 (2006) (ps, pdf).
  • A Randomized Algorithm for a Tensor-Based Generalization of the Singular Value Decomposition,
  • P. Drineas and M. W. Mahoney,
    Technical Report, YALEU/DCS/TR-1327, June 2005 (ps, pdf),
    Linear Algebra and its Applications, 420, 553-571 (2007) (ps, pdf).
  • On the Nystrom Method for Approximating a Gram Matrix for Improved Kernel-Based Learning,
  • P. Drineas and M. W. Mahoney,
    Technical Report, YALEU/DCS/TR-1319, April 2005 (ps, pdf),
    Proc. 18-th Annual COLT, 323-337 (2005) (ps, pdf),
    J. Machine Learning Research, 6, 2153-2175 (2005) (ps, pdf).
  • Sampling Sub-problems of Heterogeneous Max-Cut Problems and Approximation Algorithms,
  • P. Drineas, R. Kannan, and M. W. Mahoney,
    Technical Report, YALEU/DCS/TR-1283, April 2004 (ps, pdf),
    Proc. 22-nd Annual STACS, 57-68 (2005) (ps, pdf),
    Random Structures and Algorithms, 32:3, 307-333 (2008) (ps, pdf).
  • Fast Monte Carlo Algorithms for Matrices III: Computing an Efficient Approximate Decomposition of a Matrix,
  • P. Drineas, R. Kannan, and M. W. Mahoney,
    Technical Report, YALEU/DCS/TR-1271, February 2004 (ps, pdf),
    SIAM J. Computing, 36, 184-206 (2006) (ps, pdf).
  • Fast Monte Carlo Algorithms for Matrices II: Computing Low-Rank Approximations to a Matrix,
  • P. Drineas, R. Kannan, and M. W. Mahoney,
    Technical Report, YALEU/DCS/TR-1270, February 2004 (ps, pdf),
    SIAM J. Computing, 36, 158-183 (2006) (ps, pdf).
  • Fast Monte Carlo Algorithms for Matrices I: Approximating Matrix Multiplication,
  • P. Drineas, R. Kannan, and M. W. Mahoney,
    Technical Report, YALEU/DCS/TR-1269, February 2004 (ps, pdf),
    SIAM J. Computing, 36, 132-157 (2006) (ps, pdf).
  • Rapid Mixing of Several Markov Chains for a Hard-Core Model,
  • R. Kannan, M. W. Mahoney, and R. Montenegro,
    Proc. 14-th Annual ISAAC, 663-675 (2003) (ps, pdf).
  • Quantum, intramolecular flexibility, and polarizability effects on the reproduction of the density anomaly of liquid water by simple potential functions,
  • M. W. Mahoney and W. L. Jorgensen,
    J. Chem. Phys., 115, 10758-10768 (2001) (ps, pdf).
  • Rapid estimation of electronic degrees of freedom in Monte Carlo calculations for polarizable models of liquid water,
  • M. W. Mahoney and W. L. Jorgensen,
    J. Chem. Phys., 114, 9337-9349 (2001) (ps, pdf).
  • Diffusion constant of the TIP5P model of liquid water,
  • M. W. Mahoney and W. L. Jorgensen,
    J. Chem. Phys., 114, 363-366 (2001) (ps, pdf).
  • A five-site model for liquid water and the reproduction of the density anomaly by rigid, nonpolarizable potential functions,
  • M. W. Mahoney and W. L. Jorgensen,
    J. Chem. Phys., 112, 8910-8922 (2000) (ps, pdf).
  • Repression and activation of promoter-bound RNA polymerase activity by Gal repressor,
  • H. E. Choy, R. R. Hanger, T. Aki, M. Mahoney, K. Murakami, A. Ishihama, and S. Adhya,
    J. Mol. Biol. 272: 293-300, 1997 (ps, pdf).
  • Discrete representations of the protein C-alpha chain,
  • X. F. de la Cruz, M. W. Mahoney, and B. K. Lee,
    Fold. & Des. 2: 223-234, 1997 (ps, pdf).