• Rex (Zhitao) Ying

    I'm currently an assistant professor in the department of computer science at Yale University.

    My Lab
    Slide 1
  • Rex (Zhitao) Ying

    I work on AI and machine learning algorithms that leverage graph structure of data.

    Details
    Slide 2
  • Rex (Zhitao) Ying

    I'm excited to be a founding member of Kumo.ai!

    More About Kumo
    Slide 3

Welcome to my personal website

Refer to my Yale profile for more information. Our lab is hiring motivated and capable Ph.D students interested in geometric deep learning, graph neural nets, relational reasoning with foundation models, and trustworthy AI.

About Me

Introduction

I'm an assistant professor in the Department of Computer Science at Yale University. My research focus includes algorithms for graph neural networks, geometric embeddings, explainable models, and more recently, multi-modal foundation models involving relational reasoning. I am the author of many widely used GNN algorithms such as GraphSAGE, PinSAGE and GNNExplainer. In addition, I have worked on a variety of applications of graph learning in physical simulations, social networks, knowledge graphs, neuroscience and biotechnology. I developed the first billion-scale graph embedding services at Pinterest, and the graph-based anomaly detection algorithm at Amazon.


Education

I obtained my Ph.D degree in computer science at Stanford University, advised by Jure Leskovec. My thesis focuses on expressive, scalable and explainable GNNs (graph neural networks), which is available on Github. Prior to that, I graduated from Duke University in 2016 with the highest distinction. I majored in computer science and mathematics.

Prospective Students

We sincerely welcome interested students! Please refer to this section for more detail.
photo_rex

Dr. Rex (Zhitao) Ying

Assistant professor

News

Research Outline

My research focuses on deep learning for graphs and (implicit) relational data; as well as building multimodal foundation models with strong relational reasoning capability. I'm also very interested in geometric deep learning, non-Euclidean representation learning and trustworthy deep learning, all of which have great synergy with graph learning or foundation models. My interests span from theoretical questions in graph learning, relational reasoning, generalization, manifold geometry, to practical use cases in science and technology, collaborating with science institutes and IT companies.

I have collaborated with industrial research labs including DeepMind, Amazon, and national research labs such as Sandia. I have active collaborations with startups including Kumo.AI and Collov.

My list of publications can be found on Google Scholar
word_cloud

Deep Learning
for Graphs

Multimodal Relational
Foundation Models

Geometric
Representation Learning

Unlimited
Applications


Graph neural networks (GNNs) are powerful tools that play an important role in machine learning, to make predictions based on ubiquitous graph and network structure.
It is crucial for pretrained foundation models to understand and make use of relational information to empower intelligent reasoning, leveraging graph representations.
The goal is to empower deep learning architectures with effective representation geometry, which is essential in modeling data manifolds with different characteristics.
I work on real-world applications in chemistry, biology, neuroscience, physical simulations, knowledge graphs, natural languages, recommendations and social networks.

Services

I'm excited to serve the research community in various aspects. I co-lead the open-source project, PyTorch Geometric, which aims to make developing graph neural networks easy and accessible for researchers, engineers and general audience with a variety of background. I served as committee members for machine learning conferences including AAAI, ICML, NeurIPS, ICLR, KDD, WebConf for over 7 years, and I am serving as area chair for LoG 2022. In addition, I organized a variety of workshops on topics including graph learning, graph neural networks and deep learning for simulation.

Teaching

I teach two courses: "Deep Learning for Graph-Structured Data" and "Trustworthy Deep Learning" at Yale.

Course Websites

My Past Workshops

  • New Frontiers in Graph Learning (GLFrontiers) at NeurIPS 2022 and NeurIPS 2023
  • Deep Learning for Simulation (SimDL) at ICLR 2021
  • Stanford Graph Learning Workshop (SGL)
  • Graph Representationn Learning and Beyond (GRL+) at ICML 2020
  • Co-organized the 2020 KDD Cup Competition on Graph AutoML.

Selected Publications

A few selected publications are listed for each research direction. See Google Scholar for a full list of publications.
Most of the algorithms developed are open-sourced as part of the PyTorch Geometric Library.

Deep Learning on Graphs

I focus on advancing graph neural network (GNN) architectures and improving the expressiveness, scalability, interpretability and robustness of GNNs. More recently, I focus on pre-trained, large-scale foundation models for graph-structured data.

Building on pre-trained GNN models, we explore the general strategy of automatically selecting auxiliary datasets to improve molecule prediction performance using meta-learning.

DeSCo is the state-of-the-art (pre-trained) GNN that can perform reliable subgraph counting predictions on real-world large networks, based on symmetry breaking and expressive GNN architecture.

KDD 2023

BatchSampler is a novel and general negative sampling scheme for self-supervised contrastive learning framework. It improves pre-trained model performance in the domains of images, natural language and graphs.

ICML 2022

LA-GNN is a general pre-training framework that improves GNN performance through augmentation.

ICLR 2021

GNNs can learn to execute graph algorithms.

AAAI 2021

ID-GNN improves the expressiveness of GNN by considering node identities.

NeurIPS 2017

GraphSAGE is a general GNN framework for large-scale graph learning.

ICML 2018

GraphRNN is one of the first graph generative models for learning distribution of graphs.

NeurIPS 2019

The first framework to explain predictions made by GNNs!

Representation Learning

I innovate in representation learning techniques and embedding geometry for embedding for data with different characteristics (hierarchical, heterogeneous etc.).

ICML 2023

HIE is a task-agnostic and model-agnostic method to advance existing hyperbolic embedding methods, by incorporating cost-free hierarchical information deduced from the hyperbolic distance of the node to origin.

KDD 2023

We learn interpretable representations that capture the document and text hierarchies through combining hyperbolic GNNs and topic modeling.


NeurIPS 2019

HGCN embeds nodes in a graph in hyperbolic space to capture hierarchical structure. It's one of the first hyperbolic GNNs.

NeurIPS 2021

ConE uses hyperbolic cones to model the heterogeneous hierarchies in knowledge graph.

NeurIPS 2021

We embed biological sequences in Euclidean and hyperbolic spaces for solving challenging problems such as multiple sequence alignment.

IEEE Data Engineering Bulletin 2017

A survey on graph representation learning, including distributed embedding approaches and GNNs.

NeurIPS 2018

GraphRNN is one of the first deep graph generative models for learning distribution of graphs.

ICML 2019

P-GNN improves the expressiveness of position information for node embeddings.

Applications

Natural phenomenon and world's knowledge can often be expressed with the language of graphs. In addition to the popular applications of graphs such as social networks, recommender systems, knowledge graphs, biological networks and molecules, I'm also interested in novel ways of incorporating relational reasoning to other fields of science and technology, such as physical simulations, natural language and industrial relational database predictions.

ICML 2020

We enable graph neural networks to learn to produce realistic simulations of different materials.

KDD 2022

We collaborate with Saudi Aramco to use machine learning for simulating oil and water flows, underground pressure and oil production.

NAACL 2021

Heterogeneous graph attention on dependency parsing trees can improve robustness and performance of aspect-level sentiment analysis in NLP.

KDD 2018

The first GNN-based recommender system applied to billion-user-scale industrial platforms. It is deployed at Pinterest and also serves as an embedding service for many downstream use cases.

NeurIPS 2018

Combine GNNs and reinforcement learning to generate realistic molecules with desired chemical and biological goals.

KDD 2021

Dynamic GNN for anomaly detection at Amazon.com via efficient message passing procedure and pre-training.

To Perspective Students

For perspective students, I appreciate reading the following before reaching out to me through email. To make it easier for me to identify the applications, use "PhD (or Postdoc, Visiting Student) Application" as your title. Due to the abundance of application emails, I might not be able to always respond to the email. But if you believe that you possess the credentials and quality mentioned below, feel free to remind me if you have not received a response after a week.

Note (special focus):

  • If you are interested in generative AI and specifically spatial AGI, we have industrial collaboration opportunities to work on this direction.
  • If you are interested in applying for PhD program focusing on AI for neuroscience, explicitly mention it in the email when reaching out to me. I encourage you to check out WTI (computational track), which I'm a part of.
  • If you are interested in AI for computational biology, check out Yale CBB program, which I'm also a part of.

PhDs

When reaching out to me, it would be best to demonstrate the following in your email.
  • Reaching out early and ask about opportunity for collaboration with my lab can be a very effective way to stand out from all the candidates.
  • Highly competitive PhD student applicants usually had abundant research experiences prior to the application. Note that the number of publications is not the crucial factor, but quality, novelty and potential impact of the research are.
  • A student who had a single top-tier publication, but demonstrated outstanding ability (usually as a first author) in idea formulation, implementation, experiments, analysis and writing is considered more competitive than a student who participated in many research works but did not own / lead one from beginning to the end.
  • Publications in top-tier ML and data mining conferences such as NeurIPS, ICML, ICLR, KDD, WebConf etc. are highly encouraged. High-impact journal publications in interdisciplinary fields are also highly appreciated.
  • It is recommended that the field of your prior research is under the broad category of machine learning. However, the actual research topic does not need to be similar to mine, as long as the candidate demonstrates interests and understanding of the research topics of our lab, and has demonstrated the good quality as mentioned above. We welcome diversity at all levels, including skill sets!

Note:

I understand that while most applicants have prior research experiences and paper in submission, some of the students do not yet have a top conference publication yet. I recommend highly motivated students to reach out to me way earlier than the admission deadline, and join as a collaborator in existing projects, with the goal of a publication. I will be able to occasionally brainstorm, discuss and meet. Major progress, achievements and paper during the project can better help me advocate for the application.

Master Students

I am part of the Yale Computer Science Master Advising Committee. Master students are encouraged to apply only through the school application portal. If you are already admitted by a program at Yale and are interested in doing research with me, feel free to send me an email for further discussions.

Postdocs

Postdoc candidates are encouraged to reach out to me as well. Our lab hires, on average, 1 postdoc every 2 years.
  • Successful candidates usually have 3 or more solid and highly impactful publications in an area, and have a coherent and unified thesis on a specific topic, encompassing a number of works.
  • Similar to evaluating PhD applicants, I value paper quality over quantity. The standard will be higher for postdoc candidates.
  • Prior experiences in leading a team of researchers on a large-scope project will be appreciated.
  • The candidates are required to have extensive research experiences in either foundation models, multimodal models, graph learning, trustworthy deep learning or relational reasoning.
After passing preliminary screening, The candidate will be asked to give a research talk (remote or in-person) to the group and talk to lab members, before receiving a decision.

Visiting Students

I welcome visiting students / internships at all levels. The duration can be somewhat flexible, although the student is required to be committed towards finishing a research project for publication (as first author or co-author). Students are required to demonstrate a strong interest, good background knowledge, strong coding skills and commitment to research in the research areas mentioned. Prior research experiences are encouraged.

Acknowledgement

My work would not have been possible without the support from my family, friends, students and awesome collaborators! Check out some of my collaborators: Jure Leskovec, Christopher Ré, Pietro Liò Jiaxuan You, Marinka Zitnik, William Hamilton, Xiang Ren, Bowen Liu Peter Battaglia, Petar Veličković Ines Chami Hanjun Dai Matthias Fey and many more...

Organizations

Aside from university collaborations, I also collaborated with many industrial companies and non-profit organizations including Pinterest, Facebook AI Research, Siemens, DeepMind, Amazon, SLAC National Accelerator Library, Saudi Aramco and more.

Contact

Location

I'm currently located at New Haven, CT 06520.

Email

You could reach me via email. Show Email
I will try my best to respond if the schedule permits, unless I'm overwhelmed by emails.