• Rex (Zhitao) Ying

    I'm currently an assistant professor in the department of computer science at Yale University.

    My Lab
    Slide 1
  • Rex (Zhitao) Ying

    I work on AI and machine learning algorithms that leverage graph structure of data.

    Slide 2
  • Rex (Zhitao) Ying

    I'm excited to be a founding member of Kumo.ai!

    More About Kumo
    Slide 3

Welcome to my personal website

Refer to my Yale profile for more information. Our lab is hiring motivated and capable Ph.D students interested in geometric deep learning, graph neural nets, relational reasoning with foundation models, and trustworthy AI.

About Me


I'm an assistant professor in the Department of Computer Science at Yale University. My research focus includes algorithms for graph neural networks, geometric embeddings, explainable models, and more recently, multi-modal foundation models involving relational reasoning. I am the author of many widely used GNN algorithms such as GraphSAGE, PinSAGE and GNNExplainer. In addition, I have worked on a variety of applications of graph learning in physical simulations, social networks, knowledge graphs, neuroscience and biotechnology. I developed the first billion-scale graph embedding services at Pinterest, and the graph-based anomaly detection algorithm at Amazon.


I obtained my Ph.D degree in computer science at Stanford University, advised by Jure Leskovec. My thesis focuses on expressive, scalable and explainable GNNs (graph neural networks), which is available on Github. Prior to that, I graduated from Duke University in 2016 with the highest distinction. I majored in computer science and mathematics.

Prospective Students

We sincerely welcome interested students! Please refer to this section for more detail.

Dr. Rex (Zhitao) Ying

Assistant professor


  • It's our second year organizing the New Frontiers of Graph Learning Workshop at NeurIPS 2023. Submissions and attendance are welcomed!
  • I'm giving a talk at Stanford Graph Learning Workshop on "Deep learning for time series forecasting problems with relational information". Date: Oct 24th 2023.
  • I'm giving a seminar talk at Duke University on "Graph Learning for intelligent and efficient reasoning". Date: Oct 2nd 2023.
  • As KDD PhD Consortium Chair, I organized the 1-day PhD Consortium event at KDD 2023.
  • We had 4 papers published at ICML 2023 and KDD 2023.
  • I gave a keynote at AAAI 2023 Deep Learning on Graphs: Methods and Applications Workshop on "Graph learning for non-graph data".
  • Past News
    • 2022
      • I'm excited to be a founding engineer of Kumo.ai, where we work on building a cloud AI platform based on state-of-the-art GNNs and the PyG library.
      • I'm the winner of KDD 2022 Dissertationn Award, and will present the thesis at the KDD 2022 Conference.
      • I'm one of the 10 winners of the 2019 Baidu Scholarship in Artificial Intelligence.

Research Outline

My research focuses on deep learning for graphs and (implicit) relational data; as well as building multimodal foundation models with strong relational reasoning capability. I'm also very interested in geometric deep learning, non-Euclidean representation learning and trustworthy deep learning, all of which have great synergy with graph learning or foundation models. My interests span from theoretical questions in graph learning, relational reasoning, generalization, manifold geometry, to practical use cases in science and technology, collaborating with science institutes and IT companies.

I have collaborated with industrial research labs including DeepMind, Amazon, and national research labs such as Sandia. I have active collaborations with startups including Kumo.AI and Collov.

My list of publications can be found on Google Scholar

Deep Learning
for Graphs

Multimodal Relational
Foundation Models

Representation Learning


Graph neural networks (GNNs) are powerful tools that play an important role in machine learning, to make predictions based on ubiquitous graph and network structure.
It is crucial for pretrained foundation models to understand and make use of relational information to empower intelligent reasoning, leveraging graph representations.
The goal is to empower deep learning architectures with effective representation geometry, which is essential in modeling data manifolds with different characteristics.
I work on real-world applications in chemistry, biology, neuroscience, physical simulations, knowledge graphs, natural languages, recommendations and social networks.


I'm excited to serve the research community in various aspects. I co-lead the open-source project, PyTorch Geometric, which aims to make developing graph neural networks easy and accessible for researchers, engineers and general audience with a variety of background. I served as committee members for machine learning conferences including AAAI, ICML, NeurIPS, ICLR, KDD, WebConf for over 7 years, and I am serving as area chair for LoG 2022. In addition, I organized a variety of workshops on topics including graph learning, graph neural networks and deep learning for simulation.


I teach two courses: "Deep Learning for Graph-Structured Data" and "Trustworthy Deep Learning" at Yale.

Course Websites

My Past Workshops

  • New Frontiers in Graph Learning (GLFrontiers) at NeurIPS 2022 and NeurIPS 2023
  • Deep Learning for Simulation (SimDL) at ICLR 2021
  • Stanford Graph Learning Workshop (SGL)
  • Graph Representationn Learning and Beyond (GRL+) at ICML 2020
  • Co-organized the 2020 KDD Cup Competition on Graph AutoML.

Selected Publications

A few selected publications are listed for each research direction. See Google Scholar for a full list of publications.
Most of the algorithms developed are open-sourced as part of the PyTorch Geometric Library.

Deep Learning on Graphs

I focus on advancing graph neural network (GNN) architectures and improving the expressiveness, scalability, interpretability and robustness of GNNs. More recently, I focus on pre-trained, large-scale foundation models for graph-structured data.

Building on pre-trained GNN models, we explore the general strategy of automatically selecting auxiliary datasets to improve molecule prediction performance using meta-learning.

DeSCo is the state-of-the-art (pre-trained) GNN that can perform reliable subgraph counting predictions on real-world large networks, based on symmetry breaking and expressive GNN architecture.

KDD 2023

BatchSampler is a novel and general negative sampling scheme for self-supervised contrastive learning framework. It improves pre-trained model performance in the domains of images, natural language and graphs.

ICML 2022

LA-GNN is a general pre-training framework that improves GNN performance through augmentation.

ICLR 2021

GNNs can learn to execute graph algorithms.

AAAI 2021

ID-GNN improves the expressiveness of GNN by considering node identities.

NeurIPS 2017

GraphSAGE is a general GNN framework for large-scale graph learning.

ICML 2018

GraphRNN is one of the first graph generative models for learning distribution of graphs.

NeurIPS 2019

The first framework to explain predictions made by GNNs!

Representation Learning

I innovate in representation learning techniques and embedding geometry for embedding for data with different characteristics (hierarchical, heterogeneous etc.).

ICML 2023

HIE is a task-agnostic and model-agnostic method to advance existing hyperbolic embedding methods, by incorporating cost-free hierarchical information deduced from the hyperbolic distance of the node to origin.

KDD 2023

We learn interpretable representations that capture the document and text hierarchies through combining hyperbolic GNNs and topic modeling.

NeurIPS 2019

HGCN embeds nodes in a graph in hyperbolic space to capture hierarchical structure. It's one of the first hyperbolic GNNs.

NeurIPS 2021

ConE uses hyperbolic cones to model the heterogeneous hierarchies in knowledge graph.

NeurIPS 2021

We embed biological sequences in Euclidean and hyperbolic spaces for solving challenging problems such as multiple sequence alignment.

IEEE Data Engineering Bulletin 2017

A survey on graph representation learning, including distributed embedding approaches and GNNs.

NeurIPS 2018

GraphRNN is one of the first deep graph generative models for learning distribution of graphs.

ICML 2019

P-GNN improves the expressiveness of position information for node embeddings.


Natural phenomenon and world's knowledge can often be expressed with the language of graphs. In addition to the popular applications of graphs such as social networks, recommender systems, knowledge graphs, biological networks and molecules, I'm also interested in novel ways of incorporating relational reasoning to other fields of science and technology, such as physical simulations, natural language and industrial relational database predictions.

ICML 2020

We enable graph neural networks to learn to produce realistic simulations of different materials.

KDD 2022

We collaborate with Saudi Aramco to use machine learning for simulating oil and water flows, underground pressure and oil production.

NAACL 2021

Heterogeneous graph attention on dependency parsing trees can improve robustness and performance of aspect-level sentiment analysis in NLP.

KDD 2018

The first GNN-based recommender system applied to billion-user-scale industrial platforms. It is deployed at Pinterest and also serves as an embedding service for many downstream use cases.

NeurIPS 2018

Combine GNNs and reinforcement learning to generate realistic molecules with desired chemical and biological goals.

KDD 2021

Dynamic GNN for anomaly detection at Amazon.com via efficient message passing procedure and pre-training.

To Perspective Students

For perspective students, I appreciate reading the following before reaching out to me through email. To make it easier for me to identify the applications, use "PhD (or Postdoc, Visiting Student) Application" as your title.


When reaching out to me, it would be best to demonstrate the following in your email.
  • Early reachout and collaboration with my lab can be a very effective way to stand out from all the candidates.
  • Highly competitive PhD student applicants usually had abundant research experiences prior to the application. Note that the number of publications is not the crucial factor.
  • Student who had a single publication, but demonstrated outstanding ability (usually as a first author) in idea formulation, implementation, experiments, analysis etc. is considered more competitive than a student who participated in many research works but did not own / lead one from beginning to the end.
  • Publications in top-tier ML and data mining conferences such as NeurIPS, ICML, ICLR, KDD, WebConf etc. are highly encouraged. High-impact journal publications in interdisciplinary fields are also highly appreciated.
  • It is recommended that the field of your prior research is under the broad category of machine learning. However, the actual research topic does not need to be similar to mine, as long as the candidate demonstrates interests and understanding of the research topics of our lab. We welcome diversity at all levels, including skill sets!


I understand that there are students who do not yet have a conference publication yet, but are interested in applying. I recommend highly motivated students to reach out to me way earlier than the admission deadline, and join as a collaborator in existing projects, with the goal of a publication. I will be able to occasionally brainstorm, discuss and meet. Major progress, achievements and paper during the project can better help me advocate for the application.

Note (special focus):

  • If you are interested in applying for PhD program focusing on AI for neuroscience, explicitly mention it in the email when reaching out to me. I encourage you to check out WTI (computational track), which I'm a part of.
  • If you are interested in AI for computational biology, check out Yale CBB program, which I'm also a part of.


Postdoc candidates are encouraged to reach out to me as well.
  • Successful candidates usually have 3 or more solid and impactful publications in an area, and have a coherent and unified thesis on a specific topic, encompassing a number of works.
  • Similar to evaluating PhD applicants, I value paper quality over quantity.
  • Prior experiences in leading a large-scope project will be appreciated.
  • The candidates are required to have extensive research experiences in either foundation models, graph learning, trustworthy deep learning or relational reasoning.
After passing preliminary screening, The candidate will be asked to give a research talk (remote or in-person) to the group and talk to lab members, before making the decision.

Visiting Students

I welcome visiting students / internships at all levels. Students are required to demonstrate a strong interest and good background knowledge in graph learning. Prior research experiences are encouraged but not necessary.


My work would not have been possible without the support from my family, friends, students and awesome collaborators! Check out some of my collaborators: Jure Leskovec, Christopher Ré, Pietro Liò Jiaxuan You, Marinka Zitnik, William Hamilton, Xiang Ren, Bowen Liu Peter Battaglia, Petar Veličković Ines Chami Hanjun Dai Matthias Fey and many more...


Aside from university collaborations, I also collaborated with many industrial companies and non-profit organizations including Pinterest, Facebook AI Research, Siemens, DeepMind, Amazon, SLAC National Accelerator Library, Saudi Aramco and more.



I'm currently located at New Haven, CT 06520.


You could reach me via email. Show Email
I will try my best to respond if the schedule permits, unless I'm overwhelmed by emails.