Rex (Zhitao) Ying

I'm currently an assistant professor in the department of computer science at Yale University.
My Lab
Rex (Zhitao) Ying

I work on AI and machine learning algorithms that can reason and understand large-scale structured data, including time series, databases, tables and graphs.
Details
Rex (Zhitao) Ying

I am an Amazon Scholar working on building multimodal foundation models for Last Mile.
Details
Rex (Zhitao) Ying

From 2021 to 2023, I was a founding member of Kumo.ai!
More About Kumo

Welcome to my personal website

To find my latest research including multimodal foundation models, reasoning / post-training with structured data, graph learning, time series, RAG and geometric deep learning, please visit my Lab Website.

Refer to my Yale profile for more information. Our lab is hiring motivated and capable Ph.D students interested in large-scale geometric deep learning, multimodal foundation models structured-data reasoning and self-evolving agents.

About Me

Introduction

I'm an assistant professor in the Department of Computer Science at Yale University. My research focus includes algorithms for structured data (graphs, time series, tables...), geometric deep learning (hyperbolic foundation models), multi-modal foundation models with reasoning / post-training. I am the author of many widely used GNN algorithms such as GraphSAGE, PinSAGE and GNNExplainer. In addition, I have worked on a variety of applications in AI for science (physical simulations, molecules, single-cell, spatial transcriptomics, material discovery), recommender systems and social networks, knowledge graphs and neuroscience.

Education

I obtained my Ph.D degree in computer science at Stanford University, advised by Jure Leskovec. My thesis focuses on expressive, scalable and explainable GNNs (graph neural networks), which is available on Github. Prior to that, I graduated from Duke University in 2016 with the highest distinction. I majored in computer science and mathematics.

Prospective Students

We sincerely welcome interested students! Please refer to this section for more detail.

Dr. Rex (Zhitao) Ying

Assistant professor

News

I organized a workshop on Non-Euclidean Foundation Models at NeurIPS 2025
I gave a tutorial on Hyperbolic LLMs at KDD 2025
I gave a keynote on Hyperbolic Foundation Models at Applied Geometry for Data Science Workshop 2024
I gave a tutorial on Machine Learning in Network Science at NetSci 2024
I gave a tutorial on Text-Attributed Graph Representation Learning: Methods, Applications, and Challenges at WebConf 2024
I gave a keynote on Foundation Models and Geometry for Science via Relational Reasoning at WebConf 2024 Graph Foundation Models Workshop
I gave a seminar talk on Self-supervised learning and foundation models at University of Taxas, Rio Grande Valley. Date: April 25, 2024
I'm awarded the Amazon Research Award 2024
I gave a keynote at the GNNs for the Sciences: from Theory to Practice Workshop, University of Chicago. Date: Jan 25, 2024
I gave a talk on multimodal graph models at AWS. Date: Jan 11, 2024
Past News

2023

It's our second year organizing the New Frontiers of Graph Learning Workshop at NeurIPS 2023. Submissions and attendance are welcomed!
I'm giving a talk at Stanford Graph Learning Workshop on "Deep learning for time series forecasting problems with relational information". Date: Oct 24th 2023.
I'm giving a seminar talk at Duke University on "Graph Learning for intelligent and efficient reasoning". Date: Oct 2nd 2023.
As KDD PhD Consortium Chair, I organized the 1-day PhD Consortium event at KDD 2023.
We had 4 papers published at ICML 2023 and KDD 2023.
I gave a keynote at AAAI 2023 Deep Learning on Graphs: Methods and Applications Workshop on "Graph learning for non-graph data".

2022

I'm excited to be a founding engineer of Kumo.ai, where we work on building a cloud AI platform based on state-of-the-art GNNs and the PyG library.
I'm the winner of KDD 2022 Dissertationn Award, and will present the thesis at the KDD 2022 Conference.
I'm one of the 10 winners of the 2019 Baidu Scholarship in Artificial Intelligence.

To Perspective Students

For perspective students, I appreciate reading the following before reaching out to me through email. To make it easier for me to identify the applications, use "PhD (or Postdoc, Visiting Student) Application" as your title. Due to the abundance of application emails, I might not be able to always respond to the email. But if you believe that you possess the credentials and quality mentioned below, feel free to remind me if you have not received a response after a week.

Note (special focus):

We have industrial collaboration opportunities to work on time series foundation models, RAG, recommender systems, LLM personalization etc.
If you are interested in applying for PhD program focusing on AI for neuroscience, explicitly mention it in the email when reaching out to me. I encourage you to check out WTI (computational track), which I'm a part of.
If you are interested in AI for computational biology, check out Yale CBB program, which I'm also a part of.

PhDs

When reaching out to me, it would be best to demonstrate the following in your email.

Reaching out early and ask about opportunity for collaboration with my lab can be a very effective way to stand out from all the candidates.
Highly competitive PhD student applicants usually had abundant research experiences prior to the application. Note that the number of publications is not the crucial factor, but quality, novelty and potential impact of the research are. Please emphasize your technical innovation and leadership as demonstrated in your past research experiences.
A student who had a single top-tier publication, but demonstrated outstanding ability (usually as a first author) in idea formulation, implementation, experiments, analysis and writing is considered more competitive than a student who participated in many research works but did not own / lead one from beginning to the end.
Publications in top-tier ML and data mining conferences such as NeurIPS, ICML, ICLR, KDD, WebConf etc. are highly encouraged. High-impact journal publications in interdisciplinary fields are also highly appreciated.
It is recommended that the field of your prior research is under the broad category of machine learning. However, the actual research topic does not need to be similar to mine, as long as the candidate demonstrates interests and understanding of the research topics of our lab, and has demonstrated the good quality as mentioned above. We welcome diversity at all levels, including skill sets!

Note:

I understand that while most applicants have prior research experiences and paper in submission, some of the students do not yet have a top conference publication yet. Please refer to the funded two-year master program (which can be a bridge to being a PhD candidate in my lab). I recommend highly motivated students to reach out to me way earlier than the admission deadline, and join as a collaborator in existing projects, with the goal of a publication. I will be able to occasionally brainstorm, discuss and meet. Major progress, achievements and paper during the project can better help me advocate for the application.

Master Students

I am part of the Yale Computer Science Master Advising Committee. Master students are encouraged to apply only through the school application portal. For students interested in 2-year funded master program (which can be an effective way to become a PhD student in my lab if the research went well), please refer to the funded 2-year program here If you are already admitted by a program at Yale and are interested in doing research with me, feel free to send me an email for further discussions.

Postdocs

Postdoc candidates are encouraged to reach out to me as well. Our lab hires, on average, 1 postdoc every 2 years.

Successful candidates usually have 3 or more solid and highly impactful publications in an area, and have a coherent and unified thesis on a specific topic, encompassing a number of works.
Similar to evaluating PhD applicants, I value paper quality over quantity. The standard will be higher for postdoc candidates.
Prior experiences in leading a team of researchers on a large-scope project will be appreciated.
The candidates are required to have extensive research experiences in either foundation models, multimodal models, graph learning, trustworthy deep learning or relational reasoning.

After passing preliminary screening, The candidate will be asked to give a research talk (remote or in-person) to the group and talk to lab members, before receiving a decision.

Visiting Students

I welcome visiting students / internships at all levels. The duration can be somewhat flexible, although the student is required to be committed towards finishing a research project for publication (as first author or co-author). Students are required to demonstrate a strong interest, good background knowledge, strong coding skills and commitment to research in the research areas mentioned. Prior research experiences are encouraged.

Research Outline

My research focuses on deep learning for graphs and (implicit) relational data; as well as building multimodal foundation models with strong relational reasoning capability. I'm also very interested in geometric deep learning, non-Euclidean representation learning and trustworthy deep learning, all of which have great synergy with graph learning or foundation models. My interests span from theoretical questions in graph learning, relational reasoning, generalization, manifold geometry, to practical use cases in science and technology, collaborating with science institutes and IT companies.

I have collaborated with industrial research labs including DeepMind, Amazon, Snap, and national research labs such as Sandia. I have active collaborations with startups including Kumo.AI and Collov.

My list of publications can be found on Google Scholar

Deep Learning
for Graphs

Multimodal Relational
Foundation Models

Geometric
Representation Learning

Unlimited
Applications

Graph neural networks (GNNs) are powerful tools that play an important role in machine learning, to make predictions based on ubiquitous graph and network structure.

It is crucial for pretrained foundation models to understand and make use of relational information to empower intelligent reasoning, leveraging graph representations.

The goal is to empower deep learning architectures with effective representation geometry, which is essential in modeling data manifolds with different characteristics.

I work on real-world applications in chemistry, biology, neuroscience, physical simulations, knowledge graphs, natural languages, recommendations and social networks.

Services

I'm excited to serve the research community in various aspects. I co-lead the open-source project, PyTorch Geometric, which aims to make developing graph neural networks easy and accessible for researchers, engineers and general audience with a variety of background. I served as committee members for machine learning conferences including AAAI, ICML, NeurIPS, ICLR, KDD, WebConf for over 7 years, and I am serving as area chair for LoG 2022. In addition, I organized a variety of workshops on topics including graph learning, graph neural networks and deep learning for simulation.

Teaching

I teach two courses: "Deep Learning for Graph-Structured Data" and "Trustworthy Deep Learning" at Yale.

Course Websites

Deep Learning for Graph-Structured Data

My Past Workshops

New Frontiers in Graph Learning (GLFrontiers) at NeurIPS 2022 and NeurIPS 2023
Deep Learning for Simulation (SimDL) at ICLR 2021
Stanford Graph Learning Workshop (SGL)
Graph Representationn Learning and Beyond (GRL+) at ICML 2020
Co-organized the 2020 KDD Cup Competition on Graph AutoML.

Selected Publications

Refer to the My Lab Website for the latest and most exciting works on multimodal foundation models, time series graph learning, reasoning, and applications.

A few selected publications are listed for each research direction.

See Google Scholar for a full list of publications.
I have also been a core developer of PyTorch Geometric Library and integrated many models withint the library itself.

Deep Learning on Graphs

I focus on advancing graph neural network (GNN) architectures and improving the expressiveness, scalability, interpretability and robustness of GNNs. More recently, I focus on pre-trained, large-scale foundation models for graph-structured data.

Learning to Group Auxiliary Datasets for Molecule

DeSCo: Towards Generalizable and Scalable Deep Subgraph Counting

BatchSampler: Sampling Mini-Batches for Contrastive Learning in Vision, Language, and Graphs

Building on pre-trained GNN models, we explore the general strategy of automatically selecting auxiliary datasets to improve molecule prediction performance using meta-learning.

DeSCo is the state-of-the-art (pre-trained) GNN that can perform reliable subgraph counting predictions on real-world large networks, based on symmetry breaking and expressive GNN architecture.

KDD 2023

BatchSampler is a novel and general negative sampling scheme for self-supervised contrastive learning framework. It improves pre-trained model performance in the domains of images, natural language and graphs.

Local Augmentation for Graph Neural Networks

Neural Execution of Graph Algorithms

Identity-aware Graph Neural Networks

ICML 2022

LA-GNN is a general pre-training framework that improves GNN performance through augmentation.

ICLR 2021

GNNs can learn to execute graph algorithms.

AAAI 2021

ID-GNN improves the expressiveness of GNN by considering node identities.

Inductive Representation Learning on Large Graphs

GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models

GNNExplainer: Generating Explanations for Graph Neural Networks

NeurIPS 2017

GraphSAGE is a general GNN framework for large-scale graph learning.

ICML 2018

GraphRNN is one of the first graph generative models for learning distribution of graphs.

NeurIPS 2019

The first framework to explain predictions made by GNNs!

Graph Learning

Representation Learning

I innovate in representation learning techniques and embedding geometry for embedding for data with different characteristics (hierarchical, heterogeneous etc.).

Hyperbolic Representation Learning: Revisiting and Advancing

Hyperbolic Graph Topic Modeling Network with Continuously Updated Topic Tree

ICML 2023

HIE is a task-agnostic and model-agnostic method to advance existing hyperbolic embedding methods, by incorporating cost-free hierarchical information deduced from the hyperbolic distance of the node to origin.

KDD 2023

We learn interpretable representations that capture the document and text hierarchies through combining hyperbolic GNNs and topic modeling.

Hyperbolic Graph Convolutional Neural Networks

Modeling Heterogeneous Hierarchies with Relation-specific Hyperbolic Cones

Neural Distance Embeddings for Biological Sequences

NeurIPS 2019

HGCN embeds nodes in a graph in hyperbolic space to capture hierarchical structure. It's one of the first hyperbolic GNNs.

NeurIPS 2021

ConE uses hyperbolic cones to model the heterogeneous hierarchies in knowledge graph.

NeurIPS 2021

We embed biological sequences in Euclidean and hyperbolic spaces for solving challenging problems such as multiple sequence alignment.

Representation Learning on Graphs: Methods and Applications

Hierarchical Graph Representation Learning with Differentiable Pooling

Position-aware Graph Neural Networks

IEEE Data Engineering Bulletin 2017

A survey on graph representation learning, including distributed embedding approaches and GNNs.

NeurIPS 2018

GraphRNN is one of the first deep graph generative models for learning distribution of graphs.

ICML 2019

P-GNN improves the expressiveness of position information for node embeddings.

Representation

Applications

Natural phenomenon and world's knowledge can often be expressed with the language of graphs. In addition to the popular applications of graphs such as social networks, recommender systems, knowledge graphs, biological networks and molecules, I'm also interested in novel ways of incorporating relational reasoning to other fields of science and technology, such as physical simulations, natural language and industrial relational database predictions.

Learning to Simulate Complex Physics with Graph Networks

Learning Large-scale Subsurface Simulations with a Hybrid Graph Network Simulator

Graph Ensemble Learning over Multiple Dependency Trees for Aspect-level Sentiment Classification

ICML 2020

We enable graph neural networks to learn to produce realistic simulations of different materials.

KDD 2022

We collaborate with Saudi Aramco to use machine learning for simulating oil and water flows, underground pressure and oil production.

NAACL 2021

Heterogeneous graph attention on dependency parsing trees can improve robustness and performance of aspect-level sentiment analysis in NLP.

Graph Convolutional Neural Networks for Web-Scale Recommender Systems

Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation

Bipartite Dynamic Representations for Abuse Detection

KDD 2018

The first GNN-based recommender system applied to billion-user-scale industrial platforms. It is deployed at Pinterest and also serves as an embedding service for many downstream use cases.

NeurIPS 2018

Combine GNNs and reinforcement learning to generate realistic molecules with desired chemical and biological goals.

KDD 2021

Dynamic GNN for anomaly detection at Amazon.com via efficient message passing procedure and pre-training.

Applications

Acknowledgement

My work would not have been possible without the support from my family, friends, students and awesome collaborators! Check out some of my collaborators: Jure Leskovec, Christopher Ré, Pietro Liò Jiaxuan You, Marinka Zitnik, William Hamilton, Xiang Ren, Bowen Liu Peter Battaglia, Petar Veličković Ines Chami Hanjun Dai Matthias Fey and many more...

Organizations

Aside from university collaborations, I also collaborated with many industrial companies and non-profit organizations including Pinterest, Facebook AI Research, Siemens, DeepMind, Amazon, SLAC National Accelerator Library, Saudi Aramco and more.

Contact

Location

I'm currently located at New Haven, CT 06520.

Email

You could reach me via email. Show Email
I will try my best to respond if the schedule permits, unless I'm overwhelmed by emails.

Rex Ying

Rex (Zhitao) Ying

Rex (Zhitao) Ying

Rex (Zhitao) Ying

Rex (Zhitao) Ying

Welcome to my personal website

About Me

Introduction

Education

Prospective Students

Dr. Rex (Zhitao) Ying

News

To Perspective Students

Note (special focus):

PhDs

Note:

Master Students

Postdocs

Visiting Students

Research Outline

Deep Learning for Graphs

Multimodal Relational Foundation Models

GeometricRepresentation Learning

UnlimitedApplications

Services

Teaching

Course Websites

My Past Workshops

Selected Publications

Deep Learning on Graphs

Representation Learning

Applications

Acknowledgement

Organizations

Contact

Location

Email

Deep Learning
for Graphs

Multimodal Relational
Foundation Models

Geometric
Representation Learning

Unlimited
Applications