Xinyi Zhang will develop machine learning models that integrate multimodal and spatiotemporal data to achieve a holistic understanding of cell states, tissue microenvironments, and perturbation effects.
Xinyi Zhang - Starting Principal Investigator
Xinyi Zhang = Curious + Adventurous + Collaborative + Nature-Loving + Pilot
Xinyi Zhang will develop machine learning models to both advance the understanding of biological mechanisms and facilitate the discovery of therapeutic targets in diseases, building on theoretical and empirical advances in machine learning and causal inference.
Xinyi studied Bioengineering and Computer Science at the University of California, Berkeley, and earned her PhD in Computer Science at the Massachusetts Institute of Technology, where she was advised by Prof. Caroline Uhler and supported by a graduate fellowship from the Eric and Wendy Schmidt Center at the Broad Institute.
During her PhD, she developed computational frameworks that integrate diverse modalities—including spatial transcriptomics, chromatin and protein staining—to achieve a comprehensive view of cell state and tissue organization in diseases. Her work identified region-specific progression patterns, chromatin biomarkers, and gene expression changes in Alzheimer’s disease, and revealed shared spatial re-organization across multiple neurodegenerative disorders. To scale to large clinical cohorts, she developed representation learning methods that extract rich morphological information from simple and cost-effective imaging assays. Her models also enable prediction of missing modalities, such as the subcellular localization of unmeasured proteins at single-cell resolution. Most recently, she has worked on disentangling the shared and modality-specific information across multiple modalities to better understand the underlying regulatory mechanisms and inform experimental design.
Orchid number: https://orcid.org/0000-0003-4996-4698
LinkedIn: https://www.linkedin.com/in/zhang-xinyi/
Github: https://xinyiz98.github.io/
ML for Cell and Tissue Biology: From Multimodal Integration to Biomarker and Function
The group of Xinyi Zhang will develop machine learning models that integrate multimodal and spatiotemporal data to achieve a holistic understanding of cell states, tissue microenvironments, and perturbation effects. Her goal is to gain mechanistic insights into cellular and tissue regulation across scales— from protein localization and interaction in single cells to cell fate decisions in organoids and tissues. By modeling the cellular dynamics and interactions in the tissue context, the team aims to enable virtual profiling of genetic and chemical perturbations to identify potential therapeutic targets for disease-associated changes in protein localization, cell states, and tissue architecture.
1. Tissue-specific protein localization and interaction. Protein-protein interactions and protein localization are essential to many biological processes and are tightly regulated by cell and tissue states. However, current experimental approaches are limited in their ability to measure these properties at single-cell resolution and within tissue. The Zhang group will develop computational models that predicts protein localization and interactions with single-cell and tissue specificity. These models enable predictions of how disease-associated genetic mutations or changes in cell state alter protein localization and interactions, ultimately supporting therapeutic discovery.
2. Modeling the dynamics and interactions in tissue microenvironment to study cell fate. The Zhang group develops computational frameworks to model the tissue microenvironment and study how genetic and chemical perturbations influence cell states in tissue over time. While pooled perturbation screens with spatial transcriptomic or multiplexed imaging readouts and the development of organoid systems offer exciting opportunities, new computational methods are needed to model tissue dynamics and learn how cellular neighborhoods and tissue architecture affect perturbation outcomes. By integrating perturbation modeling with temporal dynamics, feature learning, physical principles, and disentanglement of multimodal information, the goal is to understand the interplay between the molecular and mechanical signaling underlying cell fate decisions in tissue. This understanding could enable virtual profiling of gene expression, morphology, and molecular phenotypes under unseen conditions. The approaches are applicable across developmental and disease contexts and may ultimately guide the design of perturbations to restore both pathological cell states and tissue organization.
3. Clinical applications in metabolic disease, cancer, and neurodegeneration. The methods are designed to be broadly applicable to large-scale patient and drug-screening datasets. The Zhang group aims to extend this to study the effect of patient-specific genetic variants on cell state using imaging, spatial omics, and histopathology data, which could enable functional interpretation of risk variants in metabolic disease, cancer, neurodegeneration. By developing robust, interpretable, and generalizable models, the goal is to link mechanistic insights of cellular regulation to therapeutic target discovery.
Publication Highlights
Prediction of protein subcellular localization in single cells. Zhang X, Tseo Y, Bai Y, Chen F & Uhler C. Nat Methods. 2025 Jun;22(6):1265-1275. doi: 10.1038/s41592-025-02696-1. https://www.nature.com/articles/s41592-025-02696-1
Partially Shared Multi-Modal Embedding Learns Holistic Representation of Cell State. Zhang X, Shivashankar GV, Uhler C. https://www.biorxiv.org/content/10.1101/2024.10.01.615977v1
Graph-based autoencoder integrates spatial transcriptomics with chromatin images and identifies joint biomarkers for Alzheimer's disease. Zhang X, Wang X, Shivashankar GV, Uhler C. Nat Comm. 2022 Dec 3;13(1):7480. doi: 10.1038/s41467-022-35233-1. https://www.nature.com/articles/s41467-022-35233-1