Martin Jinye Zhang

This website is outdated. Please visit my new website.

My HSPH email jinyezhang at hsph.harvard.edu is outdated. You can reach me at martinzh at andrew.cmu.edu

[Github] [Google Scholar] [Twitter]

Hello! I am Martin, an assistant professor at the Computational Biology Department at CMU. I just ended my role as a research associate at Harvard School of Public Health working with Prof. Alkes Price on statistical genetics. Currently, I am interested in the following topics:

  • Integrating GWAS and single-cell RNA-seq data; and understanding how genetic variants impact gene expression levels and diseases in different cellular contexts.
  • Understanding common-and-rare-variant genetic architecture for complex diseases and traits; we are analyzing the UK Biobank WES data for this purpose.
  • Causal inference for multiomics data; distinguishing disease-causal genes from disease-responsive genes; and ultimately, understanding how genetic intervention would change the course of diseases.

I did my PhD at Stanford with Prof. David Tse and Prof. James Zou on statistics, machine learning, and computational biology. My PhD works span a wide range of topics from theory to algorithm design to applications. I have worked on covariate-adaptive multiple hypothesis testing, algorithm acceleration via multi-armed bandits, optimal design for single-cell RNA-seq experiments, and analyzing single-cell RNA-seq data sets.

Last updated: 8/13/2023

News:
  • 4/2021 Our paper on identifying aging signatures using the Tabula Muris Senis data was accepted by eLife.

Positions & Education

2023 - Present, Computational Biology Department, Carnegie Mellon University,

Incoming Assistant Professor (expected to start in fall 2023)

2019 - Present, T.H. Chan School of Public Health, Harvard University,

Research Associate (2022-present)

Postdoctoral Researcher (2019-22)

2014 - 2019, Department of Electrical Engineering, Stanford University,

Doctor of Philosophy (PhD, 2014-19) and Master of Science (MS, 2014-17)

2010 - 2014, Department of Electronic Engineering, Tsinghua University,

Bachelor of Engineering (B.Eng.)

Papers

(*equal contribution, #corresponding author)

Preprints

  • Age-dependent topic modelling of comorbidities in UK Biobank identifies disease subtypes with differential genetic risk. [paper]
    Xilin Jiang, Martin Jinye Zhang*, Yidong Zhang*, Michael Inouye, Chris Holmes, Alkes L. Price#, Gil McVean#.
    In revision at Nature Genetics (2022).

  • Faster Maximum Inner Product Search in High Dimensions. [paper]
    Mo Tiwari, Ryan Kang*, Je-Yong Lee*, Luke Lee, Chris Piech, Sebastian Thrun, Ilan Shomorony, Martin Jinye Zhang#.
    In submission (2022).


Published papers (as main author)

  • Polygenic enrichment distinguishes disease associations of individual cells in single-cell RNA-seq data. [paper] [code]
    Martin Jinye Zhang*, Kangcheng Hou*, Kushal K. Dey, Saori Sakaue, Karthik A. Jagadeesh, Kathryn Weinand, Aris Taychameekiatchai, Poorvi Rao, Angela Oliveira Pisco, James Zou, Bruce Wang, Michael Gandal, Soumya Raychaudhuri, Bogdan Pasaniuc#, Alkes Price#.
    Nature Genetics (2022).

  • Mouse Aging Cell Atlas Analysis Reveals Global and Cell Type Specific Aging Signatures. [paper] [code]
    Martin Jinye Zhang#, Angela Oliveira Pisco#, Spyros Darmanis, James Zou#.
    eLife (2021).

  • Determining sequencing depth in a single-cell RNA-seq experiment. [paper] [code]
    Martin J. Zhang*, Vasilis Ntranos*, David Tse.
    Nature Communications (2020). Selected as 2020 Top 50 Life and Biological Sciences Articles
    (Preliminary version: "One read per cell per gene is optimal for single-cell RNA-seq". [pdf])

  • Adaptive Monte Carlo Multiple Testing via Multi-Armed Bandits. [paper] [code]
    Martin J. Zhang, James Zou, David Tse.
    ICML (2019).

  • Fast and covariate-adaptive method amplifies detection power in large-scale multiple hypothesis testing. [paper] [software] [paper code]
    Martin J. Zhang, Fei Xia, James Zou.
    Nature Communications (2019). Preliminary version accepted as the Cell Systems best paper in RECOMB 2019 and received the RECOMB Best Paper Award
    (Preliminary version: "AdaFDR: a Fast, Powerful and Covariate-Adaptive Approach to Multiple Hypothesis Testing". [pdf])

  • Exploring Patterns Unique to a Dataset with Contrastive Principal Component Analysis. [paper] [code]
    Abubakar Abid*, Martin J. Zhang*, Vivek K. Bagaria, James Zou.
    Nature Communications (2018).

  • Medoids in Almost Linear Time via Multi-armed Bandits. [paper] [code]
    Vivek Bagaria*, Govinda Kamath*, Vasilis Ntranos*, Martin J. Zhang*, David Tse.
    AISTATS (2018).

  • NeuralFDR: learning decision threshold from hypothesis features. [paper] [code]
    Fei Xia*, Martin J. Zhang*, James Zou, David Tse.
    NeurIPS (2017).

  • Block-wise MAP Inference for the Determinantal Point Processes with Application to Change Point Detection. [paper]
    Martin J. Zhang, Zhijian Ou.
    SSP (2016).

  • On the Theoretical Analysis of Cross Validation in Compressive Sensing. [paper]
    Jinye Zhang, Laming Chen, Petros T. Boufounos, and Yuantao Gu.
    ICASSP (2014).


Published papers (as advisor)

  • MABSplit: Faster Forest Training Using Multi-Armed Bandits.
    Mo Tiwari, Ryan Kang*, Jaeyong Lee*, Christopher J Piech#, Ilan Shomorony#, Sebastian Thrun#, Martin Jinye Zhang#.
    NeurIPS (2022).

  • MLDemon: Deployment Monitoring for Machine Learning Systems. [pdf]
    Antonio Ginart, Martin Jinye Zhang, James Zou.
    AISTATS (2022).

  • Bandit-PAM: Almost Linear Time k-Medoids Clustering via Multi-Armed Bandits. [code]
    Mo Tiwari, Martin Jinye Zhang, James Mayclin, Sebastian Thrun, Chris Piech, Ilan Shomorony.
    NeurIPS (2020).


Selected published papers (as contributing author)

  • A multi-dimensional integrative scoring framework for predicting functional variants in the human genome.
    Xihao Li, Godwin Yung, Hufeng Zhou, Ryan Sun, Zilin Li, Kangcheng Hou, Martin Jinye Zhang, Yaowu Liu, Theodore Arapoglou, Chen Wang, Iuliana Ionita-Laza, Xihong Lin.
    The American Journal of Human Genetics (2022).

  • Deep longitudinal multiomics profiling reveals two biological seasonal patterns in California.
    M Reza Sailani, Ahmed A Metwally, Wenyu Zhou, Sophia Miryam Schüssler-Fiorenza Rose, Sara Ahadi, Kevin Contrepois, Tejaswini Mishra, Martin Jinye Zhang, Łukasz Kidziński, Theodore J Chu, Michael P Snyder.
    Nature Communications (2020).

  • A single-cell transcriptomic atlas characterizes ageing tissues in the mouse. [code]
    The Tabula Muris Consortium.
    Nature (2020). Contributed to differential expression analysis (Fig. 2f-h) and cluster diversity score (Fig. 4c-f).

  • Polymicrobial periodontal disease triggers a wide radius of effect and unique virome.
    Li Gao, Misun Kang, Martin Jinye Zhang, M. Reza Sailani, Ryutaro Kuraji, April Martinez, Changchang Ye, Pachiyappan Kamarajan, Charles Le, Ling Zhan, Hélène Rangé, Sunita P. Ho, Yvonne L. Kapila.
    npj Biofilms and Microbiomes (2020).

  • Longitudinal multi-omics of host–microbe dynamics in prediabetes.
    Wenyu Zhou*, M. Reza Sailani*, Kévin Contrepois*, Yanjiao Zhou*, Sara Ahadi*, Shana Leopold, Martin J. Zhang, ..., George M. Weinstock, Michael Snyder.
    Nature (2019). Contributed 3 panels in 2 figures.


Softwares

  • scdrs: single-cell disease-relevance score.

  • adafdr: covariate-adaptive multiple testing.

  • sceb: sequencing-depth aware estimators for single-cell RNA-seq analysis via empirical Bayes.

  • Meddit: an almost linear algorithm for computing the medoid for a set of n points via adaptive sampling.

  • contrastive: a python library for performing unsupervised machine learning on datasets with learning (e.g. PCA) in contrastive settings, where one is interested in patterns (e.g. clusters or clines) that exist one dataset, but not the other.

Professional services

  • Reviewer for journals Nature Genetics, Nature Communications, BMC Biology, Bioinformatics, Journal of Machine Learning Research, Journal of Applied Statistics, Biometrics, Scientific Reports, Journal of Genetics and Genomics and conferences IJCAI (2021-22), ICML (2020-22), NeurIPS (2016, 2019-22), ICLR 2021.
  • Organizer of the Information Systems Laboratory Colloquium, 2015-2019, EE, Stanford.

Honors and Awards

  • 2022 Platform talk for the ASHG 2022 abstract "Cell-type transcriptome-wide association studies and fine-mapping via deconvolution using single-cell RNA-seq".
  • 2021 Postdoctoral Semifinalist for the 2021 Charles J. Epstein Trainee Awards for Excellence in Human Genetics Research for the 71st Annual Meeting of the American Society of Human Genetics (info)
  • 2021 Reviewers’ Choice Award for the ASHG 2021 abstract "Polygenic enrichment distinguishes disease associations of individual cells in single-cell RNA-seq data".
  • 2021 Reviewers’ Choice Award for the ASHG 2021 abstract "Transcriptome-wide association studies and fine-mapping at cell-type resolution".
  • 2020 Nature Communications 2020 Top 50 Life and Biological Sciences Articles for the paper "Determining sequencing depth in a single-cell RNA-seq experiment".
  • 2019 RECOMB 2019 best paper award
  • 2019 RECOMB 2019 travel award
  • 2017 NeurIPS 2017 travel award
  • 2015 Stanford Graduate Fellowship (SGF, Inventec Fellow)
  • 2015 Numerical Technologies Award in Electrical Engineering (Numerical Technologies Founders Graduate Fellowship)
  • 2015 Ranked 2/79 in the EE PhD Qualifying Exam at Stanford University
  • 2014 Outstanding Undergraduate Thesis "Speech Diarization Based on the Determinantal Point Processes" at Tsinghua University
  • 2013 Comprehensive Excellence Scholarship in Electronic Engineering at Tsinghua University

Teaching Experiences

  • TA, EE 278: Introduction to Statistical Signal Processing (Spring 2017)

Volunteering Experiences

  • Small farmer's big gamble: investigation of vegetable supply chain from Dingzhou to Beijing (2011) [pdf]