ICML Self-Supervised Representation Learning for Microbiome Improves Downstream Prediction in Data-Limited Settings and Cross-Cohort Generalizability

Poster
in
Workshop: 2nd Generative AI for Biology Workshop

Self-Supervised Representation Learning for Microbiome Improves Downstream Prediction in Data-Limited Settings and Cross-Cohort Generalizability

Liron Zahavi · Zachary Levine · Eran Segal

Keywords: [ representation learning ] [ limited data ] [ cross-domain transfer ] [ masked autoencoders ] [ cross-cohort generalization ] [ metagenomic data ] [ microbiome ] [ biological data ] [ self-supervised learning ]

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

The gut microbiome plays a crucial role in human health, but machine learning applications face significant challenges due to limited data availability, high dimensionality, and batch effects across cohorts. We developed self-supervised representation learning methods for gut microbiome metagenomic data by implementing multiple approaches on 85,364 samples, including masked autoencoders and novel cross-domain adaptation of single-cell RNA sequencing models. Systematic benchmarking against the standard practice in microbiome machine learning demonstrated significant advantages of our learned representations in limited-data scenarios, improving prediction for age (r = 0.14 vs. 0.06), Body Mass Index (r = 0.16 vs. 0.11), and drug usage (PR-AUC = 0.81 vs. 0.73). Cross-cohort generalization was enhanced by up to 81/%, addressing transferability challenges across different populations and technical protocols. Our approach provides a valuable framework for overcoming data limitations in microbiome research, with particular potential for the many clinical and intervention studies that operate with small cohorts.

Chat is not available.

Poster in Workshop: 2nd Generative AI for Biology Workshop

Self-Supervised Representation Learning for Microbiome Improves Downstream Prediction in Data-Limited Settings and Cross-Cohort Generalizability

Liron Zahavi · Zachary Levine · Eran Segal

Poster
in
Workshop: 2nd Generative AI for Biology Workshop