Researchers have developed TranscriptFormer, a generative cell atlas that integrates gene expression data from a wide range of organisms, covering 1.5 billion years of evolution. This model allows for the comparison of cell types across distant species, identifying similarities and divergences in their genetic programs. TranscriptFormer's ability to map cellular evolution on this unprecedented scale opens new avenues for understanding the fundamental principles governing cell diversity and function.
The study addresses the challenge of comparing homologous cell types in organisms with highly divergent genomes, a task traditionally complex due to sequence divergence. TranscriptFormer overcomes this by focusing on gene expression patterns, enabling the identification of conserved cell types and the reconstruction of evolutionary trajectories. This is crucial for understanding how biological complexity has emerged over time, from single-celled to complex multicellular organisms.
The methodology is based on a generative model that learns common features of cellular gene expression, regardless of species. By training the model with a vast collection of single-cell transcriptomics data from diverse species, TranscriptFormer can infer relationships between cell types that are not apparent through direct sequence analysis. This computational approach represents a significant advance in comparative biology and evo-devo (evolutionary developmental biology).
The results of this generative atlas not only provide a detailed view of cell type conservation and evolution but also offer a predictive tool for identifying new cell types or inferring their properties in understudied species. The implications range from a better understanding of human diseases, by identifying cellular homologs in animal models, to tissue engineering and biotechnology, by unraveling the genetic programs that define cell identity.