The microscopic organisms that inhabit our bodies, soils, oceans, and atmosphere are pivotal to human health and the planet’s ecosystems. Despite advances in DNA sequencing, identifying these microbes and understanding their interrelationships has remained a complex challenge. However, researchers at Arizona State University (ASU) have introduced groundbreaking tools that promise to revolutionize this field, making the process more efficient, accurate, and scalable.
In two recent studies, ASU scientists unveiled innovations that enhance the construction of microbial family trees and provide a robust software foundation for global biological data analysis. These advancements are set to bolster microbiome research, disease tracking, environmental monitoring, and emerging fields like precision medicine.
“Our team builds open-source software tools because we believe that when everyone can access and extend scientific tools, the entire community benefits and discovery accelerates,” said Qiyun Zhu, lead researcher of the new studies.
Revolutionizing Microbial Family Trees
Building detailed and accurate evolutionary trees is crucial for understanding microbial evolution and their influence on the world. Enhanced evolutionary trees improve disease tracking, allowing scientists to monitor how harmful microbes evolve over time. They also refine environmental research, illustrating how microbial communities react to pollution or climate changes, and enhance studies of the gut microbiome’s role in health.
Identifying microbial relationships begins with selecting the right marker genes—DNA signposts that trace evolutionary history. Traditionally, scientists have relied on a limited set of marker genes. However, with the rise of metagenomics, researchers now handle millions of genomes, often sourced directly from environmental samples. This technique enables scientists to capture all DNA in an environment and sequence it simultaneously, uncovering entire hidden microbial communities.
These genomes, while valuable, are frequently incomplete or of uneven quality, complicating the use of a fixed set of marker genes for accurate evolutionary results. To address this, Zhu and his team developed TMarSel (Tree-based Marker Selection). TMarSel automates the search through thousands of potential gene families, selecting combinations that construct the most reliable evolutionary trees. It assesses each gene’s prevalence, informativeness, and contribution to a stable, meaningful depiction of microbial relationships.
The result is a flexible, data-driven approach to building microbial trees that perform well even for large and diverse organism groups, regardless of genome completeness.
Scikit-bio: A Comprehensive Tool for Microbial Analysis
Zhu is also a leading developer of scikit-bio, an expansive open-source software library. Scikit-bio equips scientists with tools to analyze vast biological datasets, particularly useful for studying microbiomes—microbial communities in specific environments, such as the human gut.
Biological datasets are unique: they are massive, sparse, and often contain thousands of interconnected features. Standard data-analysis software is ill-suited for this complexity. Scikit-bio bridges this gap, offering over 500 functions for tasks including:
- Comparing microbial communities
- Calculating diversity
- Transforming compositional data
- Analyzing DNA, RNA, and protein sequences
- Building and modifying phylogenetic trees
- Preparing data for machine learning
The project is community-driven, supported by over 80 contributors, and maintained with rigorous testing and documentation. It has been cited in tens of thousands of scientific papers across fields like medicine, ecology, climate science, and cancer biology, becoming an indispensable tool for researchers analyzing the microbiome and other data-rich areas of modern biology.
A New Era in Microbial Research
As biological datasets expand, tools like scikit-bio and TMarSel enhance the reliability and reproducibility of large-scale research. These studies underscore ASU’s growing role at the intersection of biology and computation. Zhu’s work exemplifies how integrating evolutionary insight with advanced software engineering can yield tools utilized by scientists worldwide.
With DNA sequencing becoming faster and more affordable, scientists are poised to uncover even more of the microbial universe. Tools like TMarSel and scikit-bio ensure that this data deluge can be transformed into meaningful scientific insights, paving the way for breakthroughs in understanding the microbial world.