wQFMSpark: Species Tree Estimation Using wQFM in a Distributed System

Tags: Bioinformatics · Distributed Systems

Species tree estimation from gene trees is crucial in phylogenetics. Quartet-based techniques like ASTRAL, QMC, and wQFM are widely used, but struggle with scalability on large datasets. This project redesigns wQFM for distributed execution using Apache Spark, analyzing the scalability and performance gains on large-scale phylogenomic inputs.

Report · Code on GitHub