Mutational Biases in Ascomycota (DSI-SRP)
This DSI-SRP fellowship funded Qianhui Zheng to work in the laboratory of Antonis Rokas in Biological Sciences during the summer of 2021. Qianhui is a senior with majors in Computer Science and Biochemistry.
Mutations are a major source of genetic variability and novelty, providing the fuel for natural selection. Mutations are random with respect to the needs of organisms, but their occurrence in genomes is known to be influenced by certain biases. Understanding these mutational biases is key for understanding how evolution works and for predicting the consequences of increases in mutation rates. Previous studies revealed a mutational bias towards A and T nucleotide bases in certain groups of organisms, such as bacteria. However, the question of what mutational biases exist in fungal genomes remains poorly understood. Zheng’s project examined mutational biases in fungal genomes to: a) test whether fungal genomes experience mutational bias toward AT bases, and b) investigate whether other types of mutational bias. During her last DSI-SRP experience, Zheng developed a pipeline for collecting and analyzing complex genome data for 30 species, including over 500 strains in Ascomycota. Her results (distribution of single nucleotide polymorphisms, transition over transversion ratio, measure of directional mutational bias, etc.) showed that while there is no substantial AT mutational bias in most of the species in Ascomycota, Hanseniaspora uvarum and Candida auris do demonstrate strong AT mutational bias. Since H.uvarum is known to have lost an extensive amount of cell-cycle checkpoint and DNA repair genes and C.auris is an emerging multidrug-resistant yeast pathogen, the results may lead us to new discoveries of implications of specific mutational patterns.
This summer Zheng i) calculated genome assembly statistics including genome size, N50, and number of scaffolds for each sample to ensure data integrity, ii) performed pairwise intra-species fastANI analysis for each species to confirm that strains within a species are correctly classified, and iii) calculated GC content and expected GC equilibrium (GCeq) values and their correlation to investigate causes of nucleotide content variability in Ascomycota. My results show that most of the species have GCeq values around 0.5, meaning that mutational biases by themselves should lead to genomes with roughly equal AT and GC content for these species, which provide additional evidence that most Ascomycota species do not demonstrate any mutational biases. Notably all species belonging to the class Saccharomycetes have GCeq value lower than 0.45; in particular, GCeq values of H.uvarum and C.auris are 0.24 and 0.36 respectively, which are in accordance with their strong AT mutational bias. Correlation calculation requires phylogenetic independent contrasts analysis and is still in progress. Most of Zheng’s analyses are done via customized Python and bash scripts, therefore the DSI-SRP workshops about Python are particularly useful to her. Furthermore, Zheng learned from DSI-SRP workshops about how to manage her project and how to collaborate with others on GitHub, which are essential skills for data science research. Her DSI-SRP experience will serve as the basis for her honors thesis research, which she will complete this fall.
In addition to receiving support through a DSI-SRP fellowship, this project was supported and facilitated by the DSI Data Science Team through their regular summer workshops and demo sessions.