Quantitative high-throughput genomics in RNA viruses
[Thesis]
Du, Yushen
Sun, Ren
2017
Sun, Ren
2017
The high mutation rate and rapid genome replication of RNA viruses drive their adaptation to diverse selection pressures. The emergence of drug resistant or immune escape viral strains is always a major concern to public health. A comprehensive understanding of the mutation tolerability of viral genome is thus crucial to understand the evolution potential of viruses and guild the accurate risk assessments. Traditional genetics has proven to be a powerful tool for virology studies. Including forward genetics - determine the genetic basis responsible for a phenotype, and reverse genetics - determine the phenotype of a genetic change, it reveals the functional role of many important mutations. However, traditional genetics is usually restricted by limited and biased sampling, and is time and money consuming. To overcome these limitations, we have developed a qantatative high-throughput genomic system that enables us to quantify the phenotype of thousands to millions of mutations as a massive parallel process. Using random mutagenesis or satuated mutagenesis, we can generate a diverse pool of viral library containing desired mutations. The library can be used to assess the function of every amino acid/nucleotide in a variety of protein functional assays as well as viral growth assay, with the frequency of each mutant changed according to their competitive strength. We were able to quantify the relative frequency change of each variant pre and post selection by high-throughput sequencing, which represented their "relative fitness" under the particular selection condition. Since the first inception of the system, we have optimized and successfully applied it to human immunodeficiency virus (HIV), Hepatitis C Virus (HCV) and influenza A virus. We also explored the applications of the system to a variety of biological questions, with a specicial focus in the following 4 areas: Firstly, a direct application of the system is to better understand the distribution of fitness effect (DFE), which is fundamental to a variety of evolution theories. We systematically quantified the DFE of single amino acid substitutions (86 amino acids total) in the drug-targeted region of NS5A protein of Hepatitis C Virus (HCV). We found that the majority of non-synonymous substitutions incur large fitness costs, suggesting that NS5A protein is highly optimized in natural conditions. Furthermore, we characterized the evolutionary potential of HCV by subjecting the mutant viruses to varying concentrations of an NS5A inhibitor Daclatasvir. As the selection pressure increases, the DFE of beneficial mutations shifts from an exponential distribution to a heavy-tailed distribution with a disproportionate number of exceptionally fit mutants. The number of available beneficial mutations and the selection coefficient both increase at higher levels of antiviral drug concentration, as predicted by a pharmacodynamics model describing viral fitness as a function of drug concentration. Our large-scale fitness data of mutant viruses also provide insights into the biophysical basis of evolutionary constraints and the role of the genetic code in protein evolution. Secondly, we explored the usage of fitness profiling to identify and annotate protein functional residues. Using influenza A virus PB1 protein as an example, we developed an approach to achieve this task: Firstly, the effect of PB1 point mutations on viral replication was examined by saturation mutagenesis and high-throughput sequencing. Secondly, functional PB1 residues that are essential for viral growth but do not affect protein stability were identified by protein stability prediction. Lastly, homologous structural alignment was utilized to further annotate specific biological functions (canonical versus non-canonical functions) for each functional residue. We achieved high sensitivity in identifying and annotating the canonical polymerase functional residues. Moreover, we identified non-canonical functional residues, which are exemplified by a cluster of residues located in the loop region of PB1 β ribbon. These previously uncharacterized residues were shown to be important for PB1 protein nuclear import by interacting with Ran-binding protein 5 (RanBP5). Thirdly, the system was shown to be valuable for the identification of drug resistant mutations and the design of personalized therapy. Using influenza NA protein as an example, we characterized the fitness effects of single nucleotide mutations of neuraminidase (NA) and systematically identified resistant mutations for three neuraminidase inhibitors (NAIs): zanamivir, oseltamivir and AV5080. We observed that both the numbers and the effects of resistant mutations of AV5080 are smaller than those of zanamivir and oseltamivir, but so are their fitness costs. We used population genetic models to estimate the rate of increase in fitness under drug selection as a function of drug dosage. AV5080 showed a higher rate of increase in fitness at low drug concentrations due to the low fitness cost of resistant mutations, but also exhibited a steep drop with high drug concentrations because of lower strength of resistance. Our approach also enabled the systematic analyses of cross-resistance against different drugs, which showed to be uncommon between AV5080 and zanamivir. Lastly and importantly, the system can be utilized to explore new functions of viral proteins. To this end, we systematically identified type I interferon sensitive mutations across the entire influenza A viral genome. We have identified novel IFN-sensitive mutations on PB2, PA, PB1 and M1, in addition to NS1, which provides a foundation to determine multiple anti-IFN mechanisms encoded in different viral segments. Moreover, this quantitative functional information of every amino acid in the genome enabled us to rationally design vaccine to increase the safety and immunogenicity. By selecting and combining 8 mutations into one viral genome, we successfully generated a deficient in anti-interferon (DAI) influenza strain as a live attenuated vaccine candidate. DAI is replication-competent in IFN-deficient host, but able to induce transient IFN response and highly attenuated in IFN competent host. Impressively, DAI is capable of inducing a robust humoral response and a strong T cell response, which collectively leads to broad protection. The superior property of DAI strain demonstrated the capacity of our approach to construct a safe, effective and broadly protecting live attenuated influenza vaccine. Thus we proposed a novel and generally applicable approach for vaccine design: systematically identifying and eliminating immune evasion functions on the virus genome, while maintaining the replication fitness in vitro for vaccine production. In summary, we have developed the quantitative high-throughput genomic system, and applied it to a variety of biological questions. It is proven to be a powerful system to investigate fundamental evolution problems, identify functional residues and new functions of target proteins, and facilitate drug development. With the maturation of DNA systhesis technology and ever increasing sequencing power, we foresee the further improvement and more broad applications of this system to address foundamental mechanistic questions and practical applications.