Statistical analysis of whole genome sequences

The cost of sequencing DNA has fallen dramatically in the last decade (by ~99.998%), and this has led to the sequencing of a large number of complete genomes. Bacterial genomes are relatively small (therefore are cheap to sequence) and are of major interest to researchers in several different fields (e.g. microbiology, population genetics, epidemiology). Thousands of whole bacterial genomes have now been sequenced, and contain a wealth of information that was previously unavailable. However, the analysis of thousands of whole genomes poses major computational challenges which remain unsolved. In particular, inferring the population structure of a large number of genomes, and inferring the importance of different mechanisms of evolution in different populations (such as recombination rates, etc) remains a significant challenge. This project aims to provide new methods that allow problems such as these to be addressed in a statistically rigorous manner, whilst also ensuring that these methods are applicable to very large data sets.

