Milí kolegové a kolegyně,
chtěli bychom vás srdečně pozvat na další přednášku semináře, která se koná ve středu 11.5.2016, od 17:20 v posluchárně S5.
Přednášet bude Erik Garrison z Wellcome Trust Sanger Institute, UK
na téma Genome variation graphs and genotyping by tera-scale learning
Těšíme se na vaši účast!
S pozdravem, Petr Daněček a Martin Loebl
Stránky semináře http://bioinformatika.mff.cuni.cz/seminar/
--------------
In resequencing, we use an existing genome sequence as a scaffold to guide our analysis of data from a new individual. Although the genomes of many individuals are known, in standard practice we typically align sequence data from a new individual against a single reference genome. Similarly, when determining genotypes for this individual we apply a Bayesian model derived from first principles encoding our expectation of the distribution of reads across the variable site in which a single parameter defines our prior. In both cases, we fail to incorporate detailed prior information into our model, which limits our performance and prevents us from applying improved models of genomes that research has obtained.
We describe two techniques that allow the incorporation of extensive information from known genomes into the resequencing process. In the first, we replace the linear reference genome with a sequence graph that encodes known genomes of interest. We use the results of the 1000 Genomes Project to construct a graph which encodes approximately 5000 human genome equivalents. We then map sequence data from a well-characterized sample (NA12878) to this graph, call variation, and evaluate the end-to-end performance of the system using the NIST Genome in a Bottle truth set. In the second, we transform the alignments and candidate alleles produced by standard variant callers into a format suitable for consumption by a linear learner. We train the learner on various sequencing runs from NA12878, and as a proof of principle demonstrate its generalization to another sequencing run as part of the PrecisionFDA challenge.