May 2016 - Bioinf-l - kam.mff.cuni.cz

pozvanka na seminar
by Petr Danecek 16 May '16

16 May '16

Milí kolegové a kolegyně, chtěli bychom vás srdečně pozvat na poslední přednášku semináře v tomto semestru, která se koná ve středu 18.5.2016, od 17:20 v posluchárně S5. Přednáška Mateje Pribise z 2. Lékařské Fakulty UK bude na téma Copy Number Variation Těšíme se na vaši účast! S pozdravem, Petr Daněček a Martin Loebl Stránky semináře http://bioinformatika.mff.cuni.cz/seminar/

1 0

pozvánka na p?edná?ku Genome variation graphs and genotyping by tera-scale learning
by Petr Danecek 09 May '16

09 May '16

Milí kolegové a kolegyně, chtěli bychom vás srdečně pozvat na další přednášku semináře, která se koná ve středu 11.5.2016, od 17:20 v posluchárně S5. Přednášet bude Erik Garrison z Wellcome Trust Sanger Institute, UK na téma Genome variation graphs and genotyping by tera-scale learning Těšíme se na vaši účast! S pozdravem, Petr Daněček a Martin Loebl Stránky semináře http://bioinformatika.mff.cuni.cz/seminar/ -------------- In resequencing, we use an existing genome sequence as a scaffold to guide our analysis of data from a new individual. Although the genomes of many individuals are known, in standard practice we typically align sequence data from a new individual against a single reference genome. Similarly, when determining genotypes for this individual we apply a Bayesian model derived from first principles encoding our expectation of the distribution of reads across the variable site in which a single parameter defines our prior. In both cases, we fail to incorporate detailed prior information into our model, which limits our performance and prevents us from applying improved models of genomes that research has obtained. We describe two techniques that allow the incorporation of extensive information from known genomes into the resequencing process. In the first, we replace the linear reference genome with a sequence graph that encodes known genomes of interest. We use the results of the 1000 Genomes Project to construct a graph which encodes approximately 5000 human genome equivalents. We then map sequence data from a well-characterized sample (NA12878) to this graph, call variation, and evaluate the end-to-end performance of the system using the NIST Genome in a Bottle truth set. In the second, we transform the alignments and candidate alleles produced by standard variant callers into a format suitable for consumption by a linear learner. We train the learner on various sequencing runs from NA12878, and as a proof of principle demonstrate its generalization to another sequencing run as part of the PrecisionFDA challenge.

1 0

pozvanka na seminar
by Petr Danecek 03 May '16

03 May '16

Milí kolegové a kolegyně, chtěli bychom vás srdečně pozvat na další přednášku semináře, která se koná zítra, tj. ve středu 4.5.2016, od 17:20 v posluchárně S5. Karel Jalovec z ČVUT pohovoří na téma Discriminatory analysis of sequenced read-sets Těšíme se na vaši účast! S pozdravem, Petr Daněček a Martin Loebl Stránky semináře http://bioinformatika.mff.cuni.cz/seminar/ -------------- Discriminatory analysis of sequenced read-sets Increasing amount of data obtained by the NGS technologies increases the urge of effective analysis of this data. This work presents a tool for binary classification of metagenomic samples. Metagenomic samples consist of a large amount of short DNA strings (also called reads), which belong to different organisms present in an environment from which the sample was taken. Behavior of an environment can be affected by the contamination by the organisms, which originaly do not belong in this environment. The goal of this work is to develop a classification method based on DNA superstrings that can accurately classify metagenomic samples. Classifiers obtained by this method can be used for determining whether newly obtained metagenomic samples are contaminated (positive) or clean (negative) without the need of identification of particular organisms present in the sample. We want to achieve this goal by establishing a modified sequence assembly task for finding the most discriminatory DNA superstrings. We assume that standard a approach for this kind of analysis would be to assemble all the samples and try to find the most discriminatory motifs. Both tasks are very computationally demanding. Our method should solve both these tasks simultaneously.

1 0