Milí kolegové a kolegyně,
chtěli bychom vás srdečně pozvat na poslední přednášku semináře v tomto
semestru, která se koná ve středu 18.5.2016, od 17:20 v posluchárně S5.
Přednáška Mateje Pribise z 2. Lékařské Fakulty UK bude na téma
Copy Number Variation
Těšíme se na vaši účast!
S pozdravem,
Petr Daněček a Martin Loebl
Stránky semináře
http://bioinformatika.mff.cuni.cz/seminar/
Milí kolegové a kolegyně,
chtěli bychom vás srdečně pozvat na další přednášku semináře,
která se koná ve středu 11.5.2016, od 17:20 v posluchárně S5.
Přednášet bude
Erik Garrison z Wellcome Trust Sanger Institute, UK
na téma
Genome variation graphs and genotyping by tera-scale learning
Těšíme se na vaši účast!
S pozdravem,
Petr Daněček a Martin Loebl
Stránky semináře
http://bioinformatika.mff.cuni.cz/seminar/
--------------
In resequencing, we use an existing genome sequence as a scaffold to
guide our analysis of data from a new individual. Although the genomes
of many individuals are known, in standard practice we typically align
sequence data from a new individual against a single reference genome.
Similarly, when determining genotypes for this individual we apply a
Bayesian model derived from first principles encoding our expectation of
the distribution of reads across the variable site in which a single
parameter defines our prior. In both cases, we fail to incorporate
detailed prior information into our model, which limits our performance
and prevents us from applying improved models of genomes that research
has obtained.
We describe two techniques that allow the incorporation of extensive
information from known genomes into the resequencing process. In the
first, we replace the linear reference genome with a sequence graph that
encodes known genomes of interest. We use the results of the 1000
Genomes Project to construct a graph which encodes approximately 5000
human genome equivalents. We then map sequence data from a
well-characterized sample (NA12878) to this graph, call variation, and
evaluate the end-to-end performance of the system using the NIST Genome
in a Bottle truth set. In the second, we transform the alignments and
candidate alleles produced by standard variant callers into a format
suitable for consumption by a linear learner. We train the learner on
various sequencing runs from NA12878, and as a proof of principle
demonstrate its generalization to another sequencing run as part of the
PrecisionFDA challenge.
Milí kolegové a kolegyně,
chtěli bychom vás srdečně pozvat na další přednášku semináře,
která se koná zítra, tj. ve středu 4.5.2016, od 17:20 v posluchárně S5.
Karel Jalovec z ČVUT pohovoří na téma
Discriminatory analysis of sequenced read-sets
Těšíme se na vaši účast!
S pozdravem,
Petr Daněček a Martin Loebl
Stránky semináře
http://bioinformatika.mff.cuni.cz/seminar/
--------------
Discriminatory analysis of sequenced read-sets
Increasing amount of data obtained by the NGS technologies increases the
urge of effective analysis of this data. This work presents a tool for
binary classification of metagenomic samples. Metagenomic samples
consist of a large amount of short DNA strings (also called reads),
which belong to different organisms present in an environment from which
the sample was taken. Behavior of an environment can be affected by the
contamination by the organisms, which originaly do not belong in this
environment. The goal of this work is to develop a classification method
based on DNA superstrings that can accurately classify metagenomic
samples. Classifiers obtained by this method can be used for determining
whether newly obtained metagenomic samples are contaminated (positive)
or clean (negative) without the need of identification of particular
organisms present in the sample. We want to achieve this goal by
establishing a modified sequence assembly task for finding the most
discriminatory DNA superstrings. We assume that standard a approach for
this kind of analysis would be to assemble all the samples and try to
find the most discriminatory motifs. Both tasks are very computationally
demanding. Our method should solve both these tasks simultaneously.