Dear all,
the next seminar will take place on Wednesday, please see the details
below. We look forward to seeing you all!
Title: Masked superstrings as a unified framework for textual k-mer set
representations
Speaker: Ondřej Sladký, Faculty of Mathematics and Physics, Charles
University
Date and time: Wednesday 27/3/2023 - 17:20
Location: MFF UK, Malostranské nám. 25, lecture hall S4
https://bioinformatika.mff.cuni.cz/seminar/
Best wishes,
Petr Danecek
-------------------------------
*Ondřej Sladký (Faculty of Mathematics and Physics, Charles University)*
*/Masked superstrings as a unified framework for textual k-mer set
representations/*
The popularity of k-mer-based methods has recently led to the
development of compact k-mer-set representations, such as
simplitigs/Spectrum-Preserving String Sets (SPSS), matchtigs, and
eulertigs. These aim to represent k-mer sets via strings that contain
individual k-mers as substrings more efficiently than the traditional
unitigs. Here, we demonstrate that all such representations can be
viewed as superstrings of input k-mers, and as such can be generalized
into a unified framework that we call the masked superstring of k-mers.
We study the complexity of masked superstring computation and prove
NP-hardness for both k-mer superstrings and their masks. We then design
local and global greedy heuristics for efficient computation of masked
superstrings, implement them in a program called KmerCamel, and evaluate
their performance using selected genomes and pan-genomes. Overall,
masked superstrings unify the theory and practice of textual k-mer set
representations and provide a useful framework for optimizing
representations for specific bioinformatics applications.
Dear all,
the next seminar will take place on Wednesday, please see the details
below. We look forward to seeing you all!
Title: Disease maps: building and analysing graphical models of
biomedical knowledge
Speaker: Marek Ostaszewski, Luxembourg Centre for Systems Biomedicine
Date and time: Wednesday 22/3/2023 - 17:20
Location: MFF UK, Malostranské nám. 25, lecture hall S3
https://bioinformatika.mff.cuni.cz/seminar/
Best wishes,
Petr Danecek
-------------------------------
*Marek Ostaszewski*
*/Disease maps: building and analysing graphical models of biomedical
knowledge/*
Disease maps encode knowledge about molecular pathophysiology in both
visual and computational format, helping interdisciplinary exchange
between bench scientists, clinical researchers and bioinformaticians. In
this talk I’ll introduce the concept based on the example of Parkinson’s
disease map and demonstrate it as a tool for visual exploration,
analytics and investigating complex data. Then, I’ll describe the
evolution of the approach and how it was picked up by a broader research
community leading to a large-scale effort to build the COVID-19 Disease
Map.
Dear all,
the next seminar will take place on Wednesday, please see the details
below. We look forward to seeing you all!
Title: Building complex bioinformatics pipelines using Snakemake
Speaker: Joern Gerchen
Date and time: Wednesday 8/3/2023 - 17:20
Location: MFF UK, Malostranské nám. 25, lecture hall S3
https://bioinformatika.mff.cuni.cz/seminar/
Best wishes,
Petr Danecek
----------------
*Joern Gerchen: /Building complex bioinformatics pipelines using
Snakemake/*
Polyploidy, the presence of multiple genome copies as a result of
whole-genome-duplications, is often thought to cause immediate
repruductive isolation between polyploids and diploid relatives, due to
inviability and sterility of hybrid offspring. However, recent research
showed evidence of introgression (gene-flow via hybridization and
backcrossing) between diploid and polyploid lineages. In this seminar I
will introduce my PostDoc project, which uses population genomic
analyses to quantify the degree of introgression between multiple
natural plant lineages with variable ploidy and assess to what degree it
can contribute to adaptive evolution. In order to determine the degree
of inter-ploidy introgression, variant calling and subsequent population
genomic analyses have to be run for each non-model species in a
ploidy-aware manner. These analyses require complex custom
bioinformatics pipelines, which have to be run repeatedly for multiple
lineages on HPC computing clusters. Implementing and running these types
of workflows in an efficient and reproducible manner can be challenging.
As an approach to overcome these issues I will introduce Snakemake,
which allows to implement automated, scalable and reproducible
bioinformatics workflows.