March 2023 - Bioinf-l - kam.mff.cuni.cz

Lecture invitation: Masked superstrings as a unified framework for textual k-mer set representations
by Petr Danecek 27 Mar '23

27 Mar '23

Dear all, the next seminar will take place on Wednesday, please see the details below. We look forward to seeing you all! Title: Masked superstrings as a unified framework for textual k-mer set representations Speaker: Ondřej Sladký, Faculty of Mathematics and Physics, Charles University Date and time: Wednesday 27/3/2023 - 17:20 Location: MFF UK, Malostranské nám. 25, lecture hall S4 https://bioinformatika.mff.cuni.cz/seminar/ Best wishes, Petr Danecek ------------------------------- *Ondřej Sladký (Faculty of Mathematics and Physics, Charles University)* */Masked superstrings as a unified framework for textual k-mer set representations/* The popularity of k-mer-based methods has recently led to the development of compact k-mer-set representations, such as simplitigs/Spectrum-Preserving String Sets (SPSS), matchtigs, and eulertigs. These aim to represent k-mer sets via strings that contain individual k-mers as substrings more efficiently than the traditional unitigs. Here, we demonstrate that all such representations can be viewed as superstrings of input k-mers, and as such can be generalized into a unified framework that we call the masked superstring of k-mers. We study the complexity of masked superstring computation and prove NP-hardness for both k-mer superstrings and their masks. We then design local and global greedy heuristics for efficient computation of masked superstrings, implement them in a program called KmerCamel, and evaluate their performance using selected genomes and pan-genomes. Overall, masked superstrings unify the theory and practice of textual k-mer set representations and provide a useful framework for optimizing representations for specific bioinformatics applications.

1 1

Lecture invitation: Disease maps: building and analysing graphical models of biomedical knowledge
by Petr Danecek 20 Mar '23

20 Mar '23

Dear all, the next seminar will take place on Wednesday, please see the details below. We look forward to seeing you all! Title: Disease maps: building and analysing graphical models of biomedical knowledge Speaker: Marek Ostaszewski, Luxembourg Centre for Systems Biomedicine Date and time: Wednesday 22/3/2023 - 17:20 Location: MFF UK, Malostranské nám. 25, lecture hall S3 https://bioinformatika.mff.cuni.cz/seminar/ Best wishes, Petr Danecek ------------------------------- *Marek Ostaszewski* */Disease maps: building and analysing graphical models of biomedical knowledge/* Disease maps encode knowledge about molecular pathophysiology in both visual and computational format, helping interdisciplinary exchange between bench scientists, clinical researchers and bioinformaticians. In this talk I’ll introduce the concept based on the example of Parkinson’s disease map and demonstrate it as a tool for visual exploration, analytics and investigating complex data. Then, I’ll describe the evolution of the approach and how it was picked up by a broader research community leading to a large-scale effort to build the COVID-19 Disease Map.

1 0

Lecture invitation: Building complex bioinformatics pipelines using Snakemake
by Petr Danecek 06 Mar '23

06 Mar '23

Dear all, the next seminar will take place on Wednesday, please see the details below. We look forward to seeing you all! Title: Building complex bioinformatics pipelines using Snakemake Speaker: Joern Gerchen Date and time: Wednesday 8/3/2023 - 17:20 Location: MFF UK, Malostranské nám. 25, lecture hall S3 https://bioinformatika.mff.cuni.cz/seminar/ Best wishes, Petr Danecek ---------------- *Joern Gerchen: /Building complex bioinformatics pipelines using Snakemake/* Polyploidy, the presence of multiple genome copies as a result of whole-genome-duplications, is often thought to cause immediate repruductive isolation between polyploids and diploid relatives, due to inviability and sterility of hybrid offspring. However, recent research showed evidence of introgression (gene-flow via hybridization and backcrossing) between diploid and polyploid lineages. In this seminar I will introduce my PostDoc project, which uses population genomic analyses to quantify the degree of introgression between multiple natural plant lineages with variable ploidy and assess to what degree it can contribute to adaptive evolution. In order to determine the degree of inter-ploidy introgression, variant calling and subsequent population genomic analyses have to be run for each non-model species in a ploidy-aware manner. These analyses require complex custom bioinformatics pipelines, which have to be run repeatedly for multiple lineages on HPC computing clusters. Implementing and running these types of workflows in an efficient and reproducible manner can be challenging. As an approach to overcome these issues I will introduce Snakemake, which allows to implement automated, scalable and reproducible bioinformatics workflows.

1 0