Radio and PodcastRadio and PodcastLive Radio & Podcasts
Context based bioinformatics artwork
Education

Context based bioinformatics

Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU - Teil 01/02 by Ludwig-Maximilians-Universität München

May 10, 20130Education

The goal of bioinformatics is to develop innovative and practical methods and algorithms for bio- logical questions. In many cases, these questions are driven by new biotechnological techniques, especially by genome and...

About This Episode

Context based bioinformatics is an episode from Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU - Teil 01/02 by Ludwig-Maximilians-Universität München. The goal of bioinformatics is to develop innovat...

Listen Online

Use the player on this page to stream the episode online.

Episode Details

Published May 10, 2013, 0 long, audio available.

Questions About This Episode

What is Context based bioinformatics about?

The goal of bioinformatics is to develop innovative and practical methods and algorithms for bio- logical questions. In many cases, these questions are driven by new biotechnological techniques, especially by genome and cell wide high throughput experiment studies. In principle there are two approaches: 1. Reduction and abstraction of the question to a clearly defined optimization problem, which can be solved with appropriate and efficient algorithms. 2. Development of context based methods, incorporating as much contextual knowledge as possible in the algorithms, and derivation of practical solutions for relevant biological ques- tions on the high-throughput data. These methods can be often supported by appropriate software tools and visualizations, allowing for interactive evaluation of the results by ex- perts. Context based methods are often much more complex and require more involved algorithmic techniques to get practical relevant and efficient solutions for real world problems, as in many cases already the simplified abstraction of problems result in NP-hard problem instances. In many cases, to solve these complex problems, one needs to employ efficient data structures and heuristic search methods to solve clearly defined sub-problems using efficient (polynomial) op- timization (such as dynamic programming, greedy, path- or tree-algorithms). In this thesis, we present new methods and analyses addressing open questions of bioinformatics from different contexts by incorporating the corresponding contextual knowledge. The two main contexts in this thesis are the protein structure similarity context (Part I) and net- work based interpretation of high-throughput data (Part II). For the protein structure similarity context Part I we analyze the consistency of gold standard structure classification systems and derive a consistent benchmark set usable for different ap- plications. We introduce two methods (Vorolign, PPM) for the protein structure similarity recog- nition problem, based on different features of the structures. Derived from the idea and results of Vorolign, we introduce the concept of contact neighbor- hood potential, aiming to improve the results of protein fold recognition and threading. For the re-scoring problem of predicted structure models we introduce the method Vorescore, clearly improving the fold-recognition performance, and enabling the evaluation of the contact neighborhood potential for structure prediction methods in general. We introduce a contact consistent Vorolign variant ccVorolign further improving the structure based fold recognition performance, and enabling direct optimization of the neighborhood po- tential in the future. Due to the enforcement of contact-consistence, the ccVorolign method has much higher computational complexity than the polynomial Vorolign method - the cost of com- puting interpretable and consistent alignments. Finally, we introduce a novel structural alignment method (PPM) enabling the explicit modeling and handling of phenotypic plasticity in protein structures. We employ PPM for the analysis of effects of alternative splicing on protein structures. With the help of PPM we test the hypothesis, whether splice isoforms of the same protein can lead to protein structures with different folds (fold transitions). In Part II of the thesis we present methods generating and using context information for the interpretation of high-throughput experiments. For the generation of context information of molecular regulations we introduce novel textmin- ing approaches extracting relations automatically from scientific publications. In addition to the fast NER (named entity recognition) method (syngrep) we also present a novel, fully ontology-based context-sensitive method (SynTree) allowing for the context-specific dis- ambiguation of ambiguous synonyms and resulting in much better identification performance. This context information is important for the interpretation of high-throughput data, but often missing in current databases. Despite all improvements, the results of automated text-mining methods are error prone. The RelAnn application presented in this thesis helps to curate the automatically extracted regula- tions enabling manual and ontology based curation and annotation. For the usage of high-throughput data one needs additional methods for data processing, for example methods to map the hundreds of millions short DNA/RNA fragments (so called reads) on a reference genome or transcriptome. Such data (RNA-seq reads) are the output of next generation sequencing methods measured by sequencing machines, which are becoming more and more efficient and affordable. Other than current state-of-the-art methods, our novel read-mapping method ContextMap re- solves the occurring ambiguities at the final step of the mapping process, employing thereby the knowledge of the complete set of possible ambiguous mappings. This approach allows for higher precision, even if more nucleotide errors are tolerated in the read mappings in the first step. The consistence between context information of molecular regulations stored in databases and extracted from textmining against measured data can be used to identify and score consistent reg- ulations (GGEA). This method substantially extends the commonly used gene-set based methods such over-representation (ORA) and gene set enrichment analysis (GSEA). Finally we introduce the novel method RelExplain, which uses the extracted contextual knowl- edge and generates network-based and testable hypotheses for the interpretation of high-throughput data.

Where can I listen to Context based bioinformatics?

You can listen to Context based bioinformatics online on Radio and Podcast. Open the player on this page to stream the available audio.

Which podcast is Context based bioinformatics from?

Context based bioinformatics is an episode from Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU - Teil 01/02 by Ludwig-Maximilians-Universität München.

How long is this episode?

This episode is 0 long.

When was this episode published?

This episode was published on May 10, 2013.

Can I save Context based bioinformatics for later?

Yes. Use the heart button on the episode page to add it to your favorite episodes list.

Are there related episodes from Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU - Teil 01/02?

Yes. This page shows related episodes from Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU - Teil 01/02 when more episodes are available from the podcast feed.

Quick Answers About This Episode

Where can I listen to Context based bioinformatics?

You can listen to Context based bioinformatics on this page when the episode audio is available from the podcast feed.

Which podcast is this episode from?

Context based bioinformatics is from Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU - Teil 01/02 by Ludwig-Maximilians-Universität München.

What are the episode details?

Published May 10, 2013 and 0 long