Computational Biology Research Seminar

Computational Biology Research Seminar Program

Future Meetings

Give cancer archival tissues a new life: unlock with next-gen sequencing

Dr Lan Hu, Center for Cancer Computational Biology, Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA

Formalin-Fixed Paraffin-Embedded (FFPE) archival tissue blocks are a rich resource for retrospective discovery studies in cancers. A great advantage of FFPE blocks is that in many cases tissue samples of different disease stages of the same patient and the clinical data are available, which allows us to build more precise models to identify the gene expression patterns associated with cancer progression. However the fragmented and cross-linked nature of of mRNA/DNA in the FFPE blocks presents a challenge to successfully measure the transcript level. I will present how we leverage the next-generation sequencing technology to quantify transcripts in FFPE samples. We have sequenced the transcriptome in FFPE samples from non-progressive and progressed stages of bladder cancer. The subsequent analysis has shown gene expression patterns correlated to the cancer progression time as the first step to build a clinically feasible diagnostic tool to predict whether and ultimately when a patient will progress to the invasive stage in bladder cancer.

Tuesday, 29-11-2011, 5:00pm, Bioinformatics Seminar Room

Past Program

Assessing Robustness Issues in Microarray Data Analysis

Alexandra Posekany, VSCB, BOKU University Vienna.

Alexandra Posekany, Klaus Felsenstein, Peter Sykacek

Commonly applied methods for analysing microarray data generally assume Gaussian noise despite the normal distribution's sensitivity to frequently occurring outliers. The computational efficiency of Gaussian derived methods makes them preferable to robust or non-parametric approaches, especially when dealing with high-dimensional microarray data. In contrast to the Gaussian model, a robust model is able to deal with heavy-tailed data. However, analysing the effects of choosing a more complex robust model requires a thorough investigation.

Therefore, we propose a hierarchical Bayesian model which allows us to select the most appropriate noise model from a given finite set. Including the Gaussian model in the set of considered distributions makes it possible to directly compare Gaussian to heavier-tailed noise models. Furthermore, we compared the inference results of the best fitting noise model to those of the Gaussian model in order to assess differences in the biologically relevant differential expression classification.

Our investigations yielded the interesting result that a heavy-tailed distribution provides the best fit for all investigated data sets. Comparing the heavy-tailed t model to the Gaussian noise model revealed that choosing a less robust model has a huge impact on the differential expression results. Subsequently, the choice of noise model affects the conclusions drawn from the differential expression assessment on a higher level of analysis, as we considered when using Gene Ontologies.

Wednesday, 06-04-2011, 3:00pm, Bioinformatics Seminar Room

Feature architecture similarity complements sequence similarity in tracing the evolution of functional protein complexes.

Tina Koestler, Center for Integrative Bioinformatics Vienna, Max F. Perutz Laboratories, Dr. Bohr Gasse 9, 1030 Vienna, Austria

Tina Koestler, Alwin Koehler, Arndt von Haeseler, Ingo Ebersberger

Many proteins encoded in a species genome interact to form functional modules such as metabolic pathways or multiprotein complexes. Initially, research in a few, well-established model organisms sought to identify such functional modules. Meanwhile genome sequences representing almost all major groups in the eukaryotic tree are available. This wealth of data facilitates now to investigate in what eukaryotic species a functional module is present, when during evolution it emerged, and how it evolved. As a result, a more refined picture of organismal and functional evolution will emerge. The key problem however lies in the accurate prediction of functional equivalents to the proteins in the module of interest. Homology inference to already characterized proteins is the prevalent method to identify functional equivalents in new data. This brings along three major issues: First, homologs may have diverged in their function, second, homology inference becomes increasingly hard with increasing evolutionary distance, and third, functional equivalents are not necessarily homologous. This calls for comprehensive methods that can complement such sequence similarity-based approaches in the search for functional equivalents. We developed the Feature Architecture Comparison Tool that uses the similarity in the arrangement of functional domains, secondary structure elements and compositional properties between two proteins as a proxy for their functional equivalence. A scoring function measures this feature architecture similarity as a weighted sum of three partial scores measuring the similarity of (i) the number of instances for all shared features, (ii) PFAM clan annotations, and (iii) the position of shared features between two proteins. By that, FACT facilitates a feature-based search for functional equivalents in entire proteomes. An evaluation on EC classified enzymes shows that FACT still identifies functional equivalents when sequence similarity no longer suffices to trigger a significant BLAST hit. Thus, FACT complements sequence similarity-based approaches in the search for functional equivalents. We will introduce FACT and will exemplify its joint application with orthology prediction tools and BLAST in tracing two protein complexes described in yeast, TREX-2 and SAGA in the eukaryotic tree of life. Both complexes interact in relocating transcribed chromatin to the nuclear pore complex. Targeting of activated genes to nuclear pores, also termed gene gating, was recently described as a novel mechanism for modulating gene expression. It is currently unclear, how universal this phenomenon is and whether the molecular machinery for gene gating is conserved across species. We reconstruct the evolution of these modules by searching for functional equivalents in fungi, plants, animals and protists.

Thursday, 02-12-2010, 3:30pm, Bioinformatics Seminar Room

Endocrine Regulation of Drosophila Aging

Dr. Thomas Flatt, Institut für Populationsgenetik Veterinärmedizinische Universität Wien, Austria

Trade-offs between reproduction and lifespan or somatic maintenance are ubiquitous, but little is known about their underlying mechanisms. Recent work suggests that reproduction and lifespan might be linked by molecular signals produced by reproductive tissues. In the nematode C. elegans lifespan is extended if worms lack proliferating germ cells in the presence of an intact somatic gonad. This suggests that the gonad is the source of signals which physiologically modulate organismal aging. Our work shows that such gonadal signals are also present in the fruit fly D. melanogaster, suggesting that the regulation of lifespan by the reproductive system is evolutionarily conserved. Ablation of germline stem cells in the fly extends lifespan and modulates components of the insulin/insulin-like growth factor signaling pathway (IIS) in peripheral tissues, a conserved pathway important in regulating growth, metabolism, reproduction, and aging. Using a combination of experimental evolution and hormonal manipulation, we further find that juvenile hormone (JH), a hormone downstream of IIS, mediates the physiological, but not necessarily the evolutionary, trade-off between lifespan and reproduction in Drosophila. Moreover, we show that JH acts as a powerful immunosuppressor, suggesting that this hormone might regulate the commonly observed trade-off between reproduction and immune function. The rapid progress made by molecular biologists in identifying candidate mechanisms affecting life history trade-offs will enable evolutionary biologists to determine whether there is standing genetic variation for such pleiotropic mechanisms in natural populations and whether they are under selection.

Tuesday, 24-03-2009, 4:00pm,Bioinformatics Seminar Room

Mixing and matching data from different sources to understand disease

Prof. Cristin Print, Clinical Molecular Medicine & Pathology School of Medical Sciences, University of Auckland, NZ

To understand how thousands of individual molecules work together to determine the function of tissues, we need to combine information from multiple sources. This integration of different types of information is one of the key goals of systems biology, and is likely to play an important role in developing a better understanding of human disease at a molecular level. For example, it may be useful to combine information from in vitro and in vivo microarray experiments, or from exon arrays with 3' arrays, or to compare transcript abundance and pathway activation between epithelial cells and stromal cells in the same tumour. Alternatively, it may be useful to combine information about transcription factor specificity with microarray data, or to combine data about mRNA and miRNA abundance in the same tissue. This talk will discuss a set of experiments where my research group have tried to use bioinformatics to bring data of these different types together, and some of the challenges that these experiments have revealed.

Thursday, 12-03-2009, 4:00pm, Bioinformatics Seminar Room

Challenges of Managing e-science Workflows

Dr. Ivona Brandic, Distributed Systems Group, ISI, Vienna University of Technology

Large scale scientific problems like preoperative medical simulations and disaster recovery applications are usually expressed by means of a Grid workflow. Thereby, various resources distributed over different administration domains are combined in order to solve complex scientific problems. In the first part of this talk we present the principles for the specification, planning, and execution of e-science workflows. Moreover we discuss the workflow enabling technologies like Service Oriented computing, Grid, and Cloud Computing. Usually, in case of long running e-science workflows end users should be informed about the expected time or price necessary for the workflow execution by using Quality of Service (QoS) concepts. Thus, in this talk recent solutions as well as new research directions for QoS-aware e-science workflows will be presented. In the second part of this talk new research challenges will be discussed considering e-science workflows as a key technology for the establishment of virtual in-silico laboratories. In particular we will discuss generic QoS negotiation models and user-driven and dynamic e-science workflows.

Monday, 19-01-2009, 4:30pm, Bioinformatics Seminar Room

Approximate Conditional-mean Type Filtering for State-space Models

Dr. Bernhard Spangl, Institute of Applied Statistics and Computing, BOKU University

We consider in the following the problem of recursive filtering in linear state-space models. The classically optimal Kalman filter (Kalman, 1960; Kalman and Bucy, 1961) is well known to be sensitive to outliers, so robustness is an issue.

For an implementation in R (R Development Core Team, 2005), we have been working on an R package robKalman (Ruckdeschel and Spangl, 2007), where a general infrastructure is provided for robust recursive filters. In this framework the rLS (Ruckdeschel, 2001) and the ACM (Martin, 1979) filter have already been implemented, the latter as an equivalent realization of the filter implemented in Splus.

While this ACM filter is bound to the univariate setting, based on Masreliez's result (Masreliez, 1975) we propose a generalized ACM type filter for multivariate observations (Spangl and Dutter, 2008). This new filter is implemented in R within the robKalman package and has been compared to the rLS filter by extensive simulations.

Reading List

R.E. Kalman (1960). A new approach to linear filtering and prediction problems. Journal of Basic Engineering-Transactions of the ASME, 82, p. 35-45.

R.E. Kalman and R. Bucy (1961). New results in filtering and prediction theory. Journal of Basic Engineering-Transactions of the ASME, 83, p. 95-108.

R.D. Martin (1979). Approximate conditional-mean type smoothers and interpolators. In Smoothing Techniques for Curve Estimation. Lect. Notes Math. 757, p. 117-143, Springer, Berlin.

C.J. Masreliez (1975). Approximate non-Gaussian filtering with linear state and observation relations. IEEE Transactions on Automatic Control, 20, p. 107-110.

R Development Core Team (2005). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna.

P. Ruckdeschel (2001). Ansatze zur Robustifizierung des Kalman-Filters. Bayreuther Mathematis- che Schriften, Vol. 64.

P. Ruckdeschel and B. Spangl (2007). robKalman: An R package for robust Kalman filtering. Web: http://r-forge.r-project.org/projects/robkalman/

B. Spangl and R. Dutter (2008). Approximate Conditional-mean Type Filtering for Vector-valued Observations. Technical Report TR-AS-08-1, Universitat fur Bodenkultur, Vienna.

Tuesday, 25-11-2008, 4pm, Bioinformatics Seminar Room

Sparse Algebraic Dynamic Programming

Christian Höner zu Siederdissen, Theoretical Biochemistry Group, University of Vienna

Dynamic Programming is one of the ubiquitous tools in Bioinformatics today. But despite having been used for decades, the correct implementation of algorithms is still hard. Algebraic Dynamic Programming aims to reduce the effort required to produce the correct recurrences for a dynamic program by giving a set of combinators with which the problems can be formulated.

Sparse algorithms exploit the fact that for many problems the structure of the problem itself allows us to reduce the set of potential optimal answers substantially. This can lead to remarkable reductions in space and time complexity.

Sparse Algebraic Dynamic Programming combines the ideas of Algebraic Dynamic Programming and Sparsity into a framework where such algorithms can be written with ease.

Folding a single RNA sequence into the optimal secondary structure under basepair maximisation will be the showcase algorithm.

I will begin with a short introduction to Haskell. This functional language gives the user the ability to develop on a very high level of abstraction and has with GHC a compiler that produces very efficient code. Both ADP and sparse ADP are written in Haskell.

Tuesday, 28-10-2008, 4pm, Bioinformatics Seminar Room

Robust Gaussian Process Regression and Applications

Oliver Stegle, Inference Group, Cavendish Laboratory, University of Cambridge, U.K.

The Gaussian Process Prior is very flexible, easy to use, and gives excellent results for many regression problems. As soon as we try to tackle real world data the non-gaussianity of nature and noise often makes it more difficult to adopt the GP scheme. I will discuss non-Gaussian likelihood models to solve the robust regression task in the context of an application to physiological heart-rate data where prior knowledge can be extracted form additional data sources.

Thursday, 08-05-2008, 4pm, Bioinformatics Seminar Room

Modeling Thermodymamics of Microarray Hybridization

Dr. Ulli Mückstein, Theoretical Biochemistry Group, University of Vienna

Microarrays provide a high-throughput tool for a wide variety of applications, including such diverse tasks as the analysis of single nucleotide polymorphisms (SNP), the analysis of transcript expression profiles, and the identification of microorganisms. In DNA microarrays, nucleotide probes attached to a solid support are used to detect solution-phase complementary target sequences by base-pairing. Typically, target molecules carry fluorescent labels, and the hybridization signal is quantified through a laser scanner or other imaging device. At present, however, the thus measured signal levels are not easily related to absolute quantities of target transcripts.

In the research work proposed we will apply modern thermodynamic models of microarray hybridization both to the interpretation of microarray signals and the design of optimal oligonucleotide probes. In particular, we will use full thermodynamic models of competitive hybridization, applying the partition function over the ensemble of all possible duplex-structures for modeling interaction. In contrast to previous approaches, this combines exact modeling of probe-target interactions with a computation of binding-site accessibility, which has been demonstrated to be directly correlated to secondary structures of the interacting molecules. The predictive power of our models will be assessed by microarray experiments, allowing subsequent model refinement.

Wednesday, 05-03-2008, 4pm, Bioinformatics Seminar Room

Bayesian Modelling of Shared Gene Function

Dr Peter Sykacek, Bioinformatics Research Group, Dept. of Biotechnology, BOKU University Vienna

Biological assays are often carried out on tissues that contain many cell lineages and active pathways. Microarray data produced using such material therefore reflect superimpositions of biological processes. Analysing such data for shared gene function by means of well matched assays may help to provide a better focus on specific cell types and biological processes. The identification of genes that behave similarly in different biological systems also has the potential to reveal new insights into preserved biological mechanisms.
This talk proposes a hierarchical Bayesian model allowing integrated analysis of several microarray data sets for shared gene function. Each transcript is associated with an indicator variable that selects whether binary class labels are predicted from expression values or by a classifier which is common to all transcripts. Each indicator selects the component models for all involved data sets simultaneously. A quantitative measure of shared gene function is obtained by inferring a probability measure over these indicators.
Through experiments on synthetic data we illustrate potential advantages of this Bayesian approach over a standard method. A shared analysis of matched microarray experiments covering a) a cycle of mouse mammary gland development and b) the process of in vitro endothelial cell apoptosis is proposed as a biological gold standard. Several useful sanity checks are introduced during data analysis and we confirm the prior biological belief that shared apoptosis events occur in both systems. We conclude that a Bayesian analysis for shared gene function has the potential to reveal new biological insights, unobtainable by other means.

Tuesday, 11-12-2007, 4pm, Bioinformatics Seminar Room

Evaluation of Biological Mechanisms using Gene Expression Data: An Application of the Bayesian Regression and the Statistical Meta-Analysis

Dr Pramod K. Gupta, Institute of Statistical Science, Academia Sinica, Taiwan

Concept that the biological phenomena usually governed through the set of genes rather then single one. Thus, the interest from the genomics data got more centered for the identification of a set of genes that is functionally related then the interest of single gene behaviors. Statistical approach comprises with available data and biological information, which includes not only subjective one but also the database information, now provides more relevant inferences regarding system biology. This talk will discuss two novel statistical approaches which have been developed using the available biological information of both form, subjective and database. The first proposed approach based on a Bayesian regression, where prior is constructed by utilizing the subjective information that the time course expression level of a virus gene in a cell is typically constantly zero initially, increasing for a while, and then decreasing. We consider regression model in which the mean function satisfies the above shape restriction and such geometric information significantly incorporated using Bernstein polynomial as a prior. The computation has been facilitated by reversible jump Metropolis-Hastings algorithm as the dimension of parameters are not fix. Real data illustration of proposed method uncovered several interesting fact for understanding temporal transcription program of Baculovirus by grouping the genes of similar expression pattern together. The second approach proposed a testing procedure for the automatic ontological analysis of gene expression data using database information. The objective of the ontological analysis is to retrieve some functional annotations, e.g. Gene Ontology terms, relevant to underlying cellular mechanisms behind the gene expression profiles. Most existing methods for this purpose implement the similar approach that exploits rank statistics of the genes which are ordered by the strength of statistical evidences, e.g. p-values computed by testing hypotheses at the individual gene level. These existing rank-based approaches often cause the serious false discovery and such drawbacks have been pointed out here from the statistical point of view, and then, a new testing procedure is proposed in order to overcome the drawbacks. Theoretically, the proposed testing method is having the basis of the statistical meta-analysis, whereas, the hypothesis to be tested is suitably stated for the problem of the ontological analysis. Disadvantages of the rank-based approach and the advantages of the proposed method are illustrated through Monte Carlo experiments as well as gene expression data of human diabetes.
Key words: Bernstein polynomial; Metropolis-Hastings-Green algorithm; Non-informative priors; Statistical Meta-Analysis; Fisher's Exact Test; Gene Ontology; Gene Expression Data; Monte-Carlo study.
Friday, 03-08-2007, 3pm, Bioinformatics Seminar Room

Learning via Dependence Estimation

Dr Le Song, NICTA Statistical Machine Learning Program, University of Sydney.
joint work with Karsten Borgwardt, Arthur Gretton, Alex Smola and Bernhard Schölkopf

Many learning problems can be cast as maximizing dependence/minimizing independence: for instance, feature selection is to maximize the dependence between the chosen features and the given labels; clustering is to do so between the generated cluster labels and the data; and dimensionality reduction with side information is between the reduced representation of the data and the side information. In this talk, I will introduce a kernel measure of independence via Hilbert space embedding of distributions (Acronym HSIC). In contrast to traditional measures (such as mutual information), HSIC has the advantage that no density estimation is required and good convergence of its empirical estimate is guaranteed. I will then formulate various learning problems using HSIC, i.e. transform the data and/or the labels such that they are strongly dependent (relevant) as measured by HSIC. This dependence view provides a unifying learning framework: by choosing an appropriate kernel, it recovers many existing algorithms as special cases, such as signal-to-noise ratio for feature selection, k-means clustering and maximum variance unfolding. Furthermore, it also suggests new and interesting algorithms, such as clustering with structured labels and colored maximum variance unfolding. Learning via dependence estimation also has wide applicability. I will illustrate this using various real-world datasets, ranging from microarray data to EEG recordings, from images to text documents.
Friday, 27-07-2007, 2pm, Bioinformatics Seminar Room

Gene expression evolution in the brain

Dr Gerton Lunter, Comparative Genomics Group, MRC Functional Genetics Unit, Dept of Physiology, Anatomy and Genetics, University of Oxford

No abstract available.
Friday, 20-07-2007, 4pm, Bioinformatics Seminar Room

Evolutionary rate analyses in clades - comparative genomics of multiple species

Dr Andreas Heger, Comparative Genomics Group, MRC Functional Genetics Unit, Dept of Physiology, Anatomy and Genetics, University of Oxford

Thanks to many large-scale sequencing efforts, we are lucky now to have several complete genomes available for study. As a consequence, comparative studies that previously included only pairs of genomes can now extend over whole clades. In my talk I will introduce our methods for analyzing the protein coding content of several species at once and present our results for the recently sequenced clade of 12 Drosophila genomes.
Friday, 20-07-2007, 3pm, Bioinformatics Seminar Room

Kernel Methods, Software Algorithms and Applications

Dr Alexandros Karatzoglou, Institut für Statistik und Wahrscheinlichkeitstheorie, Technische Universität Wien

This talk will focus on some important aspects of kernel methods. It provides an introduction to the "R" kernel methods software package "kernlab", a description of the on-line step size adapted learning algorithms and an application of spectral clustering to text grouping. The presentation will also provide a general introduction into some basic concepts in the field of kernel-based Machine Learning.
Wednesday, 13-06-2007, 4pm, Bioinformatics Seminar Room

Simple Sequence Repeats in Theory and Practice

Dr Daniel Dieringer, Institut für Angewandte Genetik und Zellbiologie, Universität für Bodenkultur, Wien

Simple sequence repeats are often used as a tool in population genetics. The special mutation behaviour of simple sequence repeats is that they change the length of their sequence. In addition their high mutation rate makes it easy to detect genetic difference also between highly related individuals. Because the mutation rate is high and microsatellites are randomly distributed in a high number over the genomes, they are a good tool to detect recent selective sweeps, also under complex demographic scenarios. We tested this by coalescence and forward simulations and used this approach for the screening of the genome of the fruitfly Drosophila melanogaster. The original habitat of the fruitfly was in Central Africa from where it spreads all over the world ~10000 years ago. Previous studies have suggested that this habitat expansion involved the spread of beneficial mutations in non-African populations. As a side project of the work above, we also address the question what is the difference between a simple sequence repetition and a microsatellite with its typical mutational process and how a normal ~microsatellite~is born.
Wednesday, 16-05-2007, 4pm, Bioinformatics Seminar Room

On gene identification, selection and association analysis

Dr Enrico Capobianco, Research Fellow, CRS4 Bioinformatics Lab, Pula/Cagliari, Sardinia

Microarrays measure the abundance of thousands of mRNA targets simultaneously, but deliver limited samples and noisy high-dimensional gene expression values. The elucidation of gene-gene relationships is available through the observed experimental measurements and after suitable data mining and pre-processing steps; it involves several important statistical aspects like model selection, inference regularization, and calibration. For reliable inference, the issue of dealing with extremely small samples of data is crucial, as this leads to important deviations from standard statistical routes. It is of main interest to look at gene feature selection performed by projective methods. Then, other related topics will be presented.
Monday, 14-05-2007, 4:30pm, Bioinformatics Seminar Room

In-depth bioinformatic analysis of gene expression profiles reveals molecular processes and transcriptional control of rate-limiting enzymes

Dr Thomas Burkard, Bioinformatics Research Group, Institute for Molecular Pathology, Vienna

Background
Large-scale transcription profiling of cell models and model organisms can identify novel molecular components involved in fat cell development. The extent to which molecular processes can be revealed by expression profiling and functional annotation of genes was evaluated.
Results
Mouse microarrays with more than 27,000 elements were developed, and transcriptional profiles were monitored during adipose differentiation. In total, 780 differentially expressed expressed sequence tags (ESTs) were subjected to in-depth bioinformatics analyses. A molecular atlas of fat cell development was then constructed by de novo functional annotation on a sequence segment/domain-wise basis of 659 protein sequences, and subsequent mapping onto known pathways, possible cellular roles, and subcellular localizations. Key enzymes in 27 out of 36 investigated metabolic pathways were regulated at the transcriptional level, typically at the rate-limiting steps in these pathways. Also, coexpressed genes rarely shared consensus transcription-factor binding sites.
Conclusions
Large-scale transcription profiling in conjunction with sophisticated bioinformatics analyses can provide not only a list of novel players in a particular setting but also a global view on biological processes and molecular networks.
Wednesday, 09-05-2007, 4pm, Bioinformatics Seminar Room

Just-in-time assembly: the evolution of transcriptional and post-translational cell-cycle regulation of protein complexes.

Dr Lars Juhl Jensen, Bork group [Computational Biology], European Molecular Biology Laboratory (EMBL), Heidelberg, Germany

The regulation of the eukaryotic cell cycle has been a topic of intense research for decades, and it is of both fundamental scientific and medical importance. In recent years, microarrays have been utilized to simultaneously monitor the expression of all genes during the mitotic cell cycle of humans, budding yeast, fission yeast, and the plant Arabidopsis thaliana. Reanalysis the available microarray data for each species, which combined with sequence-based orthology detection, provided insight into the evolution of cell-cycle regulation. Surprisingly, the transcriptional regulation of cell-cycle genes is very poorly conserved between orthologous genes. By mapping the expression data onto protein interaction networks and well-described protein complexes, we discovered that the assembly, and hence the activity, of protein complexes is typically controlled through only a few subunits. We were able to show that although the identity of the periodically expressed subunits of a given complex varies greatly between species, the regulated subunits in each species are expressed shortly before the complex is known to act. Moreover, comparing the results from microarray expression studies to sets of substrates of cyclin-dependent kinases, which had either been shown experimentally or predicted based on sequence motifs, revealed that transcriptional and post-translational regulation has co-evolved independently in multiple lineages; the subunits that control the dynamic assembly of protein complexes during the cell cycle are thus tightly controlled at multiple levels. Our results indicate that many solutions have evolved for assembling the same molecular machines at the right time during the cell cycle, which raises the question of how fast regulation evolves and how closely related two organisms have to be for regulatory details to be transferable.
Thursday, 15-02-2007, 4pm, Bioinformatics Seminar Room

Recent advances in quantitative proteomics and applications to organelle characterization.

Dr Kathryn Lilley, Director, Cambridge Centre for Proteomics Cambridge Systems Biology Institute, University of Cambridge, UK

There are numerous approaches to study the proteome in a quantitative manner currently used by the proteomics community to answer a wide range of biological questions. Many of these techniques are complimentary and their strengths and weaknesses will be discussed, particularly in terms of compatibility with different samples and reproducibility. One important biological question, for which quantitative proteomics approaches can be used, is the accurate assignment of proteins to subcellular locations. The study of proteins at the organelle level is key to both our understanding of the function of proteins and the role of organelles. Organelle proteomics is hampered by our inability to efficiently purify most organelles. However, partial separation of organelles can be achieved using density gradient centrifugation and this results in distributions of proteins along the gradients unique to the organelle in which they reside. The subcellular localization of proteins with hitherto unknown locations can then be determined by comparing their distributions to those of previously localized proteins, as proteins belonging to the same organelle will co-fractionate. A quantitative approach and concomitant analysis of data is thus required for accurate visualization of the distribution patterns. One approach, LOPIT (localisation of organelle proteins by isotope tagging) combines isotope tagging using iTRAQ with multivariate statistical analyses. This analytical combination will be discussed, along with its use to assign proteins within the Arabidopsis endomembrane system and in the location of Drosophila signalling components.
Thursday, 01-02-2007, 4pm, Bioinformatics Seminar Room

Non-EST based prediction of alternatively spliced cassette exons with cell signaling function in Caenorhabditis elegans and human.

Dr German LeParc, Dept. of Genetics, Washington University, St. Louis, MO, USA.

Although alternative splicing is recognized as a pervasive cellular phenomenon, little is known about the function of most alternative transcripts. Recent evidence suggests that alternative splicing is intimately linked with cell signaling, and this may partially explain how cells can produce so many different responses to their environments. To find alternatively spliced protein isoforms involved in signaling, we developed PASE (Prediction of Alternative Signaling Exons), a computational tool that identifies alternative cassette exons that code for kinase phosphorylation or signaling protein binding sites. We applied PASE to the C.elegans and human genomes and tested our predictions via RT-PCR. We experimentally verified 59 previously unknown alternatively-spliced cassette exons (33 in worm, 26 in human) and showed that these are likely to function in signaling pathways. One of our validated predictions, a novel isoform of Estrogen Receptor alpha (ERa) in human, may explain the molecular basis of a previously observed interaction between ERa and the signaling protein PI3 Kinase.
Monday, 29-01-2007, 4pm, Bioinformatics Seminar Room

Computer-aided epitope based vaccine design

Dr Sudipto Saha, Institute of Microbial Technology, Chandigarh, India.

Recently, a lot of attention has been focused towards computer/information science in order to solve immunological problem. The immunological data is increasing exponentially, which poses a challenge to the informaticians. The analyses of the immune system using computational models like artificial neural network (ANN), support vector machine (SVM), hidden markov model (HMM), typically involve the development of computer aided vaccine design (CAVD) and its application to search for new vaccines. Key to solving this challenge is the prediction of immunogenicity, at the level of epitope and subunit vaccine. The talk shall be on improved and novel prediction methods for identification of potential epitope-based vaccine candidate(s). The work to be presented mainly emphasizes i) on the identification for potential targets which include virulence factors, and ii) development of prediction method(s) of linear B-cell epitope.
Thursday, 25-01-2007, 4pm, Bioinformatics Seminar Room

New Methods for the Prediction of Transmembrane Helices

Dr Moti Zviling, The Alexander Silberman Institute of Life Sciences, Department of Biological Chemistry, The Hebrew University, Jerusalem, Israel

Membrane proteins are crucial for many biological functions and have become attractive targets for pharmacological agents. About 10%-30% of all proteins have been found to contain membrane-spanning helices. Nowadays, high resolution structures for membrane proteins are still considered exceptional. The gap between known sequences and known structures enforces to find solutions to it through bioinformatics methods. In this lecture, we present three projects which aimed to find a suitable solution to this phenomenon. The first one deals with a new method for prediction of membrane helices: The genomic abundance and pharmacological importance of membrane proteins have fuelled efforts to identify them based solely on sequence information. Previous methods based on hydropathy analysis have been replaced by approaches based on hidden Markov models or neural networks which prevail due to their probabilistic orientation. Computationally, we made an optimization of the hydrophobicity tables used in hydropathy analysis by using a genetic algorithm approach. Results show a significant improvement in the prediction accuracy of hydropathy analysis, which may be valuable in the analysis of new genomes. The values obtained for each of the amino acids in the new hydrophobicity tables are discussed. Next having a method that optimizes the TM domains prediction of membrane proteins, we were ready to find out the importance of such helices in bitopic proteins, which were shown to be an important factor in protein-protein interactions, influence on the protein's function. Relative conservation analysis was used in this project to investigate this issue. Interestingly it was found that the transmembrane domains of bitopic proteins are, on average, significantly more conserved than the remainder of the protein. Fourier transform analysis of the conservation periodicity pointed to a pattern in which one side of the helix was conserved while the other was not. However, analysis of highly conserved transmembrane domains did not reveal any unifying consensus, pointing to a great diversity in the conservation patterns. Taken together, it may be possible to conclude that a significant proportion of transmembrane helices of bitopic membrane proteins participate in oligomerization events utilizing a multitude of motifs, thus, the plane of the lipid bilayer may represent a region of protein-protein interactions that is comparatively understudied. The third project deals with the question whether conserved TM sequences of bitopic proteins which share a high degree of sequence identity could also share a high structural similarity. Besides that applying information derived from evolutionary conservation data of such proteins could be useful especially in cases where high-resolution methods for structure determination have proved difficult to implement. To deal with these challenges the following work was done: Using silent amino acid substitutions simulations (CNS), we found new oligomerization motifs that may form new structures. These motifs are based only on widely available evolutionary conservation data.
Thursday, 18-01-2007, 4pm, Bioinformatics Seminar Room

Quantitative Modeling in Systems Biology, Part II: Probabilistic Models

Dr Peter Sykacek, Bioinformatics, BOKU University, Vienna.

During the second part of the quantitative modeling in systems biology seminar, we will review applications of probabilistic approaches to systems biology. After a brief introduction to probabilistic modeling, we will motivate the use of stochastic state space models (SSMs). SSMs describe the dynamics of biological systems and have thus similar goals as ODE based approaches. The most essential difference is the inherent random nature of SSMs. In a purely probabilistic understanding the states and all parameters of an SSM are considered to be interacting random variables. Measurement error is thus implicitly dealt with. Model inference is based on rules of probability calculus and can be set up such that a "divide and conquer" approach is done in parallel to parameter inference. Unfortunately, the practical experience with such approaches is that the versatility of modeling is all too often unmet by data quality. The final word of caution is thus not to believe that a sophisticated analysis will turn bad data into good systems biology.
Wednesday, 6-12-2006, 4pm, Bioinformatics Seminar Room

Quantitative Modeling in Systems Biology, Part I: ODE's

Thomas Tüchler, Bioinformatics, BOKU University, Vienna.

Our quantitative modeling seminars will review two quite popular modeling approaches in systems biology: differential equations and probabilistic state space models. Recent trends in bioinformatical data analysis are driven by the quest to obtain a quantitative understanding of interactions in biological processes at a molecular level. Although analysis is still faced with inferring differentially expressed genes, an in depth modeling has clearly to go beyond that. To avoid the curse of dimensionality, which is inherent to monolithic models of biological systems, all promising approaches are based on divide and conquer strategies. This is achieved by structuring the overall model as interaction of several simple modules.

Thomas Tüchler will talk about his experiences with differential equations. The divide and conquer approach requires defining a modularized structure of gene regulatory networks a-priori. The overall system is thus described as interacting modules each representing a dynamical system. Applying laws of biochemical kinetics, the dynamic behavior of the subsystems can be characterized by ordinary differential equations (ODE). However, writing down these equations is not sufficient for modeling the system, unless the model parameters are all known. Appropriate parameter inference is therefore imperative for successful ODE modeling and will be discussed in the ODE part of the seminar.
Wednesday, 29-11-2006, 3.30pm, Bioinformatics Seminar Room

Architectural insights into the multitasking genome of lower eukaryotes

Dr Hubert Renauld, Pathogens Sequencing Unit, The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK

Despite the many genomic sequences and gene annotations available, how genetic information is structured in any genome remains poorly understood. Eukaryotic parasites subtelomeres provide a unique system to tackle this question, as they are crucial effectors of adjacent 'contingency genes' (including virulence genes) in these organisms, and as subtelomeric-sequence assemblies are available for most of them. Yet their intrinsic features have undermined so far a comprehensive analysis. An intuitive approach will be outlined, based on the integration of dot-plot analyses, of softwares publicly available and of Perl scripts developed locally. Its application demonstrates how in-depth large-scale annotation of 'junk DNA' may help understanding the mechanisms underlying plasticity and mosaicism at the chromosomal ends, the mechanisms of allelic exclusion and antigenic variation observed among virulence-related genes, and foster convergence between experimental, evolutionary and in silico studies. Lastly, to begin to uncover the implicit information of a genome, an empirical, linguistically informed approach was undertaken to describe and compare subtelomere architecture in lower eukaryotes. Although the design features of these domains are admittedly diverse, variation is restrained. This suggests the existence of 'universals of chromosome architecture' similar to the universals of language proposed by Chomsky, and points to the possibility of using organism-neutral syntax and vocabularies to uncover principles of genome architecture.
Thursday, 19-10-2006, 4pm, Bioinformatics Seminar Room

Longitudinal and survival analysis for gene expression profiles: a review

Dr Gabriela Koskova, Bioinformatics, BOKU University, Vienna.

Time course gene expression analysis is essential to understand biological phenomena that evolve over time. A typical time course microarray experiment provides measurements about a sample of interest (e.g. a specific tissue) which are ordered w.r.t. time. The branch of statistics that allows to take such ordering effects into account is called longitudinal data analysis. Longitudinal models have thus recently found considerable attraction in the bioinformatics community. This seminar will explain basic concepts of longitudinal models and discuss their application to inferring differentially expressed genes, where a state transition occurs over time. If time permits, we will also discuss how longitudinal modeling can be combined with survival analysis. This part of the seminar shows, how longitudinal data analysis can be combined with a Cox survival model. Used together, this allows modeling of relations between gene expressions collected over time and a time interval, such as time to death or recovery.

Reading List

Storey, J.D., Xiao, W., Leek, J.T., Tompkins, T.G. and Davis, R.V.: Significance analysis of time course microarray experiments. Proc. National Acadedemy of Science, 36, 12837--12842, 2005.

Hong, Li: Functional empirical Bayes methods for identifying genes with different time-course expression profiles, Center for Bioinformatics & Molecular Biostatistics University of California, San Francisco, 2004.

Rajicic, N., Finkelstein, D.M., Schoenfeld, D.A.: Survival analysis of longitudinal microarrays, submitted to Bioinformatics, May 2006.

Ibrahim, J.G., Chen, M., Sinha, D.: Bayesian methods for joint modeling of longitudinal and survival data with applications to cancer vaccine trials, Statistica Sinica, 14, 863--883, 2004.

Thursday, 14-09-2006, 4pm, Bioinformatics Seminar Room