Formalin-Fixed Paraffin-Embedded (FFPE) archival tissue blocks are a rich resource for retrospective discovery studies in cancers. A great advantage of FFPE blocks is that in many cases tissue samples of different disease stages of the same patient and the clinical data are available, which allows us to build more precise models to identify the gene expression patterns associated with cancer progression. However the fragmented and cross-linked nature of of mRNA/DNA in the FFPE blocks presents a challenge to successfully measure the transcript level. I will present how we leverage the next-generation sequencing technology to quantify transcripts in FFPE samples. We have sequenced the transcriptome in FFPE samples from non-progressive and progressed stages of bladder cancer. The subsequent analysis has shown gene expression patterns correlated to the cancer progression time as the first step to build a clinically feasible diagnostic tool to predict whether and ultimately when a patient will progress to the invasive stage in bladder cancer.
Commonly applied methods for analysing microarray data generally assume Gaussian noise despite the normal distribution's sensitivity to frequently occurring outliers. The computational efficiency of Gaussian derived methods makes them preferable to robust or non-parametric approaches, especially when dealing with high-dimensional microarray data. In contrast to the Gaussian model, a robust model is able to deal with heavy-tailed data. However, analysing the effects of choosing a more complex robust model requires a thorough investigation.
Therefore, we propose a hierarchical Bayesian model which allows us to select the most appropriate noise model from a given finite set. Including the Gaussian model in the set of considered distributions makes it possible to directly compare Gaussian to heavier-tailed noise models. Furthermore, we compared the inference results of the best fitting noise model to those of the Gaussian model in order to assess differences in the biologically relevant differential expression classification.
Our investigations yielded the interesting result that a heavy-tailed distribution provides the best fit for all investigated data sets. Comparing the heavy-tailed t model to the Gaussian noise model revealed that choosing a less robust model has a huge impact on the differential expression results. Subsequently, the choice of noise model affects the conclusions drawn from the differential expression assessment on a higher level of analysis, as we considered when using Gene Ontologies.