Supplement for Chaix et al. (2008)

These pages contain supplementary material for the manuscript.

Modelling the Evolution of Primate Gene Expression

R. Chaix, M. Somel, D. P. Kreil, P. Khaitovich, G. A. Lunter

Supplementary Results from Phylogenetic Modelling and Figures referenced in the main text

We provide a short supplementary document with additional result tables from phylogenetic modelling and a number of figures referenced in the main manuscript text.

The remainder of this web-page has additional material concerning the low-level analysis and transforms of microarray data, which do not lend themselves well to printing, and some of which are quite large in file size.

Supplementary Results of Microarray Data Analysis

Selection and Mapping of Probes

On the employed Affymetrix HG-U133 Plus 2.1 arrays, 25-mer oligonucleotide probes had been designed to hybridize to specific transcript targets. Typically, 11 probes were intended per target transcript to allow both a reduction of noise and a removal of probe sequence specific effects. A subset of probes were selected masking those probed that would have been affected by sequence variation between the examined species (see manuscript). Typically 9 probes per target remained in the consensus set, although some targets were represented by many more probes, as shown in these plots of probe number distributions before and after the masking procedure. Considerable improvements in the human gene annotation affect, however, differential expression estimates of 30–40% of all targets, and have made a reassignment of probes to target genes necessary (Dai et al., NAR 33, e175; 2005). Our analysis used Release 9 of probe assignments to EnsEMBL gene models (release 42, Dec 2006). With these updated assignments, the typical number of probes per target drops from 11 to 6 in the consensus set. Note that the distribution of probe numbers for the revised probe annotation reflects the required merge of alternate gene models giving rise to a multi-modal distribution in the original unmasked probe set.

Diagnostic plots, Q/A, Normalization

We provide a comprehensive set of diagnostic plots for a thorough characterization of the data for systemic trends such as by species, sex, or age, and an assessment of the appropriateness of different normalization methods.

In particular, in each directory, we provide the following diagram files:

File *QQnScatters shows pairwise Quantile-Quantile plots comparing signal distributions in the panels below the diagonal and simple scatter plots of signals from all pairs of chips, with the diagonal panel displaying the chip IDs (also see data download section below). Note that this file can be large (9–24 MB) and may display very slowly in your viewer because of the larger number of detailed features.
File *MAs shows traditional M(A) plots depicting differential signal variation as a function of average signal and some summary statistics for all pairs of chips with the diagonal panel displaying the chip IDs. Coloured curves track a Loess fit of M(A). Note that this file can be large (6–17 MB) and may display very slowly in your viewer because of the larger number of detailed features.
The files *QQnScatters* show pairwise Quantile-Quantile plots in the panels below the diagonal and scatter plots for pairs of chips in the panels above the diagonal, with one file for each species, comparing data for multiple individuals, the species, age, and sex of which are shown in the diagonal panel (where available).
The files *QQnMA* show pairwise Quantile-Quantile plots in the panels below the diagonal and M(A) in the panels above the diagonal, with one file for each species, comparing data for multiple individuals, the species, age, and sex of which are shown in the diagonal panel (where available).
The *QQnMA-rmean* file compares robust (IWLS) averages by species, as indicated by the diagonal panel. These files most clearly show the systemic deviation of Orang Utan signals.
The files PLM.resids and PLM.weights show the residuals and weights of the robust (IWLS) multi-chip probe level model fits as a function of chip location. This is a good indicator of chip and hybridization quality. These files can be large (11–22 MB).

Age is shown in years from birth, sex is coded by 0=male, 1=female.

We make these files available for the three examined normalization methods and the four different species subsets studied:

Norm	HCRO	HCR	HCO	HC
nonorm2	plots	plots	plots	plots
QQ	plots	plots	plots	plots

Legend: H=Human, C=Chimpanzee, R=Rhesus, O=Orang Utan; For all normalization variants, data has been normalized by a chip specific shift and scale (using the vsn algorithm) and were then either subjected to no further normalization (nonorm2), probe level quantile-quantile normalization (QQ).

Normalized data

We provide both normalized probe level data for perfect match (PM) probes as well as robust (IWLS fitted) probe level model (PLM) gene expression summaries for the three examined normalization methods and the four different species subsets studied:

Norm	HCRO	HCR	HCO	HC
nonorm2	PLM \| PM	PLM \| PM	PLM \| PM	PLM \| PM
QQ	PLM \| PM	PLM \| PM	PLM \| PM	PLM \| PM

Legend: H=Human, C=Chimpanzee, R=Rhesus, O=Orang Utan; Normalization methods are described above.

These files have been compressed with bzip2. PLM files are about 2–3 MB large, PM files are about 7–12 MB large.

The first column contains EnsEMBL gene IDs, and each further column corresponds to a target in the same order as shown in the Target Description File.

Raw data

The CEL files for the 22 hybridizations described and analysed in the paper are provided here for download [ 155.3MB zip ].

Norm	HCRO	HCR	HCO	HC
nonorm2	PLM \| PM	PLM \| PM	PLM \| PM	PLM \| PM
QQ	PLM \| PM	PLM \| PM	PLM \| PM	PLM \| PM

Online Supplement for Chaix et al. (2008)