These pages contain supplementary material for the manuscript.
R. Chaix,
M. Somel,
D. P. Kreil,
P. Khaitovich,
G. A. Lunter
We provide a short supplementary document with additional result tables from phylogenetic modelling and a number of figures referenced in the main manuscript text.
The remainder of this web-page has additional material concerning the low-level analysis and transforms of microarray data, which do not lend themselves well to printing, and some of which are quite large in file size.
On the employed Affymetrix HG-U133 Plus 2.1 arrays, 25-mer oligonucleotide probes had been designed to hybridize to specific transcript targets. Typically, 11 probes were intended per target transcript to allow both a reduction of noise and a removal of probe sequence specific effects. A subset of probes were selected masking those probed that would have been affected by sequence variation between the examined species (see manuscript). Typically 9 probes per target remained in the consensus set, although some targets were represented by many more probes, as shown in these plots of probe number distributions before and after the masking procedure. Considerable improvements in the human gene annotation affect, however, differential expression estimates of 30–40% of all targets, and have made a reassignment of probes to target genes necessary (Dai et al., NAR 33, e175; 2005). Our analysis used Release 9 of probe assignments to EnsEMBL gene models (release 42, Dec 2006). With these updated assignments, the typical number of probes per target drops from 11 to 6 in the consensus set. Note that the distribution of probe numbers for the revised probe annotation reflects the required merge of alternate gene models giving rise to a multi-modal distribution in the original unmasked probe set.
We provide a comprehensive set of diagnostic plots for a thorough characterization of the data for systemic trends such as by species, sex, or age, and an assessment of the appropriateness of different normalization methods.
In particular, in each directory, we provide the following diagram files:
Age is shown in years from birth, sex is coded by 0=male, 1=female.
We make these files available for the three examined normalization methods and the four different species subsets studied:
Norm | HCRO | HCR | HCO | HC |
---|---|---|---|---|
nonorm2 | plots | plots | plots | plots |
plots | plots | plots | plots |
Legend: H=Human, C=Chimpanzee, R=Rhesus, O=Orang Utan; For all normalization variants, data has been normalized by a chip specific shift and scale (using the vsn algorithm) and were then either subjected to no further normalization (nonorm2), probe level quantile-quantile normalization (QQ).
We provide both normalized probe level data for perfect match (PM) probes as well as robust (IWLS fitted) probe level model (PLM) gene expression summaries for the three examined normalization methods and the four different species subsets studied:
Norm | HCRO | HCR | HCO | HC |
---|---|---|---|---|
nonorm2 | PLM | PM | PLM | PM | PLM | PM | PLM | PM |
PLM | PM | PLM | PM | PLM | PM | PLM | PM |
Legend: H=Human, C=Chimpanzee, R=Rhesus, O=Orang Utan; Normalization methods are described above.
These files have been compressed with bzip2. PLM files are about 2–3 MB large, PM files are about 7–12 MB large.
The first column contains EnsEMBL gene IDs, and each further column corresponds to a target in the same order as shown in the Target Description File.
The CEL files for the 22 hybridizations described and analysed in the paper are provided here for download [ 155.3MB zip ].
last updated 2008-01-18 by David
Kreil