MapAl is a tool for RNA-Seq expression profiling that
builds on the established programs Bowtie
and Cufflinks.
Allowing an incorporation of ‘gene models’ already at the alignment stage almost doubles the number of
transcripts that can be measured reliably.
MapAl usage:
Synopsis: perl MapAl.pl -i
input_file -g GTF_file [-t temp_path] [-s mem_size] [-S
strandedness] [-P pair_orientation] [-o output_file]
- -i input_file – file with reads aligned to the transcript sequences in SAM format – required
- -g GTF_file – ‘gene models’ file (in GTF format) – required
- -S strandedness – allowed
orientation of the aligned read (for single-end or the first read
from the pair for paired-end and mate-pairs) on the transcript
sequence; 0 - positive strand of the transcriopt, 1 - both strands
of the transript (default), 2 - reverse strand of the
transcript; – optional
- -P pair_orientation – if
paired-end or mate-pairs reads are used than this paramater
specifies allowed orientation of the second read from the pair
relative to the first read from the pair; 0 - oposite strand, 1 -
both strands (default), 2 - the same strand; – optional
- -o output_file – output file with
reads aligned to the genomic locations – optional
(default: result.sam)
- -s temp_path, -t mem_size –
As MapAl uses internally UNIX sort command, sometimes
might be usefull to provide alternative values for allowed RAM usage
and temporary files storage: -t - the path
to the directory where big temporary files can be
stored, -s the size of the RAM memory
which can be allocted by the tool – optional
When -S and -P parameters are specyfied MapAl will filter out
all alignments which do not fulfill specified strand orientation
requirements. Orientation of alignment is assesed by examining
the FLAG field (SAM format) of the input file.
MapAl is available under the GPL.
You can
download the stand-alone MapAl
script.
All versions of MapAl are available here
MapAl package:
The MapAl package contains the components of the MapAl
pipeline together with appropriate test data:
download the MapAl package (70MB).
The package contains:
- The MapAl Perl script
- A FASTA file with transcript sequences from H. sapiens Chromosome 1 (GRCh37_58)
- A GTF file with ‘gene models’ of H. Sapiens Chromosome 1 (GRCh37_58)
- CSFASTA and QUAL files with an artificially prepared set of about 1 million reads
- The MapAl pipeline shell script
TopHat package:
For a comparison we also provide a package where the well established
tool TopHat is used with
the corresponding test data:
download the TopHat package
(130MB).
The package contains:
- A FASTA file with the genomic sequence of H. sapiens Chromosome 1 (GRCh37_58)
- A GTF file with the ‘gene models’ of H. sapiens Chromosome 1 (GRCh37_58)
- CSFASTA and QUAL files with an artificialy prepared set of about 1 million reads
- The TopHat pipeline shell script
The
TopHat pipeline produces two output sets:
- In the cuff_res_known_dir directory you will find the
expression estimates for the known transcripts.
- In the cuff_res_denovo_dir directory you will find the
expression estimates for transcripts from de novo
discovered genes.
The second of these sets can be combined with results from the
MapAl pipeline for comprehensive profiling with increased precision.
For testing purposes Bowtie v0.12.7, TopHat v1.1.4, and
Cuffflinks v0.9.1
were used. Newer versions should work as drop-in replacements as
long as they are backwards-compatible. Note that these tools may
have their own depencies that need to be installed.
In order to adapt the pipeline scripts to different data set types, corresponding appropriate changes
in the Bowtie, TopHat, and Cuffflinks execution
options may be required.
The
SEQanswers forum is a good place
to look for advice after having read the available documentation.
If you have any further question to MapAl authors please
contact us.
If you would like to know more about our group please visit our homepage.