• No results found

Here, we show the user interface of SAFT, the alignment tool described in Section 2.3, be- ginning on page 47. Other scripts for preprocessing are part of SAFT. Their user interfaces are similar and will not be shown here.

User Interface

Usage: find_optimal_param_set.py [options] Options:

-h, --help show this help message and exit REQUIRED:

-b FILE, --best_score=FILE

file to store the best scoring parameters -m FILE, --matrix=FILE

file to store the full performance matrix -f FILE, --features=FILE

alignment intron features -i FILE, --annotation_introns=FILE

annotation intron list OPTIONAL:

-E STRINGLIST, --exclude_introns=STRINGLIST

list of comma separated intron files to exclude from submitted features

-I INT, --max_intron_len=INT

maximal intron length [10000000] -s, --ignore_strand

ignore strand information present in annotation -X INT, --max_feat_mismatches=INT

max number of mismatches for feat generation [80] (do only change, if you are absolutely sure!)

A.4 MMR

In this apendix, we provide the user interface of the multi-mapper resolution tool (MMR) that is described in Section 2.4, beginning on page 52. The lower part of options is only relevant in context of the usage in conjunction with MiTie as described in Section 2.4.3.

MMR Output Screen

Usage: ./mmr -o OUTFILE [options] IN_BAM Available Options:

Input handling and paralellization:

-P --parse-complete parse complete file into memory [off] -t --threads number of threads to use (must be > 2) [1] -S --strand-specific alignments are strand specific [off]

-C --init-secondary choose initial alignment also from secondary lines (flag 256) [off]

Input file filtering:

-f --pre-filter-off switch off pre filter for alignments that have F more edit ops than the best [on]

-F --filter-dist [INT] filter distance F for pre-filter [1]

-V --use-variants use variant alignments for filtering (different edit op count, requires XG and XM Tag in alignment files) [off] -L --max-list-length [INT] max length of alignment list per read (after

filtering) [1000] Paired alignment handling:

-p --pair-usage pre use pair information in the reads [off]

-i --max-fragment-size upper limit of GENOMIC fragment length [1 000 000] -A --max-pair-list-length [INT] max no of valid pairs before not using pair

modus [10000] Output handling:

-b --best-only print only best alignment [off] Options for using the variance optimization:

-w --windowsize [INT] size of coverage window around read [20]

-I --iterations [INT] number of iterations to smooth the coverage [5] Options for using the MiTie objective for smoothing:

-m --mitie-objective use objective from MiTie instead of local variance [off] -s --segmentfile MiTie segment file required for MiTie optimization []

-l --lossfile MiTie loss parameter file required for MiTie optimization [] -r --read-len [INT] average length of the reads [75]

-M --mitie-variance use variance smoothing for regions with no MiTie prediction [off]

-z --zero-expect-unpred initializes all covered but not predicted positions with expectation 0.0 [off]

General:

-v --verbose switch on verbose output [off] -h --help print usage info

A.5 Alternative Splicing Event Detection and Quantification

In this appendix, we summarize additional information relevant for the description of SplAd- der, which we discussed in Section 2.5, beginning on page 61.

SplAdder User Interface (Matlab/Octave version)

Usage: SplAdder [-OPTION VALUE] Options (default values in [...]): MANDATORY:

-b FILE1,FILE2,... alignment files in BAM format (comma separated list) -o DIR output directory

-a FILE annotation file name (annotation in *.mat format) OPTIONAL:

-l FILE log file name [stdout] -u FILE file with user settings [-]

-F FILE use existing SplAdder output file as input (advanced) [-] -c INT confidence level (0 lowest to 3 highest) [3]

-I INT number of iterations to insert new introns into the graph [5] -M <STRAT> merge strategy, where <STRAT> is one on:

merge_bams, merge_graphs, merge_all [merge_graphs]

-n INT read length (used for automatic conf. level settings) [36] -R R1,R2,... replicate structure of files (same number as

alignment files) [all R1 - no replicated] -L STRING label for current experiment [-]

-S STRING reference strain [-]

-C y|n truncation detection mode [n] -U y|n count intron coverage [n]

-P y|n only use primary alignments from provided files [n] -d y|n use debug mode [n]

-p y|n use rproc [n]

-O y|n annotation is in half-open coordinates -V y|n validate splice graph [n]

-v y|n use verbose output mode [n] -A y|n curate alt prime events [y]

-x y|n input alignments share the same genome [y] -i y|n insert intron retentions [y]

-e y|n insert cassette exons [y] -E y|n insert new intron edges [y] -r y|n remove short exons [n] -s y|n re-infer splice graph [n]

-T y|n extract alternative splicing events [y]

-X y|n alignment files are variation aware (XM and XG tags present) [n] -t STRING,STRING,... list of alternative splicing events to extract

Confidence levels for graph augmentation

SplAdder has several confidence levels the user can choose from, ranging from 0 (lowest confidence) to 3 (highest confidence). The levels adjust filter parameters to 1) select high confidence alignments and 2) set the criteria for graph augmentation. The parameter r is the length of the reads in the RNA-Seq sample.

Settings for accepted introns

Criterion Confidence Level

0 1 2 3

min segment length d0.1 · re d0.15 · re d0.2 · re d0.25 · re

max mismatches max{2, b0.03 · rc} max{1, b0.02 · rc} max{1, b0.01 · rc} 0

max intron length 20,000 20,000 20,000 20,000

min junction count 1 2 3 6

Settings for accepted cassette exons

Criterion Value

min exon coverage 5

min fraction of covered positions in exon 0.9

min relative coverage difference to flanking exons 0.05

Settings for accepted intron retentions

Criterion confidence level

0 1 2 3

min intron coverage 1 2 5 10

min fraction of covered positions in intron 0.75 0.75 0.9 0.9

min intron coverage relative to flanking exons 0.1 0.1 0.2 0.2

Event validation criteria

Each event has different criteria for validation. The table below lists all criteria for the different types of events.

Exon Skips

Criterion Value

min relative coverage difference to flanking exons 0.05

min intron count confirming the skip 3

min intron count confirming the inclusion 3

Intron Retentions

Criterion Value

min intron coverage 3

min intron coverage relative to flanking exons 0.05

min fraction of covered positions in the intron 0.75

min intron count confirming the intron 3

Alternative Splice Site Choice

Criterion Value

min intron count confirming the intron 3