• No results found

Signal Processing Adjustable Filter Parameters

1. GS Run Processor

1.4 Filters

1.4.3 Signal Processing Adjustable Filter Parameters

Some of the quality filters may be turned off or adjusted to control the stringency of the output.

This section provides instructions for customizing the filter parameters for signal processing.

Identify the Data Processing folder (‘D_’) of the sequencing Run whose reads are to be re-filtered or the Run folder (‘R_’) of a sequencing Run whose reads will be filtered using the customized filter parameters.

Generate a template file in the Run or data directory by typing the following command, from within the fullProcessing folder:

gsRunProcessor --template=filterOnly > filterTemplate.xml shotgun library processing in the current directory.

There are additional templates for Paired End and Amplicons experiments, which can be

Within section

parameters that govern the filtering of high quality reads for the sequencing Run are adju

def lt are ma

o uce the number of filtered high quality

This will create a template file for

generated by using the template arguments “filterOnlyPairedEnd” and

“filterOnlyAmplicons” in the command above. The XML output filename (to the right of the “>” symbol) can be any valid string ending with ‘.xml’.

Open the file with a text editor. An easy-to-use text editor called “nedit” is present on the Genome Sequencer FLX Instrument; on the GS Junior Attendant PC, a similar program called “gedit”, is present. To use nedit (or gedit) to edit the file, type the following command:

nedit filterTemplate.xml

the template, scroll down to the <qualityFilter> and <baseCaller>

s (Figure 7 and Figure 8 below). All of the user-adjustable Quality Filter sted in these sections. Both sections contain some adjustable parameters by au . Additional parameters can be added to the <qualityFilter> section. As changes

de, note that:

Increasing a stringency setting will red

reads by eliminating the lowest quality reads from among the previously filtered high quality reads. This may increase overall accuracy of the filtered high quality reads.

o Decreasing a stringency setting will increase the number of passed filter high quality reads by allowing lower quality reads to pass through the quality filter.

This may reduce overall accuracy of the filtered high quality reads.

o he following command structure

After the changes have been made, save and exit the text editor.

Change directory, to the parent Run folder directory:

cd ..

Launch the processing job specifying the modified filter parameter script:

To launch a filter only signal processing job use t

Software v. 2.5p1, August 2010 20

runAna

If the jo D_..filte

lysisFilter--pipe=/path_To_filterTemplate.xml /path_To_DATA_DIRECTORY

b is successfully launched, a new Data Processing directory with the name convention rTemplate will be created in the Run directory.

processi

Figure 7: Signal ng filter adjustment XML code for Shotgun library processing (comments removed)

The quality filter parameters are modifiable. Their defaults and modification recommendation values are described below. For more information regarding Amplicon filter parameters, please refer to Application Note APN-10002: Amplicon Sequencing; Experimental Designs, Guidelines and Tips, available at http://454.com/my454.

doDotCheck Checks for reads with too many negative flows.

Value Effect True – Default All Dots filtering enabled

False Dots filtering disabled

doMixedCheck Checks for reads with too many positive flows.

Value Effect True – Default All Mixed filtering enabled

False Mixed filtering disabled

doClassifierCheck Checks if reads start with a valid 4-base ‘key’ sequence (Key Pass)

Value Effect True – Default All Key Pass filtering enabled

False Key Pass filtering disabled

doShortSignalCheck A signal intensity filter that trims reads that lose signal

‘crispness’ near the end.

Value Effect True – Default All Signal intensity filtering enabled

False Signal intensity filtering disabled

doPrimerTrimming Trims the end of a read when it matches a 454 Sequencing System Adaptor sequence.

Value Effect True – Default All Primer trimming enabled

False Primer trimming disabled

doValleyFilterTrimBack Enables or disables the TrimBack Valley Filter Value Effect

True – Default Shotgun, Paired End read trimming enabled False – Default Amplicon read trimming disabled

Software v. 2.5p1, August 2010 22

minConsensusSignal values integer >= 1.

The minimum average intensity of the positive key flows to be considered as a potential well. Acceptable

are any

Value Effect 20 - Default Shotgun, full

wise processing

60 - Default other

>Default

Increasing this value can eliminate dim candidate wells This can yield more high-quality fewer candidates. However, if

oo many good candidates are with poor signal quality.

bases and Reads from the value is too high, t

discarded and throughput declines.

useBicubic

Images are upsampled during image processing, doubling arameter impacts psampling is performed for their size in each dimension. This p

whether bicubic or bilinear u this step.

Value Effect True - Default All Bicubic upsampling is performed

False Bilinear upsampling is performed

vfScanLimit Controls how much of the read is scanned with the valley teger > 0.

filter. Acceptable values are any in Value Effect

< Default

Amplicon Runs will have more reads, but bases from the un-filtered end of the flowgram m

called ay have higher es.

error rat 4,096 - Default Shotgun

efault Amplicon 700 - D

Default Amplicon Runs will have fewer reads, but they will tend to have higher accuracy.

>

vfScanAllFlows Modifies the meaning of the valley flow parameters.

Value Effect True

The ratio of vfBadFlowTh taken as a score thresho

reshold to vfLastFlowToTest is ld that is applied over the entire read.

False - Default Shotgun

For amplicon runs, a maximum of vfBadFlowThreshold valley flows can be seen within the first vfLastFlowToTest nucleotide flows.

tiOnly - Default Amplicon nt to true for both GS Junior and Genome Sequencer FLX Titanium runs.

Equivale

flxOnly Equivalent to true for Genome Sequencer FLX Standard ns only.

ru

vfLastFlowToTest

which the reads are monitored for the presence of valley flows. The maximum value for this parameter is equal to the Represents the number of flows over

number of nucleotide flows in the Run (most stringent), and the minimum value is “0” (to turn off the Valley Filter). This parameter’s behavior has been largely superseded by the vfScanLimit parameter.

Value Effect

400 higher stringency

320 Default

168 Lower stringency

vfBadFlowThreshold

If vfScanAllFlows is false, then this is literally the number of flows that can fall within a valley in the valley

general this arameter should not need to be adjusted; the preferred filter before the read is rejected (for amplicons) or trimmed (for shotgun reads). In the more common case where vfScanAllFlows is true, this value forms the numerator of the valley filter score. In

p

method is to adjust the valley filter threshold is the vfTrimBackScaleFactor parameter.

Value Effect

2 higher stringency

4 Default

6 lower stringency

vfTrimBackScaleFactor

Test.

If vfScanAllFlows is true, then flows are scored against the flow valleys using an exponential function. The sum of scores is multiplied by vfTrimBackScaleFactor before being compared to the maximum allowable score ratio formed from vfBadFlowThreshold and vfLastFlowTo If the scaled score exceeds the maximum, then the read is trimmed (shotgun) or discarded (amplicon).

Acceptable values are floating point numbers >=0.

Value Effect 1.6 - Default Shotgun

3.0 - Default Amplicon

> Default ields

r Increases the stringency of the valley filter. This y higher accuracy, but also shorter read lengths and fewe high-quality reads.

Software v. 2.5p1, August 2010 24

vfUseRollingWindows tween using This parameter estimates the valley thresholds be

the 0, 1, 2, and 3-mers dynamically across the run using a window of some number of flows, rather than one set of thresholds for the entire run.

Value Effect

True - Default es the

read Use rolling windows enabled. This generally mak

valley filter less stringent, yielding more high-quality reads and bases, but slightly shorter average lengths.

False Use rolling windows disabled

axFailedPercent

entage of “valley” flows in a read which cause ected entirely. This parameter can be used to tune the failed percent. A range between “90”

The perc

the read to be rej vfM

and “100” is recommended.

Value Effect 100 Default

90 higher stringency

vfTrimBackMinimumLength

r if they are attempting to sequence very short When a read is trimmed such that it would be shorter than the value of this parameter, the read is rejected entirely. Users may set this parameter to a lower numbe

reads.

Value Effect

>84 higher stringency

84 Default

<84 lower stringency

minLength Represents the minimum length of reads acceptable after all quality filtering steps.

Value Effect 50 Default

84 higher stringency (old default)

useAmpliconsPrimers Only search for the sequencing primers appropriate for Amplicon experiments.

Value Effect

True - default Enabled

False Disabled

useCorrectionGlobalLimit Enables or disables a global limit on the SFF flowgram correction limit.

Value Effect

True - default Enabled

False Disabled

that is NOT recomme

While it is possible to turn of s,

Dot, Mixed and Short Signa ata

nded fo to turn

off Dots and Mixed filter, ad of

the Shotgun Processing temp ure 7).

<doDotCheck>false</doDotC

<doMixedCheck>false</doM

<doClassifierCheck>false</d

<doShortSignalCheck>false</doShortSignalCheck>

f the read rejecting and signal intensity filters (Key Pas l ) this will result in output of sub-standard quality d

r use in subsequent data analysis For example , d the following lines under the <qualityFilter> section

late (Fig heck>

ixedCheck>

oClassifierCheck>

Software v. 2.5p1, August 2010 26