• No results found

AS Replaces Page 1 of 50 ATF. Software for. DNA Sequencing. Operators Manual. Assign-ATF is intended for Research Use Only (RUO):

N/A
N/A
Protected

Academic year: 2021

Share "AS Replaces Page 1 of 50 ATF. Software for. DNA Sequencing. Operators Manual. Assign-ATF is intended for Research Use Only (RUO):"

Copied!
50
0
0

Loading.... (view fulltext now)

Full text

(1)

ATF

Software for DNA Sequencing

Operators Manual

(2)

1 About ATF ...5

1.1 Compatibility ...5

1.1.1 Computer Operator Systems...5

1.1.2 DNA Sequencing Chemistry and DNA Sequencers ...5

1.2 Overview...5

2 Getting Started...6

2.1 Installation ...6

3 Quick Guide...7

3.1 Login...7

3.2 Create Reference Sequence ...7

3.2.1 Create a Reference Sequence from a GenBank Entry...7

3.2.2 Creating a Reference Sequence from a fasta file ...7

3.3 Creating Analysis Settings...7

3.3.1 Entering your sequence file-naming convention...7

3.3.2 Creating the Sequence Analysis Settings...9

3.4 Import your Sequences...9

3.5 Sequence Analysis and Editing ...10

3.6 Producing a Report...10

3.7 Saving and Opening Layouts...11

4 Detailed Users Guide and Description of Functions ...11

4.1 Logging On ...11

4.2 Adding New Operators ...12

4.3 Changing the password ...12

4.4 Additional Functions of the Login Page ...13

4.4.1 Default Settings ...13

4.4.2 System Files...13

4.4.3 Operator Levels ...13

5 Setting up ATF for Analysis...14

5.1 Setting the Analysis Parameters...14

5.1.1 Creating new settings files in Edit | Setting | General ...15

5.1.2 Setting the Electropherogram Display: Colours, fonts and line widths ...15

5.2 Setting the sequence-file naming convention in Edit | Settings | Naming . ...17

5.2.1 Set the Naming convention by defining the Sample Delimiters ...17

5.2.2 Setting the Naming convention by defining the Sample Name, Library alias positions and word length...18

5.3 Setting the data analysis parameters in Settings | Engine...18

5.3.1 BCS Limits...19

5.3.2 Matching Mode ...19

5.3.3 Basecaller...19

5.3.3.1 Apply Height Maps ...20

5.3.3.2 Update Maps ...20

5.3.3.3 Apply Auto Editing (Not recommended for variant detection applications) ...20

(3)

5.3.3.4 Suggested Applications for Picket Fence Analysis (Check

Apply Maps and / or Update Maps)...20

5.3.3.5 Suggested Applications for Non-Picket Fence Analysis (Do NOT Check Apply Maps and / or Update Maps) ...20

5.3.3.6 Suggested Applications for Autoediting...20

5.3.3.7 Summary: Sequencing Application and Analysis Parameters 21 5.4 Creating Reference Sequences in Setting | References...22

5.4.1 Creating a new reference from a GenBank files ...22

5.4.1.1 Saving GenBank Files...22

5.4.1.2 Creating the references sequence file in Assign ...22

5.4.2 Creating a new reference from text files in Fasta format ...23

5.4.3 Annotating the Reference Sequence: Editing or Adding Reference sequence information ...24

5.4.4 Creating Coding Groups...26

5.4.5 Adding the location of known variants to the reference sequence.27 5.4.6 Removing primer site sequence from analysis using Trim ...28

5.4.7 Haplotype specific of diploid template using haplotype specific sequencing primers ...29

5.4.8 Additional Reference Sequence Functions...30

5.5 Settings Appendix...30

5.5.1 Picket Fences (See 5.3.3 Basecaller)...30

5.5.2 Sequence Electropherogram Quality: The Base Call Score (BCS)31 6 Importing Sequences into Assign for Analysis ...32

6.1 Importing sequences is performed by selecting File | Import...32

6.2 Importing Sequences by Directory...32

6.3 Importing Sequences Individually ...33

6.4 Importing Sequences Using the Filter Function ...33

7 Sequence Analysis and Editing...33

7.1 The Analysis Screen...33

7.1.1 The Sample Pane...34

7.1.2 The Electropherogram (EPG) Pane...34

7.1.3 Migrating through the sequence EPG is performed...35

7.1.4 The Navigator ...35

7.1.4.1 Editing using the Navigator Keypad ...35

7.1.4.2 Priority Editing ...35

7.2 Additional Sample and Sequence Editing Functions ...37

7.2.1 Editing in the Sample Pane ...37

7.2.2 Editing in the EPG pane ...38

7.3 Additional Functions ...38

7.3.1 Zooming the EPG ...39

7.3.2 Hiding the EPG...39

7.3.3 Expanding the EPG Window ...39

7.4 Sequence View Options 39 7.4.1 Select Consensus to view the sample text sequences ...40

7.4.2 Select Dots after selecting consensus to view only those sequences that differ from the reference sequence...41

(4)

7.4.3 Select Quality to view the sample text sequences and BCS ...41

7.4.3.1 Suggested Applications for Consensus sequence and Quality view ...41

7.4.4 Select Alignment to view the sequences of the best matched alleles ...42

8 Reports ...42

8.1 Variant Report ...43

8.2 Genotype Report ...44

8.3 HARPs Report ...48

8.4 FASTA Report………...47

8.5 Quality Report...48

9 Saving, Opening and Printing Layouts ...49

10 FAQ ...49

11 Contact Us...50

(5)

1 About ATF

ATF is a sophisticated, yet simple to use software program for the analysis of DNA sequence electropherograms from automated DNA sequencers.

ATF can be used for an extensive range of sequencing applications as well as producing quality control information in a unique and informative manner.

.

1.1 Compatibility

1.1.1 Computer Operator Systems

ATF is compatible with Windows NT, Windows 2000, Windows XP and Windows Vista operating systems. Microsoft Excel 97 or above is required for the creation of enhanced reports.

1.1.2 DNA Sequencing Chemistry and DNA Sequencers

ATFrequires .ab1 or .abd sequence files from automated DNA sequencers. The files should be run through the Applied Biosystems’ Sequence Analysis software or similar. Automated DNA sequencers from Applied BiosystemsTM, Beckman, and Amersham have been used successfully with ATF.

1.2 Overview

• Developed by laboratory scientists and expert computer programmers with extensive experience in DNA sequencing

• ATF is a sequencing enabling software designed to handle any DNA sequencing application.

• Removes data analysis as a bottleneck for high throughput sequencing based genotyping, SNP discovery, mutation and variant detection and contig assembly

• User friendly with minimum hands on

(6)

• Includes a patented approach to electropherogram analysis that normalises the data and enables the quantitative nature of DNA sequencing to be exploited. This approach, nicknamed Picket Fence analysis, improves heterozygous base calling and accurate detection of low level mutants

• Performs a dynamic assessment of background noise and compensates to perform accurate base calling on data with background

• Enables genotyping and variant analysis of heterozygous insertions and deletions

• Strong focus on data quality with quality indicators for each sequenced position, each sequencing primer and each sample

• Enables analysis of critical quality parameters such as peak quality, signal intensities and base call accuracy (sequence edits)

• Unique approach to quality control that enables automatic generation of longitudinal quality control data. This enables the effect of reagent change on sequence data quality to be assessed and enables performance

criteria to be established.

• Different levels of access are possible to control release of results to appropriate personnel and provide an analysis audit trail

2 Getting Started

2.1 Installation

ATF is a single user application. It has been encrypted to prevent copying to other computers.

• You can obtain ATF by contacting Conexio Genomics o support@conexio.iinet.net.au

• You will be provided with a link to enable the installer to be downloaded

• The file you will receive is called Setup.msi.

• Save this onto the hard disk of your computer.

• Once saved, double click on the Setup.msi icon and follow the installation instructions.

• During installation you will be asked to send you computer hardware number before being granted access to the software. The installer

provides you with this number. Copy and Paste it into an email message and email it to conexio@iinet.net.au for licence registration.

(7)

3 Quick Guide

This section describes the basic functions of ATF and provides sufficient information for basic analysis:

How to login

How to create a reference sequence to which unknown sequences are compared How to create analysis settings for automated analysis of data

How to perform sequence data editing How to generate a report

How to Save a Layout How to Open a saved layout

3.1 Login

1. Open an ATF layout by clicking on the shortcut created during installation or the ATF.exe icon 2. Login using the Admin username and the

default password cg01.

See Section 4

Additional users can be added with unique passwords and varying levels of access. See

3.2 Create Reference Sequence

A reference sequence is required for Genotyping and Variant Detection. In

Genotyping the reference sequence represents a library of genotypes. Reference sequences are not required for de novo contig assembly or analysis of unknown sequences

3.2.1 Create a Reference Sequence from a GenBank Entry

1. Go to the GenBank web page that contains the reference sequence

2. Save the web page as a text file.

3. In ATF go Edit | Settings | References 4. Click on the Import GenBank button

See section 5.4

ATF will create an annotated reference sequence based on information in GenBank.

Additional information, such as known variants can be

(8)

5. Browse to the GenBank text file 6. Import

7. Edit or enter the reference sequence information. This includes the reference sequence name, the reference sequence filename and the version are required 8. Click Save Reference.

added manually.

Regions within the sequence that contain coding groups can also be added to enable reporting of putative amino acids

3.2.2 Creating a Reference Sequence from a fasta file

1. Save the reference sequence as a text file in fasta format

2. Change the .txt extension to .fasta

3. In ATF go to Edit | Settings | References 4. Click the Fasta button

5. Browse to the fasta file 6. Import

7. Edit or enter the reference sequence information. This includes the reference sequence name, the reference sequence filename and the reference version.

8. Click on Save Reference

fasta files containing sequences of multiple alleles can be imported to create a reference sequence library for Genotyping

The reference can be annotated to include the location of exons and other genetic information.

3.3 Creating Analysis Settings

3.3.1 Entering your sequence file-naming convention

(9)

1. Go to Edit | Settings | Naming

2. Define the location of the sample ID within the sequence filename by either:

defining how the sample name relates to delimiters within the filename, or

defining the position of the start of the filename and the number of characters used

3. In Reference Aliases Select the reference sequence and enter the alias used in the sequence filename that defines the reference 4. Click Update to save changes

See section 5

Defining your sequence file naming convention will automatically group different .ab1 files for a sample and analyse the consensus sequence against the appropriate reference sequence.

3.3.2 Creating the Sequence Analysis Settings

1. Go to Edit | Settings | Engine

2. Select the appropriate matching mode.

Genotyping when comparing a sequence against a reference library (eg HLA typing) and Variant Detection when comparing a sequence with a single reference sequence

3. Ensure Auto-editing is not checked for variant detection.

4. Click Update to save changes

ATF operates in 2 x modes.

Variant Detection and Genotyping. Genotyping is used when analyzing against a Library of sequences. In Engine the peak normalization algorithm can be engaged (Apply Maps and Update Maps) and the cutoff for heterozygous detection can also be set (Detection Limits)

3.4 Import your Sequences

1. Go to File | Import Electropherograms to import .ab1 data

2. Select either Browse to import the contents of a directory, or Select Files Manually to import

See Section 6

Text files in fasta format can also be imported

(10)

files manually

3.5 Sequence Analysis and Editing

1. Base call changes can be made using the key pad on the Navigator. Additional editing

functions are available by right clicking with the cursor on the electropherogram

2. Electropherograms can be trimmed (Set Start Position/Set End Position in the right click menu) de-activated and removed from the analysis or removed from the layout by right clicking on the electropherogram.

3. Base call errors can be located by dragging the blue scroll box or clicking either side of the scroll box.

4. Alternatively, base calls with a low quality score, positions mismatched with the reference, user defined variant positions and edited positions can be located by checking either/or the BCS/Edits/MM boxes on the Navigator and clicking the double green arrow (>>) button in the Navigator or by clicking the red cross button.

See section 7

Base call errors are usually at positions of poor quality.

Each base call has a shaded box above it according to peak quality to enable easy identification of poor quality regions.

Base call editing changes the consensus sequence only.

Mismatches v the reference sequence are indicated in the results pane to the right of the electropherograms

3.6 Producing a Report

1. Once base calls have been confirmed or edited reports can be created in Reports | Report Generator

2. Select the appropriate report.

3. Tailor the report to your requirements by checking the appropriate report functions

See section 8

Ensure that variant reports are selected for variant detection applications. In addition to producing sequence reports, quality control reports and text sequences can also be reported.

(11)

3.7 Saving and Opening Layouts

1. Save by going to File | Save

2. Saved layouts must be opened by File | Open and browsing to the saved layout

See section 9

ATF saves the layout information including edits and links to

electropherograms as an xml file. ATF layouts cannot be opened by clicking on the layout xml file.

4 Detailed Users Guide and Description of Functions

4.1 Logging On

Log on by double clicking the ATF shortcut icon or by double clicking on the ATF.exe file located in C:\\Program Files\Conexio Genomics\ATF

Double-clicking the ATF.exe or on an ATF.exe shortcut icon file results in the Login screen.

• The default Operator is admin and the default Current Password is cg01

• Click Submit to login or press the Enter button

(12)

4.2 Adding New Operators

• Login using the default operator and password

• Before pressing enter or submit to login click

• The Window will expand as shown above

• Type in the new operators name in Edit Operator

• Type in a password in New Password

• Re-type the new password in Retype Password

• Select the Operator Level (See 4.4.3)

• Click

4.3 Changing the password

• Login using your username and password

• Enter the new password in New Password

(13)

• Retype to confirm in Retype Password

• Click

• Re-enter the new information to log in

4.4 Additional Functions of the Login Page

4.4.1 Default Settings

This dialogue enables the operator to select the settings files at the login stage (See Settings, section 5, create setting files).

4.4.2 System Files

The system files are installed to C:\Program Files\Conexio Genomics\ATF. This location can be changed. If the file location is changed the new information needs to be entered in System File dialogue

4.4.3 Operator Levels

Different levels of access have been created to ensure that reports are not created without the appropriate level of authority

First reviewer (edit only)

• Cannot change settings

• Can edit sequences that have not been signed off by a final reviewer

• CANNOT edit sequences that have been signed off by a final reviewer

• CANNOT sign on or off second check box First reviewer (with access to settings)

• Can change settings

• Can edit sequences that have not been signed off by a final reviewer

• CANNOT edit sequences that have been signed off by a final reviewer

• CANNOT sign on or off second check box Final reviewer (with full access)

• Can change settings

• Can edit sequences that have not been signed off by a final reviewer

• CANNOT edit sequences that have been signed off by a final reviewer

• Can sign on or off second check box

Signing off means an editor is satisfied with a result. If a sample is signed off by a

“Final Reviewer” it can no longer be edited. If a sample is signed on again by a

“Final Reviewer” it may be edited further. All changes in status are recorded.

(14)

5 Setting up ATF for Analysis

Once logged in, an ATF layout is opened.

5.1 Setting the Analysis Parameters

Explanation

This section describes how to optimize sequence analysis using ATF by creating the sequence analysis parameters. This section also includes instructions on how to create reference

sequences. Sequence analysis parameters are saved in “Settings” files. A number of settings files for different experiments and users can be stored.

The settings box contains functions that enable the setting of sequence analysis parameters including

1) Display options (General)

2) File Naming conventions (Naming) 3) Analysis parameters (Engine)

4) Setting up reference sequences (References)

(15)

5.1.1 Creating new settings files in Edit | Setting | General

The default Settings file contains parameters for analysis. These settings can be edited and new settings files can be created. Several Settings files can be

created for different users or applications.

Eg. A Variant Detection experiment requires that Variant Detection is selected in Engine (Described in more detail below). The operator can create a Settings File called “Variant Detection”. By loading this settings file all settings for this application will be recovered.

• To open an existing settings file select the drop down menu, select the settings file and click on Load

• To create a new settings file, type in a new settings file name and proceed to the other settings tabs to create the settings file for this file name (there is no need to click any of the other buttons in this window)

5.1.2 Setting the Electropherogram Display: Colours, fonts and line widths

Go to Settings | General

(16)

To change the individual nucleotide peak colours within the electropherogram, the display text font and the layout background colours select Display

To change the nucleotide peak colours:

• Select the Base, choose the colour, click on Set Colour

• Click on Done in the Display dialogue when the colours have been selected.

• This returns you to the Settings | General

• Click on Update

• Click Done

To change the font size and the line width select the appropriate parameters in the Text Size: and Line Width: boxes. Click Done when complete

Important: To log changes in Settings | General click Update and Done

(17)

5.2 Setting the sequence-file naming convention in Edit | Settings | Naming

Example

This is an example of a sequence filename: A01[12345_C4_ex2F

Delimiters have been used to separate the components of the sequence-file name

[ has been used to separate the PCR number (A01) and the sample name (12345)

_ has been used to separate the sample name and the locus (C4)

_ has also been used to separate the locus and the primer name (ex2F)

5.2.1 Set the Naming convention by defining the Sample Delimiters

In the example above;

The sample name begins with [ Enter [ in the Start : String box The sample name ends with _ Enter _ in the End : String box HLAB has been used as the code to indicate that all sequence filenames with HLAB in them must be analyzed against the HLA-B gene reference sequence.

• In Library Aliases --- select B from the Library drop down menu

• Enter HLAB in the Alias box

Explanation

Using a standard sequence-file naming convention enables ATF to link all EPG for a sample and to analyse the test sequence against the appropriate reference sequence. This section describes how to set the sequence-file naming convention. Delimiter symbols can be used to define the location of the sample name.

(18)

• Click Update

• Click OK

5.2.2 Setting the Naming convention by defining the Sample Name, Library alias positions and word length

Note: This method of defining the sequence-file naming convention should only be used if the sample name always starts at the same position within the

sequence-file name and is the same length for all samples

Using the same example as above ie A01[12345_HLAB_ex2F. The sample name starts at position 5 of the sequence-file name and is 5 characters long.

• In Sample Delimiters enter 5 in Start:Position and 5 in Length:Position The Library name to be analysed is located at position 11 and is 4 characters long

• In Library Delimiters enter 11 in Start:Position and 4 in Length:Position

• Click Update followed by clicking Done

5.3 Setting the data analysis parameters in Settings | Engine

Explanation

Settings | Engine. Allows parameters to be created to optimize sequence analysis.

(19)

5.3.1 BCS Limits

• Enter the appropriate number into each box. Using the default of 0 will result in all data being included.

• Setting a Base limit value will result in all bases with a score lower than this value not being called and will be assigned a

*

• Setting an EPG limit value will result in the exclusion of an electropherogram if the mean BCS of all positions falls below the value used.

• Setting a Sample limit will result in the exclusion of a sample if the mean BCS of the sample falls below the value used.

• After entering the appropriate values click Update and Done

5.3.2 Matching Mode

5.3.3 Basecaller

BCS is Assign-ATFs quality scoring system. A sequence peak can have a BCS quality score between 0-49. The higher the number the better the sequence quality and the more confidence that the base call is correct.

The mean BCS for all positions within an EPG provides an EPG quality score. If a sample is sequenced in both directions a sample can have a mean BCS of 0-99. The BCS Limits box filters base calling. Positions within a sequence, an EPG or samples will not be analysed unless they have a value above the value entered in the BCS Limits boxes.

Explanation: Two of the main functions of Assign-400ATF are Variant Detection (eg BRCA testing) and Genotyping (eg HepC genotyping).

Variant Detection identifies sequence differences between a reference sequence and test sequence. Genotyping compares the test sequence against a library of sequences of alleles to determine to which alleles or combination of alleles the test sequence is best matched. Specifying the application optimizes the analysis. Selecting No Mixed Bases will base call a signal peak – either A, C, G or T. This is useful for base calling poor quality hemizygous data

Explanation: Assign-400ATF has a unique base caller that also includes a normalisation algorithm (Picket Fences) that improves base call accuracy for re-sequencing projects. The Basecaller box enables activation of the Picket Fences algorithm. See 10.1 for a more detailed description of the Picket Fence analysis

(20)

5.3.3.1 Apply Height Maps

Checking this box instructs the software to use existing information including information within the current layout to normalise the EPG data

5.3.3.2 Update Maps

Checking this box instructs the software to update the normalisation maps with the new data

5.3.3.3 Apply Auto Editing (Not recommended for variant detection applications)

Autoediting is an intuitive base call algorithm that is applied when the quality of a sequence peak is poor. The software uses prior base calling information at this position as a guide to the most likely base. This should not be used for variant detection or SNP discovery applications

5.3.3.4 Suggested Applications for Picket Fence Analysis (Check Apply Maps and / or Update Maps)

Picket Fence Analysis:

• High throughput genotyping on optimized data

• Comparing SNP frequencies on pooled DNA

• Accurate detection of low level mutations

• Quality Control of reagents-Ensuring equivalent amplification of alleles

• Genotyping of alleles defined by insertion/deletion polymorphisms

5.3.3.5 Suggested Applications for Non-Picket Fence Analysis (Do NOT Check Apply Maps and / or Update Maps)

Non Picket Fence analysis:

• High throughput SNP screening on non-optimized data, or data of variable quality.

• Non re-sequencing applications

• Contig assembly from cloned data

5.3.3.6 Suggested Applications for Autoediting

Sequence based genotyping when comparing an unknown sequence against a sequence library.

(21)

DO NOT USE AUTO-EDITING FOR VARIANT DETECTION OR FOR STUDIES WHERE THE TEST SEQUENCE IS COMPARED WITH A SINGLE

REFERENCE SEQUENCE.

5.3.3.7 Summary: Sequencing Application and Analysis Parameters

Application*

ATF

Analysis Parameters

Genotyping Variant Detection

Clone Anonymous Sequencing Variant

Detection

Yes Yes Yes

Genotyping Yes Matching

Mode

No Mixed Bases

Yes Apply Height

Maps

Yes Yes

Update Height Maps

Yes Yes

Base Caller

Apply Autoediting

Yes

*Applications Definitions

Genotyping applications include the comparison of a test sequence with a library of sequences of variants (alleles) for the locus being sequenced.

Variant Detection applications include SNP discovery, variants in genes associated with genetic disorders and can also be used for viral variants associated with drug resistance.

Clone sequencing and contig assembly

Anonymous sequencing can include sequencing clones or PCR products where a reference sequence does not exist

(22)

5.4 Creating Reference Sequences in Setting | References

5.4.1 Creating a new reference from a GenBank files

5.4.1.1 Saving GenBank Files

• Access the GenBank file from http://www.ncbi.nlm.nih.gov/ and retrieve the reference sequence file from the NCBI website

• Save the GenBank reference sequence file as a text file (.txt) to your computer.

5.4.1.2 Creating the references sequence file in Assign

• Go to Edit | Settings | References

• Click Import GenBank and browse to the saved GenBank text file

• Enter a sequence Reference Name (This describes the sequence. The default will be “Unknown”)

• Enter a file name in File (This will be the name of the reference sequence file. The default will be “Unknown”)

Explanation

Reference Sequence: The reference sequence is the sequence to which test sequences are aligned. Reference sequences in Assign can be made by importing GenBank

information or text sequence in Fasta format. A Fasta file containing sequences of variants or alleles can be imported also. The reference sequence contains the sequence annotation information including the location of various genetic structures such as exons, untranslated regions etc.

(23)

In this example we have imported the GenBank sequence AF033819 (Full genome HIV sequence)

The Reference Name and File name have been entered manually. The

accession number has been entered as a Comment and the Version have been entered automatically

The main window contains the sequence regions as defined by the GenBank entry. These regions will be shown in the main window of ATF

After importing the GenBank reference click Update Reference to save the information and Done to exit.

5.4.2 Creating a new reference from text files in Fasta format

A reference sequence can be created from a single sequence or multiple sequence variants from of the same gene in Fasta format. Fasta format is

Reference sequence Region information

(24)

characterised by a “>” sign followed by the sequence name on the first line and the sequence on the next line

Eg.

>Sequence 1

ACGTCGATCAGTACAGCTTTCTGACGATCCAGTTAGGGATCACCCAG ACCC…………..

>Sequence 2

ACGTCGATCCGTACAGCTTTCTGACGATCCAGTTAGGGATCACCCAG ACCC…………..etc

Important Notes:

The sequence file must have a .fsta extension

If you have sequences for multiple variants and you wish to compare test sequences against the sequences of the variants (genotyping), ensure all sequences are in a single file in and all sequences in this file are in

FASTA format

• Go to Settings | References

• Enter the name of the reference sequence in the Reference Name box.

This is usually descriptive and can contain detailed information about the reference sequence.

• Enter the name of the file that you wish to save this reference sequence in File. This is usually a short name.

• Click on the FASTA file button. This will launch a file search dialogue.

Browse to the FASTA file that contains your reference sequence.

• Additional information regarding the reference sequence can be entered in Comments

• The Version box can be used to distinguish between multiple versions of a reference sequence or allele library

• Once imported Click Update References to save and Done to exit Additional Functions of the Reference Window

5.4.3 Annotating the Reference Sequence: Editing or Adding Reference sequence information

Explanation: A reference sequence can include a gene or genes consisting of a number of exons and other important genetic regions. Each of these is a region. Annotating the sequence to include these regions is performed in Settings | Reference.

(25)

• Go to Edit | Settings | References

• Click Load browse to C:\Program Files\Conexio

Genomics\Assign400\Data\References and select the reference file

• The main window contains the sequence annotation details for the item selected in the annotation menu

• Reference sequence details can be edited or entered in the Annotation Editor

The Annotation Window

Selecting Regions enables the different regions within the reference sequence to be annotated. These can be overlapping

Selecting Trim enables sequencing or PCR primer locations at the beginning of Regions to be excluded from the analysis

Selecting Coding Groups enables coding regions to be annotated. A Coding Group can be a single region or consist of several linked regions.

Selecting Primers enables sequencing primers that enable haplotype specific sequencing from a diploid PCR product to be defined

Selecting Variants enables known sequence variants to be added to the reference. This is helpful for checking base calling at important regions and allows the sequence at these positions to be reported

Annotation Details

Annotation editor

Annotation Menu

(26)

• To add regions, choose Regions from the Show drop down menu

• Enter the name of the region in the box above the Show drop down menu eg 5UTR

• Enter the Region Start position in Start box and the Region end position in the End box. (Number the regions so that base 1 is the first base of the reference sequence)

• Click on Add/Update. Perform this process for all regions.

• Importing GenBank entries may result in many redundant and unrequired regions. Several regions can be removed by typing the first few letters of the coding regions to be deleted in the left hand box in the Annotation Editor and clicking all.

• Once all regions have been edited click Update Reference to save the information and Done to exit the window

• To annotate 5’ UTR as minus numbers before the start codon enter the appropriate Start Base and Update Reference. To view the alternative numbering systems select between With Offset and No Offset in Numbering

5.4.4 Creating Coding Groups

Explanation: Once the regions (eg Exons) have been annotated in the reference sequence, common regions can be grouped to create a continuous string of sequence.

For example exons can be grouped to form the coding sequence. This information is incorporated into variant reports to identify if variants result in amino acid changes

(27)

• Select Coding Groups from the Show drop down menu

• Enter the name of a new coding group. CDS is used in the example above.

• Enter the start base (1).

• Select the regions to be added from the Members drop down menu (exon1)

• If this is a coding region, select Yes from the Coding drop down menu

• If the coding region is in the 3’-5’ (reverse) orientation of the sequence select Yes from the Reverse drop down menu.

• Click Add/Update to register the changes

• To add more regions to the CDS Coding Group. Select CDS from the drop down menu, select exon2 from the Members drop down menu.

• Click Add/Update.

• Continue until all members of the coding group have been added

• Click Update Reference to save the changes to disk

• Click Done to exit

5.4.5 Adding the location of known variants to the reference sequence

• Select Variants from the Show drop down menu.

• Enter the Position of the variant in the reference sequence (28)

• Enter the Variant nucleotide from the drop down menu under Variant (G)

Explanation: The sequence of known variants can be included in the reference sequence. The positions are highlighted on the electropherograms and also the sequences at these positions are highlighted on the reports.

(28)

• Enter the Length of the variant (if insertion or deletion variants are >1) 1

• Enter the Class of the variant.

• Click Add/Update. Add additional variants if required.

• Click Update Reference to save.

5.4.6 Removing primer site sequence from analysis using Trim

In this example the 5’ PCR amplification primer is 23bases in length and is located at the beginning of the 5’UTR. This region is excluded from analysis by

“Trimming” the length of the PCR amplification primer region

• Select Trim from the Show drop down menu

• Select the required region to be Trimmed in the Trim region drop down menu (5UTR)

• Enter the number of bases required to be Trimmed from the Start (23)

• Click Add/Update to register the changes

• Click Update Reference to save the changes to disk

Explanation: The operator can choose not to analyse sequences at amplification primer sites if these sequences are included in the reference sequence. Currently this function only allows removing sequence at the beginning or ends of regions.

(29)

5.4.7 Haplotype specific sequencing of diploid template using haplotype specific sequencing primers

• Select the reference sequence to be edited by clicking Load and browsing to the appropriate reference .xml (usually located in C:\Program Files\Conexio Genomics\Assign\Reference

• Enter a name for the haploid sequencing primer in the Primer Name drop down menu (ABCD_hap)

• Select the location of the polymorphism according to location within the

reference sequence to which the sequencing primer has specificity and enter in the Start and End boxes. A single nucleotide can be used in which case the start and end position will be the same. In the example above 3 nucleotides at the 3’ end of the sequencing primer has been used (Start: 195, End: 197)

• Enter the sequence that is located between the allocated Start and End positions (CGC)

• Select Master to indicate that haploid sequencing is being performed on the primary diploid PCR product.

Explanation: Assign-ATF contains a unique feature that enables DNA sequencing to be used to identify the haplotypes on which two or more polymorphisms are located. This function works best for genotyping applications where a test sequence is compared with a library of sequences of known variants.

This process requires an additional haplotype specific sequencing primer (HSP) with specificity for one of the nucleotides at a heterozygous position. The HSP must then sequence through a neighbouring heterozygous position. Identification of the nucleotide at the neighbouring heterozygous position(s) enables the haplotype to be identified.

The operator is required to enter the haploid sequencing primer information in Primers in the Show drop down menu.

(30)

• Enter an Alias. This is a name that is included in the sequence filename to enable the software to recognize the sequence file as being an haploid sequencing primer (hap1)

• The Limit box is required to indicate the limit of the sequencing reaction. This is useful because ATF enables the prediction of the appropriate haplo-

sequencing primer for prospective haplo sequencing. The Limit acts as a filter that will not predict a primer if the other polymorphism is too far away from the sequencing primer THIS FUNCTION IS NOT YET AVAILABLE.

• Once complete click Add/Update

• Click Update Reference

• Click Done

5.4.8 Additional Reference Sequence Functions

• The Max Deletion trims EPG of heterozygous insertion/deletion EPG after the entered number of positions. Indel sequence is particularly hard to base call as the quality deteriorates. Mis-base calls of indel sequence make accurate

analysis of indel sequence difficult.

• Subreference can be used to create a new reference sequence from a region within the reference sequence. For example a subreference that includes a single exon can be created from an existing reference if the exon is nominated as a region within the existing sequence.

• Trim by Regions only allows analysis of EPG in defined regions. This enables efficient analysis of sequence where the amplification and sequencing primers are located in introns but the reference sequence is a cDNA sequence with defined exons (regions).

5.5 Settings Appendix

5.5.1 Picket Fences (See

Picket Fence analysis is a novel approach to sequence analysis that improves heterozygous base calling and increases the number of applications of DNA sequencing. PF analysis can only be performed on re-sequencing data. Ideally, homozygous peak heights are the same height as each other and heterozygous peaks are 50% of homozygous peak heights. However this is not the case as a result of the variable incorporation rates of di-deoxynucleotide nucleotides.

Despite the variable di-deoxynucleotide incorporation rates between positions within a sequence, the incorporation rate at any one position within a sequence is highly reproducible between different samples. As a result a homozygous base at any position within a sequence has a predicted peak height. PF analysis

(31)

presents the sequence the peak heights of an electropherogram relative to the expected homozygous peak height. As a result, homozygous peaks are usually the same height and heterozygous peak heights are 50% of homozygous peaks.

Base calling is then performed on this data resulting in an increase in heterozygous base calling accuracy.

Conventional Electropherogram Analysis

5.5.2 Sequence Electropherogram Quality: The Base Call Score (BCS)

The BCS or Base Call Score is the basic unit of Assigns quality assessment system. The BCS reflects the integrity of the peak shape, the background and the separation from neighbouring peaks. The perfect peak will have a BCS of 50.

The BCS of a consensus sequence is an accumulation of the BCS that constitute the consensus sequence. The BCS does not discriminate against heterozygous base calls as a result the mean BCS and the degree of variability of BCS

between positions are markers of sequence quality for a sequence electropherogram or a sample.

Assign uses shades of white to red to indicate the BCS for sequence position, an electropherogram or a sample. Boxes above sequence base calls indicate the BCS of the base call. Shading of the sample ID indicates the mean BCS of the sample and shading of the electropherogram map above the sequence indicates the mean BCS of the electropherogram.

Picket Fence Analysis

(32)

6 Importing Sequences into Assign for Analysis

6.1 Importing sequences is performed by selecting File | Import

6.2 Importing Sequences by Directory

• Browse to the directory (directories)

• Check the Import All Subdirectories box of all subdirectories are to be imported.

• Click on Go

Explanation.

Text sequences and electropherogram sequence can be imported into Assign-ATF. Sequence can be imported as individual files or by directory, including subdirectories. Importing sequences by directories enables high throughput analysis. Filters can be applied for specific importing of sequences. eg all sequences with the same sample name can be imported if they exist within the selected folders. This is useful for comparing sequences from the same individual over time or importing sequences from different loci for the same patient.

(33)

6.3 Importing Sequences Individually

This function also enables multiple sequences from a directory to be imported

• Click Select Files Manually

• Browse to the directory that contains the sequences

• Select the sequence you wish to import by double clicking. To import multiple sequences use the Shift or Alt keys and click to select the sequences, then double click to import.

6.4 Importing Sequences Using the Filter Function

• Proceed as described for importing sequences by directory (See above)

• To filter by name, enter the sample name in Filters:Name (the sample names must be identical in the region defined by the naming settings as the sample identifier

• To filter by locus, enter the locus code in Filters:Code

• To filter by Primer select the primer name

Only sequences with the appropriate filter will be imported

7 Sequence Analysis and Editing

7.1 The Analysis Screen

Explanation

The Analysis Screen comprises 3 main panes that include the sample ID’s, the electropherogram data and the mismatches with the reference sequence. Sequence electropherograms can be viewed and edited. Edits result in real time updates of the result screen.

(34)

7.1.1 The Sample Pane

7.1.2 The Electropherogram (EPG) Pane

Each box is a different sample.

The sample highlighted in blue is the active sample. The active electropherogram and the genotype result relate to this sample.

Moving between the samples is performed using the up | down arrows on the keyboard, the up down arrows on the Navigator (see below) or by clicking the appropriate sample

Moving between samples simultaneously updates the electropherogram and the genotype panes

The white to red shading is an indicator of quality of all sequences for the sample. The more red shading the box the poorer the quality of the sequences for this sample

The green boxes enable report selecting. Checking the green boxes will remove the sample from the report and turn the box orange in colour

The orange and yellow boxes alert the operator to samples with QC warnings. Orange boxes have a QC warning whereas yellow boxes don’t have a QC warning

Boxes headed 1, 2 and R refer to the operator level review.

Checking the boxes indicates sign off by the reviewers of different levels of authority. Box 1 is for the first reviewer, Box 2 is for the second reviewer

Position within the references sequence

Consensus sequence and quality indicator

Electropherogram sequence and quality indicator

Highlighted positions

Positions that differ between the test sequence and the reference sequence Autoedited positions

Manually edited positions

Confirmed base calls

Manually selected variant positions

Electropherogram Includes sequence filename and signal intensity

Active sequence position Sequenced

regions

Annotation Differences v

reference

(35)

7.1.3 Migrating through the sequence EPG is performed

• clicking on the EPG pane and using the arrows on the key pad

• Using the Navigator (See below)

• Clicking on the Bus and dragging it across the sequence

7.1.4 The Navigator

The Navigator enables sequence editing, moving between samples and moving between positions within a sequence

7.1.4.1 Editing using the Navigator Keypad

7.1.4.2 Priority Editing

Base calls are edited on the keypad. The bases present at the active sequence position is highlighted on the keypad.

Editing is performed by selecting non highlighted bases or unselecting the highlighted bases. In this example, if A was the correct base, clicking C (unselecting the C base call) would also unselect M. The + and – keys allow including or removing insertions (+) or deletions (-)

Priority Editing enables examination of base calls at positions:

• that have a low BCS by selecting the BCS box

• that have been edited by selecting the Edits box

• that are mismatched with any of the allele combinations in the results table by selecting the MM box

(36)

Moving to positions for Priority Editing is performed by selecting the

or buttons

Selecting either button moves the Bus either left or right by one position.

Selecting either button moves to the sample above or below

Selecting either button the bus to the beginning of the sequence or to the end

The Master drop down menu enables the operator to move between the diploid sequence to haploid sequences when haplotype specific sequences are used

The Exon 3 (example only) drop down menu enables to operator to move to different regions within the sequence. This menu will include all regions annotated for the reference sequence

Moving to specific codons and positions within the sequence that are mismatched with the library or reference sequence Typing 163.1 and clicking the arrow button results in moving the active position to codon 163, position 1.

Selecting the drop down menu containing 487 will list the positions that are mismatched with the reference sequence or library. Selecting any of these positions will move the active position to this location to confirm the base call.

Clicking OK confirms the base call and moves to the next priority editing position

(37)

7.2 Additional Sample and Sequence Editing Functions 7.2.1 Editing in the Sample Pane

Show Comments – selecting Show Comments provides a quality report of a sample with poor consensus quality in the format show below

Edit Comments – enables comments regarding the sample to be entered Reanalyse – Removes all sequence edits for all samples and reanalyses the data. Used if the analysis settings have been changed after EPG have been imported

Add New Samples – enables the addition of new samples Remove Sample – enables the removal of samples

Remove All- enables all samples to be removed

Autoedit – Selecting auto-edits runs the Autoedit function if this function has not already been selected in Settings

Add Sequences – Enables the addition of the test sequence to the library of allele sequences

Add All Sequence – Enables the addition of all sequences in the layout to the sequence library

Update Reference – Enables the sequence of the active sample to be used as the reference sequence and all samples within the layout to be reanalysed against the new reference sequence.

Explanation. Right clicking the mouse on a sample provides additional editing functions

(38)

7.2.2 Editing in the EPG pane

Selecting:

Set Start Base trims sequence from the EPG to the left of the cursor Set End Base trims sequence from the EPG to the right of the cursor Show Warnings results in the issuing of a quality warning

Autoedit results in autoediting of the sequence

Less sensitivity results in a reanalysis of an EPG after reducing the detection limit. This function is very effective for improving base call accuracy of data with high background

Reanalyse results in the reimporting and reanalysis of the EPG. Can also be used to replace an incorrectly trimmed EPG. This function is useful for data free of background to accurately detect low level mutations

More sensitivity results in a reanalysis of an EPG after increasing the detection limit

Remove EPG results in the complete removal of an EPG Add Variant results in the addition of a variant to a report

Add All Variants results in the addition of all variants for an EPG to a report Add Sequences results in the addition of the sequence to the library

7.3 Additional Functions

Explanation Right clicking on the mouse in the EPG field enables additional editing functions of the EPG

(39)

7.3.1 Zooming the EPG

The EPG can be enlarged by pressing the Shift key and the up/down or left/right arrows on the computer keyboard

7.3.2 Hiding the EPG

Simultaneously pressing the computer keyboard Shift key and one of the letters representing the 4 bases will remove the trace of this base from the EPG.

Repeating this procedure will return the trace. This function is useful if

heterozygous peaks are perfectly overlaid and the base call requires confirmation

7.3.3 Expanding the EPG Window

Pane Boundaries

Clicking the Pane Boundaries and holding the mouse key enables the movement of the Pane Boundaries to expand or contract the EPG view

7.4 Sequence View Options

(40)

Select Consensus to view the sample text sequences

7.4.1 Select Consensus to view the sample text sequences

• Each text sequence corresponds to the sample in the Sample Pane

• Sequences that are different from the reference sequences are highlighted

Explanation The View option allows switching between EPG and aligned text sequence of each sample. Viewing the sample text sequence enables high throughput SNP.

(41)

7.4.2 Select Dots after selecting consensus to view only those sequences that differ from the reference sequence

7.4.3 Select Quality to view the sample text sequences and BCS

• Each text sequence corresponds to the sample in the Sample Pane

• Colour coding indicates quality and the BCS. White is good quality and high BCS, red is poor quality and low BCS.

7.4.3.1 Suggested Applications for Consensus sequence and Quality view Assign’s ability to import thousands of sequence, its accurate and novel

approach to base calling and the simple switch between EPG and sample text sequence simplifies high throughput SNP screening

(42)

7.4.4 Select Alignment to view the sequences of the best matched alleles

• The sequence of the selected sample (blue in the sample pane) is always at the top of the list and contains highlighted sequence that differs from the reference sequence

• All sequences below the sample sequence are the sequences of the corresponding allele combinations in the results pane

• Highlighted allele combination sequences are sequence differences between the allele combination and the selected sample sequence

8 Reports

Explanation Assign enables 4 possible report formats

1) A variant report for applications where test sequence is compared with a single reference sequence

2) A genotype report for genotyping applications when matching a sample sequence against a library of known sequences.

3) A FASTA report that provides a fasta file of sequences from all samples in the Assign layout 4) A Quality (BCS) report that enables a quality control analysis of samples within the Assign layout and for all layouts within a specific directory

Contact us regarding custom reports

(43)

8.1 Variant Report

Selecting:

Output Filters and Numbering: enables selection of Samples, Locus, Layer (Haploid or diploid), sequence Group or sequence Region

Nuc or Codon: enables variants to be reported as nucleotides or codons.

Flanking Sequence 3’: lists the nominated number of nucleotides that lay 3’ to the variant sequence

Flanking Sequence 5’: lists the nominated number of nucleotides that lay 5’ to the variant sequence

Select Variants: User Defined reports sequence at positions defined by the operator

Select Variants: Observed reports any sequence differences between the sample sequence and the reference sequence.

Select Variants: All alleles reports the variants between the test sequence and the sequence of all alleles in the database

Options: BCS enables the BCS quality control values to be included Options Audit: Includes a detailed reviewer audit report

Output Type: enables vertical or horizontal listing of variants Output Formats: Excel produces a report in an excel worksheet Output Format: XML produces a report in xml format

Output Format: Text produces a text file report

(44)

8.2 Genotype Report

Selecting:

Filters: Sample enables specific samples to be selected

Filters: Locus enables samples from specific loci within a layout to be reported (Note: ATF can analyse sequences from more than one locus)

Sort by: Locus lists the reports by locus

Sort by: Name lists the reports by sample name

Full Report:

Sample: Match Summary lists the best matched alleles Sample: Auditing reports the edit auditing function

Layers: Electropherogram List lists the EPG sequence files analysed for the sample

Layers: Sequences Produces a sequence report

Layers: Edits includes the manual and autoedits performed during analysis

(45)

Layers: Mismatch List shows the mismatch nucleotide information of the closest matched sequences with the libraries in a list

Summary Options: NMDP lists the NMDP code associated with the genotype Summary Options: HARPS lists heterozygous ambiguity resolving primers which can be used to resolve the reported ambiguity

Summary Options: Full+Part lists the proportions of the full and partially complete library sequence

Summary Options: Differences indicates where the differences lie in the match summary

Audit Options: Save indicates the date and user whom saved the reported layout (Auditing must be selected from the Sample Summary drop down menu) Audit Options: Confirm lists the confirmed edits by the user

Mismatch Limits enables alleles to be reported up to the nominated number of mismatches between the sample sequence and the library sequences.

Simple List: lists the alleles as a string of text without the summary information Table: Alleles lists the alleles only without summary information

Output Formats: Excel produces a report in an excel worksheet Output Format: XML produces a report in xml format

Output Format: Text produces a text file report

Output Format: Japanese generates the report in Japanese

Output Formats: Page Breaks includes a page break between samples

Additional Information enables additional comments to be added to the report

(46)

8.3 HARPS report

Selecting:

Output Filters and Numbering: enables selection of Samples and Locus Output Format: Text produces a text file report

Output Formats: Excel produces a report in an excel worksheet Output Format: XML produces a report in xml format

(47)

8.4 FASTA Report

Selecting:

Output Filters and Numbering: enables selection of Samples, Locus, Layer (Haploid or diploid), sequence Group or sequence Region

Sort by: Locus enables reports to list be locus

Sort by: Name enables reports to list by sample name

Options: Pad Ends results in inclusion of dashes (---) at the end of a sequence to enable all sequences to be the same length.

Options: Separate Files by sample name or locus

(48)

8.5 Quality Report

The Quality Report section enables the generation of longitudinal quality control plots. The mean BCS and standard deviation for an EPG is a quality score for that EPG. Similarly the mean BCS and standard deviation for a sample is a quality score for a sample. Plotting the BCS for all samples is an effective way of performing sequencing quality control. Assign will perform a quality analysis on all saved layouts within a Selected Folder or specific saved projects by using Get Projects. This enables Assign to be used solely as a Sequencing Quality Control software.

The quality report produces an excel file containing worksheets with quality data.

(49)

The Data worksheet contains a spreadsheet with the quality information from which the quality graphs are produced.

The BCS Distribution worksheet contains a plot of the frequency of BCS from all positions for the consensus sequence from all samples within the layout. This layout also contains the frequency of edits for each BCS

The BCS Means worksheet contains a plot of the mean BCS for each sample and also includes the number of edits performed for each sample.

The Signal Strength worksheet contains a plot of signal strengths for each sample within the layout plotted with the mean BCS

9 Saving, Opening and Printing Layouts

• Assign-ATF layouts are saved using File | Save

• To open layouts go to File | Open and browse to the layout. LAYOUTS CANNOT BE OPENED BY CLICKING ON THE SAVED LAYOUT

• Assign-ATF layouts, including the EPG can be printed using File | Print

10 FAQ

Q. Why don’t all sequences from a sample appear together as part of the same active sample?

A. This is usually because the software has not been setup so that it can

uniquely identify the sample name within the sequence filename. Review section 4.2.1

Q. The sequence data has cutoff base calling but I can still see good quality sequence?

A. The software has trimmed the base calling region based on quality. To force base calling at trimmed positions go to View and check that View Unaligned is selected. Then move the mouse to the electropherogram field and right click at a position that includes the region required to be analyzed and select Trim Right (if the mouse is to the right of the unanalyzed sequence) or Trim Left (if the mouse is to the left of the unanalyzed sequence)

Q. How do I undo a mistakenly trimmed sequence?

Explanation: Assign-ATF layouts are saved with links to the EPG. EPG are not save as part of the layout in order to keep the file size as small as possible. EPG are imported back into the layout when layouts are opened.

(50)

A. There is no “Undo” function as such. Right click on the electropherogram and select “Reanalyze”. Note that the electropherogram is re-imported and all edits will be lost. Note that the “Reanalyze” function can be performed on all

electropherograms or all samples within a layout. This is usually performed if the settings are changed after sequences have been imported

Q. Why is there no information in the Results pane for a sample?

A. This is usually because the number of mismatches between the test sample and the reference sequence is too high as a result of poor quality sequence or data from an insertion / deletion (indel) heterozygote. Check the sequence quality of the electropherograms for the sample and remove or trim poor quality

sequence. If an indel is present trim the electropherogram to about 30 bases past the position of the indel.

Q. The electropherogram indicates the presence of an indel but this has not been reported as a variant in the “Reports” menu

A. ATF can calculate the deleted sequence within an indel. However, the

distinction between indel and poor quality sequence is difficult for the software to calculate. This is particularly true if the sequence quality is poor. If the software does not calculate the indel, go to the electropherogram and remove the

sequence by right clicking at between 20 and 30 bases past the point of indel and trimming the electropherogram.

Q. The electropherogram contains the correct base call at a variant site but this has been changed in the consensus sequence.

A. ATF has an “Autoedit” function. This function comes into play if the sequence quality is poor. In this case the software will perform a base call with a significant bias to what is “expected”. The autoedit function is greatly assists base calling when ATF is used for genotyping highly polymorphic genes. The autoedit function can be turned off in Settings | Engine. See section 4.3

11 Contact Us

For questions, comments, suggestions, complaints contact us at:

conexio@iinet.net.au Tel: +61-422863227

ATF is manufactured by

Conexio Genomics Pty Ltd 8/31 Pakenham St

Fremantle 6160.

Western Australia Australia

References

Related documents

WEFS is proud to be an independent public television station, offering a unique broadcast service to our Central Florida

The volume of table grapes marketed fresh in South Africa (TGFMSSA) depends on last year’s volume sold in domestic market, total production and the ratio of expected return in

By the 1920s and 1930s, kabuki was mainstream entertainment and the actors portrayed in the prints of Stars of the Tokyo stage were wildly popular for their exciting

Bjork, Karen, "Inside an Open Educational Resource Initiative: PDXOpen at Portland State University" (2015).. Library Faculty Publications

Het repertoire was voor het Dispuut nieuw, maar werd al eerder door andere.

From the point of view of (multiagent) plan- ning [ 13 ], i.e., the problem of synthesising sequences of actions to reach a certain goal – in this case, arrival at a destination from

Proceedings of the 2007 American Society for Engineering Education Annual Conference & Exposition Copyright © 2007, American Society for Engineering Education.. Session

They decided to introduce an explicit capacity payment mechanism (CPM) to attract investment, rather than to rely on the implicit reward that generators receive in an