• No results found

Simplifying Data Interpretation with Nexus Copy Number

N/A
N/A
Protected

Academic year: 2021

Share "Simplifying Data Interpretation with Nexus Copy Number"

Copied!
8
0
0

Loading.... (view fulltext now)

Full text

(1)

Rapid technological advancements, such as high-density aCGH and SNP arrays as well as next-generation sequencing have enabled fine resolution scanning of the genome for identification of copy number changes as well as allelic event changes (such as Loss of Heterozygosity). This onslaught of data coupled with ever growing number of samples processed with these new technologies requires an effective software system that can help the user interpret the results in a reasonable amount of time and with good confidence in the interpretation accuracy. Here we will describe the Nexus Copy Number version 5 software that can drastically improve the interpretation process.

Collecting, Organizing, and Searching Samples and Associated Annotations

Measurements that can be used to estimate the copy number status of a sample can be generated using a variety of platforms and can be in different file formats as well as different levels of abstraction: Raw, Normalized log-ratio, or Segmented/Called. Nexus Copy Number provides direct import of data from all commercial array platforms and at various levels of processing, see Figure 1. In this paper we will use data generated from four different array platforms, Illumina HumanCytoSNP-12, Affymetrix SNP 6.0, Agilent 44K (EmArray Design), and Roche NimbleGene 3x720K Whole Genome arrays.

Figure 1 – Analysis workflow for Nexus Copy Number and how it can accommodate all platforms at different stages in the analysis pipeline

(2)

Nexus Copy Number uses the concept of a Project to refer to a collection of samples. A Nexus project can contain samples from different arrays and different resolutions. The only thing that is common between the samples is the genome (organism and build number). The user can batch load essentially an unlimited number of samples into a project with a single mouse click. Sample annotations, such as Gender, Ethnicity, Age, Phenotype, etc. can also be loaded with a single click importing a tab-delimited list of samples along with their annotations. Nexus Copy Number refers to such sample annotations as Factors. An unlimited number of Factors can be defined by the user with arbitrary text field values.

Samples in a project can be organized and selected by sorting on factors of interest. For example, it is very simple to select all male samples with age greater than 50 in just two mouse clicks when the project has factors

Age and Gender. There is a text based search tool available as well to locate any text string of interest.

Figure 2 – Searching all samples for a particular text string

Interpreting data from a Single Sample

Raw measurements or log-ratio data can be processed within Nexus Copy Number to identify regions of copy number change and if SNP data is available, allelic events such as LOH. Nexus Copy Number offers the following algorithms for making the calls:

- Rank Segmentation: A robust variation of the well-known Circular Binary Segmentation (CBS) algorithm where the probe ranks are used to minimize the effect of outliers and drastically improve performance. - SNPRank Segmentation: An extension of the Rank Segmentation algorithm where B-Allele Frequency

values are also included in the segmentation process generating both copy number and allelic event calls. - FASST Segmentation: A novel Hidden Markov Model (HMM) based approach that unlike other HMM

methods does not aim to estimate the copy number state at each probe but uses many states to cover possibilities, such as mosaic events, and then make calls based on a second level threshold

- SNP-FASST Segmentation: An extension of the FASST algorithm but adding many more states to cover events related to the B-Allele Frequency values to make copy number and allelic event calls.

(3)

Since Nexus has been designed to offer maximum flexibility, it is easy to allow Nexus to execute a different segmentation algorithm developed in R or any other programming language. Additionally, Nexus allows the user to import copy number result files where the segmentation and/or calling of the regions has already been done by another software package.

Once the data is loaded and processed, the user can simply select one or more samples to review. The Sample

Drill-Down window provides all the necessary information needed to examine the selected sample. The Overview tab provides a quick graphical view of all the aberration where single red bars mark areas of

heterozygous deletion and double red bars indicate homozygous deletion. Single green bars indicate one copy gain and double green bars indicate multi-copy amplification, see figure 3.

Figure 3 – Overview of single sample showing all the chromosomes. All aberrations are marked by green or red marks. Here, the loss on chr 15 is related to Angelman Syndrome

By clicking on an ideogram the user is directed to the detailed view of the selected chromosome. In this case if we click on Chromosome 15, we can see the detailed annotation of this region as well as the probe level data as depicted in figure 4 below. The annotation tracks selected here include the genes, exons, known CNVs from the Database of Genomic Variants (DGV) in Toronto, miRNA locations, DECHIPER database information, Gene Association Database, and known segmental duplications. It is simple to add or take out tracks with a single click of the mouse. Nexus Copy Number version 5 also allows direct import of UCSC based BED files for even simpler annotation track loading.

(4)

Figure 4 – A view of Chromosome 15 of this sample clearly indicating several loss and gain regions. Additional tracks showing information from web based databases is provided for quick reference.

All the annotation tracks are active and provide additional information or are hyperlinked to a web-based resource. For example, clicking on a magenta colored region of known CNVs provides a list of all reported events at that location in the DGV, as shown in figure 5 below. Hyperlinks are then provided to query the region in DGV or the publication from PubMed.

(5)

Figure 5 – A drill-down table showing all reported CNVs at a particular location on chromosome 15

As part of the interpretation process the user might look at an area of aberration and query various web resources for information about the genes in that area by right-clicking on a gene, as shown in figure 6. The user can also get immediate information about all the annotations on the screen with a single click of the drill-down tool.

Figure 6 – A close-up view of a region on chromosome 15. At this level gene names are clearly visible and are hyperlinked to various sources which can be customized by the user.

(6)

Nexus Copy Number also offers a powerful report generation tool for each sample. The report contains the following information:

Chromosomal location Event (gain, loss, etc.) Region length

Number of genes in region Number of probes in region % of overlap with known CNV

In addition to the above fields, the user can customize the report to include flanking probe IDs, known syndromes in regions, or any other genomic based annotation that is desired. The user can select which regions to exclude from the report using a simple check box. Providing even more flexibility, the user can right-click on any region in the report and go to that region in a web-based browser (e.g. Ensemble or UCSC) or review the region back in Nexus. Another unique feature is the ability to query the selected region in the Nexus project or in Nexus DB to find any samples with similar events, as shown in figure 7.

Figure 7 – The Report tab of a single sample showing regions of aberration. A single region is selected and is queried in all other samples in the project. The query result window is shown

(7)

Searching for Samples with Common Aberrations

The new high-density array platforms are allowing users to detect ever smaller aberrations. Although this is potentially useful, it also creates a challenge during data interpretation. Nexus provides a number of features to help in this process. First, the % CNV overlap allows the user to sort and easily locate the regions that have not been reported in public repositories as being polymorphic in the “normal” population. Second, Nexus provides a number of data filters to remove aberrations that are smaller than a specified size from the review process. And probably the most useful and unique feature is to provide various ways of searching for common events across multiple arrays. This process can be done at two different levels. One is to search all samples in a given project. Here the project can hold thousands of samples processed at a particular location. Second is by searching the powerful Nexus DB internet based repository. We will describe both options in order below.

There are two simple ways to search a project for a genomic event. We can use an example to illustrate these methods. Here we use a project that has 57 samples from various aCGH and SNP array platforms. While examining a sample we notice a small deletion at the gene FHIT loci. Using the query tool shown in Figure 7 above we can identify all samples in the project that have a loss in this region and we can see the various sample annotations (e.g. phenotype, gender, etc.). We can also select only these samples for further analysis with a single click. Another very useful tool in Nexus Copy Number is the Sort tool. Using this feature we can just point to an area on the genome and all samples with the selected aberration are moved to the top of the screen with the smallest event on top. Figure 8 shows the samples all having a loss at the FHIT loci. We can select to color code each sample based on a selected factor. Here we chose the phenotype and see that all three samples with the deletion are cancer samples of various types.

(8)

Figure 8 – Sorting samples based on a loss around the FHIT loci. Samples are color coded based on phenotype

Sharing Results with Collaborators

Nexus Copy Number provides two simple but powerful methods to share profiles and findings with collaborators or the public. The first method allows the user to set a frequency threshold and ask the software to identify all aberrations that are present in higher than a set frequency threshold. The user can then create a BEDGRAF file with a single click. This file can be used by the UCSC browser to display the gain and loss frequency. For example a user having 1000 samples of “normal” European samples can use this function to create a frequency plot for all events higher than 1% (10 or more samples in the project) and post this at UCSC for all to use or only share it with colleagues to limit its distribution.

The second mechanism is through the use of Nexus DB. The user can select with a click of a button to have his project be accessible by anyone using Nexus DB or by a specific group of. Any data that is shared can then be visible in queries or downloaded for detailed analysis by other users.

Conclusion

We have outlined here how Nexus Copy Number version 5 software can be a powerful resource in the process of understanding and interpreting results from high-density array based copy number measurement platforms. Nexus provides all the features necessary to make the process as efficient as possible.

References

Related documents

MCB Management Control System Administrative Council Curriculum Committee Faculty Affairs Committee STUDENTS ALUMNI NEW RECRUITS CURRICULUM/ EXTRACURRICULAR ACTIVITIES

And his mother just recently died, you know that and so he's a good friend of mine and he used come down all the time to see his father and we knew all the faculty or a lot of them

To better understand the circulation of tick bacte- rial zoonosis in Lazio Region (Central Italy), and after several reports about the high density of ticks and tick- bites

Since the fulfillment of speaker´s communicative intention depends on the addressee´s comprehension of the utterance meaning and his/her recognition of the

In this memorandum the Board was informed that due to the age and declining reliability and serviceability of the P900 equipment currently controlling Carpark 1, Hayden Street

During the critical Encoding/Maintenance period, activity on trials with the highest level of accuracy (3 or 4 correct) is higher than trials with lower levels of accuracy.

The influence of the Burmese Buddhist scholastic system over the Shan monastic learning in recent decades becomes clear if we compare the syllabuses of the state- sponsored exams

– 96 000 Hz for sampling frequency, – 511 bits for length of Gold code sequence, – 7000 Hz for carrier and chip frequencies, – Method I for location estimation