• No results found

Submitting Data to ISCA and NCBI

N/A
N/A
Protected

Academic year: 2021

Share "Submitting Data to ISCA and NCBI"

Copied!
10
0
0

Loading.... (view fulltext now)

Full text

(1)

Submitting Data to ISCA and NCBI

created by Tim Hefferon last updated August 28, 2012

Dear ISCA Submitter,

This brief guide is intended to make the submission of your copy number variation and clinical data to ISCA and NCBI easy and straightforward. If you have any

questions after reviewing this guide, please send an email to [email protected].

The submission process involves the following steps:

1. Register your lab for an ISCA-dbGaP submission account (see below). 2. Review the attached ISCA submission template.

3. Transfer your data to the submission spreadsheet. There are three tabs to complete (indicated in yellow): SAMPLES, EXPERIMENTS,and VARIANT CALLS. Details and instructions for completing these can be found in this document, as well as in the blue INSTRUCTIONS tabs in the spreadsheet.

Figure 1: The submission spreadsheet. Please complete the yellow tabs; blue tabs contain instructions. 4. Log in to your dbGaP account (see step 1) and upload your completed

submission file.

5. dbGaP will process your submission, store clinical information and submitted identifiers behind controlled access, and then sever links between samples and calls so aggregate data can be sent to dbVar.

6. dbVar will further process your data and assign permanent accession id’s to your variants. A list of id’s will be returned to dbGaP, who will pass them on to you; you can then link these back to your original data using the sample ID’s you submitted.

(2)

7. dbVar releases ISCA data updates on a quarterly basis. Your data will be combined with variants reported by other ISCA labs during the same quarter. dbVar receives all ISCA data in aggregate and cannot determine which variants come from which labs. In addition, no personally identifying information is stored at dbVar – such information will always remain behind controlled access at dbGaP.

Notes on linking variants in dbGaP and dbVar

1. WARNING: Do not submit any data files containing individual level data to dbGaP. NCBI takes protection of human subjects very seriously, and it is important that you assign anonymous patient ids prior to submitting your data.

2. dbGaP retains all data you submit, including sensitive information like patient clinical phenotypes and connections between observed phenotypes and reported variants. This information is kept behind strict controlled access at dbGaP; in order to see it users must go through a secure NIH approval system (see https://dbgap.ncbi.nlm.nih.gov/aa/wga.cgi?page=login).

3. Before forwarding your data to dbVar, dbGaP will sever all informational links between patients and calls to preserve patient privacy and anonymity. There is an exception: When multiple variant calls are reported in the same patient with clinical assertion values of “Pathogenic,” “Uncertain significance: likely pathogenic,” or “Uncertain significance,” their co-occurrence may be an important aspect of their effect on clinical expression. Therefore their

relationship must be retained and displayed as an integral part of the data. To achieve this, dbGaP creates “fake” sample IDs linking the variants; dbVar can then display these specific sample:variant relationships without risking patient confidentiality.

REGISTERING FOR A SUBMISSION ACCOUNT

You must register your study with dbGaP before you can submit any data. Once your name is in the system, you can acquire a secure submission account for the person who will be uploading the files to dbGaP (if different from you). Please follow these steps:

1. Register for an NIH eRA account (if you don’t already have one).

Usernames and passwords for the dbGaP submission systems are managed as part of the “eRA Commons” NIH system. Your lab or organization likely already has an eRA account as part of your dealings with NIH grants, in which case you should already be able to use your credentials to log in to dbGaP:

(3)

https://dbgap.ncbi.nlm.nih.gov/ss/dbgapss.cgi?login

If you cannot log in (or if you are sure you do not already have an eRA account), please register for one here:

https://public.era.nih.gov/commons/public/registration/registrationInstructions.jsp If you have questions regarding the eRA account application process please contact the eRA help desk: https://public.era.nih.gov/commons/public/contacts.jsp. If you still encounter problems getting an eRA account, contact us at

[email protected].

2. Log in to dbGaP using your eRA credentials. Log in to dbGaP using your eRA credentials from step one: https://dbgap.ncbi.nlm.nih.gov/ss/dbgapss.cgi?login

3. Request a ISCA–dbGaP Submission account.

After you have successfully logged in, send us an email ([email protected]) requesting a ISCA-dbGaP Submission Account. We will then create your account and ask you to confirm that you are able to log in. You will then be able to upload your ISCA submission file.

COMPLETING THE SUBMISSION SPREADSHEET

SAMPLES

Figure 2: The SAMPLES tab

(4)

sample should have a unique sample_id (column A) but the same subject_id (column G). Enter “no-call” samples just as you would any other sample; however, you must identify them by indicating “Yes” in the is_no_call field (column P).

REQUIRED FIELDS sample_id subject_id subject_phenotype is_no_call consent may_recontact OPTIONAL FIELDS sample_resource sample_cell_type sample_cancer sample_attribute sample_karyotype subject_collection subject_population subject_karyotype subject_sex subject_age subject_maternal_id subject_paternal_id family_history

Instructions for completing each field are included in the blue SAMPLES INSTRUCTIONS tab (to the right of the yellow tabs).

Phenotypes

Phenotype information on subjects must be entered using established terminology from the Human Phenotype Ontology (HPO). These should be supplied as a comma-delimited list of vocabulary:term_id pairs – e.g., “HP:0007018, HP:0001249.“ Please see Appendix A to this document for phenotype terms commonly used by ISCA. If you are unable to find suitable HPO terms, ISCA has access to a text-mining algorithm that can help determine the most suitable term ids for your phenotypes.

The only exception to the rule requiring the use of HPO vocabulary terms is that you may instead use a very general text designation that was developed specifically for ISCA: “Developmental Delay and additional significant developmental and morphological phenotypes referred for genetic testing.” This phrase may be supplied instead of vocabulary:id pairs, if you do not wish to indicate more specific phenotypes.

(5)

EXPERIMENTS

Figure 3: The EXPERIMENTS tab

The EXPERIMENTS tab is used to record the specific methods, analyses, and platforms you used to generate and validate your data. Note that experiment_id 1 through 5 have already been completed with commonly-used parameters matching ISCA-related studies. If the pre-filled information adequately describes your

experiments, you need only complete reference_type and reference_value for experiment 1 (you can include additional details in other columns if desired). You do not have to use the pre-completed experiments; if you wish to enter your own experiments, replace the gray “sample” text with the desired information.

REQUIRED FIELDS experiment_id method_type analysis_type reference_type reference_value OPTIONAL FIELDS experiment_resolution method_platform method_description analysis_description detection_method detection_description external_links site

Instructions for completing each field are included in the blue EXPERIMENTS INSTRUCTIONS tab (to the right of the yellow tabs).

(6)

VARIANT CALLS

Figure 4: The VARIANT CALLS tab

The VARIANT CALLS tab is used to record the details of your variant calls, and represents the core of your data. It includes clinical assertions you have made, phenotypes included in those assertions, copy number data, and the genomic locations of your variant calls.

REQUIRED FIELDS variant_call_id variant_call_type experiment_id sample_id clinical_significance phenotype copy_number assembly chr inner_start inner_stop OPTIONAL FIELDS validation description origin is_parent_of_origin_affected mode_of_inheritance zygosity external_links outer_start outer_stop

Clinical Assertions and Phenotypes

Clinical assertions are one of the most important aspects of your ISCA data. As with the

subject_phenotype field in SAMPLES tab, phenotype information must be provided here as vocabulary:term_id pairs. Again, the only exception is the generic text designation developed for ISCA, “Developmental Delay and additional significant developmental and morphological phenotypes referred for genetic testing.”

(7)

Genomic Location of Variants

Figure 5: Genomic coordinates section of VARIANT CALLS tab

Required fields are indicated in yellow. We strongly recommend you also include outer_start and outer_stop coordinates whenever possible.

Reporting Coordinates in the Pseudoautosomal Region (PAR)

If you are reporting variants that fall within the pseudoautosomal regions of chromosome X (PAR1) and chromosome Y (PAR2), report only the X chromosome coordinates. Please do not use ‘X/Y’.

(8)

FREQUENTLY ASKED QUESTIONS

1. How do I submit “no-call” samples (samples in which I am not reporting any variants)?

Enter sample information for no-call samples just as you would other samples, in the SAMPLES tab. Simply indicate “Yes” in the required field, “is_no_call” (column P).

2. How do I register my lab?

Please see the section “Registering for a Submission Account” at the bottom of page 2 of this document. Additional registration with dbVar is not necessary.

3. How do I report variants in the pseudoautosomal region of chromosomes X and Y?

Please report only X chromosome coordinates for PAR variants. Do not indicate ‘X/Y’ in the chr field.

(9)

APPENDIX A: HPO PHENOTYPE TERMS AND IDs

Patient Identification

Patient Name: ____________________ (Last) ____________________ (First) Gender: [ ] Male [ ] Female Date of Birth: _____________________ (mm/dd/yyyy)

Clinical Information – Check all that apply. Use additional space at the bottom of the form if needed.

Perinatal History

[ ] Prematurity (HP:0001622)

[ ] Intrauterine growth restriction

(HP:0001511)

[ ] Oligohydramnios (HP: 0001562)

[ ] Polyhydramnios (HP: 0001561)

[ ] Non-immune hydrops fetalis (HP: 0001790) [ ] Other: ______________________________ Growth [ ] Failure to thrive (HP: 0001508) [ ] Overgrowth (HP: 0001548) [ ] Short stature (HP: 0004322) [ ] Other: ______________________________ Cognitive/Developmental [ ] Developmental delay (HP: 0001263)

[ ] Gross motor delay (HP: 0002194)

[ ] Fine motor delay (HP: 0010862)

[ ] Speech delay (HP: 0000750)

[ ] Intellectual disability/MR (HP:0001249)

[ ] Other: ______________________________

Behavioral/Psychiatric

[ ] Autism (HP: 0000717)

[ ] Autism spectrum disorder (HP: 0000729)

(includes pervasive developmental delay and Asperger syndrome)

[ ] Attention deficit hyperactivity disorder

(HP: 0007018) [ ] Anxiety (HP: 0007018) [ ] Behavioral/psychiatric abnormality (HP: 0000708) Specify: _____________________________ [ ] Other: ______________________________ Cutaneous [ ] Hyperpigmentation (HP: 0000953) [ ] Hypopigmentation (HP: 0001010) [ ] Other: ______________________________ Neurological [ ] Seizures (HP: 0001250) [ ] Hypotonia (HP: 0001252) [ ] Hypertonia (HP: 0001276) [ ] Cerebral palsy (HP: 0100021) [ ] Encephalopathy (HP: 0001298)

[ ] Structural brain anomaly (HP: 0002011)

Specify: _____________________________ [ ] Other: ______________________________

Cardiac

[ ] Atrial septal defect (HP: 0001631)

[ ] Ventricular septal defect (HP: 0001629)

[ ] Coarctation of the aorta (HP: 0001680)

[ ] Tetralogy of Fallot (HP: 0001636)

[ ] Other structural heart defect (HP: 0002564)

Specify: _____________________________ [ ] Other cardiac abnormality (HP: 0001627)

Specify: _____________________________

Craniofacial

[ ] Dysmorphic facial features (HP: 0002260)

Specify: _____________________________ [ ] Ear malformation (HP: 0000598) Specify: _____________________________ [ ] Cleft lip (HP: 0000204) [ ] Cleft palate (HP: 0000175) [ ] Macrocephaly (HP: 0000256) [ ] Microcephaly (HP: 0000252) [ ] Other: ______________________________ Hearing/Vision [ ] Hearing loss (HP: 0000365) Specify: _____________________________ [ ] Abnormality of vision (HP: 0000504) Specify: ____________________________ [ ] Abnormality of eye movement (HP: 0000496) Specify: _____________________________ [ ] Other: ______________________________ Musculoskeletal [ ] Contractures (HP: 0001371) [ ] Club foot (HP: 0001762) [ ] Diaphragmatic hernia (HP: 0000776) [ ] Limb anomaly (HP: 0002813) Specify: _____________________________ [ ] Polydactyly (HP: 0010442) Specify: _____________________________ [ ] Syndactyly (HP: 0001159) Specify: _____________________________ [ ] Vertebral anomaly (HP: 0003468) Specify: ____________________________ [ ] Other: ______________________________ Gastrointestinal [ ] Gastroschisis (HP: 0001543) [ ] Omphalocele (HP: 0001539) [ ] Anal atresia (HP: 0002023) [ ] Tracheoesophageal fistula (HP: 0002575) [ ] Pyloric stenosis (HP: 0002021) [ ] Other: ______________________________ Genitourinary [ ] Ambiguous genitalia (HP: 0000062) [ ] Hydronephrosis (HP: 0000126) [ ] Kidney malformation (HP: 0000792) Specify: ____________________________ [ ] Cryptorchidism (HP: 0000028) [ ] Hypospadias (HP: 0000047) [ ] Other: ______________________________ Family History

[ ] Parents with ≥ 2 miscarriages

[ ] Other relatives with similar clinical history

Explain: ____________________________ ___________________________________ ____________________________________

As a participant in the ISCA (International Standards for Cytogenomic Arrays) Consortium, this clinical cytogenetics laboratory contributes submitted clinical information and test results to a HIPAA compliant, de-identified public database as part of the NIH’s effort to improve understanding of the relationship between

Instructions: The accurate interpretation and reporting of genetic test results is contingent upon the reason for referral, clinical information provided, and family history. To help provide the best possible service, please check the applicable clinical information below.

(10)

APPENDIX B: ISCA SUBMISSION TEMPLATE

Please see the Excel spreadsheet (ISCA-NCBI Submission Template.xlsx) included in the zipped archive which contained this submission guide.

Figure

Figure 1:  The submission spreadsheet. Please complete the yellow tabs; blue tabs contain instructions
Figure 2:  The SAMPLES tab
Figure 3:  The EXPERIMENTS tab
Figure 4:  The VARIANT CALLS tab
+2

References

Related documents