3.2 Systems and methods
3.2.1 Website organisation and dataflow
The main objective in the design of the AFND was to provide individuals with a software infrastructure to store immune gene frequency datasets, incorporate new submissions, validate existing data by the use of a controlled vocabulary and external libraries, and provide online tools for the consultation and analysis of data. Figure 3.2 shows the typical workflow performed in the submission of a new population. Each submission (normally provided in spreadsheets, tab-separated text files or XML formats) is sent to the AFND or entered online. Then, the website performs a validation process including the confirmation of the official nomenclatures from the IMGT/HLA and IPD-KIR databases. After that, data is classified according to the type of frequency (alleles, haplotype and genotypes). Finally, end users are provided with several views based on the different cases of study described in Section 1.8.
65
Figure 3.2: Workflow and system architecture of the AFND. Submissions
IMGT/HLA
IPD-KIR
Allele Frequency Net Database
Frequency Datasets
Haplotypes Alleles Amino acid
NMDP Genotypes KIR: genotypes diplotypes, haplotypes and genotypes HLA: Haplotypes diplotypes, haplotypes and genotypes
HLA: Rare alleles
diplotypes, haplotypes and
genotypes
HLA: Freq of amino acids diplotypes, haplotypes
and genotypes
HLA, KIR, MIC, Cytokine:
Chapter 3 Development of the AFND website and online tools
66
Based on the available polymorphisms, the AFND website was divided into four main sections containing information on HLA, KIR, MIC and cytokine gene polymorphisms. Each section consisted of one or more searches depending on the availability of data in each polymorphic region, e.g. allele, haplotype or genotype frequency, and a breakdown section to summarise the existing data (Figure 3.3). In the first release of the website, a registration process was mandatory for all users. The registration consisted of submission of basic information about the user. The aim of the registration was to identify the interest of the person using the website to consider possible improvements. To meet the requirements of public databases, user registration was excluded in August 2010 and was only required for submissions of new populations.
Figure 3.3: Screenshot of the AFND website.
Internally, the AFND website consists of two main layouts: user searches and a component for data maintenance (Figure 3.4). The first layer encompasses the different searching tools that can be accessed by any individual and the second a password restricted section for administrators/curators of the database.
67
Figure 3.4: Organisation of the Allele Frequency Net Database website. AFNDWAFNDW
Cytokine
HLA KIR Maintenance
Populations cyt0001a.asp cyt2001a.asp cyt6001a.asp cyt6002a.asp hla0001a.asp hla2001a.asp hla2002a.asp hla6001a.asp hla6002a.asp hla6003a.asp hla6004a.asp hla6005a.asp kir2002a.asp kir6001a.asp kir6002a.asp kir6004a.asp man0001a.asp man0002a.asp man0003a.asp man0004a.asp man0005a.asp man0006a.asp man0007a.asp man0008a.asp man0009a.asp man6001a.asp man6002a.asp man9002a.asp pop0001a.asp pop0002a.asp pop0004a.asp pop0009a.asp pop2001a.asp pop6001a.asp pop6001b.asp pop6001c.asp pop6002a,asp pop6003.asp Cytokine catalogue Cytokine freq entries Cytokine freq search Breakdowns Allele catalogue
Allele freq entries Haplotype freq entries Rare alleles
Alleles high resol Hapl freq search Allele low resol Breakdowns
Genotype freq entries Genotype freq search
Allele freq search Breakdowns Website config Program catalogue User grants Extra tables Journals catalogue Country catalogue Polymorphic regions Locus catalogue User session Event monitor User Catalogue DB management Ethnic origin catalogue
Continent catalogue Geog region catalogue Population catalogue Submit populations Pops by geog region
Pops by country Pop details Pops by Ethnicity Breakdowns kir6005a.asp KIR populations hla6006a.asp
Allele freq search
kir6001b.asp Genotypes by pop kir6001c.asp Genotype details MICMIC mic2001a.asp mic2002a.asp mic6001a.asp mic6002a.asp
Allele freq entries MIC-HLA association entries Allele freq search
Breakdowns
pop6004a,asp
Pop by polymorphic region
hla9001a.asp
Amino acid freqs
hla6001h.asp
Rare alleles detector
kir2001a.asp
Allele freq entries
kir6007a.asp
KIR cell lines
Security
mic6003a.asp
MIC-HLA assoc freq search
Chapter 3 Development of the AFND website and online tools
68
Nomenclature of programs
All searching interfaces available in the AFND website were classified using a code of 8 alphanumeric characters to uniquely identify the application and corresponding features (Figure 3.5). The use of these program codes provided an efficient approach to invoke different interrelated programs, i.e., allele and haplotype frequency searches for the HLA system. h l a 6 0 0 1 a _ x l s 1 2 3 4 5 6 7 8 9 10 11 12
Figure 3.5: Program nomenclature in the AFND website.
The first three characters describe the type of module, which normally defines the polymorphic region (hla, kir, cyt and mic) or an alternative module, i.e. population samples (pop) and management of the catalogues (man). The next four digits (4-7) were used to identify the type of application. For instance, programs from 0001-1999 were assigned to catalogues, 2000-4999 for data processing, 5000-5999 for electronic data exchange, 6000-8999 for searches/reports and 9000-9999 for additional tools. The eighth character was used to identify the release version. Additionally, a suffix was used to define the type of format available for data reporting (xls, csv, txt, xml).