Suggested citation: European Food Safety Authority; Technical report on the use of Excel/XML files for submission of data TECHNICAL REPORT OF EFSA
Technical report on the use of Excel/XML files for submission of data to the
Zoonoses system
1European Food Safety Authority2, 3 European Food Safety Authority (EFSA), Parma, Italy
ABSTRACT
This report summarises the technical aspects and the requirements to be considered in future implementation of XML and Excel submission to EFSA of data on zoonoses, antimicrobial resistance, animal population and food-borne outbreaks in accordance to Directive 2003/99/EC. It applies to aggregated and sample-based data, and defines the practical support expected by the Member States in this exercise. In order to move to ‘one-single’ reporting system maintained by EFSA, it suggests to use the EFSA’s Data Collection Framework tool and to adopt the exchange protocols as defined in the “EFSA’s Guidance on Data Exchange”. It highlights the need to keep and maintain the Zoonoses Web Reporting Application until full automatic transmission is in place in all the Member States and summarises the outcomes of the pilot on AMR isolate-based data submission run in 2011 including some suggestion on how to extend the ‘EFSA Standard Sample Description (SSD)’ model to support the collection of AMR isolate-based data.
© European Food Safety Authority, 2012 KEY WORDS
Zoonoses, Zoonotic Agents, Animal population, Antimicrobial Resistance Food-borne Outbreaks, Data transmission, XML Submission, Excel Submission, Data Collection Framework, Controlled Terminology, Standard Sample Description
1 On request from EFSA, Question No EFSA-Q-2011-00226, approved on 26 January 2012. 2
Correspondence: [email protected] 3
Acknowledgement: EFSA wishes to thank the members of the ‘Working Group on the use of XML and Excel files in the Zoonoses system’: Cristian Apostu, Matthias Hartung, Elina Lahti, Eileen O'Dea, Peter Sewell ,Verena Spiteller, Attila Tarpai and EFSA staff: Fabrizio Abbinante, Stefano Cappè, Marco Leoni, Jane Richardson and Kenneth Mulligan for the support provided to this technical output.
SUMMARY
The European Food Safety Authority (EFSA) is charged with coordinating the annual reporting of zoonoses, zoonotic agents, animal population, antimicrobial resistance and food-borne outbreaks in the European Union under the Directive 2003/99/EC as well as analysing and summarising the data collected. For data collection purposes, EFSA currently provides a web-based reporting application, where the data is manually entered, as well as the possibility, for food-borne outbreaks and antimicrobial resistance data, to submit data using XML (Extensible Markup Language) files.
EFSA is extending the use of XML in order to fully automate the annual submission of data and to improve the quality of the information received by minimising the errors due to manual data entry. In addition, a gradual move to collection of sample-based data by XML is foreseen. At Task Force of Zoonoses Data Collection (EFSA network) meetings, Member States have expressed the need to be supported by EFSA in the future creation and submission of XML files. For some Member States, Excel is, at the moment, the most largely used tool for collecting, storing and aggregating data. Those Member States expressed their need to be supported in translating data from Excel into XML files. To address these requirements and to better plan the next steps in using XML files for reporting on zoonoses, antimicrobial resistance and food-borne outbreaks, EFSA set up a Working Group in order to agree how EFSA can best support the Member States in the use of Excel and XML files.
The conclusions and recommendations of the ‘Working Group on the use of XML and Excel files in the Zoonoses system’ are included in this technical report and are for consideration by the Zoonoses Task Force and by EFSA. It is suggested to use the EFSA’s Data Collection Framework tool to support the automatic submission to EFSA of zoonoses aggregated and sample based data and to adopt the exchange protocols already defined in “EFSA’s Guidance on Data Exchange” This will enable a smooth move to ‘one-single’ reporting system maintained by EFSA but also keeping and maintaining the “Zoonoses Web Reporting Application” in order to allow the Member States to report zoonoses data until the full automatic transmission is in place in all the countries. The use of “simple flat models” instead of “complex data models” is advisable as well as the support of “Comma Separated Values” files in addition to the support of XML and Excel files. The documentation to be provided to the Member States is also to be improved in comparison with the documentation provided in the past years by EFSA.
TABLE OF CONTENTS
Abstract ... 1
Summary ... 2
Table of contents ... 3
Background as provided by efsa ... 4
Terms of reference as provided by efsa ... 5
1. Introduction ... 6
2. AMR Isolate-based pilot ... 6
2.1. Scope of the AMR pilot ... 6
2.2. AMR isolate-based data model ... 7
2.3. Outcomes of the AMR pilot ... 7
2.3.1. The submission process ... 7
2.3.2. Data entry to Excel ... 8
2.3.3. Transformation from Excel into XML ... 9
2.3.4. Submission to Data Collection Framework (DCF) ... 9
2.3.5. Data aggregation ... 9
2.3.6. Overview of the AMR pilot ... 10
3. Submission of XML and Excel files in the Zoonoses system ... 11
3.1. Milestones of the XML submission in Zoonoses ... 11
3.2. Current submission of data in Zoonoses ... 12
3.3. Future submission of data in Zoonoses ... 13
3.4. Submission of Zoonoses data through DCF ... 14
3.4.1. Exchange Protocol ... 14
3.4.2. Validation of pick lists in DCF ... 15
3.4.3. Implementation of business rules in DCF ... 15
3.5. Supported XML schema for Zoonoses in DCF ... 15
3.5.1. XML schema: generic data model vs flat data model ... 15
3.6. Controlled Terminology maintenance ... 16
3.6.1. Publication of pick lists ... 17
3.6.2. Update of pick lists ... 17
3.6.3. Notification to Member States ... 17
3.7. Support of Excel and CSV files ... 18
3.7.1. Excel data models ... 18
3.7.2. Excel macros and Excel tools ... 18
3.7.3. Re-issuing Excel tools ... 19
4. Support expected by Member States for XML/CSV/Excel submission ... 19
4.1. Documentation to be provided to the Member States ... 19
4.2. Helpdesk ... 19
4.3. Training and meetings ... 19
4.4. Supporting IT Tools ... 20
Conclusions and recommendations ... 21
References ... 22
Appendices ... 23
A. AMR isolate-based data model ... 23
B. Proposed revision of the SSD model to accommodate AMR isolate-based reporting ... 28
BACKGROUND AS PROVIDED BY EFSA
EFSA’s Biological Monitoring Unit (BIOMO) is promoting the use of XML in the data transmission of zoonoses, antimicrobial resistance and food-borne outbreaks data to EFSA. This is to both automate the annual submission of data and to improve the quality of the information received by minimising errors due to manual data entry. In addition, XML allows the submission of sample-based data, whereas the manual data entry of sample-based data is not practical. In September 2010 a meeting with Member States' IT experts was held in Parma to address how to extend the use of XML files in annual reporting on zoonoses. It was decided not to extend the use of XML files in 2011 but to implement the XML submission to all table types in 2012. At that meeting, the topic of whether to use Excel files for the submission of data in Zoonoses system was also discussed, as Excel is, at the moment, the tool most largely used in the National Zoonoses Reporting systems.
At the Zoonoses Task Force meeting held in Parma on 9-10 October 2010, the possibility to use Excel files as an intermediate step to allow a smooth migration from manual data entry to submission of XML files was examined. In this context, some Member States expressed the need to set up a Working Group comprising Member States' experts. This Working Group should discuss the technical aspects and the possible solutions on how to use XML and Excel files in zoonoses, antimicrobial resistance and food-borne outbreaks for both aggregated and sample-based data and to define practical support needed by the Member States in this exercise.
In this context, it was also decided to run a pilot in 2011 on isolate-based antimicrobial resistance (AMR) data with some volunteer reporting countries using XML and the Data Collection Framework (DCF) tool and adapting the "Standard Sample Description for Food and Feed” to the specific AMR needs.
In order to concretely support the Member States in their migration to XML transfer and the submission of XML files to the Zoonoses database, it was also decided to launch a Grant call under Article 36 in order to co-finance the implementation of the XML data submission at Member States level.
TERMS OF REFERENCE AS PROVIDED BY EFSA
1. The Biological Monitoring Unit is asked to run a pilot in 2011 with volunteer Member States on the submission of AMR sample-based data in XML using the Data Collection Framework (DCF).
2. The Biological Monitoring Unit is asked to set up a Working Group of EFSA which shall: • monitor the pilot on AMR isolate-based data submission run in 2011 and consider any
potential technical difficulties regarding data extraction from the Member States’ databases and data transmission to EFSA using the Data Collection Framework tool;
• consider how to use XML and Excel files for the provision of aggregated and sample-based data on zoonoses, animal population, antimicrobial resistance and food-borne outbreaks; • investigate the possibility of implementing and maintaining Excel Macros;
• define the IT and non-IT support that the Member States expect from EFSA related to use of XML files and/or Excel files in the reporting;
• report on the progress done to the Task Force on Zoonoses Data Collection; and • prepare an EFSA technical report on the subjects.
Members of the Biological Monitoring Unit and members of the EFSA's ITOP Unit will also participate in the Working Group.
1. Introduction
The European Food Safety Authority (EFSA) is charged with coordinating the annual reporting of zoonoses, zoonotic agents, animal population, antimicrobial resistance and food-borne outbreaks in the European Union under the Directive 2003/99/EC4 as well as analysing and summarising the data collected. EFSA’s Biological Monitoring Unit (BIOMO) is promoting the use of XML in the data transmission of zoonoses, animal population, antimicrobial resistance and food-borne outbreaks data to EFSA. This is to both automate the annual submission of data and to improve the quality of the information received by minimising errors due to manual data entry. In this context, EFSA has set up a Working Group in order to agree how EFSA can best support the Member States in the use of Excel and XML5 files.
As defined in the terms of references, the “Working Group on the use of XML and Excel files in the Zoonoses system” has discussed the following issues:
• outcomes of the pilot on Antimicrobial Resistance (AMR) isolate-based data submission; • use of XML and Excel files for the provision of aggregated and sample-based data on
zoonoses, animal population, antimicrobial resistance and food-borne outbreaks, including the possibility to implement and maintain Excel macros;
• IT and non-IT support that the Member States expect from EFSA related to use of XML files and/or Excel files in the reporting.
2. AMR Isolate-based pilot
EFSA analysed data on antimicrobial resistance (AMR) in-depth during 2009 and 2010 by producing two Community Summary reports covering the years 2004-2008 and by finalising, in close collaboration with European Centre for Disease prevention and Control (ECDC), a third Summary Report for the year 2009. AMR data reported and analysed in those reports were reported by the Member States as aggregated data.
An AMR expert Working Group was set up in 2010 by EFSA6 to propose specifications on the most optimal and scientifically sound way of reporting and analysing AMR data at EU level. Although work is still ongoing, the Working Group has already acknowledged the need for AMR data to be collected and analysed at bacteria isolate level. This proposal has been lately discussed with the Task Force on Zoonoses Data Collection and the proposal was well received. A number of reporting countries put themselves forward to participate in a pilot AMR isolate-based data collection in 2011. 2.1. Scope of the AMR pilot
The AMR pilot had two main objectives. The first objective was to test the collection of AMR data at a deeper level of granularity and verify whether the collection of data at isolate level enables more in-depth scientific analysis. AMR isolate level data gives access to the multi-resistance patterns of isolates, and then allows identification of multi-resistance and description of clonal spreading of resistance. Furthermore, information on multi-resistance is of utmost importance to investigate the association between antimicrobial use and antimicrobial resistance.
The second objective, more technical and analysed in this report, was to test the submission of XML files and Excel files to the Zoonoses system using the Data Collection Framework (DCF). In addition, it was also possible to test the aggregation of isolate-based data and the migration of the aggregated data into the Zoonoses database.
4
Directive 2003/99/EC of the European Parliament and of the Council of 17 November 2003 on the monitoring of zoonoses and zoonotic agents, amending Council Decision 90/424/EEC and repealing Council Directive 92/117/EEC. OJ L 325, 12.12.2003, p. 31–40.
5
2.2. AMR isolate-based data model
The AMR data model used in the pilot exercise was developed by EFSA together with the “AMR expert Working Group” and based on the EFSA’s “Standard Sample description for food and feed (SSD)” model (EFSA, 2010a). After the analysis of the AMR data collection needs, it was concluded that some data fields needed for the collection of AMR isolate-based data were missing in the SSD model and it was decided to use an “ad-hoc” data model based on the SSD model but integrating the missing parts.
The so called “AMR data model” used in the pilot exercise is shown in Annex A. This model used for the pilot exercise will probably need to be further revised and corrected by AMR experts in order to improve the collection and analysis of AMR isolate-based data. These modifications, or extensions, are outside the scope of this technical report.
This Working Group, as outcome of the pilot exercise, performed a deep analysis of the differences between the SSD model and the model used in the AMR pilot and makes some proposal on how to extend the SSD model in order to accommodate the AMR isolate-based data collection needs. The results of this comparison are reported in “Annex B. - Proposed revision of the SSD model to accommodate AMR isolate-based reporting” and are for consideration by the future “SSD Review Committee” to be set up by EFSA.
2.3. Outcomes of the AMR pilot 2.3.1. The submission process
During the AMR pilot two alternative submission routes were made available by EFSA to the Reporting Countries as visualized in Figure 1: manual submission of Excel files or manual submission of XML files. The possibility to use web-services instead of manual submission, which is available in DCF, was not tested or used by any reporting country during the pilot exercise.
Figure 1: Submission process in the AMR pilot
As shown in the diagram, some countries manually entered the data in a specific “data entry” spreadsheet developed by EFSA and included in the provided “Excel pilot workbook”. The information was automatically codified in a “to be submitted” excel spreadsheet. Other countries exported the data from their database directly into Excel, either in the format requested by EFSA (i.e. the “to be submitted” format) or in the “data entry” spreadsheet. At this point some countries created from the “to be submitted” file, an XML file, whereas other countries decided to export the data from their database directly into XML files. At the end of this process, either a XML file or an Excel file was ready to be submitted to DCF.
The final stage was for EFSA to aggregate the submitted data and return a summary of the aggregation to the Member States for review. At the end of the pilot, with the consensus of the reporting country, some aggregated tables generated from the isolate-based data provided in the pilot were migrated in the Zoonoses database and used as official “AMR Quantitative Tables” which are used for the preparation of the 2010 EU Summary Report.
The following text summarizes the conclusions based on issues encountered along the process steps. 2.3.2. Data entry to Excel
The Excel template provided by EFSA proved its worth. The manual data entry into this Excel file was more convenient than to the "Zoonoses Web Reporting Application". The use of a very large Excel file can cause memory issues and difficulties when transmitting it. A consideration for the future may be to distribute its content across more than one file.
In spite of the acknowledged support of the Excel file, this stage proved to require the highest workload because the source data must be mapped to set pick lists and field definitions.
The length of the pick lists and especially the composition of non-hierarchical information within the “food categories or animal species”, “sampling stage” and “sampling context” pick lists resulted in a daunting task. Therefore, the revision by EFSA of such pick lists is suggested. Beyond this, the harmonisation of controlled terminologies between EFSA and ECDC is desirable to reduce mapping efforts.
The automatic coding of entered terms which was implemented in the Excel tool provided in the pilot functioned flawlessly (on condition that the entered term was valid). This was of significant benefit to Member States.
In addition to the substantial support for the participants supplied by EFSA (i.e. web-conferences at the beginning of the project, data dictionary, XML schema documentation and diagram, assistance by email) even further documentation would be useful for getting started in such an exercise.
2.3.3. Transformation from Excel into XML
Due to the flat hierarchy of the schema used in the AMR pilot, the transformation of an Excel file to a well-structured XML file was straightforward. It is favourable to use exactly the same header as defined in the EFSA's "Guidance on Data Exchange" for the Standard Sample Description-Message, as the one used for the pilot was, by mistake, slightly different and needed corrections.
In case EFSA provides or suggests an IT tool for the transformation of data to XML, it is recommended to suggest a free tool and not a commercial product, or at least complement any proprietary tool suggested with one open source option (like the suggestion of using Comma Separated Values files (CSV) instead of Excel Files).
2.3.4. Submission to Data Collection Framework (DCF)
The file upload via DCF worked well. However, the need to limit the size of uploaded files was highlighted since large files can cause a time out during the submission phase. It is suggested to split the data into several smaller files and to alert the reporters to the possible use of .zip files.
To accelerate the data testing and correction cycle, access to DCF should be granted to Member State data managers and not only to the zoonoses reporting officers.
Concerning the enhancements of DCF the following points were highlighted:
• Currently DCF supports only Excel files in ‘xls’ format. EFSA should investigate the possibility to support ‘xlsx’ format as well.
• The result of loading data “Partially inserted” is misleading as it indicates that there was an error but not clearly so. Use of the phrase “Error(s) in loaded data” would be clearer.
• The user interface could be more user friendly with regard to the design, alignment, labels and colours of buttons.
• The performance of DCF in general should be improved in terms of responsiveness of the application.
2.3.5. Data aggregation
The aggregation of the isolate-based data revealed errors within the provided data that were not detected by data validation (e.g. due to wrong “isolate codes” reported in the pilot or rows not submitted to EFSA). The pilot showed the need of human review of the submitted data to ensure completeness and correctness of the reported samples and the need of implementing business rules for both the submitted data in DCF but also additional “business rules” implemented by EFSA on the aggregated data.
2.3.6. Overview of the AMR pilot
It can be concluded that the pilot was a success with the contributing Member States all able to provide isolate-based AMR data within the agreed three month project window. Furthermore, from a technical point of view, no major problems were encountered during any phase of the pilot and where minor issues were raised these were resolved quickly and efficiently.
With regards to the tools made available by EFSA, the Excel template received praise for its ease of use and provided a common launch pad for those submitting Excel files to DCF, and similarly the mechanism whereby the Excel files were transformed to XML was simple and straightforward. Finally, from a technical point of view the submission of files through the DCF interface was trouble-free, even tough the usability of DCF could be improved.
There were elements of the process that were more time consuming, one obvious candidate being the mapping of the original data to EFSA's pick lists. In fact, this data management step was identified as the main workload of the project. Any facilitation of this task would be most useful and therefore will be considered for future projects. Provision of more documentation on the process would also be useful. Although not as significant a draw on resource, it was noted that time was also spent validating the submitted data once it had been aggregated into summary reports. Development of business rules to validate Member State’s submissions to ensure data quality, as mentioned above, should reduce the need for this over time. Overall, it was felt that the pilot was a valuable step in the evolution of zoonoses data collection.
3. Submission of XML and Excel files in the Zoonoses system 3.1. Milestones of the XML submission in Zoonoses
The submission of XML files to the Zoonoses system started in 2008 with a pilot project for submission of food-borne outbreaks (FBO) data and continued in 2009 with an extended pilot for submission of AMR aggregated data. For this purpose, EFSA developed a XML schema supporting the submission of XML files for both FBO and AMR aggregated data. This FBO and AMR XML Schema, developed in 2008 and extended in 2009, was based on a so-called “generic model” with the intent to support the submission of all tables in Zoonoses by using the same generic XML Schema. The complex structure of the XML Schema supporting the generic model and the complexity of the validation of the pick lists, discouraged the Member States from using this schema for reporting zoonoses data to EFSA; only two countries, from 2008 to 2011, were able to submit XML files using the generic model.
In 2010, EFSA ran a survey of the Member States in order to understand the status of the national data collection systems in place in the different Member States and their willingness to submit data to EFSA using XML files. The survey showed the evident need to develop a system able to support not only the submission of XML files but also the submission of Excel files and to develop a plan for XML/Excel submission together with the Member States. For this reason, EFSA decided to hold the first Zoonoses IT Task Force meeting with the Member States’ IT experts, to discuss the outcome of the survey and to better plan the transmission of XML files to the Zoonoses database. At that meeting, it was decided to put on hold the development of XML submission and have one year of discussion, piloting and trial. To follow up these decisions, in 2011 EFSA set up the “Working Group on the use of XML and Excel files in the Zoonoses system” (group responsible of this technical report). It was also decided to concretely support the Member States by launching a call for Grants under Article 36 in order to co-finance a 20-month-projects for the development of XML submission of Zoonoses data, and to run a pilot (using DCF) for the submission of AMR isolate-based data using XML and Excel files.
To conclude these initiatives and to discuss and plan the next steps, EFSA organised the 2nd Zoonoses IT Task Force meeting with the Member States’ IT experts on 21-22 September 2011.
The following paragraphs summarise the decisions taken and suggestions proposed to the Task Force on Zoonoses Data Collection by the experts of the “Working Group on the use of XML and Excel files in the Zoonoses system”, taking in consideration the conclusions of the 2nd Zoonoses IT Task Force meeting.
3.2. Current submission of data in Zoonoses
An overview of the Zoonoses Reporting System in place for the reporting season in 2011 is shown in Figure 2.
Figure 2: Zoonoses Reporting System 2011 In 2011, EFSA supported:
• The “Zoonoses Web Reporting Application” for the manual provision of aggregated data in the Zoonoses database.
• Two “loaders”: one for the ‘generic model’ to report FBO and AMR aggregated data using XML files only, one for the ‘simple flat model’ to report to AMR isolate-based data to DCF using XML or Excel files.
• Two “User Management modules”: one in Zoonoses and one in DCF, • Two “Dictionary Management modules”: one in Zoonoses and one in DCF.
The migration of the AMR isolate-based data submitted to DCF during the pilot was performed by EFSA (after aggregation) and after validation (via email) by the Member States.
This reporting system is not efficient due to the high costs of maintenance at EFSA (duplication of systems), and for the Member States, as they have to interface two different systems.
3.3. Future submission of data in Zoonoses
In order to improve the current system, to minimise the maintenance costs and to allow the use of one single system, EFSA, together with the IT experts of the Member States at the last Zoonoses IT Task Force meeting, agreed to use DCF for the provision of XML and Excel files to EFSA, and to support also the submission of CSV files. Therefore, the scenario foreseen for the future Zoonoses reporting system is illustrated in Figure 3:
Figure 3: Zoonoses Reporting System in future In this scenario, EFSA will support:
• The “Zoonoses Web Reporting Application” for the manual provision of aggregated data in the Zoonoses database. This web application is to be maintained until all the Member States are able to submit all the data via XML or Excel/CSV files.
• One single XML/Excel/CSV “loader” in DCF supporting ‘simple flat models’ for reporting aggregated data, narrative text forms, isolate-based or sample-based data using XML, Excel or CSV files.
• One single “Dictionary Management module” • One single “User Management module”
In the new system, aggregated data submitted through DCF will be automatically imported in the Zoonoses database, and isolate-based or sample-based data submitted through DCF will be automatically aggregated and then migrated into the Zoonoses database, where the Member States will be able to access and validate the submitted data.
Data submitted via XML/Excel/CSV should be corrected or amended by submitting a new XML/Excel/CSV file instead of manually correcting through the “Zoonoses Web Reporting Application”.
3.4. Submission of Zoonoses data through DCF
In the scenario described in the previous paragraph, Member States will have several possibilities for submitting Zoonoses data to EFSA. The process of preparing and submitting Excel, CSV or XML files to the Zoonoses database via DCF is illustrated in Figure 4s where all the possibilities provided to the Member States are covered.
Figure 4: Submission process in future
3.4.1. Exchange Protocol
For the exchange protocol, the Zoonoses system will embrace the manual (manual post) and automatic (web-services or FTP) submissions of XML/Excel/CSV files already supported in the DCF system and defined in the EFSA’s "Guidance on data exchange” (EFSA, 2010b).
Any future modification of this guidance should be also discussed and agreed with all relevant experts, including Zoonoses experts, in order to take in consideration the needs and the requirements of the provision of Zoonoses data to EFSA. This is anticipated to be the function of the Standard Sample Description (SSD) and Data Transmission review committee which EFSA intends to convene.
In this context, this Working Group suggests to allow in the DCF the reporting for “manual post” of XML messages having “dataset” as root element. It turned out that DCF ignores all header information contained in an XML file when uploaded manually. For automatic submissions the “message” entity should have all the header elements mandatory. The presence of the “message” element should be checked only in the latter case. The SSD XML Schema is already able to validate documents having as root element “message” or having as root element “dataset”.
3.4.2. Validation of pick lists in DCF
For the submission of XML, Excel and CSV files, validation of the pick list’s terms is undertaken by DCF during the upload process.
It is suggested, in order to minimise the maintenance of the XML schema and Excel mapping tools (see 3.7.2- Excel macros and Excel tools), that the validation of pick list’s terms should be kept inside the DCF loading system and not be included inside the XML Schema. In this way, the pick lists validation will be the same for submitting data in XML, in Excel or in CSV format.
3.4.3. Implementation of business rules in DCF
Business rules applicable to the Zoonoses data collection should all be implemented inside DCF. As in the case of validation of pick list’s terms, this will minimise the maintenance costs and would allow the use of the same tool for all the input types i.e. the same business rules will be used when submitting data in XML, in Excel or in CSV format.
3.5. Supported XML schema for Zoonoses in DCF
In 2011, the Zoonoses reporting system supported two different types of XML schema.
The ‘Zoonoses’ generic model’, used from 2008 to 2011, based on the principle that any data collection could be mapped in a model that answers the following logical questions: “What”, “Who”, “Where”, “When” and “Why”. Such a model could accommodate different data collection needs as different tables can be mapped against the same list of generic attributes and elements. In practical terms, this means that one generic data model (and therefore the same XML schema) could be used to collect several tables, which is the case of the Zoonoses data collection system.
The ‘AMR Isolate-based model’ based, instead, on a simple structure where the table’s data elements are mapped against a simple flat file. This model, introduced in spring 2011, was shown to be easily understood and implemented; after its publication twelve Member States involved in the AMR pilot project were able to submit data to EFSA within weeks.
A comparison between the two different schemas is provided in the following paragraphs. 3.5.1. XML schema: generic data model vs flat data model
The use of the original generic schema was advocated by the attendees at the 1st Zoonoses IT Task Force in September 2010. Whilst the schema was considered comprehensive and allowed one data model to be used for all zoonosis data types, it was also very complex.
The benefits of a single schema include low maintenance and support for the one-to-many relationships inherent in the zoonosis sample data between samples and analysis results. Use of a schema supporting such relationships would reduce the volume of data to be transmitted from Member States to EFSA due to non-repetition of sample information for each result associated with the sample.
The main disadvantage of the generic schema is its complexity. In addition, many Member States use Excel as a mid-step in applying EFSA codes to their national data prior to creation of XML data per the schema and this process is not available when using the complex generic schema.
The flat file schema was discussed in depth at the 2nd Zoonoses IT Task Force meeting in September 2011. The discussion groups were unanimous in advocating adoption of the simple flat file schema. Significantly, the only Member State to have used the generic schema for AMR data submission also advocated adoption of the simple flat file schema in preference to the generic one.
The main benefit of the flat file schema is its simplicity which makes it readily understandable. In addition, the support EFSA provided for its use in the AMR XML Pilot project, including the Excel mid-step for EFSA code creation, made its use straightforward for Member States. Although the flat file schema should be maintained as simple as possible, sometimes the use of ‘choice elements’ in the schema could be considered (e.g. in the AMR data model, two alternative mandatory sections could be implemented - one to be used for dilution results and one for diffusion results).
There are challenges associated with the simple schema. These include:
• Necessity for EFSA to maintain separate schemas for each of the Zoonoses data types • Risk of incorrect de-normalisation even when starting from a correct dataset
• Data volumes are increased due to de-normalisation
Data volume implications are not significant when transmitting aggregated data for Zoonoses or AMR. Data volume is also not an issue for sample based AMR which is typically a relatively small dataset. If a move to sample-based data reporting for Zoonoses was to be piloted, data volumes for some larger Member States could become a challenge and this would need to be explored with those Member States affected. However, experience in other EFSA’s data collection projects using DCF has shown that the benefits of simple flat schema far outweigh any issues associated with data volumes.
In conclusion, Member States are strongly in favour on the use of the simple flat file schema for future Zoonoses data collection needs. It is also recommended (where the tables involved are similar) to minimise the number of simple flat file schemas used, like in the case of “Serovar” and “Phagetypes” tables. This means that for future Zoonoses data collection needs, the following simple flat file schema should be developed:
a. Prevalence tables
b. Serovar and Phagetypes tables c. AMR Isolate-based tables
d. AMR Quantitative (Aggregated) tables e. AMR Qualitative (Aggregated) tables f. AMR Cut-off values tables
g. Animal population table h. Disease Status tables i. FBO tables
j. Text Forms
Concerning the use of simple flat files schema, there are technical aspects to be considered in more detail, and reporting solutions to be suggested that are outside the scope of this technical report. For example, when reporting aggregated data in Zoonoses for prevalence tables the information regarding serovars and phagetypes should be an optional section in order to allow the reporting of no positive findings. In this way, the simple flat schema for Prevalence tables can support the submission of no positive results and also the submission of results where more than one serovar or phagetype has been identified from a sample7. All these technical aspects are left to EFSA for analysis and are to be addressed in the documentation provided to the Member States (See 4.1- Documentation).
3.6. Controlled Terminology maintenance
The adoption of the simple flat file schema has no implications for validation of correct values against the EFSA’s controlled terminology values (also known as ‘pick lists’). Since pick list validation will be done on the DCF platform, rather than in the published XML schema (see 3.4.2- Validation of pick lists in DCF), pick list validation will remain the same for Member States submitting data in XML, in Excel or in CSV format.
7
In this context, it is crucial, in order to support the reporting of data to EFSA, that an effective publication and maintenance system of the EFSA’s controlled terminology be created as well as the implementation of a notification system to inform the Member States about any modification to the published pick lists. The following paragraphs cover these requirements in details.
3.6.1. Publication of pick lists
Pick lists are to be made available to Member States on the Internet (i.e. the EFSA Website and through web-services).
Member States should be able to download the pick lists from the Internet or to subscribe to a dedicated web-service tool which will allow the synchronisation of the EFSA’s pick lists with Member States’ copies stored in the national systems.
Both a full format of all controlled terminology values and a list of changes since the previous version will be made available for download or synchronisation to facilitate Member States who implement mapping in their system from national values to EFSA values.
There are a number of options for the format of publication and these include XML, Excel, CSV and all of these should be supported.
3.6.2. Update of pick lists
Availability of pick lists for Member States’ use in submission of data in XML, Excel or CSV format should be done as early as possible. The Working Group suggests that EFSA should define a mechanism for ongoing identification and notification to EFSA of new values that will be needed during the reporting season. This mechanism should be available to Member States throughout the year as analysis results are continuously recorded in the national systems
The EFSA proposal for annual amendments should be published well in advance of the reporting period of each year. This will allow consideration of the suggestions by EFSA and the Zoonoses Task Force group in good time and should contribute to minimising the changes necessary during the reporting period.
Pick lists will include for each value the ‘Valid from (date)’, ‘Valid to (date)’, ‘Last update (date)’ information to facilitate identification of the changes, the dates of changes and the period for which values were valid.
Notification to Member States of changes should include identification of insertions, amendments and deletions8 and should be sufficiently detailed to allow reporting officers to know whether the changes are relevant to his/her Member State’s data.
Where they are not applicable, ensuring synchronisation of the Member States’ national information system and the EFSA pick lists may not be urgent.
3.6.3. Notification to Member States
In order to allow data transmitters to register their interest and receive automatic notifications of changes in the EFSA’s pick lists, the Working Group proposes the possibility for interested users in Member States (reporting officers as well national interested reporters or data managers) to subscribe on a notification system that should provide full lists, changes lists and re-issued Excel Mapping Tool
8
(see 3.7.2 - Excel macros and Excel tools) whenever updated by EFSA. Subscribers should be able to choose preferred format for download in the subscription setup.
3.7. Support of Excel and CSV files
The DCF allows the submission of Excel files and also CSV files without additional need of enhancing the DCF. The Working Group suggests supporting both formats for the submission of Zoonoses data to EFSA.
CSV file format is not proprietary, has no limitations on the number of rows that can be provided and is not linked to a specific version of any proprietary tool and therefore can be better supported over time. Member States can use Excel to prepare the data for submission and then can transform it in CSV format to minimise the possible errors due to Excel handling.
Excel can also be used as intermediate step to transform the national data into XML files to be submitted by EFSA. If this is the case, EFSA should provide any needed support and guidelines for this transformation. It is nevertheless not recommended to develop a specific tool to be distributed to the Member States for converting Excel files into XML files.
3.7.1. Excel data models
The data models defined by EFSA for the provision of both aggregated and sample-based data should be available also in Excel format and not only in XSD format.
The data models in Excel format should indicate:
• The data elements to be provided for each type of table, with these details: o Data element code (to be used as reference code in the data model) o Data element short name (to be used in the corresponding XML schema)
o Data element long name (to be used to refer to the data element in the guidelines) o Data element label (to be used in the "Zoonoses Web Reporting Application" or in
PDF)
o Data element description
• The type of the data elements (i.e. numeric, text, date) • The length of the data fields
• The pick lists to be used for the data fields, where a pick list is applicable • Information about mandatory data fields and optional data fields
• Business rules to be applied to the data elements • Some examples of valid data elements
3.7.2. Excel macros and Excel tools
For optimising and enhancing the use of Excel and CSV files, EFSA should support the Member States as much as possible especially in the mapping of the national pick lists onto the EFSA pick lists and in the translation of the data for submission into “codes” accepted by DCF.
In this context, the Working Group suggests EFSA should not develop and distribute ad-hoc Excel Macros but should provide Member States with precise guidelines and examples on how to map and codify the national pick lists against the EFSA pick lists and how to prepare an Excel or CSV file ready to be uploaded in DCF.
A simple mapping tool in Excel format, like the one provided during the AMR Isolate-based pilot, should be provided by EFSA to support the Member States. This can be achieved for example by providing three spreadsheets:
• Spreadsheet_1 with the data fields extracted from the national reporting system in the right order defined by the data model
• Spreadsheet_2 with the EFSA pick lists where to import the national pick lists for mapping • Spreadsheet_3, where, by means of the VLOOKUP functionality and by using the mapping of
Spreadsheet_2, the extracted data entered in Spreadheet_1, is converted in “codes” accepted by DCF.
This mapping tool should be available for download to the Member States. 3.7.3. Re-issuing Excel tools
When pick lists are updated, the EFSA simple mapping tool which assists Member States in allocating EFSA codes to their national data, needs to be re-issued. The Working Group sees the maintenance of this Excel file by EFSA as a valuable support to Member States.
4. Support expected by Member States for XML/CSV/Excel submission
This Working Group discussed the support expected by the Member States in reporting XML/Excel/CSV files to Zoonoses.
Some recommendations for EFSA about the documentation, helpdesk, IT tools, training and meetings needs are reported in the next paragraphs.
4.1. Documentation to be provided to the Member States
The following documentation is suggested to support the collection of Zoonoses data using DCF via XML/CSV/Excel files:
• Publication of a “Read me” file to indicate where to find all the information about the documentation available
• Publication of data elements collected and their definitions in PDF format • Publication of data models in Excel format (see 3.7.1- Excel data models) • Publication of XML schemas (both in XSD and as a diagram)
• Publication of EFSA’s pick list and guidelines about the notification system (see 3.6.1-Publication of pick lists and 3.6.3-Notification to Member States)
• Publication of Excel mapping tools (see 3.7.2 - Excel macros and Excel tools) • Publication of business rules
• Publication of meta data values applicable for the data collection (e.g. value for "dcCode" as the unique identifier of the data collection)
• Publication of sample files for each table (in CSV, Excel and XML format) • Publication of the DCF User Manual (user interface and registration process) 4.2. Helpdesk
It is suggested that the helpdesk provided by EFSA to the Member States for the collection of Zoonoses data for supporting the Member States in the use of DCF for Zoonoses monitoring purpose should continue. Therefore, it is recommended that the [email protected] mailbox be used for any technical or scientific requests for support.
4.3. Training and meetings
Force meetings, or, upon request, in the Member States (to allow wider participation of data managers and scientists from official reporting organisations and laboratories). These trainings or workshops should focus on the following topics:
• Techniques for mapping between reporting organisation’s controlled terminologies and EFSA pick lists
• Understanding XML and XSDs
• Creating XML files from Excel and the use of the XML map functionality • Submitting data files via the DCF
• Interpretation of the DCF automated error messages
• Sharing best practice in data collection, preparation and reporting
To support these workshops, training material would be developed and, when possible, use will be made of existing training materials developed by the ECDC. The materials would include PowerPoint presentations, frequently asked questions, video clips and worked examples with Excel workbooks. The training materials would be available for download from the same location as the documentation described in point 4.1.
4.4. Supporting IT Tools
To support the creation and provision of XML/Excel/CSV files through DCF, it is suggested that EFSA support Member States with some basic tools to allow the mapping of the national pick lists against the EFSA pick lists as already described in paragraph 3.7.2 - Excel macros and Excel tools. In addition, EFSA should suggest some non-proprietary and free tools to allow the conversion of Excel and CSV files into XML files.
CONCLUSIONS AND RECOMMENDATIONS
The conclusions and recommendation of this Working Group for the submission of XML and Excel files in the Zoonoses systems are summarised below. For details, it is recommended to refer to the corresponding chapters of the technical report.
AMR Isolate based pilot
As outcome of the AMR isolate-base reporting pilot it is suggested:
• to ask AMR’s experts to revise the list of data elements collected for AMR isolate-based analysis needs, if there is the need to improve the analysis of AMR resistance tests;
• to continue the use of DCF for reporting AMR isolate-based data to Zoonoses;
• to align and harmonise the “message header” and the “transmission section” of all XML schemas used in DCF;
• to improve the DCF interface to provide better feedback and readability of data to data providers;
• pending the introduction of the Isolate concept in the SSD model (see Annex B), to consider the use of an extended SSD model for reporting AMR isolate-based data.
Automatic transfer of data to Zoonoses
With reference to automatic transfer of data to the Zoonoses database, it is recommended: • to adopt and use the Data Collection Framework tool (DCF) and more in details:
o to support not only submission of Excel and XML files but also of CSV files; o to adopt and use the validation of pick lists inside DCF;
o to adopt and use the validation of business rules inside DCF;
o to adopt and use the exchange protocols defined in the EFSA "Guidance on Data Exchange”;
• to implement one single “User Management” module; • to implement one single “Dictionary Management” module; • to drop the “generic data model” so far used in Zoonoses;
• to implement “flat data models” for the reporting of aggregated and sample-based data on Zoonoses;
• to review and harmonise the EFSA’s pick lists:
o between the different Data Collection projects in EFSA; o between EFSA and ECDC;
• to implement an automatic mechanism of publishing EFSA’s pick lists and a notification system to inform Member States when there are some changes to the EFSA pick lists;
• not to develop Excel Macros but to provide some “Excel tools” to help data providers in the mapping of national pick lists against the EFSA pick lists;
• to grant access to the test environment in DCF to all data managers of the Member States. IT and Non-IT Support
With reference to the IT and non-IT Support expected by Member States for automatic transfer of data to the Zoonoses database, it is recommended:
• to improve the documentation provided to the Member States;
• to extend the Zoonoses helpdesk in supporting XML/Excel/CSV submission; • to organise training and ad-hoc meetings for the Member States.
The aforementioned recommendations are addressed to EFSA and to the Zoonoses Task Force to allow proper planning and implementation of XML/Excel/CSV transmission in Zoonoses.
In addition, the suggestions on how to extend the SSD data model to accommodate AMR isolate-based data collection needs are reported in “Annex B. - Proposed revision of the SSD model to accommodate AMR isolate-based reporting” and are for consideration by the future “SSD Review
REFERENCES
European Food Safety Authority 2010a; Standard sample description for food and feed. EFSA Journal 2010;8(1):1457 [54 pp.]. doi:10.2903/j.efsa.2010.1457
European Food Safety Authority 2010b; Guidance on Data Exchange. EFSA Journal 2010; 8(11):1895 [50pp.]. doi:10.2903/j.efsa.2010.1895
Langual 2010; The International Framework for Food Description 2010 http://www.langual.org/langual_Thesaurus.asp
APPENDICES
A. AMR ISOLATE-BASED DATA MODEL
Element Code Element Name
Short Element name for XML/Excel transfer
Type Mandatory Definition Picklist Business Rules in the XML Schema
Business Rules in SAS
AMR.00 Result Code resultCode xs:String(60) Mandatory Unique identification number of the AST result (a row of the data table) in the
transmitted file. The result code must be maintained at organisation level and it will be used in further
updated/deletion operation from the senders.
For the purpose of the 2011 AMR Pilot, EFSA and not the organisation sending the data will build the "Isolate's Result Code".
The resultCode will be built as follows:
resultCode =[repYear] + [repCountry] + ["Z" + zoonose] + ["S" + sampType] + ["M"+method] + ["A"+AMR_substance] + "D" + SampY+SampM+SampD] + ["I" + labIsolCode]
- "repCountry" is the ISO-3166-1-alpha-2 of the reporting Country
- "zoonose","sampType","AMR_Substance" are the
corresponding codes in the pick lists (with as many "0" as needed in front of the code to reach the length of 5); Example: "Code = 123" --> "00123"
- "method" is the corresponding code in the pick list (with as many "0" as needed in front to reach the length of 3) Example: "Code = "12" --> "012"
- "sampD" may be missing, in this case it is replaced by "00" Examples of resultCode, where sampD ("day of sampling") is known or not known:
-2009ATZ00121S159613M023A03561D20011203IISOLCX324
INFO ABOUT COUNTRY AND LABORATORY
AMR.01 Reporting Year repYear xs:decimal(4, 0) Mandato ry Reporting Year >=1980 and <=2020 AMR.02 Reporting Country repCountry xs:string(2) Mandato
ry
Reporting Country
LK_Country
AMR.03 Language lang xs:string(2) Mandato ry language used to fill in the free text fields LK_Language ="en"
(for the pilot only "en" is allowed)
="en"
AMR.04 Laboratory Identification Code labCode xs:string(100) Optional Identificatio n code of the laboratory in the country performing susceptibilit y test (AST) of the isolate Free text
INFO ABOUT THE TYPE OF SAMPLE AND THE ISOLATE
AMR.05 Zoonosise zoonose xs:string(5) Mandato ry Zoonosis species, serovar or phagetype LK_Zoonose
AMR.06 Sample Type (food category or animal species)
sampType xs:string(5) Mandato ry Food categories or animal species of the sample LK_Species
AMR.07 Laboratory Isolate Code labIsolCode xs:string(20) Mandato ry Alphanumer ic code given to the isolate by the laboratory that performs the AST Free text
AMR.08 Total number of isolates available in the laboratory
labTotIsol xs:decimal(4, 0) Optional Total Number of isolates available in the >0
INFO ABOUT SAMPLING
AMR.9 Program Code progCode xs:string(20) Optional Unique
identification code of the programme or project for which the sample analysed was taken
Free Text
AMR.10 Area of Sampling sampArea xs:string(5) Optional Area or Region or Province of the sampling (in accordance to the NUTS standard) LK_NUTS to be specified when sampType refers to an animal species AMR.11 Sampling Context sampConte
xt
xs:string(5) Optional Sampling Context LK_Sampling Context
AMR.12 Sampling Stage sampStage xs:string(5) Optional Sampling Stage LK_Sampling_Stag e
AMR.13 Sampling Details sampDetail s
xs:string(100) Optional Sampling Details Free Text AMR.14 Sampling Year sampY xs:decimal(4,
0)
Mandatory Year of sampling. AMR.15 Sampling Month sampM xs:decimal(2,
0)
Mandatory Month of sampling.
AMR.16 Sampling Day sampD xs:decimal(2, 0)
Optional Day of sampling. AMR.17 Isolation Year IsolY xs:decimal(4,
0)
Optional Year when the isolation was completed
AMR.18 Isolation Month IsolM xs:decimal(2, 0)
Optional Month when the isolation was completed
AMR.19 Isolation Date IsolD xs:decimal(2, 0)
Optional Day when the isolation was completed
AMR.20 Susceptibility Test Year analisysY xs:decimal(4, 0)
Optional Year when the AST was completed
AMR.21 Susceptibility Test Month analisysM xs:decimal(2, 0)
Optional Month when the AST was completed
AMR.22 Susceptibility Test Day analisysD xs:decimal(2, 0)
Optional Day when the AST was completed
INFO ABOUT METHOD AND
ANTIMICROBIAL SUBSTA NCE
AMR.23 Method method xs:string(3) Mandatory Method for the antimicrobial susceptibility testing LK_Method AMR.24 Antimicrobial Substance
substance xs:string(5) Mandatory Antimicrobial Substance
LK_Substances
AMR.25 Cut-off value cutoffValue xs:double Mandatory Cut-off value
INFO ABOUT MIC Values (FOR DILUTION)
AMR.2 6
Lowest (limit)
lowest xs:double Optional Lowest limit (for dilution) lowest>=0. 008< and lowest<=40 96 Mandatory when Laboratory Method is = "dilution" and highest>lowe st AMR.2 7 Highest (limit)
highest xs:double Optional Highest limit (for dilution) highest>=0 .008< and highest<=4 096 Mandatory when Laboratory Method is = "dilution" and highest>lowe st AMR.2 8 MIC values (mg/L)
MIC xs:string(5) Optional MIC value (mg/L) (Minimal Inhibitory Concentration) LK_MIC_Values Mandatory when Laboratory Method is = "dilution"
INFO ABOUT Disk and IZD Values (FOR DIFFUSION) AMR.2 9 Disk concentratio n (µg)
diskConc xs:double Optional Quantity of antimicrobial per the disk >0 Mandatory when Laboratory Method is = "diffusion" AMR.3 0 Disk diameter (mm)
diskDiam xs:double Optional Diameter of the used disk disk diameter>= 6 and disk diameter<= 35 Mandatory when Laboratory Method is = "diffusion" and IZD>=disk diameter AMR.3 1 IZD values (mm)
IZD xs:double Optional IZD value (mm ) (Inhibition Zone Diameter) IZD>= 6 and IZD<=35 Mandatory when Laboratory Method is = "diffusion" and IZD>=disk diameter AMR.3 2
comment ResComm xs:string(25 0)
Optional Additional comment for the AST result
Free Text
Example of types:
xs:String(40) text field (= String) of maximum 40 characters
xs:decimal (4,0) numeric fields made of 4 digits and no decimals e.g. 2010 xs:double numeric fields with decimals e.g. 124.56
The data type xs: double and the other numeric data types which allow decimal separator requires the decimal separator to be a “.” while the decimal separator “,” is not allowed.
B. PROPOSED REVISION OF THE SSD MODEL TO ACCOMMODATE AMR ISOLATE-BASED REPORTING
The version of the SSD and the associated controlled terminologies were compared with the AMR isolate-based data model used in the AMR isolate-based pilot (version 2.1). This Annex contains a description of the main differences between the two data models and some suggestions for the future “SSD review committee” on how to possibly extend the SSD model in order to accommodate the AMR isolate-based data collection needs.
Isolate Entity
The major difference between the SSD and the AMR pilot model is the lack in the SSD model of the “Isolate” entity and concept.
The SSD is based upon five key entities “Laboratory”, “Local Organisation”, “Laboratory Sample”, “Laboratory Sub-Sample”, and “Result”. This structure allows laboratory samples to be subdivided in sub-samples, if specified by legislation or analytical protocol, and the associated laboratory results to be reported for one9 or more sub-samples.
The AMR pilot data model is designed to allow reporting of the results of the antimicrobial susceptibility testing of bacterial isolates. It is very important to be able to link the susceptibility testing of a single isolate in order to investigate multiple antibiotic resistance patterns.
Figure 5 below shows a suggestion on how to extend the entity relationships model of SSD in order to accommodate the AMR isolate-based model.
Figure 5: Extension of SSD entity relationships to include Isolate entity.
An isolate is characterised by the following elements:
• the unique isolate code used as an identifier in the laboratory,
• the species or relevant subtypes of the isolate (including the phagetype or serovar where available)
• the date of isolation
Missing AMR data elements in the SSD model
The SSD has been developed with a minimum number of mandatory elements, on the understanding that specific topic areas may need to specify additional elements as mandatory.
This is already applied in pesticide monitoring where optional elements in the SSD related to the maximum residue limit (MRL) and compliance assessment are mandatory for this data collection, thereby ensuring MRL compliance at EU level can be assessed. Since many EU reporting organisations have already implemented the SSD in their data management systems, any new elements added to the SSD would be optional, however AMR reporting guidelines could specify the topic specific mandatory fields.
A number of data fields used in the AMR pilot data model do not have a direct correspondence in the SSD. Some AMR numeric data fields used to collect the cut-off values, the antimicrobial susceptibility test and the parameters under which the test was performed are missing in the SSD model.
In the case of the Minimum Inhibitory Concentration for dilution tests (MIC) this is a categorical value with an associated MICVALUES pick list.
There is no direct correspondence with the numerical values describing substance concentration and the analytical method performance parameters in the current SSD. As a consequence, a requirement to extend the SSD data model to allow the reporting of these new data elements would be needed.
Mandatory SSD data elements missing in the AMR model
Conversely, there are some mandatory elements in the SSD that are not included in the AMR data model. Some are:
• Laboratory sample code
• Country of origin (of the sample) • Product treatment
• Sampling method • Laboratory accreditation • Type of parameter • Type of result
If SSD is extended and adopted for AMR isolate based reporting, when no information is available the value “Unknown” can be used to report on “Country of origin”, “Product treatment” and “Sampling method”. Where missing, the other controlled terminologies could be extended to include the value “Unknown”. In the latter case, it will be necessary to make this ‘Unknown’ value valid only for AMR isolate-based data, since other data collections require mandatory completion of these fields based on existing controlled terminologies which exclude ‘Unknown’.
However, the information reported on “Laboratory sample code”, “Country of origin”, “Laboratory accreditation”, “Type of parameter” and “Type of result” can be informative when assessing data quality. So the addition of these data fields in the AMR data collection model can be also considered by the AMR experts that will discuss the revision of the AMR isolate-model used in the pilot exercise.
Revision of the SSD pick lists
In order to describe the “Isolate” entity added to the SSD model, the SSD’s PARAM controlled terminology needs to be revised in order to include all the species, subtypes, phagetypes and serovars available in the Zoonoses pick list.
In order to describe food items, feed items or animal species items used in the AMR data collection and included in the Zoonoses pick list, there is the need to review and adapt the existing FOODEX and MATRIX pick lists in the SSD. This review, especially for FOODEX, should take in consideration the future EFSA’s harmonised food classification system, yet to be published, and assess the possibility to use the “product code” element of the SSD.
Revision of the AMR Isolate-based pick lists
In the AMR pilot, two pick lists were identified for which there was not an exact correspondence to the SSD elements and the following proposals for revision are made:
• The pick list “Sampling Stage” at level 1 can be a subset of the SSD’s “Sampling point” data element, associated with the SSD’s SMPNT controlled terminology.
Some entries of the current “Sampling Stage” level 2 refer to “domestic” and “imported”. This information should be kept in a different column or derived from the field “Country of origin (of the sample)” which is currently not collected in Zoonoses for aggregated tables. The addition of the field “Country of origin (of the sample)” with its specific pick-list should be considered for future sample-based data collection needs in Zoonoses. For the moment, to support backward compatibility it is suggested to still collect “domestic” and “imported” in the Level 2 of the Sampling Stage pick lists, where applicable.
In order to support reporting “Sampling Stage” at level 2 and level 3 an additional “Sample Type” data element in SSD could be considered with a controlled terminology potentially based on the “Langual 2010 C: Part of plant or animal codes” (Langual 2010) or alternatively by using the new EFSA harmonised food classification system after its adoption.
Therefore, the splitting of the “Sampling Stage” pick list into two separate pick lists as follows may facilitate the mapping to local data structures and the SSD data model and therefore improve reporting of this information:
Sampling Stage Level 1 Level 2 at AI station at catering domestic imported
at cutting plant domestic
imported at farm
at feed mill domestic
imported
at game handling establishment at hatchery
at hospital or care home at packing centre
at processing plant domestic
imported at slaughterhouse at zoo from hunting in total unspecified at border control Sample Type
Level 1 Level 2 Level 3
animal sample blood caecum ear eggs faeces fleece foetus/stillbirth lymph nodes meat juice milk mucosal swab rectum-anal swab vaginal swab placental swab neck skin organ/tissue tonsil food sample blood carcass swabs meat milk neck skin tonsil feed sample environmental sample boot swabs dust unknown
• The pick list “Sampling Context” at level 1 and level 2 is very similar to the SSD’s element “Programme type”, categorised by the SSD’s SRCTYP controlled terminology. However, “Sampling Context” level 1 and level 2 are partially but not fully covered by the “Programme type”. This could be overcome by reviewing the controlled terminology SRCTYP and creating in SSD new elements and controlled terminologies based on “Sampling Context” level 1 and level 2. For example, one new data element can be added in SSD to describe the design of the programme and the “Sampler” (i.e. “Sample taker”). The pick list “Sampling Context” at level 3 uses the EUROSTAT classification and is equivalent to the SSD element “Sampling strategy” and the associated controlled terminology SAMPSTR.
Therefore, the splitting of the “Sampling Context” pick list into three separate pick lists as follows may facilitate the mapping to local data structures and the SSD data model and therefore improve reporting of this information:
Sampling Context
Level 1 Level 2
Clinical investigations
Control and eradication programmes
Monitoring Surveillance Survey EU baseline survey National survey Unspecified Sampler Level 1 Industry sampling Official sampling
Official and industry sampling HACCP and owns check Not applicable Sampling Strategy Level 1 Objective sampling Census Convenience sampling Selective sampling Suspect sampling Unspecified Conclusions
The comparison of the SSD and AMR isolate-based data models indicates that, with the introduction in SSD of the “Isolate” entity, with the addition in SSD of some new data elements and the revision of some pick lists in both SSD and AMR models, it would be possible to use the SSD data model to report on AMR isolate-based data.
The extension of the SSD model to accomplish future Zoonoses sample-based data collection needs for food, feed or animal samples is outside the scope of this Working Group and should be discussed by the “SSD review committee” where also new requirements should be considered and discussed in details (e.g. collection of animal samples, specification of storage of the sample until shelf life, etc…).
ABBREVIATIONS
AMR = Antimicrobial Resistance CSV = Comma Separated Values DCF = Data Collection Framework
ECDC = European Centre for Disease Prevention and Control EFSA = European Food Safety Authority
FBO = Food-borne Outbreak
SSD = Standard Sample Description XML = eXtensible Markup Language