Providing data

discouraging factor for information providers. Such valuable time is seen to be better spent on updating and acquiring new data rather than just reformatting data. With this in mind, the authors are proposing that the database be developed using minimal effort. The data exchange format is therefore very important both for information providers as well as information compilers and users. Data interchanges between genebanks have to overcome various obstacles such as the differences in languages, incompatibility of computer hardware and software systems and various data compilation standards (Cao et al. 1995).

The data interchange protocol (DIP) was introduced to facilitate the development of interfaces to link different documentation systems, to enable information and data exchange, and to facilitate the re-used of data. The DIP format is described in a separate DIP manual. Assistance will be provided by IPGRI to participating organization on the development of the interface for export to DIP format if needed. The format was included for data sharing from the Coconut Genetic Resources Database and data exchange by the Taro Genetic Resources Network (TAROGEN) in the Pacific. Using the DIP format allows organizations to produce electronic catalogues without much effort by importing DIP formatted files into the DIPVIEW software or a customized interface. DIPVIEW and DIP manual can be down loaded from the Internet at http://www.ipgri.cgiar.org/regions/apo/dip.html.

Providing data

It is important that data is provided in data format and not in a report format except in cases where a format is predefined, e.g. the DIP format. Data in a report format is information and requires considerable work and time to convert the information into data again. To avoid having to spend time initially in compiling the database, the participating genebanks and organizations should export the requested data from their documentation system and bibliographic database in DIP in any of these formats:

• Access database tables

• Comma delimited text files (CSV) • Tab delimited text files

• Fixed length ASCII files

• Spreadsheets and Tables in word processing software Below are examples of data listing in various formats:

A. DIP format in a single column:

ACCENUMB: HELP CROPNAME: RICE

GENEBANK: CAAS GeneBank DONONUMB: Donor name FAMILY: Family name SCIENAME: Scientific name CULTNAME: Cultivar name

MEDICINAL PLANTS RESEARCH IN ASIA, VOLUME 1 26

ACQUDATE: Acquisition date ACCENUMB: WD-10001 DONONUMB: JING 0525 FAMILY: Gramineae SCIENAME: Oryza sativa L. CULTNAME: QIAN LI MA 1 HAO ACQUDATE: 26 Nov 87

ACCENUMB: WD-10002 DONONUMB: JING 0526 FAMILY: Gramineae SCIENAME: Oryza sativa L. CULTNAME: QIAN LI MA 2 HAO ACCENUMB: End

B. The same data as presented in a table format:

ACCENUMBCROP

NAME GENEBANK DONONUMB FAMILY SCIENAME CULTNAME ACQUDATE

WD-10001 RICE CAAS GENEBANK JING 0525 GRAMINEAE ORYZA SATIVA L.QIAN LI MA 1 HAO 26-Nov-87

WD-10002 RICE CAAS GENEBANK JING 0526 GRAMINEAE ORYZA SATIVA L.QIAN LI MA 2 HAO

C. As a report format in a catalogue:

Accession number - WD-10001 Acquisition date - 26-Nov-87 Donor Code - JING 0525 Family and Scientific name - GRAMINEAE / ORYZA SATIVA L. Cultivar - QIAN LI MA 1 HAO Accession number - WD-10002 Acquisition date - Donor Code - JING 0526 Family and Scientific name - GRAMINEAE / ORYZA SATIVA L. Cultivar - QIAN LI MA 2 HAO

The catalogue represents a report and to re-extract data from such a report is time consuming. In the case of the table format, the data remains as data and hence can be re-used easily with minimal effort. The DIP format looks like a report but in essence it is a fixed length text file that has the data dictionary and data included in a single file. The format is also well suited for electronic catalogues development and for data exchange. It is also suited for migration to XML format in the future. So, data if correctly stored in databases or as tables or in formatted ASCII text files such as the DIP format can be re-used to generate reports but data already stored in a report format as information is difficult to access electronically for re-use.

Another issue in providing data is data representation of descriptor states for similar descriptors in different databases hosted by different genebanks. Example, the descriptor states for flower colour can be very different in different genebanks’ documentation systems as shown in the table below.

Genebank Descriptor States Genebank Descriptor States

A 1 – white 2 – red 3 – green 9 – others B 1 – red 2 – light green 3 – white 4 – others

Information providers will face difficulties when they are required to comply with a standard. If the descriptor states from Genebank A are regarded as the standard, then Genebank B will have to convert its data and this requires time, effort and increases the chances of introducing errors into the database in the conversion process. The suggestion is to retain each genebanks descriptor states as it is, so that data can be exported from them without change and with minimal effort. If every

BACKGROUND PAPERS 27

genebank exports data in their own standard then how do we develop a central database?

In the proposed database we will only combine what is common, examples are the descriptors mentioned in the FAO and IPGRI list of multi-crop passport descriptors (MCPD). The MCPD can be found at http://www.ipgri.cgiar.org/publications/pubfile+.asp?ID.PUB+124 which provides international standards to facilitate germplasm passport information exchange. These descriptors aim to be compatible with IPGRI crop descriptor lists and with the descriptors used for the FAO World Information and Early Warning System (WIEWS) on plant genetic resources (PGR). In most instances, these basic passport data would be available in genebank documentation system.

The first output we mentioned to achieve is a list of medicinal plants conserved in each country and the organizations involved. These cover mainly passport data that are common. After developing the list of medicinal plant accessions held by each genebank we would have a good idea on each participant’s capacity and can further fine-tune the data exchange process. This is important when we look at characterization data in the second phase. Training and capacity building will be developed in collaboration with information officers from participating organizations.

In document Herbal (Page 31-33)