6. Description of classes
6.6. Class no 6 – Sector and Industry Datasets and Metadata
Description
This class covers records (mostly in the form of data) that are created and collected by the Authority, or provided to the Authority from Service Providers when the service provider contracts require the service provider to collect and supply the Authority with industry and market data. The data
collected by the regulator supports the management, facilitation and monitoring of the electricity markets. It enables forecasting, modelling, analysis and interpretation of market performance over time.
To support the appropriate disposal of the sector and industry datasets and their management records six sub-classes have been created that are independent of any particular datasets, but instead focus on the actions typically taken with the datasets. This follows the approach taken in DA 379 by Statistics New Zealand. The sub-classes are:
• Original dataset received from external source -The original version of a dataset received from an external source either in the format in which it was received. For example the original .CSV file or data/content retained in a system which is regularly audited. Examples include data received by the service providers from participants, or data received by the Authority from providers
• Operational datasets with accompanying metadata - The first versions of datasets created from the collected data;
• Processing datasets with accompanying metadata - Working versions of datasets used during processing including extractions and manipulations and metadata that has been collected with and about the dataset;
• Definitive version of individual datasets with accompanying metadata – Market and industry datasets and data that have experienced all the quality checks and editing
procedures deemed necessary to support the production of accurate, reliable and complete data. Also includes metadata accompanying the definitive versions of datasets that has been collected and validated. For example the data contained within the Authority data
warehouse, or the audited dataset which will identify the original transactions held by providers
• Data warehouse management – Records relating to the management of the Authority’s data warehouse. Includes:
- collection criteria and process documentation; - data policies and timetables;
data is gathered into the data warehouse;
• Software models - Open source software models developed to support analysis of data from the electricity industry such as pricing data or consumption data (includes source code). The Authority regularly gathers data from service providers as part of its role. The Authority views itself as having long term responsibility for the collection of market and industry data, particularly given market operations are contracted services for which the providers could change over time. The breadth and depth of data gathered regularly is likely to grow in the future providing an increased opportunity to monitor impacts of changes to the market and analyse other change over substantial periods of time.
Datasets currently regularly gathered by the Authority include:
• metering data (i.e. metering of electricity going on and off the grid, plus the prices, offers and bids);
• pricing data (i.e. final pricing cases and all bids, offers, available generation data so as to enable replication of the pricing process if needed)
• registry data (i.e. which customers belong to which retailer – updated daily);
• reconciliation data (i.e. data showing in half hour blocks the reconciliation between usage meters and grid metered data and total consumption within distribution companies). The Authority also has some datasets inherited from disestablished electricity sector agencies e.g. the Electricity Commission of New Zealand (ECNZ) such as hydrology data which support the ability for long term modelling and forecasting. The hydrology data held by the Authority dates back to the 1930s, with the older material not held by other agencies. The hydrology data continues to be added to.
As well as gathering data, the Authority regularly supplies or makes data available to external audiences, in both raw and analysed form. This may occur as an ad hoc request or as part of a regular function of the Authority, such as collation of the Central Data Set.
The Centralised Data Set (CDS) was established under legislation with the purpose of enabling modelling to support robust investment decisions, analysis of decisions, and risk assessment. The CDS therefore has ongoing value for a wide audience of parties who invest in the electricity industry or who are interested in the outcomes.
The CDS is made freely available and is made available to the public every six months to enable transparency, and open analysis to inform decision-making. Most recently it has been published on DVD. The CDS when published contains raw data files from its inception and summarised data plus any models used for analysis of the data. Some of the series of data published in the CDS go back to the 1970s. The CDS includes data such as:
• metering data (i.e. metering of electricity going on and off the grid, plus the prices, offers and bids);
• technical diagrams of the transmission network;
• descriptions of assets that form part of the electricity system; • hydrology data measuring water levels and flows in catchments.
The Authority has a full set of the DVDs produced, but considers them to be ‘publications’ (as they are made available to the public through a formal publishing process) with the full dataset in the data warehouse (the primary data source).
As part of gathering and management of data, metadata is kept to provide context to the datasets. A full process is in place to register and manage the receipt of data from external organisations. Data is typically received on CDs as .csv files.
Data is stored in a number of different repositories, including the recently developed data
warehouse. There are plans for the data warehouse to become the primary repository for all ‘final’ datasets so that it becomes the authoritative aggregated data source. Records created to support the management of the data warehouse include:
• collection criteria and process documentation; • data policies and timetables;
• metadata and documentation describing the datasets and the systems, from which the data was gathered, and metadata and documentation describing the data warehouse and its content.
As part of the use of the data there are often many iterations or sub-sets of data created, or models developed to enable interpretation of the data. Where the Authority develops software data models they (and the source code) are made available as open source models for anyone wanting to use them to analyse the data that the Authority provides. All original source code is retained and kept with the dataset the model was developed to analyse.
The source dataset in its original form and accompanying metadata is always kept separately from any future iteration of the data. This provides for repeatability of analysis actions if required.
Value
The operational and processing versions of the datasets and their accompanying metadata are considered to have short term business value only as they are of a transitory nature. They are recommended for destruction.
The original dataset received by the Authority or its service providers from an external source in the format in which it is received only has value until it has been fully quality checked and audited. Similarly the operational datasets and processing datasets with accompanying metadata only have value until the quality checks and all processing is complete. These sub-classes are therefore recommended for destruction once quality checking activities have been completed.
However the definitive version of the dataset created following quality checks is deemed to have long term archival value. Retention of such data and accompanying metadata is vital to allow authoritative modelling and forecasting about the electricity industry and market. The Authority considers that it has a primary role in providing for the storage and management of these datasets as part of its role as the regulator, but also to provide continuity of data within the sector. Although recommended for retention as public archives a deferral of transfer will be sought for the relevant sub-class relating to data collection and management by the Authority. The datasets are continually being added to and are likely to continue to be in current business use for decades so transfer of the data to Archives New Zealand would negatively affect the Authority’s ability to carry out many of its functions. The Authority will undertake to manage these as required by Archives New Zealand standards to ensure their long-term accessibility.
Records relating to the management of the data warehouse have long term archival value as they provide evidence of the integrity and management of the datasets within the repository’ thereby validating the actual datasets retained. They are recommended for retention as public archives. Software model development records are recommended for retention as public archives. The model used to analyse the data must be retained as well as the data itself to allow for full reproducibility of any analysis carried out.
Recommendations
Records recommended for retention as public archives: Reference
no.
Record type Description Disposal
criteria
6/4 Definitive version of
datasets with
accompanying metadata
Market and industry datasets and data that have experienced all the quality checks and editing procedures deemed necessary to support the production of accurate, reliable and complete data. Also includes metadata accompanying the definitive versions of
datasets that has been collected and validated. For example the data contained within the Authority data warehouse, or the audited dataset which will identify the original transactions held by providers
A7
6/5 Data Warehouse
Management
Records relating to the management of the Authority’s data warehouse. Includes:
- collection criteria and process documentation - data policies and timetables
- metadata and documentation describing the datasets, and the systems from which data is gathered into the data warehouse
A3, A4, A7
6/6 Software Model
Development
Open source software models developed to support analysis of data from the electricity industry such as pricing data or consumption data. Includes source code.
Records recommended for destruction: Reference
no.
Record type Description Disposal
criteria
6/1 Original dataset received from
external source
The original version of a dataset received from an external source either in the format in which it was received, e.g. the original .CSV file, or data/content retained in a system which is regularly audited. Examples include data received by the service
providers from participants, or data received by the Authority from service providers.
D3
6/2 Operational datasets with
accompanying metadata
The first version of datasets created from the collected data and metadata that has been collected with and about the dataset.
D3
6/3 Processing datasets with
accompanying metadata
Working versions of datasets used during processing including extractions and manipulations and metadata that has been collected with and about the dataset.