• No results found

Alternative data collection methods -

N/A
N/A
Protected

Academic year: 2021

Share "Alternative data collection methods -"

Copied!
15
0
0

Loading.... (view fulltext now)

Full text

(1)

Presentation prepared by

Ragnhild Nygaard, Statistics Norway

for the UNECE/ILO Meeting on CPIs, Geneva, 2.-4. May 2016

Alternative data

collection methods -

focus on online data

(2)

Contents

UNECE/ILO, 2.-4. May 2016 2

Data sources and data collection methods in

the Norwegian CPI

Major changes over time

Online data project

Building a production system for automatic

extraction of online data

Web scraped data for home electronics

Big data classification and calculation

(3)

Data sources and methods

in the Norwegian CPI

UNECE/ILO, 2.-4. May 2016 3

Traditional data collection methods

• Web questionnaires, e-mails, telephone interviews, «copy paste» of prices

online

Data sources and methods gradually changed over the

last decades;

• Technological progress

• Today much easier access to large amount of electronic data in different formats • Globalisation

• Changed consumer pattern, establishments operate across borders • Structural changes

• Increased market concentration (fewer and bigger units) • Reduction of the response burden

(4)

Data sources

in the Norwegian CPI

UNECE/ILO, 2.-4. May 2016 4 28% 17% 16% 22% 10% 7%

Data from web questionnaires

Online data

Household data (Computer assisted telephone interviewing)

Scanner data

Other electronic registered data

(5)

Online data project

UNECE/ILO, 2.-4. May 2016 5

• Increased focus on scanner data and online data in many European

countries and in Eurostat

• Statistics Norway just finalized the first online data project (2 year

project)

• Online price data - a second best option to scanner data

• Large amount of data • Unstructured data

• Lack of quantity information

• Product-offers vs transaction prices

• Online data project has looked into;

• The general e-commerce • Price setting strategies

• Set up costs, regular maintenance costs and legal issues related to web scraping • Classification and calculation based on big data

(6)

Automated extraction of online price data

- web scraping

UNECE/ILO, 2.-4. May 2016 6

• Statistics Norway is building up experience within web scraping in the

Price Statistics unit

• Automated extraction is partly used for collection of airline fares and

for the prices of dental services

• For goods - an experimental production system for automated

extraction of data

• Not implemented into the CPI

• Important to make proper cost-benefit analyses before starting to

(7)

Automated extraction of online price data

- web scraping II

UNECE/ILO, 2.-4. May 2016 7

Can be quite effective for extracting price information

from different consumer portals;

• Many prices per web site

• In our case for instance; prices of dental services

Less effective if few prices per web site

• In many cases national prices for specific services easy available

online

(8)

Experimental production system

UNECE/ILO, 2.-4. May 2016 8

Gradually built a robust production system of

automated extraction of data

More than 1 year of comparable online data

System based on free software which can easily be

downloaded from the internet (import.io)

Make daily download of prices and price related

information from 4 online retailers registered in

Norway with highest expenditure figures

Most sold products

(9)

Experimental production system II

UNECE/ILO, 2.-4. May 2016 9

Made gradual improvements to reduce the

maintenance costs

Built a separate Java program outside the import.io

environment that communicates with the robots

Important to automatically control all data files

Data will be missing due to e.g. server failure or

changed URLs and this must be discovered at once –

too late afterwards

(10)

Experimental production system III

UNECE/ILO, 2.-4. May 2016 10

Build-in checks

• Automatic retries if import.io server fails • Sizes of the data files are compared

• Look for missing variables

Data stored into csv-files

• Information added (predefined codes, date etc.) • Data cleaned (e.g. remove unnecessary signs)

For a trained import.io user setting up or train a new

robot is not very time consuming (no programming skills

needed)

• But the maintenance costs of the system is of course related to the number of web

(11)

Can we use the daily collected data of

home electronics?

UNECE/ILO, 2.-4. May 2016 11

• Extensive information of characteristics available online

• ..but not structured

• One way of using the data is to imitate traditional data collection

methods

• In our case that would be to act as a respondent and price variants of certain

representative items based on traditional methodology

• Another way of using the data is to use all the data scraped • How to classify all the product-offers?

• We started by classifying the products into product groups identified through the

URLs (àla COICOP6)

(12)

Can we use daily collected data of home

electronics? II

UNECE/ILO, 2.-4. May 2016 12

Further stratification into more detailed homogenous

product groups or segments must be made

• Make use of the product codes - splitting product codes

• For instance: “KSV|29|NW|30” or “SONXPZ|2|BK”

• Make use of the product texts - unsupervised machine learning

• Try to find some hidden structure in the unstructured data

• Data is clustered into different detailed segments based on similarities in the product text

• Also tested further breakdown into price segments to represent

simple, medium and advanced models

• In order to come closer to “similar” products

• Based on the idea that models with different price level are available in the market at the

(13)

Can we use daily collected data of home

electronics? III

UNECE/ILO, 2.-4. May 2016 13

Monthly indices – arithmetic mean of the daily extracted

data

Made some test calculations

• Monthly chained matched model approach (match of identical product

codes) with no replacements

• Product groups showed, as expected, a clear downward bias

• In many cases we see an high introductory price and a low price on it way out of the market • Products with short product life span – on average the same product code is extracted for a

period of 6 months

• Stratification methods - geometric mean of the prices based on detailed

homogenous product groups

• Direct comparisons - imply that we assume minor quality differences within the same

(14)

Can we use daily collected data of home

electronics? IV

UNECE/ILO, 2.-4. May 2016 14

Use of hedonic method is challenging which would

require structured data of price-determining

characteristics

Structured scanner data of home electronics could open

up for other methods ahead

(15)

Can we use daily collected data of home

electronics? V

UNECE/ILO, 2.-4. May 2016 15

References

Related documents

In a period of rising prices, LIFO will yield the highest cost of goods sold because the most recent purchase prices (which are the higher prices in this case) are used to price cost

- Latest price divided by the high price from the past 52 weeks of daily closing prices.. - Latest price divided by the low price from the past 52 weeks of daily

accutane online best price minister accutane online best price vdi accutane discount prices sold accutane mdl litigation lawyers accutane mdl litigation friend accutane discount

Crude protein, crude lipid and ash of fish carcass were not significantly different between dietary treatments with the exception of crude lipid of fish fed with Diet

● Math 034 WEB Mathematics of Money (Online) Fall 2011, Spring 2012 This is an online course offered by the Pennsylvania State University World Campus.. This course is

Press SOURCE on the front panel or remote handset to select Internet radio as the input source.. Press MENU to enter

Effect of muscular activity and training on phospholipid content in. muscles, liver

–  Cavallo (2015) : simultaneous data collection for 50 large retailers in 10 countries shows they tend to have either identical online and offline price levels. Online vs