• No results found

Quantitative Methods Workshop. Graphical Methods for Investigating Missing Data

N/A
N/A
Protected

Academic year: 2021

Share "Quantitative Methods Workshop. Graphical Methods for Investigating Missing Data"

Copied!
15
0
0

Loading.... (view fulltext now)

Full text

(1)

Background Missing data analysis inR rggobi: Data Visualistion Amelia: a program for missing data conclusion

Quantitative Methods Workshop

Graphical Methods for Investigating Missing Data

Graeme Hutcheson

School of Education University of Manchester

Graeme D. Hutcheson Manchester University

Background Missing data analysis inR rggobi: Data Visualistion Amelia: a program for missing data conclusion

missing data data imputation

missing data

I Data sets with missing values are very common in the social sciences.

I Missing data is commonly ‘dealt with’ by using:

I list-wise deletion

I simple data replacement (random values, mean values or values predicted directly from regression models)

I removing variables with relatively large amounts of missing data from the analysis.

None of these techniques is adequate.

(2)

Background Missing data analysis inR rggobi: Data Visualistion Amelia: a program for missing data conclusion

missing data data imputation

King et al., 2001: American Political Science Review

‘...approximately 94% (of analyses) use listwise deletion to

eliminate entire observations... List-wise deletion discards one-third of cases on average, which deletes both the few nonresponses and the many responses in those cases. The result is a loss of valuable information at best and severe selection bias at worst.’

Graeme D. Hutcheson Manchester University

Background Missing data analysis inR rggobi: Data Visualistion Amelia: a program for missing data conclusion

missing data data imputation

Objectives

I Even though missing data is important, it is rarely dealt with or even acknowledged in educational research. Why is this?

I There is a general ignorance as to the damaging effects that missing data can have on analyses.

I There is a lack of training about imputation techniques and available ‘useable’ software.

I There is a general reluctance from reviewers to accept data imputation (without detailed justifications they are often make

‘easy targets’ for criticism).

I Data imputation is not easy and should not be achieved by

(3)

Background Missing data analysis inR rggobi: Data Visualistion Amelia: a program for missing data conclusion

missing data data imputation

data imputation

I Data imputation is now accepted (particularly multiple imputation), but has been very slow to be adopted by researchers.

I The reason for this is only in part a lack of information and training. A bigger issue is that... in practice it can take many hours or days to run and cannot be fully automated.... no commercial software includes a correct implementation of multiple imputation (King et al., 2001).

Graeme D. Hutcheson Manchester University

Background Missing data analysis inR rggobi: Data Visualistion Amelia: a program for missing data conclusion

introduction R: programs conclusion

Missing data analysis in R

I The problem of software is now being addressed by the R package. Researchers working on a number of techniques over the last decade now have a platform on which to publish their software. This has led, in the last year or so, to many

techniques becoming accessible to researchers.

I A simple search for data imputation and missing data on CRAN shows the following (a selection of results are provided - note that these are only the packages that have the target words in the title):

(4)

Background Missing data analysis inR rggobi: Data Visualistion Amelia: a program for missing data conclusion

introduction R: programs conclusion

Missing data analysis in R

I Amelia II: A Program for Missing Data

I arrayImpute: Missing imputation for microarray data

I cat: Analysis of categorical-variable datasets with missing values

I EMV: Estimation of Missing Values for a Data Matrix

I impute: Imputation for microarray data

I mi: Missing Data Imputation and Model Checking

I mice: Multivariate Imputation by Chained Equations

Graeme D. Hutcheson Manchester University

Background Missing data analysis inR rggobi: Data Visualistion Amelia: a program for missing data conclusion

introduction R: programs conclusion

Missing data analysis in R

I mirf: Multiple imputation and random forests for unobservable phase, high-dimensional data.

I mitools: Tools for multiple imputation of missing data

I mix: Estimation/multiple Imputation for Mixed Categorical and Continuous Data

I pan: Multiple imputation for multivariate panel or clustered data

(5)

Background Missing data analysis inR rggobi: Data Visualistion Amelia: a program for missing data conclusion

introduction R: programs conclusion

Missing data analysis in R

I rggobi: Interface between R and GGobi (missing data tools)

I SeqKnn: Sequential KNN imputation method

I SimHap: A comprehensive modeling framework for

epidemiological outcomes and a multiple-imputation approach to haplotypic analysis of population-based data

I VIM: Visualization and Imputation of Missing Values

I yaImpute: An R Package for k-NN Imputation

Graeme D. Hutcheson Manchester University

Background Missing data analysis inR rggobi: Data Visualistion Amelia: a program for missing data conclusion

introduction R: programs conclusion

R add-on packages

I This is an exciting time in statistics and data analysis as these techniques are only now being made available to all

researchers (most of these programs have been uploaded in the last year).

I Many of the packages listed above also have point-and-click interfaces which makes them simple to operate (see, for example, rggobi, VIM, Amelia) and all have comprehensive manuals available for download from CRAN.

I This seminar will briefly demonstrate two packages rggobi, a data visualization package and Amelia II, a data imputation package.

(6)

Background Missing data analysis inR rggobi: Data Visualistion Amelia: a program for missing data conclusion

missing data shown in margin plots visualising missing data

Imputing data

checking imputed data (simple imputation) checking imputed data (multiple imputation)

rggobi: A data visualistion program

Full details of ‘rggobi’ can be found at:

http://www.ggobi.org/rggobi

Information about R and installing packages can be found at:

http://www.r-project.org http://www.rgsweb.net

Graeme D. Hutcheson Manchester University

Background Missing data analysis inR rggobi: Data Visualistion Amelia: a program for missing data conclusion

missing data shown in margin plots visualising missing data

Imputing data

checking imputed data (simple imputation) checking imputed data (multiple imputation)

rggobi: A data visualistion program

The following analyses are taken directly from the ggobi website http://www.ggobi.org/ and the book:

Cook, D. and Swayne, D. F. (2007). Interactive and Dynamic Graphics for Data Analysis: With Examples Using R and GGobi.

Springer.

The data show environmental readings for two years (an el-nino year (1997) and a non-el-nino year (1993)).

(7)

Background Missing data analysis inR rggobi: Data Visualistion Amelia: a program for missing data conclusion

missing data shown in margin plots visualising missing data

Imputing data

checking imputed data (simple imputation) checking imputed data (multiple imputation)

missing data shown in margin plots

Graeme D. Hutcheson Manchester University

Background Missing data analysis inR rggobi: Data Visualistion Amelia: a program for missing data conclusion

missing data shown in margin plots visualising missing data

Imputing data

checking imputed data (simple imputation) checking imputed data (multiple imputation)

rggobi: visualising missing data

(8)

Background Missing data analysis inR rggobi: Data Visualistion Amelia: a program for missing data conclusion

missing data shown in margin plots visualising missing data

Imputing data

checking imputed data (simple imputation) checking imputed data (multiple imputation)

rggobi: imputing data

Graeme D. Hutcheson Manchester University

Background Missing data analysis inR rggobi: Data Visualistion Amelia: a program for missing data conclusion

missing data shown in margin plots visualising missing data

Imputing data

checking imputed data (simple imputation) checking imputed data (multiple imputation)

rggobi: imputing data

(9)

Background Missing data analysis inR rggobi: Data Visualistion Amelia: a program for missing data conclusion

missing data shown in margin plots visualising missing data

Imputing data

checking imputed data (simple imputation) checking imputed data (multiple imputation)

rggobi: checking imputed data (simple imputation)

Graeme D. Hutcheson Manchester University

Background Missing data analysis inR rggobi: Data Visualistion Amelia: a program for missing data conclusion

missing data shown in margin plots visualising missing data

Imputing data

checking imputed data (simple imputation) checking imputed data (multiple imputation)

rggobi: checking imputed data (multiple imputation)

(10)

Background Missing data analysis inR rggobi: Data Visualistion Amelia: a program for missing data conclusion

introduction GUI input options output

Multiple imputation

I Methodologists and statisticians agree that ‘multiple

imputation’ is a superior approach to the problem of missing data scattered through ones explanatory and dependent variables than the methods currently used in applied data analysis (King et al., 2001: American Political Science Review).

I Amelia II is a package that implements a sophisticated multiple imputation of missing data and also allows diagnostics to assess the utility of the imputed data.

Graeme D. Hutcheson Manchester University

Background Missing data analysis inR rggobi: Data Visualistion Amelia: a program for missing data conclusion

introduction GUI input options output

Amelia: a simple GUI

(11)

Background Missing data analysis inR rggobi: Data Visualistion Amelia: a program for missing data conclusion

introduction GUI input options output

Amelia: data input

Graeme D. Hutcheson Manchester University

Background Missing data analysis inR rggobi: Data Visualistion Amelia: a program for missing data conclusion

introduction GUI input options output

Amelia: options - variables

(12)

Background Missing data analysis inR rggobi: Data Visualistion Amelia: a program for missing data conclusion

introduction GUI input options output

Amelia: options - Time Series/Cross Sectional data

With amelia, time series and cross-sectional indices can be set.

Researchers often also have additional prior information about missing data values based on previous research, academic consensus, or personal experience. This information can be

incorporated into the data imputation algorithm to produce vastly improved imputations.

Graeme D. Hutcheson Manchester University

Background Missing data analysis inR rggobi: Data Visualistion Amelia: a program for missing data conclusion

introduction GUI input options output

Amelia: options - priors

Case priors and distributional priors can be easily coded using the GUI.

(13)

Background Missing data analysis inR rggobi: Data Visualistion Amelia: a program for missing data conclusion

introduction GUI input options output

Amelia: output

The multiple imputed data files can be saved in a number of formats.

Graeme D. Hutcheson Manchester University

Background Missing data analysis inR rggobi: Data Visualistion Amelia: a program for missing data conclusion

introduction GUI input options output

Amelia: diagnostics - comparing imputed and observed

(14)

Background Missing data analysis inR rggobi: Data Visualistion Amelia: a program for missing data conclusion

introduction GUI input options output

Amelia: diagnostics - overimputation

Graeme D. Hutcheson Manchester University

Background Missing data analysis inR rggobi: Data Visualistion Amelia: a program for missing data conclusion

introduction GUI input options output

Amelia and rggobi

The values imputed using amelia can easily be saved and inspected using the graphical capabilities of rggobi. Data can be multiply imputed and also checked graphically for fit.

(15)

Background Missing data analysis inR rggobi: Data Visualistion Amelia: a program for missing data conclusion

King et al., 2001: American Political Science Review

‘For political scientists, almost any disciplined statistical model of multiple imputation would serve better than current practices. The threats to the validity of inferences from listwise deletion are of roughly the same magnitude as those from the much better known problems of omitted variable bias.’

Graeme D. Hutcheson Manchester University

Background Missing data analysis inR rggobi: Data Visualistion Amelia: a program for missing data conclusion

Conclusion

I That was just 2 of the many programs available for data imputation and missing data analysis.

I If you are interested in missing data analysis, investigate the available packages (see the manuals) and install those that might be of use.

I see www.RGSweb.net (for data coding and general information about R).

References

Related documents

Our results suggest that larger banks demand lower minimum balances to open a checking or savings account, charge lower checking and savings fees, require fewer documents to

Therefore, to ensure that the financial institutions will continue to expand the services to these categories, ADB will support innovative programs and development of

If scholarship eligibility does not affect the fields of courses taken and those induced by the program to register take a similar distribution of classes to those who would

Arabic language and identify with Arab history and culture, regardless of whether they are Muslims, Christians or Jews. This multi religious view was promoted primarily by Jews

(low values of β ), the basic estimators still show identical behaviors, but, in this case, the two new mixed estima- tors clearly outperform the former whatever the norm, the

Through an artefactual field experiment with 200 Bolivian microfinance borrowers, we observe that subjects from real-world delinquent borrowing groups do not prefer risky

T cell receptor signalling results in rapid tyrosine phosphorylation of the linker protein LAT present in detergent-resistant membrane microdomains.. Engagement of GPI-linked

I n many countries around the world the ability of the board to effectively oversee executive remuneration, as recommended by the OECD Principles of Corporate Governance , appears