Using Multiple Data Sources

Merging data files

Aggregating data

Weighting data

Changing file structure

Using output as input (For more information, see “Using Output as Input with OMS” in Chapter 9 on p. 143.)

Using Multiple Data Sources

Starting with release 14.0, you can have multiple data sources open at the same time.

When you use the dialog boxes and wizards in the graphical user interface to open data sources, the default behavior is to open each data source in a new Data Editor window, and any previously open data sources remain open and available for further use. You can change the active dataset simply by clicking anywhere in the Data Editor window of the data source that you want to use or by selecting the Data Editor window for that data source from the Window menu.

In command syntax, the default behavior remains the same as in previous releases: reading a new data source automatically replaces the active dataset. If you want to work with multiple datasets using command syntax, you need to use theDATASETcommands.

TheDATASETcommands (DATASET NAME,DATASET ACTIVATE,DATASET DECLARE, DATASET COPY,DATASET CLOSE) provide the ability to have multiple data sources open at the same time and control which open data source is active at any point in the session. Using defined dataset names, you can then:

Merge data (for example,MATCH FILES,ADD FILES,UPDATE) from multiple different source types (for example, text data, database, spreadsheet) without saving each one as an external IBM® SPSS® Statistics data file first.

Create new datasets that are subsets of open data sources (for example, males in one subset, females in another, people under a certain age in another, or original data in one set and transformed/computed values in another subset).

Copy and paste variables, cases, and/or variable properties between two or more open data sources in the Data Editor.

Operations

Commands operate on the active dataset. The active dataset is the data source most recently opened (for example, by commands such asGET DATA,GET SAS,GET STATA,GET TRANSLATE) or most recently activated by aDATASET ACTIVATEcommand.

Note: The active dataset can also be changed by clicking anywhere in the Data Editor window of an open data source or selecting a dataset from the list of available datasets in a syntax window toolbar.

Variables from one dataset are not available when another dataset is the active dataset.

Transformations to the active dataset—before or after defining a dataset name—are preserved with the named dataset during the session, and any pending transformations to the active dataset are automatically executed whenever a different data source becomes the active dataset.

Dataset names can be used in most commands that can contain references to SPSS Statistics data files.

Wherever a dataset name, file handle (defined by theFILE HANDLEcommand), or filename can be used to refer to SPSS Statistics data files, defined dataset names take precedence over file handles, which take precedence over filenames. For example, if file1 exists as both a dataset name and a file handle,FILE=file1in theMATCH FILEScommand will be interpreted as referring to the dataset named file1, not the file handle.

Example

*file2 is now the active dataset; so the following command will generate an error.

FREQUENCIES VARIABLES=file1Var.

*now activate dataset file1 and rerun Frequencies.

DATASET ACTIVATE file1.

FREQUENCIES VARIABLES=file1Var.

The firstDATASET NAMEcommand assigns a name to the active dataset (the data defined by the firstDATA LISTcommand). This keeps the dataset open for subsequent use in the session after other data sources have been opened. Without this command, the dataset would automatically close when the next command that reads/opens a data source is run.

TheCOMPUTEcommand applies a transformation to a variable in the active dataset. This transformation will be preserved with the dataset named file1. The order of theDATASET NAMEandCOMPUTEcommands is not important. Any transformations to the active dataset, before or after assigning a dataset name, are preserved with that dataset during the session.

The secondDATA LISTcommand creates a new dataset, which automatically becomes the active dataset. The subsequentFREQUENCIEScommand that specifies a variable in the first dataset will generate an error, because file1 is no longer the active dataset, and there is no variable named file1Var in the active dataset.

DATASET ACTIVATEmakes file1 the active dataset again, and now theFREQUENCIES command will work.

Example

*dataset_subsets.sps.

DATASET CLOSE ALL.

DATA LIST FREE /gender.

BEGIN DATA

0 0 1 1 0 1 1 1 0 0 END DATA.

DATASET NAME original.

DATASET COPY males.

DATASET ACTIVATE males.

SELECT IF gender=0.

DATASET ACTIVATE original.

DATASET COPY females.

DATASET ACTIVATE females.

SELECT IF gender=1.

EXECUTE.

The firstDATASET COPYcommand creates a new dataset, males, that represents the state of the active dataset at the time it was copied.

The males dataset is activated and a subset of males is created.

The original dataset is activated, restoring the cases deleted from the males subset.

The secondDATASET COPYcommand creates a second copy of the original dataset with the name females, which is then activated and a subset of females is created.

Three different versions of the initial data file are now available in the session: the original version, a version containing only data for males, and a version containing only data for females.

Figure 4-1

Multiple subsets available in the same session

In document Spss Example (Page 71-74)