Frequency distributions

A frequency distribution enables you to inspect the contents of a field of columns containing alphabetic or numeric data. For example, in a shopping survey the price the respondent paid for a bottle of mineral water may be stored in columns 112 to 114. A frequency distribution will tell you how many respondents bought mineral water at a particular price. This is very useful for determining how the values in these fields should be grouped for tabulation, as well as for rough estimates of medians.

By default, each distribution has two parts. In the first part, the values in the column field are sorted in alphabetic or numeric order; in the second, they are sorted in rank order, according to the number of times each one occurs in the data. Any multicodes in the field are decoded and the constituent codes are listed. Each distribution shows both absolute and cumulative figures as well as percentages for both. At the end of the alphabetic sort, Quantum prints:

• The number of categories (that is, different values) found.

• The number of numeric items found.

• The sum of factors — that is, the sum of all wholly numeric items (values which occur more than once are counted as many times as they occur).

• The mean for the numeric items listed (that is, the sum of factors divided by the number of numeric items).

• Standard deviation for the numeric values listed.

If the field is numeric and the run has missing values processing switched on, fields that are non-numeric will contain the value missing_. This value is counted as zero by the sum of factors, mean and standard deviation lines of the report.

Statements are provided for requesting a frequency distribution sorted in alphabetic or numeric order only.

Creating a frequency distribution

Quick Reference

To create a frequency distribution sorted in alphabetic and rank orders, type:

list c(start_col, end_col) [$text$]

where text is the heading to be printed.

To produce a frequency distribution sorted in alphabetic order only, type lista instead of list. For a distribution sorted in rank order only, type listr instead of list.

A frequency distribution, as shown in the example on the next page, is created with the list statement, as follows:

list c(m,n) [$text$]

where c(m,n) is the column field whose contents are to be listed and text is the heading to be printed at the top of each page. If no heading text is given, the heading ‘Frequency Distribution’ is used instead.

The list statement, as shown above, produces both the alphabetic and numerically-sorted distributions. To request an alphabetic distribution only, type:

lista c(m,n) [$text$]

and for a ranked distribution only, type:

listr c(m,n) [$text$]

Here are some examples:

listr c(107,108) $Contents of cols 7 and 8$

lista c(t1,t1+4) $First Set of Car Brands$

The first example produces a frequency distribution of the contents of c(107,108) sorted in numeric order; the second example generates a list of car brands which will be sorted in alphabetic order.

Additionally, we are using subscripts to represent the column numbers. If t1 has a value of 36, Quantum will list the values found in columns 36 to 40.

The rules for double quotes in the text are the same as for holecounts, that is, you must precede them with a backslash.

The list in the diagram below shows a frequency distribution for the column field c(123,125). It was created by the statement:

list c(123,125) $PRICE PAID$

Since it was run on a data file containing 200 respondents, the total is 200.

Let’s start with the first table — the alphabetical sort. The figures in the column headed ‘string’ are the values found in columns 123 to 125, in this case, the price paid for a bottle of mineral water.

The next column (item) tells us how many times each code occurred in those columns — that is, how many people paid each price. We can see the actual number of people and also what percentage of the total sample that is. For instance, 31 respondents paid 111p which is 15.5% of the total (200).

The columns labeled cumulative show accumulated totals and percentages for each value found.

There are 86 respondents who paid between 111p and 114p, and these are 43.0% of the total respondents.

The second table shows exactly the same information presented in rank order, with the most frequently occurring value first. The example shows that this is 212, and that 41 respondents or 20.5% of all the respondents paid 212p for a bottle of mineral water.

Unlike count, if list is part of a loop, it will be executed once for each pass through the loop. All values found will be entered in the same list: Quantum does not create a separate listing for each pass through the loop.

PRICE PAID

Number of categories = 14 Number of numeric items = 200 Sum of factors = 32218.00 Mean Value = 161.09

Multiplied frequency distributions

Quick Reference

To create a multiplied or weighted frequency distribution, type:

list c(start_col, end_col) $text$ c(m_start, m_end)

where text is the frequency distribution title and c(m_start,m_end) is the field in the C array containing the multiplier or weight for each record. If the multiplier contains a decimal point, reference it as cx(m_start,m_end).

For a distribution sorted in alphabetic or rank order only, type lista or listr as appropriate instead of list.

Creating multiplied frequency distributions is exactly the same as creating multiplied holecounts:

list c(m,n) [$text$] c(x,y)

As with count, c(m,n) is the column field whose values are to be listed, text is the optional heading to be printed at the top of the page, and c(x,y) is the field containing the multiplier. If the multiplier contains a decimal point, reference it as cx(x,y), otherwise the decimal point will be ignored and, for example, 1.5 will be read as 15. Multipliers may either be part of the original data, or they may be created during the edit, in which case they must be placed in the C array with a wttran statement before the frequency distribution is requested.

Multiplied frequency distributions are generally required when you are producing weighted tables and you want to check that you have the correct number of people in each row of a table.

☞

For further information about weighting and wttran, see section 1.9, ‘Copying weights into the data’ in the Quantum User’s Guide Volume 3.

In earlier chapters, we discussed ways of examining the data for a set of records (with count) or for an individual record (with write). In general, however, we want to check the validity of the data for individual records by putting in the edit a set of testing sentences which will tell us not only whether a record contains an error but also what that error is.

There are two types of checking sentence. The first involves checking whether a column contains the correct type of coding (single-coding/ multicoding) and whether the codes in that column are valid. Take the question on a respondent’s sex which may be Male, coded c106’1’, or Female, coded c106’2’. c106 must be single-coded because a person cannot have two sexes, and the only codes which may appear in that column are 1 and 2. Any record in which c106 is not single-coded with a 1 or a 2 will be flagged as incorrect.

The second type of checking involves making sure that columns whose contents depend on the contents of other columns contain the correct codes. For instance, suppose the questionnaire asks whether the respondent has ever used a particular brand of washing up liquid. The answer is coded into c125 as ‘1’ for Yes and a ‘2’ for No. If the answer is Yes, the next questions concerning price and quality are asked. If c125’2’ indicating that the respondent has not used that brand of washing up liquid, the following columns must be blank. Conversely, if c125’1’, the following columns must be coded according to the codes on the questionnaire.

In document Quantum Vol1 (Page 152-157)