• No results found

Summarizing and Displaying Categorical Data

N/A
N/A
Protected

Academic year: 2022

Share "Summarizing and Displaying Categorical Data"

Copied!
7
0
0

Loading.... (view fulltext now)

Full text

(1)

Summarizing and Displaying Categorical Data

Categorical data can be summarized in a frequency distribution which counts the number of cases, or fre- quency, that fall into each category, or a relative fre- quency distribution which measures the percentage of the data set, or proportion, within each category.

Categorical data can be visualized in a bar graph.

Bars, labelled by category, have heights determined by the frequency (or relative frequency) of data in that cat- egory. Bars should be separated by gaps across the dis- play. [Excel: Data > PivotTable; Insert >

Charts > Column > Clustered Column]

A pie chart represents categories with labeled sec- tors in a circle; the proportion of data in that category equals the percentage of the area of the circle assigned to that sector. Its best not to use a pie chart when the number of categories is large. [Excel: Data >

PivotTable; Insert > Charts > Pie > Pie]

(2)

Summarizing Categorical Data

The most common categorical data involves a variable with only two possible values: either the individual be- ing measured possesses some characteristic of interest, or it doesn’t. The resulting categories can be referred to as either Success or Failure.

• proportion of successes (p)

statistic that summarizes the data set by recording the proportion of data values which are Successes:

p = number of Successes

n ,

where n represents the number of values in the data set

• proportion of failures (q)

statistic that summarizes the data set by recording the proportion of data values which are Failures:

q = number of Failures

n ;

since there are only two categories, we always have that q = 1 − p.

(3)

Displaying Quantitative Data

Numerical data can be visualized with a histogram.

Data are separated into (usually equal) intervals along a numerical scale, called classes, then the frequency distribution of data in each class is tallied. Bars are built over each interval with heights, measured along a vertical scale, given by the frequency (or relative fre- quency) of data within each class.

[Excel: Data > PivotTable; PivotTableTools

> Options > Group > Group Field; Insert >

Charts > Column > Clustered Column; Format Data Series > Series Option > Gap Width >

No Gap]

A polygon display is obtained by replacing the bars of a histogram with a broken line joining points which are plotted at the midpoints of tops of the bars for each class interval.

[Excel: build histogram, then. . . Change Series Chart Type > Line > Line with Markers]

(4)

A cumulative frequency distribution records the number of observations that fall at or below the up- per limits of each class; a cumulative relative fre- quency distribution records the proportion of obser- vations that fall at or below the upper limits of the classes. The histogram-like display of the cumulative (relative) frequency distribution formed by erecting bars over each class is called an ogive.

[Excel: build polygon, then. . . PivotTable Field List > Values > Value Field Settings >

Show Values As > % Running Total In]

A quick way to display numerical data by hand is with a stem-and-leaf display. All but the rightmost digit (or digits) of the measurement become stems; stems head rows in which the remaining digit(s), the leaves, are listed, lined up vertically in columns. (List all inter- mediate stems, even if they contain no leaves!)

(5)

Describing Quantitative Data: Features of Inter- est

• The shape of a histogram or stem-and-leaf describes the distribution of the data – where data is concentrated and how it spreads out across the entire range of values.

• Where is the center of the distribution located?

• How much spread is there in the distribution? How tightly is the data clustered about the center?

• Is there more than one cluster, or mode? Is the data unimodal, bimodal, multimodal? Note: The loca- tion of modes can change with the scaling unit of a display (width of a bar).

• Is the distribution uniform (has a flat contour), indi- cating that every value is (roughly) equally represented?

Is it roughly symmetric, with equally frequent values on either side of the center (the distribution to the right of the center is the mirror image of what appears to the left)? Or is it skewed (heaver on one side of the center than the other) to the left or right, in the direction of the tail (region of most extreme values)?

(6)

Displaying Paired Numerical Data

Paired numerical data sets are quite common in statisti- cal practice. This occurs when two Whats are measured for the same set of Whos. Often the goal is to deter- mine whether values of one of the variables are affected by changes in values of the other variable.

• response (dependent) variable

measures a characteristic of interest in a study; the aim is to determine how this variable is affected by variation in some other quantity, namely, an . . .

• explanatory (independent or predictor) variable a variable which may turn out to influence the outcome of the response variable

• scatterplot

display of paired data as points (x, y) in a coordinate plane; here, x represents the explanatory variable, y the response variable

[Excel: Insert > Charts > Scatter >

Scatter with only Markers]

(7)

To investigate the possible relationship between the vari- ables, look for overall patterns in the plot and be on the watch for outliers (points located far from the region where most data are clustered) or deviations from the overall patterns

• association

tendency for change in one variable to be accompanied by change in the other

• direction

variables display a positive association if larger values of one tend to be paired with larger values of the other, and a negative association if larger values of one tend to be paired with smaller values of the other

• form

shape of the plot, including clusters of data points; lin- ear relationships are most important

• strength

how closely the points conform to the overall shape of the plot

References

Related documents

This stage calculates the vertical and horizontal gradients using convolution kernels. The kernels vary in size from 3×3 to 9×9, depending on the sharpness of the image. The Xilinx

We observed that surgery and LPS augmented systemic inflammation up to postoperative d 3 and this was associated with further neuroinflammation (CD11b and CD68 immunoreac- tivity)

The disadvantage of median filter is sometimes this is not subjectively good at dealing with large amount of Gaussian noise as the mean filter.. Such pre-handling operations are

i) Time-modulation theory is used to analyze the scattering properties of single-active-layer PMS in [9], and we find the fundamental scattering energy of PMS is residue due to its

Continuity theories argue that in some primal form, consciousness always accompanies matter and as matter evolved in form and complexity consciousness co-evolved, for example into

Preparation and attachment of photocleavable linker to the polystyrene matrix, their purification and characterisation, application of the photocleavable linker bearing resin in

It was the record of Panehesy*s presence in Thebes earlier in the reign of Ramesses XI, and the appearance of Herihor and Piankh with the Viceregal titles later in

The patient developed severe recurrent bacterial urinary tract infections and demonstrated several features of severe immunosuppression.. She was