TEACHING STATISTICS THROUGH DATA ANALYSIS Thomas Piazza
Survey Research Center and Department of Sociology University of California
Berkeley, California
T h e q u e s t i o n o f how b e s t t o teach s t a t i s t i c s i s sometimes p h r a s e d i n t e r m s o f l e a r n i n g by t h e o r y v e r s u s l e a r n i n g by d o i n g . U n f o r t u n a t e l y each a p - proach, by itself, has serious d r a w b a c k s . We a l l k n o w t h a t s t a t i s t i c a l t h e o r y a n d p r o o f s t e n d t o b e f o r g o t t e n soon unless p u t t o some use. O n t h e o t h e r hand, t o p l u n g e a s t u d e n t i n t o d a t a analysis does n o t g u a r a n t e e t h a t h e o r she w i l l ~ i n d e r s t a n d t h e r a t i o n a l e f o r , a n d t h e limits of, t h e p r o - c e d u r e s t h a t can now be so e a s i l y c a r r i e d o u t by u s i n g s t a t i s t i c a l s o f t w a r e packages.
T h e r e a l problem, as 1 see it, is how t o g u i d e t h e use o f data analysis as a l e a r n i n g t o o l so t h a t it leads s t u d e n t s t o an u n d e r s t a n d i n g o f t h e statistical p r i n c i p l e s o n w h i c h t h e v a r i o u s a n a l y t i c t e c h n i q u e s a r e based. I n w h a t f o l - lows I w i l l summarize t h e p r o s a n d cons o f t h r e e approaches t o t h e use o f d a t a analysis in t e a c h i n g s t a t i s t i c s . T h e n I w i l l t r y t o d r a w a f e w c o n c l u - sions a b o u t t h e m a t t e r . Note t h a t t h e s t u d e n t s we w i l l b e concerned w i t h h e r e a r e t h o s e who a r e a t a b e g i n n i n g level a n d who a r e i n t e r e s t e d p r i m a r - ily i n application. A t a more advanced level, especially f o r those w i s h i n g t o become professional statisticians, t h e balance can s h i f t more i n t h e d i r e c - t i o n o f t h e o r e t i c a l exposition.
1. Large Batch-Oriented Statistical Packages
T h e most w i d e l y d i f f u s e d data analysis tools a r e t h e l a r g e s t a t i s t i c a l p a c k - ages s u c h as SAS, BMDP, SPSS-X, a n d o t h e r sets o f p r o g r a m s t h a t o f f e r a w i d e v a r i e t y o f data management a n d analysis p r o c e d u r e s . These packages can b e u s e d t o c a r r y o u t analyses i n c o n j u n c t i o n w i t h a s t a t i s t i c s c o u r s e A data f i l e can b e set u p f o r s t u d e n t s t o use, a n d t h e y can c a r r y o u t a s s i g n - ments o r w o r k o n p r o j e c t s u s i n g an available s t a t i s t i c a l package. A p a r t i a l s t e p i n t h i s d i r e c t i o n is t h e t e x t b o o k by Mosteller, Fienberg, a n d R o u r k e (1988), w h i c h supplements t h e discussion o f m u l t i p l e r e g r e s s i o n by d i s - p l a y i n g a n d i n t e r p r e t i n g computer o u t p u t ; a f e w problems a r e even p r o - v i d e d s p e c i f i c a l l y f o r s t u d e n t s w i t h access t o a computer w i t h a m u l t i p l e r e g r e s s i o n p r o g r a m .
T h e use o f l a r g e packages f o r i n s t r u c t i o n has b o t h advantages a n d d i s - advantages. T h e p r i n c i p a l advantage o f u s i n g s t a t i s t i c a l packages i s t h e i r comprehensive n a t u r e . A l t h o u g h t h e m u l t i p l i c i t y o f available p r o c e d u r e s can cause confusion a t f i r s t , s t u d e n t s p r o f i t by b e i n g exposed t o t h e v a r i - e t y o f possible methods t h a t can b e a p p l i e d t o t h e analysis o f data. T h e y also l e a r n a g r e a t deal i n t h e process o f f i g u r i n g o u t t h e meaning o f t h e computer o u t p u t . Furthermore, since s t u d e n t s w i l l most l i k e l y b e u s i n g s t a t i s t i c a l packages i n t h e i r w o r k l a t e r on, t h e use o f these p r o g r a m s can make t h e courses seem more r e a l i s t i c a n d p r a c t i c a l .
It i s also w o r t h n o t i n g t h a t t h e s e packages a r e r e a d i l y available i n most u n i v e r s i t i e s . If adequate computer time can b e o b t a i n e d f o r i n s t r u c t i o n , a s e t o f exercises can b e developed a n d q u i c k l y i n c o r p o r a t e d i n t o a course. A d i s a d v a n t a g e o f u s i n g s t a t i s t i c a l packages f o r i n s t r u c t i o n i s t h e f a l s e sense o f p r e c i s i o n t h a t s t u d e n t s can a c q u i r e . T h e computer p r o g r a m s r a r e - ly complain when basic assumptions a r e b e i n g v i o l a t e d o r when t h e f o r m o f t h e data i s i n a p p r o p r i a t e f o r t h e p r o c e d u r e used. T h e computer p r i n t o u t can look good b u t s t i l l b e nonsense.
A n o t h e r p r o b l e m i s t h a t s t u d e n t s can g e t b o g g e d down w i t h t h e mechanics o f p r o d u c i n g t h e r u n s . T h e c o r r e c t specification o f c o n t r o l language a n d o p t i o n s can b e so d i s t r a c t i n g t h a t s t u d e n t s i m p l i c i t l y come t o t h i n k t h a t t h e successful p r o d u c t i o n o f computer p r i n t o u t i s t h e same as t h e s o l u t i o n t o a s t a t i s t i c a l problem. It i s i m p o r t a n t f o r students t o remain aware t h a t t h e p u r p o s e o f t h e c o u r s e i s t o teach s t a t i s t i c a l p r i n c i p l e s a n d s o u n d a n a l y t i c t e c h n i q u e
- n o t j u s t t o develop f a c i l i t y w i t h a p a r t i c u l a r s t a t i s t i c a l p a c k -
age.A r e l a t e d p r o b l e m i s t h a t t u r n - a r o u n d time can s t i l l b e slow a t m a n y i n s t i - t u t i o n s . I d e a l l y t h e computer s h o u l d g i v e r e s u l t s immediately f o r t h e p r o b - lems we submit. A d e l a y o f a f e w h o u r s may b e tolerable; however, if t h e r e s u l t s a r e n o t available for days, it is easy f o r students t o lose t r a c k o f w h a t t h e whole p o i n t o f t h e analysis was. Once again, it i s i m p o r t a n t t h a t s t u d e n t s n o t g e t b o g g e d down w i t h mechanical details.
O n balance, consequently, t h e use of l a r g e s t a t i s t i c a l packages f o r t e a c h - i n g s t a t i s t i c s can b e helpful, b u t t h e y a r e c e r t a i n l y n o t t h e ideal s o l u t i o n . It i s w o r t h c o n s i d e r i n g o t h e r approaches as well.
2 .
l
n t e r a c t i v e A n a l y s i s ProgramsComputer p r o g r a m s f o r minicomputers a n d microcomputers have b e g u n t o appear w h i c h a r e d e s i g n e d f o r i n t e r a c t i v e data analysis. These p r o g r a m s a r e somewhat more p r o m i s i n g f o r t e a c h i n g s t a t i s t i c s t h a n a r e t h e l a r g e b a t c h - o r i e n t e d packages. Examples a r e t h e S package f r o m Bell Labora- tories, a n d ISP f r o m Berkeley; t h e microcomputer v e r s i o n s o f SPSS a n d SAS can also b e c o n s i d e r e d semi-interactive.
T h e d i s t i n g u i s h i n g c h a r a c t e r i s t i c o f i n t e r a c t i v e data analysis p r o g r a m s i s t h a t t h e r e s u l t s o f one stage o f analysis can immediately b e u s e d t o s p e c i f y t h e n e x t stage. A t t h e simplest level t h i s means t h a t t h e r e s u l t s can b e examined o n t h e screen o f a computer terminal a n d used as t h e basis f o r an a d d i t i o n a l analysis specification. A t a more sophisticated level t h i s means t h a t i n t e r m e d i a t e r e s u l t s a r e available f o r examination a n d can g u i d e f u r t h e r a n a l y t i c steps w i t h o u t r e q u i r i n g t h e u s e r t o issue a whole new s e t o f commands.
T h e main advantage o f i n t e r a c t i v e p r o g r a m s i s t h i s immediate f e e d b a c k . T h i s a p p r o a c h encourages e x p l o r a t i o n o f t h e data, a n d it facilitates t h e development o f good p r o b l e m - s o l v i n g strategies. Some o f t h e disadvantages
of b a t c h - o r i e n t e d programs can t h u s b e overcome.. T h e t e x t b y McNeil (1 977) i l l u s t r a t e s some of t h e s e p o s s i b i l i t i e s .
O n t h e o t h e r hand, i n t e r a c t i v e s t a t i s t i c a l p r o g r a m s s t i l l have some. o f t h e same problems as b a t c h p r o g r a m s f o r p u r p o s e s o f t e a c h i n g s t a t i s t i c s . T h e r e a r e s t i l l t h e problems o f h a v i n g t o spend a l o t o f time l e a r n i n g t o use t h e p r o g r a m s a n d o f g e t t i n g d i s t r a c t e d by t h e mechanics o f p r o d u c i n g r e s u l t s . F u r t h e r m o r e , i n t e r a c t i v e p r o g r a m s p r o d u c e r e s u l t s i n t h e same u n c r i t i c a l manner as a n y o t h e r s t a t i s t i c a l computer program, a n d t h o s e r e - s u l t s need n o t make much sense.
A l l i n all, however, an i n t e r a c t i v e s t a t i s t i c a l package i s l i k e l y t o b e more a p p r o p r i a t e f o r t e a c h i n g s t a t i s t i c s t h a n a b a t c h - o r i e n t e d package. If s u c h a s e t o f p r o g r a m s i s available f o r s t u d e n t use, it i s well w o r t h i n v e s t i - g a t i n g .
3. Simulations Based o n Spreadsheets
T h e r e i s one o t h e r computer a p p l i c a t i o n t h a t I w i s h t o mention i n t h i s c o n - t e x t , a n d t h a t i s t h e use of spreadsheet programs t h a t a r e w i d e l y available f o r microcomputers. These p r o g r a m s s e t u p a l a r g e m a t r i x i n w h i c h some cells can b e d e f i n e d as a f u n c t i o n o f o t h e r cells. When t h e l a t t e r cells a r e modified, t h e n t h e cells d e f i n e d as f u n c t i o n s also change. These f u n c t i o n s can b e as simple as t h e sum o f t w o cells o r as complex as a c h i - s q u a r e s t a - t i s t i c based o n a c o n t i n g e n c y table.
T h e i n s t r u c t i o n a l value of these spreadsheet programs i s d u e t o t h e i r a b i l i - t y t o make clear and t r a n s p a r e n t how a set o f observations generates a p a r t i c u l a r s t a t i s t i c . A s t u d e n t can see immediately how a change i n o b - s e r v e d values leads t o a change i n t h e s t a t i s t i c (Kelman e t al., 1983). If s e t u p a p p r o p r i a t e l y , a spreadsheet application can allow t h e s t u d e n t t o simulate e a s i l y t h e e f f e c t s o f d i f f e r e n t d i s t r i b u t i o n s o r p a t t e r n s a n d t o examine t h e b e h a v i o r o f a s t a t i s t i c u n d e r many conditions.
L e t me g i v e a r e l a t i v e l y simple example o f a spreadsheet t h a t I h a v e used i n m y own courses. I n a course o n l o g - l i n e a r a n a l y t i c methods I wanted t h e s t u d e n t s t o develop a good i n t u i t i v e sense o f t h e b e h a v i o r o f t h e odds- r a t i o i n a 2 x 2 c o n t i n g e n c y table. So I s e t u p a spreadsheet w i t h a c o n t i n - g e n c y t a b l e a n d t h e computed o d d s - r a t i o ( p l u s Yule's Q and t h e l o g a r i t h m o f t h e o d d s - r a t i o ) . T h e s t u d e n t s c o u l d change t h e cell c o u n t s o f t h e c o n - t i n g e n c y t a b l e a n d see immediately how those changes a f f e c t e d t h e odds-
r a t i o ( a n d t h e o t h e r associated s t a t i s t i c s ) . I n a v e r y s h o r t p e r i o d o f time a s t u d e n t c o u l d e x p l o r e q u i t e t h o r o u g h l y t h e b e h a v i o r o f t h i s s t a t i s t i c u n d e r v a r i o u s c o n d i t i o n s .
A more i n t e r e s t i n g a n d complex example f r o m t h e same c o u r s e was a s p r e a d s h e e t s e t u p t o generate t h e model o f independence a n d t h e associ- a t e d c h i - s q u a r e s t a t i s t i c i n a b i v a r i a t e c o n t i n g e n c y table. A s t a r t i n g t a b l e was also i n c l u d e d so t h a t some o f t h e cells c o u l d be d e f i n e d as s t r u c t u r a l zeros. T h e s t u d e n t c o u l d manipulate b o t h t h e cell f r e q u e n c i e s a n d t h e s t a r t i n g t a b l e i n o r d e r t o see how well t h e models o f independence a n d
quasi-independence fit t h e c e l l f r e q u e n c i e s . 1 f o u n d t h i s spreadsheet a p p l i - c a t i o n t o b e p a r t i c u l a r l y h e l p f u l t o t h e s t u d e n t s .
T h e s e spreadsheet a p p l i c a t i o n s can f o c u s t h e s t u d e n t s ' a t t e n t i o n o n c e r t a i n i m p o r t a n t p r i n c i p l e s o r o n t h e b e h a v i o r o f a s p e c i f i c s t a t i s t i c . Once s e t up, t h e s e spreadsheets a r e v e r y e a s y f o r s t u d e n t s t o use, a n d t h e y can p r o - v i d e r e a l i n s i g h t s i n t o t h e s u b j e c t m a t t e r o f t h e course.
T h e limitations o f t h e s e spreadsheets, o f course, a r e v e r y clear. N o t e v e r y calculation can b e s e t u p c o r i v e n i e n t l y in such a format, a n d t h e f u n c t i o n s available (logs, s q u a r e roots, t r i g o n o m e t r i c functions, a n d so f o r t h ) v a r y f r o m one p r o g r a m t o t h e o t h e r . A n o t h e r problem i s t h a t t h e data s e t f o r calculations m u s t b e r e l a t i v e l y small, if s t u d e n t s a r e t o m o d i f y it easily a n d i n t e l l i g i b l y .
A l l in all, however, I b e l i e v e t h a t t h i s approach t o u s i n g computers holds g r e a t promise f o r t e a c h i n g s t a t i s t i c a l p r i n c i p l e s . A t t h e p r e s e n t time it i s a l i t t l e cumbersome f o r t h e p r o f e s s o r t o have t o s e t u p spreadsheets f o r a course, b u t I s u s p e c t t h a t in d u e t i m e some a p p r o p r i a t e l y d e s i g n e d s o f t - w a r e w i l l f a c i l i t a t e t h e t a s k o r p r o v i d e some e q u i v a l e n t methodology.
4. Conclusion
\
T h e b a s i c q u e s t i o n we need t o a d d r e s s i s how people l e a r n
-
more s p e c i f i c - ally, how t h e y l e a r n s t a t i s t i c a l concepts a n d techniques. Most o f us have p r o b a b l y come t o t h e conclusion t h a t it i s n o t enough t o teach statistical theorems a n d d e r i v a t i o n s , w i t h a f e w exercises t h r o w n in. We need more emphasis o n t h e analysis o f data, o n p u t t i n g s t a t i s t i c s t o w o r k .Nevertheless, it i s n o t always easy t o develop an approach t h a t i n t e g r a t e s t h e o r y , data analysis, a n d t h e r e q u i s i t e computer s k i l l s . A s discussed above, s t a t i s t i c a l packages a n d p r o g r a m s can b e helpful, b u t t h e y a r e n o t a s u b s t i t u t e f o r u n d e r s t a n d i n g s t a t i s t i c a l t h e o r y . A s we also noted, s t u - d e n t s can spend so much t i m e g e t t i n g t h e computer t o d o something t h a t t h e y lose s i g h t o f t h e s t a t i s t i c a l p r i n c i p l e s t h a t a r e supposedly b e i n g e x e r - c i sed
.
One use o f computers t h a t I c o n s i d e r especially p r o m i s i n g is t h e use o f spreadsheet programs, w h i c h a r e available f o r p r a c t i c a l l y a n y m i c r o - computer. Once s e t u p a p p r o p r i a t e l y , these programs allow s t u d e n t s t o change a set o f data a n d t o see immediately t h e impact o f those changes o n a s t a t i s t i c . T h i s method can h e l p s t u d e n t s t o u n d e r s t a n d t h e meaning o f a c e r t a i n s t a t i s t i c o r s t a t i s t i c a l p r i n c i p l e a n d t o a c q u i r e an i n t u i t i v e g r a s p o f how r o b u s t a s t a t i s t i c i s u n d e r d i f f e r e n t conditions. T h i s approach, a l - t h o u g h r e l a t i v e l y undeveloped, i s a good example o f how computers a n d data analysis can f u r t h e r a t h e o r e t i c a l u n d e r s t a n d i n g o f s t a t i s t i c s .
References
Kelman, P. e t al. (1983). Computers in teachinq mathematics. Reading, Mass. : Addison-Wesley.
McNeil, D. R. (1977). Interactive data analysis: A practical primer. New York: John Wiley.
Mosteller, F . , Fienberg, S.E., & Rourke, R.E. K. (1983). Beqinninq statis- tics with data analysis. Reading, Mass. : Addison-Wesley.