• No results found

TEACHING STATISTICS THROUGH DATA ANALYSIS

N/A
N/A
Protected

Academic year: 2021

Share "TEACHING STATISTICS THROUGH DATA ANALYSIS"

Copied!
5
0
0

Loading.... (view fulltext now)

Full text

(1)

TEACHING STATISTICS THROUGH DATA ANALYSIS Thomas Piazza

Survey Research Center and Department of Sociology University of California

Berkeley, California

T h e q u e s t i o n o f how b e s t t o teach s t a t i s t i c s i s sometimes p h r a s e d i n t e r m s o f l e a r n i n g by t h e o r y v e r s u s l e a r n i n g by d o i n g . U n f o r t u n a t e l y each a p - proach, by itself, has serious d r a w b a c k s . We a l l k n o w t h a t s t a t i s t i c a l t h e o r y a n d p r o o f s t e n d t o b e f o r g o t t e n soon unless p u t t o some use. O n t h e o t h e r hand, t o p l u n g e a s t u d e n t i n t o d a t a analysis does n o t g u a r a n t e e t h a t h e o r she w i l l ~ i n d e r s t a n d t h e r a t i o n a l e f o r , a n d t h e limits of, t h e p r o - c e d u r e s t h a t can now be so e a s i l y c a r r i e d o u t by u s i n g s t a t i s t i c a l s o f t w a r e packages.

T h e r e a l problem, as 1 see it, is how t o g u i d e t h e use o f data analysis as a l e a r n i n g t o o l so t h a t it leads s t u d e n t s t o an u n d e r s t a n d i n g o f t h e statistical p r i n c i p l e s o n w h i c h t h e v a r i o u s a n a l y t i c t e c h n i q u e s a r e based. I n w h a t f o l - lows I w i l l summarize t h e p r o s a n d cons o f t h r e e approaches t o t h e use o f d a t a analysis in t e a c h i n g s t a t i s t i c s . T h e n I w i l l t r y t o d r a w a f e w c o n c l u - sions a b o u t t h e m a t t e r . Note t h a t t h e s t u d e n t s we w i l l b e concerned w i t h h e r e a r e t h o s e who a r e a t a b e g i n n i n g level a n d who a r e i n t e r e s t e d p r i m a r - ily i n application. A t a more advanced level, especially f o r those w i s h i n g t o become professional statisticians, t h e balance can s h i f t more i n t h e d i r e c - t i o n o f t h e o r e t i c a l exposition.

1. Large Batch-Oriented Statistical Packages

T h e most w i d e l y d i f f u s e d data analysis tools a r e t h e l a r g e s t a t i s t i c a l p a c k - ages s u c h as SAS, BMDP, SPSS-X, a n d o t h e r sets o f p r o g r a m s t h a t o f f e r a w i d e v a r i e t y o f data management a n d analysis p r o c e d u r e s . These packages can b e u s e d t o c a r r y o u t analyses i n c o n j u n c t i o n w i t h a s t a t i s t i c s c o u r s e A data f i l e can b e set u p f o r s t u d e n t s t o use, a n d t h e y can c a r r y o u t a s s i g n - ments o r w o r k o n p r o j e c t s u s i n g an available s t a t i s t i c a l package. A p a r t i a l s t e p i n t h i s d i r e c t i o n is t h e t e x t b o o k by Mosteller, Fienberg, a n d R o u r k e (1988), w h i c h supplements t h e discussion o f m u l t i p l e r e g r e s s i o n by d i s - p l a y i n g a n d i n t e r p r e t i n g computer o u t p u t ; a f e w problems a r e even p r o - v i d e d s p e c i f i c a l l y f o r s t u d e n t s w i t h access t o a computer w i t h a m u l t i p l e r e g r e s s i o n p r o g r a m .

T h e use o f l a r g e packages f o r i n s t r u c t i o n has b o t h advantages a n d d i s - advantages. T h e p r i n c i p a l advantage o f u s i n g s t a t i s t i c a l packages i s t h e i r comprehensive n a t u r e . A l t h o u g h t h e m u l t i p l i c i t y o f available p r o c e d u r e s can cause confusion a t f i r s t , s t u d e n t s p r o f i t by b e i n g exposed t o t h e v a r i - e t y o f possible methods t h a t can b e a p p l i e d t o t h e analysis o f data. T h e y also l e a r n a g r e a t deal i n t h e process o f f i g u r i n g o u t t h e meaning o f t h e computer o u t p u t . Furthermore, since s t u d e n t s w i l l most l i k e l y b e u s i n g s t a t i s t i c a l packages i n t h e i r w o r k l a t e r on, t h e use o f these p r o g r a m s can make t h e courses seem more r e a l i s t i c a n d p r a c t i c a l .

(2)

It i s also w o r t h n o t i n g t h a t t h e s e packages a r e r e a d i l y available i n most u n i v e r s i t i e s . If adequate computer time can b e o b t a i n e d f o r i n s t r u c t i o n , a s e t o f exercises can b e developed a n d q u i c k l y i n c o r p o r a t e d i n t o a course. A d i s a d v a n t a g e o f u s i n g s t a t i s t i c a l packages f o r i n s t r u c t i o n i s t h e f a l s e sense o f p r e c i s i o n t h a t s t u d e n t s can a c q u i r e . T h e computer p r o g r a m s r a r e - ly complain when basic assumptions a r e b e i n g v i o l a t e d o r when t h e f o r m o f t h e data i s i n a p p r o p r i a t e f o r t h e p r o c e d u r e used. T h e computer p r i n t o u t can look good b u t s t i l l b e nonsense.

A n o t h e r p r o b l e m i s t h a t s t u d e n t s can g e t b o g g e d down w i t h t h e mechanics o f p r o d u c i n g t h e r u n s . T h e c o r r e c t specification o f c o n t r o l language a n d o p t i o n s can b e so d i s t r a c t i n g t h a t s t u d e n t s i m p l i c i t l y come t o t h i n k t h a t t h e successful p r o d u c t i o n o f computer p r i n t o u t i s t h e same as t h e s o l u t i o n t o a s t a t i s t i c a l problem. It i s i m p o r t a n t f o r students t o remain aware t h a t t h e p u r p o s e o f t h e c o u r s e i s t o teach s t a t i s t i c a l p r i n c i p l e s a n d s o u n d a n a l y t i c t e c h n i q u e

- n o t j u s t t o develop f a c i l i t y w i t h a p a r t i c u l a r s t a t i s t i c a l p a c k -

age.

A r e l a t e d p r o b l e m i s t h a t t u r n - a r o u n d time can s t i l l b e slow a t m a n y i n s t i - t u t i o n s . I d e a l l y t h e computer s h o u l d g i v e r e s u l t s immediately f o r t h e p r o b - lems we submit. A d e l a y o f a f e w h o u r s may b e tolerable; however, if t h e r e s u l t s a r e n o t available for days, it is easy f o r students t o lose t r a c k o f w h a t t h e whole p o i n t o f t h e analysis was. Once again, it i s i m p o r t a n t t h a t s t u d e n t s n o t g e t b o g g e d down w i t h mechanical details.

O n balance, consequently, t h e use of l a r g e s t a t i s t i c a l packages f o r t e a c h - i n g s t a t i s t i c s can b e helpful, b u t t h e y a r e c e r t a i n l y n o t t h e ideal s o l u t i o n . It i s w o r t h c o n s i d e r i n g o t h e r approaches as well.

2 .

l

n t e r a c t i v e A n a l y s i s Programs

Computer p r o g r a m s f o r minicomputers a n d microcomputers have b e g u n t o appear w h i c h a r e d e s i g n e d f o r i n t e r a c t i v e data analysis. These p r o g r a m s a r e somewhat more p r o m i s i n g f o r t e a c h i n g s t a t i s t i c s t h a n a r e t h e l a r g e b a t c h - o r i e n t e d packages. Examples a r e t h e S package f r o m Bell Labora- tories, a n d ISP f r o m Berkeley; t h e microcomputer v e r s i o n s o f SPSS a n d SAS can also b e c o n s i d e r e d semi-interactive.

T h e d i s t i n g u i s h i n g c h a r a c t e r i s t i c o f i n t e r a c t i v e data analysis p r o g r a m s i s t h a t t h e r e s u l t s o f one stage o f analysis can immediately b e u s e d t o s p e c i f y t h e n e x t stage. A t t h e simplest level t h i s means t h a t t h e r e s u l t s can b e examined o n t h e screen o f a computer terminal a n d used as t h e basis f o r an a d d i t i o n a l analysis specification. A t a more sophisticated level t h i s means t h a t i n t e r m e d i a t e r e s u l t s a r e available f o r examination a n d can g u i d e f u r t h e r a n a l y t i c steps w i t h o u t r e q u i r i n g t h e u s e r t o issue a whole new s e t o f commands.

T h e main advantage o f i n t e r a c t i v e p r o g r a m s i s t h i s immediate f e e d b a c k . T h i s a p p r o a c h encourages e x p l o r a t i o n o f t h e data, a n d it facilitates t h e development o f good p r o b l e m - s o l v i n g strategies. Some o f t h e disadvantages

(3)

of b a t c h - o r i e n t e d programs can t h u s b e overcome.. T h e t e x t b y McNeil (1 977) i l l u s t r a t e s some of t h e s e p o s s i b i l i t i e s .

O n t h e o t h e r hand, i n t e r a c t i v e s t a t i s t i c a l p r o g r a m s s t i l l have some. o f t h e same problems as b a t c h p r o g r a m s f o r p u r p o s e s o f t e a c h i n g s t a t i s t i c s . T h e r e a r e s t i l l t h e problems o f h a v i n g t o spend a l o t o f time l e a r n i n g t o use t h e p r o g r a m s a n d o f g e t t i n g d i s t r a c t e d by t h e mechanics o f p r o d u c i n g r e s u l t s . F u r t h e r m o r e , i n t e r a c t i v e p r o g r a m s p r o d u c e r e s u l t s i n t h e same u n c r i t i c a l manner as a n y o t h e r s t a t i s t i c a l computer program, a n d t h o s e r e - s u l t s need n o t make much sense.

A l l i n all, however, an i n t e r a c t i v e s t a t i s t i c a l package i s l i k e l y t o b e more a p p r o p r i a t e f o r t e a c h i n g s t a t i s t i c s t h a n a b a t c h - o r i e n t e d package. If s u c h a s e t o f p r o g r a m s i s available f o r s t u d e n t use, it i s well w o r t h i n v e s t i - g a t i n g .

3. Simulations Based o n Spreadsheets

T h e r e i s one o t h e r computer a p p l i c a t i o n t h a t I w i s h t o mention i n t h i s c o n - t e x t , a n d t h a t i s t h e use of spreadsheet programs t h a t a r e w i d e l y available f o r microcomputers. These p r o g r a m s s e t u p a l a r g e m a t r i x i n w h i c h some cells can b e d e f i n e d as a f u n c t i o n o f o t h e r cells. When t h e l a t t e r cells a r e modified, t h e n t h e cells d e f i n e d as f u n c t i o n s also change. These f u n c t i o n s can b e as simple as t h e sum o f t w o cells o r as complex as a c h i - s q u a r e s t a - t i s t i c based o n a c o n t i n g e n c y table.

T h e i n s t r u c t i o n a l value of these spreadsheet programs i s d u e t o t h e i r a b i l i - t y t o make clear and t r a n s p a r e n t how a set o f observations generates a p a r t i c u l a r s t a t i s t i c . A s t u d e n t can see immediately how a change i n o b - s e r v e d values leads t o a change i n t h e s t a t i s t i c (Kelman e t al., 1983). If s e t u p a p p r o p r i a t e l y , a spreadsheet application can allow t h e s t u d e n t t o simulate e a s i l y t h e e f f e c t s o f d i f f e r e n t d i s t r i b u t i o n s o r p a t t e r n s a n d t o examine t h e b e h a v i o r o f a s t a t i s t i c u n d e r many conditions.

L e t me g i v e a r e l a t i v e l y simple example o f a spreadsheet t h a t I h a v e used i n m y own courses. I n a course o n l o g - l i n e a r a n a l y t i c methods I wanted t h e s t u d e n t s t o develop a good i n t u i t i v e sense o f t h e b e h a v i o r o f t h e odds- r a t i o i n a 2 x 2 c o n t i n g e n c y table. So I s e t u p a spreadsheet w i t h a c o n t i n - g e n c y t a b l e a n d t h e computed o d d s - r a t i o ( p l u s Yule's Q and t h e l o g a r i t h m o f t h e o d d s - r a t i o ) . T h e s t u d e n t s c o u l d change t h e cell c o u n t s o f t h e c o n - t i n g e n c y t a b l e a n d see immediately how those changes a f f e c t e d t h e odds-

r a t i o ( a n d t h e o t h e r associated s t a t i s t i c s ) . I n a v e r y s h o r t p e r i o d o f time a s t u d e n t c o u l d e x p l o r e q u i t e t h o r o u g h l y t h e b e h a v i o r o f t h i s s t a t i s t i c u n d e r v a r i o u s c o n d i t i o n s .

A more i n t e r e s t i n g a n d complex example f r o m t h e same c o u r s e was a s p r e a d s h e e t s e t u p t o generate t h e model o f independence a n d t h e associ- a t e d c h i - s q u a r e s t a t i s t i c i n a b i v a r i a t e c o n t i n g e n c y table. A s t a r t i n g t a b l e was also i n c l u d e d so t h a t some o f t h e cells c o u l d be d e f i n e d as s t r u c t u r a l zeros. T h e s t u d e n t c o u l d manipulate b o t h t h e cell f r e q u e n c i e s a n d t h e s t a r t i n g t a b l e i n o r d e r t o see how well t h e models o f independence a n d

(4)

quasi-independence fit t h e c e l l f r e q u e n c i e s . 1 f o u n d t h i s spreadsheet a p p l i - c a t i o n t o b e p a r t i c u l a r l y h e l p f u l t o t h e s t u d e n t s .

T h e s e spreadsheet a p p l i c a t i o n s can f o c u s t h e s t u d e n t s ' a t t e n t i o n o n c e r t a i n i m p o r t a n t p r i n c i p l e s o r o n t h e b e h a v i o r o f a s p e c i f i c s t a t i s t i c . Once s e t up, t h e s e spreadsheets a r e v e r y e a s y f o r s t u d e n t s t o use, a n d t h e y can p r o - v i d e r e a l i n s i g h t s i n t o t h e s u b j e c t m a t t e r o f t h e course.

T h e limitations o f t h e s e spreadsheets, o f course, a r e v e r y clear. N o t e v e r y calculation can b e s e t u p c o r i v e n i e n t l y in such a format, a n d t h e f u n c t i o n s available (logs, s q u a r e roots, t r i g o n o m e t r i c functions, a n d so f o r t h ) v a r y f r o m one p r o g r a m t o t h e o t h e r . A n o t h e r problem i s t h a t t h e data s e t f o r calculations m u s t b e r e l a t i v e l y small, if s t u d e n t s a r e t o m o d i f y it easily a n d i n t e l l i g i b l y .

A l l in all, however, I b e l i e v e t h a t t h i s approach t o u s i n g computers holds g r e a t promise f o r t e a c h i n g s t a t i s t i c a l p r i n c i p l e s . A t t h e p r e s e n t time it i s a l i t t l e cumbersome f o r t h e p r o f e s s o r t o have t o s e t u p spreadsheets f o r a course, b u t I s u s p e c t t h a t in d u e t i m e some a p p r o p r i a t e l y d e s i g n e d s o f t - w a r e w i l l f a c i l i t a t e t h e t a s k o r p r o v i d e some e q u i v a l e n t methodology.

4. Conclusion

\

T h e b a s i c q u e s t i o n we need t o a d d r e s s i s how people l e a r n

-

more s p e c i f i c - ally, how t h e y l e a r n s t a t i s t i c a l concepts a n d techniques. Most o f us have p r o b a b l y come t o t h e conclusion t h a t it i s n o t enough t o teach statistical theorems a n d d e r i v a t i o n s , w i t h a f e w exercises t h r o w n in. We need more emphasis o n t h e analysis o f data, o n p u t t i n g s t a t i s t i c s t o w o r k .

Nevertheless, it i s n o t always easy t o develop an approach t h a t i n t e g r a t e s t h e o r y , data analysis, a n d t h e r e q u i s i t e computer s k i l l s . A s discussed above, s t a t i s t i c a l packages a n d p r o g r a m s can b e helpful, b u t t h e y a r e n o t a s u b s t i t u t e f o r u n d e r s t a n d i n g s t a t i s t i c a l t h e o r y . A s we also noted, s t u - d e n t s can spend so much t i m e g e t t i n g t h e computer t o d o something t h a t t h e y lose s i g h t o f t h e s t a t i s t i c a l p r i n c i p l e s t h a t a r e supposedly b e i n g e x e r - c i sed

.

One use o f computers t h a t I c o n s i d e r especially p r o m i s i n g is t h e use o f spreadsheet programs, w h i c h a r e available f o r p r a c t i c a l l y a n y m i c r o - computer. Once s e t u p a p p r o p r i a t e l y , these programs allow s t u d e n t s t o change a set o f data a n d t o see immediately t h e impact o f those changes o n a s t a t i s t i c . T h i s method can h e l p s t u d e n t s t o u n d e r s t a n d t h e meaning o f a c e r t a i n s t a t i s t i c o r s t a t i s t i c a l p r i n c i p l e a n d t o a c q u i r e an i n t u i t i v e g r a s p o f how r o b u s t a s t a t i s t i c i s u n d e r d i f f e r e n t conditions. T h i s approach, a l - t h o u g h r e l a t i v e l y undeveloped, i s a good example o f how computers a n d data analysis can f u r t h e r a t h e o r e t i c a l u n d e r s t a n d i n g o f s t a t i s t i c s .

References

Kelman, P. e t al. (1983). Computers in teachinq mathematics. Reading, Mass. : Addison-Wesley.

(5)

McNeil, D. R. (1977). Interactive data analysis: A practical primer. New York: John Wiley.

Mosteller, F . , Fienberg, S.E., & Rourke, R.E. K. (1983). Beqinninq statis- tics with data analysis. Reading, Mass. : Addison-Wesley.

References

Related documents

Review of previous studies indicates they have been conflicting results and this study sought to determine the relationship of organizational structure and internal

Rank order of the most abundant species of LAs and DAs of the total assemblage and the habitats: inner flat (IF), outer flat (OF), sandbar (SB), channel (CH), shallow sublittoral

А для того, щоб така системна організація інформаційного забезпечення управління існувала необхідно додержуватися наступних принципів:

The density meter is usually installed on the dredge discharge pipe at a point significantly distant from the suction inlet which means that the indicated density reading reflects

All ALLEGHENY GENERAL HOSPITAL Personnel are required to immediately report violations or suspected violations of this Policy to the ALLEGHENY GENERAL HOSPITAL Senior

The more notes in common, the more readily the substitute can be used, the downside being that it will sound very similar to the original?. The fewer notes the substitute has in

Each chapter focuses on textual links between speciesism and the oppression of particular human groups based on gender, sexuality, and race, arguing that each novel offers new ways