• No results found

R Graphics II: Graphics for Exploratory Data Analysis

N/A
N/A
Protected

Academic year: 2021

Share "R Graphics II: Graphics for Exploratory Data Analysis"

Copied!
72
0
0

Loading.... (view fulltext now)

Full text

(1)

Summary Plots

Time Series Plots

Geographical Plots

3D Plots

Simulation Plots

Links

UCLA Department of Statistics

Statistical Consulting Center

R Graphics II: Graphics for Exploratory Data Analysis

Irina Kukuyeva

[email protected]

(2)

Outline

1

Summary Plots

2

Time Series Plots

3

Geographical Plots

4

3D Plots

5

Simulation Plots

6

Useful Links for R

(3)

Summary Plots

Time Series Plots

Geographical Plots

3D Plots

Simulation Plots

Links

1

Summary Plots

Basic Plots

Looking at Distributions

Identify-ing Observations

Exercise I

2

Time Series Plots

3

Geographical Plots

4

3D Plots

5

Simulation Plots

(4)

Basic Plot I

1

# S t e p 1: L o a d the D a t a

2

l i b r a r y

( a l r 3 )

3

d a t a

( UN2 )

4

a t t a c h

( UN2 )

5

# S t e p 2: S u b s e t a p p r o p r i a t e l y

6

ind

< - w h i c h

( Purban > 5 0 )

7

# S t e p 3: P l o t

8

p l o t

( l o g F e r t i l i t y

~

l o g P P g d p , x l a b =

" l o g P P g d p "

,

y l a b =

" l o g F e r t i l i t y "

, m a i n =

" l o g G D P vs

l o g F e r t i l i t y P l o t "

)

9

p o i n t s

( l o g F e r t i l i t y [ ind ]

~

l o g P P g d p [ ind ] ,

col

=

"

red "

, pch = 1 9 )

10

l e g e n d

(

" t o p r i g h t "

, pch =

c

(1 ,19) ,

col

=1:2 ,

c

(

"

(5)

Summary Plots

Time Series Plots

Geographical Plots

3D Plots

Simulation Plots

Links

Basic Plots

Basic Plot II

8

10

12

14

0.0

0.5

1.0

1.5

2.0

logGDP vs logFertility Plot

logPPgdp

logF

er

tility

Purban<50

Purban>=50

(6)

Segmented bar charts I

Displays two categorical variables at a time:

1

1

s u r v e y =

r e a d

.

t a b l e

(

" h t t p :

/ /

www . s t a t . u c l a . edu

/ ~

m i n e

/

s t u d e n t s

_

s u r v e y

_

2 0 0 8 . txt "

, h e a d e r =

TRUE , sep =

" \ t "

)

2

a t t a c h

( s u r v e y )

3

b a r p l o t

(

t a b l e

( gender , h a n d ) ,

col

=

c

(

" s k y b l u e "

,

" b l u e "

) , m a i n =

" S e g m e n t e d Bar P l o t \ n

of G e n d e r "

)

4

l e g e n d

(

" t o p l e f t "

,

c

(

" f e m a l e s "

,

" m a l e s "

) ,

col

=

c

(

" s k y b l u e "

,

" b l u e "

) , pch = 16 , i n s e t =

0 . 0 5 )

(7)

Summary Plots

Time Series Plots

Geographical Plots

3D Plots

Simulation Plots

Links

Basic Plots

Segmented bar charts II

ambidextrous

left

right

Segmented Bar Plot

of Gender

0

200

400

600

800

females

males

ambidextrous

left

right

female

9

67

806

male

11

45

387

1

These two slides are modified from the SCC Mini-Course ”Introductory

(8)

Dot charts I

To compare values for variables in each category:

1

# S t e p 1: L o a d the d a t a

2

d a t a

( i r i s )

3

a t t a c h

( i r i s )

4

# S t e p 2: C a l c u l a t e m e a n s for e a c h s p e c i e s :

5

a g g r e g a t e

( i r i s [ , -5] ,

l i s t

( S p e c i e s = S p e c i e s ) ,

m e a n

) - > a

6

# S t e p 3: A s s i g n row n a m e s

7

row

.

n a m e s

( a )

< -

a [ , 1]

8

# S t e p 4: P l o t

9

d o t c h a r t (

t

( a [ , -1]) , x l i m =

c

(0 ,10) , m a i n =

"

P l o t s of M e a n s for I r i s D a t a Set "

, x l a b =

"

M e a n V a l u e "

)

(9)

Summary Plots

Time Series Plots

Geographical Plots

3D Plots

Simulation Plots

Links

Basic Plots

Dot charts II

Sepal.Length

Sepal.Width

Petal.Length

Petal.Width

Sepal.Length

Sepal.Width

Petal.Length

Petal.Width

Sepal.Length

Sepal.Width

Petal.Length

Petal.Width

setosa

versicolor

virginica

0

2

4

6

8

10

Plots of Means for Iris Data Set

(10)

Histograms

Adding Summary Statistics to Plots

Add the mean and median to a histogram:

1

h i s t

( a g e i n m o n t h s , m a i n =

"

H i s t o g r a m of Age ( Mo ) "

)

2

a b l i n e

( v =

m e a n

( a g e i n m o n t h s ) ,

col

=

" b l u e "

, lwd =3)

3

a b l i n e

( v =

m e d i a n

( a g e i n m o n t h s

) ,

col

=

" red "

, lwd =3)

4

l e g e n d

(

" t o p r i g h t "

,

c

(

" M e a n "

,

" M e d i a n "

) , pch = 16 ,

col

=

c

(

" b l u e "

,

" red "

) )

Histogram of Age (Mo)

ageinmonths

Frequency

200

250

300

350

0

100

200

300

400

● ●

Mean

Median

(11)

Summary Plots

Time Series Plots

Geographical Plots

3D Plots

Simulation Plots

Links

Looking at Distributions

Histograms I

Checking Normality

One of the methods to test for normality of a variable is to look at

the histogram (the sample density is in red, the theoretical normal

density in blue):

1

d a t a

( p r e s i d e n t s )

2

h i s t

( p r e s i d e n t s , p r o b = T , y l i m =

c

(0 , 0 . 0 4 ) ,

b r e a k s = 2 0 )

3

l i n e s

(

d e n s i t y

( p r e s i d e n t s ,

na

.

rm

= T R U E ) ,

col

=

"

red "

)

4

mu

< - m e a n

( p r e s i d e n t s ,

na

.

rm

= T R U E )

5

s i g m a

< - sd

( p r e s i d e n t s ,

na

.

rm

= T R U E )

6

x

< - seq

(10 ,100 ,

l e n g t h

= 1 0 0 )

7

y

< - d n o r m

( x , mu , s i g m a )

8

l i n e s

( x , y , lwd =2 ,

col

=

" b l u e "

)

(12)

Histograms II

Checking Normality

Histogram of Presidents'

Approval Ratings

presidents

Density

20

30

40

50

60

70

80

90

0.00

0.01

0.02

0.03

0.04

(13)

Summary Plots

Time Series Plots

Geographical Plots

3D Plots

Simulation Plots

Links

Looking at Distributions

Box and Whisker Plot I

Another method of looking at the distribution of the data is via

boxplot:

1

d a t a

( q u a k e s )

2

# S u b s e t the m a g n i t u d e :

3

ind

< - i f e l s e

( q u a k e s [ , 4] <4.5 , 0 , 1)

4

ind

< - as

.

f a c t o r

( ind )

5

l i b r a r y

( l a t t i c e )

6

b w p l o t ( q u a k e s [ , 4]

~

ind , x l a b =

c

(

" Mag < 4 . 5 "

,

" Mag

(14)

Box and Whisker Plot II

Fiji EQ since 1964

Mag<4.5

Mag>=4.5

Magnitude

4.0

4.5

5.0

5.5

6.0

6.5

0

1

● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

(15)

Summary Plots

Time Series Plots

Geographical Plots

3D Plots

Simulation Plots

Links

Looking at Distributions

Beanplot I

An alternative to the boxplot is the beanplot():

1

l i b r a r y

( b e a n p l o t )

2

par

( m f r o w =

c

(1 ,2) )

3

d a t a

( a i r q u a l i t y )

4

b o x p l o t

( a i r q u a l i t y [ , 2] , m a i n =

" B o x p l o t "

, x l a b =

" S o l a r "

)

5

b e a n p l o t ( a i r q u a l i t y [ , 2] , m a i n =

" B e a n p l o t "

,

x l a b =

" S o l a r "

)

(16)

Beanplot II

0

50

100

150

200

250

300

Boxplot

0

100

200

300

400

Beanplot

(17)

Summary Plots

Time Series Plots

Geographical Plots

3D Plots

Simulation Plots

Links

Looking at Distributions

Scatterplots I

A method of looking at the distribution and correlation of the data

is via scatterplot.matrix():

1

d a t a

( q u a k e s )

2

l i b r a r y

( car )

References

Related documents

For considering the same risk to different subjects may bring out different effect, so this paper, from the perspective point of three major stakeholders—government, SPC, and

Prevalent fractures in the upper spine region appear to be more strongly associated with QCT-based bone measure- ments than fractures in the lower spine region, similar to

The prosperity of family training in the Song Dynasty is the inevitable result of the development of family moral education.. It also has a profound influence on

Verimsiz ve müsriftir, çünkü, çal õõ şşmak isteyen herkese mak isteyen herkese. daima

ber‑Felling” (14 February 1981) in National Revolution and Indigenous Identity: The Conflict Between Sandinists and Miskito Indians on Nicaragua’s Atlantic Coast ed. Klaudine

HEVs include more electrical apparatus such as electric machines, power electronics, electronic continuously variable transmissions, embedded powertrain controllers, advanced

After  the  end  of  the  Second  World  War,  the  Soviet  Union  did  gradually come to adopt many elements of the culture of military  commemoration  that 

The primary hypothesis is that varia- tion in maternal depression symptom course from pregnancy to child age 11 years will be asso- ciated with subsequent offspring suicidal ideation