• No results found

Statistical Software

Methods Training

7.3 Statistical Software

There are many software programs available for conducting statistical analy- sis. They all have different strengths and weaknesses. SAS and SPSS are two of the oldest platforms still in use. SAS is a powerful and flexible platform capable of doing almost anything you can imagine. However, it can be a bit cumbersome to learn. SPSS also has a wide variety of built-in features. It is very easy to use because it is based on point-and-click menus. However, it can be extremely diffi- cult to use if you want to do something different that is not included in its menu system. Stata is widely used in political science. It also has a point-and-click menu system, but it is more easy to modify or adapt as needed compared to SPSS. More recently, the statistical platform R has become quite popular. It is an open source free software platform that runs on any computer. It is a cross be- tween a programming language and statistical software. In that regard, it is more similar to SAS. However,Rhas thousands of built-in commands, called functions, and a large user community that has generated thousands more additional add-on packages. Being free is a big plus, but the real value ofRis that you can program it to do anything. It is also an excellent environment for conducting simulations. I have more to say aboutR below.

Some scholars go further, using real programming languages like Python and Julia, tools for large databases like SQL, or even foundational program languages like C++. These tools go beyond what is necessary for most social science re- search. Still, it is good to know they are there.

Finally, there are a number of specialized software tools designed for specific types of analyses. Virtually all of the analysis executed by these programs could also be done in Ror some other programming environment. You will have to evaluate the usefulness of these specialized programs on a case-by-case basis. The frontier of statistical programming is always changing. That said, I want to discuss two tools in more detail.

7.3.1 Using

R

I have a strong preference for teaching and learning in R. Point-and-click menus make it too easy to conduct an analysis without really understanding what they’re doing. Programming in Rslows a student down and forces them to un- derstand what they’re doing. I often require students to program the actual math- ematical operations rather than letting them use the built-in functions. R is also great for simulations, which makes it ideal for illustrating what researchers call the Data Generating Process (discussed in more detail later in this chapter). In my view, if you learn how to useR, you can learn how to use software like SAS, SPSS, or Stata, and you can also learn to program in something like Python.

You can downloadR online at:http://cran.r-project.org/and you can learn more aboutRin general at the R-project homepage:http://www.r-project. org/. Ris not a point-and-click program. There are some Graphical User Inter- faces (GUI’s) available forR , but I don’t recommend them. Instead, you should learn to write text files, called script files inR , that sendRa series of commands to execute. It is best to use a text editor in conjunction with R. R has one built in, but there are others with many more useful features. I currently use RStu- dio (http://www.rstudio.com/), which runs on both Windows machines and Macs. Learning Rcan be a bit more challenging than learning a point-and-click program, but it is much more powerful, flexible, and is increasingly the comput- ing environment of choice for those doing statistical work across a wide range of disciplines including Political Science. More importantly, your goal is to learn about statistics, NOT about software. Programming inR is a far-superior way to learn about statistics than is using a point-and-click program.

There is no substitute for reading the documentation forR. ISTRONGLYrec- ommend that you begin with the manual called “An Introduction to R.” This doc- ument provides the core basics to understandingRas a statistical computing envi-

ronment. You can find the manual by clicking the “Manuals” link on the CRAN homepage. The direct link to the .pdf file is here: http://cran.r-project. org/doc/manuals/R-intro.pdfThis manual is also downloaded and stored on your computer when you installR.

Springer books (http://www.springer.com) has numerous books in their UseRseries that are designed to be practical applications ofRfor users. Many of these can be accessed through a university’s library online for free if the university has a subscription. One in particular that is quite useful isData Manipulation with

R by Phil Spector. Another isA Beginner’s Guide to R by Zuur et al. Chapman and Hall’s CRC Press also publishes a series calledThe R Seriesthat has many good offerings.

There are also some very helpful and short reference documents forR

commands that you might want to print and keep handy, which are located at: (http://cran.r-project.org/doc/contrib/Short-refcard.pdf) or (http://cran.r-project.org/doc/contrib/Baggott-refcard-v2.pdf) or http://www.psych.upenn.edu/~baron/refcard.pdf. John Fox wrote a book on Applied Regression, the website for which is: (http://socserv.mcmaster. ca/jfox/Books/Companion/index.html). There is a companion book focused specifically on usingRfor regression analysis. Finally, just searching online gen- erally, through RSeek (http://www.rseek.org/), or at Quick-R (http://www. statmethods.net/) will often turn up quick and easy answers to most questions. The Odum Institute (http://www.odum.unc.edu/) at the University of North Carolina has an online short course on using Ravailable here: (http://www. odum.unc.edu/odum/contentSubpage.jsp?nodeid=665). Another online for Rtraining is here: (https://www.datacamp.com/#/). There must be many oth- ers.

Finally, if you want to see Rin action, I would point you to a book I co- authored with Jeffrey Harden titledMonte Carlo Simulation and Resampling Meth- ods for Social Science, (Sage 2014). The book is meant to offer an introduction to statistical methods and simulations, and it provides lots of examples with sample code.

7.3.2 Using L

A

TEX

LATEX is not statistical software, it is a platform for scientific writing. LATEX is

a document processing environment likeR is a statistical computing environment. LATEX is not really a point-and-click system, but it is a superior environment for

producing publication-quality documents, especially if they include tables, fig- ures, and/or equations. In addition, there are some packages inRthat will format the output of Rfunctions with all the codes necessary to make the output look nice in LATEX. All you need to do is copy and paste the output into your LATEX

document. LATEX is actually used through a text editor, of which there are many.

TeXnicCenter is a popular one for Windows users. TexStudio and Texmaker work on Windows, Mac, and Unix platforms. Most of these editors include spell check functions, and there are also grammar check tools that can be downloaded and installed. They are all free and come with online help manuals. The Odum Insti- tute at UNC also has an online short course available on using LATEX located here:

(http://www.odum.unc.edu/odum/contentSubpage.jsp?nodeid=665). Using tools likeRand LATEX do not make someone smarter than a person who

uses Stata and MS Word. They are just tools. Like any tool, they are well-suited for some tasks, but not others. These two tools are quite useful and flexible, which makes learning something about them a good idea. There are corners in the social sciences, especially among methods experts, where there seems to be a bias in favor of these tools over standard software. Some scholars view the use of programming-based tools a way to separate and elevate themselves from others. Of course this is silly. I don’t think students should worry too much about this, but it is worth being aware. I am a big fan ofRand LATEX(I wrote this book using

LATEX), but only because I find them useful.