• No results found

Session 4: Descriptive statistics and exporting Stata results

N/A
N/A
Protected

Academic year: 2021

Share "Session 4: Descriptive statistics and exporting Stata results"

Copied!
5
0
0

Loading.... (view fulltext now)

Full text

(1)

Session 4: Descriptive statistics and exporting Stata results

In this session we are going to work with descriptive statistics in Stata. First, we present a short introduction to the very basic statistical contents of the session and then we will explain the way of obtaining them in Stata.

1. Short introduction to descriptive statistics

Descriptive statistics is used to describe the contents and properties of a given variable. With a number, or a limited set of numbers, we can easily know how is a variable distributed in our sample/population of interest.

Average

It is the most well-known descriptive statistic, equal to the sum of all cases divided by the number of cases

n

x

X

n i i

1 Weighted average

Every observation is weighted by a given value, that represents the importance of its contribution to the final average. It is calculated just like the average but multiplying each observation by its weight and dividing by the overall sum of weights

 

k i i k i i i

w

w

x

X

1 1 Median

It is the central value of a variable: it has as many cases below ad above. More formally, it is the value of the distribution that satisfies the condition of having half of the values lower or equal and the other half being higher or equal to it. In case that the number of cases was even, the median would equal the average of the two central values.

Mode

(2)

Quartiles are an extension of the median: are those values that have a 25%, 50%, and 75% of the cases below them, respectively. Percentiles are, in turn, a generalization of the same idea: percentile p has p% of the values below and (100-p)% above.

Variance

The variance expresses how a distribution is spread out. It equals the mean of the squareddeviations of that variable from its mean

n X x n i i

   1 2 2 ) (

Standard deviation

The standard deviation is the square root of the variance:

2

2

s

s

The standard deviation is important because it has some interesting properties. It is the most widely used dispersion statistic. In general, we can take as a reference point what we know on the normal distribution: 95% of the cases are within, aprox, +/- 2 standard deviations from the mean, and 99,87% within +/- 3 standard deviations

Range

The range of a variable equals the difference between the largest and smallest values, and expresses its amplitude.

(3)

Interquartile range.

The range might be affected by extreme values, and therefore misrepresent the amplitude. We can use the interquartile range, that equals the difference between the third and first quartiles. Within the interquartile range we will have half of the cases.

R = Q 3-Q 1

Skewness

It measures the symmetry of the distribution. It take the normal distribution as a reference point, because it is perfectly symmetrical. A normally distributed variable would have a skewness of 0. Otherwise the skewness can be:

1. Positive: A longer tail to the right, more observations on the left and therefore, few high values. Also called right-skewed

2. Negative: longer left tail, more observations to the right and few low-values. Also called left-skewed

Descriptive statistics in Stata

Stata can present all this information with the command summarize,:

 Summarize

The command summarize variable1 variable2 (etc.) details the number of valid observations, the mean, the standard deviation and the minimum and maximum value of the variables. If we want some additional information, we could use the option detail:

(4)

o Detail Typing summarize variable1 variable2, detail Stata will display the mean, standard deviation, minimum and maximum, percentiles, variance and Skewness.

 Descriptive statistics tables

The summarize command is useful for summarizing the whole sample. Although we can combine it with the options if and by to get descriptives of sub-samples, this is not the most appropriate command to do that. Stata has several useful options of building tables of descriptives by groups:

 Tabulate, summarize tabulate groupvariable, summarize(variable1) shows a frequency table of the groups defined by the variable groupvariable with the mean and standard deviation of variable1 for each group.

 Tabstat is a more powerful command, since we can include in the table a wider choice of descriptive statistics of more than one variable.

tabstat variable1 variable2, stats(mean med sd min max) by(variablegrupo) format(%9.2f)

Exporting Stata results

Stata produces results in the main window, but often we want to export them to a spreadsheet or word document. This requires some additional work.

 Log files

The Stata result window does not store the whole session, but just the last part. If we want to store the whole output we should use a log file. We can open and name it through an icon on the main window, but the same can also be done using the commands:

o Open log-file: log using file.log This opens a log file with the specified name, that will store all our activity. We can choose the format –log (plain text) or .scml (formatted). If we want to work on an existing file, we can either overwrite it (option,replace) or use the option ,append that adds the new results at the end of the file.

o Close log file: log close closes the log file

o Suspend el log file: Sometimes we might want to suspend the storing of the results and then restart is. The commands log off and log on will do the trick.

(5)

o Check the status of the log file: We might easily forget whether a log file is open or not. In this case, we can just type log in the command line and Stata would tell us.

 Copy results

Either if we use a log file or not, to export our results to word or excel we will commonly use the copy-paste functions. From Stata we can copy the relevant results by highlighting them, right-clicking on them and choosing one of the following options:

o Copy Copies the selection as text. It can be pasted on a word processor, but if we want to preserve the alignment of the tables we have to use courier or courier new fonts and choose a small font size (10, 9, 8, depending on the table).

o Copy table This is the most useful option, copies the selection as a table. If the table fits in the document, it will appear aligned by tabs, so we could easily convert it into a word table. However, this option is best suited for using excel as an intermediate step. We have to export one table at a time, and if possible select the minimum number of elements.

o Copy table as html can be useful in some contexts.

o Copy image Copies the table as an image ion the clipboard. Only useful if for whatever reason we wish to keep exactly the same appearance as in Stata.

 Advanced commands

In this introductory course we are not going to deal with these commands in detail, but in any case it is useful to know that there are several commands that can produce directly from Stata publication-quality tables that can be directly used in our papers. These commands can save us a lot of time.

 Tabout is the most complete command, a full table creation program. It needs some effort to learn it, but then it pays off. We can install it using the command ssc install tabout. And find a tutorial at

www.ianwatson.com.au/stata/tabout_tutorial.pdf .

 Esttab For more advanced analysis, mainly regression models, the command esttab will be useful, because it easily creates .rtf documents with the tables we need.

References

Related documents

Reporting. 1990 The Ecosystem Approach in Anthropology: From Concept to Practice. Ann Arbor: University of Michigan Press. 1984a The Ecosystem Concept in

The Effect of SI on Pass Rates, Academic Performance, Retention and Persistence in Community College Developmental Reading Courses, a study done by Dalton (2011), found that

Examples of predicates with modal, modality and phase verbs: 1a) Der Konsul musste diese Ida in Schtz nehmen. 1b) Konzul morade uzeti Idu u zańtitu. 2a) Unsere Tony soll

A number of samples were collected for analysis from Thorn Rock sites in 2007, 2011 and 2015 and identified as unknown Phorbas species, and it initially appeared that there were

The Lithuanian authorities are invited to consider acceding to the Optional Protocol to the United Nations Convention against Torture (paragraph 8). XII-630 of 3

Due to statements in the literature regarding the effectiveness, benefits, and likelihood of parents being more involved with younger children, the second hypothesis predicts that

Improvement in oral health care and oral hygiene habits is essential to promote better oral health and quality of life among the institutionalized elderly.. The

The following table lists hard drive options for internal disk storage of x3550 M3 server... * Note: The Advanced Feature Key and Performance Accelerator Key cannot be used at the