3.9 Advantages and Disadvantages of My Study Design
3.10.2 Notions of Reliability and Validity in Quantitative Research In
quantitative research, reliability means that “individual scores from an instrument should be nearly the same or stable on repeated administrations of the instrument and that they should be free from sources of measurement error and consistent” (Creswell, 2012, p. 627). Reliable surveys provide consistent measures of important characteristics (Fink, 2012). Closed-response survey questions are generally more reliable than open-ended questions, as data provided are uniform and easy to interpret (Fink, 2012). Due to this, I used mostly closed-ended, multiple choice questions on the survey portion of my study.
Further, the following item writing guidelines were followed to ensure that a survey is reliable: (1) each question should be singular and meaningful, (2) standard language rules (grammar, spelling) should be used at all times, and (3) biased words, phrases, and jargon should be avoided (Babbie, 2004; Creswell, 2012; Fink, 2012;
Fowler, 2014). Practicing good survey design increases the reliability of answers, as there is less room for misinterpretation or mistakes (Fowler, 2014).
There are several forms of reliability evidence. For this study, I utilized internal consistency. Internal consistency examines “how well different items complement each other in their measurement of the same quality or dimension” (Fink, 2012, p. 66). This is measured via a test called Cronbach’s alpha. An alpha of .7 or higher is needed in order to achieve adequate reliability to compare groups. For my study, internal consistency was measured and evaluated because it was the most appropriate given my aims.
In quantitative research validity refers to “the degree to which evidence and theory support the interpretations of test scores for proposed uses of tests” (AERA, 2014, p. 11). Validation requires that researchers obtain sufficient evidence to provide a basis for score interpretations. One of the easiest ways to approach validation evidence is to have a rival (or null) hypothesis to challenge the proposed interpretation. A null
hypothesis is the opposite of a stated hypothesis (AERA, 2014). If a researcher can find evidence to support her or his null hypothesis, or if a researcher cannot find sufficient evidence to reject it, the researcher cannot validly consider the proposed interpretation (AERA, 2014). For example, one hypothesis in this study is that LGBTQ students do not feel safe on campus. The rival (or null) hypothesis would be that LGBTQ students feel
safe on campus. Unfortunately, I was not able to find sufficient evidence to reject my null hypotheses.
Sources of Validity Evidence. The aspects of validity I examined in this study are content validity, internal structure, and convergent and discriminant validity evidence. I sought content-oriented evidence for validity, involving “careful review of the construct and test content domain by a diverse panel of experts” (AERA, 2014, p. 15). This process required experts in survey methodology, in this case my dissertation committee co-chair, Dr. Robert Johnson, to assess my instrument before I sent it to potential participants. Dr. Johnson reviewed the format of the items, response scales, and the overall instrument. In terms of the content validity of the instrument, I consulted Dr. Emily Greytak from GLSEN, and the director of the LGBT Resource Center at SRU. These individuals were asked to review the instrument in order to ensure that questions align with existing literature and issues within the field of LGBT studies. The director of the LGBTSRO suggested that I change “sexuality” to “sexual orientation” throughout the instrument, as well as “perceived or known sexuality/gender identity” to “perceived or actual
sexuality/gender identity.” Dr. Greytak suggested that I improve the instrument’s language in order to be more straightforward, as it was not always clear that I was specifically looking for responses from LGBTQ students. I followed their advice and updated the instrument’s language.
Fowler (2014) suggested that instrument questions be as reliable as possible. Further, he suggested that multiple questions be asked which measure the same subjective state in order to create a scale that increases the validity of the data gained. Researchers create scales to look for similar answers to similar questions in order to look
for reliability evidence. I followed all guidelines for writing good questions as mentioned above to increase the reliability of my instrument. I used multiple questions to measure the same domain by creating domain scores and a total scale score as well. For example, multiple questions were asked concerning school safety. The following are only three of these examples;
(1) How safe do you feel when walking alone on campus?
(2) How safe do you feel when inside of your residence hall?
(3) How safe do you feel when inside campus bathrooms?
These questions, along with others that measure safety, were a part of a “safety” scale. I used a Likert scale to capture responses that will range from one to four, with one
meaning “very unsafe” and four meaning “very safe.” I used individual questions relating to safety to create a composite variable. After creating a composite variable, I had a scale that included all safety scores. I followed similar procedures for the other domains represented on the survey.
Finally, I am interested in pursuing consequential validity, though I will not be able to do so at this time. Consequential validity refers to the benefit or the detriment from the use of an instrument (Messick, 1988). I would like to examine the positive and negative social consequences that arise due to this study’s completion in a future study.
3.10.3 Combining and Mixing Methods. Employing multiple sources for data