sample selection - Merrill_unc_0153D

strategy. After the search was complete and studies were gathered, I refined the resulting dataset by removing duplicate studies. Last, to critically appraise the sample, I applied an adapted form of Gough and colleagues (2012) Weight of Evidence (WOE) framework. In applying the WOE framework, I omitted inappropriate studies through application of inclusion criteria and ensured relevance of remaining studies through an author-generated relevance rubric.

Search strategy. I searched for articles that report on studies of TWCs in peer-reviewed journals and reports. I chose to search only peer-reviewed journals to lend legitimacy to the selection process. I used two sources for obtaining academic articles on the topic of TWCs: online databases and reference list searches of articles. I searched the following online databases: Education Resources Information Center(ERIC), PsychInfo, ProQuest, Education Full Text, and

Figure 1. Process for narrative synthesis of researcher concepts of teacher working conditions •Search Strategy •Critical Appraisal •Appropriateness •Relevance Phase One: Sample Selection Concept elements coded: •Theory •Definitions •Concept Decompostion Phase Two: Element Extraction •Line-by-line coding •Translation and thematic coding •Construct definition building •Definition synthesis •Construct definition configuration Phase Three: Data Synthesis 2

Academic Search Premier. All database article abstracts and titles were searched using the following search term: “teach* AND work* AND condition*.” Next, I utilized database thesauruses to match keyword search terms to databases. Keywords are words or phrases that indicate the topics of articles. Articles are tagged with keywords by a computer algorithm or human reviewers, depending on the search engine. Through keyword indexing, I was able to find articles on TWCs that I may not have found through the search term, because the authors use different phrasing. Each search engine indexes articles under different keywords, so I conducted keyword searches specific to each search engine.

The processes of compiling, screening, and deleting are documented in a work flow map (Gough et al., 2012) in Figure 2. The work flow map tracks the number of articles imported from the initial search, the number of duplicates deleted, the number of articles omitted due to each of the inclusion criteria, documentation of reasons for omitting any other articles, and the number of articles included in the final sample for the systematic review. In developing the work flow map, I referenced recent systematic reviews published in the last four issues of the Review of

Educational Research (Adesope, Trevisan, & Sundararajan, 2017; Peltier & Vannest, 2017; Sabey, Charlton, Pyle, Lignugaris-Kraft, & Ross, 2017; Singer & Alexander, 2017; Surr et al., 2017). The results from term and keyword searches, totaling 7326 records, were imported into software specifically developed for systematic review, EPPI-Reviewer 4 (Gough et al., 2012). The software has the functionality to import references in a variety of formats; label the date the citation is returned and the search engine each citation is returned from; and detect duplicates.

EPPI-Reviewer 4 includes a sensitive duplicate detection algorithm that matches exact words, case (upper or lower), punctuation (e.g. use of hyphenation or comma), and spacing of the author, title, journal name, page number, and date of publication. The program generates groups

Figure 2. Workflow map of teacher working conditions systematic document sample selection

the level of similarity between the master article and each other article. Remembering that these articles are grouped because they are likely duplicates, ratings in my review did not drop below 0.78. A rating of 1.00 indicates an exact match in every search field, with few exceptions, and a 100% chance that the article is a duplicate of the master (EPPI Reviewer 4 User Manual, 2015). Search fields include author(s), title, year of publication, journal name, and page numbers.

Though the user-manual states what a score of 1.00 indicates, it does not provide information regarding how EPPI Reviewer’s algorithm scores articles. Therefore, I hand-coded approximately 200 articles to gauge the sensitivity of the algorithm and found that a rating of between 0.99 and 0.90 indicates small differences in punctuation, case, or formatting. I found that ratings between 0.90 and 0.85 indicate a major difference in one field but major agreement in all other fields. This was often that the article title in one record was typed in all caps, while the article title in the second article capitalized only the first letter of the first word. Articles with ratings below 0.84 had a major difference in two fields, the major differences again being those of capitalization or abbreviated versus full journal names.

When I ran the algorithm on the sample of articles imported into the software, the software detected 1598 likely duplicates and assigned duplicate likeliness ratings to these articles. I auto-assigned all articles with a rating of 1.00 as duplicates. During hand-coding, I noticed that one search engine’s results included email addresses in the author entry and all other search engine results did not. This difference in author entry caused the duplicate ratings of the articles from this search engine to drop to around 0.85. I searched for ratings of between 0.85 and 0.87 and found that all ratings in this range were from the same search engine results. Given the sensitivity of the search algorithm and my discovery of the reason for lowered ratings on this group of articles, I auto-assigned articles with 0.85 ratings and higher as duplicates. Then, I

hand-coded the remaining 527 likely-duplicate records, with ratings ranging between 0.84 and 0.78. Of these low-rated records, only two non-duplicates were found. One was an errata report of the master record and the other was a review article of the master record. The precision of the algorithm, evidenced through the initial hand-coding and hand-coding of the lowest-rated records, supports the decision to mark records with 0.85 ratings and higher as duplicate records.

After duplicate articles were deleted, the remaining sample’s title, keywords, and abstract were hand-screened to detect articles that meet the inclusion criteria described below, ensuring appropriateness of articles remaining. During screening of record titles and abstracts, I identified 59 additional duplicates not recognized through the software’s algorithm. These records often contained differences in author name formatting, such as including the entire first name, rather than author initials or a difference in the ordering of authors. These differences made the records “too different” for the algorithm to detect as a likely duplicate. The manual detection of more duplicates not found by the software and the very low number of non-duplicates (two) hand- coded, suggest that the software’s algorithm is conservative in its identification of duplicates, further supporting my decision to auto-assign articles with a duplicate likeliness rating of 0.85 and higher as duplicates. In total, 1650 duplicates were removed from the sample through auto- assignment, hand-coding likely-duplicates, and manual detection during the screening of titles, keywords, and abstracts.

Critical appraisal. I followed an adapted form of Gough and colleagues’ (2012) WOE framework, which has three steps, to determine: 1) soundness of studies; 2) appropriateness of studies; and 3) relevance of the study focus to the review. The first step outlined by Gough and colleagues (2012), soundness of studies, is a review of the sampled studies’ research methods. The current synthesis does not rely on study findings; thus, I did not critically appraise study

methods because they have no bearing on the outcome of the synthesis. I adapted the WOE framework by applying only Steps Two and Three: appropriateness and relevance criteria.

Appropriateness of studies (inclusion criteria).The WOE framework’s section on appropriateness of studies aligns with sections often labeled as “inclusion criteria.” My search had six inclusion criteria, which were applied through a review of the title and abstract. If ambiguity remained after reviewing these two sources, I left the record in the sample for further review. First, I only included studies of working or school conditions that focus on the working conditions of teachers, as opposed to principals or students. A total of 440 non-teacher related articles were removed. I was guided in applying the TWCs criterion by referencing the language about TWCs used by the authors of the study. Specifically, I searched for articles that self-

describe the purpose of the study as investigating “working conditions,” “workplace conditions,” or other variations of the topic. Through application of this criterion, I removed 3368 articles with purposes that do not include the investigation of TWCs.

Second, only articles published in English and studying K-12 public schools in the United States remained in the sample. Two, 662, 112, and 809 records were removed due to criteria related to language of publication (English), the level of schooling (K-12), the funding of schooling (public), and the country in which the study took place (United States), respectively.

Third, the final sample included only studies published in or after 2002. This is the year that the landmark education legislation No Child Left Behind (NCLB) was enacted, representing a marked change in schools, due to the introduction of high stakes testing and sanctions for low- performing schools. Research on TWCs before NCLB does not represent the same context teachers experienced after NCLB. However, literature on TWCs has developed since 2002, permitting a contemporary review within the timeframe since the enactment of NCLB. I

employed search engine filters to omit pre-2002 articles, so I do not know the exact number omitted.

Fourth, following quality assurance criteria applied by researchers conducting reviews of the literature in current academic journals (see Gast, Schildkamp, & Van der Veen, 2017;

Muenks & Miele, 2017; Singer & Alexander, 2017; Gierl, Bulut, Guo, & Zhang, 2017; Welsh, 2017), I kept in the sample only academic articles published in peer-reviewed journals. Though search engine filters for peer-reviewed journals were employed, greatly limiting the number of non-scholarly material, 64 articles that are opinion and editorial pieces were removed, as are two historical articles. Fifth, 69 studies of virtual public schools were omitted from the sample, as the working conditions in these schools are materially different than physical schools (Muirhead, 2000).

Last, the research question of this synthesis inquires after the conceptions of researchers about TWCs. Inductively derived concepts of TWCs are more representative of participant conceptions of TWCs, which are interesting in their own right but are not within the scope of this synthesis. Therefore, only articles that contain a pre-conceived (a priori) articulation of TWCs were included in the sample; applying this criterion, I removed 52 articles from the sample. Resulting from the article search and exclusion process, a total of 87 full-text articles remained in the sample for relevance analysis.

Relevance of studies. At this point, I explain the nomenclature I use as I move to different areas of analysis, software analysis platforms, and levels of analysis. I apply these terms

throughout this synthesis for continuity and clarity. While working with the Relevance Rubric, I describe below, I refer to the major parts of the rubric as “elements” and smaller parts of each element as “sub-elements.” While using NVivo software, I “coded” material tagged from the

Relevance Rubric. In this stage, the data were in larger “chunks” of the major elements of the coding schema: Theory, TWCs Definitions, and TWCs Concept Decomposition, meaning how researchers breakdown the TWCs concept to study it. After initial coding, the coded “chunks” of the elements and sub-elements were transferred to spreadsheets or datasheets (I use these terms interchangeably). Theory and TWCs definitions codes were relatively small in number when compared to the TWCs Concept Decomposition data. Therefore, these two elements’

terminology need no further explanation.

The TWCs Concept Decomposition data consist of the labels, sub-labels, and measures of TWCs labels or sub-labels extracted from articles. For clarity concerning what is being analyzed and how I moved through the analysis, I use one set of words for how article authors organize their concepts of TWCs and another set for how I synthesized those concepts. I reserve the word “label” for words from sample articles describing the broadest elements of TWCs. Thus, articles include labels, sub-labels, and measures of TWCs. I iteratively organized these labels, sub-labels, and measures. In the resulting typology from the analysis and synthesis of concept

decomposition data, categories are the broadest concepts, followed by components and then the narrowest concepts are sub-components. As an example of the usage of these terms, Burkhauser (2017) includes this label “Human capital,” which the author breaks into five sub-labels

(Number of Additional Teacher Tasks, Total Teacher Hours on Non-Instructional Tasks, Formal Teacher Written Feedback, Formal Teacher Written Feedback, and Formal Teacher Written Feedback). Each sub-label is then measured by a different survey item. This nomenclature helps to prevent confusion concerning the origin of a label (sample article) or category (assigned during this synthesis).

After the application of inclusion criteria to ensure appropriateness of the studies in the sample, I analyzed the relevance of each study against the review’s research question. Relevance criteria are applied to sample articles to ascertain whether they are “fit for the purpose” of this narrative synthesis (Petticrew & Roberts, 2006, p. 131). This review’s question inquires after how researchers conceptualize TWCs, including the theory underpinning the study, as well as how TWCs are defined, broken into labels or sub-labels, and measured.

In an effort to systematically gauge the relevance of articles to this review’s question, I developed a rubric for rating sample articles. The following sections describe the rubric, including rationale for weights allocated to each element, provide a template of the rubric in Table 1, and explain how the rubric rating was applied to omit irrelevant articles and limit the number of poorly contributing articles to the current synthesis.

Relevance rubric description and weighting rationale. The criteria that make up the rubric mirror the coding schema of the three TWCs concept elements I code during the data extraction phase. These elements are 1) theory underpinning TWCs; 2) definitions of TWCs overall; and 3) TWCs concept decomposition, specifically the labels, sub-labels, and measures of TWCs used in each study. Each element that an article provides increased the relevance of that

article to this review’s research question. The rubric criteria were differentially weighted relative

to the contribution of each element to the development of the TWCs construct definition. The rubric allocates a maximum of 10 points for the inclusion of seven sub-elements of TWCs conceptualizations. Partial points for elements were not considered, since an article either does or does not have an element. For each element, I provide a definition and describe how that element is divided into sub-elements to aid in arriving at rubric ratings. Then, I explicate my rationale for the weight allocated to that element in the rubric.

Table 1

Relevance Rubric

Elements of TWCs

Conceptualization Sub-Elements Value Points

Supporting Text

Theory underpinning TWCs Name/Description of theory _0.5

Application of theory to TWCs _0.5

Definition of TWCs Narrative definition of TWCs 2.5

Origin of TWCs definition 0.5

TWCs Concept Decomposition Labels of TWCs 2

Narrative definitions of TWCs labels 2

Measures of TWCs labels or proxy variables 2

Total: 10 0

i. Theory underpinning TWCs. This element captures any theory that authors cite in sample studies that pertain to TWCs. The “theory” element is broken into two sub-elements in the rubric: 1) name and description of theory and 2) application of theory to TWCs. Articles that name or describe the theory that underpins their study were given the points for this sub-element. Concerning the second sub-element, articles that explain how the theory applies to TWCs were given the point value for the second sub-element. For example, Ingersoll (2002) cites

organization theory and explains the role of TWCs in the school context through the lens of organization theory. Therefore, this article received credit for including both sub-elements in its conceptualization of TWCs.

The “theory and explanations of theory” element provides valuable background and context but does not contribute directly to construct definition development, as it is not expected to provide sub-components of TWCs. Thus, both sub-elements of the “theory” element are allocated 0.5 points, with the maximum possible value for this element totaling one point.

ii. Definition of TWCs. This element is sub-divided into two sub-elements: 1) narrative definition of TWCs and 2) origin of definition of TWCs. The first sub-element captures a sample author's narrative definition of TWCs as an overall concept, not of individual sub-components of TWCs. For example, Bascia and Rottmann (2011) received the points for this sub-element for including this passage: “By ‘teaching conditions,’ we refer to factors that repeatedly have been identiﬁed by teachers as critical to the quality of their work” (p. 789). An article was not credited with including this sub-element if only individual TWCs labels are defined without defining TWCs as a group.

The second sub-element captures a sample author’s explanation of the origin of that definition. For example, this sub-element would be present in an article if the author explains that the definition of TWCs produced was derived from the combined work of Johnson (1990) and Goodlad (1984). This element provides a sense of how researchers arrive at their concepts of TWCs.

An article author’s narrative definition of TWCs contributes directly to the development of the TWCs construct definition and provides detail through the narrative nature of the

definition. Therefore, I allocated this sub-element the highest relative weight possible on the rubric, 2.5 points. The “origin of TWCs definition” sub-element contributes to the TWCs construct definition in a fashion similar to the “theory underpinning TWCs” element, in that it provides valuable background and context but does not contribute directly to construct definition development. Thus, it is allotted the same point value as the theory sub-elements in the rubric, 0.5 points.

iii. TWCs Concept Decomposition. I divided this element into three sub-elements: labels of TWCs; narrative definitions of TWCs labels; and measures of TWCs labels or proxy

variables. Each of these sub-elements is critical to the development of the TWCs construct definition and is weighted with two points. For example, an article received credit for inclusion of this sub-element if the article author lists “facilities, resources, professional development, time usage, and leadership” as their conceptualization of TWCs. Alone, this sub-element offers

valuable information on how sample authors compartmentalize TWCs, which contributes directly to the development of the TWCs construct definition. Articles did not receive additional points for breaking down labels into sub-labels or measures.

The sub-element “narrative definition of TWCs sub-components” is the description of TWCs labels and sub-labels provided by sample authors. If the same example author above goes on to define “professional development” as, for example, “the available learning opportunities

In document Merrill_unc_0153D_18388.pdf (Page 41-55)