Long Island University
Digital Commons @ LIU
Selected Full Text Dissertations, 2011- LIU Post
2019
A Bibliometric Study on Learning Analytics
Bertha Adenijii
A Bibliometric Study on Learning Analytics
A Dissertation
Submitted to the Faculty
of
The Palmer School of Library and Information Science
at
Long Island University
by
Bertha (Abby) Adeniji
Advisor: Professor Qiping Zhang
ii Table of Contents List of Figures ... iv List of Tables ... v Dedication ... vii Acknowledgment ... viii Abstract ... ix Chapter 1: Introduction ... 10
1.1 Theoretical Positioning of Learning Analytics (LA) ... 10
1.2 Motivation and Objective ... 12
1.3 Significance ... 13
Chapter 2: Literature Review ... 14
2.1 Bibliometric Analysis ... 14
2.1.1 Bibliometric Laws ... 14
2.1.2 Citation Analysis ... 16
2.1.3 Co-occurrence Analysis (co-word, co-co citation) ... 18
2.2 Keyword Analysis ... 19
2.2.1 Multi-Dimensional Scaling (MDS)... 19
2.2.2 Bibliometric Mapping ... 20
2.3 Domain Analysis ... 20
2.3.1 Definition of Domain Analysis ... 20
2.3.2 Techniques of Domain Analysis ... 21
2.3.3 Application of Domain Analysis to Different Disciplines ... 24
2.4 Conceptual Framework of Learning Analytics ... 26
2.4.1 Data Collection Techniques of Learning Analytics ... 28
2.4.2 Data Analysis of Learning Analytics ... 30
2.4.3 Data Reporting of Learning Analytics ... 31
2.5 Current Research on Learning Analytic... 32
2.5.1 Three Periods of Research on Learning Analytics ... 32
2.5.2 Research Factors in Learning Analytics ... 33
2.5.3 Stakeholders of Research on Learning Analytics ... 36
Chapter 3: Research Questions ... 39
3.1 Statement of Research Problem ... 39
3.2 Research Questions ... 39
3.3 Concepts and Definitions... 40
Chapter 4: Methods ... 43
4.1 Choice of Databases ... 43
4.2 Search Query ... 44
4.3 Dataset ... 44
4.4 Data Collection Procedures ... 44
4.5 Domain Analysis ... 46
iii
Chapter 5: Results ... 48
5.1 Results for RQ 1: What are the bibliometric features of research on LA? 48 5.1.1 Publication Counts Over Time... 48
5.1.2 Key Publication Types on Learning Analytics ... 49
5.1.3 Key Publication Sources on LA ... 53
5.1.4 Key Authors and Affiliations by Three Periods ... 54
5.1.5 Key Subject Area by Three Periods ... 60
5.2 Results for RQ 2: What are the evolutions of research themes on LA? ... 62
5.2.1 Results of Author Keyword Trend ... 62
5.2.2 Results of Keyword Trend-Period 1 (2004-2011) ... 69
5.2.3 Results of Keyword Trend-Period 2 (2012-2013) ... 70
5.2.4 Results of Keyword Trend-Period 3 (2014-2018) ... 71
5.3 Results for RQ 3: What are the domain features of research on LA? ... 72
5.3.1 Taxonomy of Learning Analytics Research ... 73
5.3.2 Results of Co-citation Analysis ... 74
5.3.3 Results of Co-author Analysis ... 76
5.3.4 Results of Coupling Analysis ... 80
Chapter 6: Discussion ... 82
6.1 Bibliometric Profile and Trends of Learning Analytics ... 82
6.2 Domain Analysis ... 84
6.3 Implications of This Study ... 85
6.4 Limitations ... 94
6.5 Recommendations for Future Research ... 95
6.6 Conclusions ... 96
Bibliography 97 Appendix A: Comparison of Five Academic Databases ... 106
Appendix B: Top 100 Author Keywords by Alphabetical Order... 108
Appendix C: Top 100 Author Keywords by Frequency ... 117
Appendix D: Frequency of Author Keywords by Three Periods ... 120
iv
List of Figures
Figure 1: Triadic - Epistemology - Pedagogy - Assessment (Knight, 2014) ... 11
Figure 2: Traditional Hierarchy of Evidence (Grant, 2016) ... 11
Figure 3: Arango and Prieto-Diaz Model of Domain Analysis ... 24
Figure 4: Mapping LA and AA in Big Data Context (Prinsloo, 2015)... 27
Figure 5: Inclusion and Exclusion Process for LA Publications (2011-2018) ... 45
Figure 6: Learning Analytics Publications by Year and Period ... 49
Figure 7: Learning Analytics Affiliation by County - ALL ... 57
Figure 8: Learning Analytics Affiliation by Institution ALL ... 58
Figure 9: Learning Analytics Key Subject Areas – Period 1 ... 60
Figure 10: Learning Analytics Key Subject Areas – Period 2 ... 61
Figure 11: Learning Analytics Key Subject Areas – Period 3 ... 62
Figure 12: Cluster Map of Author Keywords (VOSviewer) ... 65
Figure 13: Learning Analytics -100 Keywords (2004-2018) ... 69
Figure 14: Treemap – Learning Analytics 100 Keywords (2004-2011) – Period 1 ... 70
Figure 15: Treemap – Learning Analytics 100 Keywords (2012-2013) – Period 2 ... 71
Figure 16: Treemap – Learning Analytics 100 Keywords (2014-2018) – Period 3 ... 72
Figure 17: Learning analytics Research Primary Knowledge Structures ... 74
Figure 18: Cluster Map of Co-Word Analysis (VOSviewer) ... 75
Figure 19: Cluster Map of Co-Author Citation Analysis (VOSviewer) ... 76
v
List of Tables
Table 1: Comparison of Learning Analytics and Academic Analytics (Siemens 2011) .. 28
Table 2: Three Periods of Research on Learning Analytics ... 32
Table 4: Three Phases of Bibliometric Data Collection ... 45
Table 5: Two Phases for Co-Citation Data Collection ... 45
Table 6: Four Phases for Keyword Extraction ... 46
Table 7: Four Phases for Domain Analysis ... 47
Table 8: Publication Counts of Learning Analytics Over Time ... 48
Table 9: LA Publications by Document Type – ALL ... 49
Table 10: LA Publications by Document Type - Period 1 ... 51
Table 11: LA Publications by Document Type – Period 2 ... 52
Table 12: LA Publications by Document Type – Period 3 ... 52
Table 13: Leading Learning Analytics Journal (2011-18) ... 53
Table 14: Top Conferences for Learning Analytics ... 54
Table 15: Top Authors and Most Cited Documents ... 55
Table 16: Learning Analytics Leading Author Profile ... 56
Table 17: Top 10 Authors and Affiliations by Institution and Country – Period 1 ... 58
Table 18: Key Authors and Affiliations by Institution and Country – Period 2 ... 59
Table 19: Key Authors and Affiliations by Institution and Country – Period 3 ... 59
Table 20: Author Keywords by Frequency – 16+ - ALL ... 63
Table 21: Comparison of Top 50 Author Keyword (by Frequency Ranking) for 3 Periods... 63
Table 22: Results of Dendrogram Analysis Themes of Scholarly Communications LA . 65 Table 23: Emergent/New Author Keywords ... 67
Table 24: Dimension MDS Stress Values -Scopus... 68
Table 25: Keyword Ranked by Occurrence ... 75
Table 26: Co-Author Citation Count ... 77
Table 27: Top 20 Institutions Co-cited ... 78
Table 28: Citation Count of Co-Country ... 79
Table 29: Comparison of 5 Academic Databases by LA Journals ... 106
vi
Table 31: Top 100 Author Keyword by Alphabetic Order (2004-2018) ... 108
Table 32: Top 50 Author Keyword by Alphabetic Order - Period 1 ... 111
Table 33: Top 50 Author Keyword by Alphabetic Order - Period 2 ... 113
Table 34: Top 50 Author Keyword by Alphabetic Order - Period 3 ... 115
Table 35: Top 100 of Author Keywords by Frequency (2004-2018) ... 117
Table 36: Frequency of All Author Keyword – Period 1 ... 120
Table 37: Frequency of Top 100 Author Keyword – Period 2 ... 121
Table 38: Frequency of Top 100 Author Keyword – Period 3 ... 124
vii
Dedication
viii
Acknowledgment
Thank you for the support and encouragement of friends, family colleagues, and professors. I want to acknowledge a few individuals, specifically. I am grateful to my
committee, Dr. Qiping Zhang, Dr. Gregory Hunter, Dr. Tom Walker, Dr. Selenya Aytac and Dr. Trina Yearwood for your willingness to work with me, on this journey, and for your confidence and encouragement throughout the process. Dr. Qiping Zhang, a special thank you for your patience and encouragement as you have shared your depth of knowledge and experience in the library and information science field. To my external advisor Dr. Trina Yearwood my
appreciation for your encouragement, guidance, and insight on the value of this dissertation to the higher education society. To Dr. Baaden and Dr. Aytac, thank you for your encouragement and mentorship during my last semester of learning and the dissertation section. I am grateful for your role in this research.
Finally, for my family and especially my prayer warriors, our prayers are heard. Thank you for your encouraging words of support.
ix
Abstract
Learning analytics tools and techniques are continually developed and published in scholarly discourse. This study aims at examining the intellectual structure of the Learning Analytics domain by collecting and analyzing empirical articles on Learning Analytics for the period of 2004-2018. First, bibliometric analysis and citation analyses of 2730 documents from Scopus identified the top authors, key research affiliations, leading publication sources (journals and conferences), and research themes of the learning analytics domain. Second, Domain
Analysis (DA) techniques were used to investigate the intellectual structures of learning
analytics research, publication, organization, and communication (Hjørland & Bourdieu 2014). The software of VOSviewer is used to analyze the relationship by publication: historical and institutional; author and institutional relationships and the dissemination of Learning Analytics knowledge.
The results of this study showed that Learning Analytics had captured the attention of the global community. The United States, Spain, and the United Kingdom are among the leading countries contributing to the dissemination of learning analytics knowledge. The leading
publication sources are ACM International Conference Proceeding Series, and Lecture Notes in Computer Science. The intellectual structures of the learning analytics domain are presented in this study the LA research taxonomy can be re-used by teachers, administrators, and other stakeholders to support the teaching and learning environments in a higher education institution.
10
Chapter 1: Introduction
Learning Analytics has emerged as a fast-growing and multi-disciplinary area of Technology Enhanced Learning (TEL), much like academic analytics, action research, and educational data mining (Ferguson 2015).
Learning Analytics has the potential to impact educational practices and reshape education as we know it today (Ferguson 2013). Learning analytics concerns itself with how students learn, integrating many aspects of the student’ transactions with others, in the learner
management systems. Learning analytics is an integrated platform that provides an open
infrastructure for researchers, educators, and learners to develop new technologies and methods to enhance the teaching and learning environments (Siemens et al. 2011).
1.1 Theoretical Positioning of Learning Analytics (LA)
The theories of learning, pedagogy, epistemology, and assessment serve as the “three-legged” stool, which grounds Learning Analytics (Knight et al. 2014; Suthers &
Verbert, 2013). The definition of learning analytics is the measurement, collection, analysis, and reporting of educational data (Siemens and Long 2011).
The Society for Learning Analytics Research Handbook’s (2017) states when Epistemology-Pedagogy and Assessment (EPA) and Evidenced Based Practices (EBP) are applied to Learning Analytics, it takes LA from theory to practice (Society for Learning Analytics Research Handbook 2017). As such, the EPA-Learning Analytic Triad (Figure 1) (Knight 2014) and the traditional hierarchy of EBP (Figure 2) are the theories which ground the domain (Knight 2017).
11
The Epistemology-Pedagogy-Assessment (EPA) triad (Figure 1) illustrates relationships between epistemology (the nature of knowledge), pedagogy (nature of learning and teaching), and assessment (Knight 2014). The goal of learning analytics is to provide an assessment of performance that aligns with the pedagogical feedback and the epistemological view (of the nature of knowledge) (Knight 2014).
Figure 2: Traditional Hierarchy of Evidence (Grant, 2016)
12
Evidenced Based Practices (EBP) (Figure 2) are a means of summarizing, interpreting, and disseminating information to assist the adoption of research findings. EBP is an
interdisciplinary concept defined using the “three-legged stool” to integrate the basic principles of research evidence, expert judgment, stakeholder preferences, and values (Spring 2007) (Lilienfeld 2013).
Learning Analytics researchers are required to provide evidence to support or reject claims and discoveries drawn from or validated by educational data. Information professionals use EBP to support the use of research by their users, readers, or clients. Most important to this study, EBP allows for the identification and integration of new “evidence-based” resources. EBP
begins with information retrieval, organization, and management practices to provide the information needed to support day-to-day decision making. EBP increases the awareness and understanding of the challenges and biases that slow down the adoption of research into practice (Romero 2010).
1.2 Motivation and Objective
Learning Analytics has emerged as a fast-growing and multi-disciplinary domain of TEL (Ferguson 2012). The field of learning analytics is expected to continue growth, due to the increase in the diversity and volume of student data in the learning management systems. Additionally, the technological advances in data storage and data retrieval will increase the availability and improve accessibility to learning analytic data.
The objective of this study is to explore the LA publication to reveal the contours of the knowledge fronts in research and evolving theories. According to Smiraglia 2009, “The publication of a domain is the record of its’ productivity and communication. The
13
publication reveals domain trends and identifies the social networks inside of the community (Smiraglia 2009).
1.3 Significance
This dissertation study explored the intellectual structures found in the scholarly research in the publications on Learning Analytics.
First, the research study analyzed the research trends since the emergence of Learning Analytics. It provides information about learning analytic publication, tools, and techniques that are available to support stakeholders’ efforts to enhance the teaching and learning environments.
The findings of this study offer insight into scholarly research of learning analytics and provides evidence of its use by stakeholders in the discourse communities of learning analytics.
Second, the application of knowledge organization tool and techniques, specifically the taxonomy and ontology construct will provide scientific validation to define the learning analytic domain.
Finally, the domain analysis supports scholarly communication by identifying collaboration opportunities for authors and institutions.
14
Chapter 2: Literature Review
This chapter summarizes the empirical research on learning analytics, bibliometrics, and domain analysis for knowledge organizations. The literature review also describes the theoretical foundations and conceptual frameworks of the learning analytics domain and discusses the role of big data and other methods used to collect and process Learning Analytics data.
2.1 Bibliometric Analysis
Bibliometric tools and techniques are used by library and information science
professionals to study communication processes and information flows and to understand the management and the dissemination of knowledge. Today, bibliometric techniques are often used to evaluate scientific output, to select journals for libraries, and to forecast the potential of a topic or discipline. This study describes the bibliometric features of scholarly communication in learning analytics trends. The study also provides a summary of the evolution of Learning Analytics from inception 2004 to 2018 over three periods to show the emergence of the domain.
2.1.1 Bibliometric Laws
Three laws govern bibliometric research, Lotka's law of scientific productivity, Bradford's law of scattering, and Zipf's law of word occurrence.
Lotka’s law is the basis of the infrastructure for bibliographic databases (Smiraglia
2009). The law measures and predicts the productivity of scientific researchers. The application of Lotka's Law of scientific productivity means the number of authors making 2 publications is 1 /2*2 = 1 / 4 = 0.25 of those making 1 publication; those making 3 publications: 1 / 3*3 = 1 / 9 = 0.11 of those making 1 publication (Lotka’s Law of Productivity). Research by Potter
15
(1988), Papakhian, and more recently, Hubber, Leazer and Smiragalia have applied Lotka's law of the bibliographic universe.
According to Potter 1980, Lotka’s Law is generally accurate when applied to large
bodies of literature over a long period (Potter 1980). Potter examined the occurrence of author names in two catalogs and discovered about two-thirds (63.5 percent and 69.33 percent respectively) of all names occur only once (Potter 1980).
Papakhian (1985), replicated Potter's design using a sound recordings catalog, found that fewer than half (47.6 percent) of names occurred once, concluding the presence of non-book material increases with multiple occurrences.
Huber (2002) suggest the Cumulative advantage (CA) skews the inverse relationship between authors and their publications. Lotka’s law implies constant proportional growth; his research finds that scientists have a constant production rate over their active careers.
However, the Huber model concludes that the distribution of scientific publications is due to "the skewed distribution of talent and tenacity" (Huber 2002).
Leazer and Smiraglia (1999) suggest that canonicity plays a role in the inverse relationship between authors and their publications. Today, many authors gain value from the canons of the academic community. This relationship supports the occurrence of author names in databases.
Bradford’s law of scatter refers to the distribution of the topic or discipline. Bradford’s
law estimates the exponential or diminishing return of searching for references in science
journals. In many disciplines, this pattern is known as the Pareto distribution. The law states that if sorting journals in a defined field by the number of articles into three groups, each with about
16
one-third of all articles, then the number of journals in each group will be proportional to 1:n:n² (Black 2004).
Two early researchers Zipf and Pareto championed their study of the power-law of cumulative distributions. “Zipf’s law” and “Pareto distribution” are effectively synonymous with “power-law distribution” (Neukum 1994). However, Zipf’s law and the Pareto distribution differ
from one another in the graphic representation of the cumulative distribution, Zipf made his graphs with an ‘x’ on the horizontal axis and P(x) on the vertical; Pareto did it the other way around. (Neukum 1994). Zipf’s law or the Pareto distribution is applied when the probability of measuring a value inversely as a power of that value, the quantity (Newman 2005). In this study, Zipf’s law is used to describe the rank/frequency of the appearance of words in a text. Zipf’s
Law states that in a relatively lengthy text, listing the words occurring in a text in order of decreasing frequency, the rank of any word on the list when multiplied by its frequency will equal 25,600. The equation for this relationship is: r x f = k where r is the rank of the word, f is the frequency, and k is the constant).
2.1.2 Citation Analysis
Citation analysis is a count of publication, identifying the peers, social change, and the dissemination of knowledge in core journals of a domain. It measures the relative importance or impact of an author, an article, or a publication based on the number of citations in other works.
A critical component of the establishment of a domain is the evidence of knowledge dissemination; this study uses the relatednesses of the learning analytics journals and
conferences in scholarly work. Academic databases provide access to a collection of information commonly used for research and writing.
17
According to Lopez- Illescas et al. (2008), the leading academic databases are SCOPUS and Web of Science (WoS) databases (Lopez-Illescas et al., 2008). Web of Science (WOS) is a multidisciplinary database of abstract and citation data it consists of six indices: Science Citation Index Expanded, Social Science Citation Index, Arts, and Humanities Citation Index and
Conference Proceedings Citation Index. Essential to this study is the composition of the Emerging Source Citation Index (ESCI). The ESCI uses only the journals identified critical to the leaders, funders, and evaluators of a discipline worldwide (Databases – Clarivate). Scopus is considered the world’s largest abstract and citation database of peer-reviewed literature; this
includes scientific journals, books, and conference proceedings, covering research topics across all scientific and technical disciplines. Scopus also covers conference proceedings, trade
publications, and book series (Scopus Empowering Knowledge, 2018). Other academic databases include ERIC, ACM digital library, and EBSCO provide access to capture specific categories of publication.
This study uses the citation counts to identify the top journals and conferences on learning analytics and uses them as the basis of the comparison of availability in five academic databases (Web of Science, Scopus, ERIC, ACM Digital Libraries, EBSCO) (Appendix 1).
The Keyword analyses measure the relatedness of words and terms using co-occurrence relationships. When two words occur in the same document, the words are said to co-occur in the document (Matsuo 2004). Co-authorship relationships are also used to identify the
relationship between authors, research institutions, or countries.
The Hirsch index (h-index) is the most widely accepted measure of author productivity and quality of scholarly publishing. The H-Index developed by physicist J.E. Hirsch (2005), the
18
h-index is a measure of both quality (number of citations) and quantity (number of publications). A publication’s h-index value means that the entity has ‘x’ number of publications that were cited at least ‘y’ times.
2.1.3 Co-occurrence Analysis (co-word, co-co citation)
The term co-occurrence is the frequent occurrence of two search terms within a query. This study measures the relatedness of documents, authors, journals, institutions, and countries regarding publication on Learning Analytics in higher education. Several research studies have investigated co-occurrence of terms within databases.
Nelson (1983) found that simulation models that incorporated binary dependence of index terms within databases better modeled the distribution of term co- occurrences than a model that assumed binary independence. Wolfram (1996) used descriptor term
co-occurrence to develop a simulation model for representing inter-record linkage structure in a hypertext bibliographic retrieval system where common occurrences of descriptor terms of records are the basis for inter-record linkages. The author found that a pattern existed for the observed system that could not be adequately represented by three different models.
Co-Citation is used to measure the strength of the relatedness. The measure is calculated using the number of times the two objects are cited together (Small 1973). According to Small (1973), when a document’s citation history has shared documents, these documents are said to
be co-cited. The more co-citations two documents receive, the higher their co-citation strength, and the more likely they are semantically related (Small 1973).
Co-authorship statistics are also used to identify the relationship of authors, research institutes, or countries. Citation techniques are used to trace intellectual influence from
19
designated scholarly works (DeBellis 2007). Kessler 1963 defined bibliographic coupling as the analysis of two documents that share one or more references. The publication relationship between both documents is considered similar. (Kessler 1963). Zhao and Strotmann (2008) mapped the scientific activities of the authors to reveal the intellectual structures of a scientific domain and to enhance the understanding of the author citation networks.
Co-word analysis methods use the co-word matrix, which consists of factor analysis, cluster analysis, multivariate analysis, and social network analysis. According to Small (1977), the co-word analysis is a content analysis technique that uses the repetition of word pairs of words in texts to identify relationships between ideas in a subject area (Small, 1977).
2.2 Keyword Analysis
Bibliometric and domain analyses use keywords to generalize the content of the full-text document. Keywords help the readers to quickly grasp the central idea, technique, or core method. Term frequency (TF) is a method used to quantify keywords. The keywords analysis in this study is used to identify the hotspot and the distribution of knowledge about the LA domain. Keywords allow for a holistic view of the domain where high-frequency words give insight about the core of the domain. More importantly, the low-frequency keywords offer insight into new and innovative concepts emerging in a field (Quoniam et al. 1998).
2.2.1 Multi-Dimensional Scaling (MDS)
Multidimensional scaling and hierarchical clustering provide insight into the structure of a domain (McCain 1990; Peters and Van Raan 1993; Small et al. 1990; White & Griffith, 1981). Visualizing bibliometric data with maps allows a better understanding of the relationship
20
2.2.2 Bibliometric Mapping
Bibliometric mapping is a tool used to study the structure and the dynamics of scientific fields (Van Eck 2010). Scholars utilize bibliometric maps to enhance the understanding of the domain. Visualizing bibliometric data allows a better understanding of the relationship between disciplines, invisible colleges, and research fronts.
According to Boyack & Klavans (2005), maps are a two-dimensional representation of a set of elements and their relationship. Noyons & Calero-Medina, (2009) says that science maps in literature, provide decision makers with easy-to-use tools that enhance the understanding of the complexity and heterogeneity of scientific systems (Noyons & Calero-Medina, 2009). There are many approaches to mapping with color (e.g., McCain, 1990, White & Griffith, 1981,
Leydesdorff and Rafols 2009 and Van Eck 2010) identify cluster and network nodes using color.
Van Eck et al. (2010) uses a unifies model for the VOSviewer mapping and clustering the module uses function presented by Newman and Girvan (Van Eck and Waltman 2010).
2.3 Domain Analysis
2.3.1 Definition of Domain Analysis
Domain analysis (DA) is a technique in the science of knowledge organization (KO), used to identify the intellectual base of a domain (Neighbors 1980).
In information science, the term "domain analysis" was introduced by Birger Hjørland and Albrechtsen (1995) defined a domain as thought or discourse communities in society’s division of labor. Hjørland (2002) points out that complementary empirical approaches such as bibliometric analysis, combined with other approaches epistemological and historical
21
2.3.2 Techniques of Domain Analysis
Researchers are using multiple techniques and methods to collect data to answer the question is learning analytics an emerging domain. This study draws upon the concepts of Kuhn (1962), Hjorland (1995), and Smiraglia (2012) to identify key topics and themes, referred to as domains, and their relationships.
Kuhn (1962), applied domain analysis methods to reveal the scientific footprints and discourse in academic publications (Kuhn 1962). The knowledge organizational approach to scientific evidence was popularized in Library and Information Science by Hjorland and
Albrechtsen in 1995. The Hjorland and Albrechtsen (1995), research introduced domain analysis as an effective method to gain insight into scientific research and the collaborative discourse among scholars and stakeholders.
Traditional Library and Information Science (LIS) techniques of Domain Analysis generally begin with Hjorland (1995), which presents 11 steps to the Domain Analysis approach. The steps included the use of bibliometric studies, empirical analysis, and terminology. These steps represent the core characteristics of domain analysis. (Hjørland and Albrechtsen (1995) (“Domain Analysis” (IEKO) International Society for organizational knowledge).
• Production and evaluation of literature guides and subject gateways; • Production and evaluation of classifications and thesauri;
• Research on competencies in indexing and retrieval of information; • Knowledge of empirical user studies in subject areas;
22
• Historical studies of information structures and services in domains; • Studies of documents and genres in knowledge domains;
• Epistemological and critical studies of paradigms, assumptions, and interests in domains;
• Knowledge of terminological studies, LSP (languages for special purposes), and discourse analysis in knowledge fields;
• Studies of structures and institutions in scientific and professional communication in a domain;
• Knowledge of methods and results from domain-analytic studies on professional cognition, knowledge representation in computer science, and artificial
intelligence.
Smiraglia (2012) expanded the techniques in Hjorland and Albrechtsen 1995 to include bibliometric techniques, co-word or term analysis to facilitate the triangulating of evidence about the emergence of trends in scholarly domains" (Smiraglia 2012). According to Smiraglia (2009), “The publication of a domain is the record of its productivity and communication. The
publication reveals the trends of a domain and identifies internal social networks”.
White and McCain (1998) used author co-citation analysis to map the field of
information science. Their analysis included the top 120 authors ranked by citation counts drawn from 12 journals in information science from 1972 through 1995. Their analysis clearly showed that the field of information science consists of two sub-fields, experimental retrieval, and citation analysis, and there was little overlap between their memberships.
23
Tennis (2003) describes two axes for the operationalization of a domain: 1) areas of modulation, he also defined the domain by stating its extension; and, 2) degrees of
specialization, including “focus” and “intersection” (Tennis 2003). The related works of
Abrahamson (2003) and, Ørom (2003) provide details on the creation of descriptive domain analysis.
Visualization represents the conceptual relationships between domains that are important for revealing inter- and transdisciplinary areas (Bermejo and López-Huertas 2015). Critical changes in visual patterns lead to the discovery of scientific frontiers (Chen 2017; Raghavan et al. 2015) and prove the imperfections of knowledge organization; and systems (Osinska and Bala 2010; Haworth and Sedig 2011).
DeChampeaux et al. (1993) domain analysis is a method for realizing systematic
software reuse.Domain analysis produces domain models using methodologies such as domain-specific languages, feature tables, facet tables, facet templates, and generic architectures, which describe the systems in a domain (DeChampeaux 1993).
The Arango and Prieto-Diaz model of domain analysis (Figure 3) summarized the sources of domain knowledge (Arango and Prieto-Diaz 1989). As shown in Figure 3, domain analysis is an activity that receives multiple sources of input, produces multiple outputs. Raw domain knowledge from a relevant source is input. The stakeholders include infrastructure analyst, infrastructure implementors, domain experts, and domain analysts. The outputs are semi-formalized concepts, domain processes, standards, and logical architectures.
24
Figure 3: Arango and Prieto-Diaz Model of Domain Analysis 2.3.3 Application of Domain Analysis to Different Disciplines
Learning Analytics knowledge reuse promises evidenced-based tools, techniques, and concepts, and the opportunity to integrate proven enhancements to the teaching and learning environment (Frakes and Kang 2005).
Library and Information science (LIS) is the central discipline of knowledge organization. According to Hjorland (2008), Knowledge Organizations reflects different
historical and theoretical approaches to knowledge, cognition, language, and social organization (“Domain Analysis” (IEKO) International Society for organizational knowledge).
According to Smiraglia (2012), a domain is a unit of analysis for the construction of a knowledge organization system. Therefore, a domain group must have an ontological basis that reveals the teleology and consensus on the epistemology, methods and social constructs in the
25
evolution of knowledge (Smiraglia 2012) (“Domain Analysis” (IEKO) International Society for organizational knowledge). Domain analysis is being used in interdisciplinary
environments to expand the uses and dissemination of domain knowledge in other research communities (López-Huertas 2015).
In software engineering, Jatain and Goel (2009), identified the process of steps followed to identify a domain. Additionally, the study summarized the steps needed to summarize a domain (i.e., data collection, data analysis, and classification). Jatain and Goel recommend the use of domain analysis methods in family-oriented software engineering, object-oriented engineering, diverse organizations, and implementation technologies ( Jatain and Goel 2009)
In pedagogy domain analysis and bibliometrics analysis were used to identify the highly cited social media sites in the discourse of social media technologies and assess the influence on teachers‟ decision making. The study applied the theory of axes of domain analysis (Tennis 2002). The domain analysis identified highly cited social networking sites in scholarly discourse. According to Galante, the benefits of social media use includes knowledge sharing, enhanced collaboration, increased participation and motivation, familiarity, and accrete to learning. Moreover, the study also identified factors that negatively influenced social media use in pedagogy ( Galante, 2015).
Domain analysis in the field of medicine is commonly used to evaluate the discourse in the field. In 2017 domain analysis and bibliometrics were applied to evaluate the discourse in the Telemedicine community. The study was used to evaluate the quality of care available using telemedicine (Bynum and Irwin 2011)(Patsis 2017).
26
2.4 Conceptual Framework of Learning Analytics
The conceptual framework for positioning learning analytics within higher education began with the emergence of “big data.” according to Mayer (2009), the global expansion of
computer availability and educational media increased the opportunities to improve learning processes (Mayer 2009).
According to MacNeil et al. (2014), learning analytics applies techniques from information science, sociology, psychology, statistics, machine learning, and data mining to analyze data collected during education administration and services, teaching and learning. (MacNeill et al. 2014) (Patel and Desai 2016).
Learning analytics incorporates data from formal and informal learning environments (MacNeill et al. 2014), Ferguson and Buckingham Shum (2012) introduced the concept of social learning analytics to identify patterns and behaviors at both individual and group level (MacNeil 2014). Buckingham Shum (2012) introduced the concepts of macro, micro, and meso to support institutions in the adoption of learning analytics. Macro-level analytics enable data sharing across institutions for a range of purposes, including benchmarking. Meso-level analytics work at the level of individual institutions, and include analytics based on business intelligence approaches. Micro-level analytics support the tracking and interpretation of process-level data for individual learners (Buckingham Shum 2012).
As shown in Figure 4, Big data plays an essential role in the collection of higher education data and learning analytics. Higher education data include student’ digital identities, and personal learning environments (PLEs), student’ transactions in the learner management system.
27
Figure 4: Mapping LA and AA in Big Data Context (Prinsloo, 2015)
Figure 4 also posits learning analytics as a discipline in the sphere of Big Data.
“Mapping learning and academic analytics in the context of Big Data” help visualize learning analytics. Learning Analytics occupies the “middle space” with academic analytics between the
learning sciences/educational research and the use of computational techniques to capture and analyze data (Suthers 2013).
Table 1: compares learning analytics with academic analytics in terms of their level of analysis and stakeholders. According to Mayer-Schonberger and Cukier (2014) low-cost storage, data transfer over-connected networks, and cloud-based servers create an unprecedented volume, velocity, and variety of “big data” (Mayer-Schönberger and Cukier, 2014). Learning analytics is
a way to provide stakeholders (learners, educators, administrators, and funders) with better information and insight into the factors within the learning process that contribute to learner success (Table 1) (Siemens 2011)(Ferguson 2012). Academic analytics is concerned with the
28
improvement of organizational processes, workflows, resource allocation, and institutional measurement using learner, academic, and institutional data (Siemens 2011).
Table 1: Comparison of Learning Analytics and Academic Analytics (Siemens 2011) Type of
analytics
Level or object of analysis Stakeholders
Learning Analytics
Personal level: analytics on personal performance concerning learning goals, learning resources, and study habits of other classmates.
Course-level: social networks, conceptual development, discourse analysis, “intelligent curriculum.”
Learners, educators and teaching staff
Departmental: predictive modeling, patterns of success/failure
Learners, educators
Academic Analytics
Institutional: learner profiles, academic performance, knowledge flow, resource allocation
Administrators, funders, marketing
Regional (state/provincial): comparisons between systems, Quality, and standards
Funders, administrators
National & International National governments, UNESCO, OECD, League
Tables
2.4.1 Data Collection Techniques of Learning Analytics
Educational Data Mining (EDM)
According to Long and Siemen (2011), Learning Analytics is related to the field of Educational Data Mining. The focus of learning analytics is to optimize learning by analyzing the data captured from the learning systems. (Long & Siemens 2011) (Pelaz and Alves 2017). The goal of Educational Data Mining (EDM) to predict student' learning behavior; enhance
29
domain models; assess the impact of educational support in the learning environment and advance scientific knowledge about learning and learners (Baker and Yacef 2009).
Text Mining
Text analysis or text data mining converts unstructured data into meaningful data for analysis. (Feldman 2007) The purpose or goal of text mining is to enable a user to discover the nature and relationships of concepts reflected in the data.
Multimodal
Multimodal systems provide quantifiable data from different modes such as speech, handwriting, hand gesture, and gaze (Cobb 2003). Multimodal techniques are conventional in traditional experimental educational research. However, multimodal is a relatively new concept in learning analytics.
Vision Techniques
Computer vision techniques are used to extract the gaze direction information from video recordings (Wolf 2018)(Ochoa 2018). Video recording is the medium of choice to capture gaze data (Raca 2013). A camera, or an array of cameras, are positioned to record the head and eye movements of the subject(s) (Ochoa 2018).
Posture Techniques
The posture of a learner provides information about his/her internal state (Ochoa 2017). Posture refers to the position that the body or a part of the body adopts at a given time. For example, if a student has his/her head resting on the desk, the instructor could infer that he/she is tired or uninterested in the lecture.
30
2.4.2 Data Analysis of Learning Analytics
Data analysis techniques commonly used to understand Learning Analytics data include classification, descriptive statistics, regression analysis, predictive analysis, and visual analytics.
Classification uses decision trees, logistic regression, and support vector machine
regression (Baker 2010).
Descriptive statistics use quantitative measures to compline information and summarize
data into a single number (mean, median, mode, probability distributions, covariance, and correlation to describe the data set ( Mann 1995).
Cluster Analysis - The Gaussian mixture model forms the basis for model-based cluster
analysis (Fraley 1998). According to Grandy and Bergner (2013), these latent models apply to performance trajectories of MOOC learners (Grandy 2013).
Predictive analytics in the education sector, predictive modeling aligns with
action-oriented educational policies and technology, while predictive analytics are used to identify the at-risk student in academic programs. The predictive methodology assesses student’s achievement based on prior interactions with intelligent tutoring systems (Baker 2004). Taylor and Veermachaneni (2014), used predictive methods to identify student dis-engagement in massive open online courses (MOOCs) (Taylor, Veermachaneni 2014). Predictive models are also used to detect learners who are engaging in off-task behavior (i.e., cheating) in educational environments (Xing 2016).
Visual analytics focus on building models to understand data in its context. Information
31
clusters). Static visualizations (i.e., images and infographics) provide answers to a limited number of questions that a user might have about a dataset (Few 2009).
Social network analysis includes analyses of relationships between learners and their
instructors to identify disconnected student or influencers. The social analysis uses the metadata to determine types of learner engagement within educational settings (Bienkowski et al. 2012).
2.4.3 Data Reporting of Learning Analytics
Learning analytics requires reporting on the data, the data must be processed in a manner that can be summarized in a usable format by the end user (Campbell & Oblinger, 2007; Pardo, 2014). Data reporting uses descriptive and prescriptive statistics software tools that can handle large quantities of data (Greller & Drachsler, 2012). Researchers commonly use tools like graphs, charts, dashboards, and key performance indicators to present data to the stakeholder.
Data visualization supports the learning process and encourages reflection. Visualization models support the sense-making processes of educational data from integrated technologies such as Learning Management Systems (LMS) and Massive Open Online Courses (MOOCs) have popularized learning dashboards, and nudge reports. Dashboards are the most popular visualization method used in learning analytic; they provide an overview of relevant metrics in an actionable way. Other visualizations models include social network analysis (Dawson, 2010; Dawson, Bakharia, & Heathcoth, 2010).
Information visualization concepts and methods enable learners to gain insight into their learning actions. Teachers use visualization to monitor subtle interactions in their courses. Researchers use visualization to discover and communicate patterns in large data sets.
32
2.5 Current Research on Learning Analytic
2.5.1 Three Periods of Research on Learning Analytics
The literature of a domain is the record of its productivity and the method of communicating knowledge. The publication can be used to visualize the intellectual functioning of the domain and social networks. Various techniques are used to reveal the contours of the LA paradigm its emerging domains research fronts and evolving theories.
This study stratified publication into three distinct periods (Table 2) to understand the growth of the intellectual structures based on the following significant events in the field of learning analytics.
Table 2: Three Periods of Research on Learning Analytics
Period Year Period Focus
1 2004 - 2011 Infancy Stage 2 2012 - 2013 Growing Stage 3 2014 - 2018 Peak Stage
• 2004: 1st article on learning analytics was collected in Scopus
• 2011: 1st International Conference on Learning Analytics and Knowledge Conference (LAK11) took place at Banff, Canada.
• 2012-2013: The Society for Learning Analytics Research (SoLAR) launched multiple international initiatives to support collaboration and open research to advance learning analytics knowledge.
33
2.5.2 Research Factors in Learning Analytics
The state of Learning Analytics Dawson, Gasevic Siemens, and Joksimovic (2014) identified three factors influencing the adoption of Learning Analytics as the research gap. (Dawson 2014). According to Siemens, the factors are – pedagogical knowledge and the information design skills of the individual, adoption of learning analytics, and ease of use. The study evaluated the documents published in Learning Analytics and Knowledge conferences and special issue journals (Siemens 2011).
According to Ferguson and Clow (2017), there are four research factors - improve learning outcomes, support for the learning and teaching environment, adoption, and data ethics. Ferguson and Clow, 2017,found no conclusive evidence to support the propositions ( Ferguson and Clow 2017).
Viborg et al. (2018), revisited the four-proposition presented by Ferguson and Chow 2017, these findings validate the prior study. However, the analysis of the evidence for learning analytics indicates a shift towards a deeper understanding of student’ learning experiences (Viborg 2018).
Jivet, Spech, and Draschler (2018) proposed a quality framework for learning analytics, which standardizes the evaluation of the learning analytics tools (Jivet 2018). The study used learning analytic experts to validate the knowledge in a group concept mapping study. (Jivet Draschler 2018).
Nistor, Derntl, and Klamma (2015) conducted a literature review of the mainstream of empirical LA research which investigated the innovation potential to predict learner success. The finding single studies proved innovative because they address informal educational settings,
34
video and audio records as data sources, automated assessment, and error/misconception
analysis. The main concern of the empirical research studies is the absence of educational theory frameworks (Nistor, Derntl and Klamma 2015).
Bodily et al. (2018), investigated learning dashboard aimed at learners to evaluate the integration of LA theories and models from learning sciences. The findings report: i) dashboards rarely consider concepts from learning sciences; ii) low numbers of validation instruments are used to assess the learners' skills or the tools, and iii) the focus is on dashboard's acceptance, usefulness and ease of use as perceived by learners, versus the benefit to learners. (Bodily et al. 2018 )
Pena-Ayala (2018), conducted a review of LA toil/work, research, and trends to inspire adoption to enhance teaching and learning practices. According to this study, only a few research studies acknowledged the progress of the field, which supports the dissemination of knowledge concepts and determines the role of the diverse stakeholders (Pena-Ayala 2018). Leitner, Khalil, and Ebner (2017) reviewed the learning analytics research trends, limitations, methods, and key stakeholders. The study found massive online open courses (MOOCs), enhanced the learning performance, student behavior, and benchmarking of learning environments. The study also identified the impact of poor data preparation processes impact the size of the dataset. (Leitner, Khalil, and Ebner 2017)
According to Greller and Draschler (2012), a critical component of Learning Analytics is the systemic retrieval and analysis of information in the teaching and learning environment. (Greller 2012).
35
Ali, Hatala, Gašević, &Jovanovic (2012), developed a feedback tool called
LOCO-Analyst is a learning analytics tool to provide educators with feedback on student learning activities and performance. The LA tool enhanced data visualization, user interface, and supported feedback ( Ali 2012)
This study uses the bibliometric based approach to identify the emerging topics in the Learning Analytics domain. Using research trends to identify new topics is common in scientific studies. According to Stephan, Reinhilde, and Wang (2017), there are two classifications of trend detection models, namely, text and bibliometric based approaches (Stephan 2017). In text-based approaches, keywords and terms represent the core topics of a document (Stephan 2017).
Hall, Hans‐Georg, and Yao (2008) used the Latent Dirichlet allocation LDA model to compare publications and top keywords generated in different years to identify the transition of topics (Hall 2008). Morchen et al. (2008) analyzed keywords and the time of their appearances: the variation of term frequencies in two different timestamps to provide the insights on term significance(Morchen 2008). Takao et al. (2014) constructed tf-idf vectors of conference session abstracts from publications to assess the relative importance of the sessions, and then applied the vectors to assess the evolution of a publication venue (Takao 2014).
In Bibliometrics, Hopcroft et al. (2004) used co-citation analysis to identify the
relatedness of papers (Hopcroft 2004). Arai et al. 2008 used the clusters to identify tightly knit clusters with densely distributed edges to identified related papers (Arai 2008). Elkiss et al. (2008) analyzed the cohesion of texts in documents based on the citing sentences and co-citation metrics, which provides insights on the content of the cited document (Elkiss 2008).
36
HE et al. (2009) propose a generative model using citations between documents to identify emergent topics in the evolution of the domain (He 2009).
2.5.3 Stakeholders of Research on Learning Analytics
Learning Analytics is about the learning process and the results that are beneficial to the stakeholders. Romero and Ventura (2013), identified the key stakeholders using purpose, benefits, and perspectives in the following four groups: Learners - support the learner with adaptive feedback, recommendations, response to his or her needs, for learning performance improvement. Educators - understand student’ learning process, reflect on teaching methods and performance, understand social, cognitive, and behavioral aspects. Researchers - use the right data mining technique which fits the problem, evaluation of learning effectiveness for different settings. Administrators - evaluation of institutional resources and their educational offer.
Greller & Draschler (2012), explored the critical dimensions of Learning Analytics (LA), the problem zones, and dangers to the benefits of educational data (Greller 2012). Greller and Draschler (2012), propose a generic framework for LA services to support educational practice and learner guidance. They introduce a quality assurance, curriculum development, to improve effectiveness and efficiency in the teaching and learning environment (Greller 2012).
According to Chatti et al., (2012), Learning analytics is a multi-disciplinary field involving machine learning, artificial intelligence, information retrieval, statistics, and
visualization (Chatti 2012). The Learning analytics domain converges with academic analytics, action analytics, and educational data mining. It provides a reference model based on the four dimensions of data and environments (what?), stakeholders (who?), objectives (why?), and
37
methods (how?). The study reviews recent publications on LA and its related fields and maps them to the four dimensions of the reference model (Chatti 2012). The study also identifies challenges and research opportunities in the area of LA.
Ifenthaler, Schumacher (2016) examined student perceptions of privacy principles related to learning analytics. Privacy issues for learning analytics include how personal data are
collected and stored, as well as how they are analyzed and presented to different stakeholders. (ERIC Student Perception of Privacy Principles) (Ifenthaler, Schumacher 2016)
The goal of this study is to identify knowledge structures in the LA domain for reuse by the stakeholders. The studies of the learner activity and predictive models of the Virtual
Learning Environments (VLE) stakeholder behaviors will support the enhancing the teaching and learning environment ins higher education.
The stakeholder research in LA examines VLE’s. Learner activity is introduced, which categorizes VLE stakeholders’ accesses (logs) into individual parts of the e-learning courses into
more semantically meaningful categories. Consequently, the activity represents a sequence of semantically meaningful accesses to the parts of the e-learning courses, which relate to an activity or task that a VLE stakeholder executes.
Draschler and Geller (2012), explored the educational data in learning analytics to identify the challenges and benefits of educational data. Draschler and Greller proposed a framework for learning analytics services to support educational practice and learner guidance.
The previous section shared the highlights of the highly cited articles regarding stakeholder behavior. In this section highlights of LA studies on virtual learner activity and predictive models are presented below:
38
Ceddia, Sheard, and Tibbey (2007,) . and Ceddia and Sheard (2005) adapted the multinomial logit model (MLM) to model the probability of stakeholder accesses to VLE.
Młynarska, Green, and Cunningham (2016a) examined a Virtual Learning Environment using tie
series clustering analysis of student behavior. The clustering of learner activity revealed behavioral patterns and the relationship between VLE activity to assignments and student final grades.
Mlynarska, Greene, and Cunningham (2016b) focused on early predictions of success in Moodle environments. The analysis identified three hypotheses: early submission, high level of activity, and evening activity are indicators of success.
Munk and Drlík (2014) studied student access in e-learning courses using the
multinomial logit model (MLM). The study examined access behaviors for teachers and student to e-learning course content (Munk 2014).
39
Chapter 3: Research Questions
This study examines the published research associated with learning analytics from 2004-2018. This chapter describes the details of proposed bibliometric techniques (i.e., research design, keyword analysis, database selection, dataset, preliminary issues of validity and
reliability) and data analysis plan for the research questions. Additionally, a citation and co-citation analysis will identify the most prolific authors and institutions, as well as their impact on the adoption of Learning Analytics.
3.1 Statement of Research Problem
There is much buzz about learning analytics and the potential to enhance the teaching and learning environments (Siemens 2010). Over 6100 learning analytics documents have been published since 2004. This study examined the scholarly publication to aggregate and analyze the discourse and identify the knowledge structures in the community of Learning Analytics for reuse.
3.2 Research Questions
This dissertation research will address the following research questions:
RQ 1: What are the bibliometric features of research on LA?
1a: What are the key publication sources (journals and conferences) on LA? 1b: Who are the top authors who have contributed to research on LA?
RQ 2: What are the research trends and evolution for LA?
2a: What are the key publications stratified by three periods?
40
2c: What are the research topics (based on keyword & subject area) of LA by three periods?
2d: What are the research document type of LA by three periods?
RQ 3: Does the published LA scholarship support a domain model for LA? 3a: Who are the most influential authors (Co-author analysis)?
3b: What are the most influential documents (Co-citation analysis)? 3c: What are the most referenced documents (Coupling analysis)?
3.3 Concepts and Definitions
This section of the proposed study describes some basic definitions and concepts, used in the review of research documents.
• Bibliometric – Bibliometric the science that quantitatively studies bibliometric material (Pritchard 1969). Today, bibliometrics include a variety of techniques and methods to analyze and visualize data (Small, 1999).
• Bibliographic Coupling-two documents that share references. The relationship creates a bond of “similarity” between both publications (Kessler, 1963).
• Affiliation Analysis is a study of the relationship between cited and citing documents (Diodato 1994).
• Cite Score-The citation metrics created from a social network analysis shot of Scopus, which represents the relative performance of serial titles for a point in time.
41
• Co-occurrence - The joint appearance of two terms in the document. The higher the frequency of the joint appearance of the words, the higher is the conceptual linkage (Miguel, Caprile & Jorquera-Vidal, 2008).
• Citation-Two articles that appear simultaneously in the references of a third one. Co-citation analysis enables the discovery of the most relevant authors or papers of a field of study, through the empirical consensus established by the citations of those authors or papers (Olmeda-Gómez et al. 2007). (Guerrero et al. 2014).
• Domain Analysis-This method enables the systemic exploration of the academic publication footprint. (Khun, 1962). Domain Analysis is focused on the visualization of the research structure to identify the evolution over time, providing views of the clusters of teaching, learning, and technology (Guerrero, Martinez Almela & La Rosa, 2012). • Latent Dirichlet Allocation (LDA) is a generative statistical model that allows sets of
observations to be explained by unobserved groups that explain why some parts of the data are similar. For example, if observations are words collected into documents, it posits that each document is a mixture of a small number of topics and that each word's presence is attributable to one of the document's topics. (Latent Dirichlet Allocation -Wikipedia) (Blei et al. 2003).
• Citation Analysis – Citation analysis is a bibliometrics technique used to study the relationship between cited and citing documents (Diodato 1994).
42
• Co-occurrence is the joint appearance of two terms in the document. The higher the frequency of the joint appearance of the words, the higher is the conceptual linkage (Miguel, Caprile & Jorquera-Vidal, 2008).
• NMC Horizon Report-The report series is higher education’s longest-running
exploration of emerging technology trends in higher education. The series charts the five-year impact of innovative practices and technologies for higher education across the globe.
• Link strength is an attribute that indicates the number of links of an item with other items and the total strength of the links of an item with other items (Vosviewer Manual -CWTS).
• Factor Analysis is a statistical data reduction techniques (Kim 1975). Given an array of correlation coefficients for a set of variables, factor analytic techniques enable the visualization of patterns of relationships such that the data may be 'rearranged' or 'reduced' to a smaller set of factors. (Kim 1975).
43
Chapter 4: Methods
This study uses quantitative methods to identify research trends in Learning Analytics. Traditional bibliographic analysis review publications related to the scope of the study to identify research trends, concepts, and keywords necessary to analyze and support teaching and learning. This bibliometric study explores the scholarly literature, analyzes the methodology and subject trends in Learning Analytics.
This bibliometric study will retrieve research articles on Learning Analytics from related academic databases to identify significant research trends in learning analytics. First, the choice of databases will be discussed. Then the choice of search terms will be introduced. After the procedure is described, the dataset for the proposed study will be reported.
4.1 Choice of Databases
Learning Analytics is an emerging domain of study; nearly 80% of the publications are conference proceedings and journals. A critical component of the Learning Analytic domain and this study are the relatednesses of the journals and conferences.
Academic databases provide access to a collection of information commonly used for research and writing. The focus of this study is on empirical research; the database selection criteria include the title, abstracts, and, keywords. The profile of both WOS and Scopus databases offer access to peer-reviewed literature, scientific journals, books, and conference proceedings. This analysis compared the availability of conferences and journals of five
academic databases (Web of Science, Scopus, ERIC, ACM Digital Libraries, EBSCO) to select the best fit for the study. The findings show that SCOPUS provides complete coverage of published literature on learning analytics (Appendix A).
44
4.2 Search Query
The focus of this research is the domain analysis of research on Learning Analytics. A search with the query of “learning analytics” was performed in Scopus on a single day, November 26, 2018, to avoid daily updating bias since the database continues to collect data.
4.3 Dataset
The dataset for this study includes a total of 2730 articles that were withdrawn from the Scopus database with the query “learning analytics.
Error! Reference source not found.
Table 3 shows the number of articles by three periods. Three periods were identified in the literature review section of 2.5.1.
Period 1 (2004-2011): The first era corresponds to the earliest works before 2011, during this period, 32 separate papers published in conferences, workshops, and journals in Scopus database.
Period 2 (2012-2013): This period followed the first International Conference on Learning Analytics & Knowledge (LAK 11) in 2011. The 298 documents published during this period represents nine times more documents than Period 1.
Period 3 (2014 to 2018): This period follows the establishment of The Society for Learning Analytics Research (SoLAR) in 2013. The Scopus dataset includes 2449 documents which represent eight times more documents than period 2.
4.4 Data Collection Procedures
45
Table 3: Three Phases of Bibliometric Data Collection
Phases Tasks
Document Search
• Query = “learning analytics” in Scopus
• Specify Record Fields for Data Exporting - Author, Affiliation, Document Title, Year, EID, Source Title, Volume/ Issue, pages, citation counts, source, document type, access type, and DOI Document
Preparation
• Perform Data Cleansing
• Rank Documents by Citation Frequency • Identify Highly Cited Documents Frequency Analysis • Author Name • Source Title • Year • Affiliation • Affiliation Country • Subject Area
Figure 5 shows the detailed steps to obtain the dataset of 2730 records from Scopus for this study.
Figure 5: Inclusion and Exclusion Process for LA Publications (2011-2018)
Table 5 details two phases of co-citation data collection procedure.
Table 4: Two Phases for Co-Citation Data Collection
Procedures Task
Co-Citation Data Search
• Use Scopus to query the dataset for co-joined documents to create a frequency
46
distribution of the co-citation pairs in the data set
Construct Co-Citation Matrix
• Build a co-citation table in VOSviewer • Provide a list of the top co-cited documents
Table 5 details four phases of keyword extraction procedure. Table 5: Four Phases for Keyword Extraction
Procedure Tasks
1. Keyword Extraction
• Use Scopus to extract keywords from titles, author keywords, and abstracts from the bibliometric data set. • Specify record fields for data exporting
2. Normalize Keywords
• Export keywords to VOSviewer for clean up • Sort – frequency of occurrence
• Edit for duplication
• Frequency Ranking-descending order • Select the top 20 for table
3. Co-word Search • Use Scopus to query database pairing each of the 100 most frequently occurring words and computing frequency of each pair throughout the set
4. Build Keyword Theme Map
• Use VOSviewer to Create Co-Word Analysis Map
4.5 Domain Analysis
The domain analysis methods for knowledge organizations were applied to the Learning Analytics documents. The approach used in this study is focused entirely on publication data records, as presented in Scopus (Table 6).
The analysis consisted of Learning Analytics documents from Scopus 2730 documents indexed in Elsevier over the period 2004-2018. The query was limited to published articles that included title, abstract, and author keywords as scholarly research requires the dissemination of scientific communication. The bibliographic records are stored in a tab-delimited file.
47
Table 6: Four Phases for Domain Analysis
Procedure Method Activity
Step1: Identify Domain 1.Define the scope of the Domain
2. Refine Domain – specific requirements 3. Refine Domain with Specific Design 4. Develop the Domain Model
5. Gather Re-useable work products Step 2: Construct a model Descriptive or
Prescriptive
Domain Taxonomy Model Domain Reusable components Step 3: Specify the
components
Re-Use Method Reusable components Reuse Library
Step 4: Relate the Domain to other Domains
Identify related domains for knowledge sharing
4.6 Validity and Reliability
This research uses of co-citation analysis and co-word analysis to provide multiple perspectives of data relatedness. The most common validity concerns are the influence of author self-citation and similar author names. This study excludes the self-citations from the citation counts. The study also uses the author identifier provided in the Scopus data to distinguish author with similar sir names.
The quantitative analysis components of this study can be reproduced using the same methodology, which ensures the reliability of the process to create similar studies. The study analyzes research trends of learning analytics using bibliometrics, co-citation, and, co-word (keyword) analysis.
48
Chapter 5: Results
This chapter reports the research findings. First, it describes the bibliometric features of scholarly communication on learning analytics from 2004 to 2018. Then it presents the research trends and evolution for learning analytics. Finally, it reports the status of empirical research in the domain of learning analytics.
5.1 Results for RQ 1: What are the bibliometric features of research on LA?
The bibliometric metadata on learning analytics from Scopus (publication year, title, author, subject area, reference, affiliations country, and keywords) were used to answer this research question.
5.1.1 Publication Counts Over Time
Table 7: Publication Counts of Learning Analytics Over Time
YEAR COUNT 2018 547 2017 618 2016 547 2015 384 2014 304 2013 195 2012 103 2011 30 2010 1 2009 0 2008 0 2007 0 2006 0 2005 0 2004 1 TOTAL 2730
49
Table 7 and Figure 6: Learning Analytics Publications by Year and Period shows the publication number over time. The evolution of publications on learning analytics 2004-2018, revealed continual growth in publications on learning analytics after the Learning Analytics and Knowledge conference of 2011. This growth is associated with the increase in the volume of student data in the learning management systems. Additionally, technological advances in data storage and data retrieval have increased the availability and accessibility to learning analytics data.
Figure 6: Learning Analytics Publications by Year and Period 5.1.2 Key Publication Types on Learning Analytics
As shown in the analysis of documents by type (Table 9) Error! Reference source not f ound.publication of learning analytics is dominated by conference papers (77%) and followed
by journal articles (22%). As a relatively new field, scholars of learning analytics tend to publish their work at conferences.
Table 8: LA Publications by Document Type – ALL
DOCUMENT TYPES COUNT %
ARTICLES/DOCUMENTS (AR-DOC) 588 22%
BOOK & BOOK CHAPTERS (BOOKS) 10 0%
CONFERENCE PAPER & REVIEW (CONF) 2089 77% 0 100 200 300 400 500 600 700 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019