• No results found

A Comparative Study of Association Rule Mining Algorithms in Era of Educational Data

N/A
N/A
Protected

Academic year: 2020

Share "A Comparative Study of Association Rule Mining Algorithms in Era of Educational Data"

Copied!
6
0
0

Loading.... (view fulltext now)

Full text

(1)

Volume 9, Issue 2, February 2020, ISSN: 2278 – 1323

A Comparative Study of Association Rule Mining

Algorithms in Era of Educational Data

Rajeev Sharma, Research Scholar

Dr. Sitendrea Tamrakar, Associate Professor,

AISECT University, Bhopal, M.P., India

Nalla Malla Reddy Engineering College,

[email protected].

Hyderabad, India

[email protected].

Abstract—As the student begins an educational journey, his conduct and his overall class performance are influenced by several factors. In the educational data, it is possible to extract confidential information from a large amount of data. Learning is a technique used in the mining of education rules.The methods of data mining are useful for the extraction of the reality of the educational system. This data can be used for the development of the educational system and making it a high-class education. The common items in a dataset are found. The items that frequently occur in a database are frequent items. We use four algorithms in this paper: aperture algorithm, éclat algo, declat algo and EDM algorithm for FP-growth. This output is compared based on time efficiency or its parameters which define the effectiveness of this algos in DM.

Keywords—Data Mining, Education data mining, FP Growth,Apriori,Eclat Dclat.

I. INTRODUCTION

DM is a technique of extricating interesting taking in or designs by huge databases. There are a few procedures that have been utilized to discover this kind of information, the vast majority of them coming about because of machine learning and measurements [1]. Association Rule Mining (ARM) is a popular & well-researched mining technique. In the last decades, many scientists or practitioners from different fields have been working on data mining algos to explore relationships between attributes of big databases in order to create acceptable laws. These rules combine one or more of a dataset's attributes with the other between large database sets, which generate an if-then statement of attribute values. implementation of these methods is typically not an easy process, then involves following five steps of the KDD process to be implemented. In educational content, ARM has been successfully applied[2].

.EDM addresses the development of methods for the

exploration of data from an educational background. In order to investigate education issues, EDM uses computational methods to analyze education data. The results can not only be used to learn learning process models [3] or student modeling [4] but for evaluating or enhance e-learners[5] infinding valuabledata from learning portfolios. The results can be used[6].

II. ANINCEPTION OFEDUCATIONALDATAMINING EDM is rising as an education field in that students learn from a range of computer and mental technologies and research approaches. EDM in data mining retrieve safe learning by using an arrangement of data mining, For example, clustering, rule mining, Web-based mining, testing, a neural network, bay network as well as other distinctive systems, that give us last outcomes.[7]

A. Application Area Of Education Data Mining

This segment provides a brief about several areas of application [8]

.

1. User modeling: It usually includes a learner's understanding, a user's experience, a user's inspiration, and a learner's nature, as well as how much online learning satisfies the model's user or learner.

2. User knowledge modeling:It relates to a material known to a student as Modeling of User Knowledge in terms of specific skills set, conceptual experience and elevated thinking capacity. It is possible to extract understanding from the collected information, representing the interaction of the student with a learning scheme.

3. User behavior modeling:It relates to the student data-mining needed to search for the learning habits of the student. It's also an element of student engagement.

4. User profiling:It relates to the use of some strings to gather comparable consumers into a category. These categories are helpful in providing user experiences or user recommendations.

(2)

Volume 9, Issue 2, February 2020, ISSN: 2278 – 1323

B. EDM Methods

The next imperative task is to identify trends and features which will enhance pedagogical decision-making and the overall performance of the students after collecting student data. To order to achieve that very goal, DM strategies have been applied effectively to education, known as EDM. Prediction and classification are some very well-known methods of EDM: reversion, latent estimates of knowledge, pattern mining, mining association rules, the discovery of structures and distillation.[9].

Although the underlying principle remains unchanged, it distinguishes them from standard DM methods by a statisticthat EDM methods explainable seek prospects to exploit not only multi-level hierarchy in education data nonetheless non-independence they show. The paper is not covered by an overview of all mining techniques in the EDM sector. This paper focuses on the discovery of links between the attributes of students. Therefore, the following section offers an introduction to ARM, accompanied by an illustration of how ARM can be used for education.[10].

III. ASSOCIATIONRULEMININGALGORITHMINEDM ARM is one of the most basic or enjoyable data mining systems. It focuses on isolating interesting associations, frequent patterns, associations or informal structures from various items of transaction databases or data vaults. There are some algorithm implementing in EDM environment are follows

A. FP Growth

FP Growth [11] is another basic frequent pattern basic FPM strategy, which produces visit itemset without candidate age. It uses a tree-based structure. With regard to this framework, an FP pattern fragment growth technique was generated in view of the APARI algorithm, by the display of new, minimal data structure, known as frequent pattern tree, or FP[11].

FP Growth Algorithm for Education data mining: 1. Start the process

2. Open the dataset and take it as input

3. Choose Nominal to Binary from preprocessing 4. Apply FP Growth in the preprocessed data

5. Database examine performed to decide the support of each object, discard the rare objects and type the frequent items in lowering order

6. Scan the data set one exchange at an opportunity to make the FP-tree. For every transaction:

a) If it's miles a unique transaction shapes a new path and sets the counter for each node to at least one.

b) If it shares a commonplace prefix itemset then increment the common itemset node counters and create new nodes if wished

7. Continue this till every transaction has been mapped unto the tree

8. Stop the process

B. ECLAT

Eclat considers bottom elements like the first quest depth. Eclat's algorithm to locate frequent itemsets is very simple. The vertical database is used for this algorithm. The horizontal version can not be used. If a horizontal database is open. The database does not need to be searched regularly. The database is only searched once by the Eclat algorithm. In this algorithm, support is counted. Confidence in this algorithm is not determined.

C. DECLAT

Instead of using diffset, a distinction of 2 generated items is often used to represent the generated object.specific transaction IDs of 2 generating itemsets. Diffset significantly reduces the cardinality of itemsets, leading to faster cross-section as well as less memory procedure[20]. Enable t(P) to be P's tidset. A d(PX) diffset is a set of the t(P) but not t(PX) transaction IDs formally: d(P X)= t(P)-t(P X) − t(P) − t(P) − t(X). Provided there is available t(P X) and t(P Y), PX and PY are in the same conditional prefix P database, with the aim of computing the support(P XY). Support(P XY)= support(P X) − XY) by definition. To calculate the P XY support, it is therefore only required to calculate d(P XY). It has been shown that d(P XY)= d(P Y) − d/(P X)= t(P X) − t(P Y) can be used to generate diffsets or mindsets of a generated items set and supports them.

D. Apriori Algorithm

Apriori main success is to prune a lot of unnecessary articles that minimize computer time. In recent decades, computers have been used in teaching and learning with various e-learning systems. However, there has always been a need for improvement. Students learn, examine content or leave the log information trail. The study issue is important: What can we do with this information? Such parameters can be easily documented and analyzed in the e-learning system. ProcedureApriori (T, minSup){

//T is dataset &minSup is min support value Ck: Candidate itemset of size k

Lk: frequent itemset of size k

L1= {frequent itemsets

for(k=1; Lk!= φ; k++) do begin

Ck+1= candidates generated from Lk;

for every transaction t in dataset do{

increment count of all candidates in Ck+1 which are checked in t

Lk+1= candidates in Ck+1with min_sup

(3)

Volume 9, Issue 2, February 2020, ISSN: 2278 – 1323 IV. LITERATURESURVEY

This work is mostly linked to learning paradigms in educational data from various ARM algos.

Jalota and Agrawal analyzed a research on various DM techniques that are useful for predicting student performance levels were discussed. In order to evaluate the DM process, we have used Kalboard 360 and use it on weka. Institutes of higher education are often quite concerned about the success rates of students during their studies. Therefore, they must apply several methods for predicting the student's performance such as physical examination, statistical methods or existing DM techniques. EDM is a forthcoming research area using DM techniques. It comprises algos for machine learning or statistical techniques to help the user interpret learning habits, and academic performance so if needed improve further[12].

Kwuimi and Ramaphosaet described four classifications showcase comparisons and the algorithm of the best performance was found between them. Based on results we find J48 algos to surpass 99.13% prediction accuracy in other algorithms. keyimpartial of educational institutions is to further enhance the value of education. In the prediction of the performance of learners, useful information obtained from EDM systems. EDM is a new DM technology used only to enhance education quality. academic records for students were used for Naive Bayes, BayesNet, JRip& J48 classification algos consuming Weka.[13].

Doko and Bexheti Develop a systematic mapping studio to explore or analyze existing research on learning strategies or algos implementation. We have chosen 122 documents and picked the video fields of interest and technologies used in FC according to the field of interest. a traditional flipped classroom is distinct on the basis of flipped class video instructions at home. In addition, students have time in class for important learning experiences with their teachers. Increasing attention has been paid to the use of educational data mining or analysis by introducing a Flipped Classroom model as well as the speedy development of data studies to take advantage of learner data for optimizing the learning process. Large amounts of data from the actions of students in FC have ended it possible to perform different learning and EDM algorithms based on data and autonomous learning[14].

Hidayatetcconduct a study of student motivation in an e-learning development environment. In the Distance Learning System run by the second row of APTIKOM Consortium, we use a list of learning program events. AR& classification strategies are applied in this analysis to define information patterns & to reorganize virtual training based on patterns that can be identified. A model of data preparation procedure, as well as its stages from Moodle log data as for scientists, can contribute to the expected result of this research.For future research, various data sets from

techniques are suggested for understanding diverse expectations for outcomes. The theory of EDM is a branch of a field of knowledge that has a close relationship with my education data. EDM mainly uses computational approaches to analyze educational figures for the collection of information on learning contexts. With the learning management system, EDM can achieve learning materials & environment[15].

Angra and Ahujafocused on DM applications in education or implementation of three widespread Rapid Miner DM techniques on data collected through a survey. In conjunction with traditional methods known as EDM, data extraction offers a new step for data analysis using machine learning techniques. In order to find strategies for improving education quality or finding various patterns in education settings, EDM was found to be an interesting and useful research field. It is valuable in mining data from students, teachers, classes or educational institute managers, like schools, colleges, and universities and can offer various stakeholders interested in learning experiences[16].

S

No. Author Title Idea in Paper

1 J. Luis

Cavalcanti Ramos et al. A comparative analysis of educational data mining clustering approaches

clustering in specific groups of students by use of hierarchical and non-hierarchical methods in terms of interaction or performance[17 ]. 2 S. Roy &

S. N.

Singh al

Current trends in large data applications in

EDM or

learning analysis

In order to mine these huge pieces of data, Big data technology

entered the

educational field

3 Q. Liu et

al. Online miningdiscussion data for teachers ' understanding of reflection

A large or unexplored online text data set of trained classification models was provided or two kinds of results visualizations were given.

4 C. Jalota& R. Agrawal

Classification

[image:3.595.327.564.306.628.2]

study in EDM Machinealgorithms learningor statistical techniques are included to help the user understand the learning patterns of students[12 ]. Table 1:Shows the ideas of literature& related

Overview

V. COMPARISIONRESULTS

(4)

Volume 9, Issue 2, February 2020, ISSN: 2278 – 1323

[image:4.595.37.271.51.237.2]

1. Confidence and Support Representation for FP Growth.

[image:4.595.335.541.70.271.2]

Fig. 1. Confidence vs execution time in FP Growth

Fig. 2. Support vs execution time in FP Growth

2.Confidence and Support Representation for Eclat.

[image:4.595.41.276.262.448.2]

Fig. 3. Confidence vs execution time in Eclat

Fig. 4. support vsexecution timein Eclat

3.Confidence And Support Representation For Declat.

Fig. 5. Confidence vs execution time in Declat

[image:4.595.335.537.306.474.2] [image:4.595.52.261.472.676.2] [image:4.595.325.549.506.684.2]
(5)

Volume 9, Issue 2, February 2020, ISSN: 2278 – 1323

-5.Comparison based on timeefficiency

Through comparing 3-time efficiency algorithms we found it takes less time for FP-Growth to produce regular educational data sets.

Fig.7. Comparison based on time efficiency

VI. CONCLUSION

Use of EDM offers a defining opportunity for education institutions to enhance their curriculum, leaning and conduct as students learn. Deleting connections between success attributes of students has tremendous potential to contribute to the comprehension of student mentality. In other areas, different algorithms could be used to show interest between data among educational data. The association rules generated by these four algorithms can be combined in order to create effective algos for better results for any application in real life. Algorithms for an efficient algorithm can also be combined. The algorithms are systemized and their output is evaluated on the basis of runtime and theory. the comparison shows that the different algorithms outperforms based on performance and time efficiency.

References

[1] Jeetesh Kumar Jain, Nirupama Tiwari, Manoj Ramaiya “A Survey: On Association Rule Mining” International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 1, January -February 2013, pp.2065-2069.

[2] Ougiaroglou, S., & Paschalis, G. (2012). Association Rules Mining from the Educational Data of ESOG Web-Based Application. Artificial Intelligence Applications and Innovations, 105–114. DOI:10.1007/978-3-642-33412-2_11 [3] Hamalainen, W., Suhonen, J., Sutinen, E., Toivonen, H.: Data

mining in personalizing distance education courses. In: World Conference on Open Learning and Distance Education, Hong Kong (2004)

[4] M.GönenandE.Alpaydın, “Multiplekernellearningalgorithms,” Journal of machine learning research, vol. 12, no. Jul, pp. 2211– 2268, 2011.

[5] A. Kumar, A. Niculescu-Mizil, K. Kavukcoglu, and H. Daumé, “A binary classification framework for two-stage multiple

kernel learning,” in Proceedings of the 29th International Conference on International Conference on Machine Learning. Omnipress, 2012, pp. 1331–1338.

[6] W. Lin and G. Chen, "Large Memory Capacity in Chaotic Artificial Neural Networks: A View of the Anti-Integrable Limit," inIEEE Transactions on Neural Networks, vol. 20, no.

8, pp. 1340-1351, Aug. 2009.

DOI: 10.1109/TNN.2009.2024148

[7] Kumar, J. (2015). "A Comprehensive Study of educational Data mining". IJEECSE.

[8] Rashi Bansal et al, “Mining of Educational Data For Analysing Students’ Overall Performance”, 7th International Conference on Cloud Computing, Data Science & Engineering – Confluence, 2017, pp. 495-497.

[9] A. F. Meghji and N. A. Mahoto, “Using big data to improve the educational infrastructure and learning paradigm,” in Effective Big Data Management and Opportunities for Implementation. IGI Global, 2016, pp. 158–181.

[10] R. S. Baker, “Modeling and understanding students’ off-task behavior in intelligent tutoring systems,” in Proceedings of the SIGCHI conference on Human factors in computing systems. ACM, 2007, pp. 1059–1068.

[11] Han J., Pei H., and Yin. Y., Mining Frequent Patterns without Candidate Generation, In Proc. Conf. on the Management of Data (2000).

[12] C. Jalota and R. Agrawal, "Analysis of Educational Data Mining using Classification,"2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon), Faridabad, India, 2019, pp. 243-247. DOI: 10.1109/COMITCon.2019.8862214

[13] K. I. M. Ramaphosa, T. Zuva, and R. Kwuimi, "Educational Data Mining to Improve Learner Performance in Gauteng Primary Schools,"2018 International Conference on Advances in Big Data, Computing and Data Communication Systems

(icABCD), Durban, 2018, pp. 1-6.

DOI: 10.1109/ICABCD.2018.8465478

[14] E. Doko and L. A. Bexheti, "A systematic mapping study of educational technologies based on educational data mining and learning analytics," 2018 7th Mediterranean Conference on Embedded Computing (MECO), Budva, 2018, pp. 1-4. DOI: 10.1109/MECO.2018.8406052

[15] N. Hidayat, R. Wardoyo and S. Azhari, "Educational Data Mining (EDM) as a Model for Students' Evaluation in Learning Environment," 2018 Third International Conference on Informatics and Computing (ICIC), Palembang, Indonesia, 2018,

pp. 1-4.

DOI: 10.1109/IAC.2018.8780459

[16] S. Angra and S. Ahuja, "Implementation of data mining algorithms on student's data using rapid miner," 2017 International Conference on Big Data Analytics And Computational Intelligence (ICBDAC), Chirala, 2017, pp. 387-391.doi: 10.1109/ICBDACI.2017.8070869

[17] J. Luis Cavalcanti Ramos, R. EullerDantas e Silva, J. Carlos Sedraz Silva, R. Lins Rodrigues, and A. Sandro Gomes, "A Comparative Study between Clustering Methods in Educational Data Mining," inIEEE Latin America Transactions, vol. 14,

no.pp. 3755-3761, Aug. 2016.

DOI: 10.1109/TLA.2016.7786360

AUTHORS PROFILE

(6)

Volume 9, Issue 2, February 2020, ISSN: 2278 – 1323

References

Related documents

On the other hand, given that the purpose of the applied research development research is in a particular field and applied research is directed towards

Agroecosystems in North Carolina are typically characterized by a rich diversity of potential host plants (both cultivated and non-cultivated), and these hosts may

As mentioned earlier, in addition to investigating the effects of original Hill coefficient values on response parameters for different residual inducer

Several designs have been suggested during the last decade for efficient realization of reconfigurable FIR (RFIR) using general multipliers and constant multiplication

The study was conducted to provide information on the level of consumption and income from palm wine and other industrial beverages in Cross River State, Nigeria.. Data were

In addition, the overall buckling of the steel tube at the mid height was delayed and it was followed by crushing of resin and rupture of CFRP strips, for strengthened CFST

Buy-in from Lebanese teachers to develop and implement an SBBE program likely stems from their general belief in the maternal and/or infant benefits of breastfeeding, and in the

Improvement efforts targeted at low-risk atrial fibrilla- tion patients are expected have the biggest effect on overall compliance and potentially adverse outcomes since these