Applying Classifier Algorithms to Organizational Memory to Build
an Attrition Predictor Model
K. M. SUCEENDRAN1, R. SARAVANAN2, DIVYA ANANTHRAM3, Dr.S.POONKUZHALI4, R.KISHORE KUMAR5, Dr.K.SARUKESI6
1Research Scholar, Hindustan University, & Senior Consultant, Tata Consultancy Services, Chennai. 2Consultant,3Business Analyst, TCS, Chennai.
4Professor,5Assistant Professor, Dept. of IT, Rajalakshmi Engineering College, Chennai. 6Dean, Kamaraj College of Engineering & Technology, Virudhunagar.
INDIA.
[email protected], [email protected], [email protected],
[email protected], [email protected], [email protected]
Abstract:Employee Engagement is a strong factor in driving the growth of a business or an organization. Employ-ees who are committed to their respective organizations and are constantly motivated to work, contribute effectively to the growth of the organization’s business. On the other hand, disengaged employees are capable of digressing from organizational goals and influencing the others around too, impacting efficiency in all quarters. In spite of monetary benefits, there are other psychological and situational factors that impact an employee’s level of engage-ment. This paper discusses how exit interview details of employees can be combined with organizational memory to classify causative factors triggering attrition and construct digital persona of attrition-likely employees. In this paper, eight classifier algorithms are used to create and classify specific attrition clusters. These algorithms are evaluated on the basis of precision, recall, F-measure and ROC measure. The digital persona thus formed from the clusters can be applied to the entire workforce to identify attrition-likely associates. Additionally, an appropriate engagement strategy can be formulated to suit the possible reasons based on the classifier algorithm.
Key–Words:Attrition, Classification, Digital Profile, Organizational Memory, Tacit Knowledge.
1
Introduction
Employee connect is a critical attribute of the re-lationship between an employee and the enterprise. The research on employer-employee relationship is not new and the term ’employee engagement’ was de-fined in the early 1990s. William Kahn provided the first formal definition of employee engagement as ”the harnessing of organization members’ selves to their work roles; in engagement, people employ and ex-press themselves physically, cognitively, and emotion-ally during role performances”[6]. Over the years, the emphasis and perspective on this relationship have un-dergone many changes: In the 70’s it started with ’sat-isfaction’ which shifted to ’commitment’ in the 80’s. In the 90s, the process of engaging employees be-came the focus of HR practitioners, particularly in the knowledge enterprises[15]. The imperative of demon-strating the value of HR gave impetus to many studies to correlate employee engagement with tangible busi-ness outcomes. Many organizations with large work-force comprising knowledge workers are confronted with the challenge of effectively leveraging employee engagement. Apart from monetary benefits and
in-centives, there are other psychological and situational factors that affect an employee’s level of engagement [12]. A recent study covering 142 countries indicates that only 13 percent of employees across the globe are engaged at work. This translates into only about one in eight employees is actually committed to his job[5]. This study from Gallup reveals that most employees, 63 percent, are not engaged and 24 percent are ac-tively disengaged. Furthermore, this study also speaks of the employee engagement levels with respect to the Indian scenario. The study says only 9% of employ-ees are engaged, while 31% are actively disengaged. However there is large variation in engagement lev-els when comparing Indian employees of different job types and education background. Approximately 17-18% of people holding supervisorial, managerial, pro-fessional or sales related roles are engaged in their work[5].
Effective employee engagement depends on the successful connect between employees and organi-zational representatives including supervisors, senior leaders, HR personnel[7]. Consistent and effective communication with the employees is imperative for
employee engagement and retention[14]. Challenges exist in the form of nature and scale of workforce, type of connects, degree of subjectivity inherent in the process and more importantly, not leveraging the organizational memory about the employee involved in the connect. Employee Connects refers to vari-ous situations where an employee has an opportunity to connect and discuss with an HR personnel or a leader/supervisor. In this paper we have considered the exit interview data as our connects. Exit inter-views and discussion with the HR as part of sepa-ration process are done so as to understand the rea-sons of separation of the associate from the company. These connects being the last in an associate’s tenure with the company will bring out opinions and feed-back from the associate and the reasons will be useful for analysts to study trends of attrition and arrest the same effectively[11]. In this paper, exit interview de-tails from a leading Information Technology organiza-tion in India are taken as input data. The data is used for analyzing the various classification techniques us-ing WEKA data minus-ing tool[16]. In this evaluation process, different features are considered for choos-ing best algorithm which classifies the employees into different categories of profiles, according to the rea-son stated for leaving the organization. Finally perfor-mance evaluation is done to analyze the various classi-fication algorithms to select the best classifier for em-ployee exit details.
Outline of the paper:
Section 2 presents related works on employee attri-tion analysis using data mining algorithms. Secattri-tion 3 represents the Need for embedding K Elements in HR processes. Section 4 represents the framework of the proposed system, Section 5 gives results and discus-sion and Section 6 is the concludiscus-sion.
2
Related Works
Alao D. & Adeyemo A. B. analyzed the role of De-cision tree algorithms in their research on employee attrition. Employees’ demographic data and work re-lated data were used to classify employee details into attrition classes [1]. Waikato Environment for Knowl-edge Analysis (WEKA) and C4.5 for Windows were used to generate decision tree models and rule-sets. Decision trees used in data mining are of two main types namely, Classification tree analysis, where pre-dicted outcome is the class to which the data belongs and Regression tree analysis, where predicted out-come can be considered a real number. The term Clas-sification and Regression Tree (CART) analysis refers
to the above procedures. Alao et al, used the classi-fiers C4.5(J48), REPTree and CART (SimpleCart) de-cision tree algorithms. Attribute importance analysis was used to determine the significance of attributes. F-measure and the AUC probability tree learning mea-sures were used as evaluation metrics for the classifier models generated[1].
Decision trees are more flexible data modeling techniques than neural networks. Missing data values are managed by decision tree algorithms and models can be built with missing values, whereas in neural network and regression models, missing values need to be inserted. Jayanthi et al (2008) presented the role of data mining in Human Resource Management Systems (HRMS). The paper explains about datamin-ing techniques and the ability of these algorithms to improve decision making processes in HRMS, so as to sustain competitive advantage of the organization among competitors[4]. Hamidah et al suggested that in future works, attribute reduction should be con-ducted to identify relevant attributes for each of the factor[2].
In the study Employee Turnover Analysis with Application of Data Mining Methods, the analysis was carried out by use of decision trees, logistic re-gression and neural networks. The data models were built using SEMMA methodology. SEMMA (Sample, Explore, Modify, Model, and Assess) was developed by the SAS Institute.[13]
Hsin Yun Chang, in the study on Employee Turnover, explains about feature selection, also known as feature subset selection. Feature selection is a com-mon data preprocessing method. Yun Chang uses a mixed feature subset selection method in the study combining Taguchi method and Nearest Neighbour Classification rules to select and analyze factors and find best predictor of employee attrition[3].
S.Poonkuzhali et al. anayzed various classifica-tion algorithm for predicting efficient classifier for T53 Mutants[9]. R.Kishore Kumar et al. performed comparative analysis for predicting best classifiers for emails spam[8].
3
Need for Embedding K-Elements
and Creating Digital Persona
With many of the non-core HR functions getting pro-gressively digitized or outsourced, the no. of HR personnel actively involved in employee engagement is few in comparison to the size of the workforce. Knowledge-intensive organizations such as IT com-panies have one HR associate for every 1000 employ-ees. With such a skewed ratio, HR associates meet
Figure 1: K- Elements framework in Employee Connects
only a portion of the total associates, the number in accordance with their performance targets. Associates chosen by HR associates is mostly arbitrary and by convenience. There is no definitive process which is derived from the business imperatives or an enabling system which presents HR persons with a complete digital profile. In our earlier publication “Embed-ding K-Elements in HR Processes”, it was highlighted that data obtained from HR processes is not consoli-dated or seldom used in real time[10]. Though HR increasingly plays the role of a strategic partner and contributes also to operational management, there is often no formal framework for capturing, consolidat-ing, and continually refining tacit data arising from HR touch points. Major HR touch points are men-toring, career counseling, staffing, grievance sessions, and exit interviews. At each of these interactions, pertinent tacit details about the employee are elicited and used in the context of the process. These tacit details are mostly ’soft data’ and are subjective and time-specific in nature. The paper proposes to select set of associates in line with a specific business area in order to identify sample profiles for the business area. Example of business areas include predicting as-sociates for attrition (with respect to this paper), High-potentials, transfer, counseling and associates likely to become disenchanted or potentials for right-sizing.
4
Framework for the Proposed
Sys-tem
The K-Elements framework in Employee Connects is shown in Figure 1.
Employees who initiate separation from the or-ganization, are reached to in exit interview sessions. Here the feedback of the employee regarding the or-ganization and the primary reason for leaving are
dis-cussed. These tacit details are recorded by the HR and consolidated along with the employees demographic and job related details. These exit details for a certain period are generated and preprocessed for removing inconsistencies and missing values. Classifier algo-rithms are applied onto the data and the employee pro-files are classified. Once the best classifier algorithms are identified, and applied to the data, the attributes of employees falling under different categories and groups can be studied. The employee exit details are analyzed on the basis of their reason furnished for leaving the organization and the classifier algo-rithm categorizes the profiles into relevant groups. A few sample profiles representing each particular group can be obtained. Thus, the sample profiles of em-ployees most likely to exit the company can be con-structed. These digital personae or sample profiles, can be used as a reference to select the next set of ployees to conduct employee connect sessions, em-ployees whose profiles are similar to the digital pro-files identified by the classifier algorithm.
5
Results and Discussion
Employee exit interview details are collected from a leading IT organization, during the year 2013 which consist of 2572 records with 13 attributes (12-input attributes,1-target attribute) and are listed in Table 1. The Employee exit dataset is loaded into the WEKA data mining tool after preprocessing. Then 8 clas-sification algorithms namely ID3, J48, Naïve Bayes, Bayesian Network, K Star, IBK, Random tree and Random Forest are applied. The results of these 8 classification algorithms are depicted in Table 2. The results portrayed exhibit accuracy, precision, recall, F-measure and ROC value. The performance of all these classifiers are analyzed based on the accuracy to predict the best classifier. Here, IBK an instance
based classifier and Random tree an ensemble clas-sifiers have been identified as a best classifier for this Employee exit interview dataset as they classified with an accuracy of 97.74%.
Table 1: Attribute List of Exit Interview Data Set S.No Attribute List
1 Gender 2 Grade 3 Age 4 Designation 5 Business Unit 6 Resignation Quarter 7 Account/Customer 8 Total Experience
9 Performance Rating on confirmation 10 Last Performance Rating
11 Qualification
12 Role
13 Primary reason for leaving the organization
Table 2: Results of various classifiers
Classifiers Accur Preci Reca F_M ROC
ID3 41.21 0.52 0.41 0.35 0.82 J48 52.09 0.59 0.52 0.47 0.84 Naive Bayes 40.80 0.40 0.42 0.34 0.80 Bayes Net 45.87 0.45 0.45 0.39 0.83 KStar 97.4339 0.97 0.97 0.97 1 IBK 97.7449 0.97 0.97 0.97 1 R_Tree 97.74 0.97 0.97 0.97 1 R_Forest 95.91 0.96 0.95 0.95 0.99 Legend Accur Accuracy Preci Precision Reca Recall F_M F-Measure
ROC Receiver Operating Characteristic
R_Tree Random Tree
R_Forest Random Forest
Once the best classifier algorithms are identified, and applied on to the data, the attributes of employees falling under different categories and groups can be studied. The employee exit details are analyzed on the basis of their reason furnished for leaving the organi-zation and the classifier algorithm categorizes the pro-files into relevant groups. Thus, the sample propro-files of
employees most likely to exit the company can be de-duced and formed. These digital personae or sample profiles, can be used as a reference to target the next set of employees to conduct employee connect ses-sions. Employees whose digital persona match with the attrition likely profile can be identified and con-nected with for proactive engagement. Thus random-ness and unstructuredrandom-ness of the Employee Connects session can be mitigated and potential attrition can be arrested considerably.
5.1 IBK(K-Nearest Neighbor):
IBK is a K-Nearest Neighbor classifier that uses that same distance metric for classification. It is an instance based classifier in which the class of the test instance is based on the class of those training instances similar to it as determined by the similarity function based on distance. This algorithm uses normalized distances for all attributes. An object is classified by a majority vote of its neighbor, with the object being assigned to the class most common among its K nearest neighbors.
Pseudo for K-Nearest Neighbor Classification Algorithm:
Input:HRMS Training dataset. Output: Decision tree.
Step 1: Determine the parameter K.
Step 2: Compute the distance between the query-instance and all the training attributes using Euclidean distance algorithm. D(q, p) = v u u t n X i=0 (qi−pi)2 (1)
Step 3: Sort the training records based on the distance. Step 4: Find the nearest neighbor based on the Kth minimum distance.
Step 5: Get all the Categories of the training data for the sorted value which fall under K.
Step 6: Predict the measured value using the majority of nearest neighbors.
5.2 Random Tree:
Random tree is an ensemble classifier that consist of many class of many decision trees and outputs the class that is the mode of class output by individual trees. Each tree is fully grown and not pruned. It runs efficiently on large data bases. For predicting a new record is pushed down the tree. It is assigned the label
of the training sample in the terminal node it ends up in. This procedure is iterated over all trees in the ensemble, and the average vote of all trees is reported as random forest prediction.
Pseudo for Random Tree Algorithm: Input:HRMS Training dataset.
Output:Stereotyping classification of Employees. Step 1: For 1 to N do (N -Number of employee records in HRMS dataset D)
Step 2: Select ‘m’ input attributes at random from the ‘n’ total number of attributes in dataset D.
Step 3: Find the best spilt point among the ‘m’ attributes according to a purity measure based on Gini index. G= 2Pn i=1 iyi nPn i=1 yi −n+ 1 n (2)
Step 4: Spilt the node into two different nodes on the basis of split point.
Step 5: Repeat the above steps for different set of records to construct possible decision trees.
Step 6: Ensemble all constructed trees into single forest for classifying stereotyping digital profiles of HRMS data.
5.3 Sample Rules from Random Tree Classi-fier
If Designation = Middle Level Manager1 and Gender = Male
and Resignation Quarter = QN YYYY and Rating on Confirmation = Average
and Role = Design Lead then class = Better career opportunities
A. Accuracy
Accuracy of a classifier was defined as the percent-age of the dataset correctly classified by the method. The accuracy of all the classifiers used for classifying employee exit interview dataset dataset is represented graphically in Figure 2.
Accuracy = N o. of correctly classif ied samples T otal N o. of samples in the class
(3)
Figure 2: Accuracy of various Classification Algo-rithms
B. Recall
Recall of the classifier was defined as the percentage of errors correctly predicted out of all the errors that actually occurred.
Recall= T rueP ositive
T rueP ositive+F alseN egative (4)
C. Precision
Precision of the classifier was defined as the percent-age of the actual errors among all the encounters that were classified as errors.
P recision= T rueP ositive
T rueP ositive+F alseP ositive
(5)
D. F-Measure
F-Measure is defined as the measure of a test’s accu-racy and can be interpreted as a weighted average of the precision and recall.
F−M easure= 2∗ P recision∗Recall P recision+Recall (6)
E. ROC
Receiver Operating Characteristic curve (or ROC curve) provides a graphical plot of the true positive rate against the false positive rate for the different pos-sible cutpoints of a test. Since the curve follows the left-hand border and then the top border of the ROC space, the classifier is more accurate. The terms pos-itive and negative refer to the classifier’s prediction,
and the terms true and false refer to classifier’s ex-pectation. The precision, recall, F-measure and ROC value for the IBK and Random tree classifiers is de-picted in Figure 3 & Figure 4.
Figure 3: Performance Analysis of IBK classifier
Figure 4: Performance Analysis of Random Forest classifier
6
Conclusions and Future Work
The employee exit interview details have been clas-sified through IBK and Random Tree classifier suc-cessfully in this work. These classifiers have catego-rized the employer profiles into relevant groups based on the reason furnished for leaving the organization. Thus, the sample profiles of attrition-likely employees can be deduced and formed from these groups. The following are the research outcomes:• Design of the K-Elements framework in Em-ployee Connects
• Evolving of the best classifier algorithm for em-ployee exit data.
• Performance analysis on various metrics such as Precision, Recall, F-Measure and ROC for the best classifier.
Further, these digital persona thus constructed attrition-likely employees can be applied to the entire workforce to predict attrition-likely associates. Ad-ditionally, an appropriate engagement strategy can be formulated to suit the possible reasons based on the classifier algorithm. In future, this approach can be iteratively improved and scaled up as and when better employee connects data is generated and recorded.
References:
[1] Alao D. and Adeyemo A. B., Analyzing Em-ployee Attrition Using Decision Tree Algo-rithms, Computing,Information Systems & De-velopment InformaticsVol. 4, No. 1,2013. [2] Hamidah J, AbdulRazak H, and Zulaiha A. O,
“Towards Applying Data Mining Techniques for Talent Managements”, International Con-ference on Computer Engineering and Applica-tions, IPCSIT Vol. 2, IACSIT Press, 2009. [3] Hsin-Yun Chang, “Employee Turnover: A
Novel Prediction Solution with Effective Feature Selection”,In WSEAS Transactions on Informa-tion Science and ApplicaInforma-tions., Vol. 6, No. 3, 2009, pp. 417- 426.
[4] Jayanthi, R., Goyal, D.P., Ahson, S.I., “Data Mining Techniques for Better Decisions in Hu-man Resource Management Systems”, Interna-tional Journal of Business Information Systems, Vol. 3, No. 5, 2008, pp. 464-481.
[5] Jim Clifton “State of the Global Work-place”, Gallup Inc, 2013, Available: http://ihrim.org/Pubonline/Wire/Dec13/Global WorkplaceReport_2013.pdf
[6] Kahn, William A , “Psychological Conditions of Personal Engagement and Disengagement at Work”,Academy of Management Journal, Vol. 33, No.4, 1990, pp. 692-724.
[7] Kim Harrison, “Good Communica-tion can hugely lift employee engage-ment”, Cutting Edge PR, 2013 Available: http://www.cuttingedgepr.com/articles/good- communication-can-hugely-lift-employee-engagement.asp
[8] Kishore Kumar. R, Poonkuzhali. G, Sudhakar. P "Comparative Study on Email Spam Classifier using Data Mining Techniques",Proceedings of International Multiconference on Engineers and Computer Scientist, Vol.1, 2012.
[9] Poonkuzhali. s, Geetha Ramani, Kishore Kumar. R, "Efficient Classifier for TP53 Mutants using Feature Relevance Analysis",Proceedings of In-ternational Multiconference on Engineers and Computer Scientist, Vol.1, 2012.
[10] Suceendran. K. M, Saravanan Ramachandran, Sarukesi K, “Embedding K-Elements in HR Pro-cesses” ,Advances in Engineering, Science and Management (ICAESM), 2012.
[11] Susan M Heathfield, “4 Tips for Effective Exit Interviews”, Human Resources, Available: http://humanresources.about.com/od/when- employment-ends/fl/4-Tips-for-Effective-Exit-Interviews.htm
[12] Sylvia Vorhauser-Smith, “How the Best Places to Work are Nailing Employee Engagement”, August 2013, Available: http://www.forbes.com/sites/sylviavorhausersmith/ 2013/08/14/how-the-best-places-to-work-are-nailing-employee-engagement/
[13] Tamizharasi. K, Dr. UmaRani, “Employee Turnover Analysis with Application of Data
Mining Methods”, International Journal of Computer Science and Information Technolo-gies, Vol. 5, No. 1, 2014, pp 562-566.
[14] Thomas R Cutler, “Employee Engage-ment, retention and communication, Busi-ness Excellence”, August 2014. Available: http://www.bus-ex.com/article/employee-engagement-retention-and-communication [15] Tom O’Byrne, “History of employee
engagement - from satisfaction to sus-tainability”, HR Zone,2013, Available: http://www.hrzone.com/feature/people/history-
employee-engagement-satisfaction-sustainability/141048