Study on Predictive Analytics for Predicting Human Productivity of an Organization Based on Employees’ Performance

(1)

19

Study on Predictive Analytics for Predicting

Human Productivity of an Organization Based on

Employees’ Performance

Karthick J Rubia Ahmed Niranjani R Pavan Kumar D, AP, Dept of CSE Student, Dept of CSE, Student, Dept of CSE, Student, Dept of CSE,

MVJCE, Bengaluru MVJCE, Bengaluru MVJCE, Bengaluru MVJCE, Bengaluru

Abstract— Human capital is of a high concern for companies’ management where their most interest is in hiring the highly qualified personnel which are expected to perform highly as well. Predictive Analytics is the advanced branch of data engineering. Generally, these analytics predicts some occurrence or probability based on data for predicting the future occurrence or events.The process involves an analysis of historic data. The main objective is to provide the performance appraisal report of an employee using Predictive Analytics. Performances are found by testing the attributes of an employee against the rules generated by the decision tree classifier. This paper concentrates on collecting data about employees using the user interface, generating a decision tree from the historical data, testing the decision tree with attributes of an employee. In this paper, Predictive Analytics techniques are utilized to predict the performance of employees. With the latest prediction algorithm, we will predict employees’ performance more efficiently than the existing system. We define the performance of a frontline employee, as his/her productivity comparing with his/her past performance. This paper has concentrated on the possibility of building two or three prediction algorithms for predicting the employees’ performance and picking the one best suited for the specific organization.

Index Terms—Data Mining, Employee Performance, Regression, Prediction, Decision Tree.

I. INTRODUCTION

Business Organizations are really interested to settle plans for correctly selecting proper employees. After hiring employees, managements become concerned about the performance of these employees were management build evaluation systems in an attempt to preserve the good performers of employees. Data mining techniques are analytical tools that can be used to extract meaningful knowledge from large data sets. Data mining is a powerful new technology with great potential in information system. It can be best defined as the automated process of extracting useful knowledge and information including, patterns, associations, changes, trends, anomalies and significant structures from large or complex data sets that are unknown. Data mining consists of a set of techniques that can be used to extract relevant and interesting knowledge from data. Data mining has several tasks such as association rule mining, classification and prediction, and clustering. The advanced branch of data engineering is Predictive analytics. Analytics predicts some occurrence or probability based on data. Predictive analytics encompasses a variety of statistical techniques from predictive modelling, machine learning and data mining. It analyses the current and historical facts to make predictions about future.

Classification techniques are supervised learning techniques that classify data item into predefined class label. It is one of the most useful techniques in data mining to build classification models from an input data set. The used classification techniques commonly build models that are used to predict future data trends.

II. DATA MINING AND ITS TECHNIQUES

Data mining techniques provides a way to use various data mining tasks which is divided into two namely predictive and descriptive. Classification, regression, time series analysis come under predictive branch while clustering, summarization, association rules and sequence discovery comes under the latter.

A. Classification

(2)

model based on the class label, and intentions to allocate a class label to the future unlabeled records. Since the class field is well-known, this type of classification is known as supervised learning. There are several classification models such as Decision Tree, Genetic algorithms, statistical models and so on.

B. Regression

Regression is a statistical process for estimating the relationships among variables. It includes many techniques for modelling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables (or 'predictors'). More specifically, regression analysis helps one understand how the typical value of the dependent variable (or 'criterion variable') changes when any one of the independent variables is varied, while the other independent variables are held fixed. Most commonly, regression analysis estimates the conditional expectation of the dependent variable given the independent variables – that is, the average value of the dependent variable when the independent variables are fixed. Less commonly, the focus is on a quantile, or other location parameter of the conditional distribution of the dependent variable given the independent variables. In all cases, the estimation target is a function of the independent variables called the regression function. In regression analysis, it is also of interest to characterize the variation of the dependent variable around the regression function which can be described by a probability distribution.

Regression is a data mining function that predicts a number. Age, weight, distance, temperature, income, or sales could all be predicted using regression techniques. For example, a regression model could be used to predict children's height, given their age, weight, and other factors.

C. Clustering

Clustering is a method of grouping data into different groups, so that the data in each cluster share similar trends and patterns. Clustering creates a major class of data mining algorithms. The algorithm tries to automatically partition the data space into a set of regions or clusters, to which the instances in the table are assigned, either deterministically or probability wise. The objective of the process is to identify all sets of similar instances in the data, in some best manner.

D. Association rules

Association rule is the descriptive model of data mining this enables us to establish association and relationship between large and classified data items based on certain attributes and characteristics. The result of Association rules can help prevent failures by some appropriate measures.

Fig 1

II. Related Study

Researchers like Qasem A. Al-Radaideh and Eman Al Nagi in their paper developed the classification model that tests certain attributes may affect job performance. Five steps are used to build the classification model those are: Business understanding, data understanding, data preparation, modelling, evaluation and deployment.

Data Classification Preliminaries: In general, data classification is a two-step process. In the first step, which is called the learning step, a model that describes a predetermined set of classes. In the second step, the model is tested using a different data set that is used to estimate the classification accuracy of the model.

Data Collection Process and Data Understanding: The classification model for predicting performance depending on a dataset from a certain IT company. So that any other factors regarding the working environment, conditions, management and colleagues would have similar effect on all employees.

Data Preparation: After getting the data preparing the data was accomplished. First, the information is converted to Excel. These files are prepared and converted to (arff) format to be compatible with the WEKA data mining toolkit (Witten et al., 2011), which is used in building the model. The variables used are age, gender, job title, performance etc as shown in the Table 1.

Qasem A. Al-Radaideh and Eman Al Nagi[2012] used three data sets for the experiment. Using the decision tree technique, a tree has been built for each of these experiments. Then gain ratio measure is used to indicate the weight of effectiveness of each attribute on the tested class, and accordingly the ordering of tree nodes is specified. The techniques are: The decision tree with two versions, ID3 and C4.5 (J4.8 in WEKA), and Naïve Bayes classifier. For each experiment, accuracy was evaluated using 10-folds cross validation.

The same procedure is followed for the all the data set and the conclusion is given, that several factors might have a great effect on employee performance. One of the most effective factors is the job title. The trend of effectiveness of the job title is not much clear in the results, since there are about 20 job titles studied, but it can be related to the type of job complexity and the responsibilities related to the title. High

Algorithm used Cross-validation 10-folds

% accuracy

C4.5 41.47%

Bagging 45.62%

(3)

responsibilities sometimes affect the employee’s motivation and therefore performance in a positive way.

Table 1

V.Kalaivani , Mr.M.Elamparithi [2014] have used different classification algorithms to predict the employeeperformance. The data set was obtained by questionnaires. The questionnaire was filled by 217employees. After the questionnaires were collected, the process of preparing the data was accomplished. First, the information in the questionnaires has been transferred to Excel sheets. Then, the types of data have been reviewed and modified. These files are prepared and converted to (ARFF) format to be compatible with the WEKA data mining toolkit.

This study has been made by applying decision tree classification algorithms to the employee performance prediction. A decision tree is a classification scheme which generates a tree and a set of rules, representing the model of different classes, from a given data set. Experiments were conducted with the data collected from an institution. C4.5, Bagging and Rotation Forest algorithms are used in this system.

The results are as shown below.

Training set test

Considering the above results Rotation forest algorithm has better performance than other three algorithms because it has maximum accuracy value 51.46% for cross validation and 100% for training set test option. Therefore, it was concluded that the Rotation forest Algorithm is more efficient algorithm for employee’s performance prediction when compared to other two algorithms

.

UnivType The type of university of graduation Public, Private

GSpecial General Specialization Business, IT, English Literature, Engineering, CS, Other

Degree Employee Education Degree Diploma, Bachelor, High Diploma, Master, PhD

Grade Employee Graduation Grade Excellent, Very good, Good, Acceptable, Other

Country Country of University This attribute was eliminated, since majority of the values were Jordan

**Expyears No. of Working Experience Years a, b, c, d, e

PrevCo No. of Previous Companies the employee

worked for 0,1,2,3,4,5

JobTitle Employee’s Job Title in the current

company

Developer, Officer, QA, Data Entry, System Administrator, Office Manager, Technical Writer, Technical Manager, Software Engineering, Accountant, Infrastructure Engineer, Department Manager, software Architect , Analyst, Designer, Trainer, PM, Consultant, Customer Support, GM.

Rank Employee’s Rank in the current company Junior, Senior, Team Leader, Manager, Architect

***ServPeriod Service Period in the current company (in

years)

a, b, c, d

Working hours No of working hours Full Time, Part Time, Other. This attribute was eliminated, since most instances has Full time

value.

****SalRange Employee’s Range of Salary a, b, c, d, e

UncomWorkcond Working in uncomfortable conditions (in

employee’s perspective) Yes, No

Dissatsalrank Existence of dissatisfaction in either salary

or rank

Yes, No

Performance Employee’s performance, either as

informed or predicted. This is a class

Accomplish, Exceed, Far Exceed

Attribute Description Possible Values

*Age Employee’s Age a, b, c, d, e

Gender Employee’s Gender Male, Female

MStatue Employee’s Marital Status Single, Married with kids, Married without kids, Other

NKids No. of Kids 0, 1, 2, 3, 4

Algorithm used Training set % accuracy

C4.5 84.79%

Bagging 75.57%

(4)

Mrs. M.S. Mythili, Dr. A.R.Mohamed Shanavas[2014] conducted similar experiments for students to analyse students’ performance using classification algorithms.

The students’ academic performance is influenced by various factors like parents’ education, locality, economic status, attendance, gender and result from the different school students.260 samples were taken for the implementation. The classify panel permits the user to use classification algorithms to the dataset, to estimate the accuracy of the resulting predictive model, and to visualize the model. The decision tree classifier C4.5 (J48), Random Forest, Neural Network (Multilayer Perceptron) and Lazy based classifier (IB1) Rule based classifier (Decision Table) were enforced in WEKA. Under the “Test options”, the 10 fold cross validation is chosen. The section presents the results generated from the study. The attributes were ranked in order of its importance using information gain and gain ratio measures. The ranking of each Attribute evaluators was done using ranker search method.

From the above set of rules an inescapable conclusion emerges the attendance is considerably related with student performance. From the rule set it was found that parent education, locality, gender, Economic Status, and different factors are of high potential variable that have an effect on students’ performance for getting good performance in examination result.

Osman M.Karatepe [2015] has revealed that specifically, frontline employees who receive sufficient support in the family domain are highly engaged in their work. These employees in turn are less inclined to leave the current organization. They also display in-role and extra-role performances at elevated levels in the workplace. In short, work engagement functions as a full mediator of the impact of family support on turnover intentions, job performance, and extra-role customer service.

Employees who feel energetic, are enthusiastic and are immersed in their work have desirable job outcomes such as reduced turnover intentions, quality performance in the workplace, and high levels of job and career satisfaction (Bakker & Demerouti, 2008; Karatepe, 2012, 2013a).

Data obtained from frontline employees with a two-week time lag in the international five-star chain hotels in Turkey are utilized to gauge the above mentioned relationships.

A. Hypotheses

Social support refers to "an interpersonal transaction that

involves emotional concern, instrumental aid, information, or appraisal". Family support is a non-work-related social support. As a role resource, family support can stimulate employees' work engagement. Specifically, family members such as spouses and siblings can show emotional support toward work-related issues. Employees who are capable of receiving sufficient instrumental support from family members can spend more time and energy. That is, they found family support to be significantly and positively related to work engagement. Accordingly, the following hypothesis is advanced:

H1: Family support will be positively related to frontline employees' work engagement.

Employees who are highly engaged in their work have heightened energy while working, are involved and feel happily.

H2: Work engagement will be negatively related to frontline employees' (a) turnover intentions and will be positively related to frontline employees' (b) job performance and (c) extra-role customer service.

H3: Work engagement will fully mediate the impact of family support on (a) turnover intentions, (b) job performance, and (c) extra-role customer service.

Emin Kahya[2007] made an investigation on the influence of Workplace conditions on job performance. This study reports the effects of job characteristics and working conditions in addition to experience and education level on task performance and contextual performance. And also adds that employees who work at the qualified jobs had higher performance than the others.

Those employees who were less educated perform the jobs in poor working conditions. Employees who expose to highly unpleasant environmental conditions also perform their jobs with intense requirement for physical effort.

K. Mohammed Hussain, 2P. Sheik Abdul kadher [2014] made study on Factors and Data Mining Techniques for Employee Attrition and Retention in Industries. According to it, the organization should identify the factors which make disappointment of employees like policy or norms and also find the areas where the company is lagging and also identify the reasons for attrition in Indian industries. Individual factors such as health problem, family related issues, children education and social prestige contributes in attrition intentions.

(5)

Thushel Jayaweera1 [2015] the study tested that the relationship between work environmental factors and job performance with work motivation. Results shows that both physical and psychosocial factors are needed to promote job performance of their staff.

In their study, Salleh et al. (2011) have tested the influence of incentive on job performance for state government employees in Malaysia. The study showed a positive connection in between relationship incentive and job performance. As people with higher affiliation motivation and strong relationships with colleagues and managers have a habit to perform much better in their jobs.

Jantan et al. (2010) had discussed in their paper Human Recourses (HR) system architecture to forecast an applicant’s talent based on information filled in the human resource application and past experience, using Data Mining (DM) techniques.

III.Conclusion

The employee data set platform has been experimented with various classification algorithms .The best so far has been rotation forest algorithm.

However with latest advancements in analytical regression algorithms, LASSO and ELASTIC NET have not yet been tested for employee based platforms.

Thus the implementation of the two algorithms are to be tested in terms of accuracy and minimal error rate for the proposed system.

Further any of the papers do not talk of the human productivity of the organization based on the employee performance, the proposed system is also intended to finding the same.

Acknowledgment

REFERENCES

[1] Qasem A. Al-Radaideh and Eman Al Nagi[2012]“Using Data Mining

Techniques to Build a Classification Model for Predicting Employees Performance,” in International Journal of Advanced Computer Science and Applications, Vol. 3, No. 2, 2012 pp. 144–151.

[2] V.Kalaivani , Mr.M.Elamparithi [2014], An Efficient Classification

Algorithms for Employee Performance Prediction International Journal of Research in Advent Technology, Vol.2, No.9, September 2014.

[3] Mrs. M.S. Mythili, Dr. A.R.Mohamed Shanavas[2014] “An Analysis

of students’ performance using classification algorithms” IOSR Journal of Computer Engineering (IOSR-JCE) e-ISSN: 2278-0661, p- ISSN: 2278-8727Volume 16, Issue 1, Ver. III (Jan. 2014), PP 63-69.

[4] Osman M.Karatepe [2015], “Osman M.Karatepe [2015] The eff ects of

family support and work engagement on organizationally valued job outcomes. Original scientifi c paper Osman M. Karatepe Vol. 63/ No. 4/ 2015/ 447 - 464 UDC: 338.488.2:640.41-057.5 (560)

[5] Emin Kahya[2007], “The effects of job characteristics and working

conditions on job performance” International Journal of Industrial Ergonomics 37 (2007) 515–523.

[6] K. Mohammed Hussain, 2P. Sheik Abdul kadher [2014], “A Review of

Factors and Data Mining Techniques for Employee Attrition and Retention in Industries ” International Journal of Recent Advances in Engineering & Technology (IJRAETM) , Volume-2, Issue -6,7, 2014

[7] Thushel Jayaweera1 [2015] “Impact of Work Environmental Factors

on Job Performance, Mediating Role of Work Motivation: A Study of Hotel Sector in England” International Journal of Business and Management; Vol. 10, No. 3; 2015.

AUTHORS BIOGRAPHY

Mr. Karthick Myilvahanan.completed his B.Tech (Information Technology) in 2007 from Anna University,Chennai,.M.Tech.(Information Technology ) in 2010 from Anna University, Coimbatore. Pursuing Ph.D degree from Himalayan university. He is working as an assistant professor in the Department of computer science and engineering at MVJ college of engineering,Bangalore. He is a member of ISTE. His research areas include Data Analytics, Big Data, machine learning and Artificial Intelligence and having 10 years of teaching experience in engineering colleges and Industry.

Ms. Rubia Ahmed is currently pursuing her final year in Computer Science

Engineering from MVJCE, Bangalore.

Ms.Niranjani R is currently pursuing her final year in Computer Science Engineering from MVJCE, Bangalore.