Employee Churn Predection in 5 steps. Make USE of MACHINE learning to prevent your employees from leaving your organization.
Given that employees are one of the most valuable assets of any organization, losing an employee has a negative (detrimental) impact on several aspects of business activities. Loss of competence, worsened productivity and increased hiring costs are just a small fraction of the consequences associated with high employee churn. Making your best employees stay is in all aspects much better than having to hire new people to replace the ones that leaves.
At the same time, it is often not known why employees leave and how to prevent it. There are often many theories, but few based on actual facts. The decision to quit can stem from different sources and the reason to quit differs depending on the organization in question.
As a part of her master thesis at Random Forest, Anna Gentek has built an emplyee churn model for a customer in the healthcare industry and made some remarkable findings.
[There are several types of machine learning techniques that could be used to build an employee churn prediction model, including supervised, unsupervised, and reinforcement learning. This article focuses on supervised machine learning methods.]
Traditional churn prediction model consists of five main components. These are the following:
The first step in any classification problem is to collect relevant data. Since the goal is to model the behavior of former and current employees, the collected data should contain as many attributes, also called features, that describe an employee as possible. The dataset depends on the type of information that is available in company’s HR system, and therefore, it might differ for two different organizations and two churn prediction models.
30 features were extracted for the healthcare organization, including attributes such as occupation rate, salary, time at the company, job role or distance to work, to name a few.
The quality and the size of the acquired data set is essential when building an accurate churn prediction model. Insufficient or poor-quality data often results in a model that is not able to properly learn and detect hidden patterns and correlations in the data, leading to wrong predictions and lower accuracy. To assure the quality and desired format of the data, the second step is to apply required preprocessing techniques, including data anonymization, data cleansing, data encoding, scaling, and data transformation.
Higher number of attributes increases the dimensionality and complexity of the studied problem.
Therefore, the feature selection becomes crucial part when building churn prediction model. Removing the irrelevant, redundant, and noisy features reduces the dimensionality and complexity of the data set, leading to an increased learning performance, improved training time, and higher model interpretability. Depending on the nature of the studied problem, several feature selection methods, including filter, wrapper, embedded and hybrid methods could be used to identify and extract the most important features only.
Now that the acquired dataset is assembled and preprocessed, it is time to train the model. But which model should you choose?
Different models perform differently depending on the environment. The model you choose, should be adapted to the investigated organization and the size of dataset and the extracted features. Additionally, when trying to understand why employees leave, the models that provide more interpretability are of preference too. Therefore, a common approach is to apply several models and compare in order to find the optimal choice.
How do we evaluate the performance of the applied models? The applied models can be evaluated using several performance measures, depending on the studied problem. From the business perspective, recall, focusing on churning employees is said to be one significant measure for employee churn prediction model.