RFM model as a Simple Way to Evaluate Customer Value

The goal of this project was to rank the customers of an insurance company to improve customer care. The RFM model (Recency, Frequency, and Monetary value), which evaluates customers based on the recency, frequency, and monetary value of their purchases, was identified as a simple yet highly effective and easily interpretable technique.

Table of contents


Project Overview

Adapting the standard RFM model to suit our business invironment

Often, it is sufficient for a company’s marketing department to rank customers by their value to the company, rather than calculating the exact dollar amount the customers are likely to spend. This was the case with my client, Netrisk.hu, the largest Hungarian insurance brokerage company.

Since insurance industry typically operates in a subscription company-customer relationship, dimensions like recency and frequency are not the best modes of categorizing and ranking customers value from the company’s point of view. In a subscription business all time-related features become awkward or even meaningless. Recency has minor importance since most contracts are kept renewed automatically. Frequency is also problematic because frequent contract renewald do not necessary come together with higher spending. As a consequence of these facts I modified the standard RFM model to align with the operational context of my client. I kept monetary value in the model and replaced recency and frequncy with number and type of product ownership of the customer. With this step I think that the standard RFM framework was tailored to better fit the unique aspects of the company’s goals.


Results & Discussion


With the RFM scores in hand, the company will develop a segmented customer care strategy, prioritizing high-value customers with focused attention while allocating minimal effort to low-value customers.




Data Overview & Preparation


Base table creation:


Population to be analyzed:
  • I extracted the table that contains all contracts from the company’s relational database. (Customers identities are deduplicated in the system already.)


Filtering:
  • MTPL (Motor Third Party Liability Insurance), CASCO, Home inssurance contracts
  • Default on payment clients excluded
  • only private persons included


Created features:
  • sum_alltime_CASCO_MTPL_HOME_annual_fee_per_user: the length of relationship measured in number of years multiplied by the last annual fee. If the contract is still active then the end date of the contract is the current date;
  • sum_current_CASCO_MTPL_HOME_annual_fee_per_user: in case of an active contract the last annual fee;
  • current_num_of_insured_cars: number of vehicles that are covered at the moment of analysis;
  • length_of_relatonship_day: length of relatonship in days;
  • recency_cat: type of active insurance products the customer has: list of values = 1. only MTPL, 2. only CASCO, 3. only Home, 4. MTPL and CASCO, 5. MTPL and Home, 6. CASCO and Home, 7. all 3 types of insurance product, 8. has no any active insurance contract.


Detected outliers:
  • deleting customers having over 2.000.000 HUF cumulated annual fee amount;
  • deleting customers having over 10.000 days length of relatonship;


Aggregating the contract-level data into customer-level:
  • the following mysql snippet shows simple it is to get the customer-level data for the next steps:

DROP TABLE IF EXISTS rfm_data;
CREATE TABLE IF NOT EXISTS rfm_data
SELECT 	b.user_id_dedupl,
		coalesce(ROUND(SUM(b.annual_fee*DATEDIFF(coverage_end,coverage_start)/365),0),0) AS sum_alltime_CASCO_MTPL_HOME_annual_fee_per_user,
		coalesce(sum(CASE WHEN status = 'active' THEN b.annual_fee END),0) as sum_current_CASCO_MTPL_HOME_annual_fee_per_user,		
		count(DISTINCT b.reg_plate_number) AS num_of_insured_cars,
		count(DISTINCT CASE WHEN status = 'active' THEN b.reg_plate_number END) as current_num_of_insured_cars,
		case 
		when MIN(DATEDIFF(CURRENT_DATE,coverage_end)) = 0 then 
			case 
				when MTPL_active > 0 and CASCO_active = 0 and HOME_active = 0 then 'M'
				when MTPL_active = 0 and CASCO_active > 0 and HOME_active = 0 then 'C'
				when MTPL_active = 0 and CASCO_active = 0 and HOME_active > 0 then 'H'
				when MTPL_active > 0 and CASCO_active > 0 and HOME_active = 0 then 'MC'
				when MTPL_active > 0 and CASCO_active = 0 and HOME_active > 0 then 'MH'
				when MTPL_active = 0 and CASCO_active > 0 and HOME_active > 0 then 'CH'
				when MTPL_active > 0 and CASCO_active > 0 and HOME_active > 0 then 'MCH'
			end
		when MIN(DATEDIFF(CURRENT_DATE,coverage_end)) > 0 then 'canceled'
		end	as recency_cat,
		DATEDIFF(MAX(coverage_end),MIN(coverage_start)) AS length_of_relatonship_day,
		case 
		when last_channel = 'online' then 'online'
		else 'offline'
		end	as last_channel_cat
FROM base_table b
GROUP BY b.user_id_dedupl
;



RFM Model: Creating Recency, Product Ownershp and Monetary Value Dimensions


RM_Scores: Creating Terciles

RM_Score = recency_cat + m1_tercile, where m1_tercile is the a sum_alltime_CASCO_MTPL_HOME_annual_fee_per_user column assinged into terciles. For instance:

K1 = recency_cat[only MTPL] + m1_tercile[customer is in lowest spending tercile]


The Python Code that Creates RFM-dimensions

Data connection:

# Creating data connection
from sqlalchemy import create_engine
engine = create_engine('mysql+pymysql://dw_user:Abcd@10.00.00.00:5555/dw_user')
con = engine.connect()


Training data set:

# Copying the training table from mysql to python: 

import pandas as pd
import numpy as np
ds = con.execute("SELECT * FROM rfm_data")

rfm = pd.DataFrame(ds.fetchall())
rfm.columns = ds.keys()

# Closing data connection
con.close()


Converting some feauters into numeric in order to make them suitable for calculating terciles:

rfm['sum_alltime_CASCO_MTPL_HOME_annual_fee_per_user'] = rfm['sum_alltime_CASCO_MTPL_HOME_annual_fee_per_user'].astype('int64')
rfm['sum_current_CASCO_MTPL_HOME_annual_fee_per_user'] = rfm['sum_current_CASCO_MTPL_HOME_annual_fee_per_user'].astype('int64')
rfm['recency_cat'] = rfm['recency_cat'].astype('category')