Flashback: Pricing Analytics & Machine Learning

Phoenix Pricing
Mar 9, 2021
4 min read

Pricing Analytics

Why is there a need for pricing analytics? What is the benefit? Simplest questions with not so obvious answers: customers first approach while knowing the most about your customers is of uttermost importance. Sadly this is the part where many companies put almost no effort in. Although companies have historical data about customers behaviour, the data is often structured in a way that forbids easy insights, not mentioning high volumes of historical data, where classic excel approach is failing to deliver. Often there are too many variables, explicit or implicit, to handle and so much possible outcomes that one can easily be lost in analysis of data. Such high information entropy is hard to effectively handle by pricing algorithm. The less predictable the event is, the more entropy it has. Trying to “condense” the information and lowering the entropy, would allow us to identify common patterns => clusters of related information about customers. Having apriori domain knowledge of customer behaviour, knowing what data features constitutes segments, we can “condense” historical data information by usage of aggregate functions getting general characteristics of segments. That’s common way to treat customer transactions; going from specific per transaction behaviour to more general behaviour based on segments. Addressing the issue increases visibility of pricing in organization and grants transparent setup of price differentiations. Examples of customer segments: 1) Geographic segmentation - international companies usually segment on country or continent level 2) Demographic - based on household income, age, gender etc. 3) Behavioral - customer tendency to repeat specific pattern over extended period of time; loyalty to brand, occasions - Christmas, Olympics. Make no mistake, segments are discoverable not only on customer level, there are segments in product data, competitor records to name few.

Machine Learning

Having no apriori knowledge of segments, the task is more complicated. Machine learning, specifically clustering can give us good insights into the problem. What is a cluster (segment) in the first place? Cluster is a grouping of objects with properties (features) that are similar given some metric (metric is clustering algorithm specific). Usually clusters shouldn’t overlap. Clustering is unsupervised learning algorithm, where given the metric, the algorithm is trying to find groups of related data in n-dimensional space. Let’s explain the clustering on specific algorithm - K-means.

K-means clustering is a type of unsupervised learning, where we classify data through apriori fixed number of K clusters. Each cluster center being initialized with 1 point in n-dimensional space, where n is the number of features we are considering. The centers are usually randomly selected with one condition, the centers should be as far away from each other as possible, this is due to metric algorithm is using. After the initial setup, the algorithm is behaving iteratively. 1) It assigns every data point to a nearest cluster center 2) calculates new cluster centers using data points assigned to clusters 3) Stops if total number of iterations is exceeded or if no data points were reassigned to different cluster center, otherwise goes to step 1. Algorithm is using Euclidean distance metric - minimizing the within-cluster sum of squares (variance).

Let’s, as an example, construct artificial two dimensional dataset with 3 groups that is going to be fed to K-means algorithm. It’s worth mentioning that often the problem is multidimensional and we have chosen 2 dims for the purpose of straightforward explanation.

Input Data

We are going to use two Gaussian normal distributions as input data. All code below is pseudo code.

1) Generate 300 uniformly distributed random values u1, u2 in (0,1) interval

val u1 = scala.util.Random.nextFloat
val u2 = scala.util.Random.nextFloat

Fig I: Uniformly distributed values [u1,u2]

2) Use Box-Muller method to create two independent standard one-dimensional Gaussian samples

val x = sqrt(-2*log(u1))*cos(2*3.14159*u2)
val y = sqrt(-2*log(u1))*sin(2*3.14159*u2)

Fig II: Box-Muller transformation displayed as [u1,x]

Fig III: Box-Muller transformation displayed as [u2,y]

Fig IV: Box-Muller transformation displayed as [x,y]

3) Adjust points with following mean and deviations to create 3 clusters

Adjust points in (  0, 100> interval to (x(mean,deviation), y(mean,deviation)) = (x(500,50), y(500,50))

Adjust points in (100, 200> interval to (x(mean,deviation), y(mean,deviation)) = (x(400,50), y(400,50))

Adjust points in (200, 300> interval to (x(mean,deviation), 
y(mean,deviation)) = (x(200,50), y(800,50))

Fig V: Gaussian clusters

At this point we have the data where we can visually confirm there are 3 clusters using our brain as segmentation engine. Note that for K-means algorithm this is only set of (x,y) values.

K-Means

We are going to run unsupervised learning algorithm twice. For k=3 and for k=2, both stopping after 10000 iterations. After learning phase we switch to prediction phase feeding same data in, with algorithm producing classification number of cluster each (x,y) pair is attached to.

Fig VI: K-Means predictions for k=3, 10000 iterations

Fig VII: K-Means predictions for k=2, 10000 iterations

Results are quite impressive, algorithm identified requested number of clusters, indeed. Of course, as this is exploratory tool, the pricing data scientist has to look further into the results (features) and validate the clusters are mapping well to the real world scenarios data represents. Another point to consider is that K-means algorithm has in-build limitations too. Due to fact it is minimizing the within-cluster sum of squares, it’s prone to produce incorrect results when we have non-spherical data (one circle of points inside other circle of points) or for instance when the clusters are unevenly sized. Even with those limitations it’s valid segmentation tool.

Phoenix Pricing ®

Flashback: Pricing Analytics & Machine Learning

Pricing Analytics

Machine Learning

Input Data

K-Means

Recent Posts

Comments