top of page
Search

Flashback: Pricing Analytics & Machine Learning

  • Writer: Phoenix Pricing
    Phoenix Pricing
  • Mar 9, 2021
  • 4 min read

ree

Pricing Analytics


Why is there a need for pricing analytics? What is the benefit? Simplest questions with not so obvious answers: customers first approach while knowing the most about your customers is of uttermost importance. Sadly this is the part where many companies put almost no effort in. Although companies have historical data about customers behaviour, the data is often structured in a way that forbids easy insights, not mentioning high volumes of historical data, where classic excel approach is failing to deliver. Often there are too many variables, explicit or implicit, to handle and so much possible outcomes that one can easily be lost in analysis of data. Such high information entropy is hard to effectively handle by pricing algorithm. The less predictable the event is, the more entropy it has. Trying to “condense” the information and lowering the entropy, would allow us to identify common patterns => clusters of related information about customers. Having apriori domain knowledge of customer behaviour, knowing what data features constitutes segments, we can “condense” historical data information by usage of aggregate functions getting general characteristics of segments. That’s common way to treat customer transactions; going from specific per transaction behaviour to more general behaviour based on segments. Addressing the issue increases visibility of pricing in organization and grants transparent setup of price differentiations. Examples of customer segments: 1) Geographic segmentation - international companies usually segment on country or continent level 2) Demographic - based on household income, age, gender etc. 3) Behavioral - customer tendency to repeat specific pattern over extended period of time; loyalty to brand, occasions - Christmas, Olympics. Make no mistake, segments are discoverable not only on customer level, there are segments in product data, competitor records to name few.


Machine Learning


Having no apriori knowledge of segments, the task is more complicated. Machine learning, specifically clustering can give us good insights into the problem. What is a cluster (segment) in the first place? Cluster is a grouping of objects with properties (features) that are similar given some metric (metric is clustering algorithm specific). Usually clusters shouldn’t overlap. Clustering is unsupervised learning algorithm, where given the metric, the algorithm is trying to find groups of related data in n-dimensional space. Let’s explain the clustering on specific algorithm - K-means.

K-means clustering is a type of unsupervised learning, where we classify data through apriori fixed number of K clusters. Each cluster center being initialized with 1 point in n-dimensional space, where n is the number of features we are considering. The centers are usually randomly selected with one condition, the centers should be as far away from each other as possible, this is due to metric algorithm is using. After the initial setup, the algorithm is behaving iteratively. 1) It assigns every data point to a nearest cluster center 2) calculates new cluster centers using data points assigned to clusters 3) Stops if total number of iterations is exceeded or if no data points were reassigned to different cluster center, otherwise goes to step 1. Algorithm is using Euclidean distance metric - minimizing the within-cluster sum of squares (variance).


Let’s, as an example, construct artificial two dimensional dataset with 3 groups that is going to be fed to K-means algorithm. It’s worth mentioning that often the problem is multidimensional and we have chosen 2 dims for the purpose of straightforward explanation.


Input Data


We are going to use two Gaussian normal distributions as input data. All code below is pseudo code.


1) Generate 300 uniformly distributed random values u1, u2 in (0,1) interval



val u1 = scala.util.Random.nextFloat
val u2 = scala.util.Random.nextFloat

ree

Fig I: Uniformly distributed values [u1,u2]



2) Use Box-Muller method to create two independent standard one-dimensional Gaussian samples



val x = sqrt(-2*log(u1))*cos(2*3.14159*u2)
val y = sqrt(-2*log(u1))*sin(2*3.14159*u2)


ree

Fig II: Box-Muller transformation displayed as [u1,x]



ree

Fig III: Box-Muller transformation displayed as [u2,y]



ree

Fig IV: Box-Muller transformation displayed as [x,y]



3) Adjust points with following mean and deviations to create 3 clusters


Adjust points in (  0, 100> interval to (x(mean,deviation), y(mean,deviation)) = (x(500,50), y(500,50))

Adjust points in (100, 200> interval to (x(mean,deviation), y(mean,deviation)) = (x(400,50), y(400,50))

Adjust points in (200, 300> interval to (x(mean,deviation), 
y(mean,deviation)) = (x(200,50), y(800,50))


ree

Fig V: Gaussian clusters



At this point we have the data where we can visually confirm there are 3 clusters using our brain as segmentation engine. Note that for K-means algorithm this is only set of (x,y) values.


K-Means


We are going to run unsupervised learning algorithm twice. For k=3 and for k=2, both stopping after 10000 iterations. After learning phase we switch to prediction phase feeding same data in, with algorithm producing classification number of cluster each (x,y) pair is attached to.



ree

Fig VI: K-Means predictions for k=3, 10000 iterations




ree

Fig VII: K-Means predictions for k=2, 10000 iterations



Results are quite impressive, algorithm identified requested number of clusters, indeed. Of course, as this is exploratory tool, the pricing data scientist has to look further into the results (features) and validate the clusters are mapping well to the real world scenarios data represents. Another point to consider is that K-means algorithm has in-build limitations too. Due to fact it is minimizing the within-cluster sum of squares, it’s prone to produce incorrect results when we have non-spherical data (one circle of points inside other circle of points) or for instance when the clusters are unevenly sized. Even with those limitations it’s valid segmentation tool.



 
 
 

Comments


© 2025 by Phoenix Pricing s.r.o.

bottom of page