Understanding Product Recommendations Using Market Basket Analysis
AI and Machine Learning-based recommendation systems play an ever-greater role in modern marketing. Every day Amazon, Walmart, Google, YouTube, and Netflix use recommendations to drive sales and user engagement—and these recommendations do that very well, since they're based on the vast amount of data these companies track for each person.
Typically, recommendations are generated by a machine learning model that analyzes purchase and browsing history to determine what else each user will be interested in. These are personalized recommendations. But what about situations where there's no data available about a customer's previous history?
Take the example of a quick service restaurant (QSR) with no loyalty system in place. Any customer who eats there will be a new or unidentified user to the system handling their order. Does that mean there's no way to generate worthwhile recommendations for them?
Not at all. Fortunately, we can still create quality unpersonalized recommendations using a machine learning tool called market basket analysis.
To use market basket analysis, all we need is impersonal transaction data, such as a transaction ID and which products were purchased. The algorithm analyzes the sum of all past transactions to determine which products are frequently purchased together, allowing us to make real-time, unpersonalized recommendations.
Suppose the algorithm determines that products A and B are often purchased together, and a customer has just placed product A in their shopping basket. The recommendation engine can now suggest B to that customer, as well.
This approach may seem simple, but it can produce powerful results. In the QSR use case above, for example, the system can make real-time suggestions to the cashier on additional menu items to recommend to the customer as they're building their order. As more items are added to the order, the recommendations will get more precise, leading to more upsells, a better customer experience, and less training needed for employees.
Can you think of some ways such a tool could grow your business? Let's take a look under the hood and see exactly how market basket analysis works.
Fundamentals of Market Basket Analysis
The main output of basket analysis is usually a set of rules about when to recommend what. For example:
- "If product A was purchased, then recommend product B."
- "If both products C and D were purchased, then recommend product E."
- "If product F was purchased, then recommend both products G and H."
We can write those rules a lot more quickly with a standard shorthand, like this:
- A → B
- C, D → E
- F → G, H
The items to the left of the arrow are called antecedents. The items to the right are the consequents.
Let's call the probability of finding a particular product, X, P(X). We can define that probability as follows:
number of transactions with X
total number of transactions
If we want to find a particular group of products, instead of just one, this definition will work just as well.
With that definition in hand, we can turn to how to evaluate our rules for what recommendations to make. There are several metrics, but three key ones are support, confidence, and lift.
The support of the rule A → B is the probability of finding product A and product B in a single transaction (written P(A, B)). So:
number of transactions with A and B
total number of transactions
In other words, support shows us how often products A and B were bought together.
Support is very important, because it tells us whether a given pair of products might be worth considering as a product set. A low value for support corresponds to low interest in buying A and B together (or might point to low odds of people buying A or B at all). A high support value implies high interest in the pair.
Generally, we don't want to apply rules that have low support, because they're not conducive to effective marketing. A promotional campaign that offers a discount on a pair of items that are rarely bought together probably won't be very successful. A product set that offbeat simply won't attract a lot of buyers.
What else can other relationships in impersonal data tell us?
Quite a bit. The confidence of a rule A → B tells us how frequently A and B have been purchased together among all transactions where someone bought A.
number of transactions with A and B
number of transactions with A
support (A and B)
support A
Finally, lift builds on the concept of confidence to show the correlation between two products A and B. We define the lift of a rule as follows:
confidence(A → B)
P(B)
confidence(A → B)
support(B)
In other words, lift compares how frequently A and B have been purchased together among all transactions containing A to how frequently transactions have included B. If lift(A → B) ≤ 1, A and B are not correlated. If lift(A → B) > 1, A and B are correlated.
If you try putting some real sales data into these equations, you'll soon find for yourself that confidence(A → B) usually isn't equal to confidence(B → A), while, on the other hand, lift(A → B) = lift(B → A).
Implementing machine learning-based market analysis with Python
Python has ready-made libraries for basket analysis. One of the most popular of these is mIxtend.
In mIxtend, the rules creation process is divided into two stages.
1. Selection of the most frequent itemsets
Itemsets are sets of products that are frequently purchased together. In this selection stage, we use the apriori function provided in mIxtend.frequent_patterns.
This function takes five arguments, but we'll just look at the two most important ones here: df (DataFrame) and min_support.
df must be a pandas DataFrame, where each column is a unique product from the initial dataset and rows are all transactions. If a particular product was purchased in a given transaction, then a value of 1 or True is assigned to that cell. Otherwise, the value will be 0 or False.
Here's an example from mIxtend's official documentation that shows us what the df will look like:
min_support is a float number between 0 and 1 that indicates the lower bound of support for itemsets.
This value should be chosen very carefully to suit your dataset. The default value is set to 0.5, which means that product combinations returned by the algorithm must be present in at least 50% of all transactions. However, that default value is rarely appropriate for real-life cases.
The apriori function returns a pandas DataFrame with columns called support and itemsets. The support for all itemsets (or single items) returned in this way will be greater than or equal to the selected value for min_support.
2. Rules creation
This step uses another function from mIxtend.frequent_patterns, association_rules. The association_rules function takes four arguments, but we'll limit this discussion to three of them: df, metric, and min_threshold.
- df is a dataset with the itemsets we obtained in the previous step.
- metric is the value we'll use to select our rules. The most commonly used metric values are support, confidence, and lift, which we calculated in formulas (1), (2), and (3) above.
- min_threshold sets the lower bound of the metric value.
The values selected for metric and its min_threshold will greatly influence the number and nature of the rules we create, so these determine a lot about our final results. Accordingly, it's important to get these values right.
Conclusion
We hope you enjoyed this peek under the hood of an unpersonalized recommendation engine! As you can see, market basket analysis is a simple yet powerful analytics tool. It delivers high-value suggestions that enhance customer experiences and boost sales. At the same time, its simplicity means almost any business can start using unpersonalized recommendations and see ROI fast. You don't have to be a data giant like Google or Amazon to benefit from market basket analysis: millions of retailers of all sizes are using these analytics to take their brands to the next level.
Market basket analysis's straightforward methodology means that it's fast and easy to implement with the right support. Like any machine learning tool, the efficacy of these analytics depends on sound data selection and assessment, so make sure to get reputable data science specialists aboard from the beginning.
Integrio Systems is an industry leader in AI with over 20 years of experience. Our scientists and engineers have worked with companies ranging from startups to multinational enterprises on projects in every major industry. Throughout, the one constant has been our passion for turning machine learning theory into real-world impact.
Want to explore what market basket analytics can bring to your business? Please get in touch below to schedule a free consultation with one of our experts.
Contact us