More and more companies collect enormous amounts of data hoping to bring new insights to the industry and increase the marginal value of the enterprise - yet many of them are doing it the wrong way.
In this article, we will provide a technical overview of recommender systems, explain how recommender systems work, and review the state of the art research being done in the industry.
🤓 Hey, this article is technical. Click here if you are interested in a high-level business overview of recommender system.
Being one of the most efficient ways to increase the enterprise's turnover, guidance systems are algorithms developed from big data that seek to predict the rating or preference that a user would give to the item. Recommendation systems have become increasingly popular in the recent years and are utilized in a variety of areas including movies, music, news, books, research articles, search queries, social tags and product in general.
Big players know their value
Recommender systems are used by some of the biggest global brands. Spotify is well known for its interest in fine tuning and developing a system for music recommendation (they use the newest deep learning technologies to analyze articles, blog posts and the music itself). A couple of years ago, Netflix announced a $1 million contest to improve their suggestions system, adding another big name to the world of recommender systems. TripAdvisor or Amazon also use recommender systems, employing entire in-house teams that work to produce the best possible item matchings for their customers. With these popular brands making recommendations a priority, consumers are getting used to having good recommendations offered to them, and so recommendation systems are becoming the new norm for high end companies.
Figure: Growing size of customer base of Netflix and Spotify
Smaller and more traditional businesses are turning to recommendation systems as well. As one example, recently, an experiment has been conducted by the University of Illinois to implement a recommendation system for Chicago's Polisterol firm.. After implementing the recommendation system, the sales of the enterprise increased by 18%.
How recommender systems work?
The general idea behind almost all data-driven recommendation systems is that similar clients should like similar things. Most of the algorithms try to find good representations of both customers and products to preserve the closeness property - that same clients and products should have representations which are close to each other. Let's go through different ways on finding such meaningful representation.
Figure: a graph of a simplified representation for clothes.
Similar customers or similar products?
Usually, we divide recommender systems into three categories based on what kind of information should be the most important when making a decision. If you want to suggest your customers an item which is likely to be purchased by similar clients, then you should try collaborative filtering methods. Most filtering methods use a data mining technique to analyze customer transactions and rates history to find similar users and provide suggestions based on their preferences.
Figure: An example of a collaborative filtering recommendation.
For example, if you are a fan of a particular music genre, you are likely to listen to the newest trending artist playing the music you like. Providing your clients with recommendations of items which are similar to ones they bought or liked in the past is the foundation of content-based methods. Products like clothing or furniture might be matched by basing preferences off of visual or style similarity or similar tags.
Figure: An example of a content-based recommendation.
Some recommendation systems, mix both approaches by taking into account information from both product and customer similarity. This strategy is called a hybrid approach and nowadays it's the most common way of building recommender system because it as it utilizes all of the available information. If you are looking to implement a recommendation system, going for a hybrid approach will keep you up to date and will probably heed the best results.
Where data comes from?
Traditionally, there are two types of data that is collected to build a suggestion engine. The first type of data source are referred to as explicit sources - sources where you store the transaction and rating history of your customers and use them to find interesting and useful patterns. This kind of data is relatively easy to collect and analyze because it is so straightforward. However, the biggest problem with this kind of data is that you usually need to maintain an enormously large amount of it and use a lot of parallel computations to collect the needed statistics. In a view to cope that a variety of matrix decomposition methods and intelligent model-based data compression techniques are used (Bayesian networks or Restricted Boltzmann machines).
Figure: A matrix with rating information - an example of an explicit data source.
The second type of data is called implicit data. Implicit data is where you collect a lot of meta information about both clients and products to improve the way your system works. Usually, these methods are quite sophisticated (e.g., collecting behavioral clues based on browsing history, or comparing two songs if they are similar to each other regarding style) but finding interesting features out of loads and loads additional data takes this to a far more complex level. The best example of implicit data collection is the Spotify recommender which uses advanced deep learning techniques to match similar songs and state-of-the-art NLP algorithms to analyze blog posts and music articles to improve the music recommendations.
Figure: An embedding of music tracks obtained using deep-learning. An example of how Spotify uses a implicit sources of information.
To match or to predict?
When it comes to determining what to recommend, the machine's decisions are usually obtained either by looking for the most similar cases and making the decision based on their analysis, or by predicting the best outcome for a new example.
The first method - called similarity-based usually makes use of some variation of a nearest-neighbors classifier. Different kinds of metrics are used to define closeness (e.g., instead of using classical Euclidean distance - it's empirically proven that the cosine distance usually provides better predictions). The downside of this metric is that it's not clear how to make a decision given an object neighborhood - as a result, it is usually not obvious how to make correct conclusions.
This issue is not a problem in the second approach - which is often referred to as prediction-based recommendations - here your model is only providing you with either a decision or a score, which makes final recommendations much easier to obtain. The most common ML algorithms used for such tasks are decision-trees and random forest (like XGBoost) as they are not only relatively efficient, but also easy to understand and analyze once a model is trained.
What is the outcome of a recommender engine?
The most common tasks solved by recommender systems are:
- Suggestions - where your model aims to provide you the best predictions for a given product and customer,
- Predictions - if you need only to decide if a given item would be interesting to your client,
- Ratings and reviews - when you want to apply your system to measure a general quality or preference.
As you can see, there are a lot of ways how you can build and use a recommender system for your company.
The recommendation system is still a very new technology, and there is a limited amount of people globally who correctly understand how it works. Mastering recommendation systems takes hard work and extensive experience in the area of AI. Additionally, as guidance systems become more and more involved, there is a constant need for specialists whose knowledge base has grown alongside the advancements.
Curious about your idea of leveraging recommendation systems in your business - drop us a line!