More and more companies collect enormous amounts of data hoping to bring new insights and increase the marginal value of the enterprise - yet many of them are doing it the wrong way.

In this article, we will provide a technical overview, how the recommender systems work and a review of state of art research on them.

Why are they important? There are multiple ways how companies leverage big data. Establishing better pricing policy, determining how many items should be kept in stock and developing an effective recommendation system are only to name a few.

Being one of the most efficient ways to increase the enterprise's turnover, guidance systems are algorithms developed from big data that seek to predict the rating, or preference that a user would give to the item. Recommendation systems have become increasingly popular in the recent years utilized in a variety of areas including movies, music, news, books, research articles, search queries, social tags and product in general.

Big players know their value

Spotify is well known for its interest in fine tuning and developing a system for music recommendation (they use the newest deep learning technologies to analyze articles, blog posts and music itself). A couple of years ago Netflix announced a $1 million contest to improve their suggestions system. Not to mention TripAdvisor or Amazon which have their in-house teams for producing the best possible item matchings for their customers.

Figure: Growing size of customer base of Netflix and Spotify

But what about other smaller and in particular more traditional business? Recently an experiment has been conducted by the University of Illinois with the polisterol firm from Chicago. After implementing the recommendation system, the sales of the enterprise increased by another 18%.

In this article, we will take a look at how the suggestion engines are usually built. In our next post, we will get into business insights on how exactly your firm could benefit from using it in practice by analyzing a variety of business cases where recommender systems were crucial for increasing company's sale.

How recommender systems work?

The general idea behind almost all data-driven recommendation systems is that similar clients should like similar things. Most of the algorithms try to find good representations of both customers and products to preserve the closeness property - that same clients and products should have representations which are close to each other. Let's go through different ways on how we could find such meaningful representation.

Figure: a graph of a simplified representation for clothes. Representation Image from: VBPR: Visual Bayesian Personalized Ranking from Implicit Feedback

Similar customers or similar products?

Usually, we divide recommender systems into three categories based on what kind of information should be the most important when making a decision. If you want to suggest your customers an item which is likely to be bought by similar clients than you should try collaborative filtering methods. Most of them use a data mining techniques to analyze customers transaction and rates history to find similar users and provide suggestions based on their preferences.

Figure: An example of a collaborative filtering recommendation.

If you are a fan of e.g. a particular music genre than you are likely to listen to the newest trending artist playing the music you like. Providing your clients with recommendations of items which are similar to ones they bought or liked in the past is a base of content-based methods. Products might be matched e.g. by using visual or style similarity (e.g. in a case of clothing or furniture) or similar tags.

Figure: An example of a content-based recomendation.

Of course, you could simply mix both approaches by taking into account information from both product and customer similarity. This strategy is called a hybrid approach and nowadays it's the most common way of building recommender system as its such a waste of data to use only one source of information.

Where data comes from?

Classically there are two types of data collected to build a suggestion engine. From the conceptual point of view - the easiest one to collect and analyze are so called explicit sources - where you store the transaction and rating history of your customers and use them to find interesting and useful patterns. The biggest problem with this kind of data is that you usually need to maintain an enormously huge and use a lot of parallel computations to collect statistics needed. In a view to cope that a variety of matrix decomposition methods and intelligent model-based data compression techniques are used (e.g. Bayesian networks or Restricted Boltzmann machines).

Figure: A matrix with rating information - an example of an explicit data source.

Another type of data is a so-called implicit data where you collect a lot of meta information about both clients and products to improve the way your system works. Usually, these methods are quite sophisticated (e.g. collecting behavioral clues based on browsing history or comparing two songs if they are similar to each other regarding style) but finding interesting features out of loads of additional data is usually making a difference. The best example of it is Spotify recommender which uses advanced deep learning techniques to match similar songs and state-of-the-art NLP algorithms to analyze blog posts and music articles to improve the music recommendations.

Figure: An embedding of music tracks obtained using deep-learning. An example of how Spotify uses a implicit sources of information.

To match or to predict?

The final model decisions are usually obtained by either looking for the most similar cases and making the decision based on their analysis or by predicting the best outcome for the new example.

The first method - called similarity-based usually makes use of some variation of a nearest-neighbors classifier. Different kinds of metrics are used to define closeness (e.g. instead of using classical Euclidean distance - it's empirically proven that the cosine distance usually provides better predictions). The downside of these metric is that it's not clear how to make a decision given an object neighborhood - as usually, it's not obvious on how to make correct conclusions.

This issue is not a problem in the second approach - which are often referred as prediction-based - here your model is only providing you with either decision or a score which makes final recommendations much easier to obtain. The most common ML algorithms used for such tasks are decision-trees and random forest (like XGBoost) as they are not only relatively efficient but also easy to understand and analyze once a model is trained.

What is the outcome of a recommender engine?

The most common tasks solved by recommender systems are:

  • suggestions - where your model aims to provide you the best predictions for a given product and customer,
  • predictions - if you need only to decide if a given item would be interesting to your client,
  • rating and review - when you want to apply your system to measure a general quality or preferences.

Conclusion

As you see, there are a lot of ways of how you could build and use a recommender system for your purpose. But not everything is as easy as it looks.

The recommendation system is quite a new thing, and there is a limited amount of people globally who correctly understand how it works. Mastering recommendation systems take hard work and determination. Additionally, as guidance systems become more and more involved, there is a constant need for knowledge updates.

Curious about your idea of leveraging Recommendation Systems in your business - do drop us a line!