A picuture of AI chatbot interaction with the user - buying sushi

If you have a business with a large customer service and you want to make it more efficient, it's time to think about introducing chatbots. In this blog post, we'll cover some standard methods which can be used by any B2C.

Introduction

Any business oriented towards customers needs a customer service. Usually it is either a phone line, a chat or both. However modern customers tend to expect easiness of contact and speed of execution, whenever they have a problem. This fix-it-now attitude causes a lot of frustration with infolines and talking to consultants who seem not to care for the customer.

More and more companies are introducing a chat as a means of contact with their clients. It allows more freedom as you can pick up a conversation whenever you want and end it whenever you want. Ideally a chat would be available 24/7 and it would let a customer solve every of their needs.

These goals are apparent for telecom companies for which I worked. They all have a large volume of daily chats between customers and consultants and they all look to make the whole process more efficient. My goal is to investigate how to do it.

Automation and data preparation

We want to automate the process. There are two possibilities. You either try to replace all consultants with a chatbot or you empower your consultants with answer suggestions provided by an AI. We concentrate here on the first option which is more demanding.

So we are set: our goal is to build a chatbot which would satisfactorily converse with customers of your telecom company.

In any task involving machine learning, the first step is to prepare data. In our case we assume that we have as an input thousands of chats client-consultant. To build an effective chatbot, we need to teach a machine which words/phrases are sensitive to our business – this is called creating an ontology.

Pre-processing is done by tokenizing, stemming and lemmatizing our chats. Most often this can be done by freely available NLTK tool. The effect is that we can access parse trees of our chats. Such a pre-processing is needed to incorporate grammar into machine understanding and it allows to identify any misspellings correctly within texts.

Chatbots

There are two models for a chatbot:

Retrieval-based model – use a repository of predefined responses. We choose an appropiate one based on context following a given heurestic which can be either something very simple or quite complex (as we discuss later).

Generative model – don't use anything predefined. Learn from scratch using deep learning.

Needless to say, a generative model is harder to make it work efficiently and in particular, it is not perfect in our current state of knowledge (the Turing test has not been passed yet!). That is, you won't be able to build a perfect chatbot for all sort of tasks. However in our case, a chatbot for telecoms for client-customer relations, we can do much better as the task is more narrow.

The pro for using retrieval-based model is the fact that it won't make grammatical mistakes, however it will be rigid too, and thus not likely to appear „human”. Generative models however don't guarantee either to appear „human”, however they can adapt better to surprising demands and questions from customers.

In order to see whether you'll be fine with using a retrieval-based model you can perform a statistical analysis of usual demands. Simply filtering chats by the length shows whether demands tend to be complicated or not. The idea is that you can automate fully short conversations. In longer conversations a machine need to keep track of what was being said in previous paragraphs. This is often surprisingly difficult.

Let's get back to our case. Say you've performed a quick statistical analysis and it happened that around 65% of your chats are short (say 10 lines or less). You are now opting for a retrieval-based model, so let's see what we can do.

🤖 Chatbots are cool and interactive, but there's more to NLP. Extracting info from natural text can be a gold mine!

Deep learning

Through data preparation we already have a sufficient corpora of text on which a machine can learn. Our model will take as an input a context (a conversation with a client with all prior sentences) and output a potential answer based on it. It is worth mentioning that Google Assistant is using retrieval-based model (Smart Reply (Google) – this work also describes how to automatically reply to emails).

In order to process our data, which consists of words, it is better to transform it into numbers, as eventually machine are processing numbers themselves. There are couple of ways to do that. We have mentioned before constructing a vocabulary. This can be done for example by turning words into vectors. For example word2vec (word2vec) learns vocabulary from a training data (our chats) and then associate a vector to each word in the training data. It is interesting that those vectors capture many linguistic properties, for example

vector('Paris') – vector('France') + vector('Italy') = vector('Rome')  

For more such examples, see word2vec. Another way to create a vocabulary and turn our text corpus into vectors is by using TensorFlow and TensorFlow's Example format – you can find a tutorial on how to use it here (DEEP LEARNING FOR CHATBOTS). Yet another tool for word representation is GloVe.

The next step is to actually 'teach' our chatbot what to answer. Here comes the deep learning part. We want to train a machine what to answer given particular context. In particular, it should be able to answer something even in previously unseen contexts. To do that, we need to build a correct architecture. Here we have plethora of choices as it is actively researched domain of AI. In a way, building a chatbot is similar to building a translator – giving answers to a given question is similar to translating from language to another. Methods used for both tasks are also similar.

First of all, TensorFlow comes with seq2seq module (seq2seq tutorial ) which consists of two recurrent neural networks (RNNs): an encoder that processes the input and the decoder that generates the output. Here is a standard architecture (A, B, C are inputs to the encoder, 'go', W, X, Y, Z are inputs to the decoder):

A sketch of Recursive Convolutional Neural Network

We have a choice for each RNN. A standard one is LSTM cells – where LSTM stands for long short-term memory. Citing The Ubuntu Dialogue Corpus: "LSTMs were introduced in order to model longer-term dependencies. This is accomplished using a series of gates that determine whether a new input should be remembered, forgotten (and the old value retained), or used as output. The error signal can now be fed back indefinitely into the gates of the LSTM unit. This helps overcome the vanishing and exploding gradient problems in standard RNNs, where the error gradients would otherwise decrease or increase at an exponential rate.” For discussion of other possibilities (standard RNN or TF-IDF) see [6], and especially Table 4 for comparison of different choices.

LSTM composition

Another possible architecture is Dual Encoder described in Deep Learning for chatbots.

LSTM composition of Dual Encoder architecture

Let us describe briefly how it works:

  1. Both the context and the answer are turned into vectors (vectors of vectors of words). We've done it during data processing phase.
  2. Both vectors ("embedded words") are put into the RNN word-by-word. This gives us another vectors which capture the meaning of the context and the answer (call them c for context and a for answer). We decide how large those vectors should be.
  3. We multiply c with some matrix M to produce an answer a. The matrix M is learned during training (our weights).
  4. We measure how good is our predicted answer a, compared to the actual answer. We apply regularisation functions (e.g. sigmoid) to convert the measurement into an actual probability.

Of course, this is just a tip of an iceberg. Having an architecture is one thing, but tuning it to work well with our problem is a completely different issue. This is also very time consuming and needs to be dealt with care.

Let us mention also that you can classify chatbots into vertical and horizontal. Vertical chatbots are closed-domain chatbots focused on particular application – this is our case with a chatbot for customer service in telecoms. Horizontal chatbots are general and open-domain like Siri, Alexa or Google Assistant. Again a vertical chatbot is easier to build than a horizontal one.

One of the issues you will run into with vertical chatbots is out-of-vocabulary problem. You should supply to your training set all technical terms relevant to your business/application. However in general seq2seq provides a simple way to build unsupervised vertical chatbots (Unsupervised Deep Learning for Vertical Conversational Chatbots ).

Conclusion

In this blog post we have covered some possible architectures for a chatbot which can be used by a B2C, having as an example telecom companies. We think that any company who deal on a daily basis with large customer service should consider implementing a chatbot to make the process more efficient. Not only it cuts costs in the long term, but it gives customers the 24/7 availability and freedom, which results in satisfaction. And that's the most important of it all.

Eager to read more articles like this one? Follow us on Twitter!

In case of questions, write to us. We are here to help your business grow.

References:

[1] Smart Reply (Google) https://arxiv.org/pdf/1606.04870.pdf

[2] seq2seq tutorial: https://www.tensorflow.org/tutorials/seq2seq

[3] word2vec: https://code.google.com/archive/p/word2vec/

[4] http://www.wildml.com/2016/07/deep-learning-for-chatbots-2-retrieval-based-model-tensorflow/

[5] https://chatbotsmagazine.com/unsupervised-deep-learning-for-vertical-conversational-chatbots-c66f21b1e0f

[6] https://arxiv.org/pdf/1506.08909.pdf