Build your own Search Engine

pexels photo 67112

Post by Dr. Rutu Mulkar-Mehta In this post, I will take you through the steps for calculating the values for all the words in a given document. To implement this, we use a small dataset (or corpus, as NLPers like to call it) form the Project Gutenberg Catalog. This is just a simple toy example […]

The Math behind Lucene

antoine dautry 05A kdOH6Hw unsplash scaled 1

Lucene is an open source search engine, that one can use on top of custom data and create your own search engine – like your own personal Google! In this post, we will go over the basic math behind Lucene, and how it ranks documents to the input search query. THE BASICS – TF*IDF The […]

Docker Deep Learning – GPU-accelerated Keras

Docker Deep

Machine Learning consulting companies should also be adept at software engineering, right? In this post, I’ll show you how to prepare a Docker container able to run an already trained Neural Network (NN). It can be helpful if you want to redistribute your work to multiple machines or send it to a client, along with […]

Docker Swarm – how and when to use it?

Docker Swarm

In this post I’m going to analyze the role of Docker in each stage of the application lifecycle and try to highlight cases when you should consider moving to Swarm. It’s not an introductory tutorial, but I’ll create one in the future. Development with Docker Docker really made my life much easier. Let’s just say, […]

Elasticsearch: 10 Advices to Get Started

Elasticsearch 2

We have recently finished an innovative, data-driven project based on Elasticsearch. The aim was to find similarities between objects across sets. Sets were static, although they were a decent size (90+ million records) and there was a requirement that search was fast (nearly instant) – so Elasticsearch was the best choice. During the project, I […]

SSLForFree – Setting up SSL with NGINX and LetsEncrypt

Code with SSL caption

We all know that sweet green padlock in our browsers, meaning that our connection to a website is secure. Don’t underestimate encrypted connection! Without it, all data is sent in plain text. It’s very dangerous if your site handles secret data, like passwords or emails. Recently, my colleague from work logged into our office router. […]

Git Hooks – Automatic Code Quality Checks

Windows error

We all strive to achieve great quality code. Every language allows us to run some quality checks or automatic unit tests. But even best tests won’t help if they aren’t run often. Remember! If something takes too much time or effort, people will avoid it! Solution? We can reverse that! Let’s make automatic tests effortless, […]

Tensorflow AWS setup – proper setup of version 1.0

Tenserflow

After long development, Google released the first stable version of its Machine Learning library, TensorFlow. The release is an important milestone in the development of a common Machine Learning toolkit. TensorFlow provides a set of primitives from which Machine Learning engineers and researchers can construct trainable models — as well as a framework to run these computations […]

How to automatically fill PDF forms using Python and pdfrw

A Person writing

Recently at Sigmoidal we had a curious case of filling PDF forms for our users. They can print them out pre-filled by us and use. We had plenty of those forms to set up, so an efficient way of doing it was required. Solution 0 — Putting Texts In Python The simplest solution goes like this: Take […]