Sigmoidal Success Story
document processing rate for selected cases
human error reduction
step authorization process for selected documents
What was the business problem?
The rapid growth in the amount of newly generated data makes it harder and harder to manage. Document management is as crucial for private enterprises as it is for the public sector. The amount of data that each government (both general and local) needs to store about each citizen is growing every minute.
Those growing numbers drastically affect how we manage the data, as traditional document management solutions cannot keep up with it. On the other hand, there is a need for data to be more widely available in the public sector, so the citizens don’t have to bring the same document to two different offices, and the duplication of effort can be prevented.
Our client from Cordoba municipality in Spain asked us to create a document management system that would optimize the business processes for its staff and the citizens.
What savings and profits does the client achieve?
- Security enhancement – We implemented a 4 step verification procedure, automatically detecting and protecting sensitive information. Prevention from unauthorized access.
- Compliance with legal requirements – automatic compliance with GDPR requirements.
- Cost reduction – automated document processing and automated data extraction reduced human labor costs.
- Improved data quality – data quality is higher because the system automatically detects any inconsistencies or shortages.
- Transparency – digitizing documents means the possibility of broader access.
- Streamlined document preparation – the system can automatically generate formats of documents or ask employees for specific information that is needed, while automatically filling out those that can already be found in the system.
- Optimization of processes – streamlined processes that previously took days now can take minutes. This reduces the time of an individual employer by hours or days per task.
- Organization of documents – the system makes it easier to keep records organized and allows every employee to find the right form within seconds, no matter where they are.
- Complete traceability – activity log of every user is collected, and the audit logs for each document.
How did we accomplish it?
We implemented the solution in three areas of document management:
- Classification and processing
- Data extraction
- Advanced Security
Classification & Processing
When it comes to understanding documentation by the machines, the most complicated part is the unstructured data. With document texts like .docx, the challenge for the machine is to understand what is the meaning of the text. But with unstructured data such as PDFs, audio files, printed or handwritten text, it firstly has to find where the text actually is and what it is.
That is where the OCR (Optical Character Recognition) technique is being used. The first part of it is text detection, where the textual part within the image is determined. The localization of the text is crucial for the second part – text recognition, where the text is extracted from the image. Using these techniques together is how you can extract text from any image.
Our next step was therefore to teach the machine to recognize text inside images and convert it into an electronic form.
A big problem with document organization is there are many forms of communication: emails, phone calls, text messages, letters, etc. It does not come as a formatted database. It takes a lot of time to organize all this information and pull out the knowledge one needs. But AI manages to pull that information within seconds. The technique of data extraction is called named entity extraction. Training a model consisted of three main steps:
- Dataset preparation: at the beginning, we had to identify, integrate, and prepare the data for learning. We created a dataset containing text documents, which was loaded, and a basic pre-processing was performed. Later the dataset was split into train sets (on which the machine learns) and validation sets (used to evaluate if the learning process was successful).
- Feature engineering: The raw dataset was transformed into flat features that were later used in the machine learning model.
- Model training: The machine learning model was trained on a labeled dataset. The validation sets were then used to check how accurately the model classified text.
Our AI-powered system can enhance security and protect citizens’ data.
It can easily detect personal identifying information (PII). The automatic classification and processing allow all the documents to be in secured locations.
How did we boost the project with ?
Our data scientists implemented best practices into the project by building reusable components for machine learning algorithms that reduced the development time of the project.
Sigmoidal put forward our Data Privacy and Security method. We increased security to address massive amounts of sensitive and confidential information while adhering to federal and state privacy and security regulations by automatic classification and storing documents in secured locations.
Sigmoidal organized extensive Technical Workshops for clients engineers. It allowed them to make software adjustments and redefine the infrastructure much easier.