Industry

Raw Data to Good Data using NLP and Cloud services

The Challenge

Our client’s team works with a set of files (PDF documents, audio and images) and they need to extract the underlying text and information to turn them into valuable business assets.

| The Outcome

Raw data obtained in different formats have been converted into intelligent data, valid for further use in business applications.

How we did it

Performing operations when any user inserts PDF, audio or image into a medium suitable for processing, generating as a result a text file with the transcription.
Identifying the transcription files and generation of annotations for natural language analysis (NLP).
Storing the information in a system to facilitate further processing.

We moved the information from outside the system to a storage system, where we deal with several computing units, and we dump the partial result in an object-based storage service (Google Storage) and in the database service for analytics (Google Big Query).

We set up an automation system for the client in order to make the environments deployable from code, allowing the system to be deployed in different Google Cloud Platform projects.

Speech-to-Text
Cloud Functions
Google Storage
Google Query

Shall We Talk?

If you need to know more specifics or are interested in having us assist your organization, please use the form below.