Web Analytics

Extracting Information from Business Card with Google API

Introduction

Do you have Business Cards?
In Japanese business customs(I’m Japanese), Business cards(called “Meishi”) are essential. But if there are a lot of business cards, it is annoying to manage. So, I want to manage it easily.

I decided to extract information from business card by taking business card image with smartphone.The processing flow is as follows:

 

 

If you give a business card image to the application, then it extract person name, company name and address. For extracting some information, I use Google Vision API and Natural Language API because they are easy to use and have high performance.

Let’s extract information from business card by combining Google Cloud Vision API and Natural Language API. I make it using Python to use the APIs.

Steps to create

Step 0: Preparation

Step 1: Detecting text with Google Vision API

Step 2: Extracting person name, company name and address with Natural Language API

Step 3: Integrating Vision API and Natural Language API

Step0 Preparation

In order to create the application, install required libraries, download repository and set API key. I assume that you already have Google API key. If you don’t have it, get it from here.

Installing Libraries

Execute the following command to install required libraries.

$ pip install requests
$ pip install pyyaml

Download repository

I have prepared the repository in advance. If you fill in the necessary places, it will work. So please download the repository from below link.

Set API Key

Write Google API key in the configuration file (plugins/config/google.yaml).
First, open google.yaml in the editor and replace xxx with your API token.

token: xxx

Step1 Detecting text with Google Vision API

First, we describe the Vision API. After that we write the script to use the API. After writing the script, running the script.

What is Vision API?

By leveraging the power of a powerful machine learning model, Google Cloud Vision API enables you to develop applications that can recognize and understand the contents of images. The Cloud Vision API has the following features:

  • Image Classification (eg “Yacht” “Lion” “Eiffel Tower” etc.)
  • Face Detection
  • Text Detection
  • Logo Detection
  • Landmark Detection
  • Safe Search Detection

You can use Vision API for free until 1000 requests per month.

Writing the script

In order to use the Vision API, write a script. Edit the script in plugins/apis/vision.py as follows. Please save it after editing. Please use UTF-8 for the character engoding.

# -*- coding: utf-8 -*-
import base64
import requests


def detect_text(image_file, access_token=None):

    with open(image_file, 'rb') as image:
        base64_image = base64.b64encode(image.read()).decode()

    url = 'https://vision.googleapis.com/v1/images:annotate?key={}'.format(access_token)
    header = {'Content-Type': 'application/json'}
    body = {
        'requests': [{
            'image': {
                'content': base64_image,
            },
            'features': [{
                'type': 'TEXT_DETECTION',
                'maxResults': 1,
            }]

        }]
    }
    response = requests.post(url, headers=header, json=body).json()
    text = response['responses'][0]['textAnnotations'][0]['description'] if len(response['responses'][0]) > 0 else ''
    return text

By giving the image file path and API key to detect_text function, you can extract text in the image file. Let’s run it for checking.

Running the script

Let’s run the script we just wrote.
First, move to the plugins/tests folder. Can you see that there is test_vision.py in the folder? In test_vision.py, it is written to call detect_text function. In other words, if it operates correctly, it will return text in the image.

I will do it. Give example_en.png in the data folder and run it.

$ python test_vision.py data/example_en.png

Have you obtained the following string as the execution result?

John Smith.
Capsule Corporation
217-767-8187
1332 Spring Street Elwin Illinois

Step 2 Extracting person name, company name and address with Natural Language API

Second, we describe the Natural Language API. After that, we write a script to use the API. After writing the script, running the script.

What is the Natural Language API?

Google Cloud Natural Language API applies a powerful machine learning model with an easy-to-use REST API to recognize the structure and meaning of text. The Natural Language API has the following functions:

  • Entity Recognition (personal name, organization name, event information, etc.)
  • Sentiment Analysis (emotion of comments on products, opinions of consumers, etc.)
  • Syntax Analysis

You can use Natural Language API for free until 5000 requests per month.

Writing the script

Write a script to use the Natural Language API. Edit the script in plugins/apis/language.py as follows. Please save it after editing. Please use UTF-8 for the character encoding.

# -*- coding: utf-8 -*-
import requests


def extract_entities(text, access_token=None):

    url = 'https://language.googleapis.com/v1beta1/documents:analyzeEntities?key={}'.format(access_token)
    header = {'Content-Type': 'application/json'}
    body = {
        "document": {
            "type": "PLAIN_TEXT",
            "language": "EN",
            "content": text
        },
        "encodingType": "UTF8"
    }
    response = requests.post(url, headers=header, json=body).json()
    return response


def extract_required_entities(text, access_token=None):
    entities = extract_entities(text, access_token)
    required_entities = {'ORGANIZATION': '', 'PERSON': '', 'LOCATION': ''}
    for entity in entities['entities']:
        t = entity['type']
        if t in required_entities:
            required_entities[t] += entity['name']

    return required_entities

By giving text and API key to extract_entities function, various named entities can be extracted. However, it is only Company Name, Person Name, Location to extract from the text this time. extract_required_entities function is used to extract these pieces of information.

Let’s run it for checking.

Running the script

Let’s run the script we just wrote.
First, move to the plugins/tests folder. Can you see that there is test_language.py in the folder? In test_language.py, it is specified to call extract_required_entities function. In other words, if it works correctly, it will return the company name, person’s name and address in the text.

I will do it. Give example_en.txt in the data folder and run it. In example_en.txt contains the character recognition result as above image.

$ python test_language.py data/example.txt

Have you obtained the following string as the execution result?

{'LOCATION': 'Spring Street Elwin Illinois', 'PERSON': 'John Smith', 'ORGANIZATION': 'Capsule Corporation'}

Step 3 Integrating Vision API and Natural Language API

In the end, we write a script to combine Vision API and Natural Language API. After writing the script, run the script to confirm the operation. Let’s start with writing script.

Writing the script

Write a script to combine Vision API and Natural Language API. Edit the script in plugins/apis/integration.py as follows. Please save it after editing. Please use UTF-8 for the character encoding.

# -*- coding: utf-8 -*-
from .language import extract_required_entities
from .vision import detect_text


def extract_entities_from_img(img_path, access_token):

    text = detect_text(img_path, access_token)
    entities = extract_required_entities(text, access_token)

    return entities

By giving the image file path and API key to extract_entities_from_img function, it recognizes and returns the company name and person’s name in the image file. Let’s run it for checking.

Running the script

Let’s run the script.
First, move to the plugins/tests folder. Can you see that there is test_integration.py in it? In test_integration.py, it is specified to call extract_entities_from_img function. In other words, if it operates correctly, it will return the company name etc. in the image.

I will do it. Give example_en.png in the data folder and run it.

$ python test_integration.py data/example_en.png

Have you obtained the following string as the execution result?

{'LOCATION': 'Spring Street Elwin Illinois', 'PERSON': 'John Smith', 'ORGANIZATION': 'Capsule Corporation'}

Conclusion

We made the extractor from business card with Google Vision API and Natural Language API. But It is difficult to use it as it is now. In future I think I would like to incorporate the extractor into slack.

Leave a Comment