Introduction
Do you have Business Cards?
In Japanese business customs(I’m Japanese), Business cards(called “Meishi”) are essential. But if there are a lot of business cards, it is annoying to manage. So, I want to manage it easily.
I decided to extract information from business card by taking business card image with smartphone.The processing flow is as follows:
If you give a business card image to the application, then it extract person name, company name and address. For extracting some information, I use Google Vision API and Natural Language API because they are easy to use and have high performance.
Let’s extract information from business card by combining Google Cloud Vision API and Natural Language API. I make it using Python to use the APIs.
Steps to create
Step 0: Preparation
↓
Step 1: Detecting text with Google Vision API
↓
Step 2: Extracting person name, company name and address with Natural Language API
↓
Step 3: Integrating Vision API and Natural Language API
Step0 Preparation
In order to create the application, install required libraries, download repository and set API key. I assume that you already have Google API key. If you don’t have it, get it from here.
Installing Libraries
Execute the following command to install required libraries.
$ pip install requests
$ pip install pyyaml
Download repository
I have prepared the repository in advance. If you fill in the necessary places, it will work. So please download the repository from below link.
Set API Key
Write Google API key in the configuration file (plugins/config/google.yaml).
First, open google.yaml in the editor and replace xxx with your API token.
token: xxx
Step1 Detecting text with Google Vision API
First, we describe the Vision API. After that we write the script to use the API. After writing the script, running the script.
What is Vision API?
By leveraging the power of a powerful machine learning model, Google Cloud Vision API enables you to develop applications that can recognize and understand the contents of images. The Cloud Vision API has the following features:
- Image Classification (eg “Yacht” “Lion” “Eiffel Tower” etc.)
- Face Detection
- Text Detection
- Logo Detection
- Landmark Detection
- Safe Search Detection
You can use Vision API for free until 1000 requests per month.
Writing the script
In order to use the Vision API, write a script. Edit the script in plugins/apis/vision.py as follows. Please save it after editing. Please use UTF-8 for the character engoding.
# -*- coding: utf-8 -*-
import base64
import requests
def detect_text(image_file, access_token=None):
with open(image_file, 'rb') as image:
base64_image = base64.b64encode(image.read()).decode()
url = 'https://vision.googleapis.com/v1/images:annotate?key={}'.format(access_token)
header = {'Content-Type': 'application/json'}
body = {
'requests': [{
'image': {
'content': base64_image,
},
'features': [{
'type': 'TEXT_DETECTION',
'maxResults': 1,
}]
}]
}
response = requests.post(url, headers=header, json=body).json()
text = response['responses'][0]['textAnnotations'][0]['description'] if len(response['responses'][0]) > 0 else ''
return text
By giving the image file path and API key to detect_text function, you can extract text in the image file. Let’s run it for checking.
Running the script
Let’s run the script we just wrote.
First, move to the plugins/tests folder. Can you see that there is test_vision.py in the folder? In test_vision.py, it is written to call detect_text function. In other words, if it operates correctly, it will return text in the image.
I will do it. Give example_en.png in the data folder and run it.
$ python test_vision.py data/example_en.png
Have you obtained the following string as the execution result?
John Smith.
Capsule Corporation
217-767-8187
1332 Spring Street Elwin Illinois
Step 2 Extracting person name, company name and address with Natural Language API
Second, we describe the Natural Language API. After that, we write a script to use the API. After writing the script, running the script.
What is the Natural Language API?
Google Cloud Natural Language API applies a powerful machine learning model with an easy-to-use REST API to recognize the structure and meaning of text. The Natural Language API has the following functions:
- Entity Recognition (personal name, organization name, event information, etc.)
- Sentiment Analysis (emotion of comments on products, opinions of consumers, etc.)
- Syntax Analysis
You can use Natural Language API for free until 5000 requests per month.
Writing the script
Write a script to use the Natural Language API. Edit the script in plugins/apis/language.py as follows. Please save it after editing. Please use UTF-8 for the character encoding.
# -*- coding: utf-8 -*-
import requests
def extract_entities(text, access_token=None):
url = 'https://language.googleapis.com/v1beta1/documents:analyzeEntities?key={}'.format(access_token)
header = {'Content-Type': 'application/json'}
body = {
"document": {
"type": "PLAIN_TEXT",
"language": "EN",
"content": text
},
"encodingType": "UTF8"
}
response = requests.post(url, headers=header, json=body).json()
return response
def extract_required_entities(text, access_token=None):
entities = extract_entities(text, access_token)
required_entities = {'ORGANIZATION': '', 'PERSON': '', 'LOCATION': ''}
for entity in entities['entities']:
t = entity['type']
if t in required_entities:
required_entities[t] += entity['name']
return required_entities
By giving text and API key to extract_entities function, various named entities can be extracted. However, it is only Company Name, Person Name, Location to extract from the text this time. extract_required_entities function is used to extract these pieces of information.
Let’s run it for checking.
Running the script
Let’s run the script we just wrote.
First, move to the plugins/tests folder. Can you see that there is test_language.py in the folder? In test_language.py, it is specified to call extract_required_entities function. In other words, if it works correctly, it will return the company name, person’s name and address in the text.
I will do it. Give example_en.txt in the data folder and run it. In example_en.txt contains the character recognition result as above image.
$ python test_language.py data/example.txt
Have you obtained the following string as the execution result?
{'LOCATION': 'Spring Street Elwin Illinois', 'PERSON': 'John Smith', 'ORGANIZATION': 'Capsule Corporation'}
Step 3 Integrating Vision API and Natural Language API
In the end, we write a script to combine Vision API and Natural Language API. After writing the script, run the script to confirm the operation. Let’s start with writing script.
Writing the script
Write a script to combine Vision API and Natural Language API. Edit the script in plugins/apis/integration.py as follows. Please save it after editing. Please use UTF-8 for the character encoding.
# -*- coding: utf-8 -*-
from .language import extract_required_entities
from .vision import detect_text
def extract_entities_from_img(img_path, access_token):
text = detect_text(img_path, access_token)
entities = extract_required_entities(text, access_token)
return entities
By giving the image file path and API key to extract_entities_from_img function, it recognizes and returns the company name and person’s name in the image file. Let’s run it for checking.
Running the script
Let’s run the script.
First, move to the plugins/tests folder. Can you see that there is test_integration.py in it? In test_integration.py, it is specified to call extract_entities_from_img function. In other words, if it operates correctly, it will return the company name etc. in the image.
I will do it. Give example_en.png in the data folder and run it.
$ python test_integration.py data/example_en.png
Have you obtained the following string as the execution result?
{'LOCATION': 'Spring Street Elwin Illinois', 'PERSON': 'John Smith', 'ORGANIZATION': 'Capsule Corporation'}
Conclusion
We made the extractor from business card with Google Vision API and Natural Language API. But It is difficult to use it as it is now. In future I think I would like to incorporate the extractor into slack.