I remember that there was an advertisement called: “Learn mathematics, physics and chemistry well, and you are not afraid to travel all over the world.” I feel that I should add another sentence: “Bring your ID card.” In this article, we will look at how to use Python to extract ID card information.
Method to realize
The implementation methods can be roughly divided into two types:
- Make your own wheels, such as: use OpenCV to implement your own coding. All functions of this method need to be implemented by yourself, which is time-consuming and labor-intensive. The advantage is that it is more flexible.
- Using off-the-shelf wheels, such as Baidu Cloud, the platform has already implemented the core functions and provided an API interface. We can directly call the interface. This method saves time and effort, but may be less flexible.
Because the function we want to achieve is relatively simple, we will use the second method to demonstrate it. Let’s take a brief look at the implementation process.
Baidu Cloud SDK provides support for multiple languages. Here we install the Python version of the SDK and use the pip install baidu-aip command. The SDK directory structure is as follows:
├── README.md ├── aip // SDK directory │ ├── __init__.py // export class │ ├── base .py // aip base class │ ├── http.py // http request │ └── ocr.py / /OCR └── setup.py // setuptools installation
Create an app
After the SDK is installed, we need to create an application. Here we need a Baidu account or Baidu Cloud account. If you don’t have one, you can register one yourself. The login and registration address is: https://login.bce.baidu.com/? redirect=http%3A%2F%2Fcloud.baidu.com%2Fcampaign%2Fcampus-2018%2Findex.html, the specific process is basically similar to license plate recognition. If you are not clear, you can read this article on license plate recognition.
Let’s find an ID picture first, as shown in the figure:
Then look at the code implementation. First, create AipOcr. AipOcr is the Python SDK client of OCR. The code implementation is as follows:
# Own APPID AK SK APP_ID = 'own App ID' API_KEY = 'own Api Key' SECRET_KEY = 'own Secret Key' client = AipOcr(APP_ID, API_KEY, SECRET_KEY)
The above three parameters can also refer to the introduction in license plate recognition.
There are two modes of information extraction: normal mode and high-precision mode. The normal mode code is implemented as follows:
# Open and read the file content fp = open( "card.jpg" , "rb" ). read () res = client.basicGeneral(fp) # normal # traverse the result for tex in res[ "words_result" ]: row = tex["words"] print(row)
The output is as follows:
Name Wei Xiaobao gender male ethnic Born December 20, 1654 Address No. 4 Jingshan Front Street, Dongcheng District, Beijing Forbidden City Reverend Room Citizen ID number 112441654122 2438
Let’s try the high-precision mode again. The code is implemented as follows:
# Open and read the file content fp = open( "card.jpg" , "rb" ). read () res = client.basicAccurate(fp) # high precision # traverse the result for tex in res[ "words_result" ]: row = tex["words"] print(row)
The output is as follows:
Name Wei Xiaobao gender male ethnic Born December 20, 1654 Address No. 4 Jingshan Front Street, Dongcheng District, Beijing Forbidden City Reverend Room Citizen ID number 11204416541220243X
From the input results, we can see that the correct ID number is extracted in the high-precision mode, and there are some errors in the ID number extracted by the normal mode.
In this article, we use Python combined with Baidu cloud interface to extract ID card information with a few lines of code. In fact, in addition to ID card information, other card information can also be extracted, such as bank card information, etc. If you are interested, you can try it.
Here I would like to recommend the Python development and learning group I built myself: 1156465813. The group is all developed by learning Python. If you want to learn or are learning Python, you are welcome to join. Everyone is a software development party and shares dry goods from time to time. (Only related to Python software development), including a copy of the latest Python advanced materials and advanced development tutorials in 2020 that I have compiled by myself. Welcome to advanced and advanced partners who want to go deeper into Python!