Language: Python Runtime version:2.7

Other requirements : This code uses Google Vision API to detect the hand written text in documents.So we need google cloud service account created for this purpose. This service request will be used in API call to google cloud.

Step1 :

Follow below instruction create a service account and once the service account file created copy the file. https://docs.bmc.com/docs/PATROL4GoogleCloudPlatform/10/creating-a-service-account-key-in-the-google-cloud-platform-project-799095477.html

The service account needs to have below privileges: 1.Storage bucket admin- To create/modify buckets/items 2.Privileges to create/use machine learning models on google cloud

once the service is created download the .json file containing credentials to access the Cloud API

Step2 :

Please update config.py for below values: (Or you can default values .)

################# Config section ############ (config.py file)

BUCKET_NAME="mldata101" -- Bucket for storing data temporarily INPUT_BUCKET_PATH="/input/" -- input directory name in the above bucket OUTPUT_BUCKET_DIR="/output/" -- output directory name in the above bucket GCP_SERVICE_AUTH_FILE="testkey.json" -- Service file created in Google cloud.

####################################################

Step3 :

Environment setup :

Please run below commands to setup environment

sudo apt update
sudo apt install python
sudo apt install python-pip
pip install -r requirements.txt

Running the code :

I have created one default service account in GCP and that can be used for testing the code.
Create a directory with name "data" and create a folder with name "001" and place all input files testing the model.

|---data | ----001 | ------ input1.pdf | ------ input2.pdf | ------ input3.pdf

Now run the code

python input_ml.py

input: Please enter top level directory of the test data: "/home/lavaraja/test/data/"
Once done output will be available under output directory inside data folder.

Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readme.md

readme.md

Files

readme.md

Latest commit

History

readme.md

File metadata and controls