Azure upload fails #11

xtrojak · 2023-10-19T07:51:08Z

For some reason, Azure often does not work and returns 400 Bad requests with message The file submitted couldn't be parsed. This can be due to one of the following reasons: the file format is not supported ( Supported formats include JPEG, PNG, BMP, PDF and TIFF), the file is corrupted or password protected.". This was reported in several libraries using the same API (e.g. here and here).

The way around it seems to upload the image and use ComputerVisionClient.read instead of ComputerVisionClient.read_in_stream. Here is description how to upload image to Google drive and get a shareable link to the image (we can delete it afterwards):

Google Cloud Project Setup:
- Go to the Google Cloud Console.
- Create a new project.
- Navigate to the Dashboard and then to the APIs & Services > Library.
- Search for the Drive API and enable it.
- Navigate to the Credentials tab and click on Create Credentials. Choose OAuth 2.0 Client ID.
- Choose Desktop App and create the credentials.
- Download the credentials file (a JSON file).

Install the necessary Python libraries:

pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib

Now, you can use the following script to perform the desired operations:

import pickle
import os.path
from google_auth_oauthlib.flow import InstalledAppFlow
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError
from google.oauth2.credentials import Credentials

# Setup the Drive v3 API
SCOPES = ['https://www.googleapis.com/auth/drive']

creds = None

# Check if token exists
if os.path.exists('token.pickle'):
    with open('token.pickle', 'rb') as token:
        creds = pickle.load(token)

# If there are no valid credentials available, prompt the user to log in
if not creds or not creds.valid:
    if creds and creds.expired and creds.refresh_token:
        creds.refresh(Request())
    else:
        flow = InstalledAppFlow.from_client_secrets_file('path_to_your_downloaded_credentials.json', SCOPES)
        creds = flow.run_local_server(port=0)
    
    # Save the credentials for the next run
    with open('token.pickle', 'wb') as token:
        pickle.dump(creds, token)

service = build('drive', 'v3', credentials=creds)

# Upload file
file_metadata = {
    'name': 'Your_PDF_Name.pdf'
}
media = MediaFileUpload('path_to_your_pdf_file.pdf', mimetype='application/pdf')
file = service.files().create(body=file_metadata, media_body=media, fields='id').execute()

file_id = file.get('id')

# Share the file and get shareable link
def get_shareable_link(file_id):
    permissions = {
        'role': 'reader',
        'type': 'anyone'
    }
    service.permissions().create(fileId=file_id, body=permissions).execute()

    return f"https://drive.google.com/uc?export=download&id={file_id}"

print("Shared Link:", get_shareable_link(file_id))

# Delete the file
input("Press Enter to delete the file...")
service.files().delete(fileId=file_id).execute()
print("File Deleted!")

Note: Replace 'path_to_your_downloaded_credentials.json' with the path to your downloaded JSON file from the Google Cloud Console, and 'path_to_your_pdf_file.pdf' with the path to your actual PDF file.

Also, be careful with sharing files using the 'type': 'anyone' setting as it makes the file publicly accessible.

Remember to place your credentials.json in the same directory as the script or adjust the path accordingly.

The text was updated successfully, but these errors were encountered:

xtrojak · 2023-10-19T08:53:55Z

The image still need to be within limits.

To get size of the image, we can try this:

from io import BytesIO
img_file = BytesIO()
# quality='keep' is a Pillow setting that maintains the quantization of the image.
# Not having the same quantization can result in different sizes between the in-memory image and the file size on disk.
image.save(img_file, 'png', quality='keep')
image_file_size = img_file.tell()

xtrojak mentioned this issue Oct 19, 2023

Deal with uncooperative service #12

Closed

xtrojak mentioned this issue Oct 19, 2023

Improve OCR services handling #13

Merged

xtrojak mentioned this issue Nov 17, 2023

Size limit of image for Azure #50

Closed

xtrojak mentioned this issue Jan 3, 2024

Use compressed images #69

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Azure upload fails #11

Azure upload fails #11

xtrojak commented Oct 19, 2023 •

edited

Loading

xtrojak commented Oct 19, 2023 •

edited

Loading

Azure upload fails #11

Azure upload fails #11

Comments

xtrojak commented Oct 19, 2023 • edited Loading

xtrojak commented Oct 19, 2023 • edited Loading

xtrojak commented Oct 19, 2023 •

edited

Loading

xtrojak commented Oct 19, 2023 •

edited

Loading