Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure upload fails #11

Open
xtrojak opened this issue Oct 19, 2023 · 1 comment
Open

Azure upload fails #11

xtrojak opened this issue Oct 19, 2023 · 1 comment

Comments

@xtrojak
Copy link
Collaborator

xtrojak commented Oct 19, 2023

For some reason, Azure often does not work and returns 400 Bad requests with message The file submitted couldn't be parsed. This can be due to one of the following reasons: the file format is not supported ( Supported formats include JPEG, PNG, BMP, PDF and TIFF), the file is corrupted or password protected.". This was reported in several libraries using the same API (e.g. here and here).

The way around it seems to upload the image and use ComputerVisionClient.read instead of ComputerVisionClient.read_in_stream. Here is description how to upload image to Google drive and get a shareable link to the image (we can delete it afterwards):

  1. Google Cloud Project Setup:

    • Go to the Google Cloud Console.
    • Create a new project.
    • Navigate to the Dashboard and then to the APIs & Services > Library.
    • Search for the Drive API and enable it.
    • Navigate to the Credentials tab and click on Create Credentials. Choose OAuth 2.0 Client ID.
    • Choose Desktop App and create the credentials.
    • Download the credentials file (a JSON file).
  2. Install the necessary Python libraries:

    pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib

Now, you can use the following script to perform the desired operations:

import pickle
import os.path
from google_auth_oauthlib.flow import InstalledAppFlow
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError
from google.oauth2.credentials import Credentials

# Setup the Drive v3 API
SCOPES = ['https://www.googleapis.com/auth/drive']

creds = None

# Check if token exists
if os.path.exists('token.pickle'):
    with open('token.pickle', 'rb') as token:
        creds = pickle.load(token)

# If there are no valid credentials available, prompt the user to log in
if not creds or not creds.valid:
    if creds and creds.expired and creds.refresh_token:
        creds.refresh(Request())
    else:
        flow = InstalledAppFlow.from_client_secrets_file('path_to_your_downloaded_credentials.json', SCOPES)
        creds = flow.run_local_server(port=0)
    
    # Save the credentials for the next run
    with open('token.pickle', 'wb') as token:
        pickle.dump(creds, token)

service = build('drive', 'v3', credentials=creds)

# Upload file
file_metadata = {
    'name': 'Your_PDF_Name.pdf'
}
media = MediaFileUpload('path_to_your_pdf_file.pdf', mimetype='application/pdf')
file = service.files().create(body=file_metadata, media_body=media, fields='id').execute()

file_id = file.get('id')

# Share the file and get shareable link
def get_shareable_link(file_id):
    permissions = {
        'role': 'reader',
        'type': 'anyone'
    }
    service.permissions().create(fileId=file_id, body=permissions).execute()

    return f"https://drive.google.com/uc?export=download&id={file_id}"

print("Shared Link:", get_shareable_link(file_id))

# Delete the file
input("Press Enter to delete the file...")
service.files().delete(fileId=file_id).execute()
print("File Deleted!")

Note: Replace 'path_to_your_downloaded_credentials.json' with the path to your downloaded JSON file from the Google Cloud Console, and 'path_to_your_pdf_file.pdf' with the path to your actual PDF file.

Also, be careful with sharing files using the 'type': 'anyone' setting as it makes the file publicly accessible.

Remember to place your credentials.json in the same directory as the script or adjust the path accordingly.

@xtrojak
Copy link
Collaborator Author

xtrojak commented Oct 19, 2023

The image still need to be within limits.

To get size of the image, we can try this:

from io import BytesIO
img_file = BytesIO()
# quality='keep' is a Pillow setting that maintains the quantization of the image.
# Not having the same quantization can result in different sizes between the in-memory image and the file size on disk.
image.save(img_file, 'png', quality='keep')
image_file_size = img_file.tell()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant