Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excel file from link not being processed by Data Review Tool #101

Open
kd-ods opened this issue Jul 18, 2023 · 2 comments
Open

Excel file from link not being processed by Data Review Tool #101

kd-ods opened this issue Jul 18, 2023 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@kd-ods
Copy link
Collaborator

kd-ods commented Jul 18, 2023

It used to be possible to paste the url in B9 in this workbook into the Data Review Tool and have it validate the Excel version of the workbook.

It no longer works.

Note that downloading the Excel version of the workbook and uploading it to the tool does work.

(This direct link for validation from the workbook also does not work:

https://datareview.openownership.org/?source_url=https://docs.google.com/spreadsheets/d/1XT5UvwaUcFS65UH5kyj7hDOAKYG-U30eSBoQxbR7A1A/export?format=xlsx

...but I assume that that is a related issue.)

The issue was found by @kathryn-ods.

@kd-ods kd-ods added the bug Something isn't working label Jul 18, 2023
@kathryn-ods
Copy link

Have tried setting online workbook to 'anyone with the link can view' and 'anyone with the link can edit' still having this issue. Also throws up an error when trying to paste the link in on the review tool.

ghost pushed a commit to OpenDataServices/lib-cove-web-2 that referenced this issue Jul 19, 2023
@kd-ods kd-ods assigned ghost Jul 19, 2023
@ghost
Copy link

ghost commented Jul 19, 2023

It's nothing about the sharing permissions or the URL.

To fix this, here is a list of things that need to be done:

OpenDataServices/lib-cove-web-2#5 this PR is needed, the library needs to be released and used in Cove. It needs to check content type to work out that it's a spreadsheet.

It was assuming all URL uploads were JSON - remove that. Apply:


--- a/cove_bods/views.py
+++ b/cove_bods/views.py
@@ -54,7 +54,7 @@ class NewInput(InputDataView):
             )
         elif form_name == "url_form":
             supplied_data.save_file_from_source_url(
-                form.cleaned_data["url"], content_type="application/json"
+                form.cleaned_data["url"]
             )
 
 

It was assuming JSON always has content type set correctly and if you paste in a URL from GitHub raw like https://raw.githubusercontent.com/openownership/lib-cove-bods/master/tests/fixtures/0.3/basic_1.json it wasn't. We need to check more carefully. So apply:

--- a/cove_bods/process.py
+++ b/cove_bods/process.py@@ -84,13 +81,11 @@ class WasJSONUploaded(ProcessDataTask):
         if self.supplied_data.format != "json":
             return process_data
 
-        supplied_data_json_files = SuppliedDataFile.objects.filter(
-            supplied_data=self.supplied_data, content_type="application/json"
-        )
-        if supplied_data_json_files.count() == 1:
+        supplied_data_json_files = [i for i in self.supplied_data_files if get_file_type_for_flatten_tool(i) == "json"]
+        if len(supplied_data_json_files) == 1:
             process_data[
                 "json_data_filename"
-            ] = supplied_data_json_files.first().upload_dir_and_filename()
+            ] = supplied_data_json_files[0].upload_dir_and_filename()
         else:
             raise Exception("Can't find JSON original data!")

And (hopefully) finally, flattentool still fails when you paste that URL in with at error "openpyxl.utils.exceptions.InvalidFileException: openpyxl does not support file format, please check you can open it with Excel first. Supported formats are: .xlsx,.xlsm,.xltx,.xltm". The filename on disk is "export". We want to rewrite all file operations in Cove to use Django file interface anyway. (the reason is so we can switch to cloud hosting of files later). Unfortunately flattentool can't work with that - so here, the process must be copy the files to temp folder, run flattentool locally then copy them back. See Open-Telecoms-Data/cove-ofds@5b4fbdb So it would be good to rewrite ConvertSpreadsheetIntoJSON to do that anyway, and in the process, we can drop a proper file extension on the temporary file that flatten tool will see and thus fix this issue.

ghost pushed a commit to OpenDataServices/lib-cove-web-2 that referenced this issue Jul 19, 2023
ghost pushed a commit to OpenDataServices/lib-cove-web-2 that referenced this issue Jul 19, 2023
ghost pushed a commit to OpenDataServices/lib-cove-web-2 that referenced this issue Jul 19, 2023
ghost pushed a commit to OpenDataServices/lib-cove-web-2 that referenced this issue Jul 19, 2023
@ghost ghost assigned radix0000 and unassigned ghost Jul 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants