BrowseEverything S3 setup #173

crisr15 · 2022-11-15T00:07:20Z

~~This is partially blocked by #207~~

Summary

This is established in the gem.

The 'add cloud files' button in importers is not working. When you click on it is should prompt you with options. There should also be an 'add cloud files' button in the work page under the 'add files' and 'add folder' buttons.

Resources for installing browse-everything gem:

https://github.com/samvera/browse-everything
Documentation for Server Side Support
browse-everything configuration documentation
S3 ticket and general browse everything ticket
Community documentation

BL would like us to set up one S3 bucket that can be used for staging and production.

Accepted Criteria

There will be one S3 bucket per environment (we can share buckets across environments); that means all tenants use the same bucket.
S3 configuration added to browse everything initializer.
'Add cloud files' button prompts user with options when clicked on
'Add cloud files' button add to the work page under 'add files' and 'add folder' buttons

Notes

jeremyf · 2022-12-07T15:14:55Z

Rory pointed me to https://github.com/research-technologies/browse-everything/tree/sharepoint_provider, a fork that includes the SharePoint work. We'll need to discuss if this is pushed back to Samvera or if we proceed with this fork.

jeremyf · 2022-12-07T15:43:03Z

On 2022-12-07 I reached out to Jenny via Slack regarding credentials. There is ongoing conversations via email with their IT department concerning BrowseEverything and it's implementation.

j-basford · 2022-12-22T14:33:13Z

From BL Tech:

No issues with Rory’s response to my question (where’s the fedora instance hosted > on the same AWS instance where the repository runs, as far as I understand, this isn’t the BL’s but CoSectors, i.e. we don’t/can’t login to AWS to manage/configure the services, they do, but happy to be corrected – if it is our instance (registered to the BL, we pay AWS for it etc) then we need to take additional steps to secure it).

I would suggest now raising this with Jon Fryer, my suggestion would be to use a sharepoint online site to do this I just want to make sure Jon is OK with the process/principal (what they want to use, Graph, is an automated Microsoft method and standard and how we grant other 3rd parties access into our Microsoft tenant, it would just be the first time we’ve done this to a Sharepoint site). If Jon is on board, then we’ll need to discuss how we setup such a site, and making sure their access is done securely.

I’ve no issue with the gem they propose, or how you intend to use it, but essentially the data you grab using it is going to leave our control/visibility, albeit into a trusted 3rd party cloud and in bulk rather than manually as you’re doing now, so I don’t foresee any issues on the face of it.

TL;DR - Jenny will raise this with CIMU in Jan 23, sounds like BL will be OK with it.

j-basford · 2023-02-02T13:06:10Z

BL is OK with the Sharepoint access, but are upgrading all Sharepoint estate and are busy with that. No Technology resource til Q2 2023 so suggest we continue with S3 only and come back to SHarepoint at a later date (there may be a separate ticket for Sharepoint BE)

j-basford · 2023-02-20T17:15:26Z

To proceed with this as just with S3

ShanaLMoore · 2023-02-23T23:32:55Z

@j-basford where can we get credentials to your s3?

j-basford · 2023-02-28T13:20:27Z

@cziaarm should be able to provide these.

ShanaLMoore · 2023-02-28T17:42:48Z

cc @jillpe moving this out of the sprint: waiting on client decision.

The client needs to have policy and other prior discussions before implementation is done. We will mark this as a blocker until further notice.

s3 requires all of their IT to get on board too. And security questions of the s3 bucket ownership needs to be decided as well.

j-basford · 2023-03-03T16:03:05Z

We cannot use a BL S3 - can we use a CoSector one to confirm this works and then we will revisit after development is verified @jillpe @cziaarm

cziaarm · 2023-03-03T22:22:27Z

Have created bucket (and a user with access keys). May need some guidance on what to add to browse_everything.yml

ShanaLMoore · 2023-03-06T16:15:38Z

Have created bucket (and a user with access keys). May need some guidance on what to add to browse_everything.yml

Hi @cziaarm This example may help. Hopefully it's a plug and play type of set up.

EDIT: Oh actually, this already exists in BL. So let's try uncommenting the s3 block w the values you generated.

I also found some docs for configuring it.

cziaarm · 2023-03-10T14:56:28Z

Hi @ShanaLMoore

I have what may a useful set of value in place in the BE yaml. I'm getting an odd error that makes me feel like BE has gone wrong in constructing a url along the way (perhaps because of misconfiguration). The config looks like this:

s3:
  bucket: temp-bl-bucket-for-browse-everything.s3.amazonaws.com
  app_key: [MY_APP_KEY]
  app_secret: [MY_APP_SECRET]
  region: eu-west-1

I've left out the response_type and expires_in options, as there is no mention of those in the docs.

I've checked out these value with a simple get via postman and they are all good.

In hyku, I'm getting the s3 option and the modal that looks like this:

but when I click on "connect" I end up with an error... I don't think the error is relevant directly as I think it is the URL that is at fault:

http://bl.bl.test/concern/articles/&state=s3

Is the url that I'm ending up at and unsurprisingly causing an error

On closer inspection I can see the link for the "connect" button is:

<a class="btn btn-primary ev-auth" target="blank" id="provider_auth" href="&state=s3">Connect to S3</a>

So it feels like BE is missing something important here?

cziaarm · 2023-03-10T15:06:30Z

So link is generated by auth_link which is not overridden fo the s3 provider, hence the lack of complete url... but I suspect that would be a link to authenticate a user for the provider. In this case my bucket has key access and that key/secret is in the config, so no need for an auth step? I'd love for someone to show me round a working s3 example, as they must exist, and I'm obviously doing something wrong

jillpe · 2023-03-10T16:31:34Z

Hi @cziaarm! Shana is out today, but our team has some documentation they wrote that might be helpful:

Adding Browse Everything to a Hyrax Application

Server Side Storage Support Setup

I'm also trying to find someone on the team who has experience with this and could pair with you

cziaarm · 2023-07-18T07:27:43Z

@NoraRamsey @grahamjevon @j-basford

The S3 provider has been configured and is now available on the staging repository. You will need to configure a desktop client to be able to put things into the bucket. I have used Winscp (It is a Windows SCP/FTP client that understands S3). If you find your preferred file transfer desktop client that can use the AWSS3 protocol, then I will be able to share the access keys with you and you'll be able to use the "Add Cloud Files" feature both in the normal upload workflow and the Bulkrax import.

I'll be on slack

cziaarm · 2023-07-25T13:12:03Z

Hi Rory, regarding BE, I uploaded a 2.5GB file today to S3. The work has appeared in the repo, but the file has yet to load in the repo. Is it possible to see if that is still loading behind the scenes or does this indicate an issue? I ran the upload a few hours ago and got the familiar Chrome error.
Incidentally, the work was duplicated (so two copies of the work have appeared and two copies of the importer are showing). I'm not sure if this was a human error (perhaps I double clicked import) or if the this was technical error. I thought I'd wait until we knew if the import was still running behind the scenes before testing this again.

grahamjevon · 2023-11-16T16:45:49Z

Importer successfully imported work with 594MB file using BE. Everything happened as expected.

Importer with 2GB file resulted in a "504 Gateway Time-Out nginx". This message appeared about 1-2 minutes after starting the importer. When I went to the Importer history, there were two duplicate importers for this, which both said "complete":

https://bl.bl-staging.notch8.cloud/importers/122
https://bl.bl-staging.notch8.cloud/importers/123

This resulted in two works being created:

https://bl.bl-staging.notch8.cloud/concern/articles/3f2955a1-3c36-4784-a913-dc7e2790142c
https://bl.bl-staging.notch8.cloud/concern/articles/f30f3791-970a-4ddb-85ac-8931b7264159

While the filename appears in the items list, the item has no file size and it cannot be downloaded. This suggests that the upload of the file failed. This seems to replicate my experience when testing BE back in July.

cziaarm · 2023-12-11T14:05:15Z

Key here I think is that when we use BE via Bulkrax the web does the download. This is different to when we use BE in the upload context. In that case the worker asynchronously imports the S3 URL and then it is attached to the file_set.

crisr15 mentioned this issue Nov 15, 2022

EPIC [M] 4b Handle uploads and downloads of files in a timely reliable way with alternative ingest routes provided where the default isn't possible, e.g. AWS S3 CLI or similar alternative for cloud storage #85

Open

4 tasks

crisr15 added this to the AHRC 2 milestone Nov 15, 2022

jillpe added the browse-everything label Nov 29, 2022

jeremyf mentioned this issue Nov 30, 2022

Install Browse Everything gem #190

Closed

2 tasks

jillpe moved this to Ready for Development in britishlibrary Dec 5, 2022

jeremyf added Awaiting client feedback and removed Awaiting client feedback labels Dec 6, 2022

jeremyf added the Awaiting client feedback label Dec 7, 2022

jeremyf mentioned this issue Dec 8, 2022

Update Bulkrax to v4 #211

Closed

3 tasks

cziaarm added the SL-RC Service Label: Request for change label Jan 10, 2023

ShanaLMoore assigned ShanaLMoore, kirkkwang and DeonFranklin Feb 15, 2023

ShanaLMoore unassigned ShanaLMoore, kirkkwang and DeonFranklin Feb 23, 2023

ShanaLMoore added the Blocked label Feb 28, 2023

ShanaLMoore removed the status in britishlibrary Feb 28, 2023

ShanaLMoore removed the Blocked label Mar 3, 2023

ShanaLMoore assigned cziaarm Mar 3, 2023

cziaarm moved this to In Development in britishlibrary Mar 10, 2023

cziaarm mentioned this issue Jul 13, 2023

config for be-s3 #456

Merged

cziaarm moved this from In Development to Client QA in britishlibrary Jul 17, 2023

cziaarm moved this from Client QA to In Development in britishlibrary Jul 25, 2023

grahamjevon mentioned this issue Aug 11, 2023

Error when downloading 5GB file #465

Open

cziaarm mentioned this issue Sep 6, 2023

S3 backed Fedora timing out on large import #392

Open

1 task

cziaarm moved this from In Development to SoftServ QA in britishlibrary Nov 14, 2023

cziaarm moved this from SoftServ QA to Deploy to Staging in britishlibrary Mar 20, 2024

cziaarm mentioned this issue Mar 20, 2024

Browse everything s3 #532

Merged

cziaarm moved this from Deploy to Staging to Client QA in britishlibrary Mar 21, 2024

cziaarm moved this from Client QA to Deploy to Staging in britishlibrary Mar 26, 2024

cziaarm moved this from Deploy to Staging to Client QA in britishlibrary Mar 27, 2024

cziaarm moved this from Client QA to Done in britishlibrary Apr 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BrowseEverything S3 setup #173

BrowseEverything S3 setup #173

crisr15 commented Nov 15, 2022 •

edited by jillpe

Loading

jeremyf commented Dec 7, 2022

jeremyf commented Dec 7, 2022

j-basford commented Dec 22, 2022

j-basford commented Feb 2, 2023

j-basford commented Feb 20, 2023

ShanaLMoore commented Feb 23, 2023

j-basford commented Feb 28, 2023

ShanaLMoore commented Feb 28, 2023 •

edited

Loading

j-basford commented Mar 3, 2023

cziaarm commented Mar 3, 2023

ShanaLMoore commented Mar 6, 2023 •

edited

Loading

cziaarm commented Mar 10, 2023

cziaarm commented Mar 10, 2023

jillpe commented Mar 10, 2023

cziaarm commented Jul 18, 2023

cziaarm commented Jul 25, 2023

grahamjevon commented Nov 16, 2023 •

edited

Loading

cziaarm commented Dec 11, 2023

BrowseEverything S3 setup #173

BrowseEverything S3 setup #173

Comments

crisr15 commented Nov 15, 2022 • edited by jillpe Loading

Summary

Accepted Criteria

Notes

jeremyf commented Dec 7, 2022

jeremyf commented Dec 7, 2022

j-basford commented Dec 22, 2022

j-basford commented Feb 2, 2023

j-basford commented Feb 20, 2023

ShanaLMoore commented Feb 23, 2023

j-basford commented Feb 28, 2023

ShanaLMoore commented Feb 28, 2023 • edited Loading

j-basford commented Mar 3, 2023

cziaarm commented Mar 3, 2023

ShanaLMoore commented Mar 6, 2023 • edited Loading

cziaarm commented Mar 10, 2023

cziaarm commented Mar 10, 2023

jillpe commented Mar 10, 2023

cziaarm commented Jul 18, 2023

cziaarm commented Jul 25, 2023

grahamjevon commented Nov 16, 2023 • edited Loading

cziaarm commented Dec 11, 2023

crisr15 commented Nov 15, 2022 •

edited by jillpe

Loading

ShanaLMoore commented Feb 28, 2023 •

edited

Loading

ShanaLMoore commented Mar 6, 2023 •

edited

Loading

grahamjevon commented Nov 16, 2023 •

edited

Loading