-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BrowseEverything S3 setup #173
Comments
Rory pointed me to https://github.com/research-technologies/browse-everything/tree/sharepoint_provider, a fork that includes the SharePoint work. We'll need to discuss if this is pushed back to Samvera or if we proceed with this fork. |
On 2022-12-07 I reached out to Jenny via Slack regarding credentials. There is ongoing conversations via email with their IT department concerning BrowseEverything and it's implementation. |
From BL Tech: No issues with Rory’s response to my question (where’s the fedora instance hosted > on the same AWS instance where the repository runs, as far as I understand, this isn’t the BL’s but CoSectors, i.e. we don’t/can’t login to AWS to manage/configure the services, they do, but happy to be corrected – if it is our instance (registered to the BL, we pay AWS for it etc) then we need to take additional steps to secure it). I would suggest now raising this with Jon Fryer, my suggestion would be to use a sharepoint online site to do this I just want to make sure Jon is OK with the process/principal (what they want to use, Graph, is an automated Microsoft method and standard and how we grant other 3rd parties access into our Microsoft tenant, it would just be the first time we’ve done this to a Sharepoint site). If Jon is on board, then we’ll need to discuss how we setup such a site, and making sure their access is done securely. I’ve no issue with the gem they propose, or how you intend to use it, but essentially the data you grab using it is going to leave our control/visibility, albeit into a trusted 3rd party cloud and in bulk rather than manually as you’re doing now, so I don’t foresee any issues on the face of it. TL;DR - Jenny will raise this with CIMU in Jan 23, sounds like BL will be OK with it. |
BL is OK with the Sharepoint access, but are upgrading all Sharepoint estate and are busy with that. No Technology resource til Q2 2023 so suggest we continue with S3 only and come back to SHarepoint at a later date (there may be a separate ticket for Sharepoint BE) |
To proceed with this as just with S3 |
@j-basford where can we get credentials to your s3? |
@cziaarm should be able to provide these. |
cc @jillpe moving this out of the sprint: waiting on client decision. The client needs to have policy and other prior discussions before implementation is done. We will mark this as a blocker until further notice. s3 requires all of their IT to get on board too. And security questions of the s3 bucket ownership needs to be decided as well. |
Have created bucket (and a user with access keys). May need some guidance on what to add to browse_everything.yml |
Hi @cziaarm This example may help. Hopefully it's a plug and play type of set up. EDIT: Oh actually, this already exists in BL. So let's try uncommenting the s3 block w the values you generated. I also found some docs for configuring it. |
Hi @ShanaLMoore I have what may a useful set of value in place in the BE yaml. I'm getting an odd error that makes me feel like BE has gone wrong in constructing a url along the way (perhaps because of misconfiguration). The config looks like this:
I've left out the I've checked out these value with a simple get via postman and they are all good. In hyku, I'm getting the s3 option and the modal that looks like this: but when I click on "connect" I end up with an error... I don't think the error is relevant directly as I think it is the URL that is at fault:
Is the url that I'm ending up at and unsurprisingly causing an error On closer inspection I can see the link for the "connect" button is:
So it feels like BE is missing something important here? |
So link is generated by |
Hi @cziaarm! Shana is out today, but our team has some documentation they wrote that might be helpful: Adding Browse Everything to a Hyrax Application I'm also trying to find someone on the team who has experience with this and could pair with you |
@NoraRamsey @grahamjevon @j-basford The S3 provider has been configured and is now available on the staging repository. You will need to configure a desktop client to be able to put things into the bucket. I have used Winscp (It is a Windows SCP/FTP client that understands S3). If you find your preferred file transfer desktop client that can use the AWSS3 protocol, then I will be able to share the access keys with you and you'll be able to use the "Add Cloud Files" feature both in the normal upload workflow and the Bulkrax import. I'll be on slack |
Hi Rory, regarding BE, I uploaded a 2.5GB file today to S3. The work has appeared in the repo, but the file has yet to load in the repo. Is it possible to see if that is still loading behind the scenes or does this indicate an issue? I ran the upload a few hours ago and got the familiar Chrome error. |
Importer successfully imported work with 594MB file using BE. Everything happened as expected. Importer with 2GB file resulted in a "504 Gateway Time-Out nginx". This message appeared about 1-2 minutes after starting the importer. When I went to the Importer history, there were two duplicate importers for this, which both said "complete": https://bl.bl-staging.notch8.cloud/importers/122 This resulted in two works being created: https://bl.bl-staging.notch8.cloud/concern/articles/3f2955a1-3c36-4784-a913-dc7e2790142c While the filename appears in the items list, the item has no file size and it cannot be downloaded. This suggests that the upload of the file failed. This seems to replicate my experience when testing BE back in July. |
Key here I think is that when we use BE via Bulkrax the web does the download. This is different to when we use BE in the upload context. In that case the worker asynchronously imports the S3 URL and then it is attached to the file_set. |
This is partially blocked by #207Summary
This is established in the gem.
The 'add cloud files' button in importers is not working. When you click on it is should prompt you with options. There should also be an 'add cloud files' button in the work page under the 'add files' and 'add folder' buttons.
Resources for installing browse-everything gem:
S3 ticket and general browse everything ticket
BL would like us to set up one S3 bucket that can be used for staging and production.
Accepted Criteria
Notes
The text was updated successfully, but these errors were encountered: