You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
add a "REQUIRE_INDEX_EXISTS_FOR_FILE"(or if you can think of a better name) configuration flag to sheepdog.
if such flag is presented and set to True, for either creating or updating the node, it should return an error if the file doesn't exist.
If it exists, store the object_id in the node
context:
current status described by users:
As far as I know, there is no way to get a registered URL without typing it into api/index/index/UUID since you can't query "urls" on a node (IMO it should be queryable. If it were, you could write a script to query the url, then use aws cli to download it in your VM.)
i have searched for data files myself and found that many of them don't even have the same file_name in Windmill vs. s3 storage. So, how are users supposed to find a large list of files? right now, i think they have to do it by hand since allowing this sort of mismatch means they can't do it programmatically.
given that BloodPAC has tighter security (only allows users to DL to a machine in VPC), will they be able to implement "Files/Exploration" in Windmill? and/or "Workspace"?
For bloodpac's use case, we can support this data flow: Scenario 1: user uploads data first for storage that users have upload access to
BPA uses this flow because they are granted access to buckets predates Gen3. Also DCP and EDC uses this flow, because data is not owned by us, we are given read access to those buckets in those commons, so we continuously index them to indexd and link them in our graph.
user uploads data to buckets that they have direct access to
there is a lambda hosted by Gen3 that listens to the buckets update, and checksum and index them to indexd ( https://github.com/occ-data/goes16-indexer - work need to be done to automate deployment and polish the prototype to integrate to gen3)
user upload the metadata, if sheepdog can’t find a record in indexd that matches the checksum, it returns 400. If it can, it creates the data node with the file_id == indexd’s record id
The text was updated successfully, but these errors were encountered:
oops sorry, I rearranged the comment, it was refering to step3 in the scenario at the bottom of the comment... none of the thing in the scope is implemented
scope:
implement step 3 in the scenario described below:
context:
current status described by users:
For bloodpac's use case, we can support this data flow:
Scenario 1: user uploads data first for storage that users have upload access to
BPA uses this flow because they are granted access to buckets predates Gen3. Also DCP and EDC uses this flow, because data is not owned by us, we are given read access to those buckets in those commons, so we continuously index them to indexd and link them in our graph.
The text was updated successfully, but these errors were encountered: