Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow data sources to be synced from the UI #379

Merged
merged 72 commits into from
Jul 25, 2024

Conversation

mihow
Copy link
Collaborator

@mihow mihow commented Apr 20, 2024

Bugs to fix:

  • BE Update data source test endpoint to accept "subdir" & "regex" params. Return full S3 URI, and some connection details ("Found at least 1 file with this configuration")
  • Most Costa Rica bucket paths have a space in them (likely unintentionally)
  • Add real Test button under deployment data sources
  • Data source Regex field gets filled in with string "null"
  • Complete tests for MinIO (s3 compatible storage backend for local / docker-compose stack)
  • Zip files are added as captures when syncing? Should filter by file type.
  • Ensure presigned urls are not used if the public base url is specified (rather than new boolean field)
  • Fix MinIO implementation, add default bucket.
  • Document how to use MinIO in local dev. (use default user/pass, add entry to hosts file for now. add notes about debug logging and running tests with -K, --pdb and --failfast as well, upload files with minio:9001)
  • Creating a new processing jobs adds stages for "data_storage_sync"
  • Multiple copies of "data_source_sync" stage are added to all job types
  • Confirm that images are working with public_base_url. The bucket name appears to be missing if the base url does not end in a trailing slash and the deployment uses a subdir.
  • Make subdir required? or not?

Discovered bugs, may no longer be relevant:

  • If a capture does not belong to an Event, the UI crashes (null does not have attribute 'id')
  • Sorting sessions by session name or "most recent" doesn't seem to work. Sorting by "date" column does.

Several issues raised during this PR have been moved to related issues:
#400
#455
#310
#456
#236

Configure S3 storage source for a project
image

Test the connection, see which deployments are using this storage
image

Configure the subdirectory for each deployment & test the config
image

Start the sync
image

View status of sync job
image

See the number of captures increase for the deployment
image

Copy link

netlify bot commented Apr 20, 2024

Deploy Preview for ami-web canceled.

Name Link
🔨 Latest commit 6a84f2a
🔍 Latest deploy log https://app.netlify.com/sites/ami-web/deploys/66a1fc0a921c7f0008919ed4

Copy link

netlify bot commented Apr 20, 2024

Deploy Preview for ami-storybook canceled.

Name Link
🔨 Latest commit 6a84f2a
🔍 Latest deploy log https://app.netlify.com/sites/ami-storybook/deploys/66a1fc0a9c2615000842aea5

@albags
Copy link

albags commented May 2, 2024

Created a new issue Sign up button #385 in GitHub

@mihow mihow force-pushed the AMI-278-Allow-data-sources-to-be-synced-from-the-UI branch from a84584b to 88bc9f1 Compare May 20, 2024 23:27
@annavik
Copy link
Member

annavik commented Jul 10, 2024

Some questions (sorry in advance for the long comment!):

  • I noticed my sub directory anna got prefix_exists: false, I don't understand why? Same for directory test. I could still sync the captures though! The vermont/RawImages/LUNA one works fine!
  • Should we disable syncing of captures if the connection has any problems?
  • Is the sub dir required for deployment data source, or is it ok to leave empty?
  • Any chance we can make the job details more user friendly for the sync job? Perhaps you are on it! :) That last stage(s) looks weird and also the labels. Also the list view job status and the job detail status are not always adding up.
  • When syncing captures, they show up for the deployment, but I noticed it takes a while for details about width and height to appear. The result will be a collapsed playback view. Why is that and is it and edge case I should to handle in UI? I could handle it, but since captures might take a while to load, it's helpful to know the size beforehand to avoid "jumps" in the interface.

Here is my subdir with some sample captures:
Skärmavbild 2024-07-10 kl  19 14 44

I could still successfully sync the 4 captures to the deployment though:
Skärmavbild 2024-07-10 kl  19 10 32

Collapsed playback view when captures do not have defined size:
Skärmavbild 2024-07-10 kl  19 28 41

Job status is 67% done in detail view:
Skärmavbild 2024-07-10 kl  19 33 43

Job status for same job is done in list view:
Skärmavbild 2024-07-10 kl  19 33 51

@annavik
Copy link
Member

annavik commented Jul 10, 2024

FE code wise I'm still using some hard coded strings, I feel we are still iterating a bit on the UI here. Do you think we are showing too much details about the connection status now (making the view a bit cluttered), or is ok?

@mihow
Copy link
Collaborator Author

mihow commented Jul 11, 2024

@annavik Thank you for the comments and observations. I agree the UI is looking a but cluttered, but I think it is necessary while we work out the kinks. I want to provide all the info we need to troubleshoot. Especially because syncing data can take a long time, and you may not notice an error right away. So it's great that we are making it possible to check the status of everything ahead of time that may break the sync.

  1. I agree we should disable the Sync button if the connection status is not okay, great call.
  2. Let's show the full URL of the first_image_found, maybe on tooltip if it's too long. Then you can see if the filter is grabbing an image from an incorrect sub folder.
  3. Height & width calculation - Let's make a ticket for that. We need to decide when to calculate. I am trying to avoid reading the source images until necessary. It may actually be a deployment-wide setting.
  4. I believe I fixed the issues with your anna & test directories. There were files in those subdirs with blank names, and the connection test was stopping on those. I increased the limit from 1 file to many even when there is no regex filter, and that seems to have solved the issue.
  5. I fixed the num files checked so it shows the actual number of files checked before finding a match.
  6. The sub-directory isn't technically required, but it probably should be. It's risky if you leave it blank you could end up syncing everything in the bucket. This is okay if you are in control of your storage and you really create a separate bucket for every deployment's images. But typically we are making a subdir for each deployment.
  7. Job details are funky yes, that's the last to-do I have for this PR! At least a slight improvement for now.
  8. I added additional error handling to the backend and realized I can send the error messages for subdir not found & prefix not found the same way we send other error messages. So I removed the message for that from the frontend. Here are some shots of some of the errors you may encounter!

Screenshot from 2024-07-10 19-29-54

Screenshot from 2024-07-10 19-29-35

Screenshot from 2024-07-10 19-31-22

Screenshot from 2024-07-10 19-32-30

@annavik
Copy link
Member

annavik commented Jul 12, 2024

Ok some more FE updates!

  • Update connection status UI using an accordion layout (to make it possible to show connection details, but still keep the UI clean in the default case)
  • Disable "Sync now" if connection was not successful
  • Add tooltip to ongoing job icon button (after a sync job was started)
  • Make sub directory required for the connection in the deployment case

Screenshots:

Collapsed accordion:
Skärmavbild 2024-07-12 kl  11 38 23

Expanded accordion:
Skärmavbild 2024-07-12 kl  11 38 31

Connection status also visible on icon hover:
Skärmavbild 2024-07-12 kl  11 41 15

First file URL visible on hover, click will open the capture in a new tab.
Skärmavbild 2024-07-12 kl  11 38 46

Same connection status component in the storage case, but with less details (and no requirement about sub directory):
Skärmavbild 2024-07-12 kl  11 39 19

@mihow
Copy link
Collaborator Author

mihow commented Jul 24, 2024

@annavik @joseepoirier

I just fixed the last large todo for this mega PR (related to data sync job status and interfering with the ML job status). I will return to the smaller remainders asap. So close to merging this one!

@mihow mihow merged commit cf7d8b8 into main Jul 25, 2024
6 checks passed
@mihow mihow deleted the AMI-278-Allow-data-sources-to-be-synced-from-the-UI branch July 25, 2024 07:19
@annavik
Copy link
Member

annavik commented Jul 29, 2024

Great job @mihow ! 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants