Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STORY: As an audio-only IMAGE user, I want an overall scene description, so that the information I get about specific items via spatialized audio makes sense in context. #933

Open
jeffbl opened this issue Dec 18, 2024 · 0 comments
Assignees

Comments

@jeffbl
Copy link
Member

jeffbl commented Dec 18, 2024

We created the graphic-caption preprocessor primarily for the Monarch experiences. However, participants have mentioned the desire to have more of an overview in the audio-only experiences as well. This work item is to integrate the output of graphic-caption preprocessor at the beginning of the photo-audio-handler experiences, to give the user a clear context for the specific information about objects and such that they receive after.

Note that very early in the project we did have a captioner, but deprecated it since it was giving biased results. Hopefully llama3.2-vision (current LLM in use) has better protections than what we were using in 2021.

@Cybernide Please provide instructions on where it should appear in the experience. I assume at the beginning, but I don't know if there are tweaks to the overview, or other introductory text that should introduce the description. Once you've put those instructions in this work item, please assign to @JRegimbal.

@JRegimbal We can revisit who does this when you receive it, if you don't have IMAGE hours to spend, or this turns out to be more involved than I'm assuming. (Which is that it should be a straightforward addition, especially if any of the old code is still there from our original captioner)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants