You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
STORY: As an audio-only IMAGE user, I want an overall scene description, so that the information I get about specific items via spatialized audio makes sense in context.
#933
Open
jeffbl opened this issue
Dec 18, 2024
· 0 comments
We created the graphic-caption preprocessor primarily for the Monarch experiences. However, participants have mentioned the desire to have more of an overview in the audio-only experiences as well. This work item is to integrate the output of graphic-caption preprocessor at the beginning of the photo-audio-handler experiences, to give the user a clear context for the specific information about objects and such that they receive after.
Note that very early in the project we did have a captioner, but deprecated it since it was giving biased results. Hopefully llama3.2-vision (current LLM in use) has better protections than what we were using in 2021.
@Cybernide Please provide instructions on where it should appear in the experience. I assume at the beginning, but I don't know if there are tweaks to the overview, or other introductory text that should introduce the description. Once you've put those instructions in this work item, please assign to @JRegimbal.
@JRegimbal We can revisit who does this when you receive it, if you don't have IMAGE hours to spend, or this turns out to be more involved than I'm assuming. (Which is that it should be a straightforward addition, especially if any of the old code is still there from our original captioner)
The text was updated successfully, but these errors were encountered:
We created the graphic-caption preprocessor primarily for the Monarch experiences. However, participants have mentioned the desire to have more of an overview in the audio-only experiences as well. This work item is to integrate the output of
graphic-caption
preprocessor at the beginning of the photo-audio-handler experiences, to give the user a clear context for the specific information about objects and such that they receive after.Note that very early in the project we did have a captioner, but deprecated it since it was giving biased results. Hopefully llama3.2-vision (current LLM in use) has better protections than what we were using in 2021.
@Cybernide Please provide instructions on where it should appear in the experience. I assume at the beginning, but I don't know if there are tweaks to the overview, or other introductory text that should introduce the description. Once you've put those instructions in this work item, please assign to @JRegimbal.
@JRegimbal We can revisit who does this when you receive it, if you don't have IMAGE hours to spend, or this turns out to be more involved than I'm assuming. (Which is that it should be a straightforward addition, especially if any of the old code is still there from our original captioner)
The text was updated successfully, but these errors were encountered: