Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] TQDM Integration #190

Open
iwr-redmond opened this issue Dec 7, 2024 · 10 comments
Open

[Feature Request] TQDM Integration #190

iwr-redmond opened this issue Dec 7, 2024 · 10 comments
Labels
enhancement Improves on existing functionality - NOT a new feature

Comments

@iwr-redmond
Copy link

iwr-redmond commented Dec 7, 2024

Description

Long running processes such as file downloads and generations are common in ML applications. Often they are logged to the console and not the UI, which is not helpful in circumstances where the end user is in app mode or accessing a web app remotely.

Rio provides a ProgressBar but not a connection between this UI component and potential workloads.

I suggest integrating Rio with TQDM.

Suggested Solution

There are two possible ways to integrate TQDM:

  1. A helper class like Gradio (see helpers.py)
  2. Link the Plot component with tqdm.gui

Alternatives

No response

Additional Context

  1. Leveraging the Plot component might be easier, but risks leaving ProgressBar warming up leftover snow.

  2. Graphical progress meters are under-represented in existing frameworks (although they are used), with CLI counters being much more typical - notably, Gradio's tqdm support is comparatively recent

Related Issues/Pull Requests

No response

@iwr-redmond iwr-redmond added the enhancement Improves on existing functionality - NOT a new feature label Dec 7, 2024
@mad-moo
Copy link
Contributor

mad-moo commented Dec 7, 2024

What would the API for this look like for the user? Can you post a minimal example for what you have in mind?

And yes, definitely integrate the progressbar rather than drawing one in a plot

@iwr-redmond
Copy link
Author

iwr-redmond commented Dec 7, 2024

A few options, in pseudocode because I'm more hat than cattle today.

  1. Rely on the user to get most of it right:
downloader, output = async MyTqdmAsyncDownloader("https://huggingface.co/repository/large_file.safetensors")
rio.ProgressBar(progress=output)
  1. Assume the user might need some guidance:
class MonitoredProgress(ProgressBar):
     key = "preloader"
     async_function = MyDownloadFunction("https://huggingface.co/repository/large_file.safetensors")
     progress_type = percentage # or range, or steps, etc. - predefined for tqdm use scenarios
  1. Provide some helper functions - might be going over the top a bit:
async def my_function_steps(rio.Queue) = (
    rio.Helpers.Download(https://huggingface.co/repository/large_file.zip, local_file)
    my_hash_check_function(local_file)
    unzip_to_someplace() 
) # rio.Queue = basic wrapper around tqdm.async

class MonitoredOverall(rio.ProgressBar):
     key = "provisioning_steps_counter"
     type = QueueMonitor(my_function_steps)

class MonitoredSteps(rio.ProgressBar):
    key = "provisioning_steps_progress"
    type = StepMonitor(my_function_steps)

@Aran-Fey
Copy link
Contributor

Aran-Fey commented Dec 7, 2024

I don't really understand your examples, I'm afraid. They're a bit too pseudo for me, and they look quite different from how rio currently works.

This is how ProgressBars are currently used:

class ProgressDemo(rio.Component):
    progress: float = 0
    
    async def start_working(self):
        for p in range(10):
            await asyncio.sleep(1)
            self.progress = (p + 1) / 10
            await self.force_refresh()
    
    def build(self):
        return rio.Column(
            rio.Button('start working', on_press=self.start_working),
            rio.ProgressBar(self.progress),
        )

How would you write this code if tqdm integration existed? What do you want to see changed?

@iwr-redmond
Copy link
Author

iwr-redmond commented Dec 7, 2024

Please add that example to the documentation page. It's helpful. Also, my code is at best good for government work (c.f. warming leftover snow).

What I think is needed is two things:

  1. Easier ways to transfer progress information from the backend to the frontend. Looking at the landscape of ML apps and research demo web interfaces, this is mostly not done well and often just plain ignored. If Rio is to have a well-functioning desktop component, the CLI with its basic feedback mechanisms will not be present once the app enters into production.
  2. Basic protection against a non-async function taking over the application and causing it to freeze during a long process. This may exist already, but if not it would be good to take the opportunity to create a helper function that invites users to take advantage of useful features (easy progress bar) in order to reduce this risk and encourage best practices.

To answer your question then.

First the original function that would have existed anyway. I'll pick on OmniGen's scheduler class:

 def __call__(self, z, func, model_kwargs, use_kv_cache: bool=True, offload_kv_cache: bool=True):
        num_tokens_for_img = z.size(-1)*z.size(-2) // 4
        if isinstance(model_kwargs['input_ids'], list):
            cache = [OmniGenCache(num_tokens_for_img, offload_kv_cache) for _ in range(len(model_kwargs['input_ids']))] if use_kv_cache else None
        else:
            cache = OmniGenCache(num_tokens_for_img, offload_kv_cache) if use_kv_cache else None
        results = {}
        for i in tqdm(range(self.num_steps)):
            timesteps = torch.zeros(size=(len(z), )).to(z.device) + self.sigma[i]
            pred, cache = func(z, timesteps, past_key_values=cache, **model_kwargs)
            sigma_next = self.sigma[i+1]
            sigma = self.sigma[i]
            z = z + (sigma_next - sigma) * pred
            if i == 0 and use_kv_cache:
                num_tokens_for_img = z.size(-1)*z.size(-2) // 4
                if isinstance(cache, list):
                    model_kwargs['input_ids'] = [None] * len(cache)
                else:
                    model_kwargs['input_ids'] = None

                model_kwargs['position_ids'] = self.crop_position_ids_for_cache(model_kwargs['position_ids'], num_tokens_for_img)
                model_kwargs['attention_mask'] = self.crop_attention_mask_for_cache(model_kwargs['attention_mask'], num_tokens_for_img)

        del cache
        torch.cuda.empty_cache()  
        gc.collect()
        return z

This has not used tqdm.async, and asyncio itself isn't used anywhere in the code (nor in Rypo's excellent quantization and speed PRs). The authors have rightly focused on making their core code work, and the UI is a secondary if not tertiary affair. Because tqdm is called exactly once as a convenience, everything is outsourced to that package, which provides both risks (what if there are future changes that don't work well?) and opportunities (rio can take the process over instead).

To help with these matters, I suggest a rio wrapper function to make tqdm more accessible and reproducible. It would include a bunch of sane defaults, like using tqdm.async, and probably some exploration of the logging helper function. Critically, it would also provide a few basic options as to what type of counting is going to occur:

  • Percent
  • Range
  • Steps

These would allow the correct tqdm formatting to be applied and then transferred to the frontend. Gradio's version of this sort of integration function is here.

The end user, then, taking advantage of this integration, would then switch out L162:

for i in rio.ProgressMonitor(range(self.num_steps), key="omni_inference", counter="steps"):

Or (if a percentage is wanted in the UI):

for i in rio.ProgressMonitor(range(self.num_steps), key="omni_inference", counter="percent"):

Then finally on the frontend, something like:

from omnigen import scheduler
class ProgressDemo(rio.Component):
    
    async def start_working = OmniGenScheduler(my_ui_options_not_coded_here)
    
    def build(self):
        return rio.Column(
            rio.Button('start working', on_press=self.start_working),
            rio.ProgressBar(monitor_key="omni_inference"), # attach the particular progress bar to the particular rio.ProgressMonitor key established above
        )

I presumed above that rio.ProgressBar would remain as-is, and therefore invented subclasses (the fictional "rio.MonitorXYZ" for that purpose). An alternate approach might be to leave the original code unmodified and initiate the wrapper within the ProgressDemo component along the lines of:

from omnigen import scheduler
class ProgressDemo(rio.Component):
    
    async def start_working = rio.ProgressMonitor(OmniGenScheduler(my_ui_options_not_coded_here),
           key = "omni_inference",
           counter = "percent",
    )
    
    def build(self):
        return rio.Column(
            rio.Button('start working', on_press=self.start_working),
            rio.ProgressBar(monitor_key="omni_inference"),
        )

On the 'monitor_key' property, bear in mind that an app could have multiple async processes needing to be monitored, such as an external download of a large file and local inference, or a series of inferences/downloads and a counter of the total progress.

As an aside, I accidentally suggested a tqdm.async wrapper for external file downloads as I was thinking through the pseudo example. This would also be a worthwhile helper if there is going to be a progress monitor function of some kind, as it lends itself nicely to common uses cases (user downloads the software, which then downloads a bunch of large checkpoints).

@Aran-Fey
Copy link
Contributor

Aran-Fey commented Dec 7, 2024

Thanks for the detailed response. I'm still struggling to wrap my head around this (particularly since I'm unfamiliar with both tqdm and gradio), but it's becoming clearer.

Considering that I forgot to guard against the user pressing the "start working" button twice (which would lead to two async functions fighting over the same progress bar), it's fair to say that we should try to make ProgressBars more intuitive to use.

Not sure how tqdm would be integrated with this, but here are some ideas I had:

  1.  class ExampleTask(rio.Task):
         async def run(self):
             # Simulate some work
             for p in range(10):
                 await asyncio.sleep(1)
                 self.progress = (p + 1) / 10
    
    
     class ProgressDemo(rio.Component):
         current_task: ExampleTask | None = None
         
         async def start_task(self):
             self.current_task = ExampleTask()
         
         def build(self):
             return rio.Column(
                 rio.Button(
                     'start task',
                     on_press=self.start_task,
                     is_sensitive=self.current_task is None,
                 ),
                 rio.ProgressBar(self.current_task),
             )
  2.  class ProgressDemo(rio.Component):
         @rio.task
         async def do_stuff(self):
             # Simulate some work
             for p in range(10):
                 await asyncio.sleep(1)
                 yield (p + 1) / 10
         
         def build(self):
             return rio.Column(
                 rio.Button(
                     'start task',
                     on_press=self.do_stuff.start,
                     is_sensitive=not self.do_stuff.is_running,
                 ),
                 rio.ProgressBar(self.do_stuff),
             )

@iwr-redmond
Copy link
Author

I think (1) is closer to my layman's understanding of how Rio is structured. A rio.Task fundamental component to manage whatever comes up (sending progress to the UI, avoiding collisions, etc.) also seems highly valuable.

The Gradio reference is primarily to demonstrate that CLI -> UI transfer without matplotlib is plausible. The internals of tqdm always make me feel about as sharp as a mashed potato, so you're in good company there. If I may suggest, try experimenting with the wget sample because it shows two different ways of formatting the wget function: the sample TqdmUpTo class for a progress bar, which actually is not needed for Rio, and wrapattr, which shows only some numbers during file transfer (too little but a useful comparison). I suggested specific text deliverables (percent, range, steps) because tqdm.async can be configured to provide these during function execution, and that significantly narrows down the amount of customization/integration work that needs to be done. More counting/progress reporting methods can always be added later.

@iwr-redmond
Copy link
Author

There is a useful sample in issue 660, which converts the TQDM stats to strings that can be parsed.

That would provide something like:

stats = str(nested) # from the sample code

update = stats.split("%")[0] # assuming the standard tqdm display format
self.progress = update # send the new count to the frontend

@mad-moo
Copy link
Contributor

mad-moo commented Dec 17, 2024

I don't think the implementation is the problem. After looking at Aran Fey's code, the integration just doesn't help a whole lot. It saves just a handful of lines of code, but in return makes everything more complex.

I still like the idea, but IMHO this only makes sense if we can come up with an easier way for users to actually access it

@iwr-redmond
Copy link
Author

this only makes sense if we can come up with an easier way for users to actually access it

Hallelujah to that!

@iwr-redmond
Copy link
Author

Wolfgang Fahl's NiceGUI wrapper has adds TQDM to NiceGUI in a simpler manner than above. Example usage here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improves on existing functionality - NOT a new feature
Projects
None yet
Development

No branches or pull requests

3 participants