Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attribute Error on running bvqa.py describe (moondream?) #20

Open
njakeman opened this issue Jan 20, 2025 · 12 comments
Open

Attribute Error on running bvqa.py describe (moondream?) #20

njakeman opened this issue Jan 20, 2025 · 12 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@njakeman
Copy link

njakeman commented Jan 20, 2025

NB. Unable to paste stack from the TRE VM to this Issue, so will try to capture relevant details:

On running /test/python ../bvqa.py describe (no args, so assuming moondream by default?)
First warning as follows:

Image

Second warning appears as first description is attempted:

Image

Final error suggests syntax problem

Image

@geoffroy-noel-ddh geoffroy-noel-ddh self-assigned this Jan 20, 2025
@geoffroy-noel-ddh geoffroy-noel-ddh added the bug Something isn't working label Jan 20, 2025
@geoffroy-noel-ddh geoffroy-noel-ddh added this to the Top priority milestone Jan 20, 2025
@mchesterkadwell
Copy link
Contributor

This is a versioning issue as Transformers has deprecated get_max_length in base class Cache from v4.48.

Moondream 2024-07-23 has the old call to get_max_length(): https://huggingface.co/vikhyatk/moondream2/blob/2024-07-23/modeling_phi.py#L1104

2024-07-23 is the version we are using:
https://github.com/kingsdigitallab/kdl-vqa/blob/main/describer/moondream.py#L8C1-L8C29

Moondream 2024-08-26+ has a new implementation of prepare_inputs_for_generation: https://huggingface.co/vikhyatk/moondream2/blob/2025-01-09/modeling_phi.py#L1403

Unless @geoffroy-noel-ddh there's some other reason not to, we could try updating to 2025-01-09? (Or downgrade transformers to < 4.48, which is less preferable).

@geoffroy-noel-ddh
Copy link
Member

Thanks, I think we can upgrade Moondream version as we'll keep updating the rest of the stack to enable other newer model types. On minor issue is that the experiments I ran last year on the Sculpting Time proof of concept use the 2024-07-23 version. If I need to reproduce those results I'll create a new file under build for specific versions of moondream.

More generally the BVQA requirements are quite loosely set so far (most entries in requirementsX.txt are not bracketed). And this will soon cause other similar problems for any describer/model. I'll try to think about better versioning approach.

@geoffroy-noel-ddh
Copy link
Member

geoffroy-noel-ddh commented Jan 28, 2025

I actually cannot reproduce the bug with vikhyatk/moondream2:2024-07-23 and transformers 4.48.1. The description on Ubuntu completes without errors. I wonder why...

Ok, I see. If I re-install bvqa dependencies from scratch with build/requirements.txt and build/requirements-moondream.txt, transformers 4.48.2 is installed. Which still has the deprecated method.

But if one installs the dependencies for qwen (build/requirements-qwen) latest unreleased version transformers (4.49.0.dev0 as of today) is pulled and this doesn't have the method. I can reproduce the bug that way.

So I think we (read I) should also remove the git+https://github.com/huggingface/transformers from requirements-* as it will keep causing trouble and is no longer needed for qwen 2.5 anyway.

@geoffroy-noel-ddh
Copy link
Member

geoffroy-noel-ddh commented Jan 28, 2025

While testing of 2025-01-09 on my system (from a freshly cloned repo and recreated env) describe seems stuck on the first image without any noticeable progress after 5 mins. 100% on one cpu core only and 11GB taken by the process.

CUDA_VISIBLE_DEVICES='' python bvqa.py describe -r -R test/data

In contrast, 2024-07-23 describes the three images in 2:44 using all CPUs and 14GB.

2024-08-26 also works well. And it works with live version of transformers (4.49).

I can't see anyone else reporting similar issues about 2025-01-09 on moondream repo. I'll test on another machine.

@geoffroy-noel-ddh
Copy link
Member

On our ML machine I get this error with 2025-01-09 after a fresh install.

(venv) gnoel@ml:~/src/prj/tmp/kdl-vqa$ python bvqa.py describe -r -R test/data
Traceback (most recent call last):
  File "/home/gnoel/src/prj/tmp/kdl-vqa/bvqa.py", line 470, in <module>
    fqas.process_command_line()
  File "/home/gnoel/src/prj/tmp/kdl-vqa/bvqa.py", line 120, in process_command_line
    action['method']()
  File "/home/gnoel/src/prj/tmp/kdl-vqa/bvqa.py", line 145, in action_describe
    self.timer.step(f'comp : {self.describer.get_compute_info()}')
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/gnoel/src/prj/tmp/kdl-vqa/describer/base.py", line 93, in get_compute_info
    model = self.get_model()
            ^^^^^^^^^^^^^^^^
  File "/home/gnoel/src/prj/tmp/kdl-vqa/describer/base.py", line 81, in get_model
    self._init_model()
  File "/home/gnoel/src/prj/tmp/kdl-vqa/describer/moondream.py", line 69, in _init_model
    import torch
  File "/home/gnoel/src/prj/tmp/kdl-vqa/venv/lib/python3.12/site-packages/torch/__init__.py", line 367, in <module>
    from torch._C import *  # noqa: F403
    ^^^^^^^^^^^^^^^^^^^^^^
ImportError: /home/gnoel/src/prj/tmp/kdl-vqa/venv/lib/python3.12/site-packages/torch/lib/../../nvidia/cusparse/lib/libcusparse.so.12: undefined symbol: __nvJitLinkComplete_12_4, version libnvJitLink.so.12

@mchesterkadwell
Copy link
Contributor

mchesterkadwell commented Jan 28, 2025

I'm getting successful runs on the TRE with Moondream 2025-01-09 using Python 3.10.12.

python bvqa.py describe --describer moondream --redo --root test/data

Image

Image

@geoffroy-noel-ddh
Copy link
Member

geoffroy-noel-ddh commented Jan 28, 2025

Thanks. ML error is a known pytorch & cuda version incompatibility, it seems (I can bypass it with unset LD_LIBRARY_PATH). It is unrelated to moondream.

However CUDA_VISIBLE_DEVICES='' python bvqa.py describe -r -R test/data (with moondream 2025-01-09) on ML (CPU only) also gets stuck, just like it does on my laptop (as describe above). python bvqa.py describe -r -R test/data (GPU enabled) works well.

@mchesterkadwell
Copy link
Contributor

mchesterkadwell commented Jan 28, 2025

Okay, I will check the CPU-only option on TRE tomorrow. Are you thinking we upgrade to the previous version (2024-08-26) for now? If it's only being triggered on the dev build of transformers 4.49 we don't need to upgrade if we stick with the 4.48 release, but it is a deprecated method - it will break at some point in the future soon.

@geoffroy-noel-ddh
Copy link
Member

geoffroy-noel-ddh commented Jan 28, 2025

I noticed Moondream's README recommends to use transformers for GPU and, presumably, its own package for CPU. It might explain the problem. Transformers version used to work on CPU (see above) but has become too slow with latest moondream. If that's true, it means we'd have to split the code in our moondream's describer (one branch for CPU, one for GPU) or drop support for CPU (as not essential). That's mildly annoying.

I'll report the CPU issue with 2025-01-09 transformers on moondeam repo when I have time to get feedback.

@geoffroy-noel-ddh
Copy link
Member

geoffroy-noel-ddh commented Jan 28, 2025

Yes, switching to 2024-08-26 as a default would be a good compromise I think. Until I can elucidate the CPU issue with new version. Thanks for your help!

@mchesterkadwell
Copy link
Contributor

I have replicated the unusably slow performance on CPU on the TRE for 2025-01-09.

@geoffroy-noel-ddh
Copy link
Member

Thank you. I reported that last night after trying again on two machine from scratch. Let's wait and see...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants