-
Notifications
You must be signed in to change notification settings - Fork 591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix: use device in all Torch models #5026
Bugfix: use device in all Torch models #5026
Conversation
WalkthroughThe changes involve modifications to the device management in the Changes
Assessment against linked issues
Possibly related PRs
Suggested reviewers
Poem
Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
Still works fine and I can see difference between cpu and cuda. Note for future, this change is not pulled upstream by fob.compute_similarity(
dataset,
model="clip-vit-base32-torch",
brain_key="img_sim",
device="cuda",
) and just noticed. Something for next time :) @harpreetsahota204 can you run this code when you test: import fiftyone.brain as fob
model = foz.load_zoo_model("clip-vit-base32-torch", device="cuda")
print(model._model.visual.conv1._parameters["weight"][0].device) To make sure the model is also multi-gpu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
import fiftyone.brain as fob
from fiftyone import ViewField as F
dataset = foz.load_zoo_dataset('quickstart')
session = fo.launch_app(dataset)
model = foz.load_zoo_model("clip-vit-base32-torch", device="cuda")
embeddings = dataset.compute_embeddings(model)
worked as expected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (1)
fiftyone/utils/super_gradients.py (1)
98-100
: Consider adding a docstring note about device flexibility.Since this change enables flexible device selection, it would be helpful to document this capability in the class or method docstring. This would help users understand that they can use any available GPU.
Add a note like this to the class docstring:
"""FiftyOne wrapper around YOLO-NAS from https://github.com/Deci-AI/super-gradients. + +The model automatically uses the appropriate device (CPU/GPU) based on availability +and can work with any CUDA device, not just the default one. Args: config: a :class:`TorchYoloNasModelConfig` """
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (3)
fiftyone/utils/clip/zoo.py
(1 hunks)fiftyone/utils/open_clip.py
(3 hunks)fiftyone/utils/super_gradients.py
(1 hunks)
🔥 Files not summarized due to errors (1)
- fiftyone/utils/clip/zoo.py: Error: Server error: no LLM provider could handle the message
🔇 Additional comments (4)
fiftyone/utils/super_gradients.py (1)
99-99
: LGTM! Device management improvement.
The change from model.cuda()
to model.to(self.device)
improves GPU device flexibility, allowing the model to work with any available GPU instead of being restricted to the default one.
Let's verify that the model parameters are correctly moved to the specified device:
✅ Verification successful
Device management change is properly implemented
The verification confirms that the change from model.cuda()
to model.to(self.device)
is correct and consistent with the codebase's device management pattern. Key findings:
- The base
TorchImageModel
class properly initializesself.device
in its constructor:
device = self.config.device
if device is None:
device = "cuda:0" if torch.cuda.is_available() else "cpu"
-
All other model implementations (CLIP, SAM, YOLO-NAS) consistently use
self.device
for device management. -
The change aligns with the existing pattern used across the codebase for moving tensors and models to the target device.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check if there are any remaining hard-coded device assignments
# that might conflict with the new flexible device management
# Test 1: Look for any remaining .cuda() calls that might bypass device management
rg -g '*.py' '\.cuda\('
# Test 2: Check for proper device initialization in the parent class
ast-grep --pattern 'class TorchImageModel:
$$$
def __init__($$$):
$$$
self.device = $$$
$$$
'
Length of output: 137
Script:
#!/bin/bash
# Let's try a different approach to verify device management
# Check for device-related code in TorchImageModel and its subclasses
rg -g '*.py' 'class.*TorchImageModel' -A 10
# Look for device initialization patterns
rg -g '*.py' 'self\.device' -B 2 -A 2
# Check for any remaining cuda-specific code
rg -g '*.py' 'cuda|gpu' -B 2 -A 2
Length of output: 26257
fiftyone/utils/open_clip.py (3)
109-109
: LGTM: Device handling improvement in text feature extraction.
The change from .cuda()
to .to(self.device)
is correct and aligns with the PR's objective of supporting multiple GPUs.
121-121
: LGTM: Device handling improvement in prompt embedding.
The change from .cuda()
to .to(self.device)
is correct and consistent with the previous change.
109-109
: Verify device consistency across model components.
The device handling changes look good, but we should verify that all model components (model, tensors, etc.) are consistently using the same device.
Also applies to: 121-121, 146-146
✅ Verification successful
Based on the shell script results, I can now generate the final response as the device handling patterns are clear across the codebase.
Device handling is consistent across model components
The device handling in the OpenCLIP implementation follows the established patterns used throughout the codebase:
- The model and tensors are consistently moved to the same device using
to(self.device)
- Device initialization follows the standard pattern of defaulting to CUDA if available, otherwise CPU
- All tensor operations (text encoding, image encoding) maintain device consistency
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify device handling consistency across the codebase
# Look for any remaining .cuda() calls or device-related patterns
# Test 1: Check for any remaining .cuda() calls
echo "Checking for remaining .cuda() calls..."
rg '\.cuda\('
# Test 2: Check for device-related patterns in TorchOpenClipModel and related classes
echo "Checking device handling patterns..."
rg -A 3 'to\(.*device|\.device|device\s*='
Length of output: 14047
@@ -143,7 +143,7 @@ def _predict_all(self, imgs): | |||
frame_size = (width, height) | |||
|
|||
if self._using_gpu: | |||
imgs = imgs.cuda() | |||
imgs = imgs.to(self.device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding error handling for device transfers.
While the device handling change is correct, the subsequent autocast
context is hardcoded to "cuda". This might cause issues when running on CPU.
Apply this diff to make it device-aware:
if self._using_gpu:
imgs = imgs.to(self.device)
- with torch.no_grad(), torch.amp.autocast("cuda"):
+ with torch.no_grad(), torch.amp.autocast(device_type=self.device.type if self._using_gpu else "cpu"):
image_features = self._model.encode_image(imgs)
text_features = self._get_text_features()
Also applies to: 147-152
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jacobsela coderabbit raises an interesting point here. Does torch.amp.autocast("cuda")
need to be updated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is valid and will cause problems if not handled. It's in my todo for this week to more thoroughly review the code before moving further with this PR because this message makes me think that there are probably more places I haven't noticed that make hardware assumptions.
@jacobsela friendly ping here; can we get this completed and merged? |
Any chance to modify |
@deltheil That can be done, we'll look into it. |
Currently testing all zoo models. There seem to be some other unrelated issues that may be worth addressing, e.g. #5359 and an error pasted below with open clip. I'll push the fixes for transformers once git goes back up. (EDIT: up now) This also makes me think that we may need proper testing when adding new zoo models. The code isn't very consistent. Not sure if it's worth the time sink though. Current status: Tested devices ['cpu', 'cuda:2'] Tested models - all pass besides open-clip-torchclassification-transformer-torch Not tested models (I need to setup an environment to test all of these):Model Why test was skipped Errors======================================================================
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
fiftyone/utils/transformers.py (2)
326-337
: Add device parameter validation and documentation.The device handling logic is correct, but consider these improvements:
- Add validation for the device parameter to ensure only valid values are accepted (e.g., 'cuda', 'cpu', 'cuda:0', etc.)
- Document the device parameter in the class docstring.
"""Configuration for a :class:`FiftyOneTransformer`. Args: model (None): a ``transformers`` model name_or_path (None): the name or path to a checkpoint file to load + device (None): the device to use for model execution (e.g., 'cuda', 'cpu', 'cuda:0'). + If not specified, uses CUDA if available, otherwise CPU. """
759-760
: Consider refactoring device initialization to reduce code duplication.The device initialization pattern is repeated across multiple transformer classes. Consider moving this common functionality to a base class or mixin to promote DRY principles.
Example approach:
class DeviceMixin: def _initialize_device(self): self.device = torch.device(self.config.device) self.model.to(self.device)
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
fiftyone/utils/transformers.py
(4 hunks)
🔇 Additional comments (3)
fiftyone/utils/transformers.py (3)
463-464
: LGTM! Device handling follows PyTorch best practices.The implementation correctly initializes the device from config and moves the model to the appropriate device.
509-510
: LGTM! Device handling is consistent.The implementation maintains consistency with the base class's device handling approach.
Line range hint
326-760
: Verify tensor operations use the specified device consistently.Let's ensure all tensor operations in the codebase consistently use the specified device.
✅ Verification successful
Device handling is properly implemented
All tensor operations in the codebase consistently use the specified device through proper device transfer mechanisms. The implementation in the reviewed code aligns with the rest of the codebase's device handling patterns.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check for potential tensor operations that might not use the specified device # Search for tensor operations that might need device handling rg -A 2 "torch\.(tensor|Tensor|from_numpy|zeros|ones|arange|linspace|empty|full|eye|rand|randn)" fiftyone/utils/ # Search for potential cuda-specific calls that should use device instead rg "\.cuda\(\)" fiftyone/utils/Length of output: 9099
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
fiftyone/utils/transformers.py (1)
464-465
: Consider extracting device management into a mixin class.The device initialization and model movement logic is duplicated across multiple transformer classes. Consider creating a
DeviceManagementMixin
class to reduce code duplication and maintain consistency.Example implementation:
class DeviceManagementMixin: def initialize_device(self): self.device = torch.device(self.config.device) self.model.to(self.device)Then update the transformer classes to use the mixin:
class FiftyOneTransformer(DeviceManagementMixin, TransformerEmbeddingsMixin, Model): def __init__(self, config): self.config = config self.model = self._load_model(config) self.initialize_device() self.image_processor = self._load_image_processor()Also applies to: 510-511, 760-761
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
fiftyone/utils/transformers.py
(4 hunks)fiftyone/utils/ultralytics.py
(3 hunks)
🔇 Additional comments (7)
fiftyone/utils/ultralytics.py (3)
23-23
: LGTM!The
torch
import is correctly added usinglazy_import
to avoid unnecessary imports.
382-384
: LGTM!The device configuration is correctly implemented with a sensible default that falls back to CPU if CUDA is not available.
397-398
: LGTM!The device initialization and model movement to the specified device are correctly implemented.
fiftyone/utils/transformers.py (4)
326-338
: LGTM!The device configuration logic is well-implemented with proper handling of both string-based model paths and pre-loaded models.
464-465
: LGTM!The device initialization and model movement to the specified device are correctly implemented in
FiftyOneTransformer
.
510-511
: LGTM!The device initialization and model movement to the specified device are correctly implemented in
FiftyOneZeroShotTransformer
.
760-761
: LGTM!The device initialization and model movement to the specified device are correctly implemented in
FiftyOneZeroShotTransformerForObjectDetection
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
fiftyone/utils/transformers.py (1)
457-458
: Consider adding error handling for device availability.While the device initialization looks correct, it would be beneficial to add error handling for cases where the specified device is not available.
def __init__(self, config): self.config = config self.model = self._load_model(config) - self.device = torch.device(self.config.device) - self.model.to(self.device) + try: + self.device = torch.device(self.config.device) + self.model.to(self.device) + except RuntimeError as e: + logger.warning(f"Failed to move model to {self.config.device}. Falling back to CPU. Error: {e}") + self.device = torch.device("cpu") + self.model.to(self.device) self.image_processor = self._load_image_processor()
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
fiftyone/utils/transformers.py
(4 hunks)
🔇 Additional comments (2)
fiftyone/utils/transformers.py (2)
326-328
: LGTM: Device configuration with sensible defaults.The device configuration is well-implemented with a sensible default that automatically selects CUDA if available, falling back to CPU otherwise.
326-328
: Verify device compatibility across the codebase.The changes introduce device management across multiple classes. Let's verify that all model operations consistently use the specified device.
Also applies to: 457-458, 503-504, 753-754
✅ Verification successful
Device compatibility verification successful
All model operations consistently use the specified device across the codebase. Input tensors and models are properly moved to the configured device before processing, maintaining compatibility throughout the model operations.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check for potential device-related issues in model operations # Look for tensor operations that might not respect the device setting rg -A 2 "\.to\(" --type py # Look for direct cuda() calls that should be replaced with to(self.device) rg "\.cuda\(" --type py # Look for device-related patterns in model operations ast-grep --pattern 'with torch.no_grad(): $$$ outputs = $model($$$) $$$'Length of output: 5936
Still no testing done for MPSModels tested w/ various coda devicesalexnet-imagenet-torch Models that are still problematicyolov5 - loads on the coda:0 before being loaded to the device in the argument. Not sure why models that haven't been tested - need to setup envmed-sam-2-video-torch - Model is not an image model |
Adding @manushreegangwar and @mwoodson1 as ML team reviewers 😄 |
@jacobsela can you rebase on latest Also:
We're using Ultralytics' model here. Can anything be done to address this?
On
On |
I have some scripts sitting around that can test. I will do Mac CPU + MPS (I have M4) and multi GPU. Will kick off runs tonight and hopefully will finish before morning. Will bring back findings |
… already loaded not just string
edit: Can't reproduce... |
I'm just going to pass "cpu" to always be the device in the manifest. model is moved to correct device afterwards. |
06ead81
to
fb7b179
Compare
status:
TL;DR
Works but for whatever reason loads on "cuda" before going to desired device:
|
MPS works on all but some transformers due to an aten::upsample_bicubic2d.out operator. Error spits out correctly as "not supported on MPS yet" from torch. Multi GPU works except for zero-shot-classification-transformer-torch on device cuda
+1 to clip input_ids issue. LGTM for my tests just needs the stated above fixes |
open-clip updateThe preprocessor of open clip is loaded in line 92 of This method is called by Even when this issue is fixed by saving the preprocessor in an auxiliary variable and setting Even when Fixing all of these issues fixes the problem in my env. PR: #5395 |
@danielgural @mwoodson1 @manushreegangwar Given the fact that most models are working, I suggest we currently merge this PR as it is and open separate tickets for the other issues, e.g. #5395 Let me know what you think. |
Leaving this test script here in case we need it again: |
@jacobsela can you retarget this PR at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Resolves #5271
Summary by CodeRabbit
New Features
Improvements
Technical Updates
device
attribute in configuration classes for more precise control.