models/yolo-world/ #8224
Replies: 36 comments 121 replies
-
Authors of yolo world released instructions to fine tune their models, would that be supported by ultralytics? if so id like to contribute - https://github.com/AILab-CVC/YOLO-World/blob/master/docs/finetuning.md |
Beta Was this translation helpful? Give feedback.
-
im having trouble understanding what this is exactly, |
Beta Was this translation helpful? Give feedback.
-
I am getting different results for YOLOv8l-world compared to what I am trying on Yolo-World hugging face demo page, Any idea why this could be happening? |
Beta Was this translation helpful? Give feedback.
-
i tried this code to detect car and it's number plate. but i am not getting any results for number plate. car is working fine. what to do? |
Beta Was this translation helpful? Give feedback.
-
Nice job. I am very glad that ultralytics have already support yolo-world. And I found that you are already support the coreml export. This works. from ultralytics import YOLOWorld
model = YOLOWorld('yolov8s-world.pt')
model.export(format='coreml') But when I want to set the classes, it does not work. from ultralytics import YOLOWorld
model = YOLOWorld('yolov8s-world.pt')
model.set_classes(["colorchecker", "ball", "object", "painting", "flower", "vase", "lavander", "rabbit"])
model.export(format='coreml') The error is: Ultralytics YOLOv8.1.17 🚀 Python-3.10.12 torch-2.1.0+cu121 CPU (Intel Xeon 2.00GHz)
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-67-a013544c0706> in <cell line: 1>()
----> 1 model.export(format='coreml')
8 frames
/usr/local/lib/python3.10/dist-packages/torch/_tensor.py in __deepcopy__(self, memo)
84 return handle_torch_function(Tensor.__deepcopy__, (self,), self, memo)
85 if not self.is_leaf:
---> 86 raise RuntimeError(
87 "Only Tensors created explicitly by the user "
88 "(graph leaves) support the deepcopy protocol at the moment. "
RuntimeError: Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment. If you were attempting to deepcopy a module, this may be because of a torch.nn.utils.weight_norm usage, see https://github.com/pytorch/pytorch/pull/103001 |
Beta Was this translation helpful? Give feedback.
-
is there a away to fine tune the model if you manage to get fairly close to accurate to what you want to box bound but just need to focused on your particular application once you've set the model? |
Beta Was this translation helpful? Give feedback.
-
How can we run 'videos' with custom classes defined? |
Beta Was this translation helpful? Give feedback.
-
when the yolo world support "Training"? |
Beta Was this translation helpful? Give feedback.
-
are there plans to implement it to the export method? |
Beta Was this translation helpful? Give feedback.
-
Can we run inference with multiple images or batch of images for reducing overall computation time? |
Beta Was this translation helpful? Give feedback.
-
The model performance of the Yolo World Hugging Face model (https://huggingface.co/spaces/stevengrove/YOLO-World) is better for my purpose than the standard inference one "YOLOv8x-worldv2" for example. Because my lack of domain knowledge regarding this topics I can't make the HF model work on my Mac (not an M processor) so I would like to be able to use the model weights with the abstraction from the ultralytics library. Is there any way to use the author's provided weights together with the inference library? From the author's instruction on how to run the model on an image one needs the configuration file and the weights: https://github.com/AILab-CVC/YOLO-World The conf file I want to use and the weights are available here: Thank you in advance :) |
Beta Was this translation helpful? Give feedback.
-
Hello There, CODE: class WebcamApp:
Create a Tkinter windowroot = tk.Tk() Create the WebcamApp objectapp = WebcamApp(root, "Webcam Application")ERRORS: |
Beta Was this translation helpful? Give feedback.
-
How can I combine YOLO-World with SAHI to process satellite images? |
Beta Was this translation helpful? Give feedback.
-
I finetuned a yolo world model for a different classes which was not detectable using the original weights Now I have a weights for for detection of that classes. I also get another weight for class which are present in the original weight from ultralytics import YOLOWorld How can I combine both weights so that i can detect all required classes?? |
Beta Was this translation helpful? Give feedback.
-
I am trying to get the tracking id from image using below code img=cv2.imread('WA0004_444.jpg') when i check the results of tracking it is showing false id: None |
Beta Was this translation helpful? Give feedback.
-
I want predict three class: litter, car license plate and face.
My problem is to bring the predicted bounding boxes on the crops back to the scale of the original image by saving them in the .txt file in the usual YOLO format: how can I do it? The code where I do the first prediction is this
The second part, where I do the prediction on the crops is there
Now I should bring the bounding box back to the original image (i.e. save it in the .txt file of the first prediction), but I don't know what conversion to do. |
Beta Was this translation helpful? Give feedback.
-
Hello, I recently started exploring the YOLO-World project and greatly appreciate your work. However, I noticed some differences between the original YOLO-World model and YOLO-World v2. In the original YOLO-World paper, the vision features influence the text embeddings through the Image-Pooling Attention (I-Pooling Attention) module within the Re-parameterizable Vision-Language Path Aggregation Network (RepVL-PAN). This module enhances the text embeddings by integrating image-aware information from multi-scale image features. It appears that the YOLO-World v2 of Ultralytics has removed the I-Pooling Attention module. My questions are: Would this change decrease the model's performance while improving inference speed? Does this mean that in v2, no module at all impacts the text embeddings using the vision features from the YOLOv8 backbone? If there is anything wrong in my understanding, please correct me. Again, Thank you for your work |
Beta Was this translation helpful? Give feedback.
-
Could you help me understand how to perform inference using a ONNX or TFLite YOLO-World model on images / videos |
Beta Was this translation helpful? Give feedback.
-
hello! Can ultralytics‘s yolo-world convert text features(text_feats) from CLIP into nn.parameter, thereby allowing text embedding to be updated with the optimizer? As far as I know, the generation of text features (text_feats)(world/train.py) does not explicitly appear in the model block (such as Conv or C2f). Is there a way to convert text_feats into nn.parameter?Thank you for your reply. Wishing you a happy life! |
Beta Was this translation helpful? Give feedback.
-
Hi, great work! I have a problem when training yolov8x-worldv2.pt. It automatically downloads yolo11n.pt and seems to load it? This seems unreasonable because training yolov8x-worldv2.pt does not depend on yolo11n.pt. Execute command: yolo detect train data='custom_datasets.yaml' model=yolov8x-worldv2.pt epochs=100 imgsz=640 device=0,1,2,3 save_period=1 name=train_yolov8x-worldv2 patience=10 When training, it will output: YOLOv8x-worldv2 summary: 396 layers, 72,886,377 parameters, 72,886,361 gradients, 283.6 GFLOPs AMP: running Automatic Mixed Precision (AMP) checks with YOLO11n... |
Beta Was this translation helpful? Give feedback.
-
When training yolov8x-worldv2.pt, an error message of insufficient memory appears. Does the training part seem to have a memory leak? Please check. Error: OSError: [Errno 28] No space left on device |
Beta Was this translation helpful? Give feedback.
-
How would the exported version of this model work? for example tensorflow.js? how do I "set_classes", etc? |
Beta Was this translation helpful? Give feedback.
-
Can you share the training logs for the reproduce of YOLO-World? |
Beta Was this translation helpful? Give feedback.
-
Hey there, while training with YOLO-worldv2, I've noticed this pattern: regardless of the img-size I specify(640*640), the model always initializes with a size of 256x256. Additionally, after every epoch, the validation is done at a size of 384x672. Could there be an issue with my code? Or is this a deliberate strategy? If intentional, how can I adjust this behavior? Thanks for your response! |
Beta Was this translation helpful? Give feedback.
-
Hi, how can i use the pretrained model yolov8s-worldv2.pt to inference 1203 classes of lvis datasets? When i use val.py to run, there is an error in the number of categories. |
Beta Was this translation helpful? Give feedback.
-
Hi Ultralytics Team, Thank you for your efforts in bringing YOLO-World into the Ultralytics framework. I've encountered a potential issue (or perhaps just a point of confusion) that I'm hoping to clarify. The YAML file for yolov8-world, located in ultralytics/cfg/models/v8/yolov8-world.yaml, appears intended to create the original version of YOLO-World, which includes the ImagePoolingAttn module. However, I’ve noticed that none of the subsequent modules seem to take input from this ImagePoolingAttn module, neither the detection head module nor the module directly following it. It seems that at least the module immediately following the ImagePoolingAttn module should have -1 in the "from" section to indicate it’s taking input from the previous layer. Alternatively, another module should specify input from 16, which is the index location of the ImagePoolingAttn module. I also checked ultralytics/nn/tasks.py, which is responsible for integrating modules into the model, and observed the following in the world detection model:
As observed, each module consistently accesses the same initial text features from the CLIP model without impacting these features. This suggests that the ImagePoolingAttn module may not be functioning as intended. There appears to be no mechanism in place for image features to influence the text embeddings, even though the original YOLO-World paper introduced this concept. Given this, would it be correct to conclude that the safer approach for using YOLO-World in the Ultralytics framework is to use YOLO-World v2, which removed the ImagePoolingAttn module entirely? YOLO-World v2 relies instead on the C2fAttn module, which incorporates T-CSP layers as described in the original paper. This adjustment seems appropriate, as the ImagePoolingAttn module in YOLO-World does not appear to be fully implemented, potentially missing critical elements. While I may have overlooked parts where image features influence text features, I haven't been able to identify them thus far. Please clarify if there is any confusion about this. Thank you for your time and assistance. |
Beta Was this translation helpful? Give feedback.
-
Hi, Thank you for your wonderful work about YOLO-World detection. I have try this part, that is great. I have found the scrips codes of official YOLO-World, they have two different tasks. One for object detection, and one for instance segmentation. Will you update the code about instance segmentation head? Maybe named WorldSegment? |
Beta Was this translation helpful? Give feedback.
-
Hi, i want to set |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
models/yolo-world/
Discover YOLO-World, a YOLOv8-based framework for real-time open-vocabulary object detection in images. It enhances user interaction, boosts computational efficiency, and adapts across various vision tasks.
https://docs.ultralytics.com/models/yolo-world/
Beta Was this translation helpful? Give feedback.
All reactions