Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unit 3 : Vision Transformers / Transfer Learning & Fine-Tuning Chapter Content #204
Unit 3 : Vision Transformers / Transfer Learning & Fine-Tuning Chapter Content #204
Changes from 37 commits
73d9c0d
f603ab1
aaba4e6
75f7331
9f18e18
eabb8db
7183a81
33edf14
f511ac1
db943e1
1ebfc9d
c00d727
147061b
801776e
73316e1
255c60f
85721ee
fc08fa8
b9a22c6
2673f50
4f0dcd9
778481c
bb12e2c
c93f86a
7325118
206c08a
85fcb65
85401ea
89a97ed
c8a95b0
4a7d956
3d30a27
870496a
1181ecd
d6b6099
fbf87b7
34bc730
4b24343
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can briefly explain what panoptic segmentation is (you can find it here https://huggingface.co/docs/transformers/tasks/semantic_segmentation) and explain what's going on below. Also you could use
pipeline
, it's shorterThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested changes:
Object detection is a computer vision task that identifies and localizes objects within an image or a video. It involves two primary steps:
first,recognizing the types of objects present (such as cars, people, or animals),Second,determining their precise locations by drawing bounding boxes around them.simplya set of numbers, which tells where the object is located (regressive output containing co-ordinates of the bounding box) and what that object is (classification)There are many use cases for object detection.
One most significant example beingIn the field of autonomous driving, for instance,Whereobject detection is used to detectdifferent objects (likepedestrians, road signs,andtraffic lights, etc) around the car that becomes one of the input for taking decisions.If you want to understand more around the ins-and-outs ofTo learn more about object detection, check out the dedicated chapter about Object Detection 🤗Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this answered in the course chapter? I think we can skip this in the notebook
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Just execute the below cells to install the necessary packages." => "Execute the below cells to install the necessary packages."
If you mention transformers and PyTorch, then maybe it's worth mentioning the other libraries as well, and what you'll use them for.
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestions:
To make this tutorial interesting, lLet's consider a real-world example.Consider this scenario:Construction workers require the utmost safety at their workplacewhen working in construction areas. Basic safety protocols requireswearingthemevery time. Since there are lot of construction workers, it is hard to keep and eye on everyone all the timeeverytime.But, if we can have a camera system, which can detect persons and whether the person is wearing a helmet or not in real time, that would be awesome, right?To improve safety, wouldn't it be helpful to have a camera system that can detect whether a person is wearing a helmet in real time?
So, we are going toLet's fine-tune a light weight object detection model for thisdoing the same. Let's dive in.Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to say a couple of words about the dataset.
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestions:
Now, that we know what a sample data point contains, let's
start with plottingplot a sample with. Here we are going to first draw the image and then also drawthe corresponding bounding boxassociated.Here is what we are going to do:Get the image and it's corresponding height and width.Make a draw object that can easily draw text and lines on image.Get the annotations dict from the sample.Iterate over it.For each, get the bounding box co-ordinates, which are x (where the bounding box starts horizontally), y (where the bounding box starts vertically), w (width of the bounding box), h (height of the bounding box).Now if the bounding box measures are normalized then scale it, else leave it.And finally draw the rectangle and the the class category text.Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove this code comment completely, or move it to markdown instead.
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestions:
AutoImageProcessorPreprocessing the imagesBefore fine-tuning the model, we must preprocess the data in
sucha way thatitmatches exactlywiththe approachit wasused during thetime ifpre-training. HuggingFace AutoImageProcessor takes care of processing the image data to create pixel_values, pixel_mask, and labels that a DETR model can train with.Now, let usLet's instantiate the image processor from the same checkpointwe want to useas the model to fine-tune.Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestions:
In this section,
we will preprocess the dataset. Basically,we will apply different types of augmentationsto the images, along with their corresponding bounding boxes.In simple terms,Augmentations are some set of random transformations like rotations, resizing etc. These are applied for the following reasons:We will use the albumentations library to achieve this. If you want to dig deeper into different types of augmentations, check out the corresponding unit to learn more.
Note: is there a link to the unit we could add here?
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no need for this code comment. Please remove
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestions:
Once we initialize all the transformations, we need to make a function which formats the annotations and returns the a list of
annotation withthem in averyspecific format.This is because, the
image_processor
expects the annotations to be in the following format:{'image_id': int, 'annotations': List[Dict]}
, where each dictionary is a COCO object annotation.Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestions:
Finally, we combine individual image and annotation transformations to
do transformations overwork onthe whole batch ofa dataset batch.Here is the final code to do so:Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems redundant
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This too is not needed
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something you mention in the markdown, so there's no need to repeat it in the code comments.
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove the last sentence.
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can go before the previous markdown, meaning, before you explain what data collator does.
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good idea to explain at least what this parameter does.
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can skip this.
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestions:
Now we will try to do inference of our new fine-tuned model. Here we first write a very simple code on doing inference for object detection for some new images.
And then we will
club togatherput everything togethereverything upand make a function out of it.Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not reuse the function that you have defined earlier?
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to only show the results with the highest scores. Showing everything makes the image look cluttered. The pipeline earlier returns the top two scoring results, let's only show those
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "Clubbing it altogather" mean? Also, it's "altogether"
Reply via ReviewNB