Lesson 8: Part 2 Intro, Object Detection

(19-Mar-2018, live)

Wiki: Part 2 / Lesson 8
Lesson 8 video
http://course.fast.ai/lessons/lesson8.html
Notebook:
- pascal.ipynb

Staff

Intro by David Uminsky, Director of Data Institute of USF
Jeremy Howard, Distinguished Scholar in Deep Learning

Notes

600 international fellows around the world
Rachel & Jeremy will be in room 153, 10am to 6pm each day (not for mentoring, possible projects)

Object Detection

creating much richer convolutional structures
what is a picture of and where it is in the picture

Learning

Jeremy trying to pick topics that will help us learn foundational topics (richer CNN)
can't possibly cover hundreds of interesting things done with deep learning

Park 1 Takeaways

we don't call this deep learning, but differential programming
Part 1 was setting up a differential function, a loss function and pressing Go
If you can configure a loss function that configures score, how good a task is, you're kind of done
playground.tensorflow.org
- play interactively where you can create and play with your functions manually

Transfer Learning - definition

Transfer learning or inductive transfer is a research problem in machine learning that focuses on storing knowledge gained while solving one problem and applying it to a different but related problem.[1] For example, knowledge gained while learning to recognize cars could apply when trying to recognize trucks. This area of research bears some relation to the long history of psychological literature on transfer of learning, although formal ties between the two fields are limited.

Transfer Learning

the most important thing to learn to do to use deep learning effectively
it makes nearly everything easier, faster and more accurate
fastai library is all focused on transfer learning
network that does thing A, remove last layer or so, replace it with a few random layers at the end, fine tune those layers to do thing B, taking advantage of the features the original network learned

Embeddings

embeddings allow us to use categorical data

Part 1 to Part 2

rather than fastai and PyTorch being obscure, will learn enough to understand the source code
object oriented python important to study and understand
will introduce Python debugger, using editor to jump to code
details on coding technique
detailed walk-throughs of papers
if you come across something you don't know, it is not hard, it is something you need to learn
be careful of taking code from online resources, it may just good enough to have run their experiments, but difficult to generalize, be ready to do some debugging

Motivation

idea is to start with an empty notebook
don't copy and paste code from notebook; TYPE IT OUT
make sure you can repeat the process
practice, practice
if you don't understand a step, can ask on the forums, propose a hypothesis for why you think it doesn't work

Deep Learning Box

if you wish, and have financial resources, can build your own deep learning toolbox
if it is a good time, in your study cycle for it
budget: $1000 - $1500 for your own box
RAM: try to get 32GB
PCI Lanes: don't need to have 16 lanes to feed your GPU, you need 8 lanes
Build: you can buy the parts and put it together, or get someplace to do it for you

Reading Papers

each week we will be reading papers
in academic papers, people love using Greek letters
Adam is momentum and momentum on the square of the gradient
papers include theoretical reasoning for why things work, lot of conferences and journals don't like to accept papers without theoretical justification
Jeffrey Hinton: a decade or 2 ago, no conferences would accept neural network papers, then 1 abstract theoretical result came out, and journals started accepting neural network research
we need to learn to read papers
take a paper, put in effort to understand it, and then write a blog to explain it in code and normal English
lots of people who do that get a following and great job offers
understanding papers ---> useful skill
it's hard to read or understand something that you cannot vocalize, which means if you don't know the names of the Greek letters, it's hard to follow
spend some time to understand Greek letters

Opportunities in this Class

cutting edge research, almost no one else knows about
write blogs, incorporate research into a library
communicating what you are doing is very helpful
can get feedback on draft blogs on the forums

Part 2: What We Will Study

Generative Models
- CNNs beyond classification
- NLP beyond classification
Large datasets

Part 1 output

number
category

Part 2 output

top left, bottom right of image
what object is
complete picture
enhanced version of input image
entire original input paragraph, translated into French

Notes

requires different way of thinking about things
almost all data will be text or image (no audio yet, no more time series (most in ML course))
we will be looking at some larger datasets
don't be put off if you have limited computing resources
- can use smaller datasets
- can cut down on batch size

Object Detection

multiple items in an image we are classifying
saying what we see, we also have bounding boxes around what we see
bounding box: box, rectangle, rectangle has the object entirely in it, but is no bigger than it has to be
bounding box around horse, slightly imperfect, to be expected
take data that is labeled this way and on labeled data, generate classes of object and bounding box
labeling this kind of data is generally more expensive
ImageNet: here are the 1000 classes, tell me which it is
Object Detection: here is a list of classes, tell me everything that is in the image and where it is

Stage 1

classify and localize the largest object in each image

What it is
Where it is

Notebook: Pascal

pascal.ipynb
all notebooks are in dl2 folder
torch.cuda.set_device(3) pick number of GPUs to use (of course, it depends on how many you have to use)

Dataset: Pascal

The PASCAL VOC project:

Provides standardised image data sets for object class recognition
Provides a common set of tools for accessing the data sets and annotations
Enables evaluation and comparison of different methods
Ran challenges evaluating performance on object class recognition (from 2005-2012, now finished)

Notes

Pascal VOC (Visual Object Classes): http://host.robots.ox.ac.uk/pascal/VOC/
we're using 2007 version of data
you can use the 2012 version; it's bigger, will get better results
some people combine the two, need to be careful, there can be leakage between the validation datasets

PATH

this gives you object oriented access to the files
pathlib object has an open method
load the .json files which don't contain the images, but the bounding boxes and the classes of the object
json - the most standard way to pass around hierarchical structured data

PATH = Path('data/pascal')
list(PATH.iterdir())

Coding

requires tenacity

Editor

Visual Studio Code is a great editor out there, it is FREE
- best editor out there (unless you are willing to put time in to learn Vim or Emacs)
- if you download a recent version of Anaconda, it will offer to download Visual Studio for you
- good choice of editor if you are not sure

Steps

do git clone of fastai library
File / Open Folder / open fastai github library
For interpreter: can select fastai environment

You can use Visual Studio Code (vscode - open source editor that comes with recent versions of Anaconda, or can be installed separately), or most editors and IDEs, to find out all about the open_image function. vscode things to know:

Command palette (Ctrl-shift-p)
Select interpreter (for fastai env)
Select terminal shell
Go to symbol (Ctrl-t)
Find references (Shift-F12)
Go to definition (F12)
Go back (alt-left)
View documentation
Hide sidebar (Ctrl-b)
Zen mode (Ctrl-k,z)

OpenCV open image

open_image

cv2 is the open cv library
torch vision library uses PyTorch tensors for all of its data augmentation
a lot of people use PIL (pillow - Python Imaging Library) that adds support for opening, manipulating, and saving many different image file formats.
Jeremy did a lot of testing; found open cv is 5-10x faster than Torch Vision
Jeremy did satellite competition with another student, Torch Vision was very slow
PIL is faster than Torch Vision, but not as fast as open cv; PIL is not as thread-safe
Python has GIL (global interpreter lock) which means that two threads cannot do pythonic things at the same time, which makes Python, not a great language, for modern programming
open cv releases the GIL
one of the reasons the fastai library is so amazingly fast is that we don't use multiple processsors for data augmentation, we use multiple threads, reason we can do multiple threads is that is we use open cv
unfortunately, open cv is a crappy API, poorly documented
for these reasons, don't use PyTorch or Pillow for your data augmentation

Matplotlib

matplotlib so named because it was originally a clone of matlab's plotting library
unfortunately matlab's plotting library is awful, but what was used at the time
so, matplotlib added a second API, an object oriented library, but there's no tutorials on that
Jeremy will show us how to use this API and some simple tricks
plt.subplots is a handy wrapper, it returns 2 things, one is an axis object
instead of saying plt.<>, now say ax.<> where <> is 'something'

Step 1: Largest Item Classifier

Jeremy didn't have much experience in object detection before preparing for this course
find the biggest object in each image and classify it
younger students figure out the whole big solution they want, speculative ideas, spend 6 months on it, and day before presentation doesn't work
Kaggle approach: half an hour each day, make it better than the day before
go through each of the bounding boxes in image and get the biggest one
lambda functions used everywhere, a one-off function
sorted python function

def get_lrg(b):
    if not b: raise Exception()
    b = sorted(b, key=lambda x: np.product(x[0][-2:]-x[0][:2]), reverse=True)
    return b[0]

dictionary comprehension is like list comprehension but it goes inside curly brackets

trn_lrg_anno = {a: get_lrg(b) for a,b in trn_anno.items()}

Coding

lots of people write lines and lines of code, without checking what it is doing, and at the very end, they have an error and do not know where it is
handy method for creaing a directory

(PATH/'tmp').mkdir(exist_ok=True)
CSV = PATH/'tmp/lrg.csv'

why create a csv file? makes it easy, create a csv, put in a temp folder and use what we already have
easiest way to create a csv file is to create a pandas dataframe
code below: dictionary does not have order, so order of columns matters here

df = pd.DataFrame({'fn': [trn_fns[o] for o in trn_ids],
    'cat': [cats[trn_lrg_anno[o][1]] for o in trn_ids]}, columns=['fn','cat'])
df.to_csv(CSV, index=False)

Back to "Dogs and Cats"!

f_model = resnet34
sz=224
bs=64

tfms = tfms_from_model(f_model, sz, aug_tfms=transforms_side_on, crop_type=CropType.NO)
md = ImageClassifierData.from_csv(PATH, JPEGS, CSV, tfms=tfms, bs=bs)

crop_type=CropType.NO this is different from before: may remember the default strategy for 224x224 image, is to first resize it so the smallest side is 224, and then take a random square crop during training, and then during validation, we take a center crop, unless we do data augmentation, in which case we take a few center crops.
for bounding boxes we don't want to do that, unlike in Image Net where the thing we care about is pretty much in the middle and pretty big, a lot of the stuff in object detection is quite small and close to the edge, so we could crop it out, and that would be bad.
crop_type=CropType.NO this means don't crop; to make it square instead, it squishes it
generally speaking, a lot of computer vision models work better if you crop rather than squish, but they still work if you squish
in this case, we definitely don't want to crop, so this perfect
if you had very long or very tall images, that might be more difficult to model

Model Loader

main thing to know about a data loader is it is an iterator
each time you get the next iterator, you get a mini batch
by default, the batch size is 64
in python, the way to get the next item in iterator is with next(iter

x,y=next(iter(md.val_dl))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lesson_8.md

lesson_8.md

Lesson 8: Part 2 Intro, Object Detection

Staff

Notes

Object Detection

Learning

Park 1 Takeaways

Transfer Learning - definition

Transfer Learning

Embeddings

Part 1 to Part 2

Motivation

Deep Learning Box

Reading Papers

Opportunities in this Class

Part 2: What We Will Study

Part 1 output

Part 2 output

Notes

Object Detection

Stage 1

Notebook: Pascal

Dataset: Pascal

The PASCAL VOC project:

Notes

PATH

Coding

Editor

Steps

OpenCV open image

Matplotlib

Step 1: Largest Item Classifier

Coding

Back to "Dogs and Cats"!

Model Loader

Files

lesson_8.md

Latest commit

History

lesson_8.md

File metadata and controls

Lesson 8: Part 2 Intro, Object Detection

Staff

Notes

Object Detection

Learning

Park 1 Takeaways

Transfer Learning - definition

Transfer Learning

Embeddings

Part 1 to Part 2

Motivation

Deep Learning Box

Reading Papers

Opportunities in this Class

Part 2: What We Will Study

Part 1 output

Part 2 output

Notes

Object Detection

Stage 1

Notebook: Pascal

Dataset: Pascal

The PASCAL VOC project:

Notes

PATH

Coding

Editor

Steps

OpenCV open image

Matplotlib

Step 1: Largest Item Classifier

Coding

Back to "Dogs and Cats"!

Model Loader