Skip to content

Latest commit

 

History

History
307 lines (249 loc) · 14.7 KB

lesson_8.md

File metadata and controls

307 lines (249 loc) · 14.7 KB

Lesson 8: Part 2 Intro, Object Detection

(19-Mar-2018, live)


Staff

Notes

  • 600 international fellows around the world
  • Rachel & Jeremy will be in room 153, 10am to 6pm each day (not for mentoring, possible projects)

Object Detection

  • creating much richer convolutional structures
  • what is a picture of and where it is in the picture

Learning

  • Jeremy trying to pick topics that will help us learn foundational topics (richer CNN)
  • can't possibly cover hundreds of interesting things done with deep learning

Park 1 Takeaways

  • we don't call this deep learning, but differential programming
  • Part 1 was setting up a differential function, a loss function and pressing Go
  • If you can configure a loss function that configures score, how good a task is, you're kind of done
  • playground.tensorflow.org
    • play interactively where you can create and play with your functions manually

Transfer Learning - definition

Transfer learning or inductive transfer is a research problem in machine learning that focuses on storing knowledge gained while solving one problem and applying it to a different but related problem.[1] For example, knowledge gained while learning to recognize cars could apply when trying to recognize trucks. This area of research bears some relation to the long history of psychological literature on transfer of learning, although formal ties between the two fields are limited.

Transfer Learning

  • the most important thing to learn to do to use deep learning effectively
  • it makes nearly everything easier, faster and more accurate
  • fastai library is all focused on transfer learning
  • network that does thing A, remove last layer or so, replace it with a few random layers at the end, fine tune those layers to do thing B, taking advantage of the features the original network learned


Embeddings

embeddings allow us to use categorical data

Part 1 to Part 2

  • rather than fastai and PyTorch being obscure, will learn enough to understand the source code
  • object oriented python important to study and understand
  • will introduce Python debugger, using editor to jump to code
  • details on coding technique
  • detailed walk-throughs of papers
  • if you come across something you don't know, it is not hard, it is something you need to learn
  • be careful of taking code from online resources, it may just good enough to have run their experiments, but difficult to generalize, be ready to do some debugging

Motivation

  • idea is to start with an empty notebook
  • don't copy and paste code from notebook; TYPE IT OUT
  • make sure you can repeat the process
  • practice, practice
  • if you don't understand a step, can ask on the forums, propose a hypothesis for why you think it doesn't work

Deep Learning Box

  • if you wish, and have financial resources, can build your own deep learning toolbox
  • if it is a good time, in your study cycle for it
  • budget: $1000 - $1500 for your own box
  • RAM: try to get 32GB
  • PCI Lanes: don't need to have 16 lanes to feed your GPU, you need 8 lanes
  • Build: you can buy the parts and put it together, or get someplace to do it for you

Reading Papers

  • each week we will be reading papers
  • in academic papers, people love using Greek letters
  • Adam is momentum and momentum on the square of the gradient
  • papers include theoretical reasoning for why things work, lot of conferences and journals don't like to accept papers without theoretical justification
  • Jeffrey Hinton: a decade or 2 ago, no conferences would accept neural network papers, then 1 abstract theoretical result came out, and journals started accepting neural network research
  • we need to learn to read papers
  • take a paper, put in effort to understand it, and then write a blog to explain it in code and normal English
  • lots of people who do that get a following and great job offers
  • understanding papers ---> useful skill
  • it's hard to read or understand something that you cannot vocalize, which means if you don't know the names of the Greek letters, it's hard to follow
  • spend some time to understand Greek letters

Opportunities in this Class

  • cutting edge research, almost no one else knows about
  • write blogs, incorporate research into a library
  • communicating what you are doing is very helpful
  • can get feedback on draft blogs on the forums

Part 2: What We Will Study

  • Generative Models
    • CNNs beyond classification
    • NLP beyond classification
  • Large datasets

Part 1 output

  • number
  • category

Part 2 output

  • top left, bottom right of image
  • what object is
  • complete picture
  • enhanced version of input image
  • entire original input paragraph, translated into French

Notes

  • requires different way of thinking about things
  • almost all data will be text or image (no audio yet, no more time series (most in ML course))
  • we will be looking at some larger datasets
  • don't be put off if you have limited computing resources
    • can use smaller datasets
    • can cut down on batch size

Object Detection

  • multiple items in an image we are classifying
  • saying what we see, we also have bounding boxes around what we see
  • bounding box: box, rectangle, rectangle has the object entirely in it, but is no bigger than it has to be
  • bounding box around horse, slightly imperfect, to be expected
  • take data that is labeled this way and on labeled data, generate classes of object and bounding box
  • labeling this kind of data is generally more expensive
  • ImageNet: here are the 1000 classes, tell me which it is
  • Object Detection: here is a list of classes, tell me everything that is in the image and where it is

Stage 1

  • classify and localize the largest object in each image
  1. What it is
  2. Where it is

Notebook: Pascal

  • pascal.ipynb
  • all notebooks are in dl2 folder
  • torch.cuda.set_device(3) pick number of GPUs to use (of course, it depends on how many you have to use)

Dataset: Pascal

The PASCAL VOC project:

  • Provides standardised image data sets for object class recognition
  • Provides a common set of tools for accessing the data sets and annotations
  • Enables evaluation and comparison of different methods
  • Ran challenges evaluating performance on object class recognition (from 2005-2012, now finished)

Notes

  • Pascal VOC (Visual Object Classes): http://host.robots.ox.ac.uk/pascal/VOC/
  • we're using 2007 version of data
  • you can use the 2012 version; it's bigger, will get better results
  • some people combine the two, need to be careful, there can be leakage between the validation datasets


PATH

  • this gives you object oriented access to the files
  • pathlib object has an open method
  • load the .json files which don't contain the images, but the bounding boxes and the classes of the object
  • json - the most standard way to pass around hierarchical structured data
PATH = Path('data/pascal')
list(PATH.iterdir())

Coding

  • requires tenacity

Editor

  • Visual Studio Code is a great editor out there, it is FREE
    • best editor out there (unless you are willing to put time in to learn Vim or Emacs)
    • if you download a recent version of Anaconda, it will offer to download Visual Studio for you
    • good choice of editor if you are not sure

Steps

  • do git clone of fastai library
  • File / Open Folder / open fastai github library
  • For interpreter: can select fastai environment

You can use Visual Studio Code (vscode - open source editor that comes with recent versions of Anaconda, or can be installed separately), or most editors and IDEs, to find out all about the open_image function. vscode things to know:

  • Command palette (Ctrl-shift-p)
  • Select interpreter (for fastai env)
  • Select terminal shell
  • Go to symbol (Ctrl-t)
  • Find references (Shift-F12)
  • Go to definition (F12)
  • Go back (alt-left)
  • View documentation
  • Hide sidebar (Ctrl-b)
  • Zen mode (Ctrl-k,z)

OpenCV open image

open_image

  • cv2 is the open cv library
  • torch vision library uses PyTorch tensors for all of its data augmentation
  • a lot of people use PIL (pillow - Python Imaging Library) that adds support for opening, manipulating, and saving many different image file formats.
  • Jeremy did a lot of testing; found open cv is 5-10x faster than Torch Vision
  • Jeremy did satellite competition with another student, Torch Vision was very slow
  • PIL is faster than Torch Vision, but not as fast as open cv; PIL is not as thread-safe
  • Python has GIL (global interpreter lock) which means that two threads cannot do pythonic things at the same time, which makes Python, not a great language, for modern programming
  • open cv releases the GIL
  • one of the reasons the fastai library is so amazingly fast is that we don't use multiple processsors for data augmentation, we use multiple threads, reason we can do multiple threads is that is we use open cv
  • unfortunately, open cv is a crappy API, poorly documented
  • for these reasons, don't use PyTorch or Pillow for your data augmentation

Matplotlib

  • matplotlib so named because it was originally a clone of matlab's plotting library
  • unfortunately matlab's plotting library is awful, but what was used at the time
  • so, matplotlib added a second API, an object oriented library, but there's no tutorials on that
  • Jeremy will show us how to use this API and some simple tricks
  • plt.subplots is a handy wrapper, it returns 2 things, one is an axis object
  • instead of saying plt.<>, now say ax.<> where <> is 'something'

Step 1: Largest Item Classifier

  • Jeremy didn't have much experience in object detection before preparing for this course
  • find the biggest object in each image and classify it
  • younger students figure out the whole big solution they want, speculative ideas, spend 6 months on it, and day before presentation doesn't work
  • Kaggle approach: half an hour each day, make it better than the day before
  • go through each of the bounding boxes in image and get the biggest one
  • lambda functions used everywhere, a one-off function
  • sorted python function
def get_lrg(b):
    if not b: raise Exception()
    b = sorted(b, key=lambda x: np.product(x[0][-2:]-x[0][:2]), reverse=True)
    return b[0]
  • dictionary comprehension is like list comprehension but it goes inside curly brackets
trn_lrg_anno = {a: get_lrg(b) for a,b in trn_anno.items()}

Coding

  • lots of people write lines and lines of code, without checking what it is doing, and at the very end, they have an error and do not know where it is
  • handy method for creaing a directory
(PATH/'tmp').mkdir(exist_ok=True)
CSV = PATH/'tmp/lrg.csv'
  • why create a csv file? makes it easy, create a csv, put in a temp folder and use what we already have
  • easiest way to create a csv file is to create a pandas dataframe
  • code below: dictionary does not have order, so order of columns matters here
df = pd.DataFrame({'fn': [trn_fns[o] for o in trn_ids],
    'cat': [cats[trn_lrg_anno[o][1]] for o in trn_ids]}, columns=['fn','cat'])
df.to_csv(CSV, index=False)

Back to "Dogs and Cats"!

f_model = resnet34
sz=224
bs=64
tfms = tfms_from_model(f_model, sz, aug_tfms=transforms_side_on, crop_type=CropType.NO)
md = ImageClassifierData.from_csv(PATH, JPEGS, CSV, tfms=tfms, bs=bs)
  • crop_type=CropType.NO this is different from before: may remember the default strategy for 224x224 image, is to first resize it so the smallest side is 224, and then take a random square crop during training, and then during validation, we take a center crop, unless we do data augmentation, in which case we take a few center crops.
  • for bounding boxes we don't want to do that, unlike in Image Net where the thing we care about is pretty much in the middle and pretty big, a lot of the stuff in object detection is quite small and close to the edge, so we could crop it out, and that would be bad.
  • crop_type=CropType.NO this means don't crop; to make it square instead, it squishes it
  • generally speaking, a lot of computer vision models work better if you crop rather than squish, but they still work if you squish
  • in this case, we definitely don't want to crop, so this perfect
  • if you had very long or very tall images, that might be more difficult to model

Model Loader

  • main thing to know about a data loader is it is an iterator
  • each time you get the next iterator, you get a mini batch
  • by default, the batch size is 64
  • in python, the way to get the next item in iterator is with next(iter
x,y=next(iter(md.val_dl))