Skip to content

Latest commit

 

History

History
87 lines (61 loc) · 3.7 KB

preparation_for_ml_interviews.md

File metadata and controls

87 lines (61 loc) · 3.7 KB

Preparation for Interviews for ML Position

I will look for a full-time position in ML soon, so I list things that I think may be asked during the interviews.

Some knowledge come from the questions that were asked during my interviews and the questions that I heard were asked during others' interviews. Some are from what I read or what I learned.

The purpose of this list is for a quick review in case I overlook or forget something.

Essential Knowledge

ML

  • Bias vs. variance
  • No free lunch
  • Probably approximately correct learning
  • Occam's razor
  • cross validation
  • overfitting vs. underfitting
  • unbalanced classes
  • error bound: sample complexity, model complexity
  • regularizer
  • 0-1 loss, hinge loss, various loss functions ...
  • derivative of sigmoid, softmax, relu, tanh
  • information theory (information gain, mutual information)
  • covariate shift
  • kernel
  • recursive partition
  • bayesian inference
  • KL divergence
  • convex optimization (Lagrange multipliers)
  • multi-armed bandits
  • principle of maximum entropy
  • bayesian optimization
  • domain adaptation
  • classifier in adversarial envonrinment

Deep Learning

  • gradient descent
  • gradient vanishing and exploding
  • hidden state of RNN (or LSTM)

Tutorials and Links

Deep Learning

How to implement a recurrent neural networks, part 1

Reference to how to implement backpropagation through time

Mininal Neural Network case study

Content from Stanford CS 231, which contains the implementation of feed forward and back forward of neural networks in numpy. P.S. I was told that Google asked interviewee to manually implement gradient descent.

Yes you should understand backprop

Backprop, intuitions

Content from Stanford CS 231

Exploding and Vanishing Gradients

ML

SVM - dual problem

I read somewhere that dual problem was asked during interview for a ML position

Kernel Methods and the Representer Theorem

Representer theorem is the reason why optimization in SVM can be convert to the dual problem.

Decision Tree Flavors: Gini Index and Information Gain

Using Lagrange multipliers in optimization

There is a simple way to calculate partial derivative in this post.

Principle of Maximum Entropy: Simple Form

A lecture note on principle of maximum entropy from an MIT course.

Conditional Entropy (wiki)

Mutual Information (wiki)

Lecture note on Entropy and mutual information

Information gain

Slides on information gain. It gives an intuitive explanation of entropy and information gain.

Covariate Shift

Good introduction to covariate shift and give an intuitive explanations on why covariate shift makes the performance suffer.

[Learning from unbalanced data] (https://www.svds.com/learning-imbalanced-classes/)