f : X -> Y (f is more powerful), X,Y could be sequence, trees, discrete labels, etc.
Find a function F: X * Y -> R
F(X, Y) evaluate how compatible the object x and y is
f(X) = argmax F(X,Y) (list all the Y)
Training: P(X,Y) -> [0, 1]
Inference: argmax P(y|x) = argmax P(x, y)
Drawback for probability: 0-1 constraint is not necessary
Strength for probability: meaningful
- What does F(x, y) look like
- How to solve argmax (B&B, Viterbi, etc)
- How to find F(x, y)
F(x, y) is a linear combination of characteristics
F(x, y) = w * phi(x, y)
w is learned from training data, Perceptron Alg.
(Binary classification is a special case of Structured Learning)
But, assumption is that all the characteristics are separable. (We may rely on Structured SVM)