- Machine learning sub-problems
- classification : data to discrete class label. Predicting a class label
- regression : predicting a numerical value
- similarity : finding similar/dissimilar data
- clustering : discovering structure in data
- embedding : data to a vector
- reinforcement learning : training by feedback
- Machine Learning Model Evaluation Metrics | PyData LA 2019 | video
- classification error metrics
- accuracy : low performance on unbalanced data
- mean Average precision (mAP)
- confusion matrix
- F1 score
- AUC
- and ...
- regression error metics
- R^2
- mean square error
- absolute error
- root mean squared logarithmic error
- mean absoloute percentage error
- permutation invariant: a model that produces the same output regardless of the order of elements in the input vector e.g. permutation invariant model : MLP e.g. permutation invariant operation : sum, mean, median, max, min e.g. permutation variant model : CNN, RNN --> position information
- classification error metrics
- data augmentation
- ensemble Ref: https://machinelearningmastery.com/ensemble-methods-for-deep-learning-neural-networks/
- dropout
- skip connection
Ref: https://lswook.tistory.com/105
ensemble-like structure
paper
Taking the Human Out of the Loop: A Review of Bayesian Optimization
- Intro to graph neural networks (ML Tech Talks) | video
- application of GNN : prediction of drug properties from structure
- A Deep Learning Approach to Antibiotic Discovery | Cell, 2020 | paper
- application of GNN : estimate ETA
- application of GNN : social network
- spectral approahces : to redefine the convolution operation in the Fourier domain, utilizing spectral filters that use ghe graph Laplacian.
- non-spectral approaches : to define the colvution operation directly on the graph.
- GraphSAGE : node embedding through sampling and aggregation
- Graph Attention Network (GAT)
-
topology based pooling :
- graph coarsening algorithms
-
global pooling architecture : Node feature representation. effective on a graph with smaller number of nodes.
-
hierarchical pooling architecture : effective on a graph with larger number of nodes.
- DiffPool
- graph pooling (gPool)
-
Graph U-Net
-
Self-Attention Graph pooling (SAGpool)
- Modeling Polypharmacy Side Effects with Graph Convolutional Networks
- [Korean blog] RNN basic
- What is the output in a RNN? | stack overflow
- return_sequences
return_sequences=True
: [batch_size, time_steps, input_dimensionality(input_features)] (contining the output for all time steps)return_sequences=False
: [batch size, input_dimensionality(input_features)] (containing the output of the last time stamp)
- TimeDistributed(Dense) vs Dense in Keras - Same number of parameters | stack overflow
TimeDistributedDense
applies a same dense to every time step during GRU/LSTM Cell unrolling. So the error function will be between predicted label sequence and the actual label sequence. (Which is normally the requirement for sequence to sequence labeling problems).- with
return_sequences=False
, Dense layer is applied only once at the last cell. This is normally the case when RNNs are used for classification problem. Ifreturn_sequences=True
then Dense layer is applied to every timestep just like TimeDistributedDense.
- learn what recognize an important input (input gate), store it in the long-term state, preserve it for as long as it is needed (forget gate), and extract it whenever it is needed.
- two hidden states: h_t (short-term state; 'h' stands for 'hidden'), c_t (long-term state; 'c' stands for 'cell')
- gate controller : the logistic activation function. Its output ranges from 0 to 1. Output 0 means closure of the gate. Output 1 means openness of the gate.
- forget gate
- input gate
- output gate
- layer normalization
- residual connection
- ResNet
- Neural Machine Translation by Jointly Learning to Align and Translate | paper1
- the concept of attention mechanism | video
- transformer | video
- query, key, value
- attention pooling : Given a query, attention pooling biases selection over values.
- attention scoring function : a weighted sum of the values based on these attention weights
- masked softmax operation
- additive attention
- scaled dot-product attention
- Bahdanau Attention : encoder-decoder
paper
Neural Machine Translation by Jointly Learning to Align and Translate | paper1 - self-attention (intra-attention) and positional encoding
-
self-attention : query, key, and values come from the same place
-
positional encoding : to use the sequence order information, we can inject absolute or relative positional information by adding positional encoding to the input representation. X + P (X : input representation, P: positional embedding matrix)
-
- Attention Is All You Need
- What exactly are keys, queries, and values in attention mechanisms? | stack overflow
- Pay Attention to MLPs
- two stage detection : slow, accurate
- R-CNN
- fast R-CNN :
- faster R-CNN : regional proposal network
- one stage detection : fast, inaccurate, not easy to train
Frame per Second (FPS) > 30 : criteria for real time visualization- YOLO : grid --> conditional class probility and bouding boxes + confidence --> final detection
- Single Shot Detection (SSD)
- application
- image colorization
- gaugan system
- cycleGAN : cycle-consistant adversarial network
- idea : A -> B -> A'
- cycle-consistency loss : abs(A - A')
Perspective : All thihg in ML/DL is human efforts. (e.g. training data selection, loss function, model architecture)
Yejin Choi
Q. how to reduce the sterotype or bias such as racism or sexism?
- dateset
- "garbage in, garbage out"
- data augmentation
- objective function
- a traditional objective function minimize/maximize the error
- To handling the bias, add 'gender' variable and a constraint that make 'gender' variable equally.
review
A Primer on Neural Network Models for Natural Language Processin
- word embeddings
- Word2vec
- CBOW
- skip-gram
- Chateracter Embedding : out-of-vocabulary (OOV) words
- Contextualized word embeddings
- Embedding from Language Model (ELMo) Ref : https://wikidocs.net/33930
- Generative adversarial networks (GANs)
- autoregressive models
- flows
- variational autoencoders (VAEs)
paper
Auto-Encoding Variational Bayes link
- Diffusion probabilistic model
paper
Denoising Diffusion Probabilistic Modelspaper
DENOISING DIFFUSION IMPLICIT MODELS- Stable diffusion:
paper
High-Resolution Image Synthesis with Latent Diffusion Model