Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement MAML meta-opt #23

Merged
merged 18 commits into from
May 9, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -115,11 +115,12 @@ dmypy.json
.pyre/

# pycharm
*/.DS_Store
*.DS_Store
**/__pycache__/
.idea/
FETCH_HEAD

# vscode
.vscode
*.DS_Store
PaddleFSL/raw_data/
48 changes: 48 additions & 0 deletions PaddleFSL/examples/optim/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Image Classification Tasks

Here, we provide examples of applying PaddleFSL to few-shot image classification tasks which is similarity to example with [model_zoo](../image_classification/README.md).


## Datasets

We evaluate the performance on 5 benchmark datasets, including Omniglot, *mini*ImageNet, CIFAR-FS, FC100 and Tiered-ImageNet, which can be accessed as described in [raw_data/README.md](../../raw_data/README.md).


## Results

We provide results of using MAML [1], ANIL [2] below. The exact model configuration and pretrained models can be downloaded from [here](https://drive.google.com/file/d/1pmCI-8cwLsadG6JOcubufrQ2d4zpK9B-/view?usp=sharing), which can reproduce these results.

### [MAML](http://proceedings.mlr.press/v70/finn17a/finn17a.pdf?source=post_page---------------------------)


| Dataset | Backbone | Way | Shot | Original paper | Other reports | model zoo(first order) | Optim(first order) |
| :-------------: | :------: | :--: | :--: | :------------: | :----------------------------------------------------------: | :--------------------: | ------------------ |
| Omniglot | MLP | 5 | 1 | 89.7 ± 1.1 | 88.9<br>([learn2learn](http://learn2learn.net/)) | 88.88 ± 2.99 | -- |
| Omniglot | MLP | 5 | 5 | 97.5 ± 0.6 | -- | 97.50 ± 0.47 | -- |
| Omniglot | CNN | 5 | 1 | 98.7 ± 0.4 | 99.1<br/>([learn2learn](http://learn2learn.net/)) | 97.13 ± 1.25 | 92.7 |
| Omniglot | CNN | 5 | 5 | 99.9 ± 0.1 | 99.9 ± 0.1<br/>([R2D2](https://arxiv.org/pdf/1805.08136.pdf)) | 99.23 ± 0.40 | ***93.1*** |
| *mini*ImageNet | CNN | 5 | 1 | 48.70 ± 1.84 | 48.3<br/>([learn2learn](http://learn2learn.net/)) | 49.81 ± 1.78 | |
| *mini*ImageNet | CNN | 5 | 5 | 63.11 ± 0.92 | 65.4<br/>([learn2learn](http://learn2learn.net/)) | 64.21 ± 1.33 | -- |
| CIFAR-FS | CNN | 5 | 1 | -- | 58.9 ± 1.9<br/>([R2D2](https://arxiv.org/pdf/1805.08136.pdf)) | 57.06 ± 3.83 | 49.1 |
| CIFAR-FS | CNN | 5 | 5 | -- | 76.6<br/>([learn2learn](http://learn2learn.net/)) | 72.24 ± 1.71 | -- |
| FC100 | CNN | 5 | 1 | -- | -- | 37.63 ± 2.23 | 30.2 |
| FC100 | CNN | 5 | 5 | -- | 49.0<br/>([learn2learn](http://learn2learn.net/)) | 49.14 ± 1.58 | -- |
| CUB | CNN | 5 | 1 | -- | 54.73 ± 0.97<br/>([CloseLookFS](https://arxiv.org/pdf/1904.04232.pdf)) | 53.31 ± 1.77 | 20.7 |
| CUB | CNN | 5 | 5 | -- | 75.75 ± 0.76<br/>([CloseLookFS](https://arxiv.org/pdf/1904.04232.pdf)) | 69.88 ± 1.47 | -- |
| Tiered-ImageNet | CNN | 5 | 5 | -- | -- | 67.56 ± 1.80 | -- |

### [ANIL](https://openreview.net/pdf?id=rkgMkCEtPB)

| Dataset | Backbone | Way | Shot | Author Report | Other Report | model zoo(first order) | Optimizer(First Order) |
| :------------: | :------: | :--: | :--: | :-----------: | :-----------------------------------------------: | :--------------------: | ---------------------- |
| Omniglot | CNN | 5 | 1 | -- | -- | 96.06 ± 1.00 | 96.34 ± 1.98 |
| Omniglot | CNN | 5 | 5 | -- | -- | 98.74 ± 0.48 | |
| *mini*ImageNet | CNN | 5 | 1 | 46.7 ± 0.4 | -- | 48.31 ± 2.83 | 45.31 ± 1.43 |
| *mini*ImageNet | CNN | 5 | 5 | 61.5 ± 0.5 | -- | 62.38 ± 1.96 | 61.81 ± 1.2 |
| CIFAR-FS | CNN | 5 | 1 | -- | -- | 56.19 ± 3.39 | ***30.8 ± 2.5*** |
| CIFAR-FS | CNN | 5 | 5 | -- | 68.3<br/>([learn2learn](http://learn2learn.net/)) | 68.60 ± 1.25 | 48.6 |
| FC100 | CNN | 5 | 1 | -- | -- | 40.69 ± 3.32 | 38.4 ± 1.3 |
| FC100 | CNN | 5 | 5 | -- | 47.6<br/>([learn2learn](http://learn2learn.net/)) | 48.01 ± 1.22 | 35.0 |
| CUB | CNN | 5 | 1 | -- | -- | 53.25 ± 2.18 | -- |
| CUB | CNN | 5 | 5 | -- | -- | 69.09 ± 1.12 | -- |

143 changes: 143 additions & 0 deletions PaddleFSL/examples/optim/anil_example.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
"""MAML example for optimization"""
from __future__ import annotations
import os
import paddle
from paddle import nn
from paddle.optimizer import Adam
import paddlefsl
from paddlefsl.metaopt.anil import ANILLearner
from examples.optim.meta_trainer import Config, Trainer, load_datasets


def init_models(config: Config):
"""Initialize models."""
if config.dataset == 'cub':
config.meta_lr = 0.002
config.inner_lr = 0.01
config.test_epoch = 10
config.meta_batch_size = 32
config.train_inner_adapt_steps = 5
config.test_inner_adapt_steps = 10
config.epochs = 10000

if config.k_shot == 5:
config.meta_lr = 0.003
config.inner_lr = 0.05
config.epochs = 10000

feature_model = paddlefsl.backbones.Conv(input_size=(3, 84, 84), output_size=config.n_way, conv_channels=[32, 32, 32, 32])
feature_model.output = paddle.nn.Flatten()
head_layer = paddle.nn.Linear(in_features=feature_model.feature_size, out_features=config.n_way,
weight_attr=feature_model.init_weight_attr, bias_attr=feature_model.init_bias_attr)

if config.dataset == 'cifarfs':
config.meta_lr = 0.001
config.inner_lr = 0.02
config.test_epoch = 10
config.meta_batch_size = 32
config.train_inner_adapt_steps = 5
config.test_inner_adapt_steps = 10
config.epochs = 20000
if config.k_shot == 5:
config.meta_lr = 0.001
config.inner_lr = 0.08

feature_model = paddlefsl.backbones.Conv(input_size=(3, 32, 32), output_size=config.n_way, conv_channels=[32, 32, 32, 32])
feature_model.output = paddle.nn.Flatten()
head_layer = paddle.nn.Linear(in_features=32, out_features=config.n_way,
weight_attr=feature_model.init_weight_attr, bias_attr=feature_model.init_bias_attr)

if config.dataset == 'miniimagenet':

config.meta_lr = 0.002
config.inner_lr = 0.05
config.test_epoch = 10
config.meta_batch_size = 32
config.train_inner_adapt_steps = 5
config.test_inner_adapt_steps = 10
config.epochs = 30000

feature_model = paddlefsl.backbones.Conv(input_size=(3, 84, 84), output_size=config.n_way, conv_channels=[32, 32, 32, 32])
feature_model.output = paddle.nn.Flatten()
head_layer = paddle.nn.Linear(in_features=feature_model.feature_size, out_features=config.n_way,
weight_attr=feature_model.init_weight_attr, bias_attr=feature_model.init_bias_attr)

if config.dataset == 'omniglot':
config.meta_lr = 0.005
config.inner_lr = 0.5

if config.k_shot == 5:
config.meta_lr = 0.06
config.inner_lr = 0.12
config.train_inner_adapt_steps = 3
config.test_inner_adapt_steps = 5

config.test_epoch = 10
config.meta_batch_size = 32
config.train_inner_adapt_steps = 1
config.test_inner_adapt_steps = 3
config.epochs = 30000

feature_model = paddlefsl.backbones.Conv(input_size=(1, 28, 28), output_size=config.n_way, pooling=False)
feature_model.output = paddle.nn.Flatten()
head_layer = paddle.nn.Linear(in_features=feature_model.feature_size, out_features=config.n_way,
weight_attr=feature_model.init_weight_attr, bias_attr=feature_model.init_bias_attr)

if config.dataset == 'fc100':
config.meta_lr = 0.005
config.inner_lr = 0.1
config.test_epoch = 10
config.meta_batch_size = 32
config.train_inner_adapt_steps = 5
config.test_inner_adapt_steps = 10
config.epochs = 5000
if config.k_shot == 5:
config.meta_lr = 0.002
config.epochs = 2000

feature_model = paddlefsl.backbones.Conv(input_size=(3, 32, 32), output_size=config.n_way)
feature_model.output = paddle.nn.Flatten()
head_layer = paddle.nn.Linear(in_features=feature_model.feature_size, out_features=config.n_way,
weight_attr=feature_model.init_weight_attr, bias_attr=feature_model.init_bias_attr)

return feature_model, head_layer


if __name__ == '__main__':

config = Config().parse_args(known_only=True)
config.device = 'gpu'
config.k_shot = 1

# config.dataset = 'omniglot'
config.dataset = 'miniimagenet'
# config.dataset = 'cifarfs'
# config.dataset = 'fc100'
# config.dataset = 'cub'

config.tracking_uri = os.environ.get('TRACKING_URI', None)
config.experiment_id = os.environ.get('EXPERIMENT_ID', None)

# Config: ANIL, Omniglot, Conv, 5 Ways, 1 Shot
train_dataset, valid_dataset, test_dataset = load_datasets(config.dataset)
feature_model, head_layer = init_models(config)

criterion = nn.CrossEntropyLoss()
learner = ANILLearner(
feature_model=feature_model,
head_layer=head_layer,
learning_rate=config.inner_lr,
)
scheduler = paddle.optimizer.lr.CosineAnnealingDecay(learning_rate=config.meta_lr, T_max=config.epochs)
optimizer = Adam(parameters=learner.parameters(), learning_rate=scheduler)
trainer = Trainer(
config=config,
train_dataset=train_dataset,
dev_dataset=valid_dataset,
test_dataset=test_dataset,
learner=learner,
optimizer=optimizer,
scheduler=scheduler,
criterion=criterion
)
trainer.train()
62 changes: 62 additions & 0 deletions PaddleFSL/examples/optim/anil_text_classification.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
"""ANIL example for optimization"""
from __future__ import annotations
import os
import paddle
from paddle import nn
from paddle.optimizer import Adam
import paddlefsl
from paddlefsl.metaopt.anil import ANILLearner
from paddlenlp.transformers.ernie.modeling import ErnieModel
from paddlenlp.transformers.ernie.tokenizer import ErnieTokenizer

from examples.optim.meta_trainer import Config, Trainer, load_datasets

class SequenceClassifier(nn.Layer):
"""Sequence Classifier"""
def __init__(self, hidden_size: int, output_size: int, dropout: float = 0.1):
super().__init__()
self.dropout = nn.Dropout(dropout)
self.classifier = nn.Linear(hidden_size, output_size)

def forward(self, embedding):
"""handle the main logic"""
embedding = self.dropout(embedding)
logits = self.classifier(embedding)
return logits


if __name__ == '__main__':

config = Config().parse_args(known_only=True)
config.device = 'gpu'

train_dataset = paddlefsl.datasets.few_rel.FewRel('train')
valid_dataset = paddlefsl.datasets.few_rel.FewRel('valid')
test_dataset = paddlefsl.datasets.few_rel.FewRel('valid')

config.tracking_uri = os.environ.get('TRACKING_URI', None)
config.experiment_id = os.environ.get('EXPERIMENT_ID', None)

tokenzier = ErnieTokenizer.from_pretrained('ernie-1.0')
feature_model, head_layer = ErnieModel.from_pretrained('ernie-1.0'), SequenceClassifier(hidden_size=768, output_size=config.n_way)

criterion = nn.CrossEntropyLoss()
learner = ANILLearner(
feature_model=feature_model,
head_layer=head_layer,
learning_rate=config.inner_lr,
)
scheduler = paddle.optimizer.lr.CosineAnnealingDecay(learning_rate=config.meta_lr, T_max=config.epochs)
optimizer = Adam(parameters=learner.parameters(), learning_rate=scheduler)
trainer = Trainer(
config=config,
train_dataset=train_dataset,
dev_dataset=valid_dataset,
test_dataset=test_dataset,
learner=learner,
optimizer=optimizer,
scheduler=scheduler,
criterion=criterion,
tokenizer=tokenzier
)
trainer.train()
48 changes: 48 additions & 0 deletions PaddleFSL/examples/optim/data_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
"""Data Utils for Meta Optimzations Algorithms"""
from __future__ import annotations
from typing import Tuple, Dict
import paddlefsl
from paddlefsl.datasets.cv_dataset import CVDataset


def load_datasets(name: str) -> Tuple[CVDataset, CVDataset, CVDataset]:
"""load CV Dataset by name, which can be omniglot, miniimagenet, or cifar10

Args:
name (str): the name of datasets

Returns:
Tuple[CVDataset, CVDataset, CVDataset]: train, dev, test dataset
"""
datasets_map: Dict[str, CVDataset] = {
"omniglot": (
paddlefsl.datasets.Omniglot(mode='train', image_size=(28, 28)),
paddlefsl.datasets.Omniglot(mode='valid', image_size=(28, 28)),
paddlefsl.datasets.Omniglot(mode='test', image_size=(28, 28))
),
# "miniimagenet": (
# paddlefsl.datasets.MiniImageNet(mode='train'),
# paddlefsl.datasets.MiniImageNet(mode='valid'),
# paddlefsl.datasets.MiniImageNet(mode='test')
# ),
# "cifarfs": (
# paddlefsl.datasets.CifarFS(mode='train', image_size=(28, 28)),
# paddlefsl.datasets.CifarFS(mode='valid', image_size=(28, 28)),
# paddlefsl.datasets.CifarFS(mode='test', image_size=(28, 28))
# ),
# "fc100": (
# paddlefsl.datasets.FC100(mode='train'),
# paddlefsl.datasets.FC100(mode='valid'),
# paddlefsl.datasets.FC100(mode='test')
# ),
# "cub": (
# paddlefsl.datasets.CubFS(mode='train'),
# paddlefsl.datasets.CubFS(mode='valid'),
# paddlefsl.datasets.CubFS(mode='test')
# )
}
if name not in datasets_map:
names = ",".join(list(datasets_map.keys()))
raise ValueError(f"{name} is not a valid dataset name, which should be in {names}")

return datasets_map[name]
Loading