Skip to content

Commit

Permalink
pipeline
Browse files Browse the repository at this point in the history
delete _internal_config_dict_converted

convert init_subclass to wrapper

trainer_configmixin

sha256 and pretrained. fix wrapper

download from obs

remove test code

image_classification_trainer

append sys

add dict

mindspore dataset

fix bug findFromPreConfig

fix path bug

fix path bug

pathlib

class check

add document

checkfiles

autopipeline

documentation

doc
  • Loading branch information
kong13661 committed Oct 31, 2023
1 parent e9fb1ca commit f55218b
Show file tree
Hide file tree
Showing 7 changed files with 1,726 additions and 0 deletions.
196 changes: 196 additions & 0 deletions docs/en/source/pipeline/overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,196 @@
# Pipeline

## Introduction

To enable fast downloading of pre-trained models and perform model training and inference with minimal code, tinyms provides pipeline API. With the pipeline, you can simply use a few lines of code to download the model from the cloud and perform inference locally. Besides, you can also use just a few lines of code to train and fine-tune the model.

## Examples

### Load a pretrained model

``` python
>>> from tinyms.pipeline import AutoModelPileline
>>> model = AutoModelPileline.from_pretrained("model_cache_path", "model_repo")
>>> pred = model(your_input)
```

### Load a trainer

```python
>>> from tinyms.pipeline import AutoTrainerPipeline
>>> trainer = AutoTrainerPipeline.from_pretrained("trainer_cache_path", "trainer_repo")
>>> trainer.init_model(model)
>>> trainer.train()
```

You can also pass some arguments such as epoch to the `train` method. The passed arguments will override the default value.

After training, you can save your model by invoking `save_pretrained` method.

```python
>>> model.save_pretrained("path_to_save")
```

This will save config and checkpoint to `path_to_save`.




## For Develeper

## details for config

### structure of folder

A the folder structure using `from_pretrained` to load is displayed below.


model_cache_path
├── demo_model
├── model
│ ├── config.json
│ ├── filelist.txt
│ └── weight.ckpt
├── model_code
│ ├── filelist.txt
│ ├── networks_test.py
└── trainer
├── config.json
├── filelist.txt
└── train_config
└── config.json

`model_cache_path` is the path to save the downloaded repo.

`demo_model` is the name of downloaded repo.

`model` is the folder to save config and checkpoint.

`model_code` is the folder to save the extra-code. If no extra-code, there is no this folder.

`trainer` is the folder to save config about training the model.

The config folder can be generated by invoking `save_pretrained`.

### structure of config.json

Blocks below is an example of the `config.json` in `trainer`.

``` json
{
"__module__": "tinyms.pipeline.image_classification_trainer.Trainer",
"__version__": "0.3.2",
"build_config": null,
"eval_config": null,
"fit_config": null,
"loss": {
"loss": "SoftmaxCrossEntropyWithLogits",
"params": {
"sparse": true
}
},
"metrics": [
"accuracy"
],
"optim": {
"optimizer": "Momentum",
"params": {
"learning_rate": 0.1,
"momentum": 0.9
}
},
"predict_config": null,
"train_config": {
"__module__": "tinyms.pipeline.image_classification_trainer.TrainConfig",
"__subfolder__": null
}
}
```

This is an example of `config.json`. `save_pretrained` will record the `__module__` needed and the tinyms version. Other keys such as `loss` is used to instantiates module `tinyms.pipeline.image_classification_trainer.Trainer`. If a dictionary has key `__subfolder__`, the details of this arguments will be saved into a subfoleder with name of this dictionary.


## Define new model

To define new model class, you need define a new class inherited from `ConfigMixin`.

```python
>>> from tinyms.pipeline import ConfigMixin, save_config, Ignore, SubFolder
>>> class Model(ConfigMixin):
... @save_config
... def __init__(
... self, a: Ignore=1, b: SubFolder=2, c: Union[Ignore, int]=3, d: Union[SubFolder: int]=4):
>>> model = Model()
>>> model.save_pretrained("config_path")
```

`ConfigMixin` is a base class for pipeline mixin. This class provides methods forsaving and loading model config. A class that inherits from this class can apply `@save_config` to `__init__` method to record the config of the class.

If you wrap `__init__` with `@save_config`, the argument of Ignore type will not be saved into the config. The SubFolder type will be saved into a sub folder.

Set `__prefix__` to change the name of the folder to save config and checkpoint.
Set `__weight__` to change the name of the checkpoint file.


tinyms also provide a function `wrap_config_mixin` to define a new model.

This function will add ConfigMixin to the base class of the class and wrap `__init__` method with `@save_config`.

```python
>>> class Model:
... ...
>>>
>>> Model = wrap_config_mixin(Model)
>>> model = Model()
>>> model.save_pretrained("config_path")
```

### Extra-code

If the defined model is not in tinyms, you need save the code containing the defination of the model class. The code should save to `model_code` folder in the repo folder. When calling `from_pretrained`, the path of `model_code` will append to `sys.path`.


## Define a new trainer

To define new trainer class, you need define a new class inherited from `TrainerConfigMixin`.

`TrainerConfigMixin` is a base class for trainer pipeline mixin. This class provides methods for saving and loading model config. A class that is inherited from this class can apply `@save_config` to `__init__` method to record the config of the class.

If you wrap `__init__` with `@save_config`, the argument of Ignore type will not be saved into the config. The SubFolder type will be saved into a sub folder.

For a trainer, you may want to implement some methods like `train`, `eval`, `predict`. You can use `@set_from_config` to set the arguments from the config. The `FromConfig` type arguments having the following property.

1. The arguments that are not in the config will be set to default value.
2. The arguments set in running time will override the arguments in the config.

Once you wrap a method with `@set_from_config`, you can use `BaseArgsFromConfig` to generate the arguments class. To use `BaseArgsFromConfig`, you should wrap the `__init__` method with `@copy_signature(Trainer.method)`. The arguments of `__init__` method should be `__init__(self, *args, **kwargs)`.

You should define the arguments class in `__init__` method. The default name of the arguments class is `{method_name}_config`. You can change the name by passing the name to `@set_from_config(name)`.

```python
>>> class Trainer(TrainerConfigMixin):
... @save_config
... def __init__(self, train_args=None):
... self.train_args = train_args
...
... @set_from_config
... def train(self, epoch: FromConfig):
... ...
>>>
>>> class TrainConfig(BaseArgsFromConfig):
... @copy_signature(Trainer.train)
... def __init__(self, *args, **kwargs):
... super().__init__(*args, **kwargs)
>>>
>>> train_config = TrainConfig(2)
>>> trainer = Trainer(train_config=train_config)
>>>
>>> trainer.save_pretrained('model_config')
>>> new_trainer = trainer.from_pretrained('model_config')
>>>
>>> new_trainer.train()
>>>
>>> new_trainer.train(4)

```

Empty file added tinyms/pipeline/__init__.py
Empty file.
5 changes: 5 additions & 0 deletions tinyms/pipeline/auto_pipeline.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
from .configmixin import ConfigMixin
from .trainer_configmixin import TrainerConfigMixin

AutoModelPipeline = ConfigMixin
AutoTrainerPipeline = TrainerConfigMixin
Loading

0 comments on commit f55218b

Please sign in to comment.