-
Notifications
You must be signed in to change notification settings - Fork 130
models AutoML Text Classification
Automated Machine Learning, or AutoML, is a process that automates the repetitive and time-consuming tasks involved in developing machine learning models. This helps data scientists, analysts, and developers to create models more efficiently and with higher quality, resulting in increased productivity and scalability.
AutoML Text Classification enables you to classify or categorize texts into predefined groups. Your dataset should be a labeled set of texts with their relevant tags that categorize each piece of text into a predefined group.
With this functionality, you can:
- Directly use datasets coming from Azure Machine Learning data labeling
- Utilize labeled data to create NLP models without any training code.
- Enhance model performance by selecting the appropriate algorithm and fine-tuning the hyperparameters selecting the appropriate algorithm from a large selection of models or let AutoML find the best model for you.
- Either download or deploy the resulting model as a endpoint in Azure Machine Learning.
- Scale the operationalization process with the help of Azure Machine Learning's MLOps and ML Pipelines capabilities.
See How to train nlp models for more information.
To create NLP models, it is necessary to provide labeled text data as input for model training. For text classification, the dataset can contain several text columns and exactly one label column.
Please see documentation for data preparation requirements.
Currently, language selection defaults to English. But Automated ML supports 104 languages leveraging language specific and multilingual pre-trained text DNN models. Please see Language setting for documentation.
You can initiate individual trials, or perform a manual sweeps, which explores multiple hyperparameter values near the more promising models and hyperparameter configurations.
For more information, see Model sweeping and hyperparameter tuning.
apache-2.0
Task | Dataset | Python sample (Notebook) | CLI with YAML |
---|---|---|---|
Multiclass Text Classification | Yelp review | automl-nlp-multiclass-sentiment-mlflow.ipynb | cli-automl-text-classification-newsgroup.yml |
Multilabel Text Classification | arXiv paper abstract | automl-nlp-multilabel-paper-cat.ipynb | cli-automl-text-classification-multilabel-paper-cat.yml |
{
"input_data": {
"input_string": ["Today was an amazing day!", "It was an unfortunate series of events."]
}
}
[
{
"0": "Fake"
},
{
"0": "Fake"
}
]
Version: 2
SharedComputeCapacityEnabled
license : apache-2.0
task : text-classification
finetune_compute_allow_list : ['Standard_NC4as_T4_v3', 'Standard_NC6s_v3', 'Standard_NC8as_T4_v3', 'Standard_NC12s_v3', 'Standard_NC16as_T4_v3', 'Standard_NC24s_v3', 'Standard_NC64as_T4_v3', 'Standard_NC96ads_A100_v4', 'Standard_ND96asr_v4', 'Standard_ND96amsr_A100_v4', 'Standard_ND40rs_v2']
inference_compute_allow_list : ['Standard_D4a_v4', 'Standard_D4as_v4', 'Standard_DS4_v2', 'Standard_D8a_v4', 'Standard_D8as_v4', 'Standard_DS5_v2', 'Standard_D16a_v4', 'Standard_D16as_v4', 'Standard_D32a_v4', 'Standard_D32as_v4', 'Standard_D48a_v4', 'Standard_D48as_v4', 'Standard_D64a_v4', 'Standard_D64as_v4', 'Standard_D96a_v4', 'Standard_D96as_v4', 'Standard_FX4mds', 'Standard_F8s_v2', 'Standard_FX12mds', 'Standard_F16s_v2', 'Standard_F32s_v2', 'Standard_F48s_v2', 'Standard_F64s_v2', 'Standard_F72s_v2', 'Standard_FX24mds', 'Standard_FX36mds', 'Standard_FX48mds', 'Standard_E4s_v3', 'Standard_E8s_v3', 'Standard_E16s_v3', 'Standard_E32s_v3', 'Standard_E48s_v3', 'Standard_E64s_v3', 'Standard_NC4as_T4_v3', 'Standard_NC6s_v3', 'Standard_NC8as_T4_v3', 'Standard_NC12s_v3', 'Standard_NC16as_T4_v3', 'Standard_NC24s_v3', 'Standard_NC64as_T4_v3', 'Standard_NC24ads_A100_v4', 'Standard_NC48ads_A100_v4', 'Standard_NC96ads_A100_v4', 'Standard_ND96asr_v4', 'Standard_ND96amsr_A100_v4', 'Standard_ND40rs_v2']
View in Studio: https://ml.azure.com/registries/azureml/models/AutoML-Text-Classification/version/2
License: apache-2.0
SharedComputeCapacityEnabled: True
finetuning-tasks: token-classification
finetune-min-sku-spec: 4|1|28|176
finetune-recommended-sku: Standard_NC4as_T4_v3, Standard_NC6s_v3, Standard_NC8as_T4_v3, Standard_NC12s_v3, Standard_NC16as_T4_v3, Standard_NC24s_v3, Standard_NC64as_T4_v3, Standard_NC96ads_A100_v4, Standard_ND96asr_v4, Standard_ND96amsr_A100_v4, Standard_ND40rs_v2
inference-min-sku-spec: 4|0|16|32
inference-recommended-sku: Standard_D4a_v4, Standard_D4as_v4, Standard_DS4_v2, Standard_D8a_v4, Standard_D8as_v4, Standard_DS5_v2, Standard_D16a_v4, Standard_D16as_v4, Standard_D32a_v4, Standard_D32as_v4, Standard_D48a_v4, Standard_D48as_v4, Standard_D64a_v4, Standard_D64as_v4, Standard_D96a_v4, Standard_D96as_v4, Standard_FX4mds, Standard_F8s_v2, Standard_FX12mds, Standard_F16s_v2, Standard_F32s_v2, Standard_F48s_v2, Standard_F64s_v2, Standard_F72s_v2, Standard_FX24mds, Standard_FX36mds, Standard_FX48mds, Standard_E4s_v3, Standard_E8s_v3, Standard_E16s_v3, Standard_E32s_v3, Standard_E48s_v3, Standard_E64s_v3, Standard_NC4as_T4_v3, Standard_NC6s_v3, Standard_NC8as_T4_v3, Standard_NC12s_v3, Standard_NC16as_T4_v3, Standard_NC24s_v3, Standard_NC64as_T4_v3, Standard_NC24ads_A100_v4, Standard_NC48ads_A100_v4, Standard_NC96ads_A100_v4, Standard_ND96asr_v4, Standard_ND96amsr_A100_v4, Standard_ND40rs_v2
languages: en