Skip to content

components oss_distillation_generate_data

github-actions[bot] edited this page Aug 2, 2024 · 10 revisions

OSS Distillation Generate Data

oss_distillation_generate_data

Overview

Component to generate data from teacher model enpoint

Version: 0.0.2

View in Studio: https://ml.azure.com/registries/azureml/components/oss_distillation_generate_data/version/0.0.2

Inputs

Inputs

Name Description Type Default Optional Enum
train_file_path Path to the registered training data asset. The supported data formats are jsonl, json, csv, tsv and parquet. uri_file
validation_file_path Path to the registered validation data asset. The supported data formats are jsonl, json, csv, tsv and parquet. uri_file True
teacher_model_endpoint_name Teacher model endpoint name string True
teacher_model_endpoint_url Teacher model endpoint URL string True
teacher_model_endpoint_key Teacher model endpoint key string True
teacher_model_max_new_tokens Teacher model max_new_tokens inference parameter integer 128
teacher_model_temperature Teacher model temperature inference parameter number 0.2
teacher_model_top_p Teacher model top_p inference parameter number 0.1
teacher_model_frequency_penalty Teacher model frequency penalty inference parameter number 0.0
teacher_model_presence_penalty Teacher model presence penalty inference parameter number 0.0
teacher_model_stop Teacher model stop inference parameter string True
request_batch_size No of data records to hit teacher model endpoint in one go integer 10
min_endpoint_success_ratio The minimum value of (successful_requests / total_requests) required for classifying inference as successful. If (successful_requests / total_requests) < min_endpoint_success_ratio, the experiment will be marked as failed. By default it is 0.7 (0 means all requests are allowed to fail while 1 means no request should fail.) number 0.7
enable_chain_of_thought Enable Chain of thought for data generation string true

Outputs

Name Description Type
generated_train_file_path Generated train data uri_file
generated_validation_file_path Generated validation data uri_file

Environment

azureml://registries/azureml/environments/acft-hf-nlp-gpu/versions/63

Clone this wiki locally