Skip to content

Commit

Permalink
Made the name of an entry for a preprocessed cnn dataset to contain t…
Browse files Browse the repository at this point in the history
…he name of the tokenizer, so it is possible to have preprocessed datasets for GPTj and Mistral at the same time.
  • Loading branch information
mosalov committed Dec 28, 2023
1 parent a2a9f41 commit 1c77476
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion cnndm_preprocessor/code_axs.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,8 +69,9 @@ def preprocess_files(source_dir,
def preprocess(source_dir, input_data_type, new_file_extension, file_name, model_name_or_path, dataset_name, tags=None, entry_name=None, __record_entry__=None):

__record_entry__["tags"] = tags or [ "preprocessed" ]
entry_name_list = [ dataset_name, "preprocessed" ]
entry_name_list = [ dataset_name, model_name_or_path, "preprocessed" ]
entry_name = "_".join(entry_name_list)
entry_name = entry_name.replace("/", "-")

__record_entry__.save( entry_name )
output_directory = __record_entry__.get_path(file_name)
Expand Down

0 comments on commit 1c77476

Please sign in to comment.