Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to get embedding with finetuned model[encoder-only] #1344

Open
liulizuel opened this issue Jan 20, 2025 · 1 comment
Open

how to get embedding with finetuned model[encoder-only] #1344

liulizuel opened this issue Jan 20, 2025 · 1 comment

Comments

@liulizuel
Copy link

liulizuel commented Jan 20, 2025

I finetuned 'BAAI/bge-m3' with the script

nohup torchrun --nproc_per_node 8 \
        --master_port 29505 \
        -m FlagEmbedding.finetune.embedder.encoder_only.m3 \
        --model_name_or_path ../BAAI/bge-m3 \
    --cache_dir ../cache/model \
    --train_data ../general_train_data/mini-nq-like-general-train \
    --cache_path ../cache/data \
    --train_group_size 8 \
    --query_max_len 512 \
    --passage_max_len 512 \
    --pad_to_multiple_of 8 \
    --knowledge_distillation False \
    --same_dataset_within_batch True \
    --small_threshold 0 \
    --drop_threshold 0 \
    --output_dir ../test_encoder_only_m3_bge-m3_sd \
    --overwrite_output_dir \
    --learning_rate 1e-5 \
    --fp16 \
    --num_train_epochs 2 \
    --per_device_train_batch_size 2 \
    --dataloader_drop_last True \
    --warmup_ratio 0.1 \
    --gradient_checkpointing \
    --deepspeed ds_stage0.json \
    --logging_steps 1 \
    --save_steps 5000 \
    --negatives_cross_device \
    --temperature 0.02 \
    --sentence_pooling_method cls \
    --normalize_embeddings True \
    --kd_loss_type m3_kd_loss \
    --unified_finetuning True \
    --use_self_distill True \
    --fix_encoder False \
    --self_distill_start_step 0 > finetune.log 2>&1 &

Then I got the saved model in checkpoint-20000:

ls -lrt
total 1.1G
-rw-r--r-- 1 root root  701 Jan 17 19:03 config.json
-rw-r--r-- 1 root root 1.1G Jan 17 19:04 model.safetensors
-rw-r--r-- 1 root root 1.2K Jan 17 19:04 tokenizer_config.json
-rw-r--r-- 1 root root  964 Jan 17 19:04 special_tokens_map.json
-rw-r--r-- 1 root root 3.0K Jan 17 19:04 sparse_linear.pt
-rw-r--r-- 1 root root 4.9M Jan 17 19:04 sentencepiece.bpe.model
-rw-r--r-- 1 root root 2.1M Jan 17 19:04 colbert_linear.pt
-rw-r--r-- 1 root root 7.0K Jan 17 19:04 training_args.bin
-rw-r--r-- 1 root root  17M Jan 17 19:04 tokenizer.json
drwxrwxrwx 3 root root 4.0K Jan 17 19:04 global_step20000/
-rw-r--r-- 1 root root  22K Jan 17 19:04 rng_state_5.pth
-rw-r--r-- 1 root root  22K Jan 17 19:04 rng_state_0.pth
-rw-r--r-- 1 root root   16 Jan 17 19:04 latest
-rw-r--r-- 1 root root 3.4M Jan 17 19:04 trainer_state.json
-rw-r--r-- 1 root root  22K Jan 17 19:04 rng_state_7.pth
-rw-r--r-- 1 root root  22K Jan 17 19:04 rng_state_6.pth
-rw-r--r-- 1 root root  22K Jan 17 19:04 rng_state_4.pth
-rw-r--r-- 1 root root  22K Jan 17 19:04 rng_state_3.pth
-rw-r--r-- 1 root root  22K Jan 17 19:04 rng_state_2.pth
-rw-r--r-- 1 root root  22K Jan 17 19:04 rng_state_1.pth

The model looks totally different from the 'BAAI/bge-m3', I loaded it and got many errors.
I tried to use save_ckpt_for_sentence_transformers method, but got the same errors.

Traceback (most recent call last):
  File "/root/paddlejob/workspace/env_run/liuli/FlagEmbedding/to_sentence_transformer_model.py", line 19, in <module>
    save_ckpt_for_sentence_transformers(ckpt_dir, pooling_mode='cls', normlized=True)
  File "/root/paddlejob/workspace/env_run/liuli/FlagEmbedding/to_sentence_transformer_model.py", line 6, in save_ckpt_for_sentence_transformers
    word_embedding_model = models.Transformer(ckpt_dir)
  File "/root/.local/virtualenvs/xxx/lib/python3.9/site-packages/sentence_transformers/models/Transformer.py", line 78, in __init__
    self._load_model(model_name_or_path, config, cache_dir, backend, **model_args)
  File "/root/.local/virtualenvs/xxx/lib/python3.9/site-packages/sentence_transformers/models/Transformer.py", line 138, in _load_model
    self.auto_model = AutoModel.from_pretrained(
  File "/root/.local/virtualenvs/xxx/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py", line 564, in from_pretrained
    return model_class.from_pretrained(
  File "/root/.local/virtualenvs/xxx/lib/python3.9/site-packages/transformers/modeling_utils.py", line 3735, in from_pretrained
    with safe_open(resolved_archive_file, framework="pt") as f:
OSError: No such device (os error 19)

I have no idea to inference with the finetuned model. Can you help me?

@liulizuel
Copy link
Author

liulizuel commented Jan 20, 2025

后面解决这个问题了,是因为之前训练的时候由于机器的磁盘空间不足,所以模型保存载挂载的afs目录下,读取模型的时候不支持直接afs直接读取,从afs目录拷贝到磁盘上就行了;

For anyone who meets this error, copy the model into your local disk, this problem will be fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant