This repository has been archived by the owner on Dec 16, 2022. It is now read-only.
How to use PretrainedTransformerTokenizer
and PretrainedTransformerIndexer
without the PretrainedTransformerEmbedder
#5353
Unanswered
gabeorlanski
asked this question in
Q&A
Replies: 1 comment 4 replies
-
Hey @gabeorlanski, you could call |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am using the
PretrainedTransformerTokenizer
to tokenize my target sequences in a composed seq2seq model. I use thePretrainedTransformerEmbedder
as the encoder and theAutoRegressiveDecoder
for decoding. In the latter, there are target embeddings that I would like to keep as the basic text embeddings rather than use thePretrainedTransformerEmbedder
. However, I also wanted to use thePretrainedTransformerIndexer
to get access to the full vocabulary of the pretrained model.However, from my attempts thus far, it seems that using the
PretrainedTransformerIndexer
does not initialize the vocabulary with its namespace until after the model has been created. For the normal embeddings, this has the issue of causing one of its dimensions always to be 2 as the vocabulary is empty. Is there an officially supported way to address this?Beta Was this translation helpful? Give feedback.
All reactions