Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please pin transformers to <3.0.0 as the new installs of emBERT are broken #15

Open
dlazesz opened this issue Oct 7, 2020 · 7 comments

Comments

@dlazesz
Copy link

dlazesz commented Oct 7, 2020

Due to API breakage in transformers package, you should either pin the package version, or update emBERT to support newer transformers package version.

Personally, I recommend the first option.

Thank You!

@DavidNemeskey
Copy link
Owner

I am in the process of updating it to the latest version anyway (AutoModel and friends look very useful), so once that's done, everything should work fine. From that point onward I will also make use of releases so everything should be way more stable in a few weeks.

@dlazesz
Copy link
Author

dlazesz commented Nov 13, 2020

I saw your commit on the dependencies lately.

Meanwhile I have created a workaround in emtsv based on my findings.

Could you review an pin all necessary packages (with the transitive dependencies) to a working state in order to resolve this issue and allow to remove the workaround?

We are about to create a new release from emtsv next week, we would be glad to have this issue resolved in that release.

@DavidNemeskey
Copy link
Owner

@dlazesz What breaks for you? I have transformers 3.4 and it works for me. I still pinned it to below 3.5, but I see no reason why I should go below 3.0.

@dlazesz
Copy link
Author

dlazesz commented Nov 19, 2020

@DavidNemeskey

I try to use emBERT with emtsv on Ubuntu 18.04 in the following configurations:

  1. clone emtsv and install system package requirements (hfst, libhunspell-dev, etc.)
  2. editor requirements.txt delete or comment lines 3-11 to disable workaround
  3. cd embert && git pull origin master
  4. python3 -m venv venv
  5. ./venv/bin/pip install wheel cython numpy
  6. ./venv/bin/pip install -e embert/ __Note: These packages are pinned to different version compared to emBERT's requirements.txt

Now echo "Az alma piros volt." | ./venv/bin/python main.py tok,bert-np should work, but yields:

Szegmentálási hiba (core készült)

For installed packages see requirements_setuppy.txt attached.

After this I tried uninstalling packages installed by setup.py and install the ones in requirements.txt:

Effectively meaning: ./venv/bin/pip install tokenizers==0.9.3 transformers==3.5.1

This yields:

Loading BERT szeged_maxnp_bioes model...Traceback (most recent call last):
  File ".../emtsv/embert/embert/embert.py", line 64, in _load_model
    tokenizer, self.model = self._load_model_from_disk(str(model_dir))
  File ".../emtsv/embert/embert/embert.py", line 89, in _load_model_from_disk
    model = TokenClassifier.from_pretrained(model_dir)
  File ".../emtsv/venv/lib/python3.6/site-packages/transformers/modeling_utils.py", line 1072, in from_pretrained
    model.__class__.__name__, "\n\t".join(error_msgs)
RuntimeError: Error(s) in loading state_dict for TokenClassifier:
	size mismatch for classifier.weight: copying a param with shape torch.Size([8, 768]) from checkpoint, the shape in current model is torch.Size([2, 768]).
	size mismatch for classifier.bias: copying a param with shape torch.Size([8]) from checkpoint, the shape in current model is torch.Size([2]).

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "main.py", line 25, in <module>
    output_iterator.writelines(build_pipeline(input_data, used_tools, tools, presets, conll_comments))
  File ".../emtsv/venv/lib/python3.6/site-packages/xtsv/pipeline.py", line 27, in build_pipeline
    current_initialised_tools = lazy_init_tools(used_tools, available_tools, presets, singleton_store)
  File ".../emtsv/venv/lib/python3.6/site-packages/xtsv/pipeline.py", line 144, in lazy_init_tools
    inited_prog = prog_imp(*prog_args, **prog_kwargs)  # Inint programs...
  File ".../emtsv/embert/embert/embert.py", line 35, in __init__
    self._load_model()
  File ".../emtsv/embert/embert/embert.py", line 77, in _load_model
    raise ValueError(f'Could not load model {self.config["model"]}: {e}')
ValueError: Could not load model szeged_maxnp_bioes: Error(s) in loading state_dict for TokenClassifier:
	size mismatch for classifier.weight: copying a param with shape torch.Size([8, 768]) from checkpoint, the shape in current model is torch.Size([2, 768]).
	size mismatch for classifier.bias: copying a param with shape torch.Size([8]) from checkpoint, the shape in current model is torch.Size([2]).

For installed packages see requirements_req.txt attached.

Then I try the emtsv workaround version of the packages uncommenting the previously commented lines in emtsv's requirements.txt.

This yields:

Traceback (most recent call last):
  File "main.py", line 25, in <module>
    output_iterator.writelines(build_pipeline(input_data, used_tools, tools, presets, conll_comments))
  File ".../emtsv/venv/lib/python3.6/site-packages/xtsv/pipeline.py", line 27, in build_pipeline
    current_initialised_tools = lazy_init_tools(used_tools, available_tools, presets, singleton_store)
  File ".../emtsv/venv/lib/python3.6/site-packages/xtsv/pipeline.py", line 110, in lazy_init_tools
    importlib.import_module(module), prog   # Silently import everything for the JAVA CLASSPATH...
  File "/usr/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 941, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File ".../emtsv/embert/embert/__init__.py", line 4, in <module>
    from .embert import EmBERT
  File ".../emtsv/embert/embert/embert.py", line 12, in <module>
    from transformers import BertTokenizer
  File ".../emtsv/venv/lib/python3.6/site-packages/transformers/__init__.py", line 351, in <module>
    from .trainer import Trainer, set_seed, torch_distributed_zero_first, EvalPrediction
  File ".../emtsv/venv/lib/python3.6/site-packages/transformers/trainer.py", line 65, in <module>
    wandb.ensure_configured()
AttributeError: module 'wandb' has no attribute 'ensure_configured'

For installed packages see requirements_emtsv_req.txt attached.

Finally installing wandb by issuing ./venv/bin/pip install wandb

The following output is produced:

wandb: WARNING W&B installed but not logged in.  Run `wandb login` or set the WANDB_API_KEY env variable.
Loading BERT szeged_maxnp_bioes model....../emtsv/venv/lib/python3.6/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at  /pytorch/c10/cuda/CUDAFunctions.cpp:100.)
  return torch._C._cuda_getDeviceCount() > 0
done
.../emtsv/embert/embert/viterbi.py:29: RuntimeWarning: divide by zero encountered in log
  self.init = fn(init, dtype=float)
.../emtsv/embert/embert/viterbi.py:31: RuntimeWarning: divide by zero encountered in log
  self.trans = fn(trans, dtype=float).T
form	wsafter	NP-BIO
returning [['Az', '" "', 'B-NP'], ['alma', '" "', 'E-NP'], ['piros', '" "', '1-NP'], ['volt', '""', 'O'], ['.', '"\\n"', 'O']]
Az	" "	B-NP
alma	" "	E-NP
piros	" "	1-NP
volt	""	O
.	"\n"	O

From this state if I try to increase the version of transformers it crashes in various ways from JAVA SIGSERV to dimension erros.

Could you also test your working setup with emtsv? Do you use any features from transformers introduced after 2.10.0?

Thank you!

@dlazesz
Copy link
Author

dlazesz commented Nov 20, 2020

@DavidNemeskey

I have tested the whole thing on Ubuntu 20.04 with the same results.

Meanwhile, I have found a workaround to lift the requirement for wandb. Setting os.environ['WANDB_DISABLED']="true" allows to run without wandb installed. I think this should be the default behaviour when using with emtsv.

Could this be implemented inside emBERT or it should be set externally before the imports?

Do you have any clue about the error below?

  File ".../emtsv/embert/embert/embert.py", line 35, in __init__
    self._load_model()
  File ".../emtsv/embert/embert/embert.py", line 77, in _load_model
    raise ValueError(f'Could not load model {self.config["model"]}: {e}')
ValueError: Could not load model szeged_maxnp_bioes: Error(s) in loading state_dict for TokenClassifier:
	size mismatch for classifier.weight: copying a param with shape torch.Size([8, 768]) from checkpoint, the shape in current model is torch.Size([2, 768]).
	size mismatch for classifier.bias: copying a param with shape torch.Size([8]) from checkpoint, the shape in current model is torch.Size([2]).

The most current versions of the transformers and torch packages yields these errors, so fixing them whould also solve the pinning issue.

@DavidNemeskey
Copy link
Owner

I will look into it after the MSZNY deadline. I am not sure I need transformers > 3.0. Also, these errors seem a bit strange. For one, I don't even have wandb installed, why do you? Do you need it for anything?

I pinned transformers in requirements.txt as well.

@dlazesz
Copy link
Author

dlazesz commented Nov 20, 2020

I will look into it after the MSZNY deadline.

OK. MSZNY first.

I am not sure I need transformers > 3.0. Also, these errors seem a bit strange. For one, I don't even have wandb installed, why do you? Do you need it for anything?

I do not use wandb. I actually do not even know how these stuff works, but somehow it was installed on the system python environment and causing problems in the virtualenv. Very strange. The funny thing is emBERT worked in the docker version of emtsv without even asking for wandb, so you are right it actually should work without it even when it is installed.

I pinned transformers in requirements.txt as well.

Thank you! And thank you for the further investigations in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants