見出し画像

NLP | GINZA v5のspacy.load('ja_ginza_electra')でエラー

Dockerで新たな環境を構築しようとした時に遭遇したエラーと対処です。

エラー内容

GINZAを含んだアプケーション起動時に以下のエラーが発生しました。

Cannot find the requested files in the cached path and outgoing traffic has been disabled. To enable model look-ups and downloads online, set 'local_files_only' to False.
trying to download model from huggingface hub: /tmp/tmp6pxqkj3y/config.json ...
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/transformers/configuration_utils.py", line 568, in get_config_dict
    resolved_config_file = cached_path(
  File "/usr/local/lib/python3.9/site-packages/transformers/file_utils.py", line 1776, in cached_path
    output_path = get_from_cache(
  File "/usr/local/lib/python3.9/site-packages/transformers/file_utils.py", line 1994, in get_from_cache
    raise FileNotFoundError(
FileNotFoundError: Cannot find the requested files in the cached path and outgoing traffic has been disabled. To enable model look-ups and downloads online, set 'local_files_only' to False.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/ginza_transformers/layers/hf_shim_custom.py", line 127, in from_bytes
    transformer = AutoModel.from_pretrained(config._name_or_path, local_files_only=True)
  File "/usr/local/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py", line 418, in from_pretrained
    config, kwargs = AutoConfig.from_pretrained(
  File "/usr/local/lib/python3.9/site-packages/transformers/models/auto/configuration_auto.py", line 582, in from_pretrained
    config_dict, _ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/transformers/configuration_utils.py", line 593, in get_config_dict
    raise EnvironmentError(msg)
OSError: Can't load config for '/tmp/tmp6pxqkj3y/config.json'. Make sure that:

- '/tmp/tmp6pxqkj3y/config.json' is a correct model identifier listed on 'https://huggingface.co/models'
  (make sure '/tmp/tmp6pxqkj3y/config.json' is not a path to a local directory with something else, in that case)

- or '/tmp/tmp6pxqkj3y/config.json' is the correct path to a directory containing a config.json file



During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.9/site-packages/spacy/__init__.py", line 51, in load
    return util.load_model(
  File "/usr/local/lib/python3.9/site-packages/spacy/util.py", line 420, in load_model
    return load_model_from_package(name, **kwargs)  # type: ignore[arg-type]
  File "/usr/local/lib/python3.9/site-packages/spacy/util.py", line 453, in load_model_from_package
    return cls.load(vocab=vocab, disable=disable, exclude=exclude, config=config)  # type: ignore[attr-defined]
  File "/usr/local/lib/python3.9/site-packages/ja_ginza_electra/__init__.py", line 10, in load
    return load_model_from_init_py(__file__, **overrides)
  File "/usr/local/lib/python3.9/site-packages/spacy/util.py", line 615, in load_model_from_init_py
    return load_model_from_path(
  File "/usr/local/lib/python3.9/site-packages/spacy/util.py", line 489, in load_model_from_path
    return nlp.from_disk(model_path, exclude=exclude, overrides=overrides)
  File "/usr/local/lib/python3.9/site-packages/spacy/language.py", line 2039, in from_disk
    util.from_disk(path, deserializers, exclude)  # type: ignore[arg-type]
  File "/usr/local/lib/python3.9/site-packages/spacy/util.py", line 1300, in from_disk
    reader(path / key)
  File "/usr/local/lib/python3.9/site-packages/spacy/language.py", line 2033, in <lambda>
    deserializers[name] = lambda p, proc=proc: proc.from_disk(  # type: ignore[misc]
  File "/usr/local/lib/python3.9/site-packages/ginza_transformers/pipeline_component.py", line 76, in from_disk
    super().from_disk(path, exclude=exclude)
  File "/usr/local/lib/python3.9/site-packages/spacy_transformers/pipeline_component.py", line 420, in from_disk
    util.from_disk(path, deserialize, exclude)
  File "/usr/local/lib/python3.9/site-packages/spacy/util.py", line 1300, in from_disk
    reader(path / key)
  File "/usr/local/lib/python3.9/site-packages/spacy_transformers/pipeline_component.py", line 394, in load_model
    self.model.from_bytes(mfile.read())
  File "/usr/local/lib/python3.9/site-packages/thinc/model.py", line 593, in from_bytes
    return self.from_dict(msg)
  File "/usr/local/lib/python3.9/site-packages/thinc/model.py", line 631, in from_dict
    node.shims[i].from_bytes(shim_bytes)
  File "/usr/local/lib/python3.9/site-packages/ginza_transformers/layers/hf_shim_custom.py", line 130, in from_bytes
    transformer = AutoModel.from_pretrained(config._name_or_path)
  File "/usr/local/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py", line 418, in from_pretrained
    config, kwargs = AutoConfig.from_pretrained(
  File "/usr/local/lib/python3.9/site-packages/transformers/models/auto/configuration_auto.py", line 582, in from_pretrained
    config_dict, _ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/transformers/configuration_utils.py", line 552, in get_config_dict
    configuration_file = get_configuration_file(
  File "/usr/local/lib/python3.9/site-packages/transformers/configuration_utils.py", line 843, in get_configuration_file
    all_files = get_list_of_files(
  File "/usr/local/lib/python3.9/site-packages/transformers/file_utils.py", line 2103, in get_list_of_files
    return list_repo_files(path_or_repo, revision=revision, token=token)
  File "/usr/local/lib/python3.9/site-packages/huggingface_hub/hf_api.py", line 884, in list_repo_files
    info = self.model_info(
  File "/usr/local/lib/python3.9/site-packages/huggingface_hub/hf_api.py", line 868, in model_info
    r.raise_for_status()
  File "/usr/local/lib/python3.9/site-packages/requests/models.py", line 960, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/api/models//tmp/tmp6pxqkj3y/config.json

対応内容

結果として対応した内容を先に記載しておきます。
requirements.txtにバージョン指定のspacy-transformersを入れました。

ginza
ja-ginza-electra
spacy-transformers==1.1.3

調査内容

まず、StackTraceなどからエラー発生箇所を探したところ以下が特定でき、Pythonのコンソールから試しても同様のエラーとなりました。

import spacy
nlp = spacy.load('ja_ginza_electra')

そこから、Docker内のユーザー環境やvenvやweb検索など色々試したのですが原因はわからず、たまたま残っていた数日前の環境と見比べてみたところ、spacy-transformersのバージョンが異なっていたので以前のバージョンに合わせてみたところ上記エラーが解消されました。

OKバージョン

% pip list|grep ginza
ginza              5.1.0
ginza-transformers 0.4.0
ja-ginza-electra   5.1.0
% pip list|grep spacy
spacy              3.2.1
spacy-alignments   0.8.4
spacy-legacy       3.0.8
spacy-loggers      1.0.1
spacy-transformers 1.1.3

NG バージョン

% pip list|grep ginza
ginza              5.1.0
ginza-transformers 0.4.0
ja-ginza-electra   5.1.0
% pip list|grep spacy
spacy              3.2.1
spacy-alignments   0.8.4
spacy-legacy       3.0.8
spacy-loggers      1.0.1
spacy-transformers 1.1.4

spacy-transformersの差分確認

差分(v1.1.3 -> v.1.1.4)を見てみましたが、transformersの対応バージョンの変更以外は影響なさそうに見えます。

transformersの差分(v4.13.0 -> v4.15.0)を見てみると、こちらは多すぎて・・・

さいごに

ひとまずエラーは解消されたのですが、その後の動きは確認出来ていないので、単純にバージョン指定で入れ直すだけで大丈夫かはこれから順次見ていこうかと思います。


この記事が気に入ったらサポートをしてみませんか?