Using sentence transformers with limited access to internet

I have access to the latest packages but I cannot access internet from my python enviroment.

Package versions that I have are as below

huggingface-hub-0.4.0 sacremoses-0.0.47 tokenizers-0.10.3 transformers-4.15.0
sentence-transformers-2.1.0 sentencepiece-0.1.96 torchvision-0.11.2

print (torch.__version__)
1.10.1+cu102

I went to the location and copied all the files in a folder

os.listdir('multi-qa-mpnet-base-dot-v1_Jan2022/')

['config_sentence_transformers.json',
 'config.json',
 'gitattributes',
 'modules.json',
 'data_config.json',
 'sentence_bert_config.json',
 'README.md',
 'special_tokens_map.json',
 'tokenizer_config.json',
 'train_script.py',
 'vocab.txt',
 'tokenizer.json',
 '1_Pooling',
 '.ipynb_checkpoints',
 '9e1e76b7a067f72e49c7f571cd8e811f7a1567bec49f17e5eaaea899e7bc2c9e']

Then I went to the url and tried to execute the code listed there

But I get below error

model = SentenceTransformer('multi-qa-mpnet-base-dot-v1_Jan2022/')

OSError: Error no file named ['pytorch_model.bin', 'tf_model.h5', 'model.ckpt.index', 'flax_model.msgpack'] found in directory multi-qa-mpnet-base-dot-v1_Jan2022/ or `from_tf` and `from_flax` set to False.

Where could I get those 4 files ('pytorch_model.bin', 'tf_model.h5', 'model.ckpt.index', 'flax_model.msgpack') or what else do I need to change? These files are not available at the first URL mentioned above

Based on the things you mentioned, I checked the source code of sentence-transformers on Google Colab. After running the model and getting the files, I check the directory and I saw the pytorch_model.bin there.

download directory

And according to sentence-transformers code: Link

code

the flax_model.msgpack , rust_model.ot, tf_model.h5 are getting ignored when the it is trying to download.

and these are the files that it downloads :

['1_Pooling', 'config_sentence_transformers.json', 'tokenizer.json', 'tokenizer_config.json', 'modules.json', 'sentence_bert_config.json', 'pytorch_model.bin', 'special_tokens_map.json', 'config.json', 'train_script.py', 'data_config.json', 'README.md', '.gitattributes', 'vocab.txt']

The only thing that you have to have to load the model is pytorch_model.bin file. I tested with copying the modules to another directory and it worked. And according to your question, you haven't downloaded this file, so that is the problem.

All in all, you should download the model using its command and then move the files to another directory and initialize the SentenceTransformer class with that dir.

I wish it would be helpful.

Using sentence transformers with limited access to internet

Related

Recent Posts