Using sentence transformers with limited access to internet
I have access to the latest packages but I cannot access internet from my python enviroment.
Package versions that I have are as below
huggingface-hub-0.4.0 sacremoses-0.0.47 tokenizers-0.10.3 transformers-4.15.0
sentence-transformers-2.1.0 sentencepiece-0.1.96 torchvision-0.11.2
print (torch.__version__)
1.10.1+cu102
I went to the location and copied all the files in a folder
os.listdir('multi-qa-mpnet-base-dot-v1_Jan2022/')
['config_sentence_transformers.json',
'config.json',
'gitattributes',
'modules.json',
'data_config.json',
'sentence_bert_config.json',
'README.md',
'special_tokens_map.json',
'tokenizer_config.json',
'train_script.py',
'vocab.txt',
'tokenizer.json',
'1_Pooling',
'.ipynb_checkpoints',
'9e1e76b7a067f72e49c7f571cd8e811f7a1567bec49f17e5eaaea899e7bc2c9e']
Then I went to the url and tried to execute the code listed there
But I get below error
model = SentenceTransformer('multi-qa-mpnet-base-dot-v1_Jan2022/')
OSError: Error no file named ['pytorch_model.bin', 'tf_model.h5', 'model.ckpt.index', 'flax_model.msgpack'] found in directory multi-qa-mpnet-base-dot-v1_Jan2022/ or `from_tf` and `from_flax` set to False.
Where could I get those 4 files ('pytorch_model.bin', 'tf_model.h5', 'model.ckpt.index', 'flax_model.msgpack'
) or what else do I need to change? These files are not available at the first URL mentioned above
Based on the things you mentioned, I checked the source code of sentence-transformers
on Google Colab. After running the model and getting the files, I check the directory and I saw the pytorch_model.bin
there.
And according to sentence-transformers
code:
Link
the flax_model.msgpack
, rust_model.ot
, tf_model.h5
are getting ignored when the it is trying to download.
and these are the files that it downloads :
['1_Pooling', 'config_sentence_transformers.json', 'tokenizer.json', 'tokenizer_config.json', 'modules.json', 'sentence_bert_config.json', 'pytorch_model.bin', 'special_tokens_map.json', 'config.json', 'train_script.py', 'data_config.json', 'README.md', '.gitattributes', 'vocab.txt']
The only thing that you have to have to load the model is pytorch_model.bin
file. I tested with copying the modules to another directory and it worked. And according to your question, you haven't downloaded this file, so that is the problem.
All in all, you should download the model using its command and then move the files to another directory and initialize the SentenceTransformer
class with that dir.
I wish it would be helpful.