How to reduce the execution time for translation using mBART-50 and Hugging Face?
I could be able to explore 2 solutions to optimize the process. Lemme gimme a summary in below.
- 1. executing it in gpu, if available
from transformers import MBartForConditionalGeneration, MBart50TokenizerFast
model= MBartForConditionalGeneration.from_pretrained("facebook/mbart-large-50-many-to-many-mmt")**.to('cuda')**
tokenizer=MBart50TokenizerFast.from_pretrained("facebook/mbart-large-50-many-to-many-mmt",src_lang="hi_IN")
text="मैं ठीक हूँ।"
model_inputs=tokenizer(text,return_tensors="pt")
generated_tokens=model.generate(model_inputs,forced_bos_token_id=tokenizer.lang_code_to_id["en_XX"])**.to('cuda')**
translation=tokenizer.batch_decode(generated_tokens,skip_special_tokens=True)
print(translation)
- 2. By limiting the no of words to be translated.
model_inputs=tokenizer(text,return_tensors="pt", max_length=500, truncation=True)
- 3. Multiprocessing fails in this case - As the pretrained model is a neural network which runs in batches, hence any kind of multiprocessing fails.