How to space split a text and retain the token spans?
Maybe you can use WhitespaceTokenizer instead of split.
from nltk.tokenize import WhitespaceTokenizer, sent_tokenize, word_tokenize
text = 'The answer is here .'
spans = WhitespaceTokenizer().span_tokenize( text )
spans = list(spans)
print(spans)