How to space split a text and retain the token spans?

Maybe you can use WhitespaceTokenizer instead of split.

from nltk.tokenize import WhitespaceTokenizer, sent_tokenize, word_tokenize
text = 'The answer is here .'

spans = WhitespaceTokenizer().span_tokenize( text )
spans = list(spans)
print(spans)

Recent Posts

org.apache.kafka.common.errors.TimeoutException: Topic not present in metadata after 60000 ms

Why my code runs infinite time when i entered non integer type in c++ [duplicate]

How to retrieve Instagram username from User ID?

Serverless Framework - Variables resolution error

How do we access a file in github repo inside our azure databricks notebook

How to space split a text and retain the token spans?

Related

Recent Posts