What are the most common english words to start a sentence?
Solution 1:
You can use Python's NLTK tokenizer to split text into sentences. It has a good list of abbreviations to handle sentences like "arrived c. 1778, at which time ..." but you will have to add some abbreviations to it as well.
import nltk
nltk.download('punkt')
abbreviations = ['approx', 'ausf']
sentence_tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
sentence_tokenizer._params.abbrev_types.update(abbreviations)
sentence_tokenizer.tokenize("He arrived c. 1778, at which time he left. He was born approx. July 1941, when he died. The Panzer II Ausf. A to C had 14 mm of slightly sloped.")