How to capitalize the first letter of every sentence?

I'm trying to write a program that capitalizes the first letter of each sentence. This is what I have so far, but I cannot figure out how to add back the period in between sentences. For example, if I input:

hello. goodbye

the output is

Hello Goodbye

and the period has disappeared.

string=input('Enter a sentence/sentences please:')
sentence=string.split('.')
for i in sentence:
    print(i.capitalize(),end='')

You could use nltk for sentence segmentation:

#!/usr/bin/env python3
import textwrap
from pprint import pprint
import nltk.data # $ pip install http://www.nltk.org/nltk3-alpha/nltk-3.0a3.tar.gz
# python -c "import nltk; nltk.download('punkt')"

sent_tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
text = input('Enter a sentence/sentences please:')
print("\n" + textwrap.fill(text))
sentences = sent_tokenizer.tokenize(text)
sentences = [sent.capitalize() for sent in sentences]
pprint(sentences)

Output

Enter a sentence/sentences please:
a period might occur inside a sentence e.g., see! and the sentence may
end without the dot!
['A period might occur inside a sentence e.g., see!',
 'And the sentence may end without the dot!']

You could use regular expressions. Define a regex that matches the first word of a sentence:

import re
p = re.compile(r'(?<=[\.\?!]\s)(\w+))

This regex contains a positive lookbehind assertion (?<=...) which matches either a ., ? or !, followed by a whitespace character \s. This is followed by a group that matches one or more alphanumeric characters \w+. In effect, matching the next word after the end of a sentence.

You can define a function that will capitalise regex match objects, and feed this function to sub():

def cap(match):
    return(match.group().capitalize())

p.sub(cap, 'Your text here. this is fun! yay.')

You might want to do the same for another regex that matches the word at the beginning of a string:

p2 = re.compile(r'^\w+')

Or make the original regex even harder to read, by combining them:

p = re.compile(r'((?<=[\.\?!]\s)(\w+)|(^\w+))')

You can use,

In [25]: st = "this is first sentence. this is second sentence. and this is third. this is fourth. and so on"

In [26]: '. '.join(list(map(lambda x: x.strip().capitalize(), st.split('.'))))
Out[26]: 'This is first sentence. This is second sentence. And this is third. This is fourth. And so on'

In [27]:

How to capitalize the first letter of every sentence?

Output

Related

Recent Posts