How to extract number of max results from pagination with beautifulsoup?
Select your tag more specific, one option ist to use css selectors
to chain conditions - Select first direct <span>
of <div>
with class pagination
, split the text by whitespace and grab the last element of the list:
soup.select_one('div.pagination > span').text.split(' ')[-1]
Example
html = '''<div class="pagination"><span>Showing 1-30 of 2143</span><ul><li><div class="prev"></div></li><li><span class="disabled">1</span></li><li><a data-analytics='{"click_id":132,"module":1,"listing_page":2}' data-page="2" data-remote="true" href="/san-francisco-ca/dentists?page=2">2</a></li><li><a data-analytics='{"click_id":132,"module":1,"listing_page":3}' data-page="3" data-remote="true" href="/san-francisco-ca/dentists?page=3">3</a></li><li><a data-analytics='{"click_id":132,"module":1,"listing_page":4}' data-page="4" data-remote="true" href="/san-francisco-ca/dentists?page=4">4</a></li><li><a data-analytics='{"click_id":132,"module":1,"listing_page":5}' data-page="5" data-remote="true" href="/san-francisco-ca/dentists?page=5">5</a></li><li><a class="next ajax-page" data-analytics='{"click_id":132}' data-page="2" data-remote="true" href="/san-francisco-ca/dentists?page=2">Next</a></li></ul></div>'''
soup=BeautifulSoup(html,'lxml')
soup.select_one('div.pagination > span').text.split(' ')[-1]
Output
2143
Instead of numbers.get_text
, find
"span", get text and rsplit
by 1 and take the second element:
out = numbers.find('span').text.rsplit(' ', 1)[1]
Output:
'2143'