Why does searching for text of a pound sign using Python's BeautifulSoup return no results when the page contains that symbol?
Code:
from bs4 import BeautifulSoup
import requests
url = 'http://books.toscrape.com/'
result = requests.get(url)
doc = BeautifulSoup(result.text,"html.parser")
prices = doc.find(text = "£")
print(prices)
Output:
None
Solution 1:
What happens?
With searching by text you will only find exact matches and there is no single element that only contains £
.
How to fix?
Use regex to find a single element that contains £
:
prices = doc.find(text=re.compile("£"))
or multiple elements:
prices = doc.find_all(text=re.compile("£"))
Example
from bs4 import BeautifulSoup
import requests, re
url = 'http://books.toscrape.com/'
result = requests.get(url)
doc = BeautifulSoup(result.content)
prices = doc.find_all(text=re.compile("£"))
print(prices)
Output
['£51.77', '£53.74', '£50.10', '£47.82', '£54.23', '£22.65', '£33.34', '£17.93', '£22.60', '£52.15', '£13.99', '£20.66', '£17.46', '£52.29', '£35.02', '£57.25', '£23.88', '£37.59', '£51.33', '£45.17']