How to obtain title attribute using python and beautifulsoup?

Solution 1:

To get an attribute of an element, you can treat an element as a dictionary (reference):

soup.find('tag_name')['attribute_name']

And, in your case:

for tr in soup.find_all('tr'):
    for td in tr.find_all('td'):
        print(td.get('title', 'No title attribute'))

Note that I've used .get() method to avoid failing on td elements with no title attribute.

Solution 2:

The lxml library is often useful too, because it makes it possible to identify HTML structures using xpath expressions which can make for more compact codes.

In this case, the xpath expression //td[@title] asks for all td elements but insists that the title attribute be present. In the for-loop you see that there is no need to check for the presence of the attribute as this has already been done.

>>> from io import StringIO
>>> HTML = StringIO('''\
... <td title="title 1" role="gridcell"><a onclick="open" href="#">TEXT</a></td>
... <td role="gridcell"><a onclick="open" href="#">TEXT</a></td>
... <td title="title 2" role="gridcell"><a onclick="open" href="#">TEXT</a></td>
... <td title="title 3" role="gridcell"><a onclick="open" href="#">TEXT</a></td>''')
>>> parser = etree.HTMLParser()
>>> tree = etree.parse(HTML, parser)
>>> tds = tree.findall('//td[@title]')
>>> tds
[<Element td at 0x7a0888>, <Element td at 0x7a0d08>, <Element td at 0x7ae588>]
>>> for item in tree.findall('//td[@title]'):
...     item.attrib['title']
...     
'title 1'
'title 2'
'title 3'