Regex to capture html elements with their class name

Solution 1:

Regex is a poor choice for HTML parsing, but luckily this is trivial with BeautifulSoup:

from bs4 import BeautifulSoup

html = """<div class="header_container container_12">
        <div class="grid_5">
              <h1><a href="#">Logo Text Here</a></h1>
        </div>
        <div class="grid_7">
            <div class="menu_items"> 
                <a href="#" class="home active">Home</a><a href="#" class="portfolio">Portfolio</a> 
               <a href="#" 
                class="about">About Me
                </a><a href="#" class="contact">Contact Me</a> 
            </div>
        </div>
</div>"""
    
for elem in BeautifulSoup(html, "lxml").find_all(attrs={"class": True}):
    print(elem.attrs["class"], elem.name)

Output:

['header_container', 'container_12'] div
['grid_5'] div
['grid_7'] div
['menu_items'] div
['home', 'active'] a
['portfolio'] a
['about'] a
['contact'] a

You can put this into a dict as you desire, but be careful since more than one element will likely map to each bucket. All it'd tell you is that an element exists and has a certain tag name given a specific class name string or tuple in a specific order.

elems = {}

for elem in BeautifulSoup(html, "lxml").find_all(attrs={"class": True}):
    elems[tuple(elem.attrs["class"])] = elem.name

for k, v in elems.items():
    print(k, v)

First row of results should never be displayed

Autodesk Forge Viewer Api Cannot load markups inside screenshot

Scanf not working properly with "%c" but works [duplicate]

ElementClickInterceptedException: Message: element click intercepted: Element <label> is not clickable with Selenium and Python

Weird performance effects from nearby dependent stores in a pointer-chasing loop on IvyBridge. Adding an extra load speeds it up?

Break in Class Module vs. Break on Unhandled Errors (VB6 Error Trapping, Options Setting in IDE)

What does language_level in setup.py for cython do?

Why doesn't setTimeout(.., 0) execute immediately?

High memory consumption with Enumerable.Range?

Center Navbar links without brand pushing it to the right in Bootstrap 4?

How to avoid multiple instances of windows form in c#

how can I detect arrow keys in java console not in GUI? [duplicate]