Extracting data from html and generating a CSV

Solution 1:

Using python (although it can be also done with both js and php) and utilizing xpath, you can try the following:

import lxml.html as lh
reviews = """your html above"""

doc = lh.fromstring(reviews)
sections = doc.xpath('//section')
for section in sections:
    reviewer = section.xpath('.//div[@class="reviewer"]/span/text()')[0]
    date = section.xpath('.//div[@class="review-date"]/meta/@content')[0]
    review = section.xpath('.//div[@class="type-full"]/span/text()')[0]
    rating = section.xpath('.//div[@class="review-rating"]//meta/@content')[1]

    print(f"{date}, {reviewer}, {review}, {rating}")

The output should be

2022-01-05, Joe K., Review goes here Review Goes Here Review Goes Here, 5.0