loop over li and nested ul in each li tag and create list for data frame
I'm having a html with some 24 div, in each div there is a h2 tag and ul tag. in the ul tag there are different number of li. In each li then there is h3 tag and a ul again, which again have a li tag with h4 tag enclosing and achor tag e.g.:
<div class="medicine-category col-xs-6">
<h2 class="tree-toggler nav-header">Anaesthesia</h2>
<ul class="ulh nav nav-list tree">
<li class="cat2-li">
<h3 class="tree-toggler nav-header">Anaesthetics - General</h3>
<ul class="ulh nav nav-list tree">
<li class="cat3-li">
<h4 class="tree-toggler nav-header"><a
href="//www.medicineindia.org/generic-medicine-by-category/561/inhalational-agents">Inhalational
Agents</a></h4>
</li>
<li class="cat3-li">
<h4 class="tree-toggler nav-header"><a
href="//www.medicineindia.org/generic-medicine-by-category/562/intravenous-inducing-agents">Intravenous
Inducing Agents</a></h4>
</li>
<li class="cat3-li">
<h4 class="tree-toggler nav-header"><a
href="//www.medicineindia.org/generic-medicine-by-category/563/intravenous-diassociative-anaesthetics">Intravenous-
Diassociative Anaesthetics</a></h4>
</li>
</ul>
</li>
<li class="cat2-li">
<h3 class="tree-toggler nav-header">Anaesthetics - Local</h3>
<ul class="ulh nav nav-list tree">
<li class="cat3-li">
<h4 class="tree-toggler nav-header"><a
href="//www.medicineindia.org/generic-medicine-by-category/575/adjuncts">Adjuncts</a></h4>
</li>
<li class="cat3-li">
<h4 class="tree-toggler nav-header"><a
href="//www.medicineindia.org/generic-medicine-by-category/573/amide-type">Amide type</a></h4>
</li>
<li class="cat3-li">
<h4 class="tree-toggler nav-header"><a
href="//www.medicineindia.org/generic-medicine-by-category/574/ester-type">Ester type</a></h4>
</li>
<li class="cat3-li">
<h4 class="tree-toggler nav-header"><a
href="//www.medicineindia.org/generic-medicine-by-category/622/phenyl-methanol">Phenyl Methanol</a></h4>
</li><span id="ezoic-pub-ad-placeholder-112" class="ezoic-adpicker-ad"></span><span
class="ezoic-ad medrectangle-3 medrectangle-3112 adtester-container adtester-container-112"
data-ez-name="medicineindia_org-medrectangle-3"><span id="div-gpt-ad-medicineindia_org-medrectangle-3-0"
ezaw="200" ezah="200"
style="position:relative;z-index:0;display:inline-block;padding:0;min-height:200px;min-width:200px;"
class="ezoic-ad">
<script data-ezscrex="false" data-cfasync="false" type="text/javascript"
style="display:none;">if (typeof __ez_fad_position != 'undefined') { __ez_fad_position('div-gpt-ad-medicineindia_org-medrectangle-3-0') };</script>
</span></span>
</ul>
</li>
<li class="cat2-li">
<h3 class="tree-toggler nav-header">General Anaesthetics-Adjuncts</h3>
<ul class="ulh nav nav-list tree">
<li class="cat3-li">
<h4 class="tree-toggler nav-header"><a
href="//www.medicineindia.org/generic-medicine-by-category/9/analgesics">Analgesics</a></h4>
</li>
<li class="cat3-li">
<h4 class="tree-toggler nav-header"><a
href="//www.medicineindia.org/generic-medicine-by-category/569/anticholinesterases-inhibitors">Anticholinesterases
Inhibitors</a></h4>
</li>
<li class="cat3-li">
<h4 class="tree-toggler nav-header"><a
href="//www.medicineindia.org/generic-medicine-by-category/32/antiemetics">Antiemetics</a></h4>
</li>
<li class="cat3-li">
<h4 class="tree-toggler nav-header"><a
href="//www.medicineindia.org/generic-medicine-by-category/572/bronchodialator">Bronchodialator</a></h4>
</li>
<li class="cat3-li">
<h4 class="tree-toggler nav-header"><a
href="//www.medicineindia.org/generic-medicine-by-category/568/depolarising-neuromuscular-blockers">Depolarising
Neuromuscular Blockers</a></h4>
</li>
<li class="cat3-li">
<h4 class="tree-toggler nav-header"><a
href="//www.medicineindia.org/generic-medicine-by-category/571/h2-blockers">H2 Blockers</a></h4>
</li>
<li class="cat3-li">
<h4 class="tree-toggler nav-header"><a
href="//www.medicineindia.org/generic-medicine-by-category/570/muscarinic-receptor-antagonists">Muscarinic
Receptor Antagonists</a></h4>
</li>
<li class="cat3-li">
<h4 class="tree-toggler nav-header"><a
href="//www.medicineindia.org/generic-medicine-by-category/566/neuroleptics">Neuroleptics</a></h4>
</li>
<li class="cat3-li">
<h4 class="tree-toggler nav-header"><a
href="//www.medicineindia.org/generic-medicine-by-category/567/non-depolarising-muscle-relaxants">Non
Depolarising Muscle Relaxants</a></h4>
</li>
<li class="cat3-li">
<h4 class="tree-toggler nav-header"><a
href="//www.medicineindia.org/generic-medicine-by-category/18/respiratory-stimulants">Respiratory
Stimulants</a></h4>
</li>
<li class="cat3-li">
<h4 class="tree-toggler nav-header"><a
href="//www.medicineindia.org/generic-medicine-by-category/565/sedative-antianxiety-drugs">Sedative-Antianxiety
Drugs</a></h4>
</li>
</ul>
</li>
</ul>
</div>
<div>
<h2>h2 title</h2>
<ul>
<li>
<h3>h3 title</h3>
<ul>
<li>
<h4>
<a href="https:test1.com">h4 title</a>
</h4>
</ul>
</li>
</ul
</div
I'm trying to create a list and subsequent dataframe:
content_list = []
for ele in content:
row = []
h2 = ele.find('h2').text
row.append(h2)
content_list.append(row)
ul = ele.find_next('ul')
for li in ul.find_all('li'):
if li.find('h3'):
h3 = li.find('h3').text
row.append(h3)
content_list.append(row)
if li.find('ul'):
li_ul = li.find('ul')
for lii in li_ul.find_all('li'):
if lii.find('h4'):
h4 = lii.find('h4').text
a = lii.find('a').get("href")
row.extend((h4, a))
content_list.append(row)
the result list i'm getting is:
[['Anaesthesia',
'Anaesthetics - General',
'Inhalational Agents',
'//www.medicineindia.org/generic-medicine-by-category/561/inhalational-agents',
'Intravenous Inducing Agents',
'//www.medicineindia.org/generic-medicine-by-category/562/intravenous-inducing-agents',
'Intravenous- Diassociative Anaesthetics',
'//www.medicineindia.org/generic-medicine-by-category/563/intravenous-diassociative-anaesthetics',
'Anaesthetics - Local',
'Adjuncts',
'//www.medicineindia.org/generic-medicine-by-category/575/adjuncts',
'Amide type',
'//www.medicineindia.org/generic-medicine-by-category/573/amide-type',
'Ester type',
'//www.medicineindia.org/generic-medicine-by-category/574/ester-type',
'Phenyl Methanol',
'//www.medicineindia.org/generic-medicine-by-category/622/phenyl-methanol',
'General Anaesthetics-Adjuncts',
'Analgesics',
'//www.medicineindia.org/generic-medicine-by-category/9/analgesics',
'Anticholinesterases Inhibitors',
'//www.medicineindia.org/generic-medicine-by-category/569/anticholinesterases-inhibitors',
'Antiemetics',
'//www.medicineindia.org/generic-medicine-by-category/32/antiemetics',
'Bronchodialator',...
How can I get a list like this:
[
['Anaeshtesia', 'Anaesthetics - General', 'Inhalation Agents', '//www.medicineindia.org/generic-medicine-by-category/561/inhalational-agents'],
['Anaeshtesia', 'Anaesthetics - General', 'Intravenous Inducing Agents','//www.medicineindia.org/generic-medicine-by-category/562/intravenous-inducing-agents'],
['Anaeshtesia', 'Anaesthetics - General', 'Intravenous- Diassociative Anaesthetics','//www.medicineindia.org/generic-medicine-by-category/563/intravenous-diassociative-anaesthetics'],
['Anaeshtesia', 'Anaesthetics - Local', 'Adjuncts', '//www.medicineindia.org/generic-medicine-by-category/575/adjuncts'],
['Anaeshtesia', 'Anaesthetics - Local', 'Amide type', '//www.medicineindia.org/generic-medicine-by-category/573/amide-type']
]
You keep appending to your content_list
multiple times within your loop. you should only be appending on the last step once you have completed a "row"
. Also something seems off in the logic. Without having the full html, it's hard to debug at the moment.
Try:
from bs4 import BeautifulSoup
import requests
url = 'https://www.medicineindia.org/medicine-categories'
html = requests.get(url).text
soup = BeautifulSoup(html, 'html.parser')
content = soup.find_all('li', {'class':'cat2-li'})
content_list = []
for ele in content:
title = ele.find_previous('h2').text
sub_title = ele.find('h3').text
ul = ele.find('ul')
for li in ul.find_all('li'):
row = [title, sub_title]
h4 = li.find('h4').text
row.append(h4)
href = li.find('h4').find('a', href=True)['href']
row.append(href)
content_list.append(row)
Output:
[
['Anaesthesia', 'Anaesthetics - General', 'Inhalational Agents', '//www.medicineindia.org/generic-medicine-by-category/561/inhalational-agents'],
['Anaesthesia', 'Anaesthetics - General', 'Intravenous Inducing Agents', '//www.medicineindia.org/generic-medicine-by-category/562/intravenous-inducing-agents'],
['Anaesthesia', 'Anaesthetics - General', 'Intravenous- Diassociative Anaesthetics', '//www.medicineindia.org/generic-medicine-by-category/563/intravenous-diassociative-anaesthetics'],
['Anaesthesia', 'Anaesthetics - Local', 'Adjuncts', '//www.medicineindia.org/generic-medicine-by-category/575/adjuncts'],
['Anaesthesia', 'Anaesthetics - Local', 'Amide type', '//www.medicineindia.org/generic-medicine-by-category/573/amide-type'],
['Anaesthesia', 'Anaesthetics - Local', 'Ester type', '//www.medicineindia.org/generic-medicine-by-category/574/ester-type'],
['Anaesthesia', 'Anaesthetics - Local', 'Phenyl Methanol', '//www.medicineindia.org/generic-medicine-by-category/622/phenyl-methanol'],
['Anaesthesia', 'General Anaesthetics-Adjuncts', 'Analgesics', '//www.medicineindia.org/generic-medicine-by-category/9/analgesics'],
['Anaesthesia', 'General Anaesthetics-Adjuncts', 'Anticholinesterases Inhibitors', '//www.medicineindia.org/generic-medicine-by-category/569/anticholinesterases-inhibitors'],
['Anaesthesia', 'General Anaesthetics-Adjuncts', 'Antiemetics', '//www.medicineindia.org/generic-medicine-by-category/32/antiemetics'],
['Anaesthesia', 'General Anaesthetics-Adjuncts', 'Bronchodialator', '//www.medicineindia.org/generic-medicine-by-category/572/bronchodialator'],
...