AWK to replace HTML tag with another and keep text
Solution 1:
I would use GNU sed
for this task following way, let file.txt
content be
<span class="desc e-font-family-cond">fork</span>
then
sed -e 's/<span[^>]*>/<strong>/g' -e 's/<\/span>/<\/strong>/g' file.txt
output
<strong>fork</strong>
Explanation: firstly replace span
starting using <strong>
, secondly replace span
closing using </strong>
.
Solution 2:
Consider using Python and a tool like BeautifulSoup to handle HTML. Trying to parse HTML with other tools like sed
or awk
can lead to terrible places.
As an example:
from bs4 import BeautifulSoup
soup = BeautifulSoup('<li><span class="desc e-font-family-cond">fork</span>')
for spanele in soup.findAll('span'):
spanele.name = 'p'
html_string = str(soup)
print(html_string);
That's lightweight and pretty simple and the html is handled properly with a library that is specifically built to parse it.
Solution 3:
Don't use AWK for processing HTML files. If you can turn your HTML file into an XHTML file, you can use xsltproc for an XML transformation as follows:
trans.xsl file:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes" encoding="utf-8"/>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="span[@class='desc e-font-family-cond']">
<strong><xsl:apply-templates/></strong>
</xsl:template>
</xsl:stylesheet>
CLI command for invoking xsltproc, which has to be installed, obviously:
xsltproc trans.xsl file.html
The standard output of this command is the corrected HTML file as you want to have it.