AWK to replace HTML tag with another and keep text

Solution 1:

I would use GNU sed for this task following way, let file.txt content be

<span class="desc e-font-family-cond">fork</span>

then

sed -e 's/<span[^>]*>/<strong>/g' -e 's/<\/span>/<\/strong>/g' file.txt

output

<strong>fork</strong>

Explanation: firstly replace span starting using <strong>, secondly replace span closing using </strong>.

Solution 2:

Consider using Python and a tool like BeautifulSoup to handle HTML. Trying to parse HTML with other tools like sed or awk can lead to terrible places.

As an example:

from bs4 import BeautifulSoup
soup = BeautifulSoup('<li><span class="desc e-font-family-cond">fork</span>')
for spanele in soup.findAll('span'):
    spanele.name = 'p'
html_string = str(soup)
print(html_string);

That's lightweight and pretty simple and the html is handled properly with a library that is specifically built to parse it.

Solution 3:

Don't use AWK for processing HTML files. If you can turn your HTML file into an XHTML file, you can use xsltproc for an XML transformation as follows:

trans.xsl file:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet
  version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:output method="xml" indent="yes" encoding="utf-8"/>

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="span[@class='desc e-font-family-cond']">
    <strong><xsl:apply-templates/></strong>
  </xsl:template>

</xsl:stylesheet>

CLI command for invoking xsltproc, which has to be installed, obviously:

xsltproc trans.xsl file.html

The standard output of this command is the corrected HTML file as you want to have it.