How do I convert an HTML table into JSON in Logic Apps

Solution 1:

I hope I'm on the right track but this code below (although quite specific to your use case) will read the HTML table and return a JSON representation of the data.

Just create a new Azure Function in .NET called ConvertHtmlTableToJson and paste it in.

#r "Newtonsoft.Json"

using System.Net;
using Microsoft.AspNetCore.Mvc;
using Microsoft.Extensions.Primitives;
using System.Collections.Generic;
using Newtonsoft.Json;
using System.Xml;

public static async Task<IActionResult> Run(HttpRequest req, ILogger log)
{
    var outputTable = new List<List<String>>();

    string requestBody = String.Empty;

    using (StreamReader streamReader = new StreamReader(req.Body))
    {
        requestBody = await streamReader.ReadToEndAsync();
    }

    dynamic data = JsonConvert.DeserializeObject(requestBody);
    string xmlString = System.Text.Encoding.UTF8.GetString(Convert.FromBase64String((string)data?.Content));;

    var xmlDocument = new XmlDocument();
    xmlDocument.LoadXml(xmlString);

    // Get the rows
    var xmlRows = xmlDocument.DocumentElement.SelectNodes("//tr");

    foreach (XmlNode xmlRow in xmlRows)
    {
        // Now get the columns.
        var xmlColumns = xmlRow.SelectNodes(".//td");
        var row = new List<string>();

        foreach (XmlNode xmlColumn in xmlColumns)
        {
            var value = xmlColumn.SelectSingleNode(".//span[@class='value']");

            if (value != null)
                row.Add(value.InnerText);
        }

        if (row.Count > 0)
            outputTable.Add(row);
    }
   
    return new OkObjectResult(outputTable);
}

It accepts a Base64 string that is the HTML data you provided in your example.

A few things to note ...

  • It's hardcoded to look for span elements with an attribute of class="value", that's in line with the email you receive and the HTML you provided.
  • It doesn't check for imbalanced columns, i.e. if a row is missing a value, you may get two columns for one row and three columns for another.
  • Headers are ignored because it only searches within the td elements where there's a span element with the attribute criteria as specified in the first point.

As long as your email stays the same and you pass in the same structure that you provided as an example, this will extract the data for you. Beyond that, it would need to be enhanced.

From there, you should be able to use the 2D array it returns to load your data to your Excel table.

This is how I represented it in LogicApps ...

HTML Variable

HTML Variable

This is the body in the request ...

{
  "Content": "@{base64(variables('HTML'))}"
}

Result

Result

[
  [
    "G-TEL FOR ENBRIDGE GAS (LEGACY UNION GAS) (ENOW01)",
    "ENOW01",
    "Notification sent"
  ],
  [
    "CITY OF STRATFORD (STRATWS01)",
    "STRATWS01",
    "Notification sent"
  ],
  [
    "FESTIVAL HYDRO (LOCAL HYDRO) (FESTH01)",
    "FESTH01",
    "Notification sent"
  ],
  [
    "WIGHTMAN TELECOM - FIBRE - LIMITED (WT01)",
    "WT01",
    "Notification sent"
  ],
  [
    "CLI FOR ROGERS (ROGWAT01)",
    "ROGWAT01",
    "Cleared"
  ],
  [
    "G-TEL FOR BELL CANADA (BCOW01)",
    "BCOW01",
    "Notification sent"
  ]
]