How to parse HTML table using jsoup?

Yes, it is possible with JSoup. First, you select the table. Then, you select the <tr> tags for rows. You can start from the second index since the first row contains only the column names. Then loop over the <th> tags and get the specific index. In your case, the indexes 7 and 5 are important(index 7: Status, index 5: Host Name). Check the status if it equals to down and if it is, then add the Host Name to a list. That's all.

ArrayList<String> downServers = new ArrayList<>();
Element table = doc.select("table").get(0); //select the first table.
Elements rows = table.select("tr");

for (int i = 1; i < rows.size(); i++) { //first row is the col names so skip it.
    Element row = rows.get(i);
    Elements cols = row.select("td");

    if (cols.get(7).text().equals("down")) {
        downServers.add(cols.get(5).text());
    }
}

Update: When you find the word Titan you can create another loop and look if the cluster name is empty.

Edit: I change the while loop to do while loop.

    ArrayList<String> downServers = new ArrayList<>();
    Element table = doc.select("table").get(0); //select the first table.
    Elements rows = table.select("tr");

    for (int i = 1; i < rows.size(); i++) { //first row is the col names so skip it.
        Element row = rows.get(i);
        Elements cols = row.select("td");

        if (cols.get(3).text().equals("Titan")) {
            if (cols.get(7).text().equals("down"))
                downServers.add(cols.get(5).text());

            do {
                if(i < rows.size() - 1)
                   i++;
                row = rows.get(i);
                cols = row.select("td");
                if (cols.get(7).text().equals("down") && cols.get(3).text().equals("")) {
                    downServers.add(cols.get(5).text());
                }
                if(i == rows.size() - 1)
                    break;
            }
            while (cols.get(3).text().equals(""));
            i--; //if there is two Titan names consecutively.
        }
    }

downServers ArrayList will contain the list of down servers hostnames.


What I would do in your case is first create an Object of your machine with all apropriate attributes. Then using Jsoup I would extract data and create an ArrayList, and then use logic to get data from the Arraylist.

I am skipping the Object creation (since it is not the issue here) and I will name the Object as Machine

Then using Jsoup I would get the row data like this:

ArrayList<Machine> list = new ArrayList();
Document doc = Jsoup.parse(url, 3000);
for (Element table : doc.select("table")) { //this will work if your doc contains only one table element
  for (Element row : table.select("tr")) {
    Machine tmp = new Machine();
    Elements tds = row.select("td");
    tmp.setClusterName(tds.get(3).text());
    tmp.setIp(tds.get(4).text());
    tmp.setStatus(tds.get(7).text());
    //.... and so on for the rest of attributes
    list.add(tmp);
  }
}

Then use a loop to get the values you need from the list:

for(Machine x:list){
  if(x.getStatus().equalsIgnoreCase("up")){
    //machine with UP status found
    System.out.println("The Machine with up status is:"+x.getHostName());
  }
}

That's all. Please also note that this code is not tested and may contain some syntactical errors as it is written directly on this editor and not in an IDE.