How to parse HTML table using jsoup?
Yes, it is possible with JSoup. First, you select the table. Then, you select the <tr>
tags for rows. You can start from the second index since the first row contains only the column names. Then loop over the <th>
tags and get the specific index. In your case, the indexes 7 and 5 are important(index 7: Status, index 5: Host Name). Check the status if it equals to down
and if it is, then add the Host Name to a list. That's all.
ArrayList<String> downServers = new ArrayList<>();
Element table = doc.select("table").get(0); //select the first table.
Elements rows = table.select("tr");
for (int i = 1; i < rows.size(); i++) { //first row is the col names so skip it.
Element row = rows.get(i);
Elements cols = row.select("td");
if (cols.get(7).text().equals("down")) {
downServers.add(cols.get(5).text());
}
}
Update:
When you find the word Titan
you can create another loop and look if the cluster name is empty.
Edit: I change the while
loop to do while
loop.
ArrayList<String> downServers = new ArrayList<>();
Element table = doc.select("table").get(0); //select the first table.
Elements rows = table.select("tr");
for (int i = 1; i < rows.size(); i++) { //first row is the col names so skip it.
Element row = rows.get(i);
Elements cols = row.select("td");
if (cols.get(3).text().equals("Titan")) {
if (cols.get(7).text().equals("down"))
downServers.add(cols.get(5).text());
do {
if(i < rows.size() - 1)
i++;
row = rows.get(i);
cols = row.select("td");
if (cols.get(7).text().equals("down") && cols.get(3).text().equals("")) {
downServers.add(cols.get(5).text());
}
if(i == rows.size() - 1)
break;
}
while (cols.get(3).text().equals(""));
i--; //if there is two Titan names consecutively.
}
}
downServers ArrayList will contain the list of down servers hostnames.
What I would do in your case is first create an Object of your machine with all apropriate attributes. Then using Jsoup I would extract data and create an ArrayList, and then use logic to get data from the Arraylist.
I am skipping the Object creation (since it is not the issue here) and I will name the Object as Machine
Then using Jsoup I would get the row data like this:
ArrayList<Machine> list = new ArrayList();
Document doc = Jsoup.parse(url, 3000);
for (Element table : doc.select("table")) { //this will work if your doc contains only one table element
for (Element row : table.select("tr")) {
Machine tmp = new Machine();
Elements tds = row.select("td");
tmp.setClusterName(tds.get(3).text());
tmp.setIp(tds.get(4).text());
tmp.setStatus(tds.get(7).text());
//.... and so on for the rest of attributes
list.add(tmp);
}
}
Then use a loop to get the values you need from the list:
for(Machine x:list){
if(x.getStatus().equalsIgnoreCase("up")){
//machine with UP status found
System.out.println("The Machine with up status is:"+x.getHostName());
}
}
That's all. Please also note that this code is not tested and may contain some syntactical errors as it is written directly on this editor and not in an IDE.