How to remove row which contains blank cell from csv file in Java
I'm trying to do data cleaning on dataset. by data cleaning i meant removing the row which containes NaN
or duplicates values or empty cell. here is my code
dataset look like this:
Sno Country noofDeaths
1 32432
2 Pakistan NaN
3 USA 3332
3 USA 3332
excel file image:
public class data_reader {
String filePath="src\\abc.csv";
public void readData() {
BufferedReader br = null;
String line = "";
HashSet<String> lines = new HashSet<>();
try {
br = new BufferedReader(new FileReader(filePath));
while ((line = br.readLine()) != null) {
if(!line.contains("NaN") || !line.contains("")) {
if (lines.add(line)) {
System.out.println(line);
}
}
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
if (br != null) {
try {
br.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
}
it is working fine for NaN values and duplicates rows but not for empty cell, please help how to do this.
!line.contains("")
this is not working.
Condition !line.contains("") - doesn't make sence because every string contains empty string.
General suggestions:
- don't hard code file-path, code must be reusable;
- use try with resources;
- camel-case names.
public class DataReader {
public static void main(String[] args) {
new DataReader().readData("src\\abc.csv");
}
public void readData(String filePath) {
try(BufferedReader br = new BufferedReader(new FileReader(filePath))) {
HashSet<String> lines = new HashSet<>();
String line = null;
while ((line = br.readLine()) != null) {
if(!line.contains("NaN")) {
for (String cell: line.split(",")) {
if (!cell.isBlank()&&lines.add(cell)) {
System.out.print(cell + " ");
}
}
}
System.out.println();
}
} catch (IOException e) {
e.printStackTrace();
}
}
}