Get "embedded nul(s) found in input" when reading a csv using read.csv()

Solution 1:

Your CSV might be encoded in UTF-16. This isn't uncommon when working with some Windows-based tools.

You can try loading a UTF-16 CSV like this:

read.csv("mycsv.csv", ..., fileEncoding="UTF-16LE")

Solution 2:

You can try using the skipNul = TRUE option.

mydata = read.csv("mycsv.csv", quote = "\"", skipNul = TRUE)

From ?read.csv

Embedded nuls in the input stream will terminate the field currently being read, with a warning once per call to scan. Setting skipNul = TRUE causes them to be ignored.

It worked for me.

Solution 3:

This is nothing to do with the encoding. This is the problem with reading of the nulls in the file. To handle that, you need to pass the skipNul = TRUE paramater.

for example:

neg = scan('F:/Natural_Language_Processing/negative-words.txt', what = 'character', comment.char = '', encoding = "UTF-8", skipNul = TRUE)