R's read.csv prepending 1st column name with junk text [duplicate]
I have exported data from a result grid in SQL Server Management Studio to a csv file. The csv file looks correct.
But when I read the data into an R dataframe using read.csv, the first column name is prepended with "ï..". How do I get rid of this junk text?
Example:
str(trainData)
'data.frame': 64169 obs. of 20 variables:
$ ï..Column1 : int 3232...
$ Column2 : int 4242...
The data looks something like this (nothing special) :
Column1,Column2
100116577,100116577
100116698,100116702
Solution 1:
You've got a Unicode UTF-8 BOM at the start of the file:
http://en.wikipedia.org/wiki/Byte_order_mark
A text editor or web browser interpreting the text as ISO-8859-1 or CP1252 will display the characters  for this
R is giving you the ï and then converting the other two into dots as they are non-alphanumeric characters.
Here:
http://r.789695.n4.nabble.com/Writing-Unicode-Text-into-Text-File-from-R-in-Windows-td4684693.html
Duncan Murdoch suggests:
You can declare a file to be in encoding "UTF-8-BOM" if you want to ignore a BOM on input
So try your read.csv
with fileEncoding="UTF-8-BOM"
or persuade your SQL wotsit to not output a BOM.
Otherwise you may as well test if the first name starts with ï..
and strip it with substr
(as long as you know you'll never have a column that does start like that genuinely...)