read.csv, header on first line, skip second line [duplicate]

Solution 1:

This should do the trick:

all_content = readLines("file.csv")
skip_second = all_content[-2]
dat = read.csv(textConnection(skip_second), header = TRUE, stringsAsFactors = FALSE)

The first step using readLines reads the entire file into a list, where each item in the list represents a line in the file. Next, you discard the second line using the fact that negative indexing in R means select all but this index. Finally, we feed this data to read.csv to process it into a data.frame.

Solution 2:

You can strip the first line(s) after the header directly from the dataframe, to allow you to do this in one line:

df<-read.csv("test.txt",header=T)[-1,]

if my datafile "test.txt" is the following:

var1, var2
units1, units2
2.3,6.8
4.5,6.7

this gives me

> read.csv("test.txt",header=T)[-1,]
var1 var2
2  2.3  6.8
3  4.5  6.7

This answers your question exactly, but just to generalize the answer, you can also skip the Nth to the Mth lines in this way:

df<-read.csv("test.txt",header=T)[-N:-M,]

where N and M are integers of course.


Note: This method will convert all columns into factor.

str(read.csv("test.csv", header = TRUE)[-1,])
# 'data.frame': 2 obs. of  2 variables:
#   $ var1: Factor w/ 3 levels "2.3","4.5","units1": 1 2
#   $ var2: Factor w/ 3 levels " units2","6.7",..: 3 2

Solution 3:

On Linux (or Mac) you can take advantage of being able to use linux commands in data.table::fread, so

data.table::fread("sed -e '2d' myfile.txt", data.table = F)

will skip the second line.