Reading two-line headers in R
Solution 1:
I would do two steps, assuming we know that the first row contains the labels, and there are always two headers.
header <- scan("file.txt", nlines = 1, what = character())
data <- read.table("file.txt", skip = 2, header = FALSE)
Then add the character vector header
on as the names
component:
names(data) <- header
For your data this would be
header <- scan("data.txt", nlines = 1, what = character())
data <- read.table("data.txt", skip = 2, header = FALSE)
names(data) <- header
head(data)
> head(data)
trt biomass yield
1 C2 17.76 205.92
2 C2 17.96 207.86
3 CC 17.72 197.22
4 CC 18.42 205.20
5 CCW 18.15 200.51
6 CCW 17.45 190.59
If you want the units, as per @DWin's answer, then do a second scan()
on line 2
header2 <- scan("data.txt", skip = 1, nlines = 1, what = character())
names(data) <- paste0(header, header2)
> head(data)
trtcrop biomassMg/ha yieldbu/ac
1 C2 17.76 205.92
2 C2 17.96 207.86
3 CC 17.72 197.22
4 CC 18.42 205.20
5 CCW 18.15 200.51
6 CCW 17.45 190.59
Solution 2:
Use readLines
with 2 for the limit, parse it, paste0
them together, then read in with read.table
with skip =2
and header=FALSE
(the default). Finish the process off with assignment of the column names:
dat <- "trt biomass yield
crop Mg/ha bu/ac
C2 17.76 205.92
C2 17.96 207.86
CC 17.72 197.22
CC 18.42 205.20
CCW 18.15 200.51
CCW 17.45 190.59
P 3.09 0.00
P 3.34 0.00
S2 5.13 49.68
S2 5.36 49.72
"
You would probably use a file argument but using the text
argument to the read-functions makes this more self-contained:
readLines(textConnection(dat),n=2)
#[1] "trt\tbiomass\tyield" "crop\tMg/ha\tbu/ac"
head2 <- read.table(text=readLines(textConnection(dat),n=2), sep="\t", stringsAsFactors=FALSE)
with(head2, paste0(head2[1,],head2[2,]) )
# [1] "trtcrop" "biomassMg/ha" "yieldbu/ac"
joinheadrs <- with(head2, paste0(head2[1,],head2[2,]) )
newdat <- read.table(text=dat, sep="\t",skip=2)
colnames(newdat)<- joinheadrs
#-------------------
> newdat
trtcrop biomassMg/ha yieldbu/ac
1 C2 17.76 205.92
2 C2 17.96 207.86
3 CC 17.72 197.22
4 CC 18.42 205.20
5 CCW 18.15 200.51
6 CCW 17.45 190.59
7 P 3.09 0.00
8 P 3.34 0.00
9 S2 5.13 49.68
10 S2 5.36 49.72
Might be better to use paste with an underscore-sep:
joinheadrs <- with(head2, paste(head2[1,],head2[2,] ,sep="_") )
joinheadrs
#[1] "trt_crop" "biomass_Mg/ha" "yield_bu/ac"
Solution 3:
Almost the same method to the other answers, just shortening to 2 statements:
dat <- "trt biomass yield
crop Mg/ha bu/ac
C2 17.76 205.92
C2 17.96 207.86
CC 17.72 197.22
CC 18.42 205.20
CCW 18.15 200.51
CCW 17.45 190.59
P 3.09 0.00
P 3.34 0.00
S2 5.13 49.68
S2 5.36 49.72"
header <- sapply(read.table(text=dat, nrow=2), paste, collapse="_")
result <- read.table(text=dat, skip=2, col.names=header)
Result:
> head(result,2)
trt_crop biomass_Mg/ha yield_bu/ac
1 C2 17.76 205.92
2 C2 17.96 207.86
...
Solution 4:
A slightly different explained step by step approach:
-
Read only the first two lines of the files as data (without headers):
headers <- read.table("data.txt", nrows=2, header=FALSE)
-
Create the headers names with the two (or more) first rows,
sappy
allows to make operations over the columns (in this case paste) - read more about sapply here :headers_names <- sapply(headers,paste,collapse="_")
-
Read the data of the files (skipping the first 2 rows):
data <- read.csv(file="data.txt", skip = 2, header=FALSE)
-
And assign the headers of step two to the data:
names(data) <- headers_names
The advantage is that you would have clear control of the the parameters of read.table (such as sep
for commas, and stringAsFactors
- for both the headers and the data)