Add an index (numeric ID) column to large data frame [duplicate]
I have a read large csv file into a data frame. Data in the csv file are from multiple web sites representing user information. For example here is the structure of the data frame.
user_id, number_of_logins, number_of_images, web
001, 34, 3, aa.com
002, 4, 4, aa.com
034, 3, 3, aa.com
001, 12, 4, bb.com
002, 1, 3, bb.com
034, 2, 2, cc.com
as you can see once I bring the data into the data frame user_id is no longer a unique id and this causes all the analysis. I am trying to add another columns prior to user_id
which is something like "generated_uid"
and pretty much use the index of the data.frame
to be filled by that column. What's the best way to accomplish this.
Solution 1:
You can add a sequence of numbers very easily with
data$ID <- seq.int(nrow(data))
If you are already using library(tidyverse)
, you can use
data <- tibble::rowid_to_column(data, "ID")
Solution 2:
Using alternative dplyr package:
library("dplyr") # or library("tidyverse")
df <- df %>% mutate(id = row_number())
Solution 3:
If your data.frame
is a data.table
, you can use special symbol .I
:
data[, ID := .I]
Solution 4:
Well, if I understand you correctly. You can do something like the following.
To show it, I first create a data.frame
with your example
df <-
scan(what = character(), sep = ",", text =
"001, 34, 3, aa.com
002, 4, 4, aa.com
034, 3, 3, aa.com
001, 12, 4, bb.com
002, 1, 3, bb.com
034, 2, 2, cc.com")
df <- as.data.frame(matrix(df, 6, 4, byrow = TRUE))
colnames(df) <- c("user_id", "number_of_logins", "number_of_images", "web")
You can then run one of the following lines to add a column (at the end of the data.frame
) with the row number as the generated user id. The second lines simply adds leading zeros.
df$generated_uid <- 1:nrow(df)
df$generated_uid2 <- sprintf("%03d", 1:nrow(df))
If you absolutely want the generated user id to be the first column, you can add the column like so:
df <- cbind("generated_uid3" = sprintf("%03d", 1:nrow(df)), df)
or simply rearrage the columns.