how to create md5 hash of a column in R?

Package digest absolutely suitable for this task, so firstly we load it:

library(digest)

Then create/load/etc. test data.frame df:

txt <-
"ID,VID
1,xyz-0001
2,abc-0987"

df <- read.table(header=T, text=txt, sep=",", stringsAsFactors=F)
df

The initial data looks like:

  ID      VID
1  1 xyz-0001
2  2 abc-0987

Then we can use function digest with specified algorithm:

df$VID <- sapply(df$VID, digest, algo="md5")
df

Now we have hashed column VID in df:

  ID                              VID
1  1 44e3a9cf85f802ef50f18e64e01c5e32
2  2 c576ff180b2046c1a3ae939766588fd3

With an addition to redmode's answer:

library(digest)
txt <- "hello world"
hash <- digest(txt, algo="md5", serialize=F)
hash

[1] "5eb63bbbe01eeed093cb22bb8f5acdc3"

Setting serialize option to FALSE makes your results consistent with what you would get from online hash generators such as this or this.


Another option is install the openssl package and use its MD5 hashing function. It's a vectorised function so unlike with digest so you won't have to use sapply on it.

library(openssl)

df$VID <- md5(df$VID)

This will replace the characters in the VID column with their MD5 hashed equivalents.

Note: This function requires data to be a character type so if you want to use this on a column of integers you will need to convert them to characters with the as.character function first.