how to create md5 hash of a column in R?
Package digest
absolutely suitable for this task, so firstly we load it:
library(digest)
Then create/load/etc. test data.frame
df
:
txt <-
"ID,VID
1,xyz-0001
2,abc-0987"
df <- read.table(header=T, text=txt, sep=",", stringsAsFactors=F)
df
The initial data looks like:
ID VID
1 1 xyz-0001
2 2 abc-0987
Then we can use function digest
with specified algorithm:
df$VID <- sapply(df$VID, digest, algo="md5")
df
Now we have hashed column VID
in df
:
ID VID
1 1 44e3a9cf85f802ef50f18e64e01c5e32
2 2 c576ff180b2046c1a3ae939766588fd3
With an addition to redmode's answer:
library(digest)
txt <- "hello world"
hash <- digest(txt, algo="md5", serialize=F)
hash
[1] "5eb63bbbe01eeed093cb22bb8f5acdc3"
Setting serialize option to FALSE makes your results consistent with what you would get from online hash generators such as this or this.
Another option is install the openssl
package and use its MD5 hashing function. It's a vectorised function so unlike with digest
so you won't have to use sapply on it.
library(openssl)
df$VID <- md5(df$VID)
This will replace the characters in the VID column with their MD5 hashed equivalents.
Note: This function requires data to be a character type so if you want to use this on a column of integers you will need to convert them to characters with the as.character
function first.