State name to abbreviation

R has two built-in constants that might help: state.abb with the abbreviations, and state.name with the full names. Here is a simple usage example:

> x <- c("New York", "Virginia")
> state.abb[match(x,state.name)]
[1] "NY" "VA"

1) grep the full name from state.name and use that to index into state.abb:

state.abb[grep("New York", state.name)]
## [1] "NY"

1a) or using which:

state.abb[which(state.name == "New York")]
## [1] "NY"

2) or create a vector of state abbreviations whose names are the full names and index into it using the full name:

setNames(state.abb, state.name)["New York"]
## New York 
##     "NY"

Unlike (1), this one works even if "New York" is replaced by a vector of full state names, e.g. setNames(state.abb, state.name)[c("New York", "Idaho")]

I found the built-in state.name and state.abb have only 50 states. I got a bigger table (including DC and so on) from online (e.g., this link: http://www.infoplease.com/ipa/A0110468.html) and pasted it to a .csv file named States.csv. I then load states and abbr. from this file instead of using the built-in. The rest is quite similar to @Aniko 's

library(dplyr)
library(stringr)
library(stringdist)

setwd()
# load data
data = c("NY", "New York", "NewYork")
data = toupper(data)

# load state name and abbr.
State.data = read.csv('States.csv')
State = toupper(State.data$State)
Stateabb = as.vector(State.data$Abb)

# match data with state names, misspell of 1 letter is allowed
match = amatch(data, State, maxDist=1)
data[ !is.na(match) ] = Stateabb[ na.omit( match ) ]

There's a small difference between match and amatch in how they calculate the distance from one word to another. See P25-26 here http://cran.r-project.org/doc/contrib/de_Jonge+van_der_Loo-Introduction_to_data_cleaning_with_R.pdf

State name to abbreviation

Related

Recent Posts