Is there a way to determine how offensive a word is? [closed]

Is there a way to determine how offensive a word is?

Yes, there are several ways, depending on your need: Do you need to know how offensive or will is it offensive do? How specific does the data need to be? How many words should be analyzed? Do you have the resources to conduct your own experiment? Or maybe you want to try data mining?

It's quite broad, so I've tried to find a variety of techniques and resources. Hopefully you can find something that works for you.


The dictionary method

Sometimes, you can just look in the dictionary and find your answer. In any case, it's a good place to start.

For example, you might wonder about the word Eskimo. You can look it up in Oxford Dictionaries and you'll see:

In recent years the word Eskimo has come to be regarded as offensive (partly through the associations of the now discredited etymology ‘one who eats raw flesh’). The peoples inhabiting the regions from the central Canadian Arctic to western Greenland prefer to call themselves Inuit. The term Eskimo, however, continues to be the only term which can be properly understood as applying to the people as a whole and is still widely used in anthropological and archaeological contexts

In many cases, it won't be that obvious, but you can infer what the connotation is. See this entry in Oxford Dictionaries for cult:

a relatively small group of people having religious beliefs or practices regarded by others as strange or sinister

The word "cult" may be offensive because it implies something is strange and sinister.

You may want to also consider looking in slang dictionaries, such as Urban Dictionary or The New Partridge Dictionary of Slang and Unconventional English


Data mining

Data mining is probably your best bet if you want to examine a lot of words, especially words in context. There are a lot of resources out there if you know how to find (and use) them.

  • SentiWordNet:

    SentiWordNet is a lexical resource for opinion mining. SentiWordNet assigns to each synset of WordNet three sentiment scores: positivity, negativity, objectivity.

  • See also my answer on MSO about detecting offensive comments. I link to several papers and a dataset containing text identified as offensive or not offensive.


The research approach

There are also dozens of studies that look into exactly how offensive certain words are (the words studied vary from paper to paper). All of the studies I've found use surveys of some sort (almost always on college students).

Here are some papers I found:

  • A sociolinguistic analysis of swear word offensiveness
    • Offensiveness of 12 swear words based on gender and race
  • Thirty shades of offensiveness: L1 and LX English users' understanding, perception and self-reported use of negative emotion-laden words
    • Offensiveness of 30 words to native vs non-native speakers
  • Taboo, emotionally valenced, and emotionally neutral word norms
    • Used "92 taboo words, 184 emotionally valenced words, and 184 emotionally neutral words"
    • Word data in ZIP file
  • Affective norms for 210 British English and Finnish nouns
    • Word data also in ZIP file

Other resources

  • Offensive words for people according to nationality or ethnicity (Macmillan Dictionary)
  • Attitudes to potentially offensive language and gestures on TV and radio (OfCom)
  • Swear Word List & Curse Filter (No Swearing)
    • User submitted, but has to be approved by site
    • Also an API
  • You Swear
    • Offers a number of English dialects (and some joke languages)
    • Potentially unreliable since it's user generated, user voted content