How to replace special characters with gsub in R?

I have a text that is written with the old version of romanian letters.

Old New
ş (s with a cedilla)

UTF-8: c59f

ș (s with a comma)

UTF-8: c899

ţ (s with a cedilla)

UTF-8: c5a3

ț (t with a comma)

UTF-8: c89b

When I export the text from R into a text file, this causes problems (this special letters are exported as s and t). I've manually changed some of the letters, and there where exported correctly.

How can I replace in R the old and new versions of these letters?

So far I have tried:

x<-"ş__s"
gsub("ş","ș",x) # this replaces the letter s also (output: s__s)
gsub("\xc5\x9f","\xc8\x99",x) # this does nothing
gsub("c59f","c899",x) # this does nothing

I hope this is explained clear enough. Thank you in advance for your responses.


Solution 1:

If writing the characters as-is does not work, you can try using the unicode expression. Here is the unicode expressions of the relevant letters from Wikipedia.

ş  U+015F (351)  https://en.wikipedia.org/wiki/%C5%9E
ţ  U+0163 (355)  https://en.wikipedia.org/wiki/%C5%A2

ș  U+0219 (537)  https://en.wikipedia.org/wiki/S-comma
ț  U+021B (539)  https://en.wikipedia.org/wiki/T-comma

You can do the conversion in R as below. Utf8ToInt is convenient to verify that the letters are converted as intended.

x <- "ş__ţ"
utf8ToInt(x)
# 351  95  95 355

x2 <- gsub("\u015F", "\u0219", x)
utf8ToInt(x2)
# 537  95  95 355

x3 <- gsub("\u0163", "\u021B", x)
utf8ToInt(x3)
# 351  95  95 539

By the way, since this is letter-to-letter conversion, chartr function is more efficient than gsub because you can convert multiple pairs of letters at once like below.

x4 <- chartr("\u015F\u0163", "\u0219\u021B", x)
utf8ToInt(x4)
# 537  95  95 539