What is the difference between varchar and nvarchar?

Is it just that nvarchar supports multibyte characters? If that is the case, is there really any point, other than storage concerns, to using varchars?


Solution 1:

An nvarchar column can store any Unicode data. A varchar column is restricted to an 8-bit codepage. Some people think that varchar should be used because it takes up less space. I believe this is not the correct answer. Codepage incompatabilities are a pain, and Unicode is the cure for codepage problems. With cheap disk and memory nowadays, there is really no reason to waste time mucking around with code pages anymore.

All modern operating systems and development platforms use Unicode internally. By using nvarchar rather than varchar, you can avoid doing encoding conversions every time you read from or write to the database. Conversions take time, and are prone to errors. And recovery from conversion errors is a non-trivial problem.

If you are interfacing with an application that uses only ASCII, I would still recommend using Unicode in the database. The OS and database collation algorithms will work better with Unicode. Unicode avoids conversion problems when interfacing with other systems. And you will be preparing for the future. And you can always validate that your data is restricted to 7-bit ASCII for whatever legacy system you're having to maintain, even while enjoying some of the benefits of full Unicode storage.

Solution 2:

varchar: Variable-length, non-Unicode character data. The database collation determines which code page the data is stored using.

nvarchar: Variable-length Unicode character data. Dependent on the database collation for comparisons.

Armed with this knowledge, use whichever one matches your input data (ASCII v. Unicode).

Solution 3:

I always use nvarchar as it allows whatever I'm building to withstand pretty much any data I throw at it. My CMS system does Chinese by accident, because I used nvarchar. These days, any new applications shouldn't really be concerned with the amount of space required.

Solution 4:

It depends on how Oracle was installed. During the installation process, the NLS_CHARACTERSET option is set. You may be able to find it with the query SELECT value$ FROM sys.props$ WHERE name = 'NLS_CHARACTERSET'.

If your NLS_CHARACTERSET is a Unicode encoding like UTF8, great. Using VARCHAR and NVARCHAR are pretty much identical. Stop reading now, just go for it. Otherwise, or if you have no control over the Oracle character set, read on.

VARCHAR — Data is stored in the NLS_CHARACTERSET encoding. If there are other database instances on the same server, you may be restricted by them; and vice versa, since you have to share the setting. Such a field can store any data that can be encoded using that character set, and nothing else. So for example if the character set is MS-1252, you can only store characters like English letters, a handful of accented letters, and a few others (like € and —). Your application would be useful only to a few locales, unable to operate anywhere else in the world. For this reason, it is considered A Bad Idea.

NVARCHAR — Data is stored in a Unicode encoding. Every language is supported. A Good Idea.

What about storage space? VARCHAR is generally efficient, since the character set / encoding was custom-designed for a specific locale. NVARCHAR fields store either in UTF-8 or UTF-16 encoding, base on the NLS setting ironically enough. UTF-8 is very efficient for "Western" languages, while still supporting Asian languages. UTF-16 is very efficient for Asian languages, while still supporting "Western" languages. If concerned about storage space, pick an NLS setting to cause Oracle to use UTF-8 or UTF-16 as appropriate.

What about processing speed? Most new coding platforms use Unicode natively (Java, .NET, even C++ std::wstring from years ago!) so if the database field is VARCHAR it forces Oracle to convert between character sets on every read or write, not so good. Using NVARCHAR avoids the conversion.

Bottom line: Use NVARCHAR! It avoids limitations and dependencies, is fine for storage space, and usually best for performance too.

Solution 5:

nvarchar stores data as Unicode, so, if you're going to store multilingual data (more than one language) in a data column you need the N variant.