Difference between text and varchar (character varying)
Solution 1:
There is no difference, under the hood it's all varlena
(variable length array).
Check this article from Depesz: http://www.depesz.com/index.php/2010/03/02/charx-vs-varcharx-vs-varchar-vs-text/
A couple of highlights:
To sum it all up:
- char(n) – takes too much space when dealing with values shorter than
n
(pads them ton
), and can lead to subtle errors because of adding trailing spaces, plus it is problematic to change the limit- varchar(n) – it's problematic to change the limit in live environment (requires exclusive lock while altering table)
- varchar – just like text
- text – for me a winner – over (n) data types because it lacks their problems, and over varchar – because it has distinct name
The article does detailed testing to show that the performance of inserts and selects for all 4 data types are similar. It also takes a detailed look at alternate ways on constraining the length when needed. Function based constraints or domains provide the advantage of instant increase of the length constraint, and on the basis that decreasing a string length constraint is rare, depesz concludes that one of them is usually the best choice for a length limit.
Solution 2:
As "Character Types" in the documentation points out, varchar(n)
, char(n)
, and text
are all stored the same way. The only difference is extra cycles are needed to check the length, if one is given, and the extra space and time required if padding is needed for char(n)
.
However, when you only need to store a single character, there is a slight performance advantage to using the special type "char"
(keep the double-quotes — they're part of the type name). You get faster access to the field, and there is no overhead to store the length.
I just made a table of 1,000,000 random "char"
chosen from the lower-case alphabet. A query to get a frequency distribution (select count(*), field ... group by field
) takes about 650 milliseconds, vs about 760 on the same data using a text
field.
Solution 3:
(this answer is a Wiki, you can edit - please correct and improve!)
UPDATING BENCHMARKS FOR 2016 (pg9.5+)
And using "pure SQL" benchmarks (without any external script)
-
use any string_generator with UTF8
-
main benchmarks:
2.1. INSERT
2.2. SELECT comparing and counting
CREATE FUNCTION string_generator(int DEFAULT 20,int DEFAULT 10) RETURNS text AS $f$
SELECT array_to_string( array_agg(
substring(md5(random()::text),1,$1)||chr( 9824 + (random()*10)::int )
), ' ' ) as s
FROM generate_series(1, $2) i(x);
$f$ LANGUAGE SQL IMMUTABLE;
Prepare specific test (examples)
DROP TABLE IF EXISTS test;
-- CREATE TABLE test ( f varchar(500));
-- CREATE TABLE test ( f text);
CREATE TABLE test ( f text CHECK(char_length(f)<=500) );
Perform a basic test:
INSERT INTO test
SELECT string_generator(20+(random()*(i%11))::int)
FROM generate_series(1, 99000) t(i);
And other tests,
CREATE INDEX q on test (f);
SELECT count(*) FROM (
SELECT substring(f,1,1) || f FROM test WHERE f<'a0' ORDER BY 1 LIMIT 80000
) t;
... And use EXPLAIN ANALYZE
.
UPDATED AGAIN 2018 (pg10)
little edit to add 2018's results and reinforce recommendations.
Results in 2016 and 2018
My results, after average, in many machines and many tests: all the same
(statistically less tham standard deviation).
Recommendation
-
Use
text
datatype,
avoid oldvarchar(x)
because sometimes it is not a standard, e.g. inCREATE FUNCTION
clausesvarchar(x)
≠varchar(y)
. -
express limits (with same
varchar
performance!) by withCHECK
clause in theCREATE TABLE
e.g.CHECK(char_length(x)<=10)
.
With a negligible loss of performance in INSERT/UPDATE you can also to control ranges and string structure
e.g.CHECK(char_length(x)>5 AND char_length(x)<=20 AND x LIKE 'Hello%')