Getting Hibernate and SQL Server to play nice with VARCHAR and NVARCHAR
I'm currently in the process of enabling UTF-8 characters in some tables of a large database. These tables are already of MS-SQL type NVARCHAR. Additionally, I have several fields using VARCHAR as well.
There is a well known issue with Hibernate's interactions with the JDBC driver (see e.g., Mapping to varchar and nvarchar in hibernate) . In short, Hibernate/JDBC generates SQL that passes all strings as Unicode, regardless of the underlying SQL type. When a non-unicode (varchar) field in the database is compared to a Unicode input string, the indicies for that column do not match the encoding so a full table scan is performed. In the JDBC driver (both JTDS and MS versions) there is a parameter to pass Unicode strings as ASCII, but this is an all or nothing proposition that disallows international characters from being input into the the database.
Most posts I've seen on this issue have come up with one of two solutions - 1) change everything in the database to NVARCHAR or 2) set the sendStringParametersAsUnicode=false, My question then is this - is there any known solution for having VARCHAR and NVARCHAR play nicely together? It is a huge issue for my environment to change everything to NVARCHAR because of downstream dependencies and other external issues.
Solution 1:
public class SQLServerUnicodeDialect extends org.hibernate.dialect.SQLServerDialect { public SQLServerUnicodeDialect() { super(); registerColumnType(Types.CHAR, "nchar(1)"); registerColumnType(Types.LONGVARCHAR, "nvarchar(max)" ); registerColumnType(Types.VARCHAR, 4000, "nvarchar($l)"); registerColumnType(Types.VARCHAR, "nvarchar(max)"); registerColumnType(Types.CLOB, "nvarchar(max)" ); registerColumnType(Types.NCHAR, "nchar(1)"); registerColumnType(Types.LONGNVARCHAR, "nvarchar(max)"); registerColumnType(Types.NVARCHAR, 4000, "nvarchar($l)"); registerColumnType(Types.NVARCHAR, "nvarchar(max)"); registerColumnType(Types.NCLOB, "nvarchar(max)"); registerHibernateType(Types.NCHAR, StandardBasicTypes.CHARACTER.getName()); registerHibernateType(Types.LONGNVARCHAR, StandardBasicTypes.TEXT.getName()); registerHibernateType(Types.NVARCHAR, StandardBasicTypes.STRING.getName()); registerHibernateType(Types.NCLOB, StandardBasicTypes.CLOB.getName() ); } }
Solution 2:
I decided to try this as a hack that might work without touching the database. To do this I created a custom type for NVARCHAR fields. This requires JDBC 4 drivers (using the ones from Microsoft) and Hibernate 3.6.0. The sendStringParametersAsUnicode is false.
Here's the approach, I'm still verifying its correctness - any comments from folks with more experience than I are welcome
Add a new Dialect to support the new datatype
public class SQLAddNVarCharDialect extends SQLServerDialect {
public SQLAddNVarCharDialect(){
super();
registerColumnType( Types.NVARCHAR, 8000, "nvarchar($1)" );
registerColumnType( Types.NVARCHAR, "nvarchar(255)" );
}
}
Add the new Type. Notice the setNString
in nullSafeSet
public class NStringUserType implements UserType {
@Override
public Object assemble(Serializable arg0, Object owner)
throws HibernateException {
return deepCopy(arg0);
}
@Override
public Object deepCopy(Object arg0) throws HibernateException {
if(arg0==null) return null;
return arg0.toString();
}
@Override
public Serializable disassemble(Object arg0) throws HibernateException {
return (Serializable)deepCopy(arg0);
}
@Override
public boolean equals(Object arg0, Object arg1) throws HibernateException {
if(arg0 == null )
return arg1 == null;
return arg0.equals(arg1);
}
@Override
public int hashCode(Object arg0) throws HibernateException {
return arg0.hashCode();
}
@Override
public boolean isMutable() {
return false;
}
@Override
public void nullSafeSet(PreparedStatement st, Object value, int index)
throws HibernateException, SQLException {
if(value == null)
st.setNull(index,Types.NVARCHAR);
else
st.setNString(index, value.toString());
}
@Override
public Object replace(Object arg0, Object target, Object owner)
throws HibernateException {
return deepCopy(arg0);
}
@Override
public Class returnedClass() {
return String.class;
}
@Override
public int[] sqlTypes() {
return new int[]{Types.NVARCHAR};
}
@Override
public Object nullSafeGet(ResultSet resultSet, String[] names, Object owner)
throws HibernateException, SQLException {
String result = resultSet.getString(names[0]);
return result == null || result.trim().length() == 0
? null : result;
}
}
Update mappings for all NVARCHAR fields
<property name="firstName" type="NStringUserType">
<column name="firstName" length="40" not-null="false" />
</property>
Raw SQL before (with sendUnicode..=true):
exec sp_prepexec @p1 output,N'@P0 nvarchar(4000),@P1 datetime,@P2 varchar(8000),@P3 nvarchar(4000),@P4 nvarchar(4000),@P5 nvarchar(4000),@P6 nvarchar(4000)... ,N'update Account set ... where AccountId=@P35
And after:
exec sp_prepexec @p1 output,N'@P0 varchar(8000),@P1 .... @P6 nvarchar(4000),@P7 ... ,N'update Account set ... Validated=@P4, prefix=@P5, firstName=@P6 ... where AccountId=@P35
Seems to work similarly for 'SELECT.."
Solution 3:
One thought..
Hide your varchar columns behind indexed views. The views cast to nvarchar. This allows you to maintain 2 interfaces on the same data.
The same applies the other way... use views for your downstream stuff but these cast to varchar (all your tables are now nvarchar). In this case there would be no need to index them. A WHERE clause with a varchar value (compared against the nvarchar column) will be widened to nvarchar and the index will be used
Solution 4:
This is less of a Hibernate issue than how the JDBC drivers work. In practice I think the only problem that will arise (aside from the obvious data corruption if you write Unicode data to a varchar column) is when you're querying trying to match on a string.
SQL Server will implicitly convert nvarchar to varchar in a SQL statement ok but when you run a query with a string in the where clause it will not find existing indexes if the types don't match exactly.
So, for example
SELECT * FROM Person WHERE last_name = N'Smith'
will result in a table scan if the last_name field is defined as varchar and there's an index on it.
One other workaround for this performance issue is to use stored procedures to do the type conversion before executing the query.