SQL Server - Remove all non-printable ASCII characters
We recently migrated from SQL Server 2012 to SQL Server 2014 and all our FOR XML
code started throwing errors about non-printable ASCII characters.
I wrote this horrible function to remove non-printable ASCII characters as a quick fix. I want to replace it with something cleaner. Is there a way to do this?
ALTER FUNCTION [dbo].[remove_non_printable_chars]
(@input_string nvarchar(max))
RETURNS nvarchar(max)
BEGIN
RETURN
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
REPLACE(REPLACE(@input_string,
CHAR(1), ''),CHAR(2), ''),CHAR(3), ''),CHAR(4), ''),CHAR(5), ''),CHAR(6), ''),CHAR(7), ''),CHAR(8), ''),CHAR(9), ''),CHAR(10), ''),
CHAR(11), ''),CHAR(12), ''),CHAR(13), ''),CHAR(14), ''),CHAR(15), ''),CHAR(16), ''),CHAR(17), ''),CHAR(18), ''),CHAR(19), ''),CHAR(20), ''),
CHAR(21), ''),CHAR(22), ''),CHAR(23), ''),CHAR(24), ''),CHAR(25), ''),CHAR(26), ''),CHAR(27), ''),CHAR(28), ''),CHAR(29), ''),CHAR(30), ''),
CHAR(31), ''), NCHAR(0) COLLATE Latin1_General_100_BIN2, '')
END
Here's the FOR XML
code that broke. (I did not write this. It was already in the code base).
SELECT @HTMLTableData =
(
SELECT HTMLRows
FROM (
SELECT N'<tr>' + HTMLRow + N'</tr>' AS HTMLRows
FROM @HTMLRowData
) mi
FOR XML PATH(''), TYPE
).value('/', 'NVARCHAR(MAX)')
Solution 1:
Another Option.
This function will replace control characters and correct any residual repeating spaces. For example Jane Smith{13}was here
will not be returned as Jane Smithwas here
, but rather Jane Smith was here
CREATE FUNCTION [dbo].[udf-Str-Strip-Control](@S varchar(max))
Returns varchar(max)
Begin
;with cte1(N) As (Select 1 From (Values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) N(N)),
cte2(C) As (Select Top (32) Char(Row_Number() over (Order By (Select NULL))-1) From cte1 a,cte1 b)
Select @S = Replace(@S,C,' ')
From cte2
Return ltrim(rtrim(replace(replace(replace(@S,' ','†‡'),'‡†',''),'†‡',' ')))
End
--Select [dbo].[udf-Str-Strip-Control]('Michael '+char(13)+char(10)+'LastName') --Returns: Michael LastName
Solution 2:
In-line version:
create function [dbo].[remove_non_printable_chars] (@input_string nvarchar(max))
returns table with schemabinding as return (
select
replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(
replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(
replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(
replace(replace(@input_string collate latin1_general_100_bin2,
char(1), ''),char(2), ''),char(3), ''),char(4), ''),char(5), ''),char(6), ''),char(7), ''),char(8), ''),char(9), ''),char(10), ''),
char(11), ''),char(12), ''),char(13), ''),char(14), ''),char(15), ''),char(16), ''),char(17), ''),char(18), ''),char(19), ''),char(20), ''),
char(21), ''),char(22), ''),char(23), ''),char(24), ''),char(25), ''),char(26), ''),char(27), ''),char(28), ''),char(29), ''),char(30), ''),
char(31), ''), char(0) , '')
as clean_string
);
go
And use it like so:
select c.clean_string
from dbo.remove_non_printable_chars(@dirtystring) c
or
select ...
, c.clean_string
from t
cross apply dbo.remove_non_printable_chars(t.dirty_string) c
Reference:
- When is a SQL function not a function? "If it’s not inline, it’s rubbish." - Rob Farley
- Inline Scalar Functions - Itzik Ben-Gan
- Scalar functions, inlining, and performance: An entertaining title for a boring post - Adam Machanic
- TSQL User-Defined Functions: Ten Questions You Were Too Shy To Ask - Robert Sheldon