Write a S3 Select query to exclude a carriage return(\r) rows
I have a csv column that has data with \r character. How can write a query to eliminate such data
SELECT rv FROM s3object s
this gives me:
I don't want such rows. Want to eliminate it all.
This query still returns me the same results
SELECT rv FROM s3object s where rv!='\r'
Your file has 0x0d 0x0a
(CR LF) at the end of each line. This is often generated by Windows software.
It appears that S3 Select doesn't know how to handle the combination, so the \r
is treated as part of the last field.
You can 'fix' this by ignoring the last character of the last field:
SELECT
SUBSTRING(rv FROM 1 FOR CHAR_LENGTH(rv) - 1) AS rv
FROM s3object s
WHERE char_length(rv) > 1 -- Optional