How to remove substring from the end of string using spark sql?

Solution 1:

You can use regexp_replace function with this regex:

(\s*BRAND\s*)*$ # removes all brand names that comes at the end

For cases where perfume name is the same as brand name then the output of regexp_replace will be empty string, using when expression you can check if it's empty then use brand name:

SELECT  CASE WHEN trim(regexp_replace(perfume, format_string('(\\s*%s\\s*)*$', brand), '')) <> ''
             THEN regexp_replace(perfume, format_string('(\\s*%s\\s*)*$', brand), '')
             ELSE brand
        END AS perfume,
        brand 
FROM global_temp.gv_web