How to check if at least one element of a list is included in a text column?
Solution 1:
You can do that using the built in rlike
function with the following code.
from pyspark.sql import functions
test_df = (test_df.withColumn("text_contains_word",
functions.col('text')
.rlike('(^|\s)(' + '|'.join(test_keywords)
+ ')(\s|$)')))
test_df.show()
+---+--------------------+------------------+
| id| text|text_contains_word|
+---+--------------------+------------------+
| 1|i like stackoverflow| false|
| 2|tomorrow the sun ...| true|
+---+--------------------+------------------+