How to detect word boundary in regex for Arabic words - Python

I am trying to remove any word that might contain non-Arabic characters. So, words like ذهb or word should be removed.

I have managed to remove the non-Arabic characters using the below regex:

re.sub(r'([^،-٩]+)',' ', 'ذهb')

But how would I remove the whole word? Preceding the regex with \b doesn't seem to work.

Solution 1:

You can use

re.sub(r'\s*\b[\u0621-\u064A]*[^\W\d_\u0621-\u064A][^\W\d_]*\b', '', text)

The \s*\b[\u0621-\u064A]*[^\W\d_\u0621-\u064A][^\W\d_]*\b matches