My last post is based on blacklist only.
You check the whole string for results using the last expression i posted.
It will match everything that contains "shit" with whitespace in between unless it has a prefix of "do".
Perhaps removing the first whitespace check would be good. Otherwise you might end up missing for example "You don't do shit".
The following will match "shit" and part of "shite", "dipshit" et.c. but not "doshit" (or "doshite").
Code:
(?<!do)s\s*h\s*i\s*t
If you want the whole word removed when matched with words that contains "shit" you'll have to expand the expression some more.
I'll show it without the whitespace checks so it will be easier to read.
The following will match the whole word of "shit", "shite", "dipshit" but not "doshit".
Code:
(?<!do)(?:dip)?shit(?:e)?
And with the full whitespace check...
Code:
(?<!do)s\s*h\s*i\s*t(?:\s*e)?
Making it further dynamic would probably only cause harm. In this case, not many words contains "shit". If you use the following expression for the purpose of matching everything starting with shit, like "shitface", "shithead" et.c., you would end up matching words like shitake.
Code:
(?<!do)s\s*h\s*i\s*t[\w]*
If you're doing whitelist the most simple solution would be to loop the whole whitelist, replacing the accepted words with placeholders, do the blacklist and then change the whitelisted words back. Hopefully the whitelist doesn't have to be so large, making it still pretty efficient.
__________________