Raised This Month: $ Target: $400
 0% 

Replace blacklisted word unless it's a part of whitelisted word - string


Post New Thread Reply   
 
Thread Tools Display Modes
Author Message
Backstabnoob
BANNED
Join Date: Feb 2009
Location: Iwotadai Dorm
Old 03-08-2014 , 15:53   Replace blacklisted word unless it's a part of whitelisted word - string
Reply With Quote #1

Alright so there's the unapproved Advanced Swear Filter, which apparently should have this feature. It doesn't work properly, so copying its' filtering algorithm is out of question.

What I'm trying to do is filter out curse words from a string, as long as they're not whitelisted.

An example:
I blacklisted shit and whitelisted the word 'doshite' (sorry, couldn't think of anything better)

If the user says "If you want to say why in Japanese, you say doshite.", nothing happens.
If they say "Fuck off dipshit", it turns into "Fuck off dip****".

How would you go about this? I have the blacklisted words stored in Array: blacklist and whitelisted words in Array: whitelist.

I'd also like to completely evade the breaking into single words method, as I'll remove the whitespaces before filtering the string anyway.

Last edited by Backstabnoob; 03-08-2014 at 15:55.
Backstabnoob is offline
fysiks
Veteran Member
Join Date: Sep 2007
Location: Flatland, USA
Old 03-08-2014 , 16:00   Re: Replace blacklisted word unless it's a part of whitelisted word - string
Reply With Quote #2

Removing whitespaces actually makes it harder. The best way that I can think of is to use regex finding the word that contains "shit" but also include wild cards to get the whole word. Then, you can run that through your whitelist to see if you should block it or not.

regex: \b\w*shit\w*\b

This should return "doshite" which you then run through your whitelist. It no in the whitelist, assume it's a bad word.

(Do NOT remove spaces)
__________________

Last edited by fysiks; 03-08-2014 at 16:00.
fysiks is offline
Backstabnoob
BANNED
Join Date: Feb 2009
Location: Iwotadai Dorm
Old 03-08-2014 , 16:27   Re: Replace blacklisted word unless it's a part of whitelisted word - string
Reply With Quote #3

I have to remove spaces, as people just decide to write "s h i t" instead which asks for more attention, ultimately being exactly the opposite of what I'm trying to achieve.

Last edited by Backstabnoob; 03-08-2014 at 16:27.
Backstabnoob is offline
Black Rose
Veteran Member
Join Date: Feb 2011
Location: Stockholm, Sweden
Old 03-08-2014 , 19:12   Re: Replace blacklisted word unless it's a part of whitelisted word - string
Reply With Quote #4

Negative lookbehind would be well suited in this case.
Code:
(?<!do)shit
It would be better with
Code:
(?<!do)shit(?!e)
but the expressions only works through AND, not OR meaning it would not filter "shite".

Whitespace is really not good to remove because you could end up with new words that are blacklisted.
Wince whitespace is easy to account for it would be better to remove all special characters (.,-_*! and so on) and just doing
Code:
(?<!do)\s*s\s*h\s*i\s*t
__________________

Last edited by Black Rose; 03-08-2014 at 19:25.
Black Rose is offline
Backstabnoob
BANNED
Join Date: Feb 2009
Location: Iwotadai Dorm
Old 03-08-2014 , 20:27   Re: Replace blacklisted word unless it's a part of whitelisted word - string
Reply With Quote #5

I guess I could keep the whitespace. Anyway, I have very little experience in regex, especially AMXX wise. How do I go about this if I have exactly two arrays, one with blacklisted words and one with whitelisted words?

Also, "shite" really needs to be filtered. I take it as there's no other option than breaking the whole string into words and then comparing them one by one?
Backstabnoob is offline
Black Rose
Veteran Member
Join Date: Feb 2011
Location: Stockholm, Sweden
Old 03-08-2014 , 20:36   Re: Replace blacklisted word unless it's a part of whitelisted word - string
Reply With Quote #6

My last post is based on blacklist only.
You check the whole string for results using the last expression i posted.
It will match everything that contains "shit" with whitespace in between unless it has a prefix of "do".
Perhaps removing the first whitespace check would be good. Otherwise you might end up missing for example "You don't do shit".
The following will match "shit" and part of "shite", "dipshit" et.c. but not "doshit" (or "doshite").
Code:
(?<!do)s\s*h\s*i\s*t
If you want the whole word removed when matched with words that contains "shit" you'll have to expand the expression some more.
I'll show it without the whitespace checks so it will be easier to read.
The following will match the whole word of "shit", "shite", "dipshit" but not "doshit".
Code:
(?<!do)(?:dip)?shit(?:e)?
And with the full whitespace check...
Code:
(?<!do)s\s*h\s*i\s*t(?:\s*e)?
Making it further dynamic would probably only cause harm. In this case, not many words contains "shit". If you use the following expression for the purpose of matching everything starting with shit, like "shitface", "shithead" et.c., you would end up matching words like shitake.
Code:
(?<!do)s\s*h\s*i\s*t[\w]*
If you're doing whitelist the most simple solution would be to loop the whole whitelist, replacing the accepted words with placeholders, do the blacklist and then change the whitelisted words back. Hopefully the whitelist doesn't have to be so large, making it still pretty efficient.
__________________

Last edited by Black Rose; 03-08-2014 at 21:09.
Black Rose is offline
Backstabnoob
BANNED
Join Date: Feb 2009
Location: Iwotadai Dorm
Old 03-08-2014 , 20:38   Re: Replace blacklisted word unless it's a part of whitelisted word - string
Reply With Quote #7

I really need it with dynamic lists though, otherwise it's pointless.
Backstabnoob is offline
Black Rose
Veteran Member
Join Date: Feb 2011
Location: Stockholm, Sweden
Old 03-08-2014 , 21:11   Re: Replace blacklisted word unless it's a part of whitelisted word - string
Reply With Quote #8

Dynamic lists? What do you mean?
__________________
Black Rose is offline
Backstabnoob
BANNED
Join Date: Feb 2009
Location: Iwotadai Dorm
Old 03-08-2014 , 21:22   Re: Replace blacklisted word unless it's a part of whitelisted word - string
Reply With Quote #9

I mean that the user has the ability to write his own whitelist or blacklist entries into a config file which then gets parsed and its content saved to the arrays.
Backstabnoob is offline
Black Rose
Veteran Member
Join Date: Feb 2011
Location: Stockholm, Sweden
Old 03-08-2014 , 21:53   Re: Replace blacklisted word unless it's a part of whitelisted word - string
Reply With Quote #10

Well anything can be read from a file meaning everything can be made into a dynamic list.
I gave you some options to pick from.

This is the best I can come up with at the moment:
Code:
#include <amxmodx> public plugin_init() {     register_plugin("Test Plugin 1", "", "[ --{-@ ]");         new Array:hWhitelist = ArrayCreate(32);     ArrayPushString(hWhitelist, "doshite");         new Array:hBlacklist = ArrayCreate(32);     ArrayPushString(hBlacklist, "dipshit");     ArrayPushString(hBlacklist, "shite");     ArrayPushString(hBlacklist, "shit");         new Array:hMemory = ArrayCreate();         new string[] = "shit doshite dipshit shite";         server_print("PRE: %s", string);         new placeholder[3] = { 5, 0, 0 };         new tempstring[32];         new size = ArraySize(hWhitelist);         for ( new i ; i < size ; i++ ) {         ArrayGetString(hWhitelist, i, tempstring, charsmax(tempstring));         placeholder[1] = i;         if ( replace_all(string, charsmax(string), tempstring, placeholder) )             ArrayPushCell(hMemory, i);     }         size = ArraySize(hBlacklist);         for ( new i ; i < size ; i++ ) {         ArrayGetString(hBlacklist, i, tempstring, charsmax(tempstring));         replace_all(string, charsmax(string), tempstring, "***");     }         size = ArraySize(hMemory);         for ( new i ; i < size ; i++ ) {         placeholder[1] = ArrayGetCell(hMemory, i);         ArrayGetString(hWhitelist, placeholder[1], tempstring, charsmax(tempstring));         replace_all(string, charsmax(string), placeholder, tempstring);     }         ArrayClear(hMemory);         server_print("POST: %s", string); }

Code:
PRE: shit doshite dipshit shite
POST: *** doshite *** ***
__________________

Last edited by Black Rose; 03-08-2014 at 22:02.
Black Rose is offline
Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -4. The time now is 06:05.


Powered by vBulletin®
Copyright ©2000 - 2024, vBulletin Solutions, Inc.
Theme made by Freecode