AlliedModders

AlliedModders (https://forums.alliedmods.net/index.php)
-   Scripting Help (https://forums.alliedmods.net/forumdisplay.php?f=11)
-   -   Replace blacklisted word unless it's a part of whitelisted word - string (https://forums.alliedmods.net/showthread.php?t=236642)

Backstabnoob 03-08-2014 15:53

Replace blacklisted word unless it's a part of whitelisted word - string
 
Alright so there's the unapproved Advanced Swear Filter, which apparently should have this feature. It doesn't work properly, so copying its' filtering algorithm is out of question.

What I'm trying to do is filter out curse words from a string, as long as they're not whitelisted.

An example:
I blacklisted shit and whitelisted the word 'doshite' (sorry, couldn't think of anything better)

If the user says "If you want to say why in Japanese, you say doshite.", nothing happens.
If they say "Fuck off dipshit", it turns into "Fuck off dip****".

How would you go about this? I have the blacklisted words stored in Array: blacklist and whitelisted words in Array: whitelist.

I'd also like to completely evade the breaking into single words method, as I'll remove the whitespaces before filtering the string anyway.

fysiks 03-08-2014 16:00

Re: Replace blacklisted word unless it's a part of whitelisted word - string
 
Removing whitespaces actually makes it harder. The best way that I can think of is to use regex finding the word that contains "shit" but also include wild cards to get the whole word. Then, you can run that through your whitelist to see if you should block it or not.

regex: \b\w*shit\w*\b

This should return "doshite" which you then run through your whitelist. It no in the whitelist, assume it's a bad word.

(Do NOT remove spaces)

Backstabnoob 03-08-2014 16:27

Re: Replace blacklisted word unless it's a part of whitelisted word - string
 
I have to remove spaces, as people just decide to write "s h i t" instead which asks for more attention, ultimately being exactly the opposite of what I'm trying to achieve.

Black Rose 03-08-2014 19:12

Re: Replace blacklisted word unless it's a part of whitelisted word - string
 
Negative lookbehind would be well suited in this case.
Code:

(?<!do)shit
It would be better with
Code:

(?<!do)shit(?!e)
but the expressions only works through AND, not OR meaning it would not filter "shite".

Whitespace is really not good to remove because you could end up with new words that are blacklisted.
Wince whitespace is easy to account for it would be better to remove all special characters (.,-_*! and so on) and just doing
Code:

(?<!do)\s*s\s*h\s*i\s*t

Backstabnoob 03-08-2014 20:27

Re: Replace blacklisted word unless it's a part of whitelisted word - string
 
I guess I could keep the whitespace. Anyway, I have very little experience in regex, especially AMXX wise. How do I go about this if I have exactly two arrays, one with blacklisted words and one with whitelisted words?

Also, "shite" really needs to be filtered. I take it as there's no other option than breaking the whole string into words and then comparing them one by one?

Black Rose 03-08-2014 20:36

Re: Replace blacklisted word unless it's a part of whitelisted word - string
 
My last post is based on blacklist only.
You check the whole string for results using the last expression i posted.
It will match everything that contains "shit" with whitespace in between unless it has a prefix of "do".
Perhaps removing the first whitespace check would be good. Otherwise you might end up missing for example "You don't do shit".
The following will match "shit" and part of "shite", "dipshit" et.c. but not "doshit" (or "doshite").
Code:

(?<!do)s\s*h\s*i\s*t
If you want the whole word removed when matched with words that contains "shit" you'll have to expand the expression some more.
I'll show it without the whitespace checks so it will be easier to read.
The following will match the whole word of "shit", "shite", "dipshit" but not "doshit".
Code:

(?<!do)(?:dip)?shit(?:e)?
And with the full whitespace check...
Code:

(?<!do)s\s*h\s*i\s*t(?:\s*e)?
Making it further dynamic would probably only cause harm. In this case, not many words contains "shit". If you use the following expression for the purpose of matching everything starting with shit, like "shitface", "shithead" et.c., you would end up matching words like shitake.
Code:

(?<!do)s\s*h\s*i\s*t[\w]*
If you're doing whitelist the most simple solution would be to loop the whole whitelist, replacing the accepted words with placeholders, do the blacklist and then change the whitelisted words back. Hopefully the whitelist doesn't have to be so large, making it still pretty efficient.

Backstabnoob 03-08-2014 20:38

Re: Replace blacklisted word unless it's a part of whitelisted word - string
 
I really need it with dynamic lists though, otherwise it's pointless.

Black Rose 03-08-2014 21:11

Re: Replace blacklisted word unless it's a part of whitelisted word - string
 
Dynamic lists? What do you mean?

Backstabnoob 03-08-2014 21:22

Re: Replace blacklisted word unless it's a part of whitelisted word - string
 
I mean that the user has the ability to write his own whitelist or blacklist entries into a config file which then gets parsed and its content saved to the arrays.

Black Rose 03-08-2014 21:53

Re: Replace blacklisted word unless it's a part of whitelisted word - string
 
Well anything can be read from a file meaning everything can be made into a dynamic list.
I gave you some options to pick from.

This is the best I can come up with at the moment:
Code:
#include <amxmodx> public plugin_init() {     register_plugin("Test Plugin 1", "", "[ --{-@ ]");         new Array:hWhitelist = ArrayCreate(32);     ArrayPushString(hWhitelist, "doshite");         new Array:hBlacklist = ArrayCreate(32);     ArrayPushString(hBlacklist, "dipshit");     ArrayPushString(hBlacklist, "shite");     ArrayPushString(hBlacklist, "shit");         new Array:hMemory = ArrayCreate();         new string[] = "shit doshite dipshit shite";         server_print("PRE: %s", string);         new placeholder[3] = { 5, 0, 0 };         new tempstring[32];         new size = ArraySize(hWhitelist);         for ( new i ; i < size ; i++ ) {         ArrayGetString(hWhitelist, i, tempstring, charsmax(tempstring));         placeholder[1] = i;         if ( replace_all(string, charsmax(string), tempstring, placeholder) )             ArrayPushCell(hMemory, i);     }         size = ArraySize(hBlacklist);         for ( new i ; i < size ; i++ ) {         ArrayGetString(hBlacklist, i, tempstring, charsmax(tempstring));         replace_all(string, charsmax(string), tempstring, "***");     }         size = ArraySize(hMemory);         for ( new i ; i < size ; i++ ) {         placeholder[1] = ArrayGetCell(hMemory, i);         ArrayGetString(hWhitelist, placeholder[1], tempstring, charsmax(tempstring));         replace_all(string, charsmax(string), placeholder, tempstring);     }         ArrayClear(hMemory);         server_print("POST: %s", string); }

Code:

PRE: shit doshite dipshit shite
POST: *** doshite *** ***



All times are GMT -4. The time now is 06:05.

Powered by vBulletin®
Copyright ©2000 - 2024, vBulletin Solutions, Inc.