AlliedModders

AlliedModders (https://forums.alliedmods.net/index.php)
-   Code Snippets/Tutorials (https://forums.alliedmods.net/forumdisplay.php?f=83)
-   -   [TUT] Regular Expressions (https://forums.alliedmods.net/showthread.php?t=187308)

Exolent[jNr] 06-11-2012 15:33

[TUT] Regular Expressions
 
1 Attachment(s)
Regular Expressions

Introduction
For information on what regular expressions are (or how to create them), consult the wiki.

RegEx Patterns
Here is a quick reference for patterns:
Code:

+-------------+--------------------------------------------------------------+
| Expression  | Description                                                  |
+-------------+--------------------------------------------------------------+
| .          | Any character except newline.                                |
+-------------+--------------------------------------------------------------+
| \.          | A period (and so on for \*, \(, \\, etc.)                    |
+-------------+--------------------------------------------------------------+
| ^          | The start of the string.                                    |
+-------------+--------------------------------------------------------------+
| $          | The end of the string.                                      |
+-------------+--------------------------------------------------------------+
| \d,\w,\s    | A digit, word character [A-Za-z0-9_], or whitespace.        |
+-------------+--------------------------------------------------------------+
| \D,\W,\S    | Anything except a digit, word character, or whitespace.      |
+-------------+--------------------------------------------------------------+
| [abc]      | Character a, b, or c.                                        |
+-------------+--------------------------------------------------------------+
| [a-z]      | a through z.                                                |
+-------------+--------------------------------------------------------------+
| [^abc]      | Any character except a, b, or c.                            |
+-------------+--------------------------------------------------------------+
| aa|bb      | Either aa or bb.                                            |
+-------------+--------------------------------------------------------------+
| ?          | Zero or one of the preceding element.                        |
+-------------+--------------------------------------------------------------+
| *          | Zero or more of the preceding element.                      |
+-------------+--------------------------------------------------------------+
| +          | One or more of the preceding element.                        |
+-------------+--------------------------------------------------------------+
| {n}        | Exactly n of the preceding element.                          |
+-------------+--------------------------------------------------------------+
| {n,}        | n or more of the preceding element.                          |
+-------------+--------------------------------------------------------------+
| {m,n}      | Between m and n of the preceding element.                    |
+-------------+--------------------------------------------------------------+
| ??,*?,+?,  | Same as above, but as few as possible.                      |
| {n}?, etc.  |                                                              |
+-------------+--------------------------------------------------------------+
| (expr)      | Capture expr for use with \1, etc.                          |
+-------------+--------------------------------------------------------------+
| (?:expr)    | Non-capturing group.                                        |
+-------------+--------------------------------------------------------------+
| (?=expr)    | Followed by expr.                                            |
+-------------+--------------------------------------------------------------+
| (?!expr)    | Not followed by expr.                                        |
+-------------+--------------------------------------------------------------+

For a near-complete reference, look here:
https://developer.mozilla.org/en/Jav...ar_expressions

Example:
Let's try to catch a swear work, like "ass".
We could simply make our pattern "ass" and it would match it.
But, it would also match "glass" or "brass", so how do we fix that?

We add a non-character requirement in front of ass.
So we would do this: "\Wass", which means allow anything except for a letter before "ass".
Now, "glass" or "brass" won't get picked, but now just "ass" won't because it requires something besides a letter in front.

To fix this, we make the non-character requirement also allow nothing in front, like so: "\W*ass"

We're still not done yet. What about the word "assassin"?
That will still get picked up because it matches the first "ass" in "assassin".
So now we have to allow nothing, or anything except a character, after it: "\W*ass\W*"

Looks good, right? Now, we need to add checking for people who would type variations, like asssss or aaasssss.
So, let's just say any number of A's and at least 2 S's: "\W*a+s{2,}\W*"

One last thing is checking for non-alphabetic characters to look like those letters, like a$$.
Well, let's see what looks alike:
a = @, 4, /\, /-\
s = 5, z, $

Now, let's transform them into character classes:
a = (?:[a@4]|\/\\|\/-\\)
s = [s5z\$]

We had to put A's in a non-capturing group since /\ and /-\ are more than 1 character and won't work for character classes.
I used a non-capturing group so every A isn't put into the matches when checked.
Also, the slashes in A's and the dollar sign in S's needed to be escaped.

Add those to the pattern now:
"\W*(?:[a@4]|\/\\|\/-\\)+[s5z\$]{2,}\W*"

Finally, let's add the ignore case-sensitivity flag to allow capital letters checking:
/\W*(?:[a@4]|\/\\|\/-\\)+[s5z\$]{2,}\W*/i

That's how you create RegEx patterns, and they normally do look crazy when you're done with them so be sure to comment what they are for in case you need to go back to them later.

RegEx Flags
Flags were introduced to AMXX in version 1.8.

These are the flags you can use with your patterns:
  • i - Ignore case
    - Ignores case sensitivity with your pattern, so /a/i would be "a" and "A"
  • m - Multilines
    - Affects ^ and $ so that they match the start/end of a line rather than the start/end of the string
  • s - Single line
    - Affects . so it matches any character, even new line characters
  • x - Pattern extension
    - Ignore whitespace and # comments

RegEx in AMXX
In most languages, you will see this format for the expression: /pattern/flags
In AMXX, the pattern and flags are separated, and those slashes are taken away.

So if you had this pattern: /[ch]+at/i
It would be split into:
- pattern: [ch]+at
- flags: i

Do not escape for new line characters (or other escapable characters) with Pawns escape character, like ^n.
Instead, use the RegEx escape with \n.

Using RegEx
First, you'll need to include regex.inc:
Code:
#include <regex>

These are the error codes from matching:
Code:
enum Regex {     REGEX_MATCH_FAIL = -2,     REGEX_PATTERN_FAIL,     REGEX_NO_MATCH,     REGEX_OK };

Checking if a string matches a pattern is simple.
Spoiler


Example:
Spoiler


You can get the individual matches (if you supplied groupings).
Spoiler


Example, grabbing the numbers from a SteamID:
Spoiler

Compiling Patterns
When you have a pattern that will be used more than one time, it is more efficient to compile it.

Spoiler


Example:
Spoiler


When matching against strings, you have to use another match function:
Spoiler


Example:
Spoiler


Another example for compiling:
Spoiler

RegEx Tester
RegEx Tester is a plugin I wrote to be able to check strings against a pattern via the server console.

Command:
- regex_pattern <pattern>
- regex_test <test data>

The pattern can be in /pattern/flags format or just the pattern itself.

Example:
Code:

] regex_pattern "^STEAM_0:[01]:\d+$"
Pattern set to: ^STEAM_0:[01]:\d+$
Flags set to:
] regex_test "STEAM_0:1:23456"
1 matches
1. "STEAM_0:1:23456"
] regex_pattern "/^STEAM_0:[01]:\d+$/i"
Pattern set to: ^STEAM_0:[01]:\d+$
Flags set to: i
] regex_test "steam_0:1:23456"
1 matches
1. "steam_0:1:23456"

You can use this if you are unsure whether or not your pattern will work.

Notes
If you have more information, don't understand something, or need help with a pattern, feel free to post.

meTaLiCroSS 06-11-2012 17:03

Re: [TUT] Regular Expressions
 
Excellent :)

mabaclu 06-12-2012 10:22

Re: [TUT] Regular Expressions
 
Quote:

Originally Posted by meTaLiCroSS (Post 1726949)
Excellent :)

*Exolent

meTaLiCroSS 06-12-2012 14:52

Re: [TUT] Regular Expressions
 
Quote:

Originally Posted by mabaclu (Post 1727294)
*Exolent

No, excellent job :)

<VeCo> 06-12-2012 15:26

Re: [TUT] Regular Expressions
 
Exolent job. :3
The RegEx Tester is very useful, the community really needed a tutorial about regular expressions! :)

fysiks 06-12-2012 19:03

Re: [TUT] Regular Expressions
 
3 years too late for me :). Nice addition Exolent.

Backstabnoob 06-13-2012 09:23

Re: [TUT] Regular Expressions
 
Thanks, now I have to go and read some regex tutorial, because finding the patterns in online libraries sucks.

Exolent[jNr] 06-13-2012 17:07

Re: [TUT] Regular Expressions
 
Added quick information on patterns and how to make them. Hope that helps.

Backstabnoob 06-17-2012 19:21

Re: [TUT] Regular Expressions
 
Can you show me how would a pattern for cron format look? This is how it is in PHP:
Code:

/^((\*(\/[0-9]+)?)|[0-9\-\,\/]+)\s+((\*(\/[0-9]+)?)|[0-9\-\,\/]+)\s+((\*(\/[0-9]+)?)|[0-9\-\,\/]+)\s+((\*(\/[0-9]+)?)|[0-9\-\,\/]+)\s+((\*(\/[0-9]+)?)|[0-9\-\,\/]+)$/i
It's just too complex for me.

Also, can you split a string using a regex pattern in pawn?

Exolent[jNr] 06-17-2012 20:50

Re: [TUT] Regular Expressions
 
I'm not sure what exactly the cron format is, so I can't really tell you.
You can just use that pattern (separate pattern and flags when using it) in your code as is.


All times are GMT -4. The time now is 18:50.

Powered by vBulletin®
Copyright ©2000 - 2024, vBulletin Solutions, Inc.