Raised This Month: $ Target: $400
 0% 

[TUT] Regular Expressions


  
 
 
Thread Tools Display Modes
Prev Previous Post   Next Post Next
Author Message
Exolent[jNr]
Veteran Member
Join Date: Feb 2007
Location: Tennessee
Old 06-11-2012 , 15:33   [TUT] Regular Expressions
Reply With Quote #1

Regular Expressions

Introduction
For information on what regular expressions are (or how to create them), consult the wiki.

RegEx Patterns
Here is a quick reference for patterns:
Code:
+-------------+--------------------------------------------------------------+
| Expression  | Description                                                  |
+-------------+--------------------------------------------------------------+
| .           | Any character except newline.                                |
+-------------+--------------------------------------------------------------+
| \.          | A period (and so on for \*, \(, \\, etc.)                    |
+-------------+--------------------------------------------------------------+
| ^           | The start of the string.                                     |
+-------------+--------------------------------------------------------------+
| $           | The end of the string.                                       |
+-------------+--------------------------------------------------------------+
| \d,\w,\s    | A digit, word character [A-Za-z0-9_], or whitespace.         |
+-------------+--------------------------------------------------------------+
| \D,\W,\S    | Anything except a digit, word character, or whitespace.      |
+-------------+--------------------------------------------------------------+
| [abc]       | Character a, b, or c.                                        |
+-------------+--------------------------------------------------------------+
| [a-z]       | a through z.                                                 |
+-------------+--------------------------------------------------------------+
| [^abc]      | Any character except a, b, or c.                             |
+-------------+--------------------------------------------------------------+
| aa|bb       | Either aa or bb.                                             |
+-------------+--------------------------------------------------------------+
| ?           | Zero or one of the preceding element.                        |
+-------------+--------------------------------------------------------------+
| *           | Zero or more of the preceding element.                       |
+-------------+--------------------------------------------------------------+
| +           | One or more of the preceding element.                        |
+-------------+--------------------------------------------------------------+
| {n}         | Exactly n of the preceding element.                          |
+-------------+--------------------------------------------------------------+
| {n,}        | n or more of the preceding element.                          |
+-------------+--------------------------------------------------------------+
| {m,n}       | Between m and n of the preceding element.                    |
+-------------+--------------------------------------------------------------+
| ??,*?,+?,   | Same as above, but as few as possible.                       |
| {n}?, etc.  |                                                              |
+-------------+--------------------------------------------------------------+
| (expr)      | Capture expr for use with \1, etc.                           |
+-------------+--------------------------------------------------------------+
| (?:expr)    | Non-capturing group.                                         |
+-------------+--------------------------------------------------------------+
| (?=expr)    | Followed by expr.                                            |
+-------------+--------------------------------------------------------------+
| (?!expr)    | Not followed by expr.                                        |
+-------------+--------------------------------------------------------------+
For a near-complete reference, look here:
https://developer.mozilla.org/en/Jav...ar_expressions

Example:
Let's try to catch a swear work, like "ass".
We could simply make our pattern "ass" and it would match it.
But, it would also match "glass" or "brass", so how do we fix that?

We add a non-character requirement in front of ass.
So we would do this: "\Wass", which means allow anything except for a letter before "ass".
Now, "glass" or "brass" won't get picked, but now just "ass" won't because it requires something besides a letter in front.

To fix this, we make the non-character requirement also allow nothing in front, like so: "\W*ass"

We're still not done yet. What about the word "assassin"?
That will still get picked up because it matches the first "ass" in "assassin".
So now we have to allow nothing, or anything except a character, after it: "\W*ass\W*"

Looks good, right? Now, we need to add checking for people who would type variations, like asssss or aaasssss.
So, let's just say any number of A's and at least 2 S's: "\W*a+s{2,}\W*"

One last thing is checking for non-alphabetic characters to look like those letters, like a$$.
Well, let's see what looks alike:
a = @, 4, /\, /-\
s = 5, z, $

Now, let's transform them into character classes:
a = (?:[a@4]|\/\\|\/-\\)
s = [s5z\$]

We had to put A's in a non-capturing group since /\ and /-\ are more than 1 character and won't work for character classes.
I used a non-capturing group so every A isn't put into the matches when checked.
Also, the slashes in A's and the dollar sign in S's needed to be escaped.

Add those to the pattern now:
"\W*(?:[a@4]|\/\\|\/-\\)+[s5z\$]{2,}\W*"

Finally, let's add the ignore case-sensitivity flag to allow capital letters checking:
/\W*(?:[a@4]|\/\\|\/-\\)+[s5z\$]{2,}\W*/i

That's how you create RegEx patterns, and they normally do look crazy when you're done with them so be sure to comment what they are for in case you need to go back to them later.

RegEx Flags
Flags were introduced to AMXX in version 1.8.

These are the flags you can use with your patterns:
  • i - Ignore case
    - Ignores case sensitivity with your pattern, so /a/i would be "a" and "A"
  • m - Multilines
    - Affects ^ and $ so that they match the start/end of a line rather than the start/end of the string
  • s - Single line
    - Affects . so it matches any character, even new line characters
  • x - Pattern extension
    - Ignore whitespace and # comments

RegEx in AMXX
In most languages, you will see this format for the expression: /pattern/flags
In AMXX, the pattern and flags are separated, and those slashes are taken away.

So if you had this pattern: /[ch]+at/i
It would be split into:
- pattern: [ch]+at
- flags: i

Do not escape for new line characters (or other escapable characters) with Pawns escape character, like ^n.
Instead, use the RegEx escape with \n.

Using RegEx
First, you'll need to include regex.inc:
Code:
#include <regex>

These are the error codes from matching:
Code:
enum Regex {     REGEX_MATCH_FAIL = -2,     REGEX_PATTERN_FAIL,     REGEX_NO_MATCH,     REGEX_OK };

Checking if a string matches a pattern is simple.
Spoiler


Example:
Spoiler


You can get the individual matches (if you supplied groupings).
Spoiler


Example, grabbing the numbers from a SteamID:
Spoiler

Compiling Patterns
When you have a pattern that will be used more than one time, it is more efficient to compile it.

Spoiler


Example:
Spoiler


When matching against strings, you have to use another match function:
Spoiler


Example:
Spoiler


Another example for compiling:
Spoiler

RegEx Tester
RegEx Tester is a plugin I wrote to be able to check strings against a pattern via the server console.

Command:
- regex_pattern <pattern>
- regex_test <test data>

The pattern can be in /pattern/flags format or just the pattern itself.

Example:
Code:
] regex_pattern "^STEAM_0:[01]:\d+$"
Pattern set to: ^STEAM_0:[01]:\d+$
Flags set to: 
] regex_test "STEAM_0:1:23456"
1 matches
1. "STEAM_0:1:23456"
] regex_pattern "/^STEAM_0:[01]:\d+$/i"
Pattern set to: ^STEAM_0:[01]:\d+$
Flags set to: i
] regex_test "steam_0:1:23456"
1 matches
1. "steam_0:1:23456"
You can use this if you are unsure whether or not your pattern will work.

Notes
If you have more information, don't understand something, or need help with a pattern, feel free to post.
Attached Files
File Type: sma Get Plugin or Get Source (regex_tester.sma - 1523 views - 2.3 KB)
__________________
No private work or selling mods.
Quote:
Originally Posted by xPaw View Post
I love you exolent!

Last edited by Exolent[jNr]; 06-13-2012 at 17:07.
Exolent[jNr] is offline
 



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -4. The time now is 03:36.


Powered by vBulletin®
Copyright ©2000 - 2024, vBulletin Solutions, Inc.
Theme made by Freecode