Raised This Month: $12 Target: $400
 3% 

[TUT] Regular Expressions


Post New Thread Reply   
 
Thread Tools Display Modes
Author Message
Exolent[jNr]
Veteran Member
Join Date: Feb 2007
Location: Tennessee
Old 06-11-2012 , 15:33   [TUT] Regular Expressions
Reply With Quote #1

Regular Expressions

Introduction
For information on what regular expressions are (or how to create them), consult the wiki.

RegEx Patterns
Here is a quick reference for patterns:
Code:
+-------------+--------------------------------------------------------------+
| Expression  | Description                                                  |
+-------------+--------------------------------------------------------------+
| .           | Any character except newline.                                |
+-------------+--------------------------------------------------------------+
| \.          | A period (and so on for \*, \(, \\, etc.)                    |
+-------------+--------------------------------------------------------------+
| ^           | The start of the string.                                     |
+-------------+--------------------------------------------------------------+
| $           | The end of the string.                                       |
+-------------+--------------------------------------------------------------+
| \d,\w,\s    | A digit, word character [A-Za-z0-9_], or whitespace.         |
+-------------+--------------------------------------------------------------+
| \D,\W,\S    | Anything except a digit, word character, or whitespace.      |
+-------------+--------------------------------------------------------------+
| [abc]       | Character a, b, or c.                                        |
+-------------+--------------------------------------------------------------+
| [a-z]       | a through z.                                                 |
+-------------+--------------------------------------------------------------+
| [^abc]      | Any character except a, b, or c.                             |
+-------------+--------------------------------------------------------------+
| aa|bb       | Either aa or bb.                                             |
+-------------+--------------------------------------------------------------+
| ?           | Zero or one of the preceding element.                        |
+-------------+--------------------------------------------------------------+
| *           | Zero or more of the preceding element.                       |
+-------------+--------------------------------------------------------------+
| +           | One or more of the preceding element.                        |
+-------------+--------------------------------------------------------------+
| {n}         | Exactly n of the preceding element.                          |
+-------------+--------------------------------------------------------------+
| {n,}        | n or more of the preceding element.                          |
+-------------+--------------------------------------------------------------+
| {m,n}       | Between m and n of the preceding element.                    |
+-------------+--------------------------------------------------------------+
| ??,*?,+?,   | Same as above, but as few as possible.                       |
| {n}?, etc.  |                                                              |
+-------------+--------------------------------------------------------------+
| (expr)      | Capture expr for use with \1, etc.                           |
+-------------+--------------------------------------------------------------+
| (?:expr)    | Non-capturing group.                                         |
+-------------+--------------------------------------------------------------+
| (?=expr)    | Followed by expr.                                            |
+-------------+--------------------------------------------------------------+
| (?!expr)    | Not followed by expr.                                        |
+-------------+--------------------------------------------------------------+
For a near-complete reference, look here:
https://developer.mozilla.org/en/Jav...ar_expressions

Example:
Let's try to catch a swear work, like "ass".
We could simply make our pattern "ass" and it would match it.
But, it would also match "glass" or "brass", so how do we fix that?

We add a non-character requirement in front of ass.
So we would do this: "\Wass", which means allow anything except for a letter before "ass".
Now, "glass" or "brass" won't get picked, but now just "ass" won't because it requires something besides a letter in front.

To fix this, we make the non-character requirement also allow nothing in front, like so: "\W*ass"

We're still not done yet. What about the word "assassin"?
That will still get picked up because it matches the first "ass" in "assassin".
So now we have to allow nothing, or anything except a character, after it: "\W*ass\W*"

Looks good, right? Now, we need to add checking for people who would type variations, like asssss or aaasssss.
So, let's just say any number of A's and at least 2 S's: "\W*a+s{2,}\W*"

One last thing is checking for non-alphabetic characters to look like those letters, like a$$.
Well, let's see what looks alike:
a = @, 4, /\, /-\
s = 5, z, $

Now, let's transform them into character classes:
a = (?:[a@4]|\/\\|\/-\\)
s = [s5z\$]

We had to put A's in a non-capturing group since /\ and /-\ are more than 1 character and won't work for character classes.
I used a non-capturing group so every A isn't put into the matches when checked.
Also, the slashes in A's and the dollar sign in S's needed to be escaped.

Add those to the pattern now:
"\W*(?:[a@4]|\/\\|\/-\\)+[s5z\$]{2,}\W*"

Finally, let's add the ignore case-sensitivity flag to allow capital letters checking:
/\W*(?:[a@4]|\/\\|\/-\\)+[s5z\$]{2,}\W*/i

That's how you create RegEx patterns, and they normally do look crazy when you're done with them so be sure to comment what they are for in case you need to go back to them later.

RegEx Flags
Flags were introduced to AMXX in version 1.8.

These are the flags you can use with your patterns:
  • i - Ignore case
    - Ignores case sensitivity with your pattern, so /a/i would be "a" and "A"
  • m - Multilines
    - Affects ^ and $ so that they match the start/end of a line rather than the start/end of the string
  • s - Single line
    - Affects . so it matches any character, even new line characters
  • x - Pattern extension
    - Ignore whitespace and # comments

RegEx in AMXX
In most languages, you will see this format for the expression: /pattern/flags
In AMXX, the pattern and flags are separated, and those slashes are taken away.

So if you had this pattern: /[ch]+at/i
It would be split into:
- pattern: [ch]+at
- flags: i

Do not escape for new line characters (or other escapable characters) with Pawns escape character, like ^n.
Instead, use the RegEx escape with \n.

Using RegEx
First, you'll need to include regex.inc:
Code:
#include <regex>

These are the error codes from matching:
Code:
enum Regex {     REGEX_MATCH_FAIL = -2,     REGEX_PATTERN_FAIL,     REGEX_NO_MATCH,     REGEX_OK };

Checking if a string matches a pattern is simple.
Spoiler


Example:
Spoiler


You can get the individual matches (if you supplied groupings).
Spoiler


Example, grabbing the numbers from a SteamID:
Spoiler

Compiling Patterns
When you have a pattern that will be used more than one time, it is more efficient to compile it.

Spoiler


Example:
Spoiler


When matching against strings, you have to use another match function:
Spoiler


Example:
Spoiler


Another example for compiling:
Spoiler

RegEx Tester
RegEx Tester is a plugin I wrote to be able to check strings against a pattern via the server console.

Command:
- regex_pattern <pattern>
- regex_test <test data>

The pattern can be in /pattern/flags format or just the pattern itself.

Example:
Code:
] regex_pattern "^STEAM_0:[01]:\d+$"
Pattern set to: ^STEAM_0:[01]:\d+$
Flags set to: 
] regex_test "STEAM_0:1:23456"
1 matches
1. "STEAM_0:1:23456"
] regex_pattern "/^STEAM_0:[01]:\d+$/i"
Pattern set to: ^STEAM_0:[01]:\d+$
Flags set to: i
] regex_test "steam_0:1:23456"
1 matches
1. "steam_0:1:23456"
You can use this if you are unsure whether or not your pattern will work.

Notes
If you have more information, don't understand something, or need help with a pattern, feel free to post.
Attached Files
File Type: sma Get Plugin or Get Source (regex_tester.sma - 1501 views - 2.3 KB)
__________________
No private work or selling mods.
Quote:
Originally Posted by xPaw View Post
I love you exolent!

Last edited by Exolent[jNr]; 06-13-2012 at 17:07.
Exolent[jNr] is offline
meTaLiCroSS
Gaze Upon My Hat
Join Date: Feb 2009
Location: Viņa del Mar, Chile
Old 06-11-2012 , 17:03   Re: [TUT] Regular Expressions
Reply With Quote #2

Excellent
__________________
Quote:
Originally Posted by joropito View Post
You're right Metalicross
meTaLiCroSS is offline
mabaclu
Senior Member
Join Date: Jun 2010
Location: Portugal
Old 06-12-2012 , 10:22   Re: [TUT] Regular Expressions
Reply With Quote #3

Quote:
Originally Posted by meTaLiCroSS View Post
Excellent
*Exolent
__________________
mabaclu is offline
meTaLiCroSS
Gaze Upon My Hat
Join Date: Feb 2009
Location: Viņa del Mar, Chile
Old 06-12-2012 , 14:52   Re: [TUT] Regular Expressions
Reply With Quote #4

Quote:
Originally Posted by mabaclu View Post
*Exolent
No, excellent job
__________________
Quote:
Originally Posted by joropito View Post
You're right Metalicross
meTaLiCroSS is offline
<VeCo>
Veteran Member
Join Date: Jul 2009
Location: Bulgaria
Old 06-12-2012 , 15:26   Re: [TUT] Regular Expressions
Reply With Quote #5

Exolent job.
The RegEx Tester is very useful, the community really needed a tutorial about regular expressions!
__________________
<VeCo> is offline
fysiks
Veteran Member
Join Date: Sep 2007
Location: Flatland, USA
Old 06-12-2012 , 19:03   Re: [TUT] Regular Expressions
Reply With Quote #6

3 years too late for me . Nice addition Exolent.
__________________
fysiks is offline
Backstabnoob
Veteran Member
Join Date: Feb 2009
Location: Iwotadai Dorm
Old 06-13-2012 , 09:23   Re: [TUT] Regular Expressions
Reply With Quote #7

Thanks, now I have to go and read some regex tutorial, because finding the patterns in online libraries sucks.
__________________
Currently busy working on a very large scale anime database project.

Last edited by Backstabnoob; 06-13-2012 at 09:23.
Backstabnoob is offline
Exolent[jNr]
Veteran Member
Join Date: Feb 2007
Location: Tennessee
Old 06-13-2012 , 17:07   Re: [TUT] Regular Expressions
Reply With Quote #8

Added quick information on patterns and how to make them. Hope that helps.
__________________
No private work or selling mods.
Quote:
Originally Posted by xPaw View Post
I love you exolent!
Exolent[jNr] is offline
Backstabnoob
Veteran Member
Join Date: Feb 2009
Location: Iwotadai Dorm
Old 06-17-2012 , 19:21   Re: [TUT] Regular Expressions
Reply With Quote #9

Can you show me how would a pattern for cron format look? This is how it is in PHP:
Code:
/^((\*(\/[0-9]+)?)|[0-9\-\,\/]+)\s+((\*(\/[0-9]+)?)|[0-9\-\,\/]+)\s+((\*(\/[0-9]+)?)|[0-9\-\,\/]+)\s+((\*(\/[0-9]+)?)|[0-9\-\,\/]+)\s+((\*(\/[0-9]+)?)|[0-9\-\,\/]+)$/i
It's just too complex for me.

Also, can you split a string using a regex pattern in pawn?
__________________
Currently busy working on a very large scale anime database project.

Last edited by Backstabnoob; 06-17-2012 at 19:31.
Backstabnoob is offline
Old 06-17-2012, 19:22
Backstabnoob
This message has been deleted by Backstabnoob. Reason: lag
Exolent[jNr]
Veteran Member
Join Date: Feb 2007
Location: Tennessee
Old 06-17-2012 , 20:50   Re: [TUT] Regular Expressions
Reply With Quote #10

I'm not sure what exactly the cron format is, so I can't really tell you.
You can just use that pattern (separate pattern and flags when using it) in your code as is.
__________________
No private work or selling mods.
Quote:
Originally Posted by xPaw View Post
I love you exolent!
Exolent[jNr] is offline
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -4. The time now is 19:01.


Powered by vBulletin®
Copyright ©2000 - 2024, vBulletin Solutions, Inc.
Theme made by Freecode