AlliedModders

AlliedModders (https://forums.alliedmods.net/index.php)
-   Scripting (https://forums.alliedmods.net/forumdisplay.php?f=107)
-   -   Can't compile regex expression. (https://forums.alliedmods.net/showthread.php?t=309240)

T1MOXA 07-18-2018 15:34

Can't compile regex expression.
 
Code:

#include <regex>
public void OnPluginStart() {
    PrintToServer("%i", CompileRegex("[\x{D800}-\x{DBFF}]", PCRE_CASELESS));
}

Return: 0

What am I doing wrong ?

UPD.
the same proplem with "[\\x{D800}-\\x{DBFF}]" expression.

Dr!fter 07-18-2018 16:33

Re: Can't compile regex expression.
 
How about you, idk, use the error params...

PHP Code:

Regex CompileRegex(const char[] patternint flagschar[] errorint maxLenRegexErrorerrcode

Ill just add, that you should use SM 1.9 for those (since they were broken previously)

T1MOXA 07-18-2018 16:47

Re: Can't compile regex expression.
 
Quote:

Originally Posted by Dr!fter (Post 2604638)
How about you, idk, use the error params...

PHP Code:

Regex CompileRegex(const char[] patternint flagschar[] errorint maxLenRegexErrorerrcode

Ill just add, that you should use SM 1.9 for those (since they were broken previously)

Thanks...

PHP Code:

#include <regex>

public void OnPluginStart() {
    
char sError[128];
    
RegexError rError;
    
Handle hExpression CompileRegex("[\\x{D800}-\\x{DBFF}]"PCRE_UCPsErrorsizeof(sError), rError);
    
PrintToServer("Regex: %i | Error: %s | RegexError: %i"hExpressionsErrorrError);


Regex: 0 | Error: this version of PCRE is not compiled with Unicode property support | RegexError: 16

sm exts list
Regex (1.9.0.6241): Provides regex natives for plugins

With PCRE_CASELESS flag:
Regex: 0 | Error: character value in \x{...} sequence is too large | RegexError: 3

asherkin 07-18-2018 18:22

Re: Can't compile regex expression.
 
The root of your problem seems to be:

Quote:

Constraints on character values

Characters that are specified using octal or hexadecimal numbers are limited to certain values, as follows:

8-bit non-UTF mode less than 0x100
8-bit UTF-8 mode less than 0x10ffff and a valid codepoint
16-bit non-UTF mode less than 0x10000
16-bit UTF-16 mode less than 0x10ffff and a valid codepoint
32-bit non-UTF mode less than 0x100000000
32-bit UTF-32 mode less than 0x10ffff and a valid codepoint

Invalid Unicode codepoints are the range 0xd800 to 0xdfff (the so-called "surrogate" codepoints), and 0xffef.
https://www.pcre.org/original/doc/html/pcrepattern.html

T1MOXA 07-18-2018 20:44

Re: Can't compile regex expression.
 
Thanks, but I'm not sure what it gives me...
By the way, here this expression works as it should.
From your answer I understood that most likely to do what I want using regex is not possible ?
Perhaps there is an alternative option as find emoticons in the text and remove their ?

Dr!fter 07-19-2018 11:04

Re: Can't compile regex expression.
 
Regex re = new Regex("[\\pSo]", PCRE_UTF8, error, sizeof(error), iError);

Seems to work here is a list of what it captures.
https://www.fileformat.info/info/uni...ry/So/list.htm

T1MOXA 07-20-2018 20:36

Re: Can't compile regex expression.
 
Unfortunately, this rule removes too many characters.
I am writing this regular expression to remove those characters from nicknames which do not properly send to the MySQL server.

To avoid such errors:
Incorrect string value: '\xF0\x9F\x98\x88 6...' for column 'name' at row 1

asherkin 07-20-2018 20:40

Re: Can't compile regex expression.
 
You should really consider switching to the utf8mb4 charset.

T1MOXA 07-20-2018 22:11

Re: Can't compile regex expression.
 
Then I'll have to rewrite all plugins and web interfaces to a different encoding...
And in general it is interesting why this rule does not work in sourcemod but it works on the regex101.

asherkin 07-21-2018 05:38

Re: Can't compile regex expression.
 
No you won’t, the input/output is still UTF-8, it just allows characters outside of the BMP to be stored.

T1MOXA 07-23-2018 06:31

Re: Can't compile regex expression.
 
Then I get errors like this:
Illegal mix of collations (utf8mb4_general_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '='

...


Although it seems that this problem is due to incorrect database encoding...

T1MOXA 08-22-2018 18:23

Re: Can't compile regex expression.
 
Today I encountered an error "Incorrect string value: '\xF0\x9F\x92\x8B\xF0\x9F...' for column 'name' at row 1".
The database is encoded utf8mb4_general_ci.
The column too.
What else could be the problem ?

Fyren 08-22-2018 19:19

Re: Can't compile regex expression.
 
That error appears to mean you have bytes that can't be a valid UTF8 character.

T1MOXA 08-23-2018 05:06

Re: Can't compile regex expression.
 
Quote:

Originally Posted by asherkin (Post 2605175)
You should really consider switching to the utf8mb4 charset.

Quote:

Originally Posted by asherkin (Post 2605226)
No you won’t, the input/output is still UTF-8, it just allows characters outside of the BMP to be stored.

I did just that.
Just changed the encoding in the database.

Fyren 08-23-2018 18:44

Re: Can't compile regex expression.
 
If you're sure you've set the table and column charsets, in your plugin you can try (SQL_)SetCharset if using 1.10 or maybe doing SET NAMES 'utf8mb4' if not to set the connection's charset.

T1MOXA 08-29-2018 15:56

Re: Can't compile regex expression.
 
Quote:

Originally Posted by Fyren (Post 2611894)
If you're sure you've set the table and column charsets, in your plugin you can try (SQL_)SetCharset if using 1.10 or maybe doing SET NAMES 'utf8mb4' if not to set the connection's charset.

Yes, this probably would solve the problem, but this option is not suitable because too many plugins use the database, and will have to rewrite everything.
It is easier to remove the symbol with a regular expression, but for some reason it does not work on SM.
I am interested to hear @asherkin

T1MOXA 09-16-2018 20:05

Re: Can't compile regex expression.
 
Thank you, not knew about this rule.
Okay, how do I solve my problem ?
@asherkin not responding.

T1MOXA 09-16-2018 20:07

Re: Can't compile regex expression.
 
I have 2 problems at the moment.
1. Regex expresion to remove emoji not working
2. utf8mb4_general_ci encoding didn't help


database charset - utf8mb4
database collation - utf8mb4_general_ci
table collation - utf8mb4_general_ci
row collation - utf8mb4_general_ci

What's wrong ?


All times are GMT -4. The time now is 02:39.

Powered by vBulletin®
Copyright ©2000 - 2022, vBulletin Solutions, Inc.