View Full Version : Regex to match Cyrillic letters
Pelipoika
03-16-2016, 09:41
How to match them (https://en.wikipedia.org/wiki/List_of_Cyrillic_letters)
A single cyrillic character could be probably matched with
[\u0400-\u04FF]
according to this page: https://en.wikipedia.org/wiki/Cyrillic_script_in_Unicode
Pelipoika
03-16-2016, 10:20
A single cyrillic character could be probably matched with
[\u0400-\u04FF]
according to this page: https://en.wikipedia.org/wiki/Cyrillic_script_in_Unicode
Regex rgx = new Regex("[\0400-\04FF]"); = error 043: character constant exceeds range for packed string
Try
[\\u0400-\\u04FF]
I don't know if Regex engine is going to parse that properly though. I haven't done much stuff with Regex.
Pelipoika
03-16-2016, 10:29
Try
[\\u0400-\\u04FF]
I don't know if Regex engine is going to parse that properly though. I haven't done much stuff with Regex.
Didn't like that either:
error 010: invalid function or declaration
error 008: must be a constant expression; assumed zero
That doesn't seem right. Did you even put that under quotation marks? Post that line (maybe a few more around).
Pelipoika
03-16-2016, 10:41
That doesn't seem right. Did you even put that under quotation marks? Post that line (maybe a few more around).
public void OnPluginStart()
{
HookUserMessage(GetUserMessageId("SayText2"), UsrMsg_SayText2, true);
}
Regex rgx = new Regex("[\\u0400-\\u04FF]");
public Action UsrMsg_SayText2(UserMsg msg_id, Handle msg, const int[] players, int playersNum, bool reliable, bool init)
{
char params[4][64];
int client = BfReadByte(msg);
for (int i = 0; i < 4; i++)
BfReadString(msg, params[i], sizeof(params[]));
if(rgx.Match(params[1]))
{
}
}
The initialization has to be done inside a function body, you can't just leave it outside like that.
The initialization has to be done inside a function body, you can't just leave it outside like that.
yeah
and if you insist on making the regex a global, you can declare it outside a function and then 'regex = new Regex();' inside one
Pelipoika
03-16-2016, 13:21
The initialization has to be done inside a function body, you can't just leave it outside like that.
That compiled but
Regex rgx = new Regex("[\\u0400-\\u04FF]");
if(rgx != INVALID_HANDLE)
{
if(rgx.Match(params[1]))
PrintToServer("%N", client);
}
else
PrintToServer("Invalid handle");
always prints invalid handle
maybe needed flag PCRE_UTF8 for Unicode?Regex rgx = new Regex("[\\u0400-\\u04FF]", PCRE_UTF8);
Pelipoika
03-17-2016, 05:36
maybe needed flag PCRE_UTF8 for Unicode?Regex rgx = new Regex("[\\u0400-\\u04FF]", PCRE_UTF8);
Still an invalid handle
I think that's because the Regex pattern couldn't be compiled. You can check that by retrieving the error while constructing the handle, as you can see here:
public native Regex(const char[] pattern, int flags = 0, char[] error="", int maxLen = 0, RegexError &errcode = REGEX_ERROR_NONE);
just put in some more arguments in the constructor.
That being said, I don't know if there is a way to achieve that with strings being made of byte-long characters. Maybe splitting unicode characters up in 2 characters? I don't know how would that work, but try the following:
Regex rgx = new Regex("[\xd0\x80-\xd3\xbf]", PCRE_UTF8);
Character hex codes are according to this page: http://www.utf8-chartable.de/unicode-utf8-table.pl?start=1024.
Pelipoika
03-17-2016, 06:41
That being said, I don't know if there is a way to achieve that with strings being made of byte-long characters. Maybe splitting unicode characters up in 2 characters? I don't know how would that work, but try the following:
Regex rgx = new Regex("[\xd0\x80-\xd3\xbf]", PCRE_UTF8);
Character hex codes are according to this page: http://www.utf8-chartable.de/unicode-utf8-table.pl?start=1024.
That works, thank you
vBulletin® v3.8.7, Copyright ©2000-2024, vBulletin Solutions, Inc.