AlliedModders

AlliedModders (https://forums.alliedmods.net/index.php)
-   Snippets and Tutorials (https://forums.alliedmods.net/forumdisplay.php?f=112)
-   -   [stock]utf8 safe string cutter (https://forums.alliedmods.net/showthread.php?t=205524)

javalia 01-11-2013 14:53

[stock]utf8 safe string cutter
 
i made this function because GetClientName doesnt cuts string in utf8 safe way

Code:


new const UTF8MULTIBYTECHAR = (1 << 7);

//this function assumes string is a correctly encoded utf8 string that is cutted in not utf8 safe way.
stock terminateUTF8String(String:buffer[], const maxlength = -1){
       
        if(maxlength > 0){
       
                buffer[maxlength - 1] = '\0';
               
        }
       
        new length = strlen(buffer);
        new bytescounted = 0;
       
        if(length <= 0){
       
                return 0;
       
        }
       
        for(new i = length - 1; i >= 0; i--){
               
                if(UTF8MULTIBYTECHAR & buffer[i] == '\0'){

                        return 0;//its a single byte character, we have nothing to do.

                }else{
               
                        //j is not a good idea...
                        for(new j = 1; j <= 7; j++){
                       
                                if((UTF8MULTIBYTECHAR >> j) & buffer[i] == '\0'){
                               
                                        if(j == 1){
                                       
                                                //its part of multi byte character
                                                bytescounted++;
                                                break;
                                       
                                        }else{
                                       
                                                //its starting byte of multi byte character, so lets see if we readed enough amount of utf8 strings before and cut it if its not.
                                                if(bytescounted != (j - 1)){
                                               
                                                        buffer[i] = '\0';
                                                       
                                                }
                                               
                                                return 0;
                                       
                                        }
                               
                                }
                       
                        }
               
                }

        }
       
        return 0;

}

changed code a bit and tested and working.

Powerlord 01-11-2013 15:57

Re: [stock]utf8 safe string cutter
 
I'm not sure exactly what all this does, but...

Code:

if(UTF8MULTIBYTECHAR & buffer[i] == '\0'){
Why not just use IsCharMB here?

Having said that, if you're dealing with GetClientName, the String you put it in should be MAX_NAME_LENGTH in size.

javalia 01-12-2013 02:52

Re: [stock]utf8 safe string cutter
 
ah, i forgot about that function, however, because of other bit operations, i still need that const.
so i will just leave that.


and, even u give string buffer with enough size that corresponds MAX_NAME_LENGTH, because game`s client name string itself is cutted in non utf8 safe way, it can return strings that r having a sliced multibyte character on end of string.

javalia 01-12-2013 07:53

Re: [stock]utf8 safe string cutter
 
eh...and that IsCharMB looks like not so good to handle bytes in String[].

p.s.
or is it ok? well, i have no idea. but i will leave that code because its not bad.

p.s.
well, the function seems like ok.

Root_ 01-18-2013 10:11

Re: [stock]utf8 safe string cutter
 
When I am trying to rename player with utf8 character(s), it prints corrupted result (symbols). Can your snippet fix that?

RedSword 01-18-2013 11:00

Re: [stock]utf8 safe string cutter
 
Quote:

Originally Posted by Root_ (Post 1875596)
When I am trying to rename player with utf8 character(s), it prints corrupted result (symbols). Can your snippet fix that?

Could you give an example on how to reproduce the problem ? Could that be a file encoding problem on your side ?

Root_ 01-18-2013 11:09

Re: [stock]utf8 safe string cutter
 
I used default playercommands plugin. Ie sm_rename #userid|name, which is containing UTF8 symbols (see russian alphabed)
It showed ********* instead of correct symbol

GoD-Tony 05-11-2013 14:57

Re: [stock]utf8 safe string cutter
 
Quote:

Originally Posted by javalia (Post 1871699)
i made this function because GetClientName doesnt cuts string in utf8 safe way

I ran into this same problem today, thanks. Here is a modified version I use which calls IsCharMB:
PHP Code:

stock TerminateNameUTF8(String:name[])
{
    new 
len strlen(name);
    
    for (new 
0leni++)
    {
        new 
bytes IsCharMB(name[i]);
        
        if (
bytes 1)
        {
            if (
len bytes)
            {
                
name[i] = '\0';
                return;
            }
            
            
+= bytes 1;
        }
    }


Apparently Steam allows more bytes in a name than the game does, so you can end up with a half-UTF8 character at the end of a GetClientName call. MySQL errors out when attempting to Insert these strings. Very useful fix in this case.


All times are GMT -4. The time now is 13:42.

Powered by vBulletin®
Copyright ©2000 - 2024, vBulletin Solutions, Inc.