Raised This Month: $95 Target: $400
 23% 

[stock]utf8 safe string cutter


Post New Thread Reply   
 
Thread Tools Display Modes
Author Message
javalia
Senior Member
Join Date: May 2009
Location: korea, republic of
Old 01-11-2013 , 15:53   [stock]utf8 safe string cutter
Reply With Quote #1

i made this function because GetClientName doesnt cuts string in utf8 safe way

Code:
new const UTF8MULTIBYTECHAR = (1 << 7);

//this function assumes string is a correctly encoded utf8 string that is cutted in not utf8 safe way.
stock terminateUTF8String(String:buffer[], const maxlength = -1){
	
	if(maxlength > 0){
	
		buffer[maxlength - 1] = '\0';
		
	}
	
	new length = strlen(buffer);
	new bytescounted = 0;
	
	if(length <= 0){
	
		return 0;
	
	}
	
	for(new i = length - 1; i >= 0; i--){
		
		if(UTF8MULTIBYTECHAR & buffer[i] == '\0'){

			return 0;//its a single byte character, we have nothing to do.

		}else{
		
			//j is not a good idea...
			for(new j = 1; j <= 7; j++){
			
				if((UTF8MULTIBYTECHAR >> j) & buffer[i] == '\0'){
				
					if(j == 1){
					
						//its part of multi byte character
						bytescounted++;
						break;
					
					}else{
					
						//its starting byte of multi byte character, so lets see if we readed enough amount of utf8 strings before and cut it if its not.
						if(bytescounted != (j - 1)){
						
							buffer[i] = '\0';
							
						}
						
						return 0;
					
					}
				
				}
			
			}
		
		}

	}
	
	return 0;

}
changed code a bit and tested and working.
__________________

Last edited by javalia; 01-12-2013 at 07:11.
javalia is offline
Powerlord
AlliedModders Donor
Join Date: Jun 2008
Location: Seduce Me!
Old 01-11-2013 , 16:57   Re: [stock]utf8 safe string cutter
Reply With Quote #2

I'm not sure exactly what all this does, but...

Code:
if(UTF8MULTIBYTECHAR & buffer[i] == '\0'){
Why not just use IsCharMB here?

Having said that, if you're dealing with GetClientName, the String you put it in should be MAX_NAME_LENGTH in size.
__________________
Am I back? Well, we'll see.

Last edited by Powerlord; 01-11-2013 at 16:59.
Powerlord is offline
javalia
Senior Member
Join Date: May 2009
Location: korea, republic of
Old 01-12-2013 , 03:52   Re: [stock]utf8 safe string cutter
Reply With Quote #3

ah, i forgot about that function, however, because of other bit operations, i still need that const.
so i will just leave that.


and, even u give string buffer with enough size that corresponds MAX_NAME_LENGTH, because game`s client name string itself is cutted in non utf8 safe way, it can return strings that r having a sliced multibyte character on end of string.
__________________
javalia is offline
javalia
Senior Member
Join Date: May 2009
Location: korea, republic of
Old 01-12-2013 , 08:53   Re: [stock]utf8 safe string cutter
Reply With Quote #4

eh...and that IsCharMB looks like not so good to handle bytes in String[].

p.s.
or is it ok? well, i have no idea. but i will leave that code because its not bad.

p.s.
well, the function seems like ok.
__________________

Last edited by javalia; 01-12-2013 at 09:07.
javalia is offline
Root_
Veteran Member
Join Date: Jan 2012
Location: ryssland
Old 01-18-2013 , 11:11   Re: [stock]utf8 safe string cutter
Reply With Quote #5

When I am trying to rename player with utf8 character(s), it prints corrupted result (symbols). Can your snippet fix that?
__________________


dodsplugins.com - Plugins and Resources for Day of Defeat
http://twitch.tv/zadroot
Root_ is offline
RedSword
SourceMod Plugin Approver
Join Date: Mar 2006
Location: Quebec, Canada
Old 01-18-2013 , 12:00   Re: [stock]utf8 safe string cutter
Reply With Quote #6

Quote:
Originally Posted by Root_ View Post
When I am trying to rename player with utf8 character(s), it prints corrupted result (symbols). Can your snippet fix that?
Could you give an example on how to reproduce the problem ? Could that be a file encoding problem on your side ?
__________________
My plugins :
Red Maze
Afk Bomb
RAWR (per player/rounds Awp Restrict.)
Kill Assist
Be Medic

You can also Donate if you appreciate my work
RedSword is offline
Root_
Veteran Member
Join Date: Jan 2012
Location: ryssland
Old 01-18-2013 , 12:09   Re: [stock]utf8 safe string cutter
Reply With Quote #7

I used default playercommands plugin. Ie sm_rename #userid|name, which is containing UTF8 symbols (see russian alphabed)
It showed ********* instead of correct symbol
__________________


dodsplugins.com - Plugins and Resources for Day of Defeat
http://twitch.tv/zadroot
Root_ is offline
GoD-Tony
Veteran Member
Join Date: Jul 2005
Old 05-11-2013 , 15:57   Re: [stock]utf8 safe string cutter
Reply With Quote #8

Quote:
Originally Posted by javalia View Post
i made this function because GetClientName doesnt cuts string in utf8 safe way
I ran into this same problem today, thanks. Here is a modified version I use which calls IsCharMB:
PHP Code:
stock TerminateNameUTF8(String:name[])
{
    new 
len strlen(name);
    
    for (new 
0leni++)
    {
        new 
bytes IsCharMB(name[i]);
        
        if (
bytes 1)
        {
            if (
len bytes)
            {
                
name[i] = '\0';
                return;
            }
            
            
+= bytes 1;
        }
    }

Apparently Steam allows more bytes in a name than the game does, so you can end up with a half-UTF8 character at the end of a GetClientName call. MySQL errors out when attempting to Insert these strings. Very useful fix in this case.
__________________
GoD-Tony is offline
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -4. The time now is 11:32.


Powered by vBulletin®
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Theme made by Freecode