Raised This Month: $51 Target: $400
 12% 

Solved Convert UTF-16 LE to UTF-8


Post New Thread Reply   
 
Thread Tools Display Modes
Author Message
Dragokas
Veteran Member
Join Date: Nov 2017
Location: Ukraine on fire
Old 05-13-2021 , 12:32   Convert UTF-16 LE to UTF-8
Reply With Quote #1

Hi,

Is there a ready code snippet or something built-in?

I'd like to read localization files, like resource\l4d360ui_english.txt

Thanks.

--
Related topics:
- Bug 5011 - Add native to lookup phrases in valve's translation files
- Getting built-in (valve_*.txt) Translations
- Reverse UTF-8 String! Arabic Chat!
__________________
Expert of CMD/VBS/VB6. Malware analyst. L4D fun (Bloody Witch & FreeZone)
[My plugins] [My tools] [GitHub] [Articles] [HiJackThis+] [Donate]

Last edited by Dragokas; 05-13-2021 at 21:21.
Dragokas is offline
nosoop
Veteran Member
Join Date: Aug 2014
Old 05-13-2021 , 18:20   Re: Convert UTF-16 LE to UTF-8
Reply With Quote #2

There's the UCS-2 parsing logic in Doctor McKay's Enhanced Items plugin.
__________________
I do TF2, TF2 servers, and TF2 plugins.
I don't do DMs over Discord -- PM me on the forums regarding inquiries.
AlliedModders Releases / Github / TF2 Server / Donate (BTC / BCH / coffee)
nosoop is offline
Dragokas
Veteran Member
Join Date: Nov 2017
Location: Ukraine on fire
Old 05-13-2021 , 19:02   Re: Convert UTF-16 LE to UTF-8
Reply With Quote #3

Thanks, nosoop.
That works almost good.
Just incorrect handle surrogate pairs. I finished studying algo. Going to fix.
__________________
Expert of CMD/VBS/VB6. Malware analyst. L4D fun (Bloody Witch & FreeZone)
[My plugins] [My tools] [GitHub] [Articles] [HiJackThis+] [Donate]
Dragokas is offline
Dragokas
Veteran Member
Join Date: Nov 2017
Location: Ukraine on fire
Old 05-13-2021 , 21:17   Re: Convert UTF-16 LE to UTF-8
Reply With Quote #4

Ok, after more research / tests, fixed those:

- fixed 2 & 3 bytes character codes handled incorrectly
- added compatibility with surrogate pairs
- fixed missing file's CloseHandle

Not sure, how actual, Doctor McKay seems not active, anyway, created a pull request.

New code is here.

More improvements are welcomed.

The topic is seems solved.

--
Also, made a scheme, as a helper to understand the stuff (in attachment).

References:
https://en.wikipedia.org/wiki/UTF-8
https://en.wikipedia.org/wiki/UTF-16
https://docs.microsoft.com/en-us/dot...g-introduction
https://habr.com/ru/post/544084/
https://unicode.org/faq/utf_bom.html
Attached Thumbnails
Click image for larger version

Name:	utf-8.png
Views:	79
Size:	25.1 KB
ID:	189263   Click image for larger version

Name:	utf-16.png
Views:	66
Size:	22.6 KB
ID:	189264  
Attached Files
File Type: zip Unicode-scheme-xls.zip (4.5 KB, 47 views)
__________________
Expert of CMD/VBS/VB6. Malware analyst. L4D fun (Bloody Witch & FreeZone)
[My plugins] [My tools] [GitHub] [Articles] [HiJackThis+] [Donate]

Last edited by Dragokas; 05-13-2021 at 21:21.
Dragokas is offline
Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -4. The time now is 19:20.


Powered by vBulletin®
Copyright ©2000 - 2024, vBulletin Solutions, Inc.
Theme made by Freecode