Quantcast
Viewing all articles
Browse latest Browse all 14040

encode method for UTF16 and UTF32 in Python [duplicate]

I want to investigate how do UTF16 and UTF32 work by looking specific characters in binary.

I know we can user"string".encoding("(encoding name)") to check its hex value in specific encoding and it works fine with UTF8.

but when it comes to UTF16 or 32, I found the result is different from the encodnig value it supposed to be.

for example, the first letter "あ" in Japanese, accordting to https://www.compart.com/en/unicode/U+3042the hex value of UTF8,16,32 areE38182, 3042, 00003042

so if I execute the following code

print("あ".encode('utf-8'))print("あ".encode('utf-16BE'))print("あ".encode('utf-32BE'))

I will get

b'\xe3\x81\x82'b'0B'b'\x00\x000B'

as you can see, utf8 is identical with the code table, but 16 and 32 are wired...No idea how can 000B convert to 3042, do I misunderstand something of the encode method?


Viewing all articles
Browse latest Browse all 14040

Trending Articles