I want to investigate how do UTF16 and UTF32 work by looking specific characters in binary.
I know we can user"string".encoding("(encoding name)")
to check its hex value in specific encoding and it works fine with UTF8.
but when it comes to UTF16 or 32, I found the result is different from the encodnig value it supposed to be.
for example, the first letter "あ" in Japanese, accordting to https://www.compart.com/en/unicode/U+3042the hex value of UTF8,16,32 areE38182, 3042, 00003042
so if I execute the following code
print("あ".encode('utf-8'))print("あ".encode('utf-16BE'))print("あ".encode('utf-32BE'))
I will get
b'\xe3\x81\x82'b'0B'b'\x00\x000B'
as you can see, utf8 is identical with the code table, but 16 and 32 are wired...No idea how can 000B convert to 3042, do I misunderstand something of the encode method?