I'm running a script that walks through a large library of .flac music, making a mirror library with the same structure but converted to .opus. I'm doing this on Windows 11, so I believe the source filenames are all in UTF-16. The script calls FFMPEG to do the converting.
For some reason, uncommon characters keep getting converted to different but similar characters when the script runs, for example:
06 xXXi_wud_nvrstøp_ÜXXx.flacgets converted to:
06 xXXi_wud_nvrstøp_ÜXXx.opusThey look almost identical, but the Ü and I believe also the ø are technically slightly different characters before and after the conversion.
The function which calls FFMPEG for the conversion looks like this:
def convert_file(pool, top, file): fullPath = os.path.join(top, file) # Pass count=1 to str.replace() just in case .flac is in the song # title or something. newPath = fullPath.replace(src_dir, dest_dir, 1) newPath = newPath.replace(".flac", ".opus", 1) if os.path.isfile(newPath): return None else: print("{} does not exist".format(newPath)) cvt = ["Ffmpeg", "-v", "debug", "-i", fullPath, "-c:a", "libopus", "-b:a", "96k", newPath] print(cvt) return ( fullPath, pool.apply_async(subprocess.run, kwds={"args": cvt,"check": True,"stdin": subprocess.DEVNULL}))The arguments are being supplied by os.walk with no special parameters.
Given that the script is comparing filenames to check if a conversion needs to happen, and the filenames keep getting changed, it keeps destroying and recreating the same files every time the script runs.
Why might this be happening?
edit: I have confirmed that if I manually execute the FFMPEG command in CMD, bypassing Python completely, it converts the original Unicode 0125 + 01410 to 0334 (U with umlaut), so this doesn't seem to be a Python problem.