I found out that some of my '.wav' files are badly written.Given the following comparison between 'corrupted_file.wav' and 'ok_file.wav', this is what I get when i try to read 'corrupted_file.wav' using standard libraries such soundfile, wave, librosa etc.
RuntimeError: Error opening 'corrupted_file.wav': File contains data in an unknown format.
So I tried to understand what was the issue using:
with open('corrupted_file.wav', 'rb') as audiofile: corrupted_f = audiofile.read()with open('ok_file.wav', 'rb') as audiofile: ok_f = audiofile.read()print(corrupted_f[:40])print(ok_f[:40])
That's what I get:
b'\xff\xfb\x90\xc4\x00\x03\x12\xa9\xa3\x16g\xb0\xc9B\xf4\xb4e\xcd\x94\x9a8\x00\x12\x93\x95\xc5F~\x1e\xa71\xd2q\x18\xa58\xeb\x01\x82\x19'b'RIFF$`\x08\x00WAVEfmt \x10\x00\x00\x00\x01\x00\x01\x00\x80\xbb\x00\x00\x00\xee\x02\x00\x04\x00\x10\x00data'
AS you can see, 'corrupted_file.wav' does not satisfy WAVE standards as it does not present relevant chunks as 'RIFF', 'WAVEfmt' and 'data'.by the way Windows 10 is able to play it with its internal application.If I use a standard audio converter to export 'corrupted_file.wav' as a WAVE file, I get 'converted_corrupted_file.wav' whose representation becomes:
b'RIFFF\x16\x11\x00WAVEfmt \x10\x00\x00\x00\x01\x00\x02\x00D\xac\x00\x00\x10\xb1\x02\x00\x04\x00\x10\x00LIST\x1a\x00\x00\x00INFOISFT\x0e\x00\x00\x00Lavf59.27.100\x00data\x00\x16\x11\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\...
Which for me is not easily linkable to the corrupted one, so I cannot recovery it with a homemade function.How can I automatically recovery 'corrupted_file.wav' with a python written function? I need it as the amount of corrupted files is thousands.
UPDATEI checked the file's format using several online tools and the result it's always the same: "does not match any of the known formats". But windows 10 is still able to decode and play it so I guess there shoulkd be a trick