I have binary file made of data packets that is serialized like this:
[Length][Payload][Length][Payload][Length][Payload]
The lengths are always 4 bytes and the value is variable, there is no specific pattern. The value of the length includes the 4 bytes of the [Length]
itself. I need to extract the byte position in the file of the first bytes of every [Length]
. For example:
00 00 02 00 FF FF FF FF FF ...00 01 3C E5 FF FF FF FF FF ...00 00 A5 90 FF FF FF FF FF ...^Need to save all these indexes
This works but it's slow, since I have more than 100k data packets per file:
data = mmap.mmap(filename.fileno(), 0, access=mmap.ACCESS_READ)fileSize = os.path.getsize(filename.name)address = 0addresses = []start = time.time()while address < fileSize: pkt_length = bytes2int(data[address:(address + 4)]) addresses.append(address) address += pkt_lengthend = time.time()print(len(addresses))print(end-start)
What can I use to do this faster?
EDIT 1Here is a performance example with a 4.2GB file:
4555945621.271047115325928
The RAM consumption goes pretty high too.