I am working with different quantized implementations of the same model, the main difference being the precision of the weights, biases, and activations. So I'd like to know how I can find the difference between the size of a model in MBs that's in say 32-bit floating point, and one that's in int8. I have the models saved in .PTH format.
↧