I have the following code where I multiply tensor X by a matrix C. Depending on the size of X and whether C is attached to the computation graph, I get different results when I compare batched multiplication vs looping over each slice of X.
import torchfrom torch import nnfor X,C in [(torch.rand(8, 50, 32), nn.Parameter(torch.randn(32,32))), (torch.rand(16, 50, 32), nn.Parameter(torch.randn(32,32))), (torch.rand(8, 50, 32), nn.Parameter(torch.randn(32,32)).detach()) ]: #multiply each entry A = torch.empty_like(X) for t in range(X.shape[1]): A[:,t,:] = (C @ X[:,t,:].unsqueeze(-1)).squeeze(-1) #multiply in batch A1 = (C @ X.unsqueeze(-1)).squeeze(-1) print('equal:', (A1 == A).all().item(), ', close:', torch.allclose(A1, A))Returns
equal: False , close: Falseequal: True , close: Trueequal: True , close: TrueWhat's going on? I expect them to be equal in all three cases.
For reference,
import sys, platformprint('OS:', platform.platform())print('Python:', sys.version)print('Pytorch:', torch.__version__)gives:
OS: macOS-14.4.1-arm64-arm-64bitPython: 3.12.1 | packaged by conda-forge | (main, Dec 23 2023, 08:01:35) [Clang 16.0.6 ]Pytorch: 2.2.0