My task is to use a VAE model for binary classification. The encoder part will use an LSTM model, while the decoder will use an MLP. My data is time series data, which can be seen as 20 input features and 1 output target (0 or 1).
- First, I used a standalone LSTM model for classification, and the loss converged.
- Then, I used an AE model, where the encoder is LSTM and the decoder is MLP, and the model coverage was good.
- However, when I used the VAE model, my reconstructed loss remained around 0.69, which basically means the model did not learn anything but flip the coins, at the same time, my KL divergence decreased to a relatively small number. So, I suspect there might be a problem in calculating mu and log var.
class LSTMEncoder(nn.Module): def __init__(self, input_size, hidden_size, num_layers, latent_size): super(LSTMEncoder, self).__init__() self.num_layers = num_layers self.hidden_size = hidden_size self.latent_size = latent_size self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True, dropout=0.2) self.fc_mu = nn.Linear(hidden_size, latent_size) self.fc_logvar = nn.Linear(hidden_size, latent_size) def forward(self, x): batch_size = x.size(0) h0 = torch.zeros(self.num_layers, batch_size, self.hidden_size).to(x.device) c0 = torch.zeros(self.num_layers, batch_size, self.hidden_size).to(x.device) out, _ = self.lstm(x, (h0, c0)) out = out[:, -1, :] mu = self.fc_mu(out) logvar = self.fc_logvar(out) return mu, logvarclass MLPDecoder(nn.Module): def __init__(self, input_size, hidden_size, output_size): super(MLPDecoder, self).__init__() self.fc1 = nn.Linear(input_size, hidden_size) self.act1 = nn.LeakyReLU() self.fc2 = nn.Linear(hidden_size, hidden_size) self.act2 = nn.LeakyReLU() self.fc3 = nn.Linear(hidden_size, output_size) self.act3 = nn.Sigmoid() def forward(self, x): x = self.fc1(x) x = self.act1(x) x = self.fc2(x) x = self.act2(x) x = self.fc3(x) x = self.act3(x) return xclass VAE(nn.Module): def __init__(self, input_size, hidden_size_encoder, latent_size, hidden_size_decoder, output_size, num_layers): super(VAE, self).__init__() self.encoder = LSTMEncoder(input_size, hidden_size_encoder, num_layers, latent_size) self.decoder = MLPDecoder(latent_size, hidden_size_decoder, output_size) def reparameterize(self, mu, logvar): if self.training: std = torch.exp(0.5 * logvar) eps = torch.randn_like(std) return eps.mul(std).add_(mu) else: return mu def forward(self, x): mu, logvar = self.encoder(x) z = self.reparameterize(mu, logvar) decoded = self.decoder(z) return decoded, mu, logvardef vae_loss(recon_x, x, mu, logvar): reconstruction_loss = F.binary_cross_entropy(recon_x, x, reduction='mean') kl_divergence_loss = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp()) return reconstruction_loss + kl_divergence_loss, reconstruction_loss, kl_divergence_lossI've tried to change the hyperparameters, including hidden and latent dimensions, learning rate adjustments, adding more layers to the MLP or LSTM, and incorporating regularization techniques like L1 and L2. However, none of these adjustments have resulted in improvement.
I want to understand why this issue occurred. My goal is to achieve convergence with this model so that I can use it for my binary classification task.