In my pursuit of mastering PyTorch neural networks, I've attempted to replicate an existing TensorFlow architecture. However, I've encountered a significant performance gap. While TensorFlow achieves rapid learning within 25 epochs, PyTorch requires at least 250 epochs for comparable generalization. Despite meticulous code scrutiny, I've been unable to identify further enhancements. Despite carefully aligning the architectures of both neural networks, disparities still persist. Can anyone shed light on what else might be amiss here?
In the subsequent section, I'll present the full Python code for both implementations, along with the CLI output and graphical visualization.
PyTorch code:
# Standard library importsimport pandas as pdimport matplotlib.pyplot as plt# External library importsimport torchfrom sklearn.model_selection import train_test_splitfrom sklearn.preprocessing import OneHotEncoder, MinMaxScaler, StandardScalerfrom sklearn.metrics import max_error, mean_absolute_error, mean_squared_error# Loading datasetdf_data = pd.read_csv("./data_inverter.csv", names=["pvt", "edge", "slew", "load", "delay"])# Selecting subset of data based on specific conditionsdf_select = df_data[(df_data["pvt"] == "PtypV1500T027") & (df_data["edge"] == "rise")]# Splitting features and target variableX = df_select.drop(["pvt", "edge", "delay"], axis='columns')y = df_select["delay"]# Scaling input features using Min-Max scalingslew_scaler = MinMaxScaler()load_scaler = MinMaxScaler()X_scaled = X.copy()X_scaled["slew"] = slew_scaler.fit_transform(X_scaled.slew.values.reshape(-1, 1))X_scaled["load"] = load_scaler.fit_transform(X_scaled.load.values.reshape(-1, 1))# Splitting data into training and testing setsX_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.1, random_state=42)# Converting data to PyTorch tensorsX_train_tensor = torch.FloatTensor(X_train.values)y_train_tensor = torch.FloatTensor(y_train.values).view(-1, 1)X_test_tensor = torch.FloatTensor(X_test.values)y_test_tensor = torch.FloatTensor(y_test.values).view(-1, 1)# Setting random seed for reproducibilitytorch.manual_seed(42)# Defining neural network architecturemodel = torch.nn.Sequential( torch.nn.Linear(X_train_tensor.shape[1], 128), torch.nn.ReLU(), torch.nn.Linear(128, 128), torch.nn.ReLU(), torch.nn.Linear(128, 64), torch.nn.ReLU(), torch.nn.Linear(64, 32), torch.nn.ReLU(), torch.nn.Linear(32, 16), torch.nn.ReLU(), torch.nn.Linear(16, 1), torch.nn.ELU())# Loss function and optimizercriterion = torch.nn.MSELoss()criterion_val = torch.nn.MSELoss()optimizer = torch.optim.Adam(model.parameters())# Training the modelnum_epochs = 25progress = {'loss': [], 'mae': [], 'mse': [], 'val_loss': [], 'val_mae': [], 'val_mse': []}for epoch in range(num_epochs): # Forward pass y_predict = model(X_train_tensor) loss = criterion(y_predict, y_train_tensor) # Backward and optimize loss.backward() optimizer.step() optimizer.zero_grad() # Validation with torch.no_grad(): model.eval() y_test_predict = model(X_test_tensor) loss_val = criterion_val(y_test_predict, y_test_tensor) model.train() # Record progress progress['loss'].append(loss.item()) progress['mae'].append(mean_absolute_error(y_train_tensor, y_predict.detach().numpy())) progress['mse'].append(mean_squared_error(y_train_tensor, y_predict.detach().numpy())) progress['val_loss'].append(loss_val.item()) progress['val_mae'].append(mean_absolute_error(y_test_tensor, y_test_predict.detach().numpy())) progress['val_mse'].append(mean_squared_error(y_test_tensor, y_test_predict.detach().numpy())) print("Epoch %i/%i - loss: %0.5F" % (epoch, num_epochs, loss.item()))# Displaying model summaryprint(model)# Plotting training progressdf_progress = pd.DataFrame(progress)df_progress.plot()plt.title("Model training progress: DNN PyTorch")plt.tight_layout()plt.show()# Making predictions on the testing setwith torch.no_grad(): model.eval() y_predict_tensor = model(X_test_tensor) y_predict = y_predict_tensor.numpy()# Displaying model performance metricsprint("Model performance metrics: DNN PyTorch")print("MAX error:", max_error(y_test_tensor, y_predict))print("MAE error:", mean_absolute_error(y_test_tensor, y_predict))print("MSE error:", mean_squared_error(y_test_tensor, y_predict, squared=False))plt.scatter(y_test, y_predict)plt.scatter(y_test, y_test, marker='.')plt.title("Model predictions: DNN PyTorch")plt.tight_layout()plt.show()TensorFlow code:
# Standard library importsimport pandas as pdimport matplotlib.pyplot as plt# External library importsimport tensorflow as tffrom sklearn.model_selection import train_test_splitfrom sklearn.preprocessing import OneHotEncoder, MinMaxScaler, StandardScalerfrom sklearn.metrics import max_error, mean_absolute_error, mean_squared_error# Loading datasetdf_data = pd.read_csv("./data_inverter.csv", names=["pvt", "edge", "slew", "load", "delay"])# Selecting subset of data based on specific conditionsdf_select = df_data[(df_data["pvt"] == "PtypV1500T027") & (df_data["edge"] == "rise")]# Splitting features and target variableX = df_select.drop(["pvt", "edge", "delay"], axis='columns')y = df_select["delay"]# Scaling input features using Min-Max scalingslew_scaler = MinMaxScaler()load_scaler = MinMaxScaler()X_scaled = X.copy()X_scaled["slew"] = slew_scaler.fit_transform(X_scaled.slew.values.reshape(-1, 1))X_scaled["load"] = load_scaler.fit_transform(X_scaled.load.values.reshape(-1, 1))# Splitting data into training and testing setsX_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.1, random_state=42)# Converting data to TensorFlow tensorsX_train_tensor = tf.constant(X_train.values, dtype=tf.float32)y_train_tensor = tf.constant(y_train.values, dtype=tf.float32)X_test_tensor = tf.constant(X_test.values, dtype=tf.float32)y_test_tensor = tf.constant(y_test.values, dtype=tf.float32)# Setting random seed for reproducibilitytf.keras.utils.set_random_seed(42)# Defining neural network architecturemodel = tf.keras.models.Sequential([ tf.keras.layers.Dense(128, activation='relu', input_dim=X_train_tensor.shape[1]), tf.keras.layers.Dense(128, activation='relu'), tf.keras.layers.Dense(64, activation='relu'), tf.keras.layers.Dense(32, activation='relu'), tf.keras.layers.Dense(16, activation='relu'), tf.keras.layers.Dense(1, activation='elu')])# Compiling the modelmodel.compile( loss=tf.keras.losses.MeanSquaredError(), # Using Mean Squared Error loss function optimizer=tf.keras.optimizers.Adam(), # Using Adam optimizer metrics=['mae', 'mse'] # Using Mean Absolute Error and Mean Squared Error as metrics)# Training the modelprogress = model.fit(X_train_tensor, y_train_tensor, validation_data=(X_test_tensor, y_test_tensor), epochs=25)# Evaluating model performance on the testing setmodel.evaluate(X_test_tensor, y_test_tensor, verbose=2)# Displaying model summaryprint(model.summary())# Plotting training progresspd.DataFrame(progress.history).plot()plt.title("Model training progress: DNN TensorFlow")plt.tight_layout()plt.show()# Making predictions on the testing sety_predict = model.predict(X_test_tensor)# Displaying model performance metricsprint("Model performance metrics: DNN TensorFlow")print("MAX error:", max_error(y_test_tensor, y_predict))print("MAE error:", mean_absolute_error(y_test_tensor, y_predict))print("MSE error:", mean_squared_error(y_test_tensor, y_predict, squared=False))plt.scatter(y_test, y_predict)plt.scatter(y_test, y_test, marker='.')plt.title("Model predictions: DNN TensorFlow")plt.tight_layout()plt.show()CLI output of PyTorch model performance metrics after 25 epochs:
Sequential( (0): Linear(in_features=2, out_features=128, bias=True) (1): ReLU() (2): Linear(in_features=128, out_features=128, bias=True) (3): ReLU() (4): Linear(in_features=128, out_features=64, bias=True) (5): ReLU() (6): Linear(in_features=64, out_features=32, bias=True) (7): ReLU() (8): Linear(in_features=32, out_features=16, bias=True) (9): ReLU() (10): Linear(in_features=16, out_features=1, bias=True) (11): ELU(alpha=1.0))Model performance metrics: DNN PyTorchMAX error: 1.2864852MAE error: 0.3353702MSE error: 0.42874745CLI output of TensorFlow model performance metrics after 25 epochs:
Model: "sequential"_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= dense (Dense) (None, 128) 384 dense_1 (Dense) (None, 128) 16512 dense_2 (Dense) (None, 64) 8256 dense_3 (Dense) (None, 32) 2080 dense_4 (Dense) (None, 16) 528 dense_5 (Dense) (None, 1) 17 =================================================================Total params: 27777 (108.50 KB)Trainable params: 27777 (108.50 KB)Non-trainable params: 0 (0.00 Byte)_________________________________________________________________None6/6 [==============================] - 0s 750us/stepModel performance metrics: DNN TensorFlowMAX error: 0.013849139MAE error: 0.0029576812MSE error: 0.0036013061PyTorch scatter plot (orange = target against itself, blue = target against prediction):
TensorFlow scatter plot (orange = target against itself, blue = target against prediction):
..............................................................................................................................................
Appending additional info (reacting to the questions and comments):
torch.optim.Adam - the default learning rate is set to 0.001.
tf.keras.optimizers.Adam - the default learning rate is set to 0.001
..............................................................................................................................................
PyTorch model performance after 250 epoch:
Sequential( (0): Linear(in_features=2, out_features=128, bias=True) (1): ReLU() (2): Linear(in_features=128, out_features=128, bias=True) (3): ReLU() (4): Linear(in_features=128, out_features=64, bias=True) (5): ReLU() (6): Linear(in_features=64, out_features=32, bias=True) (7): ReLU() (8): Linear(in_features=32, out_features=16, bias=True) (9): ReLU() (10): Linear(in_features=16, out_features=1, bias=True) (11): ELU(alpha=1.0))Model performance metrics: DNN PyTorchMAX error: 0.025619686MAE error: 0.006687804MSE error: 0.008531998



