Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 23276

Investigating discrepancies in TensorFlow and PyTorch performance

$
0
0

In my pursuit of mastering PyTorch neural networks, I've attempted to replicate an existing TensorFlow architecture. However, I've encountered a significant performance gap. While TensorFlow achieves rapid learning within 25 epochs, PyTorch requires at least 250 epochs for comparable generalization. Despite meticulous code scrutiny, I've been unable to identify further enhancements. Despite carefully aligning the architectures of both neural networks, disparities still persist. Can anyone shed light on what else might be amiss here?

In the subsequent section, I'll present the full Python code for both implementations, along with the CLI output and graphical visualization.

PyTorch code:

# Standard library importsimport pandas as pdimport matplotlib.pyplot as plt# External library importsimport torchfrom sklearn.model_selection import train_test_splitfrom sklearn.preprocessing import OneHotEncoder, MinMaxScaler, StandardScalerfrom sklearn.metrics import max_error, mean_absolute_error, mean_squared_error# Loading datasetdf_data = pd.read_csv("./data_inverter.csv", names=["pvt", "edge", "slew", "load", "delay"])# Selecting subset of data based on specific conditionsdf_select = df_data[(df_data["pvt"] == "PtypV1500T027") & (df_data["edge"] == "rise")]# Splitting features and target variableX = df_select.drop(["pvt", "edge", "delay"], axis='columns')y = df_select["delay"]# Scaling input features using Min-Max scalingslew_scaler = MinMaxScaler()load_scaler = MinMaxScaler()X_scaled = X.copy()X_scaled["slew"] = slew_scaler.fit_transform(X_scaled.slew.values.reshape(-1, 1))X_scaled["load"] = load_scaler.fit_transform(X_scaled.load.values.reshape(-1, 1))# Splitting data into training and testing setsX_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.1, random_state=42)# Converting data to PyTorch tensorsX_train_tensor = torch.FloatTensor(X_train.values)y_train_tensor = torch.FloatTensor(y_train.values).view(-1, 1)X_test_tensor = torch.FloatTensor(X_test.values)y_test_tensor = torch.FloatTensor(y_test.values).view(-1, 1)# Setting random seed for reproducibilitytorch.manual_seed(42)# Defining neural network architecturemodel = torch.nn.Sequential(    torch.nn.Linear(X_train_tensor.shape[1], 128),    torch.nn.ReLU(),    torch.nn.Linear(128, 128),    torch.nn.ReLU(),    torch.nn.Linear(128, 64),    torch.nn.ReLU(),    torch.nn.Linear(64, 32),    torch.nn.ReLU(),    torch.nn.Linear(32, 16),    torch.nn.ReLU(),    torch.nn.Linear(16, 1),    torch.nn.ELU())# Loss function and optimizercriterion = torch.nn.MSELoss()criterion_val = torch.nn.MSELoss()optimizer = torch.optim.Adam(model.parameters())# Training the modelnum_epochs = 25progress = {'loss': [], 'mae': [], 'mse': [], 'val_loss': [], 'val_mae': [], 'val_mse': []}for epoch in range(num_epochs):    # Forward pass    y_predict = model(X_train_tensor)    loss = criterion(y_predict, y_train_tensor)    # Backward and optimize    loss.backward()    optimizer.step()    optimizer.zero_grad()    # Validation    with torch.no_grad():        model.eval()        y_test_predict = model(X_test_tensor)        loss_val = criterion_val(y_test_predict, y_test_tensor)    model.train()    # Record progress    progress['loss'].append(loss.item())    progress['mae'].append(mean_absolute_error(y_train_tensor, y_predict.detach().numpy()))    progress['mse'].append(mean_squared_error(y_train_tensor, y_predict.detach().numpy()))    progress['val_loss'].append(loss_val.item())    progress['val_mae'].append(mean_absolute_error(y_test_tensor, y_test_predict.detach().numpy()))    progress['val_mse'].append(mean_squared_error(y_test_tensor, y_test_predict.detach().numpy()))    print("Epoch %i/%i   -   loss: %0.5F" % (epoch, num_epochs, loss.item()))# Displaying model summaryprint(model)# Plotting training progressdf_progress = pd.DataFrame(progress)df_progress.plot()plt.title("Model training progress: DNN PyTorch")plt.tight_layout()plt.show()# Making predictions on the testing setwith torch.no_grad():    model.eval()    y_predict_tensor = model(X_test_tensor)    y_predict = y_predict_tensor.numpy()# Displaying model performance metricsprint("Model performance metrics: DNN PyTorch")print("MAX error:", max_error(y_test_tensor, y_predict))print("MAE error:", mean_absolute_error(y_test_tensor, y_predict))print("MSE error:", mean_squared_error(y_test_tensor, y_predict, squared=False))plt.scatter(y_test, y_predict)plt.scatter(y_test, y_test, marker='.')plt.title("Model predictions: DNN PyTorch")plt.tight_layout()plt.show()

TensorFlow code:

# Standard library importsimport pandas as pdimport matplotlib.pyplot as plt# External library importsimport tensorflow as tffrom sklearn.model_selection import train_test_splitfrom sklearn.preprocessing import OneHotEncoder, MinMaxScaler, StandardScalerfrom sklearn.metrics import max_error, mean_absolute_error, mean_squared_error# Loading datasetdf_data = pd.read_csv("./data_inverter.csv", names=["pvt", "edge", "slew", "load", "delay"])# Selecting subset of data based on specific conditionsdf_select = df_data[(df_data["pvt"] == "PtypV1500T027") & (df_data["edge"] == "rise")]# Splitting features and target variableX = df_select.drop(["pvt", "edge", "delay"], axis='columns')y = df_select["delay"]# Scaling input features using Min-Max scalingslew_scaler = MinMaxScaler()load_scaler = MinMaxScaler()X_scaled = X.copy()X_scaled["slew"] = slew_scaler.fit_transform(X_scaled.slew.values.reshape(-1, 1))X_scaled["load"] = load_scaler.fit_transform(X_scaled.load.values.reshape(-1, 1))# Splitting data into training and testing setsX_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.1, random_state=42)# Converting data to TensorFlow tensorsX_train_tensor = tf.constant(X_train.values, dtype=tf.float32)y_train_tensor = tf.constant(y_train.values, dtype=tf.float32)X_test_tensor = tf.constant(X_test.values, dtype=tf.float32)y_test_tensor = tf.constant(y_test.values, dtype=tf.float32)# Setting random seed for reproducibilitytf.keras.utils.set_random_seed(42)# Defining neural network architecturemodel = tf.keras.models.Sequential([    tf.keras.layers.Dense(128, activation='relu', input_dim=X_train_tensor.shape[1]),    tf.keras.layers.Dense(128, activation='relu'),    tf.keras.layers.Dense(64, activation='relu'),    tf.keras.layers.Dense(32, activation='relu'),    tf.keras.layers.Dense(16, activation='relu'),    tf.keras.layers.Dense(1, activation='elu')])# Compiling the modelmodel.compile(    loss=tf.keras.losses.MeanSquaredError(),  # Using Mean Squared Error loss function    optimizer=tf.keras.optimizers.Adam(),  # Using Adam optimizer    metrics=['mae', 'mse']  # Using Mean Absolute Error and Mean Squared Error as metrics)# Training the modelprogress = model.fit(X_train_tensor, y_train_tensor, validation_data=(X_test_tensor, y_test_tensor), epochs=25)# Evaluating model performance on the testing setmodel.evaluate(X_test_tensor, y_test_tensor, verbose=2)# Displaying model summaryprint(model.summary())# Plotting training progresspd.DataFrame(progress.history).plot()plt.title("Model training progress: DNN TensorFlow")plt.tight_layout()plt.show()# Making predictions on the testing sety_predict = model.predict(X_test_tensor)# Displaying model performance metricsprint("Model performance metrics: DNN TensorFlow")print("MAX error:", max_error(y_test_tensor, y_predict))print("MAE error:", mean_absolute_error(y_test_tensor, y_predict))print("MSE error:", mean_squared_error(y_test_tensor, y_predict, squared=False))plt.scatter(y_test, y_predict)plt.scatter(y_test, y_test, marker='.')plt.title("Model predictions: DNN TensorFlow")plt.tight_layout()plt.show()

CLI output of PyTorch model performance metrics after 25 epochs:

Sequential(  (0): Linear(in_features=2, out_features=128, bias=True)  (1): ReLU()  (2): Linear(in_features=128, out_features=128, bias=True)  (3): ReLU()  (4): Linear(in_features=128, out_features=64, bias=True)  (5): ReLU()  (6): Linear(in_features=64, out_features=32, bias=True)  (7): ReLU()  (8): Linear(in_features=32, out_features=16, bias=True)  (9): ReLU()  (10): Linear(in_features=16, out_features=1, bias=True)  (11): ELU(alpha=1.0))Model performance metrics: DNN PyTorchMAX error: 1.2864852MAE error: 0.3353702MSE error: 0.42874745

CLI output of TensorFlow model performance metrics after 25 epochs:

Model: "sequential"_________________________________________________________________ Layer (type)                Output Shape              Param #   ================================================================= dense (Dense)               (None, 128)               384        dense_1 (Dense)             (None, 128)               16512      dense_2 (Dense)             (None, 64)                8256       dense_3 (Dense)             (None, 32)                2080       dense_4 (Dense)             (None, 16)                528        dense_5 (Dense)             (None, 1)                 17        =================================================================Total params: 27777 (108.50 KB)Trainable params: 27777 (108.50 KB)Non-trainable params: 0 (0.00 Byte)_________________________________________________________________None6/6 [==============================] - 0s 750us/stepModel performance metrics: DNN TensorFlowMAX error: 0.013849139MAE error: 0.0029576812MSE error: 0.0036013061

PyTorch training progress:PyTorch training progress

TensorFlow training progress:TensorFlow training progress

PyTorch scatter plot (orange = target against itself, blue = target against prediction):PyTorch scatter plot

TensorFlow scatter plot (orange = target against itself, blue = target against prediction):enter image description here

..............................................................................................................................................

Appending additional info (reacting to the questions and comments):

torch.optim.Adam - the default learning rate is set to 0.001.torch.optim.Adam

tf.keras.optimizers.Adam - the default learning rate is set to 0.001tf.keras.optimizers.Adam

..............................................................................................................................................

PyTorch model performance after 250 epoch:

Sequential(  (0): Linear(in_features=2, out_features=128, bias=True)  (1): ReLU()  (2): Linear(in_features=128, out_features=128, bias=True)  (3): ReLU()  (4): Linear(in_features=128, out_features=64, bias=True)  (5): ReLU()  (6): Linear(in_features=64, out_features=32, bias=True)  (7): ReLU()  (8): Linear(in_features=32, out_features=16, bias=True)  (9): ReLU()  (10): Linear(in_features=16, out_features=1, bias=True)  (11): ELU(alpha=1.0))Model performance metrics: DNN PyTorchMAX error: 0.025619686MAE error: 0.006687804MSE error: 0.008531998

PyTorch training progress, 250 epochsPyTorch scatter plot, 250 epochs


Viewing all articles
Browse latest Browse all 23276

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>