In theory, an MLP with a single hidden layer with just 3 neurons is enough to predict xor correctly. It could sometimes fail to converge properly, but 4 neurons are a safe bet.
Here's an example
I've tried to reproduce this using sklearn.neural_network.MLPClassifier:
from sklearn import neural_networkfrom sklearn.metrics import accuracy_score, precision_score, recall_scoreimport numpy as npx_train = np.random.uniform(-1, 1, (10000, 2))tmp = x_train > 0y_train = 2 * (tmp[:, 0] ^ tmp[:, 1]) - 1model = neural_network.MLPClassifier( hidden_layer_sizes=(3,), n_iter_no_change=100, learning_rate_init=0.01, max_iter=1000).fit(x_train, y_train)x_test = np.random.uniform(-1, 1, (1000, 2))tmp = x_test > 0y_test = 2 * (tmp[:, 0] ^ tmp[:, 1]) - 1prediction = model.predict(x_test)print(f'Accuracy: {accuracy_score(y_pred=prediction, y_true=y_test)}')print(f'recall: {recall_score(y_pred=prediction, y_true=y_test)}')print(f'precision: {precision_score(y_pred=prediction, y_true=y_test)}')
I only get around 0.75 accuracy, while the tensorflow playground model is perfect, any idea what makes the difference?
Tried also using tensorflow:
model = tf.keras.Sequential(layers=[ tf.keras.layers.Input(shape=(2,)), tf.keras.layers.Dense(4, activation='relu'), tf.keras.layers.Dense(1)])model.compile(loss=tf.keras.losses.binary_crossentropy)x_train = np.random.uniform(-1, 1, (10000, 2))tmp = x_train > 0y_train = (tmp[:, 0] ^ tmp[:, 1])model.fit(x=x_train, y=y_train)x_test = np.random.uniform(-1, 1, (1000, 2))tmp = x_test > 0y_test = (tmp[:, 0] ^ tmp[:, 1])prediction = model.predict(x_test) > 0.5print(f'Accuracy: {accuracy_score(y_pred=prediction, y_true=y_test)}')print(f'recall: {recall_score(y_pred=prediction, y_true=y_test)}')print(f'precision: {precision_score(y_pred=prediction, y_true=y_test)}')
With this model I get similar results to the scikit-learn model... So it's not just a scikit-learn issue - am I missing some important hyper-parameter?
Edit
Ok, changed the loss to mean squared error instead of cross-entropy, and now I get with the tensorflow example 0.92 accuracy. I guess that's the problem with the MLPClassifier?