I am interested in applying classification algorithms such as KNN, GP, MLP, etc., on the RCV1 dataset for topic classification. However, this dataset is quite large, with dimensions of (804414, 47236) for the data and (804414, 103) for the target. Additionally, a significant portion of the data contains zeros.each time that i try trained the model i get memory Error or Outliers and unrelated data.I use python in google colab.To make these algorithms run easier, I am considering employing methods like sampling or feature selection. I would appreciate guidance on how to do this and witch techniques are efficient?
Thank you!
best techniques to reduce RCV1 dimension.