I have just started work on a classification project which detects phishing websites. I am using uci dataset https://archive.ics.uci.edu/ml/machine-learning-databases/00327/Training%20Dataset.arff.I am trying several models on it like ANN, SVM, logistic regression and I have trained and tested the model.
My logistic regression code looks like this
#importing librariesimport numpy as npimport matplotlib.pyplot as pltimport pandas as pd#importing the datasetdataset = pd.read_csv("phishcoop.csv")x = dataset.iloc[: , :-1].valuesy = dataset.iloc[:, -1]#Split the dataset into training and testfrom sklearn.cross_validation import train_test_splitx_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25, train_size =0.75, random_state = 0)#fitting logistic regression into training setfrom sklearn.linear_model import LogisticRegressionclassifier = LogisticRegression(random_state =0)classifier.fit(x_train, y_train)#Predicting values for test datay_pred = classifier.predict(x_test)#checking accurancy using confusion matrixfrom sklearn.metrics import confusion_matrixcm = confusion_matrix(y_test, y_pred)
How do I extract the 30 features in my dataset from the url which the user will give as input?