Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 23131

Seeing an error that says: 'numpy.ndarray' object has no attribute 'map'

$
0
0

I am selecting a subset of data from a larger dataframe.

dataset = df.select('RatingScore','CategoryScore','CouponBin','TTM','Price','Spread','Coupon', 'WAM', 'DV')dataset = dataset.fillna(0)dataset.show(5,True)dataset.printSchema()

Now, I fee that into my KMeans model

from numpy import arrayfrom math import sqrtfrom pyspark.mllib.clustering import KMeans, KMeansModelimport numpy as npdata_array=np.array(dataset)#data_array =  np.array(dataset.select('RatingScore', 'CategoryScore', 'CouponBin', 'TTM', 'Price', 'Spread', 'Coupon', 'WAM', #'DV').collect())# Build the model (cluster the data)clusters = KMeans.train(data_array, 2, maxIterations=10, initializationMode="random")# Evaluate clustering by computing Within Set Sum of Squared Errorsdef error(point):    center = clusters.centers[clusters.predict(point)]    return sqrt(sum([x**2 for x in (point - center)]))WSSSE = data_array.map(lambda point: error(point)).reduce(lambda x, y: x + y)print("Within Set Sum of Squared Error = "+ str(WSSSE))

This line: clusters = KMeans.train(data_array, 2, maxIterations=10, initializationMode="random")

Throws this error: AttributeError: 'numpy.ndarray' object has no attribute 'map'

From the code, you can see that I tried to create the array two different ways. Neither worked. If I try to fee in the items straight from the subset-dataframe, I get this error:

AttributeError: 'DataFrame' object has no attribute 'map'

What am I missing here?


Viewing all articles
Browse latest Browse all 23131

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>