Apply Kmeans in partitions

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Apply Kmeans in partitions

dimitris plakas
Hello everyone,

I have a dataframe which has 5040 rows where these rows are splitted in 5 groups. So i have a column called "Group_Id" which marks every row with values from 0-4 depending on in which group every rows belongs to. I am trying to split my dataframe to 5 partitions and apply Kmeans to every partition. I have tried 

rdd=mydataframe.rdd.mapPartitions(function, True)
test = Kmeans.train(rdd, num_of_centers, "random")

but i get an error.

How can i apply Kmeans to every partition?

Thank you in advance,
Reply | Threaded
Open this post in threaded view
|

Re: Apply Kmeans in partitions

Apostolos N. Papadopoulos
Hi Dimitri,

what is the error you are getting, please specify.

Apostolos


On 30/1/19 16:30, dimitris plakas wrote:

> Hello everyone,
>
> I have a dataframe which has 5040 rows where these rows are splitted
> in 5 groups. So i have a column called "Group_Id" which marks every
> row with values from 0-4 depending on in which group every rows
> belongs to. I am trying to split my dataframe to 5 partitions and
> apply Kmeans to every partition. I have tried
>
> rdd=mydataframe.rdd.mapPartitions(function, True)
> test = Kmeans.train(rdd, num_of_centers, "random")
>
> but i get an error.
>
> How can i apply Kmeans to every partition?
>
> Thank you in advance,

--
Apostolos N. Papadopoulos, Associate Professor
Department of Informatics
Aristotle University of Thessaloniki
Thessaloniki, GREECE
tel: ++0030312310991918
email: [hidden email]
twitter: @papadopoulos_ap
web: http://datalab.csd.auth.gr/~apostol


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]