Quantcast
Viewing all articles
Browse latest Browse all 4448

How to display the count during reduceByKey

@vidhkarthigeyan wrote:

Hi, I have written a function in pyspark to count occurance of each words in text file. Almost i have followed one of the article in your site but unable to display the count during reduceByKey. When i print the RDD, it is displaying
"[(<pyspark.resultiterable.ResultIterable object at 0x00000000065A7A58>, u’AIADMK’), (<pyspark.resultiterable.ResultIterable object at 0x00000000065A7AC8>, u’Leader’) " where i am expecting the count of AIADMK.
This is my code to display the count
def groupByKey(rdd):
rddForWords = filterMap(rdd)
#print(filterMap(rdd).collect())
mapWordsTo1 = rddForWords.map(lambda x : (x,1))
#print(mapWordsTo1.collect())
groupByKeyRDD = mapWordsTo1.groupByKey()
#print(list((j[0], list(j[1])) for j in groupByKeyRDD.take(100)))
return groupByKeyRDD
#print(groupByKeyRDD.collect())

def reduceByKey(rdd):
reduceByKeyRDD = groupByKey(rdd).reduceByKey(lambda x,y: x+y).map(lambda x:(x[1],x[0])).sortByKey(False)
print(reduceByKeyRDD.take(100))

Posts: 1

Participants: 1

Read full topic


Viewing all articles
Browse latest Browse all 4448

Trending Articles