Quantcast
Channel: Data Science, Analytics and Big Data discussions - Latest topics
Viewing all articles
Browse latest Browse all 4448

Simple beer recommendation system using cosine similarity

$
0
0

@anurag551 wrote:

I have been trying to build a beer recommendation engine , so far looking at stackoverflow i have decided to make it simply using tf-idf and Cosine similarity .

So far my code like this : `

import pandas as pd
import re
import numpy as np
from bs4 import BeautifulSoup
from sklearn.feature_extraction.text import TfidfVectorizer
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
wnlzer = WordNetLemmatizer()


train = pd.read_csv("labeledTrainData.tsv" , header = 0 ,  \
	delimiter = '\t' , quoting  = 3)


def raw_string_to_list_clean_string( raw_train_review ):
	remove_html = BeautifulSoup( raw_train_review ).text
	remove_punch = re.sub('[^A-Za-z ]' , "" , remove_html)
	token = remove_punch.lower().split()
	srm_token = [wnlzer.lemmatize(i) for i in token if not i in set(stopwords.words('english'))]
	clean_text = " ".join(srm_token)
	return(clean_text)

ready_train_list = []
length  = len(train['review'])
for i in range(0 , length):
	if (i%100 == 0):
		print "doing  %d of  %d of training data set" % (i+1 , length)
	a = raw_string_to_list_clean_string(train['review'][i])
	ready_train_list.append(a)

vectorizer = TfidfVectorizer(analyzer = "word" , tokenizer = None , preprocessor = None , \
	stop_words = None , max_features = 20000)
training_our_vectorizer = vectorizer.fit_transform(ready_train_list)``

Now i know how to use cosine similarity but i am not able to figure out ::
1 -> how to use the matrix generated by cosine similarity
2--> how to restrict the recommendation to a max of 5 beers

Posts: 1

Participants: 1

Read full topic


Viewing all articles
Browse latest Browse all 4448

Trending Articles