@vliendo wrote:
Have a good day everybody. Some time ago I read an interesting post on Solving Multi-Label classification problems from someone of the Analitycs Vidhya community. You can find it in (https://www.analyticsvidhya.com/blog/2017/08/introduction-to-multi-label-classification/). I have tried different approaches or thecniques described in the post, by using not the artificial dataset, but two of the MULAN repository, these datasets are emotions.arff and yeast.arrff. Here we have a screenshot of yeast.arrff
So far i have gotten very poor accuracy scores. Any hint about ? Are these reasonable scores given the fact that the file have a lot of features (+100) ?. I also tried with emotions.arrff but results are not too much different
That’s an example of what i’m doing, for yeast.arrff:
import pandas as pd
import numpy as np
import scipy
from scipy.io import arff
from scipy import sparse
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_splitdata, meta = scipy.io.arff.loadarff(’/home/victorl/Documents/DATAScience/MachineLearning-AND-DATA-Mining/MULTICLASS-MULTILABEL/DATASETS/yeast.arff’)
df = pd.DataFrame(data)Classes are stored as object that’s why i do this …
df[‘Class1’]=pd.to_numeric(df[‘Class1’])
df[‘Class2’]=pd.to_numeric(df[‘Class2’])
df[‘Class3’]=pd.to_numeric(df[‘Class3’])
df[‘Class4’]=pd.to_numeric(df[‘Class4’])
df[‘Class5’]=pd.to_numeric(df[‘Class5’])
df[‘Class6’]=pd.to_numeric(df[‘Class6’])
df[‘Class7’]=pd.to_numeric(df[‘Class7’])
df[‘Class8’]=pd.to_numeric(df[‘Class8’])
df[‘Class9’]=pd.to_numeric(df[‘Class9’])
df[‘Class10’]=pd.to_numeric(df[‘Class10’])
df[‘Class11’]=pd.to_numeric(df[‘Class11’])
df[‘Class12’]=pd.to_numeric(df[‘Class12’])
df[‘Class13’]=pd.to_numeric(df[‘Class13’])
df[‘Class14’]=pd.to_numeric(df[‘Class14’])Separating the data into label and features
features=list(df.columns[0:103])
labels=list(df.columns[103:])
X=df[features]
Y=df[labels]#SPLITTING IN TRAIN AND TEST SETS
X_train,X_test, y_train,y_test=train_test_split(X, Y, random_state=0)A sparse representation of the Y matrix is preferred !!!
X_train_sp=sparse.csr_matrix(X_train.values)
y_train_sp=sparse.csr_matrix(y_train.values)#1st approach
from skmultilearn.adapt import MLkNNclassifier1 = MLkNN(k=5)
train
classifier1.fit(X_train_sp, y_train_sp)
predict
predictions = classifier1.predict(X_test.values)
print(accuracy_score(y_test,predictions.todense()))
0.186776859504#2nd approach
from skmultilearn.neurofuzzy import MLARAM
classifier2=MLARAM(vigilance=0.9, threshold=0.02)
classifier2.fit(X_train.values, y_train.values)
predictions=classifier2.predict(X_test.values)
print(accuracy_score(y_test,predictions))
0.181818181818#3rd try
from skmultilearn.problem_transform import LabelPowerset
from sklearn.naive_bayes import GaussianNBinitialize Label Powerset multi-label classifier
with a gaussian naive bayes base classifier
classifier = LabelPowerset(GaussianNB())
train
classifier.fit(X_train.values, y_train.values)
predict
predictions = classifier.predict(X_test.values)
print(accuracy_score(y_test,predictions))0.17520661157
#finally
from sklearn.multioutput import MultiOutputClassifier
from sklearn.ensemble import RandomForestClassifier
forest = RandomForestClassifier(n_estimators=50, max_depth=8,random_state=1)
multi_target_forest = MultiOutputClassifier(forest, n_jobs=-1)
predictions=multi_target_forest.fit(X_train.values, y_train.values).predict(X_test.values)
print(accuracy_score(y_test,predictions))
0.104132231405Thanks in advance …
Posts: 1
Participants: 1
