@Niranjanp wrote:
Hello,
I have been working on text classification problem which has three outcome variables and they are multi-class variables.
The dataset description as follows.
Dataset is about the accidents happened in the industries over the years
and they are classified according to their Degree,Nature and
Occupation.My requirement is to build a classifier which will classify and assign
the score to the documents so that any new documents which come from
news,Google search or twitter will be classified according to their
degree,nature and occupation. Further I want to categorize them into
high,medium and low risk documents using the score value(rank them).The main goal is to build a intelligence/ classifier to prevent and
mitigate the accidents in the industry before hand.Its kind of risk
prediction model which I am trying to build.Data set summary:
Independent feature:(input variable)
Description => Information about the cause of the particular
accident.(this is my text document)Dependent/outcome features
Degree => Hospitalized,Non Hospitalized,Fatality (3 classes)
Nature => has many types/classes
Occupation => Occupation of the employeesI have read many papers where they mentioned about how to approach this
problem.
- Combine the outcome variables in one.( i don't how it is going to
work)- two level model like first build a classifier for first outcome
variable and then second one.How to achieve this problem. I am really struggling how to approach to this
problem.I have built a classifier for single outcome variable but how to do it
for two or more outcome variables which themselves are multi class
variables.If I am not wrong is it multi-class multi-label classification problem ?
I have done all feature extraction,stemming,removing stop words.
I am using tf-idf approach.( also thinking to use word2vec approach)
I am using pythonMachine learning algorithms
Sci-kit learn's Naive_Bayes and SVM algorithms.Any similar example or some reference would be helpful ?
Thanks,
Posts: 1
Participants: 1