@bgarcial wrote:
I have the following dataset, in which the
Direccion del viento (Pos)
column have categorical valuesIn total
Direccion del viento (Pos)
it has 8 categories:
SO
- Sur oesteSE
- Sur esteS
- SurN
- NorteNO
- Nor oesteNE
- Nor esteO
- OesteE
- EsteThen I convert this dataframe to numpy array and I get:
direccion_viento_pos dtype: bool [['S'] ['S'] ['S'] ... ['SO'] ['NO'] ['SO']]
Since I have character string values, I want these to be numeric values, so I need to code the categorical variables. That is, coding the text we have as numerical values
Then I perform two activities:
- I use LabelEncoder() to simply encode the values into number according to how many categories I have.
Label encoding is simply converting each value in a column to a number
labelencoder_direccion_viento_pos = LabelEncoder() direccion_viento_pos[:, 0] = labelencoder_direccion_viento_pos.fit_transform(direccion_viento_pos[:, 0])
I use One Hot Encoding to convert each category value into a new column and assigns a 1 or 0 (True/False) value to the column.
onehotencoder = OneHotEncoder(categorical_features = [0])
direccion_viento_pos = onehotencoder.fit_transform(direccion_viento_pos).toarray()Is of this way, since I get these new values:
direccion_viento_pos array([[0., 0., 0., ..., 1., 0., 0.], [0., 0., 0., ..., 1., 0., 0.], [0., 0., 0., ..., 1., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 1.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 1.]])
Then I convert this
direccion_viento_pos
array to dataframe to visualize of a best way:# Turn array to dataframe with columns indexes cols = ['E', 'N', 'NE', 'NO', 'O', 'S', 'SE', 'SO'] df_direccion_viento = pd.DataFrame(direccion_viento_pos, columns=cols)
Then, I can get by each category value a new column and assigns a 1 or 0 (True/False) value to the column.
If I use pandas.get_dummies() function I get the same result.
My question is:
Is this the best way of deal with these categorical variables?
Having a column for each category and having values of zeros in several of them does not help to have a bias or noise in the data for when automatic learning algorithms are applied?I’ve recently started reading about it in this article, but any guidance on this I appreciate
Posts: 1
Participants: 1