Quantcast
Channel: Data Science, Analytics and Big Data discussions - Latest topics
Viewing all articles
Browse latest Browse all 4448

Regression analysis

$
0
0

@kthouz wrote:

Hi everyone!

I have a dataset of 10k rows and 350 columns. 97% of all independent variables are standard normally distributed (they all have mean 0 and unity standard deviation). The rest 3% are uniformly distributed unordered categorical variable. The dependent variable is a symetric Gaussian distribution with mean 1 and standard deviation 5.

The goal is to build a predictive model

My approach is to use regression instead of classification.

My question: Would anybody tip me on how to proceed? I don't want to rush into gradient boosting algorithms before proper analysis. I am looking for a decent statistical approach (appropriate statistic tests)

  • I have done the data exploration and computed correlations. The most strongly correlated variables have a correlation of .5 hence it is ambiguous about which variables to drop
  • off all variables, I found that three of them have a standard deviation of 0.7 instead of being 1 like the rest. And interesting, these variables are in the top 5 strongly correlated to the target variable.

Posts: 4

Participants: 3

Read full topic


Viewing all articles
Browse latest Browse all 4448

Trending Articles