Quantcast
Channel: Data Science, Analytics and Big Data discussions - Latest topics
Viewing all articles
Browse latest Browse all 4448

Data Normality Question

$
0
0

@mohitlearns wrote:

Hi,

let’s say I have a small data which is indicative of demographics of a place has fields /attributes like :

  • Place_Longitude,
  • Place_Latitude,
  • population (per district in the state),
  • median_income,
  • median_house_age,
  • median_house_value,
  • district_location (near sea or inland),
  • total_households (per state district)

I plotted histograms on the entire dataset and saw long tailed distributions. Data is NOT normal or Gaussian

Now my questions are :

  • When checking for Normality / Gaussian curve. Should we check all the attributes or a few particular attributes in the dataset? which one would be those in my case above.

  • When transforming data to make it normal, do we transform all the available data/attributes or a few important ones.

  • How should I handle such a transformation of making data normal…through scipy.stats module using cox-box or by any other technique.Kindly explain.

PS: Can data scaling / standarization make data normal ?

Thanks & Regards,

Mohit

Posts: 1

Participants: 1

Read full topic


Viewing all articles
Browse latest Browse all 4448

Trending Articles