McKinsey Analytics Online Hackathon
@him4318 wrote: Where can I get a solution to the McKinsey Analytics Online Hackathon problem statement? Posts: 1 Participants: 1 Read full topic
View ArticleCollinearity among categorical variables
@kpksr wrote: Hi, How do I check the multi collinearity within the categorical variables. My target variable as well is categorical with 2 possible values. Can I use VIF as in linear regression model...
View ArticleTest file loan status missing
@atybzz wrote: Hey guys, I am new to data science and very excited to start learning! I took on this loan prediction challenge. So I clean the data, applied some feature extraction technique (pca) and...
View ArticleOuter Join in Pandas
@kailash_negi wrote: Hi, I’ve two pandas dataframes and have to perform an outer join on these two dataframes but I am not getting desired results. So what I want is when I perform join, missing data...
View ArticleHow do I use the parameters after I have found them using multiple linear...
@Tan_Moy wrote: I wrote an algorithm for multiple linear regression from scratch using python, which works as expected. My question is, how do I use the parameters that I got to predict values for the...
View ArticleAnalysis of road accidents in New York
@raj6218a wrote: Identify dataset of new york Apply data model algorithm using Rapid Miner Tool Apply the same data model(Which was applied in Rapid miner) in python related tools Do a comparison of...
View ArticleR studio - Read a file in unix from R
@Hari3289 wrote: Is it possible to connect to remote ssh server(unix) with username and password and, read a file? I’m using windows server and I need to do data profiling for that file. Need your...
View ArticleHow to avoid self-fulfilling prediction in recommendation systems?
@vjk wrote: I have two question on recommendation ML systems 1. If a ML system predicts that a user is likely to buy another item wouldn’t it mean that he is going to buy the recommended item anyway...
View ArticleWhy Does Cleaning and Collecting Data Take So Long?
@peters64s wrote: It is said a vast majority of a data scientist’s time is spent gathering and cleaning data. Why don’t data scientists use data cleaning and wrangling software to save time? What are...
View ArticleRstudio importing problem
@premsheth wrote: I am Premal Sheth. I am using Rstudio on windows 7 machine I was trying to import text files in Rstudio using VCorpus command for text analysis. But in some file there is word " 'll...
View ArticleHow to deal with Input(text written by human) -> LOCODE
@mhadjis wrote: What algorithm (machine learning) do you use for solving this problem: input(text written by human) -> United Nations Code for Trade and Transport Locations (UN/LOCODE). For...
View ArticleHow to connect TM1 REST API to R
@mukund840 wrote: Hi All I have recently working on cognos TM1 which is a planning,budgeting and forecasting tool. But limited with statistical calculations. To add on to its capability I want to...
View ArticleDealing zeros in log transformation
@santoshb7 wrote: How can we deal with the 0s in data while transforming them to log scale. Also We don’t want to discard the 0s. Posts: 1 Participants: 1 Read full topic
View ArticleHow to realize dbscan algorithm in python without using any packages?
@owenhe wrote: I want to use python to perform a data analysis task which is going to use dbscan algorithm, because I prefer python than other languages or tools. However, I am not allowed to use any...
View Articlet-SNE and subsequent clustering
@bici_sancta wrote: Hello, I have a question about the validity or problems associated to using clustering methods (e.g., k-means, or spectral, or dbscan, …) on a data set that has been dimensionally...
View ArticleAre PCA eigenvalues equivalent to semipartial correlation coefficients in...
@Mah1510 wrote: Hi! I am a PhD student in my third month with a background in cognitive neuroscience. I loved statistics and thus developed a rather deep understanding of regression analyses. Now, I...
View ArticleRegular intervals - Time series analysis
@karanam6uday wrote: Hi, I have two columns with dates in first column and values in second column. I would like to check if first column has dates in regular intervals. data Jan-2017 234 Mar-2017 567...
View ArticleInteractive visuals
@Fahim_Ahmad wrote: Hi dears, I have a data set with 4 variables, below is how my data set looks. m8 m7 q1 percent 2007 Kabul right 43.60567 2007 Kapisa right 37 2007 Parwan right 62.68935 2007 Wardak...
View ArticleLogistic Regression in R
@amrita4friends wrote: In SAS to model 1s rather than 0s, we use the descending option. We do this because by default, proc logistic models 0s rather than 1s. What happens in case of R’s glm function?...
View ArticleUsing SMOTE function to handle imbalanced data set
@milind275 wrote: I am working on a problem of loan default prediction for a financial risk assessment. I would like to know the good approach to use SMOTE function for handling the imbalanced dataset...
View Article