Quantcast
Channel: Data Science, Analytics and Big Data discussions - Latest topics
Viewing all articles
Browse latest Browse all 4448

Which regression algorithm could be applied for correcting sensor values?

$
0
0

@AbhishekHP wrote:

Please consider the sample dataset below.

In simple terms,
Sensor is defective and hence measured incorrect values since 2000 and we have the data for 10 years with both: measured and actual.

P.S. Although we dont have data for each combination of the application and sensor type on monthly basis.

Now, we want to have the actual from the algorithm for actual values.

We tried, XGBoost and CatBoost by creating another column named diff = measured- actual
and fed to the algorithm to identify the pattern. but not sure which algorithm is appropriate although suspecting Neural network or Time series (ARIMA) could work but not sure
because we have just 10 years data on monthly level

library(tidyverse)

train_data <- data.frame(
  time = c(rep("01.2000",10),rep("02.2000",10),rep(".",3),rep("11.2010",10),rep("12.2010",10)),
  application = c(rep("factory",4),rep("residential",3),rep("research",3),
                  rep("factory",2),rep("residential",5),rep("research",3),
                  rep(".",3),
                  rep("factory",2),rep("residential",2),rep("research",6),
                  rep("factory",7),rep("residential",1),rep("research",2)),
  sensor = c(LETTERS[1:10],LETTERS[10:1],rep(".",3),LETTERS[c(5:1,10:6)],LETTERS[c(3:9,2,1,10)]), 
  measured = c(26.4,2000,1001,23.9,100000,0,1234,12098,34567,0,
               123,676,12,0,100,0,0,98,1,190,
               rep(".",3),
               3454,0,101,9,1,0,14,1298,677,0,
               264,20220,1851,3.9,1044,0,1764,0,34,0),
  actual =  c(26.4,2010,1001,23.9,100100,237,1234,12098,34567,19583,
              123,706,1112,156,100,650,109,98,10,190,
              rep(".",3),
              3454,10,101,19,10,40,44,1298,760,50,
              264,20220,1851,39,1048,870,1765,40,35,1110)
)

# to forecast actual 
test_data <- data.frame(
  time = rep("01.2011",10),
  application = c(rep("factory",7),rep("residential",1),rep("research",2)),
  sensor = LETTERS[c(1,4,5,9,3,2,8,6,7,10)], 
  measured = c(26.4,100000,0,0,
               123,12,
               3454,0,20220,1851)
)

How can we predict/forecast the actual values for 01.2011 data (test_data) ?

Posts: 1

Participants: 1

Read full topic


Viewing all articles
Browse latest Browse all 4448

Trending Articles