Quantcast
Channel: Data Science, Analytics and Big Data discussions - Latest topics
Viewing all articles
Browse latest Browse all 4448

Extracting data belonging to a day from a given range of dates on a dataset

$
0
0

@bgarcial wrote:

I have a dataset with a date range of January 12th to August 3rd of 2018 with some values:

values

The dimensionality of my_df DataFrame is:

my_df.shape 
(9752, 2)

Each row data have a frequency of half hour

The first row begins at 2018-01-12

my_df.iloc[0]
Date:       2018-01-12 00:17:28
Value                      1
Name: 0, dtype: object

And the last row ending at 2018-08-03

my_df.tail(1)
                  Date:	    Value
9751	2018-08-03 23:44:59	 1

My objective is to select the data rows corresponding to each day and export it to a comma separated values file.

To get only the January 12th data and save to readable file, I perform:

# Selecting data value of each day
my_df_Jan12 = my_df[(my_df['Fecha:']>='2018-01-12 00:00:00') 
              & 
              (my_df['Fecha:']<='2018-01-12 23:59:59')
                                   ]
my_df_Jan12.to_csv('Data_Jan_12.csv', sep=',', header=True, index=False)

From January 12 to August 03 there are 203 days (28 weeks)

I don’t want to perform this query by each day of a manual way, then I am trying the following basic analysis:

  • I need generate 203 files (1 file by each day)
  • The day on January starting on 12 (January 12)
  • January is a first month (01) and August is the eighth month(08)

Then:

  • I need iterate over the 203 days totality
    • and is necessary in each date row value check
      the month and day value date with the order to check the change of
      each one of them

According to the above, I am trying this approach:

# Selecting data value of each day (203 days)
for i in range(203):
    for j in range(1,9): # month
        for k in range(12,32): # days of the month
            values = my_df[(my_df['Fecha:']>='2018-0{}-{} 00:00:00'.format(j,k)) 
            &  
            (my_df['Fecha:']<='2018-0{}-{} 23:59:59'.format(j,k))]
            values.to_csv('Values_day_{}.csv'.format(i), sep=',', header=True, index=False)

But I have the problem in the sense of when I iterate of range(12,32) in the days of the months, this range(12,32) only apply to first January month, I think so …

Finally, I get 203 empty comma separated value files, due to something I am doing wrong …

How to can I address this small challenge of the suited way?
Any orientation is highly appreciated

Posts: 1

Participants: 1

Read full topic


Viewing all articles
Browse latest Browse all 4448

Trending Articles