pandas ，检查datetimeindex的重新采样的30分钟时间段中是否存在时间戳值

输入:print df1 boat_id speedtime2015-03-01 09:00:00 28 0.0000002015-03-01 09:30:00 28 0.7235032015-03-01 10:00:00 28 2.239399print df2 id boat_id time statetime2015-01-18 16:09:03 319437 28 2015-01-18 16:09:03 22015-01-18 16:18:43 319451 28 2015-01-18 16:18:43 02015-03-01 09:39:51 507108 31 2015-03-01 09:39:51 12015-03-01 09:40:58 507109 31 2015-03-01 09:40:58 0我重置两个数据框的索引，并创建由1填充的帮助器列i.df1 = df1.reset_index()df2 = df2.reset_index(drop=True)df1['i'] = df2['i'] = 1print df1 time boat_id speed i0 2015-03-01 09:00:00 28 0.000000 11 2015-03-01 09:30:00 28 0.723503 12 2015-03-01 10:00:00 28 2.239399 1print df2 id boat_id time state i0 319437 28 2015-01-18 16:09:03 2 11 319451 28 2015-01-18 16:18:43 0 12 507108 31 2015-03-01 09:39:51 1 13 507109 31 2015-03-01 09:40:58 0 1然后我通过帮助程序列i合并了两个数据框.df = df2.merge(df1, on='i', how='left')df = df.rename(columns={'time_y':'Bin_time', 'time_x':'time'})print df id boat_id_x time state i Bin_time \0 319437 28 2015-01-18 16:09:03 2 1 2015-03-01 09:00:001 319437 28 2015-01-18 16:09:03 2 1 2015-03-01 09:30:002 319437 28 2015-01-18 16:09:03 2 1 2015-03-01 10:00:003 319451 28 2015-01-18 16:18:43 0 1 2015-03-01 09:00:004 319451 28 2015-01-18 16:18:43 0 1 2015-03-01 09:30:005 319451 28 2015-01-18 16:18:43 0 1 2015-03-01 10:00:006 507108 31 2015-03-01 09:39:51 1 1 2015-03-01 09:00:007 507108 31 2015-03-01 09:39:51 1 1 2015-03-01 09:30:008 507108 31 2015-03-01 09:39:51 1 1 2015-03-01 10:00:009 507109 31 2015-03-01 09:40:58 0 1 2015-03-01 09:00:0010 507109 31 2015-03-01 09:40:58 0 1 2015-03-01 09:30:0011 507109 31 2015-03-01 09:40:58 0 1 2015-03-01 10:00:00 boat_id_y speed0 28 0.0000001 28 0.7235032 28 2.2393993 28 0.0000004 28 0.7235035 28 2.2393996 28 0.0000007 28 0.7235038 28 2.2393999 28 0.00000010 28 0.72350311 28 2.239399输出按bin时间进行过滤:df = df[((df.time >= (df.Bin_time - dt.timedelta(minutes=30))) & (df.time <= df.Bin_time ))]df = df.drop(['Bin_time', 'id', 'boat_id_x', 'boat_id_y','state', 'i' ], axis=1 )print df time speed8 2015-03-01 09:39:51 2.23939911 2015-03-01 09:40:58 2.239399然后df被列time与数据帧df2合并.df = df2.merge(df, on='time', how='left').reset_index(drop=True)df = df.drop([ 'i' ], axis=1 )print df id boat_id time state speed0 319437 28 2015-01-18 16:09:03 2 NaN1 319451 28 2015-01-18 16:18:43 0 NaN2 507108 31 2015-03-01 09:39:51 1 2.2393993 507109 31 2015-03-01 09:40:58 0 2.239399比较向量化和索引编制方法的方法，您可以在此处找到.. >I have created a resampled data frame (DF1) in pandas with a datetimeindex. I have a separate dataframe (DF2) with a datetimeindex and time column. If an instance of time from DF2 falls within the 30 min bins of datetimeindex in DF1. I want to mark each instance of time in DF2 with the appropriate speed from the 30 min bin in DF1.DF1 boat_id speedtime2015-01-13 09:00:00 28.000000 0.0000002015-01-13 09:30:00 28.000000 0.7235032015-01-13 10:00:00 28.000000 2.239399DF2 id boat_id time statetime2015-01-18 16:09:03 319437 28 2015-01-18 16:09:03 22015-01-18 16:18:43 319451 28 2015-01-18 16:18:43 02015-03-01 09:39:51 507108 31 2015-03-01 09:39:51 12015-03-01 09:40:58 507109 31 2015-03-01 09:40:58 0Desired Result id boat_id time state speedtime2015-01-18 16:09:03 319437 28 2015-01-18 16:09:03 2 nan2015-01-18 16:18:43 319451 28 2015-01-18 16:18:43 0 nan2015-03-01 09:39:51 507108 31 2015-03-01 09:39:51 1 2.2393992015-03-01 09:40:58 507109 31 2015-03-01 09:40:58 0 2.239399I created this script to try and do this but I think it's failing because datetimeindex of DF1 is immutable and so my timedelta request doesn't create a start point for the chunk. One thought I had was if it would be possible to copy the datetimeindex of DF1 into a new column where the objects are mutable but I haven't managed it yet so am not 100% sure of the logic. I'm happy to tinker but at the moment i've been stalled for a while so was hoping someone else might have a few ideas. Thanks in advance.for row in DF1.iterrows(): for dfrow in DF2.iterrows(): if dfrow[0] > row[0] - dt.timedelta(minutes=30) and dfrow[0] < row[0]: df['test'] = row[1] 解决方案 Performance of iterating is very low. Better is use vectorized solution. I use twice function merge. Docs.Input:print df1 boat_id speedtime2015-03-01 09:00:00 28 0.0000002015-03-01 09:30:00 28 0.7235032015-03-01 10:00:00 28 2.239399print df2 id boat_id time statetime2015-01-18 16:09:03 319437 28 2015-01-18 16:09:03 22015-01-18 16:18:43 319451 28 2015-01-18 16:18:43 02015-03-01 09:39:51 507108 31 2015-03-01 09:39:51 12015-03-01 09:40:58 507109 31 2015-03-01 09:40:58 0I reset index of both dataframes and create helper column i filled by 1.df1 = df1.reset_index()df2 = df2.reset_index(drop=True)df1['i'] = df2['i'] = 1print df1 time boat_id speed i0 2015-03-01 09:00:00 28 0.000000 11 2015-03-01 09:30:00 28 0.723503 12 2015-03-01 10:00:00 28 2.239399 1print df2 id boat_id time state i0 319437 28 2015-01-18 16:09:03 2 11 319451 28 2015-01-18 16:18:43 0 12 507108 31 2015-03-01 09:39:51 1 13 507109 31 2015-03-01 09:40:58 0 1Then I merged both dataframes by helper column i.df = df2.merge(df1, on='i', how='left')df = df.rename(columns={'time_y':'Bin_time', 'time_x':'time'})print df id boat_id_x time state i Bin_time \0 319437 28 2015-01-18 16:09:03 2 1 2015-03-01 09:00:001 319437 28 2015-01-18 16:09:03 2 1 2015-03-01 09:30:002 319437 28 2015-01-18 16:09:03 2 1 2015-03-01 10:00:003 319451 28 2015-01-18 16:18:43 0 1 2015-03-01 09:00:004 319451 28 2015-01-18 16:18:43 0 1 2015-03-01 09:30:005 319451 28 2015-01-18 16:18:43 0 1 2015-03-01 10:00:006 507108 31 2015-03-01 09:39:51 1 1 2015-03-01 09:00:007 507108 31 2015-03-01 09:39:51 1 1 2015-03-01 09:30:008 507108 31 2015-03-01 09:39:51 1 1 2015-03-01 10:00:009 507109 31 2015-03-01 09:40:58 0 1 2015-03-01 09:00:0010 507109 31 2015-03-01 09:40:58 0 1 2015-03-01 09:30:0011 507109 31 2015-03-01 09:40:58 0 1 2015-03-01 10:00:00 boat_id_y speed0 28 0.0000001 28 0.7235032 28 2.2393993 28 0.0000004 28 0.7235035 28 2.2393996 28 0.0000007 28 0.7235038 28 2.2393999 28 0.00000010 28 0.72350311 28 2.239399Output is filtered by bin time:df = df[((df.time >= (df.Bin_time - dt.timedelta(minutes=30))) & (df.time <= df.Bin_time ))]df = df.drop(['Bin_time', 'id', 'boat_id_x', 'boat_id_y','state', 'i' ], axis=1 )print df time speed8 2015-03-01 09:39:51 2.23939911 2015-03-01 09:40:58 2.239399And df is merged by column time with dataframe df2.df = df2.merge(df, on='time', how='left').reset_index(drop=True)df = df.drop([ 'i' ], axis=1 )print df id boat_id time state speed0 319437 28 2015-01-18 16:09:03 2 NaN1 319451 28 2015-01-18 16:18:43 0 NaN2 507108 31 2015-03-01 09:39:51 1 2.2393993 507109 31 2015-03-01 09:40:58 0 2.239399Comparing vectorized and indexing approach you can found in similar answer here. 这篇关于 pandas ，检查datetimeindex的重新采样的30分钟时间段中是否存在时间戳值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！