>>>df3.stack().swaplevel(0,1).sort_index().reset_index()代码日期数量0 一 2019-01-10 20.01 A 2019-01-11 17.02 A 2019-01-12 0.03 一 2019-01-13 10.04 A 2019-01-14 10.05 A 2019-01-15 9.06 一 2019-01-16 9.07 乙 2019-01-10 12.08 乙 2019-01-11 41.09 乙 2019-01-12 47.010 乙 2019-01-13 58.011 乙 2019-01-14 58.012 乙 2019-01-15 58.013 乙 2019-01-16 57.014 C 2019-01-10 10.015 C 2019-01-11 9.016 C 2019-01-12 9.017 C 2019-01-13 9.018 C 2019-01-14 9.019 C 2019-01-15 9.020℃ 2019-01-16 9.0With some help from the community I have managed to get to the below function. previous question on building the functionI am trying to work out how to get the resampled date to run to the latest date that appears in anywhere in either of the input data sets for any code. Below I have included the current output I am getting and my desired output.Input data:Input 1 df1 - In date code qty0 2019-01-10 A 201 2019-01-10 B 122 2019-01-10 C 103 2019-01-11 A 24 2019-01-11 B 305 2019-01-11 C 27 2019-01-12 A 48 2019-01-12 B 611 2019-01-13 A 1012 2019-01-13 B 1213 2019-01-13 C 1Input 2 df2 - Outbound date code qty0 2019-01-11 A 51 2019-01-11 B 12 2019-01-11 C 33 2019-01-12 A 1006 2019-01-13 B 17 2019-01-13 C 18 2019-01-15 A 19 2019-01-16 B 1Existing Code:from numba import njit@njitdef poscumsum(x): total = 0 result = np.empty(x.shape) for i, y in enumerate(x): total += y if total < 0: total = 0 result[i] = total return resulta = df1.set_index(['code', 'date'])b = df2.set_index(['code', 'date'])idx = a.index.union(b.index).sort_values()df3 = (a.reindex(idx, fill_value=0) - b.reindex(idx, fill_value=0))df3 = df3.groupby('code').resample('D', level='date').sum()df3['qty'] = df3.groupby('code')['qty'].transform( lambda g: poscumsum(g.values))Current Outputeach code is only represented for dates on which they appear in the In or Out dfs. code date qty 0 A 2019-01-10 20 1 A 2019-01-11 17 2 A 2019-01-12 0 3 A 2019-01-13 10 4 A 2019-01-14 10 5 A 2019-01-15 9 6 B 2019-01-10 12 7 B 2019-01-11 41 8 B 2019-01-12 47 9 B 2019-01-13 58 10 B 2019-01-14 58 11 B 2019-01-15 58 12 B 2019-01-16 57 13 C 2019-01-10 10 14 C 2019-01-11 9 15 C 2019-01-12 9 16 C 2019-01-13 9Desired Output:each code is represented for each date between 2019-01-10 & 2019-01-16 code date qty 0 A 2019-01-10 20 1 A 2019-01-11 17 2 A 2019-01-12 0 3 A 2019-01-13 10 4 A 2019-01-14 10 5 A 2019-01-15 9 6 A 2019-01-16 9 7 B 2019-01-10 12 8 B 2019-01-11 41 9 B 2019-01-12 47 10 B 2019-01-13 58 11 B 2019-01-14 58 12 B 2019-01-15 58 13 B 2019-01-16 57 14 C 2019-01-10 10 15 C 2019-01-11 9 16 C 2019-01-12 9 17 C 2019-01-13 9 18 C 2019-01-14 9 19 C 2019-01-15 9 20 C 2019-01-16 9 解决方案 Ok, here is a 2D version of poscumsum (and generalized to cap the running sum at min and/or max):@njitdef cumsum_capped_2d(x, xmin=None, xmax=None): n, m = x.shape result = np.empty_like(x) if n == 0: return result total = np.zeros_like(x[0]) for i in range(n): total += x[i] if xmin is not None: total[total < xmin] = xmin if xmax is not None: total[total > xmax] = xmax result[i] = total return resultAnd here is how to use it (now that you want all dates spanning the same period); the good news is that there is no more groupby (so it is faster than ever):a = df1.pivot('date', 'code', 'qty')b = df2.pivot('date', 'code', 'qty')idx = a.index.union(b.index).sort_values()df3 = (a.reindex(idx, fill_value=0) - b.reindex(idx, fill_value=0)).resample('D').sum()df3.values[:, :] = cumsum_capped_2d(df3.values, xmin=0)Or, in two (convoluted) lines:df3 = (df1.set_index(['date', 'code']).subtract(df2.set_index(['date', 'code']), fill_value=0) .unstack('code', fill_value=0).resample('D').sum())df3.values[:, :] = cumsum_capped_2d(df3.values, xmin=0)On your data:>>> df3code A B Cdate 2019-01-10 20.0 12.0 10.02019-01-11 17.0 41.0 9.02019-01-12 0.0 41.0 9.02019-01-13 0.0 52.0 9.02019-01-14 0.0 52.0 9.02019-01-15 0.0 52.0 9.02019-01-16 0.0 51.0 9.0Of course, you are free stack back into a skinny df, re-order, drop index, etc. For example, to match your desired output:>>> df3.stack().swaplevel(0,1).sort_index().reset_index() code date qty0 A 2019-01-10 20.01 A 2019-01-11 17.02 A 2019-01-12 0.03 A 2019-01-13 10.04 A 2019-01-14 10.05 A 2019-01-15 9.06 A 2019-01-16 9.07 B 2019-01-10 12.08 B 2019-01-11 41.09 B 2019-01-12 47.010 B 2019-01-13 58.011 B 2019-01-14 58.012 B 2019-01-15 58.013 B 2019-01-16 57.014 C 2019-01-10 10.015 C 2019-01-11 9.016 C 2019-01-12 9.017 C 2019-01-13 9.018 C 2019-01-14 9.019 C 2019-01-15 9.020 C 2019-01-16 9.0 这篇关于如何重新采样直到满足特定日期标准的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!
10-10 04:22