本文介绍了日期时间转换-如何提取推断的格式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这里是日期时间值的数组:

Here's an array of datetime values:

array = np.array(['2016-05-01T00:00:59.3+10:00', '2016-05-01T00:02:59.4+10:00',
                  '2016-05-01T00:03:59.4+10:00', '2016-05-01T00:13:00.1+10:00',
                  '2016-05-01T00:22:00.5+10:00', '2016-05-01T00:31:01.1+10:00'],
        dtype=object)

pd.to_datetime非常擅长推断日期时间格式.

pd.to_datetime is very good at inferring datetime formats.

array = pd.to_datetime(array)

print(array)
DatetimeIndex(['2016-04-30 14:00:59.300000', '2016-04-30 14:02:59.400000',
               '2016-04-30 14:03:59.400000', '2016-04-30 14:13:00.100000',
               '2016-04-30 14:22:00.500000', '2016-04-30 14:31:01.100000'],
              dtype='datetime64[ns]', freq=None)

如何动态找出推断出的日期时间格式pd.to_datetime? %Y-%m-%dT...(对不起,我的datetime foo是真的很糟糕).

How can I dynamically figure out what datetime format pd.to_datetime inferred? Something like: %Y-%m-%dT... (sorry, my datetime foo is really bad).

推荐答案

我认为不可能在大熊猫中完全笼统地做到这一点.

I don't think it's possible to do this in full generality in pandas.

如其他评论和答案所述,内部函数 _guess_datetime_format 接近您的要求,但是它对构成可猜测格式的内容有严格的标准,因此仅适用于日期时间字符串的受限类.

As mentioned in other comments and answers, the internal function _guess_datetime_format is close to being what you ask for, but it has strict criteria for what constitutes a guessable format and so it will only work for a restricted class of datetime strings.

这些条件在这些行,您还可以在 test_parsing 脚本.

These criteria are set out in the _guess_datetime_format function on these lines and you can also see some examples of good and bad formats in the test_parsing script.

一些要点是:

  • 年,月和日必须分别存在且可识别
  • 年份必须有四个数字
  • 如果使用微秒,则必​​须使用六位数字
  • 您不能指定时区

这意味着,尽管它们是有效的 ISO,但它将无法猜测问题中日期时间字符串的格式8601 格式:

This means that it will fail to guess the format for datetime strings in the question despite them being a valid ISO 8601 format:

>>> from pandas.core.tools.datetimes import _guess_datetime_format_for_array
>>> array = np.array(['2016-05-01T00:00:59.3+10:00'])
>>> _guess_datetime_format_for_array(array)
# returns None

在这种情况下,放下时区并将微秒填充到六位数就足以使熊猫识别格式:

In this case, dropping the timezone and padding the microseconds to six digits is enough to make pandas to recognise the format:

>>> array = np.array(['2016-05-01T00:00:59.300000']) # six digits, no tz
>>> _guess_datetime_format_for_array(array)
'%Y-%m-%dT%H:%M:%S.%f'

这可能和它一样好.

如果未要求pd.to_datetime推断数组的格式,或未给出尝试使用的格式字符串,则它将尝试分别解析每个字符串,并希望它成功.至关重要的是,它不需要事先推断出格式即可.

If pd.to_datetime is not asked to infer the format of the array, or given a format string to try, it will just try and parse each string separately and hope that it is successful. Crucially, it does not need to infer a format in advance to do this.

首先,pandas假设字符串为解析字符串(大约)为ISO 8601格式.这始于对 _string_to_dts 并最终到达低级 parse_iso_8601_datetime 函数,可以完成艰苦的工作.

First, pandas parses the string assuming it is (approximately) a ISO 8601 format. This begins in a call to _string_to_dts and ultimately hits the low-level parse_iso_8601_datetime function that does the hard work.

您可以使用 _test_parse_iso8601 函数.例如:

You can check if your string is able to be parsed in this way using the _test_parse_iso8601 function. For example:

from pandas._libs.tslib import _test_parse_iso8601

def is_iso8601(string):
    try:
        _test_parse_iso8601(string)
        return True
    except ValueError:
        return False

您提供的数组中的日期将被识别为以下格式:

The dates in the array you give are recognised as this format:

>>> is_iso8601('2016-05-01T00:00:59.3+10:00')
True

但是,这不能满足问题的要求,而且我看不到任何现实的方法来恢复parse_iso_8601_datetime函数可以识别的确切格式.

But this doesn't deliver what the question asks for and I don't see any realistic way to recover the exact format that is recognised by the parse_iso_8601_datetime function.

如果将字符串解析为ISO 8601格式失败,则熊猫会退回到使用 parse() 函数(由 parse_datetime_string ).这样可以提供极高的解析灵活性,但是,我仍然不知道从此函数中提取公认的日期时间格式的任何好方法.

If parsing the string as a ISO 8601 format fails, pandas falls back to using the parse() function from the third-party dateutil library (called by parse_datetime_string). This allows a fantastic level of parsing flexibility but, again, I don't know of any good way to extract the recognised datetime format from this function.

如果这两个解析器中的两个都失败,则pandas要么引发错误,要么忽略字符串,要么默认为NaT(取决于用户指定的内容).不会再尝试解析字符串或猜测字符串的格式.

If both of these two parsers fail, pandas either raises an error, ignores the string or defaults to NaT (depending on what the user specifies). No further attempt is made to parse the string or guess the format of the string.

这篇关于日期时间转换-如何提取推断的格式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

11-01 12:38