本文介绍了如何使用argparse将二进制文件作为stdin传递到Docker容器化的Python脚本?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

基于 进行更新>

Update based on Anthony Sottile's Answer

我重新实现了他的解决方案以简化问题。让我们将Docker和Django排除在外。目标是使用熊猫通过以下两种方法读取excel:

I re-implemented his solution to simplify the problem. Lets take Docker and Django out of the equation. The goal is to use Pandas to read excel by both of the following methods:


  1. python example.py-< ; /path/to/file.xlsx

  2. cat /path/to/file.xlsx | python example.py-

  1. python example.py - < /path/to/file.xlsx
  2. cat /path/to/file.xlsx | python example.py -

其中example.py复制如下:

where example.py is reproduced below:

import argparse
import contextlib
from typing import IO
import sys
import pandas as pd


@contextlib.contextmanager
def file_ctx(filename: str) -> IO[bytes]:
    if filename == '-':
        yield sys.stdin.buffer
    else:
        with open(filename, 'rb') as f:
            yield f


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('FILE')
    args = parser.parse_args()

    with file_ctx(args.FILE) as input_file:
        print(input_file.read())
        df = pd.read_excel(input_file)
        print(df)


if __name__ == "__main__":
    main()

问题是熊猫(请参见下面的回溯)不接受2。但是它与1配合使用会很好。

The problem is that Pandas (see traceback below) does not accept 2. However it works fine with 1.

在1和2中都可以简单地打印excel文件的文本表示形式。

Whereas simply printing the text representation of the excel file works in both 1. and 2.

如果您想轻松地重现Docker环境:

首先构建名为pandas的Docker映像:

First build Docker image named pandas:

docker build --pull -t pandas - <<EOF
FROM python:latest
RUN pip install pandas xlrd
EOF

然后使用pandas Docker镜像运行:
docker run --rm -i -v /path/to/example.py:/example。 py熊猫python example.py-< /path/to/file.xlsx

Then use pandas Docker image to run:docker run --rm -i -v /path/to/example.py:/example.py pandas python example.py - < /path/to/file.xlsx

请注意,它如何能够正确打印出excel文件的纯文本表示形式,但是pandas是

Note how it correctly is able to print out a plaintext representation of the excel file, but pandas is unable to read it.

更简洁的回溯,类似于以下内容:

A more concise traceback, similar to below:

Traceback (most recent call last):
  File "example.py", line 29, in <module>
    main()
  File "example.py", line 24, in main
    df = pd.read_excel(input_file)
  File "/usr/local/lib/python3.8/site-packages/pandas/util/_decorators.py", line 208, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/pandas/io/excel/_base.py", line 310, in read_excel
    io = ExcelFile(io, engine=engine)
  File "/usr/local/lib/python3.8/site-packages/pandas/io/excel/_base.py", line 819, in __init__
    self._reader = self._engines[engine](self._io)
  File "/usr/local/lib/python3.8/site-packages/pandas/io/excel/_xlrd.py", line 21, in __init__
    super().__init__(filepath_or_buffer)
  File "/usr/local/lib/python3.8/site-packages/pandas/io/excel/_base.py", line 356, in __init__
    filepath_or_buffer.seek(0)
io.UnsupportedOperation: File or stream is not seekable.

显示在安装excel文件时代码的工作方式(即未通过stdin传递):

To show the code working when mounting the excel file in (i.e. Not being passed by stdin):

docker run --rm -i -v /path/to/example.py:/example.py -v / path / to / file .xlsx:/file.xlsx熊猫python example.py file.xlsx

原始问题描述(用于其他情况)

以在主机系统上的情况为例位于 /tmp/test.txt 的文件,并且您想在其上使用 head ,但在Docker容器中( echo'Hello World!'> /tmp/test.txt 以重现我的示例数据):

Take the scenario where on the host system, you have a file at /tmp/test.txt and you want to use head on it, but within a Docker container (echo 'Hello World!' > /tmp/test.txt to reproduce the example data I have):

您可以运行:

docker run -i busybox head -1-< /tmp/test.txt 将第一行打印到屏幕上:

docker run -i busybox head -1 - < /tmp/test.txt to print the first line out to screen:

OR

cat /tmp/test.txt | docker run -i busybox head -1-

,输出为:

Hello World!

即使使用.xlsx这样的二进制格式而不是纯文本格式,也可以完成上述操作,您将得到一些奇怪的输出类似于:

Even with a binary format like .xlsx instead of plaintext, the above can be done and you would get some weird output similar to:

�Oxl/_rels/workbook.xml.rels���j�0
                                  ��}

以上要点是,即使通过抽象,head也可以使用二进制和文本格式Docker。

The point above is that head works with both binary and text formats even through the abstraction of Docker.

但是在我自己的基于argparse的CLI中(,我相信它使用了argparse),在尝试使用熊猫的 read_excel

But in my own argparse based CLI (Actually custom Django management command, which I believe makes use of argparse), I get the following error when attempting to use panda's read_excel within a Docker context.

输出的错误如下:

Traceback (most recent call last):
  File "./manage.py", line 15, in <module>
    execute_from_command_line(sys.argv)
  File "/opt/conda/lib/python3.7/site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line
    utility.execute()
  File "/opt/conda/lib/python3.7/site-packages/django/core/management/__init__.py", line 375, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/opt/conda/lib/python3.7/site-packages/django/core/management/base.py", line 323, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/opt/conda/lib/python3.7/site-packages/django/core/management/base.py", line 364, in execute
    output = self.handle(*args, **options)
  File "/home/jovyan/sequence_databaseApp/management/commands/seq_db.py", line 54, in handle
    df_snapshot = pd.read_excel(options['FILE'].buffer, sheet_name='Snapshot', header=0, dtype=dtype)
  File "/opt/conda/lib/python3.7/site-packages/pandas/util/_decorators.py", line 208, in wrapper
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/pandas/io/excel/_base.py", line 310, in read_excel
    io = ExcelFile(io, engine=engine)
  File "/opt/conda/lib/python3.7/site-packages/pandas/io/excel/_base.py", line 819, in __init__
    self._reader = self._engines[engine](self._io)
  File "/opt/conda/lib/python3.7/site-packages/pandas/io/excel/_xlrd.py", line 21, in __init__
    super().__init__(filepath_or_buffer)
  File "/opt/conda/lib/python3.7/site-packages/pandas/io/excel/_base.py", line 356, in __init__
    filepath_or_buffer.seek(0)
io.UnsupportedOperation: File or stream is not seekable.

具体

docker run -i< IMAGE> ./manage.py my_cli import-< /path/to/file.xlsx 不起作用

。/ manage.py my_cli导入-< /path/to/file.xlsx 是可行的!

在Docker上下文中有所不同。

Somehow there is a difference within the Docker context.

但是我也注意到,即使将Docker排除在外:

However I also note, even taking Docker out of the equation:

cat /path/to/file.xlsx | ./manage.py my_cli import- 不起作用

尽管:

./ manage.py my_cli import-< /path/to/file.xlsx 确实起作用(如前所述)

./manage.py my_cli import - < /path/to/file.xlsx does work (as mentioned before)

最后,我是使用(您应该能够在管理/命令下将其另存为my_cli.py以使其在Django项目中正常工作):

Finally, the code I am using (You should be able to save that as my_cli.py under management/commands to get it working within a Django project):

import argparse


import sys


from django.core.management.base import BaseCommand


class Command(BaseCommand):
    help = 'my_cli help'

    def add_arguments(self, parser):
        subparsers = parser.add_subparsers(
            title='commands', dest='command', help='command help')
        subparsers.required = True
        parser_import = subparsers.add_parser('import', help='import help')
        parser_import.add_argument('FILE', type=argparse.FileType('r'), default=sys.stdin)

    def handle(self, *args, **options):
        import pandas as pd
        df = pd.read_excel(options['FILE'].buffer, header=0)
        print(df)


推荐答案

似乎您正在以文本模式读取文件( FileType('r') / sys.stdin

It looks as though you're reading the file in text mode (FileType('r') / sys.stdin)

根据 argparse不支持直接打开二进制文件

According to this bpo issue argparse does not support opening binary files directly

我建议您使用与此类似的代码自己处理文件类型(我不熟悉django / pandas的方式,所以我将其简化为普通的python)

I'd suggest handling the file type yourself with code similar to this (I'm not familiar with the django / pandas way so I've simplified it down to just plain python)

import argparse
import contextlib
import io
from typing import IO


@contextlib.contextmanager
def file_ctx(filename: str) -> IO[bytes]:
    if filename == '-':
        yield io.BytesIO(sys.stdin.buffer.read())
    else:
        with open(filename, 'rb') as f:
            yield f


def main() -> int:
    parser = argparse.ArgumentParser()
    parser.add_argument('FILE')
    args = parser.parse_args()

    with file_ctx(args.FILE) as input_file:
        # do whatever you need with that input file

这篇关于如何使用argparse将二进制文件作为stdin传递到Docker容器化的Python脚本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

11-03 06:28