本文介绍了在FTP服务器上扩展Python的os.walk函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何使 os.walk 遍历FTP数据库(位于远程服务器上)的目录树?现在代码的结构方式是(提供注释):

How can I make os.walk traverse the directory tree of an FTP database (located on a remote server)? The way the code is structured now is (comments provided):

import fnmatch, os, ftplib

def find(pattern, startdir=os.curdir): #find function taking variables for both desired file and the starting directory
    for (thisDir, subsHere, filesHere) in os.walk(startdir): #each of the variables change as the directory tree is walked
        for name in subsHere + filesHere: #going through all of the files and subdirectories
            if fnmatch.fnmatch(name, pattern): #if the name of one of the files or subs is the same as the inputted name
                fullpath = os.path.join(thisDir, name) #fullpath equals the concatenation of the directory and the name
                yield fullpath #return fullpath but anew each time

def findlist(pattern, startdir = os.curdir, dosort=False):
    matches = list(find(pattern, startdir)) #find with arguments pattern and startdir put into a list data structure
    if dosort: matches.sort() #isn't dosort automatically False? Is this statement any different from the same thing but with a line in between
    return matches

#def ftp(
#specifying where to search.

if __name__ == '__main__':
    import sys
    namepattern, startdir = sys.argv[1], sys.argv[2]
    for name in find(namepattern, startdir): print (name)

我认为我需要定义一个新函数(即 def ftp())将此功能添加到上面的代码中,但是,恐怕 os.walk 函数会,默认情况下,仅遍历运行代码的计算机的目录树。

I am thinking that I need to define a new function (i.e., def ftp()) to add this functionality to the code above. However, I am afraid that the os.walk function will, by default, only walk the directory trees of the computer that the code is run from.

有没有一种方法可以扩展的功能os.walk 能够遍历远程目录树(通过FTP)?

Is there a way that I can extend the functionality of os.walk to be able to traverse a remote directory tree (via FTP)?

推荐答案

全部您需要利用python的 ftplib 模块,因为 os.walk()基于广度优先搜索算法你需要 在每次迭代时找到目录和文件名,然后从第一个目录递归继续遍历。大约2年前,我实现了 ,这是用于通过FTP遍历超大型目录树的最佳软件包。

All you need is utilizing the python's ftplib module. Since os.walk() is based on a Breadth-first search algorithm you need to find the directories and file names at each iteration, then continue the traversing recursively from the first directory. I implemented this algorithm about 2 years ago for using as the heart of FTPwalker, which is an optimum package for traversing extremely large directory trees Through FTP.

from os import path as ospath


class FTPWalk:
    """
    This class is contain corresponding functions for traversing the FTP
    servers using BFS algorithm.
    """
    def __init__(self, connection):
        self.connection = connection

    def listdir(self, _path):
        """
        return files and directory names within a path (directory)
        """

        file_list, dirs, nondirs = [], [], []
        try:
            self.connection.cwd(_path)
        except Exception as exp:
            print ("the current path is : ", self.connection.pwd(), exp.__str__(),_path)
            return [], []
        else:
            self.connection.retrlines('LIST', lambda x: file_list.append(x.split()))
            for info in file_list:
                ls_type, name = info[0], info[-1]
                if ls_type.startswith('d'):
                    dirs.append(name)
                else:
                    nondirs.append(name)
            return dirs, nondirs

    def walk(self, path='/'):
        """
        Walk through FTP server's directory tree, based on a BFS algorithm.
        """
        dirs, nondirs = self.listdir(path)
        yield path, dirs, nondirs
        for name in dirs:
            path = ospath.join(path, name)
            yield from self.walk(path)
            # In python2 use:
            # for path, dirs, nondirs in self.walk(path):
            #     yield path, dirs, nondirs
            self.connection.cwd('..')
            path = ospath.dirname(path)

现在使用此类,您可以使用 ftplib 模块简单地创建一个连接对象,并将该对象传递给 FTPWalk 对象,然后循环遍历 walk()函数:

Now for using this class, you can simply create a connection object using ftplib module and pass the the object to FTPWalk object and just loop over the walk() function:

In [2]: from test import FTPWalk

In [3]: import ftplib

In [4]: connection = ftplib.FTP("ftp.uniprot.org")

In [5]: connection.login()
Out[5]: '230 Login successful.'

In [6]: ftpwalk = FTPWalk(connection)

In [7]: for i in ftpwalk.walk():
            print(i)
   ...:     
('/', ['pub'], [])
('/pub', ['databases'], ['robots.txt'])
('/pub/databases', ['uniprot'], [])
('/pub/databases/uniprot', ['current_release', 'previous_releases'], ['LICENSE', 'current_release/README', 'current_release/knowledgebase/complete', 'previous_releases/', 'current_release/relnotes.txt', 'current_release/uniref'])
('/pub/databases/uniprot/current_release', ['decoy', 'knowledgebase', 'rdf', 'uniparc', 'uniref'], ['README', 'RELEASE.metalink', 'changes.html', 'news.html', 'relnotes.txt'])
...
...
...

这篇关于在FTP服务器上扩展Python的os.walk函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-30 20:58