java - 如何使用Java-首选Java8通过多线程解压缩大型文件夹？

对应于:
http://www.pixeldonor.com/2013/oct/12/concurrent-zip-compression-java-nio/

我正在尝试解压缩5GB的压缩文件，平均大约需要30分钟，这对于我们的应用程序来说很多，我正在尝试减少时间。

我尝试了很多组合，更改了缓冲区大小(默认情况下，我的写块为4096字节)，更改了NIO方法，库，所有结果都几乎相同。

仍然没有尝试的一件事是按块分割压缩文件，因此请按多线程块读取它。

片段代码为:

  private static ExecutorService e = Executors.newFixedThreadPool(20);
  public static void main(String argv[]) {
        try {
            String selectedZipFile = "/Users/xx/Documents/test123/large.zip";
            String selectedDirectory = "/Users/xx/Documents/test2";
            long st = System.currentTimeMillis();

            unzip(selectedDirectory, selectedZipFile);

            System.out.println(System.currentTimeMillis() - st);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }


public static void unzip(String targetDir, String zipFilename) {
    ZipInputStream archive;
            try {
                List<ZipEntry> list = new ArrayList<>();
                archive = new ZipInputStream(new BufferedInputStream(new FileInputStream(zipFilename)));
                ZipEntry entry;
                while ((entry = archive.getNextEntry()) != null) {
                    list.add(entry);
                }

                for (List<ZipEntry> partition : Lists.partition(list, 1000)) {
                    e.submit(new Multi(targetDir, partition, archive));
                }
            } catch (Exception e){
                e.printStackTrace();
            }
}

可运行的是:

  static class Multi implements Runnable {

    private List<ZipEntry> partition;
    private ZipInputStream zipInputStream;
    private String targetDir;

    public Multi(String targetDir, List<ZipEntry> partition, ZipInputStream zipInputStream) {
        this.partition = partition;
        this.zipInputStream = zipInputStream;
        this.targetDir = targetDir;
    }

    @Override
    public void run() {
        for (ZipEntry entry : partition) {
            File entryDestination = new File(targetDir, entry.getName());
            if (entry.isDirectory()) {
                entryDestination.mkdirs();
            } else {
                entryDestination.getParentFile().mkdirs();

                BufferedOutputStream output = null;
                try {
                    int n;
                    byte buf[] = new byte[BUFSIZE];
                    output = new BufferedOutputStream(new FileOutputStream(entryDestination), BUFSIZE);
                    while ((n = zipInputStream.read(buf, 0, BUFSIZE)) != -1) {
                        output.write(buf, 0, n);
                    }
                    output.flush();


                } catch (FileNotFoundException e1) {
                    e1.printStackTrace();
                } catch (IOException e1) {
                    e1.printStackTrace();
                } finally {

                    try {
                        output.close();
                    } catch (IOException e1) {
                        e1.printStackTrace();
                    }

                }
            }
        }
    }
}

但是出于某种原因，它只存储没有文件内容的目录...

我的问题是:关于上述“压缩”文章的方式，在大型zip文件上用多线程制作块的正确方法是什么？

最佳答案

ZipInputStream 是单个数据流，无法拆分。

如果要多线程解压缩，则需要使用 ZipFile 。使用Java 8，您甚至可以免费获得多线程。

public static void unzip(String targetDir, String zipFilename) {
    Path targetDirPath = Paths.get(targetDir);
    try (ZipFile zipFile = new ZipFile(zipFilename)) {
        zipFile.stream()
               .parallel() // enable multi-threading
               .forEach(e -> unzipEntry(zipFile, e, targetDirPath));
    } catch (IOException e) {
        throw new RuntimeException("Error opening zip file '" + zipFilename + "': " + e, e);
    }
}

private static void unzipEntry(ZipFile zipFile, ZipEntry entry, Path targetDir) {
    try {
        Path targetPath = targetDir.resolve(Paths.get(entry.getName()));
        if (Files.isDirectory(targetPath)) {
            Files.createDirectories(targetPath);
        } else {
            Files.createDirectories(targetPath.getParent());
            try (InputStream in = zipFile.getInputStream(entry)) {
                Files.copy(in, targetPath, StandardCopyOption.REPLACE_EXISTING);
            }
        }
    } catch (IOException e) {
        throw new RuntimeException("Error processing zip entry '" + entry.getName() + "': " + e, e);
    }
}

您可能还想 checkout this answer，它使用 FileSystem 访问zip文件内容，以获得真正的Java 8体验。