本文介绍了Java多线程文件下载性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近我的项目需要比以前更多的IO交互,我觉得我想要浏览常规库(特别是Commons IO)并解决更深入的IO问题。

Having recently worked on a project which required some more IO interaction than I'm used to, I felt like I wanted to look past the regular libraries (Commons IO, in particular) and tackle some more in depth IO issues.

作为学术测试,我决定实现一个基本的,多线程的HTTP下载器。这个想法很简单:提供一个下载URL,代码将下载该文件。为了提高下载速度,将文件分块并同时下载每个块(使用HTTP 范围:bytes = xx 标头)以尽可能多地使用带宽。

As an academic test, I decided to implement a basic, multi-threaded HTTP downloader. The idea is simple: provide a URL to download, and the code will download the file. To increase download speeds, the file is chunked and each chunk is downloaded concurrently (using the HTTP Range: bytes=x-xheader) to use as much bandwidth as possible.

我有一个工作原型,但你可能已经猜到了,它并不完全理想。目前我手动启动3个下载程序线程,每个线程下载文件的1/3。这些线程使用通用的同步文件编写器实例来实际将文件写入磁盘。完成所有线程后,文件编写器完成,任何打开的流都关闭。一些代码片段给你一个想法:

I have a working prototype, but as you may have guessed, it's not exactly ideal. At the moment I manually start 3 "downloader" threads which each download 1/3 of the file. These threads use a common, synchronized "file writer" instance to actually write the files to disk. When all threads are done, the "file writer" is completed and any open streams are closed. Some snippets of code to give you an idea:

线程启动:

ExecutorService downloadExecutor = Executors.newFixedThreadPool(3);
...
downloadExecutor.execute(new Downloader(fileWriter, download, start1, end1));
downloadExecutor.execute(new Downloader(fileWriter, download, start2, end2));
downloadExecutor.execute(new Downloader(fileWriter, download, start3, end3));

每个下载器线程下载一个块(缓冲)并使用文件编写器写入disk:

Each "downloader" thread downloads a chunk (buffered) and uses the "file writer" to write to disk:

int bytesRead = 0;
byte[] buffer = new byte[1024*1024];
InputStream inStream = entity.getContent();
long seekOffset = chunkStart;
while ((bytesRead = inStream.read(buffer)) != -1)
{
    fileWriter.write(buffer, bytesRead, seekOffset);
    seekOffset += bytesRead;
}

文件编写器使用写入磁盘RandomAccessFile seek() write()块到磁盘:

The "file writer" writes to disk using a RandomAccessFile to seek()and write() the chunks to disk:

public synchronized void write(byte[] bytes, int len, long start) throws IOException
{
      output.seek(start);
      output.write(bytes, 0, len);
}

考虑到所有事情,这种方法似乎有效。但是,它不能很好地工作。我对以下几点有一些建议/帮助/意见表示感谢。非常感谢。

All things considered, this approach seems to work. However, it doesn't work very well. I'd appreciate some advice/help/opinions on the following points. Much appreciated.


  1. 此代码的 CPU使用率是通过屋顶。它使用了我的一半CPU(两个核心中每个核心的50%)来做到这一点,这比可比的下载工具指数级高得多,后者几乎没有给CPU带来压力。我对这个CPU使用率来自哪里感到有点神秘,因为我没想到这一点。

  2. 通常情况下,3个线程中似乎有1个落后显着。其他2个线程将完成,之后它需要第三个线程(看起来主要是第一个具有第一个块的线程)30秒或更长时间才能完成。我可以从任务管理器看到javaw进程仍然在进行小的IO写操作,但我真的不知道为什么会这样(我猜竞争条件?)。

  3. 尽管如此事实上,我选择了一个很大的缓冲区(1MB),我感觉 InputStream 几乎从未实际填充缓冲区,这会导致比我更多的IO写入喜欢。我的印象是,在这种情况下,最好将IO访问权限保持在最低限度,但我不确定这是否是最佳方法。

  4. 我意识到Java可能不是做这样的事情的理想语言,但我确信有比我目前的实现更多的性能。在这种情况下,NIO值得探索吗?

  1. The CPU usage of this code is through the roof. It's using half my CPU (50% of each of the 2 cores) to do this, which is exponentially more than comparable downloading tools which barely stress the CPU at all. I'm a bit mystified as to where this CPU usage comes from, as I wasn't expecting this.
  2. Usually, there seems to be 1 of the 3 threads that is lagging behind significantly. The other 2 threads will finish, after which it takes the third thread (which seems to be mostly the first thread with the first chunk) 30 or more seconds to complete. I can see from the task manager that the javaw process is still doing small IO writes, but I don't really know why this happens (I'm guessing race conditions?).
  3. Despite the fact that I've chosen quite a big buffer (1MB), I get the feeling that the InputStream almost never actually fills the buffer, which causes more IO writes than I would like. I'm under the impression that in this scenario, it would be best to keep the IO access to a minimum, but I don't know for sure whether this is the best approach.
  4. I realise Java may not be the ideal language to do something like this, but I'm convinced there's much more performance to be had than I get in my current implementation. Is NIO worth exploring in this case?

注意:我使用Apache HTTPClient进行HTTP交互,这是 entity.getContent()来自哪里(万一有人想知道)。

Note: I use Apache HTTPClient to do the HTTP interaction, which is where the entity.getContent() comes from (in case anyone is wondering).

推荐答案

回答我自己的问题:


  1. 增加的CPU使用率是由于而(){} 等待线程完成的循环。事实证明, awaitTermination 是等待 Executor 完成的更好的选择:)

  2. (和3和4)这似乎是野兽的本性;最后,我通过仔细同步每个下载大量数据的不同线程实现了我想要做的事情(特别是将这些块写入磁盘)。

  1. The increased CPU usage was due to a while() {} loop that was waiting for the threads to finish. As it turns out, awaitTermination is a much better alternative to wait for an Executor to finish :)
  2. (And 3 and 4) This seems to be the nature of the beast; in the end I achieved what I wanted to do by using careful synchronization of the different threads that each download a chunk of data (well, in particular the writes of these chunks back to disk).

这篇关于Java多线程文件下载性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-13 01:43