问题描述
短版本:我们可以从数十或数百个表分区的读取一个多线程的方式来提高几个数量级的性能
Short version: Can we read from dozens or hundreds of table partitions in a multi-threaded manner to increase performance by orders of magnitude?
长版:我们正在被储存数以百万计的行中的Azure表存储在系统上。我们将数据划分为小的分区,每个约含500条记录,从而重新presents每日价值数据的单位。
Long version:We're working on a system that is storing millions of rows in Azure table storage. We partition the data into small partitions, each one containing about 500 records, which represents a day worth of data for a unit.
由于Azure不具有求和功能,拉一年的有价值的数据,我们要么必须使用一些pre-缓存,或和数据我们在Azure的Web或辅助角色。
Since Azure doesn't have a "sum" feature, to pull a year worth of data, we either have to use some pre-caching, or sum the data ourselves in an Azure web or worker role.
假设如下: - 读一个分区,不影响他人的表现 - 读取分区都有一个瓶颈,基于网络速度和服务器检索
Assuming the following:- Reading a partition doesn't affect the performance of another- Reading a partition has a bottleneck based on network speed and server retrieval
我们可以再取一个猜测是,如果我们想快速地飞(1年,365分区)总结了大量的数据,我们可以使用大规模并行算法,它会规模几乎完美的线程数。例如,我们可以用50多个线程使用.NET并行扩展,并获得巨大的性能提升。
We can then take a guess that if we wanted to quickly sum a lot of data on the fly (1 year, 365 partitions), we could use a massively parallel algorithm and it would scale almost perfectly to the number of threads. For example, we could use the .NET parallel extensions with 50+ threads and get a HUGE performance boost.
我们正在努力建立一些实验,但我想看看这个以前也有过。由于.NET一边是高延迟的操作基本上闲置等待,这似乎是完美的多线程。
We're working on setting up some experiments, but I wanted to see if this has been done before. Since the .NET side is basically idle waiting on high-latency operations, this seems perfect for multi-threading.
推荐答案
有施加于可针对一个存储帐户和特定分区或存储服务器在给定时间周期来执行的交易的数目(地方约500限制REQ /秒)。因此,在这个意义上,有一个合理的限度的要求,你可以并行执行(之前它会开始看起来像一个拒绝服务攻击)的数量。
There are limits imposed on the number of transactions that can be performed against a storage account and a particular partition or storage server in a given time period (somewhere around 500 req/s). So in that sense, there is a reasonable limit to the number of requests you could execute in parallel (before it will begin to look like a DoS attack).
此外,在实施中,我会警惕如 System.Net.ServicePointManager
规定的客户端并发连接限制,的。我不知道如果Azure存储客户端受到这些限制;它们可能需要调整。
Also, in implementation, I would be wary of concurrent connection limits imposed on the client, such as by System.Net.ServicePointManager
. I am not sure if the Azure storage client is subject to those limits; they might require adjustment.
这篇关于从大规模并行线程读的Azure表存储性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!