从大规模并行线程读的Azure表存储性能

本文介绍了从大规模并行线程读的Azure表存储性能的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

短版本：我们可以从数十或数百个表分区的读取一个多线程的方式来提高几个数量级的性能

Short version: Can we read from dozens or hundreds of table partitions in a multi-threaded manner to increase performance by orders of magnitude?

长版：我们正在被储存数以百万计的行中的Azure表存储在系统上。我们将数据划分为小的分区，每个约含500条记录，从而重新presents每日价值数据的单位。

Long version:We're working on a system that is storing millions of rows in Azure table storage. We partition the data into small partitions, each one containing about 500 records, which represents a day worth of data for a unit.

由于Azure不具有求和功能，拉一年的有价值的数据，我们要么必须使用一些pre-缓存，或和数据我们在Azure的Web或辅助角色。

Since Azure doesn't have a "sum" feature, to pull a year worth of data, we either have to use some pre-caching, or sum the data ourselves in an Azure web or worker role.

假设如下： - 读一个分区，不影响他人的表现 - 读取分区都有一个瓶颈，基于网络速度和服务器检索

Assuming the following:- Reading a partition doesn't affect the performance of another- Reading a partition has a bottleneck based on network speed and server retrieval

我们可以再取一个猜测是，如果我们想快速地飞（1年，365分区）总结了大量的数据，我们可以使用大规模并行算法，它会规模几乎完美的线程数。例如，我们可以用50多个线程使用.NET并行扩展，并获得巨大的性能提升。

We can then take a guess that if we wanted to quickly sum a lot of data on the fly (1 year, 365 partitions), we could use a massively parallel algorithm and it would scale almost perfectly to the number of threads. For example, we could use the .NET parallel extensions with 50+ threads and get a HUGE performance boost.

我们正在努力建立一些实验，但我想看看这个以前也有过。由于.NET一边是高延迟的操作基本上闲置等待，这似乎是完美的多线程。

We're working on setting up some experiments, but I wanted to see if this has been done before. Since the .NET side is basically idle waiting on high-latency operations, this seems perfect for multi-threading.

Azure

从大规模并行线程读的Azure表存储性能

问题描述

推荐答案