本文介绍了具有许多并发且长时间运行的查询的SQL Server性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道如何同时执行许多长时间运行的查询会影响SQL Server及时为每个查询提供服务的能力。

I'm wondering how executing many long-running queries simultaneously will impact SQL Server's ability to service each query in a timely fashion.

我的意思不是模糊,这更是一种假设。让我们假设查询是对具有数百万行的表的谓词进行选择的选择语句。

It wasn't my intention to be vague, it's more a hypothetical. Let's just assume the queries are select statements with some kind of predicate on tables with millions of rows.

推荐答案

CPU



每个到达服务器的请求(即每个批处理)都将与一个任务相关联,请参见。任务在调度程序(通常是CPU内核)上排队,请参见。每个调度程序都有几个工作者(即线程或纤维,请参阅),然后一名自由工作人员将从调度程序的队列中提取下一个任务并跑掉,执行直到任务完成(即请求已完成)。这种调度机制适用于SQL内部的所有,包括系统任务,CLR运行代码等。

CPU

Each request coming to the server (ie. each 'batch') will be associated with a 'task', see sys.dm_os_tasks. The task are queued up on a 'scheduler', which is roughly speaking a CPU core, see sys.dm_os_schedulers. Each schedulers have several 'workers' (ie. threads or fibers, see sys.dm_os_workers) and a free worker will pick up next task from the scheduler's queue and 'run away' with it, executing it until the task is finished (ie. the requests has completed). This scheduling mechanism applies to everything inside SQL, including system tasks, CLR running code and so on and so forth.

可创建的内容受可用内存的限制。请求(批处理)与任务不是一一对应的,因为一些请求一旦启动就计划执行更多任务,并行查询就是典型示例。系统中的工作程序数量是动态的,但受 最大工作程序线程配置设置的限制。如果达到了工人上限,则新的预定任务将在调度程序中排队,但直到工人释放(完成任务)并可用后才开始接听任务。达到此条件后,称为工作者饥饿并导致服务器无响应,因为新客户端登录握手要求执行登录任务(服务器似乎拒绝连接),并且现有客户端的新请求将在等待任务之后排队(服务器需要很长时间来响应琐碎的请求。)

The number of tasks that can be created is limited by available memory. Requests ('batches') do not equate one-to-one to tasks since some requests once started schedule more tasks to be executed, parallel queries being the typical example. The number of workers in the system is dynamic, but capped by the 'max worker threads' configuration setting. If the workers cap was reached, then new scheduled tasks will be queued up in the schedulers but not picked up until a worker frees up (finishes a task) and becomes available. When this condition is reached, is called 'worker starvation' and result in an unresponsive server, since new client login handshakes requires the login tasks to be executed (server appears to reject connections) and existing client's new requests will be queued up behind waiting tasks (server takes long time to respond to trivial requests).

因此,如果您有大量并行且长时间运行的查询,则您将消耗大量的工作人员,长期运行,任务。这减小了空闲工作程序池的大小,从而减少了可用于服务即将到达服务器的其他短任务(例如OLTP请求,登录握手等)的工作程序。服务器似乎没有响应,因为任务在调度程序队列中堆积(可以在 sys.dm_os_schedulers DMV work_queue_count 列)。在极端情况下,您可以有效地耗尽工作人员的系统,使服务器完全无响应,直到某些工作人员有空为止。

So if you have a large number of parallel, long running queries you will consume a large number of workers doing many, long running, tasks. This reduces the size of the free workers pool, resulting in fewer workers available to service other, short tasks that are coming to the server (like OLTP requests, login handshakes etc). The server appears to be unresponsive because the tasks are piling up in the schedulers queues (this can be seen in the sys.dm_os_schedulers DMV work_queue_count column). On extreme cases, you can effectively starve the system of workers, making the server completely unresponsive until some of the workers are free.

包含并行操作的查询计划通常与大型索引(大型表)的完整扫描相关。扫描索引是通过遍历其叶子页面来完成的,读取大表中的所有叶子页面意味着在查询执行期间所有这些页面必须一次出现在内存中。反过来,这就要求从缓冲池中获取可用页面以容纳扫描的页面。对可用页的需求会产生内存压力,从而导致通知缓存以开始逐出旧条目,并导致缓冲池中的旧访问数据页被删除。可以在中看到缓存通知。数据页逐出可以通过检查是否很好来控制。性能计数器中轻松进行检查

If the IO generated by the scans exceed the bandwidth of your system, the IO operations start queuing up on the disk controller(s). this can be easily checked in the Physical Disk/Avg Queue Length performance counters.

最后,最大问题:锁定争用。如前所述,并行查询几乎总是暗含表扫描。表扫描对他们访问的每一行都采取了共享锁。的确,在正常操作模式下读取记录后,它们会立即释放锁定,但是您仍然保证您将对表中的行请求S锁定。这几乎可以保证这些扫描将被更新锁定为X的行。发生这种情况时,扫描必须停止并等待X锁释放,这在更新事务最终提交时发生。结果是,即使在表上进行适度的OLTP活动也会阻止长时间运行的查询。理想情况下,这就是所有发生的事情,结果就是性能不佳。但是,如果长时间运行的查询做任何花哨的事情,例如获取页面锁而不是行锁,事情就会变得很丑。由于这些扫描是端到端地遍历索引,并且可以保证它们与更新冲突,因此,这些查询获得的更高粒度的锁定不再仅仅是与更新锁定冲突,而是实际上导致了死锁。

And finally, the biggest problem: lock contention. As explained, parallel queries almost always imply table scans. And table scans take a shared lock on each row they visit. Its true that they release the lock as soon as the record is read in normal operations mode, but still you are guaranteed that you'll request an S lock on every row in the table. This pretty much guarantees that these scans will hit a row that is locked X by an update. When this happens the scan has to stop and wait for the X lock to be released, which happens when the update transactions finally commits. The result is that even moderate OLTP activity on the table blocks the long running queries. Ideally that is all what happens, and the result is just poor performance. But things can get ugly quickly if the long running query does anything fancy, like acquire page locks instead of row locks. Since these scans traverse the indexes end-to-end and they're guaranteed to enter in conflict with updates, the higher granularity locks acquired by these queries no longer just conflicts with the update locks, but it actually leads to deadlocks. Explaining how this can happen is beyond the point of this reply.

要消除争用,当查询正当地进行全面扫描时,最好的选择是使用魔术快照:为报告创建的 ,或使用级别。请注意,有些人可能建议您使用脏读,但我还没有找到实际可以接受的情况。

To eliminate the contention, when the queries are legitimately doing full scans, the best alternative is to use the magic snapshot: either database snapshots created for reporting, or using the snapshot isolation levels. Note that some may recommend using dirty reads, I'm yet to find a case when that was actually acceptable.

这篇关于具有许多并发且长时间运行的查询的SQL Server性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

11-01 13:18