本文介绍了PHP多cURL性能比顺序file_get_contents差的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在写一个接口,其中必须发送4个http请求以获取一些信息。



我通过两种方式实现了接口:


  1. 使用顺序file_get_contents。

  2. 使用多重卷曲。

我已经用jmeter对2个版本进行了基准测试。结果表明,当Jmeter请求中只有一个线程时,多卷曲比顺序file_get_contents好多了,但是当100个线程时,多卷曲会更糟。



问题是:



我的多重卷曲代码如下:

 

code> $ curl_handle_arr = array();
$ master = curl_multi_init();
foreach($ call_url_arr as $ key => $ url)
{
$ curl_handle = curl_init($ url);
$ curl_handle_arr [$ key] = $ curl_handle;
curl_setopt($ curl_handle,CURLOPT_RETURNTRANSFER,true);
curl_setopt($ curl_handle,CURLOPT_POST,true);
curl_setopt($ curl_handle,CURLOPT_POSTFIELDS,http_build_query($ params_arr [$ key]));
curl_multi_add_handle($ master,$ curl_handle);
}
$ running = null;
$ mrc = null;
do
{
$ mrc = curl_multi_exec($ master,$ running);
}
while($ mrc == CURLM_CALL_MULTI_PERFORM);
while($ running& $ mrc == CURLM_OK)
{
if(curl_multi_select($ master)!= - 1)
{
do
{
$ mrc = curl_multi_exec($ master,$ running);
}
while($ mrc == CURLM_CALL_MULTI_PERFORM);
}
}
foreach($ call_url_arr as $ key => $ url)
{
$ curl_handle = $ curl_handle_arr [$ key];
if(curl_error($ curl_handle)=='')
{
$ result_str_arr [$ key] = curl_multi_getcontent($ curl_handle);
}
curl_multi_remove_handle($ master,$ curl_handle);
}
curl_multi_close($ master);


解决方案

1。简单优化




  • 如果 curl_multi_select 失败,您应该睡眠大约2500微秒。

    实际上,有时每次执行都会失败

    没有休眠,你的CPU资源被大量的
  • (不是全部)请求完成后,如果您什么也不做,

    ,您应该允许最大超时秒数更大。
  • 您的代码是为旧libcurls写入的。从libcurl版本7.2开始,

    状态 CURLM_CALL_MULTI_PERFORM 不再出现。



所以,下面的代码

  $ running = null; 
$ mrc = null;
do
{
$ mrc = curl_multi_exec($ master,$ running);
}
while($ mrc == CURLM_CALL_MULTI_PERFORM);
while($ running& $ mrc == CURLM_OK)
{
if(curl_multi_select($ master)!= - 1)
{
do
{
$ mrc = curl_multi_exec($ master,$ running);
}
while($ mrc == CURLM_CALL_MULTI_PERFORM);
}
}

应为

  curl_multi_exec($ master,$ running); 
do
{
if(curl_multi_select($ master,99)=== -1)
{
usleep(2500);
continue;
}
curl_multi_exec($ master,$ running);
} while($ running);



注意



超时值 curl_multi_select 应该只在你想做类似...

时调整。

  curl_multi_exec($ master,$ running); 
do
{
if(curl_multi_select($ master,$ TIMEOUT)=== -1)
{
usleep(2500);
continue;
}
curl_multi_exec($ master,$ running);
while($ info = curl_multi_info_read($ master))
{
/ *使用$ info * /
执行操作}
} while($ running);

否则,值应该为extreamly大。

(但是, PHP_INT_MAX 太大; libcurl将其视为无效值。)



2。在一个PHP进程中轻松实验



我使用我的并行cURL执行程序库测试:



我也尝试了相反的顺序(第三个请求被踢了xD):





这些结果表示并发TCP连接数过多实际上会降低吞吐量



3。高级优化



3-A。对于不同的目的地



如果您想针对少数和多个并发请求进行优化,以下脏解决方案可能会对您有所帮助。


  1. 使用 apcu_add / apcu_fetch共享请求者数量 / apcu_delete

  2. 按照当前值切换方法(顺序或并行)。



3-B。对于相同的目的地,



CURLMOPT_PIPELINING 此选项将同一目标的所有HTTP / 1.1连接捆绑到一个TCP连接中。

  curl_multi_setopt $ master,CURLMOPT_PIPELINING,1); 


I am writing an interface in which I must launch 4 http requests to get some infomation.

I implemented the interface in 2 ways:

  1. using sequential file_get_contents.
  2. using multi curl.

I have benchmarked the 2 versions with jmeter. The result shows that multi curl is much better than sequential file_get_contents when there's only 1 thread in jmeter making requests, but much worse when 100 threads.

The question is: which could bring the bad performance of multi curl?

My multi curl code is as below:

$curl_handle_arr = array ();
$master = curl_multi_init();
foreach ( $call_url_arr as $key => $url )
{
    $curl_handle = curl_init( $url );
    $curl_handle_arr [$key] = $curl_handle;
    curl_setopt( $curl_handle , CURLOPT_RETURNTRANSFER , true );
    curl_setopt( $curl_handle , CURLOPT_POST , true );
    curl_setopt( $curl_handle , CURLOPT_POSTFIELDS , http_build_query( $params_arr [$key] ) );
    curl_multi_add_handle( $master , $curl_handle );
}
$running = null;
$mrc = null;
do
{
    $mrc = curl_multi_exec( $master , $running );
}
while ( $mrc == CURLM_CALL_MULTI_PERFORM );
while ( $running && $mrc == CURLM_OK )
{
    if (curl_multi_select( $master ) != - 1)
    {
        do
        {
            $mrc = curl_multi_exec( $master , $running );
        }
        while ( $mrc == CURLM_CALL_MULTI_PERFORM );
    }
}
foreach ( $call_url_arr as $key => $url )
{
    $curl_handle = $curl_handle_arr [$key];
    if (curl_error( $curl_handle ) == '')
    {
        $result_str_arr [$key] = curl_multi_getcontent( $curl_handle );
    }
    curl_multi_remove_handle( $master , $curl_handle );
}
curl_multi_close( $master );
解决方案

1. Simple optimization

  • You should sleep about 2500 microseconds if curl_multi_select failed.
    Actually, it defintely fails sometimes for each execution.
    Without sleeping, your CPU resources get occupied by lots of while (true) { } loops.
  • If you do nothing after some (not all) of the requests have finished,
    you should let maximum timeout seconds larger.
  • Your code is written for old libcurls. As of libcurl version 7.2,
    the state CURLM_CALL_MULTI_PERFORM does not appear anymore.

So, the following code

$running = null;
$mrc = null;
do
{
    $mrc = curl_multi_exec( $master , $running );
}
while ( $mrc == CURLM_CALL_MULTI_PERFORM );
while ( $running && $mrc == CURLM_OK )
{
    if (curl_multi_select( $master ) != - 1)
    {
        do
        {
            $mrc = curl_multi_exec( $master , $running );
        }
        while ( $mrc == CURLM_CALL_MULTI_PERFORM );
    }
}

should be

curl_multi_exec($master, $running);
do
{
    if (curl_multi_select($master, 99) === -1)
    {
        usleep(2500);
        continue;
    }
    curl_multi_exec($master, $running);
} while ($running);

Note

The timeout value of curl_multi_select should be tuned only if you want to do something like...

curl_multi_exec($master, $running);
do
{
    if (curl_multi_select($master, $TIMEOUT) === -1)
    {
        usleep(2500);
        continue;
    }
    curl_multi_exec($master, $running);
    while ($info = curl_multi_info_read($master))
    {
        /* Do something with $info */
    }
} while ($running);

Otherwise, the value should be extreamly large.
(However, PHP_INT_MAX is too large; libcurl treats it as an invalid value.)

2. Easy experiment in one PHP process

I tested using my parallel cURL executor library: mpyw/co

(The prep. for is improper and it should be by, sorry for my poor English xD)

<?php

require 'vendor/autoload.php';

use mpyw\Co\Co;

function four_sequencial_requests_for_one_hundread_people()
{
    for ($i = 0; $i < 100; ++$i) {
        $tasks[] = function () use ($i) {
            $ch = curl_init();
            curl_setopt_array($ch, [
                CURLOPT_URL => 'example.com',
                CURLOPT_FORBID_REUSE => true,
                CURLOPT_RETURNTRANSFER => true,
            ]);
            for ($j = 0; $j < 4; ++$j) {
                yield $ch;
            }
        };
    }
    $start = microtime(true);
    yield $tasks;
    $end = microtime(true);
    printf("Time of %s: %.2f sec\n", __FUNCTION__, $end - $start);
}

function requests_for_four_hundreds_people()
{
    for ($i = 0; $i < 400; ++$i) {
        $tasks[] = function () use ($i) {
            $ch = curl_init();
            curl_setopt_array($ch, [
                CURLOPT_URL => 'example.com',
                CURLOPT_FORBID_REUSE => true,
                CURLOPT_RETURNTRANSFER => true,
            ]);
            yield $ch;
        };
    }
    $start = microtime(true);
    yield $tasks;
    $end = microtime(true);
    printf("Time of %s: %.2f sec\n", __FUNCTION__, $end - $start);
}

Co::wait(four_sequencial_requests_for_one_hundread_people(), [
    'concurrency' => 0, // Zero means unlimited
]);

Co::wait(requests_for_four_hundreds_people(), [
    'concurrency' => 0, // Zero means unlimited
]);

I tried for five times to get the following results:

I also tried in reverse order (The 3rd request was kicked xD):

These results represent too many concurrent TCP connections actually decrease throughputs.

3. Advanced optimization

3-A. For different destinations

If you want to optimize for both few and many concurrent requests, the following dirty solution may help you.

  1. Share the number of requesters using apcu_add / apcu_fetch / apcu_delete.
  2. Switch methods(sequencial or parallel) by current value.

3-B. For the same destinations

CURLMOPT_PIPELINING will help you. This option bundles all HTTP/1.1 connections for the same destination into one TCP connection.

curl_multi_setopt($master, CURLMOPT_PIPELINING, 1);

这篇关于PHP多cURL性能比顺序file_get_contents差的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

05-19 07:16