问题描述
我们正在与一个朋友一起进行一个有趣的项目,我们必须使用不同的代理来执行数百个HTTP请求.想象一下,它类似于以下内容:
We are working on a fun project with a friend and we have to execute hundreds of HTTP requests, all using different proxies. Imagine that it is something like the following:
for (int i = 0; i < 20; i++)
{
HttpClientHandler handler = new HttpClientHandler { Proxy = new WebProxy(randomProxy, true) };
using (var client = new HttpClient(handler))
{
using (var request = new HttpRequestMessage(HttpMethod.Get, "http://x.com"))
{
var response = await client.SendAsync(request);
if (response.IsSuccessStatusCode)
{
string content = await response.Content.ReadAsStringAsync();
}
}
using (var request2 = new HttpRequestMessage(HttpMethod.Get, "http://x.com/news"))
{
var response = await client.SendAsync(request2);
if (response.IsSuccessStatusCode)
{
string content = await response.Content.ReadAsStringAsync();
}
}
}
}
顺便说一句,我们正在使用.NET Core(目前为控制台应用程序).我知道有许多有关套接字耗尽和处理DNS回收的线程,但是由于使用了多个代理,因此这一特定线程有所不同.
By the way, we are using .NET Core (Console Application for now). I know there are many threads about socket exhaustion and handling DNS recycling, but this particular one is different, because of the multiple proxy usage.
如果我们使用HttpClient的单例实例,就像每个人都建议的那样:
If we use a singleton instance of HttpClient, just like everyone suggests:
- 我们不能设置多个代理,因为它是在HttpClient实例化期间设置的,此后无法更改.
- 它不考虑DNS更改.重用HttpClient实例意味着它一直保持在套接字上,直到关闭为止,因此,如果服务器上发生DNS记录更新,则直到该套接字关闭,客户端才知道.一种解决方法是将
keep-alive
标头设置为false
,以便在每次请求后关闭套接字.它导致次优的性能.第二种方法是使用ServicePoint
:
- We can't set more than one proxy, because it is being set during HttpClient's instantiation and cannot be changed afterwards.
- It doesn't respect DNS changes. Re-using an instance of HttpClient means that it holds on to the socket until it is closed so if you have a DNS record update occurring on the server the client will never know until that socket is closed. One workaround is to set the
keep-alive
header tofalse
, so the socket will be closed after each request. It leads to a sub-optimal performance. The second way is by usingServicePoint
:
ServicePointManager.FindServicePoint("http://x.com")
.ConnectionLeaseTimeout = Convert.ToInt32(TimeSpan.FromSeconds(15).TotalMilliseconds);
ServicePointManager.DnsRefreshTimeout = Convert.ToInt32(TimeSpan.FromSeconds(5).TotalMilliseconds);
另一方面,处置HttpClient(就像上面的示例一样),换句话说,部署HttpClient的多个实例,导致多个套接字处于 TIME_WAIT
状态.TIME_WAIT表示本地端点(此侧)已关闭连接.
On the other hand, disposing HttpClient (just like in my example above), in other words multiple instances of HttpClient, is leading to multiple sockets in TIME_WAIT
state. TIME_WAIT indicates that local endpoint (this side) has closed the connection.
我知道 SocketsHttpHandler
和 IHttpClientFactory
,但是它们不能解决不同的代理.
I'm aware of SocketsHttpHandler
and IHttpClientFactory
, but they can't solve the different proxies.
var socketsHandler = new SocketsHttpHandler
{
PooledConnectionLifetime = TimeSpan.FromMinutes(10),
PooledConnectionIdleTimeout = TimeSpan.FromMinutes(5),
MaxConnectionsPerServer = 10
};
// Cannot set a different proxy for each request
var client = new HttpClient(socketsHandler);
最明智的决定是什么?
推荐答案
首先,我想提一下,如果在编译时知道代理,@ Stephen Cleary的示例就可以正常工作,但在我的情况下,可以在以下位置知道它们:运行.我忘了在问题中提到这一点,所以这是我的错.
First of all, I want to mention that @Stephen Cleary's example works fine if the proxies are known at compile-time, but in my case they are known at runtime. I forgot to mention that in the question, so it's my fault.
感谢@aepot指出这些内容.
Thanks to @aepot for pointing out those stuff.
这就是我想出的解决方案(信用@mcont):
That's the solution I came up with (credits @mcont):
/// <summary>
/// A wrapper class for <see cref="FlurlClient"/>, which solves socket exhaustion and DNS recycling.
/// </summary>
public class FlurlClientManager
{
/// <summary>
/// Static collection, which stores the clients that are going to be reused.
/// </summary>
private static readonly ConcurrentDictionary<string, IFlurlClient> _clients = new ConcurrentDictionary<string, IFlurlClient>();
/// <summary>
/// Gets the available clients.
/// </summary>
/// <returns></returns>
public ConcurrentDictionary<string, IFlurlClient> GetClients()
=> _clients;
/// <summary>
/// Creates a new client or gets an existing one.
/// </summary>
/// <param name="clientName">The client name.</param>
/// <param name="proxy">The proxy URL.</param>
/// <returns>The <see cref="FlurlClient"/>.</returns>
public IFlurlClient CreateOrGetClient(string clientName, string proxy = null)
{
return _clients.AddOrUpdate(clientName, CreateClient(proxy), (_, client) =>
{
return client.IsDisposed ? CreateClient(proxy) : client;
});
}
/// <summary>
/// Disposes a client. This leaves a socket in TIME_WAIT state for 240 seconds but it's necessary in case a client has to be removed from the list.
/// </summary>
/// <param name="clientName">The client name.</param>
/// <returns>Returns true if the operation is successful.</returns>
public bool DeleteClient(string clientName)
{
var client = _clients[clientName];
client.Dispose();
return _clients.TryRemove(clientName, out _);
}
private IFlurlClient CreateClient(string proxy = null)
{
var handler = new SocketsHttpHandler()
{
Proxy = proxy != null ? new WebProxy(proxy, true) : null,
PooledConnectionLifetime = TimeSpan.FromMinutes(10)
};
var client = new HttpClient(handler);
return new FlurlClient(client);
}
}
每个请求的代理意味着每个请求都有一个额外的套接字(另一个HttpClient实例).
A proxy per request means an additional socket for each request (another HttpClient instance).
在上述解决方案中, ConcurrentDictionary
用于存储HttpClient,因此我可以重用它们,这就是HttpClient的确切含义.我可以对5个请求使用相同的代理,然后再被API限制阻止.我也忘记在问题中提到这一点.
In the solution above, ConcurrentDictionary
is used to store the HttpClients, so I can reuse them, which is the exact point of HttpClient. I could use same proxy for 5 requests, before it gets blocked by API limitations. I forgot to mention that in the question as well.
如您所见,有两种解决套接字耗尽和DNS回收的解决方案: IHttpClientFactory
和 SocketsHttpHandler
.第一个不适合我的情况,因为我使用的代理是在运行时(而不是在编译时)知道的.上面的解决方案使用第二种方法.
As you've seen, there are two solutions solving socket exhaustion and DNS recycling: IHttpClientFactory
and SocketsHttpHandler
. The first one doesn't suit my case, because the proxies I'm using are known at runtime, not at compile-time. The solution above uses the second way.
对于那些有相同问题的人,您可以在GitHub上阅读以下问题.它解释了一切.
For those who have same issue, you can read the following issue on GitHub. It explains everything.
我对改进持开放态度,所以戳我.
I'm open-minded for improvements, so poke me.
这篇关于具有多个代理的HttpClient,同时处理套接字耗尽和DNS回收的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!