使用 salesforce Java API 查询超过 1,000,000 条记录并寻找最佳方法

本文介绍了使用 salesforce Java API 查询超过 1,000,000 条记录并寻找最佳方法的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在开发一个 Java 应用程序，它将查询可能包含超过 1,000,000 条记录的表.我已经尽我所能尽可能提高效率，但我只能在 avg 上实现.每分钟大约有 5,000 条记录，一次最多可以有 10,000 条记录.我尝试对数据加载器进行逆向工程，我的代码似乎非常相似，但仍然没有运气.

I am developing a Java application which will query tables which may hold over 1,000,000 records. I have tried everything I could to be as efficient as possible but I am only able to achieve on avg. about 5,000 records a minute and a maximum of 10,000 at one point. I have tried reverse engineering the data loader and my code seems to be very similar but still no luck.

线程在这里是一个可行的解决方案吗?我试过这个，但结果很小.

Is threading a viable solution here? I have tried this but with very minimal results.

我一直在阅读并应用了所有可能的事情(压缩请求/响应、线程等)，但我无法像速度一样实现数据加载器.

I have been reading and have applied every thing possible it seems (compressing requests/responses, threads etc.) but I cannot achieve data loader like speeds.

需要注意的是，queryMore 方法似乎是瓶颈.

To note, it seems that the queryMore method seems to be the bottle neck.

有没有人可以分享任何代码示例或经验来引导我朝着正确的方向发展?

Does anyone have any code samples or experiences they can share to steer me in the right direction?

谢谢

推荐答案

我过去使用的一种方法是仅查询所需的 ID(这使得查询速度显着加快).然后，您可以跨多个线程并行化retrieve().

An approach I've used in the past is to query just for the IDs that you want (which makes the queries significantly faster). You can then parallelize the retrieves() across several threads.

看起来像这样:

[查询线程] -> BlockingQueue -> [线程池执行retrieve()] -> BlockingQueue

[query thread] -> BlockingQueue -> [thread pool doing retrieve()] -> BlockingQueue

第一个线程尽可能快地执行 query() 和 queryMore()，将它获得的所有 id 写入 BlockingQueue.queryMore() 不是你应该同时调用的东西，据我所知，所以没有办法并行化这一步.所有 id 都写入 BlockingQueue.如果这成为一个问题，您可能希望将它们打包成几百个包以减少锁争用.然后，线程池可以对 id 进行并发的 retrieve() 调用，以获取 SObject 的所有字段，并将它们放入队列中，以便应用程序的其余部分进行处理.

The first thread does query() and queryMore() as fast as it can, writing all ids it gets into the BlockingQueue. queryMore() isn't something you should call concurrently, as far as I know, so there's no way to parallelize this step. All ids are written into a BlockingQueue. You may wish to package them up into bundles of a few hundred to reduce lock contention if that becomes an issue. A thread pool can then do concurrent retrieve() calls on the ids to get all the fields for the SObjects and put them in a queue for the rest of your app to deal with.

我编写了一个 Java 库来使用可能有用的 SF API.http://blog.teamlazerbeez.com/2011/03/03/a-new-java-salesforce-api-library/

I wrote a Java library for using the SF API that may be useful. http://blog.teamlazerbeez.com/2011/03/03/a-new-java-salesforce-api-library/

这篇关于使用 salesforce Java API 查询超过 1,000,000 条记录并寻找最佳方法的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！