本文介绍了Stormcrawler没有使用Elasticsearch索引内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用Stormcrawler时,它会索引到Elasticsearch,而不是内容.

When using Stormcrawler it is indexing to Elasticsearch, but not the content.

Stormcrawler是最新的'origin/master' https://github. com/DigitalPebble/storm-crawler.git

Stormcrawler is up-to-date with 'origin/master' https://github.com/DigitalPebble/storm-crawler.git

使用elasticsearch-5.6.4

Using elasticsearch-5.6.4

crawler-conf.yaml具有

crawler-conf.yaml has

indexer.url.fieldname: "url" indexer.text.fieldname: "content" indexer.canonical.name: "canonical"

indexer.url.fieldname: "url" indexer.text.fieldname: "content" indexer.canonical.name: "canonical"

已对url和title字段进行了索引,但未对内容进行索引.

The url and title fields are indexed, but not content.

我试图通过遵循Julien的教程来实现此目的: https://www.youtube.com/watch?v=xMCuWpPh-4A

I have trying to get this working by following Julien's tutorial at: https://www.youtube.com/watch?v=xMCuWpPh-4A

一切正常,除了内容没有被索引到Elasticsearch中.我觉得这是一些小的配置错误,但是我尝试了很多变种,但都没有碰到运气.所以,现在我寻求帮助.

Everything is working, except for the content is not being indexed into Elasticsearch. I feel like this is some small config error, but I have tried many variations with no luck. So, now I seek help.

谢谢.

推荐答案

您确定内容未编入索引吗?内容字段未存储,请参见 ES_IndexInit.sh ,但应将其编入索引.要存储它,您可以修改init脚本并重新运行爬网,然后将其与其他字段一样恢复.要测试它是否已建立索引,请尝试对其进行查询,看看它如何影响结果.

Are you sure that the content is not indexed? The content field is not stored, see ES_IndexInit.sh but it should be indexed. To store it, you can modify the init script and re-run the crawl, you'd then get it back same as the other fields. To test that it is indexed, try querying on it and see how it affects the results.

这篇关于Stormcrawler没有使用Elasticsearch索引内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-03 12:11