本文介绍了为Google的全文搜索服务转义搜索查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是一个



我正在使用gae 1.6.6中的全新全文搜索服务,并且遇到问题在将它们传递给搜索索引之前,弄清楚如何正确地转义我的查询字符串。文档提到某些字符需要转义(即),但是他们没有指定查询解析器如何预期字符串被转义。



我遇到的问题是双重的: / p>


  1. 无法将许多字符中的垃圾转义出来(比文档中暗示的更多)会导致解析器将 QueryException 。

  2. 当我将查询转义出来时,它不会引发数字运算符(>,<

我设置了一个测试,其中我提供了 string.printable 转换为 my_index.search(),发现它会引发 QueryException 在每个可打印控制字符上,我现在将其删除,以及看起来像是我nnocent像星号,逗号,括号,括号,代字号。



到目前为止,我已经尝试过:


  • cgi.escape()

  • saxutils.escape() code>,将ascii映射到urlencoded等价物(例如, - > %2C )

  • saxutils.escape(),并将ascii映射到html实体编码的ascii代码(例如&#123;

  • urllib.quote_plus()



到目前为止,我已经使用url-style(%NN )替换取得了最好的结果,但>,<,> = ,并且此外,这似乎与转义问题没有任何关系,但在字段前使用 NOT =值类型查询似乎不像广告中那样工作。



tl; dr



在将查询发送到搜索服务前,我应该如何逃避查询,以便解析器不会引发查询收益 QueryException 预期的结果?

解决方案

正如文档(),查询参数是一个字符串应符合我们的查询语言。现在,我建议你用双引号将你的查询(或至少一些单词/术语)包装起来。这样你就可以传递所有可打印的字符,但和。下面的例子显示了结果。

  import string 
from google.appengine.api.search import Query
Query(''%s''%string.printable.replace(''','').replace('\\\', ''))

您甚至可以传递不可打印的字符


$ b'$

 查询(''%s''%''.join(chr(i)for i in xrange(128))。replace(''','' ).replace('\\',''))

编辑
请注意,用双引号括起来的任何内容都是完全匹配的,即foo bar将与... foo bar匹配...但不会... bar foo ..


This is a cross-post of https://groups.google.com/d/topic/google-appengine/97LY3Yfd_14/discussion

I'm working with the new full text search service in gae 1.6.6 and I'm having trouble figuring out how to correctly escape my query strings before I pass them off to the search index. The docs mention that certain characters need to be escaped (namely the numeric operators), however they don't specify how the query parser expects the string to be escaped.

The issue I'm having is two-fold:

  1. Failing to escape the crap out of many characters (more than those that are hinted at in the docs) will cause the parser to raise a QueryException.
  2. When I've escaped the query to the point it won't raise, the numeric operators (>, <, >=, <=) no longer parse correctly (not factored into the search).

I setup a test where I feed string.printable into my_index.search() and found that it would raise QueryException on each of the "printable" control characters, which I'm now stripping out, as well as things that would seem innocent like asterisk, comma, parenthesis, braces, tilde. None of these are mentioned in the docs as needing to be escaped.

So far I've tried:

  • cgi.escape()
  • saxutils.escape() with a mapping of ascii to urlencoded equivalents (eg , -> %2C)
  • saxutils.escape() with a mapping of ascii to html entity encoded ascii codes (eg &#123;)
  • urllib.quote_plus()

I've gotten the best results so far using url-style(%NN) replacements, but >, <, >=, and <= continue to fail to yield the expected results from the index. Also, and this doesn't really seem to have anything to do with the escaping issue, but using NOT in front of a field = value type query seems to not be working as advertised either.

tl;dr

How should I be escaping my queries before sending them to the search service so that the parser doesn't raise QueryException and my query yields expected results?

解决方案

as briefly explained in the documentation (https://developers.google.com/appengine/docs/python/search/overview#Query_Language_Overview), the query parameter is a string that should conform our query language. Which we should document better.

For now, I recommend you to wrap your queries (or at least some of the words/terms) in double quotes. In that way you would be able to pass all printable characters, but " and . The following example shows the result.

import string
from google.appengine.api.search import Query
Query('"%s"' % string.printable.replace('"', '').replace('\\', ''))

and you could even pass non printable characters

Query('"%s"' % ''.join(chr(i) for i in xrange(128)).replace('"','').replace('\\', ''))

EDIT:Note that anything that is enclosed in double quotes is an exact match, that is "foo bar" would match against ...foo bar... but no ...bar foo..

这篇关于为Google的全文搜索服务转义搜索查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-12 12:28