<div id="cnblogs_post_body" class="blogpost-body cnblogs-markdown"> <h2 id="全文检索基本概念">全文检索基本概念</h2> <ul> <li>搜索<br> 搜索这个行为是用户与搜索引擎的一次交互过程,用户需要找一些数据,他提供给搜索引擎一些约束条件.搜索引擎通过约束条件抽取一些结果给用户</li> <li>搜索引擎<br> 搜索引擎存在的目的是存储,查找和获取数据.Neo4j用的搜索引擎是<code>Lucene</code></li> <li>文档<br> 在搜索软件中,文档是一等公民.存储,搜索,显示都是以文档为核心.文档简单可以理解为数据库中的一行数据,但是这行数据包括了field name.</li> <li>倒排索引<br> 倒排索引是搜索引擎中核心数据结构.简而言之,它将所有文档变成像是一本书后面词汇表的东西. 通过这种数据结构能够快速的从一个单词找到文档</li> <li>Lucene搜索语法</li> </ul> <table> <thead> <tr class="header"> <th style="text-align: left;">Query implementation</th> <th style="text-align: right;">Purpose</th> <th style="text-align: center;">Example</th> </tr> </thead> <tbody> <tr class="odd"> <td style="text-align: left;">TermQuery</td> <td style="text-align: right;">单词匹配</td> <td style="text-align: center;">neo4j</td> </tr> <tr class="even"> <td style="text-align: left;">PhraseQuery</td> <td style="text-align: right;">短语匹配</td> <td style="text-align: center;">"graph database"</td> </tr> <tr class="odd"> <td style="text-align: left;">RangeQuery</td> <td style="text-align: right;">范围匹配</td> <td style="text-align: center;">[A TO Z] {A TO Z}</td> </tr> <tr class="even"> <td style="text-align: left;">WildcardQuery</td> <td style="text-align: right;">正则匹配</td> <td style="text-align: center;">g*p?, d??abase</td> </tr> <tr class="odd"> <td style="text-align: left;">PrefixQuery</td> <td style="text-align: right;">前缀匹配</td> <td style="text-align: center;">algo*</td> </tr> <tr class="even"> <td style="text-align: left;">FuzzyQuery</td> <td style="text-align: right;">后缀匹配</td> <td style="text-align: center;">cipher~</td> </tr> <tr class="odd"> <td style="text-align: left;">BooleanQuery</td> <td style="text-align: right;">查询条件聚合</td> <td style="text-align: center;">graph AND "shortest path"</td> </tr> </tbody> </table> <h2 id="环境准备">环境准备</h2> <ul> <li>容器启动Neo4j<br> <code>docker run -p 17687:7687 -p 17474:7474 --name=neo4j-test neo4j:3.5.3</code></li> <li>创建数据, 使用测试数据.<br> <code>:play northwind-graph</code></li> </ul> <h2 id="neo4j全文检索">Neo4j全文检索</h2> <p>Neo4j全文检索有以下特性,不过用下来最重要的我感觉是<strong>创建索引的语句实际上只是创建于给命名控件</strong>. Neo4j从2.2.x时代开始就默认开启<code>node_auto_indexing=true</code>. 倒排索引在数据插入时候已经创建了. <strong>创建索引/删除索引代价是非常小的</strong></p> <ul> <li>支持关系与节点的索引</li> <li>支持常用<code>analyzers</code>扩展</li> <li>可以使用<code>lucene query</code>语句</li> <li>可以返回查询结果评分</li> <li>对索引自动更新</li> <li>单索引文档数量不限</li> </ul> <h3 id="索引创建与删除">索引创建与删除</h3> <p>建立两个索引, 一个是<code>Product</code>的该标签的索引. 另外一个全数据库全文检索的索引</p> <pre><code class="hljs less"><span class="hljs-selector-tag">call</span> <span class="hljs-selector-tag">db</span><span class="hljs-selector-class">.index</span><span class="hljs-selector-class">.fulltext</span><span class="hljs-selector-class">.createNodeIndex</span>(<span class="hljs-string">"all"</span>,[<span class="hljs-string">'Product'</span>, <span class="hljs-string">'Category'</span>, <span class="hljs-string">'Supplier'</span>],[<span class="hljs-string">'reorderLevel'</span>, <span class="hljs-string">'unitsInStock'</span>, <span class="hljs-string">'unitPrice'</span>, <span class="hljs-string">'supplierID'</span>, <span class="hljs-string">'productID'</span>, <span class="hljs-string">'discontinued'</span>, <span class="hljs-string">'quantityPerUnit'</span>, <span class="hljs-string">'categoryID'</span>, <span class="hljs-string">'unitsOnOrder'</span>, <span class="hljs-string">'productName'</span>, <span class="hljs-string">'description'</span>, <span class="hljs-string">'categoryName'</span>, <span class="hljs-string">'picture'</span>, <span class="hljs-string">'country'</span>, <span class="hljs-string">'address'</span>, <span class="hljs-string">'contactTitle'</span>, <span class="hljs-string">'city'</span>, <span class="hljs-string">'phone'</span>, <span class="hljs-string">'contactName'</span>, <span class="hljs-string">'postalCode'</span>, <span class="hljs-string">'companyName'</span>, <span class="hljs-string">'fax'</span>, <span class="hljs-string">'region'</span>, <span class="hljs-string">'homePage'</span>])

<span class="hljs-selector-tag">call</span> <span class="hljs-selector-tag">db</span><span class="hljs-selector-class">.index</span><span class="hljs-selector-class">.fulltext</span><span class="hljs-selector-class">.createNodeIndex</span>(<span class="hljs-string">"product"</span>,[<span class="hljs-string">'Product'</span>],[<span class="hljs-string">'reorderLevel'</span>, <span class="hljs-string">'unitsInStock'</span>, <span class="hljs-string">'unitPrice'</span>, <span class="hljs-string">'supplierID'</span>, <span class="hljs-string">'productID'</span>, <span class="hljs-string">'quantityPerUnit'</span>, <span class="hljs-string">'discontinued'</span>, <span class="hljs-string">'productName'</span>, <span class="hljs-string">'unitsOnOrder'</span>, <span class="hljs-string">'categoryID'</span>])</code></pre>

<p>删除索引</p> <pre><code class="hljs css"><span class="hljs-selector-tag">call</span> <span class="hljs-selector-tag">db</span><span class="hljs-selector-class">.index</span><span class="hljs-selector-class">.fulltext</span><span class="hljs-selector-class">.drop</span>("<span class="hljs-selector-tag">all</span>")</code></pre> <p>可以通过函数获取所有标签和属性</p> <pre><code class="hljs dos"><span class="hljs-keyword">call</span> db.propertyKeys <span class="hljs-keyword">call</span> db.labels</code></pre> <h3 id="查询">查询</h3> <p>这里面的查询非常简单.只要记住一个语句就能应付大多数场景</p> <pre><code class="hljs sql"><span class="hljs-keyword">call</span> db.index.fulltext.queryNodes( <span class="hljs-string">'all'</span>, //这里索引名 <span class="hljs-string">'Av'</span> // lucene查询语句 ) yield node <span class="hljs-keyword">where</span> node.address contains <span class="hljs-string">"12"</span> // <span class="hljs-keyword">where</span>语句 <span class="hljs-keyword">return</span> node <span class="hljs-keyword">order</span> <span class="hljs-keyword">by</span> node.address // <span class="hljs-keyword">order</span> <span class="hljs-keyword">skip</span> <span class="hljs-keyword">limit</span> <span class="hljs-keyword">skip</span> <span class="hljs-number">0</span> <span class="hljs-keyword">limit</span> <span class="hljs-number">1</span></code></pre>

</div>

04-03 16:38