分布式搜索引擎ElasticSearch之高级运用（五）

一、IK分词器

安装IK分词插件

下载地址

执行安装

采用本地文件安装方式，进入ES安装目录，执行插件安装命令：

[elsearch@localhost plugins]$../bin/elasticsearch-plugin install file:///usr/local/elasticsearch-7.10.2/elasticsearch-analysis-ik-7.10.2.zip

安装成功后，会给出对应提示：

-> Installing file:///usr/local/elasticsearch-7.10.2/elasticsearch-analysis-ik-7.10.2.zip
-> Downloading file:///usr/local/elasticsearch-7.10.2/elasticsearch-analysis-ik-7.10.2.zip
[=================================================] 100%  
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@     WARNING: plugin requires additional permissions     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
* java.net.SocketPermission * connect,resolve
See http://docs.oracle.com/javase/8/docs/technotes/guides/security/permissions.html
for descriptions of what these permissions allow and the associated risks.

Continue with installation? [y/N]y
-> Installed analysis-ik

重启ElasticSearch服务

测试IK分词器

标准分词器：

GET _analyze?pretty
{
  "analyzer": "standard",
  "text": "我爱我的祖国"
}

采用IK智能化分词器：

GET _analyze?pretty
{
  "analyzer": "ik_smart",
  "text": "我爱我的祖国"
}

IK最大化分词：

GET _analyze?pretty
{
  "analyzer": "ik_max_word",
  "text": "我爱我的祖国与家乡"
}

IK分词器最佳运用

analyzer指定的是构建索引的分词，search_analyzer指定的是搜索关键字的分词。

实践运用的时候，构建索引的时候采用max_word，将分词最大化；查询的时候则使用smartword智能化分词，这样能够最大程度的匹配出结果。
```
PUT / orders_test {
	"mappings": {
		"properties": {
			"goodsName": {
				"type": "text",
				"analyzer": "ik_max_word",
				"search_analyzer": "ik_smart"
			}
		}
	}
}
```

二、全量索引构建

下载logstash

下载地址
安装logstash-input-jdbc插件

进入logstash目录，执行：
```
bin/logstash-plugin install logstash-input-jdbc
```

配置mysql驱动包

[root@localhost bin]# mkdir mysql
[root@localhost bin]# cp mysql-connector-java-5.1.34.jar /usr/local/logstash-7.10.2/bin/mysql/

配置JDBC连接

创建索引数据是从mysql中通过select语句查询，然后再通过logstash-input-jdbc的配置文件方式导入 elasticsearch中。

在/usr/local/logstash-7.10.2/bin/mysql/目录创建jdbc.conf与jdbc.sql文件。

jdbc.conf文件:

input {
    stdin {
    }
    jdbc {
        # mysql 数据库链接,users为数据库名
        jdbc_connection_string => "jdbc:mysql://127.0.0.1:3306/users"
        # 用户名和密码
        jdbc_user => "root"
        jdbc_password => "654321"
        # 驱动
        jdbc_driver_library => "/usr/local/logstash-7.10.2/bin/mysql/mysql-connector-java-5.1.34.jar
        # 驱动类名
        jdbc_driver_class => "com.mysql.jdbc.Driver"
        jdbc_paging_enabled => "true"
        jdbc_page_size => "50000"
        # 执行的sql 文件路径+名称
        statement_filepath => "/usr/local/logstash-7.10.2/bin/mysql/jdbc.sql"
        # 设置监听间隔 各字段含义（由左至右）分、时、天、月、年，全部为*默认含义为每分钟都更新
        schedule => "* * * * *"
    }
}

output {
	elasticsearch {
		#ES的连接信息
		hosts => ["10.10.20.28:9200"]
		#索引名称
		index => "users"
		document_type => "_doc"
		#自增ID， 需要关联的数据库的ID字段， 对应索引的ID标识
		document_id => "%{id}"
	}
	stdout {
		#JSON格式输出
		codec => json_lines
	}
}

jdbc.sql文件：

select `id`, `institutionTypeId`, `menuCode`, `menuName`, `menuUri`, `menuLevel`, `componentSrc` from t_authority_menu

创建ES索引

PUT /users?include_type_name=false
{
	"settings": {
		"number_of_shards": 1,
		"number_of_replicas": 0
	},
	"mappings": {
		"properties": {
			"id": {
				"type": "integer"
			},
			"institutionTypeId": {
				"type": "text",
				"analyzer": "whitespace",
				"fielddata": true
			},
			"menuCode": {
				"type": "text"
			},
			"menuName": {
				"type": "text",
				"analyzer": "ik_max_word",
				"search_analyzer": "ik_smart"
			},
			"menuUri": {
				"type": "text"
			},
			"menuLevel": {
				"type": "integer"
			},
			"componentSrc": {
				"type": "text"
			}
		}
	}
}

执行全量同步

执行命令：

./logstash -f mysql/jdbc.conf

检查结果：

GET /users/_search

三、增量索引同步

修改jdbc.conf配置文件：

增加配置：

input {
	jdbc{
		#设置timezone
		jdbc_default_timezone => "Asia/Shanghai"
		...
		# 增量同步属性标识
		last_run_metadata_path => "/usr/local/logstash-7.10.2/bin/mysql/last_value"
	}
}

修改jdbc.sql配置文件

这里根据最后更新时间，做增量同步：

select `id`, `institutionTypeId`, `menuCode`, `menuName`, `menuUri`, `menuLevel`, `componentSrc` from t_authority_menu where last_udpate_time > :sql_last_value

创建同步最后记录时间

vi /usr/local/logstash-7.10.2/bin/mysql/last_value

给定一个初始的时间：

2020-01-01 00:00:00

验证

1）启动logstash，会根据初始时间，加载对应的数据。

2）如果修改了数据的更新时间， logstash会自动检测，同步增量的数据至ES中。

本文由mirson创作分享，如需进一步交流，请加QQ群：19310171或访问www.softart.cn

麦神-mirson

分布式搜索引擎ElasticSearch之高级运用（五）

一、IK分词器

二、全量索引构建

三、增量索引同步