Elasticsearch is a search engine based on the Lucene library. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. Elasticsearch is developed in Java. This blog is record what I learn from this geek Course
The recommendation for learning Elasticsearch.
Getting started with elasticsearch
This course github https://github.com/geektime-geekbang/geektime-ELK
Elasticsearch https://www.elastic.co/
- Elasticsearch Certification https://www.elastic.co/cn/training/certification
- ElasticSearch Engineering I training https://www.elastic.co/cn/training/elasticsearch-engineer-1
- ElasticSearch Engineering II training https://www.elastic.co/cn/training/elasticsearch-engineer-2
Install elasticsearch
Install can refer offical docs
From version 7.0, don’t need install java by yourself.
directory | Vaconfig filelue | description |
---|---|---|
bin | execution file | |
config | elasticsearch.yml | es cluster config |
JDK | Java env | |
data | path.data | es data file |
lib | Java lib | |
logs | path.log | log file |
modules | including all es modules | |
plugins | including all installed plugins |
1 | # adjust JVM - config/jvm.options |
Config recommendation
- Xms equal to Xmx
- Xmx don’t larger than 50% of memory
- Xmx no larger than 30GB can refer https://www.elastic.co/blog/a-heap-of-trouble
plugins
1 |
|
Run multiple nodes in the same machine1
2
3
4bin/elasticsearch -E node.name=node0 -E cluster.name=geektime -E path.dat= node0_data -d
bin/elasticsearch -E node.name=node1 -E cluster.name=geektime -E path.dat= node1_data -d
bin/elasticsearch -E node.name=node2 -E cluster.name=geektime -E path.dat= node2_data -d
bin/elasticsearch -E node.name=node3 -E cluster.name=geektime -E path.dat= node3_data -d
docker
Use docker-compose start cerebro kibana and 2 elasticsearch1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71version: '2.2'
services:
cerebro:
image: lmenezes/cerebro:0.8.5
container_name: cerebro
ports:
- "9000:9000"
command:
- -Dhosts.0.host=http://elasticsearch:9200
networks:
- es7net
kibana:
image: docker.elastic.co/kibana/kibana:7.1.0
container_name: kibana7
environment:
- I18N_LOCALE=zh-CN
- XPACK_GRAPH_ENABLED=true
- TIMELION_ENABLED=true
- XPACK_MONITORING_COLLECTION_ENABLED="true"
ports:
- "5601:5601"
networks:
- es7net
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:7.1.0
container_name: es7_01
environment:
- cluster.name=geektime
- node.name=es7_01
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
- discovery.seed_hosts=es7_01,es7_02
- cluster.initial_master_nodes=es7_01,es7_02
ulimits:
memlock:
soft: -1
hard: -1
volumes:
- es7data1:/usr/share/elasticsearch/data
ports:
- 9200:9200
networks:
- es7net
elasticsearch2:
image: docker.elastic.co/elasticsearch/elasticsearch:7.1.0
container_name: es7_02
environment:
- cluster.name=geektime
- node.name=es7_02
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
- discovery.seed_hosts=es7_01,es7_02
- cluster.initial_master_nodes=es7_01,es7_02
ulimits:
memlock:
soft: -1
hard: -1
volumes:
- es7data2:/usr/share/elasticsearch/data
networks:
- es7net
volumes:
es7data1:
driver: local
es7data2:
driver: local
networks:
es7net:
driver: bridge
logstash
Make sure all the version is the same for elasticsearch logstash and kibana
Now the latest version is 7.6, because this course use 7.1, https://www.elastic.co/downloads/past-releases/logstash-7-1-0
1 | # download logstash,(7.1.0)in mac you need also install JAVA |
- Download MovieLens data set:https://grouplens.org/datasets/movielens/
- Logstash download:https://www.elastic.co/cn/downloads/logstash
- Logstash docs:https://www.elastic.co/guide/en/logstash/current/index.html
Elasticsearch basic
Document is the smallest unit of elasticsearch. Document will transform to json and store in es. Every document has unique ID.
- Mutiple Tyeps https://www.elastic.co/cn/blog/moving-from-types-to-typeless-apis-in-elasticsearch-7-0
- CAT Index API https://www.elastic.co/guide/en/elasticsearch/reference/current/cat-indices.html
Index 相关 API1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26#查看索引相关信息 check index info
GET kibana_sample_data_ecommerce
#查看索引的文档总数 check index count
GET kibana_sample_data_ecommerce/_count
#查看前10条文档,了解文档格式 check first 10 index
POST kibana_sample_data_ecommerce/_search
{
}
#_cat indices API
#查看indices
GET /_cat/indices/kibana*?v&s=index
#查看状态为绿的索引
GET /_cat/indices?v&health=green
#按照文档个数排序
GET /_cat/indices?v&s=docs.count:desc
#查看具体的字段
GET /_cat/indices/kibana*?pri&v&h=health,index,pri,rep,docs.count,mt
#How much memory is used per index?
GET /_cat/indices?v&h=i,tm&s=tm:desc
Node & Shard
https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-node.html
- Master-eligible node: A node that has node.master set to true (default), which makes it eligible to be elected as the master node, which controls the cluster.
- Data node: A node that has node.data set to true (default). Data nodes hold data and perform data related operations such as CRUD, search, and aggregations.
- Ingest node: A node that has node.ingest set to true (default). Ingest nodes are able to apply an ingest pipeline to a document in order to transform and enrich the document before indexing. With a heavy ingest load, it makes sense to use dedicated ingest nodes and to mark the master and data nodes as node.ingest: false.
- Machine learning node: A node that has xpack.ml.enabled and node.ml set to true, which is the default behavior in the Elasticsearch default distribution. If you want to use machine learning features, there must be at least one machine learning node in your cluster. For more information about machine learning features, see Machine learning in the Elastic Stack.
https://www.elastic.co/guide/en/elasticsearch/reference/current/glossary.html
- primary shard: Each document is stored in a single primary shard. When you index a document, it is indexed first on the primary shard, then on all replicas of the primary shard. By default, an index has one primary shard. You can specify more primary shards to scale the number of documents that your index can handle. You cannot change the number of primary shards in an index, once the index is created. However, an index can be split into a new index using the split API. See also routing
- replica shard: Each primary shard can have zero or more replicas. A replica is a copy of the primary shard, and has two purposes:
- increase failover: a replica shard can be promoted to a primary shard if the primary fails
- increase performance: get and search requests can be handled by primary or replica shards.
By default, each primary shard has one replica, but the number of replicas can be changed dynamically on an existing index. A replica shard will never be started on the same node as its primary shard.
cluster-health
https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-health.html1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20get _cat/nodes?v
GET /_nodes/es7_01,es7_02
GET /_cat/nodes?v
GET /_cat/nodes?v&h=id,ip,port,v,m
GET _cluster/health
GET _cluster/health?level=shards
GET /_cluster/health/kibana_sample_data_ecommerce,kibana_sample_data_flights
GET /_cluster/health/kibana_sample_data_flights?level=shards
#### cluster state
The cluster state API allows access to metadata representing the state of the whole cluster. This includes information such as
GET /_cluster/state
#cluster get settings
GET /_cluster/settings
GET /_cluster/settings?include_defaults=true
GET _cat/shards
GET _cat/shards?h=index,shard,prirep,state,unassigned.reason
- CAT Nodes API https://www.elastic.co/guide/en/elasticsearch/reference/current/cat-nodes.html
- Cluster API https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster.html
- CAT Shards API https://www.elastic.co/guide/en/elasticsearch/reference/current/cat-shards.html
CRUD
- Document API https://www.elastic.co/guide/en/elasticsearch/reference/current/docs.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147############Create Document############
#create document. 自动生成 _id
POST users/_doc
{
"user" : "Mike",
"post_date" : "2019-04-15T14:12:12",
"message" : "trying out Kibana"
}
#create document. 指定Id。如果id已经存在,报错
PUT users/_doc/1?op_type=create
{
"user" : "Jack",
"post_date" : "2019-05-15T14:12:12",
"message" : "trying out Elasticsearch"
}
#create document. 指定 ID 如果已经存在,就报错
PUT users/_create/1
{
"user" : "Jack",
"post_date" : "2019-05-15T14:12:12",
"message" : "trying out Elasticsearch"
}
### Get Document by ID
#Get the document by ID
GET users/_doc/1
### Index & Update
#Update 指定 ID (先删除,在写入)
GET users/_doc/1
PUT users/_doc/1
{
"user" : "Mike"
}
#GET users/_doc/1
#在原文档上增加字段
POST users/_update/1/
{
"doc":{
"post_date" : "2019-05-15T14:12:12",
"message" : "trying out Elasticsearch"
}
}
### Delete by Id
# 删除文档
DELETE users/_doc/1
### Bulk 操作
#执行两次,查看每次的结果
#执行第1次
POST _bulk
{ "index" : { "_index" : "test", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_id" : "2" } }
{ "create" : { "_index" : "test2", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_index" : "test"} }
{ "doc" : {"field2" : "value2"} }
#执行第2次
POST _bulk
{ "index" : { "_index" : "test", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_id" : "2" } }
{ "create" : { "_index" : "test2", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_index" : "test"} }
{ "doc" : {"field2" : "value2"} }
### mget 操作
GET /_mget
{
"docs" : [
{
"_index" : "test",
"_id" : "1"
},
{
"_index" : "test",
"_id" : "2"
}
]
}
#URI中指定index
GET /test/_mget
{
"docs" : [
{
"_id" : "1"
},
{
"_id" : "2"
}
]
}
GET /_mget
{
"docs" : [
{
"_index" : "test",
"_id" : "1",
"_source" : false
},
{
"_index" : "test",
"_id" : "2",
"_source" : ["field3", "field4"]
},
{
"_index" : "test",
"_id" : "3",
"_source" : {
"include": ["user"],
"exclude": ["user.location"]
}
}
]
}
### msearch 操作
POST kibana_sample_data_ecommerce/_msearch
{}
{"query" : {"match_all" : {}},"size":1}
{"index" : "kibana_sample_data_flights"}
{"query" : {"match_all" : {}},"size":2}
### 清除测试数据
#清除数据 delete data
DELETE users
DELETE test
DELETE test2
inverted-index
- https://zh.wikipedia.org/wiki/%E5%80%92%E6%8E%92%E7%B4%A2%E5%BC%95
- https://www.elastic.co/guide/cn/elasticsearch/guide/current/inverted-index.html
Demo1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17POST _analyze
{
"analyzer": "standard",
"text": "Mastering Elasticsearch"
}
POST _analyze
{
"analyzer": "standard",
"text": "Elasticsearch Server"
}
POST _analyze
{
"analyzer": "standard",
"text": "Elasticsearch Essentials"
}
analyzer
Demo for practice1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
#Simple Analyzer – 按照非字母切分(符号被过滤),小写处理
#Stop Analyzer – 小写处理,停用词过滤(the,a,is)
#Whitespace Analyzer – 按照空格切分,不转小写
#Keyword Analyzer – 不分词,直接将输入当作输出
#Patter Analyzer – 正则表达式,默认 \W+ (非字符分隔)
#Language – 提供了30多种常见语言的分词器
#2 running Quick brown-foxes leap over lazy dogs in the summer evening
#查看不同的analyzer的效果
#standard
GET _analyze
{
"analyzer": "standard",
"text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening."
}
#simpe
GET _analyze
{
"analyzer": "simple",
"text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening."
}
GET _analyze
{
"analyzer": "stop",
"text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening."
}
#stop
GET _analyze
{
"analyzer": "whitespace",
"text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening."
}
#keyword
GET _analyze
{
"analyzer": "keyword",
"text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening."
}
GET _analyze
{
"analyzer": "pattern",
"text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening."
}
#english
GET _analyze
{
"analyzer": "english",
"text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening."
}
POST _analyze
{
"analyzer": "icu_analyzer",
"text": "他说的确实在理”"
}
POST _analyze
{
"analyzer": "standard",
"text": "他说的确实在理”"
}
POST _analyze
{
"analyzer": "icu_analyzer",
"text": "这个苹果不大好吃"
}
- https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-analyze.html
- https://www.elastic.co/guide/en/elasticsearch/reference/current/analyzer-anatomy.html
URI Search
- https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html
- https://searchenginewatch.com/sew/news/2065080/search-engines-101
- https://www.huffpost.com/entry/search-engines-101-part-i_b_1104525
- https://www.entrepreneur.com/article/176398
- https://www.searchtechnologies.com/meaning-of-relevancy
Grammer | search range |
---|---|
/_search | all index in cluster |
/index1/_search | only index1 |
/index,index2/_search | index1 and index2 |
/index*/_search | regex index* |
1 | #URI Query |
- https://www.elastic.co/guide/en/elasticsearch/reference/current/search-uri-request.html
- https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html
fields | function |
---|---|
q | query string syntax |
df | default field |
sort | sort by xx |
from size | for page |
profile | check the query process |
A B | A OR B |
“A B” | A AND B |
title:(A AND B) | title=”A AND B” |
1 | #基本查询 |
Request Body
- https://www.elastic.co/guide/en/elasticsearch/reference/current/search-uri-request.html
- https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103#ignore_unavailable=true,可以忽略尝试访问不存在的索引“404_idx”导致的报错
#查询movies分页
POST /movies,404_idx/_search?ignore_unavailable=true
{
"profile": true,
"query": {
"match_all": {}
}
}
POST /kibana_sample_data_ecommerce/_search
{
"from":10,
"size":20,
"query":{
"match_all": {}
}
}
#对日期排序
POST kibana_sample_data_ecommerce/_search
{
"sort":[{"order_date":"desc"}],
"query":{
"match_all": {}
}
}
#source filtering
POST kibana_sample_data_ecommerce/_search
{
"_source":["order_date"],
"query":{
"match_all": {}
}
}
#脚本字段 painless script
GET kibana_sample_data_ecommerce/_search
{
"script_fields": {
"new_field": {
"script": {
"lang": "painless",
"source": "doc['order_date'].value+'hello'"
}
}
},
"query": {
"match_all": {}
}
}
POST movies/_search
{
"query": {
"match": {
"title": "last christmas"
}
}
}
POST movies/_search
{
"query": {
"match": {
"title": {
"query": "last christmas",
"operator": "and"
}
}
}
}
POST movies/_search
{
"query": {
"match_phrase": {
"title":{
"query": "one love"
}
}
}
}
# slop can increase search area
POST movies/_search
{
"query": {
"match_phrase": {
"title":{
"query": "one love",
"slop": 1
}
}
}
}
Simple Query String
- Not support AND OR NOT, will transform thses to string
- Support + for AND, | for OR, - for NOT
1 | PUT /users/_doc/1 |
Dynamic Mapping
- https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-mapping.html
- https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-field-mapping.html
true | false | strict | |
---|---|---|---|
document can be index | YES | YES | NO |
fields can be index | YES | NO | NO |
mapping can be update | YES | NO | NO |
1 | #写入文档,查看 Mapping |
Mapping Setting
- Mapping Parameters https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-params.html
Index Option | Record Type |
---|---|
docs | doc id |
freqs | doc id, term frequencies |
positions | doc id, term frequencies, term position |
offsets | doc id, term frequencies, term position, character offects |
1 | #设置 index 为 false |
Analyzer
- character filters: HTML strip, Mapping, Pattern replace
- tokenizer: whitespace / standard / uax_url_email / pattern / keyword / path hierarchy
- token filter: Lowercase / stop / synonym
1 | PUT logs/_doc/1 |
Dynamic Template
- Index Templates https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-templates.html
- Dynamic Template https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-mapping.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137#数字字符串被映射成text,日期字符串被映射成日期
PUT ttemplate/_doc/1
{
"someNumber":"1",
"someDate":"2019/01/01"
}
GET ttemplate/_mapping
#Create a default template
PUT _template/template_default
{
"index_patterns": ["*"],
"order" : 0,
"version": 1,
"settings": {
"number_of_shards": 1,
"number_of_replicas":1
}
}
PUT /_template/template_test
{
"index_patterns" : ["test*"],
"order" : 1,
"settings" : {
"number_of_shards": 1,
"number_of_replicas" : 2
},
"mappings" : {
"date_detection": false,
"numeric_detection": true
}
}
#查看template信息
GET /_template/template_default
GET /_template/temp*
#写入新的数据,index以test开头
PUT testtemplate/_doc/1
{
"someNumber":"1",
"someDate":"2019/01/01"
}
GET testtemplate/_mapping
get testtemplate/_settings
PUT testmy
{
"settings":{
"number_of_replicas":5
}
}
put testmy/_doc/1
{
"key":"value"
}
get testmy/_settings
DELETE testmy
DELETE /_template/template_default
DELETE /_template/template_test
#Dynaminc Mapping 根据类型和字段名
DELETE my_index
PUT my_index/_doc/1
{
"firstName":"Ruan",
"isVIP":"true"
}
GET my_index/_mapping
DELETE my_index
PUT my_index
{
"mappings": {
"dynamic_templates": [
{
"strings_as_boolean": {
"match_mapping_type": "string",
"match":"is*",
"mapping": {
"type": "boolean"
}
}
},
{
"strings_as_keywords": {
"match_mapping_type": "string",
"mapping": {
"type": "keyword"
}
}
}
]
}
}
DELETE my_index
#结合路径
PUT my_index
{
"mappings": {
"dynamic_templates": [
{
"full_name": {
"path_match": "name.*",
"path_unmatch": "*.middle",
"mapping": {
"type": "text",
"copy_to": "full_name"
}
}
}
]
}
}
PUT my_index/_doc/1
{
"name": {
"first": "John",
"middle": "Winston",
"last": "Lennon"
}
}
GET my_index/_search?q=full_name:John
aggregations
- Bucket Aggregation: A family of aggregations that build buckets, where each bucket is associated with a key and a document criterion. When the aggregation is executed, all the buckets criteria are evaluated on every document in the context and when a criterion matches, the document is considered to “fall in” the relevant bucket. By the end of the aggregation process, we’ll end up with a list of buckets - each one with a set of documents that “belong” to it.
- Metric Aggregation: Aggregations that keep track and compute metrics over a set of documents.
- Pipeline Aggregation: Aggregations that aggregate the output of other aggregations and their associated metrics
- Matrix Aggregation: A family of aggregations that operate on multiple fields and produce a matrix result based on the values extracted from the requested document fields. Unlike metric and bucket aggregations, this aggregation family does not yet support scripting.
1 | #按照目的地进行分桶统计 |
3 chapter
term query & full query
- https://www.elastic.co/guide/en/elasticsearch/reference/current/term-level-queries.html
- https://www.elastic.co/guide/en/elasticsearch/reference/current/full-text-queries.html
1 | DELETE products |
structured search
- https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-exists-query.html
- https://www.elastic.co/guide/en/elasticsearch/reference/current/term-level-queries.html
1 | #结构化搜索,精确匹配 |
relevance
- TP = term frequency
- IDF = inverse document frequency
1 | PUT testscore |
Query&Filtering
- https://www.elastic.co/guide/en/elasticsearch/reference/current/query-filter-context.html
- https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-boosting-query.html
Demo1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296POST /products/_bulk
{ "index": { "_id": 1 }}
{ "price" : 10,"avaliable":true,"date":"2018-01-01", "productID" : "XHDK-A-1293-#fJ3" }
{ "index": { "_id": 2 }}
{ "price" : 20,"avaliable":true,"date":"2019-01-01", "productID" : "KDKE-B-9947-#kL5" }
{ "index": { "_id": 3 }}
{ "price" : 30,"avaliable":true, "productID" : "JODL-X-1937-#pV7" }
{ "index": { "_id": 4 }}
{ "price" : 30,"avaliable":false, "productID" : "QQPX-R-3956-#aD8" }
#基本语法
POST /products/_search
{
"query": {
"bool" : {
"must" : {
"term" : { "price" : "30" }
},
"filter": {
"term" : { "avaliable" : "true" }
},
"must_not" : {
"range" : {
"price" : { "lte" : 10 }
}
},
"should" : [
{ "term" : { "productID.keyword" : "JODL-X-1937-#pV7" } },
{ "term" : { "productID.keyword" : "XHDK-A-1293-#fJ3" } }
],
"minimum_should_match" :1
}
}
}
#改变数据模型,增加字段。解决数组包含而不是精确匹配的问题
POST /newmovies/_bulk
{ "index": { "_id": 1 }}
{ "title" : "Father of the Bridge Part II","year":1995, "genre":"Comedy","genre_count":1 }
{ "index": { "_id": 2 }}
{ "title" : "Dave","year":1993,"genre":["Comedy","Romance"],"genre_count":2 }
#must,有算分
POST /newmovies/_search
{
"query": {
"bool": {
"must": [
{"term": {"genre.keyword": {"value": "Comedy"}}},
{"term": {"genre_count": {"value": 1}}}
]
}
}
}
#Filter。不参与算分,结果的score是0
POST /newmovies/_search
{
"query": {
"bool": {
"filter": [
{"term": {"genre.keyword": {"value": "Comedy"}}},
{"term": {"genre_count": {"value": 1}}}
]
}
}
}
#Filtering Context
POST _search
{
"query": {
"bool" : {
"filter": {
"term" : { "avaliable" : "true" }
},
"must_not" : {
"range" : {
"price" : { "lte" : 10 }
}
}
}
}
}
#Query Context
POST /products/_bulk
{ "index": { "_id": 1 }}
{ "price" : 10,"avaliable":true,"date":"2018-01-01", "productID" : "XHDK-A-1293-#fJ3" }
{ "index": { "_id": 2 }}
{ "price" : 20,"avaliable":true,"date":"2019-01-01", "productID" : "KDKE-B-9947-#kL5" }
{ "index": { "_id": 3 }}
{ "price" : 30,"avaliable":true, "productID" : "JODL-X-1937-#pV7" }
{ "index": { "_id": 4 }}
{ "price" : 30,"avaliable":false, "productID" : "QQPX-R-3956-#aD8" }
POST /products/_search
{
"query": {
"bool": {
"should": [
{
"term": {
"productID.keyword": {
"value": "JODL-X-1937-#pV7"}}
},
{"term": {"avaliable": {"value": true}}
}
]
}
}
}
#嵌套,实现了 should not 逻辑
POST /products/_search
{
"query": {
"bool": {
"must": {
"term": {
"price": "30"
}
},
"should": [
{
"bool": {
"must_not": {
"term": {
"avaliable": "false"
}
}
}
}
],
"minimum_should_match": 1
}
}
}
#Controll the Precision
POST _search
{
"query": {
"bool" : {
"must" : {
"term" : { "price" : "30" }
},
"filter": {
"term" : { "avaliable" : "true" }
},
"must_not" : {
"range" : {
"price" : { "lte" : 10 }
}
},
"should" : [
{ "term" : { "productID.keyword" : "JODL-X-1937-#pV7" } },
{ "term" : { "productID.keyword" : "XHDK-A-1293-#fJ3" } }
],
"minimum_should_match" :2
}
}
}
POST /animals/_search
{
"query": {
"bool": {
"should": [
{ "term": { "text": "brown" }},
{ "term": { "text": "red" }},
{ "term": { "text": "quick" }},
{ "term": { "text": "dog" }}
]
}
}
}
POST /animals/_search
{
"query": {
"bool": {
"should": [
{ "term": { "text": "quick" }},
{ "term": { "text": "dog" }},
{
"bool":{
"should":[
{ "term": { "text": "brown" }},
{ "term": { "text": "brown" }},
]
}
}
]
}
}
}
DELETE blogs
POST /blogs/_bulk
{ "index": { "_id": 1 }}
{"title":"Apple iPad", "content":"Apple iPad,Apple iPad" }
{ "index": { "_id": 2 }}
{"title":"Apple iPad,Apple iPad", "content":"Apple iPad" }
POST blogs/_search
{
"query": {
"bool": {
"should": [
{"match": {
"title": {
"query": "apple,ipad",
"boost": 1.1
}
}},
{"match": {
"content": {
"query": "apple,ipad",
"boost":
}
}}
]
}
}
}
DELETE news
POST /news/_bulk
{ "index": { "_id": 1 }}
{ "content":"Apple Mac" }
{ "index": { "_id": 2 }}
{ "content":"Apple iPad" }
{ "index": { "_id": 3 }}
{ "content":"Apple employee like Apple Pie and Apple Juice" }
POST news/_search
{
"query": {
"bool": {
"must": {
"match":{"content":"apple"}
}
}
}
}
POST news/_search
{
"query": {
"bool": {
"must": {
"match":{"content":"apple"}
},
"must_not": {
"match":{"content":"pie"}
}
}
}
}
POST news/_search
{
"query": {
"boosting": {
"positive": {
"match": {
"content": "apple"
}
},
"negative": {
"match": {
"content": "pie"
}
},
"negative_boost": 0.5
}
}
}
Disjunction max query
Demo1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49PUT /blogs/_doc/1
{
"title": "Quick brown rabbits",
"body": "Brown rabbits are commonly seen."
}
PUT /blogs/_doc/2
{
"title": "Keeping pets healthy",
"body": "My quick brown fox eats rabbits on a regular basis."
}
POST /blogs/_search
{
"query": {
"bool": {
"should": [
{ "match": { "title": "Brown fox" }},
{ "match": { "body": "Brown fox" }}
]
}
}
}
POST blogs/_search
{
"query": {
"dis_max": {
"queries": [
{ "match": { "title": "Quick pets" }},
{ "match": { "body": "Quick pets" }}
]
}
}
}
POST blogs/_search
{
"query": {
"dis_max": {
"queries": [
{ "match": { "title": "Quick pets" }},
{ "match": { "body": "Quick pets" }}
],
"tie_breaker": 0.2
}
}
}
##
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-dis-max-query.html
Best Fields
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138POST blogs/_search
{
"query": {
"dis_max": {
"queries": [
{ "match": { "title": "Quick pets" }},
{ "match": { "body": "Quick pets" }}
],
"tie_breaker": 0.2
}
}
}
POST blogs/_search
{
"query": {
"multi_match": {
"type": "best_fields",
"query": "Quick pets",
"fields": ["title","body"],
"tie_breaker": 0.2,
"minimum_should_match": "20%"
}
}
}
POST books/_search
{
"multi_match": {
"query": "Quick brown fox",
"fields": "*_title"
}
}
POST books/_search
{
"multi_match": {
"query": "Quick brown fox",
"fields": [ "*_title", "chapter_title^2" ]
}
}
DELETE /titles
PUT /titles
{
"settings": { "number_of_shards": 1 },
"mappings": {
"my_type": {
"properties": {
"title": {
"type": "string",
"analyzer": "english",
"fields": {
"std": {
"type": "string",
"analyzer": "standard"
}
}
}
}
}
}
}
PUT /titles
{
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "english"
}
}
}
}
POST titles/_bulk
{ "index": { "_id": 1 }}
{ "title": "My dog barks" }
{ "index": { "_id": 2 }}
{ "title": "I see a lot of barking dogs on the road " }
GET titles/_search
{
"query": {
"match": {
"title": "barking dogs"
}
}
}
DELETE /titles
PUT /titles
{
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "english",
"fields": {"std": {"type": "text","analyzer": "standard"}}
}
}
}
}
POST titles/_bulk
{ "index": { "_id": 1 }}
{ "title": "My dog barks" }
{ "index": { "_id": 2 }}
{ "title": "I see a lot of barking dogs on the road " }
GET /titles/_search
{
"query": {
"multi_match": {
"query": "barking dogs",
"type": "most_fields",
"fields": [ "title", "title.std" ]
}
}
}
GET /titles/_search
{
"query": {
"multi_match": {
"query": "barking dogs",
"type": "most_fields",
"fields": [ "title^10", "title.std" ]
}
}
}multi-language
Elasticsearch IK分词插件 https://github.com/medcl/elasticsearch-analysis-ik/releases
- Elasticsearch hanlp 分词插件 https://github.com/KennFalcon/elasticsearch-analysis-hanlp
- 分词算法综述 https://zhuanlan.zhihu.com/p/50444885
- 中科院计算所NLPIR http://ictclas.nlpir.org/nlpir/
- ansj分词器 https://github.com/NLPchina/ansj_seg
- 哈工大的LTP https://github.com/HIT-SCIR/ltp
- 清华大学THULAC https://github.com/thunlp/THULAC
- 斯坦福分词器 https://nlp.stanford.edu/software/segmenter.shtml
- Hanlp分词器 https://github.com/hankcs/HanLP
- 结巴分词 https://github.com/yanyiwu/cppjieba
- KCWS分词器(字嵌入+Bi-LSTM+CRF) https://github.com/koth/kcws
- ZPar https://github.com/frcchang/zpar/releases
- IKAnalyzer https://github.com/wks/ik-analyzer
1 | #stop word |
tmdb practice
Prequest
- Python 2.7.15
- request
1 | cd tmdb-search |
Demo1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25POST tmdb/_search
{
"_source": ["title","overview"],
"query": {
"match_all": {}
}
}
POST tmdb/_search
{
"_source": ["title","overview"],
"query": {
"multi_match": {
"query": "basketball with cartoon aliens",
"fields": ["title","overview"]
}
},
"highlight" : {
"fields" : {
"overview" : { "pre_tags" : ["\\033[0;32;40m"], "post_tags" : ["\\033[0m"] },
"title" : { "pre_tags" : ["\\033[0;32;40m"], "post_tags" : ["\\033[0m"] }
}
}
}
Search Template & Index Alias
1 | POST _scripts/tmdb |
Function Score Query
Demo1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110DELETE blogs
PUT /blogs/_doc/1
{
"title": "About popularity",
"content": "In this post we will talk about...",
"votes": 0
}
PUT /blogs/_doc/2
{
"title": "About popularity",
"content": "In this post we will talk about...",
"votes": 100
}
PUT /blogs/_doc/3
{
"title": "About popularity",
"content": "In this post we will talk about...",
"votes": 1000000
}
POST /blogs/_search
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "popularity",
"fields": [ "title", "content" ]
}
},
"field_value_factor": {
"field": "votes"
}
}
}
}
POST /blogs/_search
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "popularity",
"fields": [ "title", "content" ]
}
},
"field_value_factor": {
"field": "votes",
"modifier": "log1p"
}
}
}
}
POST /blogs/_search
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "popularity",
"fields": [ "title", "content" ]
}
},
"field_value_factor": {
"field": "votes",
"modifier": "log1p" ,
"factor": 0.1
}
}
}
}
POST /blogs/_search
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "popularity",
"fields": [ "title", "content" ]
}
},
"field_value_factor": {
"field": "votes",
"modifier": "log1p" ,
"factor": 0.1
},
"boost_mode": "sum",
"max_boost": 3
}
}
}
POST /blogs/_search
{
"query": {
"function_score": {
"random_score": {
"seed": 911119
}
}
}
}
Term & Phrase Suggester
1 | DELETE articles |
Auto Complete
Pratice in Dev Tools1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89DELETE articles
PUT articles
{
"mappings": {
"properties": {
"title_completion":{
"type": "completion"
}
}
}
}
POST articles/_bulk
{ "index" : { } }
{ "title_completion": "lucene is very cool"}
{ "index" : { } }
{ "title_completion": "Elasticsearch builds on top of lucene"}
{ "index" : { } }
{ "title_completion": "Elasticsearch rocks"}
{ "index" : { } }
{ "title_completion": "elastic is the company behind ELK stack"}
{ "index" : { } }
{ "title_completion": "Elk stack rocks"}
{ "index" : {} }
POST articles/_search?pretty
{
"size": 0,
"suggest": {
"article-suggester": {
"prefix": "elk ",
"completion": {
"field": "title_completion"
}
}
}
}
DELETE comments
PUT comments
PUT comments/_mapping
{
"properties": {
"comment_autocomplete":{
"type": "completion",
"contexts":[{
"type":"category",
"name":"comment_category"
}]
}
}
}
POST comments/_doc
{
"comment":"I love the star war movies",
"comment_autocomplete":{
"input":["star wars"],
"contexts":{
"comment_category":"movies"
}
}
}
POST comments/_doc
{
"comment":"Where can I find a Starbucks",
"comment_autocomplete":{
"input":["starbucks"],
"contexts":{
"comment_category":"coffee"
}
}
}
POST comments/_search
{
"suggest": {
"MY_SUGGESTION": {
"prefix": "sta",
"completion":{
"field":"comment_autocomplete",
"contexts":{
"comment_category":"coffee"
}
}
}
}
}
Cross Cluster Search
- 在Kibana中使用Cross data search https://kelonsoftware.com/cross-cluster-search-kibana/
1 | # 启动3个集群 start 3 cluster |
split-brain
1 | # start 3 nodes in one host |
node type | config | default |
---|---|---|
master eligible | node.master | true |
data | node.data | true |
ingest | node.ingest | true |
coordinating only | NA | above 3 are false |
machine learning | node.ml | true(need x-pack) |
query fetch
1 | DELETE message |
Doc Values & Fielddata
Demo1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119#单字段排序
POST /kibana_sample_data_ecommerce/_search
{
"size": 5,
"query": {
"match_all": {
}
},
"sort": [
{"order_date": {"order": "desc"}}
]
}
#多字段排序
POST /kibana_sample_data_ecommerce/_search
{
"size": 5,
"query": {
"match_all": {
}
},
"sort": [
{"order_date": {"order": "desc"}},
{"_doc":{"order": "asc"}},
{"_score":{ "order": "desc"}}
]
}
GET kibana_sample_data_ecommerce/_mapping
#对 text 字段进行排序。默认会报错,需打开fielddata
POST /kibana_sample_data_ecommerce/_search
{
"size": 5,
"query": {
"match_all": {
}
},
"sort": [
{"customer_full_name": {"order": "desc"}}
]
}
#打开 text的 fielddata
PUT kibana_sample_data_ecommerce/_mapping
{
"properties": {
"customer_full_name" : {
"type" : "text",
"fielddata": true,
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
#关闭 keyword的 doc values
PUT test_keyword
PUT test_keyword/_mapping
{
"properties": {
"user_name":{
"type": "keyword",
"doc_values":false
}
}
}
DELETE test_keyword
PUT test_text
PUT test_text/_mapping
{
"properties": {
"intro":{
"type": "text",
"doc_values":true
}
}
}
DELETE test_text
DELETE temp_users
PUT temp_users
PUT temp_users/_mapping
{
"properties": {
"name":{"type": "text","fielddata": true},
"desc":{"type": "text","fielddata": true}
}
}
Post temp_users/_doc
{"name":"Jack","desc":"Jack is a good boy!","age":10}
#打开fielddata 后,查看 docvalue_fields数据
POST temp_users/_search
{
"docvalue_fields": [
"name","desc"
]
}
#查看整型字段的docvalues
POST temp_users/_search
{
"docvalue_fields": [
"age"
]
}
From, Size, Search_after & Scroll API
- from: where to begin search
- size: number of docs you need query
type | function |
---|---|
Regular | real-time query top docs |
Scroll | Need all docs |
Pagination | from and size, if need deep page, use Search After |
1 | # result windows is too large 10000 |
Concurrent Control
- Optimistic lock
1 | DELETE products |