首页  思维导图  详情

ES知识

2021-07-28 21:02:28   0  举报





AI智能生成

ES知识梳理汇总

ElasticSearch

elasticsearch

作者其他创作

大纲/内容

主要概念

index

database

type

table

document

row

field

column

mapping

schema

核心操作

index 操作

创建/删除 index、开启/关闭 index、添加/查看 mapping、设置/查看 settings。

# 创建索引
PUT /songs_v3

# 删除索引
DELETE /songs_v3

# 创建 index，指定 settings
PUT /songs_v4
{
"settings": {
"number_of_shards": 6,
"number_of_replicas": 1
}
}

# 获取 index 的 settings 信息
GET /songs_v4/_settings

# 修改 index 的配置信息
# index 的配置分为两类：
# static（number of shards/index.shard.check_on_startup）
# dynamic（index 正常工作时，能修改的配置信息）
PUT /songs_v4/_settings
{
"number_of_replicas": 2
}

# index 开启状态，不允许执行
PUT /songs_v4/_settings
{
"index.shard.check_on_startup": true
}

# 关闭 index
POST /songs_v4/_close

# 开启 index
POST /songs_v4/_open

# 获取 index 中的 mapping types
GET /songs_v4/_mapping

# 删除 mapping_type（不支持）
DELETE /songs_v4/_mapping

document 操作

索引/查询/更新/删除 document、搜索 document、执行 script

# 索引文档
# 显示指定文档 ID
PUT /songs_v4/_doc/5
{
"songName": "could this be love",
"singer": "Jennifer Lopez",
"lyrics": "Could This Be love, work up This Morning Just..."
}

# 随机生成文档 ID
POST /songs_v4/_doc
{
"songName": "could this be love",
"singer": "Jennifer Lopez",
"lyrics": "Could This Be love, work up This Morning Just..."
}

# 更新文档
PUT /songs_v4/_doc/5
{
"songName": "could this be love",
"singer": "zp",
"lyrics": "Could This Be love, work up This Morning Just..."
}

# 根据 ID 明确查询某个文档
GET /songs_v4/_doc/5

# 根据 ID 删除文档
DELETE /songs_v4/_doc/5

# 搜索一个文档
GET /songs_v4/_search?q=singer:Jennifer

GET /songs_v4/_mapping

mapping操作

# 创建 index 后，创建 mapping
PUT /books
PUT /books/_mapping
{
"properties": {
"bookName": {"type": "text"},
"content": {"type": "text"}
}
}
GET /books/_mapping
DELETE /books

# 创建 index，并指定 mapping
PUT /books
{
"mappings": {
"properties": {
"bookName": {"type": "text"},
"content": {"type": "text"}
}
}
}
GET /books/_mapping
DELETE /books

# 给 mapping 添加字段
PUT /books/_mappings
{
"properties": {
"author": {"type": "text"}
}
}
GET /books/_mapping

多重字段

PUT my_index
{
"mappings": {
"properties": {
"city": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
}

PUT my_index/_doc/1
{
"city": "New York"
}

PUT my_index/_doc/2
{
"city": "York"
}

GET my_index/_search
{
"query": {
"match": {
"city": "york"
}
},
"sort": [
{
"city.raw": {
"order": "asc"
}
}
],
"aggs": {
"citys": {
"terms": {
"field": "city.raw"
}
}
}
}

具体优化

索引类型

doc_values

大多数字段进行了反向索引，因此可以用于搜索，但排序、聚合、scripts 操作等需要正向索引

fielddata

大多数字段可利用 doc_values 来进行排序、聚合、scripts 等操作，但 doc_values 不支持 text 字段，text 字段利用 fielddata 机制来替代。（常驻内存，非常昂贵）

index

doc_values 指定文档是否进行正向索引，index 指定文档是否进行反向索引

store

默认情况下，_source 会存储文档所有的字段，当一个字段的 store 属性设置为 true 时，ES 会单独存储一份该字段。

使用场景，比如书籍，content 字段会保存几百万个字符，在几百万字符中提取 name、author 是很麻烦的事情，所以会考虑将 content 字段通过 store 存储。

PUT books
{
"mappings": {
"properties": {
"name": {"type": "text"},
"author": {"type": "text"},
"content": {"type": "text", "store": true}
},
"_source": {
"excludes": [
"content"
]
}
}
}

元字段

字段名说明
_index 文档所属的 index
_id 文档的 id
_type 文档所属的 type
_uid _type#_id 的组合
_source 文档的原生 json 字符串
_all 自动组合所有的字段值，已过时
_field_names 索引了每个字段的名称
_parent 指定文档之间父子关系，已过时
_routing 将一个文档根据路由存储到指定分片上
_meta 用于自定义元数据

简单操作

集群管理

curl http://localhost:9200/_cat/health?pretty

curl http://localhost:9200/_cat/nodes?pretty

curl http://localhost:9200/_cat/shards?pretty

curl http://localhost:9200/_cat/indices?v

查看集群中的索引列表

curl http://localhost:9200/_cat

增查改删

PUT /index_name/type_name/id

PUT /shop_index/productInfo/1
{
"name": "HuaWei Mate8",
"desc": "Cheap and easy to use",
"price": 2500,
"producer": "HuaWei Producer",
"tags": [
"Cheap",
"Fast"
]
}

GET /index_name/type_name/id

GET /shop_index/productInfo/1
{
"_index": "shop_index",
"_type": "productInfo",
"_id": "1",
"_version": 1,
"found": true,
"_source": {
"name": "HuaWei Mate8",
"desc": "Cheap and easy to use",
"price": 2500,
"producer": "HuaWei Producer",
"tags": [
"Cheap",
"Fast"
]
}
}

PUT /index_name/type_name/id

PUT /shop_index/productInfo/1
{
"name": "HuaWei Mate8",
"desc": "Cheap and easy to use",
"price": 2400,
"producer": "HuaWei Producer",
"tags": [
"Cheap",
"Fast"
]
}

POST /index_name/type_name/id/_update

POST /shop_index/productInfo/1/_update
{
"doc": {
"price": 2200
}
}

DELETE /index_index/type_index/id

DELETE /shop_index/productInfo/1

全能型的数据产品

由于支持倒排索引、列存储等数据结构，ES 提供非常灵活的搜索分析能力

支持交互式分析，即使在万亿级日志的情况下，ES 搜索响应时间也是秒级。

ES 拥有一套完整的日志解决方案（ELK），可以秒级实现从采集到展示。

Elasticsearch是一个分布式、高扩展、高实时的搜索与数据分析引擎。它能很方便的使大量数据具有搜索、分析和探索的能力。充分利用Elasticsearch的水平伸缩性，能使数据在生产环境变得更有价值。

优势

具有高可用性、高扩展性；

很简便的横向扩容，分布式的架构，可以轻松地对资源进行横向纵向扩缩容，可以满足不同数据量级及查询场景对硬件资源的需求。能由数百台到万台机器搭建满足PB级的快速搜索，也能搭建单机版服务小公司

查询速度快，性能佳；

ES底层采用Lucene作为搜索引擎，并在此之上做了多重优化，保证了用户对数据查询数据的需求。可"代替"传统关系型数据库，也可用于复杂数据分析，海量数据的近实时处理等。

搜索功能强大，高度匹配用户意图。

关性高：ES内部提供了完善的评分机制，会根据分词出现的频次等信息对文档进行相关性排序，保证相关性越高的文档排序越靠前。另外还提供了包括模糊查询，前缀查询，通配符查询等在内的多种查询手段，帮助用户快速高效地进行检索。

功能点多但使用比较简便，开箱即用，性能优化比较简单

生态圈丰富，社区活跃，适配多种工具

处理日志和输出到Elasticsearch，您可以使用日志记录工具，如Logstash（www.elastic.co/products/logstash），搜索和可视化界面分析这些日志，你可以使用Kibana（www.elastic.co/产品/ kibana），即传说中的ELK技术栈。另外当前主流的大数据框架也几乎都支持ES，比如Flink和ES就是个完美搭档。

应用场景

日志实时分析

ES 应用最广泛的领域，支持全栈的日志分析

ES 拥有一套完整的日志解决方案（ELK），可以秒级实现从采集到展示。

搜索服务

全文索引

商品索引

时序分析

时序数据的特点是写入吞吐量特别高，ES 支持的同时也提供了丰富的多维统计分析算子

典型的场景是监控数据分析

物联网场景，也有大量的时序数据

数据分析