ELK数据分析工具学习


学习参考资料


ElasticSearch参考手册,学习
http://elasticsearch.cn/book/elasticsearch_definitive_guide_2.x/index.html
DSL查询语法,包括查找(query)、过滤(filter)和聚合(aggs)等。
Logstash参考手册,学习数据导入,包括输入(input)、过滤(filter)和输出(
output)等,主要是filter中如何对复杂文本 进行拆分和类型 转化。
http://udn.yyuap.com/doc/logstash-best-practice-cn/index.html

Kibana参考手册,使用Kibana提供的前端界面对数据进行快速展示,主要是对Visulize 模块的使
https://kibana.logstash.es/content/

安装

下载

git clone https://github.com/elastic/stack-docker

安装 docker-compose

pip install docker-compose

修改配置

修改docker-compose.yml 中ip 127.0.0.1 为 0.0.0.0

启动

docker-compose up

访问

http://IP:5601

登陆

user=elastic password=changeme

参看文献

https://github.com/elastic/stack-docker
http://hao.jobbole.com/kibana/


基本操作

1curl -XGET 'http://localhost:9200/_count?pretty' -d '
2{
3    "query": {
4        "match_all": {}
5    }
6}
7'
1{
2  "count": 155300,
3  "_shards": {
4    "total": 79,
5    "successful": 44,
6    "skipped": 0,
7    "failed": 0
8  }
9}

使用put, 指定id

使用post, 自动生成id

1PUT /website/blog/123
2{
3  "title": "My first blog entry",
4  "text":  "Just trying this out...",
5  "date":  "2014/01/01"
6}
 1{
 2  "_index": "website",
 3  "_type": "blog",
 4  "_id": "R0DpVWEB9_9q-Fol0opS",
 5  "_version": 1,
 6  "result": "created",
 7  "_shards": {
 8    "total": 2,
 9    "successful": 1,
10    "failed": 0
11  },
12  "_seq_no": 0,
13  "_primary_term": 1
14}

自动生成的 ID 是 URL-safe、 基于 Base64 编码且长度为20个字符的 GUID 字符串。 这些 GUID 字符串由可修改的 FlakeID 模式生成,这种模式允许多个节点并行生成唯一 ID ,且互相之间的冲突概率几乎为零。

取回文档 根据_index _type _id

1GET /website/blog/123
2GET /website/blog/123?_source=title,text
3GET /website/blog/123/_source
 1{
 2  "_index": "website",
 3  "_type": "blog",
 4  "_id": "123",
 5  "_version": 1,
 6  "found": true,
 7  "_source": {
 8    "title": "My first blog entry",
 9    "text": "Just trying this out...",
10    "date": "2014/01/01"
11  }
12}

如果没有 会返回

 1{
 2  "_index": "website",
 3  "_type": "blog",
 4  "_id": "123",
 5  "_version": 1,
 6  "found": true,
 7  "_source": {
 8    "title": "My first blog entry",
 9    "text": "Just trying this out...",
10    "date": "2014/01/01"
11  }
12}

检查文档是否存在

1HEAD /website/blog/123?_source

200 - ok

创建新文档

1POST /website/blog/
2{ ... }# 自动生成id
3PUT /website/blog/123?op_type=create
4{ ... }
5PUT /website/blog/123/_create
6{ ... }
 1{
 2    #如果可以创建
 3    201 Created
 4    #如果id重复
 5   "error": {
 6      "root_cause": [
 7         {
 8            "type": "document_already_exists_exception",
 9            "reason": "[blog][123]: document already exists",
10            "shard": "0",
11            "index": "website"
12         }
13      ],
14      "type": "document_already_exists_exception",
15      "reason": "[blog][123]: document already exists",
16      "shard": "0",
17      "index": "website"
18   },
19   "status": 409
20}

更新文档

1PUT /website/blog/123
2{
3  "title": "My first blog entry",
4  "text":  "I am starting to get the hang of this...",
5  "date":  "2014/01/02"
6}
 1{
 2  "_index": "website",
 3  "_type": "blog",
 4  "_id": "123",
 5  "_version": 2,#版本会变成version2
 6  "result": "updated",#更改为Update
 7  "_shards": {
 8    "total": 2,
 9    "successful": 1,
10    "failed": 0
11  },
12  "_seq_no": 1,
13  "_primary_term": 1
14}

删除文档

1DELETE /website/blog/123
#存在
{
{
  "_index": "website",
  "_type": "blog",
  "_id": "123",
  "_version": 3,
  "result": "deleted",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 2,
  "_primary_term": 1
}
}
#不存在
{
  "found" :    false,
  "_index" :   "website",
  "_type" :    "blog",
  "_id" :      "123",
  "_version" : 4
}

并发中遇到的问题

悲观并发控制

这种方法被关系型数据库广泛使用,它假定有变更冲突可能发生,因此阻塞访问资源以防止冲突。 一个典型的例子是读取一行数据之前先将其锁住,确保只有放置锁的线程能够对这行数据进行修改。

乐观并发控制

Elasticsearch 中使用的这种方法假定冲突是不可能发生的,并且不会阻塞正在尝试的操作。 然而,如果源数据在读写当中被修改,更新将会失败。应用程序接下来将决定该如何解决冲突。 例如,可以重试更新、使用新的数据、或者将相关情况报告给用户。

乐观并发控制

利用 _version 号来确保 应用中相互冲突的变更不会导致数据丢失。我们通过指定想要修改文档的 version 号来达到这个目的。

实例流程

1.创建一个文档

1PUT /website/blog/1/_create
2{
3  "title": "My first blog entry",
4  "text":  "Just trying this out..."
5}

version 版本号是 1
2.重建索引保存修改 指定version

1PUT /website/blog/1?version=1 
2{
3  "title": "My first blog entry",
4  "text":  "Starting to get the hang of this..."
5}

version=2
如果version 已经是2了,再对version1进行修改,

 1{
 2 "error": {
 3   "root_cause": [
 4     {
 5       "type": "version_conflict_engine_exception",
 6       "reason": "[blog][1]: version conflict, current version [2] is different than the one provided [1]",
 7       "index_uuid": "_1drhEbXTB-rHBomeP7cJw",
 8       "shard": "3",
 9       "index": "website"
10     }
11   ],
12   "type": "version_conflict_engine_exception",
13   "reason": "[blog][1]: version conflict, current version [2] is different than the one provided [1]",
14   "index_uuid": "_1drhEbXTB-rHBomeP7cJw",
15   "shard": "3",
16   "index": "website"
17 },
18 "status": 409
19}

使用外部版本号 更新版本号

只有当前版本号小于更新版本号的时候,才会更新成功

1PUT /website/blog/2?version=10&version_type=external
2{
3  "title": "My first external blog entry",
4  "text":  "This is a piece of cake..."
5}
 1{
 2  "_index": "website",
 3  "_type": "blog",
 4  "_id": "2",
 5  "_version": 10,
 6  "result": "created",
 7  "_shards": {
 8    "total": 2,
 9    "successful": 1,
10    "failed": 0
11  },
12  "_seq_no": 0,
13  "_primary_term": 1
14}
15# 如果不小于
16{
17  "error": {
18    "root_cause": [
19      {
20        "type": "version_conflict_engine_exception",
21        "reason": "[blog][2]: version conflict, current version [10] is higher or equal to the one provided [10]",
22        "index_uuid": "_1drhEbXTB-rHBomeP7cJw",
23        "shard": "2",
24        "index": "website"
25      }
26    ],
27    "type": "version_conflict_engine_exception",
28    "reason": "[blog][2]: version conflict, current version [10] is higher or equal to the one provided [10]",
29    "index_uuid": "_1drhEbXTB-rHBomeP7cJw",
30    "shard": "2",
31    "index": "website"
32  },
33  "status": 409
34}

更新文档

1POST /website/blog/1/_update
2{
3   "doc" : {
4      "tags" : [ "testing" ],
5      "views": 0
6   }
7}