Elasticsearch 聚合搜尋: 度量計算
什麼是度量計算? 就是將抓取出來的資料, 做加總、取平均、抓最大最小等等..
當然其他還有很多, 不過這邊就介紹最常用的部分~ 有興趣看添加度量指标| Elasticsearch: 权威指南| Elastic
範例: 統計班級+性別+分數
假設我們的原始資料結構如下
class | gender | score |
---|---|---|
A | Girl | 81 |
C | Boy | 54 |
B | Girl | 63 |
B | Boy | 71 |
C | Boy | 24 |
C | Girl | 93 |
C | Boy | 85 |
我們的目標是計算出:
- 個班級的平均分數
- 個班級男生女生的平均分數
- 找出每個班級分數最高的人, 和其性別
建立資料
先在這邊下載原始資料: students.json,
注意最後一行要空白
然後將他上傳: curl -H "Content-Type: Application/json" -XPOST "192.168.40.41:9200/students/doc/_bulk" --data-binary @students.json
大家記得把 elasticsearch ip 換成自己的~
然後看一下是不是所有學生都進去了; 很好! 50個學生all in.
❯ curl http://192.168.40.41:9200/_cat/indices\?v
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open students 0SjW3-EeQ5-uI7KiUNQ8uQ 3 1 50 0 15.7kb 690b
聚合思路
要取得每個班級學生的平均分數,無法一步到位; 這邊就一步一步來看
- 先依照班級分類
這邊就要使用聚合分組, 還不知道怎麼操作的同學跟著連結複習一下~
第一步我們要把班級給分類出來, 那麼依照上一篇的思路我們的結構應該如下
{
聚合: {
以班級分類: {
詞分類: {
欄位: 班級
}
}
}
}
1-1. 實際操作
curl -s -H "Content-Type: Application/json" -XPOST "http://192.168.40.41:9200/students/_search" -d '
{
"size": 0,
"query": {
"bool": {}
},
"aggs": {
"group_by_class": {
"terms": {
"field": "class.keyword"
}
}
}
}
' | jq .
{
"took": 297,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 50,
"max_score": 0,
"hits": []
},
"aggregations": {
"group_by_class": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "C",
"doc_count": 20
},
{
"key": "A",
"doc_count": 13
},
{
"key": "D",
"doc_count": 9
},
{
"key": "B",
"doc_count": 8
}
]
}
}
}
- 分數計算
光是上面那樣分類班級的結果, 肯定是無法滿足我們的目標的. 只能知道為什麼B班級的人特別少XD
沒關係, 咱繼續聚合下去~
複習一下聚合的結構
{
聚合: {
<聚合名字自取>: {
<聚合種類>: {
<聚合欄位>: ""
}
}
}
}
那麼要這次的目標, 他應該變成這樣
{
聚合: {
平均分數: {
平均: {
欄位: 分數
}
}
}
}
2-1. 結合兩個聚合
所以說這邊我們要先分類班級, 然後計算分數; 組裝起來會變成這樣
{
聚合: {
以班級分類: {
詞分類: {
欄位: 班級
},
聚合: {
平均分數: {
平均: {
欄位: 分數
}
}
}
}
}
}
2-2. 實際操作
curl -s -H "Content-Type: Application/json" -XPOST "http://192.168.40.41:9200/students/_search" -d '
{
"size": 0,
"query": {
"bool": {}
},
"aggs": {
"group_by_class": {
"terms": {
"field": "class.keyword"
},
"aggs": {
"avg_score": {
"avg": {
"field": "score"
}
}
}
}
}
}
' | jq .
{
"took": 10365,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 50,
"max_score": 0,
"hits": []
},
"aggregations": {
"group_by_class": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "C",
"doc_count": 20,
"avg_score": {
"value": 62.75
}
},
{
"key": "A",
"doc_count": 13,
"avg_score": {
"value": 60.76923076923077
}
},
{
"key": "D",
"doc_count": 9,
"avg_score": {
"value": 71.77777777777777
}
},
{
"key": "B",
"doc_count": 8,
"avg_score": {
"value": 73.625
}
}
]
}
}
}
到目前為止; 已經算出每個班級的平均啦
- 再使用性別作為分類
到目前有跟上的同學, 一定腦筋很快的已經寫好了聚合了
就在堆一個以性別做分類下去就對啦~~
{
"aggs": {
"group_by_class": {
"terms": {
"field": "class.keyword"
},
"aggs": {
"avg_score": {
"avg": {
"field": "score"
},
"aggs": {
"group_by_gender": {
"terms": {
"field": "gender.keyword"
}
}
}
}
}
}
}
}
噗.. 結果得到一個"avg不接受子聚合"
{
"error": {
"root_cause": [
{
"type": "aggregation_initialization_exception",
"reason": "Aggregator [avg_score] of type [avg] cannot accept sub-aggregations"
}
],
"type": "aggregation_initialization_exception",
"reason": "Aggregator [avg_score] of type [avg] cannot accept sub-aggregations"
},
"status": 500
}
沒關係; 那咱們把它寫在上面; 結構如下
{
聚合: {
以班級分類: {
詞分類: {
欄位: 班級
},
聚合: {
以性別分類: {
詞分類: {
欄位: 性別
},
聚合: {
平均分數: {
平均: {
欄位: 分數
}
}
}
}
}
}
}
}
3-1. 實際操作
curl -s -H "Content-Type: Application/json" -XPOST "http://192.168.40.41:9200/students/_search" -d '
{
"size": 0,
"query": {
"bool": {}
},
"aggs": {
"group_by_class": {
"terms": {
"field": "class.keyword"
},
"aggs": {
"group_by_gender": {
"terms": {
"field": "gender.keyword"
},
"aggs": {
"avg_score": {
"avg": {
"field": "score"
}
}
}
}
}
}
}
}
' | jq .
{
"took": 561,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 50,
"max_score": 0,
"hits": []
},
"aggregations": {
"group_by_class": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "C",
"doc_count": 20,
"group_by_gender": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Boy",
"doc_count": 15,
"avg_score": {
"value": 60.06666666666667
}
},
{
"key": "Girl",
"doc_count": 5,
"avg_score": {
"value": 70.8
}
}
]
}
},
{
"key": "A",
"doc_count": 13,
"group_by_gender": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Girl",
"doc_count": 7,
"avg_score": {
"value": 71.42857142857143
}
},
{
"key": "Boy",
"doc_count": 6,
"avg_score": {
"value": 48.333333333333336
}
}
]
}
},
{
"key": "D",
"doc_count": 9,
"group_by_gender": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Boy",
"doc_count": 5,
"avg_score": {
"value": 65.6
}
},
{
"key": "Girl",
"doc_count": 4,
"avg_score": {
"value": 79.5
}
}
]
}
},
{
"key": "B",
"doc_count": 8,
"group_by_gender": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Girl",
"doc_count": 6,
"avg_score": {
"value": 73.33333333333333
}
},
{
"key": "Boy",
"doc_count": 2,
"avg_score": {
"value": 74.5
}
}
]
}
}
]
}
}
}
到這邊為止, 已經成功達成"以班級作為分類, 每個班級男生女生的平均分數了".
最後一個: 得分最高者
哈.. 只要把"平均", 也就是avg的地方改成max就好啦!
這邊就不佔版面了, 請同學自行試試看囉~
謝謝大家~ 下課!