Elasticsearch 聚合基礎(二): 度量聚合(metric)

Elasticsearch 聚合搜尋: 度量計算

什麼是度量計算？就是將抓取出來的資料, 做加總、取平均、抓最大最小等等..
當然其他還有很多, 不過這邊就介紹最常用的部分～有興趣看添加度量指标| Elasticsearch: 权威指南| Elastic

範例: 統計班級+性別+分數

假設我們的原始資料結構如下

class	gender	score
A	Girl	81
C	Boy	54
B	Girl	63
B	Boy	71
C	Boy	24
C	Girl	93
C	Boy	85

我們的目標是計算出:

個班級的平均分數
個班級男生女生的平均分數
找出每個班級分數最高的人, 和其性別

建立資料

先在這邊下載原始資料: students.json,
注意最後一行要空白

然後將他上傳: curl -H "Content-Type: Application/json" -XPOST "192.168.40.41:9200/students/doc/_bulk" --data-binary @students.json

大家記得把 elasticsearch ip 換成自己的~

然後看一下是不是所有學生都進去了; 很好！ 50個學生all in.

❯ curl http://192.168.40.41:9200/_cat/indices\?v
health status index               uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   students            0SjW3-EeQ5-uI7KiUNQ8uQ   3   1         50            0     15.7kb           690b

聚合思路

要取得每個班級學生的平均分數,無法一步到位; 這邊就一步一步來看

先依照班級分類
這邊就要使用聚合分組, 還不知道怎麼操作的同學跟著連結複習一下~

第一步我們要把班級給分類出來, 那麼依照上一篇的思路我們的結構應該如下

{
  聚合: {
    以班級分類: {
      詞分類: {
        欄位: 班級
      }
    }
  }
}

1-1. 實際操作

curl -s -H "Content-Type: Application/json" -XPOST "http://192.168.40.41:9200/students/_search" -d '
{
  "size": 0,
  "query": {
    "bool": {}
  },
  "aggs": {
    "group_by_class": {
      "terms": {
        "field": "class.keyword"
      }
    }
  }
}
' | jq .

{
  "took": 297,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 50,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "group_by_class": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "C",
          "doc_count": 20
        },
        {
          "key": "A",
          "doc_count": 13
        },
        {
          "key": "D",
          "doc_count": 9
        },
        {
          "key": "B",
          "doc_count": 8
        }
      ]
    }
  }
}

分數計算
光是上面那樣分類班級的結果, 肯定是無法滿足我們的目標的. 只能知道為什麼B班級的人特別少XD

沒關係, 咱繼續聚合下去～

複習一下聚合的結構

{
  聚合: {
    <聚合名字自取>: {
      <聚合種類>: {
        <聚合欄位>: ""
      }
    }
  }
}

那麼要這次的目標, 他應該變成這樣

{
  聚合: {
    平均分數: {
      平均: {
        欄位: 分數
      }
    }
  }
}

2-1. 結合兩個聚合

所以說這邊我們要先分類班級, 然後計算分數; 組裝起來會變成這樣

{
  聚合: {
    以班級分類: {
      詞分類: {
        欄位: 班級
      },
      聚合: {
        平均分數: {
          平均: {
            欄位: 分數
          }
        }
      }
    }
  }
}

2-2. 實際操作

curl -s -H "Content-Type: Application/json" -XPOST "http://192.168.40.41:9200/students/_search" -d '
{
  "size": 0,
  "query": {
    "bool": {}
  },
  "aggs": {
    "group_by_class": {
      "terms": {
        "field": "class.keyword"
      },
      "aggs": {
        "avg_score": {
          "avg": {
            "field": "score"
          }
        }
      }
    }
  }
}
' | jq .

{
  "took": 10365,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 50,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "group_by_class": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "C",
          "doc_count": 20,
          "avg_score": {
            "value": 62.75
          }
        },
        {
          "key": "A",
          "doc_count": 13,
          "avg_score": {
            "value": 60.76923076923077
          }
        },
        {
          "key": "D",
          "doc_count": 9,
          "avg_score": {
            "value": 71.77777777777777
          }
        },
        {
          "key": "B",
          "doc_count": 8,
          "avg_score": {
            "value": 73.625
          }
        }
      ]
    }
  }
}

到目前為止; 已經算出每個班級的平均啦

再使用性別作為分類
到目前有跟上的同學, 一定腦筋很快的已經寫好了聚合了
就在堆一個以性別做分類下去就對啦～～

{
  "aggs": {
    "group_by_class": {
      "terms": {
        "field": "class.keyword"
      },
      "aggs": {
        "avg_score": {
          "avg": {
            "field": "score"
          },
          "aggs": {
            "group_by_gender": {
              "terms": {
                "field": "gender.keyword"
              }
            }
          }
        }
      }
    }
  }
}

噗.. 結果得到一個"avg不接受子聚合"

{
  "error": {
    "root_cause": [
      {
        "type": "aggregation_initialization_exception",
        "reason": "Aggregator [avg_score] of type [avg] cannot accept sub-aggregations"
      }
    ],
    "type": "aggregation_initialization_exception",
    "reason": "Aggregator [avg_score] of type [avg] cannot accept sub-aggregations"
  },
  "status": 500
}

沒關係; 那咱們把它寫在上面; 結構如下

{
  聚合: {
    以班級分類: {
      詞分類: {
        欄位: 班級
      },
      聚合: {
        以性別分類: {
	      詞分類: {
            欄位: 性別
          },
          聚合: {
            平均分數: {
              平均: {
                欄位: 分數
              }
            }
          }
        }
      }
    }
  }
}

3-1. 實際操作

curl -s -H "Content-Type: Application/json" -XPOST "http://192.168.40.41:9200/students/_search" -d '
{
  "size": 0,
  "query": {
    "bool": {}
  },
  "aggs": {
    "group_by_class": {
      "terms": {
        "field": "class.keyword"
      },
      "aggs": {
        "group_by_gender": {
          "terms": {
            "field": "gender.keyword"
          },
          "aggs": {
            "avg_score": {
              "avg": {
                "field": "score"
              }
            }
          }
        }
      }
    }
  }
}
' | jq .


{
  "took": 561,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 50,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "group_by_class": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "C",
          "doc_count": 20,
          "group_by_gender": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "Boy",
                "doc_count": 15,
                "avg_score": {
                  "value": 60.06666666666667
                }
              },
              {
                "key": "Girl",
                "doc_count": 5,
                "avg_score": {
                  "value": 70.8
                }
              }
            ]
          }
        },
        {
          "key": "A",
          "doc_count": 13,
          "group_by_gender": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "Girl",
                "doc_count": 7,
                "avg_score": {
                  "value": 71.42857142857143
                }
              },
              {
                "key": "Boy",
                "doc_count": 6,
                "avg_score": {
                  "value": 48.333333333333336
                }
              }
            ]
          }
        },
        {
          "key": "D",
          "doc_count": 9,
          "group_by_gender": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "Boy",
                "doc_count": 5,
                "avg_score": {
                  "value": 65.6
                }
              },
              {
                "key": "Girl",
                "doc_count": 4,
                "avg_score": {
                  "value": 79.5
                }
              }
            ]
          }
        },
        {
          "key": "B",
          "doc_count": 8,
          "group_by_gender": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "Girl",
                "doc_count": 6,
                "avg_score": {
                  "value": 73.33333333333333
                }
              },
              {
                "key": "Boy",
                "doc_count": 2,
                "avg_score": {
                  "value": 74.5
                }
              }
            ]
          }
        }
      ]
    }
  }
}

到這邊為止, 已經成功達成"以班級作為分類, 每個班級男生女生的平均分數了".

最後一個: 得分最高者

哈.. 只要把"平均", 也就是avg的地方改成max就好啦！
這邊就不佔版面了, 請同學自行試試看囉～

謝謝大家～下課！

運維筆記

LINUX || OPS || DEV