運維筆記

vim 配置分享

Tommy Lin — Fri, 13 Sep 2019 20:22:00 GMT

前言

Vim 厲害的地方有三個:

各種快捷件, 熟練的話可以達到人機合一的境界.
各種神器插件, 功力夠的話也可以自己寫. 沒有做不到, 只有想不到.
運行在終端裡面 (廢話), 不過這滿重要的, 本人離不開終端R.

本來想寫個vim推坑指南的, 但是想想這寫下來應該會不少內容..
總而言之先分享 vim 配置文件吧~

效果預覽

先曬一張圖.

配置vim

下載插件管理器: Vundle

Vundle 是vim的插件管理器(之一), 使用以下指令安裝 git clone https://github.com/VundleVim/Vundle.vim.git ~/.vim/bundle/Vundle.vim

配置文件 .vimrc

貼上以下配置文件, 路徑為 ~/.vimrc; Windows用戶是在 %HOMEPATH%\_vimrc

""""""""""
" 基礎配置
""""""""""

syntax on " 支持語法高亮
set backspace=2 " 映射backspace, 適用於MAC
set laststatus=2 " 永遠開啟狀態列
set encoding=utf-8 " 支持utf8
set termencoding=utf-8
set fillchars+=stl:\ ,stlnc:\
set term=xterm-256color " 支持256色
set t_Co=256
set noautoindent " 關閉自動縮排
set number " 開啟行數
set hlsearch " 高亮搜索
set incsearch " 開啟全域搜索
set ignorecase " 搜索無視大小寫
set cursorline " 高亮所在行
set expandtab " tab 轉為空格
set tabstop=2 " tab 輸出兩個空格
set mouse=a " 支持滑鼠
set wildmenu " 指令提示菜單
au BufRead,BufNewFile *.vue set filetype=html " 以html來對待vue

"-- 摺疊配置 --
set foldcolumn=1
set foldlevelstart=99
setlocal foldmethod=marker
setlocal foldmarker={,}
nnoremap  za " 使用空白件折疊


""""""""""
" Vundle配置
""""""""""

set nocompatible
filetype off
set rtp+=~/.vim/bundle/Vundle.vim
call vundle#begin()

" 插件
Plugin 'gmarik/Vundle.vim' " 插件管理器本人
Plugin 'mattn/emmet-vim' " 快速插入html
Plugin 'tpope/vim-surround' " 快速包圍
Plugin 'chrisbra/Colorizer' " 顏色提示
Plugin 'scrooloose/nerdtree' " 樹形目錄
Plugin 'jistr/vim-nerdtree-tabs' " 樹型目錄強化
Plugin 'jiangmiao/auto-pairs' " 自動補全對稱符
Plugin 'mkitt/tabline.vim' " tab頁籤
Plugin 'itchyny/lightline.vim' " 下方狀態列表
Plugin 'scrooloose/nerdcommenter' " 快速註解
Plugin 'joshdick/onedark.vim' " vim 主題
Plugin 'Glench/Vim-Jinja2-Syntax' " python Jinja 模塊語法高亮

Plugin 'ryanoasis/vim-devicons' " 目錄icon
" 系統字體需要支持特殊字元, 推薦nerdfonts

Plugin 'prettier/vim-prettier' " 一鍵排版
" 需安裝prettier, npm install -g prettier

Plugin 'w0rp/ale' " 代碼錯誤提示
" 需安裝linter, 這邊使用jshint; npm install -g jshint
" jshint 配置參考: https://github.com/victorporof/Sublime-JSHint#using-your-own-jshintrc-options

Plugin 'Valloric/YouCompleteMe' " 補全提示
" vim 需要支持python
" 安裝方式: cd ~/.vim/bundle/YouCompleteMe &&  python3 install.py --ts-completer

call vundle#end()
filetype plugin indent on

""""""""""
" 插件配置
""""""""""

colorscheme onedark " vim 主題

" -- lightline 主題 --
let g:lightline = {
      \ 'colorscheme': 'wombat',
      \ }

" -- ALE 配置 --
let g:ale_sign_error = '✗'
let g:ale_sign_warning = '⚡'
let g:ale_open_list = 1

" [ 跳至下一個錯誤提示
" ] 跳至前一個錯誤提示
nmap [ :ALENext
nmap ] :ALEPrevious
" nmap  :ALELast
" nmap  :ALEFirst


" -- NERDTree 配置 --
map  :NERDTreeToggle
let NERDTreeAutoCenter=1
let NERDTreeShowHidden=1
let NERDTreeWinSize=31
let g:nerdtree_tabs_open_on_console_startup=1
let NERDTreeShowBookmarks=1
" let NERDTreeIgnore=['\.pyc','\~$','\.swp']
let NERDTreeQuitOnOpen=0
let g:NERDTreeChDirMode=2
let NERDTreeMapActivateNode=''
let g:NERDSpaceDelims = 1
let g:NERDDefaultAlign = 'left'
let g:NERDCompactSexyComs = 1

" 沒有文件開啟的時候關閉nerdtree
autocmd QuitPre * if empty(&bt) | lclose | endif

" -- 分頁快捷鍵配置 --
" Ctrl + t 開啟分頁
" Ctrl + x 關閉分頁
" Ctrl + hjkl 切換分頁
map   :tabnew 
map   :tabclose
map   :tabn
map   :tabp
map   :tabfirst
map   :tablast

" -- 其他自定義映射 --
" 全局替換換行符, Mac使用者可能會用到
map  :%s//\r/g

" 使用 \ 和 - 分屏
" Ctrl w + \ 垂直分屏
" Ctrl w + - 水平分屏
nnoremap \ v
nnoremap - s

" 快速重新語法高亮, 對於vue很有用
nnoremap s :syntax sync fromstart

開啟vim, 安裝插件

剛剛貼上了配置文件, 接下來還要安裝才會生效.
打開的時候可能會有些錯誤提示, 別擔心那是因為我們先進行配置實際還沒有安裝.

上述的插件可以自己選擇安裝, 不需要的註解掉即可.
接下來請使用:PluginInstall來安裝.

結尾

有空的話再個別介紹一些插件吧, 希望大家都可以成為厲害的Vimer~

logstasht接收nginx日誌出現Unrecognized character escape 'x' (code 120)

Tommy Lin — Wed, 08 May 2019 05:59:48 GMT

最近在用logstash統計nginx access log, 發現如果nginx log裡面帶有中文轉譯的字符的時候, logstash 會沒有辦法正確解析(json parse fail) nginx的log.

怎麼會呢, 不是都已經轉成json了嗎？我們這就去nginx log看一下什麼狀況, 可以看到錯誤的好像是 user_agent \xE4\xBD\xA0

大家可以把下面的log拿去validate一下; 可以看到這並不是正確的json, 正確的utf-8字符轉譯應該要是 \u0000 這種格式, 並不是 \x00

{"status": "200","remote_addr": "192.168.21.152","remote_user": "-","timestamp": "2019-05-08T12:48:26+08:00","request": "GET / HTTP/1.1","bytes_sent": "944","http_referer": "-","http_user_agent": "\xE4\xBD\xA0","server_name": "192.168.150.110","uri": "/","args": "-","proxy_host": "mixed.iloveloli.net","upstream_addr": "192.168.150.113:80","upstream_status": "200","upstream_cache_status": "-","request_time": "0.001","upstream_response_time": "0.001","X-Forwarded-For": "-"}

咱半呢？所以說 nginx 自己也知道知道有這種問題; 在1.11.8之後提供了escape=json這個參數(嚴格來說是escape這個參數); 完整說明可以看到文末的參考三
我們這就去設定檔加上看看, 箭頭部份就是我加上的.

修改完就可以 nginx -t && nginx -s reload
下面看一下修改完成之後的效果, 可以看到unicode有正確轉譯囉～結案！

參考一 github issue
參考二 elk 官方教學 nginx log
參考三 nginx log_format文檔

為Elasticsarch添增ik分析器優化中文搜索(二)

Tommy Lin — Wed, 23 Jan 2019 10:09:28 GMT

前言

雖然說安裝好ik分析器可以對中文能夠比較友善的處理了; 但測試後發現有些詞彙還是沒有成功分詞. 不過幸好也找到了解決辦法; 順便也介紹一下如何熱更新流行詞彙吧！

調整之前

我們看看針對這個句子可以得到什麼結果: 首先呢, 既然要使用那個模塊, 就必須先確保你的 Nginx 有編譯該模塊

root@ghost-elastic01:~# curl 'http://localhost:9200/ikhell/_analyze?pretty=true' -H 'Content-Type: application/json' -d '{ "field": "content", "text":"首先呢, 既然要使用那個模塊, 就必須先確保你的 Nginx 有編譯該模塊"}'
{
  "tokens" : [
    {
      "token" : "首先",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "呢",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "CN_CHAR",
      "position" : 1
    },
    {
      "token" : "既然",
      "start_offset" : 5,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "要使",
      "start_offset" : 7,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "使用",
      "start_offset" : 8,
      "end_offset" : 10,
      "type" : "CN_WORD",
      "position" : 4
    },
    {
      "token" : "那",
      "start_offset" : 10,
      "end_offset" : 11,
      "type" : "CN_CHAR",
      "position" : 5
    },
    {
      "token" : "個",
      "start_offset" : 11,
      "end_offset" : 12,
      "type" : "CN_CHAR",
      "position" : 6
    },
    {
      "token" : "模",
      "start_offset" : 12,
      "end_offset" : 13,
      "type" : "CN_CHAR",
      "position" : 7
    },
    {
      "token" : "塊",
      "start_offset" : 13,
      "end_offset" : 14,
      "type" : "CN_CHAR",
      "position" : 8
    },
    {
      "token" : "就",
      "start_offset" : 16,
      "end_offset" : 17,
      "type" : "CN_CHAR",
      "position" : 9
    },
    {
      "token" : "必",
      "start_offset" : 17,
      "end_offset" : 18,
      "type" : "CN_CHAR",
      "position" : 10
    },
    {
      "token" : "須",
      "start_offset" : 18,
      "end_offset" : 19,
      "type" : "CN_CHAR",
      "position" : 11
    },
    {
      "token" : "先",
      "start_offset" : 19,
      "end_offset" : 20,
      "type" : "CN_CHAR",
      "position" : 12
    },
    {
      "token" : "確",
      "start_offset" : 20,
      "end_offset" : 21,
      "type" : "CN_CHAR",
      "position" : 13
    },
    {
      "token" : "保",
      "start_offset" : 21,
      "end_offset" : 22,
      "type" : "CN_CHAR",
      "position" : 14
    },
    {
      "token" : "你",
      "start_offset" : 22,
      "end_offset" : 23,
      "type" : "CN_CHAR",
      "position" : 15
    },
    {
      "token" : "的",
      "start_offset" : 23,
      "end_offset" : 24,
      "type" : "CN_CHAR",
      "position" : 16
    },
    {
      "token" : "nginx",
      "start_offset" : 25,
      "end_offset" : 30,
      "type" : "ENGLISH",
      "position" : 17
    },
    {
      "token" : "有",
      "start_offset" : 31,
      "end_offset" : 32,
      "type" : "CN_CHAR",
      "position" : 18
    },
    {
      "token" : "編",
      "start_offset" : 32,
      "end_offset" : 33,
      "type" : "CN_CHAR",
      "position" : 19
    },
    {
      "token" : "譯",
      "start_offset" : 33,
      "end_offset" : 34,
      "type" : "CN_CHAR",
      "position" : 20
    },
    {
      "token" : "該",
      "start_offset" : 34,
      "end_offset" : 35,
      "type" : "CN_CHAR",
      "position" : 21
    },
    {
      "token" : "模",
      "start_offset" : 35,
      "end_offset" : 36,
      "type" : "CN_CHAR",
      "position" : 22
    },
    {
      "token" : "塊",
      "start_offset" : 36,
      "end_offset" : 37,
      "type" : "CN_CHAR",
      "position" : 23
    }
  ]
}

首先呢, 既然要使用那個模塊, 就必須先確保你的 Nginx 有編譯該模塊 被分解為
首先, 呢, 既然, 要使, 使用, 那, 個, 模, 塊, 就, 必, 須, 先, 確, 保, 你, 的, nginx, 有, 編, 譯, 該, 模, 塊

恩..可謂不盡理想呀..

添加自訂義字典

修改配置文件

根據作者文檔說明ik分析器, 我們可以在/etc/elasticsearch/analysis-ik/IKAnalyzer.cfg.xml這個設定檔添增自定義的字典檔.

在 ext_dict的地方填入我們的字典路徑.




        IK Analyzer 扩展配置
        
        custom/custom.dic

添加字典檔

cd /etc/elasticsearch/analysis-ik
mkdir custom
wget https://raw.githubusercontent.com/samejack/sc-dictionary/master/main.txt -O custom/custom.dic

這邊使用網友製作的超齊百萬字典檔

重啟elasticsearch

systemctl restart elasticsearch

驗證結果

經過自定義的字典檔, 我們來看看分詞如何不同吧.

root@ghost-elastic01:/etc/elasticsearch/analysis-ik# curl 'http://localhost:9200/ikhell/_analyze?pretty=true' -H 'Content-Type: application/json' -d '{ "field": "content", "text":"首先呢, 既然要使用那個模塊, 就必須先確保你的 Nginx 有編譯該模塊"}'
{
  "tokens" : [
    {
      "token" : "首先",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "呢",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "CN_CHAR",
      "position" : 1
    },
    {
      "token" : "既然",
      "start_offset" : 5,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "要使",
      "start_offset" : 7,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "使用",
      "start_offset" : 8,
      "end_offset" : 10,
      "type" : "CN_WORD",
      "position" : 4
    },
    {
      "token" : "那個",
      "start_offset" : 10,
      "end_offset" : 12,
      "type" : "CN_WORD",
      "position" : 5
    },
    {
      "token" : "模塊",
      "start_offset" : 12,
      "end_offset" : 14,
      "type" : "CN_WORD",
      "position" : 6
    },
    {
      "token" : "就必須",
      "start_offset" : 16,
      "end_offset" : 19,
      "type" : "CN_WORD",
      "position" : 7
    },
    {
      "token" : "必須先",
      "start_offset" : 17,
      "end_offset" : 20,
      "type" : "CN_WORD",
      "position" : 8
    },
    {
      "token" : "必須",
      "start_offset" : 17,
      "end_offset" : 19,
      "type" : "CN_WORD",
      "position" : 9
    },
    {
      "token" : "先",
      "start_offset" : 19,
      "end_offset" : 20,
      "type" : "CN_CHAR",
      "position" : 10
    },
    {
      "token" : "確保",
      "start_offset" : 20,
      "end_offset" : 22,
      "type" : "CN_WORD",
      "position" : 11
    },
    {
      "token" : "你的",
      "start_offset" : 22,
      "end_offset" : 24,
      "type" : "CN_WORD",
      "position" : 12
    },
    {
      "token" : "nginx",
      "start_offset" : 25,
      "end_offset" : 30,
      "type" : "ENGLISH",
      "position" : 13
    },
    {
      "token" : "有",
      "start_offset" : 31,
      "end_offset" : 32,
      "type" : "CN_CHAR",
      "position" : 14
    },
    {
      "token" : "編譯",
      "start_offset" : 32,
      "end_offset" : 34,
      "type" : "CN_WORD",
      "position" : 15
    },
    {
      "token" : "該",
      "start_offset" : 34,
      "end_offset" : 35,
      "type" : "CN_CHAR",
      "position" : 16
    },
    {
      "token" : "模塊",
      "start_offset" : 35,
      "end_offset" : 37,
      "type" : "CN_WORD",
      "position" : 17
    }
  ]
}

首先呢, 既然要使用那個模塊, 就必須先確保你的 Nginx 有編譯該模塊 被分解為
首先, 呢, 既然, 要使, 使用, 那個, 模塊, 就必須, 必須先, 必須, 先, 確保, 你的, nginx, 有, 編譯, 該, 模塊

是不是好了很多呢？

熱更新

網路上的詞彙日新月異, 因此我們的必須要能夠熱更新我們的字典檔; 這邊就依照文檔做一次示範.
依照文檔說明, 可以在配置文件填入外部獲取字典的接口, 然後依照Last-Modified和ETag兩個header來決定是否重新獲取字典檔案.

若文件被編輯過, 則Last-Modified和ETag會變動; 代表文件被改動過, 此時就會重新獲取字典檔.

編輯配置文件

首先也是編輯IKAnalyzer.cfg.xml這個配置文件; 找到remote_ext_dict這個入口.




        IK Analyzer 扩展配置
        
        custom/custom.dic
        
        
        
        http://127.0.0.1/es/dic

這邊因為外掛內容被改動過, 所以還是重啟elasticsearch一次: systemctl restart elasticsearch

配置nginx

在server區塊當中加入以下的路徑接口

    location /es {
      alias /etc/nginx/elasticsearch;
    }

重新載入配置文件nginx -s reload

配置字典檔案

mkdir /etc/nginx/elasticsearch
touch /etc/nginx/elasticsearch/dic

到這邊為止; 我們先測試能不能獲取到字典

root@ubuntu-87:~# curl http://127.0.0.1/es/dic

因為目前字典是空的; 這樣就是成功囉.

驗證熱更新

在更新之前先測試一次分詞: 傻眼貓咪氣pupu

root@ubuntu-87:~# curl 'http://localhost:9200/ikhell/_analyze?pretty=true' -H 'Content-Type: application/json' -d '{ "field": "content", "text":"傻眼貓咪氣pupu"}'
{
  "tokens" : [
    {
      "token" : "傻眼",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "貓咪",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "氣",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "CN_CHAR",
      "position" : 2
    },
    {
      "token" : "pupu",
      "start_offset" : 5,
      "end_offset" : 9,
      "type" : "ENGLISH",
      "position" : 3
    }
  ]
}

可以看到被分詞為: 傻眼,貓咪, 氣, pupu.

然後我們寫入新的詞彙
echo -e "傻眼貓咪\n氣pupu" > /etc/nginx/elasticsearch/dic

接著再測試一次

root@ubuntu-87:~# curl 'http://localhost:9200/ikhell/_analyze?pretty=true' -H 'Content-Type: application/json' -d '{ "field": "content", "text":"傻眼貓咪氣pupu"}'
{
  "tokens" : [
    {
      "token" : "傻眼貓咪",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "傻眼",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "貓咪",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "氣pupu",
      "start_offset" : 4,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "pupu",
      "start_offset" : 5,
      "end_offset" : 9,
      "type" : "ENGLISH",
      "position" : 4
    }
  ]
}

可以看到我們的流行詞彙都進來了唷！分詞結果為: 傻眼貓咪, 傻眼, 貓咪, 氣pupu, pupu
經測試發現這並不是立即生效的, 可能要等數分鐘; 請大家給他一點時間ＸＤ

尾聲

中文搜尋的基本優化也就到這邊告個段落了, 之後會沿著這個脈絡繼續做Ghost博客和Elasticsearch搭配的相關文章唷！下次見～

為Elasticsarch添增ik分析器優化中文搜索(一)

Tommy Lin — Tue, 22 Jan 2019 08:49:59 GMT

前言

Elasticsearch作為一個母語為英文的索引軟體, 對中文的分詞效果簡直慘不忍睹; 不過還好有ik分析器可以使用, 解決了這尷尬的窘境.

一顆慘不忍睹的栗子

放置資料

首先我們放四筆資料吧

curl -XPOST http://localhost:9200/ik/fulltext/1 -H 'Content-Type:application/json' -d '{"content":"美国留给伊拉克的是个烂摊子吗"}'
curl -XPOST http://localhost:9200/ik/fulltext/2 -H 'Content-Type:application/json' -d '{"content":"公安部：各地校车将享最高路权"}'
curl -XPOST http://localhost:9200/ik/fulltext/3 -H 'Content-Type:application/json' -d '{"content":"中韩渔警冲突调查：韩警平均每天扣1艘中国渔船"}'
curl -XPOST http://localhost:9200/ik/fulltext/4 -H 'Content-Type:application/json' -d '{"content":"中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"}'

嘗試搜尋

我們現在有四筆資料, 讓我們嘗試搜尋中国試試看

root@ubuntu-87:~# curl -XPOST "http://localhost:9200/ik/fulltext/_search?pretty"  -H 'Content-Type:application/json' -d'
{
    "query" : { "match" : { "content" : "中国" }},
    "highlight" : {
        "fields" : {
            "content" : {}
        }
    }
}
'
{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 1.264571,
    "hits" : [
      {
        "_index" : "ik",
        "_type" : "fulltext",
        "_id" : "4",
        "_score" : 1.264571,
        "_source" : {
          "content" : "中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"
        },
        "highlight" : {
          "content" : [
            "中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"
          ]
        }
      },
      {
        "_index" : "ik",
        "_type" : "fulltext",
        "_id" : "3",
        "_score" : 0.68324494,
        "_source" : {
          "content" : "中韩渔警冲突调查：韩警平均每天扣1艘中国渔船"
        },
        "highlight" : {
          "content" : [
            "中韩渔警冲突调查：韩警平均每天扣1艘中国渔船"
          ]
        }
      },
      {
        "_index" : "ik",
        "_type" : "fulltext",
        "_id" : "1",
        "_score" : 0.2876821,
        "_source" : {
          "content" : "美国留给伊拉克的是个烂摊子吗"
        },
        "highlight" : {
          "content" : [
            "美国留给伊拉克的是个烂摊子吗"
          ]
        }
      }
    ]
  }
}

結果跑出了三筆資料, 可以看到highlight的部分也就是elasticsearch認定搜索到的字詞

中 国驻洛杉矶领事馆遭亚裔男子枪击嫌犯已自首
中韩渔警冲突调查：韩警平均每天扣1艘中 国渔船
美国留给伊拉克的是个烂摊子吗

這明顯不是我們想要的結果呀！

問題原因

原來是因為分詞出了問題, 預設的分詞器根本不曉得中文這玩意兒, 於是就將詞逐字分詞了. 還不曉得分詞的同學可以看Elasticsearch當中的分析器-Analyzer.

讓我們看看這貨都將我們的句子怎麼分詞了:
以第一筆資料為例子: 美国留给伊拉克的是个烂摊子吗

root@ubuntu-87:~# curl -H "Content-Type:application/json"  "http://localhost:9200/ik/fulltext/1/_termvectors?pretty" -d '{ "fields" :
 ["content"] }'
{
  "_index" : "ik",
  "_type" : "fulltext",
  "_id" : "1",
  "_version" : 1,
  "found" : true,
  "took" : 0,
  "term_vectors" : {
    "content" : {
      "field_statistics" : {
        "sum_doc_freq" : 14,
        "doc_count" : 1,
        "sum_ttf" : 14
      },
      "terms" : {
        "个" : {
          "term_freq" : 1,
          "tokens" : [
            {
              "position" : 9,
              "start_offset" : 9,
              "end_offset" : 10
            }
          ]
        },
        "伊" : {
          "term_freq" : 1,
          "tokens" : [
            {
              "position" : 4,
              "start_offset" : 4,
              "end_offset" : 5
            }
          ]
        },
        "克" : {
          "term_freq" : 1,
          "tokens" : [
            {
              "position" : 6,
              "start_offset" : 6,
              "end_offset" : 7
            }
          ]
        },
        "吗" : {
          "term_freq" : 1,
          "tokens" : [
            {
              "position" : 13,
              "start_offset" : 13,
              "end_offset" : 14
            }
          ]
        },
        "国" : {
          "term_freq" : 1,
          "tokens" : [
            {
              "position" : 1,
              "start_offset" : 1,
              "end_offset" : 2
            }
          ]
        },
        "子" : {
          "term_freq" : 1,
          "tokens" : [
            {
              "position" : 12,
              "start_offset" : 12,
              "end_offset" : 13
            }
          ]
        },
        "拉" : {
          "term_freq" : 1,
          "tokens" : [
            {
              "position" : 5,
              "start_offset" : 5,
              "end_offset" : 6
            }
          ]
        },
        "摊" : {
          "term_freq" : 1,
          "tokens" : [
            {
              "position" : 11,
              "start_offset" : 11,
              "end_offset" : 12
            }
          ]
        },
        "是" : {
          "term_freq" : 1,
          "tokens" : [
            {
              "position" : 8,
              "start_offset" : 8,
              "end_offset" : 9
            }
          ]
        },
        "烂" : {
          "term_freq" : 1,
          "tokens" : [
            {
              "position" : 10,
              "start_offset" : 10,
              "end_offset" : 11
            }
          ]
        },
        "留" : {
          "term_freq" : 1,
          "tokens" : [
            {
              "position" : 2,
              "start_offset" : 2,
              "end_offset" : 3
            }
          ]
        },
        "的" : {
          "term_freq" : 1,
          "tokens" : [
            {
              "position" : 7,
              "start_offset" : 7,
              "end_offset" : 8
            }
          ]
        },
        "给" : {
          "term_freq" : 1,
          "tokens" : [
            {
              "position" : 3,
              "start_offset" : 3,
              "end_offset" : 4
            }
          ]
        },
        "美" : {
          "term_freq" : 1,
          "tokens" : [
            {
              "position" : 0,
              "start_offset" : 0,
              "end_offset" : 1
            }
          ]
        }
      }
    }
  }
}

句子裡面每個字都被逐一分開了, 可謂碎屍萬段～好慘呀～！！

ik分析器

為了解決中文分詞這種好恐怖好恐怖的狀況, 我們必須換一個分析器. ik分析器是一個第三方的外掛, 也是本文的主角, 專治給中文胡亂分詞的elasticsearch.

提供github傳送門elasticsearch-analysis-ik

安裝

我們直接透過內建的方式安裝吧. 這邊一定要注意的就是elasticsearch和ik分析器的版本必須對應.

# 進入elasticsearch目錄
cd /usr/share/elasticsearch

# 安裝
./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.4.2/elasticsearch-analysis-ik-6.4.2.zip

# 重啟es後生效
systemctl restart elasticsearch

測試

來測試吧, 首先我們創建一個索引

curl -XPUT "http://localhost:9200/ik1?pretty"
{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "ik1"
}

然後配置映射, 告訴他我們要使用ik分析器

curl -XPOST "http://localhost:9200/ik1/fulltext/_mapping?pretty" -H 'Content-Type:application/json' -d'
{
        "properties": {
            "content": {
                "type": "text",
                "analyzer": "ik_max_word",
                "search_analyzer": "ik_max_word"
            }
        }

}'

{
  "acknowledged" : true
}

接著一樣丟上剛剛的四筆文檔

curl -XPOST http://localhost:9200/ik1/fulltext/1 -H 'Content-Type:application/json' -d '{"content":"美国留给伊拉克的是个烂摊子吗"}'
curl -XPOST http://localhost:9200/ik1/fulltext/2 -H 'Content-Type:application/json' -d '{"content":"公安部：各地校车将享最高路权"}'
curl -XPOST http://localhost:9200/ik1/fulltext/3 -H 'Content-Type:application/json' -d '{"content":"中韩渔警冲突调查：韩警平均每天扣1艘中国渔船"}'
curl -XPOST http://localhost:9200/ik1/fulltext/4 -H 'Content-Type:application/json' -d '{"content":"中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"}'

最後查看搜尋成果, 可以看到已經準確多啦

{
    "query" : { "match" : { "content" : "中国" }},
    "highlight" : {
        "fields" : {
            "content" : {}
        }
    }
}
' 

{
  "took": 6,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.6489038,
    "hits": [
      {
        "_index": "ik1",
        "_type": "fulltext",
        "_id": "4",
        "_score": 0.6489038,
        "_source": {
          "content": "中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"
        },
        "highlight": {
          "content": [
            "中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"
          ]
        }
      },
      {
        "_index": "ik1",
        "_type": "fulltext",
        "_id": "3",
        "_score": 0.2876821,
        "_source": {
          "content": "中韩渔警冲突调查：韩警平均每天扣1艘中国渔船"
        },
        "highlight": {
          "content": [
            "中韩渔警冲突调查：韩警平均每天扣1艘中国渔船"
          ]
        }
      }
    ]
  }
}

搜尋結果兩筆資料

中国驻洛杉矶领事馆遭亚裔男子枪击嫌犯已自首
中韩渔警冲突调查：韩警平均每天扣1艘中国渔船

這樣就對啦～

驗證分詞

我們來看看透過ik分析器之後, elasticsearch會怎麼分詞吧～
一樣第一筆資料為例子: 美国留给伊拉克的是个烂摊子吗

root@ubuntu-87:~# curl -H "Content-Type:application/json"  "http://localhost:9200/ik1/fulltext/1/_termvectors?pretty" -d '{ "fields": ["content"] }'
{
  "_index" : "ik1",
  "_type" : "fulltext",
  "_id" : "1",
  "_version" : 1,
  "found" : true,
  "took" : 0,
  "term_vectors" : {
    "content" : {
      "field_statistics" : {
        "sum_doc_freq" : 9,
        "doc_count" : 1,
        "sum_ttf" : 9
      },
      "terms" : {
        "个" : {
          "term_freq" : 1,
          "tokens" : [
            {
              "position" : 5,
              "start_offset" : 9,
              "end_offset" : 10
            }
          ]
        },
        "伊拉克" : {
          "term_freq" : 1,
          "tokens" : [
            {
              "position" : 2,
              "start_offset" : 4,
              "end_offset" : 7
            }
          ]
        },
        "吗" : {
          "term_freq" : 1,
          "tokens" : [
            {
              "position" : 8,
              "start_offset" : 13,
              "end_offset" : 14
            }
          ]
        },
        "摊子" : {
          "term_freq" : 1,
          "tokens" : [
            {
              "position" : 7,
              "start_offset" : 11,
              "end_offset" : 13
            }
          ]
        },
        "是" : {
          "term_freq" : 1,
          "tokens" : [
            {
              "position" : 4,
              "start_offset" : 8,
              "end_offset" : 9
            }
          ]
        },
        "烂摊子" : {
          "term_freq" : 1,
          "tokens" : [
            {
              "position" : 6,
              "start_offset" : 10,
              "end_offset" : 13
            }
          ]
        },
        "留给" : {
          "term_freq" : 1,
          "tokens" : [
            {
              "position" : 1,
              "start_offset" : 2,
              "end_offset" : 4
            }
          ]
        },
        "的" : {
          "term_freq" : 1,
          "tokens" : [
            {
              "position" : 3,
              "start_offset" : 7,
              "end_offset" : 8
            }
          ]
        },
        "美国" : {
          "term_freq" : 1,
          "tokens" : [
            {
              "position" : 0,
              "start_offset" : 0,
              "end_offset" : 2
            }
          ]
        }
      }
    }
  }
}

尾聲

透過ik分詞器, 原本被切的亂七八糟的中文句子也得以解脫囉～

Elasticsearch當中的分析器-Analyzer

Tommy Lin — Tue, 22 Jan 2019 06:18:45 GMT

前言

相信很多朋友剛開始在使用Elasticsearch的時候,一定都會遇到一個問題： 我的檔案內容清清楚楚的寫在那, 怎麼就是搜尋不到？ 其中很大的可能就是分析器沒有正確配置唷！

舉個例子

搜尋英文進行式單詞

首先放入一筆資料, 內容是 "Set the shape to semi-transparent by calling set_trans(5)"

root@ubuntu-87:~# curl -XPOST http://localhost:9200/demo/demotype/1 -H 'Content-Type:application/json' -d '
{
  "content": "Set the shape to semi-transparent by calling set_trans(5)"
}' | jq .

{
  "_index": "demo",
  "_type": "demotype",
  "_id": "1",
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 0,
  "_primary_term": 1
}

然後我們試著搜尋 call; 結果是搜尋不到這筆資料.

root@ubuntu-87:~# curl -s -XPOST http://localhost:9200/demo/demotype/_search  -H 'Content-Type:application/json' -d '
{
  "query" : {
    "match" : {
      "content" : "call"
    }
  }
}' | jq .

{
  "took": 6,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

原因

在Elasticsearch可以搜尋文檔之前, 他必須先將字詞內容做拆分(分詞)和加工, 然後透過這些拆分好的詞建構倒排索引, 之後使用者才能在透過關鍵詞搜尋到我們想要的文檔.

而文檔內容要如何分詞或是加工, 依靠的就是分析器的配置規則囉！

也就是說,Elasticsearch認定的分詞是 calling 不是call, 所以搜尋不到.

分析器介紹

先附上文檔
英文: Elasticsearch: The Definitive Guide - Analysis and Analyzers
中文: Elasticsearch: 权威指南 - 分析与分析器

分析器是由: 字符過濾器(Character filters), 分詞器(Tokenizer), Token過濾器(Token filters) 組成的; 他們的工作流程如下

字符過濾器(Character filters): 預處理
首先，字符串按順序通過每個字符過濾器。他們的任務是在分詞前整理字符串。一個字符過濾器可以用來去掉HTML，或者將 & 轉化成 and。
分詞器(Tokenizer): 主要分詞工作
其次，字符串被分詞器分為單個的詞條。一個簡單的分詞器遇到空格和標點的時候，可能會將文本拆分成詞條。
Token過濾器(Token filters): 後續加工
最後，詞條按順序通過每個 token 過濾器。這個過程可能會改變詞條（例如，小寫化 Quick ），刪除詞條（例如，像 a， and， the 等無用詞），或者增加詞條（例如，像 jump 和 leap 這種同義詞）。

內置分析器效果預覽

Elasticsearch 內置了幾種分析器, 透過套用不同的分析器就可以讓這句文檔產生不同的索引效果

標準分析器 (Standard analyzer)
標準分析器是Elasticsearch默認使用的分析器。它是分析各種語言文本最常用的選擇。它根據 Unicode 聯盟定義的單詞邊界劃分文本。刪除絕大部分標點。最後，將詞條小寫。他會產生
set, the, shape, to, semi, transparent, by, calling, set_trans, 5
簡單分析器 (Simple analyzer)
簡單分析器在任何不是字母的地方分隔文本，將詞條小寫。它會產生
set, the, shape, to, semi, transparent, by, calling, set, trans
空格分析器 (Whitespace analyzer)
空格分析器在空格的地方劃分文本。它會產生
Set, the, shape, to, semi-transparent, by, calling, set_trans(5)
語言分析器 (Language analyzers)
特定語言分析器可用於很多語言。它們可以考慮指定語言的特點。例如，英語分析器附帶了一組英語無用詞（常用單詞，例如 and 或者 the ，它們對相關性沒有多少影響），它們會被刪除。由於理解英語語法的規則，這個分詞器可以提取英語單詞的詞幹。

英語分詞器會產生下面的詞條：

set, shape, semi, transpar, call, set_tran, 5
註意看 transparent, calling 和 set_trans 已經變為詞根格式。

看到這邊大家應該了解到如果想要搜尋英文詞根的話; 必須套用語言分析器～

配置分析器

分析器的配置放在索引的mapping之下, 而更改過的mapping無法影響已存在的文檔; 所以我們另外配置一個demo1來做示範

新增索引: curl -XPUT http://localhost:9200/demo1
配置映射:

curl -XPOST http://localhost:9200/demo1/demotype/_mapping -H 'Content-Type:application/json' -d'
{
  "properties": {
    "content": {
      "type": "text",
      "analyzer": "english"
    }
  }
}'

新增文檔: curl -XPOST http://localhost:9200/demo1/demotype/1 -H 'Content-Type:application/json' -d '{"content":"Set the shape to semi-transparent by calling set_trans(5)"}'
嘗試搜尋:

root@ubuntu-87:~# curl -s -XPOST http://localhost:9200/demo1/demotype/_search  -H 'Content-Type:application/json' -d '
{
  "query" : {
    "match" : {
      "content" : "call"
    }
  }
}' | jq .

{
  "took": 19,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "demo1",
        "_type": "demotype",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "content": "Set the shape to semi-transparent by calling set_trans(5)"
        }
      }
    ]
  }
}

查看分詞效果

僅管我們進行了這些配置, 但是只能透過搜尋來測試有沒有成功, 這樣子還是比較模糊的.
所以 Elasticsearch 提供了兩個 api 讓我們可以直接查看分詞效果: _termvectors 和 _analyze, 而不管是哪一個都提供了以下資訊

剛檔案的分詞組成
start_offset, end_offset: 字詞出現的位置, 用於高亮搜索
position: 分詞出現的順序
type: 詞的類別

分詞效果-預設狀態

使用_analyze查看

root@ubuntu-87:~# curl 'http://localhost:9200/demo/_analyze?pretty=true' -H 'Content-Type: application/json' -d '{ "field": "content", "text":"Set the shape to semi-transparent by calling set_trans(5)"}'

{
  "tokens" : [
    {
      "token" : "set",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "",
      "position" : 0
    },
    {
      "token" : "the",
      "start_offset" : 4,
      "end_offset" : 7,
      "type" : "",
      "position" : 1
    },
    {
      "token" : "shape",
      "start_offset" : 8,
      "end_offset" : 13,
      "type" : "",
      "position" : 2
    },
    {
      "token" : "to",
      "start_offset" : 14,
      "end_offset" : 16,
      "type" : "",
      "position" : 3
    },
    {
      "token" : "semi",
      "start_offset" : 17,
      "end_offset" : 21,
      "type" : "",
      "position" : 4
    },
    {
      "token" : "transparent",
      "start_offset" : 22,
      "end_offset" : 33,
      "type" : "",
      "position" : 5
    },
    {
      "token" : "by",
      "start_offset" : 34,
      "end_offset" : 36,
      "type" : "",
      "position" : 6
    },
    {
      "token" : "calling",
      "start_offset" : 37,
      "end_offset" : 44,
      "type" : "",
      "position" : 7
    },
    {
      "token" : "set_trans",
      "start_offset" : 45,
      "end_offset" : 54,
      "type" : "",
      "position" : 8
    },
    {
      "token" : "5",
      "start_offset" : 55,
      "end_offset" : 56,
      "type" : "",
      "position" : 9
    }
  ]
}

使用_termvectors查看

root@ubuntu-87:~# curl -H "Content-Type:application/json"  'http://localhost:9200/demo/demotype/1/_termvectors?pretty=true' -d '{"fie
lds" : ["content"]}'

{
  "_index" : "demo",
  "_type" : "demotype",
  "_id" : "1",
  "_version" : 1,
  "found" : true,
  "took" : 0,
  "term_vectors" : {
    "content" : {
      "field_statistics" : {
        "sum_doc_freq" : 10,
        "doc_count" : 1,
        "sum_ttf" : 10
      },
      "terms" : {
        "5" : {
          "term_freq" : 1,
          "tokens" : [
            {
              "position" : 9,
              "start_offset" : 55,
              "end_offset" : 56
            }
          ]
        },
        "by" : {
          "term_freq" : 1,
          "tokens" : [
            {
              "position" : 6,
              "start_offset" : 34,
              "end_offset" : 36
            }
          ]
        },
        "calling" : {
          "term_freq" : 1,
          "tokens" : [
            {
              "position" : 7,
              "start_offset" : 37,
              "end_offset" : 44
            }
          ]
        },
        "semi" : {
          "term_freq" : 1,
          "tokens" : [
            {
              "position" : 4,
              "start_offset" : 17,
              "end_offset" : 21
            }
          ]
        },
        "set" : {
          "term_freq" : 1,
          "tokens" : [
            {
              "position" : 0,
              "start_offset" : 0,
              "end_offset" : 3
            }
          ]
        },
        "set_trans" : {
          "term_freq" : 1,
          "tokens" : [
            {
              "position" : 8,
              "start_offset" : 45,
              "end_offset" : 54
            }
          ]
        },
        "shape" : {
          "term_freq" : 1,
          "tokens" : [
            {
              "position" : 2,
              "start_offset" : 8,
              "end_offset" : 13
            }
          ]
        },
        "the" : {
          "term_freq" : 1,
          "tokens" : [
            {
              "position" : 1,
              "start_offset" : 4,
              "end_offset" : 7
            }
          ]
        },
        "to" : {
          "term_freq" : 1,
          "tokens" : [
            {
              "position" : 3,
              "start_offset" : 14,
              "end_offset" : 16
            }
          ]
        },
        "transparent" : {
          "term_freq" : 1,
          "tokens" : [
            {
              "position" : 5,
              "start_offset" : 22,
              "end_offset" : 33
            }
          ]
        }
      }
    }
  }
}

分詞效果-經過配置後

使用_analyze查看

root@ubuntu-87:~# curl 'http://localhost:9200/demo1/_analyze?pretty=true' -H 'Content-Type: application/json' -d '{ "field": "content
", "text":"Set the shape to semi-transparent by calling set_trans(5)"}'
{
  "tokens" : [
    {
      "token" : "set",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "",
      "position" : 0
    },
    {
      "token" : "shape",
      "start_offset" : 8,
      "end_offset" : 13,
      "type" : "",
      "position" : 2
    },
    {
      "token" : "semi",
      "start_offset" : 17,
      "end_offset" : 21,
      "type" : "",
      "position" : 4
    },
    {
      "token" : "transpar",
      "start_offset" : 22,
      "end_offset" : 33,
      "type" : "",
      "position" : 5
    },
    {
      "token" : "call",
      "start_offset" : 37,
      "end_offset" : 44,
      "type" : "",
      "position" : 7
    },
    {
      "token" : "set_tran",
      "start_offset" : 45,
      "end_offset" : 54,
      "type" : "",
      "position" : 8
    },
    {
      "token" : "5",
      "start_offset" : 55,
      "end_offset" : 56,
      "type" : "",
      "position" : 9
    }
  ]
}

使用_termvectors查看

root@ubuntu-87:~# curl -H "Content-Type:application/json"  'http://localhost:9200/demo1/demotype/1/_termvectors?pretty=true' -d '{"fi
elds" : ["content"]}'
{
  "_index" : "demo1",
  "_type" : "demotype",
  "_id" : "1",
  "_version" : 1,
  "found" : true,
  "took" : 1,
  "term_vectors" : {
    "content" : {
      "field_statistics" : {
        "sum_doc_freq" : 7,
        "doc_count" : 1,
        "sum_ttf" : 7
      },
      "terms" : {
        "5" : {
          "term_freq" : 1,
          "tokens" : [
            {
              "position" : 9,
              "start_offset" : 55,
              "end_offset" : 56
            }
          ]
        },
        "call" : {
          "term_freq" : 1,
          "tokens" : [
            {
              "position" : 7,
              "start_offset" : 37,
              "end_offset" : 44
            }
          ]
        },
        "semi" : {
          "term_freq" : 1,
          "tokens" : [
            {
              "position" : 4,
              "start_offset" : 17,
              "end_offset" : 21
            }
          ]
        },
        "set" : {
          "term_freq" : 1,
          "tokens" : [
            {
              "position" : 0,
              "start_offset" : 0,
              "end_offset" : 3
            }
          ]
        },
        "set_tran" : {
          "term_freq" : 1,
          "tokens" : [
            {
              "position" : 8,
              "start_offset" : 45,
              "end_offset" : 54
            }
          ]
        },
        "shape" : {
          "term_freq" : 1,
          "tokens" : [
            {
              "position" : 2,
              "start_offset" : 8,
              "end_offset" : 13
            }
          ]
        },
        "transpar" : {
          "term_freq" : 1,
          "tokens" : [
            {
              "position" : 5,
              "start_offset" : 22,
              "end_offset" : 33
            }
          ]
        }
      }
    }
  }
}

尾聲

這樣基礎的分析器概念就說完囉, 之後會在新增中文分析器的利器ik分析器～

使用Nginx作為緩存伺服器(Cache Server)

Tommy Lin — Mon, 21 Jan 2019 08:21:14 GMT

緩存伺服器 (Cache Server)

緩存伺服器是用來減輕server loading/traffic的, 怎麼說呢？因為他會將使用者對server的請求結果, 緩存在自己身上; 這樣一來我們的server就不需要對於同樣的請求一直回覆, 因為緩存伺服器會代替我們回覆！

架構解說

第一次請求: 因為CacheServer上面沒資料, 所以回源獲取資料
User  -->  Cache Server  -->  Origin Server
User  <--  Cache Server  <--  Origin Server

第二次請求: CacheServer上面已經有緩存了, 直接代理回應
User  -->  Cache Server    Origin Server
User  <--  Cache Server    Origin Server

實際配置

環境介紹

這次的lab我們會用兩台ubuntu來示範, 兩台都會安裝nginx
proxy.demosite.com: 緩存伺服器的域名
origin.demosite.com: 源站的域名

緩存伺服器配置

首先編輯/etc/nginx/conf.d/cache.conf, 這是全域的快取配置, 內容如下

proxy_cache_path /tmp/nginx/cache levels=1:2 keys_zone=myzone:10m inactive=1d max_size=10g;
proxy_cache_key '$scheme$host$request_uri';

接著編輯/etc/nginx/sites-enabled/proxy.conf 起一個新的site, 內容如下

server {
    listen 80;

    server_name proxy.demosite.com;

    location ~ .*\.(html|png)$ {
        proxy_cache myzone;
        proxy_cache_valid  any 100m;
        proxy_pass http://origin.demosite.com;
    }

    location /info {
        proxy_cache myzone;
        expires -1;

        proxy_pass http://origin.demosite.com;
    }

    add_header X-Cache-Status $upstream_cache_status;
}

配置的緩存路徑資料夾要存在嘿: mkdir -p /tmp/nginx/cache
配置完成就可以重新載入配置文件nginx -s reload

源伺服器配置

源伺服器不需要做其他額外的配置啦～

配置解說

這邊解釋所使用到的基本配置參數, 更完整的內容請參閱Module ngx_http_proxy_module

全域快取配置

先看到上面/etc/nginx/conf.d/cache.conf這個檔案裡面的內容
proxy_cache_path: 緩存路徑, 要把緩存的內容放在哪裡
levels=1:2: 緩存的目錄結構
keys_zone=myzone:10m: zone的名字和可使用記憶體大小, 配至1MB的zone大約可以存8000筆鍵值的快取
inactive=1d: 如果一天之內沒人存取, 就會從自己身上刪除啦
max_size=10g: 允許在身上存放的硬碟容量
proxy_cache_key '$scheme$host$request_uri$is_args$args';: 每筆快取儲存的鍵值方式

virtual site配置

這配置就是/etc/nginx/sites-enabled/proxy.conf
proxy_cache myzone;: 代表要使用剛剛定義的zone
proxy_cache_valid any 100m;: 對於任何結果緩存100分鐘
expires -1;: 過期時間 -1, 就是不緩存啦～
add_header X-Cache-Status $upstream_cache_status;: 增加回應的header, 這樣可以清楚知道有沒有中快取

驗證測試

重新載入nginx之後, 就可以來測試我們的緩存狀況囉～

觀察緩存路徑

先看下我們的緩存路徑, 目前空空如也

root@ubuntu-87:/etc/nginx/conf.d# tree /tmp/nginx/cache/
/tmp/nginx/cache/

0 directories, 0 files

首次請求緩存伺服器

接著對緩存伺服器做一次請求看看

❯ curl -v proxy.demosite.com
* Rebuilt URL to: proxy.demosite.com/
*   Trying 192.168.41.87...
* TCP_NODELAY set
* Connected to proxy.demosite.com (192.168.41.87) port 80 (#0)
> GET / HTTP/1.1
> Host: proxy.demosite.com
> User-Agent: curl/7.54.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Server: nginx/1.10.3 (Ubuntu)
< Date: Mon, 21 Jan 2019 08:57:37 GMT
< Content-Type: text/html
< Content-Length: 334
< Connection: keep-alive
< Last-Modified: Tue, 15 Jan 2019 04:20:55 GMT
< ETag: "5c3d5fa7-14e"
< X-Cache-Status: MISS
< Accept-Ranges: bytes
<

  
  this is 41.166
  gun.png
  
  


  about/about.png
  
  


  info/info.png
  
  


  about
  info
  

* Connection #0 to host proxy.demosite.com left intact

大家注意看到這時候表頭多了這個X-Cache-Status: MISS
代表

我們的緩存配置生效囉
MISS代表身上沒緩存, 因此回源獲取資料; 反之則為HIT

再次查看緩存路徑

經過請求之後, 我們再次查看緩存路徑. 已經有一筆資料囉！
而這個目錄結構就是我們上面配置的level=1:2

root@ubuntu-87:/etc/nginx/conf.d# tree /tmp/nginx/cache/
/tmp/nginx/cache/
└── 3
    └── 51
        └── 600a83b12cb107e844ddd14077759513

第二次請求緩存伺服器

第二次存取的時候因為緩存伺服器身上已經有資料了, 所以就得到HIT囉！

❯ curl -v "proxy.demosite.com" 2>&1 | grep X-Cache-Status
< X-Cache-Status: HIT

緩存解說 proxy_cache_key

這部分對於緩存的正確性以及後續的清理緩存還是比較關鍵的, 因此拉出來解釋一下.

鍵值意義

對於每一筆請求, 我們的緩存伺服器是如何知道自己身上有沒有緩存呢？沒錯就是看proxy_cache_key！
還記得上面我們配置proxy_cache_key '$scheme$host$request_uri$is_args$args';

以我們剛剛的請求來說, proxy.demosite.com , 產生的鍵值就會是 httpproxy.demosite.com/

(http)  proxy.demosite.com (/)
$scheme $host              $request_uri $is_args$args

我們可以看一下剛才產生的緩存文件, 這個快取文件就是完整的html檔案, 裡面就說明了他的鍵值
KEY: httpproxy.demosite.com/

root@ubuntu-87:/etc/nginx/conf.d# cat /tmp/nginx/cache/3/51/600a83b12cb107e844ddd14077759513
0E\_=\E\G"5c3d5fa7-14e"
KEY: httpproxy.demosite.com/
HTTP/1.1 200 OK
Server: nginx/1.10.3 (Ubuntu)
Date: Mon, 21 Jan 2019 09:07:12 GMT
Content-Type: text/html
Content-Length: 334
Last-Modified: Tue, 15 Jan 2019 04:20:55 GMT
Connection: close
ETag: "5c3d5fa7-14e"
Accept-Ranges: bytes


  
  this is 41.166
  gun.png
  
  


  about/about.png
  
  


  info/info.png
  
  


  about
  info

所以 只要使用者請求的Key不同, nginx就可以生成不同的快取文件
換句話說, 如果您的網站下面這兩個請求應該得到不同的結果,

proxy.demosite.com?name=foo
proxy.demosite.com?name=bar
您對於proxy_cache_key的定義就必須精細到arg, 否則nginx會忽略存取參數; 一直以為是同樣的一份文件唷！

而如果您的網站http/https可以不用視為兩份緩存, 那key就可以省略scheme這個變數囉！

鍵值計算

那麼md5怎麼來的呢, 就是使用Key得到的; 我們可以驗證看看

❯ echo -n 'httpproxy.demosite.com/' | md5
600a83b12cb107e844ddd14077759513

清理緩存

所以說, 配置了緩存就要可以清理緩存; 否則有時候網站有更新, 使用者還是會持續看到舊的資料.

方法一: 刪除整個目錄

此方法最為直觀簡易啦～整個緩存全部清除, 也不用想太多
rm -rf /tmp/nginx/cache/*
nginx -s reload

優點: 快速簡易
缺點: 無法控制要刪除的項目

這個方法適用於網站單純的情況下, 不需要考慮如果緩存伺服器突然失去所有檔案, 進而突發回源造成大量存取

方法二: 刪除指定目錄/域名

什麼是刪除指定目錄呢？比如我們今天想刪除 /info 下面所有的緩存
或是比如我們的緩存伺服器同時有兩個網站, 而我只想對單一網站進行緩存清理; 就可以使用這個方法囉！

首先我們理解我們的緩存檔案裡面都會存放我們定義的鍵值
找出鍵值符合的檔案
批量刪除

先看一下; 目前我的緩存共有四筆, 其中的鍵值也都取出來觀察

root@ubuntu-87:/etc/nginx/conf.d# tree /tmp/nginx/cache/
/tmp/nginx/cache/
├── 3
│   ├── 51
│   │   └── 600a83b12cb107e844ddd14077759513
│   └── 9b
│       └── 3ace4f08b744df7310abeb90109e19b3
├── 5
│   └── d8
│       └── 80af67f39abf128943025ab5e1498d85
└── c
    └── 19
        └── 53e313c4a7c020ab414089c28c4c419c


root@ubuntu-87:/etc/nginx/conf.d# grep -ar "KEY: httpproxy.demosit
e.com/" /tmp/nginx/cache
/tmp/nginx/cache/3/51/600a83b12cb107e844ddd14077759513:KEY: httpproxy.demosite.com/
/tmp/nginx/cache/3/9b/3ace4f08b744df7310abeb90109e19b3:KEY: httpproxy.demosite.com/about/about.png
/tmp/nginx/cache/5/d8/80af67f39abf128943025ab5e1498d85:KEY: httpproxy.demosite.com/info/info.png
/tmp/nginx/cache/c/19/53e313c4a7c020ab414089c28c4c419c:KEY: httpproxy.demosite.com/info/info.html

四筆鍵值記錄分別是

httpproxy.demosite.com/
httpproxy.demosite.com/about/about.png
httpproxy.demosite.com/info/info.png
httpproxy.demosite.com/info/info.html

實際操作: 清除 /info 下面的所有緩存
grep -alr "KEY: httpproxy.demosite.com/info" /tmp/nginx/cache | xargs rm

這時候再看一下我們的緩存目錄, 該目錄之下的緩存都清除完畢囉！

root@ubuntu-87:/etc/nginx/conf.d# tree /tmp/nginx/cache/
/tmp/nginx/cache/
├── 3
│   ├── 51
│   │   └── 600a83b12cb107e844ddd14077759513
│   └── 9b
│       └── 3ace4f08b744df7310abeb90109e19b3
├── 5
│   └── d8
└── c
    └── 19

結語

我們下次見啦～

Elasticsearch 聚合基礎(二): 度量聚合(metric)

Tommy Lin — Mon, 03 Dec 2018 09:21:39 GMT

Elasticsearch 聚合搜尋: 度量計算

什麼是度量計算？就是將抓取出來的資料, 做加總、取平均、抓最大最小等等..
當然其他還有很多, 不過這邊就介紹最常用的部分～有興趣看添加度量指标| Elasticsearch: 权威指南| Elastic

範例: 統計班級+性別+分數

假設我們的原始資料結構如下

class	gender	score
A	Girl	81
C	Boy	54
B	Girl	63
B	Boy	71
C	Boy	24
C	Girl	93
C	Boy	85

我們的目標是計算出:

個班級的平均分數
個班級男生女生的平均分數
找出每個班級分數最高的人, 和其性別

建立資料

先在這邊下載原始資料: students.json,
注意最後一行要空白

然後將他上傳: curl -H "Content-Type: Application/json" -XPOST "192.168.40.41:9200/students/doc/_bulk" --data-binary @students.json

大家記得把 elasticsearch ip 換成自己的~

然後看一下是不是所有學生都進去了; 很好！ 50個學生all in.

❯ curl http://192.168.40.41:9200/_cat/indices\?v
health status index               uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   students            0SjW3-EeQ5-uI7KiUNQ8uQ   3   1         50            0     15.7kb           690b

聚合思路

要取得每個班級學生的平均分數,無法一步到位; 這邊就一步一步來看

先依照班級分類
這邊就要使用聚合分組, 還不知道怎麼操作的同學跟著連結複習一下~

第一步我們要把班級給分類出來, 那麼依照上一篇的思路我們的結構應該如下

{
  聚合: {
    以班級分類: {
      詞分類: {
        欄位: 班級
      }
    }
  }
}

1-1. 實際操作

curl -s -H "Content-Type: Application/json" -XPOST "http://192.168.40.41:9200/students/_search" -d '
{
  "size": 0,
  "query": {
    "bool": {}
  },
  "aggs": {
    "group_by_class": {
      "terms": {
        "field": "class.keyword"
      }
    }
  }
}
' | jq .

{
  "took": 297,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 50,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "group_by_class": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "C",
          "doc_count": 20
        },
        {
          "key": "A",
          "doc_count": 13
        },
        {
          "key": "D",
          "doc_count": 9
        },
        {
          "key": "B",
          "doc_count": 8
        }
      ]
    }
  }
}

分數計算
光是上面那樣分類班級的結果, 肯定是無法滿足我們的目標的. 只能知道為什麼B班級的人特別少XD

沒關係, 咱繼續聚合下去～

複習一下聚合的結構

{
  聚合: {
    <聚合名字自取>: {
      <聚合種類>: {
        <聚合欄位>: ""
      }
    }
  }
}

那麼要這次的目標, 他應該變成這樣

{
  聚合: {
    平均分數: {
      平均: {
        欄位: 分數
      }
    }
  }
}

2-1. 結合兩個聚合

所以說這邊我們要先分類班級, 然後計算分數; 組裝起來會變成這樣

{
  聚合: {
    以班級分類: {
      詞分類: {
        欄位: 班級
      },
      聚合: {
        平均分數: {
          平均: {
            欄位: 分數
          }
        }
      }
    }
  }
}

2-2. 實際操作

curl -s -H "Content-Type: Application/json" -XPOST "http://192.168.40.41:9200/students/_search" -d '
{
  "size": 0,
  "query": {
    "bool": {}
  },
  "aggs": {
    "group_by_class": {
      "terms": {
        "field": "class.keyword"
      },
      "aggs": {
        "avg_score": {
          "avg": {
            "field": "score"
          }
        }
      }
    }
  }
}
' | jq .

{
  "took": 10365,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 50,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "group_by_class": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "C",
          "doc_count": 20,
          "avg_score": {
            "value": 62.75
          }
        },
        {
          "key": "A",
          "doc_count": 13,
          "avg_score": {
            "value": 60.76923076923077
          }
        },
        {
          "key": "D",
          "doc_count": 9,
          "avg_score": {
            "value": 71.77777777777777
          }
        },
        {
          "key": "B",
          "doc_count": 8,
          "avg_score": {
            "value": 73.625
          }
        }
      ]
    }
  }
}

到目前為止; 已經算出每個班級的平均啦

再使用性別作為分類
到目前有跟上的同學, 一定腦筋很快的已經寫好了聚合了
就在堆一個以性別做分類下去就對啦～～

{
  "aggs": {
    "group_by_class": {
      "terms": {
        "field": "class.keyword"
      },
      "aggs": {
        "avg_score": {
          "avg": {
            "field": "score"
          },
          "aggs": {
            "group_by_gender": {
              "terms": {
                "field": "gender.keyword"
              }
            }
          }
        }
      }
    }
  }
}

噗.. 結果得到一個"avg不接受子聚合"

{
  "error": {
    "root_cause": [
      {
        "type": "aggregation_initialization_exception",
        "reason": "Aggregator [avg_score] of type [avg] cannot accept sub-aggregations"
      }
    ],
    "type": "aggregation_initialization_exception",
    "reason": "Aggregator [avg_score] of type [avg] cannot accept sub-aggregations"
  },
  "status": 500
}

沒關係; 那咱們把它寫在上面; 結構如下

{
  聚合: {
    以班級分類: {
      詞分類: {
        欄位: 班級
      },
      聚合: {
        以性別分類: {
	      詞分類: {
            欄位: 性別
          },
          聚合: {
            平均分數: {
              平均: {
                欄位: 分數
              }
            }
          }
        }
      }
    }
  }
}

3-1. 實際操作

curl -s -H "Content-Type: Application/json" -XPOST "http://192.168.40.41:9200/students/_search" -d '
{
  "size": 0,
  "query": {
    "bool": {}
  },
  "aggs": {
    "group_by_class": {
      "terms": {
        "field": "class.keyword"
      },
      "aggs": {
        "group_by_gender": {
          "terms": {
            "field": "gender.keyword"
          },
          "aggs": {
            "avg_score": {
              "avg": {
                "field": "score"
              }
            }
          }
        }
      }
    }
  }
}
' | jq .


{
  "took": 561,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 50,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "group_by_class": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "C",
          "doc_count": 20,
          "group_by_gender": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "Boy",
                "doc_count": 15,
                "avg_score": {
                  "value": 60.06666666666667
                }
              },
              {
                "key": "Girl",
                "doc_count": 5,
                "avg_score": {
                  "value": 70.8
                }
              }
            ]
          }
        },
        {
          "key": "A",
          "doc_count": 13,
          "group_by_gender": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "Girl",
                "doc_count": 7,
                "avg_score": {
                  "value": 71.42857142857143
                }
              },
              {
                "key": "Boy",
                "doc_count": 6,
                "avg_score": {
                  "value": 48.333333333333336
                }
              }
            ]
          }
        },
        {
          "key": "D",
          "doc_count": 9,
          "group_by_gender": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "Boy",
                "doc_count": 5,
                "avg_score": {
                  "value": 65.6
                }
              },
              {
                "key": "Girl",
                "doc_count": 4,
                "avg_score": {
                  "value": 79.5
                }
              }
            ]
          }
        },
        {
          "key": "B",
          "doc_count": 8,
          "group_by_gender": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "Girl",
                "doc_count": 6,
                "avg_score": {
                  "value": 73.33333333333333
                }
              },
              {
                "key": "Boy",
                "doc_count": 2,
                "avg_score": {
                  "value": 74.5
                }
              }
            ]
          }
        }
      ]
    }
  }
}

到這邊為止, 已經成功達成"以班級作為分類, 每個班級男生女生的平均分數了".

最後一個: 得分最高者

哈.. 只要把"平均", 也就是avg的地方改成max就好啦！
這邊就不佔版面了, 請同學自行試試看囉～

謝謝大家～下課！

Elasticsearch 聚合基礎(一): 分组聚合(bucketing)

Tommy Lin — Mon, 03 Dec 2018 06:16:28 GMT

Elasticsearch 聚合搜尋: 分組

Elasticsearch 的聚合搜尋, 可以說是最常的用的功能了. 什麼是聚合搜尋呢？就是針對搜尋出來的結果, 再去做計算. 比如可以計算最大值、平均值、最小值、總和、95%、分組、累加... 等等的計算.

這篇從基礎的分組開始帶大家了解如何做聚合搜尋, 以及他的概念.

範例: 球球分類

首先, 我們有一堆球球; 這些球球都有自己個別的元素, 包含形狀和顏色. 所謂分類就是讓有相同屬性的球球分到一類, 比如相同的形狀或是相同的顏色.

透過分類, 我們可以得到

這一坨球球裡面有哪些形狀？
每個形狀有幾顆球球?
每個形狀, 有幾個不同顏色的球球？

建立資料

那就先把資料建立進去吧, 還沒安裝的朋友請參考Hello World 系列 - Elasticsearch.

我們用批量上傳的方式來丟資料, 可以在這邊下載原始資料balls.json
注意最後一行要空白！

然後將他上傳: curl -H "Content-Type: Application/json" -XPOST 192.168.40.41:9200/demo/doc/_bulk --data-binary @balls.json

大家記得把 elasticsearch ip 換成自己的~

然後看一下, 25顆球都歸檔啦～

❯ curl http://192.168.40.41:9200/_cat/indices\?v
health status index               uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   demo                bBTVU1ssRjKvoIamaO7Irg   3   1         25            0       34kb         20.7kb

來分類吧: 用形狀分類

第一步, 我們先用形狀來分類. 我們預期得到以下結果:

注意圖片中每個分組寫著Buckets, 意思就是桶; Elasticsearch 當中的分組都適用桶裝的唷.

所以怎麼用 elasticsearch 辦到呢？咱們接著做囉！

以詞(term)分類來作聚合搜尋

首先, 聚合的普遍結構長成這樣:

{
  聚合: {
    <聚合名字自取>: {
      <聚合種類>: {
        <聚合欄位>: ""
      }
    }
  }
}

那麼以這次的目標來說就是下面這樣

{
  聚合: {
    以形狀分類: {
      詞分類: {
        欄位: 形狀
      }
    }
  }
}

實際操作

❯ curl -s -H "Content-Type: Application/json" -XPOST "http://192.168.40.41:9200/demo/_search" -d '
{
  "size": 0,
  "query": {
    "bool": {}
  },
  "aggs": {
    "group_by_shape": {
      "terms": {
        "field": "shape.keyword"
      }
    }
  }
}
' | jq .


{
  "took": 177,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 25,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "group_by_shape": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "rectangle",
          "doc_count": 9
        },
        {
          "key": "circle",
          "doc_count": 8
        },
        {
          "key": "triangle",
          "doc_count": 8
        }
      ]
    }
  }
}

可以看到最後結果的部分; 已經得到我們想要的分類囉！

這邊注意幾個地方

size: 0: 因為我們在乎的是聚合後的結果, 而不是原始資料; 所以這邊size就可以等於0. 事實上, 如果您也有使用 grafana 或是 kibana, 也會發現他們也是這樣使用的.
took: 177: 代表這次搜尋花了 177 毫秒.
hits: 25: 代表總共有 25 顆球球.
aggregations: 透過聚合得到的資料都會出現在這物件裡.
buckets: 桶, 每個分類都是用桶子裝著 XD

接著再以顏色做分類

上面分類出來, 只能得到各形狀有幾顆球球; 但是我們還需要知道更細的分類
每個形狀裡面, 個別又有多少顏色的球？

這是我們想要得到的結果:

這時候, 有了上面分類的概念, 只要照著嵌套下去就行了～
所以說, 結構長成這樣:

{
  聚合: {
    以形狀分類: {
      詞分類: {
        欄位: 形狀
      },
      聚合: {
        以顏色分類: {
          詞分類: {
            欄位: 顏色
          }
        }
      }
    }
  }
}

實際操作

❯ curl -s -H "Content-Type: Application/json" -XPOST "http://192.168.40.41:9200/demo/_search" -d '
{
  "size": 0,
  "query": {
    "bool": {}
  },
  "aggs": {
    "group_by_shape": {
      "terms": {
        "field": "shape.keyword"
      },
      "aggs": {
        "group_by_color": {
          "terms": {
            "field": "color.keyword"
          }
        }
      }
    }
  }
}
' | jq .

{
  "took": 65,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 25,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "group_by_shape": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "rectangle",
          "doc_count": 9,
          "group_by_color": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "blue",
                "doc_count": 5
              },
              {
                "key": "red",
                "doc_count": 3
              },
              {
                "key": "yellow",
                "doc_count": 1
              }
            ]
          }
        },
        {
          "key": "circle",
          "doc_count": 8,
          "group_by_color": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "blue",
                "doc_count": 4
              },
              {
                "key": "red",
                "doc_count": 3
              },
              {
                "key": "yellow",
                "doc_count": 1
              }
            ]
          }
        },
        {
          "key": "triangle",
          "doc_count": 8,
          "group_by_color": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "yellow",
                "doc_count": 6
              },
              {
                "key": "blue",
                "doc_count": 2
              }
            ]
          }
        }
      ]
    }
  }
}

鏘鏘鏘～這樣就達到我們要的結果了～如果說每個球球身上還有分數的話, 還可以繼續嵌套下去將它計算出來唷！

所以上面得答案依序是

這一坨球球裡面有哪些形狀？ rectangle, triangle, circle
每個形狀有幾顆球球? rectangle:9, triangle:8, circle: 8
每個形狀, 有幾個不同顏色的球球？

rectangle-blue: 5, rectangle-red: 3, rectangle-yellow: 1
triangle-blue: 2, triangle-red: 0, triangle-yellow: 6
circle-blue: 4, circle-red: 3, circle-yellow: 1

分組聚合就到這邊告一段落囉, 謝謝大家.

Linux 使用wondershaper限制帶寬速度

Tommy Lin — Mon, 26 Nov 2018 06:35:04 GMT

前情提要

為什麼要限制帶寬哩？原因是之前管理的某服務器被黑了; 被拿去做肉雞參與了在世界上某地的一場DDOS戰役.. 見下圖

流量被吃, CPU可能還被拿去挖礦, 這都已經夠慘了; 供應商直接把你斷網, 然後給你一個告知原文如下

Hi there,

We've detected an outgoing Denial of Service attack (http://do.co/21Y1Gc1) originating from your Droplet. Due to the traffic’s harmful nature, your Droplet was taken offline; this means it is not connected to the internet and all hosted sites and services are unreachable. We know that this action is disruptive, but it’s necessary to protect you, our network, and the target of your Droplet’s attack.

You can access your droplet using this console link: https://cloud.digitalocean.com/droplets/119932431/console

Because this means your Droplet has been compromised, you’ll need to back up your data and transfer it to a new Droplet. We have a recovery tool to assist you, but any databases on your Droplet will need to be backed up before we boot your Droplet into the recovery tool because you won’t be able to make the backups afterwards.

Specific backup steps vary depending on the database software in use, which is most commonly MySQL. If you’re not sure how, http://do.co/1h0uWgm will show you how to back up your databases from MySQL.

Once you have finished backing up your data, the next step is downloading and transferring your data to your new Droplet. Please update this ticket when you’re ready and we’ll configure this Droplet so you can proceed.

If you’ve enabled our backup service or have a snapshot of the Droplet, you can restore directly from that image instead of going through the recovery process. Be aware that this will destroy any changes or additions made to the Droplet since the creation date of the image you use to restore from. If you do this, please update the ticket as we will need to reconfigure networking to get your Droplet back online.

If you don’t need the data from this Droplet, you can destroy this Droplet at your convenience. If you’d like to keep the current IP address, you will need to use our rebuild function. This acts like a clean install of your OS and is currently the only way to ensure you retain your IP. As with restoring from an image, please let us know once you’ve done this.

If you have any further questions, or if we can further assist, please let us know.

Regards,

Trust & Safety
DigitalOcean Support

內容大致如下

我們發現尼的vps有大量惡意流量唷, 為了保護尼, 我們將它斷網了. 不過尼還是可以透過 web console 登入唷 ! 即使進去後你還是沒有網路德, 備份好之後請跟我們說唷, 我們會協助尼將資料轉移到新的機器, 至於現在這台就這樣讓他去吧~ 886

哇擦... 發現惡意流量你可以限速就行了吧？搞成這樣至於嗎？
嘛... 算了當作沒有事先了解遊戲規則吧...

正文

所以說.. 我們就乖乖的自己限速一下吧...

先看一下官方, 這套件其實就是 tc 的前端, 方便使用者調用 tc ; 而 tc, 就是 linux 內核用來控制流量的機制.
關於 tc 網路上介紹的文章已經很多了, 那部分還真的有點艱深... 小弟理解不足, 就不在這邊介紹了, 請各位大大自行移駕.

再說用法之前, 還需要再補充一點. tc 本身對於上傳可以比較準確地控制, 下載則無法. 原因是對於出口流量, 較容易控制要出去多少, 而近來的流量, 必須要靠中介網卡ifb才可以; 但這部分超出文章範圍哩, 有興趣的自行去理解吧！

下載

cd /tmp/
git clone  https://github.com/magnific0/wondershaper.git
cp wondershaper/wondershaper /usr/bin/

wondershaper -h
USAGE: /usr/bin/wondershaper [-hcs] [-a ] [-d ] [-u ]

Limit the bandwidth of an adapter

OPTIONS:
   -h           Show this message
   -a  Set the adpter
   -d     Set maximum download rate (in Kbps) and/or
   -u     Set maximum upload rate (in Kbps)
   -p           Use presets in /etc/conf.d/wondershaper.conf
   -c           Clear the limits from adapter
   -s           Show the current status of adapter
   -v           Show the current version

MODES:
   wondershaper -a  -d  -u 
   wondershaper -c -a 
   wondershaper -s -a 

EXAMPLES:
   wondershaper -a eth0 -d 1024 -u 512
   wondershaper -a eth0 -u 512
   wondershaper -c -a eth0

使用

首先看一下未限速的上傳: 100Mbits

然後使用該工具查看網卡的預設配置: wondershaper -s -a ens160

接下來, 幫他上限速. 這邊就只限制上傳速度: wondershaper -a ens160 -u 20480; 這樣子是鎖在20Mbits

最後看一下成果吧: 16Mbits

如果不要限制了的話: wondershaper -c -a ens160

結論

希望大家的雲主機都能安安穩穩~~~

Nginx 阻擋國家並且配置白名單

Tommy Lin — Wed, 20 Jun 2018 15:21:56 GMT

前言

使用 ngx_http_geoip_module 搭配國家阻擋, 要特別注意的是他和我們常用的 ngx_http_access_module, 是會互相打架的哦. 什麼意思哩？就是如果比如你把某個國家阻擋掉, 那麼就算你特別配置允許該國家特定 ip allow A.B.C.D, 也是沒有用的哦, 那整個國家就是被擋在外面啦！下面我們就來看要怎麼配置阻擋國家和特定白名單吧.

事前準備

測試環境
Ubuntu 14.04 LTS
Ubuntu 16.04 LTS

首先呢, 既然要使用那個模塊, 就必須先確保你的 Nginx 有編譯該模塊. 可以用這個指令查看. nginx -V 2>&1 | grep -o with-http_geoip_module 如果輸出是空白的, 代表你的 Nginx 沒有這個模塊. 那麼恭喜你, 就可以繼續按照本課程繼續下一步. 我的也是熱騰騰編譯進去的, 如果你們的已經如下圖正確顯示, 那請跳到配置部份 ~

要加入這個模塊的話, 就必須從編譯開始. 但是要編譯之前呢, 請先 apt-get install libgeoip-dev, 這個是 geoip 所需要的函式庫, 如果缺少安裝的話就會有下面錯誤:
configure: error: the GeoIP module requires the GeoIP library. You can either do not enable the module or install the library.

安裝好了就開始編譯吧！請在 configure 步驟的時候加入 --with-http_geoip_module,已經開始恍神的同學快到 Ubuntu16.04 從源安裝nginx 惡補一下.

configure 完了之後就是編譯 make, 跑完之後, 記得先暫停服務 service nginx stop, 然後安裝 make install. 最後啟動服務 service nginx start.

這時候再次執行 nginx -V 2>&1 | grep -o with-http_geoip_module , 就可以看到已經加入模塊啦！

在配置之前, 還有個最重要的工作, 下載 geoip 的資料庫！完成了就可以開始配置了.

cd /etc/nginx
wget http://geolite.maxmind.com/download/geoip/database/GeoLiteCountry/GeoIP.dat.gz 
gunzip GeoIP.dat.gz

配置

請在 /etc/nginx/nginx.conf 裡面的 http 區塊裡面加入這段, 請注意我使用 $remote_addr 這個變數, 如果你們的 Nginx 是躲在 proxy 後面的, 請先參照 Nginx 獲取 CDN X-Forwarded-For, 把表頭的 X-Forwarded-For 替換至 remote_addr. (如果覺的麻煩的話, 解決方法就是用別的變數取代也可以)

    # 資料庫位址
    geoip_country /etc/nginx/GeoIP.dat;

    # 從 $remote_addr 這個變數, 來定義 $ip_whitelist 這個變數
    geo $remote_addr $ip_whitelist {
      default 0;
      1.200.216.57 1;
    }

接下來配置一個設定檔 /etc/nginx/conf.d/geoblock.conf, 內容如下. 這邊配置阻擋了日本, 台灣, 新加坡. 其他的國家代碼可以在 maxmind 網站找到哦.

# 如果是白名單, 就不用繼續看下去了
if ($ip_whitelist = 1) {
    break;
}

# 這些國家給他 403
if ($geoip_country_code ~ (JP|TW|SG)) {
    return 403;
}

最後一步就是在你的 server 引用進來

server {
 listen 80 default_server;

 root /usr/share/nginx/html;
 index index.html index.htm;

 server_name localhost;
 
 # 引用我們的 geoblock 配置
 include /etc/nginx/conf.d/geoblock.conf;

 location / {
  try_files $uri $uri/ =404;
 }
}

驗證

依照剛剛的配置, 就是阻擋日本, 台灣, 新加坡; 然後白名單是一個日本IP (118.11.213.239 );

實驗者一號, 日本, 阻擋！

實驗者二號, 新加坡, 阻擋！

實驗者三號, 台灣, 阻擋！

實驗者四號, 日本白名單, pass！

(嘛, 相信以各位的功力, 這點馬賽克是傷不了你們的眼睛的)
這樣就達成今天的目的啦～我們下次見！

尾聲

一個 moment 就把整個國家 block 掉！

使用 dnsmasq 配置内部 DNS server

Tommy Lin — Fri, 08 Jun 2018 11:35:55 GMT

前言

為什麼要配置內部的 dns server 呢？因為現在各種應用溝通都開始使用 FQDN 啦～其實要在 public nameserver 配置 private ip 也可以, 但是有幾點考量 :

有時候他們被限制不能上網, 那又需要解析 domain 的時候, 就必須要用內部 dns 啦
明明要去的目的地就在旁邊, 不需要繞出去 internet 在查詢一次
如果被人發現你們的域名解出來是 private ip, 那 ... 其實他也不能怎樣, 就是很有趣的發現這樣XD.

說好的 bind9 呢？

沒有錯！我這一開始也是安裝 bind9, 但是配置到一半我就投降了！他比較適合架構比較大的時候使用吧, 我們這就是一個小小的內部 dns server, 追求的就是一個單純快速簡易穩定～

入正題！

安裝環境: Ubuntu 16.04.4 LTS
第一步就是安裝啦 apt-get install dnsmasq
安裝好後會有個設定檔 /etc/dnsmasq.conf , 裡面寫的落落長但全都註解的, 我們就先把他改名子, 當作之後的配置參考. mv /etc/dnsmasq.conf /etc/dnsmasq.conf.bak

基本配置

基本配置這樣就可以啦～如果還需要其他選項, 比如 ttl, 或是 ptr 或是要使用哪張網卡, 監聽哪個端口, 都可以去剛剛的 /etc/dnsmasq.conf.bak 裡面找需要的參數使用. 我們就先把目前的配置講完吧.

會有三個配置文件

/etc/dnsmasq.conf: 主要配置, 要加什麼選項通常在這
/etc/hosts_myns.conf: 裡面內容格式和 /etc/hosts 一樣, 就是本地 dns 紀錄
/etc/hosts_myns.conf: 上游選項, 就是說如果你身上沒有紀錄的話, 他要去哪裡幫你找答案

root@ubuntu:~# cat /etc/dnsmasq.conf
# 我们自己的ns纪录对照, 格式和 /etc/hosts一样
no-hosts
addn-hosts=/etc/hosts_myns.conf

# 自己没有纪录的话就向外询问
resolv-file=/etc/resolv_myns.conf

# 如果要指定是 CNAME 的话这样配置
# 这笔纪录一定要存在于hosts (/etc/hosts_myns.conf)
cname=fish01.tux.com,fish.tux.com

root@ubuntu:~# cat /etc/hosts_myns.conf
2.2.2.2 www.bar.com
3.3.3.3 fish.tux.com

root@ubuntu:~# cat /etc/resolv_myns.conf
nameserver=8.8.8.8
nameserver=168.95.1.1

改完後記得 systemctl reload dnsmasq 或是 systemctl restart dnsmasq

兩者的差別是, 如果你配置的 domain, 在 internet 已經查詢的到, 你要強制覆蓋的話, 使用 reload 指令是不會馬上生效的哦, 因為緩存還沒失效

就這樣短短幾行配置, 完成了 A 紀錄, CNAME 紀錄, 上游 dns 配置啦～下面就演示一下成果

驗收

配置的A紀錄

可以看到預設的 ttl 是 0 哦

配置的CNAME紀錄

沒配置的紀錄, 要去詢問上游

尾聲

嘛～有時間還是要把 bind9 學好阿～人家可是老字號阿.

Collectd的SNMP模塊搭配Logstash

Tommy Lin — Sat, 26 May 2018 20:50:48 GMT

前言

最近使用 ELK/G 在彙整資料, 雖然說 Logstash 已經能幫我們處理 netflow, log, api 等資訊, 但還有一樣東西是運維不可或缺的: 沒錯就是snmp !!

秉持 hello world 的精神, 這篇先介紹怎麼使用 collectd snmp 模塊抓取本機的資訊, 熟悉之後就可以把資料拋給 logstash 囉

安裝環境: Ubuntu 16.04 LTS

安裝和配置 snmp 及 collectd

apt-get install snmp snmpd collectd
mv /etc/snmp/snmpd.conf /etc/snmp/snmpd.conf.bak
echo "rwcommunity mysnmp" >> /etc/snmp/snmpd.conf

這樣就安裝和配置完成snmpd部份了哦, 可以看到剛才把預設的配置檔做備份, 真正的配置目前只有rwcommunity mysnmp 這樣一行而已哦! 這是個人習慣, 一來可以很清楚自己下的設定, 二來未來再次部屬的時候也可以很快的copy paste哦！

改完設定檔以後記得要重新啟動服務systemctl restart snmpd

配置Collectd

Collectd 基礎配置

cat /etc/collectd.conf                                                          
Hostname    "localhost"
LoadPlugin logfile
LoadPlugin write_log
LoadPlugin snmp
TypesDB    "/usr/share/collectd/types.db"
TypesDB    "/usr/share/collectd/types.db.custom"


        LogLevel info
        File STDOUT
        Timestamp true
        PrintSeverity false


  
      Type "if_octets"
      Table true
      Instance "IF-MIB::ifDescr"
      Values "IF-MIB::ifInOctets" "IF-MIB::ifOutOctets"
  
  
      Address "127.0.0.1"
      Version 1
      Community "pp"
      Collect "std_traffic"
      Interval 10

名詞解釋:
Hostname：讓別人知道是誰在採資料
logfile：寫入檔案的模塊, 必須和write_log一起使用, STDOUT 就是在畫面打印啦
write_log: 寫入採集到的 metric, 等到測試完成這個可以註解掉哦～
snmp：就是採集snmp的模塊囉
：採集資料名, 自取, 稍後會在Host用到
Instance "IF-MIB::ifDescr"：資料參照名, 這邊就是網卡的敘述名稱
Values "IF-MIB::ifInOctets" "IF-MIB::ifOutOctets" 實際要採集的資料
：主機名, 自取辨識用
Collect "std_traffic"：剛才取的採集資料名

這時候執行sudo collectd -f 就可以看到資料打印囉

[2017-08-18 00:56:52] plugin_load: plugin "logfile" successfully loaded.       
[2017-08-18 00:56:52] plugin_load: plugin "snmp" successfully loaded.          
[2017-08-18 00:56:52] plugin_load: plugin "write_log" successfully loaded.     
[2017-08-18 00:56:52] Initialization complete, entering read-loop.             
[2017-08-18 00:56:52] write_log values:                                        
my_archlinux.snmp.if_octets-lo.rx 6065342 1502989012                           
my_archlinux.snmp.if_octets-lo.tx 6065342 1502989012                           

[2017-08-18 00:56:52] write_log values:                                        
my_archlinux.snmp.if_octets-tun0.rx 548103 1502989012                          
my_archlinux.snmp.if_octets-tun0.tx 471309 1502989012                          

[2017-08-18 00:56:52] write_log values:                                        
my_archlinux.snmp.if_octets-Intel_Corporation_Wireless_3160.rx 327940 1502989012                                                                               
my_archlinux.snmp.if_octets-Intel_Corporation_Wireless_3160.tx 248255 1502989012                                                                               

[2017-08-18 00:56:52] write_log values:                                        
my_archlinux.snmp.if_octets-Qualcomm_Atheros_Killer_E220x_Gigabit_Ethernet_Controller.rx 1208338895 1502989012                                                 
my_archlinux.snmp.if_octets-Qualcomm_Atheros_Killer_E220x_Gigabit_Ethernet_Controller.tx 64125573 1502989012                                                   

^C[2017-08-18 00:56:53] Exiting normally.                                      
[2017-08-18 00:56:53] collectd: Stopping 5 read threads.                       
[2017-08-18 00:56:53] collectd: Stopping 5 write threads.

上面 Collect 抓到的資訊分別是
Host - Type - Type-Instance - Metric - Timestamp

OK! 沒問題了, 那我們就進行下一步吧！

來配置和Logstash的傳接球吧

Collect 配置

cat /etc/collectd/collectd.conf
LoadPlugin logfile

	LogLevel info
	File STDOUT
	Timestamp true
	PrintSeverity false

LoadPlugin network
LoadPlugin snmp
LoadPlugin write_log

    
    


  
      Type "if_octets"
      Table true
      Instance "IF-MIB::ifDescr"
      Values "IF-MIB::ifInOctets" "IF-MIB::ifOutOctets"
  
  
      Address "127.0.0.1"
      Version 1
      Community "pp"
      Collect "std_traffic"
      Interval 10

可以看到這邊就是多了一個 network模塊, 要填入 logstash 的 ip 哦, 25826 就是我設定 logstash 監聽(接球)的埠號/端口

Logstash 配置

不熟悉的人先看這篇哦

input {
  udp {
    port => 25826
    buffer_size => 1452
    codec => collectd { }
    type => "collectd"
  }
}

filter {}

output {
if [type] == "collectd" {
  stdout {                   
    codec => rubydebug       
  }  
  elasticsearch {
    hosts => ["192.168.40.44:9200"]
    index => "collectd-%{+YYYY.MM.dd}"
        }
  }
}

這邊的 Collectd 和 Logstash 都有開啟 stdout (標準輸出) , 先確定配置沒有問題; 之後用 daemon 模式開啟就不需要囉

確定資料沒問題之後, 就可以用 systemctl start collectd 的方式來執行囉!

常見問題集

模塊表示: 我只接受x個值, 你卻給我y個

[2018-03-18 05:16:25] snmp plugin: DataSet `memory' requires 1 values, but config talks about 2

這是在說什麼呢？
原來是type造成的. 什麼意思? 有看到上面設定檔有個types.db嗎, 設定檔裡面的type就是去參照他的, 我們看一下types.db裡面他提到的memory部份.

tommy@ubuntu:~/collectd$ grep memory /usr/share/collectd/types.db
memory                  value:GAUGE:0:281474976710656
memory_lua              value:GAUGE:0:281474976710656
vs_memory              value:GAUGE:0:9223372036854775807

當你今天使用memory這個type的時候, 就不能回傳多個值, 這時候你可以建立一個自己的, 記得要再設定檔裡面把這個types.db.custom也加進去

tommy@ubuntu:~/collectd$ grep memory /usr/share/collectd/types.db.custom
linux_memory                    mem_total:GAUGE:0:281474976710656 mem_avail:GAUGE:0:281474976710656

然後type就可以選擇linux_memory這樣就好嚕

我想導入其他的MIB, 要怎麼做呢？

要導入其他廠商的MIB可以參考這個
導入mib

我的syslog裡面出現很多`UDP connection ftom xxx`這樣的垃圾訊息

可以用下面的配置來指定日誌等級

# RHEL OS
cat /etc/sysconfig/snmp
# OPTIONS="-LS0-6d"

# Ubuntu OS
cat /etc/default/snmpd
SNMPDOPTS='-LS4d -Lf /dev/null -u snmp -g snmp -I -smux,mteTrigger,mteTriggerConf -p /run/snmpd.pid'

可以參考這裡

我的syslog裡面有這些訊息 `snmpd[1234] Cannot statfs /run/user/1000/gvfs: Permission denied`

snmp 官方說這個無法防止, 所以請在rsyslog端做忽略

cat /etc/rsyslog.d/040-snmp-statfs.conf

if $programname == 'snmpd' and $msg contains 'statfs' then {
   stop
}

還有哪些OID可以用？

喂～這不算問題吧？
linux 常用

使用fail2ban防止ssh騷擾

Tommy Lin — Mon, 21 May 2018 17:37:12 GMT

前言

fail2ban 是一個可以防止各種鬼鬼祟祟, 偷偷摸摸行為的程式; 利用他可以有效防止各種莫名其妙的騷擾！

翻譯: 此程式可以讓你配置規則, 用定義好的正則過濾器閱讀日誌判斷各式暴力破解, 最後執行如加入iptables, 寄信等各式功能.

安裝環境: Ubuntu 16.04.2 LTS

首先直接安裝apt-get install fail2ban

安裝好之後, 來到 /etc/fai2ban這個路徑, 可以看到下面這些檔案

挑重要的來說
jail: 存放你的規則
filter: 存放你的過濾條件
action: 存放你的執行動作

所以完整的判斷流程就是 jail => filter => action

然後我們看一下jail.conf, 裡面放了一些預設的規則, 這邊介紹一下常用的參數意思.

[DEFAULT] : 模塊名稱, 這個模塊下定義的就是所有預設行為啦.
ignoreip: 也就是白名單ip.
ignorecommand: 額外的腳本來判別他是不是壞人. (其實不常用, 但是截圖入鏡只好介紹).
bantime: 要關進牢裡多久, 配置-1就是終身監禁哦.
findtime: 觀察間期.
maxretry: 可以犯幾次錯.
enabled: 預設每個規則都不啟用, 如果要啟用的話記得在那個模塊下配置 enabled = true.
filter: 過濾配置的名子, $(__name__)s就是和規則同名啦.
banaction: 判定有罪後的處置方式.

因為fail2ban預設唯一開啟的就是sshd規則; 那這篇就不再介紹新增規則啦～留給下一篇配置nginx再說 ٩(｡・ω・｡)

不過還是要帶大家看一下常用指令和效果！

關於監獄的各種登記紀錄, 可以在 /var/log/fail2ban.log查看, 如下圖所示 (一堆鬼鬼祟祟的人都被登記啦～)

fail2ban-client status 這個指令可以查看目前建立了哪些監獄

然後如果你對sshd這個監獄有興趣, 那就fail2ban-client status sshd 來查看; 可以看到裡面已經關三個人啦！

最後就是來iptables查看一下

尾聲

把那些討厭的人通通關起來 (`･ω･´)

使用 snmp 監控檔案

Tommy Lin — Sat, 19 May 2018 12:42:16 GMT

使用snmp監控檔案大小, 以及檔案中出現指定字樣的次數

snmp提供兩個entry來達到目標, 使用此方法可以來監控比如nginx access log裡面502的次數, 又或者是 error log的大小.

監控檔案大小, 單位為kb: file FILE [MAXSIZE]

監控指定字樣出現次數: logmatch NAME FILE CYCLETIME REGEX

在/etc/snmp/snmpd.conf新增以下兩行

logmatch mylogcheck /var/log/dummy/dummy.log 150 SYSTEM STATS
file /var/log/dummy/dummy.log 2

這樣的意思就是說

監控/var/log/dummy/dummy.log這個檔案 SYSTEM STATS出現的次數
/var/log/dummy/dummy.log這個檔案的大小不能超過2k
mylogcheck只是名子, 可以自己取

先看一下狀態, 目前SYSTEM STAT出現三次, 並且檔案大小是8.3k

root@ubuntu:/var/log/dummy# grep "SYSTEM STAT" dummy.log | wc -l
3
root@ubuntu:/var/log/dummy# du -h dummy.log
8.3K dummy.log

查看出現次數

❯ snmpwalk -v2c -c labsnmp 192.168.41.123 .1.3.6.1.4.1.2021.16
UCD-SNMP-MIB::logMatchMaxEntries.0 = INTEGER: 250
UCD-SNMP-MIB::logMatchIndex.1 = INTEGER: 1
UCD-SNMP-MIB::logMatchName.1 = STRING: mylogcheck
UCD-SNMP-MIB::logMatchFilename.1 = STRING: /var/log/dummy/dummy.log
UCD-SNMP-MIB::logMatchRegEx.1 = STRING: SYSTEM STATS
UCD-SNMP-MIB::logMatchGlobalCounter.1 = Counter32: 8
UCD-SNMP-MIB::logMatchGlobalCount.1 = INTEGER: 8
UCD-SNMP-MIB::logMatchCurrentCounter.1 = Counter32: 3
UCD-SNMP-MIB::logMatchCurrentCount.1 = INTEGER: 3
UCD-SNMP-MIB::logMatchCounter.1 = Counter32: 3
UCD-SNMP-MIB::logMatchCount.1 = INTEGER: 0
UCD-SNMP-MIB::logMatchCycle.1 = INTEGER: 150
UCD-SNMP-MIB::logMatchErrorFlag.1 = INTEGER: noError(0)
UCD-SNMP-MIB::logMatchRegExCompilation.1 = STRING: Success

抓到該字詞出現3次

查看檔案大小

❯ snmpwalk -v2c -c labsnmp 192.168.41.123 .1.3.6.1.4.1.2021.15.1
UCD-SNMP-MIB::fileIndex.1 = INTEGER: 1
UCD-SNMP-MIB::fileName.1 = STRING: /var/log/dummy/dummy.log
UCD-SNMP-MIB::fileSize.1 = INTEGER: 8 kB
UCD-SNMP-MIB::fileMax.1 = INTEGER: 2 kB
UCD-SNMP-MIB::fileErrorFlag.1 = INTEGER: error(1)
UCD-SNMP-MIB::fileErrorMsg.1 = STRING: /var/log/dummy/dummy.log: size exceeds 2kb (= 8kb)

抓到檔案大小8k, 超過我們配置的2k

那個 oid 是哪來的？

其實我沒有找到oid.. Orz
那我是怎麼知道要使用這個oid呢？是用下面這種偷吃步的方式啦！

因為snmpd已經註冊了我們配置的名子, 所以就大範圍搜尋！

❯ snmpwalk -v2c -c labsnmp 192.168.41.123 .1.3.6.1.4.1.2021 | grep mylogcheck
UCD-SNMP-MIB::logMatchName.1 = STRING: mylogcheck

找到以後,再把他翻譯回來

❯ snmptranslate -On UCD-SNMP-MIB::logMatchName.1
.1.3.6.1.4.1.2021.16.2.1.2.1

其實不用特別翻譯也可以用哦, 前提是你的snmp client有安裝mib; 我覺得反解回來oid比較通用啦～

額外技巧: snmp正反解
正解

% snmptranslate .1.3.6.1.2.1.1.3.0
SNMPv2-MIB::sysUpTime.0

反解

% snmptranslate -On SNMPv2-MIB::sysUpTime.0
.1.3.6.1.2.1.1.3.0

參考:
SNMPD manpage (Log File Monitoring)
snmptranslate

Ansible 的 Facts

Tommy Lin — Thu, 17 May 2018 15:08:28 GMT

什麼是 Ansible facts 呢?

可以把他想像成auto_discover, 它能夠幫你蒐集一些機器的基本訊息, 如硬盤,IP, 主機名稱等等的, 並且註冊到playbook變數, 讓之後可以繼續使用.有一些文章提到, 如果你已經有架構內所有機器的資訊, 不需要他來幫你gather_facts 那就可以把他關掉哦. 不過這個功能其實滿好用的,下面來介紹一下吧～

首先我們編寫一個playbook, 內容就是基本的ping就好了

執行效果如下:

可以看到, 他在真正執行tasks之前, 做了一個動作Gathering Facts, 那就是他在蒐集資訊哦. 這是他預設的行為, 不過這項任務其實很單純, 並不需要這些資訊,那我們可以把他關掉. 有兩個地方可以關掉, 第一個就是ansible.cfg, 另外也可以在playbook裡面各別關掉哦. 首先看看ansible.cfg.

有三個選項: 預設是 implicit
smart: 預設蒐集資訊, 但是不重複蒐集
implicit: 預設一律蒐集, 指定 gather_facts: False來關閉
explicit: 預設一律不蒐集, 指定 gather_facts: True來開啟

那我們這邊把他改成 explicit, 讓他一律不蒐集吧！

這時候再次執行剛剛的playbook, 可以看到他不再蒐集囉. 在簡單的任務中, 這樣可以省不少時間呢.

欸..上面才說要介紹,怎麼就關掉了哪.. 別急阿～這就把他打開

怎麼使用他？

下面示範如何各別開啟, 然後這個facts又能給我們帶來什麼好處吧！
比如你需要在你的任務中, 呼叫那台機器的host name 或是 IP, 那就可以像這樣使用哦.

這邊示範取得主機 IP

運行結果：

Wait!!

ansible_default_ipv4['address'] 這是打哪來的？
這就是他蒐集到的facts啦～
使用 ansible -m setup 可以看到全部的參數哦

這樣一來, 機器的很多資訊不用再定義一次啦～不過...拿到自己的資訊可能不太稀奇, 我能不能讓LAB_PROXY10的任務, 自動去抓取LAB_WEB1的 IP 呢？

共享facts

獨樂(ㄩㄝ↘)樂不若與眾樂(ㄩㄝ↘)樂, 在你那邊蒐集到的資訊, 我借來用可以吧？當然可以, 請看下面示範
請注意這邊只有 LAB_WEB1 有蒐集事實 (gather_facts); LAB_PROXY10 則沒有

運行結果如下

這邊的 inventory_hostname 指的就是自己, 因為 LAB_PROXY10 自己沒有蒐集事實, 所以他第一個會輸出 VARIABLE IS NOT DEFINED

但是 LAB_WEB有蒐集事實, 所以就可以去輸出他的 IP 哦; 這邊就成功讓 LAB_PROXY10 去輸出 LAB_WEB1的 IP 資訊啦～

好像有什麼沒說清楚..

Wait x2

第一次我們不用共享facts的時候, 我們這樣配置 var=ansible_default_ipv4['address']
第二次要facts的時候變成 var=hostvars[inventory_hostname]['ansible_default_ipv4']['address']

還記得我們可以在inventory裡面定義變數嗎？然後 facts 這也視為變數, 這兩個會參在一起~~做成撒尿牛丸~~變成hostvars; 所以可以想像 hostvars 就是全局的變數, 看下面的範例吧

LAB_WEB1 什麼都不做, 就是蒐集事實
LAB_PROXY10 負責把全局變數打印出來

然後把打印的結果, 丟到這裡看

可以看到
LAB_PROXY10 下面只有24個選項可以用; LAB_WEB 有110個選項, 那就是gather facts帶來的哦

尾聲

關於 facts 就介紹到這邊啦～
複習一下重點

如果要取自己的變數, 參照 ansible setup 模塊 提供的就可以了
如果還要取別人的變數, 就要從全局變數 hostvars 一路寫下來
gather facts只要做一次, 就可以在 playbook 裡面沿用下去

我們下次見～