logstasht接收nginx日誌出現Unrecognized character escape 'x' (code 120)

最近在用logstash統計nginx access log, 發現如果nginx log裡面帶有中文轉譯的字符的時候, logstash 會沒有辦法正確解析(json parse fail) nginx的log.

DeepinScreenshot_select-area_20190508125002

怎麼會呢, 不是都已經轉成json了嗎? 我們這就去nginx log看一下什麼狀況, 可以看到錯誤的好像是 user_agent \xE4\xBD\xA0

DeepinScreenshot_select-area_20190508125251

大家可以把下面的log拿去validate一下; 可以看到這並不是正確的json, 正確的utf-8字符轉譯應該要是 \u0000 這種格式, 並不是 \x00

{"status": "200","remote_addr": "192.168.21.152","remote_user": "-","timestamp": "2019-05-08T12:48:26+08:00","request": "GET / HTTP/1.1","bytes_sent": "944","http_referer": "-","http_user_agent": "\xE4\xBD\xA0","server_name": "192.168.150.110","uri": "/","args": "-","proxy_host": "mixed.iloveloli.net","upstream_addr": "192.168.150.113:80","upstream_status": "200","upstream_cache_status": "-","request_time": "0.001","upstream_response_time": "0.001","X-Forwarded-For": "-"}

咱半呢? 所以說 nginx 自己也知道知道有這種問題; 在1.11.8之後提供了escape=json這個參數(嚴格來說是escape這個參數); 完整說明可以看到文末的參考三
我們這就去設定檔加上看看, 箭頭部份就是我加上的.

DeepinScreenshot_select-area_20190508130601

修改完就可以 nginx -t && nginx -s reload
下面看一下修改完成之後的效果, 可以看到unicode有正確轉譯囉~ 結案!
DeepinScreenshot_select-area_20190508131051

參考一 github issue
參考二 elk 官方教學 nginx log
參考三 nginx log_format文檔