Hive分析AWS ELB訪問日志

保存 AWS ELB 訪問日志

AWS ELB - AWS elastic load balancing,為了方便做 auto scaling,可以選用 AWS ELB + AutoScaling。

在 ELB 設(shè)置頁面可以開啟保存訪問日志到 S3 bucket中。

Hive 分析ELB日志

Hive可以從S3 bucket中讀取日志文件,并使用SQL查詢。

注:可以配置 AWS EMR + Hue,方便做后面測試。當然也可以將S3 bucket中的文件同步到本地做測試。

創(chuàng)建Hive table

CREATE EXTERNAL TABLE IF NOT EXISTS elb_raw_access_logs (
  request_timestamp string, 
  elb_name string, 
  request_ip string, 
  request_port int, 
  backend_ip string, 
  backend_port int, 
  request_processing_time double, 
  backend_processing_time double, 
  client_response_time double, 
  elb_response_code string, 
  backend_response_code string, 
  received_bytes bigint, 
  sent_bytes bigint, 
  request_verb string, 
  url string, 
  protocol string, 
  user_agent string, 
  ssl_cipher string, 
  ssl_protocol string ) 
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
         'serialization.format' = '1','input.regex' = '([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*)[:\-]([0-9]*) ([-.0-9]*) ([-.0-9]*) ([-.0-9]*) (|[-0-9]*) (-|[-0-9]*) ([-0-9]*) ([-0-9]*) \\\"([^ ]*) ([^ ]*) (- |[^ ]*)\\\" (\"[^\"]*\") ([A-Z0-9-]+) ([A-Za-z0-9.-]*)$' ) 
LOCATION 's3://onetouch-test-elb/';

分析

查看數(shù)據(jù):

SELECT * FROM elb_raw_access_logs WHERE elb_response_code = '200' LIMIT 10;

日期:2017-04-17T10:11:32.623734Z

各模塊正常請求的平均響應時間:

SELECT elb_name, avg(backend_processing_time)
    FROM elb_raw_access_logs
    WHERE elb_response_code == '200'
    GROUP BY elb_name;

Hive partition

CREATE EXTERNAL TABLE IF NOT EXISTS elb_raw_access_logs_part (
  request_timestamp string, 
  elb_name string, 
  request_ip string, 
  request_port int, 
  backend_ip string, 
  backend_port int, 
  request_processing_time double, 
  backend_processing_time double, 
  client_response_time double, 
  elb_response_code string, 
  backend_response_code string, 
  received_bytes bigint, 
  sent_bytes bigint, 
  request_verb string, 
  url string, 
  protocol string, 
  user_agent string, 
  ssl_cipher string, 
  ssl_protocol string ) 
PARTITIONED BY(year string, month string, day string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
         'serialization.format' = '1','input.regex' = '([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*)[:\-]([0-9]*) ([-.0-9]*) ([-.0-9]*) ([-.0-9]*) (|[-0-9]*) (-|[-0-9]*) ([-0-9]*) ([-0-9]*) \\\"([^ ]*) ([^ ]*) (- |[^ ]*)\\\" (\"[^\"]*\") ([A-Z0-9-]+) ([A-Za-z0-9.-]*)$' )
LOCATION 's3://onetouch-test-elb/';

alert table

ALTER TABLE elb_raw_access_logs_part ADD PARTITION (year='2017',month='05',day='30') 
    location 's3://onetouch-test-elb/proxy/AWSLogs/677234397898/elasticloadbalancing/us-east-1/2017/05/30/';

show partitions elb_raw_access_logs_part;

分析

2017-05-30 的數(shù)據(jù)按 elb_response_code 分組:

SELECT elb_response_code, count(url) FROM elb_raw_access_logs_part
    WHERE year = '2017' AND month = '05' AND day = '30'
    GROUP BY elb_response_code;

以天為單位統(tǒng)計模塊吞吐量:

select year, month, day, count(*) as total_request_count 
from auth_elb_access_logs_part
group by year, month, day;

以天為單位統(tǒng)計錯誤率:

select year, month, day, SUM( IF( substr(elb_response_code, 1, 1) != '2', 1 , 0 ) )/ COUNT(*) * 100 as error_rate_pct 
from auth_elb_access_logs_part
group by year, month, day;

以天為單位計算平均響應時間:

select year, month, day, avg(backend_processing_time) as avg_backend_processing_time
from auth_elb_access_logs_part
group by year, month, day;
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容