zabbix 从入门到放弃

docker 安装 zabbix, 添加主机,设置报警,性能调优。

docker 搭建

1
2
3
4
5
6
7
8
9
10
# install docker-ce
sudo yum install -y yum-utils device-mapper-persistent-data lvm2
sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
sudo yum install -y docker-ce
sudo systemctl start docker

# 做数据映射后的方案
mkdir -p /data/docker/mysql/zabbix/data
mkdir -p /data/docker/zabbix/alertscripts
mkdir -p /data/docker/zabbix/externalscripts

然后是安装 zabbix 前端,后端,数据库。

1
2
3
4
5
6
7
8
9
# 数据库。
docker run --name mysql-server -t \
-e MYSQL_DATABASE="zabbix" \
-e MYSQL_USER="zabbix" \
-e MYSQL_PASSWORD="feiyang@2019+" \
-e MYSQL_ROOT_PASSWORD="feiyang@2019+" \
-v /data/zabbix_data:/var/lib/mysql \
-d mysql:5.7 \
--character-set-server=utf8 --collation-server=utf8_bin

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# 后端 参数已经调优
docker run --name zabbix-server-mysql \
-e DB_SERVER_HOST="mysql-server" \
-e MYSQL_DATABASE="zabbix" \
-e MYSQL_USER="zabbix" \
-e MYSQL_PASSWORD="feiyang@2019+" \
-e MYSQL_ROOT_PASSWORD="feiyang@2019+" \
-e ZBX_TIMEOUT=30 \
-e ZBX_CACHESIZE=8G \
-e ZBX_TRENDCACHESIZE=2G \
-e ZBX_STARTPOLLERS=500 \
-e ZBX_STARTPOLLERSUNREACHABLE=100 \
-e ZBX_HOUSEKEEPINGFREQUENCY=0 \
-v /data/zabbix/alertscripts:/usr/lib/zabbix/alertscripts \
-v /data/zabbix/externalscripts:/usr/lib/zabbix/externalscripts \
-v /data/zabbix/conf:/etc/zabbix \
--link mysql-server:mysql \
-p 10051:10051 \
-d zabbix/zabbix-server-mysql:centos-4.2-latest
1
2
3
4
5
6
7
8
9
10
11
12
# 前端
docker run --name zabbix-web-nginx-mysql \
-e DB_SERVER_HOST="mysql-server" \
-e MYSQL_DATABASE="zabbix" \
-e MYSQL_USER="zabbix" \
-e MYSQL_PASSWORD="feiyang@2019+" \
-e MYSQL_ROOT_PASSWORD="feiyang@2019+" \
-e PHP_TZ="Asia/Singapore" \
--link mysql-server:mysql \
--link zabbix-server-mysql:zabbix-server \
-p 8080:80 \
-d zabbix/zabbix-web-nginx-mysql:centos-4.2-latest

安装完成后,在浏览器打开 http://localhost:8080 默认的账户是 Admin 密码是 zabbix

ansible 批量添加主机

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
---
- name: add zabbix hosts
local_action:
module: zabbix_host
server_url: "{{ var_server_url }}"
login_user: "{{ var_login_user }}"
login_password: "{{ var_login_password }}"
host_name: "{{ inventory_hostname }}"
visible_name: "{{ inventory_hostname }}-{{function}}"
host_groups:
- "{{ var_host_group }}"
link_templates:
- Template Sea Ops OS Linux
- Template Sea Ops Disk IO Linux
#status: disabled
status: enabled
state: present
interfaces:
- type: 1
main: 1
useip: 1
ip: "{{ var_lanip | default(inventory_hostname) }}"
dns: ""
port: 10050

Action

设置触发警告的 Action 时,当 Step 设置为从 1 到 0 时,会一直发送告警信息,直到事件状态变成 OK,当 Step 设置为从 1 到 1 时,则只会发送一次告警,后面不会继续发送告警信息。

Zabbix 监控

监控网页状态

zabbix 自带的 Web monitoring 就可以进行简单的网页监控。目前官方的 zabbix 版本是 4.2 此时日期 2019-07-09
首先是找到一台机器 Go to Configuration → Hosts, pick a host and click on Web in the row of that host. Then click on Create web scenario. 详情请看官方文档,然后是添加报警,网页监控的官方文档也是介绍的非常详细。
具体的监控图表信息,可以在 zabbix 主页的 Monitoring -> Web 可以看到网页监控的详细信息。

监控 DNS

官方文档 4.2 版本
zabbix默认支持检查解析成功与否和具体的解析结果。对应内置的KEY

1
2
3
4
5
6
7
8
9
net.dns[<ip>,zone,<type>,<timeout>,<count>]
net.dns.record[<ip>,zone,<type>,<timeout>,<count>]
ip 指DNS服务器地址。
zone 指要解析的域名
type 指解析的记录类型
timeout 指超时时间 默认1 秒
count 指解析失败重试的次数 默认 2次

trigger {host:net.dns[dns_server,domain,A,1,2].count(#3)}=0

数据库表优化

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
DELIMITER $$
CREATE PROCEDURE `partition_create`(SCHEMANAME varchar(64), TABLENAME varchar(64), PARTITIONNAME varchar(64), CLOCK int)
BEGIN
/*
SCHEMANAME = The DB schema in which to make changes
TABLENAME = The table with partitions to potentially delete
PARTITIONNAME = The name of the partition to create
*/
/*
Verify that the partition does not already exist
*/

DECLARE RETROWS INT;
SELECT COUNT(1) INTO RETROWS
FROM information_schema.partitions
WHERE table_schema = SCHEMANAME AND table_name = TABLENAME AND partition_description >= CLOCK;

IF RETROWS = 0 THEN
/*
1. Print a message indicating that a partition was created.
2. Create the SQL to create the partition.
3. Execute the SQL from #2.
*/
SELECT CONCAT( "partition_create(", SCHEMANAME, ",", TABLENAME, ",", PARTITIONNAME, ",", CLOCK, ")" ) AS msg;
SET @sql = CONCAT( 'ALTER TABLE ', SCHEMANAME, '.', TABLENAME, ' ADD PARTITION (PARTITION ', PARTITIONNAME, ' VALUES LESS THAN (', CLOCK, '));' );
PREPARE STMT FROM @sql;
EXECUTE STMT;
DEALLOCATE PREPARE STMT;
END IF;
END$$
DELIMITER ;
DELIMITER $$
CREATE PROCEDURE `partition_drop`(SCHEMANAME VARCHAR(64), TABLENAME VARCHAR(64), DELETE_BELOW_PARTITION_DATE BIGINT)
BEGIN
/*
SCHEMANAME = The DB schema in which to make changes
TABLENAME = The table with partitions to potentially delete
DELETE_BELOW_PARTITION_DATE = Delete any partitions with names that are dates older than this one (yyyy-mm-dd)
*/
DECLARE done INT DEFAULT FALSE;
DECLARE drop_part_name VARCHAR(16);

/*
Get a list of all the partitions that are older than the date
in DELETE_BELOW_PARTITION_DATE. All partitions are prefixed with
a "p", so use SUBSTRING TO get rid of that character.
*/
DECLARE myCursor CURSOR FOR
SELECT partition_name
FROM information_schema.partitions
WHERE table_schema = SCHEMANAME AND table_name = TABLENAME AND CAST(SUBSTRING(partition_name FROM 2) AS UNSIGNED) < DELETE_BELOW_PARTITION_DATE;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;

/*
Create the basics for when we need to drop the partition. Also, create
@drop_partitions to hold a comma-delimited list of all partitions that
should be deleted.
*/
SET @alter_header = CONCAT("ALTER TABLE ", SCHEMANAME, ".", TABLENAME, " DROP PARTITION ");
SET @drop_partitions = "";

/*
Start looping through all the partitions that are too old.
*/
OPEN myCursor;
read_loop: LOOP
FETCH myCursor INTO drop_part_name;
IF done THEN
LEAVE read_loop;
END IF;
SET @drop_partitions = IF(@drop_partitions = "", drop_part_name, CONCAT(@drop_partitions, ",", drop_part_name));
END LOOP;
IF @drop_partitions != "" THEN
/*
1. Build the SQL to drop all the necessary partitions.
2. Run the SQL to drop the partitions.
3. Print out the table partitions that were deleted.
*/
SET @full_sql = CONCAT(@alter_header, @drop_partitions, ";");
PREPARE STMT FROM @full_sql;
EXECUTE STMT;
DEALLOCATE PREPARE STMT;

SELECT CONCAT(SCHEMANAME, ".", TABLENAME) AS `table`, @drop_partitions AS `partitions_deleted`;
ELSE
/*
No partitions are being deleted, so print out "N/A" (Not applicable) to indicate
that no changes were made.
*/
SELECT CONCAT(SCHEMANAME, ".", TABLENAME) AS `table`, "N/A" AS `partitions_deleted`;
END IF;
END$$
DELIMITER ;
DELIMITER $$
CREATE PROCEDURE `partition_maintenance`(SCHEMA_NAME VARCHAR(32), TABLE_NAME VARCHAR(32), KEEP_DATA_DAYS INT, HOURLY_INTERVAL INT, CREATE_NEXT_INTERVALS INT)
BEGIN
DECLARE OLDER_THAN_PARTITION_DATE VARCHAR(16);
DECLARE PARTITION_NAME VARCHAR(16);
DECLARE OLD_PARTITION_NAME VARCHAR(16);
DECLARE LESS_THAN_TIMESTAMP INT;
DECLARE CUR_TIME INT;

CALL partition_verify(SCHEMA_NAME, TABLE_NAME, HOURLY_INTERVAL);
SET CUR_TIME = UNIX_TIMESTAMP(DATE_FORMAT(NOW(), '%Y-%m-%d 00:00:00'));

SET @__interval = 1;
create_loop: LOOP
IF @__interval > CREATE_NEXT_INTERVALS THEN
LEAVE create_loop;
END IF;

SET LESS_THAN_TIMESTAMP = CUR_TIME + (HOURLY_INTERVAL * @__interval * 3600);
SET PARTITION_NAME = FROM_UNIXTIME(CUR_TIME + HOURLY_INTERVAL * (@__interval - 1) * 3600, 'p%Y%m%d%H00');
IF(PARTITION_NAME != OLD_PARTITION_NAME) THEN
CALL partition_create(SCHEMA_NAME, TABLE_NAME, PARTITION_NAME, LESS_THAN_TIMESTAMP);
END IF;
SET @__interval=@__interval+1;
SET OLD_PARTITION_NAME = PARTITION_NAME;
END LOOP;

SET OLDER_THAN_PARTITION_DATE=DATE_FORMAT(DATE_SUB(NOW(), INTERVAL KEEP_DATA_DAYS DAY), '%Y%m%d0000');
CALL partition_drop(SCHEMA_NAME, TABLE_NAME, OLDER_THAN_PARTITION_DATE);

END$$
DELIMITER ;
DELIMITER $$
CREATE PROCEDURE `partition_verify`(SCHEMANAME VARCHAR(64), TABLENAME VARCHAR(64), HOURLYINTERVAL INT(11))
BEGIN
DECLARE PARTITION_NAME VARCHAR(16);
DECLARE RETROWS INT(11);
DECLARE FUTURE_TIMESTAMP TIMESTAMP;

/*
* Check if any partitions exist for the given SCHEMANAME.TABLENAME.
*/
SELECT COUNT(1) INTO RETROWS
FROM information_schema.partitions
WHERE table_schema = SCHEMANAME AND table_name = TABLENAME AND partition_name IS NULL;

/*
* If partitions do not exist, go ahead and partition the table
*/
IF RETROWS = 1 THEN
/*
* Take the current date at 00:00:00 and add HOURLYINTERVAL to it. This is the timestamp below which we will store values.
* We begin partitioning based on the beginning of a day. This is because we don't want to generate a random partition
* that won't necessarily fall in line with the desired partition naming (ie: if the hour interval is 24 hours, we could
* end up creating a partition now named "p201403270600" when all other partitions will be like "p201403280000").
*/
SET FUTURE_TIMESTAMP = TIMESTAMPADD(HOUR, HOURLYINTERVAL, CONCAT(CURDATE(), " ", '00:00:00'));
SET PARTITION_NAME = DATE_FORMAT(CURDATE(), 'p%Y%m%d%H00');

-- Create the partitioning query
SET @__PARTITION_SQL = CONCAT("ALTER TABLE ", SCHEMANAME, ".", TABLENAME, " PARTITION BY RANGE(`clock`)");
SET @__PARTITION_SQL = CONCAT(@__PARTITION_SQL, "(PARTITION ", PARTITION_NAME, " VALUES LESS THAN (", UNIX_TIMESTAMP(FUTURE_TIMESTAMP), "));");

-- Run the partitioning query
PREPARE STMT FROM @__PARTITION_SQL;
EXECUTE STMT;
DEALLOCATE PREPARE STMT;
END IF;
END$$
DELIMITER ;

DELIMITER $$
CREATE PROCEDURE`partition_maintenance_all`(SCHEMA_NAME VARCHAR(32))
BEGIN
CALL partition_maintenance(SCHEMA_NAME, 'history', 30, 24, 14);
CALL partition_maintenance(SCHEMA_NAME, 'history_log', 30, 24, 14);
CALL partition_maintenance(SCHEMA_NAME, 'history_str', 30, 24, 14);
CALL partition_maintenance(SCHEMA_NAME, 'history_text', 30, 24, 14);
CALL partition_maintenance(SCHEMA_NAME, 'history_uint', 30, 24, 14);
CALL partition_maintenance(SCHEMA_NAME, 'trends', 120, 24, 14);
CALL partition_maintenance(SCHEMA_NAME, 'trends_uint', 120, 24, 14);
END$$
DELIMITER ;

Trends 120,(‘history’, 30, 24, 14), 最多保存 30 天的数据,每隔 24 小时生成一个分区,每次生成 14 个分区

首先进入容器内部,将上面这个 partition.sql 导入数据库 mysql

1
2
3
4
5
6
7
8
mysql  -uzabbix  -pfeiyang@2019+  zabbix  < partition.sql

# 在 mysql 容器内部 vim /opt/mysql.sh

#!bin/bash
mysql -uzabbix -pfeiyang@2019+ zabbix -e"CALL partition_maintenance_all('zabbix')"

chmod 755 /opt/mysql.sh

退出容器,在宿主机上,建立定时任务

1
2
3
4
5
# vim /etc/crontab

23 03 * * * root /bin/docker exec [mysql 容器 ID] bash -c "cd /opt && bash mysql.sh"

systemctl restart crond

Zabbix api

官方文档

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# auth_zabbix

import requests
import json

url = 'http://IP:port/api_jsonrpc.php' #docker 方式
# 非 docker 方式为 "http://IP:port/zabbix/api_jsonrpc.php"

post_data = {
"jsonrpc": "2.0",
"method": "user.login",
"params": {
"user": "xxx",
"password": "xxx"
},
"id": 1,

}
post_header = {'Content-Type': 'application/json'}

ret = requests.post(url, data=json.dumps(post_data), headers=post_header)

#print(ret)

zabbix_ret = json.loads(ret.text)

if not zabbix_ret.has_key('result'):
print 'login error'
else:
print zabbix_ret.get('result')
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# get hostid

import requests
import json

url = 'http://IP:port/api_jsonrpc.php'

server_list=["1.1.1.1","233.233.233.233"]

post_data = {

"jsonrpc": "2.0",
"method": "host.get",
"params": {
"filter": {
"host": server_list
},
"sortfield": "host",
},

"id": 1,
"auth": "由上文中的 auth_zabbix.py 得出"
}

post_header = {'Content-Type': 'application/json'}

ret = requests.post(url, data=json.dumps(post_data), headers=post_header)

zabbix_ret = json.loads(ret.text)

print zabbix_ret

if not zabbix_ret.has_key('result'):
print 'login error'
else:
print zabbix_ret.get('result')

hostid_list=[]
for i in zabbix_ret.get('result'):
hostid_list.append(str(i['hostid']))

print hostid_list
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# get hist_data
import requests
import json
import time
import datetime

today = datetime.date.today()
today_unix = int(time.mktime(today.timetuple()))
tomorrow = today+datetime.timedelta(days=1)
tomorrow_unix = int(time.mktime(tomorrow.timetuple()))

print today_unix
print tomorrow_unix

url = 'http://IP:port/api_jsonrpc.php'

post_data = {

"jsonrpc": "2.0",
"method": "history.get",
"params": {
"output": "extend",
"history": 3, # 0,1,2,3,4 History object types
"itemids": "31023",
"sortfield": "clock",
"sortorder": "DESC",
"time_from": "today_unix",
"time_till": "tomorrow_unix"

},

"auth": "由上文中的 auth_zabbix.py 得出",
"id": 1
}

post_header = {'Content-Type': 'application/json'}

ret = requests.post(url, data=json.dumps(post_data), headers=post_header)

zabbix_ret = json.loads(ret.text)

print zabbix_ret.get('result')
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
# get trend_data
import requests
import json
import time
import datetime

today = datetime.date.today()
today_unix = int(time.mktime(today.timetuple()))
tomorrow = today+datetime.timedelta(days=1)
tomorrow_unix = int(time.mktime(tomorrow.timetuple()))


url = 'http://IP:port/api_jsonrpc.php'

post_data = {
"jsonrpc": "2.0",
"method": "trend.get",
"params": {
"output": [ # 定义 output 格式
"itemid",
"clock", # 当前时间
"num", # trend 一小时采集次数
"value_min",
"value_avg",
"value_max"
],

"itemids": [
"28959",
"28972"

],

"time_from": today_unix,
"time_till": tomorrow_unix
},
"auth": "由上文中的 auth_zabbix.py 得出",
"id": 1
}

post_header = {'Content-Type': 'application/json'}

ret = requests.post(url, data=json.dumps(post_data), headers=post_header)

zabbix_ret = json.loads(ret.text)

print (zabbix_ret.get('result'))

zabbix_get

从 server 端检测到 client 端的网络是否通畅,可能是 iptables 或者 server host 白名单造成的问题。

1
zabbix_get -s 10.10.1.1 -k system.uname