zabbix監控磁盤RAID的discover模板

通常,我們對硬盤當前的狀態不太好確定,一般通過機房人員巡檢來完成,有沒有通過軟件的方式來檢查確定這個問題呢。MegaCli就可以做到,一般通過 MegaCli 的“Media Error Count”和“Other Error Count”這兩個數值來確定陣列中磁盤是否有問題。

Medai Error Count 表示磁盤可能錯誤,可能是磁盤有壞道,這個值不為0值得注意,數值越大,危險係數越高;

Other Error Count 表示磁盤可能存在鬆動,可能需要重新再插入;

發現腳本:

#!/bin/bash

###raid_id_discover.sh

###wuhf###

num=0

RAID_stats() {

DISK=($(sudo /usr/local/MegaCli/MegaCli64 -pdlist -aALL | grep "Slot Number" | awk -F":" '{print $2}'))

printf '{\\n\\t"data":[\\n'

for key in ${DISK[@]};do

if [[ "${#DISK[@]}" -gt "$num" && "$num" -ne "$((${#DISK[@]}-1))" ]];then

printf "\\t\\t{\"{#RAID_ID}\":\"$key\"},\\n"

let "num++"

elif [[ "$((${#DISK[@]}-1))" -eq "$num" ]];then

printf "\\t\\t{\"{#RAID_ID}\":\"$key\"}\\n"

fi

done

printf '\\t]\\n}\\n'

}

RAID_stats

鍵值設置:

#raid.conf

UserParameter=raid_discover,bash /usr/local/zabbix/libexec/raid_id_discover.sh

UserParameter=raid_degraded,sudo /usr/local/MegaCli/MegaCli64 -AdpAllInfo -aALL -NoLog | grep "Degraded" |awk '{print $NF}'

UserParameter=raid_failed_disks,sudo /usr/local/MegaCli/MegaCli64 -AdpAllInfo -aALL -NoLog | grep "Failed Disks" |awk '{print $NF}'

UserParameter=raid_MEC[*],sudo /usr/local/MegaCli/MegaCli64 -PDList -aAll -NoLog | grep -A 8 "Slot Number: $1" | grep "Media Error Count" | awk '{print $NF}'

UserParameter=raid_OEC[*],sudo /usr/local/MegaCli/MegaCli64 -PDList -aAll -NoLog | grep -A 8 "Slot Number: $1" | grep "Other Error Count" | awk '{print $NF}'

權限設置:

chmod 755 /usr/local/zabbix/libexec/raid_id_discover.sh

chown zabbix.zabbix /usr/local/zabbix/libexec/raid_id_discover.sh

chown zabbix.zabbix /usr/local/zabbix/etc/zabbix_agentd.conf.d/raid.conf

echo "zabbix ALL=(root) NOPASSWD:ALL" >> /etc/sudoers

sed -i 's/^Defaults.*.requiretty/#Defaults requiretty/' /etc/sudoers

模板導入:


zabbix監控磁盤RAID的discover模板


說明:

要理解模板首先要了解MegaCLI命令的詳情,這個百度教程有很多;

我提供的模板是在zabbix-3.0的環境上運行的,低版本可能不兼容,只要理解了鍵值的意義自己可以自定義模板;


分享到:


相關文章: