通常,我們對硬盤當前的狀態不太好確定,一般通過機房人員巡檢來完成,有沒有通過軟件的方式來檢查確定這個問題呢。MegaCli就可以做到,一般通過 MegaCli 的“Media Error Count”和“Other Error Count”這兩個數值來確定陣列中磁盤是否有問題。
Medai Error Count 表示磁盤可能錯誤,可能是磁盤有壞道,這個值不為0值得注意,數值越大,危險係數越高;
Other Error Count 表示磁盤可能存在鬆動,可能需要重新再插入;
發現腳本:
#!/bin/bash
###raid_id_discover.sh
###wuhf###
num=0
RAID_stats() {
DISK=($(sudo /usr/local/MegaCli/MegaCli64 -pdlist -aALL | grep "Slot Number" | awk -F":" '{print $2}'))
printf '{\\n\\t"data":[\\n'
for key in ${DISK[@]};do
if [[ "${#DISK[@]}" -gt "$num" && "$num" -ne "$((${#DISK[@]}-1))" ]];then
printf "\\t\\t{\"{#RAID_ID}\":\"$key\"},\\n"
let "num++"
elif [[ "$((${#DISK[@]}-1))" -eq "$num" ]];then
printf "\\t\\t{\"{#RAID_ID}\":\"$key\"}\\n"
fi
done
printf '\\t]\\n}\\n'
}
RAID_stats
鍵值設置:
#raid.conf
UserParameter=raid_discover,bash /usr/local/zabbix/libexec/raid_id_discover.sh
UserParameter=raid_degraded,sudo /usr/local/MegaCli/MegaCli64 -AdpAllInfo -aALL -NoLog | grep "Degraded" |awk '{print $NF}'
UserParameter=raid_failed_disks,sudo /usr/local/MegaCli/MegaCli64 -AdpAllInfo -aALL -NoLog | grep "Failed Disks" |awk '{print $NF}'
UserParameter=raid_MEC[*],sudo /usr/local/MegaCli/MegaCli64 -PDList -aAll -NoLog | grep -A 8 "Slot Number: $1" | grep "Media Error Count" | awk '{print $NF}'
UserParameter=raid_OEC[*],sudo /usr/local/MegaCli/MegaCli64 -PDList -aAll -NoLog | grep -A 8 "Slot Number: $1" | grep "Other Error Count" | awk '{print $NF}'
權限設置:
chmod 755 /usr/local/zabbix/libexec/raid_id_discover.sh
chown zabbix.zabbix /usr/local/zabbix/libexec/raid_id_discover.sh
chown zabbix.zabbix /usr/local/zabbix/etc/zabbix_agentd.conf.d/raid.conf
echo "zabbix ALL=(root) NOPASSWD:ALL" >> /etc/sudoers
sed -i 's/^Defaults.*.requiretty/#Defaults requiretty/' /etc/sudoers
模板導入:
說明:
要理解模板首先要了解MegaCLI命令的詳情,這個百度教程有很多;
我提供的模板是在zabbix-3.0的環境上運行的,低版本可能不兼容,只要理解了鍵值的意義自己可以自定義模板;
閱讀更多 愛踢人生 的文章