HOWTO Setup lm sensors

From Research
Revision as of 18:32, 11 August 2016 by Gordp (talk | contribs)
Jump to navigation Jump to search

What We Want to Accomplish

We will use the lm_sensors package to monitor and log our CPU and motherboard temperatures. This package will also let us see various voltages, and even some fan-speeds. lm_sensors often serves as the hardware-interface for user-utilities, but in our case (with a server) there won't be any user-space utilies... rather, we'll run lm_sensors in daemon-mode, and write every 5min.

Preparing the Kernel

First, we have to compile our kernel with support for sensors:

Device Drivers  --->
    <M> I2C support  --->
        <M> I2C device interface
            I2C Hardware Bus support --->  
            # Activate everything
    <M> Hardware Monitoring Support --->
        # Activate everything

Adding the lm_sensors Package

The sensord USE-flag must be enabled. An unfortunate side-effect is that the package x11-libs/cairo is pulled in; we want to verify -X and add svg USE-flags

hostname # emacs -nw /etc/portage/package.use
sys-apps/lm_sensors	    sensord
x11-libs/cairo		    svg

Configuring lm_sensors

Try running sensors-detect, although many of our PAX/GRSECURITY-enabled kernels will prohibit the necessary probes. However, for many of our recent servers, you can try manually adding the necessary (Intel) CPU temperature-sensor modules, and the (ASUS) motherboard sensor-modules; here are a couple of typical examples:

hostname # modprobe coretemp
hostname # modprobe asus_atk0110

You should be able to run the sensors utility, and observe useful output:

spitfire # sensors
coretemp-isa-0000
Adapter: ISA adapter
Core 0:      +43.0�C  (high = +82.0�C, crit = +100.0�C)  

coretemp-isa-0001
Adapter: ISA adapter
Core 1:      +39.0�C  (high = +82.0�C, crit = +100.0�C)  

coretemp-isa-0002
Adapter: ISA adapter
Core 2:      +42.0�C  (high = +82.0�C, crit = +100.0�C)  

coretemp-isa-0003
Adapter: ISA adapter
Core 3:      +36.0�C  (high = +82.0�C, crit = +100.0�C)  

atk0110-acpi-0
Adapter: ACPI interface
Vcore Voltage:      +1.31 V  (min =  +0.85 V, max =  +1.60 V)
+12V Voltage:      +11.78 V  (min = +10.20 V, max = +13.80 V)
+5V Voltage:        +4.89 V  (min =  +4.50 V, max =  +5.50 V)
+3.3V Voltage:      +3.26 V  (min =  +2.97 V, max =  +3.63 V)
CPU_FAN FAN Speed: 1934 RPM  (min =  800 RPM)
CHA_FAN1 FAN Speed:   0 RPM  (min =  800 RPM)
PWR FAN FAN Speed:    0 RPM  (min =  800 RPM)
CHA_FAN2 FAN Speed:   0 RPM  (min =  800 RPM)
CPU Temperature:    +24.0�C  (high = +60.0�C, crit = +95.0�C)  
MB Temperature:     +29.0�C  (high = +45.0�C, crit = +95.0�C)

Once you've determined the appropriate modules, you can compile out the un-needed ones from your kernel. And, you'll have to ensure that these needed modules get loaded at boot-time:

hostname # emacs -nw /etc/modules.autoload.d/kernel-2.6
# for sensord
coretemp
asus_atk0110

Tell the daemon to make log-entries every 5min, by making the changes shown in bold:

hostname # emacs -nw /etc/conf.d/sensord
# Extra options to pass to the sensord daemon,
# see sensord(8) for more information
SENSORD_OPTIONS="--log-interval 300"

OPTIONAL
If you want to make use of some user-space utilities to monitor your sensors, you'll have to enable and use lm_sensors. You must add the needed modules to /etc/conf.d/lm_sensors file:

hostname # emacs -nw /etc/conf.d/lm_sensors
# The format of this file is a shell script that simply defines variables:
# HWMON_MODULES for hardware monitoring driver modules, and optionally
# BUS_MODULES for any required bus driver module (for example for I2C or SPI).

# Load modules at startup
LOADMODULES=yes

# Initialize sensors at startup
INITSENSORS=yes

HWMON_MODULES="coretemp asus_atk0110"

# For compatibility reasons, modules are also listed individually as variables
#    MODULE_0, MODULE_1, MODULE_2, etc.
# Please note that the numbers in MODULE_X must start at 0 and increase in
# steps of 1. Any number that is missing will make the init script skip the
# rest of the modules. Use MODULE_X_ARGS for arguments.
#
# You should use BUS_MODULES and HWMON_MODULES instead if possible.

MODULE_0=coretemp
MODULE_1=asus_atk0110

Drive Temperatures

The following example is suited for use with our LSI 9280-16i4e MegaRaid controller, but the ideas can easily be extended to other controllers (such as 3Ware).
On a server, if you have sensord running, and are running logwatch, by this point you'll normally see the CPU/CPU-core temperatures in logwatch, but also also the following:

--------------------- lm_sensors output Begin ------------------------ 

nc: unable to connect to address 127.0.0.1, service 7634

This indicates that logwatch is trying to query the drive-temperature with hddtemp. Hddtemp in daemon-mode will listen on port 7634, but can only query drives directly; hddtemp cannot query drives behind our RAID controller. However, either smartctl or megacli can query drives behind a controller.
edit /usr/share/logwatch/scripts/services/zz-lm_sensors, and adjust ~Line 30 for $query_hddtemp

hostname # emacs -nw /usr/share/logwatch/scripts/services/zz-lm_sensors
#my $query_hddtemp  = $ENV{'query_hddtemp'}  || '/usr/bin/nc 127.0.0.1 7634';
my $query_hddtemp  = $ENV{'query_hddtemp'}  || 'megacli -PDList -aALL | grep Temperature';

Logging

For logging, we'll want to keep our regular /var/log/messages free of clutter - our sensor data would go to /var/log/messages by defaults :-( So, let's change this behaviour, and store our sensor data in it's own /var/log/sensord file (note: you can break the three commands apart, and place them into the related destination / filter / log portions of the file, if you wish:

hostname # emacs -nw /etc/syslog-ng/syslog-ng.conf
# let's put our sensor data into it's own log-file
destination sensord { file("/var/log/sensord"); };
filter f_sensord { program(sensord); };
log { source(src); filter(f_sensord); destination(sensord); flags(final); };

Make it take effect:

hostname # /etc/init.d/syslog-ng restart

We'll need to rotate our sensord log-file, so let's create a logrotate file for sensord:

hostname # emacs -nw /etc/logrotate.d/sensord
/var/log/sensord {
    weekly
    rotate 4
    missingok
    compress
    postrotate
      /etc/init.d/syslog-ng reload > /dev/null 2>&1 || true
    endscript
}

Go!

hostname # /etc/init.d/sensord start

Verify that the log-file has just been created, and holds sensible information:

hostname # cat /var/log/sensord
Aug 23 10:07:08 hurricane sensord: sensord started
Aug 23 10:07:08 hurricane sensord: Chip: coretemp-isa-0000
Aug 23 10:07:08 hurricane sensord: Adapter: ISA adapter
Aug 23 10:07:08 hurricane sensord:   Core 0: 36.0 C
Aug 23 10:07:08 hurricane sensord: Chip: coretemp-isa-0001
Aug 23 10:07:08 hurricane sensord: Adapter: ISA adapter
Aug 23 10:07:08 hurricane sensord:   Core 1: 37.0 C
Aug 23 10:07:08 hurricane sensord: Chip: coretemp-isa-0002
Aug 23 10:07:08 hurricane sensord: Adapter: ISA adapter
Aug 23 10:07:08 hurricane sensord:   Core 2: 37.0 C
Aug 23 10:07:08 hurricane sensord: Chip: coretemp-isa-0003
Aug 23 10:07:08 hurricane sensord: Adapter: ISA adapter
Aug 23 10:07:08 hurricane sensord:   Core 3: 35.0 C
Aug 23 10:07:08 hurricane sensord: Chip: atk0110-acpi-0
Aug 23 10:07:08 hurricane sensord: Adapter: ACPI interface
Aug 23 10:07:08 hurricane sensord:   Vcore Voltage: +1.31 V (min = +0.85 V, max = +1.60 V)
Aug 23 10:07:08 hurricane sensord:   +12V Voltage: +11.84 V (min = +10.20 V, max = +13.80 V)
Aug 23 10:07:08 hurricane sensord:   +5V Voltage: +4.87 V (min = +4.50 V, max = +5.50 V)
Aug 23 10:07:08 hurricane sensord:   +3.3V Voltage: +3.26 V (min = +2.97 V, max = +3.63 V)
Aug 23 10:07:08 hurricane sensord:   CPU_FAN FAN Speed: 1985 RPM (min = 800 RPM)
Aug 23 10:07:08 hurricane sensord:   CHA_FAN1 FAN Speed: 0 RPM (min = 800 RPM)
Aug 23 10:07:08 hurricane sensord:   PWR FAN FAN Speed: 0 RPM (min = 800 RPM)
Aug 23 10:07:08 hurricane sensord:   CHA_FAN2 FAN Speed: 0 RPM (min = 800 RPM)
Aug 23 10:07:08 hurricane sensord:   CPU Temperature: 24.0 C
Aug 23 10:07:08 hurricane sensord:   MB Temperature: 27.0 C

Make sensord start by-default whenever we start:

hostname # rc-update add sensord default

OPTIONAL
Again, this is only something you'll invoke if you're planning to use some user-space monitoring tools (which we typically DO NOT on our servers. You'll have to reboot, before lm_sensors will run, after you configure your kernel as above (note: sensord does not need a reboot, it is only lm_sensors which is fussy). Before a reboot, it will give the error:

* Loading lm_sensors modules...
*   Loading i2c-core ...
*     Could not load i2c-core! 

So, reboot :-)

hostname # /etc/init.d/lm_sensors start
hostname # rc-update add lm_sensors default