The purpose of the health check is to ensure that the support systems of the device are suitable for running LAVA tests. To do this, the health check is run periodically and if the health check fails for any device, that device is automatically taken offline. Reports are available which show these failures and track the general health of the lab.
http://validation.linaro.org/scheduler/reports
For any one day where at least one health check failed, there is also a table providing information on the failed checks:
http://validation.linaro.org/scheduler/reports/failures?start=-1&end=0&health_check=1
Health checks are defined in the admin interface for each device type and run as the lava-health user.
The required entry is a JSON test file with the following change:
In addition, it is recommended to use:
{
"timeout": 900,
"job_name": "lab-health-beaglebone-black",
"logging_level": "DEBUG",
"health_check": true,
"actions": [
{
"command": "deploy_linaro_image",
"parameters": {
"image": "http://linaro-gateway/beaglebone/beaglebone_20130625-379.img.gz"
},
"metadata": {
"ubuntu.distribution": "quantal",
"ubuntu.build": "299",
"rootfs.type": "nano",
"ubuntu.name": "beaglebone-black"
}
},
{
"command": "lava_test_shell",
"parameters": {
"testdef_repos": [
{
"git-repo": "git://git.linaro.org/qa/test-definitions.git",
"testdef": "ubuntu/smoke-tests-basic.yaml"
}
],
"timeout": 900
}
},
{
"command": "submit_results",
"parameters": {
"server": "http://localhost/RPC2/",
"stream": "/anonymous/lab-health/"
}
}
]
}
The health check needs to at least check that the device will boot and deploy a test image. Multiple deploy tasks can be set, if required, although this will mean that each health check takes longer.
Wherever a particular device type has common issues, a specific test for that behaviour should be added to the health check for that device type.
It is a mistake to think that lava_test_shell should not be run in health checks. The consequence of a health check failing is that devices of the specified type will be automatically taken offline but this applies to a job failure, not a fail result from a single lava-test-case.
It is advisable to use a minimal set of sanity check test cases in all health checks, without making the health check unnecessarily long:
{
"command": "lava_test_shell",
"parameters": {
"testdef_repos": [
{
"git-repo": "git://git.linaro.org/qa/test-definitions.git",
"testdef": "ubuntu/smoke-tests-basic.yaml"
}
],
"timeout": 900
}
}
These tests run simple Ubuntu test commands to do with networking and basic functionality - it is common for linux-linaro-ubuntu-lsusb and/or linux-linaro-ubuntu-lsb_release to fail as individual test cases but these failed test cases will not cause the health check to fail or cause devices to go offline.
Using lava_test_shell in all health checks has several benefits:
See also Writing a LAVA test definition.