This site is a work in progress — you can help! Please see the Site news for details.

Resource Agents

From Linux-HA

(Redirected from Resource agents)
Jump to: navigation, search

A resource agent is a standardized interface for a cluster resource. In translates a standard set of operations into steps specific to the resource or application, and interprets their results as success or failure.

Resource Agents have been managed as a separate Linux-HA sub-project since their 1.0 release, which coincided with the Heartbeat 2.99 release. Previously, they were a part of the then-monolithic Heartbeat project, and had no collective name. Later, the Linux-HA Resource Agents and the RHCS Resource Agents sub-projects have been merged. The joint upstream repository is now https://github.com/ClusterLabs/resource-agents

Pacemaker supports three types of Resource Agents,

This page is about OCF Resource Agents bundled in the resource-agents package (aka cluster-agents, on Debian based distros), which you should install together with Heartbeat (or Corosync) and Pacemaker.

Supported Operations

Operations which a resource agent my perform on a resource instance include:

  • start: enable or start the given resource
  • stop: disable or stop the given resource
  • monitor: check whether the given resource is running (and/or doing useful work), return status as running or not running
  • validate-all: validate the resource's configuration
  • meta-data: return information about the resource agent itself (used by GUIs and other management utilities, and documentation tools)
  • some more, see OCF Resource Agents and the Pacemaker documentation for details.

Implementation

Most resource agents are coded as shell scripts. This, however, is by no means a necessity – the defined interface is language agnostic.

They are synchronous in nature. That is, you start them, and they complete some time later, and you are expected to wait for them to complete. Certain operations (notably start, stop and monitor) may take considerable time to complete. Considerable time means seconds to many minutes in some cases.

Source Code Repository

Source code for Resource Agents is being maintained in the https://github.com/ClusterLabs/resource-agents Git Repository.

Available Resource Agents (release 3.9.2, current as at 2011-10-27)

anything Manages an arbitrary service
This is a generic OCF RA to manage almost anything.
AoEtarget Manages ATA-over-Ethernet (AoE) target exports
This resource agent manages an ATA-over-Ethernet (AoE) target using vblade.
It exports any block device, or file, as an AoE target using the 
specified Ethernet device, shelf, and slot number.
apache Manages an Apache web server instance
This is the resource agent for the Apache web server.
This resource agent operates both version 1.x and version 2.x Apache
servers.

The start operation ends with a loop in which monitor is
repeatedly called to make sure that the server started and that
it is operational. Hence, if the monitor operation does not
succeed within the start operation timeout, the apache resource
will end with an error status.

The monitor operation by default loads the server status page
which depends on the mod_status module and the corresponding
configuration file (usually /etc/apache2/mod_status.conf).
Make sure that the server status page works and that the access
is allowed *only* from localhost (address 127.0.0.1).
See the statusurl and testregex attributes for more details.

See also http://httpd.apache.org/
AudibleAlarm Emits audible beeps at a configurable interval
Resource script for AudibleAlarm. It sets an audible alarm running by beeping 
at a set interval. 
ClusterMon Runs crm_mon in the background, recording the cluster status to an HTML file
This is a ClusterMon Resource Agent.
It outputs current cluster status to the html.
conntrackd This resource agent manages conntrackd
Master/Slave OCF Resource Agent for conntrackd
CTDB CTDB Resource Agent
This resource agent manages CTDB, allowing one to use Clustered Samba in a
Linux-HA/Pacemaker cluster.  You need a shared filesystem (e.g. OCFS2) on
which the CTDB lock will be stored.  Create /etc/ctdb/nodes containing a list
of private IP addresses of each node in the cluster, then configure this RA
as a clone.  To have CTDB manage Samba, set ctdb_manages_samba="yes".
Note that this option will be deprecated in future, in favour of configuring
a separate Samba resource.

For more information see http://linux-ha.org/wiki/CTDB_(resource_agent)
db2 Resource Agent that manages an IBM DB2 LUW databases in Standard role as primitive or in HADR roles as master/slave configuration. Multiple partitions are supported.
Resource Agent that manages an IBM DB2 LUW databases in Standard role as primitive or in HADR roles in master/slave configuration. Multiple partitions are supported.

Standard mode:

An instance including all or selected databases is made highly available.
Configure each partition as a separate primitive resource.

HADR mode:

A single database in HADR configuration is made highly available by automating takeover operations.
Configure a master / slave resource with notifications enabled and an
additional monitoring operation with role "Master".

In case of HADR be very deliberate in specifying intervals/timeouts. The detection of a failure including promote must complete within HADR_PEER_WINDOW.

In addition to honoring requirements for crash recovery etc. for your specific database use the following relations as guidance:

"monitor interval" < HADR_PEER_WINDOW - (appr 30 sec)

"promote timeout" < HADR_PEER_WINDOW + (appr 20 sec)

For further information and examples consult http://www.linux-ha.org/wiki/db2_(resource_agent)
Delay Waits for a defined timespan
This script is a test resource for introducing delay.
drbd Manages a DRBD resource (deprecated)
Deprecation warning: This agent is deprecated and may be removed from
a future release. See the ocf:linbit:drbd resource agent for a
supported alternative. --
This resource agent manages a Distributed
Replicated Block Device (DRBD) object as a master/slave
resource. DRBD is a mechanism for replicating storage; please see the
documentation for setup details.
Dummy Example stateless resource agent
This is a Dummy Resource Agent. It does absolutely nothing except 
keep track of whether its running or not.
Its purpose in life is for testing and to serve as a template for RA writers.

NB: Please pay attention to the timeouts specified in the actions
section below. They should be meaningful for the kind of resource
the agent manages. They should be the minimum advised timeouts,
but they shouldn't/cannot cover _all_ possible resource
instances. So, try to be neither overly generous nor too stingy,
but moderate. The minimum timeouts should never be below 10 seconds.
eDir88 Manages a Novell eDirectory directory server
Resource script for managing an eDirectory instance. Manages a single instance
of eDirectory as an HA resource. The "multiple instances" feature or
eDirectory has been added in version 8.8. This script will not work for any
version of eDirectory prior to 8.8. This RA can be used to load multiple
eDirectory instances on the same host.

It is very strongly recommended to put eDir configuration files (as per the
eDir_config_file parameter) on local storage on each node. This is necessary for
this RA to be able to handle situations where the shared storage has become
unavailable. If the eDir configuration file is not available, this RA will fail,
and heartbeat will be unable to manage the resource. Side effects include
STONITH actions, unmanageable resources, etc...

Setting a high action timeout value is _very_ _strongly_ recommended. eDir
with IDM can take in excess of 10 minutes to start. If heartbeat times out
before eDir has had a chance to start properly, mayhem _WILL ENSUE_.

The LDAP module seems to be one of the very last to start. So this script will
take even longer to start on installations with IDM and LDAP if the monitoring
of IDM and/or LDAP is enabled, as the start command will wait for IDM and LDAP
to be available.
ethmonitor Monitors network interfaces
Monitor the vitality of a local network interface.

You may setup this RA as a clone resource to monitor the network interfaces on different nodes, with the same interface name.
This is not related to the IP adress or the network on which a interface is configured.
You may use this RA to move resources away from a node, which has a faulty interface or prevent moving resources to such a node.
This gives you independend control of the resources, without involving cluster intercommunication. But it requires your nodes to have more than one network interface.

The resource configuration requires a monitor operation, because the monitor does the main part of the work.
In addition to the resource configuration, you need to configure some location contraints, based on a CIB attribute value.
The name of the attribute value is configured in the 'name' option of this RA.

Example constraint configuration:
location loc_connected_node my_resource_grp \
rule $id="rule_loc_connected_node" -INF: ethmonitor eq 0

The ethmonitor works in 3 different modes to test the interface vitality.
1. call ip to see if the link status is up (if link is down -> error)
2. call ip an watch the RX counter (if packages come around in a certain time -> success)
3. call arping to check wether any of the IPs found in the lokal ARP cache answers an ARP REQUEST (one answer -> success)
4. return error
Evmsd Controls clustered EVMS volume management

(deprecated)

Deprecation warning: EVMS is no longer actively maintained and should not be used. This agent is deprecated and may be removed from a future release. --
This is a Evmsd Resource Agent.
EvmsSCC Manages EVMS Shared Cluster Containers (SCCs) (deprecated)
Deprecation warning: EVMS is no longer actively maintained and should not be used. This agent is deprecated and may be removed from a future release. --
Resource script for EVMS shared cluster container. It runs evms_activate on one node in the cluster.
exportfs

Manages NFS exports

Exportfs uses the exportfs command to add/remove nfs exports.
It does NOT manage the nfs server daemon.
It depends on Linux specific NFS implementation details,
so is considered not portable to other platforms yet.
Filesystem Manages filesystem mounts
Resource script for Filesystem. It manages a Filesystem on a
shared storage medium. 

The standard monitor operation of depth 0 (also known as probe)
checks if the filesystem is mounted. If you want deeper tests,
set OCF_CHECK_LEVEL to one of the following values:

10: read first 16 blocks of the device (raw read)

This doesn't exercise the filesystem at all, but the device on
which the filesystem lives. This is noop for non-block devices
such as NFS, SMBFS, or bind mounts.

20: test if a status file can be written and read

The status file must be writable by root. This is not always the
case with an NFS mount, as NFS exports usually have the
"root_squash" option set. In such a setup, you must either use
read-only monitoring (depth=10), export with "no_root_squash" on
your NFS server, or grant world write permissions on the
directory where the status file is to be placed.
fio fio IO load generator
fio is a generic I/O load generator. This RA allows start/stop of fio
instances to simulate load on a cluster without configuring complex
services.
ICP Manages an ICP Vortex clustered host drive
Resource script for ICP. It Manages an ICP Vortex clustered host drive as an 
HA resource. 
ids Manages an Informix Dynamic Server (IDS) instance
OCF resource agent to manage an IBM Informix Dynamic Server (IDS) instance as an High-Availability resource.
IPaddr Manages virtual IPv4 addresses (portable version)
This script manages IP alias IP addresses
It can add an IP alias, or remove one.
IPaddr2 Manages virtual IPv4 addresses (Linux specific version)
This Linux-specific resource manages IP alias IP addresses.
It can add an IP alias, or remove one.
In addition, it can implement Cluster Alias IP functionality
if invoked as a clone resource.
IPsrcaddr Manages the preferred source address for outgoing IP packets
Resource script for IPsrcaddr. It manages the preferred source address
modification. 
iscsi Manages a local iSCSI initiator and its connections to iSCSI targets
OCF Resource Agent for iSCSI. Add (start) or remove (stop) iSCSI
targets.
iSCSILogicalUnit Manages iSCSI Logical Units (LUs)
Manages iSCSI Logical Unit. An iSCSI Logical unit is a subdivision of 
an SCSI Target, exported via a daemon that speaks the iSCSI protocol.
iSCSITarget iSCSI target export agent
Manages iSCSI targets. An iSCSI target is a collection of SCSI Logical
Units (LUs) exported via a daemon that speaks the iSCSI protocol.
jboss Manages a JBoss application server instance
Resource script for Jboss. It manages a Jboss instance as an HA resource.
LinuxSCSI Enables and disables SCSI devices through the

kernel SCSI hot-plug subsystem (deprecated)

Deprecation warning: This agent makes use of Linux SCSI hot-plug
functionality which has been superseded by SCSI reservations. It is
deprecated and may be removed from a future release. See the
scsi2reservation and sfex agents for alternatives. --
This is a resource agent for LinuxSCSI. It manages the availability of a
SCSI device from the point of view of the linux kernel. It make Linux
believe the device has gone away, and it can make it come back again.
LVM Controls the availability of an LVM Volume Group
Resource script for LVM. It manages an  Linux Volume Manager volume (LVM) 
as an HA resource. 
lxc Manages LXC containers
Allows LXC containers to be managed by the cluster.
If the container is running "init" it will also perform an orderly shutdown.
It is 'assumed' that the 'init' system will do an orderly shudown if presented with a 'kill -PWR' signal.
On a 'sysvinit' this would require the container to have an inittab file containing "p0::powerfail:/sbin/init 0"
I have absolutly no idea how this is done with 'upstart' or 'systemd', YMMV if your container is using one of them.
MailTo Notifies recipients by email in the event of resource takeover
This is a resource agent for MailTo. It sends email to a sysadmin whenever 
a takeover occurs.
ManageRAID Manages RAID devices
Manages starting, stopping and monitoring of RAID devices which
are preconfigured in /etc/conf.d/HB-ManageRAID.
ManageVE Manages an OpenVZ Virtual Environment (VE)
This OCF compliant resource agent manages OpenVZ VEs and thus requires
a proper OpenVZ installation including a recent vzctl util.
mysql Manages a MySQL database instance
Resource script for MySQL. 
May manage a standalone MySQL database, a clone set with externally
managed replication, or a complete master/slave replication setup.
mysql-proxy Manages a MySQL Proxy daemon
This script manages MySQL Proxy as an OCF resource in a high-availability setup.
Tested with MySQL Proxy 0.7.0 on Debian 5.0.
named Manages a named server - (not yet available in 3.9.2)
Resource script for named (Bind) server. It manages named as an HA resource.
nfsserver Manages an NFS server
Nfsserver helps to manage the Linux nfs server as a failover-able resource in Linux-HA.
It depends on Linux specific NFS implementation details, so is considered not portable to other platforms yet.
nginx Manages an Nginx web/proxy server instance
This is the resource agent for the Nginx web/proxy server.
This resource agent does not monitor POP or IMAP servers, as
we don't know how to determine meaningful status for them.

The start operation ends with a loop in which monitor is
repeatedly called to make sure that the server started and that
it is operational. Hence, if the monitor operation does not
succeed within the start operation timeout, the nginx resource
will end with an error status.

The default monitor operation will verify that nginx is running.

The level 10 monitor operation by default will try and fetch the /nginx_status
page - which is commented out in sample nginx configurations.
Make sure that the /nginx_status page works and that the access
is restricted to localhost (address 127.0.0.1) plus whatever
places _outside the cluster_ you want to monitor the server from.
See the status10url and status10regex attributes for more details.

The level 20 monitor operation will perform a more complex set of tests
from a configuration file.

The level 30 monitor operation will run an external command to perform
an arbitrary monitoring operation.

oracle Manages an Oracle Database instance
Resource script for oracle. Manages an Oracle Database instance
as an HA resource.
oralsnr Manages an Oracle TNS listener
Resource script for Oracle Listener. It manages an
Oracle Listener instance as an HA resource.
pgsql Manages a PostgreSQL database instance
Resource script for PostgreSQL. It manages a PostgreSQL as an HA resource.
pingd Monitors connectivity to specific hosts or

IP addresses ("ping nodes") (deprecated)

Deprecation warning: This agent is deprecated and may be removed from
a future release. See the ocf:pacemaker:pingd resource agent for a
supported alternative. --
This is a pingd Resource Agent.
It records (in the CIB) the current number of ping nodes a node can connect to.
portblock Block and unblocks access to TCP and UDP ports
Resource script for portblock. It is used to temporarily block ports 
using iptables. In addition, it may allow for faster TCP reconnects
for clients on failover. Use that if there are long lived TCP
connections to an HA service. This feature is enabled by setting the
tickle_dir parameter and only in concert with action set to unblock.
Note that the tickle ACK function is new as of version 3.0.2 and
hasn't yet seen widespread use.
postfix Manages a highly available Postfix mail server instance
This script manages Postfix as an OCF resource in a high-availability setup.
proftpd OCF Resource Agent compliant FTP script.
This script manages Proftpd in an Active-Passive setup
Pure-FTPd Manages a Pure-FTPd FTP server instance
This script manages Pure-FTPd in an Active-Passive setup
Raid1 Manages a software RAID1 device on shared storage
Resource script for RAID1. It manages a software Raid1 device on a shared 
storage medium. 
Route Manages network routes
Enables and disables network routes.

Supports host and net routes, routes via a gateway address,
and routes using specific source addresses.

This resource agent is useful if a node's routing table
needs to be manipulated based on node role assignment.

Consider the following example use case:

-  One cluster node serves as an IPsec tunnel endpoint.

-  All other nodes use the IPsec tunnel to reach hosts
in a specific remote network.

Then, here is how you would implement this scheme making use
of the Route resource agent:

-  Configure an ipsec LSB resource.

-  Configure a cloned Route OCF resource.

-  Create an order constraint to ensure 
that ipsec is started before Route.

-  Create a colocation constraint between the
ipsec and Route resources, to make sure no instance
of your cloned Route resource is started on the
tunnel endpoint itself.
rsyncd Manages an rsync daemon
This script manages rsync daemon
rsyslog rsyslog resource agent - (not yet available in 3.9.2)
This script manages a rsyslog instance as an HA resource.
SAPDatabase Manages any SAP database (based on Oracle, MaxDB, or DB2)
Resource script for SAP databases. It manages a SAP database of any type as an HA resource.
SAPInstance Manages a SAP instance as an HA resource.
Usually a SAP system consists of one database and at least one or more SAP instances (sometimes called application servers). One SAP Instance is defined by having exactly one instance profile. The instance profiles can usually be found in the directory /sapmnt/SID/profile. Each instance must be configured as it's own resource in the cluster configuration.
The resource agent supports the following SAP versions:
- SAP WebAS ABAP Release 6.20 - 7.30
- SAP WebAS Java Release 6.40 - 7.30
- SAP WebAS ABAP + Java Add-In Release 6.20 - 7.30 (Java is not monitored by the cluster in that case)
When using a SAP Kernel 6.40 please check and implement the actions from the section "Manual postprocessing" from SAP note 995116 (http://sdn.sap.com).

All operations of the SAPInstance resource agent are done by using the startup framework called SAP Management Console or sapstartsrv that was introduced with SAP kernel release 6.40. Find more information about the SAP Management Console in SAP note 1014480. Using this framework defines a clear interface for the Heartbeat cluster, how it sees the SAP system. The options for monitoring the SAP system are also much better than other methods like just watching the ps command for running processes or doing some pings to the application. sapstartsrv uses SOAP messages to request the status of running SAP processes. Therefore it can actually ask a process itself what it's status is, independent from other problems that might exist at the same time.

sapstartsrv knows 4 status colours:
- GREEN   = everything is fine
- YELLOW  = something is wrong, but the service is still working
- RED     = the service does not work
- GRAY    = the service has not been started

The SAPInstance resource agent will interpret GREEN and YELLOW as OK. That means that minor problems will not be reported to the Heartbeat cluster. This prevents the cluster from doing an unwanted failover.
The statuses RED and GRAY are reported as NOT_RUNNING to the cluster. Depending on the status the cluster expects from the resource, it will do a restart, failover or just nothing.
scsi2reservation

scsi-2 reservation

The scsi-2-reserve resource agent is a place holder for SCSI-2 reservation.
A healthy instance of scsi-2-reserve resource, indicates the own of the specified SCSI device.
This resource agent depends on the scsi_reserve from scsires package, which is Linux specific.
SendArp Broadcasts unsolicited ARP announcements
This RA can be used _instead_ of the IPaddr2 or IPaddr RA to
send gratuitous ARP for an IP address on a given interface, 
without adding the address to that interface.  For example,
if for some resaon you wanted to send gratuitous ARP for
addresses managed by IPaddr2 or IPaddr on an additional
interface.
ServeRAID Enables and disables shared ServeRAID merge groups
Resource script for ServeRAID. It enables/disables shared ServeRAID merge groups.
sfex Manages exclusive access to shared storage using Shared Disk File EXclusiveness (SF-EX)
Resource script for SF-EX. It manages a shared storage medium exclusively .
slapd Manages a Stand-alone LDAP Daemon (slapd) instance - (not yet available in 3.9.2)
Resource script for Stand-alone LDAP Daemon (slapd). It manages a slapd instance as an OCF resource.
SphinxSearchDaemon Manages the Sphinx search daemon.
This is a searchd Resource Agent. It manages the Sphinx Search Daemon.
Squid Manages a Squid proxy server instance
The resource agent of Squid.
This manages a Squid instance as an HA resource.
Stateful Example stateful resource agent
This is an example resource agent that impliments two states
symlink Manages a symbolic link
This resource agent that manages a symbolic link (symlink).

It is primarily intended to manage configuration files which should be
enabled or disabled based on where the resource is running, such as
cron job definitions and the like.
SysInfo Records various node attributes in the CIB
This is a SysInfo Resource Agent.
It records (in the CIB) various attributes of a node
Sample Linux output:
arch:   i686
os:     Linux-2.4.26-gentoo-r14
free_swap:      1999
cpu_info:       Intel(R) Celeron(R) CPU 2.40GHz
cpu_speed:      4771.02
cpu_cores:      1
cpu_load:       0.00
ram_total:      513
ram_free:       117
root_free:      2.4

Sample Darwin output:
arch:   i386
os:     Darwin-8.6.2
cpu_info:       Intel Core Duo
cpu_speed:      2.16
cpu_cores:      2
cpu_load:       0.18
ram_total:      2016
ram_free:       787
root_free:      13

Units:
free_swap: Mb
ram_*:     Mb
root_free: Gb
cpu_speed (Linux): bogomips
cpu_speed (Darwin): Ghz

syslog-ng Syslog-ng resource agent
This script manages a syslog-ng instance as an HA resource.
tomcat Manages a Tomcat servlet environment instance
Resource script for Tomcat. It manages a Tomcat instance as a cluster resource.
VIPArip Manages a virtual IP address through RIP2
Virtual IP Address by RIP2 protocol.
This script manages IP alias in different subnet with quagga/ripd.
It can add an IP alias, or remove one.
VirtualDomain Manages virtual domains through the libvirt virtualization framework
Resource agent for a virtual domain (a.k.a. domU, virtual machine,
virtual environment etc., depending on context) managed by libvirtd.
vmware Manages VMWare Server 2.0 virtual machines
OCF compliant script to control vmware server 2.0 virtual machines.
WAS Manages a WebSphere Application Server instance
Resource script for WAS. It manages a Websphere Application Server (WAS) as 
an HA resource.
WAS6 Manages a WebSphere Application Server 6 instance
Resource script for WAS6. It manages a Websphere Application Server (WAS6) as
an HA resource.
WinPopup Sends an SMB notification message to selected hosts
Resource script for WinPopup. It sends WinPopups message to a 
sysadmin's workstation whenever a takeover occurs.
Xen Manages Xen unprivileged domains (DomUs)
Resource Agent for the Xen Hypervisor.
Manages Xen virtual machine instances by mapping cluster resource
start and stop,  to Xen create and shutdown, respectively.

A note on names

We will try to extract the name from the config file (the xmfile
attribute). If you use a simple assignment statement, then you
should be fine. Otherwise, if there's some python acrobacy
involved such as dynamically assigning names depending on other
variables, and we will try to detect this, then please set the
name attribute. You should also do that if there is any chance of
a pathological situation where a config file might be missing,
for example if it resides on a shared storage. If all fails, we
finally fall back to the instance id to preserve backward
compatibility.

Para-virtualized guests can also be migrated by enabling the
meta_attribute allow-migrate.

Xinetd Manages an Xinetd service
Resource script for Xinetd. It starts/stops services managed
by xinetd.

Note that the xinetd daemon itself must be running: we are not
going to start it or stop it ourselves.

Important: in case the services managed by the cluster are the
only ones enabled, you should specify the -stayalive option for
xinetd or it will exit on Heartbeat stop. Alternatively, you may
enable some internal service such as echo.
Personal tools