Setting up HA for arbitrary services #################################### .. note:: This example presumes an Apache HTTP server running on same machines as the Keepalive Daemon, but that is not a requirement. * What the user experiences .. graphviz:: digraph web_service { rankdir="LR" node [shape="rectangle"] Request [shape="circle", color="red"] Landlord [color="blue", label="Landlord IP"] WEB [ color=blue, label="MP" ] Request -> Landlord -> WEB } * Packet's-eye-view .. graphviz:: digraph ha_packet { rankdir="LR" node [shape="rectangle"] Request [shape="circle", color="red"] subgraph cluster0 { label="landlord on 192.168.56.200" #KA [ label="192.168.56.200" ] subgraph cluster_ethyl { label="ethyl on 192.168.56.201" color=blue HAproxy_ethyl MPethyl } subgraph cluster_fred { label="fred on 192.168.56.202" color=blue HAproxy_fred MPfred } } #Request -> KA #KA -> HAproxy_ethyl #KA -> HAproxy_fred Request -> HAproxy_ethyl [ label="192.168.56.200 master" ] Request -> HAproxy_fred [ label="192.168.56.200 slave", style=dotted ] HAproxy_ethyl -> MPethyl HAproxy_ethyl -> MPfred HAproxy_fred -> MPethyl HAproxy_fred -> MPfred } * Under the hood... * This configuration envisions only two machines: **fred** and **ethyl** * The dashed lines are functions, not machines. .. graphviz:: digraph ha_demo { rankdir="LR" node [shape="rectangle"] Request [shape="circle", color="red"] Landlord [color="blue", label="Landlord IP"] HA [ style=dashed, color=blue, label="LB" ] WEB [ style=dashed, color=blue, label="MP" ] subgraph cluster0 { label="fred" HAethyl [ label="HA ethyl" ] WEBethyl [ label="MP" ] KAmaster [ label="VRRP\nmaster" ] } KA [ style=dashed, color=blue, label="VRRP" ] subgraph cluster1 { label="ethyl" HAfred [ label="HA fred" ] KAslave [ label="VRRP\nslave" ] WEBfred [ label="MP" ] } Request -> Landlord [ fontcolor=red, color=red ] Landlord -> HA HA -> HAethyl [ color=blue, style=dotted, dir=back ] HA -> HAfred [ color=blue, style=dotted, dir=back ] HA -> WEB [ color=red ] WEB -> WEBethyl [ color=red, label="LB or failover", dir=back ] WEB -> WEBfred [ color=red, label="LB or failover", dir=back ] KAmaster -> HAethyl [ color=orange, label="monitoring" ] KAslave -> HAfred [ color=orange, label="monitoring" ] KA -> KAmaster [ style=dashed, color=orange, label="VRRP", dir=back ] KA -> KAslave [ style=dashed, color=orange, label="VRRP", dir=back ] } * When would we add an appliance? * ...When we use bulk encryption for a mixed-domain stack. * This may someday apply to us. * ...As part of a denial-of-service protection scheme. * This will not apply to us anytime soon. * ...If most of our application logic already lives on an appliance with only a database behind it. * This will likely never apply to us. * ...If CI/CD is no longer an objective for the organization. * This better not apply to us. * Here is how it looks with an appliance. +----------------------+--------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Event | Resources involved | Resources involved | | | with agile solution | with rigid appliance | +======================+================================================================================+===================================================================================================================================================================+ | Networking incident | *Network engineer* assesses network issues. | *Network engineer* assesses network issues. | | | Component fault isolation is obvious and immediate. | Component fault isolation is obvious and immediate. | +----------------------+--------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Application incident | *System administrator* validates available services in a central location. | *Network engineer* and *system administrator* simultaneously test possibly conflicting configration. | | | Component fault isolation is obvious and immediate. | **Fault isolation is a separate task, delaying actual break-fix.** | +----------------------+--------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Configuration change | A central configuration pushes both the application and its high availability. | **Error-prone human coordination** must simultaneously push the same change to both service nodes and network devices. | +----------------------+--------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Code push | *Software developers* define services. | *Software developers* define services. | | | *Sysadmins* deploy configuration files to a central location. | *Sysadmins* and *network engineers* **figure out which part of the service rests on which hardware**. (See configuration change.) | | | *Network engineers* only assign IP addresses | *Network engineers* assign IP addresses **attempt to guess** whether services, network availability, or temporary resource constraints will define failover | | | | conditions. | | | | **AKA Hope-as-Strategy** | +----------------------+--------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+ Preparing the system ==================== * Edit */etc/sysctl.conf* .. code-block:: cfg :linenos: :caption: /etc/sysctl.conf net.ipv4.ip_nonlocal_bind=1 net.ipv4.ip_forward=1 * Make the sysctl changes permanent and install the RMPs .. code-block:: bash :linenos: # Cement the changes from the previous step sysctl -p # Install the packages yum install -y httpd keepalived haproxy # Enable the services we will use chkconfig keepalived on chkconfig haproxy on chkconfig httpd on HA Proxy configuration ====================== * The HA Proxy configuration should be the same on each node. .. code-block:: apache :linenos: :caption: /etc/haproxy/haproxy.cfg global log 127.0.0.1 local0 log 127.0.0.1 local1 notice maxconn 4096 user haproxy group haproxy daemon #debug #quiet defaults log global mode http option httplog option dontlognull retries 3 option redispatch maxconn 2000 timeout connect 5000 timeout client 50000 timeout server 50000 listen stats 192.168.56.200:8989 mode http stats enable stats uri /stats stats realm HAProxy\ Statistics stats auth admin:admin listen cluster37 0.0.0.0:80 mode http balance roundrobin option httpclose option forwardfor cookie SERVERNAME insert indirect nocache server fred 192.168.56.201:8080 check server ethyl 192.168.56.202:8080 check Keepalive Daemon configuration ============================== * On each machine in the "cluster," we configure *keepalived*. * Assume the shared IP is **192.168.56.200** and our two nodes are **192.168.56.201** and **192.168.56.202** .. code-block:: perl :linenos: :caption: Master /etc/keepalived/keepalived.conf global_defs { # Keepalived process identifier lvs_id landlord_fred } # Script used to check if HAProxy is running vrrp_script check_haproxy { script "killall -0 haproxy" interval 2 weight 2 } # Virtual interface # The priority specifies the order in which the assigned interface to take over in a failover vrrp_instance router37 { state MASTER interface eth1 virtual_router_id 37 priority 101 # The virtual ip address shared between the two loadbalancers virtual_ipaddress { 192.168.56.200 } track_script { check_haproxy } } .. code-block:: perl :linenos: :caption: Slave /etckeepalived/keepalived.conf global_defs { # Keepalived process identifier lvs_id landlord_ethyl } # Script used to check if HAProxy is running vrrp_script check_haproxy { script "killall -0 haproxy" interval 2 weight 2 } # Virtual interface # The priority specifies the order in which the assigned interface to take over in a failover vrrp_instance router37 { state SLAVE interface eth1 virtual_router_id 37 priority 100 # The virtual ip address shared between the two loadbalancers virtual_ipaddress { 192.168.56.200 } track_script { check_haproxy } }