Setting up HA for arbitrary services
####################################

.. note::

    This example presumes an Apache HTTP server running on same machines as the Keepalive Daemon, but that is not a requirement.

* What the user experiences

.. graphviz::

    digraph web_service {
        rankdir="LR"
        node [shape="rectangle"]
        Request [shape="circle", color="red"]
        Landlord [color="blue", label="Landlord IP"]
        WEB [ color=blue, label="MP" ]
        Request -> Landlord -> WEB
    }

* Packet's-eye-view

.. graphviz::

    digraph ha_packet {
        rankdir="LR"
        node [shape="rectangle"]
        Request [shape="circle", color="red"]
        subgraph cluster0 {
            label="landlord on 192.168.56.200"
            #KA [ label="192.168.56.200" ]
            subgraph cluster_ethyl {
              label="ethyl on 192.168.56.201"
              color=blue
              HAproxy_ethyl
              MPethyl
            }
            subgraph cluster_fred {
              label="fred on 192.168.56.202"
              color=blue
              HAproxy_fred
              MPfred
            }
        }
        #Request -> KA
        #KA -> HAproxy_ethyl
        #KA -> HAproxy_fred
        Request -> HAproxy_ethyl [ label="192.168.56.200 master" ]
        Request -> HAproxy_fred [ label="192.168.56.200 slave", style=dotted ]
        HAproxy_ethyl -> MPethyl
        HAproxy_ethyl -> MPfred
        HAproxy_fred -> MPethyl
        HAproxy_fred -> MPfred
    }

* Under the hood...

	* This configuration envisions only two machines:  **fred** and **ethyl**

	* The dashed lines are functions, not machines.


.. graphviz::

    digraph ha_demo {
        rankdir="LR"
        node [shape="rectangle"]
        Request [shape="circle", color="red"]
        Landlord [color="blue", label="Landlord IP"]
        HA [ style=dashed, color=blue, label="LB" ]
        WEB [ style=dashed, color=blue, label="MP" ]
        subgraph cluster0 {
            label="fred"
            HAethyl [ label="HA ethyl" ]
            WEBethyl [ label="MP" ]
            KAmaster [ label="VRRP\nmaster" ]
        }
        KA [ style=dashed, color=blue, label="VRRP" ]
        subgraph cluster1 {
            label="ethyl"
            HAfred [ label="HA fred" ]
            KAslave [ label="VRRP\nslave" ]
            WEBfred [ label="MP" ]
        }
        Request -> Landlord [ fontcolor=red, color=red ]
        Landlord -> HA
        HA -> HAethyl [ color=blue, style=dotted, dir=back ]
        HA -> HAfred [ color=blue, style=dotted, dir=back ]
        HA -> WEB [ color=red ]
        WEB -> WEBethyl [ color=red, label="LB or failover", dir=back ]
        WEB -> WEBfred [ color=red, label="LB or failover", dir=back ]
        KAmaster -> HAethyl [ color=orange, label="monitoring" ]
        KAslave -> HAfred [ color=orange, label="monitoring" ]
        KA -> KAmaster [ style=dashed, color=orange, label="VRRP", dir=back ]
        KA -> KAslave [ style=dashed, color=orange, label="VRRP", dir=back ]
    }


* When would we add an appliance?

    * ...When we use bulk encryption for a mixed-domain stack.

        * This may someday apply to us.

    * ...As part of a denial-of-service protection scheme.

        * This will not apply to us anytime soon.

    * ...If most of our application logic already lives on an appliance with only a database behind it.

        * This will likely never apply to us.

    * ...If CI/CD is no longer an objective for the organization.

        * This better not apply to us.

* Here is how it looks with an appliance.

+----------------------+--------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Event                | Resources involved                                                             | Resources involved                                                                                                                                                |
|                      | with agile solution                                                            | with rigid appliance                                                                                                                                              |
+======================+================================================================================+===================================================================================================================================================================+
| Networking incident  | *Network engineer* assesses network issues.                                    | *Network engineer* assesses network issues.                                                                                                                       |
|                      | Component fault isolation is obvious and immediate.                            | Component fault isolation is obvious and immediate.                                                                                                               |
+----------------------+--------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Application incident | *System administrator* validates available services in a central location.     | *Network engineer* and *system administrator* simultaneously test possibly conflicting configration.                                                              |
|                      | Component fault isolation is obvious and immediate.                            | **Fault isolation is a separate task, delaying actual break-fix.**                                                                                                |
+----------------------+--------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Configuration change | A central configuration pushes both the application and its high availability. | **Error-prone human coordination** must simultaneously push the same change to both service nodes and network devices.                                            |
+----------------------+--------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Code push            | *Software developers* define services.                                         | *Software developers* define services.                                                                                                                            |
|                      | *Sysadmins* deploy configuration files to a central location.                  | *Sysadmins* and *network engineers* **figure out which part of the service rests on which hardware**.  (See configuration change.)                                |
|                      | *Network engineers* only assign IP addresses                                   | *Network engineers* assign IP addresses **attempt to guess** whether services, network availability, or temporary resource constraints will define failover       |
|                      |                                                                                | conditions.                                                                                                                                                       |
|                      |                                                                                | **AKA Hope-as-Strategy**                                                                                                                                          |
+----------------------+--------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+


Preparing the system
====================

* Edit */etc/sysctl.conf*

.. code-block:: cfg
    :linenos:
    :caption: /etc/sysctl.conf

    net.ipv4.ip_nonlocal_bind=1
    net.ipv4.ip_forward=1


* Make the sysctl changes permanent and install the RMPs

.. code-block:: bash
    :linenos:

    # Cement the changes from the previous step
    sysctl -p
    # Install the packages
    yum install -y httpd keepalived haproxy
    # Enable the services we will use
    chkconfig keepalived on
    chkconfig haproxy on
    chkconfig httpd on

HA Proxy configuration
======================

* The HA Proxy configuration should be the same on each node.

.. code-block:: apache
    :linenos:
    :caption: /etc/haproxy/haproxy.cfg

    global
        log 127.0.0.1   local0
        log 127.0.0.1   local1 notice
        maxconn 4096
        user haproxy
        group haproxy
        daemon
        #debug
        #quiet

    defaults
        log     global
        mode    http
        option  httplog
        option  dontlognull
        retries 3
        option redispatch
        maxconn 2000
        timeout connect     5000
        timeout client      50000
        timeout server      50000

    listen stats 192.168.56.200:8989
        mode http
        stats enable
        stats uri /stats
        stats realm HAProxy\ Statistics
        stats auth admin:admin

    listen cluster37 0.0.0.0:80
        mode http
        balance roundrobin
        option httpclose
        option forwardfor
        cookie SERVERNAME insert indirect nocache
        server fred 192.168.56.201:8080 check
        server ethyl 192.168.56.202:8080 check


Keepalive Daemon configuration
==============================

* On each machine in the "cluster," we configure *keepalived*.
* Assume the shared IP is **192.168.56.200** and our two nodes are **192.168.56.201** and **192.168.56.202**

.. code-block:: perl
    :linenos:
    :caption: Master /etc/keepalived/keepalived.conf

    global_defs {
        # Keepalived process identifier
        lvs_id landlord_fred
    }
    # Script used to check if HAProxy is running
    vrrp_script check_haproxy {
        script "killall -0 haproxy"
        interval 2
        weight 2
    }
    # Virtual interface
    # The priority specifies the order in which the assigned interface to take over in a failover
    vrrp_instance router37 {
        state MASTER
        interface eth1
        virtual_router_id 37
        priority 101
        # The virtual ip address shared between the two loadbalancers
        virtual_ipaddress {
            192.168.56.200
        }
        track_script {
            check_haproxy
        }
    }

.. code-block:: perl
    :linenos:
    :caption: Slave /etckeepalived/keepalived.conf

    global_defs {
        # Keepalived process identifier
        lvs_id landlord_ethyl
    }
    # Script used to check if HAProxy is running
    vrrp_script check_haproxy {
        script "killall -0 haproxy"
        interval 2
        weight 2
    }
    # Virtual interface
    # The priority specifies the order in which the assigned interface to take over in a failover
    vrrp_instance router37 {
        state SLAVE
        interface eth1
        virtual_router_id 37
        priority 100
        # The virtual ip address shared between the two loadbalancers
        virtual_ipaddress {
            192.168.56.200
        }
        track_script {
            check_haproxy
        }
    }