Select Badges

Select Supported Platforms

RSS

pubfactory_cookbook_prime (122) Versions 2.0.117

Recipes and roles for PubFactory

Berkshelf
Policyfile
Knife
cookbook 'pubfactory_cookbook_prime', '= 2.0.117'
cookbook 'pubfactory_cookbook_prime', '= 2.0.117', :supermarket
knife supermarket install pubfactory_cookbook_prime
knife supermarket download pubfactory_cookbook_prime
README
Dependencies

DO NOT COMMIT DIRECTLY TO GITHUB. USE DELIVERY VIA CHEF AUTOMATE

pubfactory_cookbook_prime

The Chef recipes and roles used to set up PubFactory servers.

Debugging Live Nodes

The first place to start is by looking at /var/log/chef/client.log to see if there are any errors. If you want to force an update, you can run sudo chef-client and watch the logs.


Recipes

  • pubfactory_cookbook_prime::web - Recipe for web app nodes
  • pubfactory_cookbook_prime::loader - Loader and CMS recipe
  • pubfactory_cookbook_prime::amx - AMX recipe
  • pubfactory_cookbook_prime::base - base cookbook run by all

TODO RYAN there are more recipes to list.

There is also a default pubfactory_cookbook_prime::default recipe that is empty.

The base cookbook runs for all three of the primary cookbook recipes and sets up subversion, and our /proj directories.


Data Bags

The pubfactory_cookbook_prime relies on a single data bag to be set up per environment (prod, sandbox, dev, etc.).

All of the data bags must be in the /pubfactory directory in the data bags repo.

See example_data_bag_.json for an example of how the data bag should be configured, and/or read on for an explaination of what should be in the data bag.

Data Bag Configuration

For each new group of applications a new database is needed. The data bag is the base configuration that control the following:

  1. SVN repo URL
    • This represents the location that houses the configuration files for this environment, see the Config File section below
  2. NSF Mount Points
    • The following must be added to your data bag, specifying the client mount points for /proj/env, /proj/staticfiles, /proj/source... javascript "nfs": { "env": "dg-db-02:/nfs/env/prod", "staticfiles": "dg-db-02:/nfs/staticfiles", "source": "dg-db-02:/nfs/source" }
    • Please note that these must also be on the NFS server side /etc/exports (currently this is not Chefized).
  3. FTP Users
  4. Host file editing - Node host files can be maintained via the databags allowing for easy maintenance across all nodes.
    • example: javascript "hosts_file": { "custom_entries": [ { "ip" : "10.33.33.100", "host" : "chefzero", "aliases" : [""] }, { "ip" : "10.33.33.101", "host" : "web", "aliases" : [""] }, { "ip" : "10.33.33.102", "host" : "amx", "aliases" : [""] }, { "ip" : "10.33.33.103", "host" : "loader", "aliases" : [""] } ] }

FTP

To configure FTP on a node, you must add users in your databag... we no longer give all users on the machine access:

{
...
    "ftp": {
        "users": [
              {
                  "name": "bob",
                  "password_hash": "z32e/rE43l8w.k",
                  "local_root": "/proj/source"
              },
              {
                  "name": "ifactory",
                  // To get the password hash, type `htpasswd -cd /etc/vsftpd/passwd username; more /etc/vsftpd/passwd`
                  // NOTE THAT THE -d option is vital see http://superuser.com/questions/386531/why-wont-vsftpd-let-me-log-in-with-a-virtual-user-account
                  "password_hash": "xZZ2e/rENDl8w",
                  "local_root": "/proj/source"
              }
         ]
    }
...
}

How/Why Our FTP Works

For each user listed in the data bag, we create a new directory in /home/vsftpd/username, vsftp requires that a user does not have write access to their root directory, so the permission of this folder is set to 555. We then mount the local_root to a source subfolder within their root directory - /home/vsftpd/username/source... This means when a user logs in, they start in /home/vsftpd/username, and they must cd to the source directory before being able to add/edit/delete any files.

Additional resources as to why this is so complicated: * http://www.sigerr.org/linux/setup-vsftpd-custom-multiple-directories-users-accounts-ubuntu-step-by-step/ * http://radu.cotescu.com/vsftpd-and-symbolic-links/ * http://superuser.com/questions/386531/why-wont-vsftpd-let-me-log-in-with-a-virtual-user-account


Node Configuration

Heap Memory / Garbage Collection

This cookbook attempts set most JVM options programmatically with the exception of Heap Xmx settings. Each tomcat instance generally needs their Xmx heap set manually.

There is some legacy computation logic for heap space from when we tried running tomcats with 1 instance per VM. This logic is not recommended for new multi-tenant nodes.

DNS Servers

This is configurable via the sbo_network_interfaces recipe... here is the sample: javascript "normal": { "sbo_network_interfaces": { "interfaces": [ { "name": "eth0", "address": "10.16.32.22", "netmask": "255.255.252.0", "gateway": "10.16.32.1", "dns-nameservers": [ "10.16.24.58", "10.16.24.59" ] },

Application Configuration

Use the pf-config git project for config file management.

Logtoaster Configuration

It is intended that there will be a "jobs" node for tasks such as logtoaster, however it can be configured for any node... a single node will process configured instances logs. The script looks in /proj/env/logs/<node_name>/<instance_name> directory and if a requests.<date>.txt.gz exists, it will copy, ungzip and toast the logs. See [logtoaster-execute-<instance_name>.sh](/templates/default/logtoaster-execute.sh)

"pubfactory_cookbook_prime": {
      "instances": {
          "eep-web-qa": {
            "logtoaster": {
              "cron": "true",
              "log_dirs": [
                "/proj/env/logs/ams-psy-01/tomcat7",
                "/proj/env/logs/ams-psy-02/tomcat7",
                "/proj/env/logs/web-10/dg-web10-1",
                "/proj/env/logs/web-10/dg-web10-2",
                "/proj/env/logs/web-11/tomcat7",
                "/proj/env/logs/web-13/tomcat7",
                "/proj/env/logs/web-15/tomcat7",
                "/proj/env/logs/pdf-01/tomcat7"
              ]
            }
          },
          "loeb-web-qa": {
            "logtoaster": {
              "cron": "false",
              "config_file_override": "/fully/qualified/path/to/configuration.xml"
            }
          },
          "no-toast": {
          }
      }
}

Server Monitoring

To add the default server monitoring using the Safari API key, simply add role[newrelic] to your run_list.

Application Monitoring

In order to get the Application monitoring in place, you have to enable server monitoring, and do the following:

If Overridding Default Java Options

It's not recommend to override all JVM options so you probably shouldn't need the below config hints.

If you are overridding the default PF java_options, you need to do the following in addition to the above in order to fully enable New Relic: 1. Tell Java where the agent is by editing the ['tomcat']['java_options'] configuration: "tomcat": { "java_options": "... -javaagent:/var/lib/tomcat7/newrelic/newrelic.jar ..." } 2. If configuring a non production environment, also add this to your -Dnewrelic.environment=<environment> where <environment> = 'development', 'staging', or 'test'

Alert Triggering

To trigger baseapp alerts, we currently use a cron job that hits a page of the app. To enable alert triggering, pick the node you would like to have the cron and change the following node configuration

...
"normal": {
...
    "pubfactory_cookbook_prime": {
       ...
        "alerting": {
            "url": "http://www.degruyter.com/dg/dgtriggeralerts?sendEmails=true", // the URL to hit
            "cron": {
                "enabled": true, // enables the job
                "mailto": "nfolts@safaribooksonline.com,blough@safaribooksonline.com" // emails if there are error
            }
        }
    },
   ...

In some cases, you may want to make sure that the alerting hits the same node that the cron is on... you can specify a loopback in the hosts file of that node by editing the following (though, usering ?server= functionality in your alerting URL is preferred): javascript ... "normal": { ... "hosts_file": { "fqdn": "pdf-01.dg.safaribooks.com www.degruyter.com" }, ...

Log Backup

On all nodes, a cron job is created for the pubfactory user that moves all logs in the /var/log/tomcat7 directory to /proj/env/log/<hostname>/<instance_name> as the /proj/env is intended to be a shared storage solution and they can be backed up this way.

This backup process happens at 12:01 AM (server time). It is a required pre-req for the logtoasting, as the logtoasting job needs to find all files from all servers.

Sandboxing

We are attempting to standardize the sandbox environments (where clients get access to edit xsl/css)...

Open Athens Configuration

TODO: explain this

Nagios Configuration

TODO: Please add the Nagios Recipe to all nodes..

"recipe[sbo_nagios]",

Also there is a need for the following stanza to be placed within Normal. I will paste a DG example here.....

.......
}, "nagios": { "allowed_hosts":["50.0.113.48"], "nrpe":{ "dont_blame_nrpe":"1" }, "pagerduty": { "key": null } }, "pubfactory_cookbook_prime": { "databag": "dg_live", "config_file": "config-dg-web-nodes.properties", "newrelic": "production" }, .......

URL Rewrites

For URL rewriting, we rely on a third party library called Tuckey... to see how to configure URL rewrites, refer to PubFactory upgrade documentation

HTTP Cache Headers

For cache headers, we now use HttpCacheControlService and contribute HttpCacheControlProviders.


Log Management

Every node backs up their log to the /proj/env/logs/<node_name>/<instance_name> (configurable with node['pubfactory_cookbook_prime']['log_backup_dir'] but with the instance name being added on the end)... since they are backed up there, and that is intended (by default) to be a NFS mount, we also remove the logs from the individual nodes after 30 (configurable with node['pubfactory_cookbook_prime']['log_retention_days'])

Backups

The file managing the logs is /proj/bin/log_backup_cron.sh, please see it in the templates dir of this cookbook to see it's full capabilities. The individual log backup scripts per instance are located in /proj/bin/log_backup_<instance_name>.sh

Logtoaster

Logtoaster (in the CHEF world) relies on the backup script to put the log files on the NFS, as logtoaster is configured per instance and, cron enabled logtoasters, are managed by /proj/bin/logtoaster_cron.sh The individual logtoaster scripts per instance are located in /proj/bin/logtoaster-<instance_name>

Logtoasters are configured within the Pubfactory Cookbook (pubfactory_cookbook_prime), under the "instances" attribute.

The instance scripts will look for the logtoaster configuration in the following default location: /proj/config/logtoaster/<instance_name>/logtoaster_configuration_<instance_name>.xml

You can override the configuration location via the "config_file_override" logtoaster attribute. You can determine if the instances logtoaster will be included in the server's cron job by setting the "cron" logtoaster attribute to "true" or "false". If a Pubfactory Cookbook instance does not have a "logtoaster" attribute, none of the logtoaster support will be generated for that instance.

"pubfactory_cookbook_prime": {
      "instances": {
          "eep-web-qa": {
            "logtoaster": {
              "cron": "true"
            }
          },
          "loeb-web-qa": {
            "logtoaster": {
              "cron": "false",
              "config_file": "/fully/qualified/path/to/configuration.xml"
            }
          },
          "no-toast": {
          }
      }
}

Load Balancing

Every node now runs haproxy to allow for load balancing of amx, xml, and mysql. Configuration of these values are done in the Data Bag for the environment.

Stats Page

To see the stats of an embedded HAproxy, follow these instructions.

From your machine, you can use this command to tunnel (replace rpollock with your SSH user name) ssh -L8090:localhost:8090 rpollock@test-web-01.dg.safaribooks.com

Then, in your browser, go to http://localhost:8090/haproxy

AMX Load Balancing

Any number of AMXes can be used for load balancing of rest calls from the app. HAProxy listends on port 8090 and directs traffic to one of the nodes...

AMX LB Data Bag Configuration

    "haproxy_dblb": {
        ...
        "amx_endpoints": [
            {
                "id": "ams-auth-01",
                "address": "10.248.254.111:80"
            },
            {
                "id": "ams-auth-02",
                "address": "10.248.254.112:80"
            }
        ]
    }

AMX LB App Configuration

Every node has a loopback address of "amx-lb", allowing EVERY web node to be configured with the following AMX connection:

# AMX Configuration
amx.rest.port = 8090
amx.rest.host = user:pass@amx-lb
amx.rest.base-path = rest/
amx.rest.protocol = http
amx.rest.base-address = http://user:pass@amx-lb:8090/rest

MySQL Load Balancing

We support a master/slave LB configuration... if the master goes down, the LB will point to the slave, though the slave should be configured for read only, so some site functionality will fail. HAProxy listens on 3306 (the standard MySQL port).

MySQL LB Data Bag Configuration

    "haproxy_dblb": {
        "sql_db_1": "10.248.254.14:3306", // the master
        "sql_db_2": "10.248.254.15:3306" // the slave
        ...
    }

MySQL LB App Configuration

Every node has a loopback address of "mysql-lb", allowing EVERY web node to be configured with the following MySQL connection:

hibernate.connection.url=jdbc:mysql://mysql-lb/database?autoReconnect=true&characterEncoding=UTF-8

XML (Solr/MarkLogic) Load Balancing

Any number of servers can be specified as XML connections... It is currently configured so that it is round robin to all nodes, as we expect no writes need to happen from a web app (though, currently we do have bootstrapping for ML databases, but that is going to be reworked).

XML LB Data Bag Configuration

    "haproxy_dblb": {
        "xml_dbs": [
            {
                "id": "dg-mlo-01",
                "address": "10.248.254.11:8013"
            },
            {
                "id": "dg-mlo-02",
                "address": "10.248.254.12:8013"
            }
        ],
        ...
    }

XML LB App Configuration

Every node has a loopback address of "xml-lb", allowing EVERY web node to be configured with the following XML connection:

xml.connection.uri = xcc://ifactory:368cong4@xml-lb:8015

JMS Health Check and Restart

For JMS Processors that use a REST interface (as opposed to via OpenWire): Because Klopotek cannot yet figure out why, there are occasions where a mystery consumer attaches itself and blocks the JMS processor, thus filling the JMS log with errors instead of actually processing messages. There is a script that was originally placed in /pf-config/services/brill/prod/brill-jms-prod/ called checkHealthJms.sh. Copy this file into the desired config directory and then configure the node according to the example listed below:

    "jms_check": [
        {
          "client": "brill",    
          "environment": "prod",
          "mailto": "melanie.brooks@sheridan.com"
        }
      ]
    }
        ...
    }

Authors

Ryan Pollock rpollock@oreilly.com Nikk Folts nicholas.folts@sheridan.com Justin Lapierre justin.lapierre@sheridan.com Melanie Brooks melanie.brooks@sheridan.com

Dependent cookbooks

pf_chef_app_pubfactory_tomcat ~> 0.1.17
logrotate >= 0.0.0
seven_zip = 3.0.0
pf_chef_app_dart_sass >= 0.0.0
chef-client >= 0.0.0
pleaserun ~> 0.1.1
consul ~> 2.3.0
ntp ~> 3.3.1
sudo ~> 2.9.0
build-essential >= 0.0.0
users ~> 2.0.3
ssh_known_hosts ~> 3.0.0
iptables ~> 1.1.0
safari-tomcat ~> 1.2.6
haproxy = 6.2.6
hosts_file ~> 0.2.0
cron ~> 1.7.6
wkhtmltopdf ~> 0.2.0
curl ~> 2.0.0
subversion ~> 1.3.1
imagemagick ~> 0.2.3
openssl ~> 2.0.0
java ~> 1.31.0
apt ~> 2.9.2
apache2 = 5.0.1
ark >= 0.0.0

Contingent cookbooks

pf_chef_role_monitor Applicable Versions
pf_chef_role_nagios_internal Applicable Versions