Using Metrics to Guide Container Adoption, Part I

Earlier this year, I wrote about a new approach my team is pursuing to inform our Container Adoption Program using software delivery metrics to keep organizations aligned and focused, even when those organizations are engaging in multiple workstreams spanning infrastructure, release management, and application onboarding. I talked about starting with a core four set of metrics identified in Accelerate (Forsgren, Humble, and Kim) that act as drivers of both organizational and noncommercial performance.

Let’s start to highlight how those metrics can inform an adoption program at the implementation team level. The four metrics are: Lead Time for Change, Deployment Frequency, Mean Time to Recovery, and Change Failure Rate. Starting with Lead Time and Deployment Frequency, here are some suggestions for activities that each metric can guide in initiatives to adopt containers, with special thanks to Eric Sauer, Prakriti Verma, Simon Bailleux, and the rest of the Metrics Driven Transformation working group at Red Hat. .

Lead Time for Change

Providing automated, immutable infrastructure to development teams. Use OCI container images to define a baseline set of development environments to developer teams and allow self-service provisioning for new projects.

Building automated deployment pipelines. Create deployment pipelines using build servers, source code management, image registries, and Kubernetes to automate previously manual deployment and promotion processes.

Building unit tests. Unit tests are often talked about but still too often left out of development activities, and they are as relevant and important in cloud or Kubernetes environments as they are in traditional deployments. Every piece of code with faulty logic sent back for rework by a manual test team represents unnecessary delays in a deployment pipeline. Building unit tests into an automated build process keeps those errors close to their developer source, where they are quickly fixable.

Automating and streamlining functional testing. Just as unit tests avoid time-consuming manual testing, so does the automation of functional acceptance tests. These tests evaluate correctness against business use cases and are more complex than code-level unit tests. That makes them all the more important to automate in order to drive down deployment lead times. The contribution of Kubernetes here is ability to easily spin up and destroy sophisticated container-based test architectures to improve overall throughput.

Container native application architectures. As the size and number of dependencies in an application deployment increases, the chances of deployment delays due to errors or other issues in the readiness of those dependencies likewise increases. Decomposing monoliths into smaller containerized, API-centric microservices can speed deployment time by decoupling the service from the deployment lifecycle of the rest of the application.

Deployment Frequency

For implementation teams, deployment frequency is as much about development processes as it is about technical architecture.

Iterative planning. Deployment frequency is in part a function of the way project management approaches the planning process. Each release represents an aggregation of functionality that is considered significant or meaningful to some stakeholder. Rethinking the concept of a release at an organizational level can help improve deployment frequency. As project managers (and stakeholders) start to reimagine delivery as an ongoing stream of functionality rather than the culmination of extended planning process, iterative planning takes hold. Teams plan enough to get to the next sprint demo and use the learning from that sprint is input for the next increment.

User story mapping. User story mapping is a release planning pattern identified by Jeff Patton to get around the shortcomings of the traditional Agile backlog. If Agile sprint planning and backlog grooming is causing teams to deliberately throttle back on the number of software releases, it may be time to revisit the Agile development process itself, replacing by-the-book techniques with other approaches that may feel more natural to the team.

Container native microservices architecture. Larger and more complex deployments are hard to deploy cleanly. It is difficult to automate the configuration of deployments with a large number of library and infrastructure dependencies, and without that automation, manual configuration mistakes are bound to happen. Knowing those deployments are painful and error-prone, teams inevitably commit to fewer, less frequent deployments to reduce the number of outages and late night phone calls. Breaking a large monolithic deployment into smaller, simpler, more independent processes and artifacts makes deployments potentially less painful, which should give teams the assurance to increase deployment frequency to keep pace with customer demands.

These are just a few team-level techniques organizations can pursue to improve Lead Time for Change and Deployment Frequency, the software delivery metrics associated with market agility. In the next posts, I’ll outline some techniques teams can pursue to improve upon the measures of reliability: Mean Time to Recovery and Change Failure Rate.

Exploring a Metrics-Driven Approach to Transformation

My team has been working with organizations adopting containers, Kubernetes, and OpenShift for more than three years now. When we started providing professional services around enterprise Kubernetes, it became clear we needed a program-level framework for adopting containers that spelled out the activities of multiple project teams. Some participants would be focused on container platform management and operations, some on continuous delivery, some on the application lifecycle, and others on cross-cutting process transformation.

We’ve had success using this framework to help customers rethink container adoption as less a matter of new technology implementation and more as an “organizational journey” where people and process concerns are at least as important as the successful deployment of OpenShift clusters.

Over time, we’ve realized the program framework is missing a guiding force that gets executive stakeholders engaged and keeps all participants focused on a consistent, meaningful set of objectives. Too often, we’ve seen IT and development managers concentrated on narrow, tactical objectives that don’t drive the bigger picture, transformational needs of most enterprises today. What we felt was lacking was a set of trackable and meaningful measures that could demonstrate progress to all stakeholders in a highly visible way.

We were very excited by the release of Accelerate by Nicole Forsgren, Jez Humble, and Gene Kim last year as the culmination of “a four-year research journey to investigate what capabilities and practices are important to accelerate the development and delivery of software, and, in turn, value to companies.” The authors, already well-known for their work on Puppet Labs’ State of DevOps reports and books like Continuous Delivery and The Phoenix Project, were able to use extensive survey data and statistical analysis to show relationships between specific capabilities/practices and organizational performance.

One of these capabilities, software delivery performance, is of particular interest to organizations undergoing cloud adoption and/or digital transformation. Forsgren and her co-authors showed a statistical link between software delivery performance and organization performance, including financial indicators like profitability, productivity, market share, and number of customers. Interestingly, the authors showed a link between software delivery performance and non-commercial performance as well: things like product quality, customer satisfaction, and achieving mission goals.

Equally important, the authors defined software delivery performance in a very concrete, measurable way that can be used as indicators for a wide range of transformative practices and capabilities. They defined software delivery performance using four metrics: Lead Time, Deployment Frequency, Mean Time to Recovery, and Change Failure Rate, described below.

Finally, the authors enumerated the various practices and capabilities that drive software delivery performance as measured this way: test automation, deployment automation, loosely coupled architecture, and monitoring, among others.

What this means is that we now have specific measures that adopters of container platforms (among other emerging technologies) can use to guide how the technology is adopted in ways that lead to better organizational performance. And we have a set of statistically validated practices that can be applied against this technology backdrop, using containers and container platform as accelerators of those practices when possible.

The focus for the authors is on global performance, not local optimization, and on “outcomes not output,” so the organization rewards activities that matter, rather than sheer volume of activity. This last point is crucial. In an earlier post, I wrote about app onboarding to OpenShift. Taken to the extreme, a myopic focus on the percentage of the portfolio or number of apps migrated to X (containers, Kubernetes, OpenShift, AWS, “The Cloud”) is a focus on outputs not outcomes. It’s a measure that seems to indicate progress but does not directly determine the success of the cloud adoption program as a whole, success that must involve some wider notion of commercial or noncommercial performance.

Put another way, cloud platforms do not automatically confer continuous delivery capabilities upon their adopters. They enable them. They accelerate them. But without changing the way we deliver software as an organization—the way we work—cloud technology (or any other newly introduced technology) will probably fail to match its promise.

I will be writing more about how we put a metrics-based approach into practice with our customers in upcoming posts, starting with an update on how we’ve begun to capture these metrics in easily-viewable dashboard to keep stakeholders and project participants aligned to meaningful goals.

Assessing App Portfolios for Onboarding to OpenShift

I’ve decided to start writing to this blog again, but reflecting a change in roles and professional focus, the topics are going to be more about organizational practices and less about pure technical implementation. This post is all about the transition to PaaS platforms from existing environments.

Most professionals who’ve spent enough time in the IT industry have seen organizational silos in action. The classic silos are the ones created by Development and Operations organizations, silos we aim to break down through DevOps-style collaboration. But how many organizations pursuing digital transformation are continuing that siloed thinking when it comes to evaluating the application portfolio for cloud migration and modernization?

Application Development, Database Operations, Infrastructure, and the various lines of business have portions of the application portfolio for which they take responsibility. When organizations think about modernization, they need to deemphasize the silos and develop a comprehensive approach that evaluates the entire portfolio and the teams that support those applications. Otherwise, they’re leaving money on the table in the form of missed opportunities for cost savings and application improvements that generate revenue and increase customer engagement.

A comprehensive approach takes into account the full range of workloads supported by the IT organization and starts making tough decisions about: which workloads can/should be modernized, which should be rehosted to take advantage of more efficient cloud platforms, and which should be left as is or even retired because they’re outlived their usefulness.

My team works with many organizations that treat Kubernetes/OpenShift container platform adoption as an infrastructure modernization project. We recommend using the current wave of Kubernetes adoption as an opportunity to broaden the discussion, build bridges between Ops and Dev, and develop an approach that evaluates all application migration pathways, including ones that may not necessarily result in containerization.

So how does an organization work through an application portfolio assessment efficiently and holistically?

One way to approach this project is through a three-step process that looks like the following:

  1. Filter the Portfolio/Teams
  2. Identify and Select Archetypes and Teams
  3. Analyze and Prioritize Individual Applications and Teams

Step 1: Filter the Portfolio/Teams

Start with a configuration management database or application index and assemble your candidate application population. Ideally, this index also has some information about the team responsible for operating, maintaining, and developing the applications. This might include the responsible group, project manager, primary technical team lead, and number of operators and developers.

At this point, it’s important to apply a filter to the application/team inventory, setting aside workload/team types that are not good initial candidates for onboarding to container platforms.

Kubernetes has to-date largely focused on orchestration inside Linux host clusters. Workloads that target other operating systems, especially mainframe and enterprise desktop, don’t make good candidates for initial onboarding activity today. This story may change as Microsoft containers and Windows hosts become more integrated into the Kubernetes ecosystem and its operational practices.

Because container platform workflows accelerate software development and deployment practices, the largest ROI opportunities are with workloads whose source code you maintain and control. These are the workloads for which accelerated deployment frequency can improve software quality and foster innovation and value creation. This should also include net-new development, including modernization efforts that involve rewriting legacy mainframe workloads as container-native applications. Infrequently re-deployed commercial off-the-shelf databases and other enterprise products may not be the right workloads to start with, especially if they weren’t designed to take advantage of distributed, elastic cloud environments.

Put all of these considerations together and you wind up with a preliminary filtering guide that looks something like the following:

Step 2: Identify and Select Archetypes and Teams

Step 1 should have resulted in a much smaller list of workloads and teams for consideration for onboarding to the container platform. The next step is to understand where to focus within the remaining set to capture the best ROI for modernization. That means considering application patterns, application value to the business, and team personality.

Every application portfolio has a mix of application types. These types are defined by application function (user interface layer, API layer, web services, batch processing, etc.), system dependencies, and technology choices, among other variables. Identify applications that seem to match particular patterns (e.g. Java web services) that tend to repeat themselves across the portfolio. The idea is to perform an onboarding and capture lessons learned in a way that can be reapplied among the remainder of applications of the same type. You want to address as much of the portfolio with that repeatable pattern of onboarding as possible.

Also in this stage, consider that each application generates differing levels of business value for the organization. Some applications may be infrequently used or outmoded completely. These are candidates for retirement. Some applications may have more visibility to the organization and/or support revenue generating function. Actively developed applications are more likely to benefit from rapid delivery processes aligned to container technology.

Beyond that, the onboarding process represents an opportunity to create a new platform of API-driven, reusable services opened to a wider stakeholder group. Try to select applications and workloads that contribute to this kind of transformed vision of service delivery and add productive value to the organization, with a measurable impact that can be highlighted to build enthusiasm for the program.

Finally, recognize that not all teams have equal enthusiasm to be early adopters of enterprise cloud technology. Consider teams that have demonstrated an ability to learn and embrace new technology, and, importantly, are willing to provide feedback to the platform team on how the platform and onboarding processes can be adjusted to create a pleasant platform user experience for other onboarded development teams in the future.

Step 3: Analyze and Prioritize Individual Applications/Teams

Now that you have your application portfolio and team analysis focused on a smaller, more manageable number of applications and teams, you’re ready to do a deep dive analysis on the prioritization of those workloads and rough level of complexity for migrating each.

The high level technical requirements for application suitability for container platforms are not particularly stringent. Here is a sample list of criteria for Red Hat OpenShift:

You may want to expand beyond these basic criteria to uncover hidden layers of complexity that could cause an onboarding project to get bogged down. Are there dependencies on external services that will need to be onboarded to the container platform as well or is egress routing sufficient? How well does the application support clustering for resiliency or performance? Does the application have a robust test suite to validate proper system behavior after onboarding to the new platform? Does it have a performance metrics baseline to compare against?

What you would like to arrive at is a decision on the best fit team or two to launch your container adoption program with and a list of 10-15 additional teams and workloads in the queue for expanding the program in the next phase of application onboarding. Use initial app onboarding efforts as test cases, documenting what is working and what isn’t and capturing patterns of app onboarding techniques that can be re-applied across the portfolio.

Container adoption requires cross-functional collaboration between operations and application teams to develop a platform that works for everyone. The most important thing is getting started and getting feedback. With a portfolio assessment completed, you have enough planning in place to get application onboarding off on the right foot.


Deploying Applications to OpenShift v2 through Jenkins

There are at least three different ways to pursue Jenkins integration with OpenShift Enterprise version 2.

1. Use OpenShift’s built-in Jenkins capability. You can push source code to the OpenShift gear directly and allow an embedded Jenkins client to handle compilation and deployment of the code. For more details on this, see the Continuous Integration section on the OpenShift Developer Portal.

2. Binary deployment via OpenShift API and Git push. Create a script job in Jenkins that uses the OpenShift REST API or CLI to create the gear, then use the Git URL of the created gear returned by the REST call to push a binary deployment.

3. Jenkins OpenShift Deployer Plugin. The community open source Jenkins OpenShift Deployer plugin allows a Jenkins job to create OpenShift containers and deploy applications to it. The Jenkins job is configured through the Jenkins UI rather than through a custom script.

Most enterprises will want to run Jenkins in a centralized server/cluster external to the PaaS containers that host the applications. OpenShift itself could be the hosting environment for that Jenkins cluster, but it would likely run in a separate OpenShift Enterprise environment and feed releases to the DEV, TEST, and PROD OpenShift Enterprise environments that host and autoscale the applications.

That means either handling your own integration between Jenkins and OpenShift Enterprise use the OpenShift REST API and Git API or using a third-party tool like the open source community Jenkins OpenShift Deployer.

The OpenShift Deployer worked well for the use case I’ve deen demo’ing. It’s simply a matter of configuring the connection and SSH keys for the OpenShift environments that Jenkins will communicate with and then tying that OpenShift broker into a build’s deployment job. Here is a screenshot of the configuration of OpenShift servers in the Manage Jenkins > Configure System screen:

jenkins-2

As you can see, the plugin allows you to test your credentials against the broker for the OpenShift Enterprise environment you’re configuration and even upload the Jenkins server’s public SSH key to the broker. The plugin will log in with your credentials and upload the SSH key so that Jenkins can automatically drive deployments to your OpenShift environment with no human intervention.

With that in place, you can configure a build job to deploy to OpenShift. Here is the snippet from the build job (in this case the one that corresponds to the Acceptance stage of our deployment pipeline) that deploys the application on OpenShift:

Jenkins Job Configuration with the OpenShift Deployer Plugin

The Deployment Type is GIT, which signifies to the plugin that this is not a deployable zipped as a .tar.gz archive. The actual deployment package is listed in the current directory, which is where the package was moved to by an earlier step. In a real enterprise environment, this would contain a URL to the location of the package in an artifact repository like Nexus.

With Jenkins/OpenShift integration, you have two powerful tools of automation in a deployment pipeline working hand-in-hand.

Jenkins and Deployment Pipelines

In the previous post, I described deployment pipelines and mentioned how you need a controller to drive software releases through each stage of the pipeline. This component is typically a Jenkins CI/CD server, although Jenkins’ built-in capabilities to capture pipelines of deployment activity are rather limited. There is an open source Build Pipeline plugin that gets the job done, for the most part. Other commercial tools in this area include: Atlassian Bamboo, ThoughtWorks Go, and XebiaLabs XLRelease.

This is a screenshot from the Jenkins Build Pipeline plugin that shows various pipelines in progress.

bp

It helpfully color codes the status of each stage of each pipeline, and, combined with Jenkins itself, allows you to do role-based access to control how and when jobs are launched and restarted. Most importantly, it allows you to capture a flow of build jobs, from stage to stage. Stages can be configured to automatically begin based on successful completion of the stage preceding it or they can require manual push button approval to proceed.

The Build Pipeline tool enables us to capture each of the deployment stages shown below:

Commit Stage

commit

The Commit Stage is a continuous integration step, used to validate a particular code commit from a compilation, unit test, and code quality standpoint. It does not involve any deployment to an environment, but does include export of a compiled binary (in the case of a Java project) to an enterprise artifact repository such as Nexus.

Acceptance Stage

acceptance

The Acceptance stage is where automated functional acceptance and integration tests are performed. To do this, Jenkins will deploy the code to an environment. PaaS makes this trivial, as I’ll shown in a later post.

UAT Stage

uat

In this deployment pipeline example, a separate UAT stage is available for manual user testing, important for confirming application usability. Advancement to the UAT Stage must be approved by the QA team, who log in to Jenkins to perform a push button deployment.

Production Stage

production

Release to production is the final stage, controlled by the Operations team. Like the two stages before it, it also involves CI/CD server deployment to the PaaS environment.

So that captures the deployment pipeline’s integration with the Jenkins CI/CD server. The next post will profile how the CI/CD server leverages PaaS to make both environment creation, configuration, and application deployment an easy process.

Continuous Delivery Deployment Pipelines and PaaS

This has been a year of DevOps for me professionally. I spent some quality time with a couple of the must-reads: Continuous Delivery by Humble and Farley and The Phoenix Project by Kim, Behr, and Spafford and worked with some colleagues at Red Hat to demonstrate how a DevOps-oriented continuous delivery workflow could operate in the context of Platform-as-a-Service (PaaS).

PaaS is a key enabler of continuous delivery, and should be evaluated as part of any serious effort at DevOps transformation in a large IT organization. PaaS is maturing very rapidly. Offerings like OpenShift by Red Hat now satisfy, out-of-the-box, a lot of automation approaches teams used to have to develop ad hoc solutions for using tools like Puppet and Chef. For many use cases, there should be no need to roll your own development platform install and config automation, at least in any of the major platforms like Java, PHP, Python, Ruby, etc.

Push button deployment of application code to a newly created, containerized server is a solved scenario, available on public cloud infrastructure or internal data centers. My experience is mostly with OpenShift, and it works very well for typical web application deployment.

Humble and Farley’s Continuous Delivery positions the development of “deployment pipelines” as maybe the central practice in efficient software delivery. A deployment pipeline is a structured, automated, optimized process for moving software from idea to production. Deployment pipelines organize delivery into a series of stages, each of which validates a code release in a different, more restrictive way to make sure it’s ready for production.

deployment-pipeline

For Continuous Delivery to work efficiently, the pipeline must be examined for automation opportunities. Wherever you see “environment configuration” or “deployment” in the pipeline, that’s an opportunity for PaaS to take care of automation for you. What’s missing is a central component that can encapsulate the deployment pipeline process and call through to things like the PaaS to drive the software release forward. This is the role typically played by a Continuous Integration/Continuous Delivery server, the most well known of which is the open source Jenkins project. I’ll talk about Jenkins integration with OpenShift in upcoming posts.

Red Hat JBoss BRMS 5 Release Management Strategies

Red Hat JBoss BRMS 5 is an enterprise-supported business rules management, business process management, and complex event processing system based on the JBoss community Drools project. The repository for business rules is a JCR implementation and BRMS 5 exposes a REST and WebDAV API for moving rule packages between environments. There are several ways to migrate a business rules repository from one code promotion environment to the other. This blog post lays out three different approaches, based on the level of formal process control at the organization.

Release Management Strategies Overview
Release Management Strategies Overview

Shared Server Model

shared-server

Configuration

  • One BRMS server shared by all environments: dev, test, preprod, production
  • Deployment environments technically not 100% isolated from each other
  • BRMS management functions can prevent
    unauthorized changes to production

Release Process

  • Rules can be authored in various environments depending on programming interest/expertise of rule authors.
  • Rules are snapshotted in BRMS server to identify them as release-ready.
  • Once snapshot is made, the rules are available on new URL, hosted by the BRMS server.
  • Rule snapshots are referenced by change-set.xml or properties files embedded in application (see next slide).
  • Change-set.xml or properties file is updated to point to new snapshot URL.
  • No export or deployment of rules necessary for promotion.

Runtime Environment

  • Clustered, load balanced and highly available BRMS using Apache is recommended.
  • Knowledge agents are responsible for caching rules in application tier and checking BRMS for updates.
  • Application can continue to function even if BRMS server fails.
  • Alternative to knowledge agents: references to package snapshots in application properties files, but need to set up own polling/caching (for ex., using JBoss Data Grid).
  • Rules can be updated in production using Guvnor. Knowledge agent will see update and application logic is updated without redeployment or restart.

Repository Promotion Model

Repository Promotion Model

Configuration

  • One BRMS for each environment
  • Applications deployed in each environment only communicate with that environment’s BRMS server
  • Deployment environments can be 100% isolated from each other

Release Process

  • Rules can be authored in various environments depending on programming interest/expertise of rule authors.
  • Two approaches to rule promotion, either of which should be scripted/automated through a build server:
    • Small rule sets/models: Export repository from one deployment environment to the next when promotion happens. Be aware large rule sets/fact models can create long-running export jobs.
    • Larger rule sets/models: WebDAV. Must maintain a list of rule packages/snapshots to be transferred and script the transfer operations.
  • Only the DEV environment will have complete version history of all the rules. Repository export only captures latest rule version and overwrites repository of target server.

Runtime Environment

  • Clustered, load balanced and highly available production environment BRMS using Apache is recommended.
  • Rules can be updated in production at runtime without an application restart, but should only be done in emergency circumstances.
  • Same knowledge agent or custom caching runtime approach of Shared Server Model applies here.
  • Note: It is possible for production instance to fall out of sync with other environments unless you are careful to reproduce those changes in the upstream environments.

Dev-Only Model

Dev-Only Model

Configuration

  • One BRMS server, but only used for development of rules
  • Deployment environments can be 100% isolated from each other
  • Applications in deployment environments refer to rules embedded in application itself

Release Process

  • Rules can be authored in various environments depending on programming interest/expertise of rule authors.
  • Rules are snapshotted in Guvnor but exported as PKG (binary) files as part of automated build process.
  • PKG files are bundled with the application and checked into version control systems with application code.
  • Upon release, source code and PKG files are tagged, built, and deployed together to the next environment (dev to test, test to preprod, etc.).
  • Releases might involve complete redeployment or patch process could target specific rule PKG updates. Releases will likely require an application restart.

Runtime Environment

  • Application refers to local PKG file for rule execution.
  • No connection to a BRMS server needed at all at runtime (no checking for rule updates, caching, or knowledge agent configuration).
  • BRMS server does not require clustering or high-availability, because it is deployed in the development environment only.
  • Rules cannot be updated at runtime in production.
  • Code always refers to compiled PKG binaries.
  • Only way to update application’s PKG files is to tag, build, and release using the standard release management practice.

New GDP chart added

Just added a new GDP chart, with statistics read in from the Bureau of Economic Analysis. This one involved more brute force than the unemployment data. I wound up manually reading lines from a downloaded CSV response using while loops and substr. There must be an easier way. The code’s up on GitHub.

Creating an Economic Dashboard Using OpenShift PaaS, Node.js, and Google Charts (Part II)

Picking up where I left off in Part I, I have a JSON data service created. Now I want to create a dashboard to display that data.

Creating a Jade Template

To create the page that holds the dashboard data, I decided to try out the Jade template engine for Node.js. I really dig the slimmed down HTML syntax in Jade. All the normal HTML markup built around opening and closing HTML tags like this:

<ul>
  <li class="first">
    <a href="#">foo</a>
  </li>
  <li>
    <a href="#">bar</a>
  </li>
  <li class="last">
    <a href="#">baz</a>
  </li>
</ul>

is shortened to this:

ul
  li.first
    a(href='#') foo
  li
    a(href='#') bar
  li.last
    a(href='#') baz

A div with an id of “whatever” is to simply #whatever. Not bad. Plus it’s got all the usual template engine features like variables, conditionals, iteration, and template includes.

Here is the source code for the Jade template for my dashboard page:

html
    head
        script(src='http://documentcloud.github.com/underscore/underscore-min.js')
        script(src='https://www.google.com/jsapi')
        script(type='text/javascript')
            var blsData = !{blsData}
        script(src='js/googleCharts.js')
        link(rel='stylesheet', href='css/main.css')
	body
        title U.S. Economic Dashboard
        h1 U.S. Economic Dashboard
        #unempChart.chart
        #chart2.chart
            div.questionMark ?
        #chart3.chart
            div.questionMark ?
        #chart4.chart
            div.questionMark ?

Pretty simple. There are a few script imports at the top, including the JavaScript utility library Underscore.js and of course the Google API. The one script block for var blsData shows where I am populating a JavaScript variable with the content of an escaped Jade variable of the same name. I’ll show where this variable is passed to the template in a sec.

The page layout is basic. Four divs, three of which have question marks to represent placeholders for future content. The only div with chart content is unempChart, which will be used by the Google Charts script.

There are a couple changes that need to be made to handle the dashboard page in server.js. First, I need to set up Express to serve my static JavaScript and CSS content. This is done by adding the following two non-route URLs:

// Expose the static resources on two non-route URLs
app.use("/js", express.static(__dirname + '/js'));
app.use("/css", express.static(__dirname + '/css'));

Next I create the route to the dashboard itself, passing in the unemployment data scraped from the BLS site (as described in Part I):

// Route: GET /dashboard -> Jade template
app.get("/dashboard", function(req, res) {
    retrieveUnemployment(function(unemploymentData) {
        res.render("index.jade", {
            blsData : JSON.stringify(unemploymentData)
        });
    });
});

Google Charts

Lastly, there is the JavaScript that creates the Google Chart out of the JSON data and drops it into the div container.

google.load("visualization", "1", {
    packages : [ "corechart" ]
});
google.setOnLoadCallback(drawChart);
function drawChart() {
    var data = new google.visualization.DataTable();
    data.addColumn('date', 'Month');
    data.addColumn('number', 'Unemployment');

    // Parsed blsData
    var parsedBLSData = [];
    _.each(blsData, function(blsDataItem) {
        var parsedBLSDataItem = [
                new Date(blsDataItem.year, blsDataItem.month, 1),
                blsDataItem.rate 
        ];
        parsedBLSData.push(parsedBLSDataItem);
    }, this);

    data.addRows(parsedBLSData);

    var options = {
        title : 'U.S. Unemployment Rate',
        chartArea : {
            width : '90%',
            height : '75%'
        },
        vAxis : {
            maxValue : 11.0,
            minValue : 0.0
        },
        legend : {
            position : "none"
        }
    };

    var chart = new google.visualization.LineChart(document
            .getElementById('unempChart'));
    chart.draw(data, options);
}

I create a google.visualization.DataTable object and populate it with a two-dimensional array, converting the JSON month and year properties into JavaScript date objects as I go. Chart visualization options (of which there are many) are supplied in the options var and a LineChart object is created with the target div id. Calling chart.draw with the converted data and options object displays the chart on the screen.

Partial dashboard running on OpenShift

Next steps

Well, obviously the dashboard needs more charts. One thing it could use is a metric that better controls for the number people who have stopped looking for employment or otherwise dropped out of the labor market. Something like total nonfarm payroll. Then there are other common measures like quarterly GDP changes.

From a technical standpoint, it’s not a great idea to pull the source data with every request for the dashboard. There needs to be a caching mechanism, so the data’s pulled at a more sensible interval (no more than once a day). That might be a good excuse to explore the MongoDB cartridge.

All of the source code is up on Github. Have a look: https://github.com/trevorquinn/econstats

Creating an Economic Dashboard Using OpenShift PaaS, Node.js, and Google Charts (Part I)

Hi and welcome to yet another programming blog. This blog is going to be my repository for technical how to’s for my own record and if it helps someone else along the way, then great.

This post discusses creating a dashboard for the U.S. economy using OpenShift and Node.js. Business executives have dashboards on the financial performance of their companies. Why not create a simple dashboard for the U.S. public on the overall financial health of the country?

I figure the dashboard could display four key economic metrics for the country. The choice of metrics is a matter of debate, but I started with the overall unemployment rate from the Bureau of Labor Statistics. I’m still thinking about which other indicators to display. If anyone has ideas on that, let me know.

In terms of technology, I wanted to demo Red Hat’s OpenShift platform-as-a-service (PaaS). I’m a big believer in the PaaS concept and hope it’s how most of us will deploy applications in the coming years. I love the idea of spinning up an entire application stack and deployment environment in one command (which I’ll show below).

Also, Node.js was on my target technology list, but it was more out of curiosity than anything else. I doubt Node.js is the best fit for this kind of application right now, but I was pleasantly surprised by the number and capabilities of Node.js libraries out there already, especially Express, Zombie, and the Jade template engine.

Design

The application is designed to have two kinds of HTTP routes defined. One is for the dashboard itself. The other is a set of data services that supply the JSON data that feeds the dashboard. Not all sources of economic data have that data exposed in a convenient format or API, so creating services for those sources is a potentially useful byproduct of this project. If I ever need to create a different kind of client (mobile, for example), I can reuse those services.

Creating a New OpenShift Application

Creating a new application on OpenShift is easy. If you haven’t already, sign up for an account on OpenShift and get your SSH keys set up. OpenShift offers various platform types (JBoss AS 7, PHP, Ruby, Python, Perl, Node) and includes a way to build your own cartridges to support other platforms. To create this application on a Node.js stack, either use the web management console or install the OpenShift command line tools and type:

rhc app create -a econstats -t nodejs-0.6 -l [OpenShift username]

OpenShift will create and configure a ready-to-go application stack, named econstats, for Node.js 0.6. It will also create a remote Git repository and local clone for your application code. Jump into the just-created local econstats folder and edit server.js as follows…

Create the JSON Data Service

The first step is to create the JSON data service that feeds the dashboard. I use a headless browser library called Zombie.js to scrape the data from the BLS site:

 
var express = require("express");
var zombie = require("zombie");

var retrieveUnemployment = function(callback) {
    // Screen scrape BLS web page for latest unemployment information
    zombie.visit("http://data.bls.gov/timeseries/LNS14000000", 
        function(err, browser, status) {
        var unemploymentData = [];

        // Grab the unemployment table
        var ths = browser.querySelectorAll("table.regular-data tbody th");
        for ( var i = 0; i < ths.length; i++) {
            var unemploymentEntry = {};

            // Grab each row header and use it to set the year
            var th = ths.item(i);
            var year = th.innerHTML.trim();

            // Grab each cell in the row and use it to set the month and
            // unemployment rate
            var tds = th.parentNode.getElementsByTagName("td");
            for ( var j = 0; j < tds.length && j < 12; j++) {
                var monthData = tds.item(j).innerHTML.trim();
                if (monthData && monthData !== " ") {
                    unemploymentEntry = {
                        month : j + 1,
                        year : parseFloat(year),
                        rate : parseFloat(monthData)
                    };
                    unemploymentData.push(unemploymentEntry);
                }
            }
        }
        console.log("Retrieved unemployment data from BLS.");
        callback(unemploymentData);
    });
}

var app = express.createServer();

// Route: GET /unemployment -> Unemployment JSON data
app.get("/unemployment", function(req, res) {
    retrieveUnemployment(function(unemploymentData) {
        res.json(unemploymentData));
    });
});

// Get the environment variables we need.
var ipaddr = process.env.OPENSHIFT_INTERNAL_IP || "127.0.0.1";
var port = process.env.OPENSHIFT_INTERNAL_PORT || "3000";

// And start the app on that interface (and port).
app.listen(port, ipaddr, function() {
    console.log('%s: Node server started on %s:%d ...', Date(Date.now()),
            ipaddr, port);
});

The call to zombie.visit visits the page, adding to the browser var context. browser.querySelectorAll retrieves the table header cells to grab the year. th.parentNode.getElementsByTagName retrieves the cells for the data, which is pushed to a JSON object called unemploymentData.

Routing is handled by Express.js. Express is a lightweight web application library for Node inspired by the excellent Sinatra library for Ruby. It is a non-MVC way to publish simple web applications and APIs. Define a route (URL pattern and HTTP method) and a response, create and start the server, and you’re up and running.

I simply define an Express server and use app.get to create a URL route for the unemployment data (/unemployment). A callback sets the content type and the content on the response. Then I bind the server to the port using a call to app.listen.

A couple OpenShift environment caveats: The IP address and port need to use OpenShift environment variables to work properly in that environment. Also, to make this all work, we need to have Zombie.js installed in the OpenShift environment (Express is already installed). To add Zombie to OpenShift, we edit the project folder’s the dependency list file, deplist.txt, by adding the following line:

zombie@0.12.13

Commit and push the changes to OpenShift to run it in the cloud:

git commit -a -m “Adding route to provide unemployment data scraped from BLS site” 
git push origin master

Visit http://econstats-[your OpenShift domain name].rhcloud.com/unemployment to the view raw, pure JSON unemployment data that will serve as the foundation for the dashboard chart.