Tuesday, July 31, 2012

Got RUM (Real User Monitoring)?

This year I attended the O'Reilly Velocity Conference and had a very good experience.  Before I continue let me give a little of my background.  Most of the conferences that I have attended have been for developers or for a specific language.  The developers conferences all degrade into one large and endless commercial.  The programming language conferences turn into juvenile language bashing or a religious revival.  It takes a lot of patience to glean valuable insights from these conferences much like getting water from a dry sponge.  Velocity on the other hand was like drinking from a fire hose.  It is going to take months to get through the valuable information that was available.  One topic stood out almost ominously, the topic of this blog entry real user monitoring or RUM as I'll refer to it throughout the rest of this entry.

What is RUM?

RUM is a passive technology used for performance metrics and monitoring.  A simple definition is that it records all of the user interactions with a website.  That user creating the interactions could be another website, robot or human.  RUM is passive in that the collecting device gathers web traffic without having any effect on the site.  This hasn't always been the case in the past but the technology has improved to a point where there is no excuse to have it.  Passive monitoring differs from other types such as synthetic and automated web browsers in that it relies on actual inbound and outbound web traffic to take measurements.

Why use RUM?

The performance community for years have been preaching that site owners need to use real end-user monitoring tools, like Webpagetest.org, to get a real-world picture of performance.  For example, just because a test was successful doesn't mean users aren't experiencing problems:

  • The user could be on a different browser than the test system.
  • The user may be accessing a portion of the site that is not being tested.
  • The user may be following a navigation path that was not anticipated.
  • An outage could have been so brief that it occurred between two tests.
  • A user's input data could cause the site to behave erratically.
  • In a load-balanced situation a user could hit a failed component while the synthetic tests hits a working one.
There are infinite ways a site can be broken but still be working or hobbling along.  As I have experienced in my career, all the monitors could be green while the user experience is horrible.  RUM is a collection of technologies that capture, analyze and report a site's performance and availability from an actual visitor's perspective.  RUM may involve sniffing a network connection, adding JavaScript to web-pages, installing agents on boxes or any combination thereof.


Simple RUM!


If you are already using Google Analytics you already are instrumented for RUM!  Take a look at the Real-Time pages for the RUM reports.  It will open up a whole new vista for you.  Another simple implementation is using boomerang.

boomerang always comes back, except when it hits something.

Boomerang.js website (https://github.com/lognormal/boomerang) is a piece of JavaScript code that you insert into your pages that captures measures for a whole range of performance characteristics from an actual user's interactions.  I found that using Google Analytics' RealTime to be the easiest way to get RUM up and running and boomerang.js second to that.


Note: Here is how I got my nickname/alias 'oldstinger'.  In high school I played the strong safety/outside linebacker position and was known for my hard hits.  So one day my coach accidentally slurred my last name to 'stingham' instead of 'stringham'.  My friend said that yeah the hits sting.  Thus 'stinger' came into being.  Now that I am getting grey hairs it has become 'oldstinger'. :)

Friday, July 6, 2012

My Agile IT Experience Report At AgileRoots2012

Late in June, I presented "Agile IT/Ops: A One Year Checkup" at AgileRoots. It seemed like IT and DevOps were popular topics because I heard a lot about them at other presentations and in the hallway track.

Specific talks that stood out (and about which I'll post more later) included: Agile to the Rescue (a CIO's view of doing IT Agilely) and Agile 2.0 (which touched a lot on DevOps and IT themes). Also scheduled up against my experience report was Outgrowing the Cloud by my friend Mike Moore I think this one will be available on the web soon, you should look for it.

So, what did I talk about? Good question. Here's the overview:

A little background

Our team of nine is part of a bigger group of about 30 (which includes: data center folks; DBAs; infrastructure, network, and storage engineers, change management; and our sys-admin/Ops team). That group is, in turn, part of a much larger development organization.

Not only do we have a big group with diverse charters, we're also geographicly spread out. We have people in two different office spaces about an hour apart. We also have three data centers spread across two states.

Several years ago, everyone else in the department went to a series of Scrum training events and became 'Agile'. At that point, the powers that be decided that IT couldn't be done within an Agile process, so we kept on doing things 'the old fashioned way'. Eventually, the dissonance became too much and we started exploring a move to Agile. That's where our story picks up.

Jun-Aug

The first quarter of our agile conversion was marked by Painful Planning and Guerilla Agile. Our group's management team met to figure out how to make the move. Since I'm an Agile Methodology junkie (well read in the topic, but only lightly seasoned in practice) I was pulled in as an advisor. We ran several planning exercises to see how that would look (it wasn't pretty), and eventually decided not to make the move. Honestly, I think I spent too much time on the mechanics and not enough time getting into the philosophy — that probably led to some of the problems we ran into downstream.

My team manager and I decided there was still a chance though, so we went underground and started running an Agile Ops Team. We focused a lot on a training-trying-repeat cycle. At this point, I started to slip more philosophy into the mix.

I was also careful to be really explicit about what and how we were doing things. For example, in our retrospecitves, I would start out with a review of the stages of retrospectives, then announce which stage we were moving into and what we were trying to do in it. This helped build some solid institutional knowledge among members of the team.

Sep-Nov

Our second quarter felt like we were Getting Into A Groove (we were hitting the "Norming" stage of Tuckman's stages of Group Development). This was also when we hit our Agile Mandate — you've probably heard of the Agile Manifesto, we had a top down directive that said "Everyone in the department will now be Agile". We didn't all have to make the move at once, but the writing was on the wall.

Unfortunately, this quarter ended on a low water mark for our year, a management re-org that really shook up our group. Our manager moved to another group, a PM took over responsibility for Agile in the group, and there were a variety of external changes that reverberated in our group as well.

Dec-Jan

Soon after the re-org, we went through some additional changes that sent us back to Forming and Storming. We made it through December operating about where we had been and started January with a full-day, off-site 6 month retrospective. From my perspective, this was the highlight of the year. Our team was really humming at this point.

Then we came in for our 6 month planning session which was pre-empted with the news that we were going start "doing agile" as a group. This was rough. We had people in the group with very different skill (and interest) levels, we were now a much larger group trying to meet together, and we had to deal with meetings that crossed time zones and pulled people in via phone.

We had a variety of mis-steps in this quarter: some people ducked meetings to "work on the important stuff"; we often lost traction on improvement ideas that came out of retrospectives; and morale suffered as we learned just how bad our estimates and capacity planning were.

Mar-May

In the final quarter of our year, we got back to Norming. There were still some obstacles, but there were also some lessons learned and some wins for us.

We moved from the product we were using to track requests and problems (a bug tracker, which wasn't optimal, but entrenched) to an agile tool. This created more heat than light at first while we went through some growing pains. It also helped pull the group together — we identified the pain points in a retrospective and came up with some ways to work through the rough patches.

The group also decided to cut back on the time spent in retrospectives and hold them every other iteration. We're coming up on the end of this experiment, so we'll see if we stick to it or not.

We also had one team split back out and make a move to more of a KanBan or "continous flow" model. This provoked a lot of discussion on our team, as we feel it might be a good direction for us as well.

Our team decided to start applying retrospectives to our operational work as well as our iterations. We met each week in a "The Week That Was" (imagine it being read in the booming, deep radio announcer voice) where we would discuss what had happened over the last week, what we could learn from it, and what we were going to do about it.

Today & Tomorrow

Since my timeline ended in May, I also talked a bit about where things are now and where they're headed. Three things really stand out:

  1. We're breaking back out to the team level, and reporting to the group to make our meetings more managable and effective.
  2. We're scheduling an annual retrospective and planning meeting as a team
  3. We're going to experiment with KanBan ourselves.

Wrapping Up

Just before my presentation, my son went on a two week canoeing trip. So this next bit is an homage to him. The tradition in the program he attended was to hold a nightly reflection focusing on Wet Socks (things that didn't go well), Dry Socks (things that went well), and Gold Bond (things that could be done to make things better).

Wet Socks

  • our group was too big and too dispersed to be effective
  • we had too many disparate charters
  • there was no real product owner, so everyone tried to be one (and to paraphrase Syndrome "When everyone is a product owner, no one is a product owner"
  • the reorg
  • cutting retrospectives to every other iteration (in my opinion)

Dry Socks

  • we created a lot of transperancy internally and externally
  • we held ourselves and each other accountable
  • we built a lot of team unity
  • just deciding to do it was a big win
  • the 6 month retrospective
  • starting "The Week That Was"

Gold Bond

  • KanBan
  • integrating Ops and Iteration retrospectives more completely
  • going back to team level meetings

Recommendations

If you're thinking about trying to run your IT shop using Agile principles, do it! It might be hard, but it can work.

Look at continous flow from the get go. We haven't gotten there yet, but we all think this will be a good move for us.

Train all the time. Make every meeting and communication a chance to do a little mini training. Why are we doing this? What does this mean? How can we improve?

Use your retrospectives wisely. Savor the wins, examine the pain points, and keep improving.

Be prepared for hard times. They will come. If you're careful and thoughtful, they'll make you better. If you just grit your teeth and endure them, they'll probably circle back and hit you again.

Keep records and use your metrics. This will give you a better sense of perspective, and ammunition to fight off the occasional attempt to shut things down.