I am currently attending GOTO Copenhagen for the first time.
I decided to spend most of the first day in the ‘DevOps and Continuous Delivery’ track as this is the area that’s piquing my interest most at the moment.
I’ve always had large operations components in my development roles, and I’m really enjoying learning the best practises here in terms of closing the gap between development and operations, agile infrastructure, and continuous delivery.
I’ll begin by summarising the talks that I heard today:
Patrick Debois – Using Monitoring and Metrics To Learn In Development
Patricks talk began by discussing Devops at a high level – shortening feedback cycles between development and operations, speeding up releases, and infrastructure as code.
However, the meat of this talk was on a less discussed aspect of DevOps – capturing and using metrics to proactively monitor how our application is performing. These are traditionally operations tools that we can potentially open up back to development and the wider business to give joint ownership of the production application to a wider group.
There are clearly are a vast amount of tools in this space – Nagios, New Relic, Statsd, Munin, CollectD, Ganglia etc.
Without being disparaging to the above tools, the problem is that monitoring tools are not currently up to scratch for the needs of DevOps. There’s a #monitoringsucks hashtag floating on Twitter which is dedicated to describing the shortcomings of the current monitoring space – missing apis, poor user interfaces, static configuration, requirement for restarts etc. All of these add up to the fact that innovation is required if we are to use metrics and monitoring as a better communication tool between development and ops.
Update – slides for the talk are here.
Yoav Abrahami – Going CI/CD, the Wix.com experience
WIX.com is a platform that allows people to design and host their own high quality websites. With 2 million live websites running against an old Flash designer, Wix.com recently had to take steps to rewrite the platform as HTML5, and also move towards continuous integration and continuous deployment methodology of working in order to deliver faster and more reliably.
The guys at WIX try to deliver in really small steps – the minimal viable features that can go out. They’ll empower developers to own features from start to finish, and will generally push features into production multiple times per day.
The system is very highly decomposed and loosely coupled, and they allow versions of components within the system to go out of line. The implication of this is that they like to maintain forward and backward compatibility for 2 or 3 versions. This sounds like a combanitorial nightmare to me from a testing and reliability perspective, but they do put steps into place to reduce this.
Yoav raised the interesting point that continuous delivery is slightly at odds with SCRUM. Iterations are potentially too slow and standups become what did we release yesterday? What did we release today? This is an interesting question for the future – how process is impacted by this new style of continuous delivery.
He also made the point that doing continuous delivery required a whole company engagement in order to understand the implications, pros and cons of it. Going to see a company that had successfully applied it bought a skeptical management around to the idea.
Itai Hochman – Continuous Deployment at Outbrain
I only caught the tail end of this talk, but picked up some interesting comments and ideas from the discussion.
For instance, Itais organisation still had a QA team, but they had taken steps to ensure that the QA team were not viewed as gatekeepers and that developers with the ones with final responsibility for delivering working features. QA as a quality gatekeeper with sole responsibility for reliability is an anti-pattern that we need to avoid in all software delivery teams.
Itai also discussed his preference for emergent architecture rather than having seperate architects dictating architecture as of old. I’ve heard this opinion many times in the agile world, but again it’s interesting how the role of the architects intersects with continuous delivery. The architect perhaps had a much greater role to play with big design up front than he does when delivering multiple times per day.
Jeffrey Fredrick – A Leap From Agile To DevOps
This was a very open and honest conversation about some of the challenges Jeffrey had found in his current organisation with regards to reliabilty and bringing operations in house for the first time.
One thing Jeffrey observed was how the development organisation has evolved and improved over time, but when they put the operations organisation into place, they didn’t put any of this learning into place, and started making the same mistakes all over again. Simple things such as source code control, iterative delivery, pairing, testing that are becoming quite mature in the development world but are sporadically applied in the operations world.
The talk also discussed how they were using root cause analysis on any production incidents to try to understand the root of the problems that had caused the incident to occur. Over times patterns in these root cause analyses would come to light which would offer real insight into underlying organisational problems and what was stopping them addressing these problems.
It sounds as though they had a lot of success in bringing around the business onto the idea of DevOps. A lot of development teams and product owners are stuck in the mindset of delivering change, but we need to make the case for reliability and robustness and work with product management to get this kind of maintenance and monitoring factored in and given priority.
Michael Nygard – Disband The Deployment Army
The talk started with a humourous picture of around 40 people sweating in an office who had been involved in a deployment at 1 am on a weekend morning.
Luckily this is an extreme example of a deployment army, but deployments and releases are often complex and slightly scary scenarios that can often involve far too many people. There is a bit of a vicious feedback loop at work here in that the harder releases are, the longer we put them off. Then they end up growing even bigger and more scarier.
The meat of this talk was explaining how we can move away from this by implementing continuous or at least much more frequent deployments and releases.
If something is painful, do it more often!
Michael accepted that more delivery and frequent deployment could result in slightly more production incidents, but the idea is to reduce the mean time to identifying those issues and then resolving them, such that the the cost is very small relative to the business benefits gained by frequently releases.
Most people would obviously say that zero production incidents should be the goal, but it’s an interesting point that if we can quantify and put a price on production incidents, and then minimise and reduce the probability of them to negligible levels, perhaps these are a fair price to pay in order to remove the overhead of heavy weight releases.
This was a really interesting first day and I continue to be convinced of the value of DevOps and continuous delivery as I hear more real world experience.
I’ve heard the continuous delivery pitch a number of times now. Feature flags, deploy from main, extensive acceptance tests etc etc. This is a slightly predictable talk now. What’s more interesting though is how the organisations have fared in their transition to doing this – cultural changes and the end result in terms of observed quality and ability to deliver. It’s a self selecting group talking about this, but most of them seem to have had success with it and dare I say found it easier than you might imagine.
Monitoring and this idea of an immune system for the application is a slightly fresher idea with a lot of potential. I like the idea of using automated tests to build confidence pre-release, and then use suites of metrics post-release to build confidence in the change. I agree with the point that there is huge scope for innovation in the tools space, but as that comes, high quality application metrics could really be a line of defence almost as useful as tests and QA.