Wednesday, March 7, 2012

Configuration Management, more important than you think

Let me start this by saying that Configuration Management, in my terminology, means the practice of maintaining and managing source code, builds, unit tests and releases – essentially the management of all development and release artefacts and not something else. 

Personally I don’t consider myself a Configuration Management person, I have done the role several times, even for teams of developers numbering more than 50, but it is not something that I aspire to do. What I do have for CM however is a deep appreciation of just how vital it is to the success of delivering an IT project, and in particular one that involves offsite development, multiple releases and on-going change management. For these reasons most people who work with me know that I evangelise the importance of CM a lot. You can chalk that up to having a lot of experience working on some very complex IT projects where I realised pretty quick how essential it was to have a functioning CM practice that ensured everything which got developed, got built, tested and deployed in a methodical, efficient and systematic manner.  But this post is not going to be a lecture on the finer points of branching, merging, build automation, continuous integration and so on – there’s more than enough stuff about that on the Internet to keep anyone amused for months, instead I’m going to have a rant around the question of why is it is sometimes still done so poorly in the IT industry.

First up let me say that not every client I have worked at, or company I have worked for, does it badly or under-value’s it, but a decent amount still do. I’ve seen an alarming number of projects being run over the years where a source code repository, a couple of branching diagrams and maybe, just maybe, a few install scripts is as far as the thought process goes with CM. Worse still the implementation and maintenance of it is usually done by developers, usually a few “heroes” that know how it should all hang together, with little consideration given to anything except to just get the release out the door. In short it usually fails once the project gets big. It’s the same old story, the business gets more funding or wants more features and as such, more releases are then needed. All of a sudden the development team realises that the solution now needs to be deployed across multiple servers to cope with the higher demands. The architect gets consulted, updates the architecture and thus in-turn increases the deployment complexity to cope with the new features which all now need to be incorporated into the builds and deployment scripts and installers. In a panic code branches are created left, right and centre to cope with multiple releases and development teams. Merges then get made, often in haste, over-writing code already produced causing more delays and down the slippery slope the project goes. An even worse scenario in this situation is when you have multi-site development teams, especially if they are located in another country or time-zone, because the chaos spreads to these areas as well compounding all the problems. By the end your source code repository resembles a weeping willow tree

The trunk is in there somewhere!
Not good. 

So why do these problems always happen? Could it be the architects not catering for it in the estimates or the finalising of the project plan? Possibly, but it’s not like architects weren’t developers once so they would know how tight the leash needs to be to keep them in check (if an architect was never a developer by the way then run a mile – I’ve had the pleasure of working with these unusual breeds of people before and they are dangerous). Any decent Architect would know that the quality of the code that will support the solution is inorexably tied to the Development Leads overseeing the work and the CM Resource (or Team) managing the process of its production, compilation and deployment – not to mention quality assurance which is a shared responsibility. As such architects should always be seeking to understand how the Development Team leads are managing, testing and deploying the code to develop the solution they spent weeks/months of long hours finessing to get right. If you’re not doing this then you need to question how serious you’re taking the work you are doing. But if the fault cannot be traced back to the architect then where does it lie? Experience suggests that it is usually a result of budgeting and estimation not catering for it and typically this occurs because the Project Managers or Business Owners (who often aren’t technical and hence don’t understand or appreciate the critical value of it), will push back on getting permanent CM resources. Other times it is the Dev Leads who, under pressure to cut down their estimates, will downgrade it to a small task that a developer can manage: “just create the branches; get a build running and that’s all we need” has been a common saying which is really another way to say “we’ll worry about it later, lets win/get the work first”. 

To put it bluntly if you don’t do it, on any project where more than 5 developers are working (and thus extrapolate that out to the number of testers, analysts etc..), then you are going to pay a heavy price for it when the project grows in size. 

And now for a real-world example to illustrate the points made so far:

 A few years ago a company I was working for was tasked to build a very large IT system. It had numerous function points, integration, workflow, and batch processing requirements. The user interfaces were highly complex and the design work required a large amount of modelling. In summary it was not your average 6 month project, it was estimated to take several years to complete, and the code was to be developed offshore involving a team numbering nearly 70 developers at its peak. The offshore team were told to get a CM resource on board and, not finding one readily available, hired a senior technical lead to do the job. After a few months the Tech Lead, who had not sat and planned the CM work effectively and thus had not mandated the standards around branching, merging, versioning, builds and testing (as he did not understand CM), became buried in a huge spaghetti mess of build and deployment scripts, numerous branches and an incoherent folder structure. Lacking adequate and enforced standards the developers began running rampant, adding compiled assemblies into the source code system, creating numerous folders and sub-branches, diluting the quality of code (as there were no QA checking tools installed) and no-one was sure how it was all meant to work so builds and environment installs took days to complete. End result: builds were failing everywhere and the onshore team were raising concerns. Offshore decided that the problem was thought to be resourcing and so another resource was put onto the team. This time it was a junior developer. The junior developer, not knowing what she was doing just made the mess worse, so another junior dev was brought in and the problem grew worse again. At this point you could assume that it’s easy to point the finger at the offshore team but that would not be fair. The onshore team themselves, who were so overloaded with work, were not constantly monitoring the CM process as they did not have the time; they just saw the builds failing. And secondly, the resources tasked to do the CM work were not experienced, they were just told to get it done. In other words: Management failure. Eventually onshore resources were sent offshore to get the project back into shape and then the penny dropped when they arrived and saw the development process in action. I got brought onto the project, initially to do an architectural assessment but then I started noticing  the CM issues as well, got called into a few meetings and then it was determined I should try and help sort it out. My first step was to immediately draft a set of standards that would become the bible for managing the CM work. This was distributed across all the development teams and it was made clear that adherence to these standards would now apply and anyone caught breaking the rules would have their work removed and potentially themselves off the project for repeat offenders. I then halted all code check-in work, cleaned the source code repository (freeing up GB’s of disk-space) and set about wholesale restricting of the builds and scripts. 8 months later and a trip overseas to help the team institute the standards and we had a functioning CM practice.  It would be fair to say that CM was estimated for a lot better in future projects after this.

So resourcing is for CM is harder than some people realise

Configuration Management is not easy; finding good resources therefore can be very hard and companies that have them, particularly large organisations that do a lot of development work, hang onto them tightly. An effective CM person needs to be a strong team leader, possess good analytical skills, have an appreciation for process and be able to document and articulate (in crystal clear clarity) the expectations and standards that need to be met to ensure that the software is stored, built and tested in a consistent and efficient manner. And yes they also need to be feared by developers. I’m not a believer in using fear as an educator but in the world of IT it does tend to be rather effective – a least in this discipline.  CM people need to know architecture, not to be able to recite TOGAF or Zachman to the nth degree, grow beards and wear bow-ties to work, but they should be able to understand how a solution must hang together from a technical viewpoint. This will ensure that the scripts to configure the OS, Databases, Web Servers and so on align with the architecture that has been designed to ensure the solution can be deployed properly. They also need to know technology, particularly the languages used by the development teams, to ensure that the builds, unit tests, reports, code management tools and all other manner of CM artefacts are being put together in the best possible manner to ensure they function and integrate correctly.

Finally CM people must evangelise process automation. This does not just extend to Continuously Integrated builds and integration of reports either, Source Code repositories, build and deployment scripts should be constantly checking to ensure that code being checked in is not violating quality rules and allowing bad code to pollute the body of work. There is no excuse not to do this nowadays, Source Code Management systems are now so sophisticated they can manage an entire project from the design, through to the development, unit testing, system testing, system integration testing, user acceptance testing and deployments. There is also a plethora of third party and free tools available online (big thumbs up here for CruiseControl and NDepend) that can perform all manner of code quality checking, both pre and post-compilation which bolt onto these products. The aim should always be to use these as much as possible to create a water tight Source Code Management system, with detailed reporting, to ensure that quality is constantly being enforced and monitored with minimal human intervention.