Saturday, May 3, 2014

The offshoring cost delusion

At face-value it is always going to be more cost-effective to perform labour, irrespective of the industry, in countries where the labour cost is cheaper. It’s a fact that is hard to argue against, almost impossible really, and it has been happening for years in a large number of industries such as manufacturing and banking. If you can charge out a resource at a third of the price of an onshore resource then it is a compelling proposition not just for you, but for your customers as well. With respect to IT, offshoring managed service work, packaged software implementations, helpdesk and call centre operations (the kind of work that is highly governed by prescriptive processes and is accompanied with numerous repeatable, measurable steps i.e. no need to “think” so much as to just “do”) it usually will be. That’s the proposal being sold by companies that can offshore and you really you can’t argue with it. But when that same line is used for bespoke software development the situation is different. Very different.

Bespoke application development requires a number of different set of skills and qualities – that aren’t just confined to an organisations or teams competencies with a given set of technologies required to deliver a solution. It requires you to think laterally, logically, and in entirety with respect to the solution being built. You need to be able to think outside the box to get things done, either as a developer, tester or architect – there are no distinctions. This is because there are always some grey areas and hurdles to overcome and that means it can be unpredictable. So therefore a lot of what governs and underpins your ability to deliver successfully is the processes that prescribe how you identify, resolve and future-proof these areas of uncertainty.

In the bad old days of waterfall this uncertainty would always freak people out and because you released in a “big bang”, the architecture and design tended to be gold-plated inside and out from the start (whether it was always necessary to do this or not), and it would be accompanied by endless reams of documentation that would get reviewed once, and then get dropped somewhere in a CMS never to see the light of day again. The really big waterfall projects would hardly ever make a delivery date due to the big bang approach of deployments as getting it installed correctly in the testing and production environments could waste days/weeks getting everything configured and integrated. Then Agile came along and we started to see the light that little bit better by releasing smaller and building integrity into the product.

When application development work started to get moved offshore in Australia around the early 2000’s, thanks to a big push from the banks and the big telcos, every problem we faced doing traditional waterfall got exacerbated because, despite the best efforts of the onshore teams, the offshore teams would struggle to understand every requirement due to the highly complex nature of the systems that had been produced. Architecture gold-plating (because of the big bang unpredictability), and the need to wade through all the documentation to understand what the hell was going on, confused people a great deal. It would take months to understand it all and the offshore teams were (stupidly) expected to come up to speed in a matter of weeks – as promised by the glittering powerpoint presentations of their high-end consultants. They never did of course – no-one could, and a big part of the “sell” of these big offshoring companies, which was their sophisticated training centres that claimed they could train up resources to meet demand anywhere in the world, sounded as ridiculous now as it did then.  



I worked on a project such as this once, here is my story

The architecture of the system I worked on was highly complex. Ridiculously so. It was probably the worst candidate for a system to be offshored that I have seen in all my years working in IT. When the system was first transitioned offshore it collapsed in spectacular fashion because the offshore teams did not have the domain nor the extensive technology experience to implement or understand the system and how to change, enhance and support it correctly. This caused all the delivery dates for new initiatives to be missed, everyone worked late nights and after a few months the hatred towards the company, management and the poor offshore resources themselves (who were doing their best), by the onshore resources was not even trying to be kept hidden. The client was furious at the missed dates and the costs blew out badly. In this early stage numerous people had to travel to India to rescue the project at great cost to the company. In short every promise of improved quality, productivity and cost was broken.
As the onshore teams got so badly burnt by the late nights making up for offshores mistakes, it decreased workplace morale and to make matters worse, the company started cutting costs due to compensate for the project cost blowouts as the schedules started to get missed as the money wasn’t coming in. Even beers on Fridays were cancelled (in an Australian work-force that is like inviting mutiny upon yourself), and morale got worse and worse.

Down the rabbit hole of fail we go....

Because the offshore teams were failing so badly, causing much finger pointing and blaming on both sides of the fence, the architects and designers (under a considerable amount of pressure) started to gold-plate their systems and designs even more to remove any traces of ambiguity – and to cover themselves in the face of any management wrath. The designs became so complex that one of the architects years later remarked, “we made it so generic it actually became unusable”! As a result estimates to deliver the software doubled, even tripled, to compensate for the delivery problems as no-one was crazy enough to make promises based on the pre-offshoring days estimates.

The client started to get even more upset at the cost blowouts (because they were promised things would get cheaper) and managers, who had already worn out any goodwill to their architects and tech-leads could not get anyone to budge on estimates so under pressure they started over-promising to the clients hoping, praying, things would improve.

And of course they didn’t.

Now in a total panic senior company management implemented more knee-jerk responses to stem the cost bleeding because they were losing control. To appease investors alarmed at the spiralling costs they made even more cutbacks and produced lots of spin both internally and externally to paint a better picture of the situation in an attempt to “sell” the new offshore delivery model and pretend things were going well. This was of course in total contrast to the experience on the ground and made the whole situation even worse as it bred a massive amount of contempt amongst those doing the work. Even anonymous blogs were created that mocked the company and told many home truths that the company scrambled to get shut down and blocked by ISP and network providers! The environment became incredibly toxic and hostile and people left the company in droves fed up with how badly everything got managed. Those that were left jumped back on the plane for extended stays in India but then had to fight tooth and nail to claim back expenses incurred as the company kept cutting more and more corners to compensate for cost-blowouts requiring digging into more and more capital.

It was an absolute nightmare and mis-managed from the top to the bottom. The initial cost-cutting set the scene for failure and as we went further down the spiral more cost cutting just exacerbated the issues.



So how do you get it to work?

There is a lot of literature and great publications now around the DevOps movement. A movement borne out of Agile to wholly integrate development and operations into a single functioning flow-through work process that enables fast and quick integration of code, automation of virtually everything and continuous one-click deployments to name but a few desirable capabilities. If you aren’t DevOps capable don’t even think about doing bespoke application delivery out of another country. For a start you can’t be as Agile as you can be onshore so you’re already at a disadvantage when it comes to team collaboration and working closely with the business. Also if offshore start spruiking the cheaper labour costs as a reason to not investing up-front in DevOps practices such as automation testing, code quality checking, continuously deployable pipelines and coded-infrastructure as they’ll just “assign cheaper resources to review and validate manually” then start scanning the job classifieds because you are going to go down.

Move it all to the cloud and automate the shit out of it

You have to invest in your offshore delivery centres, moreso than onshore and you have to do it well. For a start avoid physical infrastructure and put all your development and testing in the cloud. Before you do that ensure all your servers and environments are coded up. That means you can restore and rebuild entire environments from the click of a button. It is not impossible, yes it will take time, but the long term benefits will protect and insure you against failure. This must be completed and in-place weeks before development even starts so the environments are completed, properly monitored, tested and cost-forecasted.

Hire properly offshore

No graduates.

If the people on your delivery project do not have a minimum of 5 years technical delivery experience then do not hire them. In some countries there is a culture of “it’s okay, they got good marks, they’ll learn as they go” and in IT this theory simply does not work. You need delivery experience, you need to be technically proficient and you need to have experience working with other cultures in other countries. Good people cost money, offshore is no different, but offshore it is even more critical because you really need smart and motivated people that can think on their feet.

Automate quality checking to the nth degree

And that means everything. Unit Test Code coverage above 90%, code metrics checking for cyclomatic complexity, lines of code, class coupling, maintainability and anything else you can throw at it. This all has to be automated on check-in. Furthermore database script changes should be tested against database rebuilds, selective automation test execution for critical end-to-end functions should also be thrown in.

Use code reporting tools in conjunction with build and code management software as much as you can. Produce reports and get them delivered to delivery managers daily so they can see how things are tracking.

Test the hell out of it often

Full suite of automation testing should run every night. As a minimum. Automation testers should start on day one as should DevOps people. No project should be without this. The capability to extend and support must work hand-in-glove with development and functional testing to amplify the feedback loops and keep everything rigorously tested

Get onshore and offshore passports and visa’s ready

People need to travel and they need to travel often. If you want to have any hope of doing offshore delivery successfully you have to do this. There should be back and forth travel occurring every two weeks for the duration of the project. Extended stays should not happen, two-week stints are enough. This will ensure the teams are always working together and avoids the “us and them” situation from occurring. The more people work together, the better they work together and travel should be happening from the topmost level of PM’s through to the low level.

Do not avoid doing this. Do not think that travel at the start of the project to cement the understanding is all that is needed with travel at the end catered for to “bring it all home”. This does not work because there is all that time in the middle where the problems occur. I have seen this situation occur numerous times.

Summary

Cost-cutting up-front without investing properly in transitioning, training, infrastructure etc… is incredibly stupid when offshoring application development and it is even more disastrous when performed on projects in-flight when things start to go bad. Offshoring is not going to be cheap in the short term, and it is not going to be cheap in the long-term if you try and cut corners. You have to invest and invest heavily to get it to work from day zero in every area.

In my experience the developers, testers, DevOps and architects (mostly) get the need for all this extra overhead, both onshore and offshore. And it is not a one-off thing. Doing this is a cycle of investment, from project to project, that must be repeated constantly, to refine and enhance so that you delivery standards are continuously improved. The first project will be the worst, it will cost you way more than you planned but if you stick with it and maintain the investment and support the costs will start to come down. You have to retain this focus, the more you keep up the focus on ever improving quality the faster and faster you will get. Doing bespoke application delivery between countries is not easy and it is not, and never will be, fundamentally cheaper. As soon as you realise this, and budget your projects accordingly to address it, you will have a hope of succeeding.

Friday, April 18, 2014

Offshoring - lessons and observations from the coal-face

IT is all about adaptation and change, if you can’t handle it, you’re in the wrong industry.

October 2013. My fifth trip to India and like my previous trips it was as eventful, colourful and fun as ever. I really love visiting this country, the people are really cool, the food hot and the culture vibrant. It’s a great place to visit for work in IT and I have had many wonderful experiences being there. This following series of blog posts will deal with some of my major observations of working with IT Delivery projects in India, and the challenges and observations that I have made over the last 10 years being involved with them from initially as a .NET/COM developer through to Solution Architecture and DevOps

First up let me say that there are lot of great things about offshoring that work and work very well. Managed services work, call centres and help-desks are great candidates for offshoring because, by and large, the processes they follow are (usually) well-defined and the steps to resolve and assist with issues are highly prescriptive. Application delivery however is a very different beast and this is what I am going to concentrate on.

In a typical IT application delivery project you are always going to require some base, non-negotiable fundamentals for success such as:
  • Strong management support to streamline and plan work
  • Close collaboration and good working relationships between teams – in Agile that goes without saying
  • Good architecture and well-defined requirements – complemented by governance processes to manage gaps and change.
  • People who can think laterally to solve issues and solve them early, quickly and efficiently
  • Good investment in development and environment infrastructure to ensure there are no excessive downtimes waiting for software to build and deploy.
  • An embedded culture of continuous improvement and automation to future-proof against risk and maintain a high quality of work
  • Plenty more but that will do for now.

Running a delivery project offshore is no different however it is amazing at how of all the items I listed above, how poorly done they can be when their importance is scaled done or the corners end up getting cut in an effort to rein in costs.

Cutting corners on an IT Delivery project is always be a very stupid thing to do, but this is exacerbated so much more when you throw offshore delivery into the mix. Simply put if you believe that being able to deliver software offshore for your customers will be fundamentally cheaper for you as a business than doing it onshore, or you can cut corners such as not invest heavily in supporting infrastructure and staff, reduce travel budgets, or not hire experienced architects, testers, developers and DevOps staff because people will just “work harder” or “work smarter” then you are delusional.

One of the most important things to ensure success is the implementation of effective communication between teams and the right infrastructure to support it such as video conferencing, tools like Lync and frequent travel back and forth. Failure to do this creates an “us and them” culture that spreads like a virus. If it is not addressed, the environment can turn very toxic as the bad blood spreads from the teams at the lower end of the spectrum, who are usually the most fearful of losing their jobs, bubbling up to the top causing reactionary and defensive tactics on both sides to avoid accountability



Typically the end results of this are onshore teams will try their hardest to put down, discredit and deliberately (in extreme cases) sabotage work done by offshore. Conversely offshore teams will become hell-bent on proving how much better they are than the onshore teams by constantly raising issues, escalating even the most trivial problems to management and being deliberately vague and obtuse, even lie, to avoid doing work or taking responsibility.

In short nobody wins, and if these toxic attitudes are left unchecked to fester then all hell breaks loose, projects fail, and people leave the company. I’ve seen numerous cases where onshore staff turn highly aggressive in telephone conferences as the pressure builds, finger pointing emails begin flowing back and forth, and everything descends into an ugly mess. And all this within the same company – thankfully clients were never witness to things like this!

 And it is really stupid because it is so easy to rectify by doing a couple of simple things such as:
  1. Have a lot of onshore/offshore travel so the teams mix and integrate. Once every few months is not enough, people should be flying back and forth every two weeks so relationships are formed and people get comfortable with each other.
  2. Don’t place employees on these projects that have difficulty dealing with offshore teams. For some people cultural barriers are just too much to overcome and they will be affronted with the feeling of being forced to adapt and change “their ways” to suit others. These people need to be weeded out immediately

But considering this stuff has been documented 1000 times over, and a lot better than what I can do, I’ll instead delve into some other areas

Coming up next time: The Cost Delusion

Wednesday, April 2, 2014

AWS, PowerShell and Jenkins – your complete cloud automation management solution

I had the opportunity to set up a complete DevOps architecture for a big onshore/offshore (Australia/India) project recently and amongst the many tasks I was set was that the entire development environment (source control, builds etc..) test environments (automation test, functional test, performance test and showcase) had to be hosted in AWS in Sydney within a VPC and secured.

First up great! This was music to my ears, no more stuffing around with physical machines and fighting death cage matches with support people to get hardware upgraded. I could control the environment, the domain, basically everything.

So over the next 9 months I toiled away and came up with, what I think, was a really good solution, a fair bit of it I detail below. To go into the total ins and outs of it would be akin to rivalling War and Peace so I’ll contrast on the important parts of it, namely how I got the most out of the AWS SDK’s.

The setup

Setting all this up initially took a lot of trial and error. You really cannot do this kind of thing without properly planning how your VPC will be set up. Security Groups, Subnets, routing tables, acls etc… there is a bit to get your head around but having said that this excellent blog post sums it all up nice and quick:  Get your head around that and you’re well on your way to nailing this stuff

After a week and a few long nights we had Active Directory setup, groups and user accounts provisioned, we had come to grips with the Remote Desktop gateway server and the NAT Server. Although at this point we started campaigning long, and hard to get our support team to set up a VPN between the corporate network and our AWS VPC and trust the two domains. Eventually after two months of emails, phone calls, risk escalation, intensive nagging we got 4 hours of the support guys time to set it up. You don’t have to say anything at this point, I know what you are thinking and yes it is true we started saving time immediately.

So cool, we now have AWS VPC set up, I can RDP to the AD machine from my local desktop without needing and I have created a Windows Server 2012 Core image to build all my machines upon.

Next hurdle, how do I manage the infrastructure and categorise it

Experience tells me that if I had of just started creating images everywhere for the whims of developers, testers and architects I would have had a hideous mess on my hands by nightfall. Plus I still needed to set up TFS for source control, builds, project work tracking so of course that means SQL Server too.
So in short I needed a way to be able to categorise my instances to control them – enter the AWS Metadata tags. This very simple feature allows you to simply “tag” an instance with whatever key/value you like. Create 1, create 100 it doesn’t matter. Well creating 100 is probably going to be a pain but you get the idea. A couple of hours of putting thoughts to paper, a meeting and a quick chat and we came up with a set of tags that would categorise our instances.

  • Core – always on, candidate for reserved instances
  • DevInfra – Development Infrastructure – almost always on, 20 hours a day minimum.
  • TestInfra – Testing Infrastructure, on for about 16 hours a day
  • DemandOnly – Demand instances only, manual startup, always shut down every day if running
·        We added a couple more over the journey but these four are certainly good enough to get most stuff off the ground.

So now we have TFS installed, developers are developing, builds are building, delivery managers are setting up work items and…. you get the idea.

Next hurdle, how to automate the shit out of everything so that I keep costs down?

Firstly I did not want to have to worry about checking startup, shutdown, backups, snapshots etc… all day, I needed a way to set up a machine with the right software that enables me to schedule automation jobs, keep a history, work with Windows Operating Systems and the AWS .NET SDK, oh yeah and I didn’t want to pay for it either.

There are a number of ways to skin this pussy cat but I combined a bunch of modularised PowerShell scripts and ran it all through Jenkins 

Why PowerShell?

Because it’s all built on Windows. If you’re not using PowerShell to build up and configure your Windows machines you’re doing it wrong.

Why Jenkins?

I know the product well (always good to stick with known, knowns) and it really is a great tool with good online support. It enables scheduling of jobs that can run virtually anything, it can build software, chain jobs together in pipelines and it has a ton of plugins too. Sure it’s a Java tool but only an idiot would assume you can only look for answers in the Microsoft world.

The end result

After a month of solid scripting and testing I had created enough PowerShell scripts and functions that enabled me to do the following with the instances in my VPC all controlled through Jenkins using the metadata tags

  • Startup and Shutdown of instances
    • Core on all the time, DevInfra on 20 hours per day, TestInfra on 16 hours a day
  • Snapshots – Core and DevInfra snapshots are created every day
  • S3 Database Backups – All my database full backups that ran every night were copied to S3
  • Redundancy – New snapshots created were also copied over to the US West Region every night
  • Environment rebuilds – Cloud Formation scripts ran every night to rebuild the test environments so we had a totally clean machine to deploy to daily
  • AWS Cleanup - I created jobs to clean up S3 and instance snapshots once they got older than a couple of weeks

The best part about this solution was that if we added new instances we just tagged them appropriately and then all the maintenance of the startup, shutdown, snapshotting took care of itself. Matter of fact I stopped looking at it after a couple of months as it all ran like clockwork.

We even got really clever with it such as shutting down TFS Build Agents when the amount of queued builds was low through polling the TFS API services and then starting them back up when the builds queued up and so on. We also extended Jenkins to do software builds for the purpose of running Sonar over the top of them and then also to create deployment pipelines so that the testers could self-serve their own environments.

Couple of things I learned along the way

CloudFormation can be a pain in the butt for Windows, when it works for you it’s beers all round, when it doesn’t you’ll be swearing long and hard into the night getting that goddamn, effing server to join the friggin domain! And yeah if you’re using SharePoint don’t use it. Matter of fact find the guy that recommended this as part of the technical solution and slap them, it’s a horrendously painful and complicated product to set up and it does not play nice with CloudFormation or re-attaching volumes from snapshots either. SharePoint – I hate it. There I said it.

Conclusion

Using AWS for your dev and testing is an absolute win over the more traditional methods (physical servers - arrgh!, VMWare - Hosts forever running out of capacity).

Sure it will take time and investment in your DevOps staff to plan for and use it appropriately (show me a new infrastructure technology that doesn't need it), but the payoff in being able to completely automate your environments, increase/decrease resources as you need, scale up instances (such as your build servers when they are starting to run hard) and the lower TCO is impossible to ignore. Best of all once you have done it once, and done it properly, you can reuse a lot of what you created for other projects and clients. 

Wednesday, March 7, 2012

Configuration Management, more important than you think


Let me start this by saying that Configuration Management, in my terminology, means the practice of maintaining and managing source code, builds, unit tests and releases – essentially the management of all development and release artefacts and not something else. 

Personally I don’t consider myself a Configuration Management person, I have done the role several times, even for teams of developers numbering more than 50, but it is not something that I aspire to do. What I do have for CM however is a deep appreciation of just how vital it is to the success of delivering an IT project, and in particular one that involves offsite development, multiple releases and on-going change management. For these reasons most people who work with me know that I evangelise the importance of CM a lot. You can chalk that up to having a lot of experience working on some very complex IT projects where I realised pretty quick how essential it was to have a functioning CM practice that ensured everything which got developed, got built, tested and deployed in a methodical, efficient and systematic manner.  But this post is not going to be a lecture on the finer points of branching, merging, build automation, continuous integration and so on – there’s more than enough stuff about that on the Internet to keep anyone amused for months, instead I’m going to have a rant around the question of why is it is sometimes still done so poorly in the IT industry.

First up let me say that not every client I have worked at, or company I have worked for, does it badly or under-value’s it, but a decent amount still do. I’ve seen an alarming number of projects being run over the years where a source code repository, a couple of branching diagrams and maybe, just maybe, a few install scripts is as far as the thought process goes with CM. Worse still the implementation and maintenance of it is usually done by developers, usually a few “heroes” that know how it should all hang together, with little consideration given to anything except to just get the release out the door. In short it usually fails once the project gets big. It’s the same old story, the business gets more funding or wants more features and as such, more releases are then needed. All of a sudden the development team realises that the solution now needs to be deployed across multiple servers to cope with the higher demands. The architect gets consulted, updates the architecture and thus in-turn increases the deployment complexity to cope with the new features which all now need to be incorporated into the builds and deployment scripts and installers. In a panic code branches are created left, right and centre to cope with multiple releases and development teams. Merges then get made, often in haste, over-writing code already produced causing more delays and down the slippery slope the project goes. An even worse scenario in this situation is when you have multi-site development teams, especially if they are located in another country or time-zone, because the chaos spreads to these areas as well compounding all the problems. By the end your source code repository resembles a weeping willow tree


The trunk is in there somewhere!
Not good. 

So why do these problems always happen? Could it be the architects not catering for it in the estimates or the finalising of the project plan? Possibly, but it’s not like architects weren’t developers once so they would know how tight the leash needs to be to keep them in check (if an architect was never a developer by the way then run a mile – I’ve had the pleasure of working with these unusual breeds of people before and they are dangerous). Any decent Architect would know that the quality of the code that will support the solution is inorexably tied to the Development Leads overseeing the work and the CM Resource (or Team) managing the process of its production, compilation and deployment – not to mention quality assurance which is a shared responsibility. As such architects should always be seeking to understand how the Development Team leads are managing, testing and deploying the code to develop the solution they spent weeks/months of long hours finessing to get right. If you’re not doing this then you need to question how serious you’re taking the work you are doing. But if the fault cannot be traced back to the architect then where does it lie? Experience suggests that it is usually a result of budgeting and estimation not catering for it and typically this occurs because the Project Managers or Business Owners (who often aren’t technical and hence don’t understand or appreciate the critical value of it), will push back on getting permanent CM resources. Other times it is the Dev Leads who, under pressure to cut down their estimates, will downgrade it to a small task that a developer can manage: “just create the branches; get a build running and that’s all we need” has been a common saying which is really another way to say “we’ll worry about it later, lets win/get the work first”. 

To put it bluntly if you don’t do it, on any project where more than 5 developers are working (and thus extrapolate that out to the number of testers, analysts etc..), then you are going to pay a heavy price for it when the project grows in size. 

And now for a real-world example to illustrate the points made so far:

 A few years ago a company I was working for was tasked to build a very large IT system. It had numerous function points, integration, workflow, and batch processing requirements. The user interfaces were highly complex and the design work required a large amount of modelling. In summary it was not your average 6 month project, it was estimated to take several years to complete, and the code was to be developed offshore involving a team numbering nearly 70 developers at its peak. The offshore team were told to get a CM resource on board and, not finding one readily available, hired a senior technical lead to do the job. After a few months the Tech Lead, who had not sat and planned the CM work effectively and thus had not mandated the standards around branching, merging, versioning, builds and testing (as he did not understand CM), became buried in a huge spaghetti mess of build and deployment scripts, numerous branches and an incoherent folder structure. Lacking adequate and enforced standards the developers began running rampant, adding compiled assemblies into the source code system, creating numerous folders and sub-branches, diluting the quality of code (as there were no QA checking tools installed) and no-one was sure how it was all meant to work so builds and environment installs took days to complete. End result: builds were failing everywhere and the onshore team were raising concerns. Offshore decided that the problem was thought to be resourcing and so another resource was put onto the team. This time it was a junior developer. The junior developer, not knowing what she was doing just made the mess worse, so another junior dev was brought in and the problem grew worse again. At this point you could assume that it’s easy to point the finger at the offshore team but that would not be fair. The onshore team themselves, who were so overloaded with work, were not constantly monitoring the CM process as they did not have the time; they just saw the builds failing. And secondly, the resources tasked to do the CM work were not experienced, they were just told to get it done. In other words: Management failure. Eventually onshore resources were sent offshore to get the project back into shape and then the penny dropped when they arrived and saw the development process in action. I got brought onto the project, initially to do an architectural assessment but then I started noticing  the CM issues as well, got called into a few meetings and then it was determined I should try and help sort it out. My first step was to immediately draft a set of standards that would become the bible for managing the CM work. This was distributed across all the development teams and it was made clear that adherence to these standards would now apply and anyone caught breaking the rules would have their work removed and potentially themselves off the project for repeat offenders. I then halted all code check-in work, cleaned the source code repository (freeing up GB’s of disk-space) and set about wholesale restricting of the builds and scripts. 8 months later and a trip overseas to help the team institute the standards and we had a functioning CM practice.  It would be fair to say that CM was estimated for a lot better in future projects after this.

So resourcing is for CM is harder than some people realise

Configuration Management is not easy; finding good resources therefore can be very hard and companies that have them, particularly large organisations that do a lot of development work, hang onto them tightly. An effective CM person needs to be a strong team leader, possess good analytical skills, have an appreciation for process and be able to document and articulate (in crystal clear clarity) the expectations and standards that need to be met to ensure that the software is stored, built and tested in a consistent and efficient manner. And yes they also need to be feared by developers. I’m not a believer in using fear as an educator but in the world of IT it does tend to be rather effective – a least in this discipline.  CM people need to know architecture, not to be able to recite TOGAF or Zachman to the nth degree, grow beards and wear bow-ties to work, but they should be able to understand how a solution must hang together from a technical viewpoint. This will ensure that the scripts to configure the OS, Databases, Web Servers and so on align with the architecture that has been designed to ensure the solution can be deployed properly. They also need to know technology, particularly the languages used by the development teams, to ensure that the builds, unit tests, reports, code management tools and all other manner of CM artefacts are being put together in the best possible manner to ensure they function and integrate correctly.

Finally CM people must evangelise process automation. This does not just extend to Continuously Integrated builds and integration of reports either, Source Code repositories, build and deployment scripts should be constantly checking to ensure that code being checked in is not violating quality rules and allowing bad code to pollute the body of work. There is no excuse not to do this nowadays, Source Code Management systems are now so sophisticated they can manage an entire project from the design, through to the development, unit testing, system testing, system integration testing, user acceptance testing and deployments. There is also a plethora of third party and free tools available online (big thumbs up here for CruiseControl and NDepend) that can perform all manner of code quality checking, both pre and post-compilation which bolt onto these products. The aim should always be to use these as much as possible to create a water tight Source Code Management system, with detailed reporting, to ensure that quality is constantly being enforced and monitored with minimal human intervention.

Tuesday, December 20, 2011

The architectural success of any solution is directly related to the capabilities of the people who deliver upon it

One thing that an Architect must never forget is the capabilities of those around them. Over-complicate your solution to adhere ridiculously high standards or lengthy time-consuming review and approval processes and it will more than likely fail. Your vision must be able to be translated into modular, workable components that can be designed to integrate seamlessly into a larger picture as highly robust, reusable and deliverable, software components. In other words you must ensure your designers and developers are able to design, develop, test and deploy their work in as simple and efficient a manner as possible. Too much fluff leads to too much uncertainty and guesswork, too much complexity and the project timeline will be half-over before you’ve got any code running on a build server

Being a good architect requires (amongst many other skills) the overall ability to choreograph a high-wire balancing act that revolves around producing a solution that solves the business problem, meets the business functional requirements, meets the quality standards of their IT departments (and yours), delivers on-time but most importantly of all: can be delivered with the people you have at your disposal. To blindly assume every developer is as highly competent or better than you are is to fail before you’ve even started.

A good test for assessing how easy or complex your delivery environment has become is by noticing the ramp-up time for a new starter. If a new person needs to spend a considerable amount of time building their workstation, asking numerous questions of co-workers, has to perform a number of operating system tweaks or custom software installs to just get to the starting line then something is already going wrong.

Anyway rather than waffle any further let me present a story from the trenches to illustrate the point.

Early in my career I cut my teeth working on a highly complex, bespoke developed CRM application that went through a number of versions and iterations over its lifetime. This application was massive (and I do mean massive), it had over 100 developers alone at its peak, over a 1000 CR’s, and featured hundreds of Use Cases. As it grew bigger with every release more architects got involved to assist with the delivery of the enhancements. This injection of architects (all with their own opinions and persuasions) caused complexity to go through the roof as they deemed that new architecture frameworks were necessary to meet the increasing functionality demands of the business. Wild ideas ran rampant, various splinter groups went off on tangents creating their own processes and standards to deliver upon them as they all strived to create perfect architecture frameworks. A fall-out of this process also meant highly complex co-existence measures were required to ensure the new frameworks still worked with the previously established ones too – primarily because re-platforming was not an option. Knowing that delivery complexity was increasing exponentially as a result of adopting these new frameworks (and fearing that the technical designers, developers and testers were struggling to understand their vision), the architects started micro-managing every minute detail of the solutions being delivered to ensure they conformed to the numerous standards and processes they had created. They established numerous review and QA stage-gates to measure and ensure project delivery remained compliant with their architectural vision and established lengthy review and approval processes to govern the production of key deliverables.

Starting to see the tidal wave building now? The guiding mantra of simplicity and efficiency effectively went out the window at this point.

Approval processes to make for small changes took weeks to get finalised and developers who could not grasp the complex processes started to under-perform and produce poor code. This then in turn caused numerous defects, blew out testing times and projects started running well over budget causing late nights for everyone concerned (the record holder was set by a deployment architect – 40 hours straight, I only managed 24 as it was around that time that the code started to dance on the screen and I developed muscle spasms around my eyes). In short the desire to gold-plate and genericise every aspect of the delivery put such high demands on developers that they began to crumble under the strain. Thus the delivery of new business initiatives on the new architecture frameworks became impossible using existing time-frames established in the past for similar sized business initiatives. The large numbers of developers just could not grasp the complex coding structures, frameworks and processes set by the architects.

Delivery costs are now increasing, try explaining to the Business why you need 30% more time to deliver a CR now, than you did in the past, for an equivalent sized piece of functional change. Architectural purity? Good luck!

Compensation for the increased delivery times, and exponentially increasing complexity, meant the estimates produced for new projects blew out in ever increasing margins because no-one really knew how it all hung together. Tech Leads built in large amounts of contingency as they were afraid of not making the deadlines with the teams they had at their disposal. Delivery resources that had been on the project for a number of years were fought over at the resourcing table by PM’s as no-one wanted to take on a “new guy” because they knew they would not cope. And by and large they didn’t so those that were competent were kept in their roles because they could be relied upon – to the detriment of their own promotional aspirations.

Now a “hero” culture has been created.

So as the estimates kept growing fights broke out between the PM’s, technical designers and the architects as those at the coal-face could see the processes was not working. The architects struggled to translate their vision and ideas, a lot did not really know the intricacies of the frameworks, and others did not see how it had become so difficult. This malaise in turn caused other groups involved in the delivery of the application (PM’s, Testers, DBA’s etc..) to grow frustrated at the late deliveries which also were causing them late nights to test/deploy changes to get the projects over the line. The finger pointing began in earnest as all grew disenchanted with the entire delivery process and hence lost faith in the architecture team because they could not see the value in what they doing.

Once people lose faith in the architecture you’re on a very slippery slope

The architects however, despite the numerous issues could see the value in what they doing. Their frameworks were creating a much more modular, generic, extendible and reusable enterprise architecture platform. It was highly configurable, well-documented and the code (at least at face value) looked very well-written as they had condensed so much of the common functions down that developers only needed to (in theory) fill in the code logic gaps. Unfortunately to everyone else all the new architecture frameworks did for them was to make everything take longer to deliver, become more complex to test and deploy than it did developing on the older architecture frameworks. In other words it’s not that it was fundamentally flawed and would never work, it’s just that it was too clever for its own good and the capabilities and abilities of the people tasked to deliver on it.

So what did I learn from being involved in all of this?

Always consider the simplicity and efficiency of the architectural decisions you make as they are interpreted and communicated down the chain to the delivery, testing and deployment teams. Always keep in mind the capabilities of those that will implement it and the SDLC process they must follow to deliver it. Compensations must always be made in order to have a successful delivery of IT projects, knowing where and how best to do this unfortunately only comes with experience – you can’t teach it.

And finally always remember that when choreographing your high-wire act you don’t set the rope too high to cause a high degree of danger to your team or too low to provide a low-degree of value for your audience.  Better make sure you have some safety nets in place to catch those that are likely to fall too.

Thursday, October 20, 2011

How I fell in love with the stored procedure all over again

There was once a time when getting data in and out of a database involved embedding SQL Script directly in code, it was a horrible, ugly way to extract data, prone to error and more often than not a drain on performance, not just with the inefficiency of connection pooling, but also in the parsing of the raw SQL code itself. Not to mention a serious security flaw with SQL injection attacks which burst onto the scene with the uptake of the web as the 1990’s was drawing to a close and is still highly prevalent today.

So why the stored proc?

To address these issues many software systems started being architected more and more with the use of stored procedures as the standard API into the database. Despite the obvious performance benefits there were security benefits too as the use of parameters when calling them meant that SQL Injection attacks could never work. In short it was a win-win and as their use grew wider-spread, support for them in coding frameworks grew along with them. From the early days of COM-based ADO with Microsoft Visual Basic programming, through to ADO.NET with .NET and of course the various incarnations of JDBC amongst other languages. In short the control of how data was extracted and processed from the database was now controlled; within the database.

For all intents and purposes leaving this domain of control in the database was a very good idea. Stored procedures after all get pre-compiled and optimised for performance within a database (well, at least in the bigger enterprise styled ones they do) and thus it made for a very fast and efficient way to pull data in and out of a database. But like any service-based layer it is only as good as the code and architecture behind it and as over-zealous developers and projects with a lot of budget cuts relegated the programming of stored procedures to developers, this often produced disastrous results. Stored procedures would get bloated and use inefficient data processing SQL statements and this situation only got exacerbated if the RDBMS itself was poorly designed. Projects of course that had good DBA’s and database developers did not suffer the same fate and it was soon realised that on large development projects having a dedicated database programmer for stored procedures was essential.

But ultimately as the development processes matured with their usage the goal of a maintaining an architecturally pure, service-based façade around a data layer became possible. This theory of course is inline with modern design best-practices for software systems. So to complement these advancements frameworks started arriving that helped to auto-generate the code around stored procedures to make their usage in code even more efficient. Writing code to call stored procedures was, after all, a laborious process so any tool that could auto-generate the code-wrappers to call them would save a great deal of time. And as these frameworks got more and more sophisticated, they matured into what is now known as Object Relational Mapping tools. These things were heralded with great fanfare as they could enable the by-pass the stored procedures and instead enable the creation, in code, of a strongly-typed object model that is based on the tables and relationships in the database. Architects and developers flocked to these tools, no longer would there need to be a dedicated SQL Programmer on a development team, architects could design the RDBMS (mostly) themselves and a tool would write all the code to take care of developers using the data-structures in code with all the relationships maintained and protected. Hibernate, NHibernate, LLBLGEN, CodeSmith and many tools flourished and their up-take became so popular that almost overnight it became inconceivable to run a development project without them. Development teams would claim upwards of 30% in reduced development times, some even higher and even attempts at justifying why DBA’s are no longer needed by architects were even thrown around – ridiculous I know but I did hear it!

But then things got ugly.

Because the ability to be able to create a framework that is generic enough to manage complex, in code, relationships based on a well-normalised relational database gets very, very difficult the more joins you create in the code. Underneath this lovely, generated, strongly-typed object model you still need to get the data out of the database – and that means you need to write SQL Code. So that means you need to have a framework that can generate this code – for all the possible different relations and permutations. And that means these statements get really big and really quick, the more joins you make. To date I have seen SQL statements that have some of the biggest SELECT query syntax I have ever seen being produced by Microsoft’s LINQ to SQL and the Entity Framework, the same goes for NHibernate and LLBLGEN – although the latter are usually a bit better. All these statements have to get compiled by the database, every time they run, and that means the database starts taking a lot longer to get data in and out. Steps have been taken to perform lazy lookups and delayed execution by these object models, this has helped to address immediate performance issues but an you imagine, if you will, being the architect having to explain to the DBA at your client the reason why a SELECT query with three joins needs to be printed out on an A3 sized piece of paper for the system you have just crafted?

So where does this leave us?

 Well put simply,

Don’t stop using stored procedures
  • Do stop trying to build complex object models of database relationships, tables and structures in code.
  • Only use ORM’s to manage calling stored procedures.

But admittedly this is not doing ORM’s justice. They do have their usage and their place, but they are not the total solution for always pulling data from a database as many in this industry would have you believe. They work well for simple table inserts and updates and when complex joining operations are not being performed. For small applications or ones where calls to the database are not common they are also very suitable. They do cut down on the amount of code that is needed to be manually written and also provide securities around how data is managed through a strongly typed object model.

Data Caching is a big plus with ORM’s, with stored procs and indeed any calls to a database the same static information is served time and time again. ORM’s have gotten good now at caching data and thus for static, referential data this is a big bonus, especially when volumes are large and being served to web-based applications.

Development wise ORM’s also facilitate a much faster turnaround in getting the modelling of the data from the database done correctly,  and of course the type-safety of the generated objects eliminates the plethora of bugs that this always used to generate back in the days when you had to roll your own database access layer.

So what do we use ORM’s or stored procs?

Simple answer here: use the right tool for the job and follow some basic rules
  • Big bulk inserts, lots of complex joins and database logic required– use a stored proc.
  • Simple CRUD apps, lots of reads of static referential data – use an ORM.

And yes the two can coexist if you architect the design well enough because no single solution can ever guarantee a blanket answer for an IT system when it comes to managing transactional and referential data.

Thursday, July 14, 2011

Gold-plating or the Curse of the Architect

No solution should ever be engineered to be so technically complex or genericised to the nth degree that it becomes virtually impossible to redevelop, extend and maintain. While your years of technical experience have made things in your mind once seemed complex to now be easy,  the same is not true of those in your team who are likely to have much less experience than you. The same applies for the process and method that you must implement, which extends across the gathering and documenting of requirements, designing the software, developing it, building it, testing it, deploying it, maintaining it and so on and making sure it all integrates in a seamless fashion to deliver what is required on time and on budget. If only a select few or no-one at all “gets it”, then you’ll fall behind the moment you start. Communication is the one key attribute that you have to master; being able to communicate what must be done in clear, simple, language that is easily understood by all is a fundamental skill for an architect.


Complexity will bind you

Create too many staging gates, too many cumbersome and lengthy review and QA cycles, fail to clearly specify the deliverables, who owns them, how they align to the methodology and project plan or enforce a tightly coupled and rigid developer environment with no automation of quality and far too room for creative thought and things will fall apart. Nowhere is the need for this more pressing than in offshoring of software development too. The method, the process, the standards, they must be so well-defined and translatable from the architecture and the requirements right down to the lines of code that the concept of the “code factory” can actually be realised. But more on that in another article.

Consistency will save you

You must make sure that the solution is designed and broken down into components that can be easily understood by designers and developers so that they ultimately become reusable, testable and maintainable. Make sure that the way in which every single artefact is produced is done in a consistent fashion. There is no shame in creating more components within a solution if it improves the overall simplicity and consistency in the process of design and development. In fact it may end up making it quicker to produce than alternative methods because a simple and efficient process, once engrained and embedded in the minds of those following it, becomes innate, repeatable, measurable and also predictable. Make aspects of the solution, or the process to produce it, do too many things and it will grow out of control quickly because you will lose track of where and how things are being done. If consistency is inherent in everything you do changing things is simple. A highly modularised design is easy to modify and extend than one which is tightly coupled, cumbersome and inconsistent from one software layer to the next. We’ve all heard of the importance of architectural patterns, no doubt we’ve all read the Erich Gamma and co work, one of the principles that underpin this form thinking is consistency.

A saying I picked up early in my career as a junior developer from a highly skilled, if somewhat socially inept, architect was that I have never forgot: “I don’t care if you make mistakes; all I care about is that if you do make them, you make them consistently. Consistent mistakes we can fix inconsistent ones we cannot”

Minimalism will break you

There is a perception amongst many architects and developers that trying to be as minimalist as possible by putting as much complexity into the artefacts they produce is somehow conducive to creating a highly elegant and functioning application. It isn’t. Unless you are blessed with a team of people as equally smart and intelligent as yourself it will not work because fundamentally all IT projects are produced by humans and humans all think differently. Know your teams capabilities, know the expectations of the client and create processes, standards and a solution that meets these requirements in a simple and consistent fashion and you will be successful. Your worst enemy is always yourself, over-think, over-engineer, over-complicate it for your own ego’s sake and it will fail. You can sometimes get away with it on a small project < $500,000 AUD but you won’t on anything above and beyond $1M AUD

Owning the failures and sharing the success = respect

Back yourself and your judgement. Be confident in your decisions and people will buy-in to what you are selling, be cagey, un-cooperative and aloof and those below you will lose faith in the directions you set. There is no shame in being wrong or not knowing all the answers, just be accountable for your mistakes and learn to accept you are not always right and you will be amased at how well things will turn out. Don’t be afraid to stick your neck and take responsibility for when you fail. Because you will fail. But the most important thing is the way in which you handle and respond to it. Start pointing fingers, shouting and blaming others and you will lose respect. Own the response to fix the problem, commit yourself and always tell the truth, even if it hurts when doing so, and you’ll be respected.

How to sum it up? Why quote a luminary of course

I am both a victim and a perpetrator of this quote from Frederick Brooks, bookmark it and remember it, to keep yourself grounded:
An architect’s first work is apt to be spare and clean. He knows he doesn’t know what he’s doing, so he does it carefully and with great restraint.
As he designs the first work, frill after frill and embellishment after embellishment occur to him. These get stored away to be used “next time.” Sooner or later the first system is finished, and the architect, with firm confidence and a demonstrated mastery of that class of systems, is ready to build a second system.
This second is the most dangerous system a man ever designs. When he does his third and later ones, his prior experiences will confirm each other as to the general characteristics of such systems, and their differences will identify those parts of his experience that are particular and not generalizable.
The general tendency is to over-design the second system, using all the ideas and frills that were cautiously sidetracked on the first one. The result, as Ovid says, is a “big pile.”