Friday, April 18, 2014

Offshoring - lessons and observations from the coal-face

IT is all about adaptation and change, if you can’t handle it, you’re in the wrong industry.

October 2013. My fifth trip to India and like my previous trips it was as eventful, colourful and fun as ever. I really love visiting this country, the people are really cool, the food hot and the culture vibrant. It’s a great place to visit for work in IT and I have had many wonderful experiences being there. This following series of blog posts will deal with some of my major observations of working with IT Delivery projects in India, and the challenges and observations that I have made over the last 10 years being involved with them from initially as a .NET/COM developer through to Solution Architecture and DevOps

First up let me say that there are lot of great things about offshoring that work and work very well. Managed services work, call centres and help-desks are great candidates for offshoring because, by and large, the processes they follow are (usually) well-defined and the steps to resolve and assist with issues are highly prescriptive. Application delivery however is a very different beast and this is what I am going to concentrate on.

In a typical IT application delivery project you are always going to require some base, non-negotiable fundamentals for success such as:
  • Strong management support to streamline and plan work
  • Close collaboration and good working relationships between teams – in Agile that goes without saying
  • Good architecture and well-defined requirements – complemented by governance processes to manage gaps and change.
  • People who can think laterally to solve issues and solve them early, quickly and efficiently
  • Good investment in development and environment infrastructure to ensure there are no excessive downtimes waiting for software to build and deploy.
  • An embedded culture of continuous improvement and automation to future-proof against risk and maintain a high quality of work
  • Plenty more but that will do for now.

Running a delivery project offshore is no different however it is amazing at how of all the items I listed above, how poorly done they can be when their importance is scaled done or the corners end up getting cut in an effort to rein in costs.

Cutting corners on an IT Delivery project is always be a very stupid thing to do, but this is exacerbated so much more when you throw offshore delivery into the mix. Simply put if you believe that being able to deliver software offshore for your customers will be fundamentally cheaper for you as a business than doing it onshore, or you can cut corners such as not invest heavily in supporting infrastructure and staff, reduce travel budgets, or not hire experienced architects, testers, developers and DevOps staff because people will just “work harder” or “work smarter” then you are delusional.

One of the most important things to ensure success is the implementation of effective communication between teams and the right infrastructure to support it such as video conferencing, tools like Lync and frequent travel back and forth. Failure to do this creates an “us and them” culture that spreads like a virus. If it is not addressed, the environment can turn very toxic as the bad blood spreads from the teams at the lower end of the spectrum, who are usually the most fearful of losing their jobs, bubbling up to the top causing reactionary and defensive tactics on both sides to avoid accountability



Typically the end results of this are onshore teams will try their hardest to put down, discredit and deliberately (in extreme cases) sabotage work done by offshore. Conversely offshore teams will become hell-bent on proving how much better they are than the onshore teams by constantly raising issues, escalating even the most trivial problems to management and being deliberately vague and obtuse, even lie, to avoid doing work or taking responsibility.

In short nobody wins, and if these toxic attitudes are left unchecked to fester then all hell breaks loose, projects fail, and people leave the company. I’ve seen numerous cases where onshore staff turn highly aggressive in telephone conferences as the pressure builds, finger pointing emails begin flowing back and forth, and everything descends into an ugly mess. And all this within the same company – thankfully clients were never witness to things like this!

 And it is really stupid because it is so easy to rectify by doing a couple of simple things such as:
  1. Have a lot of onshore/offshore travel so the teams mix and integrate. Once every few months is not enough, people should be flying back and forth every two weeks so relationships are formed and people get comfortable with each other.
  2. Don’t place employees on these projects that have difficulty dealing with offshore teams. For some people cultural barriers are just too much to overcome and they will be affronted with the feeling of being forced to adapt and change “their ways” to suit others. These people need to be weeded out immediately

But considering this stuff has been documented 1000 times over, and a lot better than what I can do, I’ll instead delve into some other areas

Coming up next time: The Cost Delusion

Wednesday, April 2, 2014

AWS, PowerShell and Jenkins – your complete cloud automation management solution

I had the opportunity to set up a complete DevOps architecture for a big onshore/offshore (Australia/India) project recently and amongst the many tasks I was set was that the entire development environment (source control, builds etc..) test environments (automation test, functional test, performance test and showcase) had to be hosted in AWS in Sydney within a VPC and secured.

First up great! This was music to my ears, no more stuffing around with physical machines and fighting death cage matches with support people to get hardware upgraded. I could control the environment, the domain, basically everything.

So over the next 9 months I toiled away and came up with, what I think, was a really good solution, a fair bit of it I detail below. To go into the total ins and outs of it would be akin to rivalling War and Peace so I’ll contrast on the important parts of it, namely how I got the most out of the AWS SDK’s.

The setup

Setting all this up initially took a lot of trial and error. You really cannot do this kind of thing without properly planning how your VPC will be set up. Security Groups, Subnets, routing tables, acls etc… there is a bit to get your head around but having said that this excellent blog post sums it all up nice and quick:  Get your head around that and you’re well on your way to nailing this stuff

After a week and a few long nights we had Active Directory setup, groups and user accounts provisioned, we had come to grips with the Remote Desktop gateway server and the NAT Server. Although at this point we started campaigning long, and hard to get our support team to set up a VPN between the corporate network and our AWS VPC and trust the two domains. Eventually after two months of emails, phone calls, risk escalation, intensive nagging we got 4 hours of the support guys time to set it up. You don’t have to say anything at this point, I know what you are thinking and yes it is true we started saving time immediately.

So cool, we now have AWS VPC set up, I can RDP to the AD machine from my local desktop without needing and I have created a Windows Server 2012 Core image to build all my machines upon.

Next hurdle, how do I manage the infrastructure and categorise it

Experience tells me that if I had of just started creating images everywhere for the whims of developers, testers and architects I would have had a hideous mess on my hands by nightfall. Plus I still needed to set up TFS for source control, builds, project work tracking so of course that means SQL Server too.
So in short I needed a way to be able to categorise my instances to control them – enter the AWS Metadata tags. This very simple feature allows you to simply “tag” an instance with whatever key/value you like. Create 1, create 100 it doesn’t matter. Well creating 100 is probably going to be a pain but you get the idea. A couple of hours of putting thoughts to paper, a meeting and a quick chat and we came up with a set of tags that would categorise our instances.

  • Core – always on, candidate for reserved instances
  • DevInfra – Development Infrastructure – almost always on, 20 hours a day minimum.
  • TestInfra – Testing Infrastructure, on for about 16 hours a day
  • DemandOnly – Demand instances only, manual startup, always shut down every day if running
·        We added a couple more over the journey but these four are certainly good enough to get most stuff off the ground.

So now we have TFS installed, developers are developing, builds are building, delivery managers are setting up work items and…. you get the idea.

Next hurdle, how to automate the shit out of everything so that I keep costs down?

Firstly I did not want to have to worry about checking startup, shutdown, backups, snapshots etc… all day, I needed a way to set up a machine with the right software that enables me to schedule automation jobs, keep a history, work with Windows Operating Systems and the AWS .NET SDK, oh yeah and I didn’t want to pay for it either.

There are a number of ways to skin this pussy cat but I combined a bunch of modularised PowerShell scripts and ran it all through Jenkins 

Why PowerShell?

Because it’s all built on Windows. If you’re not using PowerShell to build up and configure your Windows machines you’re doing it wrong.

Why Jenkins?

I know the product well (always good to stick with known, knowns) and it really is a great tool with good online support. It enables scheduling of jobs that can run virtually anything, it can build software, chain jobs together in pipelines and it has a ton of plugins too. Sure it’s a Java tool but only an idiot would assume you can only look for answers in the Microsoft world.

The end result

After a month of solid scripting and testing I had created enough PowerShell scripts and functions that enabled me to do the following with the instances in my VPC all controlled through Jenkins using the metadata tags

  • Startup and Shutdown of instances
    • Core on all the time, DevInfra on 20 hours per day, TestInfra on 16 hours a day
  • Snapshots – Core and DevInfra snapshots are created every day
  • S3 Database Backups – All my database full backups that ran every night were copied to S3
  • Redundancy – New snapshots created were also copied over to the US West Region every night
  • Environment rebuilds – Cloud Formation scripts ran every night to rebuild the test environments so we had a totally clean machine to deploy to daily
  • AWS Cleanup - I created jobs to clean up S3 and instance snapshots once they got older than a couple of weeks

The best part about this solution was that if we added new instances we just tagged them appropriately and then all the maintenance of the startup, shutdown, snapshotting took care of itself. Matter of fact I stopped looking at it after a couple of months as it all ran like clockwork.

We even got really clever with it such as shutting down TFS Build Agents when the amount of queued builds was low through polling the TFS API services and then starting them back up when the builds queued up and so on. We also extended Jenkins to do software builds for the purpose of running Sonar over the top of them and then also to create deployment pipelines so that the testers could self-serve their own environments.

Couple of things I learned along the way

CloudFormation can be a pain in the butt for Windows, when it works for you it’s beers all round, when it doesn’t you’ll be swearing long and hard into the night getting that goddamn, effing server to join the friggin domain! And yeah if you’re using SharePoint don’t use it. Matter of fact find the guy that recommended this as part of the technical solution and slap them, it’s a horrendously painful and complicated product to set up and it does not play nice with CloudFormation or re-attaching volumes from snapshots either. SharePoint – I hate it. There I said it.

Conclusion

Using AWS for your dev and testing is an absolute win over the more traditional methods (physical servers - arrgh!, VMWare - Hosts forever running out of capacity).

Sure it will take time and investment in your DevOps staff to plan for and use it appropriately (show me a new infrastructure technology that doesn't need it), but the payoff in being able to completely automate your environments, increase/decrease resources as you need, scale up instances (such as your build servers when they are starting to run hard) and the lower TCO is impossible to ignore. Best of all once you have done it once, and done it properly, you can reuse a lot of what you created for other projects and clients.