Thursday, October 20, 2011

How I fell in love with the stored procedure all over again

There was once a time when getting data in and out of a database involved embedding SQL Script directly in code, it was a horrible, ugly way to extract data, prone to error and more often than not a drain on performance, not just with the inefficiency of connection pooling, but also in the parsing of the raw SQL code itself. Not to mention a serious security flaw with SQL injection attacks which burst onto the scene with the uptake of the web as the 1990’s was drawing to a close and is still highly prevalent today.

So why the stored proc?

To address these issues many software systems started being architected more and more with the use of stored procedures as the standard API into the database. Despite the obvious performance benefits there were security benefits too as the use of parameters when calling them meant that SQL Injection attacks could never work. In short it was a win-win and as their use grew wider-spread, support for them in coding frameworks grew along with them. From the early days of COM-based ADO with Microsoft Visual Basic programming, through to ADO.NET with .NET and of course the various incarnations of JDBC amongst other languages. In short the control of how data was extracted and processed from the database was now controlled; within the database.

For all intents and purposes leaving this domain of control in the database was a very good idea. Stored procedures after all get pre-compiled and optimised for performance within a database (well, at least in the bigger enterprise styled ones they do) and thus it made for a very fast and efficient way to pull data in and out of a database. But like any service-based layer it is only as good as the code and architecture behind it and as over-zealous developers and projects with a lot of budget cuts relegated the programming of stored procedures to developers, this often produced disastrous results. Stored procedures would get bloated and use inefficient data processing SQL statements and this situation only got exacerbated if the RDBMS itself was poorly designed. Projects of course that had good DBA’s and database developers did not suffer the same fate and it was soon realised that on large development projects having a dedicated database programmer for stored procedures was essential.

But ultimately as the development processes matured with their usage the goal of a maintaining an architecturally pure, service-based façade around a data layer became possible. This theory of course is inline with modern design best-practices for software systems. So to complement these advancements frameworks started arriving that helped to auto-generate the code around stored procedures to make their usage in code even more efficient. Writing code to call stored procedures was, after all, a laborious process so any tool that could auto-generate the code-wrappers to call them would save a great deal of time. And as these frameworks got more and more sophisticated, they matured into what is now known as Object Relational Mapping tools. These things were heralded with great fanfare as they could enable the by-pass the stored procedures and instead enable the creation, in code, of a strongly-typed object model that is based on the tables and relationships in the database. Architects and developers flocked to these tools, no longer would there need to be a dedicated SQL Programmer on a development team, architects could design the RDBMS (mostly) themselves and a tool would write all the code to take care of developers using the data-structures in code with all the relationships maintained and protected. Hibernate, NHibernate, LLBLGEN, CodeSmith and many tools flourished and their up-take became so popular that almost overnight it became inconceivable to run a development project without them. Development teams would claim upwards of 30% in reduced development times, some even higher and even attempts at justifying why DBA’s are no longer needed by architects were even thrown around – ridiculous I know but I did hear it!

But then things got ugly.

Because the ability to be able to create a framework that is generic enough to manage complex, in code, relationships based on a well-normalised relational database gets very, very difficult the more joins you create in the code. Underneath this lovely, generated, strongly-typed object model you still need to get the data out of the database – and that means you need to write SQL Code. So that means you need to have a framework that can generate this code – for all the possible different relations and permutations. And that means these statements get really big and really quick, the more joins you make. To date I have seen SQL statements that have some of the biggest SELECT query syntax I have ever seen being produced by Microsoft’s LINQ to SQL and the Entity Framework, the same goes for NHibernate and LLBLGEN – although the latter are usually a bit better. All these statements have to get compiled by the database, every time they run, and that means the database starts taking a lot longer to get data in and out. Steps have been taken to perform lazy lookups and delayed execution by these object models, this has helped to address immediate performance issues but an you imagine, if you will, being the architect having to explain to the DBA at your client the reason why a SELECT query with three joins needs to be printed out on an A3 sized piece of paper for the system you have just crafted?

So where does this leave us?

 Well put simply,

Don’t stop using stored procedures
  • Do stop trying to build complex object models of database relationships, tables and structures in code.
  • Only use ORM’s to manage calling stored procedures.

But admittedly this is not doing ORM’s justice. They do have their usage and their place, but they are not the total solution for always pulling data from a database as many in this industry would have you believe. They work well for simple table inserts and updates and when complex joining operations are not being performed. For small applications or ones where calls to the database are not common they are also very suitable. They do cut down on the amount of code that is needed to be manually written and also provide securities around how data is managed through a strongly typed object model.

Data Caching is a big plus with ORM’s, with stored procs and indeed any calls to a database the same static information is served time and time again. ORM’s have gotten good now at caching data and thus for static, referential data this is a big bonus, especially when volumes are large and being served to web-based applications.

Development wise ORM’s also facilitate a much faster turnaround in getting the modelling of the data from the database done correctly,  and of course the type-safety of the generated objects eliminates the plethora of bugs that this always used to generate back in the days when you had to roll your own database access layer.

So what do we use ORM’s or stored procs?

Simple answer here: use the right tool for the job and follow some basic rules
  • Big bulk inserts, lots of complex joins and database logic required– use a stored proc.
  • Simple CRUD apps, lots of reads of static referential data – use an ORM.

And yes the two can coexist if you architect the design well enough because no single solution can ever guarantee a blanket answer for an IT system when it comes to managing transactional and referential data.