Monday, July 4, 2011

Common Information Model, Canonical Schema, whatever you call it, just do it. Always.

Whatever name you apply to it, any software that is being developed be it a custom ground-up build, to a piece of integration middleware, one of the first and foremost task of any designer is to model the data that the system is going to use and the structures and relationships that compose it.

Before you start creating your sequence and activity diagrams survey the domain of the problem you are trying to solve. Look at all the unique pieces and groupings of the data that is going to be used throughout the software layers and interfaces, how the hosts systems categorise and organise relationships, how the business requirements reference and refer to it and so on to use that information to create the model. You’ll get a lot of this from the use-cases being constructed (if they are thorough enough) and also from system interface specifications, database structures, screen layouts. If you’re lucky the client may have already done this task on a previous project and hence you may be able to leverage the work already done, sometimes you can find evidence of it within the enterprise architecture – although more often than not in this case it will be very high-level and difficult to leverage without a lot of decomposition.

Generating the model is pretty straightforward, you can use any modelling tool that is available, but try and use one that enables the generation of a Class diagram into code such that it is always easy to maintain updates. My preferred tool of choice is Enterprise Architect by Sparx Systems. Not just because it is Australian, but because it is simple to use, cheap and very, very powerful. Other solutions would be to use XML, middleware tools such as BizTalk Server adopt this approach when defining the data schemas.

The level of modularity you build into the model is important and should take into consideration how this model can be extended and reused within the project you are working on and potential ones in future. One of my preferred methods to break a model down is to use a common database design theory known as normalisation. Once you have created your first drafts of the data model start the process of normalisation and break it down so that it becomes more modularised and hence more extendable and reusable. The extent to which it gets broken down is dependent on what is appropriate for the system being built and is dependent on a number of factors so at least get it to second or third normal form and leave it there.

Once the model is defined its usage should be permitted only within the layer it has been created for. A Business Logic Layer model should not be used in the Presentation layer and not in the Data Layers (as they should model their own data accordingly) - if it is for an integration solution the concept of internal and external schemas should be adhered to, the principle is the same. Exposing any of the models entities within service interfaces should be forbidden as the flow-on impact of a change to a model object will not be contained within the layer itself but instead will impact the services that expose it as well. For these reasons this means all requests should be translated to and from the data model within the services that expose the interfaces. The following diagram illustrates this concept in more details


Encapsulated Data Model

Now I bet some of you read that last paragraph and thought that was a load of crap? If you didn’t good, if you did then consider this question: why did the major database vendors start incorporating stored procedures into their platforms to control access to the data held within tables to offer alternatives to making direct table access calls from functions in code? Not sure? Because it was a bad idea 20 years ago and it still is now. Keeping that factor in mind lets us ponder another: it is both an accepted fact and considered best-practice within the IT industry that all logical layers of a software system should have a boundary of controlled entry points, and that these entry points must not be bound to the data structures and logical functions within to avoid both exposure of data and logic (sometimes this can be a security issue) and that the entry points should be able to be versioned and extended without impacting the logic and functionality underneath. Sound familiar? This is one of the principles that govern the implementation of service-based system – also known as being part of a broader SOA implementation. See how what I have described above is just following the same pattern? Yes you could avoid it on small applications where the code base is small but if you don’t do it on enterprise scale applications with large development and design teams you’ll be screwed so therefore why not follow the same pattern and just make it a habit. At times it may be a bit more work but I believe the trade-offs are worth it.

No comments:

Post a Comment