I have been exploring several web development platforms for quite a few months now. It is not that there is a shortage of great frameworks out there (I am not averse to learning a new language in order to use a good framework), and I did play around with a few really good ones, but I have a few stringent conditions that I want the framework to satisfy:
Since my goal is to build an application that provides hosted services to multiple organizations, the framework must be one that treats multi-tenancy as a first class member in their feature list. There’s always scope for some heated debate on whether multi-tenancy is the best approach, especially when it comes to isolating data between multiple clients. One of these discussions, which I found quite informative, is this.
Multi-tenancy is definitely not the best solution for all usage scenarios; one could argue that multiple single tenant databases are easier to scale out horizontally. However, scaling out is not entirely impossible with the multi-tenant model, and it does save me certain overheads like multiple-maintenance of common configuration data.
My reasons for opting for multi-tenancy are the lowered upfront and recurring infrastructure costs compared to running single-tenant-per-db solutions, and an easier maintenance/upgrade path. However, since the job of isolating data between clients is managed exclusively at the application level, that implementation has to be absolutely water-tight.
Native Application State
‘Shared-nothing’ platforms, like PHP, do not have a built-in application state. Once again, it is ultimately a matter of opinion as to whether or not this is a good thing, but I personally prefer systems where the bootstrap process ideally takes place just once in the lifetime of the application.
Devoid of a native provision for long-lived objects, a stateless platform has to bootstrap the entire framework for every single request that it processes. This is because all objects, class definitions, and all compiled bytecode in general, are restrained to scope of the request itself. While this does make thread-safe programming a no-brainer, it incurs a severe overhead for having to rebuild the object graph and other data structures in memory for each request (even those whose data are request-independent). No wonder it performs very poorly when compared to a platform like Java, in which an object once loaded into memory (even when triggered by a request) can legitimately outlive the request itself, thus saving time while processing future requests, since they can reuse this loaded object.
The lack of an application state can be offset by using op-code caches like APC (for PHP) which can cache compiled bytecode, and even objects across multiple requests (Note that doing this essentially violates the shared-nothing principle, one of the fundamental tenets of PHP). Memcache-based solutions can also be used as an alternative to, or in conjunction with APC. However, these solutions are not built into PHP, and thus require additional modules and/or coding in order to use (this also means there is additional code to execute). Expiring externally cached objects is also a non-trivial issue, since a separate garbage collector must be designed for that. At the end of the day, nothing can beat the speed of directly accessible, in-process-memory caching (with no protocol overheads) that native application state offers. Here’s an interesting Q&A with David Strauss, creator of Pressflow (essentially Drupal on steroids). Just the following excerpt from one his answers should drive home the point:
Because the overhead is largely on the PHP side, Pressflow is exploring ways to accelerate common functions by offloading select parts of core to Java (which blows away PHP + APC on a modern Java VM) and performing expensive page assembly and caching operations with systems like Varnish’s ESI and nginx’s SSI.
No prizes for guessing what gives Java this performance edge ;-). Even a simple PHP application needs help from external caching, and other auxiliary mechanisms in order to satisfactorily serve anything more than a handful of requests per second.
Independently accessible service layer
Say we have an application up and running, and it needs to be accessible though multiple devices, and over multiple channels like HTML, REST/SOAP, RSS and what not. Most platforms come pre-packaged with a scaffolding system for building the presentation layer on HTML by default. This is not a bad thing in general, except when I’m accessing the app from something other than a web browser, such as a mobile app with (screens built in), or from another webapp. In cases like this, I would like not even the slightest overhead to be incurred in loading and building any part of any presentation layer that is not required for serving these requests.
This is possible only when the framework has been designed from ground up with ‘headless’ operation in mind. Basically this translates to a totally detachable service layer that can be invoked independently of the default presentation system.
Lightweight domain objects
I’ve come across a few excellent frameworks that do something I find really strange, and the reason for which escapes me. Their domain modelling paradigm dictates that all business logic pertaining to a specific domain model be contained in the model itself. Really??! In cases where one is building large lists of objects of a particular ‘heavyweight’ domain class, this embedded business logic is simply bloating the memory usage. Now I get the part about static methods and variables (before the slingshots come out ;-) ) which are instantiated just once per class and not per object, but static members were designed with different architectural goals in mind, and not specifically as a memory saving construct. Hence they do help reduce the overhead somewhat, but not by much (every object still needs to maintain pointers to reference the static members).
Another problem is: where do you put logic whose concern spans multiple domain classes? Or has nothing to do with domain classes? Neverending source of confusion, that.
I would rather go for a design which treats domain objects as simple data beans, with no more than the simplest validation rules built in. The heavy lifting of business logic should be borne by a dedicated service layer. This approach also simplifies the implementation of independently accessible service methods that I outlined in the previous section.
Pluggable storage backend
This is a short one. Most frameworks support interchanging one RDBMS with another fairly smoothly. I want to throw NoSQL stores into the mix. I want to be able to plug in MongoDB or Couchbase, for instance, to supplement the RDBMS with certain functions that NoSQL DBs excel at, but I don’t want to change the way I use the persistence layer. Whatever the technology I use for abstracting the storage functionality in the application, its API must let me work seamlessly with non-relational data stores as well.
That more or less covers my wishlist of things I’m looking for, in a web framework. In order to keep the title short, I didn’t mention it has to be open source as well (yes it does :-) ). I think I might have found one that manages to check all boxes on the list, but I’m open to suggestions.