….to its brand new home at
Using the site building platform DocPad, it becomes much easier to embed code snippets and such. This also gives me full control over the site.
Oh, and no pesky ads… up yours, WP!
I will still keep this site up, and respond to the stray comment that may find its way here. Do try to use the new site hereon though :).
One of my initial excursions into Sails.js territory:
I had posted a while back about a web-based front-end to aria2c that I started building, primarily as an exercise in learning Node.js. While it has been functional and available on GitHub for some time, I now have the pleasure to announce that it is now available directly through the NPM Registry.
Needless to say, my JS knowledge base required a few upgrades before I could put together anything smarter than a ‘Hello World’ responder. Said upgrades, among other excellent sources, I have found here, here, here along with the JS object graph learning trail (part 1, part 2 and part 3). In fact, most of How to Node is a must-read if you’re planning on serious Node programming.
But this post is not just about my experience learning server-side JS. In the process of upgrading my overall web development repertoire, I’ve had to undergo quite a steep ramp-up on client-side JS technologies as well…
The biggest problem still, was the lack of flexibility, in spite of all the automation and scaffolding (or perhaps because of it), in building a custom UI to one’s exact liking (esp. with dynamic scaffolding, where you lose significant control over the finally generated DOM). Including client-side styling libraries like Bootstrap would require jumping through increasingly tight hoops in order to make the auto-generated templates adhere to the specific DOM and CSS requirements dictated by such libraries.
As of today, I’m still delving deeper into the fascinating world of server-side JS runtimes, the accompanying middleware (Connect and Express being among the biggest names here), while also being repeatedly amazed by the power of modern client-side frameworks. The JS landscape is already incredibly expansive, and continues to grow at a frightful rate, as each day heralds the launch of several new libraries, frameworks and tools that make JS programming all the more exciting and enlightening.
My journey of exploration has already started bearing fruit, and has empowered me to give back to the wonderful open source community (JS or otherwise) that I have received so much from. My current efforts are focused on building a JS-based e-commerce platform, and I am confident I have selected the right platform for the necessary pace of development and superior performance that I demand of the product-to-be.
When I was looking for a suitable OSS license under which to release my download manager, I came across this very useful post by Ed Burnette. I converted his post, describing the decision process, into a (somewhat simplified) flowchart.
As already mentioned in the original post, this information is not a substitute for professional legal advice.
I will take a detour from my usual trend of writing purely about technology, and instead reflect on how and why so many software projects get into a terribly messy state of affairs, often bad enough to be terminated.
I’m trying to understand why, in spite of there being such a wealth of information, training, tools and techniques available on how to run a smooth ship, so little is actually implemented or consistently exercised in most places. I will limit my focus to a subset of problems for which I may be able to hint at a solution through a process of selective elimination of complexities and implementation overheads. Technology can obviously never replace a competent human for addressing personnel-related matters like conflicts, assessment, morale, etc. (although it can still be used to gather crucial data points that help measure individual and team performance), and I am not trying to shoehorn it into such a role.
Every project, small or big, starts out with a lot of energy, zeal and a headstrong optimism among the stakeholders regarding the successful and timely delivery of end products. Assuming there is no deliberate malice or vested interest in failure, no one in a decision-making role would set the ball rolling if they predicted near-certain collapse. How then, do so many projects spiral into an uncontrollable vortex of botched deadlines, leading to increased pressure and reduced morale, leading to more botched deadlines and ultimately, attrition, blame games, broken deliverables and even total shutdown? A bit of googling would provide a flood of information pointing to issues with personnel, techniques, tools and circumstance. I will focus on just a few, outlining how non-invasive, data-driven solutions can help in addressing some of them.
An often observed chain of events in many projects I have worked on, goes something like this:
- Project Manager asks for overall status from team.
- Team reports completion status and ETA from seat-of-the-pants guesstimates, often intentionally hiding flare spots and backlogs.
- Project Manager gets an overall picture that’s rosier than what the hard ground realities would depict, and based on this information, assigns further tasks and deadlines to team, often too aggressive to be realistically achievable.
- Team accepts tasks and deadlines without question, mostly for the same reason they concealed the bad news in the first place.
- Deadlines fly by, most tasks are in a pitiful state of disarray, and no one can satisfactorily answer why, or what caused this mess in the first place.
Sound familiar? A point to note here is that this can happen even to the most competent and dedicated teams, since effort estimation and progress tracking over long periods of time are continuous, and non-trivial tasks, and not always diligently performed. The delay/failure in pushing out today’s deliverables may have been caused by an insidious scope creep in the recent past from another project, eating into the time and resources of the project at hand. However, people can’t always connect the dots at a moment’s notice. The data is often just too thinly spread out or too deeply nested in several layers of loosely related or unrelated information, or just too old and foggy, and hence not easily visible/traceable.
Another well-known tendency in any chain of command, is for news to get more and more inaccurate and imprecise, the further it percolates up the org pyramid. This is especially true when the most granular data is hard to get at or comprehend, and people at higher levels rely instead on summary information prepared by those immediately below them. Typically, since no one wants to share bad news too willingly, it is only the good bits that are polished, often exaggerated and passed on for consumption up the corporate ladder, while the bad ones are quietly swept under the rug in the hope that no one will go sniffing about.
I have personally observed this phenomenon even in very small teams with greatly flattened hierarchies. Human psychology is obviously playing a very prominent role here, and one needs to allow for the fact that the mere presence of distorted data does not necessarily imply deliberate falsification.
So what can one do to improve the overall visibility and predictability in attaining project milestones? At the very least, we need to make atomic data accessible to all layers in the management stack. Raw data, when coupled with advanced analytical tools, can reduce or even totally eliminate the chances of error buildup. Reporting and analysis tools abound, but it all has to begin with collecting the raw data from deep down in the trenches.
However, when it comes to collecting data at the grassroots level, one cannot reasonably expect all developers, designers and testers to continually push out micro-updates on the progress of their work items. It is disruptive, inefficient and basically extremely annoying to have to do so on a regular basis. Processes and tools that work in the background to automate the collection of atomic progress data, therefore, are key to enabling optimum visibility across the organization for any software project.
Real and Perceived Overheads in Supervision / Management
I am strongly opposed to micro-management – exercising too much control over how a developer works has been proven to be counter-productive in many studies, and I can corroborate the same from my own experience. However, I don’t lobby for a total absence of management either. Rather, during the life of a running project, I think management needs to perform, with minimal intrusion and high precision, two jobs in particular:
- Smartly allocate tasks to the resource pool based on availability and effective pairing with core competencies.
- Identify tasks that are lagging behind schedule, and selectively focus on them to facilitate speedy completion.
Both these duties can be most effectively carried out when there is adequate, up-to-date data available in order to compute resource availability and progress of individual items. A manager’s task can be further assisted by automated planning software, using constraint programming algorithms to suggest time and resource allocation strategies. Driving this automation, again, is the raw data that needs to captured at every stage of the project life-cycle.
Real and Perceived Overheads in Using Project Tracking Tools
I’ve often noticed, especially in small companies with small teams and few, if any rigidly enforced processes, a distinct tendency for issue trackers and sophisticated project tracking tools to be under-utilized. Very often, at the time of project kickoff, a brand new ‘tracker’ is created on a spreadsheet, and a quick RTFM is disbursed to the team before it is let out into the wild.
This ‘tracker’ is diligently used and updated for perhaps the first 2-3 weeks. Following this initial phase of rigorous adherence is a marked decline in its usage. Limited usability, constant maintenance of fickle formulas, lack of scalability and lack of integration with external tools like issue trackers, version control systems and continuous integration systems are just a few reasons why most developers quickly learn to loathe a spreadsheet-based project tracker. For them it is no more than a gigantic, metric butt-load of (barely justifiable) double data entry. Data is already being constantly captured in various centralized as well as local development tools (VCS, CI, Bug Trackers, IDE, etc.). This data can and should be automatically fed into the project tracker so that it remains up-to-date without much manual input. Developers would much prefer to work on a project that tracks itself.
Another common ailment that many non-trivial projects suffer from is the reluctance of non-technical stakeholders, often business/end users, to formally enter their requirements, issues and feedback into an issue tracker, and the confusion that arises from the disconnect. Developers are bad at remembering the minute details of feedback and change requests. Email can work only for the simplest projects, with no more than 5-6 people involved in all. Input often comes in through phone calls, instant messages, SMS and direct conversation. As the project grows, it becomes cumbersome and error-prone to organize and track this barrage of information in an overly simplistic tracking tool.
One can always argue that the end users should not have to learn a new and complicated system. However, the core issue here is that most project and issue tracking systems lack a simple, intuitive user interface, geared towards simplifying the chore of submitting feedback. A system with such an interface in place would be easier to pitch to the layman, and thus enjoy much better adoption. Whenever possible, this input mechanism should be very tightly integrated with the feedback channels offered by the end product. However, an easy interface can only partially lower the barrier to adoption of tracker usage. The rest of the data must be captured, as far as possible, directly from their sources. Integrating a tracker with these channels can be a challenging task though.
Lack of Continous Integration and Automated Build Systems
Continuous integration systems and automated building and health monitoring systems are usually setup only for large projects with a sizable team, in medium to large enterprise scale operations. These tools ensure that mission-critical software goes through a good deal of testing and quality checks before it is shipped out. Smaller projects do not have such elaborate setups in place, mostly due to time and budget constraints, as well a (misguided) belief that they can manage without them.
Truth is, no project lasting longer than a few months and involving more than 2-3 people can be effectively monitored in the long run for new bugs or build breakage in borderline test cases without some process facilitated by a continuous build system. Small projects, in spite of their relative simplicity, are still just as mission-critical to a small company as a big project is to a big company. Hence, it is a mistake to overlook the importance of putting these systems in place.
Overall, I have not proposed any radically new idea or magic cure to heal an ailing project. I am merely driving at the point that although the answers are out there for solving most problems of project management, they often exist in mutual isolation. The need of the hour is to bring them all under one banner and provide an integrated solution that, while being comprehensive and powerful enough to suit the needs of large and complex projects, is also affordable and easy to setup and use, so that smaller projects can reap the benefits of adopting them as well.
- Your dev setup is on Linux, or other Bash-compatible. (Build script written for Bash.)
- You use Git for SCM. (The build script will use git for some operations. Feel free to alter the script, and get rid of this dependency.)
- The app is structured as recommended by the angular-seed project. (Build script expects certain folders to be present at specific locations. You can adapt it to your project structure.)
Go grab the following:
- YUI Compressor
- Gzip (Installed by default on most *NIX systems. Pull from distro repos otherwise.)
- Node.js (For testing stuff on your localhost. Not required if you have/prefer some other server for delivering static content.)
- Stomach for shell scripts
I wanted to split my web application into two distinct components:
- A client-side, JS-driven presentation layer.
- A lightweight, REST-based backend.
I’ve had to sort out a lot of issues to get both to cooperate while running on different servers on different domains, use digest-based authentication instead of cookies (REST is stateless), and so on, but that’s another post. This one focuses on efficiently delivering the UI portion – HTML + CSS + JS + Media – which from a server POV is static content.
Preparing AngularJS Scripts for Minification
The AngularJS docs provide some information on how to prepare controllers for minification here. Quoting from the page:
PhoneListCtrlcontroller, all of its function arguments would be minified as well, and the dependency injector would not be able to identify services correctly.
PhoneListCtrl is part of the angular-phonecat application, used for driving the on-site tutorial.
Basically, every controller defined by your application needs to be explicitly injected with whatever dependencies it has. For the example above, it looks something like:
PhoneListCtrl.$inject = ['$scope', '$http'];
There is one more way defined on the site, but I prefer the method above.
However, this is not enough to get minified scripts working right. YUI Compressor changes closure parameter names, and this doesn’t go down well with Angular. You need to use inline annotations in defining custom services. You can find a usage example here.
Additionally, you can collate all content from controllers.js, directives.js, services.js and filters.js into app.js to reduce the number of calls made to the server.
Don’t forget to modify your index.html / index-async.html to reflect this change.
The Build Script
If you’re sticking to the folder structure provided by angular-seed, you’ll have an app folder in your project root. Adjacent to this, create a build folder to contain the minified and compressed output files generated by the build script. You can tell git to ignore this folder by adding the following line to .gitignore:
You can put your build script anywhere you like, and run it from anywhere in the project folder. I have put it inside the conveniently provided scripts folder.
#!/bin/bash ccred=$(echo -e "\033[0;31m") ccyellow=$(echo -e "\033[0;33m") ccgreen=$(echo -e "\033[0;32m") ccend=$(echo -e "\033[0m") exit_code=0 cd "$(git rev-parse --show-toplevel)" #Minify echo -e "$ccyellow========Minify========$ccend" for ext in 'css' 'js' do for infile in `find ./app -name *.$ext |grep -v min` do outfile="$(echo $infile |sed 's/\(.*\)\..*/\1/').min.$ext" echo -n -e "\nMinifying $infile to $outfile: " if [ ! -f "$outfile" ] || [ "$infile" -nt "$outfile" ] then yuicompressor "$infile" > "$outfile" if [ `echo $?` != 0 ] then exit_code=1 echo -e "\n$ccred========Failed minification of $infile to $outfile . Reverting========$ccend\n" >&2 git checkout -- "$outfile" || rm -f "$outfile" else echo $ccgreen Success.$ccend fi else echo $ccgreen Not modified.$ccend fi done done #Compress / Copy echo -e "\n\n$ccyellow========Compress / Copy========$ccend\n" for infile in `find ./app -type f -not -empty` do filetype="$(grep -r -m 1 "^" "$infile" |grep '^Binary file')" outfile="./build/$(echo $infile |cut -c7-)" mkdir -p $(dirname "$outfile") if [ ! -f "$outfile" ] || [ "$infile" -nt "$outfile" ] then if [ "$filetype" = "" ] #Compress text files then echo -n -e "\nCompressing $infile to $outfile: " gzip -c "$infile" > "$outfile" else #Copy binary files as is echo -e -n "\nCopying $infile to $outfile: " cp "$infile" "$outfile" fi if [ `echo $?` != 0 ] then exit_code=2 echo -e "\n$ccred========Failed compress / copy of $infile to $outfile . Reverting========$ccend\n" >&2 else echo $ccgreen Success.$ccend fi else echo -e "$infile -> $outfile: $ccgreen Not modified.$ccend\n" fi done echo -e "\n$ccyellow========Finished========$ccend" exit $exit_code
Once you run this script, every app/file.[css | js] would have a working copy at build/file.min.[css | js]. Every other file in the app folder will be either:
- compressed and copied (name unchanged) into the build folder if it is a text file, or
- simply copied into the build folder if it is a binary file (like an image).
Your CSS and JS references need to be updated to their corresponding min versions in index.html / index-async.html.
Now that you’ve got a compressed, minified version of your app in the build folder, you can deploy it to any static server. But you do need to set your HTTP response headers properly, or the browser WILL show garbage. Most importantly, any compressed content must be served with the HTTP response header:
Additionally, for every file that is static content, it makes sense to set a far future date using an Expires header similar to the following:
Expires: Thu, 31 Dec 2037 20:00:00 GMT
The NodeJS web-server.js Script
The contents of the build folder are technically ready to be uploaded to any web server, but you will probably want to run the app from your localhost to first check if everything works fine. The built-in web-server.js is very useful to quickly launch and test your app, but it needs a few mods in order to serve the compressed content from the build folder correctly. The Content-Encoding header is sufficient to render the page correctly, but if you’re a stickler for good YSlow grades even on your localhost, you will want to add the Expires headers as well. Search for the following response codes in your web-server.js and add the lines listed below:
- 200 (writeDirectoryIndex), 500, 404, 403, 301:
'Expires': 'Thu, 31 Dec 2037 20:00:00 GMT',
- 200 (sendFile) (After var file = fs.createReadStream(path);):
That’s it! Now when you run the web-server.js, all content from the build folder will be correctly served with the ‘gzip’ header (unless it is a binary).
I have been exploring several web development platforms for quite a few months now. It is not that there is a shortage of great frameworks out there (I am not averse to learning a new language in order to use a good framework), and I did play around with a few really good ones, but I have a few stringent conditions that I want the framework to satisfy:
Since my goal is to build an application that provides hosted services to multiple organizations, the framework must be one that treats multi-tenancy as a first class member in their feature list. There’s always scope for some heated debate on whether multi-tenancy is the best approach, especially when it comes to isolating data between multiple clients. One of these discussions, which I found quite informative, is this.
Multi-tenancy is definitely not the best solution for all usage scenarios; one could argue that multiple single tenant databases are easier to scale out horizontally. However, scaling out is not entirely impossible with the multi-tenant model, and it does save me certain overheads like multiple-maintenance of common configuration data.
My reasons for opting for multi-tenancy are the lowered upfront and recurring infrastructure costs compared to running single-tenant-per-db solutions, and an easier maintenance/upgrade path. However, since the job of isolating data between clients is managed exclusively at the application level, that implementation has to be absolutely water-tight.
Native Application State
‘Shared-nothing’ platforms, like PHP, do not have a built-in application state. Once again, it is ultimately a matter of opinion as to whether or not this is a good thing, but I personally prefer systems where the bootstrap process ideally takes place just once in the lifetime of the application.
Devoid of a native provision for long-lived objects, a stateless platform has to bootstrap the entire framework for every single request that it processes. This is because all objects, class definitions, and all compiled bytecode in general, are restrained to scope of the request itself. While this does make thread-safe programming a no-brainer, it incurs a severe overhead for having to rebuild the object graph and other data structures in memory for each request (even those whose data are request-independent). No wonder it performs very poorly when compared to a platform like Java, in which an object once loaded into memory (even when triggered by a request) can legitimately outlive the request itself, thus saving time while processing future requests, since they can reuse this loaded object.
The lack of an application state can be offset by using op-code caches like APC (for PHP) which can cache compiled bytecode, and even objects across multiple requests (Note that doing this essentially violates the shared-nothing principle, one of the fundamental tenets of PHP). Memcache-based solutions can also be used as an alternative to, or in conjunction with APC. However, these solutions are not built into PHP, and thus require additional modules and/or coding in order to use (this also means there is additional code to execute). Expiring externally cached objects is also a non-trivial issue, since a separate garbage collector must be designed for that. At the end of the day, nothing can beat the speed of directly accessible, in-process-memory caching (with no protocol overheads) that native application state offers. Here’s an interesting Q&A with David Strauss, creator of Pressflow (essentially Drupal on steroids). Just the following excerpt from one his answers should drive home the point:
Because the overhead is largely on the PHP side, Pressflow is exploring ways to accelerate common functions by offloading select parts of core to Java (which blows away PHP + APC on a modern Java VM) and performing expensive page assembly and caching operations with systems like Varnish’s ESI and nginx’s SSI.
No prizes for guessing what gives Java this performance edge ;-). Even a simple PHP application needs help from external caching, and other auxiliary mechanisms in order to satisfactorily serve anything more than a handful of requests per second.
Independently accessible service layer
Say we have an application up and running, and it needs to be accessible though multiple devices, and over multiple channels like HTML, REST/SOAP, RSS and what not. Most platforms come pre-packaged with a scaffolding system for building the presentation layer on HTML by default. This is not a bad thing in general, except when I’m accessing the app from something other than a web browser, such as a mobile app with (screens built in), or from another webapp. In cases like this, I would like not even the slightest overhead to be incurred in loading and building any part of any presentation layer that is not required for serving these requests.
This is possible only when the framework has been designed from ground up with ‘headless’ operation in mind. Basically this translates to a totally detachable service layer that can be invoked independently of the default presentation system.
Lightweight domain objects
I’ve come across a few excellent frameworks that do something I find really strange, and the reason for which escapes me. Their domain modelling paradigm dictates that all business logic pertaining to a specific domain model be contained in the model itself. Really??! In cases where one is building large lists of objects of a particular ‘heavyweight’ domain class, this embedded business logic is simply bloating the memory usage. Now I get the part about static methods and variables (before the slingshots come out 😉 ) which are instantiated just once per class and not per object, but static members were designed with different architectural goals in mind, and not specifically as a memory saving construct. Hence they do help reduce the overhead somewhat, but not by much (every object still needs to maintain pointers to reference the static members).
Another problem is: where do you put logic whose concern spans multiple domain classes? Or has nothing to do with domain classes? Neverending source of confusion, that.
I would rather go for a design which treats domain objects as simple data beans, with no more than the simplest validation rules built in. The heavy lifting of business logic should be borne by a dedicated service layer. This approach also simplifies the implementation of independently accessible service methods that I outlined in the previous section.
Pluggable storage backend
This is a short one. Most frameworks support interchanging one RDBMS with another fairly smoothly. I want to throw NoSQL stores into the mix. I want to be able to plug in MongoDB or Couchbase, for instance, to supplement the RDBMS with certain functions that NoSQL DBs excel at, but I don’t want to change the way I use the persistence layer. Whatever the technology I use for abstracting the storage functionality in the application, its API must let me work seamlessly with non-relational data stores as well.
That more or less covers my wishlist of things I’m looking for, in a web framework. In order to keep the title short, I didn’t mention it has to be open source as well (yes it does 🙂 ). I think I might have found one that manages to check all boxes on the list, but I’m open to suggestions.
On some networks, I need to connect to a (firewalled) intranet over wired ethernet, while general unrestricted network access is available over WiFi. Typically I need to stay connected to both networks so as to access machines on the LAN as well as the WWW. Trouble is (at least on my F17 machines) the system is configured to use the ethernet interface (if live) by default for all outbound requests, regardless of whether the WiFi is enabled or not.
This is not a convenient situation as the LAN is often configured to block access to requests going outside the local subnet. This means every time I have to go online, I need to disable my ethernet Iface first! The source of this endless bother can be traced down to the way the system has setup its routing. Just fire up a terminal and issue the following command to get your current routes. In one such run I get the following output:
$ route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 192.168.11.2 0.0.0.0 UG 0 0 0 p1p1 192.168.0.0 0.0.0.0 255.255.255.0 U 0 0 0 wlan0 192.168.8.0 0.0.0.0 255.255.252.0 U 0 0 0 p1p1
This tells me that the default route for all outbound requests (those that do not specifically match any other rule) is through Iface p1p1 (ethernet or wired LAN). I need this to be set to wlan0 (WiFi) instead.
That is done (as root) by first deleting the existing default route, followed by adding a new rule to route default requests through WiFi:
# route del default # route add -net 0.0.0.0 dev wlan0 gw 192.168.0.1 # route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 192.168.0.1 0.0.0.0 UG 0 0 0 wlan0 192.168.0.0 0.0.0.0 255.255.255.0 U 0 0 0 wlan0 192.168.8.0 0.0.0.0 255.255.252.0 U 0 0 0 p1p1
The gateway IP for the default route should be the default gateway for your WiFi.
Post these steps, the system will route requests within the LAN through p1p1 (note that this route was already configured for p1p1 in my case and is a stricter rule than all the others, hence is the first to match) and outbound traffic to non-local addresses through wlan0.