Building an almighty data retrieval system for all HTML5 webapps   Leave a comment


Download ExampleDifficulty: easy moderate challenging

As discussed in my previous article here, data retrieval is at the heart of every informatory HTML5 web application. Because of this reason, it is important to thoroughly consider and devise a scalable solution for data retrieval, which collectively deals with caching and network retrieval. This is mandatory if you want your app to be functional both online and offline, by retrieving data from network when possible, or from the stored cache otherwise. Considering a good design is also important as any solution you come up with, good or bad, will be used everywhere in the app and thus difficult to modify.


Our main goal is to make the app work with either data coming from cache or from network interchangeably, without two different code paths, and with only a few lines of code whenever we need such data that do not concern whether there’s connectivity, cached data, or whether the servers are even up. But this is just the surface of things. Internally, we will automatically store data in cache once retrieved from the network. This will not only allow our app to work when offline by serving the stored data, but also to make our app more responsive, save bandwidth, and minimize server load, all by eliminating recurring requests even when the user is online. As you will see soon, because our design can monitor all data transmissions in one place, it could also set to handle many advanced features, such as expiry requirements, queuing, batching, storage limits; I leave you to discover these on your own, according to your needs, while I focus on the core of this solution.

How should we then construct our data retrieval mechanism? First, let’s consider the façade of it. The façade is the face of this mechanism, which will be used everywhere in our application, whenever data is required. We should prefer the most specifically defined methods that will support our specific business logic. Such methods will require specific parameters, will return specific outputs, and will be named distinctively. If we’ll consider an application that lists upcoming flights, it makes more sense to use a method called getUpcomingFlightsBetweenTwoAirports, which expects two strings of airport codes, and which returns an array of Flight objects, rather then using an all-encompassing function such as GetFlights, which works on many different options, but in our case will require an object with a fromAirport and a toAirport parameters and another one explaining that we want it to return upcoming flights between these two airports. The latter method is too ambiguous and will require rereading a lot of code or documentation for every single use of it. In the process of building our façade, we break our complex system requirements into smaller, comprehensible and well-defined chunks, which will be much easier to interact with, especially with an interpreted language like JavaScript for which you would have to run the web app in a browser in order to verify you don’t break anything when you modify some code. We will put all of our façade methods in a class called Data, for example Data. getUpcomingFlightsBetweenTwoAirports. This is demonstrated using the code segment below; in it, the first method is the app UI, which uses the second method –our façade method. Two gotchas about this example and the others below: first, the example minimally uses jQuery to populate the UI; if you are not familiar with it, you will probably still understand it all. Second, to simplify your understanding, the code uses return statements to signal back; this is not how it should be coded, but it keeps things simple for now; an explanation and its following example below will mend and demonstrate that.

/**

* Our UI method for listing and allowing interaction with flights between two airports

**/

function showFlights() {

var container=$(‘<ul style=”flights”/>’);

var flights=Data.getUpcomingFlightsBetweenTwoAirports({fromAirport:‘JFK’, toAirport:‘ORD’});

for (var i=0;i<flights.length;i++) {

var flight=flights[i];

$(‘<li style=”flight”/>’).text(flight.flightNumber).click(function() { showFlightPopover(flight.flightNumber); }).appendTo(container);

}

container.appendTo(‘body’);

}

/**

* Our facade

**/

Data = {};

/**

* Expects params.fromAirport to be an airport code string

* Expects params.toAirport to be an airport code string

*

* Returns an array of Flight objects listing all upcoming flights from <fromAirport> to <toAirport>

*/

Data.getUpcomingFlightsBetweenTwoAirports = function (params) {

var fromAirport=params.fromAirport;

var toAirport=params.toAirport;

var result={}; /// TBD

// if no results, returns a blank array

// if there are, filters only the scheduled one

return(result ===null ? [] : result.flights.filter(

function(element, idx, array) {

return(element.status===‘scheduled’);

}

));

}

On the other side of things, we want our individual data pieces to be retrieved via AJAX using JSON, JSONP, or XML. Each data piece has to be retrieved using its own API URL, with its own parameters, and with custom parsing code for the response – preferably all coded in a single place. Since all these AJAX requests pertain to network retrieval only, they should be defined in a new class, say NetworkController, and not in our façade. This also allows us to use a single network call for more then one façade method — for example Data.getFlightTime and Data.getFlightDuration are two façade methods that use the same NetworkController.getFlightInfo API call. This saves us extra coding and redundant cached data on the inside, while still remaining clear on the outside.

Breaking our app to façade methods and to retrieval methods helps us with differentiating our business logic from our backend implementation. For each data unit the app uses, we should create a façade method, and for each web service call, a NetworkController one. The two doesn’t necessarily have a one-to-one relation. This should encourage you to break your business logic into smaller, more manageable components using the façade.

Now, how do we combine the façade and NetworkController? In order to achieve the said advantages of this design such as automatic caching, conditional caching, etc. we would need one place where we can code it all, regardless of the specific façade or network call; this is an equal opportunity design, after all. Hence, we direct all façade methods into a new method; let’s name this method ‘get’ in another class called DataRetrieval. Any façade method will call DataRetrieval.get method, which will then call a NetworkController’s method along with any parameters such method might require. The façade method actually chooses the retrieval method to use by passing another parameter to DataRetrieval.get named methodName, which will identify which NetworkController method to invoke in order to get the data it needs. The façade method can then “parse” the response, by picking the items it needs from what returns, and then replying with its specific output. To simple the contract between the façade and NetworkController, you should use well-defined objects as parameters and responses, if there is a lot of information to pass. A demo of this three-tiered solution is shown in the code below; the first method is the façade one, the second is our DataRetrieval.get connector method, and the last is the NetworkController one.

/**

* Expects params.flightCode to be a flight code string

*

* Returns a Date object for the flight’s time

*/

Data.getFlightTime = function (params) {

var flightCode=params.flightCode;

var result=DataRetrieval.get({

methodName:‘FlightInfo’,

flightCode: flightCode

});

if (result===null)

return(null);

var d=new Date;

d.setTime(result.flight.time);

return(d);

}

/**

* get is in charge of common data operations (caching, queuing, batching, etc.) and distributing operations to either NetworkController or CacheController

*

* Expects params.fromAirport to be an airport code

* Expects params.toAirport to be an airport code

*

* Returns an array of Flight objects

*/

DataRetrieval.get = function (params) {

var methodName=params.methodName;

……….

// network retrieval — runs a methods (notice the paranthesis in the end)

var networkData=NetworkController[‘get’ + methodName](params);

return(networkData);

}

/**

* getFlightInfo retrieves flight information using TSA’s public api

*

* Expects params.flightCode to be a flight code

*

* Returns a MyData.Flight object

*/

NetworkController.getFlightInfo = function (params) {

var flightCode=params.flightCode;

return($.ajax(http://api.tsa.gov/getExtendedFlight&#8217;, { data: {

flightCode: flightCode

} }, function(data) {

// data retrieved, parsing the result, validating, and reformatting

if (data===null || data.content===null || data.content.flight===null) {

return (null);

}

var flight=data.content.flight;

// instanciate my own Flight object, and uses it to store the data to my liking

var myFlight=new MyData.Flight();

myFlight.flightNumber=flight[‘airline’] + flight[‘number’];

myFlight.fromCity=flight[‘city0’];

myFlight.toCity=flight[‘city1’];

myFlight.duration=flight[‘duration’];

myFlight.airplane=flight[‘plane’];

myFlight.speed=flight[‘expectedSpeed’];

myFlight.time=(flight[‘updatedTime’]!==null ? flight[‘updatedTime’] : flight[‘scheduledTime’]);

switch (flight[‘status’]) {

case ‘N’:

case ‘M’:

case ‘Q’:

myFlight.status=‘scheduled’;

break;

case ‘R’:

case ‘P’:

myFlight.status=‘intransit’;

break;

default:

console.error(‘unknown flight status:’ + flight[‘status’]);

}

return({flight: myFlight});

}, function () {

// network failed

return(null);

}));

}

And now we can finally get down to business. Based on the network call and its parameters, our new DataRetrieval.get method knows how to request the data over the wire. But it could also cache previous calls and their results, if for every request made, it would have an identifying key that can be used for storing the result into the cache. For example, for a NetworkController.getAirportInformation method and the parameter ‘ORD’, we could define a key: ‘arpt_ORD’; then, after a successful network call was made, we would store the response using this key. Now, the next time we are requested for the same method and ‘ORD’ parameter, we could again reconstruct the key ‘arpt_ORD’ based on the specific method and its parameters, query the cache about it, and use the data from cache without issuing a single network request.

 

Even though one can build an automatic conversion from the list of parameters to a key, similarly to Java’s hashCode, I would recommend you specifically code these. This can be done in a DataRetrieval.getKey method, which DataRetrieval.get will call; the latter will pass all the parameters it receives to getKey, which will then construct a unique key for each call using its parameters. See the sample below.

/**

* getKey generates and returns a unique key for every possible call

*/

DataRetrieval.getKey = function(params) {

var methodName=params.methodName;

switch (methodName) {

case ‘getAirportInformation’:

return(‘arpt_’ + params.airport);

break;

case ‘FlightsBetweenAirports’:

return(‘flights_’ + params.fromAirport + ‘_’ + params.toAirport);

break;

case ‘FlightInfo’:

return(‘flt_’ + params.flightNumber);

break;

default:

window.alert(‘getKey called for an unknown method: ‘ + methodName);

}

}

You can probably code DataRetrieval.Get yourself by now. But just to be clear on its precise operation, let’s examine it together. This method follows a simple decision-making flow diagram. First, it should check if the data is already in the cache; it will need to call getKey for that. If it was cached already, then the job is done, and we should simply return the data from cache. However, if the cache answers negatively, network delivery should occur. Before still, using network detection techniques, which are available in HTML5, you might decide to forgo network retrieval entirely and simply fail if you know that there is no connectivity at all. If network is reachable, however, our method should now be calling NetworkController[‘get’ + methodName] (e.g. NetworkController.getAirportInformation). This network request might take some time, but in return will provide you with a fresh piece of data. Now, instead of simply returning the data received, we would also want to cache the response for next time, using the same key. That’s it. You can further modify DataRetrieval.get to meet your app’s needs; for example, if you wish the app to always retort to network when possible, and to use cache only when offline, simply change the state machine. The images below explain the scenarios when there is cached data and when there isn’t, respectively. The code after demonstrates our full DataRetrieval.get. The attachment below combines all of the methods, in a working example.

 

Retrieval with cached data

Diagram 1: Retrieval with previously cached data.

Retrieval without cached data

Diagram 2: Retrieval without cached data.

/**

* get is in charge of common data operations (caching, queuing, batching, etc.) and distributing operations to either NetworkController or CacheController

*

* Expects params.fromAirport to be an airport code

* Expects params.toAirport to be an airport code

*

* Returns an array of Flight objects

*/

DataRetrieval.get = function (params) {

var methodName=params.methodName;

var cacheKey=DataRetrieval.getKey(params);

// queries the cache for a previously stored result

var cachedData = localStorage.getItem(cacheKey);

if (cachedData !== null) {

// there’s stored data for this request, use it

return(JSON.parse(cachedData));

}

// can only retrieve from network, if device is connected

if (Utilities.isNetworkConnected()) {

// there’s nothing in cache, procceed with network retrieval — runs a methods (notice the paranthesis in the end)

var networkData=NetworkController[‘get’ + methodName](params);

if (networkData !== null) {

// got it

// saves the newly retrieved content into cache

localStorage.setItem(cacheKey, JSON.stringify(networkData));

// and sends the content back

return(networkData);

}

}

// data is not available either way

return(null);

}

 

We will not go into the implementation of the cache itself, which is to be coded in DataRetrieval.get. However, a suggestion to consider: use a key-value pair implementation for your cache. If you chose HTML5’s localStorage, this is the only available option anyway. However, for advanced technologies such as WebSQL, one might opt for a more structured design. I wouldn’t, for a few reasons. First, if you ever want to switch, to support more than one, or even allow the user to choose the cache type, key-value pair is the lowest common denominator and is easy to implement on any storage type. Second, the structureless usage of key-value pairs goes hand in hand with JavaScript’s lax coding; this means that changes in your UI that propagate to facade or network retrieval changes need not alter your database structure. This point is ever more important for iterative and incremental development because relational databases have to be upgraded from one version to another, and will consume a lot of development and testing resources. So, if you are using a more structured technology for storage, I would recreate key-value pairs in a two-columns single table and would mimic a getItem and setItem methods to dumb down the storage.

We didn’t talk about this until now, but if you are familiar with advanced JavaScript, you know that all network requests are asynchronous. All of our methods should then also receive a callback parameter that they call when they are done. Every time that I said: “the method returns some data”, it means “the method calls back using the callback function with some data”. I also like to add another callback parameter, named error, for when things go bad, such as when we do not have the data to return. I want to stress this: ALL methods we discussed until now should have these two more callback parameters — let’s call them success and error. A demonstration of the asynchronous technique is available through the following attachment; it is too long to paste here.

Download Asynchronous Example
 
 

Now, how about some even more advanced stuff? Since all data requests are now being made through DataRetrieval.get, it’s easy enough to design an optimization mechanism. For example, we can cache some frequently used calls in memory — without even having to recourse to our slower storage. Another one is setting limits on the size of storage; you could either query your database size or count bytes whenever you are about to save data into cache in DataRetrieval.get, and consider purging some old data pieces when you approach your set limit.

Perhaps even more important, how do we ever update our data if we always resort to cache? That’s a question that each person needs to consider for himself or herself. If you want to refresh content at most once per 5 minutes, you can store a last accessed timestamp value along with each cached item, which will assist DataRetrieval.get in deciding whether the data is stale and shouldn’t be used. Or, if your app uses the same exact data pieces over and over again, for example, getLatestFlights(), you might even be better off creating an Updater class that will mandate refreshing stale data at specific intervals, without the user requesting it, and will save the user from waiting for an update the next time this data is needed. Such Updater class could send an extra parameter to flag to the façade that a refresh on the data is needed; other callers will not have this flag and thus will continue to hit the cache. Or, if some data must always be fetched from network, we could forgo caching entirely for such calls. This, similarly, should be taken into account in DataRetrieval.get.

There are even more things to consider and to enhance. But this article is already almost too much to swallow for now. I hope that you are not too overwhelmed and that you feel better prepared to embark on your next HTML5 adventure.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: