Monday, December 20, 2010

JavaScript and generic server side services

For awhile I'm trying to figure out a nice and elegant way to combine a server side code and JavaScript-Ajax code without the need for writing the tedious service layer(Handler page, web service, etc.) and the parallel JavaScript functions.

This came up, after watching various teams doing the same over and over again. And over time breaking their own code as the system keeps growing, sometimes can't find where's what and on which side, and frustration of not having a solid interface where all developers from the same or different teams write to as one.
* for some reason the song "imagine" by John Lennon is playing in my head...

The idea is to write a server side code, which can be "registered" to the client side code in such a way that the client side developer will have all the intended JavaScript code written for her or him and he or she will only be needed to write invocation or binding code to the "registered" code. The server side code documentation can be used to guide the client side developer.

I am assuming the code is written in an object oriented language and it is used in an OOP fashion with a reflection ability.

The solution will require the following components:
* One or more classes which perform service operations(getting stuff, saving it, etc.)

* An interface which all service classes are required to implement

* Registration utility code

* Service Proxy

The following will describe a close to pseudo-code implementation.
This description is a skeleton to a much more complex and smart systems which can be built upon.

Any class needed to be exposed to the client side, will need to implement an interface which will describe all the required functionality for the registration part to work.

for example:

Interface IClientScript
function ignoreService()
// the implementation of this method will need to return a list of methods
// which should be excluded from client side exposure

Service Class
The service will need to implement the client side registration interface.

for example:

Class Stuff implements IClientScript
// just happy empty constructor

function getStuff(arg1){
// this method will fetch some stuff from the db and will return it

function saveStuff(arg1, arg2){
// this method will save some stuff to the db and will return Boolean result

function ignoreService(){
// this is an implementation method of IClientScript interface. it will return
// a list of methods not to be registered to the client

Registration Utility Code
The registration utility code, should get classes which implements the needed interface and expose the service methods using reflection (in a very simple example, only the public methods).
The registration code will generate the client script code, while taking into consideration namespaces, client side libraries, etc.

for example:

function register(name, args){
// this function will take name of a class, will make sure it implements the needed
// interface, read all the methods which needs to be exposed and generate a
// JavaScript code

A generated JavaScript code could look like:

window.NS1 = window.NS1 || {};
window.NS1.ServiceProxy = window.NS1.ServiceProxy || {};
window.NS1.ServiceProxy.url = "";
window.NS1.ServiceProxy.invoke = function(options){
// get details from the options argument and use your favorite way to send
// the request to the service proxy
window.NS1.Stuff = window.NS1.Stuff || {};
window.NS1.Stuff.getStuff = function(arg1){
  "data": arg1,
  "success": successfn,
  "failure" : failurefn});

Service Proxy
The service proxy is an http request handler. This component analyze the request, use reflection to load the necessary service (class), invoke the service method and send back a result message(text, json, etc.) which will be handled by the JavaScript callback function.

Tuesday, December 14, 2010

Web bug server & log analyzer

I got a task to build a web bug system and then analyze the saved information.

What is a web bug? see what Wikipedia has to say about it

As this might seems like an easy task, there were few interesting challenges:

  • Enormous amount of data to handle (tens of millions of requests every day)

  • Make the system fully scalable

  • Make the system redundant

  • Implementation of the saving mechanism

  • Track and analyze the saving mechanism

  • Archive the data for a long period of time

  • Analyze where requests are coming from (countries)

  • Count unique users

  • Phew, is there some time for coffee???

So, after getting the requirements, scratching the head, making coffee (it appears there's always time for coffee), scratching the head some more, drawing some weird boxes on the erasable board while mumbling some buzz words, scratching the head again and then sitting down to think; comes the time for some decisions.

starting with the seems like easy decisions:
* How to save the data? - just throw it to the database
* Archive? - take the saved data, and put it some data-store
* Where requests are coming from? - find a GEO IP service and use it while analyzing the data
* Unique users? - use cookies
* Coffee? - kitchen --> coffee machine

Design #1 - Simple
1 web server
1 database server

Get all the traffic to the web server with a server side code (PHP, Python, .Net, etc.) which will read the request and save it in the database.

Have a daily job which will get all the saved data and analyze it.

* very easy to implement
* very short time to develop

* Non effective way to handle enormous amount of requests -
analyzing tens of millions of raw data on RDBMS takes eons!
* Not scalable and not redundant

As it might be a very nice start for a small system, this solution fails the requirement of analyzing enormous amount of data.
While testing this design, it took the web server less than 30 seconds to crash.

Back to the drawing board, draw more boxes and mumble a bit more and find how to improve the first idea. new decisions to make:
* Make it scalable - use more than one web server with load balancing
* Redundancy? - log the web requests. If data won't be saved in the database, we can get it from the log files.

Design #2 - Simple++
3 web servers
1 database server

Use DNS load balancing to split the traffic between the web server. This solution is very easy to use for scaling up the system, we only need to put up another server and register it with the DNS system.

Turn on the web server's logging and the rest is like the first design:
Get all the traffic to the web server with a server side code (PHP, Python, .Net, etc.) which will read the request and save it in the database.

Have a daily job which will get all the saved data and analyze it.

* very easy to implement
* very short time to develop
* Scalable
* Redundant

* need some basic knowledge on how to set DNS load balancing
* still not addressing the analyzing time which still takes eons on RDBMS

We got one step forward getting our system to work. The servers where able to withhold the amount of requests and even had good time response. But we are still stuck with the analyzing time.

OK, so we really need to address this analyzing time. New decisions to make:
* Improve analysis time - solution = reduce, reduce, reduce... yes reduce where ever we can.

Design #3 - Smart Reduce
2 Web servers
1 Analysis server
1 database server

This design will try to reduce the end result while maintaining scalability and redundancy.
* Use web server's log files instead of saving the requests directly to the database - we reduced the process time for handling the request

* Add a module to the web server to track the users (with cookies) - we reduced the need for writing any code

* Add a module to the web server to analyze the request and get the GEO location - we reduced the need to do it when analyzing the data

* Limit every log file to short amount of time (minutes)

* When log file is done (the web server is working on a newer log file), ship it to analysis queue on the analysis server - we reduced the amount of data which needs to be analyzed at a time and if failure occur, it will only effect a small portion of the data. If we'll track this right, we can easily find the problem later and reanalyze the data

* Save the analyzed data to the database

* Have a daily job which will get all the saved data and summarize it if necessary

* Scalable
* Redundant
* More cost effective, less web servers can handle more requests
* Analysis time improvement, we can even have partial analyzed data during the day

* Harder to implement
* Takes more time to develop

This design works and meets the requirements. Less web servers are needed for handling the requests(first tests without accelerators showed ability to answer 10 times more requests per server in this design than using server side code).

We can always improve this design by replacing the DNS load balancing to a more robust load balancing solution, so even if one server is down, all the other traffic will be transferred to the other web servers. And probably can find smarter ways to reduce the processing of data on each layer.

More about archiving, tracking and making coffee later on.....
Related Posts Plugin for WordPress, Blogger...