Saturday, March 2, 2013

Code Storytelling

In this post I will describe a thought I had about the different stories developers are telling when writing code.
In my work, part of my job is to read other people's code, to give feedback about it and then to compile lectures, which will help the development teams improve their coding. I often try to write in different styles on different projects in order to improve my understanding of various coding styles and the people which tend to use them.

I found that most people have their own style and sometimes multitude of styles with in the same code.  I try to show people how they can keep a consistent style with in their code, as part of their training. On some occasions I've also seen the opposite, where a person will stick to one style no matter what and other teams which are made to write all in the same style.

These situations caused me to wonder, how much one should be able to keep a personal coding style while working with others?
Should a line be drawn between the personality and the uniformity? and if so, where should that line be drawn?
Are we losing ingenuity and progress while trying to keep a consistent coding style?

What do you think?

Thursday, December 8, 2011

Caching Google Translate

We've been using the free Google Translate for a while with the jquery-translate plugin (Copyright (c) 2009 Balazs Endresz (

Google decided this month to close the free version of their API and open version 2 for paying customers, in a model where they charge by characters count and capping the daily usage and the requests per second.

For this reason, we needed to purchase an account and use the new paid API.
Couple of major problems popped:
* The application for translating is based on client side code and the API key will be visible to everyone.

* The amount of daily translations (in characters) is by the millions.

We needed a quick solution for this problem.
The solution for the first problem (visible API key) was solved by creating a proxy server which have the API key and communicating the requests between the client side application to Google API.

To solve the other problem and reduce the amount of calls to Google API, we've implemented caching. We've taken every chunk of data which were needed to be translated and cached it. The problem was the unique cache key, which was resolved by using md5 hashing on the chunk of data that was sent to the proxy server with concatenation of the language. i.e. [SOME_HASH]_[LANGUAGE]

Monday, December 20, 2010

JavaScript and generic server side services

For awhile I'm trying to figure out a nice and elegant way to combine a server side code and JavaScript-Ajax code without the need for writing the tedious service layer(Handler page, web service, etc.) and the parallel JavaScript functions.

This came up, after watching various teams doing the same over and over again. And over time breaking their own code as the system keeps growing, sometimes can't find where's what and on which side, and frustration of not having a solid interface where all developers from the same or different teams write to as one.
* for some reason the song "imagine" by John Lennon is playing in my head...

The idea is to write a server side code, which can be "registered" to the client side code in such a way that the client side developer will have all the intended JavaScript code written for her or him and he or she will only be needed to write invocation or binding code to the "registered" code. The server side code documentation can be used to guide the client side developer.

I am assuming the code is written in an object oriented language and it is used in an OOP fashion with a reflection ability.

The solution will require the following components:
* One or more classes which perform service operations(getting stuff, saving it, etc.)

* An interface which all service classes are required to implement

* Registration utility code

* Service Proxy

The following will describe a close to pseudo-code implementation.
This description is a skeleton to a much more complex and smart systems which can be built upon.

Any class needed to be exposed to the client side, will need to implement an interface which will describe all the required functionality for the registration part to work.

for example:

Interface IClientScript
function ignoreService()
// the implementation of this method will need to return a list of methods
// which should be excluded from client side exposure

Service Class
The service will need to implement the client side registration interface.

for example:

Class Stuff implements IClientScript
// just happy empty constructor

function getStuff(arg1){
// this method will fetch some stuff from the db and will return it

function saveStuff(arg1, arg2){
// this method will save some stuff to the db and will return Boolean result

function ignoreService(){
// this is an implementation method of IClientScript interface. it will return
// a list of methods not to be registered to the client

Registration Utility Code
The registration utility code, should get classes which implements the needed interface and expose the service methods using reflection (in a very simple example, only the public methods).
The registration code will generate the client script code, while taking into consideration namespaces, client side libraries, etc.

for example:

function register(name, args){
// this function will take name of a class, will make sure it implements the needed
// interface, read all the methods which needs to be exposed and generate a
// JavaScript code

A generated JavaScript code could look like:

window.NS1 = window.NS1 || {};
window.NS1.ServiceProxy = window.NS1.ServiceProxy || {};
window.NS1.ServiceProxy.url = "";
window.NS1.ServiceProxy.invoke = function(options){
// get details from the options argument and use your favorite way to send
// the request to the service proxy
window.NS1.Stuff = window.NS1.Stuff || {};
window.NS1.Stuff.getStuff = function(arg1){
  "data": arg1,
  "success": successfn,
  "failure" : failurefn});

Service Proxy
The service proxy is an http request handler. This component analyze the request, use reflection to load the necessary service (class), invoke the service method and send back a result message(text, json, etc.) which will be handled by the JavaScript callback function.

Tuesday, December 14, 2010

Web bug server & log analyzer

I got a task to build a web bug system and then analyze the saved information.

What is a web bug? see what Wikipedia has to say about it

As this might seems like an easy task, there were few interesting challenges:

  • Enormous amount of data to handle (tens of millions of requests every day)

  • Make the system fully scalable

  • Make the system redundant

  • Implementation of the saving mechanism

  • Track and analyze the saving mechanism

  • Archive the data for a long period of time

  • Analyze where requests are coming from (countries)

  • Count unique users

  • Phew, is there some time for coffee???

So, after getting the requirements, scratching the head, making coffee (it appears there's always time for coffee), scratching the head some more, drawing some weird boxes on the erasable board while mumbling some buzz words, scratching the head again and then sitting down to think; comes the time for some decisions.

starting with the seems like easy decisions:
* How to save the data? - just throw it to the database
* Archive? - take the saved data, and put it some data-store
* Where requests are coming from? - find a GEO IP service and use it while analyzing the data
* Unique users? - use cookies
* Coffee? - kitchen --> coffee machine

Design #1 - Simple
1 web server
1 database server

Get all the traffic to the web server with a server side code (PHP, Python, .Net, etc.) which will read the request and save it in the database.

Have a daily job which will get all the saved data and analyze it.

* very easy to implement
* very short time to develop

* Non effective way to handle enormous amount of requests -
analyzing tens of millions of raw data on RDBMS takes eons!
* Not scalable and not redundant

As it might be a very nice start for a small system, this solution fails the requirement of analyzing enormous amount of data.
While testing this design, it took the web server less than 30 seconds to crash.

Back to the drawing board, draw more boxes and mumble a bit more and find how to improve the first idea. new decisions to make:
* Make it scalable - use more than one web server with load balancing
* Redundancy? - log the web requests. If data won't be saved in the database, we can get it from the log files.

Design #2 - Simple++
3 web servers
1 database server

Use DNS load balancing to split the traffic between the web server. This solution is very easy to use for scaling up the system, we only need to put up another server and register it with the DNS system.

Turn on the web server's logging and the rest is like the first design:
Get all the traffic to the web server with a server side code (PHP, Python, .Net, etc.) which will read the request and save it in the database.

Have a daily job which will get all the saved data and analyze it.

* very easy to implement
* very short time to develop
* Scalable
* Redundant

* need some basic knowledge on how to set DNS load balancing
* still not addressing the analyzing time which still takes eons on RDBMS

We got one step forward getting our system to work. The servers where able to withhold the amount of requests and even had good time response. But we are still stuck with the analyzing time.

OK, so we really need to address this analyzing time. New decisions to make:
* Improve analysis time - solution = reduce, reduce, reduce... yes reduce where ever we can.

Design #3 - Smart Reduce
2 Web servers
1 Analysis server
1 database server

This design will try to reduce the end result while maintaining scalability and redundancy.
* Use web server's log files instead of saving the requests directly to the database - we reduced the process time for handling the request

* Add a module to the web server to track the users (with cookies) - we reduced the need for writing any code

* Add a module to the web server to analyze the request and get the GEO location - we reduced the need to do it when analyzing the data

* Limit every log file to short amount of time (minutes)

* When log file is done (the web server is working on a newer log file), ship it to analysis queue on the analysis server - we reduced the amount of data which needs to be analyzed at a time and if failure occur, it will only effect a small portion of the data. If we'll track this right, we can easily find the problem later and reanalyze the data

* Save the analyzed data to the database

* Have a daily job which will get all the saved data and summarize it if necessary

* Scalable
* Redundant
* More cost effective, less web servers can handle more requests
* Analysis time improvement, we can even have partial analyzed data during the day

* Harder to implement
* Takes more time to develop

This design works and meets the requirements. Less web servers are needed for handling the requests(first tests without accelerators showed ability to answer 10 times more requests per server in this design than using server side code).

We can always improve this design by replacing the DNS load balancing to a more robust load balancing solution, so even if one server is down, all the other traffic will be transferred to the other web servers. And probably can find smarter ways to reduce the processing of data on each layer.

More about archiving, tracking and making coffee later on.....
Related Posts Plugin for WordPress, Blogger...