Arsalan Zaidi's Blog: December 2005

Friday, December 23, 2005

"My Golden Rule" -- Business 2.0 Magazine

My Golden Rule

49 business visionaries, collectively worth over $70 billion, state which single philosophy they swear by more than any other.

It's interesting, but as always take this stuff with a whole load of salt. Things are never as clear-cut as the blurb suggests. Who knows if it's these "rules" that made them successful or whether they just wish they had.

:-)

Are there no truly original ideas left?

Look it what I found. Openomy.

Let's see now...

Online File System? Check.
Tagging? Check.
REST based? Check.
1 GB space free? Check.
Transparent Integration coming? Check.

See this blog entry for an introduction to Openomy.

Sounds real similiar to this and this, doesn't it? :-)

Thursday, December 22, 2005

4 Rules for the Practical Entrepreneur -- Fragmented Markets

Ian Landsman has written an excellent article for the practical entrepreneur in this article. Definitely worth the time spent reading it.

Thursday, December 15, 2005

SLoPIS (sloppies): Slow Loading Pages of Infinite Size

I've sometimes wondered how I'd instantiate and maintain a call-back link between a client and a server on the Internet. That is, I want the server to update the client as and when various events occur without the client having to specifically poll for them. This is very easy to do over an intranet, but seems almost impossible over the Internet because most clients will be behind restrictive firewalls and proxies. Using an arbitrary protocol over a randomly chosen port is not going to work. So is using polling the only way to go?

One way out of this conundrum might be to use something I've christened SLoPIS (pronounced: sloppies) - for Slow LOading Pages of Infinite Size - in a moment of whimsy.

What your client will do is issue a GET command to a particular URL, which will be generating the actual call-backs. The server starts to send back a 'page' with any queued events or a ping every 5 seconds to keep the connection alive. To any entity in the middle, the page simply appears to be very large and very slow loading, but not otherwise unusual in any way.

So lets say I have a REST based file system or something and I want to be informed of any file changes on the server. I can open up a connection to http://some.filesystem.com/SLoPIS/filechanges which is, lets say, a Java Servlet on the server. As part of the request I send my authentication data and the server responds with the any queued events. e.g.

<html>
<body>
DELETION: /mnt/some/file
...

If there's nothing to report for awhile and the connection is in danger of timing out, it'll send a ping, like so:

...
PING
...

to continue the downloading of the 'page'.

Now this isn't a true call-back, because the connection has to be initiated by the client, but for most applications I can think of, this is not really such a problem. To receive, the client just keeps an ear to the sloppy and reacts to any events sent over it.

So no more issues with firewalls or proxies and no need to create a special hole in them to get your application to work! :-) I can see this being very valuable for AJAX based web-apps.

There can be issues with load and running out of ports from a client source IP if there are a lot of NAT'ed users coming in from the same IP, so it's not a perfect solution. However, it may be more appropriate than polling in many situations.

On Ennui (on-we)

"This is the basic state of the creative soul when work has no meaning and brings suffering. In this state, employees feel they are doing the same things day after day. They repeat the same tasks, fill out the same forms and talk to the same people. They work in an environment they cannot change. They are merely the executors of other’s projects. They believe their bodies are the extension of other people´s minds.

In this stage, employees are not supposed to be creative. Don’t think, just do it! They are told. They don’t feel their work is important because they feel replaceable. Finally, they are not satisfied with what they do. When someone spends half their life doing something they don’t enjoy, it impacts their soul. Indeed, it has a deep impact.

This is the stage of Sysiphus. Sysiphus was a character of ancient Greek mythology. He represents two states of the soul at work: suffering and lack of meaning. Albert Camus explained the fate of Sysiphus with these words:

“The gods had condemned Sisyphus to ceaselessly rolling a rock to the top of a mountain, whence the stone would fall back of its own weight. They had thought with some reason that there is no more dreadful punishment than futile and hopeless labor”.

But the soul tends to move. It is like a river that always finds a channel."

-- The Life Cycle of the Creative Soul

Wednesday, December 14, 2005

A tag based file system (with Bayesian Auto-tagging)

So having been foiled by OmniDrive in my desire to create an Internet based virtual drive, I've moved on other, perhaps greener and defintely less populated pastures.

If you'll remember, I spoke about adding tagging support to the virtual drive I was talking about. I said that we could "... add tagging support to the virtual drive. You can tag files and folders and view virtual 'tag' folders with links to those files. Mainstream OS's don't have a tagging mechanism for files, so we'll have to add meta-data through file names. e.g. end file names with a special character and the tags (i.e. myfile.txt#work,proposal,text) which will be stripped off before being saved to the virtual drive."

I won't be working on the 'Internet' part of the virtual drive, but I can certainly implement this idea. Why not create a FUSE based file system with tag based virtual folders? Use either folder names (/mnt/tagfs/here:are:some tags/myfile.txt) or file names (/home/user/myfile.txt:here:are:some tags) to add tags and then use virtual tag directories to navigate through those tags? You can use mv to change the tags associated with a file etc.

In addition, we can have a Bayesian Auto-tagger which learns which tags you've previously used for files of a certain type and then automatically tags them appropriately if no tags are supplied. The more you tag, the better the auto-tagger.

Who knows, I might decide to run with this one! :-)

The saga continues...

Relevant links:

Flickr FS - FUSE based. Supports Flickr tags
Del.icio.us Style File Tagging - Some nice ideas and comments
Microsoft's WinFS - Much more than just tagging... but no actual tags either

Monday, December 12, 2005

Another one bites the dust

No sooner do I start thinking about a globally accessible, scaleable, encrypted virtual drive that someone announces their intention of releasing just such a product! :-(

Check out OmniDrive. Pretty much what I was aiming for. They've even got their eyes set on Google and have a very interesting blog entry on The Economics of Online Storage.

Well, back to the drawing board! :-)

Friday, December 09, 2005

The Fine Art of Programming

A fine and growing collection of online programming guides and such. A link worth saving and revisiting.

Thursday, December 08, 2005

A virtual drive in every pot

I've been mulling over the idea of a virtual drive on the Internet for some years now. Witness my ham-handed efforts at http://zfs.sourceforge.net for an early example. Well, I recently resurrected the idea of writing something like it now and it's interesting to see how my thoughts have evolved.

ZFS as I initially envisioned it was to be a network of automatically replicating file servers and the use case in my mind was a university file server. There would be a mapping of many users to a single (virtual) server, with the system (internally a cluster) having to scale to handle as close to an infinite number of users as possible.

Lately, I've been thinking more along the lines of writing a 'Net Drive' type application. A virtual disk I can mount from any machine connected to the Internet and treat as a local drive. Companies like xDrive and iDrive and MangoSoft already offer something of the sort. However, their offerings are targeted more towards the business user. I personally feel there's a massive untapped market of casual users who might be interested.

Imagine having 1GB of space available to you online and directly accessible via a virtual drive. Directly save documents, media files etc. to the virtual drive and access it from anywhere through another computer with the same drive mounted in or through a web interface. Share your password with several people and have them save in the same drive if you wish, give them a URL to the data on your drive or just share certain folders. Boom, you've just eliminated the need for a hard disk on our PC. Internet appliances, here we come!

This type of application would be ideal for someone like Google to create and I can see them stepping into this field sometime soon. It's a classic Google app. You need to scale almost infinitely, but that's easy because you can create slices of the virtual resource and limit the number of users accessing each slice. Want to support more users? Add more slices.

Take Gmail as an example. It's probably got hundreds of millions of users, but unlike the University use case, the users have a many to many (or from another perspective, a one to one) relationship with the system. That is, unlike university students, gmail users are not interested in checking other people's mail or accessing a common email account, or even sharing their email account. This makes it much easier to slice up the virtual space, assigning a limited number of users to each slice and scaling the slices. So Gmail is probably made up of thousands of individual computers, each supporting let's say 1000 users, fronted by an authentication cluster. When a user wants to log in, he goes to gmail.com, is authenticated and then redirected to the individual machine he shares with 999 other people. If you want to add another 1000 users, plug in another machine. You can keep scaling horizontally till infinity for all practical purposes. The authentication datastore will eventually become a bottle-neck, but you can support a enormous number of users before you hit that wall. *

If Google were making a virtual drive, they'd do something similiar. As new users signed up, they'd be assigned to different machines, upto a certain max cap. Just like gmail, users have a one to one relationship with their account. That is, they're only interested in the contents of their accounts and have no need to access anyone elses account or a common store. This makes it trivial to scale exponentially.

Now this is a great product to make and market, except for one small problem; there are already a whole bunch of people out there doing the same thing. So we need to differentiate ourselves from the pack.

One way to do that is to add tagging support to the virtual drive. You can tag files and folders and view virtual 'tag' folders with links to those files. Mainstream OS's don't have a tagging mechanism for files, so we'll have to add meta-data through file names. e.g. end file names with a special character and the tags (i.e. myfile.txt#work,proposal,text) which will be stripped off before being saved to the virtual drive. Users can also publicly 'share' tags.

Other features we can offer are:

Fast file indexing and searching and maybe even mapping/linking files to each other based on content etc.
Clients for hand-helds with disconnected operational ability
Single-click integration with Flikr, Del.icio.us etc.
Rsync based transfers

How are you going to pay for all this? Advertising. Have the virtual drive folder show text/banner ads and the website as well. Have premium accounts and dedicated machines for business users.

Who knows, I might work on this idea... or maybe not.

* Correction: It's possible to avoid turning the authentication store into a bottle-neck. One way to do this would be to have the store for a particular set of users reside on the machine assigned to them. So when you want to access the virtual drive abc.virtualdrive.com, you go to that URL and send in your username and password. If the authentication process running on that machine can't find the user, the login attempt fails.

This just leaves the DNS server as the bottleneck now :-)

Wednesday, December 07, 2005

PETA - People Eating Tasty Animals

If God didn't want us eating animals, why did He make them out of meat? :-D

A couple of friends and I were discussing the various methods used to slaughter animals (over lunch, when else) and we got to talking about the Halaal method (where a stroke through the neck severs every vein, artery and the wind-pipe, but leaves the spinal column intact) versus the Jhatka method (lit. jerk - where the animal is decapitated in one stroke). The debate revolved around which method caused the least pain to the animal, with most people automatically assuming that the Jhatka method was more painless.

I disagree.

In both methods, the animal eventually becomes unconscious due to a lack of blood going to the brain and the resultant drop in blood pressure. In the Jhatka method, since the animal is decapitated and the head is no longer attached to the body, we don't see the animal kick about and grunt as animals being slaughtered are wont to do. However, the sensation of 'pain' is interpreted by the brain and as long as that is active, the animal will suffer. The head being separated from the body doesn't make any difference.

Since in both methods, the blood flow is disrupted and in one we have the additional pain of the spine being cut through, the Jhatka method should logically be the more painful of the two, with the added disadvantage of not keeping the heart going for as long as possible to clear out as much of the pathogen carrying blood as possible.

It's just an illusion that Jhatka is more merciful, brought on by the stillness of the decapitated animal corpse. A final verdict awaits the time when it will be possible for us to measure and quantify 'pain' as a value...

Hungry kya? :-P

Tuesday, December 06, 2005

SQL Injection and XSS Attacks

Some topics everyone involved in web development must read at least once:

SQL Injection Attacks by Example - It's a lot easier than you think. And yes, you customers will try it out some of the standard approaches out of idle curiosity if nothing else.
Real World XSS - You'll be surprised at the sites which are vulnerable to attacks of this nature.
More XSS
And still more XSS

Mark Cuban - Success & Motivation

Blog Maverick - The Mark Cuban Blog

In Essence:

Be driven
Know the industry you're in, inside out.
Use 1. and 2. to ensure you're ready when Lady Luck strikes.
You only need to be lucky once...

Cuban comes across as an intense, driven, workaholic; kind of like my current (successfully entrepreneurial) employer :-D. It's a bit depressing to think that the only way to free yourself from the chains of a 9-5 life, is to handcuff yourself to a 00:00-23:59 one!

Arsalan Zaidi's Blog