Sandeep Deb's Home Page
Pluto
  Pluto - Technical details
Pluto home  |   Features  |   Download  |   Latest additions

The core of Pluto's design is based on two fundamental principles - loose coupling and usability. Of course all the other design related song and dance is present, but the above two factors are given the highest importance. This is because, for Pluto the future is unknown (Pluto evolves and my understanding of the market evolves) and the only way this can be mitigated is by designing it on the lego principle, I should be able to strip off blocks and rebuild them again with equal ease. The focus on usability is primarily on the user interface front. Since Pluto is centered around data visualization and analysis, usability is the only way to assimilate the data that Pluto gathers.

The diagram on the right shows a very high level structure of Pluto's internals. This is an over simplified diagram but proves the point that Pluto is based on well proven architecture comprising of layers and tiers and an unwavering focus on reuse.

The design of Pluto has also evolved with the realization that within a subnet, Pluto is most economically operated in a client server architecture, where the server comprises of the persistent layer and data fetching logic. Each deployment of Pluto's server is going to require GBs of data space to store the end of day values of stocks for the last 20 years (or more), not to mention the network bandwidth requirement. It is estimated that on an average Pluto downloads around 70 MB of data. Keeping this in mind, the communitation between the client and the server has been kept extremely loosely coupled which will facilitate deploying only the client as a separate application, applet or for that matter a WebStart application. Spring is used heavily to achieve the decoupling in terms of interface based dependency injection.

Pluto's data capture algorithms heavily rely upon screen scraping technologies. At present the scaping is inbuilt using custom string manipulation code, which is not quite resilient towards HTML layout changes in the page. However, the data extraction logic from the scraped contents is well isolated. The intention is to replace the custom logic with WebHarvest and Solvent at a later date. WebHarvest and Solvent use XQuery extensively on the HTML DOM, making it easy to extract data and resilient to string changes in the HTML.

An integral part of Pluto's internals is a job subsystem. As I have mentioned earlier, pluto is designed to be autonomous and functioning with minimal user intervention. It is for this reason that Pluto resides as a background daemon (in Windows as an operating system service). To function on it's own, Pluto relies on a barrage of cron triggered jobs, which get triggered at predefined intervals. The set of jobs include logic to fetch end of day data, intra day data, archival of old records, checking the network status etc. The cron triggering functionality is implemented leveraging Quartz.

I will be penning more details on this page iteratively over a period of time. In the interim, if you are genuinely interested to learn more and/or contribute to Pluto, please get it touch. I would also love to hear any technical views you might have on making Pluto better. You can always reach me through my guest book.