Monday, October 18, 2004

Moving towards Scalable Distributed and High Available Data Structures

MultiComputers: ---------------

Commodity computers, interconnected through the high- speed networks are becoming the basic hardware. Such configurations, called multicomputers, clusters, superservers... include dozens, often hundreds or even thousands of clients and servers . Their cumulative resources are impressive: dozens of GBytes of distributed RAM, and TBytes of disks accessible for the GMips-highly parallel and distributed processing. They offer potentially unbeatable performance and price/performance ratio, opening up new perspectives for the applications . Major hardware and software makers present multicomputers as the next step of their business strategy. This was, among others, the subject of Microsoft Scalability Day organized in May 1997 for the US professionals. The technology is also felt as the next step for the Internet that should evolve from a data providing utility today, into a computing utility . One also foresees very large multicomputers coming as new tools for the academia, e.g., a 10.000 node multicomputer for Stanford University in 5-10 years . Finally, the domain was recently declared as strategic to the US computer science supremacy in 21st century by the US Government, and become the object of nationwide research program PACI .

Multicomputers need new system software, fully taking advantage of the distributed RAM, and of parallel and processing on multiple CPUs,. One especially needs new data structures, allowing files to span over multiple servers, and to scale to as many sites as needed. Such files should reside for processing in the distributed RAM, providing then access performance inaccessible to disk files. They should be accessible to multiple autonomous clients, including the mobile ones. Finally, they should not require any centralized data access computation or directories, too avoid hot-spots.

One solution is a new class of data structures called Distributed Scalable Data Structures (SDDSs). First proposed in 1992-1993, SDDSs gave rise to important research effort, materialized by several algorithms, papers and implementations. It was shown that SDDS files are able to scale to thousands of sites, and terabytes in distributed RAM, with constant access performance, and search times under a millisecond. Multi-key searches requiring an hour or so in a traditional file, e.g., a k-d file, may succeed in less than a second in an SDDS file . All these properties should be of prime importance for the applications, especially in the DBMS design arena, . They open new perspective for the VLDB design, for multimedia databases, for real-time and high-availability databases, for decision support systems, and for high performance computing in general.

Wednesday, October 06, 2004

Need of Incremental Datastructures.........from transistors to transactions

Yep that sound's really great. As a developer did you ever felt the need of these things ever??. Currently Iam into this intresting area, I talked about making your datastructures persistent datasturctures (look at my previous blog), persistent datastructures can replace the need to database to make use of the concurrrency and consistency of the data in these structures. So now come's the challenge Incremental Datastructures, especially Incremental Datastructures are most essential if the data in these structures is really behemoth. Take the following examples from totally different domains Geometric synthesis: Physical Design ------------------------------------ As we know in many of the physical design tools (software), we need to represent all the geometric information of the circuit, I mean for the fabricator he just need the geometric information about the polygons in each layer (p-type, n-type , metal ect...). So in the software tools (Physical Design Layout Editors) to process this information. The complete information should be kept in some datastructure. I'll give an example of my favourite datastructure the CORNER STITCH. Created by Ousterhout corner stitch. but unfortunately corner stitch has the following drawbacks. 1. Its a totally inmemory datastructure. 2. Its not an incremental datastructure. So to handle the incremental changes (Custom IC design). We need an incremental corner stitch. So the question is can we make a incremental and external memory corner stitch? Bussiness Intelligence ----------------------- Lets move into a totally different domain which deals with massive amount of data, lets move from Transistors to Transcations. The massive data here Iam talking is about the data you dump in your Datawarehouse. So most of the analytical queries are based on MV's(Materialized Views). These MV's Hold massive amount of data really massive. And these are genereally refreshed every 1 to 1.5 months, to analyze the transcations a particular enterprise is processing. Yes Iam talking about OLAP and not OLTP. Generally the refresh time of a OLAP system is very large for an enterprise, typically goes to several days before the managers can analyze the reports on these MV's. Now the challenge is can we build a Real-Time OLAP system, which will not take this much time to refresh.....Yes Yes you got it right Iam talking about incremental refresh....so incremental refresh need a Incremental Datastructure. Recently I have been working with oracle, On a Bussiness Intelligence product. Oracle is the first company to recognise the need of incremental algorithms in the area of bussiness intelligence. They call it Daily Bussiness Intelligence (DBI) is the implementation of such incremental OLAP system with a lot of incremental datastructures in it. study more about DBI DBI So do you think your datastructures need a Incremental tinge???? Vamsi