Monday, October 18, 2004

Moving towards Scalable Distributed and High Available Data Structures

MultiComputers: ---------------

Commodity computers, interconnected through the high- speed networks are becoming the basic hardware. Such configurations, called multicomputers, clusters, superservers... include dozens, often hundreds or even thousands of clients and servers . Their cumulative resources are impressive: dozens of GBytes of distributed RAM, and TBytes of disks accessible for the GMips-highly parallel and distributed processing. They offer potentially unbeatable performance and price/performance ratio, opening up new perspectives for the applications . Major hardware and software makers present multicomputers as the next step of their business strategy. This was, among others, the subject of Microsoft Scalability Day organized in May 1997 for the US professionals. The technology is also felt as the next step for the Internet that should evolve from a data providing utility today, into a computing utility . One also foresees very large multicomputers coming as new tools for the academia, e.g., a 10.000 node multicomputer for Stanford University in 5-10 years . Finally, the domain was recently declared as strategic to the US computer science supremacy in 21st century by the US Government, and become the object of nationwide research program PACI .

Multicomputers need new system software, fully taking advantage of the distributed RAM, and of parallel and processing on multiple CPUs,. One especially needs new data structures, allowing files to span over multiple servers, and to scale to as many sites as needed. Such files should reside for processing in the distributed RAM, providing then access performance inaccessible to disk files. They should be accessible to multiple autonomous clients, including the mobile ones. Finally, they should not require any centralized data access computation or directories, too avoid hot-spots.

One solution is a new class of data structures called Distributed Scalable Data Structures (SDDSs). First proposed in 1992-1993, SDDSs gave rise to important research effort, materialized by several algorithms, papers and implementations. It was shown that SDDS files are able to scale to thousands of sites, and terabytes in distributed RAM, with constant access performance, and search times under a millisecond. Multi-key searches requiring an hour or so in a traditional file, e.g., a k-d file, may succeed in less than a second in an SDDS file . All these properties should be of prime importance for the applications, especially in the DBMS design arena, . They open new perspective for the VLDB design, for multimedia databases, for real-time and high-availability databases, for decision support systems, and for high performance computing in general.

No comments: