## Saturday, December 18, 2004

### L-TREE a new datastructure for efficient hierarchy representation in a database.....

Hey don't say that you guys have never came across an LDAP server....hmm...ok its your directory within your organizaion.....since every organizaition is hierarchical in the sense a 'senior manager' has several 'managers' reporting to him and a 'manager' has 'developers' etc.....so all this data is hierarchical. So if I were to give you this problem and make a system which will report queries like 'gimme all the employees under employee X' how would you approach the problem ?....
One quick solution is create a table in the database
CREATE TABLE ORG_DIR (int EMP_ID NOT_NULL,VARCHAR(10) EMP_NAME, int SUP_ID);
To answer that you might query the database like
SELECT T1.EMP_NAME
FROM ORG_DIR T1, ORG_DIR T2
WHERE T1.SUP_ID == T2.EMP_ID && T2.EMP_ID = &INPUT

Finally you have got a workaround...Unfortunately this kind of workarounds seem to be very inefficient for storing and retriving Structured Hierarchical Data hmm...I know you are thinking about some thing right??? XML :)) Yes its XML. How can I store XML in my database and answer my queries on the Hierarchical data. My Idea is to make HIERARCHY as a datatype itself rather than some thing else. That is just as we have primitive datatype support within a database..ie every column in the database would be of some primitive datatype supported by the database...so The idea is can we make HIERARCHY a datatype.....hey wait a sec just figured out I'am not the only one who thought about it....yes Oleg has alreay added a great Contrib Module for postgresql. Oleg made is generic so that you can persist any label based HIERARCHY...now I have a good Idea in extending L-TREE to support XML....its really great to have such an optimized HIERARCHY representation for structured data...people have been talking about indexing and datastructures for XML Databases but this seems to me as a really good contribution from Oleg..... Vamsi

## Tuesday, December 14, 2004

### Useful unix command for finding all the files recusively which contain a string

peemt4:vamsik:(vamsi_aix_ns_port):/vobs/ETG_Repository/src>find . -name "*.c" -exec grep -i "Your Search String" \{\} \; -print
Sound simple but thought it would be useful....

## Saturday, December 04, 2004

### Moving from STL to GTL

GTL
Most of the developers building any systems are generally stuck with the problem of implementing graph algorithms. Graph algorithms are really fundamental algorithms and are applied extensively...take the following example's

1. Circuit Partitioning: The stage where the designer had come up with a netlist and technology mapping being done. Now want to partition the circuit so that highly connected blocks stay together.....So when I talk about netlist....there it goes the HYPERGRAPH. The most fundamental netlist representation so to make systems which process these hypergraphs (netlist's) you need to write code in a very generic manner because the partioned hypergraph by the partitioning algorithms will be used by the placement algorithms and it goes on and on.......hmmmm now I recollect the network stack where pointer to the packet is passed across several layers and its done in such a way that to avoid redundancy involved in copying over the stack (process stack not network stack) they use a pointer to some heap space.......similarly in EDA tools where the netlist is taken through several stages(involves verification also) and finally realized into physical geometry, the developers especially EDA tool developers need some generic implementations like STL....its really great TEMPLATE LIBRARY lets you quickly do your tasks and modify only appropriate parts of the algorithms.....but STL has no support of hypergraphs...I mean not even graphs....you need to create your own Graph Container on the existing framework.....so now to avoid this we have got GTL a great libray.....I came across it while helping a friend in implementing KLM circuit partitioning algorithm...I feel that it would be really helpful for may other computer science domains apart from CAD....

Cheers
Vamsi

## Monday, November 22, 2004

### Trying to start a new life(Genetic Algorithms).....but those old memories(Deterministic Algorithms) still make me feel emotional.........

Joined JGAP to contribute to the opensource community. Genetic algorithms (GA's) are search algorithms that work via the process of natural selection. They begin with a sample set of potential solutions which then evolves toward a set of more optimal solutions. Within the sample set, solutions that are poor tend to die out while better solutions mate and propegate their advantageous traits, thus introducing more solutions into the set that boast greater potential (the total set size remains constant; for each new solution added, an old one is removed). A little random mutation helps guarantee that a set won't stagnate and simply fill up with numerous copies of the same solution. In general, genetic algorithms tend to work better than traditional optimization algorithms because they're less likely to be led astray by local optima. This is because they don't make use of single-point transition rules to move from one single instance in the solution space to another. Instead, GA's take advantage of an entire set of solutions spread throughout the solution space, all of which are experimenting upon many potential optima. However, in order for genetic algorithms to work effectively, a few criteria must be met: * It must be relatively easy to evaluate how "good" a potential solution is relative to other potential solutions. * It must be possible to break a potential solution into discrete parts that can vary independently. These parts become the "genes" in the genetic algorithm. * Finally, genetic algorithms are best suited for situations where a "good" answer will suffice, even if it's not the absolute best answer.

## Tuesday, November 16, 2004

### Never count on anyone except your self..........(Dont trust the runtime it will never behave ideally for your programs)

Never think that the code you have written is platform independent, even though you write it according to POSIX standards. So this is what programs teach us in life never trust any thing which is not written by you. YOU CANNOT COUNT ON ANYONE EXCEPT YOU.... --------------------------------------- An echo fades into the night, an eerie mournful sound. A shooting star disappears from sight, and I crumble to the ground. There is no life within this garden; my sobs are the only sound. I have poisoned the honeyed fountain where your love could be found. Dazed, I stare at the stars above, my grieving howls fill the night! Unintended betrayal of love has hidden you from my sight. I remember how it used to be when we shared our fears and delights. You are a treasured friend to me. How can I make things right? Feeling afraid, cold and lonely, I long to tell you how I feel, but you don’t want to hear me. The pain for you is much too real. Should I back away and build a wall and block away how I feel? Or, should I give you a call? We both need some time to heal. An echo fades into the night as our friendship disappears. How do I know what is right? How can I ease my fears? If I do call you again, would the old wounds reappear? I can’t stand to cause you pain. Hurting you again is my worst fear!

## Friday, November 12, 2004

### what you feel when __L_I_F_E__ daemon receives signal 11...

Pain... Tension... Fatigue... Depression... Anger, Aggression, Frustration. All these unwanted sensations - Burning, hurting, tearing. My heart alone, cold and fearing. Why won't you let me sleep, let me rest, Let me forget To eradicate, eliminate, destroy all my regrets? These memories inside, swirling, twirling, unwilling to reside in the corner of my mind. Repeating, resisting, insisting - Refusing to be denied its recognition Of its position in my Frustration, Confusion, Delusion. Ah, to close my eyes and let time fly by, Because there's so much to gain By forgetting these dreams driving me insane. Unfocused, unclear, out of control, My world spinning, spinning, spinning, My sanity flying through the door. My reason, my logic, oh, it's tragic, Like fine sands running through my hands, I'm losing my mind.

## Tuesday, November 09, 2004

1. research needs more brains than hands (unfortunately we were given two hands but only one brain); before counting how many tools you can use, ponder how effectively you can think. 2. writing is occupying a critical role, since "writing is nature's way of letting you know how sloppy your thinking is". 3. mathematics is the key word, since it is "the science of effetive reasoning"; or "mathematics is nature's way of letting you know how sloppy your writing is".

## Saturday, November 06, 2004

### External Memory Geometric Datastructures

Many massive dataset applications involve geometric data (or data that can be interpreted geometrically) Points, lines, polygons Data need to be stored in data structures on external storage media such that on-line queries can be answered I/O-efficiently Data often need to be maintained during dynamic updates,Data sets in large applications are often too massive to fit completely inside the computer's internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. During the last decade a major body of research has been devoted to the development of efficient external memory algorithms, where the goal is to exploit locality in order to reduce the I/O costs Examples: Phone: Wireless tracking Consumer: Buying patterns (supermarket checkout) Geography: NASA satellites generate 1.2 TB per day

## Tuesday, November 02, 2004

### How to represent INFINITY in ur programs

No doubt every one might have come across this in their programming experience, on how to represent the theoritical infinity in the programs.
Example: -----------
Most candid use of INFINITY is to use them in graphs to represent two nodes which are not connected ie if we dont have any edge between two nodes N1 and N2 then we represent the cost of the edge as INFINITY. in the implementation of the dijkstra's algorithm on a graph we generally, come across a condition 'CURRENT_COST > CURRENT_COST + WEIGHT_OF_EDGE' then update the path. Theoritically INFINITY+1 = INFINITY. So if some one just does a #define INFINITY __SOME_LARGE_NUMBER, then INFINITY+1 will not be INFINITY. And the conditions just as the ones illustrated above will become buggy. The following representation of infinity is more shrewd than the ordinary #def's
/*
* The following coordinate, INFINITY, is used to represent a
* tile location outside of the tile plane.
*
* It must be possible to represent INFINITY+1 as well as
* INFINITY.
*
* Also, because locations involving INFINITY may be transformed,
* it is desirable that additions and subtractions of small integers
* from either INFINITY or MINFINITY not cause overflow.
*
* Consequently, we define INFINITY to be the largest integer
* representable in wordsize - 5 bits.
*/

#undef INFINITY
#define	INFINITY	((1 << (8*sizeof (int) - 6)) - 4)
#define	MINFINITY	(-INFINITY)


## Monday, October 18, 2004

### Moving towards Scalable Distributed and High Available Data Structures

MultiComputers: ---------------

Commodity computers, interconnected through the high- speed networks are becoming the basic hardware. Such configurations, called multicomputers, clusters, superservers... include dozens, often hundreds or even thousands of clients and servers . Their cumulative resources are impressive: dozens of GBytes of distributed RAM, and TBytes of disks accessible for the GMips-highly parallel and distributed processing. They offer potentially unbeatable performance and price/performance ratio, opening up new perspectives for the applications . Major hardware and software makers present multicomputers as the next step of their business strategy. This was, among others, the subject of Microsoft Scalability Day organized in May 1997 for the US professionals. The technology is also felt as the next step for the Internet that should evolve from a data providing utility today, into a computing utility . One also foresees very large multicomputers coming as new tools for the academia, e.g., a 10.000 node multicomputer for Stanford University in 5-10 years . Finally, the domain was recently declared as strategic to the US computer science supremacy in 21st century by the US Government, and become the object of nationwide research program PACI .

Multicomputers need new system software, fully taking advantage of the distributed RAM, and of parallel and processing on multiple CPUs,. One especially needs new data structures, allowing files to span over multiple servers, and to scale to as many sites as needed. Such files should reside for processing in the distributed RAM, providing then access performance inaccessible to disk files. They should be accessible to multiple autonomous clients, including the mobile ones. Finally, they should not require any centralized data access computation or directories, too avoid hot-spots.

One solution is a new class of data structures called Distributed Scalable Data Structures (SDDSs). First proposed in 1992-1993, SDDSs gave rise to important research effort, materialized by several algorithms, papers and implementations. It was shown that SDDS files are able to scale to thousands of sites, and terabytes in distributed RAM, with constant access performance, and search times under a millisecond. Multi-key searches requiring an hour or so in a traditional file, e.g., a k-d file, may succeed in less than a second in an SDDS file . All these properties should be of prime importance for the applications, especially in the DBMS design arena, . They open new perspective for the VLDB design, for multimedia databases, for real-time and high-availability databases, for decision support systems, and for high performance computing in general.

## Wednesday, October 06, 2004

### Need of Incremental Datastructures.........from transistors to transactions

Yep that sound's really great. As a developer did you ever felt the need of these things ever??. Currently Iam into this intresting area, I talked about making your datastructures persistent datasturctures (look at my previous blog), persistent datastructures can replace the need to database to make use of the concurrrency and consistency of the data in these structures. So now come's the challenge Incremental Datastructures, especially Incremental Datastructures are most essential if the data in these structures is really behemoth. Take the following examples from totally different domains Geometric synthesis: Physical Design ------------------------------------ As we know in many of the physical design tools (software), we need to represent all the geometric information of the circuit, I mean for the fabricator he just need the geometric information about the polygons in each layer (p-type, n-type , metal ect...). So in the software tools (Physical Design Layout Editors) to process this information. The complete information should be kept in some datastructure. I'll give an example of my favourite datastructure the CORNER STITCH. Created by Ousterhout corner stitch. but unfortunately corner stitch has the following drawbacks. 1. Its a totally inmemory datastructure. 2. Its not an incremental datastructure. So to handle the incremental changes (Custom IC design). We need an incremental corner stitch. So the question is can we make a incremental and external memory corner stitch? Bussiness Intelligence ----------------------- Lets move into a totally different domain which deals with massive amount of data, lets move from Transistors to Transcations. The massive data here Iam talking is about the data you dump in your Datawarehouse. So most of the analytical queries are based on MV's(Materialized Views). These MV's Hold massive amount of data really massive. And these are genereally refreshed every 1 to 1.5 months, to analyze the transcations a particular enterprise is processing. Yes Iam talking about OLAP and not OLTP. Generally the refresh time of a OLAP system is very large for an enterprise, typically goes to several days before the managers can analyze the reports on these MV's. Now the challenge is can we build a Real-Time OLAP system, which will not take this much time to refresh.....Yes Yes you got it right Iam talking about incremental refresh....so incremental refresh need a Incremental Datastructure. Recently I have been working with oracle, On a Bussiness Intelligence product. Oracle is the first company to recognise the need of incremental algorithms in the area of bussiness intelligence. They call it Daily Bussiness Intelligence (DBI) is the implementation of such incremental OLAP system with a lot of incremental datastructures in it. study more about DBI DBI So do you think your datastructures need a Incremental tinge???? Vamsi

## Friday, September 17, 2004

### Its never too late to get it back..........(This is what I feel today....)

Hey guys and bi%\$#@!, no one is a master and especially if you are aspiring to become one never ever feel this. I know its really involves a lot of nerve to become one, 1.have that same yearing passion for creating complex things forever, 2.always wanting to be one THE ONE. Yes I'am an amateur hacker....always wanted to be like http://catb.org/~esr. If you are also the one with the same attitude just readon. In this I just wanted to tell about the things an aspiring hacker needs to take care during his career, never feel that you have assuaging the above things which I mentioned due to a failure, failure may not always be technical sloppiness (atleast for me), you feel that you failed even in any different situation. just remember that its never too late to get it back (ITS NEVER TOO LATE TO GET IT BACK).

## Sunday, July 18, 2004

### Have you ever wondered on how to create persistent datastructures.......?

Hey recently I came across a problem on how to implement external memory datastructures???........external memory datastructures are just like normal datastructures but operations you make on these datastructures are stored in secondary memory.......   PROBLEM: My server application (assume a dashboard application running on a server)has a binary tree to keep track of the posts posted by the user....as long as my server is up and running...and when the user creates new posts, Iam adding the posts to this tree...every thing was fine till recently before my dashboard became a hotspot on the internet...suddenly I found as the posts increased the tree is growing bigger and bigger and eating up all the RAM space on my sever Ok I still have a 4GB RAM..but its hitting the performance.....what should I do as the person who wrote the program to handle this stuff.....I cannot keep all the 1 million posts in the RAM....ok the OS's virtual memory is taking care of doing this (Virtual memory is doing the swapping)...but its leading to thrashing....ask the virtual memories code is not application specific.....So whats the solution???? SOLUTION: All applications which need to handle huge amounts of data in the application code should have a inbuild PERSISTENT-RUNTIME which does the application specific paging.....I guess the code below will make things clear than words.....   I had this RUNTIME code in all the persistent code which I had written...I guess its very easy to tweak for your application.............Planning to make a opensource project which does better caching etc...... Get in touch with me I'll sell it if you want to buy............the following is the interface....
#ifndef __Persistence_H__
#define __Persistence_H__
#include
#include

typedef int PAGENUMBER;
typedef void* PAGE;
typedef enum PersistentError{PSUCCESS=1,
/**
* The Cacheing structure
** This helps in doing the inmemory Lookup
**/
typedef struct PersistCache{
//TODO: Some one can extend this interface

bool (*pageExists)(PAGENUMBER)=NULL;
}PCache;
/**** The core Persistence Engine **/
typedef struct Persistence{
int PageSize;
FILE *fptr;
PCache *PageCache;
//The core routines//
PERROR (*startPersistence)(const char*);
PAGENUMBER (*getNewPageNumber)(void)=NULL;
PAGE (*getPage)(PAGENUMBER)=NULL;
PERROR (*writePage)(PAGENUMBER,PAGE)=NULL;
PERROR (*defragmentPersistence)(void)=NULL;

}PersistEngine;
#endif