Saturday, December 18, 2004

L-TREE a new datastructure for efficient hierarchy representation in a database.....

Hey don't say that you guys have never came across an LDAP server....hmm...ok its your directory within your organizaion.....since every organizaition is hierarchical in the sense a 'senior manager' has several 'managers' reporting to him and a 'manager' has 'developers' etc.....so all this data is hierarchical. So if I were to give you this problem and make a system which will report queries like 'gimme all the employees under employee X' how would you approach the problem ?....
One quick solution is create a table in the database
CREATE TABLE ORG_DIR (int EMP_ID NOT_NULL,VARCHAR(10) EMP_NAME, int SUP_ID);
To answer that you might query the database like
SELECT T1.EMP_NAME 
FROM ORG_DIR T1, ORG_DIR T2
WHERE T1.SUP_ID == T2.EMP_ID && T2.EMP_ID = &INPUT
Finally you have got a workaround...Unfortunately this kind of workarounds seem to be very inefficient for storing and retriving Structured Hierarchical Data hmm...I know you are thinking about some thing right??? XML :)) Yes its XML. How can I store XML in my database and answer my queries on the Hierarchical data. My Idea is to make HIERARCHY as a datatype itself rather than some thing else. That is just as we have primitive datatype support within a database..ie every column in the database would be of some primitive datatype supported by the database...so The idea is can we make HIERARCHY a datatype.....hey wait a sec just figured out I'am not the only one who thought about it....yes Oleg has alreay added a great Contrib Module for postgresql. Oleg made is generic so that you can persist any label based HIERARCHY...now I have a good Idea in extending L-TREE to support XML....its really great to have such an optimized HIERARCHY representation for structured data...people have been talking about indexing and datastructures for XML Databases but this seems to me as a really good contribution from Oleg..... Vamsi

Tuesday, December 14, 2004

Useful unix command for finding all the files recusively which contain a string

peemt4:vamsik:(vamsi_aix_ns_port):/vobs/ETG_Repository/src>find . -name "*.c" -exec grep -i "Your Search String" \{\} \; -print
Sound simple but thought it would be useful....

Saturday, December 04, 2004

Moving from STL to GTL


GTL
Originally uploaded by vamsi.
Most of the developers building any systems are generally stuck with the problem of implementing graph algorithms. Graph algorithms are really fundamental algorithms and are applied extensively...take the following example's

1. Circuit Partitioning: The stage where the designer had come up with a netlist and technology mapping being done. Now want to partition the circuit so that highly connected blocks stay together.....So when I talk about netlist....there it goes the HYPERGRAPH. The most fundamental netlist representation so to make systems which process these hypergraphs (netlist's) you need to write code in a very generic manner because the partioned hypergraph by the partitioning algorithms will be used by the placement algorithms and it goes on and on.......hmmmm now I recollect the network stack where pointer to the packet is passed across several layers and its done in such a way that to avoid redundancy involved in copying over the stack (process stack not network stack) they use a pointer to some heap space.......similarly in EDA tools where the netlist is taken through several stages(involves verification also) and finally realized into physical geometry, the developers especially EDA tool developers need some generic implementations like STL....its really great TEMPLATE LIBRARY lets you quickly do your tasks and modify only appropriate parts of the algorithms.....but STL has no support of hypergraphs...I mean not even graphs....you need to create your own Graph Container on the existing framework.....so now to avoid this we have got GTL a great libray.....I came across it while helping a friend in implementing KLM circuit partitioning algorithm...I feel that it would be really helpful for may other computer science domains apart from CAD....

Cheers
Vamsi

Monday, November 22, 2004

Trying to start a new life(Genetic Algorithms).....but those old memories(Deterministic Algorithms) still make me feel emotional.........

Joined JGAP to contribute to the opensource community. Genetic algorithms (GA's) are search algorithms that work via the process of natural selection. They begin with a sample set of potential solutions which then evolves toward a set of more optimal solutions. Within the sample set, solutions that are poor tend to die out while better solutions mate and propegate their advantageous traits, thus introducing more solutions into the set that boast greater potential (the total set size remains constant; for each new solution added, an old one is removed). A little random mutation helps guarantee that a set won't stagnate and simply fill up with numerous copies of the same solution. In general, genetic algorithms tend to work better than traditional optimization algorithms because they're less likely to be led astray by local optima. This is because they don't make use of single-point transition rules to move from one single instance in the solution space to another. Instead, GA's take advantage of an entire set of solutions spread throughout the solution space, all of which are experimenting upon many potential optima. However, in order for genetic algorithms to work effectively, a few criteria must be met: * It must be relatively easy to evaluate how "good" a potential solution is relative to other potential solutions. * It must be possible to break a potential solution into discrete parts that can vary independently. These parts become the "genes" in the genetic algorithm. * Finally, genetic algorithms are best suited for situations where a "good" answer will suffice, even if it's not the absolute best answer.

Tuesday, November 16, 2004

Never count on anyone except your self..........(Dont trust the runtime it will never behave ideally for your programs)

Never think that the code you have written is platform independent, even though you write it according to POSIX standards. So this is what programs teach us in life never trust any thing which is not written by you. YOU CANNOT COUNT ON ANYONE EXCEPT YOU.... --------------------------------------- An echo fades into the night, an eerie mournful sound. A shooting star disappears from sight, and I crumble to the ground. There is no life within this garden; my sobs are the only sound. I have poisoned the honeyed fountain where your love could be found. Dazed, I stare at the stars above, my grieving howls fill the night! Unintended betrayal of love has hidden you from my sight. I remember how it used to be when we shared our fears and delights. You are a treasured friend to me. How can I make things right? Feeling afraid, cold and lonely, I long to tell you how I feel, but you don’t want to hear me. The pain for you is much too real. Should I back away and build a wall and block away how I feel? Or, should I give you a call? We both need some time to heal. An echo fades into the night as our friendship disappears. How do I know what is right? How can I ease my fears? If I do call you again, would the old wounds reappear? I can’t stand to cause you pain. Hurting you again is my worst fear!

Friday, November 12, 2004

what you feel when __L_I_F_E__ daemon receives signal 11...

Pain... Tension... Fatigue... Depression... Anger, Aggression, Frustration. All these unwanted sensations - Burning, hurting, tearing. My heart alone, cold and fearing. Why won't you let me sleep, let me rest, Let me forget To eradicate, eliminate, destroy all my regrets? These memories inside, swirling, twirling, unwilling to reside in the corner of my mind. Repeating, resisting, insisting - Refusing to be denied its recognition Of its position in my Frustration, Confusion, Delusion. Ah, to close my eyes and let time fly by, Because there's so much to gain By forgetting these dreams driving me insane. Unfocused, unclear, out of control, My world spinning, spinning, spinning, My sanity flying through the door. My reason, my logic, oh, it's tragic, Like fine sands running through my hands, I'm losing my mind.

Tuesday, November 09, 2004

Some thing about research.

1. research needs more brains than hands (unfortunately we were given two hands but only one brain); before counting how many tools you can use, ponder how effectively you can think. 2. writing is occupying a critical role, since "writing is nature's way of letting you know how sloppy your thinking is". 3. mathematics is the key word, since it is "the science of effetive reasoning"; or "mathematics is nature's way of letting you know how sloppy your writing is".

Saturday, November 06, 2004

External Memory Geometric Datastructures

Many massive dataset applications involve geometric data (or data that can be interpreted geometrically) Points, lines, polygons Data need to be stored in data structures on external storage media such that on-line queries can be answered I/O-efficiently Data often need to be maintained during dynamic updates,Data sets in large applications are often too massive to fit completely inside the computer's internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. During the last decade a major body of research has been devoted to the development of efficient external memory algorithms, where the goal is to exploit locality in order to reduce the I/O costs Examples: Phone: Wireless tracking Consumer: Buying patterns (supermarket checkout) Geography: NASA satellites generate 1.2 TB per day

Tuesday, November 02, 2004

How to represent INFINITY in ur programs

No doubt every one might have come across this in their programming experience, on how to represent the theoritical infinity in the programs.
Example: -----------
Most candid use of INFINITY is to use them in graphs to represent two nodes which are not connected ie if we dont have any edge between two nodes N1 and N2 then we represent the cost of the edge as INFINITY. in the implementation of the dijkstra's algorithm on a graph we generally, come across a condition 'CURRENT_COST > CURRENT_COST + WEIGHT_OF_EDGE' then update the path. Theoritically INFINITY+1 = INFINITY. So if some one just does a #define INFINITY __SOME_LARGE_NUMBER, then INFINITY+1 will not be INFINITY. And the conditions just as the ones illustrated above will become buggy. The following representation of infinity is more shrewd than the ordinary #def's
/*
* The following coordinate, INFINITY, is used to represent a
* tile location outside of the tile plane.
*
* It must be possible to represent INFINITY+1 as well as
* INFINITY.
*
* Also, because locations involving INFINITY may be transformed,
* it is desirable that additions and subtractions of small integers
* from either INFINITY or MINFINITY not cause overflow.
*
* Consequently, we define INFINITY to be the largest integer
* representable in wordsize - 5 bits.
*/

#undef INFINITY
#define	INFINITY	((1 << (8*sizeof (int) - 6)) - 4)
#define	MINFINITY	(-INFINITY)

Monday, October 18, 2004

Moving towards Scalable Distributed and High Available Data Structures

MultiComputers: ---------------

Commodity computers, interconnected through the high- speed networks are becoming the basic hardware. Such configurations, called multicomputers, clusters, superservers... include dozens, often hundreds or even thousands of clients and servers . Their cumulative resources are impressive: dozens of GBytes of distributed RAM, and TBytes of disks accessible for the GMips-highly parallel and distributed processing. They offer potentially unbeatable performance and price/performance ratio, opening up new perspectives for the applications . Major hardware and software makers present multicomputers as the next step of their business strategy. This was, among others, the subject of Microsoft Scalability Day organized in May 1997 for the US professionals. The technology is also felt as the next step for the Internet that should evolve from a data providing utility today, into a computing utility . One also foresees very large multicomputers coming as new tools for the academia, e.g., a 10.000 node multicomputer for Stanford University in 5-10 years . Finally, the domain was recently declared as strategic to the US computer science supremacy in 21st century by the US Government, and become the object of nationwide research program PACI .

Multicomputers need new system software, fully taking advantage of the distributed RAM, and of parallel and processing on multiple CPUs,. One especially needs new data structures, allowing files to span over multiple servers, and to scale to as many sites as needed. Such files should reside for processing in the distributed RAM, providing then access performance inaccessible to disk files. They should be accessible to multiple autonomous clients, including the mobile ones. Finally, they should not require any centralized data access computation or directories, too avoid hot-spots.

One solution is a new class of data structures called Distributed Scalable Data Structures (SDDSs). First proposed in 1992-1993, SDDSs gave rise to important research effort, materialized by several algorithms, papers and implementations. It was shown that SDDS files are able to scale to thousands of sites, and terabytes in distributed RAM, with constant access performance, and search times under a millisecond. Multi-key searches requiring an hour or so in a traditional file, e.g., a k-d file, may succeed in less than a second in an SDDS file . All these properties should be of prime importance for the applications, especially in the DBMS design arena, . They open new perspective for the VLDB design, for multimedia databases, for real-time and high-availability databases, for decision support systems, and for high performance computing in general.

Wednesday, October 06, 2004

Need of Incremental Datastructures.........from transistors to transactions

Yep that sound's really great. As a developer did you ever felt the need of these things ever??. Currently Iam into this intresting area, I talked about making your datastructures persistent datasturctures (look at my previous blog), persistent datastructures can replace the need to database to make use of the concurrrency and consistency of the data in these structures. So now come's the challenge Incremental Datastructures, especially Incremental Datastructures are most essential if the data in these structures is really behemoth. Take the following examples from totally different domains Geometric synthesis: Physical Design ------------------------------------ As we know in many of the physical design tools (software), we need to represent all the geometric information of the circuit, I mean for the fabricator he just need the geometric information about the polygons in each layer (p-type, n-type , metal ect...). So in the software tools (Physical Design Layout Editors) to process this information. The complete information should be kept in some datastructure. I'll give an example of my favourite datastructure the CORNER STITCH. Created by Ousterhout corner stitch. but unfortunately corner stitch has the following drawbacks. 1. Its a totally inmemory datastructure. 2. Its not an incremental datastructure. So to handle the incremental changes (Custom IC design). We need an incremental corner stitch. So the question is can we make a incremental and external memory corner stitch? Bussiness Intelligence ----------------------- Lets move into a totally different domain which deals with massive amount of data, lets move from Transistors to Transcations. The massive data here Iam talking is about the data you dump in your Datawarehouse. So most of the analytical queries are based on MV's(Materialized Views). These MV's Hold massive amount of data really massive. And these are genereally refreshed every 1 to 1.5 months, to analyze the transcations a particular enterprise is processing. Yes Iam talking about OLAP and not OLTP. Generally the refresh time of a OLAP system is very large for an enterprise, typically goes to several days before the managers can analyze the reports on these MV's. Now the challenge is can we build a Real-Time OLAP system, which will not take this much time to refresh.....Yes Yes you got it right Iam talking about incremental refresh....so incremental refresh need a Incremental Datastructure. Recently I have been working with oracle, On a Bussiness Intelligence product. Oracle is the first company to recognise the need of incremental algorithms in the area of bussiness intelligence. They call it Daily Bussiness Intelligence (DBI) is the implementation of such incremental OLAP system with a lot of incremental datastructures in it. study more about DBI DBI So do you think your datastructures need a Incremental tinge???? Vamsi

Friday, September 17, 2004

Its never too late to get it back..........(This is what I feel today....)

Hey guys and bi%$#@!, no one is a master and especially if you are aspiring to become one never ever feel this. I know its really involves a lot of nerve to become one, 1.have that same yearing passion for creating complex things forever, 2.always wanting to be one THE ONE. Yes I'am an amateur hacker....always wanted to be like http://catb.org/~esr. If you are also the one with the same attitude just readon. In this I just wanted to tell about the things an aspiring hacker needs to take care during his career, never feel that you have assuaging the above things which I mentioned due to a failure, failure may not always be technical sloppiness (atleast for me), you feel that you failed even in any different situation. just remember that its never too late to get it back (ITS NEVER TOO LATE TO GET IT BACK).

Sunday, July 18, 2004

Have you ever wondered on how to create persistent datastructures.......?

  Hey recently I came across a problem on how to implement external memory datastructures???........external memory datastructures are just like normal datastructures but operations you make on these datastructures are stored in secondary memory.......   PROBLEM: My server application (assume a dashboard application running on a server)has a binary tree to keep track of the posts posted by the user....as long as my server is up and running...and when the user creates new posts, Iam adding the posts to this tree...every thing was fine till recently before my dashboard became a hotspot on the internet...suddenly I found as the posts increased the tree is growing bigger and bigger and eating up all the RAM space on my sever Ok I still have a 4GB RAM..but its hitting the performance.....what should I do as the person who wrote the program to handle this stuff.....I cannot keep all the 1 million posts in the RAM....ok the OS's virtual memory is taking care of doing this (Virtual memory is doing the swapping)...but its leading to thrashing....ask the virtual memories code is not application specific.....So whats the solution???? SOLUTION: All applications which need to handle huge amounts of data in the application code should have a inbuild PERSISTENT-RUNTIME which does the application specific paging.....I guess the code below will make things clear than words.....   I had this RUNTIME code in all the persistent code which I had written...I guess its very easy to tweak for your application.............Planning to make a opensource project which does better caching etc...... Get in touch with me I'll sell it if you want to buy............the following is the interface....    
#ifndef __Persistence_H__
#define __Persistence_H__
#include
#include

typedef int PAGENUMBER;
typedef void* PAGE;
typedef enum PersistentError{PSUCCESS=1,
        PREADFAILED=-2,PWRITEFAILED=-3,PPAGESIZEEXCEEDED=4} PERROR;
/**
 * The Cacheing structure
 ** This helps in doing the inmemory Lookup
 **/
typedef struct PersistCache{
//TODO: Some one can extend this interface

        bool (*pageExists)(PAGENUMBER)=NULL;
}PCache;
/**** The core Persistence Engine **/
typedef struct Persistence{
        int PageSize;
        FILE *fptr;
        PCache *PageCache;
        //The core routines//
        PERROR (*startPersistence)(const char*);
        PAGENUMBER (*getNewPageNumber)(void)=NULL;
        PAGE (*getPage)(PAGENUMBER)=NULL;
        PERROR (*writePage)(PAGENUMBER,PAGE)=NULL;
        PERROR (*defragmentPersistence)(void)=NULL;

}PersistEngine;
#endif

Thursday, May 06, 2004

Need of abstraction and Most commonly used practical design patterns in building key software systems

1.Introduction In this small article, I will try to explain briefly about design patterns briefly, and most practically used design patterns(patterns I have used), in building systems using java programming language. This article is divided into two parts first part introduces the subtle concepts of abstraction in contemporary object-oriented programming and also on how abstraction was acheived in procedural programming in earlier days, second part tries to illustrate most commonly used design patterns used in building system's and also illustrates how these design patterns address the issues in the earlier section. 2. Why do we need abstraction within our program's? Is abstraction within programs a new buzz word, introduced by the contemporary object oriented programming methodology?, the answer to the above question is NO. Abstraction was implicitly used in programming from the earlier days. We illustrate this from the following quote by Tanenbaum. "Seperate the policy from mechanism",Andrew.S.Tanenbaum in his book on "Operating system design concepts", when he tries to explain on how to implement security and scheduling components within the operating systems. Tanenbaum says that in building key software systems like the operating systems, which should be adaptable to various customizations (security/secheduling) at the run time. The implementors of these systems should program the system in such a way that the implementation is not aware of the runtime customization, this allows the users to have multiple security policies which use the same underlying mechanism, implemented within the operating system. Tanenbaum is implicitly provoking the need of abstraction in building key software systems. So let me ask the question, Why do you think these people want to use abstraction?. The answer to this question is "REUSE, DONT RE-INVENT THE WHEEL" (yep Iam shouting). In the above scenario you are reusing the operating systems security mechanism, when ever you add a new security policy to the system. Let me further make you realize the need of abstraction in building systems. Suppose as a programmer you have been asked to build a 'HTTP Server', later after some time you were asked to build a 'SMTP Server'. So what do you do? start building the 'SMTP Server' right from scratch? as a dumb programmer or as an average intelligent programmer would you like to use some of the stuff from your earlier code?, I think you may want to stick to the latter. Now how do you proceed further?. Would you like to copy the code from the earlier program and try to change the protocol handling stuff to realize the 'SMTP Server'?. Yes this may seem a workaround for small programs. As an intelligent programmer this might seem very odd way of doing things (reusing). So you start finding a workaround on how to acheive this in an elegant manner. Ok you are not the first programmer who is asking this question. Great programmers have been asking this kind of questions to themselves from past several years. This question which the programmers had asked themselves had a revolution on how we write programs to build complex systems. So if you wanted to reuse the existing 'HTTP Server' logic, then you should have written the 'HTTP Server', In such a way that it can be reused. ie you should change the way you write code, and analyze the system. You should try to separate the generic logic within in the programs from specific logic to accomplish a particular task. In this case of 'HTTP Server' and 'SMTP Server', the logic which handles the clients requests ( while(Wait_For_Client){ fork_a_thread_to_handle_the_request; OR Use_thread_from_thread_pool; dispatch_request_to_worker_thread; }). The latter logic is common for both 'HTTP Server' and 'SMTP Server'. In order to reuse the above logic, we should make sure that the logic is not aware of what server it is (ie.. the logic should be unaware of whether its processing HTTP or SMTP. So while writing the server code we should seperate application specific stuff from generic stuff. In the contemporary object-oriented methodology we call this separate the interface from the implementation. In the next post I will, expand on this and design patterns. Let me know if this is motivating to learn about the intricacies of abstraction.