Tuesday, April 15, 2008

[TECH] Reverse Engineering and Creating Crawler BOTS.

I have my share of pleasure reverse engineering the underlying details of SCOPUS. SCOPUS as you might know is the most popular scholarly database used for citation searching. I was trying to solve this problem "Given a research paper X produce a set of research papers in the same connected component of the CITATION GRAPH", Let me define a CITATION GRAPH (CITE(V,E)) , V = {set of all research papers} and E={(i,j)} set of all directed edges from research paper 'i' to 'j' such that paper 'j' refers paper 'i' in its references. This edge information comes from SCOPUS however SCOPUS gives only one level (depth 1) in the connected component of all related papers, my goal is to get all the related papers (related in the sense fall in the same connected component of the CITE graph).

The reverse engineering the underlying comes handy when we want to automate the process of searching all these from the browser ourself.

I'm too tired to explain the details of the program which I had written using perl+LWP to create a CRAWLER BOT which gets all the related papers but if you need similar stuff sure the code can help click here

Unfortunately I don't get enough time to write blogs but in past few weeks I had some very interesting technical stuff I want to write.

No comments: