Google Answers Logo
View Question
 
Q: text mining ( No Answer,   1 Comment )
Question  
Subject: text mining
Category: Computers > Internet
Asked by: saarm-ga
List Price: $10.00
Posted: 29 Jul 2004 14:40 PDT
Expires: 28 Aug 2004 14:40 PDT
Question ID: 380967
Dear Madam/Sir

I am a student for MSc in Computer Science at Imperial College London,
currently writing my thesis on content-based pagerank algorithms.

I am trying to setup a TF-IDF classification for web pages, as I need it for
a content-based pagerank algorithm which I'm developing.

I am looking for an index of terms (and their frequency), gathered
from a large collection of web-pages's (webgraph).

Such an index is described in the paper: "The Term Vector Database:
fast access to indexing terms for Web pages", written by Krishna
Bharat, Raymie Stata and Farzin Maghoul.

I would be grateful for any advise that will help me finding and
downloading such an index.

Thanks in advanced,

Saar Miron
Answer  
There is no answer at this time.

Comments  
Subject: Re: text mining
From: nandigam-ga on 29 Jul 2004 21:55 PDT
 
hi,

I was working on such TFIDF weighing last semester, and for the
collection of items we scrpaed http://dlib.org/back.html, this webpage
which is excellent resource of text, has provided good resource.

One of the other good sources is the usage of the google groups, which
have excellent datasets. These groups can be accesed with the usage of
the google api, though there is a limitation of querying the google
through API of 1000 queries per day, but i personally feel it as a
good collection.

hope that could ease you. and give you some solution.

Thanks
Giridhar Nandigam

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy