![]() |
|
![]() | ||
|
Subject:
text mining
Category: Computers > Internet Asked by: saarm-ga List Price: $10.00 |
Posted:
29 Jul 2004 14:40 PDT
Expires: 28 Aug 2004 14:40 PDT Question ID: 380967 |
Dear Madam/Sir I am a student for MSc in Computer Science at Imperial College London, currently writing my thesis on content-based pagerank algorithms. I am trying to setup a TF-IDF classification for web pages, as I need it for a content-based pagerank algorithm which I'm developing. I am looking for an index of terms (and their frequency), gathered from a large collection of web-pages's (webgraph). Such an index is described in the paper: "The Term Vector Database: fast access to indexing terms for Web pages", written by Krishna Bharat, Raymie Stata and Farzin Maghoul. I would be grateful for any advise that will help me finding and downloading such an index. Thanks in advanced, Saar Miron |
![]() | ||
|
There is no answer at this time. |
![]() | ||
|
Subject:
Re: text mining
From: nandigam-ga on 29 Jul 2004 21:55 PDT |
hi, I was working on such TFIDF weighing last semester, and for the collection of items we scrpaed http://dlib.org/back.html, this webpage which is excellent resource of text, has provided good resource. One of the other good sources is the usage of the google groups, which have excellent datasets. These groups can be accesed with the usage of the google api, though there is a limitation of querying the google through API of 1000 queries per day, but i personally feel it as a good collection. hope that could ease you. and give you some solution. Thanks Giridhar Nandigam |
If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you. |
Search Google Answers for |
Google Home - Answers FAQ - Terms of Service - Privacy Policy |