sample code in java for table tag parsing and then searching for a keyword in <td> |
Clarification of Question by
naveen82-ga
on
14 Nov 2005 20:04 PST
I need to search among the tables in a text file ( This is the source
from the web page and has been stored) and search for the table whose
size is greater among other tables and also search for the table which
has a specific keyword in it.I need to even check for the count of the
keywords occurences. Your help would be greatly appreciated.
|
Request for Question Clarification by
leapinglizard-ga
on
14 Nov 2005 23:49 PST
I don't think it's necessary to fully parse the HTML in order to find
tables containing a keyword. It suffices to look for occurrences of
the keyword between pairs of <table> and </table> tags. I can give you
more detailed assistance if you specify more precisely the
requirements of the task. Furthermore, what do you mean by the "size"
of a table? And what does it mean to "check for the count"?
leapinglizard
|
Clarification of Question by
naveen82-ga
on
15 Nov 2005 03:18 PST
Since, my intrest is only on the tables I need not look onto other
tags of the page. I need to search the text in the <td> of the tables.
A typical file of mine contains <a href> tags and also text(any
general web page). I need to search for the table that has more number
of <td>(s) in it and also for the repetition of perticular word (like
a search string)in the text. Based on this observation I need to
return only one table. Third party softwares like SourceForge HTML or
JDOM or any kind of DOM can be applied for this.
|
Request for Question Clarification by
leapinglizard-ga
on
15 Nov 2005 06:36 PST
I can give you sample code for this, but I'll need to know a few more details.
How do you want to identify the table with the greatest number of <td>
pairs? Should the program output the full text of the table, or just
the beginning and ending indices in the text file?
How do you want to handle ties? Would it do to pick the first table
that has the maximum number of <td> pairs?
Finally, I want to make sure I understand the keyword task. Do you
want to count the occurrences of a given keyword only in the table
that has the maximum number of <td> pairs?
I'll be able to help you in short order once we get these matters resolved.
leapinglizard
|
Clarification of Question by
naveen82-ga
on
15 Nov 2005 07:34 PST
The thing I need is mining a web page(search engine result). I have
gone through the part of storing the web page. A web page actually
contains many tables. Its like my result is stored in a table (general
observation) that has maximum size (as we see in the search engine it
has more number of <tr> and <td> 's in it. And also the search query
is present in the <td> of the table (IT ALSO HAS HREF TAGS IN IT). A
general search string is repeated more often in the result table
(another general observation). Hence I need to extract that table
which has more <tr> and also based on the search string.My output is
to present the whole table as a html page agian. So I need that table
(if two tables has exact match then they both are to be presented) in
a html file. As far as I know a DOM would be really sufficient for
this. But, I am very new to the use of DOM. Any help regarding this
would be greatly appreciated.
|
Clarification of Question by
naveen82-ga
on
15 Nov 2005 07:41 PST
One of the important things I dint emphasize is I need the code in
java. Any parser build in java can be used I guess.
|
Clarification of Question by
naveen82-ga
on
24 Nov 2005 19:06 PST
hi,
I was just curious about my question ? Can anyone plz respond to that
? The comment recieved from larkas was great! I went through that! But
what I understood was it is for XML. I dont want to do something like
converting from html to xml and then back to html again! If someone
knows how the html parsing is done for a search of a key word in the
html page! Thanks in advance! Any help would be really appreciated!
|