My manager at work has assigned me the task of analyzing the rank of
our company's products on specific shopping search engines, like
Froogle, Y! Shopping, BizRate, etc. by querying specific searches on a
list of terms that should (hopefully) bring up results which include
our products.
Part of my task is to print out the results pages for each query
(usually 3-4 pages) to create a hard copy record of our analyses (we
do them on a regular basis). This process is fairly cumbersome, as I
currently take each term from our list, query it and then print out
the results pages for that query. Since I have to do 100-200 every
week or two, it is a huge time sink and I'm looking for a way to
automate the process.
So....my question is as follows:
*Is it possible to write a Perl script that will automate these print
jobs? Mind you, I'm working with Internet Explorer 6.0. If the
answer to this is yes...
A. Could you write a Perl script (and post the code here on Google
Answers) that will take a list of terms that I could cut and paste
into a web form, do the relevant queries on them and print each one
out? You only need to write code which will do this for one of the
search engines. Since I like Google and I'm using Google Answers,
just make the code run for Froogle (http://www.froogle.com). ($20
tip)
A sample list of queries is as follows:
Men's gloves
Leather jacket
Men's leather shoes
Worsted wool suit
men's cashmere sweater
B. Could you include code in the Perl script to print the pages in
this format: double-sided, 2 pages per side? ($5 tip)
If the answer is no...explain why not and, if possible, give me a way
to accomplish this task.
Thanks,
Froogler |
Request for Question Clarification by
dogbite-ga
on
07 May 2003 20:46 PDT
Hi froogler,
I am interested in answering your question
but want to clarify it first.
Is it essential that you print the result
pages from IE? How many HTML pages do you
print for each search? When you say you
print 2-3 pages, does that mean you print
the first 2-3 pages of results, or that
one webpage takes up 2-3 paper pages.
It would be easiest to put the search terms
in a text file and then have a perl script
issue the queries to froogle and retrieve
the HTML pages. That script could then
simply print out the result text from the
search results. Alternatively, the script
could use a simple html renderer to create
an IE-like printout.
The more complex solution would be using
a perl module like Win32::OLE to interact
with IE.
Can you help me better understand your question?
dogbite-ga
|
Clarification of Question by
froogler-ga
on
08 May 2003 00:04 PDT
Hi dogbite,
Thank you for taking a look at my question.
To answer your first question...what I want are (up to) the first 4
pages of the search results of each query. Most will take 4, some
will take fewer. When printing 2 pages per side, double sided, that
should mean one physical sheet of paper is being used to print out on
our printers at work. Also, the orientation of the pages should be
portrait relative to themselves but the physical sheet of paper will
be landscape (think of what the pages of a booklet would look like).
To answer your second item, you can issue the queries to Froogle
anyway you please - from a text list is fine. However, I want the
actual HTML rendering of the results pages printed...with the
formatting, relative placement and images. The script should run and
yield print-outs that look just like what the browser displays. It
does not have to print from IE, per se...I mentioned that in case it
was relevant information.
Whether you use a simple HTML-renderer or the more complex module that
you mentioned is no consequence to me.
Another alternative would be to take the browser results pages for
each query and append the HTML for each set of results to the previous
query's results, thus making (and saving) one large file. So, you
would get the results for query 1 on pages 1-4, the results for query
2 on pages 5-8, the results for query 3 on pages 9-12, etc. juxtaposed
in that particular order in one large HTML file. The only stipulation
is that there would have to be some sort of code to create page breaks
after each set of results...so that the results for Query 2 would
start on a new fresh "page" rather than down the middle of the last
page of Query 1. The sets of printed results for each query should
remain discrete from the results for the other queries.
The internal page breaks in this large file would be registered by the
browser and then I could format the printouts anyway I wanted. That
would also solve my problem.
Thanks,
Froogler
|
Request for Question Clarification by
dogbite-ga
on
08 May 2003 08:56 PDT
Hi froogler-ga,
I propose a solution that has
the html2ps program at its core.
The program's homepage is here:
http://www.tdb.uu.se/~jan/html2ps.html
That program should be able to
download all the result pages
and render them, images and all,
into PostScript. You can then
print the PostScript files however
you want.
I could write a script that would
convert all of your search terms
into froogle URLs and then feed those
URLs into html2ps.
There are a few caviats though. First,
you will have to handle installing all
of the modules that html2ps requires.
Those include Perl, ImageMagick, and
Ghostscript. Second, I cannot guarantee
html2ps will render the pages correctly.
What do you think?
dogbite-ga
|
Clarification of Question by
froogler-ga
on
08 May 2003 12:27 PDT
Hi dogbite,
Thanks for your helps so far. I found out that I don't have
authorization to install ImageMagick or Ghostscript. So if your
postscript solution requires those things, it's a no go. It sounds
like it would work (practically speaking).
How would you tackle the problem with the Win32::OLE module?
Froogler
|
Request for Question Clarification by
dogbite-ga
on
08 May 2003 23:09 PDT
Hi froogler-ga,
My experience with interacting with a
program like IE from an outside script
is that it is always requires a lot
of fiddling. Also, it often requires
a C or C++ compiler for Windows, like
Visual C++.
Upon further thought, I think your
suggestion of putting everything into
one file is best. Here are two pages
that I put into one .html file:
http://froogle.google.com/froogle?q=men%27s+gloves&btnG=Froogle+Search
Are you able to install curl on your
Windows computer? There is an installation
file here:
http://www.cag.lcs.mit.edu/curl/download.html
Also, do you have Perl installed?
dogbite-ga
|
Clarification of Question by
froogler-ga
on
09 May 2003 13:50 PDT
Dogbite,
Is the URL you mentioned in your last clarification request a
cut/paste typo? Curl sounds like it would work fairly well.
However...
I'm going to close the question (a coworker came up with a potential
solution), but I think you should be paid for your time and help. If
I end up needing help on this again, I'll specify you to answer it
first.
Thanks,
Froogler
|
Clarification of Question by
froogler-ga
on
09 May 2003 13:51 PDT
Dogbite,
Please cut and paste your clarification requests into the Answer and I
will pay you.
Thanks,
Froogler
|