Google Answers Logo
View Question
 
Q: for joseleon-ga only please ( Answered 5 out of 5 stars,   0 Comments )
Question  
Subject: for joseleon-ga only please
Category: Computers > Programming
Asked by: bmcompany-ga
List Price: $200.00
Posted: 20 Jan 2004 08:09 PST
Expires: 19 Feb 2004 08:09 PST
Question ID: 298383
Are you around? could do with a webpage front end to our link parser.

Many thanks

Request for Question Clarification by joseleon-ga on 20 Jan 2004 08:14 PST
Hello, bmcompany:
  Nice to see you again, sure I can do it, please, post the details so
we can discuss them.

Regards.

Clarification of Question by bmcompany-ga on 20 Jan 2004 08:57 PST
That was quick!

Last time we spoke, you created a Parser that strips 50 or so html
files to create a list of URLs to manually submit to search engines.
The program also identified the 1 file from each site with a layer
containing a list of links to the other pages. The list was then
exported to an Access database.

The system is working well so good job! However, we now have 3 people
working from this database submitting URLs to SEs from home. We have 3
databases floating around which is obviously posing a problem.

We are wanting a front end to the DB viewable in a browser, with the facility to 

a- Add URLs through "console" back-end webpage, in the same way as the
delphi parser does now (browsing to a dir, creating the list and
identifying the page with the layer) and then inserting to the DB

b- Allow a submitter to 'log in' and submit URLs, ticking them off the
list as they go along, inserting the submitter's log in ID into the
record.

Since we will be reaching many many thousands of URLs soon, we were
thinking of a MySQL PHP solution?

If you think this is something you could do for us, then let me know
and i'll post a more exact spec (field names, hosting for the DB etc)

Oh and we'll make this a 2 part Answer 200 per question ($400) + a
tip if i'm feeling nice :-) Is that ok?

Request for Question Clarification by joseleon-ga on 20 Jan 2004 09:50 PST
Hello, bmcompany:
  Delphi and PHP+MySQL are the languages in which I develop more
software, there is no problem on that.

Maybe, the only problem is to browse for a directory on a webpage,
that is, there is no way to do it. So maybe the best solution is to
expand the existing Delphi client to send parsed results to a remote
MySQL server.

Please, post all the info you have about the project so we get the
best way to implement it, but there should be no problem.

Regards.

Clarification of Question by bmcompany-ga on 21 Jan 2004 02:17 PST
ok good stuff.

Firstly, can you check if this hosting is going to have all the
functionality you need

http://www.hostway-uk.com/hosting/info.asp?id=6

We need all of the functionality of the existing parser (let me know
if you need a copy) and some new features.

We manually submit to 3 search engines. Google, Alltheweb and Altavista.

Our current DB looks like this

URL(string)  Layer (y/n)  Google (date)	Alltheweb(date)	Altavista(date)

A submitter manually submits 5 pages for each client (because its not
good to submit too many pages each day) inc the layer page to Google.
Once all of the submissions have been done to Google for that day,
then they start on ATW and AV. But Google gets the priority, for
obvious reasons.

The php page needs to be able to display results from the database
with several filters.

All arranged in date order (date the URL was added to the list), we
need to be able to choose ascending or descending

Front End filters (submitter can see)

Display All URLS
Display all URLS waiting any submission
Display all URLS waiting Google submission
Display all URLS waiting Altavista submission
Display all URLS waiting Alltheweb submission

We need an option to set how many results to display per page. (20,
50, 100, all etc)

With all of the above we need to be able to say how many URLs to
display per client (assuming 1 client has 1 domain name) For now, we
only want to submit 5 pages from each client (inc the Layer page).
Because each client has 50 or so pages, we cannot submit all of them
so at the moment, we are submitting only a few URLs and hoping the
search engine spiders the rest. But the URLs must stay in the list in
case we find time to catch up.

However many URLS are displayed per client, the Layered page needs to
be displayed first, somehow marking that page as a layered page.

Admin Filters (submitter cannot see)

All of the above filters plus:

Display submission logs by submitter and date (so we can see how much
work people are/are not doing!)

As you know, we must be able to add urls via the existing parser. The
parser must log the date that the url was added to the list.

As we have more than one person submitting, there needs to be a log-in
facility creating some sort of session variable (i know a little ASP
but no PHP so bare with me) with their username and added to any
submission they do. Once a submitter has submitted a url, they then
tick the box for the appropriate engine and it inserts the date and
time into the DB.

The database is now looking more like this (you might wanna add some
normalisation but you're the expert in that field)

url
dateadded
google_submitted_date
google_submitter
alltheweb_submitted_date
alltheweb_submitter
altavista_submitted_date
altavista_submitter

Please let me know if this makes sense. Please also let me know if you
are ok with the prices for all of the above.

We look forward to working with you again.

Request for Question Clarification by joseleon-ga on 21 Jan 2004 03:28 PST
Hello, bmcompany:

  The hosting you say it has all the features I need, but just as a
suggestion, please, take a look to:
  
PHPWebhosting  
http://www.phpwebhosting.com

I host all my websites there and the performance is very good, just a
suggestion, I don't get paid to say this ;-)

Regarding the source code of the parser, I have the last version, don't worry.

This is what I understand from what you say and how I plan to do it:

There will be two tables:

User table
----------
user_id       (autoinc)
user_name     (unique)
user_password (md5)
 
URLs table
---------
url_id                   (autoinc)
url_url                  (unique)
url_layered              (boolean)
url_date_added           (datetime)
google_date              (datetime)
google_submitter         (user_id)
alltheweb_date           (datetime)
alltheweb_submitter      (user_id)
altavista_date           (datetime)
altavista_submitter      (user_id)

-When you access the utility, there will be a login screen, where your
submitters can log in and you will have an special administrator
user/password.

-If you log in as administrator you will be able to:
 -Create/delete/modify users
 -Get an activity report with submission logs, that is, select an user
and see how much have submitted and when
 -Access to all the database fields and be able to sort by any field
 -Setup how many results will be displayed per page
 -Setup how many URLs from the same domain can be submitted (if any
submitter wants to submit more, will get an error)
 -Submit URLs so the submitters have more work to do
 
-If you log in as submitter you will be able to:
 -Display all urls
 -Display all urls waiting any submission       (that is, select all
urls from database where any submitter is empty)
 -Display all urls waiting Google submission    (that is, select all
urls from database where Google submitter is empty)
 -Display all urls waiting AllTheWeb submission (that is, select all
urls from database where AllTheWeb submitter is empty)
 -Display all urls waiting AltaVista submission (that is, select all
urls from database where AltaVista submitter is empty)
 (all the results will be ordered to show the layered pages first, to
be the first to be submitted)
 
So the way of work would be:
-You pack all the pages that need to be parsed into a .zip file
-You log into the utility as admin
-You upload that zip file containing the pages to be parsed
-You will get a report showing the results of the parsing
-New pages will be added to the database waiting for submission

-Then, submitters will log into the webpage and will be able to see
urls to be submitted (some kind of todo list)
-When they submit any, they will mark it as submitted to an specific
search engine, the submission date/time and submitter will be taken
from the server
-That way is like they say "I have submitted this URL to this SEARCH ENGINE"

-At any time you could enter the utility as admin and get an activity
report of any submitter

If this is what you want, I can start right now, I think this is a two
step process:

-Develop the parser engine in PHP
 -Accept pages to parse in a .zip file
 -Decompress it
 -Parse the pages
 -Enter results in the database
 (if the zip files are going to be really big, then we can do it
allowing you to upload using FTP and choosing the .zip file from a
combo)
 
-Develop the front end to work with submitters
 -Login
 -Reports
 -Submission interface 
 
Regarding price, yes, I'm agree with your offer and thanks for count
on me again ;-)

Regards.

Clarification of Question by bmcompany-ga on 21 Jan 2004 03:50 PST
Perfect....Well, almost.

The only thing im not sure of is the packing of the files to be parsed
into the zip file because 2 clients could have the same file name. I
assume we can put files for more than one client at a time?

Is this a problem? Can we zip including subfolders to seperate clients? 

The only other thing you didnt mention is that the layer needs to be
highlighted and appear at the top of the list.

Regards

Bm

Clarification of Question by bmcompany-ga on 21 Jan 2004 04:00 PST
hello again. Please download this file and post back and i will remove.



http://www.sesuk.net/ga/phpwebhosting.txt

Request for Question Clarification by joseleon-ga on 21 Jan 2004 04:04 PST
Hello, bmcompany:
  You can safely remove the text file, regarding the zip file, the
purpose is that you pack a complete directory structure, so you can
pack all pages from different customers, once uploaded, I will unpack
it and I will parse it.

Regarding layered pages I said this:

(all the results will be ordered to show the layered pages first, to
be the first to be submitted)

So I add to highlight them... ;-)

Do I start now?

Regards.

Clarification of Question by bmcompany-ga on 21 Jan 2004 04:17 PST
Ah, so you did say that! I just cannot read.

:)

I hope the hosting is set up as you need it. Please let me know if you
have any issues. You'll have my email address in the control panel so
you can contact urgently if necessary.

Please go ahead.

Request for Question Clarification by joseleon-ga on 21 Jan 2004 04:26 PST
Hello, bmcompany:
  No I go to lunch, 13:25 here, but I will be back soon, in any case,
we cannot contact outside Google Answers, is prohited to us in the
researcher guidelines, so all communication must be done here.

I will contact you when I have something to show.

Regards.

Clarification of Question by bmcompany-ga on 21 Jan 2004 05:42 PST
Great, I look forward to hearing from you.

Request for Question Clarification by joseleon-ga on 21 Jan 2004 11:45 PST
Hello, bmcompany:
  Just a report, I have translated the parsing engine to PHP and it's
working and it's update the database with the report results, the
front end to upload the files is almost finished, all this is working
on my computer and I'm just dealing with file permissions and trying
automate things as much as possible, once is finished I will upload it
to your new site, so you can test it.

Regards.

Clarification of Question by bmcompany-ga on 21 Jan 2004 13:00 PST
tis sounding good

Clarification of Question by bmcompany-ga on 22 Jan 2004 11:14 PST
for me, GA has been down all day!

I hope i havent missed anything

Request for Question Clarification by joseleon-ga on 22 Jan 2004 13:10 PST
Hello, bmcompany:
  That's why I haven't been able to post an update in all the day, so here it is:

  I have finished the first part where you can add URLs and list them,
I have changed the way you enter URLs because on the other way, you
would need to upload all the files into the server and it was no good
because you will need to delete them, prevent name mangling and so on.
So have make it this way:
  
-I have modified the Link Parser so you can now browse for a dir and
parse as many files you want looking for links, you can distribute
this app very easy, it doesn't require installation

-Also, the results produced have an (*) as the first character,
indicating whether the URL has the layer or not

-So you only have to copy the results to the clipboard and access this page:

http://www.qadram.com/zip/parse.php

-Paste the URLs in the textbox and click the button

-The system will report how many are new and how many are duplicated

-If you want to see them, just access here:

http://www.qadram.com/zip/parse.php?action=show_urls

-The urls with layer are highlighted

You can download the update Link Parser here:

http://www.qadram.com/zip/Link_parser_improved.zip

Tell me if all is ok, so I continue with the second part. As soon as
it is finished I will install it on your server.

Regards.

Clarification of Question by bmcompany-ga on 23 Jan 2004 00:03 PST
tis looking good.

only thing i can think of here is that we would need to parse each
client individualy with the local parser.

What i liked about the ziped idea is that we could do more than 1 client at once.

Its not essential but just a thought.

Post your next comment as an answer and we'll move on to the second question.

You're doing a great job (as usual)!
Answer  
Subject: Re: for joseleon-ga only please
Answered By: joseleon-ga on 23 Jan 2004 00:18 PST
Rated:5 out of 5 stars
 
Hello, bmcompany:
  I modified the Link Parser to parse recursively a directory
structure, for example:

customers\customer1\page1.html
customers\customer1\.....
customers\customer2\page1.html
customers\customer2\.....
customers\customer3\page1.html
customers\customer3\.....
customers\customer4\page1.html
customers\customer4\.....
customers\customer5\page1.html
customers\customer5\.....

If you have all your customers stored in a single directory and
separated in different folders, you can just point the Link Parser to
the customers root directory and it will parse recursively all
directories inside, dumping out a list of all links found.

I added this on the last version, you can download it, test it and
tell me if you are agree with that behaviour, if not, just tell me and
will work the way you need.

On the next step, I will add users, logins, submissions, etc, etc,
please, don't hesitate to request any additional feature you need.

Regards.

Request for Answer Clarification by bmcompany-ga on 23 Jan 2004 01:26 PST
hi there,

That sounds perfect but im not sure if i have the correct version
becuase i cannot get it to parse through directories.

Anyways, yes move on to the next step. I didnt want to put this as a
comment on an accepted answer so you'll have to post another answer.

the other question is 299229

Many thanks

Clarification of Answer by joseleon-ga on 23 Jan 2004 02:21 PST
Hello, bmcompany:
  Yes, you are right, I uploaded the "improved" version while the new
one is the "distributed" version, sorry ;-)

Here it is the latest:

http://www.qadram.com/zip/Link_parser_distributed.zip

Please, test it and tell me if it works as you need, let's continue on
the other question...

Regards.
bmcompany-ga rated this answer:5 out of 5 stars
Excellent work!

Comments  
There are no comments at this time.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy