We need a search program for a website we are creating. It should be
priced below $2,000, or a reasonable annual fee, and have at least the
following features:-
PDF fully indexable
we need an ability to add a PDF file to search index, and to point
out that PDF belongs to particular category
Ranking - we need put out own priorities there
we need or to mark some articles with special numbers, and then when
results received, to sort first based on that numbers, and then on
internal search engine ranking
Spelling / misspelling and configurable
We need that searches engine that both singular and plural words
We need to be able configure search engine for synonyms
Reports - what people search ?we should have ability to know what
people search and receive a file with their searches
Most popular searches
Searches that didn?t provide any results
We should be able to format presentation search results as we want
We should be able to make parallel searches in several categories or
types of information and then present the results in one page
The ranking of search engine have to be seen, and if possible, understandable
Search presentation should be able to be separated by categories
where the information was found (i.e. bookstore, forum postings,
files, etc)
Search should be possible by date
Boolean search AND, OR, AND NOT
We should be able to specify in what fields to search Body, title,
description, keywords, author (might be)
We should be able to rebuild the search database
It should have clear interface how to add new articles
Program must work on Linux/Appache/PHP |
Request for Question Clarification by
answerguru-ga
on
17 Jul 2006 10:04 PDT
Hello solars-ga,
I have spent some time going through your criteria and I have found
something that meets *almost* all of your requirements. For those that
are not met precisely, I have given a description below explaining the
reasoning:
"Ranking - we need put out own priorities there"
This requirement is met partially using what is called "boosting".
When you add an item, you can increase it's relevance this way so it
appears higher than it normally would. However, no search engine I've
seen will guarantee that document X will appear above document Y
regardless of the search terms. It seems to go against the whole
purpose of a relevance based search engine.
"We need to be able configure search engine for synonyms"
There is no support for this, however since it is an API. Thta means
that if you have a collection of synonyms in a database, you can route
searches through that so that similar words are also used in the
search. Of course this requires some software customization (I know of
companies that have done things like this). The reason that its too
much to ask of a search engine is that different subject matter areas
have their own interpretations of what terms are similar or
equivalent.
"The ranking of search engine have to be seen, and if possible, understandable"
A percentage ranking is available for display, but again, if you want
to override the order based on your predetermined ranking then it's
not going to make very much sense! That percentage is a fairly common
and understandable way of conveying how close of a match the document
is to the search terms.
"We should be able to specify in what fields to search Body, title,
description, keywords, author (might be)"
What the search tool does here is takes all of the embedded PDF
meta-data and places it in the description field when that document is
indexed. Although your search interface is customizable, all of those
fields would be hitting against a single database field containing all
of the meta data.
OK - those are all my comments. All of the other requirements have
been met exactly. Please do let me know if you are OK with these
slight variations and I will post the product information as an
official answer.
Thank you for using Google Answers!
answerguru-ga
|
Clarification of Question by
solars-ga
on
17 Jul 2006 18:13 PDT
Am checking with programmers, and will advise as soon as I have an answer.
I assume the tool you found meets the software language too (ISYS for
example meets most of the criteria but not the language).
|
Request for Question Clarification by
answerguru-ga
on
17 Jul 2006 18:46 PDT
Sure - no problem.
What programming language is being used? I can confirm whether or not
this is compatible.
Thanks,
answerguru-ga
|
Clarification of Question by
solars-ga
on
19 Jul 2006 08:19 PDT
The programming language - is mentioned at the end of the qustion.
Here is what the programmer repsonded:-
-------------
This requirement is met partially using what is called "boosting".
When you add an item, you can increase it's relevance this way so it
appears higher than it normally would. However, no search engine I've
seen will guarantee that document X will appear above document Y
regardless of the search terms. It seems to go against the whole
purpose of a relevance based search engine.
---------------------------
Response:-
It is better then nothing, but if it will not show up our marked
articles as first items we actually will not use that feature, moving
from 3rd page to 2nd is not a valuable option.
You invest time and energy in landing page for some word. Then it
shows up on 3rd page and boosting moves it to 2nd.
============
Synonym list - OK
============
Overiding -
There if its presented in star rating rathe rthan percentage, that would suffice.
"We should be able to specify in what fields to search Body, title,
description, keywords, author (might be)"
=============================
What the search tool does here is takes all of the embedded PDF
meta-data and places it in the description field when that document is
indexed. Although your search interface is customizable, all of those
fields would be hitting against a single database field containing all
of the meta data.
So, we cannot search by author or any other fields. That tool keeps
them all together. Is this an issue for pdf only or for all documents.
================================
|
Request for Question Clarification by
answerguru-ga
on
19 Jul 2006 10:30 PDT
Hello again,
So I have some follow-up questions:
Boosting: is this still an isssue then? If it is, I've thought of a
workaround that resolves the problem. You can place all of your
'marked' documents (which you want to show above everything else) in a
seperate category. When you do a search, do that category seperately
and place its results above the context-based results.
Result rating using stars can be implemented, but it would require
some customization. You will still receive a percentage value in the
raw search result, but you can transform the "look" of the search
result using XSL stylsheets. Your stylesheet can output stars
depending on the value you get back.
Searching document meta-data: this combination of everything into one
field applies to document types containing this type of information
(PDF, Word, etc.). You could still provide these fields in your
search, but they would all be looking at the same field in the
backend.
Programming language (PHP) is fine - the search tool is a Java
component, which can be integrated with PHP.
So it looks like we've pretty much taken care of all of the initial
concerns. Please confirm and I will go ahead and post the information
on the product as an official answer.
Thanks,
answerguru-ga
|
Clarification of Question by
solars-ga
on
21 Jul 2006 05:12 PDT
Hi, this is what I have from the programmer in response to your suggestions:-
======================================
Boosting: is this still an isssue then? If it is, I've thought of a
workaround that resolves the problem. You can place all of your
'marked' documents (which you want to show above everything else) in a
seperate category. When you do a search, do that category seperately
and place its results above the context-based results.
======================================
That on the assumption that you can 1) do two searches at once 2) make
a joint presentation of two searches.
Highly unlikely both things enabled.
=========================
Searching document meta-data: this combination of everything into one
field applies to document types containing this type of information
(PDF, Word, etc.). You could still provide these fields in your
search, but they would all be looking at the same field in the
backend.
=========================
That is just ultimately bad, because we cannot look specifically at
meta that way. Let's say we marked a 1st article with word "cancer"
then the title of other article "That cures from everything except
cancer" and we marked id with "smile"
Both of them will be the same for the system because title joined with meta.
My addition
There are tools which do everything we need. One of those for example
is Mondosearch - see this link for the feature list
http://www.mondosoft.com/ms-features.asp
We just need something like that but for less money.
|