Hi teruca,
The idea of a single global store of knowledge has captivated people's
interests for decades.
Ted Nelson founded Project Xanadu in 1960. He had a vision of a
world-wide public hypertext publishing and document storage system.
For the past 42 years Project Xanadu has worked to implement a network
of deep electronic documents with side-by-side inter-comparison.
Xanadu provides for frictionless re-use of copyrighted material, with
rights management. The Xanadu model envisages a massive global
repository of information with automatic version management, quoting
and annotation. Xanadu documents are deeply-interconnected and
systematically stored:
http://xanadu.com/
I will mention Ted Nelson's dream again later in this answer.
It is not currently possible to implement all of the things that you
mention in your question, although the technology does exist to do
some of those things - and some of them are being done quite well
already.
It is possible to index and store much of the information available on
the internet, and also some privately-held information. It is possible
to perform a rough automated translation of that information into
other languages. It is possible to provide access to that information
across networks such as the internet.
But it is not possible to store all of the information on a single
compact device. It is not possible to provide automatic translations
that are anywhere near as good as translations performed by a human.
And most privately-held information is likely to remain private.
I'll break your question down into three parts. If I've missed
anything, or not addressed a part of your question in the way that you
intended, please post a request for clarification.
PART ONE: Is it conceivable to imagine a system that can become the
nucleus of all the information that is available on the internet, and
all information in private databases?
According to the Online Computer Library Center there are 8.4 million
unique websites, a growth of 18% since last year. Of these, three
million were public, two million were private, and a further three
million were "provisional" or "under construction":
http://wcp.oclc.org/ (click on "Size and Growth")
It is quite practical to access (or "crawl") and index these public
websites, and web search engines do just that.
According to Inktomi, their index comprises 500 million web pages:
The Inktomi Difference
http://www.inktomi.com/products/web_search/difference.html
In addition to HTML web pages, Google indexes a range of other
document types including PDF documents, Microsoft Office document
formats and Corel document formats. According to the Google home page,
Google indexes and searches "2,073,418,204 web pages":
://www.google.com/
The Google index also includes over 700 million Usenet newsgroup
postings and 330 million images.
"Google Offers Immediate Access to 3 Billion Web Documents"
://www.google.com/press/pressrel/3billion.html
To access and index such a large number of documents, and to allow the
index to be queried, obviously requires massive communications
bandwidth and computing power. For example, Google use over 6000
RedHat Linux servers:
Interview with Google's Sergey Brin
http://www.linuxgazette.com/issue59/correa.html
It's not practical to access and re-index every document every day. So
search engines crawl the web periodically - typically every month or
so. Sometimes it is possible to determine which documents are likely
to change frequently, in which case those documents can be indexed
more often. According to the link referenced above, Google refreshes
"millions of web pages every day".
In your question, you mention information held in private databases.
It is hard to envisage a legislative or technological change that
would result in private databases being made centrally available, so
it seems inevitable that much of the world's private data will remain
private.
Some private data is marketable, and this could be made centrally
available if a suitable charging structure is in place. This already
happens with commercial data (such as the stock exchange prices that
can be obtained from Google), and with subscription-based access to
private data sold by content aggregators.
In addition to indexing the web, search engines can also store copies
of it. For example, Google caches copies of the documents that it
indexes, and can serve those copies to anyone who requests them. For
documents in formats other than HTML, an HTML version of the document
can be served if the user requests it.
So, the answer to part one of your question is that state-of-the-art
search engines can already be though of as a nucleus for much the
information that is available on the internet, and even for some
information from private databases.
PART TWO: Is it conceivable to imagine translating this information
from one language to another?
Certainly, the answer is "yes", although translation technology is
fairly rudimentary at present. Machine translation can give a rough
idea of the general content of a document, but it is rarely good
enough to make the translated document fully usable.
I'm guessing from your question that you may be fluent in Spanish, and
I have translated your question into Spanish to demonstrate the
limitations of automatic language translation. There are many online
translation services, and I used Free Translation for this example:
http://www.freetranslation.com/
Here's how this service translated your question. You will probably
agree that something has been lost in the translation:
¿Dada toda la información que está disponible en
el internet más todo que es contiene en los bancos
privados de la memoria que apreciaría saber si es
concebible a la imagen un sistema, el método o el
artefacto que pueden llegan a ser el núcleo de toda
esa información, lo traducen en un idioma y lo
hace disponible a algún usuario dado que tiene
tal artefacto?
Automated machine translation is already provided from within the
search results of search engines such as AltaVista and Google.
So, the answer to part two of your question is that it is quite
feasible to translate the information from one language to another -
but current technology doesn't do the job very well.
PART THREE: Is it conceivable to imagine a method or device that can
make all the information available to any given user having such
device?
It would be uneconomical for every person to have a device that holds
all of the documents. If we assume that each of the three billion
documents mentioned above averages 10 kilobytes (kB), we would need
around 30 terabytes (TB) of storage on each device. At current prices
of about $1 per gigabyte (GB), the storage would cost $30,000 per
device. Also, the device would fill a small room!
Luckily, we don't need to replicate all of the documents on every
device; we just need to make the all of the documents accessible from
every device. We already have the technology to do this, by using an
internet-connected computer or even an internet-enabled mobile phone.
So, the answer to part three of the question is that once we have the
information, we are certainly able to make it available to any user.
To sum up: state-of-the-art search engines can already:
- index much of the information on the internet
- store copies of that information
- translate that information into other languages
- make that information available to anyone who
has a computer and internet access
Search engines already provide much of the functionality that you ask
about in your question. The remaining problems are how to increase the
amount of information that can be gathered, and how to improve the
quality of translation.
Earlier in this answer, I mentioned Ted Nelson's Project Xanadu. The
story of Ted's long struggle to implement his dream is told here:
Gary Wolf, "The Curse of Xanadu". Wired Magazine (June 1995)
http://www.wired.com/wired/archive/3.06/xanadu.html
Gary Wolf's article is fairly harsh, almost chastising Nelson for
having a dream so ambitious and all-encompassing that it will probably
never be implemented. In a posting to the C2 wiki site, Peter Merel
wrote:
"Xanadu was a good idea, but if you can't adapt your ideas to
circumstances, you can't get them to go any place no matter how good
they are."
Peter Merel, "The Curse of Xanadu" (comment). Online posting to:
http://c2.com/cgi/wiki?TheCurseOfXanadu
Peter Merel's comment seems to sum up the current situation. For the
forseeable future, most of the world's online information will to be
held on the world-wide-web, and any successful global repository of
knowledge must be built on top of what the web already provides, and
take account of the way the web already works.
Additional links:
Search Engine Watch
http://searchenginewatch.com/
C2 Wiki site (discussions by programmers about software)
http://c2.com/cgi/wiki?FindPage
Google search strategy:
"how big is the www"
://www.google.com/search?q=%22how+big+is+the+www%22
+www "million pages"
://www.google.com/search?q=%2Bwww+%22million+pages%22
"translation service"
://www.google.com/search?q=%22translation+service%22
"project xanadu"
://www.google.com/search?q=%22project+xanadu%22
I hope you find this information useful. If I have missed any
information that you were seeking, please ask for clarification.
Regards,
eiffel-ga |