I am not a Google Researcher but I would like to answer/comment your questions.
Let me introduce my self first, I am working in the software
development industry for last 8 years and I had been part of many
commercial projects on different platforms. I have led projects and
served as an architect in different projects. Hopefully you would find
my answer practical and interesting.
Question:
=========
The software stack will use Apache, Tomcat, and MySQL on Linux. I
have software development experience, but not regarding clustering and
the resulting design considerations. I would like some advice on how
to design my software. I'm also not as familiar with Linux, so please
provide simple terminology.
Comment:
========
Architecture and design are considered to be the solution to technical
and functional requirements of a project/problem.
For example one of yours client requirements would be "Cost
Effectiveness of the Solution" and in answer to that you chose these
reliable and freeware softwares for development and deployment.
What I want to say is that I can only advise you a useful and
practical solution (in terms of Architecture and Design) if I have
more details of project requirements. Sometimes we discuss the basic
idea only but this is not enough because there are always important
client's concerns and constraints.
I can suggest you many sites on the same topic of architecture and
design (I am sure you have already searched that) but in my opinion
these would not help you much. It is my experience that each
customized project should be treated specifically according to
specific client requirements.
Question:
========
Specifically, how can I store references to files (documents) so that
adding new servers (or drives) will be seamless? If I were storing
all documents on one server, every time a user adds a file, I would
upload the file to the user's directory, renaming the file to
something unique: /home/john- doe/uniquename 001.doc. In the
software, I'm planning for every user will have their own directory,
but if you have a better idea, I am open to ideas.
My problem: if I add a new server, how do I store which server the
user's files are on? Should I instead have one file server and add
drives (if so, how do I store which drive/partition a user is located
on - this may be a dumb Linux question, but again, I'm used to c:\ and
d:\ or Windows)?
Answer:
=======
Consider these scenarios
1) One server machine with one drive
2) One server machine with multiple drives
3) Multiple servers with one drive each
4) Multiple servers machine with multiple drives
In my experience to address all above four scenarios, following should be done
1) Software to be developed should be able to register server machinea
with following parameters
(1) Server machine name/IP (So that it can be accessed)
(2) Admin Id to access all the drives on the server machine
(3) Admin password to access all the drives on the server machine
(4) Drives with symbol (e.g. C: or D:) and name (e.g. my local
drive)
2) Software should be able to register all the server machines with
available drive's information
3) This all information should go to a database for reference
Now when a user submits a file for storing it.
Then software should be able to look for the available space one each
server one by one in each drive and store the file on an available
server, with available drive and with available space.
If a file "FirstFile.Doc" is save to a server "Server4" on "E:" drive
for application "DocumentsDataStore" for user "Josh" then address
stored in the database can be as following
Server4 Machine Address
File Path
File unique id (generated by system)
File saved by
Files name
File description
Etc
Path could be
"\\Server4\E$\DocumentDataStor\Josh\FirstFile.Doc"
Whenever user needs to open a stored file then software access that
file from above mentioned path. Since there would be a database table
which records each user files paths with access rights information, so
this way user would be able to access their files using stored
information.
If I talk in terms of Java then using servlets functions I can
retrieve and store files on registered servers.
Question:
=========
In summary, I'm looking for a solid answer regarding a scalable
architecture and software design that would accommodate a growing user
and document base.
Also, I would like the names and contact information for people/groups
that I may hire to help me implement these suggestions (e.g.
consulting groups specializing in building Java-based scalable
software). Focus should be on true scalability and cost effectiveness
- starting out with one server, building into multiple. I don't want
to spend thousands of dollars per month (I'm hosting with Rack space)
starting out, but would be willing to if the client base required it.
Answer:
=======
You can find a solid answer (technical solution) only if you provide
details of requirements and perform proper software development
Phases. For this project I suggest you to stick to following software
development life cycle.
Development Approach:
In my opinion every project takes it due time to be stable no matter
what process or technique you adopt and many times wrong project
management techniques fail the project and frustrate the stake
holders. In my experience one should follow the strict rules of
Software Development Life Cycle and this would make the life easy for
each party. Quality of deliverables would be extra ordinary at each
stage. I suggest you to follow the following software life cycle in
strict manner.
Requirement Analysis:
Analysis experts should perform a through research to enhance your
business idea further and prepare detailed requirement specifications.
Reading this document you should know exact details of application
features. You should only approve that document if it satisfies your
requirements. This document should be updated against your comments
until and unless you are satisfied.
Architecture Definition:
Based on the requirement analysis, architecture should be adopted,
which would include details of software and hardware used for the
project with other details
Prototyping:
Graphics designer should prepare a comprehensive prototype for the
requirements and that prototype would also be sent to you for
functional and graphics approvals. You should approve prototype if it
satisfies your requirements. This prototype should be updated against
your comments until and unless you are satisfied.
Design Phase:
Get designed whole application before you actually get coded
something. Design your application and get performed multiple
technical reviews by third party to enhance quality of the
application. Design is the most important step, which is often ignored
for small projects. This causes many problems in future.
Coding Phase:
Code your application and get performed multiple technical reviews to
enhance quality of the application.
Testing Phase:
Perform thorough functional and technical testing of the application
to ensure stability in the end product.
If I could help you more, I would be happy to do that
Regards,
Tobascus |