This project has two parts to it. The first part must work
individually from the second part. I need the first part of the
project to be done as soon as possible, preferably by Sunday 11/02/03.
The first part is not too hard, and should not take much time for an
experienced programmer. The second part incorporates the first part
and can be done within the next two weeks. Part I is fairly simple,
but part II might take a little longer. There might also be a part III
to this project. I will increase the compensation accordingly. I have
a similar project completed, with the only difference being the
database used. If it helps, I can post the specifications and code
for that project. I would do this myself, but between two jobs and
kids I have no time!! Below is the specifications of part I of the
project. Thank you for taking the time to read my question. The code
must work on a Sun Solaris workstation.
Project to be completed:
This lab looks at the way information is stored in fields and records.
Your program will read a source file from a real life application,
extract the fields and print the result.
2. Specification of the source data
We will extract data from the file
http://astrogeology.usgs.gov/Projects/VenusImpactCraters/venuscraters.csv,
which is found on the web page
http://astrogeology.usgs.gov/Projects/VenusImpactCraters/venus_download.html.
The data file we will use is the CSV (comma delimited) format file.
However, we won't use the whole file, just a subset of the records in
the file. It is important that you follow directions !
3. Specification of the Problem
a. Your code should read in the input data, which as you will see has
variable length fields.
b. Your code should extract each field individually and write it to
the screen, and to disk.
c. The fields written should be fixed length fields. Thus it may be
necessary to truncate some fields.
d. The primary key is the crater name. We will assume there are no
duplicates. Some craters have no name. These records should be
omitted. Note that some crater names include a blank or hyphen.
e. At the same time you should construct a data record (structure)
consisting of fixed length fields.
f. I think that there are a maximum of 12 fields, but I may be wrong.
I am basing my conclusion on the Excel version of the file. The Excel
version also gives the meaning of each field, which we will need for
the next part of the project.
g. Most records do not have all the fields. Null values should be
provided for omitted fields.
h. Some fields are enclosed in double quotation marks. I suppose these
to be text fields, and the commas inside the quotation marks should
not be considered delimiters.
i. The fields Lat., Long., Dia. and Crater Elev. should be saved as
type double.
j. The record should be written to a file called datafile.txt.
k. The records should be written one record at a time (not field by
field).
l. The required format for output to the screen is
i. Sequence number (RRN)
ii. Primary key
iii. All other fields in the record in the same order as in the data
file.
m. The output for one record should fit on one line. Fields should be
separated by '|' character.
n. The format for the output file should be similar, but omit the RRN
and '|'.
o. Your executable file should be named showdata.
p. There should be no restriction on the number of records processed.
4. Specification of the user interface
a. The file name should be taken from the command line.
b. The command to produce the correct result should be showdata
rawdata.txt, if the input file is named rawdata.txt.
5. Implementation constraints
a. Construct a subset of about 50 lines from the source file to test
your code. You must follow this specification, because that is how the
file used to test your program will be generated. You may not change
the input format. I will make my own selection of lines to run my own
test. I will probably choose some of the more "interesting" lines.
b. Your input should be done in binary ("raw") mode.
c. Design for re-use.
d. The fields should be output as fixed length fields and the
beginning and end of the field should be clearly shown as described
above.
e. If you develop your code in Windows, you should allow for either
Windows or UNIX type text files. The difference is in how the end of a
line is represented. Windows uses 0x0A0D (or is it 0x0D0A ?), UNIX
just 0x0D.
f. You should submit source code and makefile. .
g. The code should be neatly formatted. There should be at least two
lines between major blocks of code or functions. A function should fit
on a single page. Some logical scheme of indentation should be used.
h. Comments should appear at the beginning of each function explaining
what the inputs and outputs of the function are and what operation is
performed by the function. Each paragraph of code should also have
comments describing its operation. Comments should also be used to
elaborate on the description of the data where it would be useful to
the reader.
Here are the specifications of part 2 for this project.
PART 2:
I need to build an indexed data storage and retrieval system for
the data used in part 1. This will include a batch load function (from
a text file), an index display function, an interactive search feature
and delete. You need to use an Avail list for deleted records.
The easiest way to implement these functions is to use one program for
each function. The programs should know the name of the file
structure. The only thing that needs to be passed to the program on
the shell command line is the argument for the command:
$ load data.txt /* create the file structure */
$ list /* outputs a list of all records */
$ search Serena /* program displays remaining args */
$ add Serena /* program prompts for remaining args */
$ delete Serena /* program displays, waits for yes */
Give some thought to sharing code among the six programs. Other
commands may also occur to you during implementation and verification.
Part 3 (if I need it to be done) will repeat the same problem using a
B-tree index. Structure your code so that the index functions are
separate from the user interface and can easily be replaced!
Specification of the problem
a. A program load should translate a text file containing product
records to a file structure of your design. You should have two files,
a database file, a primary index. You have already solved the problem
of creating the data file in part 1. In this part, we add the index
function. (For part 3, your index file will change but you should be
able to use the same database file structure.) The source file format
is the same as for part 1. Records with missing primary key should be
omitted.
b. A program search should allow the user to find a record with a
given primary key and display the record's content.
c. A program list should display all records in order by the primary
key.
The display should show the entire primary key, followed by the other
fields, which may be truncated. Each work should fit on a single line,
and there should be no blank lines in between. This format must be
followed!
A program add should allow the user to add a new record to the file
structure. This program should prompt for the other fields. Since we
are going to add records, your index should have more space allocated
to it than just what is necessary for the source text records. The
function should check for duplicate keys and output an error message
if encountered.
d. A program delete should allow the user to delete a record with a
user supplied primary key.
e. The delete function should place the deleted data record on an
Avail list. The Avail list should be kept in the deleted records in
the data file. The head pointer for the Avail list should be kept in
the header of the index file. Subsequent adds should re-use deleted
data records for new data.
Specification of the source data
a. You should use the same source data file as in part 1. The index
needs to be large enough to hold 50+ entries. That is, the intial file
should have 50 data records, plus some additions will be made after
that.
Specification of the user interface
a. Each command that may be performed on the file structure should be
implemented in a separate program.
b. Each program should know the name of the file containing the
records in the file structure.
c. The key field should be taken from the command line.
d. Some of the keys contain blanks. This must be supported without
requiring any unusual actions from the user.
e. Each program should open the file structure, perform a single
operation, and close the file structure.
Implementation constraints
a. Each program should perform the action required and report the
results to the standard output.
b. Transfers to or from the database file should be implemented only
with open(...), read(...), write(...) and close(...) functions. This
is the same way that data records should have been written in part 1.
There are no constraints on how the index file is read or written.
c. The file structure transfer block size should be the size of four
records, for transfers to and from the database file. The file size
must be a multiple of this value. Only blocks corresponding to RRNs
which are a multiple of 4 should be accessed.
d. The functions that read and write the database should print out the
block accessed and also the record accessed within the block ONLY for
functions that access only a single record. (i.e., not for the list
function.)
e. The search for an item should be performed using a binary search in
a sorted index.
f. The binary search function should print out the locations examined
by the search.
g. The index file should be created by the load function and should be
stored in order by primary key.
h. You may select your own format for storing the index file.
i. Additions and deletions should be implemented by moving index
entries appropriately and storing a sorted index.
j. The avail list should be kept in the deleted data records. The key
should be set to a particular value, and the next field in the data
record should contain the RRN of the next list item. The remainder of
the data field should be set to blanks. The head of the avail list
should be kept in the header record of the index file.
k. Code may be written in either C or C++.
Suggested structure and required output
a. Reading and writing data records.
The block size is sufficient for 4 records. Only blocks corresponding
to RRNs which are a multiple of 4 should be accessed, and the
appropriate record should then be extracted. For read, your blockread
function is given an RRN and returns a pointer to the appropriate
record in the buffer. For blockwrite, the function is given an RRN and
a pointer to a record which is to be copied into the buffer. Every
write also involves a read, since three of the four records in the
block remain unchanged. The functions should print a brief descriptive
message of the operation, including block numbers (RRN of the start of
the block and RRN of the accessed record) whenever a block is read or
written, ONLY for functions that access only a single record. (i.e.,
not for the list function.)
b. Binary search in index
Given a key, the function returns two values, a location in the index
and a Boolean value indicating whether the key was found in the index.
The binary search should print each index location accessed (on one
line, please!).
c. Insert into index
Since the binary search has already found the correct location, this
value can be passed to the indexinsert function along with a key and
RRN. Following entries in the index are moved down to make room for
the new entry. The function should print the range of index entries
which are being moved (first and last only).
d. Delete from index
Since the binary search has already found the correct location, this
value can be passed to the indexdelete function. Following entries in
the index are moved back to remove the index entry. The function
should print the range of index entries which are being moved (first
and last only).
I suggest that common functions should be placed in an include file.
a. I need the source code and a single makefile that will create all
executables.
b. A neatly formatted code listing should be submitted, using the
following guidelines. Unreadable code will be penalized.
o At least two lines between major blocks of code.
o The entire body of a function should take up no more than one page.
o Comments should appear at the beginning of each function explaining
what is to be done by the function. |