Google Answers Logo
View Question
 
Q: Legacy data migration QA plan ( Answered,   3 Comments )
Question  
Subject: Legacy data migration QA plan
Category: Computers > Programming
Asked by: siam-ga
List Price: $100.00
Posted: 18 Mar 2003 17:00 PST
Expires: 17 Apr 2003 18:00 PDT
Question ID: 177921
Looking for sources on how to QA output of a parser that converts
monolithic legacy data files to individual XML snippets. The specs are
too involved to include here. Books or white papers that address this
problem in terms of QA planning and execution, how to sample the
output and assess satisfaction of requirements (aside of XML
validation) given a staggering variation of input data and loose
specs, would be appreciated. (We've got the development side covered).

Request for Question Clarification by mathtalk-ga on 21 Mar 2003 10:53 PST
Hi, siam-ga:

It might help to stimulate my brain-storming if I had a picture of
what the project team looks like, in a broad sense.  If you can't
thumbnail them for me, at least run through these questions in your
own mind.

The input data is described both as "monolithic legacy data files"
with "a staggering variation of input data" and as "ancient
typesetting markup".  Does this format have any more specific name? 
Is it a "homegrown" standard?  Can the conversion to XML be decomposed
into two parts, a "naive" conversion and one which does the "throwing
out a lot" part?

Who are the users/sponsors/domain experts that drive the
specifications and (ultimately) the signoff decisions?  How available
will they be to participate in the project (and how committed are they
to its "success")?

You mention that you have the development side covered, but it would
be helpful to know how large the development team is, what parsing
tools and version control is "standard" for the project, depth of
experience with the legacy application, and also how committed they
are to the project's success.

Finally I'm picturing you as the manager of the project, even though
this may be a simplification of the actual roles and responsibilities.
 I'd be trying to make suggestions about how to "version control" the
specs (and the deliverables) and to maintain a "project schedule" so
that you, the developers, the users, and management all have
"visibility" into the project's progress.

In that respect it would be good to know something about your style of
project management and what tools (e.g. MS Project) you are
comfortable with.

Here's my estimate of the relative efforts involved, assuming you are
starting with project specifications that reflect some serious
thinking about where you are and where you want to go:

- maintaining the project schedule and updating specifications (20%)

- actual software development (10%)

- quality assurance (code walkthroughs, output evaluation) (70%)

Assuming the project is "big" (big enough that lessons learned at the
first phases of implementation need to be remembered/assimilated into
later phases), a key driver of quality will be construction of an
automated test bed.  This is intended to supplement, not replace, the
70% effort mentioned above (human eyes on code and output).  An
automated test bed runs the current version of migration path on sets
of known inputs and compares the XML outputs to previously validated
XML outputs, and reports on any changes.

I'm very interested in the subject of your question, but unclear on
what level of detail is expected in relationship to the price offered.
 Perhaps if you review the pricing guidelines here, you'll be able to
clarify this expectation for me as well:

http://answers.google.com/answers/pricing.html  

regards, mathtalk-ga

Clarification of Question by siam-ga on 21 Mar 2003 12:59 PST
This is a proprietary format developed by a contractor who can no
longer be contacted. The format is documented, but the problem lies in
its misuse. Since the data was fed to a typesetter, emphasis was on
achieving visual correctness, that is, groupings of data were visually
understood, but not necessarily tagged as such. Apart from the obvious
need to infer and code this path from tagging to output, the reliance
on visual grouping is sometimes intuitive and not documented. For
example: a text paragraph may have different meanings (that is, type
of data) depending on what it follows. In many cases, the data can be
identified in sequential reading and contextualizing only. We are
converting this data to XML first, by identifying and naming the data
sections which were previously were only partly tagged and partly
understood by placement. Hence the difficulty in automating the
validation.

In cases of tagged data, we follow the documentation and are prepared
for error checking in parsing and by validation of output. In case of
inference, things get hairy and require client review to spot out
problems, and yet, it seems that we have managed to cover a lot
correctly. Our budget does not allow for review of every single output
file and subsequent revisions. WRT naive conversion: this is something
I've considered, as proprietary tag to XML element mapping, but there
is so much depending on sequencing and context that I can't see a
two-way relationship. (performance and efficiency are not a concern at
this point)

The team: a producer, part-time project manager, and full time
developer, myself. Additionally, we are geographically interspersed
and rely on web based version control and bug tracking system (I have
to refrain from mentioning specifics, but can say it's a Unix
environment and scripting is possible), and communicate via email and
phone conferences with the client and among us. For reasons I can't
elaborate on, much of the planning unexpectedly ended up on my plate.

Code walkthroughs: we have documented our conversion process as pseudo
code for our client to review and have discussed various points in it
via email and phone conferencing. These has been pretty effective and
valuable. The client, and the rest of my team are not technical enough
to understand actual code and the question is, how to get more
specific with pseudo code that appears more confusing than the code
itself (heavy use of regex), and how to deal with the fact that only
the client is aware of many exceptions to the specs that were allowed
over the years and never put down in writing. These exceptions keep
popping up in conversations and feedback and were not documented
before.

Datasets and unit tests: I selected a large dataset, roughly 30% of
the output to be reviewed. Each output file is defined as test case.
In addition, I have requested that the client provide specially
authored data to use for unit testing specific modules in the parser.

QA: I have decided on daily "builds" that post daily parser runs of
the same dataset to our version control system, and a weekly milestone
build. I should say that we are not yet done with the development as
client requests and changes are still being implemented. The version
control system allows for visual diffs which I would like the client
to go through every week with respect to bug fixes. The client will be
doing QA and enter bugs through that interface. I'm not sure we have
the time for scripted automated test bed, but it might be possible.
Again, there's only one developer juggling all of this. In addition:
the client wants to review final HTML output which will be generated
from the XML, but I feel it's much more important to look at the XML
output itself and leave transformation to smaller scale testing. How
should we divide attention between the two?

Domain experts: the client has two experts on the format and its
application, I am the domain expert on parsing and XML. Client's
domain experts also sign off results.

Availability: the level of commitment is unclear. The client says they
will be bringing in testers, but I've yet to see concrete plans.

Please let me know if the pricing is acceptable.

Request for Question Clarification by mathtalk-ga on 21 Mar 2003 14:10 PST
Hi, siam-ga:

Pricing is fine and makes sense to me.  I appreciate the detailed
information provided above.

One more question.  You mentioned roughly 3,000 documents to convert,
I think, although if I read carefully, it seems to be a count of
output documents.  Is there a definite count (either of input or
output) documents involved?  I can imagine two sorts of reasons why
there might not be.  First would be that the number of output
documents is not easily determined from the given input; the number of
outputs will only be known once all the input is processed
(hypothetical).  Second would be that even as you begin processing the
legacy data, more such "input" data continues to be created.  This is
a red flag for the project, if so.  The conversion project should be
undertaken with a clear destination, in this case a replacement format
(XML) which has already been validated for ongoing use.  Is that the
case here?

regards, mathtalk

Clarification of Question by siam-ga on 21 Mar 2003 16:43 PST
There are 63 source files, expected to spawn to 2777 files, and in any
case not much more than that.

One more thing: I'm considering adding interactive parser
functionality that will accept manual input if it can't recognize a
pattern despite exception handling. Also considering having the XML
output documents edited manually if problem is too specific to warrant
a parser change. In that case there would be some need to track such
changes. Is this something to be allowed, given that ongoing revisions
will require repeating this manual change.
Answer  
Subject: Re: Legacy data migration QA plan
Answered By: mathtalk-ga on 31 Mar 2003 21:51 PST
 
Hi, siam-ga:

Your question asks about QA aspects of a legacy data migration
project, but nearly all aspects of a project's planning can directly
impact the QA tasks.  So I think it best to widen our discussion !!
:-)

CAVEAT
======

My understanding of your project is very incomplete, but from what you
have described, it seems to involve data cleanup at least as much as
pure data conversion.  You seem to emphasize the semantic content of
the data more than its presentation, perhaps more strongly than your
client would.  Many legacy "markup" conversion projects focus on
mainly on presentation of results, and hence target PDF or HTML
output, as for example:

[Minnesota Local Research Board]
http://www.lrrb.gen.mn.us/Guidelines/Appendices/appendixB.asp

Caveat: Your project is targeting XML output, but I know little about
what application(s) which will use this migrated data.  Normally the
target application would dictate a lot of things concerning the QA
process.  In the absence of knowing more about that, however, I'm
thinking of the results of the project as being targeted for various
potential (unwritten) future uses, making it in a narrow sense
something of a "data warehousing" or "data mining" project instead of
simply Web publishing.  [Something of that duality is inherent with
MathML, which allows either content orientation or presentation
orientation in representing mathematical formulas.]

CARDINAL PRINCIPLES
===================

I have two cardinal principles for data mapping projects, and I want
to throw these out there in advance of telling you exactly how to
apply them in your project:

1) Speak in complete sentences.

The idea is that the units of conversion should resemble standalone
statements, capable of being true or false (correct or incorrect) on
their own.  Of course this is not entirely the case, even in
mathematics.  There is always a context to what is being asserted. 
Nonetheless, in constructing your "XML snippets" be careful to avoid
fragmenting the source data beyond a point where it can no longer be
understood as "complete sentences" as this is a red flag that the
converted data has lost its coherence.

2) Invent needed vocabulary.

Your description of the specifications process echoes experiences I've
had.  Apparently there exist many basic patterns for the conversion
"template" and probably even more "exceptions" to these patterns.  In
order to discuss the patterns and exceptions, and most importantly to
be able to write them into the project specifications, I'm guessing
that you will need to invent some new vocabulary.  The discussion of
critical issues can break down in the specification phase because the
same imprecise words get used to describe a variety of truly distinct
phenomena.  Sometimes this is fortuitous and leads to deep insights
into the similarity of tasks for the software to perform, but more
often than not it results in a false sense of confidence by the client
that difficulties have been ironed out.

A FEW WEB CITATIONS
===================

Okay, now that I've thrown out my two cents on the generalities, let
me present a few papers I found in searching around the Web.  While
none describes a situation exactly like yours, each struck me as
having some good ideas to contribute with respect to quality assurance
in data conversion projects.


First up is a white paper by Colin J. White of Database Associates:

[An Analysis-Led Approach to Data Warehouse Design and Development]
http://www3.newmediasales.com/dl/1755/Colin_White_Evoke_in_DWH_V2.pdf

This paper has absolutely nothing to do with XML but champions the
notion of using data quality dictate the design of a "data warehouse".
 It presents some terminology that may be useful in selling your
project planning to the rest of the project team, such as the
importance of "staging areas" for data to minimize data quality and
data integration problems.  Note that this is "version 2" of his
paper, so apparently it made a good enough impression on the first
client he used it with to make it into a second version!


The second paper is by Robert Aydelotte:

[From EDI to XML]
http://www.posc.org/ebiz/xml_edi/edi2xml.html

The author describes project planning for converting "legacy" EDI data
formats into XML/edi (XML-based EDI) but doesn't go into detail about
test cases and QA.  However he gives a link to the ISIS European
XML/EDI Pilot Project, where many seminar presentations and other
project-specific documents are available.  This was sufficiently far
in the past that validation of XML was discussed solely in terms of
DTD's, but what I found most interesting in this material was the
discussion of "best pracices" for creating those DTD's.


Third is a paper by Shazia Akhtar, Ronan G. Reilly, and John Dunnion:

[Automating XML Mark-up]
http://www.nyu.edu/its/humanities/ach_allc2001/papers/akhtar/

which may provide some cogent ideas toward selection of test cases. 
They describe using the "self-organizing map" (SOM) learning algorithm
proposed by Kohonen to arrange documents in a two-dimensional map, so
that similar documents are located close to one another.  I was
thinking that this idea might be applied in your project to the
selection of test cases.  Supposing that the 2777 XML snippets were
mapped into a 2D diagram, selection of test cases from among these
could then be done so that a greater number of idiosyncratic documents
are chosen for critical examination (at the expense of using only a
relatively smaller number where very similar documents are densely
clustered).


STARTING OVER AGAIN
===================

Having hit all these fragmentary insights at the outset, let back up
and divide the data migration process into three "quasi-sequential"
phases:

Data Cleanup (rectification of original data) 
Data Translation (data mapping and conversion)
Data Installation (provisioning of revised data to applications)

It would be nice if the three phases were truly sequential.  In
practice one allows a greater or smaller measure of parallel activity
across these phases for the sake of speedy deployment.  Understanding
the interactions is a key to minimizing cost and risk.

Data Cleanup
============

In a classic waterfall process for data migration, the data cleanup is
done on the frontend of the project.  Temptation to defer this upfront
"intellectual effort" to a later point in the project calls into
question the "integrity" of the conversion phase.  If the data is not
correct to begin with, how can a properly defined conversion process
produce correct output?  GIGO stood for "garbage in, garbage out", but
it could also mean "good-data in, good-data out".

In this particular project you've said that specifications exist for
the original "tagged" format of the data files.  That this data is
organized into 63 input files seems somewhat incidental to the
structure of the entities represented by that data.  As a conceptual
aid I'm thinking of those files as being somewhat like 63 tables in a
relational database (please feel free to speak up and give a better
description), guessing that each of the 2777 output files (XML
snippets?) would generically depend on the aggregate of all the input
files.

You've also indicated that these specifications were abused, that to
an extent old "markup" practices blurred the lines between content and
presentation.  For example, you suggest that semantic relationships
are sometimes "coded" merely as a pattern of contiguous presentation
(ordering) in the layout.

If it is meaningful to correct these "bad practices" in situ, then it
would be advantageous to do it before trying to convert them into
"proper" XML output.  For one thing it sounds as if the client has
more "resources" who understand the legacy format than who understand
the target XML format.

Of course advantage should be sought in using "tools" to assist in
this data cleanup, and it may be that the legacy format is simply too
"fragile" to support an aggressive cleanup effort.

Data Translation
================

I suggested decoupling the "conversion" into a "naive translation"
phase and a "forgetful" (through stuff out) phase.  This avoids
confusing an intentional discarding of information, obsolete for
future purposes, with the quite opposite objective, to "add value" by
reconstructing explicit semantic relations from "implied patterns".

A naive translation phase would put the legacy data into a more robust
XML format, in which you can hope to leverage lots of existing tools
(version control, XSLT, schemas, etc.) that may have no useful
counterparts in the legacy format.  The "mission statement" for this
naive translation phase would be to provide XML tagging that
duplicates the existing data in a literal fashion, so that at least in
principle the legacy data could be fully reconstructed from this
intermediate form.

Note that XML/XPath does provide syntax for the ordering of sibling
nodes.  In this sense I'd hope that the "patterns" of implied
relationships could be as manifest in the naive XML translation as
they are in the legacy format.

I'd anticipate that a number of issues with the original data would
not be fully recognized until the conversion phase was well along. 
While it is currently hoped that many of the "exceptions" that are
recognized late in the game will somehow fit neatly into the
preconceived architecture of rules,

Data Installation
=================

As previously mentioned, without knowing something about the target
applications, it's hard to discuss their relative importance in the QA
process.  You did mention in one clarification that the XML is to be
used to generate HTML, and that "the client wants to review final HTML
output" whereas you "feel it's much more important to look at the XML
output itself."  Given the greater insight you have into the HTML
output process than I have, I'm certainly willing to adopt your point
of view and consider the XML and its correctness as the focus of this
question.  It sounds as if the XML to HTML translation might be simply
a stylesheet transformation, although the designation of the XML
output files as "snippets" makes me suspect that a lot of "includes"
surround this process.


SPECIFIC QUESTIONS & ANSWERS
============================

Given this outline of the project, imperfect as only my imagination
can make it, we can at least recap the questions you raised and
discuss solutions:

1. What are some books and papers that address project planning for an
exercise like this?

This is the all-encompassing question asked in the original post. 
Project plans are a means to an end, not the end in themselves. 
Planning makes it more likely that you will reach the desired goal. 
As Gen. Dwight Eisenhower famously observed, while the plans for
battle are useless as soon as war begins, planning is indispensable.

You obviously have a good grip on the tools of Unix and XML, so I
won't try to drive the discussion of project planning down to a
technical level.  However here's a book on generic project planning
that I like:

Project Management: How to Plan and Manage Successful Projects
  by Joan Knutson and Ira Blitz
  
It's not extremely thick, about 200 pages, and I took a short course
out of it a few years back, sponsored by the American Management
Association.  One of the key points that I took away from that course
is that a project manager's role is that of facilitating, not doing
the project work.  I can't say that I ever took that lesson to heart,
because I'm the quintessential player-coach on a project team, but I
really do appreciate the contributions made by project managers who
take care of the issues log, updating the schedule, drawing feedback
from the clients, etc. without involving themselves in technical
accomplishments on a day-to-day basis.

For advice on software projects I can recommend the very readable
Peopleware by Tom DeMarco and Timothy Lister (2nd ed.).  I also find
food for thought in the eXtreme Programming (XP) series.  As a
starting point I'd read:

Extreme Programming Explained: Embrace Change
  by Kent Beck

2.  How should we sample the output and assess satisfaction of
requirements (aside from XML validation), given a staggering variation
of input data and loose specs?

I mentioned an idea above for using Kohonen's self-organizing map
(SOM) to assist in selection of the test cases.  You have obviously
had some discussions with the client about preparing artificial data
for use in unit testing, so clearly as you develop the conversion code
you are planning on stubbing out certain sections to allow for this
unit testing.

I might try use some "debug" code to profile which patterns are being
identified/applied and how often as your development code runs against
the entire input.  I'm unclear about whether the conversion will have
to take all 63 files simultaneously as input, or whether it's more a
matter of processing each one individually.  But in any case if you
can identify "natural" test cases for each code pathway, these will
naturally serve as good test cases for unit testing.  Asking the
client to make up data for the sake of unit testing seems to me to
carry some risk of wasted effort and even introduction of conversion
issues that never existed in the original data!  Just a thought
(probably a paranoid one!).

Once the conversion software is completed enough to run in
"integration" mode, you will want to consult that "debug" log to see
what the main code pathways are, and what "test cases" are good
benchmarks (illustrate expected functionality) and what are open-issue
related.  I really feel that the automated testing suite is going to
provide value on this project, despite the additional effort required
of you, the lone developer.  A major headache with late changes to
specs or even with bug fixes is that the changes needed to add A or
resolve B winds up unexpectedly breaking C.  In my experience a test
harness always provides value because it's better to discover that
Murphy's Law has struck while the code changes are still fresh in your
mind.

So as a proxy for doing something clever with the SOM map, I'd suggest
using the "profiling" counts from the test harness to decide how to
sample test cases. As the client's experts report conversion issues,
meet with the project manager to decide how the issues need to be
logged, ie. spec change vs. bug in the code.  Invent vocabulary as
needed to update the specs with clarity for all concerned.

3.  None of the project team except you are technical enough to
understand actual code.  How can the specs be made more specific
without pseudo code that appears more confusing than the code itself
(heavy use of regular expressions)?  How can exceptions to specs that
only the client is aware of be effectively documented (they keep
popping up in conversations)?

This is picking up where the last topic left off.  Pattern matching is
a key element of much declarative programming, but it can be tough
sledding to give it "literal translation" in the specs.  This is where
an astute use of jargon, specially invented for this project, can pay
off.  Give the patterns that need to be discussed in the specs
colorful, even semi-humorous names.  It makes them memorable and gives
the rest of the project team a feeling of belonging, of being "in on
the secret".  Give a fully blown definition of the pattern _once_ in
the appropriate section of the specs, but thereafter simply refer to
it by name.

Suppose (merely for sake of illustration) that in the documents
there's a typical pattern in which you have a section of BOLDED text,
followed by exactly three sections of fine print, followed by a
section in Italics.  Regardless of what the actual purpose of this
pattern is for the client's typesetting needs, you might aptly and
humorously refer to it as the Three Blind Mice template.  The lead
paragraph might be called the Farmer and the closing one, the Farmer's
Wife (since she "cuts off" the tail of the pattern).

Or, if someone on the project team fancies him- or herself a chess
aficionado, let them propose names like Queen's Gambit, etc.  It's a
chance for the non-technical but creative members of the project to
make an expressive connection to the nitty gritty details, and usually
enhances the commitment of the team as a whole to doing stuff the
right way, rather than just producing something "of the form".

For each section of the specs that defines a "pattern" you can have a
standard subsection that describes "known" or suspected exceptions. 
As the exceptions are more clearly identified and distinguished, some
of them are likely to evolve into subvarieties of "patterns", with
their own exceptions.  Listing the known exceptions can help the
project team to prioritize the evolution of new patterns based on the
depth and complexity of the existing patterns and exceptions.

I don't know what language you plan to implement with.  You mention
regular expressions (and a focus on correctness rather than speed),
which leads me to think of interpretive languages like perl or Awk.  I
prefer Prolog as a declarative language with strong pattern matching
features, but in working with XML source documents of course XSLT is a
natural choice.  But regardless of how pattern matching will be coded,
there needs to be an internally consistent vocabulary for all the
variations that the project team can buy into.

4.  The client wants to review final HTML output which will be
generated from the XML, but I feel it's much more important to look at
the XML output itself and leave transformation to smaller scale
testing. How should we divide attention between the two?

You have a clear instinct about this, which I would trust.  But I
think I'd try to adapt to the client's point of view in a way that
makes it seem as though they are winning the argument.  Specifically
I'm thinking of serving up the XML pages with a very thin stylesheet
transformation, which in the limiting case might be the default
stylesheet used by Internet Explorer to render generic XML.  If I knew
more about the target application, I might see more clearly what
incremental transforms might bridge the gap between the "raw" XML and
the ultimately desired HTML.  If you are the only developer, then I
guess you'd be in the best position to judge how to finesse the
differences.

The presentation for testing will need to account for the size of the
output documents.  While "snippet" suggests a single page or so of
XML, this may be wishful thinking on my part.  If the documents are
really big, one might use an "outlining" stylesheet that allows for
"collapsible" sections of textual display to assist navigation within
the document.  This is something I should know more about than I do;
if it's of interest, then make a Request for Clarification (RFC) with
the button at top of my Answer, and I'll put a demo together for you.

5.  One more thing:  How about adding interactive parser functionality
that will accept manual input if it can't recognize a pattern despite
exception handling?   Or having the XML output documents edited
manually, if a problem is too specific to warrant a parser change? 
Should this be allowed, given that ongoing revisions will require
repeating this manual change?

Obviously allowing for an XML output document to be edited manually
wouldn't require much programming effort on your part, where the first
option sounds to my uneducated ear as if it would require a lot of
effort.  You accomplish the revision tracking for output documents
more or less easily by logging them into a version control system. 
There are some issues with this.  You'll need to come up with a naming
convention for the output documents which reflects their "identity"
across changes in the parser, and I have no clue how this might be
done.  Also you'll need to come up with an extra "database" that
identifies which output documents are being treated as "manual"
exceptions, with the intention of "checking out" only those documents
prior to a run which are supposed to get automated treatment.  I don't
think those are insuperable obstacles, and in fact I think the
identification of the "exceptional" output documents ties in well with
what I suggested above about having "exception" subsections in the
specs.

My only real objection to this sort of approach, which may be
pragmatically best, is that in principle one would prefer to do the
cleanup on the source data, rather than in ad hoc fashion as a
post-processing phase.

Perhaps for you the concept of interactively directing the parser has
a fairly immediate and easily implemented meaning, one that is more
restrictive than simply allowing the user to do whatever they please. 
One aspect of it that I'd drill down on is how the parser is to be
"interrupted" to allow manual interaction.  The exceptions are likely
to include not only documents that fail to match patterns, but also
documents that match patterns that were unintended.  In the latter
case it seems that it might be prohibitively slow to "set breakpoints"
in the software that asks a user to decide in each circumstance
whether to allow automated parsing to continue or to "interrupt" for
manual interaction.

CONCLUSIONS
===========

I've been off thinking and talking to myself about these ideas for too
long, but everytime I went back to look over your notes in relation to
my ideas, I got the feeling that my ideas had at least partial
relevance to paths you'd already gone down.  I need a reality check,
so I'm putting what I've got together as well as I can tonight for you
to take a look, and I'm standing by for any further clarification!

regards, mathtalk-ga
Comments  
Subject: Re: Legacy data migration QA plan
From: mathtalk-ga on 18 Mar 2003 20:09 PST
 
Does it makes sense to test the conversion by "round tripping" the
data, or a sample thereof?  That is, develop both the legacy-to-XML
and XML-to-legacy paths and compare the reconstructed data to the
original?

How much quality is required?

Does your "shop" do code walkthroughs?

regards, mathtalk
Subject: Re: Legacy data migration QA plan
From: siam-ga on 19 Mar 2003 00:27 PST
 
Impossible. We are moving from ancient typesetting markup that was
entered manually according to the best judgement of the editor at the
time, to XML and throwing out a lot in the process.

How much quality is a tricky question. The client seems to expect 100%
data integrity, and we are not sure how to define an error margin. We
have a data mapping spec that might be similar to a walk through, but
the exceptions and special cases are so numerous, that documenting all
of it seems more complicated than providing the code itself.

In a sense, we are going to treat our output documents as individual
test cases and possibly throughly validate 30% (out of about 3000
total). We don't know how to gauge if this is representative and on
what criteria to decide. The output documents are stored under version
control and are updated as "builds" that coincide with further
revisions of the parser, and data integrity problems found in them are
entered in a bug tracking system. This is a somewhat unusual approach
with regard to these systems. In addition to sources you could
recommend, an opinion on this process would be appreciated.
Subject: Re: Legacy data migration QA plan
From: mathtalk-ga on 21 Mar 2003 11:17 PST
 
Hi, siam-ga:

A code walkthough is a structured presentation at which a developer
meets with a group of other project members and explains how the
developer's code under review meets the applicable specifications. 
The reviewing group may be structured in a manner that accords with
the project "culture" at your shop, e.g. including or excluding the
end users depending on their commitment and facility at understanding
the specifications breadth and nuance.

The theory is to catch bugs in the code and/or gaps in the
specifications earlier in the software lifecycle.  The developer can
do the best job of trading off design decisions when presented with a
complete and accurate definition of requirements, yet in practice
those most responsible for articulating the requirements cannot fully
anticipate the wealth of special cases and exceptions without a body
of experience to judge by.

Hence a code walkthrough aims for a constructive meeting of the minds
instead of finger pointing.  You or a proxy will want to participate
in order to capture any nascent spec changes that emerge from the
discussion.

[Code Walkthrough Wiki]
http://c2.com/cgi/wiki?CodeWalkthrough

[Code Walkthrough Sample Checklist]
http://www.iseran.com/Win32/Articles/walkthrough.html

regards, mathtalk

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy