Google Answers: Generating xml from relational data

View Question

Q: Generating xml from relational data ( Answered 4 out of 5 stars

, 0 Comments )

Question

Subject: Generating xml from relational data
Category: Computers
Asked by: mssmb-ga
List Price: $50.00

Posted: 19 Dec 2002 01:10 PST
Expires: 18 Jan 2003 01:10 PST
Question ID: 126816

I want algorithm (steps) to generate xml from relational data?
and why we need this transforming(from relational database to XML)?
(XML with DTD)

Can you help me?

Request for Question Clarification by joseleon-ga on 19 Dec 2002 06:39 PST

Hello, mssmb:
  I can help you, but I will need more information:

What kind data do you want to convert? (i.e User information)
How many tables do you want to convert? (i.e 300)
Which kind of database server? (i.e Oracle) (Maybe the server you are
using can export to XML automatically)
How many people need access to that data? (i.e 500)
In which environment will you use this data? (i.e Intranet, Extranet,
Internet)

By answering these questions I will be able to tell you whether you
need to convert your data or not. Also you say something about DTD,
that DTD will be dependant from the datasource and table structure you
want to convert.

Please, post here any additional information you have.

Regards.

Clarification of Question by mssmb-ga on 19 Dec 2002 11:46 PST

Hi joseleon-ga
I'm talking in general. I mean any kind of data and from any database server.
I want the algorithm that work in different system (the steps).
And about DTD, I mean the generation XML use DTD not XML Schema.
I hope what I say is clear.

Answer

Subject: Re: Generating xml from relational data
Answered By: mathtalk-ga on 20 Dec 2002 23:12 PST
Rated: 4 out of 5 stars

Hi, mssmb-ga: Your excellent question has two parts, which I will take the liberty of rephrasing: 1. How does one convert data from a relational database into XML? 2. Why would it be important or useful to do this? Your wording, "algorithm (steps) to generate xml from relational data," tells me that you want a concrete answer to the first part of this question, which I will endeavor to provide. But the second part of your question reminds us that to do it well, "how" we convert the data depends on "why" we need to convert it. Therefore my approach to answering will be a little different than usual. I want to begin with a brief summary of relational databases and XML representations of data, and then give a simple but explicit example of converting from the former to the latter. In describing this example I will point out the "manual" steps involved in conversion. What follows will depend to an extent on your feedback. I do not know how familiar you are with either relational databases or XML. Assuming you have some significant experience with each, I will proceed to give a more complex example which illustrates one way in which client-server applications commonly use such a conversion. Then I will enumerate a number of toolsets or "frameworks" which are available for automating most of the conversion, commenting on which of these either ignore, require, or supply a DTD corresponding to the XML output validation. My hope is that by putting before you an assortment of mechanisms for conversion, I can give a more satisfactory answer to the second part of your question. The different frameworks for generating XML from relational data have many similarities, but they expose also an assortment of purposes for these conversions, and thus how the "algorithms" of conversion may be attuned to these various purposes. [SUMMARY OF RELATIONAL DATABASES AND XML REPRESENTATIONS] For a comparatively long time, relational databases have dominated managing "ownership" of data above a certain threshold of complexity. Relational databases exploit the "table" as a primitive data structure. Along with a few other concepts like keys and indexes, tables represent "business rules" as constraints to the valid data for an application. Essentially from the outset of the RDBMS "theory", Codd and Date advocated "normalization" as a set of guidelines/prescriptions for good design of relational databases. XML is a comparatively recent text-based formatting standard for data documents, aimed at simplifying the transmission or exchange of data between disparate architectures (data feeds). As such XML is both a flexible and regularly structured format. The XML standard has given rise to numerous derivative standards, such as XSLT, XPath, and Schema, and has insinuated itself into almost every aspect of modern application development, from Web site management to program documentation. This "surprising" success is due in part to an abstract "tree" model underlying XML (see Don Box's book, Essential XML), which is counterpoint to the flattened "table" designs of RDBMS. However XML lacks any "philosophy" of normalization comparable to what has been prescribed for relational databases, and designing XML schemes remains a much more unstructured art than designing relational database schemes. Thus converting from relational (table-based) data to an XML (tree-based) format occasions a certain amount of choices, potentially informed by our grasp of the ends to be served by this conversion. There are always so many possible ways to accomplish this, that one might wish for a "magic wand" (framework) that takes some or all of the decision making out of our hands. Let's begin our progress towards a discussion of these "magic wands" with a simple example in which the choices are easily made. [EXAMPLE 1] Suppose we have a single relational database table EMPLOYEE, defined as follows: CREATE table EMPLOYEE ( empid char(6) NOT NULL, firstname varchar(15) NOT NULL, middleinitial varchar(2) NULL, lastname varchar(20) NOT NULL, suffix varchar(3) NULL ) and a pair of records stored within as follows: INSERT into EMPLOYEE ( empid, firstname, middleinitial, lastname, suffix ) VALUES ('100000','Thomas','A.','Edison','Sr.') INSERT into EMPLOYEE ( empid, firstname, middleinitial, lastname, suffix ) VALUES ('100001','Bill',null,'Gates',null) [XML DOCUMENT REQUIREMENTS] To express these records' data in an XML format, we must produce a well-formed XML document that conserves the "complete sentence" semantics entailed by the business rules underlying the table. Let us assume, for the sake of illustration here, that field empid is the primary key on this table. Every XML document _should_ begin with an XML declaration, minimally like this: <?xml version="1.0"?> Fortunately the folks at W3C did such a good job with the basic XML recommendation, there hasn't been any newer version of XML after 1.0 (although much has been built atop this foundation). Other than the version parameter (which is required in any XML declaration), the XML 1.0 recommendation allows two optional parameters (encoding and standalone). But the above is fine for our immediate purposes. The rest of our XML document, its "body", will consist of a suitably nested arrangement of "elements". The W3C recommendation _allows_ for certain additional kinds of contents (comments, processor instructions, and CDATA sections), but for the sake of simplicity we will not use any of those constructs in this example. An XML element consists of an opening and a closing "tag", the contents of the element being everything in between. Tags are bracketed by an opening "<" and a closing ">", and the name of an element (a text identifier) is the first thing one finds after the opening "<". In addition to the element's name and its contents (which may be comprised of character data and/or a further arrangement of nested "child" elements), an element may possess attributes. The attributes are specified within the element's opening tag, separated by whitespace from the name and from each other. An empty element (one without contents) may be abbreviated into a single tag that is both opening and closing, as we shall illustrate below. [CHOOSE A NAME FOR THE DOCUMENT ELEMENT] It is required that an XML document have a single element at the root of its hierarchy of elements. So the first choice we'll make here is what to name this "document element". Under the circumstances (describing a collection of EMPLOYEE records), I would choose to name it "employees" (plural, hinting that the contents will consists of a repeating group of "employee" elements). So let's add a minimal opening and closing tag for our document element, giving the result: <?xml version="1.0"?> <employees> </employees> In itself that would be a well-formed XML document. However we must still provide further contents to reflect the records held in the relational database table. As foreshadowed above, I plan to layout one "employee" element for each of the two records in the EMPLOYEE table. That brings us to the next choices to be made. [DELINEATE THE ATTRIBUTES AND CONTENTS OF THE DOCUMENT ELEMENT] Since the contents of distinct fields on the record are separate, we should preserve that separation in the way we layout these <employee> elements. We could keep these values separate by either representing them as child elements or as attributes of <employee>. In the first case we might do something like this with subelements: <employee> <empid>100000</empid> <firstname>Thomas</firstname> <middleinitial>A.</middleinitial> <lastname>Edison</lastname> <suffix>Sr.</suffix> </employee> while an alternative approach using only attributes might look this way: <employee empid="100000" firstname="Thomas" middleinitial="A." lastname="Edison" suffix="Sr."> </employee> Notice that in this last approach the <employee> element is empty. In such a case one can "abbreviate" the opening and closing tags into a single tag as follows: <employee empid="100000" firstname="Thomas" middleinitial="A." lastname="Edison" suffix="Sr." /> Applications which parse the XML should treat these last two alternatives equivalently. Deciding whether values are to be represented as elements or as attributes is not always an obvious choice. Both extremes (all child elements with no attributes, versus empty elements characterized solely by attributes at some depth in the hierachy) have proponents. However I would choose in this case a combination approach: <employee empid="100000"> <firstname>Thomas</firstname> <middleinitial>A.</middleinitial> <lastname>Edison</lastname> <suffix>Sr.</suffix> </employee> The reasoning behind my choice is that the field empid has a distinguished role, as the primary key, in uniquely identifying the individual employee. Therefore it is attractive to link it closely with the <employee> element. Also it is important to note that the key field empid appears to be a "synthetic" sort of value, without useful substructure. This is important when deciding if a value ought to be an attribute, because while elements can be nested in a a manner that reflects substructure (or grouped in a manner that reflects a common semantic purpose, as we might do here by combining all the other values as child elements under an <empname>...</empname> construct), there is no nesting or grouping of attributes. Here then is my completed conversion to an XML document of this relational data: <?xml version="1.0"?> <employees> <employee empid="100000"> <firstname>Thomas</firstname> <middleinitial>A.</middleinitial> <lastname>Edison</lastname> <suffix>Sr.</suffix> </employee> <employee empid="100001"> <firstname>Bill</firstname> <middleinitial /> <lastname>Gates</lastname> <suffix /> </employee> </employees> There are several ways in which this example was especially simple. Not only were we treating a single table or view (or more generally a single recordset; stored procedures may return multiple recordsets), the records were few and the values were all naturally mapped into character form. In particular we did not bump into issues about representing floating point numbers or dates in character form. Some closely related issues would surface if we delved into what aspects of this converted XML document might be "validated" under a DTD. [PROCEEDING RECURSIVELY] But the example does point the way toward an important aspect of the general conversion process, and that is recursion. Here we began by naming the document element and delineating its contents. Actually we slid very subtly into performing these same activities on the next level deeper of child elements, progressing from <employees> layout to <employee> layout. A similar pattern is applicable to any XML conversion scheme: define the mapping at the level of the document element, and from there repeat the process with each child of those elements already mapped. It is a classic instance of definition by "induction", with the basis step formed naturally by the mapping of the document element. [FEEDBACK] As proposed at the outset, I would like to "pause" here to collect feedback from you. Have I told you already more than you ever wanted to know about XML and relational databases? Or have I told you only things which were already well known to you? Have I started us along a worthwhile track to a deeper discussion (as I proposed, a more real world example of conversion together with a survey of "automated" conversion frameworks)? Or have I begun the discussion in a way that either misses your concerns altogether or "passes overhead at a high rate of speed"? In short, should I proceed apace, backtrack to fill in gaps, or quit and give another researcher a crack at your question? regards, mathtalk-ga
Request for Answer Clarification by mssmb-ga on 21 Dec 2002 22:43 PST Hi mathtalk-ga, Thanks and gratitude for your assistant and this wonderful response. I hope that you continues the answer � because I believe that the other example is what I search for it. I have a good experience in relational databases and XML to understand the examples. In your answer there are some information that I have known it. I hope the next example will be validated under a DTD. Thank you again.
Clarification of Answer by mathtalk-ga on 23 Dec 2002 21:34 PST Hi, mssmb-ga: Thanks for the kind words and encouragement. You have a good background in XML and relational databases, and your focus seems to be on how DTDs can be used to validate XML documents that represent relational data. I think it best to revisit DTD concepts first in the context of the earlier example. [Example 1 DTD] Before proceeding to our "real world" example, let's elaborate the first XML document with a DTD (Document Type Definition). The DTD syntax, while also "inherited" from SGML, is quite separate from what we have used so far. See Secs. 3 and 4 of the W3C XML 1.0 Recommendation here: http://www.w3.org/TR/REC-xml The purpose of the DTD, as you know, is to define a particular class or "type" of XML document. The DTD can then be used by an XML processor to "validate" whether a XML document instance conforms to these rules. Any DTD must be referenced in the XML document after the XML declaration and before the document element. The DTD may have a part "internal" to the XML document and another part "external" to it. The usual syntax matches a DOCTYPE identifier to the document element name in this fashion: <?xml version="1.0"> <!DOCTYPE myDocElementName SYSTEM "http://www.myWebName.com/myDtdName.dtd"> <myDocElementName> . . . </myDocElementName> This external form of the DOCTYPE declaration uses a URL "location" to identify where the DTD file can be found. The internal portion of the DTD follows the external SYSTEM "..." reference. DTD declarations in the internal portion can then override any similar DTD declarations in the external portion. The external portion may be omitted if only the internal portion is needed. We illustrate this below for the sake of having a self-contained presentation. Recall our first example: <?xml version="1.0"?> <employees> <employee empid="100000"> <firstname>Thomas</firstname> <middleinitial>A.</middleinitial> <lastname>Edison</lastname> <suffix>Sr.</suffix> </employee> <employee empid="100001"> <firstname>Bill</firstname> <middleinitial /> <lastname>Gates</lastname> <suffix /> </employee> </employees> Since our document element is called "employees", our DOCTYPE declaration will wrap various DTD declarations (more precisely, "markup" declarations if you read the W3C specs) inside a pair of square brackets: <!DOCTYPE employees [ . . . ] > By convention the various DTD declarations are listed on separate lines (as are the opening and closing square brackets) for legibility. There are four essentially kinds of DTD declarations, not counting comments: ELEMENT, ATTLIST, ENTITY, and NOTATION. We will only be concerned here with the first two of these. The last two are useful in "parameterizing" an XML document by tying in macros and (externally defined) XML or non-XML content. Coordinating IGNORE and INCLUDE directives with ENTITY declarations allows one to create "conditional" declarations, but this is outside the present scope of our discussion. An ELEMENT declaration which looks like this: <!ELEMENT myElementName ... > constrains the content of the element <myElementName> by either a content category or a content model. The syntax options for the first case are quite limited; one has two choices: <!ELEMENT myElementName ANY > <!ELEMENT myElementName EMPTY > It's really all or nothing with these content categories. The ANY option allows all combinations whatsoever of text content and child elements (including no text content and/or no child elements). The EMPTY option requires no text content and no child elements. A content model allows more flexibility, at least in terms of specifying the varieties of child elements permitted. In our example we have <employee> elements. These always contain just these four child elements exactly once in the given order: firstname, middleinitial, lastname, suffix. The ELEMENT syntax that expresses this is: <!ELEMENT employee (firstname,middleinitial,lastname,suffix) > This is an example of a "sequence list" of child elements. Let's back up from the <employee> element to its parent in this example, the document element <employees>. The particular instance of the document type might be validated with this declaration: <!ELEMENT employees (employee,employee) > But this is inadequate. It asserts exactly two occurrences of <employee>, no more and no less, and we know from the intended application to retrieving records from a relational database that one should contemplate a variable number of records, including the possibility of no records returned. The ELEMENT syntax that expresses this is: <!ELEMENT employees (employee)* > The * is called a cardinality operator, and its sense is that zero or more instances are allowed. Therefore * has the meaning of "optional element(s)". For the sake of completeness we mention that ? has the sense of zero or one instance, hence meaning "an optional (singular) element", and that + has the sense of one or more instances, meaning "required element(s)". Finally we would simply like to say that the child elements of <employee> each have text-only content: <!ELEMENT firstname (#PCDATA) > <!ELEMENT middleinitial (#PCDATA) > <!ELEMENT lastname (#PCDATA) > <!ELEMENT suffix (#PCDATA) > A couple of words are in order about how the nullable fields EMPLOYEE.middleinitial and EMPLOYEE.suffix have been mapped into our XML document. Authors (such as C.J. Date) have observed that "null" as a value is open to a variety of interpretations, e.g. a nonexistent value, an accidentally missing value, a truly unknown value, a circumstantially inapplicable value, a "none of the above" optional value, etc. The handling of nulls is therefore dependent on "business rules" that exists outside of database design and contents per se. Here I have chosen to treat the null values for middle initial and suffixes in a manner similar to how they would be treated if instead a zero length text string were supplied. I have included <middleinitial> and <suffix> elements which happen to be empty. This is allowed, incidentally, by the #PCDATA characterization; one doesn't need to add a cardinality operator to indicate that the (necessarily singular) text contents are optional (no text amounts to zero length text). An alternative approach, one that would distinguish a "null" value from one which is an "empty" string, would be to omit the corresonding elements from the XML document when a field is populated with a null. In our case this would mean changing the ELEMENT declaration for <employee> to: <!ELEMENT employee (firstname,middleinitial?,lastname,suffix?) > so that the child elements <middleinitial> and <suffix> of <employee> can appear zero or one times. But that is not how I chose to map the elements here. Again this is something of a special case because we are mapping character strings as text content. Other sorts of values, like numbers or dates, would naturally be considered "nulls" if the corresponding element were empty of text content. This completes the descriptions of element contents, but one important item remains: the attributes! In our example only the <employee> element was assigned an attribute, but it was a critical one, the primary key EMPLOYEE.empid. The syntax used to describe the attributes of an element is like this: <!ATTLIST myElementName . . . > in which the individual lines (as formatted here) will describe attribute names, their types, and how to treat their default values (if any). Without going into complete detail about all possibilities, the "natural" way to describe a primary key associated "uniquely" with a record from a relational database is with an attribute of type ID, and which (from a "default" behavior standpoint) is REQUIRED. Hence we would like to say: <!ATTLIST employee empid ID #REQUIRED > because the ID attribute type means specifically that this attribute's value is to be unique within instances of this element (in the particular XML document). Unfortunately there is a problem with this approach. The problem is that the values of ID attributes must be legal XML identifiers. While XML identifiers can contain both alphabetic and numeric characters, there is a restriction on the first characters of legal XML identifiers. It cannot be a number. However an underscore is a valid first character of XML identifiers, so "mangling" of the empid values would provide one workaround. There are other attribute types for which the EMPLOYEE.empid values would qualify without mangling, such as CDATA or NMTOKEN, but these would not carry any sense of uniqueness. Further, as we shall see with the more complex example to come, the ID attributes are useful in linking data from one element type to another as IDREF attributes come into play. This is analogous to primary key/foreign key relationships in a relational database design. For now I will take the "underscore mangling" approach in order to capture the semantics of uniqueness. The completed example is then: <?xml version="1.0"?> <!DOCTYPE employees [ <!ELEMENT firstname (#PCDATA) > <!ELEMENT middleinitial (#PCDATA) > <!ELEMENT lastname (#PCDATA) > <!ELEMENT suffix (#PCDATA) > <!ELEMENT employee (firstname,middleinitial,lastname,suffix) > <!ATTLIST employee empid ID #REQUIRED > <!ELEMENT employees (employee)* > ] > <employees> <employee empid="_100000"> <firstname>Thomas</firstname> <middleinitial>A.</middleinitial> <lastname>Edison</lastname> <suffix>Sr.</suffix> </employee> <employee empid="_100001"> <firstname>Bill</firstname> <middleinitial /> <lastname>Gates</lastname> <suffix /> </employee> </employees> Additional Links: [W3C Recommendation: XML Information Set] (or XML InfoSet, abstraction of XML tree structure) http://www.w3.org/TR/xml-infoset/ [W3C Candidate Recommendation: XML 1.1] (formerly XML Blueberry, mainly character set & name issues) http://www.w3.org/TR/xml11/
Clarification of Answer by mathtalk-ga on 23 Dec 2002 22:06 PST Just a quick note about an "online" XML validating parser, here: http://www.stg.brown.edu/cgi-bin/xmlvalid/xmlvalid.pl The Web page has a text box where you can paste a small XML document, then click the "Validate" button. I did this with the example 1 with DTD text above, and although it gave warnings about the EMPTY content for <middleinitial> and <suffix>, the bottom line is the document validates OK. regards, mathtalk-ga
Clarification of Answer by mathtalk-ga on 26 Dec 2002 19:17 PST Even though you have already rated my answer, I'd like to add a few more words for the sake of completeness about _why_ one might convert relational data to XML. First, even though the table structure of a relational database is hard to beat for managing _all_ of a dataset, when the application calls for information from a variety of tables about a single entity (e.g. an employee's service, benefit options, elections, and beneficiary relationships), it may be much more useful to present this "star cluster" of data as a hierarchy (tree) rooted in the employee, rather than as a superficially disjointed collection of records (as would appear from simply parsing a sequence of SQL queries). Second, it can be helpful to use XML as a transmission format, esp. over a transport such as http:, because this is generally compatible with firewall restrictions. Third, XML can serve a common language when data is to be extracted from multiple relational databases. Oracle, SQL Server, and MySQL all have "native" capability to present query results as XML. Even a relatively primitive DBMS such as Visual FoxPro can be "frontended" with an XML formatting agent. The adoption of XML Schema as a W3C Recommendation has (despite your focus on DTDs) answered some critical concerns about standardization of datatypes. If you are interested in further details about available software frameworks to automate general relational datasets to XML, please post a new question. regards, mathtalk-ga

mssmb-ga rated this answer: 4 out of 5 stars

A good answer

Comments

There are no comments at this time.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.

Search Google Answers for

Google Home - Answers FAQ - Terms of Service - Privacy Policy