Chapter 1 Goals and motivations
1.1 Why query XML Documents ?
When we speak about queries and query languages, we often refer
to database technology. The well known SQL, for Structured Query
Language, has become a standard for relational databases
management systems. For the more modern object database
technology, a query language, namely OQL has been defined.
The purpose of a query language is to offer a declarative
means to obtain data. The query language is a tool to describe
the data you want to retrieve and manipulate, and it is the
role of the underlying system to find the best (e.g. efficient)
way to compute the answer.
Traditional database systems are appropriate tools for
data that can fit into a rigid schema. But there exist a lot of
information that can hardly benefit from being stored into
structured repositories. We are speaking here about documents,
like for instance TeX, SGML or HTML files, and also less
structured data typically exchanged via data exchange formats
(e.g ASN.1). Anyway, it is often very useful to
be able to query large sets of such data to fulfill specific
needs. In the Web context, search engines
provide such facilities. Roughly speaking, we can say that
in particular, documents were until recently mostly queried using
full-text indexes, which do not take into account the internal structure of
documents. Although quite sophisticated query engines for
structured documents have existed for some time, they have not
been a mainstream application.
Being optimistic, we can say that XML brings together the best
of these two worlds. Indeed, XML documents are structured
documents. They blur the distinction between data and documents,
allowing documents to be treated as data sources, and traditional
data sources to be treated as documents. Some XML documents are
nothing more than an ASCII representation of data that might
traditionally have been stored in a database. Others are
documents that contain very little structure. Still others are
somewhere in between, like for instance dictionaries or
technical manuals. As more and
more information is either stored in XML, exchanged in XML, or
presented as XML through various interfaces, the ability to
query XML data sources becomes increasingly
important.
1.2 Why X-OQL ?
There is an active debate about what a query language for
XML should be. Several proposals were presented at a W3C workshop on query
languages for XML [Mar98]. The Query Language Working Group at the W3C is
still working on the standard query language for XML.
Those proposals can be divided into two groups.
On one hand, we
have languages coming from the semistructured data field
(see [Abi97, ABS99] ). Since XML can be seen as a semistructured
data model, languages created for querying semistructured data
can be seen as query languages for XML (see for instance
[GMW99]). Lorel[AQM+97], XML-QL[DFF+98] and
YATL[CDSS98] belong to this family. In these languages, a
query consists of three parts:
- a pattern clause, which matches nested elements in
the input document;
- a filter clause, which tests the bound variables;
- a constructor clause, which specifies the result in
terms of the bound variables.
The information passed between these
clauses can be modeled as a relation. The separation
between these three clauses is clear in a language like Lorel,
which uses a familiar select-from-where statement. The select statement corresponds to the constructor clause, the
from statement is the pattern clause and the where
statement represents the filter clause. X-OQL belongs to this category.
On the other hand, proposals like XQL [RLS98], XQuery
[DeR98] or XPath [CD99]
provide extensions to the XSL pattern syntax. Such language can
be used to filter the nodes of interest into a document. They
bring XSL-T, the transformation language of XSL [Dea99], to a full
fledged pattern matching language.
We will not discuss here the respective advantages and drawbacks
of these two approachs, but the question remains. Why X-OQL ?
In the context of the
Active Views project [AAC+99, AAA+99], a year ago, we needed a working
implementation of a query language for XML. Since there was at
this time no consensus about the incoming standard (which is
still the case), we choosed to
create our own query language. In order to ensure a good
modularity of the Active Views prototype, we also decided to
implement it on top the Document Object Model, the standard
application interface for XML documents. Concerning the design of
the language, we were rather conservative and kept,
as in Lorel, the spirit of OQL, the standard query language for
objects databases [CBB+97].
I was in charge of the so called X-OQL part of Active Views.
At this time, I thought I would have fun doing it. I happened to
be wrong. I had a lot of fun. So you should be aware that the work around
X-OQL is certainly not to be compared to official proposals to
the W3C! It is more a research prototype than a strong
proposal. It certainly lacks essential features, as compared to
more polished works. Anyway, it often works as it should do.
We will in the sequel give a quick overview of XML
[BPSM98] and the
Document Object Model (DOM) [WG98]. Those two sections
try to provide
guidelines to readers new to XML technologies. It is strongly
recommended to refer to official specifications Web pages (see
[BPSM98, WG98]) for more information. We then expose
briefly the design of the language and its implementation on top
of DOM.
1.3 XML
XML means eXtensible Markup Language. It is a new standard
adopted by the World
Wide Web Consortium (W3C) to ease data exchange on the World Wild Web. Like HTML, XML makes use of tags (words
bracketed by < and >) and attributes (of the
form name="value"), but while HTML specifies what each tag
and attribute means (and often how the text between them will
look in a browser), XML does not impose a fixed a-priori set of
tags, and, generally speaking, uses the tags only to delimit pieces of
data, leaving the interpretation of the data completely to the
application that reads it.
1.3.1 Basic syntax
Elements
Recall that XML is a textual representation of data. The basic component in
XML is the element. An element is a piece of text bounded
by matching tags. Consider for instance the following XML
element:
<book>
<title> Heart of Drakness </title>
<author> Joseph Conrad </author>
</book>
Inside an element we may have raw tex, other elements, a
mixture of the two, or nothing. An expression such as <book> is called
a start-tag and </book> an end-tag. Such tags must be
balanced, ie they should be closed in inverse order they are
opened. This means that overlapping tags, like
<p> <b> some text </p> some text </b>
are not allowed in an XML document.
Tags in XML are defined by users. There are no predefined tags as
in HTML.
Finally, XML has a useful abbreviation for empty elements. The
following <book></book> can be abbreviated to
<book/>.
Attributes
XML allows to associate attributes with
elements. Attributes are name = "value" pairs that can appear
inside start-tags. In the example below an attribute is used to
specify the currency of a price element:
<book>
<title> Heart of Drakness </title>
<author> Joseph Conrad </author>
<price currency = "euro"> 20.9 </price>
</book>
As with tags, users may define arbitrary attributes. The value of
an attribute is always a string, and must be enclosed between
quotation marks. Whereas sub-elements with the same tag may be
repeated within an element, a given attribute may occur at most once
within a tag.
For instance,
<price> <currency = "euro" /> <currency = "US dollar /> </price>
is correct, but
<price currency="euro" currency="US dollar">
is not.
Comments
Comments can be added anywhere into an XML document for
readability. Their syntax is as follows:
<!--this is a comment-->
1.3.2 Well-Formed and Valid XML Documents
<books>
<novel>
<title> Heart of Darkness </title>
<author> Joseph Conrad </author>
<price currency = "euro"> 20.9 </price>
</novel>
<manual>
<title>
LateX: a Document Preparation System
</title>
<author> Leslie Lamport </author>
<publisher> Addison-Wesley </publisher>
</manual>
<book year="1999" month="October" >
<title>
Data on the Web: From Relations
to Semistructured Data and XML
</title>
<authors>
<author> Serge Abiteboul </author>
<author> Peter Buneman </author>
<author> Dan Suciu </author>
</authors>
<price currency = "US dollar"> 39.95 </price>
<publisher> Morgan Kauffman </publisher>
</book>
</books>
Figure 1.1: A simple XML document.
So far we have presented the basic XML syntax. There are very few
constraints that have to be matched: tags have to nest properly
and attributes within an element have to be unique.
A document is well-formed if it
satisfies those two constraints and if it contains one and only
one element at the upper-most level. The example document in Figure is well-formed.
Actually according to the official
specification, every well formed XML document must contain a
prolog, but the prolog can consist of nothing.
When the prolog is not empty, it must come before the first
element in the document. It can contain a version declaration,
a document type declaration and comments.
Version declaration
All XML documents should contain a simple version declaration,
which tells the processor what version of XML the document
conforms to. Here is the form that it takes.
<?xml version="1.0"?>
<greeting> Hello XML!</greeting>
The version declaration can also contain other information such
as an encoding declaration, that informs the processor what kind
of code the document uses. For instance UTF-8 means that the
encoding of this document uses the same character set as ASCII.
It may also contain a standalone
declaration that tells
the XML processor whether the document can be read as a standalone
document, or whether it needs to look outside the document for
other rules. This is a rather advanced parameter and you are directed
to the specification for more information. Here is an example
however:
<?xml version="1.0"? encoding="UTF-8" standalone="yes" ?>
<greeting> Hello XML!</greeting>
Document Type Declaration and DTD
Alternatively, a document can be accompanied by a Document
Type Definition (DTD). It is essentially a grammar for restricting
the tags and structure of a document. An XML document satisfying a DTD
is considered valid.
A DTD is generally stored in a standalone file.
An XML processor can locate a document's DTD by looking at the
document type declaration in the prolog. For instance:
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE greeting SYSTEM "hello.dtd">
<greeting> Hello XML!</greeting>
means that this document must have a root element with tag name
"greeting" and must satisfy the validity constraints
defined in the file "hello.dtd" in order to be valid.
"hello.dtd" would contain something like:
<!ELEMENT greeting (#PCDATA)>
This declaration means that a greeting element can contain only
character data.
Without getting into the details of DTD syntax, here follows a
simple DTD for the example document.
<!-- the root element -->
<!ELEMENT books (novel | manual | book)*>
<!-- it can contains 0 or more subelements -->
<!-- each subelement must be a novel, a manual or a book -->
<!-- novel type -->
<!ELEMENT novel (title, author, price?) >
<!-- a novel must contain, in this order -->
<!-- a title and an author -->
<!-- a price may follow -->
<!ELEMENT manual (title, author, publisher?)>
<!ELEMENT book (title,
(author|authors),
price?,
publisher?)>
<!ELEMENT authors (author+)>
<!-- authors is a non empty set of author elements -->
<!ELEMENT author (#PCDATA)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT price (#PCDATA)>
<!ELEMENT publisher (#PCDATA)>
<!-- the currency attribute for a price element -->
<!ATTLIST price currency CDATA #IMPLIED>
Assume that this DTD is stored in "booksDTD.dtd". If we want to
validate the example document against this DTD, it becomes:
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE books SYSTEM "booksDTD.dtd">
<books>
...
</books>
1.4 Other XML constructs
Entities and CDATA
Suppose for instance that one wants to insert this sentence:
A start <tag> must nest properly with an end </tag>.
into an XML document, as pure textual data. In this case,
we do not want an XML
processor to interpret the text between <tag> and </tag as XML.
One way is to use entities. An entity is a named piece of
text. Whenever an XML processor encounters a reference to an
entity, it replaces this reference by the corresponding text.
There exists some predefined entities, such that lt for the
character < or gt for >.
Using entities, the sentence above become:
A start <tag> must nest properly with an end </tag>.
User defined entities can be declared into a DTD.
Another way is to use CDATA sections. This is a more appropriate
method to treat markup as text in documents, so that there is no
necessity to escape every special character.
The syntax is:
"<![CDATA [A start <tag> must nest properly
with an end </tag>
]]>"
Processing Instructions
Processing instructions take the following form:
<?target processing instruction?>
They are not part of the document's character data, but must be
passed through to the application. The PI begins with a target
(PITarget) used to identify the application to which the
instruction is directed. The target names "XML", "xml", "xMl"
and so on are reserved.
Processing instructions can occur anywhere and contain information
that the processor must pass on to the user agent. The
version declaration
<?xml version="1.0" encoding="UTF-8"?>
is an example of a processing
instruction.
1.5 DOM
The W3 Document Object Model (DOM) is an application programming
interface for XML documents. It defines a logical view of the
structure of documents and the way a document is accessed and
manipulated.
The current W3 specification [WG98] defines the Document Object Model Level 1. It is separated into two parts:
Core and HTML. The Core DOM Level 1 section provides
a low-level set of fundamental interfaces that can represent any
structured document. It defines as well extended interfaces for
representing an XML document.
We will in the sequel deal with Core DOM Level 1, and, for
simplicity, refer to it as DOM.
The goal of DOM is to define a programmatic platform- and
language-neutral interface that allows programs and scripts to
dynamically access and update the content and the structure of
documents. In DOM, documents have a logical structure which is a
tree. Consider for instance the XML document in
Figure 1.1.
0.65 DOMtree.eps
Figure 1.2: A DOM representation of an XML document
It can be represented as a tree, as in Figure 1.2.
The name "Document Object Model" was chosen because it is an
"object model" in the traditional object-oriented design sense.
Documents and components of documents are modeled using
objects, and the model encompasses
not only the structure of a document, but also the behavior of a
document and the objects of which it is composed. In other words,
the nodes in the example diagram (Figure 1.2)
represent objects with identities and methods.
Let us examine more precisely the example document to discover
what kind of nodes a DOM tree contains. Starting from
the root node, we find the Document node. This object
represents a Document. It has one child node, books.
This is an Element object. In DOM, a document must have one
and only one element child.
For instance, the simplest XML document, <doc></doc> would
be represented as a document object having one element child with
tag name doc.
This unique child element of a document is
called the document element.
Getting back to Figure 1.2, we can see that
the document element has three child nodes,
novel, manual and book,
which happen to be element nodes, and each of these have children
of different kind. A node that contains only text data is called a Text node. In DOM, Text nodes does not have any children.
Some elements, like book have attributes, represented as
Attr
objects. An attribute is a
name/value pair belonging to an element.
We have looked at the objects that the example well-formed XML
document can be divided into. We will in the sequel details
more precisely the properties DOM defines for each kind of
object. Please be aware that this presentation is an abstract of the W3 DOM
Specification. For shortness and clarity a lot of details have
been omitted. The reader should refer to the official document
whenever exact information is needed.
Before going any further it is probably better to say a little
about the syntax, namely the Object Management Group's
Interface Definition Language (OMG IDL)
used to define the properties and methods of the
DOM in the W3C DOM specification.
1.5.1 The OMG IDL
The OMG IDL is a language for defining object interfaces, and
ensure interoperability between object oriented programming
languages. If one wants a Java program to communicate
over a network with a C++ program, one can do so using IDL and a
Broker which will translate between the two languages. This
broker is called the Common Object Request Broker Architecture or
CORBA for short. All CORBA objects support an IDL interface; the
IDL interface defines an object type. An interface can inherit
from one or more other interfaces. IDL syntax is very similar to
that of Java or C++, and an IDL file is functionally the CORBA
language-independent analog to a C++ header file. IDL is mapped
into each programming language to provide access to object
interfaces from that language.
An IDL interface declares a set of client accessible operations,
exceptions, and typed attributes (values). Each operation has a
signature that defines its name, parameters, result, and
exceptions. A simple IDL interface that describes a person follows.
interface Person {
attribute string firstname;
attribute string lastname;
attribute short age;
boolean isOfAge();
};
The Interfaces attribute definitions are shorthands for
get/set methods.
It is particularly unfortunate that the term that IDL uses as a
get/set method is called 'attribute', which of course has nothing
at all to do with element XML attributes. For this
reason the DOM refers to XML element attribute as Attr.
An operation may raise an exception when an error condition
arises. The type of the exception indicates the kind of error
that was encountered.
1.5.2 The Node interface
The Node interface is the primary datatype for the entire
Document Object Model: each object in a DOM tree is a Node.
The IDL definition of the Node interface is :
interface Node {
readonly attribute DOMString nodeName;
attribute DOMString nodeValue;
readonly attribute unsigned short nodeType;
readonly attribute Node parentNode;
readonly attribute NodeList childNodes;
readonly attribute Node firstChild;
readonly attribute Node lastChild;
readonly attribute Node previousSibling;
readonly attribute Node nextSibling;
readonly attribute NamedNodeMap attributes;
readonly attribute Document ownerDocument;
Node insertBefore(in Node newChild,
in Node refChild)
raises(DOMException);
Node replaceChild(in Node newChild,
in Node oldChild)
raises(DOMException);
Node removeChild(in Node oldChild)
raises(DOMException);
Node appendChild(in Node newChild)
raises(DOMException);
boolean hasChildNodes();
Node cloneNode(in boolean deep);
};
For convenience, the Node interface expose methods for dealing with
children, despite the fact that not all objects
implementing the Node interface may have children. For example,
Text nodes may not have children, and adding children to such
nodes results in a DOMException being raised.
In the same spirit, the attributes nodeName, nodeValue and
attributes are included as
a mechanism to get at node information without casting down to
the specific derived interface. In cases where there is no
obvious mapping of these attributes for a specific nodeType
(e.g., nodeValue for an Element or attributes for a Text),
this returns null. Note that the specialized interfaces may
contain additional and more convenient mechanisms to get and set
the relevant information.
The DOM also specifies a NodeList interface to handle ordered
lists of Nodes, such as the children of a Node, and a
NamedNodeMap interface to handle unordered sets of nodes
referenced by their name attribute, such as the attributes of an
Element.
1.5.3 DOM Structure Model
DOM presents documents as a hierarchy of Node objects that
also implement other, more specialized interfaces. Some types of
nodes may have child nodes of various types, and others are leaf
nodes that cannot have anything below them in the document
structure. The node types, and which node types they may have as
children, are as follows:
| Document | Element (maximum of one), ProcessingInstruction,
Comment, DocumentType |
| DocumentFragment | Element,
ProcessingInstruction, Comment, Text, CDATASection,
EntityReference |
| DocumentType | no children |
| EntityReference | Element, ProcessingInstruction, Comment, Text, CDATASection,
EntityReference |
| Element | Element, Text, Comment,
ProcessingInstruction, CDATASection, EntityReference |
| Attr | Text, EntityReference |
| ProcessingInstruction | no children |
| Comment | no children |
| Text | no children |
| CDATASection | no children |
| Entity | Element, ProcessingInstruction, Comment, Text,
CDATASection, EntityReference |
| Notation | no children |
|
1.5.4 The Document interface
The Document interface represents the entire XML
document. Conceptually, it is the root of the document tree, and
provides the primary access to the document's data. Elements,
text nodes, comments, processing instructions, etc. cannot exist
outside the context of a Document. So the Document interface also
contains the factory methods needed to create these objects. The
Node objects created have a ownerDocument attribute which
associates them with the Document within whose context they were
created.
IDL Definition
interface Document : Node {
readonly attribute DocumentType doctype;
readonly attribute DOMImplementation implementation;
readonly attribute Element documentElement;
Element createElement(in DOMString tagName);
DocumentFragment createDocumentFragment();
Text createTextNode(in DOMString data);
Comment createComment(in DOMString data);
CDATASection createCDATASection(in DOMString data);
ProcessingInstruction
createProcessingInstruction(in DOMString target,
in DOMString data);
Attr createAttribute(in DOMString name);
EntityReference createEntityReference(in DOMString name);
NodeList getElementsByTagName(in DOMString tagname);
};
Attributes
- doctype The Document Type Declaration
associated with this document. For XML documents without a document type
declaration this returns null. The DOM Level 1 does
not support editing the Document Type Declaration,
therefore docType cannot be altered in any way.
- implementation The DOMImplementation object that
handles this document. A DOM application may use
objects from multiple implementations.
- documentElement This is a convenience attribute that
allows direct access to the child node that is the
root element of the document.
1.5.5 The Element interface
By far the vast majority of objects (apart from text) that authors encounter when traversing a document are Element nodes. Assume the
following XML document:
<elementExample id="demo">
<subelement1/>
<subelement2><subsubelement/></subelement2>
</elementExample>
When represented using DOM, the top node is an Element node for
"elementExample", which contains two child Element nodes, one
for "subelement1" and one for "subelement2". "subelement1"
contains no child nodes.
Elements may have attributes associated with them; since the
Element interface inherits from Node, the generic Node
interface method getAttributes may be used to retrieve the
set of all attributes for an element. There are methods on
the Element interface to retrieve either an Attr object by
name or an attribute value by name. In XML, where an
attribute value may contain entity references, an Attr object
should be retrieved to examine the possibly fairly complex
sub-tree representing the attribute value.
IDL definition
interface Element : Node {
readonly attribute DOMString tagName;
DOMString getAttribute(in DOMString name);
void setAttribute(in DOMString name,
in DOMString value);
void removeAttribute(in DOMString name);
Attr getAttributeNode(in DOMString name);
Attr setAttributeNode(in Attr newAttr);
Attr removeAttributeNode(in Attr oldAttr);
NodeList getElementsByTagName(in DOMString name);
void normalize();
};
Attributes
- tagName
The name of the element. For example, in:
<foo id="demo">
...
</foo>
tagName has the value "foo". Note that this
is case-preserving in XML, as are all of the
operations of the DOM.
Methods
- getAttribute Retrieves an attribute value by name.
- setAttribute Adds a new attribute. If an attribute with
that name is already present in the element, its value is changed
to be that of the value parameter. This value is a simple string,
it is not parsed as it is being set. So any markup (such as
syntax to be recognized as an entity reference) is treated as
literal text, and needs to be appropriately escaped by the
implementation when it is written out. In order to assign an
attribute value that contains entity references, the user must
create an Attr node plus any Text and EntityReference nodes,
build the appropriate subtree, and use setAttributeNode to assign
it as the value of an attribute.
- removeAttribute Removes an attribute by name. If the
removed attribute has a default value it is immediately replaced.
- getAttributeNode Retrieves an Attr node by name.
- setAttributeNode Adds a new attribute. If an attribute
with that name is already present in the element, it is replaced
by the new one.
- getElementsByTagName Returns a NodeList of all descendant
elements with a given tag name, in the order in which they would
be encountered in a preorder traversal of the Element tree.
- normalize Puts all Text nodes in the full depth of the
sub-tree underneath this Element into a "normal" form where only
markup (e.g., tags, comments, processing instructions, CDATA
sections, and entity references) separates Text nodes, i.e.,
there are no adjacent Text nodes. This can be used to ensure that
the DOM view of a document is the same as if it were saved and
re-loaded, and is useful when operations (such as XPointer
lookups) that depend on a particular document tree structure are
to be used.
1.5.6 The Attr Interface
The Attr interface represents an attribute in an Element
object. Typically the allowable values for the attribute are
defined in a document type definition. Attr objects inherit the
Node interface, but since they are not actually child nodes of
the element they describe, DOM does not consider them part of the
document tree. Thus, the Node attributes parentNode,
previousSibling, and nextSibling have a null value for Attr
objects. DOM takes the view that attributes are properties of
elements rather than having a separate identity from the elements
they are associated with. Furthermore, Attr nodes may not be
immediate children of a DocumentFragment. However, they can be
associated with Element nodes contained within a
DocumentFragment. In XML, where the value of an attribute can
contain entity references, the child nodes of the Attr node
provide a representation in which entity references are not
expanded. These child nodes may be either Text or
EntityReference nodes. Because the attribute type may be
unknown, there are no tokenized attribute values.
IDL definition
interface Attr : Node {
readonly attribute DOMString name;
attribute DOMString value;
};
Attributes
- name Returns the name of this attribute.
- value
- On retrieval: the value of
the attribute is returned as a string. Character
and general entity references are replaced with
their values.
- On setting: this creates a Text node with the unparsed
contents of the string.
1.5.7 The Text Interface
The Text interface represents the textual content (termed
character data in XML) of an Element or Attr. If there is no
markup inside an element's content, the text is contained in
a single object implementing the Text interface that is the
only child of the element. If there is markup, it is parsed
into a list of elements and Text nodes that form the list of
children of the element.
IDL definition
interface Text : CharacterData {
};
1.6 Querying XML documents using DOM
Let us go back to the document figure . It
contains three similar records containing informations about some
books. Note that each record has a different structure.
Imagine a bookseller managing his catalog using such a
document. He will certainly be interested in
retrieving all the books written by a given author, or the novels
with price less than a given value.
When the document is small, this can be achieved by manually
browsing the XML document. But one can easily imagine that manual
browsing becomes more and more cumbersome as the size of the
document grows up.
We can automate manual browsing in order to ease specific
tasks. For instance, if it often happens that the bookseller
looks for the titles of novels with price less than 10, he could
write a program like the following :
Starting from the root, find all the direct child nodes which
name is novel.
Call N this collection of nodes.
For each node n in N, if n has a child node
p with price for name, coerce the text children of
p into a numeric value, and compare it to 10.
If the comparison succeeds, append the title subelement of
n to the result.
Translated into a Java program using DOM Java Binding, it becomes:
// the input document
XMLParser parser = DomImplementation.createParser();
Document in = parser.parse("books.xml");
// the ouput document
Document out = domImplementation.createDocument();
// create the root element for output document
// this is the DocumentElement for out
out.appendChild( out.createElement("result");
NodeList rootChildNodes = in.getDocumentElement().getChildNodes();
// iterate through the novel subelements of root
for(int i=0; i<rootChildNodes.getLength(); i++){
Node novel = rootChildNodes.item(i);
if( novel.getNodeName().compareTo("novel")==0 ){
NodeList novelChildNodes = novel.getChildNodes();
// iterate through the price subelements of each novel
for(int j=0; j<novelChildNodes.getLength(); j++){
Node price = novelChildNodes.item(j);
if( price.getNodeName().compareTo("price")==0 ){
// retrieve the first child node of price
Node priceText = price.getFirstChild();
if( priceText != null){
if( priceText.getNodeType() == Node.TEXT_NODE){
// if it is a text node, compute its numeric value
float priceValue = new Float( priceText.getNodeValue() ).floatValue();
if( priceValue < 10){
// if this value is less than 10
// retrieve a title subelements of this novel
Node title = null;
for(int k=0;k<novelChildNodes.getLength() && (title==null);k++){
if(novelChildNodes.item(i).getNodeName().compareTo("title")){
title = novelChildNodes.item(k);
}
}
// If it exists, append it to the output document.
// Note that DOM does not allow cloning nodes from
// one document to an another.
if( title != null){
Element e = out.createElement( "title");
e.appendChild(
out.createTextNode(
title.getFirstChild().getNodeValue()));
out.getDocumentElement().appendChild(e);
}
}
}
}
}
}
}
}
One must admit that this program is quite complex compared to
the simplicity of the original query.
An equivalent X-OQL query would be :
select novel/title
from books in load("books.xml")
,novel in books/novel
where novel/price < 10;
1.7 DOM as an intermediate layer
The X-OQL implementation only uses DOM operations to access XML data. Thus,
the X-OQL module is able to query DOM trees, whatever the nature
of the underlying storage and/or implementation.
- X-OQL is not computationally complete. It is an easy to use query
language which provides easy access to XML documents through the
DOM interface.
- X-OQL can be invoked directly from within a Java program.
- X-OQL can easily be extended.