How to Read W3C Specs

时间：2005-05-01 来源：freelamp

Realize that W3C specifications are written for implementers, not end users.
Many specifications contain a section that tells how they are organized and how you should read them.
Know the vocabulary that specifications use.
Remember that you don’t have to read every word. Skim for the parts that make sense.
Avoid discussions of namespaces.
Learn to read BNF — it's used in lots of places.
Learn to read a DTD for answers to syntax questions.
If a technology is scriptable, the information is in the bindings.

Be patient and persistent, and you’ll be amazed at the amount of information you too can extract from a W3C specification.

<p>Here is Einstein’s famous equation: <ml:math> <ml:mrow> <ml:mi>e</ml:mi> <ml:mo>=</ml:mo> <ml:mi>m</ml:mi> <ml:mi>c</ml:mi> <ml:msup>2</ml:msup> </ml:mrow> </ml:math> with which we <i>all</i> are familiar.</p>

Your best bet is to follow any namespace prefixes that you see in sample documents. In most cases, if you encounter a long discussion of how a certain XML technology is “namespace-aware,” you may safely skip it.

Learn to Read BNF

BNF stands for Backus Naur Form, or Backus Normal Form. It’s a compact way to represent the grammar of computer languages, and it’s been around, well, forever. Different specifications use different flavors of BNF, but they all translate long English descriptions into symbolic form. Take this example of what constitutes a sandwich:

A sandwich consists of a lower slice of bread, mustard or mayonnaise; optional lettuce, an optional slice of tomato; two to four slices of either bologna, salami, or ham (in any combination); one or more slices of cheese, and a top slice of bread.

This translates to:

 sandwich ::= lower_slice [ mustard | mayonnaise ] lettuce? tomato? [ bologna | salami | ham ] {2,4} cheese+ top_slice

The constituent parts of a definition are listed in order, separated by blanks. Items are grouped with square brackets, and choices within a group are separated by a vertical bar.

If an item is followed by a question mark, that means “one or none;” if followed by a plus sign, that means “one or more;” if followed by an asterisk, that means “zero or more;” and if followed by numbers inside braces, it gives the lower and upper limits for how many times an item can occur.

Parentheses, or more square brackets, are used to group items in more complex definitions. Sometimes a generic item (like a “color”) is enclosed in < and >, or fixed items will be enclosed in quote marks.

Learn to Read a Document Type Definition

The Grolier Encyclopedia® is the source authority for all answers and questions asked on Jeopardy®. —Credit on TV game show

You know those <!DOCTYPE ...> declarations that you put in your documents to tell the browser which version of HTML or XHTML you're using? Those declarations refer to a Document Type Definition, or DTD, which defines which combinations of elements are legal in a document.

While learning to read a DTD is difficult, it's not an impossible task. And it's worth learning, because the DTD is the ultimate authority for what is and is not syntactically correct for a particular markup language.

A full explanation of how to read a DTD is well beyond the scope of this article, but it can be found in Elizabeth Castro’s XML for the World Wide Web Visual Quickstart Guide, or in Erik Ray’s Learning XML. Here's a brief example of something you might see in a DTD:

 <!ENTITY %fontstyle "(tt | i | b)"> <!ENTITY %inline "(#PCDATA | %fontstyle;)"> <!ELEMENT div (p | %inline;)+> <!ATTLIST div align (left | right | center) #IMPLIED>

And here’s what it means in English:

The font style elements are <tt>, <i>, and <b>. Inline elements consist of text or font style elements. A <div> can contain one or more <p> or inline elements in any order. A <div> has an optional align attribute with values of left, right, or center.

Idle Past IDL, be Bound by Bindings

Some XML technologies, such as SVG and SMIL, allow a user to write programs to control a document dynamically, much as JavaScript lets you control an HTML document. Their specifications will have sections that describe how scripts work with the Document Object Model. These sections show the interfaces in IDL, the Interface Definition Language.

IDL is a generic notation for describing the kinds of information that a user agent should make accessible to a programming environment. IDL is not a programming language; it’s a notation for describing these interfaces in a compact way. While informative, the IDL interface definitions are almost certainly not what you are looking for.

What you probably want, depending upon your programming language of choice, is the Java bindings or ECMAScript bindings.

Bindings is a fancy term for the list of objects, properties, and methods that are available to your scripts. ECMAScript is the European Computer Manufacturer's Association standard version of JavaScript.

If you're using some other language like Perl or Python, you'll have to look for a library from some place like the Comprehensive Perl Archive Network or the Python XML Special Interest Group.

Summary

Realize that W3C specifications are written for implementers, not end users.
Many specifications contain a section that tells how they are organized and how you should read them.
Know the vocabulary that specifications use.
Remember that you don’t have to read every word. Skim for the parts that make sense.
Avoid discussions of namespaces.
Learn to read BNF — it's used in lots of places.
Learn to read a DTD for answers to syntax questions.
If a technology is scriptable, the information is in the bindings.

Be patient and persistent, and you’ll be amazed at the amount of information you too can extract from a W3C specification.

Translations French (Pompage.net)

J. David Eisenberg is a programmer and teacher living in San Jose, CA with his cat, Marco Polo. Most of his current work is in XML, Java, JavaScript, and Perl. He has written a book about Scalable Vector Graphics. The details are at catcode.com/narrative.html.