Talk:XML/Archive 4

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Some XML terms[edit]

I believe the main text needs a bit more beef, but I hesitate to add such content since it's a bit too much for most people... you can keywords search from inside Visual Studio 2005 if you have it. --Raylopez99 20:57, 1 October 2007 (UTC)[reply]


XML's set of tools helps developers in creating web pages but its usefulness goes well beyond that. XML, in combination with other standards, makes it possible to define the content of a document separately from its formatting, making it easy to reuse that content in other applications or for other presentation environments. Most importantly, XML provides a basic syntax that can be used to share information between different kinds of computers, different applications, and different organizations without needing to pass through many layers of conversion.[4] —Preceding unsigned comment added by 220.226.191.107 (talk) 05:06, 28 February 2009 (UTC)[reply]

SGML[edit]

This article states "Its predecessor, SGML, has been in use since 1986, so there is extensive experience and software available." But the article on SGML states "few SGML-aware programs existed when XML was created". Which is it? Few programs, or extensive software? Can't be both? 10:48, 12 December 2008 (UTC) —Preceding unsigned comment added by 195.72.173.51 (talk)

I assume that sentence was supposed to mean "(extensive experience) + (software)", as opposed to "extensive (experience + software)". Software included WordPerfect Office (see for example WordPerfect Office 2000 review in PCPro July 1999, WordPerfect Office 2002 review at DTPStudio and Corel WordPerfect Office X3), Adobe Framemaker + SGML (see Adobe FrameMaker; with versions for Windows, Macintosh and Unix), the OmniMark programming language, various SGML parsers (including James Clark's famous sgmls), tools for viewing and edting DTDs, etcetera. In fact, Peter Flynn published a complete book on the subject: Understanding SGML and XML Tools: Practical programs for handling structured text Boston/Dordrecht/London: Kluwer Academic Publishers, 1998. ISBN 0-7923-8169-6. (XML was brand new at the time, but development of XML software had already started while the spec was under development.)
With regard to experience, this is something that may be inferred from the body of literature on the subject. (For example, search Amazon.) --ChristopheS (talk) 12:28, 12 February 2009 (UTC)[reply]
It is a contradiction. But there were many SGML-specific applications that operated directly on the SGML, but not many applications that happened to use SGML as a side issue. While with XML, everything uses it, but relatively few applications manipulate it directly. This is largely a function that SGML matches publishing requirements, and that it was so complex that implementing it was too much of an effort for it to be used as an afterthought, the way XML is. Rick Jelliffe (talk) 14:52, 10 August 2009 (UTC)[reply]

Disadvantages of XML[edit]

XML vs. binary is a disadvantage (the first item in the list)? Why not the disadvantage of XML vs. a ham sandwich? XML is not a binary format. It should be compared to other markup systems, not to a format that it is clearly not in the family of. —Preceding unsigned comment added by 205.229.50.10 (talk) 23:36, 16 December 2008 (UTC)[reply]

If you have a credible cite that indicates XML and Ham Sandwiches as reasonably capable of mutual substitution, or otherwise capable of meeting the same or similar requirements as alternative technologies, then even that comparison might merit some mention. As long as there is a meaningful basis for comparison and it is within the scope of the article, then it is potentially relevant to the article. dr.ef.tymac (talk) 18:09, 19 January 2009 (UTC)[reply]
Ehhm, as regard to edibility, ham sandwiches rulezz, but XML might also be fine if it generates money that generates ham sandwiches... Prawn would also be fine, and maybe a pizza and a couple of beer at Friday.
Besides that, you're partially right. Foremost XML vs. "binary" is a weird comparison, since gzipping an XML will make it "binary". XML could be compared to other things according to usage, where the most obvious competitor is SGML. XML bundled with it's "obvious partners" such as XSLT, XSL:foo, XPath, XLink and such could be used for documentation with a unique DTD, and then it could be compared to f.ex. DocBook, but such specific comparisons should be highlighted as per-topic. ... said: Rursus (bork²) 13:41, 11 February 2009 (UTC)[reply]

Confused[edit]

I came here to learn about XML; I know almost nothing about it. The section on Well-formedness confuses me, as it doesn't appear to make any sense:

The only indispensable syntactical requirement is that the document has exactly one root element (also known as the document element), i.e. the text must be enclosed between a root start-tag and a corresponding end-tag, known as a 'well-formed' XML document: <book>This is a book... </book>

First of all, it's confusing that "root element" and "document element" appear to be fundamental terms, yet they're merely in bold. Seems like if it's an "indispensable syntactical requirement," it might warrant an entry--or at least a complete definition somewhere on the page.

Second, it seems like a non sequitur to say that an XML document must have "exactly one element, i.e. the text must be enclosed between a root start-tag and a corresponding end-tag..." Where's the one element? I read a statement--another requirement, in fact.

To top off this strange sentence, this is followed by "known as a 'well-formed' XML document." What is known as a well-formed document, the element? The text?

In the example, is "This is a book..." the element? The whole well-formed document?

Someone should explain this better. I'd like to understand.

Thanks. —Preceding unsigned comment added by 152.3.112.145 (talk) 21:03, 9 March 2009 (UTC)[reply]

"This article may be too long to comfortably read and navigate."[edit]

I suggest that this article is actually of an appropriate length, given the complexity of the topic. I would prefer NOT to see it broken up into subtopics. (Of course, links to subtopics would be welcomed, but I don't think there's too much text here/too many subtopics for this particular topic.)

I'm guessing that this is an automatically-generated warning from a bot. If others agree with me on this point (of the article NOT being too long), perhaps someone knows how to get rid of the warning AND flag it somehow so the bot won't re-add the warning. "???"

Aloha, philiptdotcom (talk) 02:29, 2 July 2009 (UTC)[reply]

I agree. Does anyone have an objection to us removing this notice? --Nigelj (talk) 12:11, 2 July 2009 (UTC)[reply]
It was me who tagged it I think. The article currently reads like it was copied out of a textbook; non-technical readers would have little hope of finishing it before falling asleep at their keyboards. That something is a "complicated topic" only excuses it from our guidelines on length if it is very well-written, such as our various Presidential biographies. This article could easily be cut down to a more manageable length without it being less valuable; for instance, the minutae on validation belongs on XML validation (itself a very poor article presently) and not here. Feel free to work on this if you want; I'll try to do so myself at some point, which is what the tag is there to remind me of. Chris Cunningham (not at work) - talk 12:26, 2 July 2009 (UTC)[reply]
I took the liberty of removing the tag (before reading this discussion). I did that on the basis that the amount of information presented is probably about right for most readers. That of course is a personal judgement. I'm not sure whether the tag was added before or after Tim Bray's rewrite. Mhkay (talk) 09:57, 6 August 2009 (UTC)[reply]
Before. --Cybercobra (talk) 22:29, 7 August 2009 (UTC)[reply]

This article is a mess[edit]

There are maybe two or three smaller-sized coherent WP entries struggling to get out in here. I'm going to invest a few hours in the next few days in trying to make it smaller and cleaner. There are quite a few statements currently here that need some supporting references. I think the WP policies requiring such support are sane and will try to follow them. Tim Bray (talk) 03:28, 29 July 2009 (UTC)[reply]

I'm a rather competent XML practitioner and would be willing to provide a sanity check on the structure and content. Artcolman (talk) 20:50, 30 July 2009 (UTC)[reply]

Entity references and Escaping[edit]

Are &amp; , &lt; , &gt; , &quot; and &apos; the only characters that NEED to be escaped ("need" meaning "required by the XML standard")? Or are there other characters that need to be escaped? Or even less than those 5 ( &gt; doesn't seem to necessary to correctly parse XML - in contrast to the other 4)? This is relevant (at least to me) since XML can be used with many different encodings.
And related to this: Is the character encoding used for the entire document or only for its "content" (leaving the tags, i.e. the structure around this content, in ASCII or whatever standard encoding XML is supposed to use (seems to be UTF8))? E.g., if for some strange reason i were to use EBCDIC as encoding, would the & and < used in XML be the UTF-8 & respectively UTF-8 < or the EBCDIC & respectively EBCDIC < ?
And what about "<?xml version="1.1" encoding="EBCDIC"?>"? Would that line be itself encoded in EBCDIC or in UTF-8? Catskineater (talk) 20:55, 2 August 2009 (UTC)[reply]

Sometimes ', ", and > need to be escaped, ' and " in attribute values, and > after the string "]]". The character encoding always applies to the whole document, markup and content. Tim Bray (talk) 00:37, 5 August 2009 (UTC)[reply]
Wow, wouldn't that make XML lacking external encoding information impossible to parse? To even parse the first line '<?xml version="1.1" encoding="some_strange_encoding"?>' describing the used encoding you have to know that same encoding. You could guess based on statistical information, but that doesn't sound robust. With encodings that are supersets of ASCII you won't notice the problem, but there are examples like EBCDIC which aren't supersets of ASCII. Catskineater (talk) 17:16, 5 August 2009 (UTC)[reply]
In principle, yes. XML parsers aren't actually required to understand any encodings except UTF-8 and UTF-16 with a BOM. Some parsers understand EBCDIC of various flavors, others don't: there is an explanation of how to detect EBCDIC in the non-normative Appendix F. Conceivably, some encodings might not be detectable at all: the fortunately fictional encoding US-BSCII, for example, which is the same as US-ASCII except it interchanges 'a' and 'b', meaning that "us-bscii" in US-BSCII is encoded with exactly the same bytes as "us-ascii" in US-ASCII. Fortunately encodings that are neither UTF-16, UTF-32, ASCII supersets (including UTF-8 and the ISO 8859 group), nor EBCDIC variants are very rare. --John Cowan (talk) 02:05, 9 August 2009 (UTC)[reply]
There are of course many ASCII variants too, though they are becoming rare: but in most cases they don't vary the basic subset of characters [a-zA-Z0-9] which are typically found in the encoding name. Some do, however, substitute the lower-case latin letters with upper-case letters from another alphabet such as Greek or CyrillicMhkay (talk) 02:52, 9 August 2009 (UTC)[reply]

Rewriting[edit]

I'm proceeding through from start to end, trying to leave things behind in a sane state as I pass through. So far, I've whacked a bunch of text that seemed superfluous or in the wrong place but left it behind in comments in case the right place should appear.

I spent some time going back and forth through the XML spec, and it's just too big, there's no way all the syntactic variations and corner cases can sanely be described in this entry. So it seems to me that it's important to make sure that the important stuff: elements, attributes, encoding, escaping, and so on, be well-described. Should there be an ancillary article on "XML Syntax" or some such where there could be a deep-dive on stuff nobody cares about like NOTATION and unparsed entities and so on?

So, what I'm trying to do is leave this in a condition where what's left is an opinion-free well-referenced tour through the important pieces of XML. Some things are sorely lacking: an introduction to the "XML stack" - XML & the Infoset & XSD & XSLT & XPath & RelaxNG and so on and so on.

I'd really welcome opinions about how to handle the stupid verbose unhelpful "pro & con" section. I'm really unconvinced that an encyclopaedic article on XML is actually the place for an argument about whether it's good or not. Here's what it is, here's where it's been used.

What *is* needed, and I'm not sure whether it's in this article or not, is some discussion of XML & other formats that are in common use for data interchange: ASN.1, YAML, and JSON leap to mind. Someone keeps talking up s-expressions but I've never actually seen them used for industrial data interchange. Tim Bray (talk) 00:46, 5 August 2009 (UTC)[reply]

A Comparison of data serialization formats would certainly be interesting. Regarding syntax, there is certainly precedent from several programming language articles to have a separate sub-article on syntax. --Cybercobra (talk) 01:40, 5 August 2009 (UTC)[reply]
The "pro & con" section is unhelpful because it is only a list of disconnected topics IMHO. However I think it is impossible to remove it because a lot of editors would want to have their way and list critics they could find on it. I propose to have only a short chapter here, mainly linking to another article with this list. Hervegirod (talk) 09:41, 5 August 2009 (UTC)[reply]
"a lot of editors would want to have their way" doesn't seem like a good argument to me. Wikipedia experts: is there a tag you can put on a section saying you think it should be deleted, sort of a last-call? Tim Bray (talk) 19:44, 5 August 2009 (UTC)[reply]
I agree with you (and you are an XML expert !), but I don't know if it's possible to put such a tag without "securing" some place for the critics, at least those which are properly sourced. I made this suggestion because I remember a similar situation with the Java (programming language) article. There was a "Criticism" paragraph which at the end became a long list of disconnect critics. Creating a specific article (which has its own problems, I admit) was a way to avoid this, and now the main article is just linking the specific "Criticism" article. However I know it's really an imperfect solution. Hervegirod (talk) 19:59, 5 August 2009 (UTC)[reply]

Let's emphasize how xml helped with the adoption of utf-8. 99.56.139.29 (talk) 02:39, 5 August 2009 (UTC)[reply]

It would help novices if the opening paragraphs made clear that the term "XML" can mean 3 different things, depending on context: (1) the syntax and rules defined by the W3C XML 1.* specs; (2) any vocabulary based on the XML spec (whether or not there is an associated schema); and (3) the set of XML languages applied to a particular problem set, as in "we are using XML technology for information exchanges". Ken Sall (talk) 22:28, 5 August 2009 (UTC)[reply]

Ken - good idea. I'd recast (3) slightly as "xml technology in general", in that (nice) example sentence, when they say XML they mean one or more XML languages and parsers and APIs and transformers and so on. Why don't you go ahead and add that? Tim Bray (talk) 23:16, 5 August 2009 (UTC)[reply]
Actually, "XML" is used very widely to mean anything except XML (the markup language). In particular, it is used to mean typed data objects that form a series of trees by XQuery people and merchants, and typed+validated trees (PSVI) by XSD people and merchants. And it is used to stand for the full WS-* stack sometimes: how often do we read comments on "how complicated XML is" only to find they are not discussing XML (the markup language) at all! It is like how Java EE people habitually use "Java" instead of "Java EE". Rick Jelliffe (talk) 06:48, 6 August 2009 (UTC)[reply]
Tim and Rick, I've added an "About the term XML" section. Couldn't figure out how to make the link to the Category:XML page which I think would be a good jumping off point for those interested in the "XML technologies" aspect.Ken Sall (talk) 01:51, 7 August 2009 (UTC)[reply]

Misc[edit]

How can XML be "fee free" when Microsoft is not allowed to use it, and has patented some of it? http://blogs.zdnet.com/BTL/?p=22595&tag=nl.e550 http://i.zdnet.com/blogs/msfti4icomplaint.pdf http://i.zdnet.com/blogs/msfti4ijudgment.pdf —Preceding unsigned comment added by A6zzz (talkcontribs) 22:30, 12 August 2009 (UTC)[reply]

No-one has patented XML and everyone is free to use it. Some people claim to have patented some processes that happen to use XML, and therefore restrict your ability to use such processes. But to claim you can't use XML because of patents is like claiming you can't use a spring because someone has patented a mousetrap that uses springs. Mhkay (talk) 13:59, 13 August 2009 (UTC)[reply]


The section introducing elements would be improved by noting that XML names cannot contain spaces. We take it for granted, but that <person name="fred"> means there is a person with an attribute name is not obvious: I have had people (COBOL programmers and database people) who want to read it as a property "person name" which has a value "fred". Rick Jelliffe (talk) 07:31, 6 August 2009 (UTC)[reply]

The section on Sources does not mention Charles Goldfarb. This is a major omission, bordering on insult, since many of the ideas in XML come directly from him and his efforts. I suggest the second sentence of the first paragraph should be "Many of the ideas in SGML in turn are attributed to Charles Goldfarb." Rick Jelliffe (talk) 07:31, 6 August 2009 (UTC)[reply]

The section on Sources mentions Steve deRose. I also was involved in implementing a WF-class parser in 1989 as part of the RISP LISP SGML text processor at Unicode, Tokyo. And even when people used a full SGML parser such as OmniMark, normalizing the SGML was a common step. So XML reflected many years of common practise. I don't think it is important enough to warrant a change, just noting it. Rick Jelliffe (talk) 07:31, 6 August 2009 (UTC)[reply]

not to diss Steve, but I also built a WF-class parser at the New OED project in 87-89. Rick, why don't you go ahead and make some of the changes you're suggesting. Tim Bray (talk) 08:30, 6 August 2009 (UTC)[reply]

The section on Sources is a little squiffy on ERCS, but I don't expect there is any value in teasing it out. ERCS came first: some of its recommendations (extended naming rules) made it into SGML (ENR)[[1]] which I co-drafted before the XML effort started, XML itself and adopted/adapted other parts of it (hex NCR, the particular naming rules), and then SGML was again fixed to support other leftover parts of XML [[2]] I don't think it is important enough to warrant a change, just noting it. Rick Jelliffe (talk) 07:31, 6 August 2009 (UTC)[reply]

XML Base should be mentioned in the Related Specifications. I don't know how much xml:base is used, but the XML Infoset spec depends on it as do XML Schema, RELAX NG, XPath 2.0, XSLT 2.0, XQuery. I think the ordering of the listed specifications should as much as possible reflect layering, so I would put it after XML Namespaces and before XML Infoset. I would move xml:id to after XML Infoset (since XPath 2.0 depends on it, but XML Infoset doesn't). James Clark (talk) 22:16, 7 August 2009 (UTC)[reply]

The section on xml:id was seriously in error. The language I changed was "allows an author to confer ID-ness (in the sense used in a DTD) on an attribute, by naming it in an xml:id attribute". That sounds as if you give an attribute name as the value of an xml:id attribute, and the named attribute becomes an ID. We could have done that, but we did something simpler: any attribute named xml:id is an ID, even in a document whose DTD does not define it so. So my wording is "allows an author to confer ID-ness (in the sense used in a DTD) on an attribute, by naming it xml:id." —Preceding unsigned comment added by Johnwcowan (talkcontribs) 02:14, 9 August 2009 (UTC)[reply]

As an aside, I have just edited the entry on SGML in various ways which I hope will make it a more useful companion to the XML entry, mainly at the top. It still has a lot of insane minutae about syntax, all icing and no cake, so anyone else from that era is welcome to improve it further.Rick Jelliffe (talk) 17:31, 10 August 2009 (UTC)[reply]

The section on related standards is entirely disposable. As standards they are hardly the most important or typical standards that use XML. Their only connection is that they are all ISO standards: so what?Rick Jelliffe (talk) 09:39, 18 August 2009 (UTC)[reply]

Zhang/XimpleWare/VTD-XML[edit]

There's a minor edit war going on here, where J. Zhang, founder of XimpleWare, is trying to promote his company's product by putting plugs for his VTD-XML into this article. BTW I dropped by XimpleWare via a google search and got a warning about it being a known host for browser-attack viruses. Tim Bray (talk) 21:05, 7 August 2009 (UTC)[reply]

Yes, I've been one of those deleting Zhang's contributions. He (or she?) has also been conducting a dialogue on my talk page and in private email. To be fair to Zhang, I don't think this is commercially-motivated advertising, it stems from a deep conviction that their technology is highly significant and will change the world. But for this article, the rule has to be that they first need to convince the world of its significance through means other than Wikipedia. Mhkay (talk) 09:23, 8 August 2009 (UTC)[reply]

DTD[edit]

The sentence "DTD is still used in many applications because of its ubiquity." has a strong tautological feel. I'd go ahead and edit it (to, e.g., "Despite these perceived limitations, DTD is still in widespread use.") but perhaps there's a more subtle point that I'm missing - something to do with the specific mention of applications. I.e., DTDs are still used by applications because there's so much XML out there that uses DTDs. Was that the intention motivating the mention of applications? If so, the sentence could still be changed to feel less tautological. Terry Jones (talk) 21:54, 8 August 2009 (UTC)[reply]

I've changed this to "DTD technology ...". I think what's intended here is to convey that many models are still defined with DTDs because there's a pretty good guarantee that most XML software will then support it (of course, that's not the whole story ... but a fuller analysis of why DTDs haven't gone away would perhaps not be right for this article?) Alexbrn (talk) 07:08, 9 August 2009 (UTC)[reply]

There is a new paragraph that says that DTDs are hard to read because of parameter entities. But I think it is bogus. Who thinks that XSD is easier to *read* for example? People really like RELAX NG compact syntax, and *everything* statement in RELAX NG is a parameter-entity equivalent. I would remove this sentence or phrase it in some neutral way, and (because I don't think it is universally accepted) get some citation. Rick Jelliffe (talk) 08:29, 25 August 2009 (UTC)[reply]

Referencing the spec[edit]

This article needs a policy decision as to which version of the XML spec should be referenced by default, and then go through and look at every reference to the spec; check out the References section to get a feel for the current disorder. My feeling would be to stick with www.w3.org/TR/REC-xml/ everywhere that a specific edition isn't being called out, but I haven't thought it over deeply. Tim Bray (talk) 04:47, 10 August 2009 (UTC)[reply]

I think that's the way to go. Hervegirod (talk) 17:38, 10 August 2009 (UTC)[reply]

Something for Dummies?[edit]

I know this article is not intended to be "XML for Dummies" but it doesn't at all help the kind of people that I try to explain XML to when they ask me what I do for a living. I generally do this by pointing out the syntactical similarities to HTML and the fact that you can define your own tags. Probably this is not the best way to describe XML but something near this level would be a good thing, preferably before the word "lexical" is ever used, and with an example. 173.32.243.222 (talk) 22:43, 10 August 2009 (UTC)[reply]

I like the XML-for-dummies text that's now serving as the article's lede, but I don't think it comes close to what Wikipedia:Lead wants, so I'm going to have a whack at a better lede. For the moment, I'm going to stash the for-dummies text here and see if we can find a home for it somewhere else in the article. Tim Bray (talk) 21:39, 13 August 2009 (UTC)[reply]

XML (Extensible Markup Language) is a way of marking up structured documents, that is, documents in which the markup primarily indicates the content's purpose rather than its formatting. Like HTML, it uses tags (for example <para>) and attributes (for example <image file="face.jpg"/>) but there are some very important differences. Among them are:

  • Tags must be balanced: a start-tag such as <name> must be paired with an end-tag </name> (the tags, plus their content, comprise an element), or else the tag must use the empty-tag syntax (<name/>) in which case it cannot have content, but may have attributes.
  • Elements must be properly nested: if element B's start-tag occurs after element A's start-tag, then B's end-tag must occur before A's end-tag (for example, <firstname>Fred<lastname>Fish</firstname></lastname> is not allowed).
  • The values of attributes must be enclosed in matching single- or double-quotes.
  • The document must contain exactly one top-level element (known as the root element) that encloses the rest of the elements.

Documents that meet the above four requirements, and certain others, are said to be well-formed. All XML documents must be well-formed.

  • Instead of using a more or less fixed set of elements and attributes, as in HTML, designers may create their own.
  • Optionally, XML documents may be tested against a set of formally-declared rules (known as a DTD or Schema) that specify how the elements can be arranged and what attributes they have. This is called validation.

HTML elements have built-in rendering support in web browsers, but XML documents must be accompanied by a cascading style sheet in order to be displayed graphically in a browser. XML documents may also be rendered online or printed using specialized applications.

Backwards and in heels[edit]

The lede still showed its debt to what must have been tutorials, guidelines for use, or marketing materials about why XML mattered. I compressed and reworded it, and moved some material into a ref. See if it still scans.

The different uses of 'XML language' v. 'XML dialect'; and 'specification' v. 'standard' should be cleared up, in all documents about XML. It's a big task, so let's start here :-) Someone who's done more work in the field should add a better definition of an XML dialect to the list of key terms. +sj+ 00:50, 12 August 2009 (UTC)[reply]

The new lede is not terrible, but I don't think there's consensus in the XML community that "dialect" is the correct term. My impression is that when people talk about XML-based markup vocabularies (the only term that is somewhat blessed in normative text, cf the namespaces spec), they use the term "XML Languages". So I disagree with your global move to "dialects". Any other opinions? Tim Bray (talk) 14:57, 12 August 2009 (UTC)[reply]
I hadn't come across this term "lede" before, but I see it's American slang, so that's not surprising. I think the introductory text before the table of contents should be much more concise, and it should describe XML for what it is, not for how it differs from HTML - I don't think that one should assume that the people who want to know what XML is will already be familiar with HTML, and describing it in terms of the differences is very unhelpful to those who aren't. Mhkay (talk) 02:46, 14 August 2009 (UTC)[reply]

Problems and errors[edit]

I see that in the course of a few days recently, this article has been almost entirely rewritten. That means we now have a preponderance of the opinions and viewpoints of just a few editors where previously we had the combined consensus of hundreds of contributors.

Ho hum.

Here are the first few problems problems I see already:

  1. The lead should summarise the article, not the technical spec, or try to be a mini-howto
  2. The bold characters in the lead should emphasise the name of the article, not 'file="face.jpg"'
  3. The fundamental concept within an XML document is the element, not the 'tag'
  4. XML text documents today are increasingly considered to be mere serialisations of in-memory XML DOMs (See X/HTML5)
  5. 'Elements must not overlap' is definitely not the second most important thing for the layman to know about XML! Nor is it the second part of the article the lede is summarising
  6. The theory that 'XML' has three distinct colloquial meanings is pure unreferenced personal original research and should be cited or removed immediately
  7. Our new article seems not to have grasped character encoding at all: Having said 'Almost every legal Unicode character may appear in an XML document', it goes on to give an example which restricts the character encoding to "encoding='ISO-8859-1'", which is almost unheard-of in practical XML usage
  8. It even goes so far as to say that 中 should be referred to as &#20013; or &#x4e2d; and teaches us how to avoid the correct use of Unicode altogether with "I &lt;3 J&#xF6;rg" (and all this in a Unicode XHTML document downloaded from Wikipedia!) And we used to have the valid example '<俄語>Китайська мова</俄語>' to make the whole point succinctly.

I can't get up the energy to carry on - I'm only about a quarter of the way through. Only to note that we've also gone down from about 50 cited sources to 25 as well.

I find it hard that some people's vanity was so huge that they felt that they could do better in a few hours than all that collaborative effort over so many years. Of course there were issues with the old text but there are babies and there are bathwaters too.

I don't know where to start.

--Nigelj (talk) 10:25, 13 August 2009 (UTC)[reply]

Unfortunately, with a subject like XML, there are a lot of people who know a little about it, and this tends to result over time in an article that contains a lot of things that are only half-true, and many things that are irrelevant. The new version is a great improvement, a much better place to start. Of course, further improvements are possible. Please don't accuse an expert who knows his stuff better than anyone and who volunteers his time to do this work of "vanity" - that's plain silly, and devalues the rest of your comments. Mhkay (talk) 13:37, 13 August 2009 (UTC)[reply]
I am not about to bandy real-world professional project experience, University or professional qualifications with you or anyone else with regard to this subject, but I assure you that I know more than "a little about it", thank you very much. Since I started volunteering my own time and expertise to this article in August 2005 (and to WP in general in 2004), I have seen several major re-writes of other important articles. The normal approach is to draft the rewrite somewhere in user- or talk-space and to invite others to view, comment and contribute for a few weeks or months. This ensures that the best of the old article's content and goodwill are maintained along with any new material, while the irrelevancies can be cropped. Deciding off-line amongst yourselves that a small group of you are "a bunch of highly-expert new contributors [now brought] on board" and going ahead in article-space with little general discussion is what I was referring to as a kind of vanity. Now please stop puffing yourselves up to be some superior kind of contributors, and then we can all cut out these personal attacks. And we can all get on with fixing the article, which, as is gracefully acknowledged below, is currently a little 'buggy'. --Nigelj (talk) 20:33, 13 August 2009 (UTC)[reply]
The old article was rambling, way too long, full of irrelevancies, and neglected to mention all sorts of important stuff. We've brought a bunch of highly-expert new contributors on board in the last couple of weeks and have a shorter, tighter article. I think that it's, while imperfect, quite a bit better. You're free to disagree. Tim Bray (talk) 16:56, 13 August 2009 (UTC)[reply]

Now, with respect to Nigelj's detail points:

  1. Agreed that the lede, while helpful those seeking an introduction, is inappropriate. Will try again. Also agree about inappropriate boldfacing and tag/element.
  2. I'd have to disagree with the claim that "XML text documents today are increasingly considered to be mere serialisations of in-memory XML DOMs", while agreeing that that is the HTML5 viewpoint. For the primacy of syntax over object models or APIs on the Web, see Architecture of the World Wide Web
  3. "The theory that 'XML' has three distinct colloquial meanings is pure unreferenced personal original research and should be cited or removed immediately". This is an important truth in the market, and I think important to know for someone who comes here wondering "what is this XML I'm supposed to use to solve the problem?" I understand the no-original research principle and will look for supporting citations.
  4. "Our new article seems not to have grasped character encoding at all: Having said 'Almost every legal Unicode character may appear in an XML document', it goes on to give an example which restricts the character encoding to "encoding='ISO-8859-1'"... If that's what the current text made you think it said, that's a bug & should be fixed.
  5. "It even goes so far as to say that 中 should be referred to as" - The point was that if you can't type in 中 on your keyboard, or you're stuck with using ISO-latin, you can still have it in your XML document. If there's any suggestion of "should", that's a bug. Tim Bray (talk) 16:56, 13 August 2009 (UTC)[reply]
Hi Tim. With regard to your point 2, Architecture of the World Wide Web says, "If the URI owner has provided more than one representation (in different formats such as HTML, PNG, or RDF; in different languages such as English and Spanish; or transformed dynamically according to the hardware or software capabilities of the recipient), the resulting representation may depend on negotiation between the user agent and server." This Recommendation (although published in 2004, and so much older than what I meant by 'today') seems set on the idea that what is returned over the wire is a representation of the resource: the resource is something else that is held on the server, and among the list of representation example formats, XML could easily have been added to HTML, RDF etc, I think. What is made more clear in current W3C documents like HTML5 is that the resource itself may be considered to be the in-memory DOM. The XML 1.0 spec actually begins, "Extensible Markup Language, abbreviated XML, describes a class of data objects called XML documents" - data objects called documents, not text documents. I only mentioned this as I felt it might be a good place to start, not explicitly but in the mind's eye as it were: like the BNF in the XML spec, an XML document is... a prologue, an element and some optional things like comments, processing instructions and whitespace. Then an element is... two tags and some optional content possibly including other elements, or an empty tag. A start-tag or an empty tag may contain attributes, an attribute is... etc. At some point we get down to numeric and named entities and that's where we get to &amp;, &#20013; and why they exist etc. I'm just suggesting one possible plan or structure that means we don't have to gloss over important points with partial explanations at the beginning, as can happen when we introduce them too soon to want, or be able, to explain them properly. If we start with the big picture and keep it simple, we can drill into detail later on, still building a true representation of what XML actually is and isn't, what it can do and what it can't, what it can represent, what can be done to it etc. Just a suggestion. --Nigelj (talk) 21:23, 13 August 2009 (UTC)[reply]
This turns out to be a really important argument that keeps coming back. I'm pretty convinced that one of the reasons the Web has worked so well is that it doesn't try to do APIs and object models across the net, but interoperates on the basis of syntax. If you read the Webarch document carefully I'm pretty sure it's clear that representations are just what you say they are conceptually, but they manifest in the real world as sequences of bytes. I'm also 100% sure that the XML 1.0 spec is very clear that what it's defining is a syntax that applies to sequences of Unicode characters. I agree that HTML5 is trying a bold experiment in a new direction; tie the DOM and the syntax together at the hip. While some of the HTML5 work looks likely to be very popular, e.g. <video> & <canvas>, this other model/syntax stuff remains at this point a very interesting science experiment. One of the reasons XML caught on is that it didn't initially stray outside the realm of syntax. The fact that there are multiple API flavors (stream/DOM/whatever) is a beneficial side-effect of this. Tim Bray (talk) 21:36, 13 August 2009 (UTC)[reply]
There are certainly many people who like to think of the data structure as primary, and the lexical form as a "mere" serialization. Often that's the way I think, and it's certainly the way I encourage people to think when they are programming against XML. However, it's not what the specs say, and I think the specs carry some authority. Apart from anything else, the lack of a definitive and universal data model for XML is one of its most notable features. There are many data models for XML, not one, and they are all derived from the textual syntax, not the other way around.Mhkay (talk) 03:11, 14 August 2009 (UTC)[reply]
XML manifestly is a syntax/language not an API or object model. There is no other way to read the XML specification. The element may be more important then the tag, but in XML there are no elements without tags. It certainly might be good to start with the XML information set, but that is something only implicit in XML and variable. Rick Jelliffe (talk) 10:39, 20 August 2009 (UTC)[reply]

Draft alternate lede[edit]

Here's a draft of an alternate text for the lede (would also replace the "Meanings" section). I've tried to follow Wikipedia:LEAD; I think this draft aligns nicely with the TOC. Input? Tim Bray (talk) 23:03, 13 August 2009 (UTC)[reply]

XML (Extensible Markup Language) is a set of rules for encoding documents electronically. It is defined in the XML 1.0 Specification produced by the W3C and several other related specifications; all are fee-free open standards.[1] As of 2009, hundreds of XML-based languages have been developed,[2] including RSS, Atom, SOAP, and XHTML. XML has become the default file format for most office-productivity tools, including Microsoft Office, OpenOffice.org, AbiWord, and Apple's iWork.

XML’s design goals emphasize simplicity, generality, and usability over the Internet.[3] It is a textual data format, with strong support via Unicode for the languages of the world. Although XML’s design focuses on documents, it is widely used for the representation of arbitrary data structures, for example in web services.

There are a variety of programming interfaces which software developers may use to access XML data, and several schema systems designed to aid in the definition of XML-based languages.

For further coverage of the many XML languages and technologies, see the XML Category.

Looks good to me. It's accurate and not over-technical. I'm not 100% confortable with "encoding", because of the potential confusion with character encoding, but I can't think of anything better. I would leave out the quotes around "schema", which suggest that the word is somehow being used improperly. I do wonder what "usability over the Internet" is actually supposed to mean: usability is a measure of the efficiency and effectiveness of a user performing a task, and I don't know what the task is - presumably doing *something* over the internet, but I'm not sure what. Mhkay (talk) 02:37, 14 August 2009 (UTC)[reply]
Agreed, mostly. I'd welcome a better word than "encoding", sigh. I took the quotes off schema. "Usability" is taken more or less directly from the spec's "goals" section which says "straightforwardly usable" Tim Bray (talk) 03:40, 14 August 2009 (UTC)[reply]
I don't like the first verb being "provides". The lead sentence should identify up-front what the thing is. i.e., the first verb used should be "is". --Cybercobra (talk) 04:24, 14 August 2009 (UTC)[reply]
Works for me. Tim Bray (talk) 05:28, 14 August 2009 (UTC)[reply]
I would strike the last sentence; it's an improper self-reference. --Cybercobra (talk) 06:35, 14 August 2009 (UTC) Also, XML's SGML origins should be mentioned. Its error handling might also merit a mention/sentence; this I'm less sure about. --Cybercobra (talk) 06:39, 14 August 2009 (UTC)[reply]
Thanks for all the detail edits. Revised the last sentence, I do think it's important to squeeze in a category pointer in the lede somehow or other. Got a better idea? (Surely there must be a common Wikipedia idiom for doing this.) As for SGML, I think its importance is now historical and thus the coverage in the Origins section is fine. As for error-handling, the draft lede is already kind of long. Tim Bray (talk) 07:25, 14 August 2009 (UTC)[reply]
Actually, the lede can be up to 4 paragraphs. --Cybercobra (talk) 06:19, 15 August 2009 (UTC)[reply]

Typographical conventions[edit]

The current article has inconsistencies. The following constructs appear in the article:

- individual characters, for example A and 中 - Syntax characters, e.g. < and & - small chunks of XML, e.g. <line-break/> - multi-line examples, e.g. at the bottom of the "Key Concepts" section

The potential typographical tools are: double-quotes, bold, italic, and monospace via the "code" and "source" elements. Does Wikipedia have a set of rules that we can follow? I'm cleaning up some of the more obvious inconsistencies but it would be nice to standardize carefully end to end. Tim Bray (talk) 05:19, 15 August 2009 (UTC)[reply]

I've now made them consistent. Code in <code></code>, characters in quotes. Bold should only be used when introducing terminology and italic is only used for emphasis or to format the title of a work, not for quoted text. That leaves quotes and monospace, which is what I went with. --Cybercobra (talk) 06:12, 15 August 2009 (UTC)[reply]

Disposable sections[edit]

Copying Rick's remark from above:

The section on related standards is entirely disposable. As standards they are hardly the most important or typical standards that use XML. Their only connection is that they are all ISO standards: so what?Rick Jelliffe (talk) 09:39, 18 August 2009 (UTC)

+1 on removing that section. Also I note the four trailing sections:

  • 10 See also
  • 11 References
  • 12 Further reading
  • 13 External links

They look redundant and I can't figure out which rule sorted which references into which section. Could someone with more Wikipedia experience check it out and see if there's scope for reorg/simplification. Most of the contents look perfectly reasonable. Tim Bray (talk) 15:03, 18 August 2009 (UTC)[reply]

There was 1 redundant link, everything seems to be in the right section. The External links may need pruning. --Cybercobra (talk) 17:28, 18 August 2009 (UTC)[reply]

NeoOffice, AbiWord[edit]

Someone seems to think these applications are important enough to justify a mention in the lead paragraph of the XML article. I don't think they are. I have no views on the merits of the products, but they get 0.5m and 2.3m Google hits respectively, and there are thousands of XML applications with more than that: we can't mention them all. (We don't mention SVG, for example, and that gets 13m.) It's a list of examples that's there to demonstrate the truth of an assertion, and three examples is enough. Any more than that is advertising.Mhkay (talk) 21:22, 22 September 2009 (UTC)[reply]

The lede should summarise the main points of the article[edit]

At the moment it seems to cover its own, different ground.

The trouble with this includes, if there is anything controversial in the lede, there is no room to explain the finer points, without the lede getting unduly long. I'm not happy with the implication of saying "As of 2009 [...] XML-based formats have become the default for [...] Microsoft Office (Office Open XML) [and] OpenOffice.org (OpenDocument)". While MS Office may be just changing over in 2009, XML-based formats have been the default on OOo for a decade (since it was StarOffice). Lumping them together in this way, which is unfair on the reader by giving a false impression unless they follow links and do their own research, is due to trying to make too many points in one sentence, which in turn in brought about by trying to cover new ground in the lede that is not properly discussed in the body of the article, which itself is in contravention of the guidelines.

The main sections of the article are

  1. Key terminology
  2. Characters and escaping
  3. Well-formedness and error-handling
  4. Schemas and validation
  5. Related specifications
  6. Use on the Internet
  7. Programming interfaces
  8. History

We either need to discuss the appropriateness of having these sections in the first place, or be content to summarise them in the lede. --Nigelj (talk) 08:49, 4 October 2009 (UTC)[reply]

The lede can contain more than just mere summary of subsections; in the case of the sentence in question, in the context of its surrounding paragraph, it is demonstrating the widespreadness of XML. The statement is not inaccurate; it does not say that no XML office document formats were the default prior to that date, but rather that as of that date all the major programs in the area have standardized on XML (vs. just one of them). --Cybercobra (talk) 09:19, 4 October 2009 (UTC)[reply]

XML vs SGML[edit]

Perhaps there should be a section on what exactly the differences are or what form the special case takes. (Or maybe expand on "Sources".) 118.90.15.97 (talk) 09:42, 28 November 2009 (UTC)[reply]

Reasons for Popularity[edit]

Why is XML so popular? Why do some many protocols use XML, despite its extreme inefficiency? Could we cover that in the article? --Matthew Bauer (talk) 04:12, 5 December 2009 (UTC)[reply]

I think you would have difficulty drafting text for the article that answers that question in an objective way. You would certainly have to find published material that discusses the question, and ensure that any opinions expressed are attributed to reputable sources.
My own view (and you might find published articles or talks that express this) is that the primary reasons were (a) the fact that XML handled both documents and data at a time when the web badly needed a technology that could do both; (b) the fact that it was good enough to meet all the requirements; (c) the fact that it was very cheap to implement; (d) the fact that there were no major competitors at the time it was launched, and that all the influential players (W3C, Microsoft, Sun, IBM, Oracle etc) endorsed it. There had been alternative technologies with better performance and better functionality for years (such as ASN.1) but they were horrendously expensive to implement.Mhkay (talk) 23:23, 6 December 2009 (UTC)[reply]

Apparent Contradiction[edit]

Under the heading "Comments," the article states

The string " -- " (double-hyphen) is not allowed, and entities must not be recognized within comments.

An example of a valid comment: "<!-- no need to escape <code> & such in comments -->" 

So is the double-hyphen allowed in comments, or not? Moioci (talk) 23:47, 7 December 2009 (UTC)[reply]

The double-hyphen is used to start and terminate the comment; other double-hyphens aren't allowed in the text of the comment apparently.

<!-- This is valid -->

<!-- This -- is -- not -->

Will try and edit to clarify. --Cybercobra (talk) 01:43, 8 December 2009 (UTC)[reply]

Don't make the mistake of thinking that every nuance of the spec has to be in this article. It's supposed to be an encyclopaedic overview, not a detailed reference. OK, I know there people who will try to write XML using Wikipedia as their only source of information, but they are not our target audience. Mhkay (talk) 13:14, 8 December 2009 (UTC)[reply]

Criticisms of XML missing[edit]

I notice that on the 6th of August, any discussion of the issues of XML became limited to in-place comments in the corresponding sections. Whilst this is appropriate for someone reading the complete article, anyone skimming the resource (as is typically done when someone is researching) will be left with an artificial perspective. Whilst I can understand that the previous list could well have suffered from endless expansion, it is none-the-less appropriate for an article on any subject to detail the views of it's detractors, and did appear to be referenced reasonably. LinaMishima (talk) 17:10, 22 December 2009 (UTC)[reply]

Personally, I think Wikipedia articles (especially on a technical subject) are best when they stick to facts. Opinions about strengths and weaknesses will always be opinions, even if they were first aired in a place other than Wikipedia. Note that it wasn't just the negative opinions that went in that rewrite, it was also the positive ones. Mhkay (talk) 13:59, 23 December 2009 (UTC)[reply]
That someone has performed analysis, praise or criticism is a fact - we don't get to discount this, especially when those involved are notable, their comments hold weight, and may even be backed up with yet more verifiable details. If anything, objective details on a subject are already easy to find without having to visit wikipedia. Without collecting such information, researchers cannot use wikipedia as a summary source - they are ultimately forced elsewhere. HTML talks about features missing, DisplayPort is compared with other formats, SOAP details such information, JSON also has comparisons. More notably, Opera (web browser) (FA) features discussion of its reception, as does Twitter (GA) and YouTube (GA). Perhaps the most appropriate thing to do would be to merge the seperate sections from articles on similar technologies, and write an XML-focused summary for the section. LinaMishima (talk) 05:04, 24 December 2009 (UTC)[reply]
Yeah, around that time, in August 2009, we were told that we were "a lot of people who know a little about it", and that someone had "brought a bunch of highly-expert new contributors on board in the last couple of weeks". We were told not to "accuse an expert" of not being better than all of us put together. So, I left them to it. I can't be bothered with that attitude on WP or anywhere else. --Nigelj (talk) 12:27, 24 December 2009 (UTC)[reply]

Music Markup Language[edit]

Please consider adding this external link: Music Markup Language. -- Wavelength (talk) 18:28, 14 April 2010 (UTC)[reply]

(1) There's already an article on it (Music Markup Language) and internal links are preferred to external links (2) It is already mentioned in the categories and list articles linked in the 'See also' section (3) Why would this particular language merit mentioning? There are tons of XML-based languages, it's not feasible to list every single one on the page; only a very few examples are given, solely in the last paragraph of the introduction, and those ones are wildly popular; there is no indication Music Markup Language is anywhere near as widely used. --Cybercobra (talk) 18:44, 14 April 2010 (UTC)[reply]
Thank you for your reply. I did not notice the article on it. I accept your reasoning for not listing it as an external link.
-- Wavelength (talk) 16:47, 15 April 2010 (UTC)[reply]

I would like to see more resource to the tutorial resource[edit]

Hello, members of this community. I consider that this article is very informative, professionaled and so on. The great tutorials was used for writing of this article such w3schools. But I would like to see here tutorials which are not so great but very useful for beginers too, such as http://phpforms.net/tutorial/tutorial.html What do you think you about it? Thank you in advance. —Preceding unsigned comment added by Malinari (talkcontribs) 16:35, 21 April 2010 (UTC)[reply]

Wikipedia is not a tutorial: see What Wikipedia is not. That policy statement doesn't directly address the level at which an article should be written, for example whether an article on XML should be written for the general public or for professional programmers. But in my view, this article is pitched at about the right level. There are plenty of other sources if you want a more gentle introduction or (conversely) something addressing the formal computer science audience. Mhkay (talk) 13:26, 22 April 2010 (UTC)[reply]

The above commentary notwithstanding, this entry is incredibly obtuse, only slightly more readable than a reference book. I come here every few months lookng for some useful information as to what XML is, forgetting my last abortive attempts to understand it here, the details of what it is made up of, etc, something that I can elucidate my own lacking knowledge of it. Instead I read abstruse cryptic commentary, assumptive descriptions, and ambiguous terminology that presumes the reader knows enough about the topic that it would appear unnecessary to read the entry. I am a programmer from the old school, and an EE so I have had my soirees with technical manuals, but this prose is so dense and undefinitive as to the terms used, I find my mind wandering away from it, not mulling over the information in it. It doesn't have to be a tutorial to be understandable.

XML (Extensible Markup Language) is a set of rules for encoding documents in machine-readable form.[edit]

Would it not be more appropriate to say that xml is for encoding in human-readable form?

What is machine-readable form supposed to mean?

UndercoverAgents (talk) 18:52, 7 July 2010 (UTC)[reply]

Its sole purpose is to interpret human-readable content/context and to turn it into machine-readable from which through a medium/interface. XML can be read/interpreted and parsed from within compiled and ascii, which suggests both application and text would be valid (personal opinion). Daemondevel (talk) 00:57, 4 August 2010 (UTC)[reply]

Spelled-out title[edit]

The W3C defines XML as follows:

Extensible Markup Language, abbreviated XML, describes a class of data objects called XML documents ... and so on

The important part here is that the article should first use the fully spelled out name and then the abbreviation. This is not only in accordance with the standard, but also follows general rules of good writing in English. Kbrose (talk) 03:15, 4 August 2010 (UTC)[reply]

This might be true if there were consensus that "XML" is an abbreviation of the three-word form, but many of us just don't believe that. Tim Bray (talk) 05:39, 4 August 2010 (UTC)[reply]

"Extensible Markup Language" first?[edit]

We're going to need to sort out what it should say at the top of the article:

Current candidates:

Extensible Markup Language (XML) and XML (Extensible Markup Language)

The first is supported by the English convention that a full name is listed first, and wording from the W3C spec: "The Extensible Markup Language (XML) is..." The second by the fact that the title of the article is (appropriately) XML and since the three-letter version is used rather than the three-word version in approximately 100% of spoken and written discourse.

Also note that XML is *not* an abbreviation or an acronym, it is just another name for the same thing.

My vote would be that the primary name should be the same as the title of the article and should reflect common usage. But it's not a matter of life or death. What do others think? Tim Bray (talk) 03:19, 4 August 2010 (UTC)[reply]

Of course it's an abbreviation, even the standards documents specifically say so, as quoted above. The title of the article should also be the full name, ideally. The reason that it isn't, is that too many writers here are suffering from Acronymitis. Almost all articles of computer networking protocols use the full protocol name as title, even for the most common of protocols, such as IP. Kbrose (talk) 03:27, 4 August 2010 (UTC)[reply]
There's no need for the title and the first name mentioned in the lede to match, particularly for acronyms where the full name is less common: NATO, Laser; see WP:SINGULAR on acronyms. --Cybercobra (talk) 03:56, 4 August 2010 (UTC)[reply]
The form in the NATO entry looks better than either alternative to me. "Extensible Markup Language or XML" correctly reflects that one is not an abbreviation of the other. I'd put the more common form first, but that's hard to get too excited about. Tim Bray (talk) 05:42, 4 August 2010 (UTC)[reply]
Empirically, the first letter of "Extensible" is not "X". It is generally agreed upon that writing "eXtensible Markup Language" is an error, which is another symptom of the fact that "XML" and "Extensible Markup Language" are two names for the same thing, one immensely more popular and widely used than the other. When I drafted the first sentence of the XML specification, I was insufficiently percipient to have predicted which would catch on. Tim Bray (talk) 05:38, 4 August 2010 (UTC)[reply]
XML is definitely an acronym, as evidenced by the fact that "ML" is "Markup Language" and "X" is generally considered an acceptable character abbreviation to represent extensible, at least in part because extensible and xtensible share the exact same phonetics. For other examples see XP (eXtreme Programming, eXperience Point), XSL (eXtensible Stylesheet Language), XBML (eXtended Business Modeling Language, eXtensible Battle Management Language), XMP (eXtensible Metadata Platform), and so on. The oXygen editor's product name is a play on the "X" acronym use, so with many counter examples, I would argue that it's not generally agreed that eXtensible is incorrect. It may not be a well-formed acronym, but it does have the most important semantic mapping characteristic of an acronym and in the one case that it doesn't take the first letter mapping, it uses an acceptable replacement. That's the first point, so if I replace XML with DNA, does the second point hold up?
"The second by the fact that the title of the article is (appropriately) DNA and since the three-letter version is used rather than the three-word version in approximately 100% of spoken and written discourse."
In this case, it's obvious that the typical rules of English apply, even though most people probably don't even know what DNA stands for any more. I can definitely see (and agree with) the logic of mapping from the commonly seen and heard acronym back to the expanded form when the acronym serves as a mental key, but that's inconsistent with currently correct english usage. It's essentially guaranteed that acronyms are always going to be more popular and more widely used than their expansions because that is their very purpose, so your argument would apply to all acronyms. MaxxD (talk) 07:10, 4 August 2010 (UTC)[reply]
(Okay, I'll argue with myself...) XML is technically an initialism, not an acronym or abbreviation because it is not a pronounceable word, but the point that it does represent the initials of the expanded form (notwithstanding the ex/x issue) is reasonable. However, The Chicago Manual of Style (CMS) states, "Occasionally, too, it makes sense to use the acronym first and put the full name in parentheses, if the acronym in question is so familiar to your expected audience that it almost goes without explication." [3] and XML has certainly achieved this distinction, so writing it as "XML (Extensible Markup Language)" is not only perfectly okay per the CMS, but almost certainly preferred. MaxxD (talk) 09:36, 4 August 2010 (UTC)[reply]

Details of valid characters[edit]

The section "details of valid characters" is getting absurdly detailed, especially as it appears so close to the start of the article. It's simply not interesting to the average reader who comes here wanting an overview of what XML is - the kind of people who want this level of detail are much more likely to go to the specs than to come here. I think the usual Wikipedia solution is to move the material out to a separate article, and I propose doing that. Mhkay (talk) 11:06, 13 August 2010 (UTC) (Now done.)[reply]

By definition?[edit]

Under "key terminology" it is stated: "By definition, an XML document is a string of characters.". By what definition, pray? That's not what the definition of "document" in the XML 1.0 rec says. It might be nice if it did, but it doesn't. Instead it mumbles about "textual objects", thus leaving (deliberately?) ambiguous the question of whether a document is a sequence of characters or a sequence of octets. Mhkay (talk) 23:23, 25 August 2010 (UTC)[reply]

This description is only useful to people who already know what XML is useful for[edit]

Hi!

it would be helpful if someone re-wrote this to explain why XML exists, as this would justify the entry. —Preceding unsigned comment added by 86.9.13.234 (talk) 14:49, 8 September 2010 (UTC)[reply]

"&" and "<" in XML entity values[edit]

  • The article itself states they "may never appear in content."
  • The matching reference's summary states they are allowed (just not recommended).
  • The actual reference (i.e. the specs) states they "MUST NOT appear in their literal form, except when..." (certain cases like when inside CDATA).

So should the first two be fixed to reflect the latter? Can someone offer correct fixes then? -109.66.203.215 (talk) 08:52, 16 November 2010 (UTC)[reply]

History needs a bit of cleanup[edit]

Fixed a couple of things, but the section could (and should) be much better written. A very short to-do list, in decreasing order of importance:

  • the link to Kimber's blog is totally out of place;
  • more supporting citations are needed;
  • more historical sources should be found and linked to.

Andy Monakov (talk) 11:32, 15 September 2011 (UTC)[reply]

On the number-of-weeks issue, I can't get Jon's count to work in my head. I seem to remember that we were working in at least part of August, and when I pop up a 1996 calendar I have trouble getting the week count down to his number. However, it is absolutely the case that the first wave of work was in the August-November timeframe, so I thought it best just to say that rather than arguing over the number of weeks. On the section in general, I agree it's rambling and messy, that may have been partially a consequence of too many of the people who were involved wanting their opinion/contribution included. I think it would be a good idea for someone else to be bold and clean it up. Tim Bray (talk) 17:35, 20 September 2011 (UTC)[reply]

Jsonix[edit]

Is it worth including a link to Jsonix? --Gak (talk) 12:22, 21 September 2011 (UTC)[reply]

A bit of Web searching reveals no uptake, and also confluence.highsource.org is offline. So, no. Tim Bray (talk) 23:04, 30 September 2011 (UTC)[reply]

XML Abuse[edit]

XML being developed for text markup is being used as general serialization container for any data structure.

Should we add a section about XML Abuse?

Or maybe just a reference at the header should be added?

What do you think?

I would like to add a reference to the header since this is an important problem. —Preceding unsigned comment added by 87.217.111.16 (talk) 07:03, 20 June 2010 (UTC)[reply]

I think you would find it very hard to get consensus on any statement (let alone one short and pithy enough to go in the article lead) about when XML is and is not appropriate. Certainly, the opinions on the page you cite are far too debateable to go here. Let's keep this article factual and concise. It should tell people what XML is and does, not try to precis all the debates about its whys and wherefores. This is an encyclopedia. Mhkay (talk) 21:44, 21 June 2010 (UTC)[reply]
I would like to see mention to the XML abuse as this is an extended practice: What is the worst abuse of XML that you have seen? —Preceding unsigned comment added by 79.144.221.71 (talk) 18:41, 12 December 2010 (UTC)[reply]
XML abuse is a serious, real-world problem and as such it should be addressed by the Wikipedia article. Things have cooled off now that the current buzz is about anything with the word "cloud" written over it, but it was quite terrible not many years ago. XML probably was, and still is, the most widely misunderstood and heavily buzzworded technology of this century so far, and acts as a selling point of anything that uses it, regardless of purpose and schema. People (especially pointy-haired bosses) think anything which has "XML" in the box will automagically talk to anything else with the same label. 08:55, 13 January 2011 (UTC)

A very interesting insight on two potential examples of XML abuse can be quoted from Håkon Wium Lie, Opera's CTO (e.g. here): he describes OOXML and ODF as essentially "memory dumps with angle brackets". 08:55, 13 January 2011 (UTC)

08:55, 13 January 2011 (UTC)  —Preceding unsigned comment added by 217.125.117.197 (talk)  

Thank you very much for keeping the Criticism section and making clear that XML should not be used to represent structured data, but narrative documents. — Preceding unsigned comment added by 193.127.207.152 (talk) 08:09, 13 October 2011 (UTC)[reply]

Large commented out section under Well-formedness and error-handling[edit]

The section in question can be found at the end of this section: http://en.wikipedia.org/w/index.php?title=XML&action=edit&section=8

Are there plans to use that? If not it should be removed, although it does seem to contain some valid information. — Preceding unsigned comment added by Nick Garvey (talkcontribs) 04:03, 2 November 2011 (UTC)[reply]

Hidden commented out sections like this are a menace. I've moved it here from the article:
Extended content

Tree representation of an XML Document

The nesting of elements leads directly to a tree representation for an XML document. The root element becomes the root of a tree. Because every element is composed of a sequence of other elements and character data, it is easy to determine the children of each element. Just take each item in the sequence and create a new child node. Here is an example of a structured XML document:

 <recipe name="bread" prep_time="5 mins" cook_time="3 hours">
   <title>Basic bread</title>
   <ingredient amount="8" unit="dL">Flour</ingredient>
   <ingredient amount="10" unit="grams">Yeast</ingredient>
   <ingredient amount="4" unit="dL" state="warm">Water</ingredient>
   <ingredient amount="1" unit="teaspoon">Salt</ingredient>
   <instructions>
     <step>Mix all ingredients together.</step>
     <step>Knead thoroughly.</step>
     <step>Cover with a cloth, and leave for one hour in warm room.</step>
     <step>Knead again.</step>
     <step>Place in a bread baking tin.</step>
     <step>Cover with a cloth, and leave for one hour in warm room.</step>
     <step>Bake in the oven at 180(degrees)C for 30 minutes.</step>
   </instructions>
 </recipe>
!-- Not well-formed fragment --
<title>Book on Logic<author>Aristotle</title></author>

One way of writing the same information in a way which could be incorporated into a well-formed XML document is as follows:

!-- Well-formed XML fragment --
<title>Book on Logic</title> <author>Aristotle</author>

In XML, the proper way of nesting code is through parallel data and character data

Ex.

   <paragraph>
	Hello, my name is<first-name>John</first-name>
	<last-name> Doe</last-name>from the
	<country>United States</country>
   </paragraph>

This shows the “paragraph” consists of a sequence of five items. The “first-name”, “last-name”, and “country” elements consisted of character data and the other two areas were just character data.

Entity references[edit]

An entity in XML is a named body of data, usually text. Entities are often used to represent single characters that cannot easily be entered on the keyboard; they are also used to represent pieces of standard ("boilerplate") text that occur in many documents, especially if there is a need to allow such text to be changed in one place only.

Special characters can be represented either using entity references, or by means of numeric character references. An example of a numeric character reference is "&#x20AC;", which refers to the Euro symbol by means of its Unicode codepoint in hexadecimal.

An entity reference is a placeholder that represents that entity. It consists of the entity's name preceded by an ampersand ("&") and followed by a semicolon (";"). XML has five predeclared entities:

  • &amp; (& or "ampersand")
  • &lt; (< or "less than")
  • &gt; (> or "greater than")
  • &apos; (' or "apostrophe")
  • &quot; (" or "quotation mark")

Here is an example using a predeclared XML entity to represent the ampersand in the name "AT&T":

<company_name>AT&amp;T</company_name>

Additional entities (beyond the predefined ones) can be declared in the document's Document Type Definition (DTD). A basic example of doing so in a minimal internal DTD follows. Declared entities can describe single characters or pieces of text, and can reference each other.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE example [
    <!ENTITY copy "&#xA9;">
    <!ENTITY copyright-notice "Copyright &copy; 2009, XYZ Enterprises">
]>
<example>
    &copyright-notice;
</example>

When viewed in a suitable browser, the XML document above appears as:

Copyright © 2009, XYZ Enterprises

Numeric character references[edit]

Numeric character references look like entity references, but instead of a name, they contain the "#" character followed by a number. The number (in decimal or "x"-prefixed hexadecimal) represents a Unicode code point. Unlike entity references, they are neither predeclared nor do they need to be declared in the document's DTD. They have typically been used to represent characters that are not easily encodable, such as an Arabic character in a document produced on a European computer. The ampersand in the "AT&T" example could also be escaped like this (decimal 38 and hexadecimal 26 both represent the Unicode code point for the "&" character):

<company_name>AT&#38;T</company_name>
<company_name>AT&#x26;T</company_name>

Similarly, in the previous example, notice that "&#xA9;" is used to generate the “©” symbol.

See also numeric character references.

Well-formed documents[edit]

In XML, a well-formed document must conform to the following rules, among others:

  • Non-empty elements are delimited by both a start-tag and an end-tag.
  • Empty elements may be marked with an empty-element (self-closing) tag, such as <IAmEmpty />. This is equal to <IAmEmpty></IAmEmpty>.
  • All attribute values are quoted with either single (') or double (") quotes. Single quotes close a single quote and double quotes close a double quote.[4][5]
  • To include a double quote inside an attribute value that is double quoted, or a single quote inside an attribute value that is single quoted, escape the inner quote mark using entity references.
  • Tags may be nested but must not overlap. Each non-root element must be completely contained in another element.
  • The document complies with its declared character encoding. The encoding may be declared or implied externally, such as in "Content-Type" headers when a document is transported via HTTP, or internally, using explicit markup at the very beginning of the document. When no such declaration exists, a Unicode encoding is assumed, as defined by a Unicode Byte Order Mark before the document's first character. If the mark does not exist, UTF-8 encoding is assumed.

Element names are case-sensitive. For example, the following is a well-formed matching pair:

<Step> ... </Step>

whereas these are not

<Step> ... </step>
<STEP> ... </step>

By carefully choosing the names of the XML elements one may convey the meaning of the data in the markup. This increases human readability while retaining the rigor needed for software parsing.

Choosing meaningful names implies the semantics of elements and attributes to a human reader without reference to external documentation. However, this can lead to verbosity, which complicates authoring and increases file size.

Automatic verification[edit]

It is relatively simple to verify that a document is well-formed or validated XML, because the rules of well-formedness and validation of XML are designed for portability of tools. The idea is that any tool designed to work with XML files will be able to work with XML files written in any XML language (or XML application). Here are some examples of ways to verify XML documents:

  • load it into an XML-capable browser, such as Firefox or Internet Explorer
  • use a tool like xmlwf (usually bundled with expat)
  • parse the document, for instance in Ruby:
 irb> require "rexml/document"
 irb> include REXML
 irb> doc = Document.new(File.new("test.xml")).root
--Cybercobra (talk) 05:12, 2 November 2011 (UTC)[reply]

Example shown in Icon, not technically invalid but a poor example[edit]

Looking at the example, it shows questions and answers being thrown straight into the <quiz> bracket. Surely each pair of Q&A would need to be wrapped in a tag <round> or <question_set>? Otherwise the program using this would have to read through the whole thing serially for any of it to make sense.

This is more a practical issue and not a technical one.--92.14.116.17 (talk) 14:12, 3 March 2011 (UTC)[reply]

Using XML for question-and-answer quizzes seems to be a common student exercise set by unimaginative teachers, and as the problem never occurs in real life I guess you'd better find out what those teachers consider the right answer to be. Or at any rate, find out what requirements they are assessing the solution against. Mhkay (talk) 15:15, 3 March 2011 (UTC)[reply]

Still a poor example, and barely an example as it is shown as a small piece of graphics. There should definitely be a real example in the text. And in that example it should be explained which one is the root element. My issue is that I believe I heard that the root element is in fact an implicit element above the topmost element. This article does not even explain what a root element is, just that there is only one. 193.140.194.148 (talk) 12:15, 13 January 2012 (UTC)[reply]

Character entity references for escaping[edit]

I attempted to link the #Escaping section with the Character entity reference article. I thought the link was relevant because it seems that article also lists the same five objects, and could potentially expand on the topic. If the problem with my change was just an issue with terminology or semantics, perhaps I can avoid this by directly naming the article, for example “There are five predefined entities (see Character entity reference)”. Otherwise, I’d love to know why the two topics shouldn’t be related when they seem so similar. Vadmium (talk, contribs) 12:34, 5 February 2012 (UTC).[reply]

Inline links are preferable - so I linked the "predefined entities" to the list article. I would be in favour of a link to Character entity reference in there as well (as a see also or piped link), as I'm not sure how Tim's more strict syntactic stance fits with article topics. I've linked to them in the (already linked) Valid characters in XML as a compromise for now. I'm more than happy to step away and leave for Tim. Is there a better title for List of XML and HTML character entity references ? (discuss here) Widefox (talk) 10:09, 13 June 2012 (UTC)[reply]

Linking “predefined entities” is fine by me. I would like to see other opinions, especially Tim’s, because so far I don’t understand the reasons behind his edit summaries. Vadmium (talk, contribs) 13:00, 13 June 2012 (UTC).[reply]

Merger proposal: valid XML document[edit]

The concept of a valid document is better described in the XML article itself. The phrases "valid XML" and "valid XML document" should redirect to that section in the XML article. Paul Foxworthy (talk) 04:30, 3 July 2009 (UTC)[reply]

After three years of no discussion, with a very short article, the little content involved has been merged to this article. WTF? (talk) 03:18, 12 July 2012 (UTC)[reply]

Not sure what is, found anyway[edit]

Not from INRIA anyhoo. — Preceding unsigned comment added by 177.4.61.19 (talk) 01:22, 27 September 2012 (UTC)[reply]

Unclear?[edit]

"Extensible Markup Language (XML) is a markup language created to.... As of 2009, hundreds of XML-based languages have been developed." So, is XML ONE language that inspired others that are based on it, or is it a family of languages? The text says the former, but it comes close to saying the latter. I think a clarification is needed.Kdammers (talk) 11:59, 15 November 2012 (UTC)[reply]

The airplane was created by the Wright brothers to... As of 2009 hundreds of airplanes have been developed.
The PC (Personal Computer) was created by John Blankenbaker in 1971. He called it the Kenbak-1. As of 2009 thousands of PCs have been developed. --Guy Macon (talk) 14:59, 15 November 2012 (UTC)[reply]

AJAX, Syntax Examples[edit]

I searched the article for any mention of AJAX and didn't see any. Maybe I'm mistaken? I believe it's important to shed some light on the fact that XML has been...featured...as one of the key ingredients of the highly-relevant and evermore popular AJAX framework. While there is a whole article dedicated to AJAX itself, it would be a disservice to XML to not give it some limelight with AJAX' popularity. I would also recommend adding AJAX to the list of recommendations at the bottom of the page. Secondly, the syntax examples given are great, but I feel it would be beneficial to show some syntactic examples, perhaps with a wider context. For example, displaying a decent-sized chunk of XML code in it's full glory and I would say that even showing an example XSD along side of it would be very much relevant to XML as well. Great article though overall! Very good work. Cheers! Danielbullis (talk) 04:13, 2 March 2013 (UTC)[reply]

Here's a good example of what I meant by providing XML examples. Scroll down to approximately half-way down the page or so, or use the table of contents to select the "XML" link. XML example in JSON article. Cheers! Danielbullis (talk) 05:46, 2 March 2013 (UTC)[reply]

Tables[edit]

This article lacks a description of XMLs ability to define tables (e.g. with definable cell walls as used in Microsoft Word etc and predefined table types as used in Z-notation). It would be nice to see examples of XML used to define tables. FreeFlow99 (talk) 10:40, 22 January 2014 (UTC)[reply]

XML (as scoped by this article) doesn't have the ability to define tables. XML, together with an application schema, might be able to – but that belongs in an article about that schema, not about the syntax and data model of XML overall. At most, this article could use detail about such an application level as an example of what XML can be used for and so why it's worth bothering with it at all. Such examples should be broad and lightweight though. Also note that the importance of XML to table-schema doesn't necessarily indicate that table-schema has an equal commutative relationship of importance to XML. Andy Dingley (talk) 11:46, 22 January 2014 (UTC)[reply]
Thanks. In that case I would suggest the creation of a small section entitled "Tables" (so that it is easily found by people interested in that functionality). Within that section a statement that XML itself does to support tables, but that some markup languages based on XML do, and list some examples with links to their respective pages. FreeFlow99 (talk) 12:47, 22 January 2014 (UTC)[reply]
Totally inappropriate suggestion. What's so special about tables? They are just one example of the many things that can be modelled using XML. It would be like mentioning Timbuktu in an article about helicopters, just because helicopters can be used to travel to Timbuktu. Mhkay (talk) 00:05, 25 January 2014 (UTC)[reply]

Confusion[edit]

I came across this page because I wanted to know what and XML document was, and what it's used for after I came across one on my computer. The article confused me and although it appears to contain a considerable amount of information for those with an advanced level of understanding about these things, as it started off expecting that the reader had the knowledge I gave up and will probably go elsewhere.

If the information beginners might need is already in the article, I would suggest moving it around so it appears early on. If the information is in another article, it might be best to make this known at the top of the page.

It might just be me having this problem because I can't imagine many people want to know what an XML document is, but it's worth bearing in mind.

Brainshower (talk) 16:19, 25 October 2012 (UTC)[reply]

Please respond to this by looking at the article and improving it. This is just one of too many Wik articles (especially ones dealing with computers) that leave outsiders frustrated.

Kdammers (talk) 11:56, 15 November 2012 (UTC)[reply]

    • Added XML log section as this is what is found on most systems. Thank you for the feedback. Telecine Guy (talk) 20:22, 14 December 2016 (UTC)[reply]

Example[edit]

Strangely enough, I did not found any examples on this page.

Where should xml-model go?[edit]

Is ISO/IEC 19757 xml-model worthy of its own article? Or should it be folded in here. If so where? Ross Lamont (talk) 07:48, 15 September 2017 (UTC)[reply]

Unsourced claims of extensibility and semantics.[edit]

These [4] [5] are a problem. Not only are they unsourced, but XML can't do these things. Yes, they can be done, and they can be done in XML - but they can be done in ASCII too, and I don't see such claims being added to that article. XML does not support these features. It is wrong for an encyclopedic article to add unsourced claims like this, that imply that it does. Andy Dingley (talk) 15:47, 24 August 2017 (UTC)[reply]

  1. [6] " through use of tags that can be created and defined by users. Much like natural language is extensible (that is, can grow) when speakers create new words and agree on what they mean, XML is a markup language that can grow when users create new elements and agree on what they mean. "
  2. [7] "XML allows markup with tags that can be created and defined by users. Much like natural language is extensible (that is, can grow) when speakers create new words and agree on what they mean, XML is a markup language that can grow when users create new elements and agree on what they mean. This makes XML able to capture intent in a way much broader than a nonextensible markup language such as HTML. For example, XML can mark up machine-readably that apples and bananas are types of fruit, which is semantically deeper than the purpose of HTML. However, HTML is useful for display of content; often HTML is used to display XML content after transformation with XSL."
Andy Dingley (talk) 15:48, 24 August 2017 (UTC)[reply]


Problems with this:
  • "use of tags that can be created and defined by users."
How are "users" (and who are "users"?) able to do this? This is an ancient misconception of XML, one that I thought was dead and buried by 2005 or so - 2000 if you'd been paying attention. XML differs from SGML primarily in that it is parseable without a DTD or schema, but the misconception is that trivial parsing then magically allowed the resultant infoset to be processed further, in ignorance of the schema. This bold claim did not work.
  • "Much like natural language is extensible (that is, can grow)" adjoining "XML is a markup language that can grow when users create new elements and agree on what they mean. "
That's to conflate a folksonomy and the use of XML, presumably by creating new elements similarly on the fly. Ain't gonna happen.
  • "This makes XML able to capture intent in a way much broader than a nonextensible markup language such as HTML. "
There is no indication that XML can do any such thing. In both cases, a schema needs to agreed beforehand, so that programmers can begin work on writing processors for this infoset. If either is extensible, they're both extended in the same way: by using a (pre-defined and pre-agreed) metadata schema (there are several for HTML, such as Microformats), pushing the problem off into a metaformat.
  • "XML can mark up machine-readably that apples and bananas are types of fruit,"
Wow, class-based taxonomic inferencing just appeared out of nowhere. Not in XML it doesn't. Maybe in RDF, but even then you start to need OWL to get anywhere.
These are serious problems in this added text, and these misunderstandings were demolished (at vast cost) about 15 years ago. WP should not be reintroducing them today. Andy Dingley (talk) 16:01, 24 August 2017 (UTC)[reply]

Hi @Andy Dingley:. I too am happy to discuss. You're (good-faith-ly) quite off base about "XML does not support these features". It's the very point of its extensibility and it's why companies or anyone else writes their own DTDs and XML schemas instead of pulling a ready-made one off the rack. Besides the voluminous discussion of technicalities and syntax (which is already abundant and is quite useful as far as it goes), the article also needs to simply explain to readers why extensible markup is a thing—why it is needed in addition to nonextensible markup such as HTML. There is no reason not to begin the Applications section with this explanation. Think about why companies write their own DTDs and XML schemas. Why didn't WC3 just create all the XML elements that can exist in a predefined dictionary, like they did for HTML? It is to capture meaning through markup. For example, a company needs to define a new child element for an existing parent element. This is why XML databases are possible to make. Quercus solaris (talk) 15:57, 24 August 2017 (UTC)[reply]


  • "Think about why companies write their own DTDs and XML schemas."
Because they don't know any better. And if they do it today, it's because they've not been paying attention for years.
If you want your project to work, don't invent a schema. A new schema isn't some magic way to communicate, it's a babel to prevent anyone else understanding what you're saying. Successful use of XML isn't based on inventing new schemas (unless you have some industry-dominant position in your narrow industry), it's about using existing schemas. If I had a dollar for every project that crashed and burned from that mistake, I'd have most of my consultancy income for 2000-2005. If you really need to do this, don't expect XML to do it for you, you need something better.
  • "This is why XML databases are possible to make"
What's an "XML database"? A marketing buzzword? (most popular) An opaque bucket to store opaque XML in, rather than opaque text? (how WP defines it) Or a database that allows querying on the basis of the XML infoset? - doesn't work. That's why SPARQL is based on RDF rather than XML.
All of this extensibility is good stuff, but XML doesn't do it. That's why we had to invent other things to do it with. Andy Dingley (talk) 16:14, 24 August 2017 (UTC)[reply]


No, no—you're way overthinking the technicality of what I'm saying, and being way too superficially dismissive ("not paying attention since 2000" etc). I'm not talking about folksonomy at all here. Not about "on the fly" at all—rather, about "at all" versus "on the fly". "Users" in the XML context are the people writing the content or the people marking up the content on behalf of them (such as when a nurse writes an article about nursing and then a journal staff marks it up with XML). The journal staff create elements in an XML schema. Then, using that schema, they all agree on what the elements are, how they interact with other elements, and what they represent semantically. For example, a book author decides that his book will have two types of appendix. The book publisher then creates elements such as appendix1 and appendix2, defines them in a DTD or schema, and uses them in XLS. The point is that HTML does not offer a way to define appendix1 and appendix2; it is not for that purpose. XML is. Quercus solaris (talk) 16:09, 24 August 2017 (UTC)[reply]
"you're way overthinking the technicality of what I'm saying"
It's what I do. It's what I'm paid to do. It's why I don't edit computing articles on WP. I spend my working day being told "you're overthinking the technicalities" by idiot managers who then have their projects fail, because they've under-thought their XML-based wunderkind. And if you pay me enough, I'll even smile at the bankruptcy party. I ain't doing it for free.
"being way too superficially dismissive"
Because I'm racing against an Undo button. If you want chapter and verse, read my conference papers and patents. I'm not one of the Dans, I don't work for Google, but I do have enough lumps in this field to know what I'm talking about.
" "Users" in the XML context are the people writing the content "
XML doesn't define a "user" layer, but let's go with them at that level. They need to create an incremental authoritative taxonomy, or even a folksonomy (which is a harder problem). "The journal staff create elements in an XML schema." Not in processable XML they don't. Because if they do, the software that needs to process this has never seen that element before and it has no idea how to process it. If you want to build a system that does, you need something smarter than XML. If you want to do it in a way that can communicate outside a single project (the babel schema problem) then you have to do it with RDF + RDF Schema (needs more than RDF alone, there's not much else in the same space, it doesn't yet need OWL, Schema is sufficient). Andy Dingley (talk) 16:29, 24 August 2017 (UTC)[reply]

What you said at one point is exactly what I am talking about ("In both cases, a schema needs to agreed beforehand, so that programmers can begin work on writing processors for this infoset. If either is extensible, they're both extended in the same way: by using a pre-defined and pre-agreed) metadata schema"). You think I'm flying past that and trying to tack on other things. I'm not. The point is that extensbile markup languages were invented to make that possible. That's what I'm talking about. This article in its current form does not explain or illustrate that for a layperson. A layperson has no conception of that until it is explained somehow or somewhere. Maybe I should be looking to lead them to it somewhere else like the articles Markup language or Standard Generalized Markup Language. It seems that you have been an IT expert for so many years that you have lost sight of how one would explain to laypersons why SGML or XML exist—why element extensibility exists. Quercus solaris (talk) 16:23, 24 August 2017 (UTC)[reply]

"were invented to make that possible" - yes they were. Sadly inventing them wasn't enough to make that possible. Optimism doesn't make the code work. XML doesn't work for doing this. Andy Dingley (talk) 16:32, 24 August 2017 (UTC)[reply]
@Andy Dingley: To show you that I am not making this up out of thin air, please take a look at W3schools XML 101 info at https://www.w3schools.com/xml/default.asp. Look at the "food" and "calories" elements. This is what I am talking about. That website explains and shows to beginners what the purpose is. This Wikipedia article does not. Quercus solaris (talk) 16:30, 24 August 2017 (UTC)[reply]
  • W3Schools?
Seriously?
WTF is the point in even having this discussion? Andy Dingley (talk) 16:32, 24 August 2017 (UTC)[reply]
The point is this: Wikipedia explains to people what things are and why they exist in addition to how they work. I aim to improve this instance of the lack of that somehow, if not at this article then linking to discussions elsewhere at Wikipedia. That much is not a subjective matter of your being a greater IT expert (no doubt you are) who is overruling my edits. It objectively needs to be done somehow, and I will figure out how to do it. This aspect does not require my getting anyone's permission to edit or add. Quercus solaris (talk) 16:37, 24 August 2017 (UTC)[reply]
I think you should also take a deep breath long enough to see things through others' eyes. You seem incredibly touchy at the level of technical expertise and rapid dismissal, but it leaves your approach devoid of any pedagogical value. Rather than simply looking at the example that I provided, only to see what I am talking about, you apparently dismissed it without a glance. I am not saying that I am teaching you anything. I am just talking about explaining a subject to lay readers. Consider what Wikipedia is and why it exists: there is pedagogy involved. Don't be quite so quick to assume that I'm an idiot. I guess what I'm getting at is that in your rapid thrust to display your expertise and dismiss idiots, you're actually showing yourself to others to be too wounded and angry to see or admit the valid kernel that someone else is trying to get at, and work collaboratively from there. The net result is that you're not proving yourself immeasurably superior like you think you are. For any others reading this page, see https://www.w3schools.com/xml/xml_whatis.asp as a clear example of what I am talking about, proving that Andy has mostly missed my point in his race to prove me an idiot. Quercus solaris (talk) 16:45, 24 August 2017 (UTC)[reply]
Whether justifiably or not, w3schools has a very poor reputation among many in the XML community and by citing it you reduce rather than enhance your credibility. Sorry: just saying so that you know. Equally, the fact that you take Andy's criticism so personally does not help your case. More substantively, there has been nearly 20 years of discussion on the xml-dev mailing list about the extent to which one can attribute semantics to XML, and the general consensus has always been that XML is no more than a syntactic framework for transmitting messages, which can only convey information from sender to recipient if the recipient has some knowledge of the meaning of the tags acquired through a separate channel. If you receive a message saying <book price="2.65">The Grapes of Wrath</book> you can make guesses about its possible meaning but you cannot make any reliable inferences. For example, if you inferred that the sender of the message was willing to pay you EUR 2.65 for your copy of the book, I suspect you would be wrong, but there is nothing in the message to say you are wrong. Mhkay (talk) 19:46, 19 September 2017 (UTC)[reply]
"Mhkay"? We're not worthy! Andy Dingley (talk) 20:37, 19 September 2017 (UTC)[reply]

text/xml deprecated?[edit]

Not sure why it says text/xml is deprecated.

Just skimming over the RFC can't see that explicitly [8]

Jjjjjjjjjj (talk) 19:20, 10 June 2010 (UTC)[reply]

I have updated the citation to the IETF memo that deprecates text/xml and explains why. Mhkay (talk) 22:57, 11 June 2010 (UTC)[reply]

The RFC says "If an XML document -- that is, the unprocessed, source XML document -- is readable by casual users, text/xml is preferable to application/xml." I think characterising this as deprecation is inaccurate. Perhaps the description should be "application/xml (preferred for most technical use), text/xml (preferred when readable by casual users)" with a link to the RFC. What do people think? Paul Foxworthy (talk) 03:52, 13 June 2010 (UTC)[reply]

The cited Murata/Kohn/Lilley memo clearly labels text/xml as deprecated. This memo is much more recent than RFC 3023. The problems with text/xml largely emerged after 3023 was published. Mhkay (talk) 22:50, 14 June 2010 (UTC)[reply]
Thanks. But the citation is to RFC 3023. If the MKL memo is the source that confirms that text/xml is deprecated, then the citation is misleading. Well, it misled me at least :-). The memo I can find [9] is a draft and supposedly expired in March. If it's now an RFC, where is it? I propose a second citation be added to the deprecation referring to the draft. If and when the draft becomes an RFC to replace 3023, there should be just one citation that refers to that replacement. Does that make sense to everyone?Paul Foxworthy (talk) 15:47, 21 June 2010 (UTC)[reply]
I can't see where you have problems. (You say "But the citation is to RFC 3023". But there are multiple citations.) The article says that RFC 3023 standardizes text/xml and application/xml, which is true, and it also says that text/xml is in the process of being deprecated, which is also true, and both statements are linked to relevant citations. I've no idea what the current state of that process is, but the fact that the memo has timed out doesn't mean the process has been abandoned, unless you can find evidence to the contrary. Mhkay (talk) 21:39, 21 June 2010 (UTC)[reply]
I was talking about the citation in the infobox, sorry I didn't make that clear. I am not too fussed about the status of the memo, all I want is the best citation for the fact that text/xml has been deprecated.Paul Foxworthy (talk) 06:02, 1 July 2010 (UTC)[reply]
I've added a citation in the infobox. Paul Foxworthy (talk) 04:52, 6 July 2010 (UTC)[reply]
Now it still looks as if it were deprecated, but in reality it isn't. It was - as you tell yourself - deprecated in a draft which expired. It this really notable? Anyway it should be made clear, that it isn't deprecated and may never be deprecated, although there may be reasons against it's use. —h.e.r—79.236.22.145 (talk) 08:54, 2 August 2010 (UTC)[reply]
Claiming deprecation of text/xml in this article is misleading. The Murata/Kohn/Lilley memo merely refers to potential issues with charset encoding values, not with the media type per se (charset encodings can clash when the charset indicates ISO-8859-1 when the XML document uses UTF-8 for example.) The fact is that text/xml is still widely in use as a media type and has not been deprecated. Robert van Engelen (talk) 06:12, 2 August 2018 (UTC)[reply]

Another unsourced claim[edit]

'Some other specifications conceived as part of the "XML Core" have failed to find wide adoption, including XInclude, XLink, and XPointer.' This sentence may be true (although DocBook documents use XInclude all the time, e.g. chapters in a book). But if it is true (and I want to emphasize that I don't know whether it's true), it needs to be sourced.

The trouble with this mechanistic approach to "sources" and "citations" is that a good encyclopedia article actually summarises knowledge aggregated from a very wide range of reading; a claim like this is not based on one well-researched statistical study that scientifically analyses the level of adoption of different technologies, it is based on the experience of someone who reads a lot, goes to a lot of conferences, and is well connected in the industry. I know there are those in the Wikipedia community who would deprive us of the benefit of such acquired wisdom, but personally, I find it invaluable. Mhkay (talk) 19:59, 6 November 2018 (UTC)[reply]

Likewise the statement "exchanging highly structured data between applications, which was not its primary design goal". I'm not sure where this claim comes from. It's true that "highly structured" isn't mentioned in the W3C's "origin and goals" section in the spec (https://www.w3.org/TR/REC-xml/), but the spec does talk about "data objects"--but one could just as well argue from this omission that XML wasn't intended for "non-highly structured" data, either. Anyway, if exchanging highly structured data between apps was not on the founding fathers' mind, a citation would be in order.

This slide just about sums it up: https://www.w3.org/Talks/9803xml-seattle/slide9-0.htm ("Q: Why the W3C XML Activity? A: Structured Document Interchange"). But I think citing one slide to justify this claim would be facile. You have to read the entire conference proceedings of conferences held around 1997/98 to get a view of the general climate of opinion at the time, not just one slide from one speaker. Mhkay (talk) 19:48, 6 November 2018 (UTC)[reply]

(BTW, I'd put this under the previous heading about unsourced claims, but I'm afraid it would just get lost there.) Mcswell (talk) 17:20, 5 November 2018 (UTC)[reply]

A small doubt related[edit]

In this article, in the infobox named file format, for the attribute mime, please verify whether the first </code> needs to be removed or not.Adithyak1997 (talk) 08:00, 25 January 2019 (UTC)[reply]

Thanks for noticing, fixed --hulmem (talk) 18:44, 25 January 2019 (UTC)[reply]

Broadcast Markup Language and BeerXML[edit]

The recently created article on BeerXML is up for deletion by one user who is very motivated to get rid of it after their speedy delete marker was removed. As they are now losing the argument at its delete discussion page they have adopted the tactic of trawling Wikipedia for other specialist XML based definitions and placing delete markers on them also. The long-standing article Broadcast Markup Language has now being tagged by for them in an effort to shore up their attempts to get BeerXML deleted.

If this behaviour is not checked several articles in the various XML categories could come under threat. Please help defend these articles with good arguments. Help will be much appreciated by all the contributors who now find their good work under threat. Devils In Skirts! (talk) 12:34, 15 February 2014 (UTC)[reply]

Help Oh12345well4321 (talk) 08:28, 23 March 2019 (UTC)[reply]

Please use the correct inseption year in the description box[edit]

See http://wikidata.org/entity/Q2115#P571 --Krauss (talk) 11:23, 21 September 2019 (UTC)[reply]

XML is history[edit]

There are three main technologies driving WWW. These are HTML5, ECMA Script, and CSS. Although XML still plays a role, the effort to supplant HTML 4.1 with flavors of XHTML was misguided. It is an error to indicate that XML is part of any current development effort related to HTML. — Preceding unsigned comment added by Squeeky Longhair (talkcontribs) 20:15, 3 February 2022 (UTC)[reply]

If the last part of your message is referring to any specific portion of the article, please mention that portion.
As for the rest of the message, is sounds like a rant. Thousands of technologies are driving today's Web, e.g. JPEG, HEVC, TCP/IP, HTTPS, PHP, Python, RPC. To ignore them is hubris. Waysidesc (talk) 21:30, 3 February 2022 (UTC)[reply]
  1. ^ "W3C® DOCUMENT LICENSE".
  2. ^ "XML Applications and Initiatives".
  3. ^ "XML 1.0 Origin and Goals". Retrieved July 2009. {{cite web}}: Check date values in: |accessdate= (help)
  4. ^ "XML Attributes". W3Schools.
  5. ^ "Attributes (XML Standards)". Microsoft.