Talk:XHTML/Archive 1

This page is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

POV in Note

I think this statement should be removed:

While the arguments behind this XHTML scare are not entirely clear, they seem to be based on the mistaken assumption that the SGML on which HTML (along with XML) is based could never allow a slash ‘/’ within a tag.

the reasons why XHTML is criticized is because it was designed with false assumptions and it brings more problems than actual benefits to authoring.

XHTML was half-way adopted by authors because of their expectations that it will minimalise transitions costs to future w3c recommended markup. with XHTML2.0 being not backwards compatible, even this turns not to hold.

The slash problem is just one tiny piece. I find more dangerous that xhtml calls for not displaying the document content even when some advertising/measuring code injected by third party breaks the structure. This behaviour is practical during authoring but definitively harmful once the document is published.

w3c should recall that the main purpose of web is to share information, not to share error messages. (unsigned comment from IP 85.160.16.82)

The author of the above comment would do well to remember that the best way to share information is through an agreed standard which everyone conforms to. I'm sure if you put in a video cassette into a DVD player then you'd have get a very obvious error message. It is not XHTML that calls for error messages, it is part of the XML spec and XHTML merely inherited this.

XHTML 2.0 recommends using <element/>

...Instead of <element />, which was allowed on older versions of the spec. Some people might say <element /> offers a fallback compatibility, but that's wrong; it just leads to tag soup. Using <element/> is the right way, and I'll be changing the main article minorly to reflect that. --Saoshyant 22:30, 7 August 2006 (UTC)

"Some people" should read W3C which recommends <element /> in their HTML compatibility guidelines. XHTML 2 is not a recommendation yet. You shouldn't presume the HTML 4 compatibility guidelines no longer apply just because they are not in the XHTML 2 draft.

You might want to mention that "some people" don't like the self closing tags or the the extra space-character in their self-closing tags, but you shouldprovide a cite or a citation needed template when you do. --Cplot 00:34, 8 August 2006 (UTC)

Also I should add that adding the space in XHTML 1 and later is not tag soup since whitespace is ignored within the tag. In HTML 4.01 and earlier having the / in the element is also not tag soup since its simply ignored: the slash has no meaning inside that portion of the tag. --Cplot 00:56, 8 August 2006 (UTC)

Sitegod 20:00, 9 May 2007 (UTC) I assume then that the slash is being treated as a minimized attribute in HTML 4? A conforming SGML parser would take the trailing space and slash as prematurely closing the element and add >s everywhere.

Until you guys stop claiming that HTML is an application of SGML a solidus does have meaning. Read http://hixie.ch/advocacy/xhtml for some information. Anne van Kesteren 14:14, 17 December 2006 (UTC)

What Cplot said. ¦ Reisio 13:27, 8 August 2006 (UTC)

I am fairly sure both are acceptable XML tags, therefore it doesnt matter which you use, provided you are serving the content as "application/xhtml+xml" or something to that effect. I believe Saoshyant had the right idea, but didn't word it right. As seen from Anne van Kesteren's article, XHTML SHOULD NOT BE SERVED AS HTML! Any site doing this, including wikipedia.org, is sending tag soup to browsers. If they send their xhtml as <element />, the browsers can better read the tag soup. In this way, it DOES offer fallback capability (though you are merely relying on the browser's error handling capabilities). However, XHTML 2.0 explixcitly says it should only be served as "application/xhtml+xml" or something to that effect. Whilst <element /> is still allowed, <element/> is possibly very slightly better markup. BOTH METHODS ARE CORRECT. I use server-side scripting to add end-tags for browsers that accept xhtml, so I can serve xhtml properly to browsers that take it, and html to everyone else. Cplot, if you are serving your xhtml as "text/html", which judging by your replies you most likely are, then your document is infact a massive pile of tag-soup. well done and have a medal for your idiotically aggressive wording when it seems Saoshyant is actually right and you are wrong. If you are serving XHTML as intended there is no need to follow the HTML 4 compatibility guidelines
Old sweat 20:27, 2 January 2007 (UTC)

Most of the recent versions of popular web browsers render XHTML properly

Err, Internet Explorer does not support XHTML (application/xhtml+xml). Any valid XHTML file will fail to render on it. Instead an Unknown File Type error will occur. Btw, Google does not index correct XHTML either.

And btw if you cheat and identify XHTML incorrectly as HTML then you void all the possible benefits of XHTML.

Some criticism of XHTML should be in this article. --Hendry 06:06, 8 December 2005 (UTC)

You can't support XHTML on the Web without manipulating the mimetypes. --61.9.136.168

The W3C XHMTL FAQ page has a line of code that is says performs a trick to allow it to work on Internet Explorer.--80.4.252.114 18:31, 15 April 2006 (UTC)

...and yet the W3C XHTML FAQ itself is served as "text/html".—70.184.72.38

'Most' means 10%?

A paragraph in the article needs to be corrected, but I'm not sure which statement is correct. First it says "Most of the recent versions of popular web browsers render XHTML properly", but then a few sentences later goes on with "During October 2005 approximately 10% of web surfers were using browsers capable of rendering XHTML properly." Which is it?

--Jorvis

afaict most browsers do, most users use the one browser that still doesn't ;) Plugwash 04:36, 21 January 2006 (UTC)

XHTML != separation

XHTML is just the XML-form of HTML, CSS and seperation of content and form are the same as they were in HTML. <table/> is still usable for lay-outing. But with XHTML, semantics and the seperation of content and form became a hot item.

I think that it's necessary to add something like this to the part about CSS and XHTML.

Try this page, this is real XHTML (contenttype application/xhtml+xml) http://devedge.netscape.com/viewsource/2003/xhtml-style-script/examples/example-6.xhtml

It doesn't work in IE6 (it wants to download it). Mozilla gets it. These days you can't make webpages that aren't supported by IE6. It also shows that a real XHTML page is something different then XMLling your HTML, putting in an xhtml doctype and then send it as text/html anyway.

XTML?

Is XTML really a valid abbreviation for XHTML? I've never seen it used on the net before, especially not at the W3C and in standards discussions.

I was just about to ask the same thing. I've never heard of "XTML" and this stuff is my job. I'm gonna remove it. Feel free to revert the change, but at least explain where and why, with references, if you're gonna... ;o) — OwenBlacker 01:59, Jun 20, 2004 (UTC)

DTD != XML

I just removed: "The XHTML DTD is defined using XML, to enforce that language's strict rules" which is plain wrong. DTDs are not specified using XML, schemas are.

Woule it not be fair to say that the DTD is where the scema is defined for a particular document type? And the primary diffrerence between the HTML4.01 and the XHTML 1.01 DTD schema are that the XHTML 1.01 schema requires the closing of all tags due to the XML rules versus SGML rules, which allows optional closing of tags in cases where the closure is non-ambiguous (at least to a parser; and assuming the content model is followed precisely by the author(s)).

DTDs are part of the XML spec, and are part of XML, even if its syntax isn't that of XML. Geoffrey Sneddon (talk) 17:19, 23 December 2007 (UTC)

was just reading the comments here...

I make a few websites here and there... a good comment above that table's are perfectly valid XHTML and validate through w3c.org... many HTML junkies think XHTML = no tables or table-less layouts. I'd think perhaps the most common error from HTML to XHTML is using uppercase tags... this seems to be rampant all over the internet.

I look at this as a W3 v. Microsoft issue. The W3 would like to seaprate presentation from content for all the often incredible advantages that brings about. When user agents (browsers and such) adeeuately support visual layouts without tables, we won't be forced to use them anymore for layout. In other words, not that we won't get to, but that we won't have to.

Microsoft (for some bizarre reason) has decided it wants to permanently keep the advantages of separating content from presentation from everyone (perhaps they see this as locking everyone into their products somehow). My understanding is that even IE7 betas continue to eschew the specifications of the W3.

What would be nice if someone created a server-side tool (like an Apache plugin) that took CSS2 or CSS3 visual layout rules and transformed the semantic content of an XHTMl document into one presented as a table (or infinetly embeded tables in the Micrsoft way) based on the CSS visual layout declarations. Then the reat of the world could take advantage of CSS and semantic XHTML despite Microsoft and IE. --Cplot 03:38, 14 July 2006 (UTC)

Tabular DATA!!!

That's why, for our tabular data. And you can still use tables for layout, the W3C does not reccomend it.

mandatory quotes

The Overview section originally stated that quotes are mandatory in HTML but are "often ignored". However, in HTML it's perfectly legal to not quote an attribute value as long as it consits only of alphanumeric characters and/or certain allowed special characters (such as hyphens). For example, <html lang=en-us> is perfectly allowable. The value must be quoted only if the attribute value includes non-allowed characters (e.g. <style type=text/css> is invalid) or spaces. Demonstration -- Vystrix Nexoth 13:48, Nov 21, 2004 (UTC)

To quote the specs:

By default, SGML requires that all attribute values be delimited using either double quotation marks (ASCII decimal 34) or single quotation marks (ASCII decimal 39).

They are mandatory in HTML ~ mlk ^✉^♬ 01:27, 31 Dec 2004 (UTC) ~

ml, you neglected to read the next paragraph:

In certain cases, authors may specify the value of an attribute without any quotation marks. The attribute value may only contain letters (a-z and A-Z), digits (0-9), hyphens (ASCII decimal 45), periods (ASCII decimal 46), underscores (ASCII decimal 95), and colons (ASCII decimal 58). We recommend using quotation marks even when it is possible to eliminate them.

They are NOT mandatory in HTML, in some circumstances. Granted, it is safer to always use them, but they aren't required. porges 22:21, Apr 4, 2005 (UTC)

(The allowable character set without quotes is then ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-._:)

Tables vs. CSS designs

I cannot agree with calling no-table design users junkies. It makes life easier. Try building a portal, which uses a lot of element positioning from tables, it will take you two weeks to get a prototype of a page finished. Do it with xHTML+CSS, you'll do it in 2 hours. (But, please, do it correctly.)

CSS design requires a lot of experience and is therefore not recommended for beginners. Otherwise, check out the page:

http://www.csszengarden.com/

ProClub

The main reason for tableless layout is that HTML tables are part of that (HTML) semantic language. Using a table for visual purposes is a misuse of the language. In the other hand, layouts based on tables could mean a serius problem for people with disabilities. Imagine that you're blind and a robot (spoken browsers) is reading the content of a website as a table, wich should have structured data in rows and columns related in some way.

Diegosolo 04:17, 25 August 2006 (UTC)

Although CSS design requires a lot of experience, I believe it's no more intensive than learning table layout hacks. Save the beginners some time in the long run and teach them the right way to build web pages.

-- I concur -- Boxxertrumps

My recent edit, commented

I feel this page still has many errors, however, I could not figure out how to correct them without totally ruining the text, so I have left them. However, I did make many adjustments. Most important is it to notice that XHTML is in no way stricter than HTML is, it's just simpler. Thus the parsers are easier to write, and they run better on low-end equipment. However, XML and SGML are both very strict when the parsers written for them are following specifications. For instance, the rule that in XML says "when you start with tag <li> you must also close it with tag </li>" says in SGML "when you start with tag <li> and a tag that cannot be inside li is reached, assume li to be closed". Both are rules, neither are less strict than the other. The XML one is however less ambiguous to us humans.
And, as I tried to point out in my correction, the fact that HTML is allowed to be full of errors is due to crappy implmentation from browser vendors. After all, if they all suddenly started using a full-fledged SGML parser now, most pages on the WWW would not work in them. XHTML however, is a new thing, and can be parsed with full strictness. Of course, this helps little when nobody sends out XHTML as XHTML (application/xhtml+xml) and everybody just spits it out as HTML (text/html) instead, since it causes the browser to use their non-strict HTML parser mode. More about that can be found here: http://www.hixie.ch/advocacy/xhtml.
I corrected tag to element where appropriate; If you don't know the difference, you're on thin ice, jargon-wise.
The statement that XHTML is semantically rich is just plain false. Whatever semantics you can put into an XHTML document (non-modularized) can be put into HTML as well. There is nothing, anywhere, that specifies that XHTML has to be semantic at all.
"XHTML 1.0 Strict requires that all tags be well-formed, and deprecates many elements and attributes found in HTML 4.01." — no. There is no difference between HTML 4.01 Strict and XHTML 1.0 Strict whatsoever, except that they are specified in SGML and XML respectively.
"XHTML 1.0 Transitional is designed for an easier transition from HTML, and allows..." — wrong again. This one is equal to the Transitional specification of HTML 4.01, and has nothing to do with easier transition from HTML to XHTML. It has to do with easier transition from lenient documents to stricter ones, whether in HTML or XHTML. It's there to be used while you are moving to separation of content and style. (CSS)
An XML declaration is not the same as an XML prolog. Please read up on this document for more information about this subject: W3C's XML specification.
My final correction was about backwards compatibility with HTML. This ended with XHTML 1.0. XHTML 1.1 is not backwards compatible with HTML, as it SHOULD NOT be sent as text/html and can use other XML namespaces as well as have the new ruby elements.

DarkPhoenix 21:45, Jun 18, 2005 (UTC)

Re: 2; "The reason that HTML documents often have messier code than XHTML documents is due to the way the browsers parse them." I'm sorry, but this did not make sense - some parsers may render code messily, but they do not actually alter the code. What you added in that section also seemed rather unnecessary...like it was talking to editors, which is what the Talk page is for. ¦ Reisio 22:36, 2005 Jun 18 (UTC)

Re: 7; The text was not implying that an XML declaration is the same as an XML prolog. It was somewhat inaccurate, however, and has been fixed. ¦ Reisio 22:40, 2005 Jun 18 (UTC)

Re: Re: 2; In any case, saying that HTML is generous is a mistake, because this is not due to its specification, but because of browsers' implementations. I would like to make that a point, because currently, this article is learning people the wrong thing, and we don't want an encyclopedia to have wrong information, do we? I was not talking to editors, I was talking to the readers, because there are so many misconceptions going around concerning the difference between HTML and XHTML, and this document was (and still is) full of these misunderstandings. Let me give you an example. This document here is perfectly valid HTML:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html> <title>Test</> <h1>This is a test</> <p>Para 1 <p>Para 2

The reason that won't show up correctly in browsers is because they don't parse it with an SGML parser, but with some home-made tag-soup-ish crap. If browsers got that right, they would, due to the rules of SGML, show a page with a title, a header, and two normal paragraphs. Try to validate that, it's valid.

DarkPhoenix 08:36, Jun 19, 2005 (UTC)

I believe whoever initially put in "generosities" was not incorrect exactly. HTML is generous in that it takes a more complex (or at least more bloated) parser to render it properly; because of the less-strict markup, etc.. I agree, however, that it's not the best explanation and could be improved. ¦ Reisio 10:04, 2005 Jun 19 (UTC)

"when you start with tag

and a tag that cannot be inside li is reached, assume li to be closed".

This complicates parsing a LOT. with xml you can parse a document into a tree without any prior knowlage of the tags. with sgml you need to know all the rules about the tags before you can even turn the document into a tree. Plugwash 22:52, 22 January 2006 (UTC)

also if xhtml 1.1 isn't meant to be backwards compatible why are the ruby elements designed with a fallback system built in? Plugwash 22:55, 22 January 2006 (UTC)

Ruby markup was a backwards-compatible, IE-specific extension of regular HTML way before XHTML 1.1 was in the works. The W3C spec is just a rubber stamp on what IE was doing, albeit a little more fleshed out. I believe they associated the spec with XHTML 1.1 because the DTDs for HTML 4 and XHTML 1.0 were already set in stone, but XHTML 1.1's DTDs were still in development and thus could formally accommodate the new elements.

The 'fallback' mechanism was (and still is) just a side effect of unrecognized elements like rp being treated, in lenient, non-DTD-enforcing HTML processors, the same as elements like span: where the tags are ignored but the contents are still processed. Browsers that support ruby know to completely ignore the rp tag and its contents.

From this I would not conclude that "XHTML 1.1 is meant to be backwards compatible" — rather we can only say that "by virtue of incorporating elements that were devise to extend HTML, XHTML 1.1 incorporates some elements that, if used in HTML or XHTML 1.0 documents and processed by lenient user agents, can be said to be backwards compatible." — mjb 23:24, 22 January 2006 (UTC)

i seem to remember ruby had a tag (similar to the noframes tag in framesets) specifically designed for fallback purposes. Plugwash 23:25, 22 January 2006 (UTC)

"No to XHTML"

http://www.spartanicus.utvinternet.ie/no-xhtml.htm (unsigned comment by 132.162.244.206)

notice how they don't list any real disadvantages of xhtml they just attack it based on the fact it may not make all its promised improvements.

also some stuff they say is totally wrong. for example "omit the xml declaration and the document can only use the default character encodings UTF-8 or UTF-16." is wrong, its quite ok to specify the charset in a http header. Plugwash 19:47, 6 September 2005 (UTC)

Your argument is only valid for documents served over HTTP. Anne van Kesteren 18:40, 29 December 2006 (UTC)

True, but if you are writing in eXtensible Hyper Text Markup Language, its safe to assume its intended to be served over Hyper Text Transfer Protocol.
a) there are other document formats that can be used if this is not the case.
b) the very fact that its XHTML means the data it can be styled with XSL to another format and not served over HTTP.
Old sweat 23:27, 4 January 2007 (UTC)

HTML not being developed?

I am removing this claim because the WHATWG intend to submit their further work with HTML to the W3C.

eXtensible

Can anyone give an authoritative citation for the unconventionally capitalized "eXtensible"? The W3C uses "Extensible" in its recommendations: see XHTML™ 1.0 The Extensible HyperText Markup Language (Second Edition) for example. Indefatigable 00:12, 29 November 2005 (UTC)

Several of the older XHTML drafts mention "XML is an acronym for the eXtensible Markup Language" [1], but that progresses to "XML™ is the shorthand for Extensible Markup Language, and is an acronym of eXtensible Markup Language [XML]." [2], which illustrates that the "eX" was their way of explaining the acronym clearly, not part of the actual name. The older drafts give the "eX" usage minor legitimacy, but its usage is clearly deprecated. "Ex" should be used - bolding the lower case "x" would be fine if people want to make the acronym's origin obvious. ¦ Reisio 00:40, 29 November 2005 (UTC)

If it was "Extensible" instead of "eXtensible" then we would have "EML" instead of "XML". ¦

empty elements in html

from the article

Authors who follow the compatibility guidelines essentially create HTML that, while technically invalid due to the use of malformed empty-element tag

umm i don't think theese are malformed according to SGML i could be wrong though is there any authoritive source backing this up or going against it? Plugwash 22:47, 22 January 2006 (UTC)

Empty elements using XML syntax (<element/>) are improper syntax in HTML. You can prove this by running a document that uses them through a validator. If you have a <link ... /> or <meta ... /> element in the "head", the validator will throw an error. However, the validator will not throw an error on <br />, but this is only because it's not parsing as you would accept. A strictly compliant HTML parser parses <foo/> the same as <foo>>. If there were a browser that used a strictly compliant HTML parser, it would render a ">" after the line break every time you used <br />. Indefatigable 18:01, 23 January 2006 (UTC)

do you have any quotes from standards that back this statement up? Plugwash 18:12, 23 January 2006 (UTC)

The well-formedness rules for HTML are different from XML. XHTML 1.1 strict is strict XML, but I think XHTML 1.0 transitional is intended to be parsed either way. Not sure if that helps. —Michael Z. 2006-01-23 18:26 Z

I don't have a succinct quotation from a standard, but the SGML standard coupled with the SGML declaration for HTML from the W3C implies the behaviour I've described. When XML came out, a feature was added to SGML to enable XML-style empty elements (so that XML could be a subset of SGML), but the feature has to be explicitly enabled in the SGML declaration. The W3C has never updated the SGML declaration for HTML to turn this feature on. (I confess that I'm not expert enough with SGML to say exactly what changes whould be necessary, but I think it goes in the FEATURES MINIMIZE section.) Here are a few more links (but they are not authoritative): Empty elements in SGML, HTML, XML, and XHTML, Sending XHTML as text/html Considered Harmful, Comparison of SGML and XML (this last one's tough reading for those who, like me, are not hard-core SGMLers). Indefatigable 21:14, 23 January 2006 (UTC)

XHTML media types for dummies

"XHTML Documents which follow the guidelines set forth in Appendix C, "HTML Compatibility Guidelines" may be labeled with the Internet Media Type "text/html" [RFC2854], as they are compatible with most HTML browsers. Those documents, and any other document conforming to this specification, may also be labeled with the Internet Media Type "application/xhtml+xml" as defined in [RFC3236]. For further information on using media types with XHTML, see the informative note [XHTMLMIME]." -- http://www.w3.org/TR/xhtml1/#media

There is no "media-type problem…". There may be a literacy problem, however - I'm not sure how we're supposed to deal with that through a medium dependent on literacy, but I am sure the solution isn't linking to a couple short, redundant writeups on commercial sites. ¦ Reisio 02:28, 9 February 2006 (UTC)

I am the primary author of Serving up XHTML with the correct MIME type. The article was written in an attempt to provide an easy way for developers to serve XHTML as application/xhtml+xml. The tutorial describes the technique in detail, and has had technical and peer review. The media-type problem relates to the fact that certain well-known browsers cannot cope with the application/xhtml+xml MIME type, and must therefore be served text/html after the document has undergone some sort of transformation. I would argue that it makes sense to link to the tutorial if you are going to link to Ian Hickson's Sending XHTML as text/html Considered Harmful; however, as the author I don't feel qualified to be the one to decide - I'm sure other (and perhaps more learned) Wikipedians will make the correct decision. To address part of the comment by Reisio, I don't understand why you think the article is redundant. Nor do I think that it is a problem that the link is on a commercial site - I would imagine that a great many external links on the Wikipedia are to commercial sites; however, I would be perfectly willing to create a Wikipedia entry of the technique if you think that is more appropriate. Thoughts, anyone? -- Scjessey 15:33, 9 February 2006 (UTC)

Wikibooks would be the place to do it (you can link from here to there). Sufficeth to say that I would have no issue with that. ¦ Reisio 22:52, 9 February 2006 (UTC)

xhtml and css

What about that thing where 'head' needs the same css attributes as 'body'? 0_o --81.84.216.52 19:59, 10 February 2006 (UTC)

...eh? ¦ Reisio 01:09, 11 February 2006 (UTC)

He's talking about the misconception that CSS styles don't work on the body tag unless they're duplicated on the html tag. How this got started, I don't know, but it's apparently based on the fact that, in XHTML, the body element isn't "magic" like it apparently is in HTML 4; it's just another element, and the very bottom-most element to be rendered is actually html. This is a good thing for a few reasons, the least of which is that it lets you get rid of that div element so many designers seem to put around all their content, and just style the body tag like you would that div, thus reducing the amount of markup you need to worry about. I use this all the time.

KyleCardoza 08:31, 13 August 2006 (UTC)

I hadn't heard that misconception before. I do believe there's a case where to specify the height of the body as 100% (for example) the HTML element also has to have its height specified as 100%. If not then the 100% appplies just like the default to 100% of the height of the contents the body encloses. So if the body contains two paragraphs of 300 px each a property of 100% on the body will make the body 600 px high. Howevver if the HTML element is also included in the 100% rule the body will be the height of the viewport instead (which is typically what a designer is after when they set "body {height: 100%;}". I found the article that exposed me to this issue here. --Cplot 19:33, 13 August 2006 (UTC)

Considering the external links on this page

I recently removed a validator link which the author added again. Looking closer, I feel that even more of the external links on this page should be removed. Per What Wikipedia is not, External links and When should I linke externally there is no need to link to (currently) six validators. I'd say the W3C's Markup Validator and WDG HTML Validator links are sufficient. The others are either too specific (e.g. the plug-in for the non-free Windows-only program NoteTab and the module for ASP.NET 2.0) or too general (UiTest.com Analysis is even redundant with the W3C validator, and is hardly relevant to XHTML at all, per se.)

The three articles about “harmful” text/html XHTML (I'd say it's rather POV) and correct serving should probably be copyedited and included in this article, or be removed, per the policies/guidelines linked to above. You could also consider making an article dedicated to the cause, like serving XHTML correctly or whatever, although I don't really think that's proper... DarkPhoenix 10:33, 4 April 2006 (UTC)

I agree that there are too many validators listed here, for this context, and that this could become an unmanageable spam-link farm for every half-finished idea on the web. However, I've just added one - the Firefox plugin/extension module - which I now use all day every day as a web site developer. My personal POV is that everyone (developers, users, managers et al) should be made as aware as possible of valid vs. invalid (X)HTML. Rather than lose the link information that's accreted here, I suggest a new article, HTML validation or something like that, with a 'Main article' link from here, and several other links from all the other relevant places. Trouble is, I'm a bit busy at the moment (a whole site needs to go into UAT by mid-month), so I can't get down to it at this stage. --Nigelj 21:00, 4 April 2006 (UTC)

On the subject of the links regarding XHTML's content type, again I agree that this is an important and topical issue and that another new article, perhaps called Serving XHTML, is called for. Incidentally, I just checked and W3C's homepage is served with a HTTP header that says it's 'application/xhtml+xml', but the XHTML itself has a <meta/> tag that says it's 'text/html'! I assume that they must have given this some thought, and I feel that information like this should be recorded and discussed somewhere in WP. Microsoft have also been spreading some FUD about this issue, due mainly to IE6's shortcomings, and that needs counter-balancing. --Nigelj 21:21, 4 April 2006 (UTC)

Considering the W3C's serving of application/xhtml+xml with a <meta> tag of text/html, I can only assume it may be done to fool IE to cooperate, because per the rules of the HTTP protocol, the HTTP headers are the most significant setting... DarkPhoenix 22:05, 4 April 2006 (UTC)

Speaking of which, my IE crashes upon trying to access the W3C homepage. DarkPhoenix 22:11, 4 April 2006 (UTC)

Also, considering there's an article on Divitis, I'm sure there won't be many objections to an article like Serving XHTML. DarkPhoenix 08:00, 6 April 2006 (UTC)

First timer

Am learning HTML & CSS to set up a web page. Should I switch to XHTML?--shtove 18:37, 15 April 2006 (UTC)

Unless you need to do some advanced stuff like MathML or included SVG, I don't see why you would need to. Of course, experience has shown me that others will likely disagree with me for some reason or other. --DarkPhoenix 19:09, 15 April 2006 (UTC)

If one follows the XHTML syntax the only thing that needs to be changed to transform documents between HTML 4.01 to XHTML 1.01 (of the same sub-version: strict, transitional, frameset) is to:

change the DTD declaration above the head
make sure the xmlns is included (for the XHTML version)
change the lang attributes to xml:lang attributes

The only other issue is that XML offers the self-closing tag shortcut that is technically foriegn to SGML based HTML. The W3 recommends adding a space before the '/' in self‑closing element: for example <br /> instead of <br/> The W3 also recommends simply including both lang and xml:lang attributes to deal with the move from HTML to XHTML. And you can go ahead and declare the xml namespace in an HTML document without any hangups..

So if one is hand coding their documents, tis probablly best to just stick to an XHTML syntax with these minor changes. Then moving from one to the other is just a matter of changing the DocTypeDefinition declaration.

XHTML 1.1 strict?

This paragraph from the subsection "XHTML 1.1" argues there's a concept of "strict" conformance that excludes the frameset (and presumably the legacy) modules from the DTD.

Although Modularization of XHTML allows small chunks of XHTML to be re-used by other XML applications in a well-defined manner, and for XHTML to be extended for specialized purposes, XHTML 1.1 adds the concept of a "strictly conforming" document: such a document cannot employ such features—it must be a complete document containing only elements defined in the modules required by XHTML 1.1. For example, if a document is extended by using elements from the XHTML Frames (frameset) module, it may still be described as XHTML 1.1, but not strictly conforming XHTML 1.1. Instead, it might be described as an XHTML Host Language Conforming Document, if the relevant criteria are satisfied.

I can find no verification of this. It wouldn't surprise me if it were true, but can soneone point to somewhere in the specification where this is presented? --Cplot 20:13, 13 July 2006 (UTC)

XHTML 1.1 - Conformance Definition —mjb 00:40, 14 July 2006 (UTC)

Yes, that does it. Sorry I mised that. I had seen that page, but missed that is had no frames nor target modules. I also missed that the legacy module has the word "ignore" next to it rather than "include" like most everything else. At first after seeing that link, I thought that they included the legacy module, but not the frames related modules. Now, I gatherr, that to be a conforming (of any kind of conformance) XHTML 1.1 document, I imagine the doucment has to ignore the legacy presentation module. Anyway thanks for the link. --Cplot 03:13, 14 July 2006 (UTC)

Possible rewrite

This article is very inaccurate in many ways. A long time ago I set out to consider how to rewrite it. I recorded my thoughts at User:Tildebeeplus/XHTML. I just remembered this now. Interested editors should take a look. I know the article's improved since that time, and I don't even agree with everything I wrote at the time. But it's there, so I should at least make it known. —tilde 21:33, 19 July 2006 (UTC)

If I were going to rewrite it, I'd get rid of the whole validation section and possibly some other not-so-necessary information (people tend to want to make these articles into comprehensive tutorials) and I think my TOC would look pretty much like yours …but I don't see where it the current article is inaccurate. What errors do you see? Anyway, if you do rewrite things, take it on one section at a time, if you can. More drastic reorgs should be discussed here first. —mjb 23:53, 19 July 2006 (UTC)

I no longer have any interest in rewriting it (which I meant to imply, but for some reason didn't). I stopped watching this article long ago. So what's there is there. I'm just linking to what I had done, since it seemed like a waste to keep it hidden. Again, I think this article's improved its focus, and I can't defend everything I did. But I completely agree that this article should not be a tutorial. The "common errors" section is particularly inappropriate for this encyclopedia. —tilde 00:08, 20 July 2006 (UTC)

Addition of lang and xml:lang attributes in namespace discussion

As a correction, I removed reference to these attributes in the namespace discussion since. those are not relevant to the schema or namespace designation. I left the attributes set in the example, but without further comment. --Cplot 17:10, 12 August 2006 (UTC)

proposed link

The issue ofver XHTMl and CSS had me turning to Quirksmode which is an excellent site for dealing with all osrts of boundary issues in XHTML and CSS. Its not especially CSS oriented, but it might be worht adding to the external links list. --Cplot 20:14, 13 August 2006 (UTC)

Wrong reasons?

"The need for a more strict version of HTML was felt primarily because World Wide Web content now needs to be delivered to many devices (like mobile devices) apart from traditional computers, where extra resources cannot be devoted to support the additional complexity of HTML syntax."

mobile devices aren't striclty the reason - complexity in general is. Today it's almost impossible to write HTML parser that "properly" reads tagsoup pages. Implementators need to reverse-engineer internet explorer and implement exactly the same error recovery. This is a nightmare and browser vendors have to put enormous effort in getting it right (see bugzilla for examples). If you eliminate error recovery from the equation parser implemention becomes trivial task and that's the goal of XHTML. —Preceding unsigned comment added by 84.92.248.233 (talk • contribs) 23:10, 21 August 2006

Sounds good. You should add something like this to the paragraph, though in slightly more encylopedic language. Another issue of complexity relates to being abel to extend the specification, modularize and eveolve it quicker with XHTML than with HTML. Just another thought. --Cplot 01:43, 22 August 2006 (UTC)

Typo?

"....to show the contrast between mark-up and context easier to the human editor."

Shouldn't that be "content" rather than "context"?

Mike Freeman 18:35, 30 September 2006 (UTC)

.xht extention?

Is this a real extension used on some operating systems or is thi sjust made up? It isn't recognized on Mac OS X as mapped to either text/html or application/xhtml+xml. Does Windows use this? I'd be surprised if DOS has any applications to process XML documents so what would be the point of a three-letter extension? I raise this question becuase just recently someone edited this diagram by saying there are no incorrect extensions. While that may be true in some sense. In another more accurate sense it is not true. --Cplot 16:51, 2 October 2006 (UTC)

I guess I should add to the OS question whether any http server (out of the box) maps this filename extension .xht to an xhtml related mime-type? Obviously the servber aministrator can add filename extension mappings, but I'm wondeering whether this is one that exists somewhere out of the box. I'd hate for Wikipedia to be the originator of such a strange filename extension. --Cplot 17:01, 2 October 2006 (UTC)

Yes, it's a real extension; see RFC 3236. The default mime.types for Apache, the most widely used web server, maps both xhtml and xht to application/xhtml+xml. --cesarb 17:31, 2 October 2006 (UTC)

OK. Good enought for me. --Cplot 06:31, 3 October 2006 (UTC)

(XHTML 2.0): <sup> and <sub> are not presentational elements

The section on XHTML 2.0 states that the only somewhat-presentational attributes remaining are sup and sup. Actually, those aren't presentational at all; they're a standard part of text and are required for proper typography in many written forms and languages. Examples: French titles like M^lle Dupont, chemical formulas like H₂O, mathematical formulas like e = mc², and footnotes.¹ They are no more presentational than the shape of the letter X. - Richcon 02:41, 17 October 2006 (UTC)

Thanks for highlighting this. There's nothing wrong with ‘sub’ and ‘sup’ elements being presentational. Sometimes the goal of eliminating all presentational elements can just make the language cumbersome.

The examples you give, though drawn from the recommendation, actually make the opposite case. I’m starting to think the word “somewhat” shouldn’t be there in that passage. Typography is both presentational and semantic. It doesn’t try to make the distinction that the W3C makes. And the more examples we can list for various meanings presented as either subscript or superscript the more it demonstrates that these are not semantic elements. Consider an analogous example (paraphrasing):

Italics are part of text and are required for proper typogrpahy in many written forms and languages. Examples: Book titles like The Advenures of Huckleberry Finn; sarcastic expression like, I love wastching Oprah; and emphasis such as several different meanings can be presented with italics.

All of those phrases have meaning and that meaning is conveyed throughy an italics presentation. However, the italics is used in the absence of having other semantic elements to take their place. Clearly the superscript ‘2’ in e = mc², does not share the same meaning (though it does share the same presentation) with the ‘lle’ in M^lle. I can imagine what elements that might serve as semantic elements for the latter two examples. For example an <exponent>2</exponent> element. However, for the French example I’m not familiar enough with French to even describe the meaning conveyed in the superscript. It seems somewhat similar to ordinal counts that are sometimes presented with superscript in English (for example: 1^st, 2^nd, 3^rd, 4^th). Again this is another place where superscript may be an important presentational idiom, but it doesn’t make ‘sup’ into a semantic element.

The shape of the letter ‘X’ is another good example. The shape of the letter ‘X’ — or its glyph — is the presentational form of the letter. Its semantic form is its “character”. This is another area where several decades ago computer science began to differentiate between semantics (characters) and presentation (glyphs) and is quite analogous to that separation in HTML. --Cplot 04:49, 17 October 2006 (UTC)

I was thinking about this issue that Richon brought up and it occurred to me that many of those examples actually have semantic structures in existing W3C recommendations. For example the french abbreviation example could be covered by the ‘abbr’ element and the ‘:first-letter’ pseudo selector (combined with the ‘lang’ selector for ‘fr’ if someone wanted to make it specific to French). Similarly the e=mc² has a MathML semantic syntax. The footnote, though not covered by current recommendations, is included in the CSS3 paged-media generated content draft. That leaves only the chemical formulae semantics that do not have a W3C semantic markup that I’m aware of. Of course with XML namespaces another group could develop and publicize a separate chemistry xml in its own namespace. --Cplot 06:18, 20 October 2006 (UTC)

The technique of using abbr and :first-letter can't cope with all French abbreviations that use superscripts, e.g. 22^e (vingt-deuxième) and Établ^ts (Établissements), but these superscripts are not truly semantic, because if superscript formatting is not used, the text is still readily understandable, and its meaning does not change. Indefatigable 21:43, 21 October 2006 (UTC)

In a mathematical formula the superscript is semantic: without it, the exponent is incorrectly expressed. But in abbreviations like 3rd, 22e or Mlle, it is solely presentational (I don't know if Établts is an acceptable typographic expression). Hence, semi-presentational. —Michael Z. 2006-10-22 16:34 Z

This is a common misunderstanding between presentation and semantics. If something is presentational, it may convey a meaning. However, it is not the same as the meaning. A superscript is one possbile way to convey the meaning of an exponent. It also can convey lots of other meanings depending on context. A semantic element conveys the same meaning regardless of context (see the example of italics above). --Cplot 16:57, 22 October 2006 (UTC)

--Boxxertrumps Says: there is No Point to The superscript and subscript elements, because the same effect can be achieved with css. Example: 7<sup>3</sup> is the same as: 7<span style="vertical-align: super">2</span>

"Future Product" tag on The XHTML 2.0 draft specification section

Someone marked this article with a {{future product}} tag (now moved to the relevant section) because XHTML 2.0 is still a draft recommendation. Since the section is referring to the draft itself, rather than the future specification, I don't believe the tag is necessary. I recommend that the tag is removed. Anyone else agree? -- Scjessey 18:02, 18 October 2006 (UTC)

I the one who moved it to the XHTML 2.0 draft section, though I tend to think it isn’t all that necessary. XHTML 2.0 is not really a ‘product’ so the template seems a bit forced. Also I agree that since the section makes it clear it’s talking about a draft specification that it’s also clear things can change in the future. --Cplot 18:11, 18 October 2006 (UTC)

XHTML HTML compatibility example

Someone took away the ‘id’ and ‘name’ attributes from this example. I added them back in but this time to a paragraph element rathern than the body element. I added ‘lang’ and ‘xml:lang’ attributes to the body. I added an empty script element to show how that shuld be and I removed the xml declaration since appendix C does also recommend leaving that off (for the IE 6 and ealier doctype sniffing bug). --Cplot 04:17, 24 October 2006 (UTC)

I have removed the "name" attribute since it is invalid in XHTML 1.0 Strict. I have also removed "xml:lang" and "lang" in <body> as there has been "xml:lang" and "lang" in <html>. XML declaration has been added as it should be used in an XML document.

In fact, the part ", following the HTML Compatibility Guidelines, in Appendix C of the XHTML 1.0 Specification." was added by me (using an IP address of 61.238.43.155). At first, I thought that the example should have something related to the HTML Compatibility Guidelines. However, it is really difficult to balance the use of XHTML 1.0 Strict and the HTML Compatibility Guidelines. Should the example still reflect the HTML Compatibility Guidelines? I am not sure about this now. I would like to ask for comments. If it is not necessary, I suggest using XHTML 1.1. If it should, then changing from XHTML 1.0 S to XHTML 1.0 Transitional may be better.

--Franklin Tse 09:46, 24 October 2006 (UTC)

More about the XML declaration, quoted from the HTML Compatibility Guidelines -

"In order to portably present documents with specific character encodings, the best approach is to ensure that the web server provides the correct headers. If this is not possible, a document that wants to set its character encoding explicitly must include both the XML declaration an encoding declaration and a meta http-equiv statement."

--Franklin Tse 09:55, 24 October 2006 (UTC)

The XML declaration is unnecessary for any document that follows XML 1.0 and uses either UTF-8 or UTF-16 as its character encoding. As appendix C advises, it should not be used for HTML compatibility. This is due to the fact that it sends IE 6 and ealier in to quriks mode for the CSS box model.

I think it’s a good idea to have an appendix C example in the article. And appendix C recommends those lang/xml:lang and id/name duplications among other things. As for validity, technically, those attribute duplications are no more ‘valid’ under XHTML 1.0 Transitional than they are under XHTML 1.0 Strict. For appendix C, the point is to include them regardless of invalidity for compaitiblity reasons. --Cplot 12:11, 24 October 2006 (UTC)

I do think that the example has to be successfully validated by the W3C Markup Validation Service. I have changed the example from XHTML 1.0 Strict to XHTML 1.0 Transitional and added an <img> for showing id and name. --Franklin Tse 16:19, 24 October 2006 (UTC)

That does make sense that it should validate if possible. I changed the fragment identifier to an anchor element and now it validates in XHTML strict. I wanted it to be strict so that it undermines the misconceptions that the HTML compatibility guidelines have something necessarily to do with the “Transitional” variant of XHTML 1.0. --Cplot 01:11, 25 October 2006 (UTC)

I have rewritten the example for showing the difference between clean XML and codes following the HTML Compatibility Guidelines. --Franklin Tse 11:58, 25 October 2006 (UTC)

Overview rework

I made some changes to the overview section. One of my main concerns is that the article is drawn largely from the recommendations without reflecting what’s actually happening with those recommendations: particularly with adoption of XHTML. For example the issue of XHTML being served as HTML keeps getting added ad hoc to the article in one place after another: typically without due explanation. So I decided we should try to address that as soon as possible in the overview so that it can be referred to where relevant in latter sections. In doing this, I moved material around, corrected some material and added some new material. I’m still gathering my references for some of those claims, but if anyone wants to add some before me, feel free. --Cplot 04:03, 25 October 2006 (UTC)

Will the paragraph "So while many authors were anxious to embrace the new XHTML recommendations, the obstacles from browser vendors — persisting seven years after the introduction of XHTML — have slowed the effective rate of the adoption. Recently, some have begun to question why authors ever made the leap into authoring in XHTML. Campaigns now exist discouraging authors from following the W3C’s appendix C HTML compatibility guidelines: suggesting it’s a mistake. Due to a vacuum of information and without forthcoming browser support, XHTML adoption among authors is actually beginning to reverse." be too subjective?

--Franklin Tse 13:51, 25 October 2006 (UTC)

I don’t think it’s too subjective, but it does need some citations. I think the earlier claim about the authoring in XHTML should also be supported. I imagine someone must have some statistics on how many XHTML documents are out there, but I’m having trouble finding it. I do know whenever I check a web page, there’s a very good chance it’s authored as XHTML. I’m trying to track something down on that. I do have citations for the claims made in this paragraph and I’ll be adding them. --Cplot 15:17, 25 October 2006 (UTC)

Can this link "http://annevankesteren.nl/2004/06/invalid-html" be a citation? --Franklin Tse 16:06, 25 October 2006 (UTC)

Yes, that’s a good example of the confusion I was talking about. There are many others. Perhaps we could place a footnote and list several of the more notable ones. Here’s some other notable (notorious?) ones.

As I said, perhaps a footnote with all three of these where the “Adoption” subsection describes the campaign against XHTML. --Cplot 17:05, 25 October 2006 (UTC)

I think they are the criticisms of XHTML. Should a new section titled "Criticisms" be added? --Franklin Tse 17:31, 25 October 2006 (UTC)

I added a footnote to the “Adoption” subsection. They’re not really straightforward and clear enough to be called criticisms off XHTML. They are criticsms of XHTML’x HTML compaitbility foom appendix C, so in that minor sense they are criticms of XHTML I have a feeling we’ll find it difficult to locate scholarly criticms of XHTML. I think those citations are notable for the effects they’re having on XhTML adoption (swaying authors away, convincing browser vendors there’s no need to support XHTML, etc) then as criticsms per se.

However, I think for the purposes of NPOV, criticms sections are a good idea when they’re clear, concise and scholarly. XML itself probably has more criticism material, but that would be for the XML article.. --Cplot 17:49, 25 October 2006 (UTC)

The article is supposed to be about XHTML itself, is it not? POV stuff about adoption is misplaced, especially when laced with weasel words (which certainly isn't scholarly). There is no "campaign against XHTML", but rather a desire not to see it used improperly, or inappropriately. -- Scjessey 18:06, 25 October 2006 (UTC)

weasal words banner

There’s only a single ”some say” in that section and the some refers to the campaign cited in the next sentence. Did you mean to place it on a different section???

"many authors" and "some have begun" are examples of weasel words. "Due to a vacuum of information and without forthcoming browser support, XHTML adoption among authors is actually beginning to reverse" is an example of POV. -- Scjessey 18:15, 25 October 2006 (UTC)

Please do not removed the weasel words tag until the words have been excised and a consensus exists. -- Scjessey 18:17, 25 October 2006 (UTC)

You’re misreading those as weasal words. The ’many authors’ is not meant to cite the authors it’s like saying many browsers. Here the authors are objects of the claim (which could admitabdly use citation, please help), no subjects who are cited to support the claim. I removed the tag because I had no idea how you were misreading the statements. Again,on the ’some have begun’ the citation also refers to the cited campaign. This is a notable event in XHTML and there are citations of it in the footnote (including the claimed POV beginning to reversse) I can provide more citations if you think that’s necessary. What exactly is the objection however. That might be a better place to start. Whnever I see someone start with one Wiki policy and then quickly move on to another one, there’s usually something else at stake that’s much simpler to resolve. --Cplot 18:34, 25 October 2006 (UTC)

This may be useful: XHTML as an Emerging Innovation for the World Wide Web. —Michael Z. 2006-10-25 18:37 Z

Also as I already said above (on the many authors claim) I’m trying to track down some stats on that (and would appreciate help). Clearly adoption of XHTML by authors has been quite impressive (considering the browser obstinance). I can point to many cases where sites/authors have reversed their decision and made a public statement about it, but there must be some statistics on this. It’s something more easily compiled by bots than browser stats that get quoted all the time (thanks Michael for the lead). --Cplot 18:44, 25 October 2006 (UTC)

"many web authors" is just as bad as "many authors". You cannot plonk stuff like that into the article without backing it up with suitable references. Furthermore, you should not be removing the weasel words tag just because you don't understand why it is there. There is still a lot of POV stuff in there too. For example, "XHTML adoption among authors is actually beginning to reverse." You have to back that stuff up. I think the entire last paragraph of the "adoption" section should either be removed, or completely rewritten. -- Scjessey 18:59, 25 October 2006 (UTC)

Well I don’t think you should be putting banners on pages wihtout reading carefully what’s written and visiting the talk page. But maybe that’s just me. Why is ’many authors’ a weasal word while ’many browsers’ is not. As I said before I’m not using ’authors}’ in the way one would use it in weasal words, but just like the rest of the areticle uses ’many browesers’: as an object. As I said on the talk page, I have some reference for that that I’m compiling. The citation I’m having trouble tracking down is the claim you have no problem with: that adoption of XHTML happened (from the first paragraph). The events of the last paragraph, I've already referenced. And I can keep providing references for this ad infinitumm if iyou’d like. Try visiting the talk page, beffore throwing misplaced banners on a page. --Cplot 19:20, 25 October 2006 (UTC)

I've been a contributor to this article for a long time, and I absolutely monitor this talk page. Perhaps you don't grasp what it is you are doing wrong, despite my repeated explanations. Let me try one more time. In order to disguise a lack of evidence, or simply a lack of any credible reference, editors sometimes uses terms like "some people say" - these are weasel words. When you use generic terms like "many authors", you are doing the same thing. As I read your words, I am not getting the impression that you are referring to "authors" as an object. The sentence structure and grammar you are using indicates a generic term. As I have said before, you must alter the language of the paragraph in question to make it more clear. Furthermore, since you wrote this section, it is not for you to decide to remove any tags. You should have left the {{weasel section}} tag in place until a consensus of opinion was reached on this discussion page. Once again, it is not for you to decide - it is a collective decision.-- Scjessey 21:39, 25 October 2006 (UTC)

I understand what a weasal word is. The problem is there aren’t any weasal words in the section I added. There isn’t a NPOV problem either. There are some citation problems and I’ve been forthcoming about that from the moment I added this content. The problem is you’re being too quick to just insult the prose with unsupported weasal word banners wihtout explaining what facts you’re disputing. What you want to put up there is a ]fact’ tremplate. That way it indicates a dispute of some fact in that subsection. Without that I can’t tell what particular piece you’re looking for citations on.

Do you really think I’m using ”some authors” in the sense that I”m citing some authors (in the weasal word sense). I thiink a general readership would find it clear that this is referring to authors of web pages and not authorities being cited. Woulod it be better if I said something like “many web pages authored with XHTML” instead? I think that would be unnecessary, but that change would be fine with me.

Finally, for the record I regularly remove unsupported templates on wikipedia (unless I can make out myself why that template might have been added). Whenever you place a banner or template on an article or section you should be sure to explain why. --Cplot 22:54, 25 October 2006 (UTC)

RDF and XHTML?

I'm wondering, does any of you know if it's possible to mix RDF syntax on XHTML documents through the use of the <rdf:RDF>...</rdf:RDF>? If it's valid that would be pretty interesting.--Saoshyant ^{talk / contribs} (I don't like Wikipedophiles) 13:39, 30 October 2006 (UTC)

You can mix namespaces when XHTML is served as XML (application/xhtml+xml, application/xml, text/xml). That makes serving such pages to a wide audience difficult since IE does not support application/xhtm+xml and for the other two MIME types, it only recognizes the semantics of XHTML if you trick it using an XSLT.

Also,, I don’t know enough about RDF to know how specific user-agents would need to treat it in that context. I imagine the browser would not need to do anything special, and indexing bots and the like would probably still be able to make use of it. If that’s the case it may also work if you serve it as text/html, but with the RDF elements set to ‘display: none’ with CSS. That way the bots could still get to it, but the RDF would not get in the way of the rendering of the page. You’d need to test that on all the browsers you’re targetting hyowever since it goes against the standards.

This really isn’t the place for such discussions, but to make it relevant I’ll say that RDF would make a good example for the nameespace section. --Cplot 16:06, 30 October 2006 (UTC)

I tested it using XHTML 1.1 served as app/xhtml. The W3C validator reports the RDF as invalid, though. I also thought that in theory, it should work. It would also probably extend the use of XHTML if it did work; after all, people don't have a reason yet to use XHTML instead of HTML.

To extend XHTML with new elements, you need to:

send it with an XML media type. You must do this because namespaces are not supported in HTML.
omit references to the formal W3C document type definitions. If there’s no internal subset, then you can simply remove the document type declaration. You must do this since, otherwise, the document will be considered invalid according to the DTD.

JustSomeGuy 19:08, 15 June 2007 (UTC)

I also tested the file under two "XHTML-browsers": Fx and Konqueror. I believe it worked perfectly well in Fx, but in Konqueror it rendered the RDF meta stuff as page content, even though the code was located at the <head>. If someone would like to use my test-case and see if it works on other browsers (Opera, etc.), please e-mail me, so I can send the file attached (I don't have a server up right now).--Saoshyant ^{talk / contribs} (I don't like Wikipedophiles) 11:25, 31 October 2006 (UTC)

changes to Common Errors

The changes I made were to correct some inaccuracies and provide some clarification I thought was necessary.

Script elements in HTML require a closing tag, so it is not correct to include a self-closing tag when served as text/html
It doesn’t apply only to IE 6, but serving it to any browser as text/html requires an explicit closing tag
the convoluted CDATA sections, escapes and escaped escapes are only relevant for a page that (during its lifetime) will be served as both text/html and xml

I had deleted the explanation of CDATA sections in hybrid HTML compatible documents because I thought it was too detailed for the article and yet still not explained in enough detail to give a reader a clear enough understanding. --Cplot 05:36, 31 October 2006 (UTC)

Regarding your three points:

An XHTML document is an XML document, whatever mime-type it is served with. It is relevant to note an isolated case of a normal XML idiom not being understood by a major browser.
A self-closing script element in an XHTML 1.0 document served as text/html is perfectly well understood by Firefox 1.5.x and all the other recent Gecko-based browsers, so it is especially relevant to note that it isn't by IE6. It probably is understood by IE7 (I haven't personally tested it yet, or seen any reports on the point), so even the version number is relevant now that that is out.
Your phraseology certainly read like it could be served with both mime-types at once - which, if any authority recommended that, would make a whole new section in the article! The fact is that served with any mime type, the document is still XML and may be processed by any number of user agents way outside of the page designer's control. These may include search-engine robots, screen-scraping systems, XSLT-based re-processing etc (I do know - I've designed and written some of these things). These 'technical' agents, as well as Firefox et al, may benefit hugely from including the CDATA declaration - even if older browsers (including IE6) don't know what to make of it, so need it hiding. I didn't make up the convoluted form, it comes directly from Sending XHTML as text/html Considered Harmful by Ian Hickson, as cited. While not an official document, this is an oft-quoted and now almost seminal work, even translated into other languages. It has received a lot of attention and updates since it was first written in September 2002, and is unlikely still to contain a 'bad' code example.

XML is XML, and there's a lot more to the web than just a few browser-types, but there's only one browser that is used by a huge majority of users and which is seriously broken and/or out-of-date in many respects. Roll on IE7, and let's hope it's truly better. --Nigelj 22:29, 31 October 2006 (UTC)

Nigelj, first I want you to understand that I’m in complete agreement with you about IE: it’s a piece of crap that has reeally done singificant damage to the progresssion of the web. And unfortuantely IE 7 does virtually nothing to improve things.

However, there’s some misconceptions about XHTMl that I want to make sure the article does not promote or contribute to. For example, it’s not totally clear what you mean when you say an XHTML document is XML no matter what type it is served as. The W3C appendix C that provides a recommended way to make XHTML into HTML compatible markup may be XML in some sense. However, all browsers (and all user-agents should) process it as HTML. It’s considereed very bad practice to not follow the MIME type. If the document is served as text/plain then it should not be treated as XML or HTML, for example.. So in order to serve XHTML as text/html it should follow those appendix C guidelines which effectively make it into another verion of HTML. You’ll find developers of WebKit and Mozilla suggesting the leniency that currently overlooks the unclosed element could change in the future (making those rendering engines more like IE when processing text/html). The fact that it’s understood now in Firefox 1.5 does not mean it will work in future versions of Firefox.

Finally, I wasn’t disputing the escaping of CDATA sections. Just that it is only necessary when the same document will be processed without changes as both HTML and XML. Meaning if it is processed as text/html and application/xhtml+xml mime types without changing the script and style elements. Mozilla recommends against this practice and it’s hard to imagine what such an approach would accomplish. --Cplot 23:20, 31 October 2006 (UTC)

It sounds like we're incomplete agreement except that you placed more store on the mime-type than I did: I guess I was thinking more along the lines that, not only could any document be served with a different mime-type in the future, but any user agent could try assuming it had been (it's their business what it and its operator do with a document once obtained, after all). All I'm saying is, the more general-purpose we make our XHTML documents, and staying as close as possible to overall standards, the better - especially at the moment. --Nigelj 19:41, 1 November 2006 (UTC)

I like to make sure others see this document from the W3C on treating MIME type headers as authoritative. Problems can be compounded if user-agents try to second guess the MIME type headers in an attempt to ‘fix’ things. But yes I agree about sticking to standards and composing documents with XHTMl following the HTML compatibility guidelines (unless you can go pure XHTML). This would include simply using the src and href attributes to reference external scripts and style sheets rather than attempting to embed them (again, unless one goes with pure XHTML). --Cplot 22:34, 1 November 2006 (UTC)

Browser differences

There seems to have been a recent increase in the amount of information relating to differences in the way certain browsers handle XHTML. I'm not sure any of these browser-specific references have a place in this article. Surely this article is about the various XHTML specifications/recommendations, and not about any specific browsers. Mention of Internet Explorer 6, for example, is largely irrelevant - that browser (and IE7, for that matter) is not designed to specifically handle XHTML (it treats an XHTML as HTML). The article is being bloated by browser issues and related examples. -- Scjessey 23:11, 31 October 2006 (UTC)

But the main problems with XHTML at the moment are the browsers - especially IE6. To try to pretend there is nothing to say on that subject at the present time would be to skirt around all the issues, and end up with a very short article (look how short the XHTML 1.0 spec was!). Maybe at some point in the future, when the whole web is being browsed on IE8 snd Firefox 3, we can cull all this stuff from the article. Don't worry we'll still be here then, and someone will ;-) --Nigelj 19:48, 1 November 2006 (UTC)

"But the main problems with XHTML at the moment are the browsers" - The fact that browser manufacturers are incapable of producing products that can handle and properly render XHTML as defined in the specification is their problem. It is not a problem with the specification itself. General criticisms of the specification itself are valid here, but specific browser issues should not be discussed in this article. As I have said before, Internet Explorer was never designed to handle XHTML, so why should so much text be devoted to discussing IE problems with XHTML? It makes no sense at all. It is like talking about how all the different types of square pegs are not supported by round holes. -- Scjessey 20:33, 1 November 2006 (UTC)

I don't know the current figure, but if say 80% of web users still use IE6, then it is fundamental to any XHTML usage that that majority of users should still be able to access the XHTML pages. The XHTML spec is not a theoretical, abstract or academic issue - it has to be discussed and analysed with regard to the real world. Consider an article about a war that discussed only the technical specifications of the weaponry and refused to mention any other aspect? Well, the www isn't a war-zone, so maybe that's not the best analogy. What about an article on education that talked only about chalk or blackboard specifications? Or even only about the syllabus? --Nigelj 21:01, 1 November 2006 (UTC)

Perhaps XHTML adoption issues needs a separate article, because it opens such an enormous can of worms? In the "real world" that you speak of, there is hardly any actual XHTML out there, because it is being served as text/html, and thus treated as invalid HTML. -- Scjessey 21:31, 1 November 2006 (UTC)

Yes, but that was my point in the discussion above: it still is XHTML - whatever mime-type it is served with. The mime-type is just a processing or rendering hint to the first piece of software to receive the download. For a simple example, save it to disk and double-click it - the serving mime-type is then irrelevant.

There's an awful lot of XHTML out there - the whole of Wikipedia for a start, probably all the other wikis that use the MediaWiki framework too. Microsoft's Visual Studio 2005 creates XHTML web sites by default whenever you start a new ASP.NET 2.0 project (and most users will just use the default). I doubt if students are taught anything else, in introductory classes, these days. Whatever the mime-type, if a page has a XHTML DOCTYPE it is XHTML: maybe invalid, maybe HTML-compatible, whatever. Adoption isn't the issue, that's now a given - it's information and understanding that are needed (e.g. from an on-line encycopedia like WP ;-) --Nigelj 22:31, 1 November 2006 (UTC)

First, I would agree that there are a few places where we get into too much detail about IE and the work arounds required for IE. We could probably devote an entire article to how one makes W3C recommended XHTML/HTML/CSS/XSL IE compatible. However, IE’s disabling of XHTML is a significant and documented fact about XHTML so it definitely needs mention in this article. Also, it’s a common misconception to say that IE was not designed for XHTML. It certainly was designed for it; Microsoft simply disabled it to be obstructionist. That’s certainly information worthy of an ecylopedia article. Another common misconecption is that serving XHTML as MIME Type text/html is invalid. That’s just plain wrong and you can look it up with the W3C if you’d like (I take them to be authoitative on this, not some guy writing on his blog). And since most XHTML out there on the web is served in this HTML compatible way (as text/html) then that too is an important thing for this article to include and explain.

Aside from the adoption issue, I only see IE mentioned a few other times. Those other instances could probably be reworded, since it really applies more generally to serving XHTML as HTML and not to IE per se. Or in the case of the object element in the not on the sample HTML, that is not particularly relevant to this article. --Cplot 22:51, 1 November 2006 (UTC)

One other place IE is mentioned is surrounding the inclusion of xn XML declaration when serving to IE as text/html. This too seems like a brief and pertinent fact that should probably remain in the subsection on the XML declaration. --Cplot 23:05, 1 November 2006 (UTC)

backward compatibility example

I like having the example that shows XHTML’s HTML compatible approach. However, I think the two examples are too similar and so it is too dificult to make out the differences. Perhaps using some color or otherwise highligting the differences would be more useful to readers. And I know we’re tying to get an example that involves the ‘name’ attribute, but adding the part about the IE workaround seems like a distraction from the main thrust of the example. Perhaps moving to a <form> element example with a ‘name’ and ‘id’ attributes would be better. Any thoughts? --Cplot 01:54, 2 November 2006 (UTC)

The IE workaround is just for creating a JavaScript and making an empty <object> tag. As I mentioned in the note, it can be replaced by adding <param name="src" value="http://www.w3.org/TR/xhtml1/xhtml1.pdf" /> within <object>. There is a problem that IE does not load the object from the data attribute. Using colors to show the differences seems to be a good idea, but will it make the example difficult to read? -- Franklin Tse 03:47, 02 November 2006 (UTC)

According to the DTD of XHTML 1.0 S, <form> element does not get a name attribute. Details: http://www.w3.org/TR/xhtml1/dtds.html#a_dtd_XHTML-1.0-Strict -- Franklin Tse 04:01, 02 November 2006 (UTC)

If Example 2 changes from XHTML 1.0 S to XHTML 1.1, will it be better? -- Franklin Tse 04:09, 02 November 2006 (UTC)

I understand why the IE workaround is included. And it’s a useful bit of information. However, it seems to draw attention away from what I understood as the point of those examples: to show how one can serve XHTMl as text/html.

As I’ve said before, the inclusion of attributes in the DTD is irrelevant when it comes to authoring for HTML compatibility. As an example, the ‘lang’ attribute does not appear in any of the XHTML DTD’s, but we can include it in documents that are authored according to the Appendix C guidelines. The ‘name’ attribute is just like that. Perhaps the W3C could have avoided a lot of confusion if they had created an Appendix C DTD and made it normative.

I suppose Example 2 could be changed to XHTML 1.1: especially since it indicates it should not be served as text/html, which is only true if it’s version 1.1 or higher. --Cplot 14:48, 2 November 2006 (UTC)

Just to clarify: "lang" attribute is defined in the DTDs of XHTML 1.0, but it is removed in XHTML 1.1.

A XHTML document has to follow the elements and attributes defined by the DTD. Otherwise it will be invalid. I hope the examples on the page are valid.

Regarding the workground for IE, it is optional actually. It can be removed if it draws the attentions away.

The reason why Exmaple 2 should not be served as text/html is due to the CDATA section and the self-closing <object>. It does not have a META element defining the charset as well. Also, according the XHTML Media Types (the W3C note), "The use of 'text/html' for XHTML SHOULD be limited for the purpose of rendering on existing HTML user agents, and SHOULD be limited to XHTML1 documents which follow the HTML Compatibility Guidelines." Example 2 does not follow the HTML Compatibility Guidelines. The codes are, however, cleaner than those in Example 1. Example 2 will be vaild XHTML 1.1 after the DTD is changed.

When I wrote the examples, I wanted to show the differences between following and not follow the HTML Compatibility Guidelines, that's why I used the same DTD for both examples. Now, I have no comment on whether the JavaScipt should be removed or not and whether Example 2 should be changed from XHTML 1.0 S to XHTML 1.1 or not, either. If anybody has ideas, feel free to discuss here.

-- Franklin Tse 16:03, 02 November 2006 (UTC)

I do like the idea of showing the differences between pure XHTML and XHTML when following the HTML compatibility guidelines: especially since HTML compatible XHTML is the norm these days.

Though I noticed the CDATA section and the ‘meta’ element, I hadn’t noticed the self-closing ‘object’ element. That’s the kind of subtlety I was saying is not clear with those two examples. It takes enormous undue scrutiny to tell what differences the examples are trying to exhibit (without color highlighting or the like). As far as the ‘object’ element goes, it is poor practice to have an empty ‘object’ element since there should be some fallback content described within the ‘object’ element. And for the ‘meta’ element (as well as the XML declaration) it is not required for either MIME Type.

Again as I’ve said before, regardless of the DTD one has to include invalid attributes to follow the Appendix C guidelines. Due to the philosophical approach to XHTML 1.0 (to simply port HTML 4.01 to XML), my guess is those attributes were inadvertantly left out of the 1.0 DTD. Like the ‘lang’ attribute, they should have remained until XHTML 1.1. With the separation of well-formedness from validity, invalidity is not such an important issue anymore. I’m not saying to place meaningless elements and attributes into a document, but if someone is including attributes and elements for good reason (like following the appendix C guidelines) then not following the DTD is irrelevant. --Cplot 16:13, 2 November 2006 (UTC)

The HTML Compatibility Guidelines are only guidelines and do not give users permission to violate the DTD. XHTML 1.0 Strict is not designed for backward compatibility. If users need to ensure the highest extent of backward compatibility, they should use XHTML 1.0 Transitional. In HTML 4.01, the specification has stated that "'name' attribute has been included for backwards compatibility. Applications should use the id attribute to identify elements." In XHTML 1.0 Strict, <form> and <img> (probably more elements) do not have the "name" attribute which is having in XHTML 1.0 T.

I agree that having an empty <object> element is not a good practice. If there is any better element that can show the self-closing tag as well as the "name" attribute, please suggest.

-- Franklin Tse 16:33, 02 November 2006 (UTC)

I have a thought suddenly. In fact, XHTML is a new markup language, it is definitely impossible and unnecessary to make a XHTML webpage compatible with all old browsers. Instead, backward compatibility of XHTML should be limited to HTML 4.01-compatible browsers. Browsers that can understand HTML 4.01 should be able to understand XHTML 1.0 correctly as well. In HTML 4.01, "id" attribute has been defined. There is actually no good reason to use the very old "name" attribute to identify an element.

-- Franklin Tse 12:39, 03 November 2006 (UTC)

XHTML_Modularization

A doc about XHTML Modularization has been found, should it relate with the XHTML page? -- Franklin Tse 16:15, 09 November 2006 (UTC)

If you mean it should be merged with this article, then it's possible, yes. Anyone wants to merge it, or maybe leave it as it is?--Saoshyant ^{talk / contribs} (I don't like Wikipedophiles) 16:10, 9 November 2006 (UTC)

I think that they should be merged since future versions of XHTML are modules -- Franklin Tse 04:55, 18 November 2006 (UTC)

Should a new section called XHTML Modularization be written, or just mention XHTML Modularization in the section of XHTML 1.1? -- Franklin Tse 13:45, 03 December 2006 (UTC)

Removed footnote

It said:

Some sentiments against XHTML/HTML compatibility have begun to perculate through web authoring communities. For example, see Sending XHTML as text/html Considered Harmful, XHTML is invalid HTML and Understanding HTML, XML and XHTML. While the arguments behind this XHTML scare are not entirely clear, they seem to be based on the mistaken assumption that the SGML on which HTML (along with XML) is based could never allow a slash ‘/’ within a tag.

The links are duplicated in the previous footnote, and the commentary doesn't seem to be useful (or neutral, and maybe not correct) - I haven't investigated the details myself, but the people in the links are the people who are writing browsers and specifications, so I would be inclined to believe that they are not promoting "mistaken assumption[s]" on technical issues. 131.111.8.98 21:21, 30 November 2006 (UTC)

Differences table very unclear

It is hard to get any meaning from the table of differences between XHTML and HTML. The only difference I can see there is that XHTML has "elements" in it's transitional presentation where HTML has "element properties". Encapsulating this tiny amount of information in a table makes it very hard to see the relevence of the table. It would be much easier to express this difference with a sentence instead of a table.

Further, the reader can't be assumed to know what those terms mean or even that they are different things. So it's hard to see that the table has any information at all (other than that HTML has SGML-like syntax and XHTML has XML syntax - but that point has already been explained in the article). I can't fix this because I can't see the point of the table - can someone more knowledgeable look into this? Thanks. Gronky 12:28, 28 February 2007 (UTC)

XHTML use in AJAX

The AJAX programming techniques rely heavily on XHTML, because it allows HTML fragments to be transferred to client browsers through "XML HTTP Connections" (XMLHTTPConnection object in Javascript). This is a very important element in the so-called "Web 2.0" shift. I Think this is worth mentioning. Hugo Dufort 06:34, 1 March 2007 (UTC)

That is not accurate. Perhaps our Ajax (programming) article can enlighten you. ¦ Reisio 19:20, 1 March 2007 (UTC)

I embed XHTML data sections into my XMLHttpRequest's document structure (which is then parsed by Javascript). Everybody does that. So what's "inaccurate" here? Hugo Dufort 00:46, 2 March 2007 (UTC)

Nope, "everybody" doesn't do that. You can send/receive any kind of data using the XMLHttpRequest object. Not just XML. One popular method f.ex is to send Json-encoded Javascript. Jerazol 07:01, 2 March 2007 (UTC)

IE parser

Will IE ever parse XHTML as XHTML? This nonsense, not parsing it right is annoying. --Stefán Örvarr Sigmundsson 23:26, 26 July 2007 (UTC)

HyperText or Hypertext

While I know that you can find the capitalisation HyperText also on W3C pages I wonder whether it isn't just to point out where the "T" in the acronym comes from (like User:Reisio already pointed out for the "eXtensible")?--Speck-Made 19:59, 8 September 2007 (UTC)

For consistency, if eXtensible is not used, neither should HyperText. Both are inaccurate spellings that deliberately highlight the acronym's origin. As it is now spelled "extensible", hypertext should follow the same rule, not being a name. I'm changing it. --- Arancaytar - avá artanhé (reply) 14:20, 19 March 2008 (UTC)

Rework in places as new HTML5 development means XHTML is no longer "replacement" for HTML

The article has some places where XHTML is described as the replacement for HTML -- but this is no longer true with the move to creating an HTML 5.

Also the fact that XHTML will be skipping from v2.0 (I believe?) to XHTML5 should be explained. —Preceding unsigned comment added by 75.36.151.116 (talk) 08:08, 7 November 2007 (UTC)

I don’t think that’s an accepted fact. For instance, XHTML2 Working Group Home Page does not say anything about skipping. It is just that some people think XHTML is predestined to fail, and thus HTML5 is the only way forward (XHTML5 wouldn’t really matter then, as it is not HTML5‐compatible, anyway). --AVRS 15:23, 7 November 2007 (UTC)

Archived:Tcardone05 (talk) 04:38, 28 July 2008 (UTC)

This page is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.