Sunday, December 24, 2006

XHTML Explained

by Ross Shannon
Argh! Just when we were all getting comfortable with the HTML 4 stuff they go and change the standards again. Now we have to learn all-new tags and stress even more about browser compatibility... Except not really. This isn't a big shake-up like HTML 4.0 was. See what it's all about below.

Why the Change?


Right, before we get into all this, you should have a
good grasp of the ancient and more recent past of HTML. You can get the full
low-down in

The History of HTML
, but I'll summarise:


HTML began as a simple way to



transfer


data
between any computer across the Internet; designed for
scientists and researchers with no publishing experience. Over time the web
became mainstream entertainment and new tags were brought in by the browser
companies that didn't go along with this original



aim
presentation became
hugely important and structure and compatibility
started to take a back seat. This meant that some pages were not accessible for
people with the 'wrong' browser or



computer
setup.


Thankfully, the use of much of the extraneous
presentational tags has receded in use in recent times, mainly due to the
innovation of

CSS code
. Ideal HTML would be purely structural,
with every element concerning how a page is displayed being controlled by a
stylesheet. The
» W3C
(HTML's overseers, whom you should know something about by now) have spearheaded
this desire with XHTML.


Further to all that, in recent times the Internet has
begun to be accessed through new devices
other than the classic computer and web browser arrangement. Things like

PDAs
, phones and, er, fridges with



Internet


access
are going to become common in the near future. There's
an estimate going around that sometime in the near future,
75% of Internet viewing will be carried out on one
of these many new platforms
. The custom-made browsers used in these



systems
need to be small for cost-effectiveness. For every
markup error that a browser has to deal with, more code has to be added to the
program. XHTML is a very,
very strict way of coding, which means the
system makers don't have to accommodate for bad markup.


What is
XHTML?


Before I describe
XHTML, it is probably best
to understand where it has come from. All web Markup languages are based on
SGML,
a horrendously complicated language that is not designed for humans to write.
SGML is what is called a metalanguage; that is, a
language that is used to define
other languages. To make its power available to



web


developers
, SGML was used to
create XML (eXtensible Markup Language)
, a simplified version, and also
a metalanguage.


XML is a powerful format — you create your own tags and
attributes to suit the type of document you're writing. By using a set group of
tags and attributes and following the rules of XML, you've created a new Markup
language.


This is what has been done to create XHTML (eXtensible
HyperText Markup Language
) — which is why you'll see
XHTML being called a subset or
application
of XML. The pre-existing

HTML 4.01 tags and attributes
were used as the vocabulary of this new Markup
language, with XML providing the rules of how they are put together.


So, using XHTML, you are really writing XML
code, but restricting yourself to a predetermined set of elements. This gives
you all the benefits of XML (see below), while avoiding the complications of
true XML; bridging the gap for developers who might not fancy taking on
something as tricky as full-on XML. As you're coding under the guise of
XHTML, all of the tags available to you should
be familiar. Writing XHTML requires that
you follow the rules of conformant XML, such as correct syntax and structure
.
As XHTML looks so much like classic HTML, it
faces no compatibility problems as long as some simple coding guidelines are
followed.


If all of this sounds a bit heavy, don't worry. Transitioning to
XHTML is quite a simple process, with only a few
rules to remember.


Benefits of XHTML


The benefits of adopting XHTML now or
migrating your existing site to the new standards are many. First of all, they
ensure excellent forward-compatibility for your creations.
XHTML is the new set of standards that the web
will be built on in the years to come, so future-proofing your work
early will save you much trouble later on. Future browser versions might stop
supporting

deprecated elements
from old HTML drafts, and so many old basic-HTML sites
may start displaying incorrectly and unpredictably.


Once you have used XHTML for a short time, it
is no more difficult to use than HTML ever was, and in ways is easier since it
is built on a more simplified set of standards. Writing code is a more
streamlined experience
, as gone are the days of browser hacks and
display tricks. Editing your existing code is also a nicer experience as it is
infinitely cleaner and more self-explanatory. Browsers can also interpret and
display a clean XHTML page quicker than one with
errors that the browser may have to handle.


A well-written XHTML page is more
accessible
than an old style HTML page, and is guaranteed to work in
any

standards-compliant browser
(which the latest round have finally become) due
to the insistence on rules and sticking to accepted W3C specifications. As
mentioned above, XHTML allows greater access to
configurations other than a computer and browser.
This interoperability is another aspect
of XHTML's greater accessibility
.


XHTML Coding


The first thing you need to know about changing over to
XHTML as the new standard is that there really
isn't much new to learn. No new tags or attributes have been added into your
repertoire, like
HTML 4
(although a few have been deprecated); this is just a move towards good,
valid and efficient coding
. XHTML
documents stress
logical
structure
and simplicity, and use
CSS for
nearly all presentational concerns. It just means you have to change the way you
write code. Even if you always wrote great code before, there're a few new
practices you need to add in.


What's even more quality about it though, is that a page written entirely in
XHTML will still work fine in the current
generation of browsers, so you shouldn't have any problems migrating your site
across.


XML Declaration


An XML declaration at the very top of your document
define
s both the version of XML you're using as well as the character
encoding. It is recommended but not required; as a few

» old browsers
will choke on a page that begins this way. For this reason, I
advise against using the correct line:


<?xml version="1.0" encoding="UTF-8"?>


and instead using a
meta tag in
the head of your document. If you're using

» Unicode
,


<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"
/>


And if you're using the more common ISO-8859-1 encoding, use


<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"
/>


XHTML DTDs


Whether you use the XML declaration or not, every
XHTML
document must be defined
as such by a line of code at the start of the page, and some attributes in the
main <html> tag, which tell the browser what language the text is
in. The opening line is the DTD (Document Type Declaration).
This tells your browser and

validators
the nature of your page.


A DTD is the file your browser reads with the names and attributes of all of
the possible tags that you can use in your markup
define
d in it. Newer browsers will usually have the latest specs written
into their DTDs. The official

» XHTML Strict DTD
is available for you to
attempt to read. Declare it by putting this at the very top of your code:


<!DOCTYPE html PUBLIC "-//W3C//DTD
XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">


That DTD is the one you use if you're committed to writing entirely correct
XHTML code. Strict
XHTML dispenses with a whole lot of
presentational tags and attributes
, and is indeed very strict.

    If you choose to use it, you're going to have to become close friends with
the » W3C validator. You won't be
permitted to use the font tag at all, nor will attributes like
width and height be allowed in your tables. You won't
be able to use the border attribute on images, and will have to use
the alt attribute on all images if you want to validate. You get
the idea — almost all presentational attributes are restricted in favour of
wider CSS
utilisation, so unless you know your stuff in this regard, it'd be best to use
XHTML Transitional below.


If you're going to hover between HTML and XHTML
use the next DTD, which is a bit looser, and if you're putting together a
frameset page, use the last one.


<!DOCTYPE html PUBLIC "-//W3C//DTD
XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">


<!DOCTYPE html PUBLIC "-//W3C//DTD
XHTML 1.0 Frameset//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">


Most people will opt for XHTML Transitional,
as changing straight to Strict can be a daunting prospect. If you feel you're
able to work within Strict's constraints, by all means go for it.

    A correct DTD allows the browser to go into standards mode,
which will render your page correctly, and similarly across browsers. Without a
full DTD your browser enters ‘compatibility’, or ‘quirks’ mode, behaving like a
version 4 browser, including all of their associated quirks and inconsistencies.
Also, these declarations are all case-sensitive, so don't
change them in any way.


Finally, you need to define the XML
Namespace
your document uses. Don't stress about this — it's simply a
definition of which set of tags you're going to be using, and concerns the
» modular properties of
XHTML. It's set by adding an attribute into the
<html> tag. While we're at it, we specify the language of our pages
too. Modify your tags to this:


<html xmlns="http://www.w3.org/1999/xhtml"
xml:lang="en" lang="en">



</html>


XHTML Coding Practices


And now the moment you've been waiting for — the different styles of coding
used by an XHTML author compared to the old HTML
methods. You shouldn't have many problems adopting these new techniques, so long
as you work carefully. It should be noted, even if it is an obvious point, that
you really must hand code to be able to write valid
XHTML
. No current visual editor comes close to the compliance required.


sourcetip: Even though your code is
changing, your filenames won't have to — you end your files with .html as
always.


1. Tags and attributes have to be lowercase

Whereas before it used to come down to preference whether you used <B>
or <b>, now all of your tags and attributes have to be in
lowercase. This is because XML is case-sensitive — i.e. a tag in capitals is a
different tag to one in small letters.


2. All tags must be closed

Now all of those once-optional </p> and </li> tags are
essential for your XHTML documents to validate.
Even

empty elements
like img, hr, and br
must now be closed. You can use a standard forward-slashed end tag, or just add
in a forward slash to the end of the tag.


<br /> or <br></br>


It's recommended that you use the former method here, and leave a space
before the slash so older browsers aren't confused. They'll just ignore the
trailing slash as an unrecognised attribute.


3. Documents must be well-formed

'Well-formedness' is a dream that you were meant to try and make real from the
start, but many coders write badly-syntaxed code. You have to open and close
tags correctly in XHTML, and nest them properly.


Bad: <p>My coding is <b>bad</p></b>

Good: <p>But my coding is <b>good</b></p>


Remember the simple rule you should have been taught at the very start: The
first tag you open is the last tag you close.


4. Attribute values must be quoted

Back in HTML you could leave out the quotes on a number value, like
HEIGHT=3
, but now all values have to have quotation marks around them, so
that would become height="3".


5. Attribute Minimisation

Some HTML tags had one-word attributes, like HR's NOSHADE.
You can't use these anymore, and must add the attribute in as its own value,
like so:


<hr noshade="noshade" />


Any browser compatible with HTML 4 shouldn't have a problem with markup like
this.


6. Internal Links

Internal links
in HTML were made using a combination of the <a> tag and the
name
attribute. In XHTML, to go along
with XML, you use the id attribute to make these links instead of
the name attribute. For a while you should probably include both so
that your links still work on older browsers, but this will be the method used
in future. The name attribute has been deprecated.


<a href="#section">link</a>

<p id="section" name="section"></p>


Since all tags can take the id attribute, you can now make links
to any element on your page. Most helpful if you add the link to a heading or
specific paragraph.


7. Alternative text in images

While it has always been good practice to add the alt="..."

attribute
to your images, now you must add some alternate text to
every image on your page. If your image is purely decorative you can
give it a null alt attribute with a space:


<img src="header.gif" alt=" " />


You could also try adding the title="..." attribute to as many
elements as possible. It's a good

accessibility
aid, especially on links.


8. Ampersands in URLs

Ampersand characters are frequently used in page addresses to carry variables,
like in PHP. When
coding these addresses into your XHTML, you must
escape them using the
entity value
&amp;. They'll be displayed as ampersand characters (&) on screen,
of course.


<a href="reviews.php?page=27&style=blue">link</a>

becomes

<a href="reviews.php?page=27&amp;style=blue">link</a>


9. Content must be wrapped in a block-level element

In XHTML Strict, when you add text to your page,
you can’t add it directly into the body element. All text needs to
be within a suitable containing

block-level
element, such as a p, a ul or a
div
.




As you should always have done before, be sure to validate your
document
to certify that there are no errors. There is absolutely no
point in writing XHTML if you don't make sure it
is free of mistakes. The online
»
W3C validator
will check your code for mistakes and give you a full report
back. Once you can

» understand
its occasionally unhelpful error messages, it is an excellent
utility. Make use of it.

No comments:

Sandeep Prabhakar's WebLog

News from NASA

Blog News ::

BBC On This Day | Front Page

CNN Special Programming

Hot Jokes ::

Powered By Blogger

Todays Hot News from INDIA ::

Most Popular Computer Hardware ::

Windows XP Expert Zone Community Articles

Dictionary : Word of the Day

Windows Forms.Net

CodePlex MS News ::

SmartCodeGenerator (Asp.Net) : News

Yahoo! Search Blog

MSDN Magazine - Advanced Basics

Microsoft Windows Vista

MSDN: Windows Mobile Developer Center

Live Search's WebLog

MSDN Forums: Windows Communication Foundation ("Indigo")

ExtremeTech.com News Letter

MSDN Forums: .NET Framework Networking and Communication

How-To"s Update

CGIndia - 3D | Animation | Visual Effects | CG Resource