An HTML document is a "pure ASCII text" file. ASCII text files are usually created in a text editor, although it is possible to use your favorite word processor provided you are familiar with the "Save as Type" options and save your file as a text file. A file extension of .txt does not guarantee the file is in fact a "text" file, and a "pure ASCII text" file can have various extensions like .cpp (C++ source code file), .bat (DOS batch file), etc. Basic HTML documents that we will be working with have the extension .html or .htm.
Numerous HTML "Integrated Development Environments" (IDE's) such as Dream Weaver, FrontPage, etc. implement "WYSIWYG" editing similar to word processors. However, all HTML documents for this class will be created in a text editor (or word processor saved as text). All major operating systems ship with at least one text editor: Windows Notepad, edit; Mac OS SimpleText ; UNIX/Linux vi, emacs.
Windows Notepad defaults to a file type of "Text Document" and automatically appends a ".txt extension to most filenames. Enclose the file name in quotes to ensure you always get the exact file name and extension that you intend:

Force Notepad to save file without .txt
extension
XHTML (Extensible Hypertext Markup Language) is the latest specification for HTML and will be used throughout this discussion. The following XHTML code is a complete, minimum shell or template for a Web page. The sample document begins with a Document Type Declaration (the first 2 lines) followed by an XML namespace (the third line). The Document Type Declaration (DDT) informs the browser to display the page in standards mode defined by the World Wide Consortium (W3C). A missing, incomplete, or incorrect DDT will cause the browser to revert to quirks mode.
The head element contains various items of information about the document, but only the title element is required. The sample also uses a meta element to the declare character encoding. Character encoding can also be declared in an XML Declaration (recommended by the W3C), however an XML Declaration (which appears before the DDT) causes Internet Explorer 6 on Windows to use quirks mode. Note that, although the title element content is used on the browser window Title Bar, nothing in the head element is displayed as part of the page content.
The page content is contained in the body element, which becomes a container for other HTML elements, and is what will be displayed in the browser. The sample uses HTML heading elements of 2 different sizes and paragraph elements to organize the structure of the page.
Tags delimit elements. Starting tags are composed of a left-angle bracket, a keyword that identifies the element, and a right-angle bracket. Ending tags are identical to starting tags with the addition of a forward slash between the left-angle bracket and the element keyword:
|start end |
\ tag| element content |tag |
\ | | |
<h1>Basic HTML Document</h1>
| |
| level 1 heading element |
Some elements are empty elements; they do not have any content of their own, but direct that some other action or special character be inserted at that position. The meta element in the sample is one example; other common examples are the line-break element, the image element, and the horizontal rule element. To satisfy the XHTML rule that all elements have ending tags empty elements merge the ending tag with the closing right-angle bracket with a single space and a forward slash immediately preceding the closing bracket. See the meta element in the sample.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Basic HTML Sample 1</title>
<!--
sample1.html
-->
</head>
<body>
<h1>Basic HTML Document</h1>
<h2>HTML Elements</h2>
<p>HTML documents are composed of elements that define the structure of a
page. Elements are delimited by starting and ending tags.</p>
<h2>ASCII Text Files</h2>
<p>HTML files are ASCII text files that can be created and edited in a text
editor such as Windows Notepad. If a word processor is used care must be taken
to save the file as text.</p>
</body>
</html>
sample1.html
XHTML is very similar to the older HTML specification (XHTML is HTML with stricter standards). If you have some experience with HTML, note these differences:
Older HTML used element attributes to control many presentational features such as text formatting, text and background colors, text alignment, etc. HTML was designed to define page structure and content , not appearance. Therefore, modern HTML and XHTML favor the use of Cascading Style Sheets (CSS) to separate structure from presentation. In some cases, however, browser support for CSS has not complete caught up, so occasionally some attributes are needed. Also, some element attributes have no CSS equivalent, such as the meta element attributes in the sample, and in the following image element:
<img src="notepad_save.png" width="470" height="110"
alt="Enclosing Notepad file name in quotes" />
All XHTML attribute values must be enclosed in double-quotes and all
attributes must have a value (<tag attribute="value">blah blah
blah</tag>). Some element attributes (such as form elements) do
not have attribute values. In those cases simple set the attribute equal to
itself and set the value in quotes
(<tag attribute="attribute">...).
Comments are part of your HTML document but are not displayed by the browser. The syntax for comments in an HTML document is:
<!-- Single Line: Comments are important! -->
<!--
Multiple Lines:
Comments may span multiple lines.
Text within comments is not displayed by the browser.
Comments may NOT be nested (no comments within comments).
-->
Technically comments are not elements. The character strings
"<!--" and "-->" are not tags. To be precise
"<!" is the markup declaration open delimiter,
">" is the markup declaration close delimiter, and "--" is the
comment open andcomment close delimiter. For this reason you should
not use a string of 2 or more hyphens ("---") in comments. If you
want to achieve some sort of graphical delineation use strings of equal signs
("==="), or anything else that suites your taste.
Comments are helpful for other authors to study and/or maintain your
code. In all too short a period of time you become the "other
author" when you come back to your code and forget what you did or why
you did it a certain way. Comments are also useful to temporarily hide
or disable a part of your HTML code, perhaps to retain a previous
version of a section of HTML code in case you like the first version
better.
Note: Comments cannot be nested!
Some common text formatting tasks are emphasizing words or phrases; giving strong emphasis to words or expressions, quoting text, and citing references. HTML provides the em, strong, blockquote, and cite logical elements. Most browsers render the em element as italic, the strong element as bold, the blockquote element as normal text with indented left and right margins, and the cite element as italic.
You could, of course, simply use the <i>...</i>
italic element and <b>...</b> bold element for some
of these. However, about 10% of people accessing the Web do so other than
visually. For these users using specialized browsers (for example, visually
impaired users have software that reads Web pages aloud) the logical elements
give special meaning and voice inflection to the elements, where the italic
and bold elements simple change the appearance.
| HTML Source Code | Browser Display |
|---|---|
<em>emphasis</em>
|
emphasis |
<strong>strong emphasis</strong>
|
strong emphasis |
<cite>Harry Potter and the Chamber of
Secrets</cite>
|
Harry Potter and the Chamber of Secrets |
In the essay <cite>Hackers and Painters</cite> from May 2003 author Paul Graham commented: <blockquote> <p>Programming languages should be designed to express algorithms, and only incidentally to tell computers how to execute them.</p> </blockquote> |
In the essay Hackers and Painters from May 2003 author Paul
Graham commented:
|
HTML was originally designed with content as the main priority; it was never intended to give the author word processing type document formatting capabilities. You use the paragraph tag to tell the browser "I want the following text to be treated as a paragraph." The browser decides exactly how much space to insert between the previous paragraph, and whether or not to indent the first line.
Multiple white space characters (space, tab, new line) are rendered as a single space by the browser. The following 2 lines are displayed exactly the same:
| HTML Source Code | Browser Display |
|---|---|
<p>The quick brown fox</p> |
The quick brown fox |
<p>The quick brown fox</p> |
The quick brown fox |
HTML should define the structure of the page and present the content. Cascading Style Sheets are designed to handle the presentation details and keep them separate from the structure. The following Internal Style Sheet placed in the head element changes the font to Arial, or the system's default sans-serif font if Ariel is not installed; sets a text color and background color; gives the page a left and right margin, and centers level one heading. Note that all this was accomplished without touching the original HTML content!
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Basic HTML Sample 2</title>
<style type="text/css">
body {
color: #339999;
background-color: #FFFFCC;
font-family: Arial, sans-serif;
margin-left: 2em;
margin-right: 2em;
}
h1 {
text-align: center;
}
</style>
</head>
sample2.html