HackCraft

Art begins with craft, and there is no art until craft has been mastered. You can’t create until you’re willing to subordinate the creative impulses to the constriction of a form.Anthony Burgess

Date & Time Formats on the Web

There are several different formats used in different places in the technologies of the World Wide Web to represent dates, times and date/time combinations (hereafter collectively referred to as “datetimes” unless a distinction has to be made). This document presents a survey of the most significant, details which formats are mandated by the key technologies of the web, and offers advice for deciding what formats you should use in your own web applications.

The Issue

The representation of datetimes and is at first glance a relatively simple problem, indeed one which we solve in natural language since we first learn how to read a clock. However it is often a source of confusion and error as several, often equally valid, formats will rarely work well together.

The issue of datetime representation can be considered as a collection of 5 separate, albeit related, issues:

  1. Internal representations used by a given language, runtime or library.
  2. Human-readable strings intended primarily for parsing by computer, where the format is decided by an external specification, recommendation or standard.
  3. Human-readable strings intended primarily for parsing by computer, where the format is decided as part of the design of the system.
  4. Representations intended primarily for human readers (i.e. end-users).
  5. Input of datetimes from end-users.

In some of these situations (notably the second, often also the first) there is little or no freedom for the developer to decide on the format, it’s merely a matter of learning what the requirements are and implementing them. For the most part this is a blessing — as it is one less thing for the developer to worry about. However it can be a fiddly programming task to get two such standards to coöperate.

In others the developer has both more scope to implement things as he or she sees fit, but with that more opportunities for making a poor design decision that may impact on interoperability or which may fail to cope with unconsidered edge-cases.

This document details the datetime formats that most often are found in the second category, and offers advice for dealing with the 3rd, 4th and 5th. The first category depends on implementation languages and libraries and won’t be dealt with here.

I will focus entirely on web applications — which loosely means any application that uses URIs, HTTP, XML and/or HTML though not necessarily all of those. Some of it may, however, be of use to developers and designers working in other areas.

Calendars, Time Zones and Issues they may Raise

The following section gives some information about dates and times themselves; in particular time zones and differences between calendars in use in different places at different times.

In the West we use the Gregorian calendar. Much of the rest of the world either uses this calendar, or else at least understands it. In some countries it is used for technical purposes even if it is not commonly used for everyday or legal purposes.

For the most part you can assume that applications and/or users will use dates expressed in the Gregorian calendar, occurring after 1582, in the UTC time zone, and without having to deal with leap seconds.

However in a minority of applications these issues can be crucial; worse still is the minority of applications were these issues could produce unforeseen edge cases. Hence a basic familiarity with them is worth acquiring.

Issues with Historical Dates

The Gregorian calendar was introduced in 1582 and not fully implemented in many countries until the 20th century. Historical dates before that time are often expressed according to the Julian calendar that was in place before the Gregorian.

The proleptic Gregorian calendar* does not begin with 1582, but allows for dates before that time by using the date the Gregorian calendar would have had, were it in effect at that point. Usage differs in how dates prior to the year 0001 are expressed. In some uses the year before 0001 is -0001. Such dates are also sometimes labelled with CE (Common Era) or BCE (Before Common Era) or by some Christians as AD (Anno Domini) and BC (Before Christ), as in the Julian calendar. In other uses the year before 0001 is 0000 with obvious mathematical advantages.

This rarely affects much code applicable to the web, however it may be necessary to encode historical data, and in such a case you must be clear about how dates prior to 1582 are interpreted.

Calendar Limits in Applications

It is not uncommon for code to use a very early date in a minimum date restriction to block a filter. For example, in a system storing reports an application might allow the user to restrict a search so that no reports filed before, say, the 1st of January 2002 were returned. If the user didn’t want to exclude any reports then rather than using different logic the code might attempt to ensure that no reports filed before the year 1900 were returned — knowing that no records where added to the system anytime even nearly as early.

This is often a very efficient way of dealing with such logic, but care should be with the year you choose as being “before everything”. Obviously the year chosen must be earlier than the earliest possible piece of real data, however some systems have difficulties with dates beyond a particular time (common values for “year zero” on different systems include 1970, 1900, 1899, 1753, 1582, 0001 and 0000). In a system where a date will be processed by several layers then generally the highest “year zero” of each layer will be the year below which you should avoid going.

Similarly there may be an upper limit, relatively common “end of all time” years for various systems including 2038, 9999, 32767 and 65536.

Issues with Leap Seconds

The matter of leap seconds rarely impacts upon any code that is not used for specialist scientific purposes. However if validation code assumes that the seconds portion of a date is in the range [0,60) instead of the more accurate [0,61) a very hard to trace bug can occur. Strictly this can only occur at the last second of --03-31Z, --06-30Z, --09-30Z or --12-31Z.

Generally you won’t have to worry about this unless it is very obviously entailed by an application’s purpose. However library code intended for reuse should be leap second safe, since you cannot know that such a library will never be used in an application that requires leap-second information.

Interestingly there are two definitions of the term “GMT” which differ from each other in terms of how leap seconds are dealt with. This is one of the reasons that the term “UTC” should be used instead.

Issues with Time Zones

Different systems will have different assumptions and requirements about the time zone of a datetime. This can be a source of errors.

Many web technologies mandate that the date and time must be expressed in UTC, in particular HTTP.

If possible applications should deal with dates exclusively in UTC to avoid ambiguity and translate to and from local time zones on-the-fly as necessary.

Particular care must be taken when testing code if you live somewhere where the local time zone coïncides with UTC, such as Ireland, the United Kingdom or Portugal during the winter months (from the last Sunday in October until the last Sunday in March); as it will not be possible to detect if local time is being used when UTC should, or vice versa, without careful examination.

Calendar Systems Other than Gregorian

In some parts of the world use is made of calendars other than the Gregorian calendar. This document does not deal with such calendars. In general if dates must be expressed or parsed from systems other than the Gregorian it is advisable to deal with them internally as Gregorian dates and convert to and from other calendars as needed.

Important Datetime Formats

ISO 8601

ISO 8601 is easily the most important standard for the representation of datetimes in existence. It is the international datetime format, and the source document for the national standards of many countries such as ANSI X3.30 in the US and European standard EN 28601 which in turn is the source of the Irish datetime standard IS/EN 28601, the British datetime standard BS EN 28601 and so on.

In jurisdictions where it is the national standard its use is often restricted to “behind the scenes” and representations aimed primarily at end-users will often remain in formats based on national conventions (e.g. the formats dd/mm/yyyy or even d/m/yy are more commonly seen at end-user level in Ireland and Britain, and the formats mm/dd/yyyy or m/d/yy are more common in the US, despite all of those countries having national standards based on ISO 8601).

ISO 8601 is a very large standard that covers a wide range of potential needs, and allows great variation in implementation. There are even some implementations allowed in restricted circumstances that are ambiguous — though obviously these implementations are not intended to be used in interoperable situations such as most web applications.

Because of this degree of flexibility it is easy to write code that will output a string in ISO 8601 format, or read a string in a particular subset of the formats allowed by ISO 8601, but code that will parse a variety of compliant strings is very difficult (impossible if some variations are allowed). Because of this ISO 8601 is often used as the basis of another standard that defines a subset of allowed representations. Of these profiles the one defined by the W3C (described below) is the most important for the web.

Unfortunately many people think of a particular profile they are familiar with when they hear “ISO 8601”, other people know that using 8601 is a Good Thing but are not familiar with the details of implementation. Hence a spec or requirements document might mention 8601 but not be more explicit than that. In such cases it’s important to seek clarification rather than assume that the format you think of as “ISO 8601” is the correct one to use.

This document shall not detail ISO 8601 itself, but rather certain profiles and subsets that are of particular importance to the Web. The standard itself should be consulted for further details.

W3C DTF

The W3C note Date and Time Formats details a profile of ISO 8601 that is heavily used on the web variously referred to as “W3C Datetime Format”, “W3C DTF”, “W3C Profile of ISO 8601”, “Web Datetime Format” or even simply as “Datetime” in some web-centric contexts.

It is the best starting point for most attempts to decide on a datetime format where the details are left in the hands of a developer. While the document itself is a “Note” in W3C terminology (and hence not a normative standard) it is used normatively by other standards, and is a valid implementation of ISO 8601, so it can be used with confidence.

It actually defines a series of formats with varying degrees of precision. Sometimes a specification, recommendation or standard would allow any of these (perhaps recommending to be as precise as possible), and sometimes it may mandate a particular degree of precision.

Given:

Placeholder
String
Meaning
YYYY four-digit year
MM two-digit month (01=January, etc.)
DD two-digit day of month (01 through 31)
hh two digits of hour (00 through 23)
mm two digits of minute (00 through 59)
ss two digits of second (00 through 59)
s one or more digits representing a decimal fraction of a second
TZD time zone designator (“Z” or +hh:mm or -hh:mm)

Then the following formats are allowed:

Year:
YYYY (e.g. 1997)
Year and month:
YYYY-MM (e.g. 1997-07)
Complete date:
YYYY-MM-DD (e.g. 1997-07-16)
Complete date plus hours and minutes:
YYYY-MM-DDThh:mmTZD (e.g. 1997-07-16T19:20+01:00)
Complete date plus hours, minutes and seconds:
YYYY-MM-DDThh:mm:ssTZD (e.g. 1997-07-16T19:20:30+01:00)
Complete date plus hours, minutes, seconds and a decimal fraction of a second
YYYY-MM-DDThh:mm:ss.sTZD (e.g. 1997-07-16T19:20:30.45+01:00)

This is a great simplification on the range allowed by ISO 8601 and does not cater for a many cases that ISO 8601 deals with, nor does it define mechanisms for dealing with leap seconds, years prior to 1582, or even give a way to represent years outside of the range 0000-9999. However it is more than adequate for the majority of purposes web applications use datetimes for.

Note that the TZD can be either the capital letter “Z” to indicate UTC time, a string in the format +hh:mm to indicate a local time expressed with a time zone hh hours and mm minutes ahead of UTC or -hh:mm to indicate a local time expressed with a time zone hh hours and mm minutes behind UTC.

1994-11-05T08:15:30-05:00 corresponds to 5th November, 1994, 8:15:30 am, US Eastern Standard Time. 1994-11-05T13:15:30Z corresponds to the same instant.

RFC 822 (As Updated by RFC 1123)

RFC 822 (Standard for the format of ARPA Internet text messages) defined a date time format that is much used on the Internet and has some use on the web (particularly in HTTP). The Augmented BNF of the format is:

date-time = [ day "," ] date time ; dd mm yy ; hh:mm:ss zzz day = "Mon" / "Tue" / "Wed" / "Thu" / "Fri" / "Sat" / "Sun" date = 1*2DIGIT month 2DIGIT ; day month year ; e.g. 20 Jun 82 month = "Jan" / "Feb" / "Mar" / "Apr" / "May" / "Jun" / "Jul" / "Aug" / "Sep" / "Oct" / "Nov" / "Dec" time = hour zone ; ANSI and Military hour = 2DIGIT ":" 2DIGIT [":" 2DIGIT] ; 00:00:00 - 23:59:59 zone = "UT" / "GMT" ; Universal Time ; North American : UT / "EST" / "EDT" ; Eastern: - 5/ - 4 / "CST" / "CDT" ; Central: - 6/ - 5 / "MST" / "MDT" ; Mountain: - 7/ - 6 / "PST" / "PDT" ; Pacific: - 8/ - 7 / 1ALPHA ; Military: Z = UT; ; A:-1; (J not used) ; M:-12; N:+1; Y:+12 / ( ("+" / "-") 4DIGIT ) ; Local differential ; hours+min. (HHMM)

RFC 1123 changed this to include:

date = 1*2DIGIT month 2*4DIGIT

That is, it changed from a 2-digit year to allow either a 2-digit or 4-digit year to avoid ambiguity between dates before and after 2000, further it stated that the 4-digit version should be used.

It also deprecated the use of the military codes. If received such dates should be interpreted as meaning a time difference from UTC of zero, unless other information enables you to determine the intended meaning.

XML Schema Datetime Datatypes

Part 2 of the XML Schema recommendation includes quite a rich collection of datatypes for dealing with dates, times and periods of time. Strictly speaking this is more of a case of a technology that uses datetime formats than a source of datetime formats in itself, however for many of them the recommendation is the normative definition of the format. Several XML applications are defined in terms of XML Schemata and/or in terms of a Post-Schema Validation Infoset, and hence inherit the use of some or all of these formats. All of these formats are based on ISO 8601, with many using a subset of what 8601 allows.

It’s worth noting that all formats allowed by W3C DTF can be expressed as a restriction on one of the formats defined by these datatypes, hence it is possible for the use W3C DTF to be expressed in XML Schemata.

duration

The duration datatype represents a period of time.

It is expressed in the format [-]PnYnMnDTnHnMnS, where [-] indicates an optional minus time to indicate a negative period of time, nY is the number of years, the first nM is the number of months, nD is the number of days, nH is the number of hours, the second nM is the number of minutes, nS is the number of seconds and all other characters are literals.

The values for the year, month, day, hour, and minute components can be any integer. The values of the seconds component can be any integer or decimal number.

Components can be omitted according to the following rules:

  1. If any component has a value of 0 then the number and designator may be omitted.
  2. The seconds part may have a decimal fraction.
  3. The T designator must be present if any time item is present (hours, minutes and seconds) and must be absent if all of the time items are absent.

Examples:

P1Y2M3DT10H30M: 1 year, 2 months, 3 days, 10 hours and 30 minutes.

-P120D: minus 120days.

P1347Y: 1347 years.

P1347M: 1347 months.

P0Y1347M: 1347 months.

P0Y1347M0D: 1347 months

-P1347M: minus 1347 months.

P1Y2MT3H: 1 year, 2 months and 3 hours.

datetime

The datetime datatype represents a particular instance in time.

It is expressed in the format [-][Y*]YYYY-MM-DDThh:mm:ss[.s[s*]][TZD].

Years must be at least 4 digits long, with extra digits being allowed only if a value greater than 9999 is expressed. - may be used to indicate a negative year, a positive year being assumed if it is absent.

Rather annoyingly the year 0000 is not allowed, and hence it treats 0001… as indicating the first year in the Common Era (CE sometimes called AD) and -0001… as indicating the preceding year, that is the first year Before Common Era (BCE, sometimes called BC). This is at odds with ISO 8601, which uses 0000 to indicate 1BCE, with obvious mathematical advantages. The relevant working group have expressed an intention to allow 0000 in the next version of XML Schema.

The seconds component may have a decimal point followed by one or more decimal digits.

The time zone designator may be omitted or given as described in the description of the W3C DTF — note that it is not possible to reliably compare a datetime with a time zone designator with a datetime without one if the latter would be less than 14 hours away from the former if it was in UTC (current laws mean that time zones vary from +12:00 to -13:00, but these laws could change so a possible range of +14:00 to -14:00 is assumed).

A canonical representation is obtained by restricting the time zone so that either it is absent, or it is “Z” (hence 1994-11-05T08:15:30-05:00 is represented canonically as 1994-11-05T13:15:30Z).

time

The time datatype represents an instant of time that recurs each day. It is expressed in the format hh:mm:ss[.s[s*]][TZD] (a left truncation of the representation of datetime).

The time zone designator may be omitted or given as described in the description of the W3C DTF — note that it is not possible to reliably compare a datetime with a time zone designator with a datetime without one if the latter would be less than 14 hours away from the former if it was in UTC (current laws mean that time zones vary from +12:00 to -13:00, but these laws could change so a possible range of +14:00 to -14:00 is assumed).

A canonical representation is obtained by restricting the time zone so that either it is absent, or it is “Z” (hence 08:15:30-05:00 is represented canonically as 13:15:30Z).

date

The date datatype represents a calendar date. It is expressed in the format [-][Y*]YYYY-MM-DD[TZD] (a right truncation of the representation of datetime, with the addition of an optional time zone designator).

gYearMonth

The gYearMonth datatype represents a particular Gregorian month in a particular Gregorian year. It is expressed as [-][Y*]YYYY-MM[TZD]. E.g. 2003-09 is the month of September in the year 2003CE.

gYear

The gYear datatype represents a particular Gregorian year. It is expressed as [-][Y*]YYYY [TZD]. E.g. 2003 is the year 2003CE.

gMonthDay

The gMonthDay datatype represents a date that recurs every year. It is expressed as --MM-DD[TZD] (note there are two hyphens beginning the string). E.g. --10-31 is the 31st of October.

gDay

The gDay datatype represents a date that recurs every Gregorian month. It is expressed as ---DD[TZD] (note there are three hyphens beginning the string). E.g. ---03 is the 3rd day of the month.

gMonth

The gMonth datatype represents a Gregorian month that recurs every Gregorian year. It is expressed as --MM[TZD] (note that there are two hyphens on the left side of the digits). E.g. --06 is the month of June.

At the time of writing the current version of the XML Schema Recommendation gives the format as --MM--. This is a bug and has been corrected in the Errata document.

Other Datetime Formats

ISO 2014, 2015, 2711, 3307 and 4031

ISO 2014, 2015, 2711, 3307 and 4031 defined formats for the encoding of different portions of date and time information. ISO 8601 is largely based on these standards, and now supersedes them.

RFC 850 (as Updated by RFC 1036)

RFC 850 gave a vague description of a datetime format that is used on Usenet. Its primary (probably sole) relevance to the web, and indeed to much computing apart from on Usenet, is as a deprecated HTTP datetime format which HTTP servers and clients must be prepared to accept for backwards compatibility, but must not send. The section on HTTP below should give enough information on the format for implementations to use.

asctime() Format

The Ansi C Standard Library (later the international standard ISO/IEC 9899) includes a function asctime() that uses a fixed format to fill a 26-char buffer with a representation of a date and time. It is still much used by C and C++ programmers as a “quick-and-dirty” way to obtain a datetime representation, but is neither particularly suitable for interoperable use (in which ISO 8601 or a profile thereof should be used) nor for presenting to an end-user.

Its primary relevance to the web is as a deprecated HTTP datetime format that HTTP servers and clients must be prepared to accept for backwards compatibility, but must not send. The section on HTTP below should give enough information on the format for implementations to use.

Datetime Formats Used with Specific Web Technologies

HTTP

HTTP accepts dates in one of three formats, though two of these are for backwards compatibility only. The correct datetime format for use with HTTP is described in RFC 2616 as “RFC 822, updated by RFC 1123”. However unlike RFC 822 and RFC 1123 it states that the day field is two digits in length.

HTTP dates are always given in UTC. This is enforced by insisting that the time zone string used with RFC 822 date times must be “GMT”.

Example: Sun, 06 Nov 1994 08:49:37 GMT

Because the definition is at odds with RFC 822 regarding the number of digits in the date field, it is advisable to always use a two-digit day field, but to be prepared to accept a single digit field as well. (E.g. you would only ever send Sun, 06 Nov 1994 08:49:37 GMT but you should be prepared to accept Sun, 6 Nov 1994 08:49:37 GMT as well).

Two other formats that you should be prepared to accept, but never transmit, are RFC 850, as updated by RFC 1036, and the format generated by the asctime() function of ANSI C.

RFC 850 and RFC 1036 are rather vague about their date time format. RFC 1123 gives the format as being in the format Sunday, 06-Nov-94 08:49:37 GMT. Notably this format is ambiguous with regard to centuries.

asctime() returns a string in the format Sun Nov 6 08:49:37 1994. Note that there are two spaces before a single-digit day field, though some C implementations may use a leading 0 instead.

The augmented BNF for all of these given in RFC 2616 is:

HTTP-date = rfc1123-date | rfc850-date | asctime-date rfc1123-date = wkday "," SP date1 SP time SP "GMT" rfc850-date = weekday "," SP date2 SP time SP "GMT" asctime-date = wkday SP date3 SP time SP 4DIGIT date1 = 2DIGIT SP month SP 4DIGIT ; day month year (e.g., 02 Jun 1982) date2 = 2DIGIT "-" month "-" 2DIGIT ; day-month-year (e.g., 02-Jun-82) date3 = month SP ( 2DIGIT | ( SP 1DIGIT )) ; month day (e.g., Jun 2) time = 2DIGIT ":" 2DIGIT ":" 2DIGIT ; 00:00:00 - 23:59:59 wkday = "Mon" | "Tue" | "Wed" | "Thu" | "Fri" | "Sat" | "Sun" weekday = "Monday" | "Tuesday" | "Wednesday" | "Thursday" | "Friday" | "Saturday" | "Sunday" month = "Jan" | "Feb" | "Mar" | "Apr" | "May" | "Jun" | "Jul" | "Aug" | "Sep" | "Oct" | "Nov" | "Dec"

Remember that rfc1123-date is the only version you should actually transmit!

XML

XML itself does not define, or need, a datetime format. However a great many XML applications do. The majority either mandate the use of W3CDTF, of one of the formats allowed by W3CDTF, or else use one of the XML Schema datatypes. Some may even go so far as to mandate one of the formats allowed by W3CDTF and then define it as a restriction of an XML Schema datatype.

Familiarity with the W3CDTF and the datatypes allowed by XML Schemata will stand you in good stead for learning any XML application.

HTML/XHTML

HTML has one attribute that takes a datetime value, called datetime, which is an optional attribute of the <ins> and <del> elements. Its value must be the sixth format allowed by the W3C DTF (that is it gives a full date and time to the nearest second, with a time zone designator). Values for seconds, minutes and seconds or even hours, minutes and seconds can be given as “00” if they are unknown, generally however it is preferable to be as precise as possible.

Example:

<p> Everything is going <ins datetime="2002-03-21T19:47:35Z">absolutely</ins> <del datetime="2002-03-14T12:34:58Z">perfectly</del><ins datetime="2002-03-14T12:34:58Z">dreadfully wrong</ins>. </p>

URIs

Because URIs are interpreted either as opaque strings or as a sequence of opaque strings with an application-defined hierarchical structure there is no need for a datetime format per se. However it is often convenient or essential to encode datetime information in a URI. The two most common cases are where the resource the URI identifies is related to a date (creation date, approval date, etc.) or to provide version control (in particular it is useful with the URIs used as XML Namespace names). ISO 8601 provides a useful basis for such dates:

  • It is familiar to most developers
  • It is unambiguous
  • When sorted as a string of characters or as a string of octets they will be sorted as if they had been sorted as datetimes.

There is sometimes an advantage in using / as a separator between date parts — that is in defining each date part as a separate step in the path hierarchy. In such cases the order year, month, day should still be used and the month and day should be expressed as two digits, with a leading zero if necessary. Generally this makes sense in cases, such as a large archive, where each year and month, perhaps even each day, will have lots of items. In cases where new material is added less frequently it makes more sense to use a full date in a single path step.

Note that the colon character “:” is a reserved character in URIs, it may be necessary to express times as a single number without separators between the parts.

Designing Your Own Datetime Formats

It may be necessary to design a datetime format to fulfil some rôle in an application that does not cleanly fit into one of the above. In particular even following the guidelines above the case of XML applications still leaves a lot in the hands of the application designer.

I advise working to the following guidelines:

  1. If one single format allowed by W3C DTF will fulfil the rôle perfectly, use it.
  2. Otherwise, if offering a choice of 2 or more formats allowed by the W3C DTF will fulfil the rôle perfectly, use them.
  3. Otherwise, if one of the XML Schema datetime datatypes, or a restriction on it, will fulfil the rôle perfectly, use it.
  4. Otherwise, if a choice of 2 or more of the XML Schema datetime datatypes, or restrictions thereof, will fulfil the rôle perfectly, use them.
  5. Otherwise, if a format based on ISO 8601 will fulfil the rôle perfectly, use it.
  6. Otherwise your purposes are going to be quite specialised; research formats used by experts in the application domain before designing your own format from scratch.

In the last two cases it may be necessary to consider what effect pre-Gregorian dates and leap seconds might have, and whether the year before 0001 is 0000 or -0001. In general either ban dates before 1582 or use the proleptic Gregorian calendar and allow 0000.

Make sure that all of the design decisions are documented completely. One of the advantages with the first few options listed above is that it is possible to do so with little more than a reference to the documents in questions, and perhaps an XML Schema Document.

Presenting Dates and Times to End-Users

With dates that are human-targeted, rather than merely human-readable but targeted primarily at a machine, there is less certainty of “correctness”. UI is one of the most difficult areas of computing to be precise about. Hence while I offer the following advice it may be wise to test how users actually find the implementation. Be advised though that testing with users from just one culture may result in a decision to go with a format that serves them well but which is disastrous in an internationally-accessible application.

Output

Local conventions vary as to how dates and times are encoded. These differences lead to notorious ambiguity, for instance 01/02/1999 would be interpreted by most American readers as representing the 2nd of January but by most Irish readers as representing the 1st of February.

It is difficult to reliably determine the locale of an end user on the web through non-invasive means (it can be done through some ActiveX techniques, amongst other ways, but this introduces security and reliability issues). Some people have implemented techniques based on hints such as the IP of the client or the Accept-Language HTTP header, but these may give false results (while some development environments such as .NET are good at determining user locale, there can still be false results here). As such it is important that date and time information is expressed in an unambiguous manner.

Long dates which use either the full or abbreviated name of the months and four digits for the years will avoid this ambiguity even if the order in which the different parts of the date are combined are not what the end-user would be used to. Educated guesses as to whether one should use “2nd of February” or “February 2nd” will be harmless and you may wish to use them for extra “polish”.

Client-side Javascript can be used to output a date according to a user's locale settings, but a fall-back for users without Javascript enabled would still be required:

<script type="text/javascript"> <-- var dt = new Date(); dt.setFullYear(2003,12,8); dt.setHours(19,30,0,0) document.write('<p>The meeting will be held at ' + dt.toLocaleString() + '.<\/p>;'); //-->; </script> <noscript><p>The meeting will be held at 7:30 pm on the 8<sup>th</sup> of December.</p></noscript>

Note that the degree of precision used cannot be controlled, so time portions or year portions will be printed even when not desirable in the context.

(Thanks to David Balazic for arguing with me about whether Javascript can reasonably be used here, resulting in the idea of using it in combination with <noscript>).

Abbreviated month names mean that a developer can rely on an unambiguous English language date of a fixed size e.g. “16 Nov 1973”. However this is not a feature of abbreviated month names in other languages, for example in Irish a month name abbreviation may be 3, 4 or 5 characters long.

If a short format is necessary use an extended format from ISO 8601 (e.g. 2003-07-24). While it may seem quite “foreign” to some users it will not be ambiguous. Using a space instead of the “T” between the date and time portion would be an acceptable concession towards a more “natural” format. In all though, I would strongly avoid presenting short formats to end-users, the only exception being in language-neutral cases (i.e. where there is no text) since long formats require you to be literate in a given language.

Contra this, in documents aimed at a technical audience it may be reasonable to assume a familiarity with ISO 8601, and hence it becomes the ideal.

A difficult judgement to make can be the extent to which different locale-specific mechanisms affect one another — e.g. does the use of American English entail the use of the American mm/dd/yyyy convention? For the most part it’s wise to avoid such assumptions, beyond the fact that the names of months should obviously be in the language in question.

Input

Similarly input must be constrained so that users will not enter ambiguous dates. A user must not be presented with a text box into which they can enter a date as they might enter 01/02/1999 or 02/01/1999 for the same date depending on their culture.

The ideal mechanism is to use drop-down lists (from the <select> element) for the dates and months, with full or abbreviated month names used to help avoid ambiguity. For some applications years can be selected from a list of possibly valid years, or else a text-box could be used.

Javascript can be used to ensure that impossible combinations, such as --04-31 or 2001-02-29 do not occur. As always, however, server-side code must perform its own validation and not rely on client-side validation, to catch both cases where the client is not allowing Javascript to execute, and cases were crackers deliberately input invalid data.

More sophisticated calendar-view systems should be avoided. These systems generally rely upon client-side code, ActiveX controls or Java that should be avoided if at all possible. They are generally based upon UI systems that are familiar to users of some operating systems but not others. They tend to be difficult for non-sighted users to use as they contain a large series of dates that doesn’t translate well to Braille or text versions, and some even open a new window to operate. The links that trigger the use of such input mechanisms also tend to be relatively non-intuitive (there’s no reason why they should be, but the always seem to be). If someone (i.e. a client who refuses to allow themselves to be educated) insists on such a mechanism being provided it should supplement a drop-down based system, rather than replace it — on returning from the calendar-view mechanism the drop-downs would be filled with the selected values.

It’s worth noting that X-Forms implementations (which would include any complete XHTML2.0 implementation, judging by the current Working Draft) would enable a calendar-style system for inputting dates that was both natural for a given operating system, and configurable to work with a user’ abilities. Hopefully X-Forms will enable us to have the best of both worlds in this regard. However widespread implementation is probably still some years away.

Validation of Datetime Information

The following rules can be used to validate dates and times:

  1. The valid values for a year will often be application defined. If this is left open then whether year 0000 is allowed must be decided upon and documented (if in doubt, allow it and consider it to be the year before 0001 and after -0001).
  2. The valid values for a month is in the range [01-12]
  3. The valid values for days is determined by the month and years:
    1. A year is a leap year if it is divisible by 4 and is not divisible by 100, or if it is divisible by 400, otherwise it is a common year. If you do not allow the year 0000 then you should add 1 to any negative years before applying this algorithm.
    2. For a common year the maximum range of dates for a given month is given by the sequence using the month as an index into an array of the values {0, 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31}.
    3. For a leap year the maximum range of dates for a given month is given by the sequence using the month as an index into an array of the values {0, 31, 29, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31}.
    4. The minimum value for any day is 01.
  4. The valid range for hours is [00-23]. (Some ISO 8601 formats allow a time of 24:00:00 to be used - treating it as equivalent to 00:00:00 the next day).
  5. The valid range for minutes is [00-59]
  6. The valid range of seconds depends on whether leap seconds are allowed, and on whether fractions of a second are allowed as follows:
    Allow Fractions of Seconds
    YesNo
    Allow
    Leap
    Seconds
    Yes [00-61) [00-60]
    No [0-60) [0-59]

If an application is allowing leap seconds it should strictly only allow them on the last minute of --03-31Z, --06-30Z, --09-30Z or --12-31Z.

If you wanted to allow leap seconds, so that valid data wouldn’t cause a validation error, but don’t want to record information with such precision you can produce a reasonable approximation by subtracting 60 seconds and adding one minute (hence 23:59:60 becomes 00:00:00 the next day).

Notes

*
The proleptic Gregorian calendar is sometimes also referred to as the “prolaptic Gregorian calendar”. As far as I can determine this is due to a spelling mistake in a version of ISO 8601 that somehow caught on!
The use of single-letter codes for those time zones that are a whole number of hours behind or ahead of UTC had been used by the US Military for some time. At the time RFC 822 was written the development of the Internet was still heavily influenced by ARPA, so it is natural that this military terminology would come into use. Unfortunately the examples in the RFC got the order the wrong way around, hence some people where using “A” to mean an hour ahead of UTC and some to mean an hour behind. In the end any use of the single-letter codes was deprecated, to prevent further confusion. It’s worth nothing that “Z” remains an unambiguous indicator of UTC whatever way it is interpreted.
This can be expressed in C, C++, Javascript or Java code as: isLeap = !(year % 4) && ( !year % 100 || !(year % 400)) or in C, C++, Javascript, Java and also C# as: isLeap = year % 4 == 0 && ( year % 100 != 0 || year % 400 == 0)

References

ISO 8601:2000. Data elements and interchange formats — Information interchange — Representation of dates and times, International Organisation for Standardization, Geneva, 2000.

Date and Time Formats, Misha Wolf, Charles Wicksteed, World Wide Web Consortium, 1997.

RFC822: Standard for the format of ARPA Internet text messages. D. Crocker, Internet Engineering Task Force, 1982.

RFC 1123: Requirements for Internet Hosts — Application and Support. R. Braden, Ed. Internet Engineering Task Force, 1989.

ISO/IEC 9899:1999. Programming languages — C, International Organisation for Standardization, Geneva, 1999.

(A relatively cheap version of ISO/IEC 9899:1990 is available in the form of The Annotated ANSI C Standard. However the annotations are flawed to the point of being downright harmful. While it’s a cheap way of getting hold of the standard only buy it if you think you can be disciplined in not reading the annotations.)

XML Schema Part 2: Datatypes, Paul V. Biron, Ashok Malhotra, World Wide Web Consortium, 2001.

XML Schema 1.0 Specification Errata, World Wide Web Consortium, 2001.

RFC 2616: Hypertext Transfer Protocol — HTTP/1.1, R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, T. Berners-Lee, Internet Engineering Task Force, 1999.

HTML 4.01 Specification, Dave Raggett, Arnaud Le Hors, Ian Jacobs, World Wide Web Consortium, 1999.

RFC 2396: Uniform Resource Identifiers (URI): Generic Syntax. T. Berners-Lee, R. Fielding, L. Masinter, Internet Engineering Task Force, 1998.

RFC 850: Standard for interchange of USENET messages. M.R. Horton, Internet Engineering Task Force, 1983.

RFC 1036: Standard for interchange of USENET messages. M.R. Horton, R. Adams, Internet Engineering Task Force, 1987.

Extensible Markup Language (XML) 1.0, Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, Eve Maler, World Wide Web Consortium, 1998 & 2000.

XHTML™ 1.0 The Extensible HyperText Markup Language, World Wide Web Consortium, 2000 & 2002.