Alexander Sakharov

Data Languages

This page is reference material on data languages. In contrast to natural languages, most of artificial languages are programming languages. Grammars are the dominant means for language specification. Whereas grammars are designed for virtually all programming languages, it is a pretty rare practice to use grammars for specifying data formats as opposed to computer programs. Normally, data formats are specified by giving interpretations to fixed-size chanks of the data.

Those rare cases in which grammars are used to specify data languages are collected on this page. It is the grammar that makes a data format a data language. Data formats described by other means than grammars are not covered here.

While compiling this list of data languages, I discovered that there are basically two groups of them. The first one is XML-related languages. SGML belongs to this group but it also stands out as a language that was in existence far before the conception of XML. In the view of XML advocates, XML-related languages eventually replace all other data languages and all other data formats for that matter. The second group combines all other data languages. Surprisingly, the second group also happened to be somewhat uniform. It includes exclusively languages for printing and for visualization of documents. One of the biggest groups of data formats - graphical formats - are normally not specified by languages. There are noticable exceptions like SVG and X3D though.

XML Family
This is the biggest family of languages including XML itself, HTML, and numerous XML sub-languages. Fortunately, all relevant specifications are available on the W3C site. Here, direct links leading to major language specifications, i.e. their grammars, are collected. Those members of the XML family whose primary purpose is programming (such as XSLT) are left out since they are not quite 'data' languages. Auxiliary languages such as XPath are not listed here either.

Extensible Markup Language (XML) 1.1

HTML 4.01 Specification

XHTML™ 1.0 The Extensible HyperText Markup Language


Resource Description Framework (RDF) - Syntax Specification

MathML 2.0

Cascading Style Sheets (CSS) 2.1

Scalable Vector Graphics (SVG) 1.1

SGML is mostly known now as a XML precursor. Overall, it is more expressive and more complex to implement than XML.

Standard Generalized Markup Language (SGML)
Overview of SGML Resources
These two are the best sources of SGML materials I found so far. Also, ISO sells its SGML standard:
Information processing -- Text and office systems -- Standard Generalized Markup Language (SGML)

X3D is a XML-enabled 3D file format that enables real-time communication of 3D data across all applications and network applications. VRML is its predecessor.

X3D and Related Specifications

TeX is a language used in typesetting. In particular, it was designed for typesetting math and other technical materials. BibTeX is a TeX-based bibliography format.

TeX Users Group
AMS TeX Resources
(La)TeX Navigator

PostScript is a printer-independent page description language. It has become a standard for the production of the printed page.

PostScript Language Reference
Encapsulated Postscript

Posrtable Document Format (PDF) is a language for electronic exchange of documents. It is based on the imaging model of PostScript. It enables documents to be reliably viewed and printed anywhere.

PDF Reference

Rich Text Format (RTF) is intended for text and graphics interchange between different operating environments and operating systems, both on the screen and in print.

Rich Text Format (RTF) Specification, version 1.6
Rich Text Format (RTF) Version 1.5 Specification

Need to relax? Try brain teasers. I would recommend those marked 'cool'.