EAD 2002: Finding Aid Delivery using Native XML Technologies

next | up | prev

From SGML to XML

Changes to the SGML-encoded finding aids were driven both by the move to XML and by the use of XML validation tools which brought some things to light.

Close empty elements for XML, e.g.:

<lb/> (not <lb>)

Convert Microsoft Codepage 1252 characters for typographic niceties
such as smart quotes, the Œ/œ ligature (not in ISO 8859-1), etc.
either into UTF-8 numeric character references with sed, e.g.,

    sed -f [file of patterns] wpaepcke.sgm > wpaepcke.xml

or into UTF-8 directly with GNU recode, e.g.,

    cat wpaepcke.sgm | recode 1252/..UTF-8 > wpaepcke.xml