Changes to the SGML-encoded finding aids were driven both by the move to XML and by the use of XML validation tools which brought some things to light.
Close empty elements for XML, e.g.:
<lb/> (not <lb>)
Convert Microsoft Codepage 1252 characters for typographic niceties
such as “smart quotes,” the Œ/œ ligature (not in ISO 8859-1), etc.
either into UTF-8 numeric character references with sed, e.g.,
sed -f [file of patterns] wpaepcke.sgm > wpaepcke.xml
or into UTF-8 directly with GNU recode, e.g.,
cat wpaepcke.sgm | recode 1252/..UTF-8 > wpaepcke.xml