A Newer LSJ

Last week we published a Markdown version of the LSJ lexicon. On Monday, Professor Helma Dik of the various digital Classics initiatives at the University of Chicago send me XML editions of the LSJ that her team has been lovingly curating for the past decade.

This version fixes errors in many lemmata and the structure of entries, and improves Unicode presentation of the Greek.

As of today (Tuesday, November 6, 2018), the CITE Architecture LSJ Browsing application defaults to this new collection of LSJ entries.

Versions of Collections

The CITE Architecture is based on machine-actionable canonical citation of objects that scholars study. Canonical citations can’t change; a CITE2 URN must always identify the same thing.

For this lexicon project, we identified a notional collection of “LSJ Entries”, which we can cite with this URN:

urn:cite2:hmt:lsj:

To see actual data, however, we need a “version-level” URN. Before November 6, 2018, the LSJ application was querying a service for objects in the versioned-collection identified by: urn:cite2:hmt:lsj.markdown:.

With the creation of a new version of a collection of LSJ entries, based on the Chicago data, we are querying for objects in the collection urn:cite2:hmt:lsj.chicago_md:.

The structure of the CITE2 URN preserved the “scholarly identity” of these two collections. They are both collections of LSJ entries, and two URNs with the same object-identifier should point to the “same” (notional) entity in both versions, albeit with different data:

urn:cite2:hmt:lsj.markdown:n29873

ἐαροτρεφής
ἐᾰρο-τρεφής, ές, `A` **flourishing in spring**, λειμῶνες Mosch. 2.67.

urn:cite2:hmt:lsj.chicago_md:n29873

ἐᾰρο-τρεφής
**ἐαροτρεφής**, ές, `A` **flourishing in spring**, λειμῶνες Mosch. 2.67.

Canonically Citable Data in a Changing World

On November 4, 2018, Dr. Monica Berti announced that she had incorporated the LSJ browser into the Digital Athenaeus. Clearly, any URNs that Dr. Berti has included in her data identify a different version of the LSJ collection from the one currently offered by the app. I did not want to break Dr. Berti’s work (especially since I was super flattered that she liked this web-application). I did not want to have to let the LSJ Browser support multiple versions of the LSJ (I have other things I need to do in life). I definitely wanted to default to the Chicago version. As tempting as it was just to replace the LSJ collection with new data, keeping the markdown-versioned URNs, that wouldn’t be correct. Here’s what I’m doing:

  • Both versions of the LSJ data are on the Github repository as CEX files. The Chicago data is added to the lexica.cex that includes LSJ and the Lewis & Short Latin dictionary in a single CEX file.
  • The Scala CITE Services service will include both versions of the LSJ data, so anyone can query them both, by URN.
  • The LSJ Browser application will exert its privileges as an application—to do whatever it feels like—and accept any version of the notional urn:cite2:hmt:lsj: collection, but will deliver data only from whatever version is configured as the current supported one, for now, chicago_md. So if you query urn:cite2:hmt:lsj.markdown:n29873 you will see results from urn:cite2:hmt:lsj.chicago_md:n29873 (and you will see from the URN that you got a different version).
  • This is super easy thanks to the xcite Scala library, which offers many well-tested methods for manipulating CITE2 and CTS URNs.