TEI (and XML more generally) contains rich data about individual portions of a text.
The current field_descriptions.json format could be supplemented or avoided by having some way to define keys through RDF triples. So instead of "date," one could use http://bookworm.culturomics.org/date
, or purl.org/dcterms/date
, or whatever the proper syntax for this is.
Unfortunately,
Integration with SOLR makes an enormous amount of sense: Solr handles text queries very well, and is getting increasingly good at things like faceted queries. It makes things like proximity search possible for the first time. Both engines take a very long time to build indexes, but Solr may be better at adding new indexes.
There are a number of things possible in MySQL that I (Ben) want to make sure we preserve before moving over to a Solr solution altogether, because I think they are enormously important for a variety of research.