field_descriptions.json
The field_descriptions.json
file is metadata about the metadata. This defines how the keys in jsoncatalog.txt
will be interpreted.
Not all data you put in will have the same purpose. Some fields (like policalParty
or subject
) will be categorical elements suitable for choosing from a dropdown menu;
others (like publish_year
or authorBirthYear
) will be time variables you'll want to display on the X axis;
yet others (like authorName
) might be fields you want to allow in searches, but which web users would never see as a full list.
This file is where you give the parser information.
If you don't enter in a field_descriptions.json, Bookworm can will simply guess based on the name and layout of the fields what kind of data is stored in them. There's a really good chance this will fail, so don't rely on it very heavily. But it can be helpful to run it manually before building the bookworm (by running bookworm prep guessAtFieldDescriptions) and then use the output at
field_descriptions.json` as a template.
See the chapter of "Future Plans" below for one option to use field names from a defined ontology (like Dublin Core) to automatically handle data.
Bookworm creation may fail and throw an error if put in a key name that does any of the following things:
([A-Za-z_])+
Bookworm will throw a warning and drop any fields matching this description.
Here is an example of the field_descriptions.json
file from the Open Library Bookworm:
[
{"field":"title","datatype":"etc","type":"text","unique":true},
{"field":"lc0","datatype":"categorical","type":"character","unique":true},
{"field":"lc1","datatype":"categorical","type":"character","unique":true},
{"field":"lc2","datatype":"etc","type":"integer","unique":true},
{"field":"publishers","datatype":"etc","type":"character","unique":false},
{"field":"subjects","datatype":"categorical","type":"character","unique":false},
{"field":"publish_country","datatype":"categorical","type":"character","unique":true},
{"field":"publish_places","datatype":"categorical","type":"character","unique":false},
{"field":"publish","datatype":"time","type":"character","unique":true,"derived":[
{"resolution":"year"}
]},
{"field":"searchstring","datatype":"searchstring","type":"text","unique":true}
]
field_descriptions.json
also supports a syntax for derived fields; a single input field can yield several fields in the ultimate metadata.
Currently this is only enabled for time fields. Here's how it works: given the following snippet.
{"field":"publish","datatype":"time","type":"character","unique":true,"derived":[
{"resolution":"year"},
{"resolution":"month"},
{"resolution":"month","aggregate":"year"}
]}
Your time should be input in an ISO style (eg, "1921-03-05", although leading zeroes are not necessary). For each of the key-values in derived, a different field will be created: so in this example, we'll get publish_year
, publish_month
, and publish_month_year
. The year will be an integer 1921: the month will be an integer rounded to first day of March in 1921; and month_year will take the "month of the year", so that you can do things like look at calendar cycles. (This is particularly useful for visualization).
Aside from year, all values are stored in days since year 0. (There is no year zero, of course, but MySQL and Javascript act as though there is.)
Years before 1 AD will not behave properly.