To build a bookworm, you need three files:
input.txt: a single file containing your texts (see the section for alternates if that's impractical;
jsoncatalog.txt: metadata about each text;
field_descriptions.json: metadata about your metadata, such as which fields are dates and which are categorical.
The result of this chapter describes the format for these three input files.
These can be stored in one of two locations:
If you want to clone a bookworm as part of a larger installation, you can leave the files in a root directory. This is frequently cleaner, and is the recommended format. In this case, your file structure will look like.
your_directory/ -- jsoncatalog.txt -- input.txt -- field_descriptions.json
To build the Bookworm, follow the instructions.
This is a legacy format. There is no reason to use it. Relative to the root directory (which will be called
BookwormDB if you clone this repo directly), they are all in a subdirectory called
files. When you've built them all, it will look like this:
BookwormDB/ -- files/ | -- texts/ | | raw <--- (alternate method: contains texts files or hierarchical folders of text files) | | input.txt <----- (preferred method: a single file with all texts, preceded by their id.) | -- metadata/ | | -- jsoncatalog.txt | | -- field_descriptions.json
There is also a highly deprecated format, occasionally suggested by
some old documentation and installations, that recommends a folder
texts inside files. Really, really, do not use this format.