Welcome back to Nooks & Crannies! After a month off for my wedding, I've been digging around for some interesting bits for upcoming columns. This month, I'll take a look at some open source code libraries that developers can use to handle MARC-formatted records.
A little background for the MARC novice
MARC stands for MAchine Readable Cataloging records. It's a format first developed in the 1960s for the U.S. Library of Congress in order to facilitate the exchange of bibliographic records among libraries. By the mid-1970s, it was an international standard, used around the world.
There are several variants of the MARC format. MARC21 was a merger in the 1990s between USMARC and CANMARC, the US and Canadian variants then in use, and other countries have their own formats. In much of Europe, UNIMARC is the variant most often seen. All of these records are formatted the same, with a structure of tags that are used to contain information, a directory which tells what tags are in the record, and where they are located.
Each tag, in each format, means something specific. For instance, in MARC21 bibliographic format, the 245 tag holds information about the title of the work. Additional information, including the publisher, author, size of the physical book, publication date, and subjects, are contained in other tags.
The format of the record, if you were to just print it out, is kind of hard to read. It was originally designed for serial interchange, via 9-track tape, and that medium was still in use in the early days of my career, in the 1990s. The first five bytes of the record are digits and tell you how long the record is, in bytes—including those five bytes. The clever modern nerd