Locating and Resolving Unauthorized Terms Used in Marc Records¶
Note
This will work best if the tsv has been cleaned up first using the galatea clean-tsv command
This workflow is used to locate and resolve unauthorized terms in MARC record data found in a tsv file. The workflow consists of the following steps:
- Check for unauthorized terms in the tsv file.
Use the authorized-terms check command to find unauthorized terms in the tsv file.
galatea authorized-terms check <source_tsv>
- Create a new transformation file with the unauthorized terms.
if you don’t already have a transformation file with the unauthorized terms, use the authorized-terms new-transformation-file command to create a new one.
galatea authorized-terms new-transformation-file
Fill in the transformation file with how you want specific unauthorized terms to resolve.
Note
Don’t have to start from scratch every time. If you have a transformation file that you have used before that has all the transformation defined, you can use that. In fact, you should be constantly improving a single one.
The transformation file is the tsv file generated by the authorized-terms new-transformation-file command and will contain a list of two columns. By default, the name generated for the file by this command is “authorized_terms_transformation.tsv”, but you can name it whatever you want.
Open this file in whatever table editor you prefer, as Microsoft Excel. As long as the the application can open and save tsv files, it can be used for this step.
The format of this file has the first column for unauthorized terms, and the second column is for the matching authorized term that should replace the unauthorized term.
For example:
You want the unauthorized term “Chicago, Ill.” to change to “Chicago (Ill.)”, and the unauthorized term “Washington, D.C.” to change to “Washington (D.C.)”, the tsv file should look like this:
unauthorized term resolving authorized term Chicago, Ill. Chicago (Ill.) Washington, D.C. Washington (D.C.)
Warning
The first line of the tsv file is a header line, and should not be changed.
After the header line, you can add as many lines as you want, but the first column should always be an unauthorized term, and the second column should be an authorized term.
- Resolve the unauthorized terms to authorized terms in the tsv file.
Using the transformation file you created in the previous step, use the authorized-terms resolve command to resolve the unauthorized terms in the tsv file.