• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Enhanced Networked Monographs

Topic Map

Topic map editorial policies

May 18, 2018 by Alex


Icon of a computer from the Noun Project created by Arthur ShlainIcon of an eye with eyelashes by Adrien CoquetThe workflow for managing the topic map is a mix of machine processing and human curation. The Topic Curation Toolkit (TCT) does things like merge and link topics, and the human editor reviews and does quality control. Core editorial activities happen in the TCT. Most of the fields can be directly edited, and it’s also possible to add new data, “re-run relations,” and mark whether a topic is reviewed (simply checked over) or edited (in other words, changed in some way). The TCT provides alphabetical lists of topics that can be sorted (alphabetically, by number of occurrences, or by number of relations) and filtered by review status (unreviewed or reviewed).

When I began my work as Digital Production Editor, I created a workflow for proceeding through the topics in alphabetical order. Between January and June of 2017, I was able to check about 15,800 topics, or a little over one third of the topic map. 12,647 of these topics were marked as reviewed, while 3,151 were edited. While working through part of the topic map, I discovered issues and observed patterns that helped me develop editorial values, principles, and practices to guide my work going forward.

The topic map values identified were

  • Automation over human curation
  • Serendipity over rigidity
  • Transparency and mitigation of bias
  • Autonomy of the user (potential conflict with Serendipity over rigidity)
  • Accuracy of content (potential conflict with Automation over human curation)

The principles (more specific than values) included

  • Pay attention to the impact of editorial intervention on the end-user
  • Minimize involvement of the editor
    • Keep editorial intervention to the minimum possible, so that the workflow could scale up in the future.This implies a level of trust in the automated processes producing topics and relations.
  • Privilege serendipity through loose semantics
    • The dataset resulting from this process should facilitate the serendipitous discovery of concepts by users. Therefore, broad or loose understanding, especially in the relations between topics, is valued over restrictive and narrow interpretation of a topic’s meaning and relations. This is especially true for broader or more general topics.
    • In the current iteration of the Topic Curation Toolkit software and because of the nature of EPUB index markup today, some subentries may have parsed with issues, and see or see also cross references may have linked subentries instead of main entries. Because not all of these are easily findable, this variety of messiness or looseness is acceptable in the context of this project.
  • Minimize misleading or irrelevant relations between topics
    • to the extent possible while considering the importance of principle 2 (Minimal involvement of the editor).
  • Work to get through all topics
    • Strive to efficiently address all topics, but acknowledge that the Digital Production Editor may not be able to review all topics
  • Acknowledge non-neutrality
    • A workflow is as a series of decisions and actions. A critical analysis of a workflow acknowledges the points at which bias and judgement enter the process and recognizes the impact of the editor.
    • Relatedly, a critical workflow recognizes that the software and automated processes are designed to make choices which are not always ones that humans would make (e.g., connecting topics based on strings of matching words, which may have distinct meanings.)
  • Proceed as if we are going to continue adding terms and indexes to the topic map dataset.
  • Conduct more involved editing in special cases

Finally, the values and principles are informed by–and then in turn themselves inform–specific day-to-day practices, or policies. The practices provide guidelines for removing relations, adding relations, splitting topics, merging topics, deleting topics, name scopes, alternate names and changing names, main/sub entry curation, problem records, and URIs. For example, one policy under the category of splitting topics has to do with splitting proper nouns and general concepts:

SPLITTING TOPICS

Policy
Split proper nouns and general concepts into separate topics and move relations to the appropriate topic.

Key examples
Nation, The
Semantic ambiguity can be a problem for the user when they look at occurrences. For example, there was a topic record with two names listed: Nation and Nation, The. The occurrences referred to either the concept of a nation or to the specific publication called The Nation. In other words, one of the names in this topic was a proper noun. A user trying to read about The Nation would have to sift through irrelevant occurrences to find what they needed.

The example here is the topic nation. It turned out that some occurrences referred to the concept of a nation, while others referred to the publication The Nation. This happened because the TCT ignores stopwords like “the” and “a.” Taking into account here the values of autonomy and accuracy, we thought that a user trying to read about The Nation would become frustrated with having to sift through occurrences about the general concept, so I created a policy to split those kinds of homographs into separate topics.

Even though I was the only editor of the ENM topic map, creating a policy document was important for both maintaining consistency in my own work and ensuring that I kept a record of why certain decisions were made, which (I hope) provides some transparency to those who will eventually use the topic map. Read the full policies document for more details (link forthcoming).

Filed Under: Topic Map

What is the ENM topic map?

May 18, 2018 by Alex

One of the main deliverables for ENM is a topic map of names and concepts derived from index entries. ENM’s topic map is a meta-index made by combining many individual back-of-book indexes into one dataset.

Background on Topic Maps

Originally developed in the 1990s to address the need for dynamic, aggregated indexes for UNIX manuals, Topic Maps is an ISO standard (ISO 13250) for representing data about concepts.1 For the ENM project, we are not using the XML syntax for topic maps (XTM), but rather are inspired by the underlying data model. A topic map can be thought of as a dataset that sits on top of a source (such as a book) or group of sources. As an interlinked graph, the topic map facilitates navigation between parts of the source.

Diagram of higher-level topic map structure, showing topics, relations, and occurrences
Image by Michel Biezunski (Infoloom), available at https://www.infoloom.com/media/presentations/mb-2018-04-28.pdf

A topic map consists of topics and relations. These relations can be associations between topics or links between topics and sources (called occurrences). Topics are representations of abstract concepts, and are comprised of names and can have types.

Detail of an index from Postmodern Legal Movements (New York University Press) showing a qualifier of the index entry Cleavers that reads TV family.
Detail of index from Postmodern Legal Movements (1995, New York University Press)

In book index terms, a topic is a heading, an association is a “see” or “see also” cross-reference, and occurrences are the page number locators listed for each heading. Unlike an index heading, however, topic map elements, like name, can be scoped in order to disambiguate and designate conditions of validity for a given statement. We are using scope as a qualifier, which you occasionally see as parentheticals in index entries.

ENM topic map

The ENM topic map was generated using a custom-built piece of software called the Topic Curation Toolkit (TCT). Developed by Infoloom, the TCT is a series of back-end scripts and a front-end editing interface. The TCT parses EPUB files and populates a relational database by extracting, merging, and linking index entries and page text.

Screenshot of the topic page for Amazon in the Topic Curation Toolkit, showing topic record elements.
Screenshot of the topic page for Amazon in the Topic Curation Toolkit.

Above is a screenshot of the record in the Topic Curation Toolkit for the topic Amazon. In the center there are names for this topic, including one that is scoped; on the right there are relations to other topics; below there are occurrences, or pages in books; and on the very bottom there are links that connect to controlled vocabularies.

Screenshot of the occurrence view for a page in Making News at the New York Times, showing page text and topics indexed on that page.
Screenshot of the occurrence view for a page in Making News at the New York Times (2014, The University of Michigan Press)

Above is a screenshot of the occurrence or book page view in the Topic Curation Toolkit. The text of the page is in the middle, and on the right is a “reverse-index” view of all of the topics indexed on this page.

ENM topic map stats

The ENM topic map was built from 89 books with indexes and page numbers. Not all epubs have both, so out of 113 books we were considering, 89 were usable for this aspect of our project. Published between 1987 and 2016 (with the largest number published in 1998), the books in the topic map are open access titles from disciplines including literary analysis, philosophy, law, media studies, race studies, and gender studies. These 89 EPUBs generated 45,000+ topics! Only 2,652 topics appeared in two or more books, so one lesson learned is that it might make more sense to generate a topic map for a specific discipline, since those books might be more likely to share terminology.

By the end of the project, we plan to release a more detailed Topic Curation Report, so stay tuned for that for more details about the content and creation of the ENM topic map. To view the open-source code, made available under an open source Apache 2.0 license,  head over to our Github repositories for the frontend, backend, and vagrant box components.

References

  1. Newcomb, S. R. (2003). A Perspective on the Quest for Global Knowledge Interchange. In J. Park & S. Hunting (Eds.), XML topic maps: creating and using topic maps for the Web. Boston: Addison-Wesley.

Filed Under: Topic Map

Industry Insights: Alex’s interview with the ACLS Humanities E-book team

May 18, 2018 by Alex

Humanities E-Book logoBack in November, I had the chance to sit down with colleagues from ACLS Humanities E-Book. HEB is a collection of about 5,000 scholarly ebooks from humanities fields. As part of their Industry Insights blog series, I chatted with HEB’s Marketing & Production Manager Chris Plattsmier about the ENM topic map and some of the research we did into user expectations. View the video or read the transcript of the interview at http://www.humanitiesebook.org/industry-insights-humanities-e-book/.

Filed Under: Topic Map, User Research

Primary Sidebar

Pages

  • Home
  • Presentations
  • Project Team

Recent Posts

  • Topic map editorial policies
  • What is the ENM topic map?
  • Industry Insights: Alex’s interview with the ACLS Humanities E-book team
  • ENM at NYU TorchTech Share Fair
  • Annotation for EPUBs

Archives

  • May 2018
  • April 2018
  • October 2017
  • August 2017

Categories

  • Design Process
  • Reader Interface
  • Topic Map
  • User Research

Copyright © 2025 · eleven40 Pro on Genesis Framework · WordPress · Log in