Technology should reflect the ethos of the library
At the VALA2020 conference on Libraries and Technology last month I stated, as I have in numerous other presentations, reports, and recommendations, that implementations of technology (and I am usually speaking about AI) in libraries should reflect the ethos of the library. I say this not because the ethos of the library is correct, just, or even well-defined; but it is something to which we who work in libraries can be held accountable. Cecily Walker made it clear in her talk at VALA2020, "Be the Goose", that we have a lot of work to do to make libraries the safe and nurturing places for everyone that we claim them to be and want them to be. The notion that technology should reflect our ethos is not prescriptive. It is a design principle intended to align implementations of technology with the spirit and intentions of those who own it.
At the same conference, Adam Moriarty shared the vision driving the digitization and access effort at Auckland War Memorial Museum in New Zealand: “Open as a Rule, Closed by Exception”. The motto reflects a reconsideration and realignment of the museum’s use of technology to better match their mission: to share their collections. The careful guarding of collections and monitoring of use was a result of the values built into the technology they originally employed, not the values of the museum. The realignment effort required the museum to establish a clear rubric for determining which objects can and which cannot be shared widely. But the terms are their own, to which they are accountable.
At Stanford Libraries, the SUL AI Studio— an experimental cross-institution effort to surface projects that could benefit from AI — was organized around the vision that implementations of AI in the library be need driven, not technology driven. The projects that metadata librarians, subject specialists, engineers, and archivists brought to the studio were not chat bots, robot greeters, or recommendation engines, but applications of technology that library professionals could employ themselves to allow them to do their work better. They were projects like automating audio transcription to make recordings in the archives more discoverable and navigable; using improved handwriting recognition to transcribe field notebooks; and using content-based image search to help with deduplication and the development of descriptive metadata.
What makes each of those projects challenging is also what would make them enduring: they are need driven, not technology driven. They are challenging because they require close involvement of the people who do the work in all of the decision-making that goes on between deciding what to build and how to implement the new technology. It is much easier to just take an existing product off the shelf, or technique from a Google research paper, incorporate it into a system and declare improved efficiency. That approach ignores the design steps that ask the question that the Auckland Museum asked, “Does this technology reflect our mission and our values?” Those critical design steps force us to confront our accountability for developing systems that, as Cecily Walker pointed out, do harm whether intentionally or inadvertently.
Fellow VALA speaker Phillippa Sheail spoke to the ways that technology, particularly in the form of surveillance technology, is encroaching on the spaces of libraries, the practices of librarians, and expectations of privacy. Not only is human judgement being replaced by data analytics, but library expertise is devalued along with it. (The Twitter captures below are from Phillippa's slide presentation. See the full presentation here.)
So why not add a recommendation engine to a library’s online catalog? After years of digitizing and digitally transcribing (through OCR) books, journals, reports, and other textual objects in library catalogs, we now have the tools to do very interesting things with them that go far beyond allowing users to search within. The powerful combination of natural language processing and machine learning makes it possible to identify patterns within books and across collections. Making it easy for researchers to find similar books would seem like an obviously beneficial decision. But, as my friend Paolo Ciuccarelli would say, it’s all in the implementation.
What does similarity mean in an academic library? How does our understanding of similarity compare to assumptions about similarity that are built into a machine learning model? There are measures of similarity in lexical and syntactic grammar that can help us understand if similar words and phrases are being used in a corpus. But semantic similarity is much more difficult to define because it depends on context. A first question from a subject specialist might be, Which context? or Whose context?
Machine learning approaches to “find similar books” are based on representing the textual digital object as a set of features, which are then represented numerically in vector space. The measure of the distance between the features is the calculation of similarity. Who decides which features are important to determine the semantics of an object? What is the contextual relationship to a collection or to a domain? What constitutes a collection? Those decisions are not automated; they require human expertise and, importantly, they are matters of choice. Consider the following three sentences:
record the play
play the record
play the game
Which two are more similar? Some may argue that the last two sentences are more similar because play is used in both as a verb. And yet, the first and the last sentence are both likely to relate to a game you are watching or playing.
This is not an argument for inaction. On the contrary, I would argue that the greatest peril for libraries is that we not embrace this technology. But we have to bend it to our needs, not grab the first shiny object offered to us and implement it just because we can. An experiment based on text similarity measure was, in fact, central to one of the SUL AI Studio projects: Using Topic Modeling to Describe 19th Century Novels. The need driving the project was that the curator responsible for the of over 1,600 Edwardian novels did not have adequate descriptive metadata to understand the nature of the collection. Offering the tools to run similarity, clustering, and topic modeling on collections would be extremely beneficial for both classification tasks and curatorial work. But that is an example of supporting the experts in our library rather than circumventing their work with a button in the interface that purports to do that work automatically.
See all of the recorded keynote addresses at VALA2020 here.