ePADD 7.2 now available
The ePADD project team is excited to announce the release of ePADD version 7.2! This latest version of the software includes bug fixes that improve the performance of the indexing of email attachments in the Appraisal module.
In Fall, 2019, the ePADD project was awarded a Weber grant by Stanford University Libraries to undertake several key development activities aimed at ensuring that ePADD remained optimally functional and to anticipate the future needs of our users. One of our primary goals was to research errors encountered when we tried to import Don Knuth’s email collection into ePADD. This email collection is part of the Don Knuth papers, held in the Stanford University Archives, which documents the life and work of this influential computer scientist and Stanford professor emeritus.
When we first tried to import Don Knuth’s email into ePADD’s Appraisal Module, we were met with an error that seemed to be related to ePADD’s indexing of the files attached to emails. ePADD uses a content analysis and detection technology called Apache Tika to recognize file types so that they can be indexed, which makes them searchable in ePADD. Instead of smoothly recognizing all of the files attached to emails in the collection, it would seem to get stuck on an email attachment and never complete the indexing process.
After several attempts we called on the expertise of our Technical Advisor, Sudheendra Hangal and Software Developer, Chinmay Narayan to research the issue. They discovered that the culprit was indeed email attachments but specifically large .zip files. Apache Tika can recognize many common file types but not all, and Hangal and Narayan determined that it was unable to identify the .zip files in this case. Not only was it failing to identify these files, but due to their large size, it was taking an extremely long time to do so. When we perceived ePADD as stalling in the process of indexing the email collection it was actually hard at work cycling through each and every byte of the first .zip file it encountered, trying to identify it. The solution they arrived at was to simply skip these files during the indexing process altogether. This change allowed ePADD to successfully index the Knuth email collection.
In this case, we decided to simply direct ePADD to skip .zip files in the indexing process, but it also made it clear that we need to perform a broader and more systematic review of how ePADD handles file attachments. We are planning to carry out this review as part of our current Andrew W. Mellon Foundation funded grant project.
The ePADD project team would like to thank Stanford Libraries and Phil Schreur for their support in funding this project. With their generous support we were able not only to achieve the goals set for this grant period, but also to identify additional priorities that we will be integrating into our current grant project.