The Beats within: comparing AI & human adaptations of “Howl”
This blog post was co-authored by Kathleen Smith.
How have translators grappled with adapting Allen Ginsberg’s iconic poem “Howl”? What happens if we feed its first line to a machine-learning algorithm for text generation, and then give those results to an image-generating algorithm? Kathleen Smith (Curator, Germanic Collections and Medieval Studies) and Quinn Dombrowski (Academic Technology Specialist, CIDR & Division of Literatures, Cultures, and Languages) explored these different ways of adapting “Howl” in their talk “The Beats Within: Comparing AI & Human Adaptations of ‘Howl’” as part of HOWLoween: Celebrating Wolf Awareness Week at Stanford Libraries (October 18-22).
As home to the Allen Ginsberg papers, the Stanford Libraries’ collections include original drafts of “Howl”, which was first published in the 1956 collection Howl and Other Poems. “Howl” is a fascinating work for the study of translation because it is such a complex text. For human translators, the challenges begin with the title and its connotations. For example, is “howl” a noun or a verb? Translators have tended to use an equivalent noun, with a few noteworthy exceptions: the Italian translationUrlo ‘I scream’, the Romanian Urlet de mînie ‘howl of rage’, and Hebrew יללה which refers to a Biblical lament or wail, adding another layer of meaning to the poem. Other translators have sidestepped the issue altogether, retaining the original title untranslated.
The first translator of the poem into Spanish was Fernando Alegria (1918-2005), a Chilean poet in exile, literary critic, Stanford professor, and friend of Allen Ginsberg. He later expressed reservations about his choice to render “hipster” in a literal and physical sense as “buenas caderas”, but he was far from the only translator to struggle with the phrase “angelheaded hipster”. A French translation by Robert Cordier and Jean-Jacques Lebel uses “initiés à tête d’angel”; similarly, the Italian translation by Pierfrancesco la Mura opts for “alternativi dalle teste d'angelo”, while another Italian translation by Fernanda Pivano simply retains “hipster”, as do German and Swedish translations.
This question of what is being translated -- the words into rough equivalents, the cultural context, the underlying concepts, or something larger -- shaped translations beyond Alegria’s. Fernanda Pivano (1917-2002) translated Ginsberg, Hemingway, Kerouac, and Burroughs into Italian. She saw this work of translating the Beats as a form of freedom and translation as anti-fascist resistance. In a 1996 essay, she described “top 10 list” of Ginsberg’s ideals: “spiritual liberation, sexual revolution, freedom from censorship; demystification of any laws against marijuana; spreading of ecological awareness; opposition to the military-industrial complex; respect for the Earth and native peoples; less consumerism; Eastern thought; universal anti-fascism”. Translators in the USSR saw Ginsberg’s writings as a model for resistance/dissidence, one that appealed to Soviet censors who evaluated the work on a literal level and saw Ginsberg as an anti-American activist. Translations also reflect the context of the time in which they were created, as can be seen in a 1979 Polish translation of two other works by Ginsberg rendered “queer” as “dziwaczne” or “odd,” rather than “gay”.
“Howl” has also undergone numerous “translations” to other media, ranging from a 1974 musical setting by Polish composer Boguslaw Schäffer (using Polish translation by Leszek Elektorowicz) to a 2006 musical setting by Hyla Lee, to an automatic “convert to MIDI” transformation using the Ableton Live software, to the 2018 film. New translations of “Howl” will continue to be necessary to reflect changing contexts and time periods. But what happens if we give it to computers to adapt, in the spirit of creative making and breaking? What can this new approach tell us about the poem and how we relate to it?
“Howl” is considered one of the most important poems of American literature and stands as an iconic work of the 1950’s. Can its famous first line be translated into a different context entirely by using fictional texts from another period? What does the GPT-2 model (a large language model originally developed by Open-AI) mark as the identifying features of the first line of “Howl” and what does that tell us about the GPT-2 model’s knowledge of literary texts?
For this occasion, we retrained the small-size GPT-2 model (originally developed by Open-AI) to generate alternate ways to complete the text “I saw the best minds of my generation…” in the style of different authors or characters, with the result as a text or an image.
GPT-2 was trained on 40 GB of internet text, about 8 million webpages, mostly in English. While that training data provided the model with many racist and sexist word associations, it also provided lots of information about the basics of English morphology and syntax, allowing it to generate text that often resembles coherent modern English, at least on the sentence level. For image generation, we used VQGAN + CLIP (Vector Quantized Generative Adversarial Network + Contrastive Language–Image Pre-training). VQGAN is a widely used model for doing image generation (e.g. creating new images based on text prompts), while CLIP was trained on a vast (and publicly unknown) corpus of both text and image data. In essence, CLIP takes in a text prompt and uses it to “steer” VQGAN towards outputs that better reflect the text prompts, even when they’re complex. We used Google Colab Python notebooks for running both text and image models.
What if, instead, we take Ginsberg in the direction of 1980’s and 90’s America and the context of a wealthy high school soap opera with GPT-2 retrained using Sweet Valley High?
With Halloween shortly following this talk, we could not resist trying a famous American mystery series like the “Hardy Boys”:
What if the opening line of “Howl” was part of an English regency romance? Here’s what GPT-2 retrained using text from Georgette Heyer came up with:
What if Jabba the Hutt were a beat poet?
Retraining the small GPT-2 model on Star Wars text isn’t enough to overcome what it’s “learned” about English syntax, even when given a prompt that might mitigate otherwise, such as the idiosyncratic grammar used by the character of Yoda.
Can an algorithm retrained to write like Allen Ginsberg regenerate “Howl” if given the starting prompt? In theory, it’s possible: when generating text with GPT-2, you can turn up or down the “temperature” hyperparameter to control how “creative” it gets. Using a higher value for the temperature leads to more word combinations that never appear in Allen Ginsberg’s writing; using a lower value, it is more likely to get into repeating text loops. There’s always a balance to be struck here: if you’re using GPT-2 to generate something new, you want it to be creative, not just regurgitate things it’s already seen, but if you go too far it stops making any modicum of sense. You might think that turning the temperature far down might provoke it to simply spit out the actual text from “Howl” (which the model has seen before in the retraining process); the problem, though, is that the model has also learned lots of other directions that the phrase “I saw…” can go -- both from Allen Ginsberg and as part of the base GPT-2 training. You never get the exact same output twice when you ask GPT-2 to generate some text, and certainly none of its attempts at any temperature led to anything that resembled “Howl”.
These examples present a completely different view of Ginsberg’s “Howl” and the multiple layers on which it engages readers. While humorous, our computer-generated adaptations also serve an important purpose in reminding us not only of the (often-invisible) amount of human labor involved in translation and interpretation, but also of those qualities which make “Howl” so timeless.
Fundamentally, these computer translations experiments are valuable in forcing us to confront the high expectations we place on tools and algorithms that claim to be powered by “artificial intelligence” and the reality of its limitations. Large language models (of which GPT-2, particularly the small model, now counts as one of the smallest) are, as Bender, Gebru, et al. put it, “stochastic parrots”, not anything that “knows” anything in a manner comparable to the linguistic and cultural knowledge that goes into the thoughtful human translation of a difficult poem. Yet they are tasked with solving complex human challenges such as providing advice through web-based chat bots. What is really being modeled when VQGAN+CLIP creates an image of “Howl” using the Star Wars corpus of texts? Are large language models a viable tool for responding to research questions in a useful and factual way? What are the ethical implications, potential for biases, flat-out “wrong” answers, and harm? Next time you see a press release that rhapsodizes about “AI”, hold your awe in check and start asking questions.
Special thanks to Evan Muzzall for his thoughtful feedback on the first draft of this post.