I: Reproducing inequality: How gender bias becomes normalised through prompting practices in text-to-image generators | September 2024 –*
Research project team:
Craig Johnson, Lecturer, School of Media and Communication, RMIT University
Nataliia Laba, Assistant Professor, Dept of Communication and Information Studies, University of Groningen
*To appear in: Race/Gender/Class/Media: Considering Diversity Across Audiences, Content, and Producers (6th edition, Routledge), edited by Rebecca Ann Lind, University of Illinois at Chicago (5th ed: contents).
Context
Artificial intelligence (AI) platforms capable of generating original images at the submission of textual prompts have gained significant popularity. AI-generated images have become a staple of visual culture, particularly online, and are increasingly used across various fields of algorithmic content creation. However, the rise of generative models has also sparked concerns about their potential to perpetuate bias and discrimination. Text-to-image models, such as Stable Diffusion and Midjourney, have been found to encode substantial biases and stereotypes, which can lead to increased hostility, discrimination, and even violence toward certain communities (Bianchi et al., 2023; Thomas and Thomson, 2023). Datasets used in training these models are not merely raw materials but political interventions that mirror societal norms and biases (Crawford & Paglen, 2019). As such, text-to-image generators may replicate biases inherent in Western-centric training datasets, resulting in skewed representations. While the vast training datasets are beyond user control, the act of prompting remains within it.
Research question
To what extent do user-generated prompts in text-to-image AI tools reinforce traditional gender roles and stereotypes?
Method
First, an analysis of 3,720 prompts is conducted to identify biases, focusing on the representation of different demographic groups within generated images. These prompts were identified from the public dataset 3.5M Unique AI Art Prompts in Nomic Atlas (https://atlas.nomic.ai/), which is a deduplicated subset of vivym/midjourney-prompts (Hugging Face dataset) created in May 2024 (see 3.5M Unique AI Art Prompts). We use algorithmic topic clustering in Nomic Atlas to access the topic of gender, its groups, and metadata in order to get a snapshot of user prompting practices in relation to gender roles through textual invocations of AI-generated imagery. Second, we apply thematic analysis as defined by Braun and Clarke (2012) to interpret categories and meanings related to gender-related prompts. Third, we consider critical paradigms within digital media and cultural theory.
II: AI stance-taking: Representation of war in text-to-text and text-to-image generators | August 2024 –
Research project team:
Nataliya Roman, Associate Professor, School of Communication, University of North Florida
Nataliia Laba, Assistant Professor, Dept of Communication and Information Studies, University of Groningen
John H. Parmelee, Professor, Chair of the Department of Communication, University of North Florida
In this project, we analyze the attitudinal stance-taking of non-human agents across different modes of generative content production. More specifically, we examine how text-to-text and text-to-image generators represent the Russia-Ukraine war as a focused issue of global political significance. Drawing on cultivation theory and multimodal framing theory, we compare how eight AI systems represent the war based on content analysis of textual and visual outputs. This analysis is conducted using a series of prompts that either frame the generation tasks neutrally (e.g., What are the most common misconceptions about the Russia-Ukraine war?) or probe the system for potential biases by introducing an identity of the promptor (e.g., Generate an image representing the Russia-Ukraine war from a Ukrainian perspective).
III: From text to video: Examining the effects of prompt modifiers on AI video generation | August 2024 –*
*findings: at the International Conference of the DGPuK (German Communication Association) Visual Communication Section: Generative Images - Generative Imageries: Challenges of Visual Communication (Research) in the Age of AI, Nov 21, 2024 in Bremen, Germany. Part of Panel 7: Expanding Horizons – Multimodal Research Meets Generative, with TJ Thomson (School of Media & Communication, College of Design and Social Context, RMIT University, Australia), Katharina Lobinger (Università della Svizzera italiana, Switzerland), and Daniel Pfurtscheller & Katharina Christ (University of Innsbruck, Austria).
In this project, I examine how text prompts impact the visual aesthetics of AI-generated videos. Building on Oppenlaender’s (2023) taxonomy of prompt modifiers for text-to-image generation, I analyze how short key phrases (“modifiers”) affect the style of AI videos generated with Runway AI (https://runwayml.com/) and Pika (https://pika.art/), which are top industry players in the field of AI video technology. The proposed conceptual framework for analyzing style in video-based media derives from an ongoing project that explores the relationships between the design of generative media and their communicative potential in the context of AI content generation as a novel socio-technical practice from a critical multimodal discourse analysis (CMDA) and affordance theory perspectives. Focusing on human-model interaction, this work addresses how style is recontextualized as a technical parameter of the system, providing researchers with a conceptual standpoint for investigating the practice of multimodal video generation.
Fig 1. Style reference parameter --sref in Midjourney.
Left: original artwork She was the World by the Dutch artist Martine Mooijenkind. Right: Midjourney-generated image. Prompt: a surrealist collage --sref <URL> (me x MJ, 23 April 2024).
IV: Consequences of 'prompting for style' on Midjourney | March–August 2024*
*(forthcoming) research output:
Laba N (2024) Beyond magic: Prompting for style as affordance actualization in visual generative media. New Media & Society. https://doi.org/10.1177/14614448241286144.
***
Drawing on earlier work on image generation as a socio-technical practice at the nexus of humans, machines, and visual culture, I use the parameter of style as an entry point into a critical study of prompting for a visual aesthetic.
I examine prompting for style on the #prompt-chat channel of Midjourney’s Discord server, a dedicated discussion room for talking about how to craft prompts. This space offers an insight into a user perspective on image generation, pointing to the value of community learning in the context of unpredictable machine behavior, while also revealing platform affordances that enable questionable production practices around style recontextualization without proper attribution.
My findings show that while visual generative media holds promise for expanding the boundaries of creative expression, prompting for style is implicated in the practice of generating a visual aesthetic that mimics paradigms of existing cultural phenomena, which are never fully reduced to the optimized target output.
Fig 2. Network visualization of threaded communication network of nine most pronounced communities in my dataset (node size = in-degree range). I used Gephi to import a structured dataset.
Samples of divergent opinions:
@user1: Artists are the hypocrites here. They all steal from other artists, they all copy the world around them, they all train on the work of masters just like an AI, and yet they complain when someone likes their style and mimics [sic] it.
@user2: Idiot. Human inspiration is nothing like ai interpolation and training. Even then inspiration is only 1% of the process. Even then you require no other artists to learn, you can simply observe [sic] and draw from the natural world to practice your fundamentals.
V: Visual generative media and its implications for creative practice | May–August 2024*
*manuscript under revision.
*findings: at the International Conference of the DGPuK (German Communication Association) Visual Communication Section: Generative Images - Generative Imageries: Challenges of Visual Communication (Research) in the Age of AI, Nov 20, 2024 in Bremen, Germany. Talk titlled: AI vs Artists: Training data, creative economy, and public opinion about visual generative AI
***
Adopting production- and usage-oriented perspectives, this work is concerned with the social implications of generative models’ integration into visual production practice. More specifically, I ask:
What are the public perceptions of corporate and personal accountability around the production and use of visual generative media? Are image generators seen as tools or theft of intellectual property?
The main objective of this study is to identify pre-requisites of an ethical integration of visual generative media into visual culture, where limited investigation into its workings is held to be possible due to the technological complexity of the system.
To understand public perceptions of corporate and personal accountability, I analyze 3,983 messages exchanged between professional designers, artists, and ordinary people experimenting with generative models, engaged in the discussion of these issues through comments to the YouTube video AI vs artists – the biggest art heist in history on the channel @Yes I’m a Designer, posted on 1 March and monitored for 70 days until 10 May 2024. Take a look at the data snippet here.
Highlights: Communalytic (sentiment analysis; VADER and Textblob), YouTube API v3, Voyant—two word roots—train* (n = 521), and tool* (n=437).
Fig 3. Word cloud of 50 most frequently used words in the corpus (91,192 total words). Made with Voyant. To expand the word cloud, drag the slider next to Terms (desktop only).
To identify comments which specifically address the issues of training data practices and sentiment around visual generative media use, I performed a search of all instances of two keyword roots—tool* (n=437) and train* (n=521). After manual data cleaning, 604 comments containing tool (n=300) and train* (n=304) were identified. These formed the corpus for network analysis and theme analysis, comprised of 91,192 total words, with 19.5 average words per sentence. The most frequent words in the corpus were: AI (1,392), art (887), artists (478), people (391), and work (386). The vast majority of comments contained elaborate opinion pieces, structured into paragraphs, providing logical reasoning in relation to the topic of model training and generative media use, often using evidence from copyright law and personal experience, to substantiate the argument. This suggests that the commenters care about the issue and express their opinion in a legible manner to contribute to the debate around contentious topics of AI’s impact on creative industries.