Academic researchers of ChatGPT user habits stumble upon “extreme outlier” who generated thousands of fanfics about Doki Doki Literature Club! characters giving birth  

The “AI Fiction in the Wild” study analyzed over 500,000 anonymous English-language ChatGPT conversation logs.

As spotted by ITmedia, a team of researchers from the University of Washington and the University of Colorado Boulder recently published a paper titled “AI Fiction in the Wild,” analyzing over 500,000 anonymous English-language ChatGPT conversation logs. Funnily, among its findings, the study spotlights a particularly prolific user who spent months generating huge volumes of Doki Doki Literature Club! pregnancy fanfiction. The study found that there was a tendency among a certain subset of users to request fanfiction of specific IPs over and over again, as well as a pretty strong trend of requesting sexual content. The research comes with impeccable timing given the recent discourse surrounding gen-AI content in fandom and fanfic. 

The “AI Fiction in the Wild” study examines how AI users use ChatGPT to generate fiction, both in terms of volume and content. It draws on the WildChat dataset, which consists of conversations collected through a free chatbot hosted on Hugging Face, where users were given access to GPT-3.5 Turbo and GPT-4 without an OpenAI account after consenting to their conversations potentially being used and anonymously shared for research purposes. 

The researchers analyzed roughly 573,000 English conversations collected between April 2023 and May 2024. Since the topic focused specifically on fiction, they first identified which conversations qualified by filtering only content involving imaginary or hypothetical scenarios (AI was also used for this process, though humans manually verified the accuracy of filtering based on a sample of 300 selected conversations). 

Of the approximately 573,000 conversations, around 195,000 were classified as fiction, and of those, roughly 52,000 contained “sexually explicit material.” Additionally, another 67,000 conversations were labeled “toxic” by the dataset. This implies that nearly 30% of the fiction generated by AI users involved sexual content. 

Another interesting detail is that the researchers found that fiction generation was densely concentrated among a very small group of “heavy users.” According to the data, the top 2% of fiction-generating users accounted for more than 80% of all fiction-related conversation logs. The total number of users generating fiction was estimated at around 10,000, which suggests that roughly 200 people were responsible for over 150,000 fiction prompts. 

Among these heavy users, the researchers observed a couple of distinct patterns of behavior. One type, dubbed the “story cyclers,” repeatedly generated iterations of the same story for a certain period before moving on to another topic. Others, labeled “infinite story demanders,” spent long stretches of time repeatedly requesting nearly identical stories with only minor variations. 

The previously mentioned Dok Doki Literature Club! fanfic generator is cited as a prime example of an infinite story demander. Over the course of several months, this individual prompted ChatGPT to create fanfiction based on the game thousands of times, with the very specific premise of heroine Natsuki suddenly going into labor and ChatGPT being tasked to continue the story from there. 

Image Credit: “AI Fiction in the Wild” on arXiv

In response, ChatGPT produced a wide range of endings, such as an ending where emergency responders arrive in time, and both mother and baby survive safely. While the paper dubs this user an extreme outlier, they note that many heavy fiction users displayed similar tendencies. Among prompts submitted by the top 2% of users, 69% were repetitive prompts, with users repeatedly attempting to refine or re-run nearly identical requests. 

The study also ranked the franchises most frequently mentioned in these fiction-related conversations. Doki Doki Literature Club! topped the list with 22,381 mentions, followed by Freedom Planet (5,204), League of Legends (4,514), and Naruto (4,342). Although, it’s important to note for all the mentioned findings that the WildChat dataset is not representative of all ChatGPT users. Since it comes from a free chatbot hosted on Hugging Face, its users were likely more technically inclined and more immersed in online culture than the average AI chatbot user. Even so, the researchers argue that WildChat offers a rare glimpse into real-world interactions with ChatGPT. 

Related: Japanese government faces backlash over AI-generated flyers meant to promote fair transactions with anime creators

Hideaki Fujiwara
Hideaki Fujiwara

Automaton Japan Deputy Editor-in-Chief. Voracious gamer who plays everything. Loved Titanfall 2 and has been playing Apex Legends since its launch.

Articles: 250

Leave a Reply

Your email address will not be published. Required fields are marked *