AI learning dataset full of Japanese video game voice lines sparks controversy

Since the emergence of image-generating AI such as ‘Midjourney’ and ‘Stable Diffusion,’ the concept of AI-generated content has led to widespread global attention, also bringing concerns about unauthorized machine learning. Recently, a dataset called MoeSpeech, hosted on Hugging Face, a platform that hosts datasets for machine learning, has caused controversial reactions in the Japanese community as it compiles the voices of numerous Japanese voice actors. 

MoeSpeech is a dataset that contains approximately 363,000 voice files from 449 video game characters, totaling about 581 hours and 343GB. (source: Hugging Face) However, because it contains predominantly lines by Japanese voice actors, it has become a focal point of discussion in both technological and legal circles within Japan.  

The dataset has received mixed reactions from the public. Some question the legality of extracting game data for AI analysis in the first place.  

Post translation: I can’t believe it’s okay to extract game data just by saying that it’s for AI data analysis. If you install SDs, you can extract data to your heart’s content, say, “Actually, I won’t analyze it!” and then keep it all on your PC without anyone checking if it’s deleted. The age of major piracy is upon us. 
Post translation: I expect this would lose in a lawsuit. One reason is the possibility of falling under an “unjustly harming the interests of the copyright holder” case in Article 30-4. Another possibility is that game companies prohibit the distribution of voice lines in games when they contract with voice-acting agencies. 

 Others suggest that the legality of such datasets reflects problematic aspects of current copyright laws. 

Post translation: Shouldn’t we quickly spread the word and raise concerns about this? Anyone with common sense would realize that the law allowing this to be legal is flawed. 
Post translation: The feeling that this can’t be legal, and the law just hasn’t caught up is essential for living in a law-abiding country. 

The heart of the controversy lies in the dataset’s creation and usage. Experts such as Taichi Kakinuma, a lawyer specializing in AI and a board member of the Database Society of Japan, have further explored the legal case. 

In this case, it seems that there are three types of rights involved: 

  1. Copyright of the script, arising from the scriptwriter (or similar). 
  1. Neighboring rights related to the performance of the voice actor. 
  1. Publicity rights related to the voice of the voice actor. 
(translated into English, original version via Taichi Kakinuma

The key question is the application of Article 30-4 of the Copyright Law, which applies when using data for the purpose of “information analysis.” This law allows for replicating copyrighted data for machine learning purposes, provided it’s not for enjoyment. Thus, the creation and public transmission of such datasets are principally covered by this article.  

However, the issue is whether the act of creating and transmitting the dataset coexists with an “enjoyment purpose,” which would make the law inapplicable. 

In this regard, Hugging Face has claimed: 

“To prevent usage for enjoyment purposes, the following measures have been taken: 

  1. Hiding game names and character names, not categorizing by games, and using random alphanumeric names for character identifiers. 
  1. Randomizing the order of voice files in each character folder to prevent the identification of the sequence of lines.” 

(translated into English, original version via Hugging Face

However, the biggest area of debate is on publicity rights. Publicity rights, recognized in the Pink Lady Supreme Court decision on February 2, 2012, define the exclusive right to use one’s name, portrait, and so on to attract customers. This includes the voice, especially for voice actors or celebrities. Not all uses of such ‘portraits’ violate publicity rights, but potential unauthorized use of a voice actor’s identifiable voice could potentially constitute an infringement. 

Regardless of the debate around legality, other people have commented on the ethical concerns of such technology, such as the harm it can cause to voice actors’ livelihoods.  

Post translation: Even if this puts the position of voice actors at risk, they’ll probably say, “Don’t oppose technological innovation!” It was just by chance that this came to the surface. In reality, there’s already various voices already circulating. 
Post translation: Whether it hurts someone, whether your family falls victim to voice-phishing, or people die, for these people, the notion of ‘this might be dangerous’ is always secondary to ‘this seems interesting’. That’s the kind of people they are. Their sense of good and evil is weak, just like children. 
If people outside of their circle don’t monitor and regulate to ensure proper use, they will endlessly expand the scope of harm. 
Katarina Woodman
Katarina Woodman

I was born in the United States and currently reside in Kyoto, Japan. As an undergrad, I spent a year studying abroad in Japan, living in Nara City. Despite the language barrier, I was able to make many friends, which further fueled my desire to learn Japanese.
After completing my undergraduate degree, I moved to Kyoto, where I am currently enrolled in a graduate program at Kyoto University. I work as a freelance writer and translator on the side. I have an academic background in psychology and philosophy and a special interest in Japanese culture, Eastern philosophy, and linguistics.

アメリカ生まれ、現在は京都市在住です。学部生の時、1年間日本留学をし、奈良市に住んでいました。言語の壁がある中でも多くの友人を作り、日本語学習への欲望を刺激しました。
大学を卒業後、京都に移り、現在は京都大学で大学院課程に在籍しています。フリーランスライターおよび翻訳者としても副業で収入を得ています。心理学と哲学の学歴を持ち、日本の文化、東洋哲学、言語学に特別な関心を持っています。

Articles: 7

Leave a Reply

Your email address will not be published. Required fields are marked *

CAPTCHA