AI learning dataset full of Japanese video game voice lines sparks controversy

Since the emergence of image-generating AI such as ‘Midjourney’ and ‘Stable Diffusion,’ the concept of AI-generated content has led to widespread global attention, also bringing concerns about unauthorized machine learning. Recently, a dataset called MoeSpeech, hosted on Hugging Face, a platform that hosts datasets for machine learning, has caused controversial reactions in the Japanese community as it compiles the voices of numerous Japanese voice actors.

MoeSpeech is a dataset that contains approximately 363,000 voice files from 449 video game characters, totaling about 581 hours and 343GB. (source: Hugging Face) However, because it contains predominantly lines by Japanese voice actors, it has become a focal point of discussion in both technological and legal circles within Japan.

The dataset has received mixed reactions from the public. Some question the legality of extracting game data for AI analysis in the first place.

https://twitter.com/rumiakane/status/1749871422980411529?s=20

Post translation: I can’t believe it’s okay to extract game data just by saying that it’s for AI data analysis. If you install SDs, you can extract data to your heart’s content, say, “Actually, I won’t analyze it!” and then keep it all on your PC without anyone checking if it’s deleted. The age of major piracy is upon us.

これって，訴訟されたら負けると予想しているんだよね．1つは著作権30条4の「著作権者の利益を不当に害する」場合に該当する可能性がある点．もう1つは，ゲーム会社が声優事務所と契約する際に「ゲーム中の音声の配布」そのものを禁止している可能性． https://t.co/JcPGJ8nu8k
— M. Morise (忍者系研究者) (@m_morise) January 24, 2024

Post translation: I expect this would lose in a lawsuit. One reason is the possibility of falling under an “unjustly harming the interests of the copyright holder” case in Article 30-4. Another possibility is that game companies prohibit the distribution of voice lines in games when they contract with voice-acting agencies.

Others suggest that the legality of such datasets reflects problematic aspects of current copyright laws.

もうこういうのはさっさと拡散して問題視して貰う方が良くない？って気持ちになってきた。
まともな感覚持ってる人ならこれが合法って解釈できうる法律がおかしいってなるだろ https://t.co/ZWpAYJdQuV
— 🌵 (@ktlugwi) January 23, 2024

Post translation: Shouldn’t we quickly spread the word and raise concerns about this? Anyone with common sense would realize that the law allowing this to be legal is flawed.

合法じゃなくて法律追いついてないだけじゃないとこんなの絶対おかしいよなって感覚は、法治国家に生きる人間として必要不可欠な感覚なんだなと思う https://t.co/WJpehr3BMO
— ヒツジもどき@絵と文字どっちもかいてる人 (@blackseepMODOKI) January 24, 2024

Post translation: The feeling that this can’t be legal, and the law just hasn’t caught up is essential for living in a law-abiding country.

The heart of the controversy lies in the dataset’s creation and usage. Experts such as Taichi Kakinuma, a lawyer specializing in AI and a board member of the Database Society of Japan, have further explored the legal case.

In this case, it seems that there are three types of rights involved:

Neighboring rights related to the performance of the voice actor.

Publicity rights related to the voice of the voice actor.

(translated into English, original version via Taichi Kakinuma)

The key question is the application of Article 30-4 of the Copyright Law, which applies when using data for the purpose of “information analysis.” This law allows for replicating copyrighted data for machine learning purposes, provided it’s not for enjoyment. Thus, the creation and public transmission of such datasets are principally covered by this article.

However, the issue is whether the act of creating and transmitting the dataset coexists with an “enjoyment purpose,” which would make the law inapplicable.

In this regard, Hugging Face has claimed:

“To prevent usage for enjoyment purposes, the following measures have been taken:

Hiding game names and character names, not categorizing by games, and using random alphanumeric names for character identifiers.

Randomizing the order of voice files in each character folder to prevent the identification of the sequence of lines.”

(translated into English, original version via Hugging Face)

However, the biggest area of debate is on publicity rights. Publicity rights, recognized in the Pink Lady Supreme Court decision on February 2, 2012, define the exclusive right to use one’s name, portrait, and so on to attract customers. This includes the voice, especially for voice actors or celebrities. Not all uses of such ‘portraits’ violate publicity rights, but potential unauthorized use of a voice actor’s identifiable voice could potentially constitute an infringement.

Regardless of the debate around legality, other people have commented on the ethical concerns of such technology, such as the harm it can cause to voice actors’ livelihoods.

でこれで声優の立場が危機にあっても
「技術革新()に反対すんな！」って言うんだろうな

たまたまこれが見える化されてるだけで
本質的には現状でもう色んな声が
出回ってるしな https://t.co/fA8PFZUJrS
— あい (@mxMRO3zHEIiUX8q) January 24, 2024

Post translation: Even if this puts the position of voice actors at risk, they’ll probably say, “Don’t oppose technological innovation!” It was just by chance that this came to the surface. In reality, there’s already various voices already circulating.

誰かを傷つけようがこれ見てるあなたの家族が詐欺の被害に遭おうが人が死のうが、｢やばいかも｣より｢面白そう｣を優先する

技術者とはそういう人種
善悪の観念が薄い
子供と同じ

界隈外の人が監視して正しく運用されるように規制しないと際限なく被害を広げます https://t.co/7WF4DoTDeI
— なー (@untiAIeshi) January 24, 2024

Post translation: Whether it hurts someone, whether your family falls victim to voice-phishing, or people die, for these people, the notion of ‘this might be dangerous’ is always secondary to ‘this seems interesting’. That’s the kind of people they are. Their sense of good and evil is weak, just like children.
If people outside of their circle don’t monitor and regulate to ensure proper use, they will endlessly expand the scope of harm.

Katarina Woodman

Related Posts

Leave a ReplyCancel Reply