Futures

The Battle Over Books and AI, from (20231010.)

External link

Summary

This text discusses a data set called “Books3” that contains over 191,000 pirated ebooks used to train generative-AI systems by companies like Meta and Bloomberg. The data set is at the center of several lawsuits brought against Meta by authors who claim copyright infringement. The author of the text has analyzed the data set and found that many authors’ works were included without their permission. The secretive nature of AI-training practices and the potential harm to authors are highlighted. The text also provides caveats for using a search tool to identify authors and titles within the data set.

Keywords

Themes

Signals

Signal Change 10y horizon Driving force
Pirated ebooks used to train generative-AI Copyright infringement to potential legal regulations More stringent regulations on the use of pirated content Protecting author rights and preventing copyright infringement
Authors’ books unknowingly used to train machines Lack of awareness to increased transparency Authors will have more control over the use of their works Protecting author rights and preventing unauthorized use
Generative AI threatens authors’ livelihood Potential harm to authors to potential solutions Authors will have new ways to protect their work Protecting author rights and ensuring fair compensation
AI-training practices are secretive and nonconsensual Lack of transparency to increased transparency AI training practices will become more transparent and accountable Ensuring ethical and responsible use of AI technology

Closest