The Model Train Set: AI Training Models and Their Impact on Copyright Liability

There are various techniques for training machine learning systems that use preexisting works in different ways and therefore have different implications for copyright. Assessing the potential liability of those techniques requires understanding of how these techniques work. For example, the techniques used to train large language models (such as ChatGPT) are materially different than those for diffusion or image classification models, and they can change again at the fine-tuning level. Throughout these different processes, notions of reproduction, distribution, and display may or may not be implicated, and indeed traditional notions of what these terms mean may be subject to strain and challenge in the world of AI. In this session, we’ll explain how content becomes data for AI purposes and identify where potential reproduction, distribution, and display of content may occur.

  • Moderator: Aleksander GoraninPartner, Duane Morris
  • Matthew Sag, Professor of Law, Artificial Intelligence, and Data Science, Emory University School of Law
  • Yacine Jernite, Machine Learning and Society Lead, Hugging Face
  • Rebecca Blake, Advocacy Liaison, Graphic Artists Guild