Express Computer
Home  »  News  »  The challenge of training AI

The challenge of training AI

0 117

By Arjun Sinha, Partner, AP & Partners 


Rapid advancement of artificial intelligence (AI) is reshaping industries, much like the Industrial Revolution. However, with innovation comes legal and ethical challenges, none more pressing than the issue of copyright in AI training and output generation. As large language models (LLMs) become increasingly sophisticated, their reliance on copyrighted material has sparked significant legal debate worldwide, including in India.

The development of AI models, especially those designed for sector-specific applications, frequently involves training on extensive datasets. These datasets may include textbooks, research papers, and other written works, many of which are protected by copyright laws. For instance, an AI model designed to assist teachers may require training on educational materials, such as school textbooks and lecture notes.

There are two primary legal concerns when it comes to copyright and AI:

 

  •   Whether the use of copyrighted content to train LLMs constitutes infringement remains a contested issue

 

  •   Who owns the copyrighted material?

 

  •   Does LLM-generated content that uses extracts of copyrighted materials result in infringement?

 

Third-party materials used for training LLMs will typically have copyright protection. However, whether training LLMs on such materials itself constitutes infringement is an evolving legal question. Some jurisdictions recognise exceptions under fair use or fair dealing, which allow limited use of copyrighted materials under specific conditions.

In India, the Copyright Act permits certain ‘fair’ uses of copyrighted works, such as reproduction for instruction or examination purposes. In the context of AI training, the fair dealing exemption for private use, including research, could be relevant.

Since AI training typically occurs in a private setting without direct public sharing of the training data, one could argue that fair-dealing protections apply. However, a key limitation is that such use should not be exploitative or harm the market value of the original works. Similarly, individuals are free to read books and compile their analysis for educational purposes. If AI functions similarly, using digital means instead of human cognition, should it not be permissible?

Ownership of AI-generated content depends on who is legally considered the author. Under Indian copyright law, the author of a computer-generated work is the person who causes the work to be generated. If a teacher uses an AI tool to generate content, they may be considered the author—unless a contract gives ownership to the institution.. Alternatively, if the AI software company frames the tool to generate standard responses based on answers produced during the AI training process by individuals engaged by the company, then the company could be deemed the first owner. This framework carries its risks. If an AI software company assumes authorship, it may lose intermediary protections and be held liable for copyright violations by its users, and also open itself to challenges in third-party claims.

The question of whether AI-generated content constitutes copyright infringement is fluid and contested. AI-generated content generally falls into two categories: facts and extracts of third-party materials that have copyright.

Outputs that consist of factual information, such as solutions to mathematical problems or scientific data, are not subject to copyright since they pertain to laws of nature. LLM outputs that rely on third-party copyrighted materials, such as excerpts from books, critiques of academic texts, or restructured exam questions, raise legal concerns. Also, courts have recognised and permitted the use of limited sections of literary works. For example,  limited use of third-party materials has been recognised in the preparation of instructional material, preparation of guidebooks to support popular educational material, or answer keys to textbooks. Significant copying of third-party materials has been objected to by courts, as it may result in a reduction in the value of the original copyrighted work.

Given the evolving legal framework, businesses, AI developers, and educational institutions must proactively manage copyright risks. While a key strategy does include obtaining explicit licenses for copyrighted materials to avoid potential litigation, the use of materials for training as fair use should, in our view, get wider acceptance.  Developers will also need to take steps to ensure that AI training and output generation adhere to fair use or fair dealing principles, and look to define ownership of AI-generated content through contractual agreements. Even when legal exceptions apply, the ethical implications of using copyrighted works for AI training should be carefully evaluated, especially if AI-generated content competes with original creators.

Leave A Reply

Your email address will not be published.