Fair Dealing or Free Riding? The Legal Contours of AI Training Under Indian Copyright Law



Share on:

The explosive growth of generative Artificial Intelligence (AI) has brought two vital public policy goals into a direct, high-stakes collision: the mandate to foster technological innovation and the statutory duty to protect creative expression. At the very heart of this global debate is the practice of data scraping. To train Large Language Models (LLMs) capable of generating human-like text, code, or art, technology companies routinely ingest massive datasets comprising millions of copyright-protected books, articles, and websites.

While AI developers view this process as mere statistical pattern-seeking, content creators see it as industrial-scale copyright infringement. In the Indian context, whether this unauthorized ingestion is legally permissible hinges almost entirely on the interpretation of Section 52 of the Copyright Act, 1957. As Indian courts begin to grapple with these issues, a critical question emerges: Can the doctrine of "fair dealing" expand to shield commercial AI training, or must the legislature step in?

The Fundamental Structural Divide: Fair Use vs. Fair Dealing
To understand the vulnerability of AI developers under Indian law, one must contrast it with the legal framework of the United States. US courts rely on the doctrine of "Fair Use" (codified under 17 U.S. Code § 107), which functions as a flexible, four-factor balancing test. This open-ended standard allows judges to evaluate new technologies dynamically, frequently protecting unauthorized copying if the use is deemed "transformative"—meaning it adds something new, with a further purpose or different character. AI companies in the US heavily rely on this defense, arguing that converting text into mathematical weights is inherently transformative.

In stark contrast, India adheres to the rigid doctrine of "Fair Dealing." Section 52(1) of the Indian Copyright Act does not provide an open-ended balancing test. Instead, it offers an exhaustive, statutory list of specific, compartmentalized exceptions. For an act of copying to be declared lawful, it must explicitly fit into one of the designated statutory pigeonholes, such as:

  • Private or personal use, including research;
  • Criticism or review of that work or any other work;
  • Reporting of current events and current affairs.

If an activity falls outside these precise boundaries, no matter how innovative or socially beneficial it might be, Indian courts lack the statutory authority to declare it "fair."
Can AI Ingestion Pass the "Research" Test?

AI developers operating in India face an uphill battle attempting to shoehorn industrial LLM training into Section 52(1)(a)(i), which permits fair dealing for the purpose of "private or personal use, including research."

Indian jurisprudence has historically interpreted this exception conservatively. In the landmark case of Civic Chandran v. Ammini Amma, the Kerala High Court emphasized that the court must scrutinize the purpose, character, and commercial impact of the dealing. Commercial AI training fails the "private or personal" threshold on multiple fronts:

  1. Scale and Mechanical Reproduction: Unlike a human researcher who reads a text to glean knowledge or cite a passage, AI training involves the wholesale, automated creation of digital copies of entire works to feed into a proprietary database.
  2. The Commercial Nature of the Enterprise: Most leading LLMs are not non-profit academic experiments; they are commercial products designed to power subscription services, enterprise software, and enterprise applications.

While the Delhi High Court’s ruling in University of Oxford v. Rameshwari Photocopy Service demonstrated that Indian courts are willing to interpret Section 52 purposefully to advance broader societal goals (such as education), that case was explicitly confined to non-commercial, instructional photocopying. Extending the protective umbrella of "research" to multi-billion-dollar tech corporations executing commercial data aggregation stretching across millions of creators would severely distort the statutory intent of the Act.

The Economic Market Effect
Another core tenet of fair dealing analysis is the market effect of the competing work. If the unauthorized copy acts as a market substitute for the original, it rarely qualifies as fair. Generative AI presents a unique, existential market threat. LLMs do not merely compete with authors; they are trained on an author’s corpus to generate text that can actively displace the market demand for the author’s future writings.

When a model ingests news articles, legal commentaries, or creative fiction to generate instantaneous summaries or synthetic alternatives, it directly deprives creators of licensing revenue and website traffic. Under traditional Indian copyright principles, any unauthorized commercial utility that damages the economic exploitation of the original work tilts heavily toward a finding of infringement.

The Regulatory Path Forward: A Call for Legislative Intervention
Because Section 52 was drafted decades before the advent of machine learning, neural networks, and Text and Data Mining (TDM), it contains a profound legislative vacuum. Forcing the judiciary to squeeze generative AI into an outdated statutory framework risks two equally problematic outcomes: either judges will overreach by inventing new exceptions, or they will enforce a strict reading that inadvertently bottlenecks domestic AI innovation.

The solution lies not in judicial activism, but in legislative intervention. Other global jurisdictions offer viable models for India to consider:

  • The European Union (EU) Model: The EU Digital Single Market Directive introduces a specific TDM exception allowing data scraping for commercial purposes, but crucially grants copyright holders an absolute "opt-out" right. If a creator tags their digital content as opt-out, AI companies must secure a commercial license.
  • The Japanese Model: Japan amended its copyright law to create highly permissive exceptions for machine learning, allowing AI training on copyright works regardless of commercial intent, provided it does not "unreasonably prejudice" the interests of the copyright holder.

Conclusion
As India positions itself as a global hub for technological advancement, its legal framework must evolve. Stretching the existing boundaries of Section 52 to allow unchecked, unlicensed commercial AI training would undermine the financial ecosystem that sustains Indian authors, journalists, and artists.

To maintain its commitment to both technological progress and intellectual property rights, the Parliament of India must introduce a dedicated statutory amendment. A balanced regulatory framework—one that perhaps introduces statutory licensing schemes or TDM exemptions with robust opt-out mechanisms—is essential to ensure that the AI revolution does not thrive at the unfair expense of human creativity.