From Copyright Case to AI Data Crisis: How The New York Times v. OpenAI Reshapes Companies’ Data Governance and eDiscovery Strategy

Nelson Mullins Riley & Scarborough LLP

The New York Times v. OpenAI litigation has garnered significant attention as a landmark copyright dispute[1]; however, it has rapidly evolved into a global data privacy conflict that will inform how enterprises approach AI and data governance. When Magistrate Judge Ona T. Wang issued a May 13, 2025 preservation order requiring OpenAI to retain all ChatGPT conversation logs – affecting over 400 million users worldwide – she inadvertently created a watershed moment for AI data management that dominated the past two months of that case.[2] This order, which the presiding judge recently affirmed, represents the first time a court has mandated mass preservation of AI-generated content on such a scale, forcing immediate reconsideration of privacy commitments, international compliance obligations, and the fundamental relationship between AI systems and user data.

The implications extend far beyond OpenAI’s courtroom battles. Enterprise leaders implementing AI solutions now face unprecedented questions about data sovereignty, litigation preparedness, and the balance between innovation and legal compliance. Although the fight over OpenAI’s data preservation obligations has only just begun, this case illuminates critical unresolved questions about AI and data governance frameworks, providing early insights for organizations navigating the complex intersection of AI technology, privacy rights, and legal discovery obligations.

The AI preservation order that shook 400 million users.

When OpenAI launched ChatGPT in November 2022, it set the record as the fastest-growing app of all time. ChatGPT remains the most widely used AI chatbot worldwide, with an average of nearly 800 million weekly users.[3] Since its meteoric rise, OpenAI has endured its fair share of controversy, including an ongoing copyright infringement lawsuit filed by The New York Times in December 2023, which alleges that OpenAI unlawfully used millions of NYT articles to train its AI models, including ChatGPT.[4]

Although The New York Times v. OpenAI raises several novel copyright questions that are worth watching, a broader data preservation controversy began when The New York Times alleged that OpenAI was systematically destroying evidence by deleting user conversations that might demonstrate copyright infringement.[5] On May 13, 2025, Judge Wang responded with a sweeping preservation order directing OpenAI to “preserve and segregate all output log data that would otherwise be deleted on a going forward basis,” regardless of whether deletion was requested by users or mandated by privacy regulations.[6] The order has critical implications for virtually all companies that use AI.

On its face, the scope of this order was unprecedented in AI litigation. OpenAI promptly asked the court to reconsider, explaining the conflict between complying with the preservation order and managing competing global requirements.[7] OpenAI argued that the preservation order would force it to “disregard legal, contractual, regulatory, and ethical commitments to hundreds of millions of people, businesses, educational, and governments around the world – even though there is no reason to believe these drastic measures will advance the litigation.”[8] OpenAI also explained that its “data infrastructure is complex and retaining data that is otherwise slated for deletion requires significant engineering work, infrastructure changes, and compute resources.”[9]

Judge Wang denied OpenAI’s motion for reconsideration; however, she invited OpenAI to submit supplemental briefing on the issue in preparation for a hearing.[10] OpenAI’s key arguments included:

  • The order required preserving 60 billion conversations that can’t be feasibly searched;[11]
  • The Plaintiffs “(over)estimate that only 0.006% of the data might even be relevant”;[12]
  • The technical cost of compliance would require months of engineering and cost millions in hosting infrastructure;[13] and
  • Users reasonably expected deleted chats to stay deleted.[14]

Following the May 27 hearing on these issues, the court clarified that it did not intend “wholesale preservation” and that the preservation requirement only applied to certain ChatGPT plans.[15] OpenAI also noted that it would be exploring geographic exclusions for non-U.S. users to avoid conflicts with international privacy regulations like GDPR.[16] Finally, OpenAI indicated that it would be negotiating a more reasonable sampling approach with the Plaintiffs;[17] however, OpenAI appealed Judge Wang’s decision to District Judge Stein instead.[18]

OpenAI’s public response frames emerging privacy considerations and risks.

Dissatisfied with Judge Wang’s ruling, OpenAI filed an objection to her preservation order, escalating the issue to District Judge Sidney Stein.[19] The objection argues that the breadth of the order violates established proportionality standards in federal discovery,[20] resting on three main pillars:

First, OpenAI argues that there is no substantial evidence that users actually obtain copyrighted news content through ChatGPT, which the Plaintiffs admit is “difficult.”[21] Second, the preservation order creates a disproportionate burden, requiring months of engineering time to override privacy protections for hundreds of millions of users (and at great cost, of course).[22] Finally, OpenAI argues that the preservation order is procedurally unfair, as it was based on now-debunked accusations of litigation-related data destruction.[23]

Beyond its legal arguments, OpenAI issued a public response about the preservation order on June 5, 2025, to explain which users’ data is affected and what OpenAI is doing while it appeals the preservation order.[24] OpenAI’s statement confirms that all ChatGPT Free, Plus, Pro, and Team subscriptions are now being preserved.[25] API usage is also affected for users who do not have a Zero Data Retention[26] agreement in place.[27] ChatGPT Enterprise and ChatGPT Edu customers are not affected.[28] The conversations preserved under the order are “stored separately in a secured system” which is only accessed for the purpose of litigation by “a small, audited OpenAI legal security team.”[29]

Sam Altman, OpenAI’s CEO, also took to social media to discuss a potential “AI Privilege” to protect sensitive conversations from disclosure.[30] To be clear, no such privilege has been recognized by a U.S. court, but that may be a forecast of OpenAI’s strategy if it is ultimately forced to disclose user chat logs in discovery.

Judge Stein’s ruling affirms the preservation order and its stakes.

On June 26, 2025, District Judge Sidney Stein heard oral argument on OpenAI’s objections to Judge Wang’s preservation order.[31] During the hearing, Judge Stein highlighted that OpenAI’s terms of use allowed preservation for legal requirements, and rejected that user privacy interests should override the needs identified in the preservation order. He also entertained arguments by the New York Times that searching output logs would be important to discover whether users might be engaging in and concealing copyright infringement or generating infringing news articles with ChatGPT. Ultimately, Judge Stein denied OpenAI’s objection and affirmed Judge Wang’s preservation order.[32]

Implications for enterprise AI governance and use.

Although this discovery dispute has now moved beyond its early stage with Judge Stein’s decisive ruling against OpenAI, The New York Times v. OpenAI preservation order conflict reveals serious challenges for enterprise AI governance that extend far beyond copyright litigation. Organizations implementing AI solutions must now account for discovery obligations that traditional data governance frameworks did not anticipate. OpenAI’s early difficulties also highlight that AI systems create entirely new categories of electronically stored information that require specialized handling in litigation contexts.

For companies using vendors like OpenAI, the most immediate concern involves understanding the extent to which user data is being retained in the first place. This case shows that vendor privacy commitments may come into conflict with discovery orders, potentially exposing enterprise users to data disclosure requirements that sharply contrast longstanding privacy commitments between companies and their AI vendors. Retaining information under legal requirements is standard in most vendor agreements, including OpenAI’s policies, which Judge Stein was quick to point out.[33] However, companies should consider additional terms requiring notification when their data becomes subject to a hold, including a court-mandated hold, and providing opportunities to object to disclosure. While ChatGPT’s general enterprise plans were not affected, OpenAI’s API use was impacted in the absence of a Zero Data Retention agreement.[34] Zero Data Retention agreements are important because they generally require that vendors do not retain prompts and responses in the first place, thus avoiding preservation obligations.

This data preservation dispute also highlights steps that companies can take to make discovery or investigations more manageable. Front-end data tagging and classification has always been important in managing ESI, but it’s a critical risk management tool with the high volume of data that AI systems can create. At the hearing on her preservation order, Judge Wang was concerned that OpenAI did not collect or provide sufficient information to determine whether automatically destroyed output logs were relevant.[35] While litigation holds and preservation orders may not have been top of mind when the major chatbots were developed, every major model provider is now involved in litigation, and they will likely continue to be a target for third-party discovery, as well. Most commercially available chatbots and APIs will resist responding to certain topics – such as requests to create harmful content – so similar prompt-based flags may help to flag potentially-relevant content earlier. This may be more work at the outset, but it may help create a defensible basis for limiting discovery to a more manageable level, rather than reverse-engineering it mid-lawsuit. Proactive measures can also be a factor in shifting the costs associated with these discovery disputes.

Conclusion and Issues to Watch.

The New York Times v. OpenAI preservation order is a significant milestone for AI governance, transforming abstract privacy concerns into immediate operational challenges. For 400 million ChatGPT users and companies who believed their deleted conversations were gone forever, this case shatters basic assumptions about privacy when interacting with AI. For enterprise users, it demands a careful reassessment of vendor relations and AI risk management.

With the New York Times now beginning to search through preserved logs, we can expect the next phase of this litigation to reveal more practical challenges and potential constitutional issues surrounding mass AI data discovery. Such large-scale preservation of user data by AI companies will inevitably invite third-party discovery by litigants seeking to access their adversaries’ preserved AI conversation logs, further implicating data privacy issues and even privilege concerns. OpenAI’s continued appeals also suggest this case may ultimately require resolution at the appellate level, potentially setting nationwide precedents for AI data governance.

The implications extend far beyond courtroom strategy. This preservation order also reveals a new range of issues that will factor into due diligence calculations for M&A transactions, the decision to enter joint ventures, and vendor selections where AI systems process sensitive data. Companies evaluating potential acquisitions must now assess not only an AI system’s capabilities, but its litigation exposure and data retention architecture. Similarly, vendor contracts that seemed prudent yesterday may prove inadequate tomorrow if they lack provisions for discovery cooperation, data segregation, or jurisdiction-specific privacy compliance. The OpenAI case reveals just how connected AI adoption risks are – technical, legal, and reputational – and underscores the fact that traditional governance frameworks may not be fully equipped to handle such complexities.

For organizations navigating this landscape, the lesson is clear: AI integration demands not just innovation strategies, but an understanding of how established legal principles are evolving with emerging technology. Those who recognize this reality and act accordingly will find themselves better prepared for the inevitable discovery challenges ahead.

[1]See generally The New York Times Company, et al. v. Microsoft Corporation, et al,S.D.N.Y. Case No. 1:23-CV-11195[hereinafter “The New York Times v. OpenAI”], docket information available at https://www.courtlistener.com/docket/68117049/1/the-new-york-times-company-v-microsoft-corporation/; In re OpenAI, Inc., Copyright Infringement Litigation,MDL No. 3143, docket information available athttps://www.courtlistener.com/docket/646469/in-re-openai-inc-copyright-infringement-litigation/.

[2]The New York Times v. OpenAI, Order, D.E. 551(May 13, 2025)[hereinafter “Preservation Order”], available at https://www.courtlistener.com/docket/68117049/551/the-new-york-times-company-v-microsoft-corporation/.

[4]See generally The New York Times v. OpenAI,Complaint, D.E. 1 (Dec. 27, 2023),available at https://www.courtlistener.com/docket/68117049/1/the-new-york-times-company-v-microsoft-corporation/ .

[5]Preservation Order, p.1 (“The deletion of the output log data was first raised with the Court in January 2025 and discussed at the January 22, 2025 conference.”).

[6]Id.p.2.

[7]The New York Times v. OpenAI, Letter Motion to Reconsider, D.E. 558 (May 15, 2025),available at https://www.courtlistener.com/docket/68117049/558/the-new-york-times-company-v-microsoft-corporation/.

[8]Id.p.1.

[9]Id. p.2.

[10]The New York Times v. OpenAI, Order Denying Reconsideration without Prejudice, D.E. 559 (May 16, 2025),available at https://www.courtlistener.com/docket/68117049/559/the-new-york-times-company-v-microsoft-corporation/.

[11]The New York Times v. OpenAI, OpenAI Memo In Opposition (Output Logs), D.E. 578 (May 23, 2025), available at https://www.courtlistener.com/docket/68117049/578/the-new-york-times-company-v-microsoft-corporation/.

[12] Id.

[13]Id.

[14]Id.

[15]The New York Times v. OpenAI, OpenAI Status Letter to the Court, D.E. 587 (May 29, 2025),available at https://www.courtlistener.com/docket/68117049/587/the-new-york-times-company-v-microsoft-corporation/.

[16]Id.

[17]Id.

[18]The New York Times v. OpenAI,OpenAI Objection to the Preservation Order, D.E. 596 (June 3, 2025),available athttps://www.courtlistener.com/docket/68117049/596/the-new-york-times-company-v-microsoft-corporation/;see also The New York Times v. OpenAI, News Plaintiffs’ Letter Regarding Sampling Proposal Deadline, D.E. 604 (June 6, 2025) (representing that OpenAI had not provided its sampling proposal by the June 6 deadline), available at https://www.courtlistener.com/docket/68117049/604/the-new-york-times-company-v-microsoft-corporation/.

[19]See OpenAI Objection to the Preservation Order.

[20]Id.

[21]Id.

[22]Id.

[23]Id.

[24]How we’re responding to The New York Times’ data demands in order to protect user privacy, OpenAI (June 5, 2025),available athttps://openai.com/index/response-to-nyt-data-demands/.

[25]Id.

[27]How we’re responding to The New York Times’ data demands in order to protect user privacy,OpenAI (June 5, 2025), available athttps://openai.com/index/response-to-nyt-data-demands/.

[28]Id.

[29]Id.

[31]See The New York Times v. OpenAI, Order (Oral Argument), D.E. 690 (June 20, 2025),available at https://www.courtlistener.com/docket/68117049/690/the-new-york-times-company-v-microsoft-corporation/.

[32]The New York Times v. OpenAI,Judge Stein Order, D.E. 712 (June 26, 2025),available at https://www.courtlistener.com/docket/68117049/712/the-new-york-times-company-v-microsoft-corporation/.

[34]How we’re responding to The New York Times’ data demands in order to protect user privacy, OpenAI (June 5, 2025),available athttps://openai.com/index/response-to-nyt-data-demands/.

[35]US court order preserving OpenAI chats shows importance of discovery panning, Emma Whitford, mLex (May 30, 2025),available athttps://www.mlex.com/mlex/artificial-intelligence/articles/2347459/us-court-order-preserving-openai-chats-shows-importance-of-discovery-planning.

DISCLAIMER: Because of the generality of this update, the information provided herein may not be applicable in all situations and should not be acted upon without specific legal advice based on particular situations. Attorney Advertising.

© Nelson Mullins Riley & Scarborough LLP

Written by:

Nelson Mullins Riley & Scarborough LLP
Contact
more
less

PUBLISH YOUR CONTENT ON JD SUPRA NOW

  • Increased visibility
  • Actionable analytics
  • Ongoing guidance

Nelson Mullins Riley & Scarborough LLP on:

Reporters on Deadline

"My best business intelligence, in one easy email…"

Your first step to building a free, personalized, morning email brief covering pertinent authors and topics on JD Supra:
*By using the service, you signify your acceptance of JD Supra's Privacy Policy.
Custom Email Digest
- hide
- hide