Every week, the Array team reviews the latest news and analysis about the evolving field of eDiscovery to bring you the topics and trends you need to know. This week’s post covers the period of August 10-16. Here’s what’s happening.
Replacing the Enron dataset for eDiscovery training
Recently, Craig Ball and Doug Austin both wrote about the Enron Email Corpus. Not only about why it has for two decades been the standard training database for eDiscovery but also why it’s time (or past time) to find a replacement dataset that prepares attorneys and legal professionals for “contemporary realities” as Ball puts it.
First, here’s why the Enron emails work as a dataset. As Austin and Ball write, they are:
- Free and legal to use — the company no longer exists
- Large enough to run effective searches, with more than 500,000 messages from over 100 employees
- Written contemporaneously by real humans in corporate jobs and contain real-life quirks like typos, etc.
But the Enron emails are also outdated. They don’t address modern eDiscovery challenges like:
- Email from cloud-hosted mail systems. In fact, as Austin writes, the email data originated in Lotus Notes before being converted to Outlook.
- Smartphones content and mobile-generated messages
- Chats from collaboration platforms like Slack, Teams, etc.
- Content from video conferencing apps like Zoom or Webex
- Data with hyperlinked, or modern, attachments
Ball and Austin offer two possible replacements. On his eDiscovery Today blog, Austin said he viewed a demo at ILTACON that used documents from the Opioid Industry Documents Archive, created by University of California at San Francisco (UCSF) and Johns Hopkins University in 2021. However, the entire collection can’t be downloaded in whole and the files are converted to PDFs, which doesn’t allow users to train for some of the modern challenges pointed out above.
On his blog, Ball says he used the John Podesta emails related to the 2016 U.S. presidential election as a training dataset while teaching a law school course: “Yes, they were stolen and released without consent. Yes, that’s ethically and legally fraught. But they’re from 2015, structured as PSTs with full headers and attachments, and a suitable size for students—around 50,000 messages. In a controlled educational setting, with disclaimers, they supply a realistic glimpse of the formats, quirks, and challenges of modern email collections that Enron simply can’t provide.”
Judge orders meet and confer after request to preserve RAM
Austin also writes on eDiscovery Today about Belvac Prod. Mach., Inc. v. Adonis Acquisition Holdings LLC, where the plaintiff alleged defendant reproduced copyrighted software in random access memory (RAM) or other temporary storage in violation of the Copyright Act. The judge’s decision on a motion shows why parties should take steps to negotiate an ESI protocol, including clear definitions for data to be produced, before going to the court.
Belvac contended that “it is undisputed that each time [Defendant] powers on or reboots the Belvac Equipment … a copy of [Plaintiff’s] embedded, copyrighted programmable logic controller (‘PLC’) Software is made from an SD memory card to RAM for execution of the software, with the RAM copy remaining only until the Belvac Equipment is next powered off.” Belvac filed a motion to enter into an ESI protocol that proposed the defendant should preserve RAM data, other temporary files, and system logs “only if it evidences an infringing act by Adonis under one of the known-to-Adonis scenarios in which reproduction occurs.”
A judge denied the motion without prejudice and said Belvac didn’t meet its burden to show good cause for the preservation of ESI as set forth in its proposed protocol. In addition, the judge said the plaintiff did not define phrases such as “real-time retention” or “known-to-Adonis scenarios,” and noted the parties had not met and conferred on the defendant’s compromise proposal. She also ordered the parties to meet and confer and said “If no compromise is reached during the meet and confer process, any renewal of the instant motion by Plaintiff should include in the briefing: (1) analogous case authorities, to the extent there are any, regarding the preservation of evidence of the alleged reproduction of copyrighted software; and (2) precise and narrowly contoured language defining the scope of the ESI which must be preserved, with each side’s competing proposal clearly identified for any portions of the proposed protocol that remain in dispute.”
Other recent eDiscovery news and headlines:
[View source.]