We once thought that large, heavy, top-down InfoGov tools would solve our unstructured data problems. We thought that we would have an easily searchable index of all ESI created by our companies, and that we could, with a minimum amount of stress or effort, go into those systems to cherry pick the ESI that we needed for any matter. Sounds dreamy, no?
Reality has not played out exactly as we wanted. Where what we want is a clear, easy map to finding and collecting data, what we get is a vast, swarming sea of unstructured data. On top of that sea floats a thin layer of metadata. Learning how to work with metadata in ediscovery – and the risks that metadata pose to an ediscovery process – is a crucial part in making your ediscovery work in the world as it is, not as we wish it to be.
What is metadata?
Metadata is basically data about data. It is the library card in the Dewey drawers. Savvy IT teams use metadata to give high level structure to unstructured data, for example to make it findable so that other business processes can be applied to it, including preservation and collection, as well as retention policy purges or archiving.
Metadata is embedded in the data sources themselves – it is part of the email or the PDF file – and it contains things like who created the document, the date it was created, where the document was created, and even what version of software created it. Individuals can edit some of the fields included in metadata fields. In addition, organizations can create metadata fields specific to their organization needs–automatically including the business function that created a document, say, or including the status of a document (draft, or final).
To get a sense of what metadata is in real life, here are some examples of metadata for common file types:
- PDF: Title of document, author, subject, keywords, created on date, modified on date, application version, location, file size, page size, number of pages.
- Outlook Email: From, to, date sent, subject, attachments, compliance label, storage area (called compound path).
- Teams: Conversation ID, conversation name, contains deleted messages, contains edited messages, custodian name, date.
Why is metadata important in ediscovery?
Metadata is used not only to identify potentially responsive material by looking at parameters like date ranges and custodian names, but also to ensure that the data has had a sound and defensible chain of custody as it moves from preservation to collection to review and into production.
Metadata is commonly used to do ECA on a matter. Attorneys and paralegals will run initial searches on potentially responsive ESI to estimate the general size and scope of a matter. This information is then used to make proportionality decisions, as well as used in the meet-and-confer process to negotiate about the scope of the discovery.
Because metadata shows document creation information–who created it, when and where, as well as who modified it and when–it is easy to track high level changes to native files that might indicate responsiveness. This becomes important when identifying potentially responsive material, as well as when demonstrating defensible chain-of-custody practices. If handled improperly, metadata can create data spoliation and potential sanctions, as well as increased discovery time and cost.
Finally, metadata can be used as evidence to authenticate authorship, creation date, and creation location. Without metadata, this authentication would have to happen with sworn testimony. Metadata streamlines the process of fact pattern creation in litigation.
What are the metadata risks for ediscovery?
Metadata spoliation is a large risk for ediscovery teams. Identification, preservation, and collection is the basis for all of the review activities that follow. Any mishandling of metadata at earlier stages in ediscovery creates large downstream risk and potential cost. Once metadata is spoiled, it’s gone. Metadata spoliation can not be undone.
Many companies with immature ediscovery practices rely on employees or custodians to self-identify and self-collect potentially responsive ESI. Self-collection is a large cause of data spoliation. There is established case law related to self collection that highlights the risks, namely Leidig v. BuzzFeed and National Day Laborer Organizing Network, et al v. U.S. Immigration and Customs Enforcement Agency, et al. Based on this case law, some of the negative consequences to self collection spoliation may be:
- Litigants may be precluded from using metadata as evidence, and will be required to provide independent evidence, through testimony
- Litigants may be directed to complete additional – and costly – discovery
- Fines may be imposed
Frequently, a discovery demand contains language requiring “all ESI in custody or control related to X matter.” Metadata is part of “all ESI.” If there has been metadata spoliation because of poor preservation and collection practices, and you are unable to meet the production obligations, you may be sanctioned.
If accurate metadata is not available to run early case and scope / scale assessments, it may lead to over preservation or over collection. Not only is over preservation and collection expensive – the billable hours add up in review – but preserving data that should be deleted according to your organization’s data retention policies creates more legal risk in future.
In the review stages, if files don’t have accurate metadata, it will be impossible to effectively search, chronologically sort, or de-duplicate files. This creates problems during the review itself.
Finally, metadata represents a security risk for in-house teams and for outside counsel. Metadata can be used by hackers to identify vulnerabilities, to find email addresses, to identify who is attorney and who is client, or to gather information about subject location.
8 Steps to Reduce Risk to Your Metadata
- Never ask employees or custodians to self-collect. Always use a tool that can safely and defensibly collect without altering metadata.
- Ensure that your preservation and collection tools and practices do not modify metadata fields by working with an established, tested ediscovery vendor.
- Ensure that individual custodians can not manually modify or change metadata fields.
- Work collaboratively with your IT teams to ensure the right metadata is captured for primary ESI sources.
- Work collaboratively with InfoGov teams to ensure that data retention policy fields are created properly.
- Include instructions for metadata preservation in legal hold notices.
- Know when to use forensic collection practices, and when forensics is not needed.
- Know how common systems – like O365 – alter metadata in normal course of business activities. For instance, the “create date” on an email can be changed by O365 when that email file is moved into the archive. This creates a big problem for ECA as well as for identification of potentially responsive data.
How In-House Ediscovery Software Can Help Reduce Risks to Metadata
The primary risk that ediscovery software mitigates is the inadvertent spoliation of data during the preservation, collection, and review stages of ediscovery. Ediscovery software is designed to allow ESI handling and review without altering metadata.
By working with an established ediscovery vendor who has a tool designed for ESI identification, preservation, and collection, you can better and more efficiently perform ECA for scope / scale and proportionality questions, as well as ensure that metadata can be used as evidence and can demonstrate defensible chain-of-custody for your responsive ESI.