What is data mapping?
Data mapping is the process of creating a comprehensive inventory of an organization’s data. Data maps for ediscovery generally include the following:
- what types and formats of data the organization generates, uses, and stores;
- where that data is stored;
- who is in charge of that data; and
- when it should be archived or deleted.
Despite the name, a data map need not be a graphical representation of data. A simple list or a spreadsheet is often more useful.
Data mapping is critical for ediscovery because awareness that information exists is a predicate to any downstream ediscovery process. Simply put, an organization cannot preserve, collect, process, review, or produce information unless it is aware that it has that data. Knowing what data exists, where it is, and what custodians manage it is therefore a key first step in information governance and litigation readiness.
Some degree of data mapping is required by the Federal Rules of Civil Procedure. Rule 26(a)(1)(A)(ii) requires parties to promptly produce “a description by category and location  of all documents [and] electronically stored information” that the party has and “may use to support its claims or defenses.”
When mapping or inventorying data, organizations should consider three types of questions. First, an organization must assess what types of data it generates or uses. This involves searching for information such as:
- hard-copy documents, handwritten notes, or printed manuals;
- emails and other forms of correspondence;
- voicemails or voice recordings;
- text messages;
- instant messages;
- conversations from collaboration or project-management applications;
- database information;
- text from websites and social media;
- photos and videos; and
- data generated by connected sensors or Internet of Things devices.
Second, the organization should determine where its data lives and where it should look for additional sources of potentially discoverable information. For example, data may be generated and stored on:
- local laptop or desktop computers;
- internal servers or network drives;
- cloud storage accounts;
- file hosting services or vendor-provided storage systems;
- local backup storage devices such as hard drives, thumb drives, backup drives, or CDs;
- mobile applications;
- individually owned devices such as cell phones, tablets, laptops, and even home computers, especially — but not exclusively — if an organization uses a BYOD (bring-your-own-device) policy;
- websites or social media accounts; and
- legacy systems or hardware devices that are no longer in active use.
Finally, as an organization determines the categories of information it generates and uses and the locations where it may find relevant data, it should also assess what it needs to know about that data. This “meta-information” worth tracking on a data map might include:
- as mentioned above, the location of each data type;
- the volume that is generated of each data type;
- the custodians who generate, use, and manage that data;
- the form or format of each type of data;
- the record-retention requirements for each type of data indicating when it can or should be moved to an archive system or deleted entirely; and
- the purpose or use of each different type of data.
A data map, like a store inventory, cannot be created once and forgotten. A functional data map should be a living document that is monitored and maintained over time. Data sources change, custodians retire or switch positions, and new data streams or storage devices are added on an ongoing basis. Organizations that regularly revisit and update their data maps are prepared to answer external ediscovery or regulatory compliance document requests and can respond promptly to the threat of litigation with their own early case assessment.
Data mapping involves creating a comprehensive inventory of an organization’s potentially discoverable data. Data maps for ediscovery should generally include the types and formats of data that the organization has as well as the locations, custodians, and record-retention requirements of that data.