Inside 1 Terabyte Per Hour: Building a High-Performance Pipeline for Heterogeneous Data Processing on AWS

In February 2016, we announced that our data processing software Digital Discovery Pro had exceeded the processing rate of one terabyte per hour — which improved known performance by more than 20x. The development team worked closely with Amazon Web Services (AWS) architects to achieve this rate by exploiting the AWS platform to bring to bear the compute power necessary to achieve these rates and solve a significant pain point in the legal industry.

This technical paper provides an overview of the following:

  • Market need for improved processing
  • Phases of data processing for legal
  • Challenges for improving processing speed
  • Methodology for measuring processing performance