This article won the LitigationWorld Pick of the Week.
LitigationWorld is a free weekly email newsletter that provides helpful tips regarding electronic discovery, litigation strategy, and litigation technology.
Of the concepts guaranteed to glaze the eyes of the most attentive, “data processing” is certainly near the top of the list. Even attorneys that managed to stay awake through such law school gems as Secured Transactions and Estate Planning may struggle with data processing. Part of the problem is, at first blush, data processing appears more technical than legal.
The prevalence of this myth is unfortunate and the risks to attorneys that subscribe to it are substantial. This blog post is not intended to be a deep dive into the world of data processing; including the issue of whether all data processing tools are created equal (they are not). More on that topic in a future article perhaps.
Regardless of which data processing tool you choose to use, there are a few concepts you should understand prior to giving your vendor or outside law firm the go-ahead to begin processing your data. This is clearly not an exhaustive list, but it will get you started:
- Hidden Content
Several applications, notably Microsoft Office applications, allow the user to hide content, including rows, columns and worksheets in Excel, slides in PowerPoint and track changes content in Word. At a minimum you will want the processing engine to flag documents with hidden content so you have the option of digging deeper into those documents prior to production. You may also want to “force” hidden content to be visible (at least while you are reviewing documents) to save you the time and cost of manually going through the mouse clicks required to make visible each piece of hidden content.
Often the most confidential or privileged information in a document is located in hidden content. The classic example is the attorney that uses track changes to comment on a draft document. If the track changes view is turned off at the time of collection, the person reviewing the document for relevance may not notice the hidden privileged material. The document is produced and the receiving party need only turn on track changes to view the attorney’s comments.
- Exceptions Reporting
So how do attorneys get into trouble here? They assume:
1) we collected everything the employee had;
2) we sent it to the vendor;
3) we reviewed everything the vendor loaded and produced the relevant non-privileged material;
4) therefore we have produced everything that is relevant.
Right? Maybe.
Some of the documents you collect and give to your vendor to load into the review tool will not ultimately make it into the tool. There are several reasons for this, including limitations of the processing or review tools, uncommon file types and corrupted files. Your vendor should be listing files that do not make it into the review tool on an exceptions report and talking through it with you. It is important that someone on your team is reviewing the exceptions report to determine what errors are occurring and determining whether these files are relevant to your litigation and need to be produced.
For obvious reasons, it would be less than ideal for your vendor to ask you after the close of discovery, “So how would you like us to handle all these files we could not process and load for you to review?”
- Embedded Objects
Many applications permit the user to embed data in a file. Examples of embedded content include; attachments embedded in the body of an email and an Excel spreadsheet embedded into a PowerPoint. You may choose to flag documents with embedded objects for a later review or choose to extract these embedded objects and attach them to the original “parent” document as attachments. A word of caution if you go with the latter option, you may end up with dozens of “junk” attachments if your data set has a high population of embedded images, e.g. images systemically embedded in the signature block of an email.
It is easy to embed thousands of pages of financial data into a single PowerPoint presentation – though only one page of that spreadsheet may be visible to the person looking at the slide on the screen. If not treated properly, embedded content has the potential to inadvertently relay privileged or confidential information to the receiving party.
- De-Duplication
It is likely your data set contains duplicate documents. You may want to reduce your data volume by de-duplicating (keeping only one copy of each document). De-duplicating will save you from looking at the same document multiple times as you review documents in preparation for production or depositions. If you opt to de-duplicate, consider adding a field that identifies custodians that have duplicate documents that were removed during de-duplication. You may also want to repopulate these duplicates at the time of production so they are included in your production. This is important when you need to know who else may have had a copy of a document, e.g. when preparing for depositions.
Simply put, it is expensive to look at the same document multiple times. However, you should alert the opposing party if you are de-duplicating for review and not re-populating at the time of production.
- Parent-Child
Does your processing tool maintain family relationships, i.e. retain the connection between an attachment and the cover document? Sometimes the content of an attachment is less important than the context in which it was emailed from one person to another.
Producing documents without maintaining family relationships is sometimes called “fifty-two card pickup” — for obvious reasons. You can appreciate why your opponent may cry foul when you produce email and attachments separately without a way to marry them up.
- Z-Printing
Look at your data set and if you need to review Excel spreadsheets in image format, perhaps because you need to make manual redactions and do not have access to a native redaction tool, determine whether it is helpful to image the spreadsheets in a “z” pattern. Z-pattern printing means the vendor images the spreadsheet from left to right, first capturing all of the columns associated with a given set of rows, then moving on to the second set of rows. Spreadsheets are nearly always wider than one page so if you image in a “z” pattern it is faster (and thus cheaper) to review and redact from left to right.
Anecdotally, I would estimate spreadsheets in nine out of ten data sets would be easier to review and redact in Z-printing format. If you get this one wrong, the person drawing redactions will appreciate it all weekend long…
In summary, data processing is a legal issue and not solely a technical one. As an attorney, if you blindly delegate these data processing decisions to your technical staff or vendor you may later wish you had done your homework first.





Thursday, May 24, 2012
Pingback: Extra, Extra: Get Your E-Discovery News Here! | E-Discovery Beat
Pingback: Document Solutions, Inc. » ESI Data Processing: Why Should Attorneys Care?