Document Management Data System Business Internet Concept

Towards Developing a Proposed TAR Framework

Recently, ACEDS hosted a webinar entitled “Point|Counterpoint: A Proposed TAR Framework,” during which a stellar panel of lawyers, including Redgrave’s Christine Payne and Kirkland & Ellis’ Michele Six represented the defense bar, and Suzanne Clark and Chad Roberts from eDiscovery CoCounsel represented the plaintiff’s bar. Retired US Magistrate Judge James C. Francis, IV, now with JAMS, agreed to moderate the discussion. The presentation raised the question whether using a scorecard or similar results-oriented framework could provide a solution for practitioners interested in assessing the success of a TAR project.

Because we had so many attendees, there were too many questions to answer during the webcast. The panelists graciously agreed to add their voices to this blog post to answer the questions we did not get to during the webinar. Note that where appropriate ACEDS combined similar or related questions to facilitate providing efficient responses.


What is the difference between data analytics and TAR?

PAYNE & SIX: The folks who can really answer this question are the vendors who control the marketing of all this.  But to me, what sets TAR apart from general data analytics is the review piece – anything that makes a predictive guess at the way in which a document should be categorized for review (responsive/not responsive, priv/not priv, confidential/not confidential) counts as TAR.

ROBERTS & CLARK: “Analytics” is a non-specific term that has evolved in vendor’s marketing vocabulary to describe a variety of products and functionality. I think of it as a very broad bucket of various functions and features with special purposes, for example: conceptual clustering, similar document detection, development of keyword expansion through semantic similarity, data visualization, PII identification and redaction, etc.  Generally, features that help you understand the data you have or that apply a process to your data set. TAR, as Ms. Payne points out, is specific to the machine learning process, where humans interact with the machine and the subjective judgments they make are then replicated to a larger set of documents.

Are you recommending TAR pre or post search terms?

PAYNE & SIX:  A hot debate!  Some courts say no search terms before TAR.  A well-intended but misguided law review article says no search terms before TAR.  Most data scientists, however, will say that you can absolutely use search terms before TAR.  I think that’s the right answer, as long as you are designing everything as a system.  In other words, it doesn’t make sense to design search terms for attorney review and then change your mind and uncritically layer TAR on top of that.  The beauty of the report card model, however, is that it doesn’t matter.  I repeat—it doesn’t matter.  Do search terms, TAR, search terms again and then put a cherry on top.  Just know that at the end of the process, there will be an objective report card waiting for you, and so you will need to demonstrate that your process—whatever it was—actually worked.

ROBERTS & CLARK:  It is a hot debate! If you are approaching the task as an information retrieval scientist, you would never want to use search terms to cull a data set before a TAR process because it can be shown that the overall (“end-to-end”) recall rate will be degraded, sometimes very significantly degraded. So, don’t use search terms for the purpose of trying to make the outcome “better” by increasing the “richness” (technically speaking, the prevalence) of the data set thinking you will improve the quality of the production. In the work-a-day world outside of information retrieval contests, pre-culling the data set with search terms is simply done to reduce storage costs that are incurred hosting the data set in the TAR review platform to begin with.  There is a good explanation of this here: 7 F.C.L.R. 1 (2014) Grossman & Cormack, Comments on “The Implications of Rule 26(g) on the Use of Technology Assisted Review

Before TAR, parties never needed to be transparent about how they reviewed documents. What requires more transparency now?

PAYNE & SIX: Case law.  Proponents of TAR offered up unprecedented transparency and cooperation in the early days to expand adoption.  It then became part of the case law.

ROBERTS & CLARK:  Transparency in electronic discovery methodology pre-dates TAR, actually. Here’s a great case about it: Google v. Samsung 2013 WL 1942163 (N.D. Cal 2013). To put a fine point on it though, because the question is about “how they reviewed documents,” review protocols (instructions to document reviewers about coding criteria) is typically couched as work product, but an overall methodology of review (TAR v. linear, etc.) is not necessarily work product or privileged.

In some ways search terms are like a very simple TAR model. Recall and precision can apply to search terms as well, but I’ve never seen those metrics negotiated for terms. What do you think are the primary reasons receiving parties expect and want so much more from TAR?

PAYNE & SIX: Case law.  See above.

ROBERTS & CLARK:  I think it is because TAR workflows were primarily responsible for introducing the notion of quality metrics into electronic discovery production. This happened when the early adopters and their data scientists were defending the methodology during the contested hearings and the notion of quality metrics were used as reassuring support for the process.  Quality metrics are now becoming firmly entrenched in search term validation as well. Here’s a wonderfully written opinion on just that subject: City of Rockford v. Mallinckrodt ARD Inc., No. 17 CV 50107, No. 18 CV 379 (N.D. Ill. Aug. 7, 2018)

Why do lawyers believe manual linear review produces a better production?   Recall and precision should be applied to both linear review and TAR.  Linear review gets a pass.

PAYNE & SIX:  I would put one of my attorney reviews up against any TAR tool in the country.  I look at the conditions of the studies that conclude TAR is at least as good as manual review (they don’t go further than that), and I think “wow that was a very poorly designed manual review.”  But to answer your actual question, the big difference is case law.  TAR requires transparency and cooperation, attorney review does not.  The idea of the report card is that everyone would have to fill it out, regardless of method.  So, it evens the playing field.

ROBERTS & CLARK:  Unless you’re willing to send things out the door without looking at it (it’s not as uncommon as you might think,) TAR simply generates a smaller, more productive “linear review.” Even TAR 2.0 (Continuous Active Learning) is a linear review of sorts, the CAL workflow simply helping you decide when you’re finished and when you can defensibly stop looking at more documents. As a practical matter, the best workflows all typically have some measure of TAR, search terms, and human review as components. Just like Gary Kasparov v. Deep Blue, Dave the Astronaut v. HAL, or John Henry v. The Steam Drill, TAR v Linear Review is just another in a long line of Man v. Machine contests.

What is the best way to describe the difference between precision and recall?? Also, what is the recommended way to calculate them?

PAYNE & SIX: Ah … this is where I call my data scientist friends to make sure I’m not talking out of school.  We provided a definition that was checked and rechecked in our article.

ROBERTS & CLARK:  Assume a certain information retrieval strategy is used to challenge a large data set and retrieve responsive items. Precision is the percentage of responsive items found in the retrieved set. It is a measure of how accurate the retrieval strategy is. Recall is the percentage of retrieved responsive items from the entire data set. It is a measure of how complete the retrieval strategy is. Generally, for any given retrieval strategy, precision and recall have an inverse relationship. Maximizing precision tends to diminish recall. Maximizing recall tends to diminish precision.

Imagine a search strategy in a business dispute that uses a single search term to identify responsive documents. You could choose a single term (like the name of the adverse party) that would have high precision (i.e., most all of the documents retrieved were responsive documents.) But that methodology would certainly leave very many responsive documents not retrieved (low recall). Requesting parties tend to be interested in recall (completeness.)  Producing parties tend to be interested in precision (cost reduction).

Can the presenters explore the idea that almost all of these metrics are on a “curve” – precision falls (sometimes fast) as recall increases; recall plateaus (sometimes uncomfortably early) if review is for several concepts at once.

PAYNE & SIX:  Someone smarter than me will have to say definitively, but I really don’t know if you can calculate recall effectively with multiple concepts.  And yes, you can get 100% recall by just selecting the entire data universe—your precision will be terrible.  Ideally, with an effective review, you’d have close to 100% recall (getting all the good stuff) and also 100% precision (keeping out all the junk).  But that’s not realistic under any model, and each case is going to be different.  Some parties may be willing to wade through more junk to ensure they are getting everything they need.  Other parties may want a more precise set.  It’s going to be a case-specific question.

ROBERTS & CLARK:  There is definitely a sweet-spot in the trade-off between precision and recall. There is another metric known as “F1” that attempts to give an indication of this; it is the harmonic mean of precision and recall. Let your machine do the math, but the math is not as scary as it looks here, where it is explained in its Wikipedia page: https://en.wikipedia.org/wiki/F1_score

For recall/precision are you only concerned with the results of the machine learning algorithm or can you calculate these metrics to validate the efficacy of your overall workflow? Sometimes it makes sense to use more than one methodology.

PAYNE & SIX: Overall.

ROBERTS & CLARK:  Yes, it should be overall or “end-to-end”, but often times a producing party can never truly calculate overall recall if they use a culling methodology that actually discards information from the data set. It makes it impossible for them to sample from the entire data set.

Is the sample the elusion sample at the end aka validation sample after review is completed or proposed to be completed?

PAYNE & SIX: I assume most practitioners would be doing in-process testing, but that would not be for the report card.  The report-card sampling would be final validation only. 

ROBERTS & CLARK:  The best outcomes have mid-course assessments (including elusion testing) established in the workflow.

Shouldn’t any producing party be required to test the null set and disclose any marginally relevant documents and engage in a dialogue as to whether additional search refinement is required.

PAYNE & SIX:  Maybe?  The idea of the report card is to allow both parties to have objective metrics and drive dialogue that way.  And they may agree, or the court may order, that any responsive documents in the sample set be produced and reviewed further.  But I don’t think there’s a one-size-fits-all answer to this question. 

ROBERTS & CLARK:  Any responsive document should obviously be produced, regardless of where in the workflow its found. However, TAR does not necessarily try to leave behind documents of diminished relevance. It only leaves behind documents that it predicts is human trainer is less likely to tag as being “responsive.”  So very highly relevant documents can have just so-so predictive rankings, and vice-versa. This is one of the least understood notions about TAR.

My understanding is that it’s unwise to make promises about the recall rate a producing party will achieve when you don’t know at the outset what the data set contains.  i think it’s worrisome approach.

PAYNE & SIX:  Agree completely.

ROBERTS & CLARK:  Yep. This is why it should be an iterative approach undertaken by reasonable people.

Could you say that effective/passing grade TAR is dependent to some extent on the application? Is improvement of AI expected to improve reliability/report card score?

PAYNE:  I don’t know, ask me in 10 years when we’re all driving flying cars.

ROBERTS & CLARK:  I have less confidence now with the proliferation of “TAR” features in a lot of platforms. The original applications were facing tremendous scrutiny and had quality features cooked into them that tempered the way in which the machine learned to avoid bias and dead-ends. New applications may be more “juiced up” with more of an emphasis on precision as opposed to recall integrity. Theoretically, a solid validation procedure would compensate for this, but these platforms remain largely unregulated and without objective measurements of accuracy. The core math engines of an AI application are in the public domain; most Information Science grad students could build some type of crude TAR application in their mom’s garage.

What value would go into the cell that corresponds to precision horizontally and sampling method vertically?

PAYNE & SIX: That would be a text-based answer describing the sampling method for selecting the set of documents designed to test for precision.  It comes from the responsive set, not the null set.

Marginally relevant documents could be thousands more to deal with additionally.  When do you know where to draw the line?

PAYNE & SIX:  That’s a question that every review has to wrestle with, regardless of the methodology used.  You have to have a strong, defensible approach to determining responsiveness, and you have to train your people/computer thoroughly.

ROBERTS & CLARK:  It is the stuff of probabilities and likelihoods, and not absolutes. Which is why “defensibility” is an easier lift if the requesting party had a seat at the table when methodology is being designed. Just sayin’.

Can you use TAR as an ECA strategy to fine tune key words and document demands?

ROBERTS & CLARK:  Yes! The uses of TAR workflows to exploit evidence is limited only by your creativity and imagination.

Doesn’t TAR and use of the report card (or any technical analysis of outcomes) place an inordinate amount of pressure on judges, many of whom are not knowledgeable about such technical analysis?

HON. JAMES C. FRANCIS IV: Sure, it creates pressure, but pressure to develop technical competence is not a bad thing. My concern would be with judges who don’t appreciate the limits of their knowledge and who might, for example, assume that the failure to meet a pre-set recall rate necessarily demonstrates that the producing party conducted an inadequate search or, worse, acted in bad faith. Judges need to understand that there are complexities behind the simple numbers on a report card and to be prepared to address them.

PAYNE & SIX: That would be a great question for Judge Joe Brown in Nashville, who is semi-retired and became a folklore hero of the eDiscovery world for his frequent use of animal metaphors in TAR-related rulings.  I don’t get the sense that he ever thought he’d preside over TAR-related litigation, but he did and we have the cougar/raccoon/horse stories to prove it.  The truth is that TAR is headed for judges no matter what.  The report card is designed to give everyone—judges included—an objective framework to cling to.

ROBERTS & CLARK:  I love the notion of the report card.  As a complete substitute for transparency and collaboration, not so much.


Thank you again to our presenters for taking the time to present not only on the webinar, but to answer these questions here on the ACEDS blog.

Cybersecurity and information or network protection. Future tech

Legal Tech: The Intersection of E-Discovery and Cybersecurity: You’ve Come a Long Way, Baby

Data is an asset and a liability. It fits into both accounting columns and will not fail to be used against a corporate entity if not secured properly. Databases contain trade secrets, personally identifiable information, HIPAA-protected health care information, proprietary information and classified data. They also house sensitive information and evidence of liability or criminal behavior. As the size of databases grew, one thing became apparent: the information stored in those repositories had to be kept secure. As the importance of data became more evident, so did the importance of information security and cybersecurity.

Lawyers and cybersecurity experts were forced together as soon as employees had access to the internet. Before data breaches became the norm, the ugly secret in the IT closet was the amount of pornography in databases. Employees were searching pornographic materials at work, from their work desktops, and they seemed to believe that no one would ever find out. Unfortunately for them, when lawyers conducted ediscovery for investigations and litigation, they uncovered large volumes of pornography in their clients’ databases. Attorneys were obligated to inform corporate executives of this behavior, including the who, what and when. It was not long before firewalls were installed to block pornographic websites and other nefarious sites.

Lawyers routinely battled over the discovery of electronic data and how to get more data from adversaries in court. Receiving more data also meant reviewing more data. Lawyers reviewed data by looking at every document for relevance and privilege. But what good is it to pore over documents and strategically produce data if a hacker can breach your client’s database, exfiltrate all of the most sensitive data and post it on the dark net? Lawyers needed information security and cybersecurity experts to help block access to the Internet.

Meanwhile, the military and intelligence community were light-years ahead of lawyers. They compiled classified data and kept it from being compromised. The IC was aware of the value of sensitive intelligence data and the hazards of that data falling into the wrong hands. Thus, the military created cybersecurity tools and protocols within the Air Force Computer Emergency Response Team in the late 1990s — primarily network defense tools. Lawyers were largely unaware of and had no access to them, but as corporations and other governmental agencies started looking for ways to protect their most valuable assets, they had to turn to the U.S. government for help. The two professions rarely speak the same language but have the same goals and are often in the room at the same time. For information security and data security, the federal government led the way, with corporations following closely behind, leaving only law firms still lagging.

In 2002, Congress enacted the Federal Information Security Management Act (FISMA), 44 U.S.C. S 3541, et seq. As part of the E-Government Act of 2002, FISMA created the foundation for information security in the federal government and recognized the importance of InfoSEC to the economy and national security.

As data grew, federal CIOs recommended moving data to the cloud to reduce the government’s on-site data storage and risk. Federal CIOs agreed with this protocol, but lawyers did not. Lawyers tend to be risk adverse, not familiar with cybersecurity and very busy. They had no intention of pushing their data outside of their agency. There was one exception, the Department of Justice has had a contract known as Mega for litigation support for over 20 years. Early on, the Mega contractors were primarily defense companies like Lockheed Martin and CACI. The DOJ controlled the environment and worked seamlessly with the Mega contractors for a couple of decades. All federal agencies could utilize that contract for litigation support help. It was convenient because the security component was handled by the DOJ and the contractors were in the defense business.

However, by 2010, federal agencies were looking to upgrade their ediscovery platforms to more modern and robust tools only available in the cloud. Law firms and corporations were using ediscovery vendors to host robust and revolutionary software applications in their environment. Technology-assisted review, computer-assisted review and predictive coding became the norm for the private sector. These tools were innovative and saved time and money, but for the private sector, there was no standard security protocol for hosting third-party data. In fact, while each vendor follows some form of security protocol today, there is still no standard in the private sector. Vendors cobble their security programs together based on ISO and NIST publications.

In 2011, the Office of Management and Budget authorized, via memorandum, the Federal Risk Authorization Program, and the FedRamp Program Management Office was established in 2012. The purpose of FedRamp was to provide a set of guidelines and protocols for securing government data in the cloud. A FedRamp authorization consists of 170+ controls and subcontrols that secure cloud infrastructures, networks and databases. Many of these controls are policies. The bulk of data in agencies that investigate and litigate is used by attorneys. To avoid breaches of legal data, federal agencies locked down their data behind firewalls.

FedRamp authorization allowed federal agencies to put their data in the cloud, but it was an expensive and painful process for those with no knowledge of cybersecurity. Until this year, only three ediscovery companies have made it through the FedRamp authorization process. Meanwhile, data breaches were becoming a common occurrence.

If you are an attorney and you need ediscovery tools, having them behind the firewall of your corporation, firm or agency is no longer the best option. Having the technical expertise, budget and variable options for the management of terabytes and petabytes of legal data is not usually feasible. Multinational organizations and financial institutions are the only entities that can support such infrastructure, and most of them still use cloud-based vendors for ediscovery.

The best cybersecurity experts come straight out of the government. They are in our armed forces, the intelligence community and entities that include DHS and the White House, and they have been dedicated to protecting our government networks from attack. Therefore, using a FedRamp-authorized vendor is turning out to be the best option for agencies. The FedRamp guidelines work as private sector guidelines too. Legal departments, CISOs and vendors are working together to meet the FedRamp guidelines to build secure environments for the tools of their choice.

The fight to keep data safe has become an extremely complex and expensive endeavor. A 2019 study by Emsisoft reported that in 2019 at least 966 health care providers, government agencies and educational institutions in the U.S. were targeted by ransomware attacks. SeeThe State of Ransomware in the US: Report and Statistics 2019 (Dec. 12, 2019). The cumulative cost of those attacks to taxpayers was more than $7.5 billion. Id. The number of attacks on law firms and corporate legal departments is also increasing and jeopardizing attorney-client privilege. Let us look at some recent data breaches and what could have prevented them.

Federal Breach: OPM

In 2014 and again in 2015, the U.S. government discovered the theft of all personnel security clearance information including background investigation files and fingerprints. The attackers gained valid user credentials and employed malware which installed itself onto the Office of Personnel Management’s network and established a back door. More than 20 million records were exfiltrated. The Chinese government reportedly stole the entire database. The fallout from this breach is so wide-reaching that we may not know just how many Americans were targeted after China analyzed the data. Basic cyber hygiene could have helped prevent, identify and detect the initial attack in the early stages before the hackers had opened access to OPM’s network for almost 18 months. Routine patching, user awareness and trained network defenders would have significantly reduced risk. Also, using enhanced protections and monitoring around the OPM security file database could have reduced damage and exposure of millions of U.S. government employees’ security files.

State Breach: IDES

The Illinois Department of Employment Security contracted with a vendor to launch the Pandemic Unemployment Assistance Portal as an add-on to its unemployment system. The new PUA went live in May 2020. A few days later an outside entity discovered that a spreadsheet with the names, addresses and Social Security numbers of Illinois unemployment applicants was publicly visible on the website. Approximately 32,500 applicants’ personally identifiable information was exposed. This breach has been referred to by officials as a “glitch.” Free credit monitoring services are being offered to the victims.

New IT projects need to be put through an information assurance process, and data projects require quality assurance processes. A good IA process checks all the risks associated with the hardware, software and implementation of both. During the IA process is when any open portals should have been discovered. A good quality assurance program will check all permissions and access for data and would have discovered PII that was public facing. Neither process worked on this project. Contractors need to include these assurances before turning over a new system. The client must be involved and needs to see the results of both processes before going live.

Law Firm Breach: GSMS

Recently, Grubman Shire Meiselas & Sacks, a New York entertainment law firm to the stars, was hit with a ransomware attack. The attackers allegedly demanded 12 bitcoins for the decryption key. At the time of this writing, 12 bitcoins converts to about $111,265 — not a lot of money to a New York law firm. However, approximately 750 GB of attorney-client privileged data was also being offered on the Internet to the highest bidder. Ransomware is a particularly vicious cyberattack because it shuts your business down, destroys goodwill and breaches client trust. Law firms have been especially slow to seek out cybersecurity and information security experts before they get attacked. At least five law firms were hit with the so-called Maze ransomware in January 2020 alone.

Basic user awareness can help block ransomware. Initial attacks usually come in via phishing messages, phone calls and text messages. Never give up sensitive information nor click on links or attachments from unknown senders. Security email filtering and scanning for inbound email to law firms should be in place and only allow trusted file types. Finally, routine security updates for endpoint machines, mobile devices and servers need to be performed to close vulnerabilities.

E-Discovery Vendor Breach: Epiq Global

In February 2020, Epiq Global — an ediscovery vendor with 80 offices worldwide — was the victim of big-game hunting, a practice where Ryuk ransomware attackers go after large enterprises. Epiq Global hosts client data and third-party data for law firms and corporations. The attack followed a format usually used by the Ryuk attackers: A phishing scheme gathers administrator and user credentials to gain access to the network. This opens the door to spying, encrypting data and exfiltrating it or demanding a ransom and extorting the victims. Law firms and corporate legal departments around the world were impacted. The big question for law firms is whether these vendor breaches violate attorney-client privilege.

It comes down to end-user awareness, basic cyber hygiene and information separation as well as partitioned access for sensitive data. End users need to be able to identify potentially malicious messages and alert their cybersecurity team. If one user identifies a malicious message, there are likely nine other staff receiving the same message. Building an alert culture is key to helping secure sensitive data. Additionally, separating key databases and putting up enhanced protections such as access control and monitoring will help detect and identify anomalous behavior. Administrators must use a separate account and a separate machine for troubleshooting and maintenance of the crown jewel datasets. Finally, two-factor authentication for all users greatly reduces risk of user and administrator accounts being compromised.

As we see the ransomware attacks against law firms, state and local governments and corporations increase, the need for a set of cybersecurity standards for law firms that host client data also intensifies. The Association of Corporate Counsel is working on a new Data Steward program that will create a baseline for law firms and corporate counsel. In the meantime, lawyers would be wise to follow the FedRamp Moderate authorization requirements for the hosting of client data. In the long run, it is less expensive than paying a ransom and losing the goodwill and trust of your clientele. Moreover, some of these breaches may eventually constitute a breach of attorney-client privilege and lead the courts to start sanctioning lawyers. The intersection of cybersecurity and ediscovery is complete.


Originally appeared in Cybersecurity Law & Strategy. © 2020 ALM Media LLC. Reprinted with permission.

Block chain network/Blockchain network concept , Distributed register technology , Block chain text and computer connection with blue background

The Increasing Promise of Technology-Assisted Review: How to Tame the Vulgar Expense of E-Discovery

In my first major case using technology-assisted review, our team had to review documents in Korean, which brought with them privacy and cross-border transfer concerns. The technology was very helpful, but we still had to employ two rooms filled with Korean-speaking lawyers to support the effort. Needless to say, it was a very expensive production.

During a more recent matter in the second half of 2019, we collected nine-million documents and applied basic and broad keyword searches at the outset to quickly reduce that dataset to two-and-half-million records. We then applied Brainspace and its continuous active learning functionality to the remaining information and were able to quickly categorize each document, including those categories that were uniquely valuable to our case, to immediately and painlessly eliminate millions of documents from consideration.

The contrast between the two experiences was striking. Instead of multiple rooms filled with reviewing lawyers, we enlisted a skilled, but relatively small team of contract attorneys to code 25,000 records in two weeks. When all was said and done, the client told us that this project was far less expensive than the similarly sized project he had just completed on another case, and I am very comfortable that we identified the correct documents in a highly defensible manner.

As a result, leveraging artificial intelligence in this way is not just an option, it is the only one if you want to tame the vulgar expense of e-discovery.

This more effective model is not without its challenges, which include the following:

You Need Skilled Lead Counsel

Given that lead lawyers on matters of this type heavily rely on technology to determine which documents are relevant, it is essential that they have the requisite skill and understanding of the current technology to complement their legal talent. While they once simply designated documents as privileged or responsive in a linear manner, using mapping and other visualization tools allow them to highlight conversations, issues, windows in time, and specific types of documents, all in a manner that can quickly identify the most important documents related to a specific issue and cleave out those with no relevance. In other words, to fully harness the technology, counsel must not only have deep knowledge of the case, but must understand what can be done with the AI and how to do it.

Contract Attorneys Require Training

While contract attorneys may have fewer documents to review because of the technology, human eyes still need to review whatever the technology identifies as relevant.  Accordingly, and possibly even to a greater extent than when they reviewed “everything,” contract attorneys must be deeply trained on the matter in order to optimize their efforts. Insufficient preparation may result in inconsistent document coding, i.e., responsive vs. non-responsive, which could materially delay the process. In fact, the more you rely on computers perform key tasks, the more disciplined the human interaction and input needs to be.

Client Collaboration is Critical

Full transparency and client buy-in about the process is critical.  New tools are launching regularly so even sophisticated, large organizations may not understand the significant benefits and savings on the back end that usually result from the slightly higher front-end cost of the initial computerized data analysis. This may require preparing a cost-benefit analysis demonstrating the overall savings, which again, I have found to be increasingly substantial.  We were certain that our advanced approach would result in a substantial cost reduction and it turned out to be one of the smoothest productions we had ever completed. Our collaboration ensured that the client’s sophisticated team collected efficiently and transferred it to the host.  With well-documented culling followed by the AI analysis, we were able to save thousands in monthly hosting fees alone.

Choose the Right Technology

In our case, the head of litigation support technology at our firm recommended Brainspace because it integrated with our existing portfolio of tools. What I derisively call the golden age of big-law document review, with teams of associates reviewing every document in a linear manner, thankfully no longer exists. Increasingly, even the more restrained days of law firms simply supervising lower-cost contract reviewers are also in the rearview mirror. Now, the law firm’s role is to optimize the use of AI-driven review tools, manage the technology, ensure the contract lawyers are well trained, and produce a defensible production.

While the firm associates still must participate in reviews, often performing quality-control aspects of the job, they are now supported by our manager of technology-assisted review. That manager can compare what the reviewers are finding with the broader database as a whole, essentially performing a statistical QC of the overall findings that further validates the integrity of the production.

And the emerging new model is not a bad thing for those like me in “big law.”  Although the “golden age” ended—as it should have—with much of the rote review work being outsourced to contract reviewers, when AI is involved on large cases, the tech-savvy partners and associates are reemerging with new roles that actually create the kinds efficiencies that really justify their fees.

Key Best Practices

To maximize the value of your efforts and optimize efficiency:

  1. Remember that data security is the most important issue; it must be addressed with every vendor, contract lawyer, and team member.
  2. Surround yourself with the right people; people who truly understand the technology are worth their hourly rates and contribute to real savings overall.
  3. Carefully consider the roles of each member of the team. Often, it will be important to have a chief technologist liaising with both the client and professionals handling other aspects of the case.
  4. Hire the most talented contract lawyers, train them well, QC their work, and immediately let go of those who are not working out. Document review cannot be forgiving. One bad reviewer can infect the entire process.
  5. Quality control is key and must be done in a rigorous and consistent manner.
  6. Memorialize everything, from search terms, to AI processes, to the metrics on each stage of the review. I put everything into a defensibility memorandum so that if needed in two years I can explain to a court or tribunal exactly what was done and why it was reasonable.

Promoting the Promise of TAR

We have been discussing the promise of technology assisted review for years.  Whether called TAR or AI, I believe the technology is now well in the mainstream, and am very impressed with its effectiveness. The challenge for junior lawyers is that technology is limiting the work that formerly provided them with foundational experience. Document review, though arduous, helps one learn about the business of a client. I remember spending many months as a young lawyer sitting at document repositories flipping moldy pages of old client files.  It’s a great, if expensive, way for young lawyers to learn not only about the case, but about the ways of the business world.  While automated review is better for clients in the long run, it does reduce the amount of work for human lawyers, so that supply and demand will have to re-balance over time.