Luddite's Guide to Active Learning

The long-standing disconnect between legal and tech
The Active Learning glossary
How does Active Learning work?
The benefits of Active Learning
When Active Learning is suitable
FAQs

The long-standing disconnect between legal and tech

The legal sector hasn’t always maintained the greatest relationship with technology.

The Luddite lawyer, struggling to adapt to the latest innovations, is more than just an amusing stereotype — it’s a well-documented reality.

Tools and technologies that have been at the centre of the conversation for several years can still exclude the most capable legal executives. Here at Altlaw, one thing which has repeatedly puzzled our partners and clients is Active Learning.

Lawyers' inability to wrap their heads around Active Learning (or TAR — Technology Assisted Review) has been a trend in eDiscovery for many years.

The legal sector is already one of the most demanding industries, even without the added pressure to understand the ins and outs of complex technologies.

With that in mind, we thought we’d create this guide to offer a helping hand. We’ll run through everything you need to know about Active Learning — what it is, how it works, when to use it and why you shouldn’t overlook its value.

So, let’s begin your journey to become an Active Learning aficionado.

The Active Learning glossary

Before we dive into the details of Active Learning, let’s run through some important terms and phrases.

Technical jargon is one of the biggest barriers to understanding the processes of eDiscovery. Where Active Learning is concerned, general confusion can stem from countless terms, acronyms and initialisms being used interchangeably (and sometimes incorrectly).

Let’s break down the key terms you need to know for Active Learning.

TAR

TAR stands for Technology Assisted Review. TAR is an umbrella term for several processes where technology is used to help support, streamline or strengthen a document review project.

Specifically, TAR tends to be used to refer to predictive coding technologies (explained further below), meaning systems that aim to imitate the decision-making ability required of a human reviewer when classifying case documents.

A review team can then leverage and replicate en masse to conduct the discovery process in less time and at a lower cost. Over time, as TAR has grown and progressed, different iterative terms have been used to refer to the evolution of TAR capabilities.

TAR 1.0 refers to technologies that function through ‘simple Active Learning,’ or a process of one-time training before use. The systems in question must be trained and developed by a senior legal executive or subject matter expert to interpret and classify data in the manner they wish.

This involves coding a control set for relevance before training the system algorithm against that control set. While it’s less exhaustive than manual review, it can still be a fairly laborious process, requiring a senior team member to configure the algorithms by conducting thousands of document classifications before the system can rank them effectively and consistently.

Predictive coding

Predictive coding was the first form of Technology Assisted Review, but it’s now often referred to as a core component of TAR 1.0.

Predictive coding is a process which reduces the number of non-responsive documents in a system. It supports the review process by intelligently labelling documents and classifying them as either relevant or non-relevant. This process typically uses a combination of machine learning, keyword searches and filtering.

As previously mentioned, its understanding of relevance must stem from how a senior lawyer or subject matter expert codes a training set of documents.

Machine learning

Machine learning is a subset of artificial intelligence. It refers to systems that can automatically learn how to improve the performance of a task from experience without having to be programmed or trained by a human first.

In eDiscovery, machine learning algorithms are used by advanced forms of TAR to process large batches of case documentation, continuously training the model, therefore improving the accuracy of the review process as it goes on.

Continuous Active Learning (CAL)

Continuous Active Learning is an advanced form of predictive coding and an advanced form of technology-assisted review. CAL has replaced the ‘predictive coding’ used by earlier eDiscovery solutions as the most widely applied form of Technology Assisted Review, using machine learning to make document review processes simple, swift and cost-effective.

As it’s now more popular and commonly used, when most people in the legal sector refer to TAR, they tend to mean Active Learning.

TAR 2.0 refers to the technologies developed to address the shortcomings of TAR’s initial iteration. The main difference with TAR 2.0 is that, unlike previous iterations of technology-assisted review, the input of senior lawyers or subject matter experts is no longer required for training purposes.

How does Active Learning work?

The Active Learning system provides reviewers with what it judges to be the most relevant documents to their case. Unlike the earlier versions of predictive coding (TAR 1.0), Active Learning technology learns and adapts during the review process rather than being trained or configured beforehand.

Its judgement of relevance is instead based on patterns in the coding decisions made by a review team. Each tagged document delivers an iterative improvement in the system’s understanding of your case and the relevant documents you’re reviewing.

So, the more documents you code, the more consistent and accurate the Active Learning engine becomes. The documents the Active Learning system identifies as highly relevant to your case are pushed to the front of the review queue in real-time, helping review teams get to the most critical case information as quickly as possible. But, as far as the benefits of Active Learning go, these are just the basics.

Read on to discover how Active Learning can offer legal teams greater efficiency, speed and process automation, allowing you extra time and resources to do more.

The benefits of Active Learning

What benefits can it offer you? How can it add value to law firms and in-house counsel? You may have been able to infer some of the benefits of active learning already by reading about how it works and how it differs from earlier forms of TAR, but for the sake of ease and clarity (this is a guide for Luddites after all), we’ve laid the key benefits out for you below.

Reduce time spent reviewing irrelevant documents

The traditional process of manual review is exhaustive. It requires a review team to look through every last document in a given batch before considering a review project complete. Active Learning’s powerful engine can assess the relevance of documents in your batch based on your previous coding decisions and then push the most relevant documents to the front of the queue.

This means you can easily find relevant information and define your case strategy earlier in the project.

Know when a project is done

Because Active Learning works to show you the most relevant documents first, this means that, eventually, you’ll reach a point where the relevancy of what you’re reviewing begins to taper off. But we know that any ordinary lawyer wouldn’t be able to sleep at night, having ended a project after seeing the relevancy line on a data chart start to descend.

Active Learning has an elusion testing feature, which can provide intelligent, data-driven and substantiated confirmation that a review project has been completed to a satisfactory level.

Minimise set-up and admin time

As we alluded to earlier in the glossary, a major downside of earlier forms of predictive coding was the need to train the system before a review project could even begin. This not only required a lot of time, but it also typically required the attention of a more senior legal executive, representing a cost and potential productivity trade-off.

But with Active Learning, there’s no need for training sets or the manual batching of documents — human input during the setup process (and in general) is minimal, making it an incredibly budget-friendly solution. Reviewers can simply log in, click a button, enter a review queue and start working through the most relevant data.

According to Jeff Gilles of Relativity, “You can take a 100,000-document project from set-up to review in under 10 minutes.”

Integrate with other leading legal technology

Active Learning is highly compatible with other tools, technologies and platforms used for eDiscovery. Active Learning systems boast a flexible API (application programming interface) which allows them to be easily integrated and used in combination with numerous other eDiscovery tech-driven activities, such as email threading, clustering and sample-based learning.

All review projects and cases come with their own unique needs and nuances. By leveraging Active Learning’s integrability, you can use it as part of a unique workflow that combines many powerful, leading-edge eDiscovery tools tailored to your situational needs and objectives.

When Active Learning is suitable

There are a lot of positives that can come from knowing how to use Active Learning effectively, but ultimately, this is a helpful guide, not a sales brochure. Getting the most out of Active Learning technology means knowing its limitations and what tasks and project dynamics it isn’t suited for.

Being aware of the limitations of Active Learning technology is vital if you’re going to get the best out of this powerful technology, and it’s the best way to safeguard yourself from the pitfalls of misusing it, too.

Active Learning isn’t for an exhaustive review

Active Learning prioritises relevant documents, leaving those it deems non-relevant on a discard pile. This means it isn’t a tool for comprehensive review projects. If your goal is to review a data set in its entirety and ensure every single document in a batch is looked at, you’ll need to apply a different method to your review process.

Active Learning isn’t for nuanced review protocols

The intelligent machine learning capabilities that drive Continuous Active Learning mean it’s an exceptional tool for classification tasks. When your team can identify relevance based on fairly binary criteria, you’ll likely get the precise results you want, with those impressive cost savings and speed to boot.

What Active Learning isn’t suitable for, however, is a review project which aims to perform complex analysis of issues. If you’re investigating a case where multiple conclusions may be interpreted or drawn from each document, you may need to opt for a different approach.

Active Learning isn’t for media-rich file formats

Just because a document is stored electronically, it doesn’t mean Active Learning can review it. Most Active Learning systems are only built to work with text-rich file formats, such as emails and word processing documents. Active Learning shouldn’t be used to review images, spreadsheets, videos or other media-rich files.

If there’s no text information in a document, then there’s nothing for the Active Learning engine to identify. So, if you go through a project and the majority of the data isn’t in the right format, these will need to be looked at separately and you’ll have to employ a different strategy to review and code them effectively.

Also, if you wish to interrogate the metadata of the files in your document batch, then you’ll require the use of a different protocol or workflow. Because it’s composed of varying text characters, the inclusion of metadata in an Active Learning review can seriously muddy the waters and compromise your system’s ability to identify relevance effectively.

FAQs

Does using Active Learning mean we no longer have to QC the documents in question?

The short answer is no. Active Learning is a potent tool capable of creating new efficiencies and reducing workload, but it isn’t a substitute for human involvement or sound eDiscovery practice.

You should still do everything you’d normally do in the case of first-pass review alongside Active Learning; daily quality control, constructive feedback, conformity validation and reporting, to name a few. Technologies can significantly support your team’s processes, but they’re a long way away from being able to replace them altogether.

Documents from priority custodians are often some of the last we receive. How can Active Learning ensure the most relevant files are at the front of the review queue when documents are added after the process has started?

This question highlights one of Active Learning’s most powerful features. Due to its sophisticated automated capabilities and its ability to process masses of documents in real-time, Active Learning can take on case information in a rolling fashion.

Suppose a sudden batch of new documents is sent over for review once collection has started (as often happens in the unpredictable world of litigation). In that case, Active Learning can take them on without detriment to the process or its effectiveness. As the system continually refreshes its rankings, new documents can be added anytime.

Note: the above is true of Active Learning. For earlier iterations of TAR, training work is required on the basis that all case documents have been received before ranking them, so rolling document addition is not possible.

Is Active Learning more accurate than human review?

Numerous studies have proven that, if managed and used effectively, Active Learning systems will outperform the efforts of exhaustive manual review.

While the accuracy of judgement can’t be proven objectively by studies of this nature, we can speculate that things like human fatigue have a hand to play in these discrepancies. However, as mentioned above, the effectiveness of Active Learning ultimately rests on the practices of the people that leverage it. If it’s used for tasks it isn’t suited to, or if it isn’t properly supervised and configured, Active Learning can present risks, leading to tangible losses.

In the case of United Airlines in 2018, for example, their faulty use of TAR produced millions of unresponsive documents. This cost them significant time and money, and they had to request a six-month extension for their litigation.

Completing a project without having reviewed all of the documents sounds risky. How does elusion testing work to account for this?

We understand that ending a review without having looked at a large number of the documents in your pile might bring you out in a cold sweat, but the elusion testing functionality provided by most CAL systems is highly defensible and designed to be risk-averse.

It’s also worth mentioning that the decision to conduct an elusion test and thus begin bringing a project to a close is down to the review team. This will be different for everyone, but generally, this should be when the review queue appears to have reached a point of consistently low relevance.

When the cost and time required to keep reviewing likely irrelevant documents outweighs the positive impact they will bring to your case; you can run an elusion test to determine whether you’re satisfied with the results. Elusion testing gives the review team a statistical sample of documents deemed ‘not likely relevant’ to code (typically, this will be between 1,000 and 1,500 documents).

You’re still tagging them as either responsive or unresponsive as you did before, but now they’re coming from the discard pile rather than the review queue. After coding this sample of files, the review team feeds the results into the Active Learning system.

From there, you’ll be presented with an elusion rate and an estimate of how many relevant documents may remain in your data set. Now, it’s a judgement call based on the percentage of documents you’ve already worked through, what percentage of the remaining documents may be ‘eluded documents’ and to what extent the goal of your investigation process has already been fulfilled.

Want to learn more about any of the topics discussed in this guide? Altlaw can help.

Contact a member of our team today for an informal chat.

eDiscovery Services: 020 7566 7566

Print/Hard Copy Services: 020 7490 1646

Email us: enquiries@altlaw.co.uk

Active Learning for Luddites

Contents