This browser is not actively supported anymore. For the best passle experience, we strongly recommend you upgrade your browser.

Competition & EU law insights

Keeping you up to date on Competition & EU law developments in Europe and beyond.

| 7 minutes read

Forensic Finesse: Navigating Competition and Regulatory Dawn Raids and Requests for Information

Professionals who are navigating competition and regulatory investigations - whether triggered by complaints, self-referral, market monitoring, merger reviews, or sector inquiries - are now facing a surge in technical complexity. The IT challenges in these investigations have shifted from sheer data volume to the diversity of data sources. Nowadays, dealing with competition investigations requires technology experts and lawyers to undergo continuous training to keep abreast of the integration of emerging communication channels with traditional ones, as well as the accompanying challenges spanning the e-discovery process from data collection to review and production. In this article, we will discuss the best ways to address these challenges and their ultimate impact on process time and costs for clients.

The Electronic Discovery Reference Model (EDRM) represents the standard phases in an e-discovery project and serves as guidance in the assimilation and gathering of electronic data during the legal process. This process is iterative. It allows forensic teams to see how the process unfolds across the following key stages:

  • Identification
  • Preservation and Collection
  • Processing and Analysis
  • Review
  • Production and Presentation


The initial stage of the e-discovery process involves identifying all potentially relevant data sources. Collaboration between IT and other departments is vital to grasp data storage nuances and communication channels.

Along with the integration of IT systems into unified environments, which allows for the connection of multiple disciplines, there is now a prevalence of informal communication channels such as WhatsApp, Signal, Telegram, iMessage, and data sharing platforms. This creates additional challenges in data collection, as their use by those being investigated may only come to light during the review process. A robust data governance program can help companies protect against the challenges faced during these initial stages of an investigation by ensuring compliance with retention policies and mapping of repositories or assets. See our article on data protection issues for further detail on this point.

The following points are important during the initial investigative phase from an IT and data perspective:

  • Individuals should not copy, modify or destroy any data or devices;
  • ‘Legal hold’ functionalities should be applied on network-based data sources, such as email, cloud drives etc.;
  • Companies and individuals should not perform any actions requiring data interference (e.g., hard drive decryption); and
  • Companies should involve and consult forensic IT specialists – preferably instructed by external counsel with legal privilege.

Preservation and Collection

Preservation and Collection are pivotal stages in the e-discovery process, requiring meticulous planning to mitigate any data loss risks. Data sources range from end-user devices to corporate environments, each necessitating tailored collection approaches. Typical data sources include:

  • End point user devices (laptops, desktops, mobiles);
  • Share drives and online repositories (such as Google Drive, OneDrive, Sharepoint);
  • Mailboxes and corporate instant messenger (Legacy Exchange, Microsoft 365, Gmail, Teams, Google Chat); and
  • Alternative and emerging communication channels (Slack, WhatsApp, Telegram, Signal, Viber).

Data stored within a corporate environment (i.e., shared drives, online repositories, mailboxes and corporate instant messengers) can usually be exported in full by an IT department. Modern IT environments not only facilitate data filtering and exporting, but also offer a technological legal hold application which can protect data from deletion. By contrast, data from end-user devices require specialist knowledge and e-discovery or forensic tools. The aim is to preserve data integrity (securing metadata) in order to provide irrefutable electronic evidence accepted in litigation or by regulators.

When planning, it is important to consider which sources require full collection, such as forensic images, and where collection of certain sources can be selective. Chat correspondence from WhatsApp can, for example, be exported along with media using the app's built-in functionality.

If system logs that contain user activity and potentially deleted data need to be recovered or located, it is recommended to opt for a more extensive collection process, such as forensic imaging. However, the recovery of deleted data is contingent on various circumstances and requires advanced knowledge of forensic technology and tools.

Understanding how IT systems and applications work is critical. For user device data, it is increasingly rare to rely solely on what is stored locally. Syncing with OneDrive, backing up to iCloud, or the increasingly popular use of modern attachments in email correspondence all merit consideration in the data collection process, and failing to do so may result in incomplete data being collected.

Data collection is also an area where initial assumptions and stakeholder expectations often do not align with reality and technology expert practice. While exporting, transferring, and preparing limited email data can take a few days, securing large amounts of data can be time-consuming and require several actions to be taken from both sides, such as arranging the organization's resources, meeting with employees to collect data from their devices, transferring data between systems, requesting data from external vendors, or restoring offline backups. 

Processing and Analysis

This is the stage where the power and utility of e-discovery tools and platforms are most apparent, as they can be used to organize unstructured data by means of:

  • Indexing;
  • Optical Character Recognition (OCR);
  • Deduplication;
  • Exclusion of immaterial files;
  • Structured analytics (language identification, name normalisation, email threading, near duplicates); and
  • Unstructured analytics (Clustering, Concept Searching).

If the right assumptions are made, this stage can yield gains by reducing the number of documents and the time spent reviewing them. However, incorrect assumptions can extend the process to a much longer period than anticipated, for example, if adjustments for unexpected results are necessary.

Even default settings such as the source time zone, index or OCR language can have an impact and result in incomplete keyword results: 

  • The wrong time zone setting can mislead the person reviewing the documents or fail to capture documents from borderline dates. Therefore, when collecting data from different time zones and jurisdictions, it is important to act with due diligence and split sources across appropriate lists.
  • Failure to include additional relevant languages for OCR will not produce a correct text layer. Similarly, the level of complexity of keywords needs to be considered to create appropriate case-sensitive indexes or include special characters and punctuation.

Some of this can be reworked at a later stage (if not picked up at an earlier stage) but preparing data to produce the correct results will take additional time.

The same is true of structured analytics such as email threading, which can reduce the number of documents requiring manual review. If used incorrectly, search terms that include message sender and recipient criteria, or a narrowed time range, can miss important documents which are included in a longer email chain.


Documents selected using keywords or other criteria are reviewed manually. There are several approaches to reviewing documents and for assessing their relevance in relation to the investigation. Technology experts predominantly use the following two approaches:

  • Linear review, whereby all selected documents are reviewed for relevance, normally using sets of documents (review jobs/batches) sorted chronologically. Granularity can be increased by including custodians or topics based on clusters or keywords.
  • Continuous Active Learning (CAL), which uses a predictive statistical model to ensure that the review covers the most relevant documents, while indicating when it can be stopped before all initially selected documents have been reviewed. The criteria for applying CAL require agreement with the opposing party in litigation or the regulator.

Regardless of the approach taken, it is important to consider:

  • Whether to review entire families of documents, only documents hit by keywords, or documents hit by keywords and related top-level items at the first level.
  • Whether to use effective email threading to reduce the number of documents to review or follow entire threads of relevant documents.
  • Whether to use Near Duplicates to exclude documents with substantially similar content, for example, 99% similarity.
  • Whether a significant number of additional documents are expected to appear later in the review.
  • Whether to use machine translations or involve native speakers for foreign language documents.
  • Coding layout, including fields, choices, and level of complexity, as higher complexity can impact review speed.

It is also possible to augment initial searches with advanced analytics, such as the sentiment analysis implemented by some e-discovery platforms that allows for the selection of documents with a certain sentiment. It is particularly useful in cases involving compliance matters, such as harassment.

The document review process is multi-tiered. The first level identifies potentially relevant documents for specific topics and flags those for redaction or potential withholding based on privilege. The second level, led by more experienced team members, reviews the first level assessment and confirms or modifies choices. Various streams are then created to complete the project.

Along with the algorithms currently being used, there is growing interest in the potential of generative AI in e-discovery. Soon, tools will be able to efficiently identify documents with a high probability of privilege, while others in development will analyze data using a large language model (LLM). This will make it easier to interpret selected sets of documents or select specific documents using prompts instead of relying solely on syntax and technical language such as keywords and metadata.

Production and Presentation

This stage entails the export and delivery of information and documents which are identified as relevant. The format of the data, known as production, is set by a protocol and can vary by project, stakeholder requirements, arrangements with opposing parties in litigation, or requirements set by courts, tribunals or regulators.

Production usually involves the transfer of documents in a format which allows the recipient to upload them into its e-discovery platform for similar search and review processes. The data set will therefore include native documents, text layers, images for redacted documents, and a structured metadata file (loadfile).

Exporting partial production and documentation in PDF format is also possible.


The scope of investigative e-discovery approaches can vary and may not include all the steps described above. However, the growing complexity of projects driven by the rapid pace of change in technology and forms of communication requires professionals and project managers to have a comprehensive understanding of IT systems, organizational structures, and business practices to address technical challenges effectively and ensure proportionality of effort.

For more information, please contact Jakub KurMarcin Miazga or Anne Federle.

This article is part of a special edition on Investigations of our monthly newsletter Competitive Edge.




competition, competition law, eu, eu law, antitrust, dawn raids, europe, competition & eu law, investigations