Acumen Import  

Acumen Import delivers comprehensive, scalable, and customizable pre-processing
for over 400 file types in five automated steps:
> Explode compound files > Extract text and metadata > De duplicate > Index

Leverage Acumen’s pre-import analysis and data culling utilities, in addition to de duplication, to reduce a large document population down to a much smaller, responsive set. Acumen accounts for every file that was processed, even files that were segregated due to filtering criteria, de duplication, or import errors.

Collection analysis

Preview and optionally filter by folders and file types or size. Use Acumen's intutive pie and bar charts to analyze the collection's data at a glance.

Analyze and query email collections using Acumen reports about "Who is talking to who?" and "What are they talking about?"

Use Acumen's interactive query utility to determine responsive files based on native file full text indices.

Filter out known system files

System files can be filtered from the processed batch when their MD5 hash value matches that of a known software application file. A list of over 45 million known software application files is published by the National Software Resource Library (NSRL) list, and Acumen receives an updated list quarterly. The “known” tab of Acumen’s detailed file-level processed batch spreadsheet report lists all files that were filtered during processing due to a match with a known MD5. Filtering by MD5 may be turned on or off in the Acumen case wizard and processed batch wizard.

Reports account for every file

Use Acumen’s processed batch summary report to get processing statistics at a glance, or drill down into a detailed file-level accounting.

Some files are segregated from the processed batch due to import errors. The “import errors” tab of the processed batch spreadsheet report includes all files that were held as errors, along with a description of the error. For example, a file may have failed to import because it was password protected.

View and reprocess import errors

Files that error on import can be identified visually in the context of the processed batch, along with reference to the first (winner) instance of the document. Acumen users may optionally browse and reprocess files that error by on import without leaving the application.

Audit trail of workflow

Furthermore, Acumen’s audit trail substantiates all events in a file's lifecycle. Track the file as it moves from processing to review to production by event, by user, and by date/time.

Parent/ child hierarchy

Acumen preserves the legal significance and context of data relationships between "parent" files and "child" attachments by visually displaying the relationship in processed and culled batches, optionally display data relationships during review, and making the relationship available as metadata throughout processing, review and production.

Point Acumen's import module to restored data on any external or internal drive. Acumen uses Stellent Outside In to view and extract searchable text for over 400 different file extensions with thousands of file types from the decades of versions. Acumen supports most of the formats listed here. In addition to word processing formats, spreadsheets, presentations, graphics, and compressed formats like .zip, Acumen extracts full text for searching and metadata from email formats including Outlook messages, calendar, notes, tasks, journal items and folders (PST), Lotus Notes messages (NSF), and instant messenger archives.


Metadata is characterized by the Federal Rules of Civil Procedure as the historical, managerial, and tracking components of a file. Acumen stores up to 111 different metadata values per document in its database. Since Metadata values are subjective and depend on context, only some of these 111 values will be populated for any given document. Metadata is extracted during processing and stored for later use with sophisticated searching and filtering in Acumen's review module and production from Acumen's export module.

De duplication is the process of identifying and segregating those files that are exact duplicates of one another.

Exact de duplication by MD5 or metadata

Acumen identifies exact duplicates using the mathematical hashing algorithm MD5 or a user-defined combination of metadata. (De duplication of exact duplicates is the most precise way to de duplicate a data set, and Acumen does not offer "near de duplication" at this time).

Of the 111 metadata types extracted by Acumen, the following 19 may be used for deduplication in Acumen:

  • Document MD5
  • Folder MD5
  • Email Subject MD5
  • Email From MD5
  • Email To MD5
  • Email CC MD5
  • Email BCC MD5
  • Mail All Recipients MD5
  • Mail Reply Recipients MD5
  • File Comments MD5
  • File Key Words MD5
  • Task Contact Names MD5
  • Task Status Recipients Completion MD5
  • Task Status Recipients Updated MD5
  • Contact Email Addresses MD5
  • Distribution List Members MD5
  • Attachment File Names MD5
  • Mail Body MD5
  • Mail Linked Contacts MD5

Deduplication by email family

Acumen allows case admins to de duplicate email families (an email and its attachments) across batches and cases. Read more and view a screen shot of the Import Wizard de duplication options.

User-defined de duplication level

The scope of de duplication may be across batches, custodian sources, or cases. For example, data from file servers may be de duplicated across sources, while email may be de duplicated across batches. In the processed batch wizard, an Acumen user may choose to turn off duplicate checking, to display only one duplicate for review, or to display all duplicates for review and link them to each other. Choosing to display all duplicates but link them allows duplicates to be reviewed and produced together. The "duplicates" tab of the processed batch spreadsheet report includes all files that were flagged as duplicates. Duplicates may also be identified visually in the context of the processed batch, along with reference to the first (winner) instance of the document.

Auto-tagging of duplicates

The first copy of any document loaded into Acumen is identified as the master and subsequent identical documents are identified as duplicates. When a reviewer designates (tags) the master, subsequent duplicates are automatically designated (tagged) and redacted the same way. This master/duplicate auto tagging and redaction reduces inconsistencies among duplicates reviewed by different reviewers.

Search to create responsive set

One of the most common culling strategies is the use of search terms. DtSearch Corp. powers Acumen’s search, allowing users to search the full text of files and their embedded documents. Users may also narrow a data set for review based on file-level attributes and criteria. Acumen then displays documents with responsive terms highlighted.  


Choose from a menu of powerful Acumen import options to optimize speed. Process
1 gigabyte of data in approximately 1.5 hrs. Leverage additional filters from Acumen's rich tool set by allowing 2 hour per gigabyte.

Performance benchmarks from HP's Partner Technology Access Center confirm Acumen Import's performance when exercising the complete menu of Import's options on single or multi-server environments and when using SAN.

Performance results on HP BL460c. Jan 31, 2007

Performance results on HP DL380. Dec 22, 2006