Document Tracking: why DRM is needed to track PDF files

Why PDF document tracking needs DRM.

There are many reasons why document publishers may want to track PDF documents but there are also many ways users can use documents without being tracked. That is why DRM is needed to enforce document use.

PDF document tracking – knowing where documents have been, and when they have been used, has become increasingly important in the electronic world. Digital documents flow round the Internet the way water flows round the sea. In great quantities and quickly.

With little ownership or control vested in the documents and the information they contain it might just be a matter of opinion as to whether they are in fact worth tracking at all.

What are the reasons for needing to track PDF documents and their use?

Some reasons are simply commercial – the document owner wants to know if their document(s) are only being used by authorized people and are not being used in locations that have not been approved. They also want to know if documents are being printed, because that may mean the documents are being distributed out of control. They may want to try and actively prevent distribution in some countries or regions.

Other reasons can be regulatory. Procedures, processes and guidance may be distributed as PDF documents by companies to their staff for regulatory purposes, where there is a requirement to be able to demonstrate that the documents were at least looked at, even if not understood.

But without any form of control over the PDF documents then even the information about where they may have been used is of little value.

PDF tracking issues to resolve

The first issue to resolve is that document content is controlled – that it cannot be altered by recipients and ‘fake news’ passed along as if it were the real thing – see document tracking & control. Because if it can be altered then who knows what is actually being monitored and tracked?

Prevent tracking ‘fake news’

The only technical method of protecting PDF documents successfully is Digital Rights Management (DRM). This has the advantage that all a document’s content is protected from alteration and misrepresentation, so that a regulatory requirement cannot be forged or removed without destroying the document itself.

This goes some way to explaining that unless you work with protected PDF information there is little clue as what it is that is actually being tracked.

Authenticating the user being tracked

The second issue is to know who is an authorized user, and of which protected PDF documents. This is very difficult as even the best Artificial Intelligence Visual Recognition systems require a whole infrastructure to support user recognition, usually going through a ‘registration’ process in order to be educated into who are to be recognized, and requiring willingness on the part of the user together with suitable camera technology on the device the document is to be processed on.

Now that turns out to be a big deal. Users are already concerned about what their electronic identity can be used for, never mind what their visual identity could be put to. Ignoring questions of reliability, the potential for ‘fake news’ is bigger and more convincing than ever before. After all, you saw it on video so it must be true. And there are implementation problems when mixing with personal data (data privacy protection) where a visual recognition is considered invasive and requiring treatment as sensitive personal data.

DRM document protection takes a less zingy approach to identifying users, focusing on authorised machine identities (which are quite hard to forge) and thus reasonably reliable. This can be flexible enough to allow the same authorised user to use several different machines, each of which is registered against their ‘user identity’ whilst avoiding the problems and complexities of privacy regulation because machines are not people.

Location tracking

The Internet uses the Internet Protocol (IP) address to locate devices in order to communicate with them. To make things more interesting IP addresses can be static or dynamic. But they are administered and controlled, and they can tell you where they have been assigned to – region, country, location. But not down to the level of personal data. And using IP addressing as a further refinement allows a different and perhaps subtler control for the detection and the prevention of unauthorised use.

DRM provides better administration and control for tracking PDF files

DRM technology relies on encryption to protect the information it is protecting and controlling. Exchanges between administrators and protected document users are themselves encrypted. This means that it is difficult to confuse the administration system with false messages and also that unless tracking actually takes place, access to the DRM protected document can be denied, printing documents refused or other controls introduced because the administration system is not available to carry out tracking.

What can PDF DRM based tracking achieve?

PDF DRM based tracking allows document publishers to track activities that are reasonable without becoming invasive for the end user. Nor do they require a massive back end administration system to collect permissions from document recipients and allow them to be granted and revoked and maintain extended logging files which themselves will require additional security provisions if it is possible to add controls that are not fundamental to the furtherance of a contract (most likely to supply protected PDF information and track when it is used for the purpose of verifying the identity of the user and their contracted use of agreed documents).

Now that is a bit of a mouthful, but it is avoiding starting to monitor looking at individual pages or printing parts of the PDF DRM document. These levels of monitoring may seem attractive, but they require several things to be in place to work:

The user must always be online to the administration server
The application must be continuously polling the administration server in order to establish timing of page reads
You must take note of relevant Privacy regulations such as the EU General Data Protection Regulation (GDPR) which apply to the collecting and storing of such information

The results will need some interpretation because factors that cannot be controlled include:

More than one instance of the app used to view the protected PDF file may be running, confusing length of time on a single page or multiple pages
It is not always possible to interrogate a print tab to find out what proportion of a document is being printed and using which printer facilities

Web based DRM systems and document tracking

One needs to be aware that there are limitations in any tracking system, some of which are imposed by the Operating System provider or browser environment rather than being limitations in applications design. For example, some web-based document DRM systems enable you to limit and track prints but this control is effectively useless:

while you count each print action done from the browser or Viewer (i.e. file>print or print button click) you cannot tell if the user has then cancelled the printing process via the print queue – so that print allocation is now used even if it was cancelled.
you cannot count the no of copies the user has set in the printer dialog. So you might give someone say 1 print, but they set the number of copies to 5 in the print dialog and so have already got around the ‘limited copies’ feature. The tracking control would only count this as one print.
users can print directly to PDF printers and other file printers thus giving them unlimited printing with no tracking.

So while web-based document tracking systems (for example online or virtual secure data rooms) seem to be an easy and quick solution they have many pitfalls because there is no software installed on the client to fully control and enforce document use.

Matching Locklizard PDF DRM to tracking requirements

Locklizard, a premier PDF DRM control system supplier, has products that are well placed to address the issues that have been identified when setting out to track protected PDF documents and link documents to authorized users.

Not only are controls available to assure that the correct document is being tracked, but they ensure that realistic data are being collected, and will not give rise to complaints about the potential use of personal data. Locklizard do not collect any personal data either for their own processing or for processing by third parties. Any data that are collected are identified by the publisher of the protected PDF documents and Locklizard only make them available to that publisher and no other.

So tracking document usage by user is readily achieved with a reliable system using PDF DRM technology.

Document Tracking and DRM