Managing secure distribution of confidential documents & PDF files: use of DRM in content life cycle
Defining content lifecycle
Content lifecycle is defined by some as the series of changes in the life of any piece of content, including reproduction, from creation onward. This definition is somewhat vague because the apparent importance of content may change to reflect changes in the collection of the content or changes to the perception of the value/importance/secrecy to be applied to the content. For instance, a person’s name without any other data is not confidential, but if it is connected to their medical records it certainly is.
Who is responsible for defining what content is confidential & secure document distribution?
So managing secure document distribution for confidential documents (use of Digital Rights Management) is a dynamic activity. There have been many attempts to determine by automated means when documents are confidential (some by text examination and others by analysing metadata) but these have all had their problems since classification is difficult, even for experts. The word ‘acquisition’ for example, may be confidential in the context of buying a corporation but uninteresting when buying a software package. Context may not help – “This will be an important acquisition,” does not tell us much unless a monetary sum is conveniently nearby.
Generally where confidential documents are to be distributed it is known at a management level if the content should be controlled or not, and responsibility for managing the situation is down to either IT systems managers, where the distribution is from computer application to computer application, and Departmental management where documents are being distributed.
Distributing documents securely and the role of encryption
Exchanges between computer based applications generally use encryption in order to protect the transfer from being intercepted or misused. But managing the document content itself has proved rather more difficult. Trying to use encryption on its own has proved technically difficult because although it preserves perfectly the original content, it has no continuing control over how a file is used once it is decrypted and returned to its original form.
A simpler approach to exerting continuing content control has emerged with the increasing availability of Digital Rights Management technologies that can be applied to documents in order to protect confidential content. This works because by using DRM it is possible to control subsequent use of the whole document instead of relying upon ‘conventions’ or ‘understandings’ as to how the content is to be controlled once it has been disclosed.
Is the content confidential?
The best example of the confidential content control problem is not, as many would think, the fact of disclosures by Snowden in June 2013, but the results of what was disclosed – the extent to which US technology firms were using personal data, and the extent of US government access to such data, causing the EU to invalidate the Safe Harbor Agreement between the United States and the European Union that governs the transfer and protection of personal data and affects the commercial operations of many businesses.
Had DRM enforcing technologies been in use, Snowden would not have been able to publish all those PDF documents, and so the actual content and its confidential meaning (deduced from the presence of the content rather than being the content itself) would have remained under control.
Confidentiality through redaction
A commonly used alternative, although rather less successful in the world of digital documents, is the use of data redaction to control access to confidential information.
Data redaction is used to render only specific data items inaccessible, and it does not seek to control the subsequent use of the document content. It does that by blanking over the part of the document that is to be redacted, but leaves the surrounding information untouched. In many systems this is achieved by putting a mask over the visible content, but does not remove the underlying content. Because the underlying data remains, anyone able to access it without further control is able to both read and process the un-redacted text.
DRM controls are much more successful because they are able to enforce redaction by preventing access to the underlying code that creates it, as well as being able to prevent onwards use of controlled documents.
Content available for a limited time
According to James Bond, “Diamonds are forever,” but content often is not to keep and use forever. Frequently, controlled content has limited accessibility. This may be due to information being superceded and is so no longer valid, or that is only given a certain lifetime to begin with. And controls are needed to prevent content from being available before it is officially released.
So content management controls need to take into account a number of dates: the date before which the content may not be used; the length of time the content can be used for (some legislatures mandate 6 years maximum from collection to use personal information); the date on which the content ceases to be available, or on which it becomes publicly available; the length of time content can be used after first being used, the number of times that the content may be used by a specific user. Some of these content controls can be technically difficult to invoke.
In order to talk about time (as far as computers are concerned) many people talk about ‘network time’ as being highly accurate and reliable. And if you are looking to the nearest hour it is probably good enough as long as there are not too many multiple routings possible. Aiming for sub second accuracy requires specific hardware on the computer checking the time. Another time factor is the time zone that is being used. Most people consider time to be where they are located, but in a global operation this is impractical and there has to be a more structure when deciding which ‘day’ you are in. A typical solution to this is to use Universal Time Coordinated (UTC) time. It is within a second of mean solar time at 0 degrees longitude and does not use daylight saving time. So it provides a stable output for measuring the hour or the day when you are operating global access by time and date. This is an ideal approach to setting start and stop dates that will be consistent.
Content kept confidential by location
Further content controls may be applied at the IP address level. At one level, this kind of content control is used to prevent data leakage outside of specific networks or groups of networks. Either all documents are restricted to one or more individual or groups of network addresses, or specific users are restricted to the range of acceptable addresses they can use content that is protected.
Using IP level controls may also be extended to determining the geographic locations that are permitted. This may prove to be an essential technique when some documents may be embargoed in some countries, or where content is country specific. For instance French content may be restricted to France and Canada and not available in Belgium, where English content may be provided. Also, some countries may forbid or censor content and prosecute publishers who fail to prevent content use when it is not approved.
Auditing use of confidential content
Underpinning content control is providing audit information about the use of content. It is all well and good having applied content controls, but how do you know they are working? Where content is being used, logs need to be maintained that list which users have opened the documents, at what time (UTC is simplest), and what IP did they use at the time. (This may also enable content controllers to identify if content is being used in locations that are inconsistent (appearing in a country they are not expected to be in, for instance, or being in several places at the same time) and might require investigation. When printing is allowed controllers may require a log of each time particular content was printed as well as controlling the number of prints that may be made. In combination these are valuable tools for auditing user activity or as a basis for charging for use.
Summary: Secure document distribution and content management
In summary, secure document distribution and content management for confidential documents may be achieved by using DRM class technologies. These enable you to bind controls to the content that are abiding. Confidential content in a lifecycle management sense has many dimensions beyond file access controls: Controls are needed by dates, numbers of accesses, numbers of printed copies made, and location in order to allow the content owner to be effective in managing confidentiality. Further there must be provisions for the content owner to change their minds about controls such as dates or prints.