Thoughts On Data Privacy

I do not believe that a correct and appropriately flexible privacy regime starts with redaction and filtering. It's my opinion that those are details of the presentation layer, if you will. The part of the system that *acts* on the lease. The correct foundation is the definition of a system that conveys the intent of the data owner to all recipients and users of the data.

Our job has several parts, providing a language for the expression of that intent, reorganizing the data model and infrastructure to facilitate that (and, probably, encode defaults sufficient to prevent grievous violations locale policy), mechanisms for administrators to encode local policy and, crucially, to allow the owner (or originator, it that's the correct entity) to specify specific intention about a specific datum. That is, for example, a parent should be able to say, different from every other parent, that the child's middle name is too embarrassing to be revealed and he or she, as the owner of that data, requires that it never be shown to anyone or only to psychiatric personnel.

Further, assuming that the lease is the central construct, I think that the it should probably not identify specific targets (absolutely zero refIds) but a categorical identification. For comparison, most rental leases have text that specify that "the renter shall...", with the actual name of the individual specified in a header at the time of instantiation (ie, when the contract is signed). RefIds should be specified in an instance specific mechanism that connects to the abstracted spec ("the renter") to instruct the output mechanism what to do/filter/redact. (Perhaps we should define a mechanism for this interpretation but am skeptical.)

I also think that the specification of the lease should be built into the structure of the data model (though I also think it must include a mechanism that allows administrators to override, eg, with xPath metadata). One way to do this would be by adding an element to the complex types that tell us about the privacy components of its subordinates which, in turn, have such an element, too. Perhaps a sort of inline CSS for privacy.

Filtering/redaction should, I think, be considered a ubiquitous part of the rendering system, with the definition of rendering expanded to include processing the data for transmission to another system. That is, the lease should be considered the essence of the privacy regime and the fact of filtering/redacting *one* of the ways that systems respond.

A use case that exemplifies my thinking says... A system receives an object that includes sensitive data that it is allowed to have. Some bonehead thinks, "I have this and am allowed to do what I need to do." Then he or she grabs the object and transmits it to someone else. The software tool that unpacks that received object for viewing by the recipient should be able to look at the privacy data implicit in the object and say to itself, "Holy Cow!!, my user is not X. I will only show him or her the stuff that is allowed." The privacy regime should be built to be as robust as possible against errors including helping systems to prevent violations even if someone else did a bad thing.

The current perspective being applied in much of this conversation focuses inwardly. The lease proposed by Jon (and I apologize if this is too blunt and hope you are sincere about this being a strawman) talks only about how to protect himself as the operator of a system from violating the rules. A good privacy regime should be above those concerns and provide the information that lets him –and everyone else– protect themselves.

Which is to say that the lease and the information it conveys should allow Jon to say, "I have to filter," but it should also allow the system that receives the data (and the ones after that) also to decide what it should do.