I recently found myself in a surreal discussion in which it was suggested to me that metadata was like poetry.
I was asked to consider the line 'I wandered lonely as a cloud' and it was put to me that 'I' am the data, 'lonely' as metadata, and 'as a cloud' is meta-metadata. My interlocutor beamed at me with pride, and waited for my confirmation of their brilliance.
My heart sank as I realised that analogies are really not very helpful.
Like many in Enterprise Information Management (EIM), I have been guilty of using them far too much. I have now promised myself to stop using analogies, and instead to make the effort to understand how others see the world, and to explain Enterprise Information Management in terms they understand, about data they really use.
Why do we use analogies?
A lot of the material in EIM is pretty abstract, and it only really makes sense once you have already done it a couple of times.
Anyone working in EIM will be familiar with the challenge of explaining what it is and why you should do it. Metadata is a great example, and the simple definition of 'data about data' doesn't really help anyone new to the topic. So there's a temptation to introduce analogies. As well as poetry, a recent project I worked on threw up analogies including fingerprints, traffic rules and photography. We use them to try to explain concepts we understand to someone who doesn't.
But analogies don't help
There are two problems with analogies. The first is that they eventually break down. Once you have introduced an analogy, the discussion inevitably explores it further and you end up debating where it is valid and where not. The second problem is that an analogy only really works for the person who came up with it. One of my favourites is weeding a garden as an analogy for Data Quality improvement.
To me, it makes sense, because the weeds will always come back, whatever I do. So although I can aim for a weed-free garden, I know I’ll never achieve it. It's the same with Data Quality: while you may aim for zero defects, you can never actually achieve this. For those who don't garden, it doesn't help.
For those who do garden, the discussion moves on quickly to dandelions, moss and bindweed. Either way, we're not getting very far with Data Quality, and the analogy breaks down because no one ever creates a weeding dashboard or assigns gardening stewards.
From the specific to the general – and back again
One of the problems comes from the way that we in Enterprise Information Management think. As a group, we tend to look for patterns and are constantly seeking general abstractions in a sea of specific examples.
> See also: What is a true digital enterprise?
A good example is the party data model, which was born from the observation that customers, suppliers, employees, representatives and so on have common attributes and that they can be generalised as persons or organisations, that they can be related and so on. There are other generalised data models that have been developed over years. It's what we do, we can't help it. That's why we ended up in EIM in the first place.
The problem is that the rest of the world doesn't think like this. Most people consider customers and suppliers to be fundamentally different, and a party is an event where people celebrate. We need to get back to specifics that are relevant to our stakeholders and give them concrete examples of what we’re talking about.
From poetry to SOAP
To move the conversation away from poetry and on to something more useful, I dug a little deeper to find something more specific and relevant to someone seeking to understand metadata. The guy I was talking to had experience of integrating systems, so we talked about exchanging data between two or more systems. He suggested SOAP as his preferred protocol. Then we discussed how a SOAP message is specified as an XML Information Set, and that this is an example of metadata.
As we were on familiar ground, I could explain to him why such an Information Set would need to be owned, why it should be approved, why changes should be carefully managed, and what the risks of an incomplete definition would be. From this example, which he understood, the definition of 'data about data' made sense, the need for formally managing it made sense, and he could begin to understand that these principles would apply in other scenarios.