Sunday, January 11, 2009

Metadata or data

I get quite irritated (sorry - no patience) when I hear others talk of a very distinct difference between metadata and transactional data. I really don't agree.

The arguments generally go something like this: "Metadata is mostly read-only. Transactional data is written much more frequently. Metadata has different access patterns -- I don't even know what that means :).  

I find that to be hogwash. That describes usage not kinds of data. I do not like categorizing data. It's like nominal typing - limits its broader viability and usability after the fact. Any data at any given time can be more like metadata or more like transactional data. For example:

- To an engineer, bill of materials is transactional data when designing a product. But, to a resource planner, the bill of materials is metadata that drives materials planning, purchasing and manufacturing scheduling. 
- A web page is transactional data during development, but metadata at runtime (unless it is self-modifying code) 

So, it just depends on the usage. Don't categorize the data, just understand the usage.

As for Oslo, I assert thatwe are building a broad set of capabilities to describe, validate, transform, access, and store data. Sure, Oslo's primary scenarios and our investments right now are targeted at data that describes runtimes. However, our ambitions are bigger, and our architecture and designs not limited or miopic in our thinking. If they are - please help us. After all, data is just data.

3 comments:

Jon said...

Your point about "one man's data is another man's metadata" is a good one, but not unique.
For example, to the Social Security Administration, an SSN is a primary key - a unique (hopefully) identifier without intrinsic meaning. Ditto for your account number at your bank.
But to someone else, those are now deeply endowed with meaning and so make TERRIBLE primary keys. (leaving aside privacy and legal considerations)

I'm actually never heard of the apparent war between transactional data and metadata. Usually you hear about transactional data versus analytic data - as in "happening now" versus "analyzed/reported on later". But then I'm a classic database dude, which is not really your scope in this discussion.

Metadata is data, but it is data describing other data. In my DB world, the number of rows in a table is metadata, the column data types are metadata.

Maybe you could say data is the thing you are interested in (whoever you are at whatever level in the process) and metadata is anything "above" that level.
Data to a database user is the content of a table.
Data to a database designer is the arrangement of columns and datatypes.
Data to an operating system programmer is the arrangement of bits in the file system.

And of course, anything "below" your level, well that is technically termed "irrelevant crap". :-)

Pinky said...

Yep - the problem is that sometimes we myopically only think about one or the other.

Colin Jack said...

"Any data at any given time can be more like metadata or more like transactional data."

Definitely but usually within one model data is meta-data or transactional data and seperating them can help. Slightly different case but I'd recommend looking at Knowledge Level in DDD, introducing it can really help to clarify your model



"To an engineer, bill of materials is transactional data when designing a product. But, to a resource planner, the bill of materials is metadata that drives materials planning, purchasing and manufacturing scheduling."

Question is can one model serve both, should it? Depends on the situation but its quite possible you have multiple bounded contexts here and multiple models in play.