[Rant] MarkTime's Metadata Wack Attack | ANN.lu |
Posted on 02-Dec-2003 18:00 GMT by MarkTime (Edited on 2003-12-02 21:51:39 GMT by Teemu I. Yliselä) | 13 comments View flat View list |
The unfortunately titled MarkTime's Metadata Wack Attack aims to be an introductory discussion about the subject of Metadata.
Good communication is an art form, not a science. And yet science provides cues about the makeup of good communication. I found an example of this conundrum when reading the introduction page for the government's metadata standards committee. The government’s premiere experts on naming conventions define themselves in this way:
>>NCITS L8 (formerly X3L8), on Data Representation, is a technical committee of National >>Committee on Information Technology Standards (NCITS) Accredited Standards Committee >>X3, which is accredited by ANSI, the American National Standards Institute.
The description is technically accurate and yet I found it to be nearly meaningless. The world's foremost group of scientists working on naming data named themselves 'NCITS L8'. The term NCITS L8 is not descriptive; even with helpful information that the group was formerly called X3L8. It has a number at the end, implying it is a member of a series; I felt stupid for also not knowing NCITS L7. To put it bluntly, this is exactly the type of name expected from a government agency: cryptic and nearly impossible to understand. Thankfully, most people in the industry refer to NCITS L8 as the ‘Metadata Standards Committee’. The term ‘Metadata Standards Committee’ is descriptive, and because it is so easy to understand, it is also easier to remember.
Here is the issue for an organization: too much effort into naming standards is not only a waste of time, but it potentially produces comical and bureaucratic results. However, a lack of attention to naming standards may result in even more dire consequences: incongruent names, outdated or inaccurate information, and as a data system grows larger, it could become unmanageable.
The solution must always be a compromise; an organization must define standards, and enforce a reasonable set of rules. Beyond these rules, a skilled communication artisan can distinguish himself or herself. It is an art, and with practice and attention, the information architect can improve on their skills in a noticeable way, and to the benefit of the group. An organization that fosters an environment of good communication achieves better communications.
The purpose of this article is to outline suggestions for better metadata naming standards.
The Amiga community has within it, some organizations, which have data collection needs (customers, users, sales, etc.), and at the time data is collected (right now while all these groups are small), some thought about standards will result in better formed data, data easier to exchange, whose meaning is well known, as opposed to collecting it without any specific reasoning, which will become an unmanageable expense over time.
Thankfully, if a company becomes that successful, they can afford to pay for their lack of preparation, by hiring an expert, so give me a call.
Another purpose of this article is just the usual rant, and Amigans are geeks who love this sort of thing.
Main
----
"Humans are aware of anything that exists in the natural world through its properties. Data represents the properties of these things."
--from the introduction to: ISO/IEC 11179-1:1999(E)
The international standards community, primarily through ISO (the International Organization for Standardization) and IEC (the International Electrotechnical Commission) is developing international standards for data representation.
In this way, information may be exchanged globally, with less confusion resulting from language barriers, cultural differences, by industry, or confusion resulting from data that adheres to undocumented or nonexistent standards.
The document 11179 is a work in progress that has been in development for over a decade. It represents some of the best thought in defining metadata standards.
In addition, at this time, large software corporations, such as Microsoft and Oracle publish their own standards that are commonly followed.
The purposes of these standards are:
- allow data definitions to support the needs multiple users, departments, companies, and countries
- reduce redundancy of definitions
- recognize standard data, for example, addresses and identifiers
- give uniform guidance for the development and description of data
- allow data to be exchanged with data modeling software
Large corporations already deal with standards in some areas. Interfaces to shipping systems require address standardization. Government regulatory requirements often impose a need for standard data.
An organization will benefit by defining standards across its organization both for expediting data exchanges between organizations, companies and governments, but also for exchanges between employees and departments. A future employee will more easily understand the work of a current employee, for example, when standards are defined and applied.
Concepts
Developing Definitions
The metadata standards committee gives some recommendations for defining a data element. They are:
each definition should be unique:
be stated in the singular:
state what the concept is, not only what it is not:
be stated in a descriptive way:
if using an abbreviation, it should be commonly understood:
be expressed without relying on the definition of another data element:
I recommend reading ISO/IEC 11179-4 for further discussion of these topics. But briefly, I will discuss the concept of defining data in the singular.
It is especially interesting because Oracle Corporation recommends that table names always be stated in the plural.
Who is correct?
There are good arguments for both sides. In Oracle8i: The Complete Reference (3rd edition), Loch and Loney argue in favor of the singular naming convention. It is good English. They use the examples: phone book, restaurant list, address book. We don't say, for example, ‘addresses book’. In English we most commonly use the singular.
However, if we were to label a container, for example, a jar of pickles, we would write on the label, the word 'pickles', not the word ‘pickle.’ Seemingly, by writing a singular on a container, we are implying the existence of only one of that item, which may be an inaccurate suggestion.
It is an interesting discussion, but the important thing is just to be consistent, so that you never have to recollect from memory if the table name is singular or plural. Inconsistency is the cause of a lot of coding errors and inefficiency.
Naming Principles
The number one rule of naming convention is you name something only once.
For example, oracle recommends if a department number is deptno in one table, then it cannot be defined as dept_num in another table.
This may seem like a simple rule, but it's extremely difficult and takes a great deal of attention and good memory and understanding of data. It can be complex to follow, for example, information coming from different data feeds, may come in different forms.
For this purpose, a repository matching the relationships between entities is recommended for larger organizations.
This topic, of not duplicating information can be extended to different levels of abstraction.
For example, we can define at the most basic level of semantics, an object class. For example, in the column name CAR_TOTAL_AMOUNT, the word CAR is an object class, it defines the realm in which this description exists.
The recommendation is to have one, and only one, object class defined for each concept. A CAR should not be called somewhere else an AUTO. That leads to confusion, are columns named CAR_TOTAL_AMOUNT and AUTO_TOTAL_AMOUNT the same or different? The implication is that they are different, because we do not duplicate column names, but they also appear the same, because they have the same meaning in English.
In the example, CAR_TOTAL_AMOUNT, the word TOTAL is a property identifier; there should be one, and only one, word to represent this property identifier. Again, CAR_TOTAL_AMOUNT and CAR_COMPLETE_AMOUNT would be confusing.
The last term in CAR_TOTAL_AMOUNT is AMOUNT. It is called the representation term.
If you guessed you should have only one representation term for a concept, you are correct! AMOUNT should always be AMOUNT, not, for example COST somewhere else.
So, in summary, everything should be consistent, not just specific column names, but also concepts. The order of terms should be consistent. You should not have AMOUNT_TOTAL_CAR, even if it is consistently AMOUNT_TOTAL_CAR everywhere, because now the order of terms is not consistent.
The standard order is object class, then property term, then representation.
Conclusion
I hope the background information on metadata sparked some interest in metadata concepts. The subject of metadata is somewhat nebulous, I am not aware of a comprehensive reference on the subject.
Still a well organized data collection is obvious when you see it, and results in a more efficient operation. Databases are inherently complex, but part of the complexity typically seen from a database is due to design choices that, while appropriate for a computer, often give misleading signals to human beings. I have a whole collection of real world, very specific, choices for meta data collection, developed over years of experience, but that is for me to know, nanny nanny boo boo, this isn't a scientific article, after all.
Few people understand metadata, but a metadata expert must understand people. By constructing a database that is consistent, simple, concise, non-redundant, and with naming conventions that are intuitive, everyone from a programmer to an end user can more effectively work with the information store.
This article was written by Robert Dupuy, aka MarkTime. It is released to the public domain. References were given, as appropriate.
|
|
MarkTime's Metadata Wack Attack : Comment 1 of 13 | ANN.lu |
Posted by MarkTime on 02-Dec-2003 17:05 GMT | In the phrase CAR_TOTAL_AMOUNT the word BOOK is an object class?
Summamabeeotch
I meant the word 'CAR'...oh well. who read that far anyway.
LOL |
|
MarkTime's Metadata Wack Attack : Comment 2 of 13 | ANN.lu |
Posted by bbrv on 02-Dec-2003 17:49 GMT | In reply to Comment 1 (MarkTime): MarkTime,
:-o
Need time to answer this.
In the meanwhile, metaMorphOSis OK!
:-D |
|
MarkTime's Metadata Wack Attack : Comment 3 of 13 | ANN.lu |
Posted by MarkTime on 02-Dec-2003 18:58 GMT | In reply to Comment 2 (bbrv): metamorphosis ok... :-)
oh gawd, I really am a geek...I thought that was funny. |
|
MarkTime's Metadata Wack Attack : Comment 4 of 13 | ANN.lu |
Posted by bbrv on 02-Dec-2003 19:49 GMT | In reply to Comment 3 (MarkTime): :-D
MarkTime, can we twist that a little into something like this:
One of the most dramatic impacts of computerization has been the orders-of-magnitude productivity increases organizations experience today in developing, articulating, and disseminating information. Because of competitive demands and the creation of the Internet, organizations have taken advantage of this much enhanced capability to generate an explosion of information about their organizations, processes, products, etc. (“digital content”). Unfortunately, because their existing systems are often relatively inflexible and have limited abilities to communicate, the digital content that is created or developed is typically recreated and/or repackaged many times in order to present the information usefully to all of the information consumers, e.g. customers, partners, vendors, employees. As a result, despite increased productivity, organizations experience (1) excessive costs, including duplicate entry, reconciliation different versions of data, rekeying of data into spreadsheets, etc., (2) decreased responsiveness, and (3) lost opportunities as the difficulties of managing and understanding information keep organizations from initiating new methods and communications because the costs don’t appear justified – or just because the opportunity is invisible under the mass of data!
Given these issues, what would be possible if organizations could collect all of their content in one place, present it over the Internet and easily customize views and documents, and easily export these to office productivity tools? What would be the impact of being able to manage all of the different types of digital content, e.g. specifications, pictures, drawings, and even video, within one repository? How much cost savings and flexibility would be available if an environment existed where all of the content was aggregated, but security allowed customers, partners, officers, employees, and vendors to only see what they are entitled to see? What is needed is a highly scalable, flexible architecture for aggregating and disseminating information throughout the Web in just this way.
If we could provide a simple, non-programmatic method for defining the content to be collected, and then automatically creates an intuitive interface for capturing that content. We should also provide for a non-programmatic method for defining content presentation, and then provide the content to information users everywhere -- but with tight security that ensures users only get the information that they are authorized to look at! Moreover, we could then provide the personalization capabilities to let information users see what they want – and not what they don’t.
We should take advantage of a highly targeted architecture for the productive management of “aggregated digital content”. The heart of this architecture would be a “DSS” (Directory Sub-System). The DSS allows this method to aggregate digital content of any type, and makes it easy to define both its collection and presentation. We could provide the solution for organizations that need to aggregate rich content and provide it to multiple types of users flexibly and securely.
Of course, there already are a number of solutions created to address the information management problem outlined above. In response to the wealth of systems many organizations are using, e.g. “ERP” (Enterprise Resource Planning) systems, “CRM” (Customer Relationship Management) systems, several new software companies, “EAI’s” (Enterprise Application Integrators) developed tools that allow organizations to more easily integrate their data across all of their systems. Where these systems are already in place, we can further leverage these capabilities by providing a flexible repository for collecting and presenting securely all of this content to all information users.
Moreover, in response to the overwhelming amount of data available in today’s systems, “BI” (Business Intelligence) solutions were created to turn this data into “information”. These solutions perform valuable “OLAP” (On-Line Analytic Processing) by summarizing collected system data and then presenting those results in a tabular or graphical way. By investing in both EAI and BI offerings, organizations can go a long way towards optimizing the usage of their data, but they will still lack an environment for managing all of the rich digital content they probably have developed, e.g. pictures, specifications, drawings, and even digital and audio, and they will still need the capability for flexibly and securely presenting that content to everyone. We should meet that need. Morever, because we could manage relational content and would be based on relational storage, it could leverage existing BI solutions in two ways: first, by providing an environment for collecting data on which BI solutions can operate, and second, by being able to present BI results through a controlled content-management interface. This would be the missing piece that allows organizations to manage and disseminate all of their digital content.
Later phases of the offering could include (II) interoperability with typical, relational database-based systems, (III) infinite-scalability, and (IV) integrated management of disparate, but highly related content (“metadata-aware”).
See we were listening! :-) Are we in step with you MarkTime? Can we march ahead now?!
R&B |
|
MarkTime's Metadata Wack Attack : Comment 5 of 13 | ANN.lu |
Posted by Anonymous on 02-Dec-2003 21:08 GMT | In reply to Comment 4 (bbrv): >How much cost savings and flexibility would be available if an environment>existed where all of the content was aggregated, but security allowed customers,>partners, officers, employees, and vendors to only see what they are entitled>to see? This coming from an FBI man? :-7 |
|
MarkTime's Metadata Wack Attack : Comment 6 of 13 | ANN.lu |
Posted by MarkTime on 02-Dec-2003 22:32 GMT | In reply to Comment 4 (bbrv): @bbrv,
I doubt you were waiting for me to get your marching orders, but march on!
I did find this article of yours to be interesting, and different from one's I've read before, and furthermore, it outlines not only a good market to be in, because there will be billions in revenues from the paradigm shift, but in in it,I see a vision for a better world....something I personally like, because that is what I do, every day, create a better world.
Well that and rant, and nearly take people's heads off if they sass me.
:-) |
|
MarkTime's Metadata Wack Attack : Comment 7 of 13 | ANN.lu |
Posted by katos1 on 03-Dec-2003 01:19 GMT | I didn't understand a thing you said... Memory overrun Error??
well, once I upgrade I will try reading that again ;-) |
|
MarkTime's Metadata Wack Attack : Comment 8 of 13 | ANN.lu |
Posted by Raffaele on 03-Dec-2003 07:37 GMT | BAH...
Another useless committee for international standards
By the way, IMHO they re-invented the concept of "hot water" and how to made it...
They can learn from our platform, because regarding Amiga we already have:
1) IFF data structure stored into Form and Chunks...
Add a MetaData-chunk-info into a IFF structure...
...and any program capable to manage that chunk will give to every file it saves a flexible solution to manage MetaData...
2) Datatypes and Multiview
It is not difficult form an amigan programmer to add another datatype containing references to read metadata from the inside of other files, copy them, store them, edit them, etc...
Even if none of the amiga programs can still manage metadata from inside files (spreadsheet files, database files, etc., such a solution based on datatypes could be useful and elegant)
Unfortunately, not so much people in the world heard anything about Amiga in these days, they could have learn a lot more, and being infuenced in making international standards and rules to manage these standard itself from any OS...
Bye,
Raffaele
P.S.: REMEMBER that it still remains a fact.
If metadata are a way to add unique names to call certain procedures within database and spreadsheet programs, there is only a way to implement it. To Build Metadata structure infos into the core of these programs. |
|
MarkTime's Metadata Wack Attack : Comment 9 of 13 | ANN.lu |
Posted by MarkTime on 03-Dec-2003 15:49 GMT | In reply to Comment 8 (Raffaele): Raffaele,
I understand your skepticism, and truly I tried to reflect a healthy dose of skepticism in my own article, because as I said, these internation standards committee's tend to be a bit of a beareaucratic joke, they claim to be experts on naming conventions, but they can't even name themselves in any kind of meaningful way.
However, the real world problem, is that large organizations are constantly renaming the same things over and over and over again, and then constantly paying for their own lack of understanding later, with confusion, reports that have inaccurate information (making decisions on this confused information)and finally just paying someone to do it right.
I just wanted to present some metadata things for people to think about, and you know, when you set up your information store, doing it with *some kind* of reasoning usually ends up better, than what most people do, is which they charge ahead blindly without any information at all.
I agree, there is no panacea or perfect solution, and metadata aware programs would be helpful.
I also don't think metadata is the end all, its just one subject I like, one of many. I agree too that Amiga OS did some things very right in the past.
However, when all I hear about now is talk about just getting netscape and a copy of Quake III to run, I realize that later efforts at Amiga OS improvements are stalled at 24 bit PNG icons.
Well, bbrv's response was a good one, in that he is thinking of some real ideas that would make a difference in the world...they make a difference by aiding in universal communication, and efficiency of groups.
But, listen, I don't think there is one solution to solve everything, I just think people have to set out goals, and have an idea of what they are trying to accomplish.
I forget who said it, but in the long run, you hit only what you aim at. |
|
MarkTime's Metadata Wack Attack : Comment 10 of 13 | ANN.lu |
Posted by bbrv on 03-Dec-2003 17:23 GMT | In reply to Comment 9 (MarkTime): MarkTime, this thread and the concepts you started this discussion with are very interesting to us. It would be very possible through a Task Oriented User Interface to be able to provide any digital content (Documents, Specifications, Pictures, Audio, etc.) to Partners, Suppliers, Customers and/or Employees. This could be done for a provider with:
-Full Security Control - what each should see, when they should see it
-Configurable/Extensible Form Views - no programming required
-Single Point of Access - collect all open content in one place
-Automatic Update on Change - aggregation, no duplication
-Internet Interoperability - leveraging Partner and Supplier content
The "free" internet will not go away, but there is a compelling reason for something new and completely innovative.
R&B |
|
MarkTime's Metadata Wack Attack : Comment 11 of 13 | ANN.lu |
Posted by bbrv on 04-Dec-2003 03:59 GMT | In reply to Comment 10 (bbrv): Robert, your comcast email address no longer works.... |
|
MarkTime's Metadata Wack Attack : Comment 12 of 13 | ANN.lu |
Message removed by Christian Kemp for violation of ANN's posting rules. Specific reason from moderator: Impersonation |
|
MarkTime's Metadata Wack Attack : Comment 13 of 13 | ANN.lu |
Posted by Alan LM Buxey on 08-Dec-2003 13:58 GMT | In reply to Comment 9 (MarkTime): >later efforts at Amiga OS improvements are stalled at 24 bit PNG icons.
stalled? one simple patch and 24bit PNG icons are all yours.
(the others you raise eg Mozilla are valid for the system - be it native, or emulated - to be adopted by users at desktop level)
alan |
|
Anonymous, there are 13 items in your selection |
|
- User Menu
-
- About ANN archives
- The ANN archives is powered by #AmigaZeux. It was updated daily (news last: 22-Oct-2004; comments last: 18-May-2005).
ANN.lu was created, previously owned and maintained by Christian Kemp, www.ckemp.com.
- Contribute
- Not possible at this time!
- Search ANN archives
- Advanced search
- Hosting
- ANN.lu was hosted by Dreamhost. Sign up through this link, mention "ckemp" as referrer and he will get a 10% commission on any account you purchase.
Please show your appreciation for any past, present and future work on ANN.lu by making a contribution via PayPal.
|