XMDR

eXtended MetaData Registry (XMDR) Project

Scope
This project is concerned with the development of improved standards and technology for storing and retrieving the semantics of data elements, terminologies, and concept structures in metadata registries.

Existing metadata registry standards include the ISO/IEC 11179 family of Metadata Registry standards (e.g., ISO/IEC 11179, ISO/IEC 20943, and ISO/IEC 20944). We intend to propose extensions of the ISO/IEC 11179 family of metadata registry standards to support more diverse types of metadata and enhanced capabilities for semantics specification and queries.

  • Propose revisions to ISO/IEC 11179 Metadata Registry (MDR) Standard.
  • Creation of a prototype extended metadata registry.
  • Loading some terminologies / ontologies into the prototype XMDR.
  • Explore technologies for providing access to the XMDR across the Internet.

We welcome participation in the project of additional parties who will actively contribute (funds, data, code, labor). See Contact Information.

Uses of Extended Metadata Registries
Extended metadata registries can be used for a variety of purposes including:
  • Data administration for databases - e.g., database design, documentation, applications, user help, query planning and optimization, federated databases/data integration, data warehousing extraction/translation/loading, etc.
  • Data administration for datasets - for (scientific) data not kept in databases
  • Data administration for messaging and data exchange - e.g., EDI, e-commerce, military messaging, HL7, XML message exchange, ...
  • Support for scientific workflow management systems, scientific and statistical software "workbenches"
  • Support the use and enforcement of controlled vocabularies, and authority lists (for proper nouns such as places, organizations, bibliographic citations, or persons).
  • Support for interoperable annotation systems, e.g., annotation of documents, programs, DNA sequences, genes, ....
  • Support for natural language processing - entity extraction, etc.
  • Agent support - provide a mechanism for software agents to discover, retrieve, and update metadata.
  • Support for semantic help facilities (e.g., data element definitions) for data entry systems.
  • Metadata support for case based reasoning (CBR) systems (used in medicine, equipment failure analysis, software problem resolution, architectural and mechanical design, and law enforcement applications)
  • Bootstrap facility for Semantic Web applications. XMDR will provide a mechanism for open sharing of ontologies and/or ontology metadata, thus promoting sharing, interchange, merging and adoption of ontologies.
  • Centralized storage for enterprise metadata that allows decentralized stewardship of models (e.g. schemas, ontologies, etc.).
Proposed Revisions of ISO/IEC 11179 Medata Registry Standard(s)

The proposed revisions to ISO/IEC 11179 Metadata Registry Standard(s) concerning enhanced capabilities with respect to terminologies, classification schemes, and ontologies should be normative.

We would like (ultimately) to register semantically accurate, ontologically well structured terminologies, classification schemes and ontologies. This might well lead to a requirement to capture the semantic specification in terms of a description logic.

We also expect to capture many existing terminologies, classification schemes, etc. which may not meet the standards of a "well-formed" terminology / classification scheme. Many existing terminologies / classification schemes lack a description logic encoding or even sufficient precise semantic specification to permit such a DL encoding with much additional work (many person years per terminology).

Current thinking is that we may adopt a multi-level registration process for terminologies / classification schemes akin to that used for data element registration. Lower levels of registration would be merely descriptive. Higher levels of registration would be required to meet additional "well-formedness" requirements.

We are aware of efforts in ISO JTC1 SC32 WG2 led by Hajime Horiuchi (Tokyo International Univ, ISO JTC1 SC32 WG2 Japan) to develop a Metamodel Framework (MMF), which is intended to support registration of ontologies, such as those specified in the OMG ODM approach. We expect to coordinate our work with their efforts.

Prototype Extended Metadata Registry (XMDR)

The purpose of the prototype is to demonstrate the feasibility and utility of constructing an extended metadata registry, i.e., one which encompasses richer classification support, facilities for including terminologies, and better support for formal specification of semantics. The prototype registry will also serve as a reference implementation for the revised versions of ISO 11179, Parts 2 and 3 to help guide production implementations.

Technologies for XMDR Prototype

We are currently debating which technology to use for encoding semantic metadata. There is an inevitable tradeoff between expressiveness and computational tractability (scalability). Among the technologies which have been used for similar purposes in other projects are:

A much more complete enumeration of candidate technologies will be found on the web page XMDR Platform Technologies.

  • Relational databases - used for current ISO 11179 metadata registries
  • Frame-based representations - used in older knowledge representation systems - easily mapped into database representations. Examples include Ontolingua, Ecocyc, and Riboweb.
  • Object oriented databases, e.g., Objectivity, Cache, etc.
  • semantic networks - a graph based representation, nodes are concepts, and directed edges represent binary relationships (is-a, part-of, ...) e.g., RDF and the UMLS Semantic Network at NLM
  • RDF (Resource Description Framework) - a graph-based data model used for encoding metadata on the web. A kind of semantic network.
  • Description logic - a restricted subset of first order logic widely used in knowledge representation applications and more recently in large scale terminology systems, e.g., Galen, SNOMED, ...
  • OWL-Lite, OWL-DL, OWL-Full = Ontology Web Language standardized by W3C Semantic Web working group. Built atop RDF. OWL-DL is the description logic version (also the most popular version). Precursors include OIL+DAML.
  • Prolog - Horn clause logic, e.g., XSB
  • First order logic - full first order predicate logic - very expressive, but computationally problematic.
  • Frame Logic (F-logic) - e.g., Flora system
  • Default logic - allows specification of default (normal) predicates and exceptions.
  • Modal logic - used to incorporate temporality and multiple contexts into logic.
  • Semantic Web Rule Language OWL + RuleML, allows constraint specifications via RuleML
Potential XMDR Prototype Content

We plan to load the prototype extended metadata registry (XMDR) with several metadata / terminology datasets. The purpose of loading these metadata datasets into the prototype registry is twofold: to provide sample metadata for users to explore the functionality of the XMDR, and to test the ability of the XMDR infrastructure to successfully capture the metadata semantics.

Candidate metadata / terminology datasets will likely be selected from the following lists. It is unclear whether we will have the resources to load all of these datasets.

Metadata content may include:

  • Data Element Characterization
    • definitions - natural language, logic-based
    • types
    • dimensionality / measurement units
  • Taxonomies (is-a relationships) - linguistic, biological, ...
  • Partonomies (part-of relationships) - geographic, organizational, anatomic, manufactured, ...
  • Code Sets - e.g., country codes, CAS numbers, airport codes
  • Terminologies - controlled vocabularies
  • Schemas - e.g., for databases, messages, file formats, etc.
  • Ontologies
  • Matchings, Mappings - across terminologies, schemas, ...

From a graph-theoretic perspective, potential metadata content can be classified as either:

Platforms for the XMDR Prototype

Our principal development platform at LBNL is Linux on x86 boxes (either Intel or AMD). We are currently running Mandrake 9.1, but are being urged to move to SUSE 9.1 by systems staff. We expect to deploy a web-based system, using Apache for the web-server and various browsers (Mozilla 1.7, Firefox, Microsoft Internet Explorer 6.x) for the front ends.

As noted earlier, we intend provide programmatic access for remote programs (or software agents) to the XMDR, in addition to human user interfaces. We will implement a REST interface first, on the theory that it is easier to wrap a REST API with SOAP than vice versa.

See also the forthcoming web page on candidate XMDR Platforms ...

Schemas and Ontologies

As part of this project, some XML schemas, OWL Ontologies, and possibly other reusable metadata resources will be developed and published. So far we have developed an OWL ontology for 11179 Part 3 version 2, which is available on this web server:

http://hpcrd.lbl.gov/SDM/XMDR/ont/iso11179-3v2.owl

Additional schemas and ontologies produced for XMDR will also be made available here as they are completed.

Participants
  • Bruce E. Bargmeyer (LBNL, Chair of ISO SC32) - Principal Investigator
  • Frank Olken (LBNL) - content survey, ISO/IEC 11179 Edition 3 Revisions
  • Kevin D. Keck (LBNL) - technology survey, archictecture, ISO/IEC 11179 Metamodel Revisions, programming
  • John L. McCarthy (consultant)
  • Harold Solbrig (Apelon, Inc.) - medical terminology, content preprocessing
  • Karlo Berket (LBNL) programmer
  • Gail Hodge (Information International Associates, USGS / National Biological Information Infrastructure) - environmental terminology
  • Larry Fitzwater (EPA, convener of ISO SC32 WG2) - EPA metadata registries
  • Linda Spencer (EPA) - environmental terminology, EPA data standards
  • Nancy K. Lawler (DoD) - editor of ISO/IEC 11179 Part 2 Version 3
  • LCDR Sam Chance (DoD) - semantic web technologies, agent-based applications
Inter-agency Collaboration

This project involves inter-agency collaboration among:

ISO/IEC Metadata Standards Resources

The ISO/IEC 11179, 20943, 20944, and 19763 standards are products of Work Group 2 (WG 2) of SubCommittee 32 (SC 32) of Joint Technical Committee 1 (JTC 1) of the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC) (see SC32/WG2 Organization Chart).


© 2007, Lawrence Berkeley National Laboratory
maintained by Karlo Berket
Credits: The research and development of the eXtended MetaData Registry is supported by a variety of participating organizations. Valid XHTML 1.0 Strict Valid CSS!