| | Scope |
This project is concerned with the development of
improved standards and technology for storing and retrieving the semantics of data elements,
terminologies, and concept structures in metadata registries.
Existing metadata registry standards include the ISO/IEC 11179 family of Metadata Registry
standards (e.g., ISO/IEC 11179, ISO/IEC 20943, and ISO/IEC 20944). We intend to propose
extensions of the ISO/IEC 11179 family of metadata registry standards to support more diverse
types of metadata and enhanced capabilities for semantics specification and queries.
- Propose revisions to ISO/IEC 11179 Metadata Registry (MDR) Standard.
- Creation of a prototype extended metadata registry.
- Loading some terminologies / ontologies into the prototype XMDR.
- Explore technologies for providing access to the XMDR across the Internet.
We welcome participation in the project of additional parties who will actively contribute
(funds, data, code, labor). See Contact Information.
|
| Uses of Extended Metadata Registries |
Extended metadata registries can be used for a variety of purposes including:
- Data administration for databases - e.g.,
database design, documentation, applications,
user help, query planning and optimization,
federated databases/data integration,
data warehousing extraction/translation/loading,
etc.
- Data administration for datasets -
for (scientific) data not kept in databases
- Data administration for messaging and data exchange -
e.g., EDI, e-commerce, military messaging, HL7, XML message exchange, ...
- Support for scientific workflow management systems,
scientific and statistical software "workbenches"
- Support the use and enforcement of controlled vocabularies,
and authority lists (for proper nouns such as places,
organizations, bibliographic citations, or persons).
- Support for interoperable annotation systems, e.g.,
annotation of documents, programs, DNA sequences, genes, ....
- Support for natural language processing - entity extraction, etc.
- Agent support - provide a mechanism for software agents to
discover, retrieve, and update metadata.
- Support for semantic help facilities
(e.g., data element definitions)
for data entry systems.
- Metadata support for case based reasoning (CBR) systems
(used in medicine, equipment failure analysis,
software problem resolution,
architectural and mechanical design,
and law enforcement applications)
- Bootstrap facility for Semantic Web applications. XMDR
will provide a mechanism for open sharing of
ontologies and/or ontology metadata, thus promoting
sharing, interchange, merging and adoption of
ontologies.
- Centralized storage for enterprise metadata that allows
decentralized stewardship of models (e.g. schemas,
ontologies, etc.).
|
| Proposed Revisions of ISO/IEC 11179 Medata Registry Standard(s) |
The proposed revisions to ISO/IEC 11179 Metadata Registry Standard(s)
concerning enhanced capabilities with respect to terminologies,
classification schemes, and ontologies should be normative.
We would like (ultimately) to register semantically
accurate, ontologically well structured
terminologies, classification schemes and ontologies.
This might well lead to a requirement to capture the semantic
specification in terms of a description logic.
We also expect to capture many existing terminologies,
classification schemes, etc. which may not meet the standards
of a "well-formed" terminology / classification scheme.
Many existing terminologies / classification schemes lack a
description logic encoding or even sufficient precise semantic
specification to permit such a DL encoding with much additional
work (many person years per terminology).
Current thinking is that we may adopt a multi-level registration
process for terminologies / classification schemes akin to that
used for data element registration.
Lower levels of registration would be merely descriptive.
Higher levels of registration would be required to meet
additional "well-formedness" requirements.
We are aware of efforts in ISO JTC1 SC32 WG2
led by Hajime Horiuchi (Tokyo International Univ, ISO JTC1 SC32 WG2 Japan)
to develop a Metamodel Framework (MMF), which is intended
to support registration of ontologies, such as those
specified in the OMG ODM approach.
We expect to coordinate our work with their efforts.
|
| Prototype Extended Metadata Registry (XMDR) |
The purpose of the prototype is to demonstrate the feasibility
and utility of constructing an extended metadata registry, i.e.,
one which encompasses richer classification support, facilities
for including terminologies, and better support for formal
specification of semantics.
The prototype registry will also serve as a reference implementation
for the revised versions of ISO 11179, Parts 2 and 3 to help guide
production implementations.
|
| Technologies for XMDR Prototype |
We are currently debating which technology
to use for encoding semantic metadata.
There is an inevitable tradeoff between
expressiveness and computational tractability (scalability).
Among the technologies which have been
used for similar purposes in other projects are:
A much more complete enumeration of candidate technologies
will be found on the web page
XMDR Platform Technologies.
- Relational databases - used for current ISO 11179 metadata registries
- Frame-based representations
- used in older knowledge
representation systems - easily mapped into
database representations.
Examples include Ontolingua, Ecocyc, and Riboweb.
- Object oriented databases, e.g., Objectivity, Cache, etc.
- semantic networks - a graph based representation, nodes are concepts, and directed edges
represent binary relationships (is-a, part-of, ...) e.g., RDF
and the UMLS Semantic Network at NLM
- RDF (Resource Description Framework) -
a graph-based data model used for encoding metadata on the web.
A kind of semantic network.
- Description logic - a restricted subset of first order logic
widely used in knowledge representation applications
and more recently in large scale terminology systems,
e.g., Galen, SNOMED, ...
- OWL-Lite, OWL-DL, OWL-Full = Ontology Web Language standardized by
W3C Semantic Web working group. Built atop RDF.
OWL-DL is the description logic version (also the most popular version).
Precursors include OIL+DAML.
- Prolog - Horn clause logic, e.g.,
XSB
- First order logic - full first order predicate logic - very expressive, but computationally problematic.
- Frame Logic (F-logic) -
e.g., Flora system
- Default logic - allows specification of default (normal) predicates
and exceptions.
- Modal logic - used to incorporate temporality and multiple
contexts into logic.
- Semantic Web Rule Language OWL + RuleML, allows constraint specifications via RuleML
|
| Potential XMDR Prototype Content |
We plan to load the prototype extended metadata registry (XMDR)
with several metadata / terminology datasets. The purpose of
loading these metadata datasets into the prototype registry
is twofold: to provide sample metadata for users to explore
the functionality of the XMDR, and to test the ability
of the XMDR infrastructure to successfully capture the
metadata semantics.
Candidate metadata / terminology datasets will likely be selected from the following lists.
It is unclear whether we will have the resources to load all of these datasets.
- Environmental Domain (primarily of interest to EPA and DOE)
- Biomedical (Biological, Medical) Domain (for NCI, EPA, DOE Biology Programs, DOD bioterrorism). Candidate datasets may include:
- Chemical Domain (used in environmental, medical, biological applications)
- e.g., chemical nomenclature, chemical code sets (CAS registry numbers),
chemical reactions, chemical properties
- GIS (Geographic Information Systems) (for EPA, DOD, and DOE)
Candidate datasets may include:
- Bibliographic Ontologies / Metadata Standards
- General Terminologies / Ontologies
- Economic Code Sets
- Miscellaneous Metadatasets, Terminologies, etc.
- DTIC Thesaurus (from U.S. Defense Technology Information Center)
- Energy Technology and Data Exchange (ETDE) Thesaurus from DOE OSTI (U.S. Dept. of Energy Office of Science and Technology Information)
- Telecommunications Ontology
Metadata content may include:
- Data Element Characterization
- definitions - natural language, logic-based
- types
- dimensionality / measurement units
- Taxonomies (is-a relationships) - linguistic, biological, ...
- Partonomies (part-of relationships) - geographic, organizational, anatomic,
manufactured, ...
- Code Sets - e.g., country codes, CAS numbers, airport codes
- Terminologies - controlled vocabularies
- Schemas - e.g., for databases, messages, file formats, etc.
- Ontologies
- Matchings, Mappings - across terminologies, schemas, ...
From a graph-theoretic perspective, potential metadata content can be
classified as either:
|
| Platforms for the XMDR Prototype |
Our principal development platform at LBNL is Linux on x86 boxes (either Intel or AMD).
We are currently running Mandrake 9.1, but are being urged to move
to SUSE 9.1 by systems staff.
We expect to deploy a web-based system, using Apache for the web-server
and various browsers (Mozilla 1.7, Firefox, Microsoft Internet Explorer 6.x) for the front ends.
As noted earlier, we intend provide programmatic access for remote
programs (or software agents) to the XMDR, in addition to human user
interfaces. We will implement a REST
interface first, on the theory that it is easier to wrap a REST API
with SOAP than vice versa.
See also the forthcoming web page on candidate XMDR Platforms ...
|
| Schemas and Ontologies |
As part of this project, some XML schemas, OWL Ontologies, and
possibly other reusable metadata resources will be developed and
published. So far we have developed an OWL ontology for 11179 Part
3 version 2, which is available on this web server:
http://hpcrd.lbl.gov/SDM/XMDR/ont/iso11179-3v2.owl
Additional schemas and ontologies produced for XMDR will also be
made available here as they are completed.
|
| Participants |
- Bruce E. Bargmeyer (LBNL, Chair of ISO SC32) - Principal Investigator
- Frank Olken (LBNL) - content survey, ISO/IEC 11179 Edition 3 Revisions
- Kevin D. Keck (LBNL) - technology survey, archictecture, ISO/IEC 11179 Metamodel Revisions, programming
- John L. McCarthy (consultant)
-
Harold Solbrig (Apelon, Inc.) - medical terminology, content preprocessing
- Karlo Berket (LBNL) programmer
- Gail Hodge
(Information International Associates, USGS / National Biological Information Infrastructure) - environmental terminology
- Larry Fitzwater (EPA, convener of ISO SC32 WG2) - EPA metadata registries
- Linda Spencer (EPA) - environmental terminology, EPA data standards
- Nancy K. Lawler (DoD) - editor of ISO/IEC 11179 Part 2 Version 3
- LCDR Sam Chance (DoD) - semantic web technologies, agent-based applications
|
|