首页 本体数据库2

本体数据库2

举报
开通vip

本体数据库2 Ontology Database: A New Method for Semantic Modeling and an Application to Brainwave Data Paea LePendu1, Dejing Dou1, Gwen A. Frishkoff2, and Jiawei Rong1 1 Computer and Information Science University of Oregon, USA {paea,dou,jrong}@cs.uoregon.edu 2 Learn...

本体数据库2
Ontology Database: A New Method for Semantic Modeling and an Application to Brainwave Data Paea LePendu1, Dejing Dou1, Gwen A. Frishkoff2, and Jiawei Rong1 1 Computer and Information Science University of Oregon, USA {paea,dou,jrong}@cs.uoregon.edu 2 Learning Research and Development Center University of Pittsburgh, USA gwenf@pitt.edu Abstract. We propose an automatic method for modeling a relational database that uses SQL triggers and foreign-keys to efficiently answer positive semantic queries about ground instances for a Semantic Web ontology. In contrast with existing knowledge-based approaches, we ex- pend additional space in the database to reduce reasoning at query time. This implementation significantly improves query response time by al- lowing the system to disregard integrity constraints and other kinds of inferences at run-time. The surprising result of our approach is that load- time appears unaffected, even for medium-sized ontologies. We applied our methodology to the study of brain electroencephalographic (EEG and ERP) data. This case study demonstrates how our methodology can be used to proactively drive the design, storage and exchange of knowl- edge based on EEG/ERP ontologies. 1 Introduction With recent advances in data modeling and increased use of the Semantic Web, scientific communities are increasingly looking to ontologies to support web- based management and exchange of scientific data. Ontologies can be used to formally specify concepts and relationships between concepts within a domain. The resulting logic-based representations form a conceptual model that can help with storage, management and sharing of data among different research groups. In addition to the representation of classes and properties, ontologies can store intensional knowledge in the form of general facts, often called rules, ax- ioms or formulae, such as, “All Sisters are Siblings.” Extensional data include specific facts, or ground terms, such as, “Mary and Jane are Sisters.” Relational databases can effectively store and retrieve extensional data, but they lack ob- vious mechanisms to perform the inferences necessary to answer extensional queries over intensional data, as in, “Which individuals are Siblings?” Unlike a typical relational database, a knowledge base can support the deduction that Mary and Jane are siblings by using an inference engine. B. Luda¨scher and Nikos Mamoulis (Eds.): SSDBM 2008, LNCS 5069, pp. 313–330, 2008. c© Springer-Verlag Berlin Heidelberg 2008 314 P. LePendu et al. Intensional knowledge reduces the need to store large amounts of extensional data. For example, we do not need to store the fact, “Mary and Jane are Sib- lings,” to know that it is true. The trade-off, however, is that inferences are required at run-time to generate this fact. What we have, therefore, is an exam- ple of the classical trade-off between time and space: the more extensional data we store, the less time it will take to answer queries about them. In this paper, we challenge traditional approaches for modeling knowledge-based or deductive database systems of this sort, which typically aim to find a balance between space and time requirements. Instead we propose that space is expendable and a great deal of inference (time) can be saved through the use of triggers and foreign-keys to forward-propagate inferences at load-time. Interestingly, when we compared our methods against existing benchmarks, we found we significantly improved query performance as expected, but load-time was remarkably unaffected. In addition to these performance gains, we demonstrate that semantics can play an essential role in data management and query answering. In fact, both ontologies and database systems are important, leading us to propose a new methodology for database design, which we will call ontology databases. To illustrate this idea, we describe the application of our methodology to brain electroencephalographic (EEG and ERP) data. In this application, we describe a database design that is ontology-driven. Moreover, we demonstrate how queries can be posed by domain experts at the ontology-level rather than using SQL di- rectly. Database projects like ZFIN [8] and MGI [1], housing large central reposi- tories for zebrafish and mouse genetic data, respectively, were later reinforced by the Gene Ontology [25] to help normalize knowledge across these kinds of repos- itories. By contrast, our Neural ElectroMagnetic Ontology (NEMO) project uses expert knowledge in the form of EEG/ERP ontologies to drive the data modeling and information storage and retrieval process. The paper is organized as follows. We begin with related work (Section 2), followed by a description of our ontology-based modeling methodology and a performance analysis (Section 3). We then present a case study in which we applied our methodology to develop ontology databases for EEG/ERP query answering (Section 4). We conclude with a discussion and an outline of future work in Section 5. 2 Related Work Ontologies can be regarded as a conceptual or semantic model for database design. Hull and King [19] provide a nice summary of semantic models of all kinds: Entity-Relational, Object-Oriented, Ontological and so on. While the no- tions in their survey make clear that there are firm connections between models, database implementations, and logics, we have been interested in exploring the question, “What is a semantic data model?” In particular, we wish to explore it from an ontology-based perspective that addresses practical issues in collabora- tive scientific research, especially, biomedical research. Increasingly, biomedical researchers are looking to develop ontologies to support cross-laboratory data Ontology Database: A New Method for Semantic Modeling 315 sharing and integration. These ontologies can be found at ontology repositories around the world [34]. For example, more than 62 biomedical ontologies can be found at the National Center for Biomedical Ontology (NCBO) [6]. Pan and Heflin proposed a similar approach, which they call description logic databases (DLDB) [26]. DLDB is a storage and reasoning support mechanism for knowledge base facts (RDF triples), which has been compared to well-known systems such as Sesame [10]. Although we structure the database relations in a way that is similar to DLDB (i.e., unary and binary predicates become unary or binary relations), our implementation using triggers and foreign keys to support reasoning, as opposed to SQL views, allows for a significant performance gain by trading space for time by eagerly forward-propagating data at load-time. In this context, it is informative to consider the recent work by Paton and Dı´az [27], which examines rules and triggers in active database systems. Recent research on bridging the gap between OWL and relational databases by Motik, Horrocks and Sattler [24] provides unique insight into the expressive- ness of description logics versus relational databases. The integrity constraints in databases can be described with extended OWL statements (axioms). An important contribution of this research is to show that the constraints can be disregarded while answering positive queries, if the constraints are satisfied by the database. The idea of balancing space and time when we couple databases and reason- ing mechanisms comes from seminal works by Reiter [28,30]. Reiter proposed a system that uses conventional databases for handling ground instances, and a deductive counterpart for general formulae. Since no reasoning is performed on ground terms, Reiter argues convincingly that in such a system queries can be answered efficiently while retaining correctness. OntoGrate [13] is precisely such a system for semantic query translation using ontologies. The key question that motivated our trigger-based approach was, “Since disk-space is rarely an issue these days, what would happen if we use even more space?” The neuroscience community is a recognized leader in the development of biomedical ontologies. For example, the Human Brain Project has supported the development of a common data model and meta-description language [17] for neu- roscience data exchange and interoperability. BrainMap [22] has designed a Ta- laraich coordinate-based archive for sharing and meta-analysis of brain mapping studies and literature, as well as a sharable schema for expression of cognitive- behavioral and experiment concepts. The fBIRN project [20] has pioneered sev- eral areas for neuroscience data sharing, including distributed storage resources and taxonomies of neuroscience terms (called BIRNlex). Our project will build on this prior work and extend it to incorporate ontology-based methods for rea- soning. In addition to incorporating cognitive-behavioral and anatomy concepts represented in BrainMap and in fBIRN, NEMO will develop ontologies for tempo- ral, spatial, and spectral concepts that are used to describe EEG and ERP pat- terns. In line with OBO “best practices,” we will reuse ontology concepts from relevant domains. In fact, we are collaborating directly with ontology engineers and domain experts in the fMRI, as well as the EEG and ERP, communities. 316 P. LePendu et al. The NEMO project brings some distinctive methods to bare on the problem of data sharing. Whereas most prior work on data sharing in the neurosciences has focused on the development of simple taxonomies or relational databases, NEMO uses ontologies to design databases that can support semantically based queries. What this means is that NEMO databases can be used to answer more com- plex queries, which cannot be handled by traditional (purely syntactic) database structures. For example, the popular Gene Ontology (GO) [25] provides a stan- dard vocabulary and concept model for molecular functions, biological processes and cellular components in genetic research. The OWL [7] specification of GO is over 40 Megabytes in size [25] and terabytes of research data stored in model organism databases around the world such as ZFIN [8] and MGI [1] are all being marked-up according to the GO ontology. The NEMO working group is borrow- ing from this idea and taking it a step further [12,15]. More than a standard vocabulary of terms, the ontologies NEMO is developing will capture knowl- edge ranging from the experimental methods used to gather ERP data down to instrument calibration settings so that results can be shared and interpreted semantically during large-scale meta-analysis across laboratories. 3 Ontology-Based Data Modeling We first present a new and general methodology, which takes a Semantic Web ontology as input and outputs a relational database schema. We call such a database an “ontology database,” which is an ontology-based, semantic database model. As we will show in Section 4, after we load ERP data into the NEMO ontology database, we can answer queries based on the ontology while automat- ically accounting for subsumption hierarchies and other logical structures within each set of data. In other words, the database system is ontology-driven, com- pletely hiding underlying data storage and retrieval details from domain experts, whose only interaction-interface happens at the ontology (conceptual) level. 3.1 The Procedural Extension Although Description Logics (DL) [9] provide the formal logical foundation for OWL and Semantic Web ontologies, we do not require the full expressiveness of this logic for data modeling purposes in most scenarios we have encountered. It suffices to use rules of the form (reads “if C then D”): C ⇒ D, which exclude the analysis-by-cases and contrapositive reasoning provided by full DL inclusion axioms of the form (reads “C is subsumed by D”): C � D. What this means is that we are drawing a line between databases and knowledge bases. For example, while it may be taken for granted in a knowledge-based Ontology Database: A New Method for Semantic Modeling 317 system that, “X is either a Rock or it is not a Rock, no matter what X is,” a database has no such reasoning capability. It can only say which is actually the case. As such, we technically only allow epistemic inclusion axioms with the K operator [9] which stands for “know” in the following rule (reads “Only when we know that C is true can we conclude D”): KC � D. The difference is evidenced by the fact that we can immediately conclude D (without any positive or negative witnesses of C) in: (C unionsq ¬C) � D, but not necessarily in: (KC unionsqK¬C) � D. This restriction makes knowledge maintenance (reasoning) much easier: all we need to calculate is the procedural extension of a given set of facts and rules [9]. This can easily be done using database triggers and foreign keys with cascading deletes, the basic idea of which we outline below. 3.2 Triggers Triggers are used for each rule to propagate data in a forward-chaining manner as facts are loaded into the ontology database. For example, suppose we have the following first-order rule (reads “all Sisters are Siblings”): ∀x, y : Sisters(x, y)→ Siblings(x, y). Whenever a new pair of sisters is inserted into the ontology database, such as Sisters(Mary, Jane), a trigger fires, eagerly inserting Siblings(Mary, Jane) as well. This process is depicted in Figure 1. Sisters (subj, obj) (Mary, Jane) Siblings (subj, obj) (Mary, Jane) (Lily, Zena) (Paul, Mary) (Lily, Zena) (Mary, Jane)trigger f-keyf-key Fig. 1. This figure shows that upon asserting Sisters(Mary,Jane) which means in- serting (Mary, Jane) into the Sisters-property table, the trigger causes (Mary, Jane) to first be inserted into the Siblings-property table. Triggers generate knowledge in a forward-chaining manner for the Sisters-Siblings rule, ∀x, y : Sisters(x, y) → Siblings(x, y). Implicitly understood in this sub-property rule is also the contraposi- tive, ∀x, y : ¬Siblings(x, y) → ¬Sisters(x, y), an integrity check that foreign-keys can enforce, shown here as the dotted line. 318 P. LePendu et al. Although the above is an example of a sub-property (Sisters is a sub-property of Siblings), triggers can be used for both sub-class and sub-property hierarchies. Each trigger is a straightforward encoding of the epistemic rule, in SQL: CREATE TRIGGER subPropertyOf-Sisters-Siblings SUCH THAT UPON DETECTING EVENT INSERT (x,y) INTO Sisters(subject,object) FIRST EXECUTE INSERT (x,y) INTO Siblings(subject,object) 3.3 Foreign Keys with Cascading Delete Foreign keys are used to check integrity constraints as usual, but by using the “on delete cascade” option, they also propagate deletions whenever facts are negated (which is not uncommon in scientific domains). For example, in the Sisters-Siblings sub-property rule of Figure 1 it is understood implicitly that if two people are not Siblings, then they cannot be Sisters either: ∀x, y : ¬Siblings(x, y)→ ¬Sisters(x, y). Semantically, we interpret the contrapositive to mean two things. First of all, it is an integrity constraint: if Siblings(Mary, Jane) is not true, then it cannot be the case that Sisters(Mary, Jane) is true, so an integrity check is performed to val- idate that Siblings(Mary, Jane) is true before inserting Sisters(Mary, Jane). Of course, care must be taken to ensure triggers and integrity checks happen in the correct order (note the “FIRST” keyword in the SQL trigger). Secondly, if deletions (negations) are performed, they must be propagated to ensure con- sistency is maintained, thus explaining the “on delete cascade” option. Indeed, this is the pattern for all sub-class and sub-property rules: they are both triggers (knowledge generating) and integrity constraints (knowledge checking), consis- tent with the semantics of inclusion axioms. Integrity constraints also occur in domain and range restrictions on properties. In this case, we have foreign keys but no triggers. For example, when we assert Sisters(x, y) we generally presume that x and y are People. That is, we mean: ∀x, y : [¬Person(x) ∪ ¬Person(y)] → ¬Sisters(x, y), but not necessarily: ∀x, y : Sisters(x, y)→ [Person(x) ∩ Person(y)]. In other words, given the statement Sisters(Mary, buddyTheFrog), we do not intend to automatically conclude that buddyTheFrog is a Person but rather hope the assertion is rejected unless we know for sure that buddyTheFrog is a Person (and not a Frog). This kind of reasoning is due in large part to the notion common in database systems that any fact not known to be true is presumed false, known as the closed world assumption [29]. Ontology Database: A New Method for Semantic Modeling 319 Table 1. The ontology database methodology is summarized in this table. Here, re- spectively, subj and obj refer to the subject and object of a property, MinCard and MaxCard refer to cardinality, and f-key and p-key stand for foreign key (with an “on delete cascade” option) and primary key. Logical Feature FOL Formalism Ontology DB Implementation Structure Class(A), Class(B) A(x), B(y) relation: A(id), B(id) Property(P ) P (x, y) relation: P (subj, obj) Restrictions Domain(P, A) ∀x, y : P (x, y)→ A(x) f-key: P (subj) ref A(id) Range(P, B) ∀x, y : P (x, y)→ B(y) f-key: P (obj) ref B(id) MaxCard(P, 1) ∀x, y, z : P (x, y) ∧ P (x, z) p-key: P (subj) → y = z MinCard(P, A, 1) Domain(P, A) f-key P (subj) ref A(id); → (∀x : A(x) trigger: on insert on A(id) → ∃y : P (x, y)) insert ignore P (id, null) Subsumption subClassOf(B, A) ∀x : B(x)→ A(x) trigger: before insert on B(id) insert ignore A(id); f-key: B(id) ref A(id); subPropertyOf(Q, P ) ∀x, y : Q(x, y)→ P (x, y) trigger: before insert on Q(subj,obj) insert ignore P (subj, obj); f-key: Q(subj, obj) ref P (subj, obj); Horn Rules & GMP ∀x1, x2 . . . xm : ∀k ∈ [1..n] trigger(rule premise-k): on insert on Pk(xh−1, xh) P1(x1, x2) ∧ . . . update [rule-premise-table with Pk] ∧Pn(xm−1, xm)→ Q(xi, xj) trigger(rule activate): (1 ≤ i, h ≤ m, 1 ≤ j, h ≤ m) on update on [rule-premise-table] if [all premises satisfied] then insert ignore Q(xi, xj) (1 ≤ i, h ≤ m, 1 ≤ j, h ≤ m) 3.4 Modeling Summary Table 1 summarizes the main logical features we implement in the ontology database methodology. These features can be categorized according to struc- tures, restrictions and subsumptions which come from OWL, RDF [3] and gen- eral first-order logic. The database relational structure we have chosen (unary and binary predicates become unary and binary relations) is almost identical to the hybrid approach of DLDB [26], which combines approaches from prior works to effectively store RDF triples. 3.5 Logical Justification Our ontologies are generally restricted to Horn Normal Form (HNF) [32], which is a disjunction with only one positive literal as in: 320 P. LePendu et al. ¬p1 ∨ ¬p2 ∨ . . . ∨ ¬pn ∨ q. These formulae can be written as implications without disjunctions on the right- hand side, like Datalog [33] rules, which we call implicative normal form (INF): p1 ∧ p2 ∧ . . . ∧ pn → q. Generalized Modus Ponens (GMP) [32] is an inference rule based on the well- known modus ponens rule: p′1 ∧ p′2 ∧ . . . ∧ p′n p1 ∧ p2 ∧ . . . ∧ pn → q SUBST (θ, q) GMP GMP allows us to unify several antecedents simultaneously to prove a con- clusion. It is well-known that GMP is sound and complete for knowledge bases in HNF (and therefore INF) [32]. A trigger is essentially a forward-chaining im- plementation of GMP, recursively calling other triggers as necessary. Because all definitions are acyclic, the procedure is guaranteed to terminate. Foreign-keys and null-valued triggers together provide the machinery for solemnization under existential constraints (such as, “All Employees have an SSN.” [31]). According to this method, an ontology database therefore produces and maintains the pro- cedural extension, guaranteeing that the database is a Herbrand Model for the given set of facts (see [32] for details on the Herbrand universe, interpretation and model). 3.6 Gene
本文档为【本体数据库2】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑, 图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。
下载需要: 免费 已有0 人下载
最新资料
资料动态
专题动态
is_519106
暂无简介~
格式:pdf
大小:1MB
软件:PDF阅读器
页数:18
分类:互联网
上传时间:2013-12-04
浏览量:21