Frequently Asked Questions
on the Concept-Oriented Data Model (COM)
vs. Other Models
Alexandr Savinov
http://conceptoriented.org/
First started: 17.11.2005
Last updated: 14.04.2006
1 Relational Model
1.1 What are the main differences between COM and relational
model?
1.2 How references differ from primary keys?
1.3 Is it possible to model the concept-oriented model by means of
secondary keys?
1.4 Do normal forms exist in COM?
1.5 How primary keys can be modeled in COM?
2 Object-Oriented Model
2.1 Is there inheritance in COM?
2.2 What are differences between inheritance and subconcept-superconcept
relation?
2.3 What is the difference in object definition?
3 Formal Concept Analysis
3.1 What is formal concept analysis?
3.2 What is an FCA syntax?
3.3 What is FCA semantics?
3.4 What is an FCA concept?
3.5 How FCA concepts are specified?
4 OLAP and Multidimensional Databases
4.1 What is an OLAP dimension?
4.2 What is an OLAP measure?
4.3 What is drill down and roll up?
4.4 What is an OLAP cube?
4.5 What is an OLAP representation?
5 Ontologies
5.1 How knowledge is shared in COM?
6 Semantic Web and RDF
6.1 What are main differences between RDF and COM?
7 Object-Role Modeling (ORM)
7.1 What are main differences between ORM and COM?
8 Universal Relation Model (URM)
8.1 What are main assumptions of URM?
8.2 What are main differences between URM and COM?
9 Functional Data Model (FDM)
9.1 What are main properties of FDM?
9.2 What are main differences between FDM and COM?
10 Multi-Value Model (MV)
10.1 How multi-valued attributes are modeled in COM?
10.2 How multi-valued attributes are modeled in MV?
10.3 Links
Below we describe some most important differences between these two approaches:
Primary key is a normal column adapted to special purposes. Reference is an initially special mechanism, which has much less to do with object properties. In other words, reference is NOT a column or dimension. It is a separable part of an object, which is intended to represent the source object in another space. In fact, reference may have properties (both its own and object's) but the difference from primary key is in the role of the identifier.
Yes, it is possible to some extent but it will require significant efforts and manual implementation and control of many features. In addition, it will require following a strict discipline in using this model. Although for restricted use such an simulation is possible, in general case this approach does not guarantee that all moments will be taken into account. This question is somewhat similar to object-oriented programming using only procedural programming language facilities where we need to manually implement some convenience techniques and follow the object-oriented discipline. The relational model and its implementations in the state of the art DBMSs is rich enough to simulate other data models. However, the main advantage of the concept-oriented approach is in its simplicity, reliability, flexibility and expressiveness in describing real problem domains.
In relational model there is a problem of normalization, which consists in satisfying particular criteria by the data schema. In such a form this problem does not exist in COM because COM can be viewed as normalized by definition because COM enforces a concrete structure of concepts. However nothing prevents us from formulating the problem of normalization in some other modified form. In relational model the problem of normalization is especially actual because this model does not separate object representation functionality from semantic characterization and treats primary keys as normal columns. In other words, we define columns and then specify the role of some of them as representatives of the whole record. In COM we explicitly separate these functions in a principled manner by using the mechanism of references for object representation purposes and object characteristics or dimensions for semantic characterization. Although it is quite possible to use some semantic characteristics to represent an object, this is considered a kind of optimization or a trick, which is not encouraged. By postulating such a separation of functions we solve many serious problems, particularly, the problem of normalization. Actually, the normalization problem does not disappear at all but rather it takes another interpretation.
In the two-level concept-oriented data model there exist only one root element, a number of concepts each having a number of items where items are identified via their references. Item references have system default format. In the case the default format is a restriction for developing a model, it is possible to define an arbitrary format of item references as a set of fields. This new format is analogous to primary keys. However, in contrast to primary keys in the relational model, the format of references is dual to the format of items themselves. In the concept oriented programming (COP) the format of references is defined by the reference class while the format of items is described by the object class. In the relational model primary key is defined by assigning special role to columns.
Strictly speaking no, there is no inheritance in COM. Each concept has a number of superconcepts, which are not considered base concepts as it is treated in object-oriented paradigm. In particular, a subconcept is not formally supposed to extend or specialize its superconcepts. The main role of the subconcept consists in defining the possible structure of its items as a combination of items taken from the superconcepts (and not from any other concept).
There exists the following interconnections between the conventional inheritance and the concept-oriented model:
In the object-oriented approach object is a piece of information with its own identity, which distinguishes it from other objects. In COM it is also true because each data item has its identity represented by the reference. However, a subtle difference is that in COM an object does not have its own (private or internal) semantics and in this sense it is distributed all over the concept structure. In other words, the object semantics is expressed via other objects and generally we can find this object parts in very different concepts represented by other objects. This is a consequence of the principle that object is distributed in the space horizontally (multidimensionally) and vertically (hierarchically). In particular, we can produce an abstract view of an object and more specific and detailed view of this very object. So in the concept-oriented approach we essentially do not know what object is. For example, is an order item its reference, its date of issue or an address of its customer? We do not know. It depends on what do we need at this current moment. In the object-oriented approach we have arbitrary references between objects but we do not know how to interpret them. Thus each reference has its own interpretation, which is not an integral part of the model. The only thing we can do with reference is to follow it. In the CO approach the database knows precisely the meaning of references and this is why they cannot be used arbitrarily - creating one new reference adds concrete piece of semantics to the whole database and we have to understand what does it mean.
The goal of the Formal Concept Analysis (FCA) consists in developing methods for representation and analysis of data. Its specific feature is that the data is structured into units, which are formal abstractions of concepts of human thoughts allowing meaningful and comprehensive interpretation. FCA is heavily based on the lattice theory and essentially it can be viewed as an adaptation of lattice theory to the needs of data representation and analysis.
FCA syntax is described by a set of attributes M.
FCA attributes are analogous to COM primitive concepts of binary type. Such an COM concept might have two items 0 and 1 or, alternatively (and better), it might have a single item 1 while the second value is denoted by null. Thus an object either takes the only concept value 1 or null.
FCA semantics is described by a set of object G where each object is characterized by a subset of attributes. This information is represented by a so called incidence relation I in G x M. A triple <G,M,I> is called an FCA formal context. An alternative (equivalent) approach is where each object is represented as a set of its binary characteristics corresponding to FCA attributes. If an object is characterized by an attribute then the attribute value is 1, otherwise the value is 0. All objects are characterized by a common set of attributes and live in one space G. This space is an COM concept (not FCA concept). FCA objects are analogous to COM items.
FCA concept is a subset of objects and a subset of attributes, where the subset of attributes precisely describes the set of objects (and dually, the subset of objects precisely covers the subset of attributes). A subset of objects is said to be concept extent while the subset of attributes is said to be the concept intent. This definition means that if an additional object is added to the concept extent then we have to add also attributes to cover this new set of objects (otherwise it is not an FCA concept). Dually, if we remove some attribute from the concept intent then we have also remove some objects (otherwise it is not an FCA concept).
FCA concepts cannot be directly specified because they are derived from FCA semantics and depend on this semantics, i.e., if FCA semantics changes then a set of FCA concepts also changes. For example, if objects change their attribute values or objects are deleted/added then a set of concepts may change. Thus the only way to change a set of FCA concepts consists in changing FCA semantics.
An OLAP dimension is a concept hierarchy where superconcepts correspond to abstract representation with low details while subconcepts correspond to detailed representation. The items are values of this OLAP dimension representing objects with different levels of details.
An OLAP measure is defined precisely like OLAP dimension, i.e., it is a concept hierarchy. Normally however, OLAP dimension has a numeric type so that its items can be arithmetically aggregated. In general case the measure could be non-numeric or even multidimensional where it is a product of several concepts (like dimensions).
Drill down corresponds to choosing a subconcept of the current concept involved into the OLAP dimension. Thus we move to more detailed view because subconcepts have more detailed items. Roll up is an opposite operation of moving to a superconcept of the current concept. Thus changing the current concept we can the level of details. Generally there are more than one superconcept and subconcept defined for an OLAP dimension so the choice may be more complex.
An OLAP cube is the product of the current concepts chosen for each dimension, which consists of a combination of all items from the current concepts. Each such combination of items is called a cell. By applying drill down and roll up we can change the current concepts and thus change the OLAP cube and its granularity.
An OLAP representation is function, which assigns one measure value for each item from the OLAP cube. This means that for each combination of items from the concepts selected in the OLAP dimensions this function finds some item from the measure concept. In general case the measure is not necessarily numeric and then the function assigns any item to each cell in the OLAP cube.
In the concept-oriented data model higher level concepts are supposed to be more stable because they describe common terms used in subconcepts. In other words, the syntax and semantics of superconcepts changes much less frequently then those of the subconcepts. The terms in the concept-oriented model are items (not concepts) from the common superconcepts. Having a common set of superconcepts with its term items allows us to position our own custom items in one and the same common space. In other words, such an approach allows sharing the space structure and its coordinate system. In order to define a new item we have to specify its position in this common space by specifying its coordinates from the superconcepts. Thus we can link our new custom terms to the system of common terms.
An example of such a common system of terms might be a product categorization schema, which consists of a set of concepts along with their semantics describing a list of products. Once such a schema has been fixed it can be shared among participating agents. This guarantees then that their own more specific databases will be linked against one system of product categorization. As a consequence the agents can exchange information and understand it because understanding means having a common system of terms or living in one space.
The most primitive knowledge sharing system is that of numbers. Nobody asks what number 10 means because everybody has the primitive concept of real numbers stored in its knowledge base. The concept-oriented approach allows sharing much more complex knowledge. This is equivalent to standardizing the space structure where we live.
URM was started as an independent work of several researcher with the goal to study interrelational data dependencies. The relational model successully sovled the problem of physical navigation but it failed to solve the problem of logicl navigation where the user still has to specify concrete access paths in order to get a correct result set. The main goal of URM consists in achieving complete access path independence and free the user from specifying it for each query.
Here are the main assumptions of URM:
In the concept-oriented data model there is no dedicated mechanism for modeling multi-valued attributes. Instead, they are modeled by means of inverse dimensions. In other words, normal dimensions are always one-valued while inverse dimensions are always multi-valued. This property can be used to determine the direction of dimension and mutual position of concepts.
Records have multi-valued fields where values are separated by value marks. The values within these fields are dealt with a query language.