Testwiki:Data model/en

From testwiki
Jump to navigation Jump to search

<languages/> Template:Information page

Wikidata represents Template:Wikipedia as data itemsTemplate:Anchor (e.g. Template:Q and Template:Q are data items). Knowledge about data items is represented via statementsTemplate:Anchor, whose basic structure consists of a subject, a predicate and an object. For example, Template:Statement.

  • The subject of a statement is usually a data item — in this case, Template:Q.
  • The predicate of a statement is always a property — in this case, Template:P.
  • The object of a statement is a value of the data type of the property — in this case, an item, Template:Q.

The property used in a statement determines both the meaning of the statement (i.e. the nature of the relationship between the subject and the object), as well as which values may be used, as specified by its data type.

For example, in the example above we used the property Template:P, whose values must have the data type Template:Datatype, allowing a data item to be set as the object of the statement (in the case of our example, Template:Q).

An example of a property with a different data type is Template:P, whose values must be of data type Template:Datatype, so it can only be used to state a point in time.

Template:Anchor Wikidata also allows statements to be qualified with further properties, which are called Template:Ll. For example, we might state Template:Statement.

The information on this page is not required to contribute to Wikidata or to consume Wikidata. To learn about contributing/consuming Wikidata, please refer to the pages Template:Ll and Template:Ll respectively.

Three levels of data models

Wikidata is powered by the Template:Ll software. While Wikibase defines Template:Data data types by default, it does not come with any Template:Ll out of the box. Wikidata, however, has [[Special:ListProperties|Template:Pages in namespace properties]], which have all been created specifically for Wikidata and are defined within Wikidata itself. (Don't worry about that large number, Expression error: Unrecognized punctuation character "[".% of these properties are just Template:Ll, i.e. links to items in other databases.)

When we speak of a "Template:Wikipedia" in the context of Wikidata, it can actually refer to one of three things:

  • the data model of the Wikibase software (which is actually more elaborate than just semantic triples[1])
  • the fundamental data model that Wikidata establishes on top of the Wikibase model, which includes the core properties such as Template:P, Template:P and Template:P
  • any of the topic-specific data models (e.g. for instances of Template:Q, there are the properties Template:P and Template:P)

All of these different data models are described on different pages:

  • the data model of the Wikibase software is described on mediawiki.org very technically in the specification and more accessibly in the primer to the Wikibase data model
  • the fundamental data model of Wikidata is not strictly defined, nonetheless this page attempts to describe it
  • the various topic-specific data models are loosely described via Template:P and more formally via entity schemas.

Note that Wikidata has no central authority that decides how data should be modeled, instead that question is decided collaboratively by Template:Ll through public discussion. The data model of Wikidata has evolved over time and is very much still evolving: new data types can be introduced, new properties are being proposed and created, problematic properties get deprecated and there is an ongoing effort to better describe how properties are meant to be used via property constraints and entity schemas.


Data model of Wikibase

Template:Properties by datatype

The data model of Wikidata is based on the data model of Template:Ll, which is described very technically in the specification and more accessibly in the primer to the Wikibase data model.

Wikidata extends the Wikibase data model via extensions. Most notably WikibaseLexeme adds three entity types for Template:Ll (Lexeme, Form and Sense), as described in the WikibaseLexeme data model. Wikidata uses several extensions to add more data types to Wikibase, as described in Template:Ll.

Data types

The data types of Wikidata are described at Template:Ll and listed at Special:ListDatatypes. Wikidata extends the data types of Wikibase via the following three extensions:

This is possible because the data types of Wikibase are extensible. The introduction of more data types can be proposed on Phabricator.

The Wikibase data model has a canonical representation in JSON, which is further described at Template:Ll.

Note that several data types have limitations, which are listed at Template:Ll.

Also note that there is no clear semantical difference between Template:Datatype and Template:Datatype ... several string properties are external identifiers and Template:P works for both.

Ranks

Every statement in Wikibase has one of three ranks (normal, deprecated or preferred). For the semantics of these ranks please refer to Template:Ll.

No value and unknown value

Template:See also

  • Template:Statement means that no such value exists (≡ ¬∃ X (Template:Statement))
  • Template:Statement can mean any of the following:
    • the value was once known but has been lost to time (e.g. Template:Statement)
    • the exact value has never been known and might not ever be known (e.g. Template:Statement)
    • the Wikidata contributor who made the statement knows the value exists but doesn't know it personally
    • the value is a known object, but there's no Wikidata item about the object (perhaps because it's not notable).


Order of values

While Wikibase always stores values in a specific order (insertion order by default), the order of values generally does not imply any semantics. Semantic order is instead expressed via qualifiers, for example:

Note that the order expressed via qualifiers does not necessarily match the order of values in the user interface or the API because these interfaces simply return values in the serialization order, which may or may not match the semantic order expressed by the qualifiers.[2]

Fundamental entities

The fundamental properties of Wikidata are described in Template:Ll.

For more information and people interested in the Template:Wikipedia of Wikidata, please refer to the Ontology WikiProject.

Fundamental properties

Note: This section assumes that you are familiar with Template:Wikipedia, for a less technical explanation please refer to Template:Ll. The three arguably most important properties of Wikidata are based on Template:Wikipedia, which is described in the RDF Schema specification.

These properties have the following semantics:

Please note that Template:P and Template:P are both transitive properties:

Template:Ombox

Another important property is Template:P, which is equivalent to owl:inverseOf and carries the following semantics:

Restrictiveness of qualifiers

Template:Ll can be either restrictive or non-restrictive. Restrictive qualifiers change the meaning or scope of a statement, they have to be taken into account by data consumers that want to correctly interpret Wikidata statements. Non-restrictive qualifiers on the other hand just add additional information that can be safely disregarded without changing the meaning or scope of the statement.

Examples for restrictive qualifiers are:

The restrictiveness of properties when used as a qualifier is currently modeled via Template:Statement and Template:Statement (note that you as always have to take the transitivity of Template:P into account).

Unfortunately some properties aren't clear-cut and can be both restrictive as well as non-restrictive when used as a qualifier, so we can group qualifier properties into four categories:


Negation

Wikibase does not have built-in support for Template:Wikipedia, negation therefore has to be modeled with separate properties. For example Template:P can be negated with Template:P. Such negating properties only exist for Template:Quickquery. When the need for a new negating property arises, it Template:Ll.

The semantics of negating properties are modeled via Template:P, as follows:

Whether or not a property expresses the absence of something is currently modeled via Template:Statement.

Differences from OOP

Contrary to Template:Wikipedia there is nothing preventing an entity from being both an instance as well as a class.

Furthermore an entity can be an instance of multiple classes, as well as a subclass of multiple classes.

Lastly you might expect that an instance automatically inherits all statements from its parent classes, however that is explicitly not the case, as explained in Template:Ll.

Inferring classes

Properties may specify Template:P which has the semantics:

Classes can be defined to be a Template:Wikipedia or a Template:Wikipedia of other classes with Template:P and Template:P respectively. Their concrete semantics are as follows:

Let's define classesOf(X):={CX is an instance of C}.

Classes may specify Template:P which has the semantics:

Classes may specify Template:P which has the semantics:

Inheritance

If you are familiar with Template:Wikipedia, you might expect that instances of a class inherit the statements of a class. This is generally not the case. For example just because Template:Statement and Template:Statement does not mean that Template:Statement. However there are some properties that are likely to be inherited:

Property Inverse property
Template:P Template:P
Template:P none
Template:P Template:P
Template:P Template:P

For example Template:Statement and Template:Statement can be used to correctly infer Template:Statement.

When attempting to make such inferences don't forget to take ranks, Template:Ll and negation into account, as explained in Template:Ll.

Does a statement apply?

The following is an attempt at outlining a strategy to decide whether a particular statement applies to a given entity:

  1. Statements Template:Ll have been superseded and therefore no longer apply.
  2. Statements with a Template:Ll only apply with regards to the respective qualifier.
  3. Statements of certain properties are likely to be inherited (see Template:Ll). Note however that instances or intermediary classes may negate statements inherited from a parent class, as described in Template:Ll.


Reflexive statements

Template:Statement has unclear semantics if A is a class, it could mean:

  1. an instance of A has a relation P to another instance of A (which may or may not be the same instance)
  2. an instance of A has a relation P to a different instance of A (which cannot be the same instance)
  3. an instance of A has a relation P to itself

See Template:Pps for a proposal to introduce a qualifier property to differentiate these cases.

Format string properties

Wikidata has several format string properties, such as Template:P, Template:P and Template:P.

The formatting mechanism of these properties and what kind of values they produce is currently not stated in a machine-readable manner, however that might change with the introduction of the proposed Template:Pps properties.


Property constraints

Wikidata employs Template:Ll to combat property misuse. Property constraints are implemented by Extension:WikibaseQualityConstraints and are stated on properties via Template:P since 2017. [3] The violation of such property constraints is directly displayed in the Wikidata user interface.

More complex property constraints can be implemented as Template:Ll queries and placed with Template:Tl on property talk pages. The violation of such complex constraints is periodically reported by a bot on pages within the Category:Complex constraint violation reports category.

For more information about property constraints, please refer to Template:Ll and the property constraints WikiProject.

Topic-specific data models

Wikidata covers many topics, such as art, biology, countries, cities, monuments, movies, people, software, websites, writings, etc. All entities of these topics that are Template:Ll somehow need to be represented as Template:Ll with Template:Ll. So which statements should be made for a specific entity type and which properties should be used for these statements? The answers to these questions are subject to the topic-specific data model that should be used for the specific topic. So, which data model should be used for a given topic? That is decided collaboratively by the Wikidata community through public discussion. The discussions and efforts about a specific topic in Wikidata are organized via Template:Ll.

Where can you find topic-specific data models?

Entity schemas

An alternative approach to property constraints is using the Template:Wikipedia data modelling language. For Wikidata such schemas can be stored within the EntitySchema:* namespace on the wikidata.org wiki (which is enabled by the EntitySchema MediaWiki extension). Note that the effort to establish such schemas for Wikidata is very much ongoing: the Shape Expression for class property proposal is currently on hold because the EntitySchema data type is not yet implemented.[5]

For more information about Wikidata Schemas, please refer to the Schemas WikiProject.


See also

References

  1. Items can have labels, descriptions, aliases and sitelinks, statements have a rank and can have qualifiers and references, and values can also be specified as no value or unknown value.
  2. Phabricator task T173432: Sort claims of a property in meaningful way
  3. Phabricator task T102759: Migrate constraints from property talk pages to statements on properties
  4. it is possible that such variety will be standardized in the future
  5. Phabricator task T214884: linking Schemas in statements

Template:Translation categories [[Category:Wikidata{{#translation:}}]]