Difference between revisions of "PO Developers Guide"

From Plant Ontology Wiki
Jump to navigationJump to search
Line 51: Line 51:
 
===Ontology file syntax===
 
===Ontology file syntax===
  
As specified in the [http://www.obofoundry.org/crit.shtml OBO Foundry principles], ontologies must be expressed in either the OBO syntax, extensions of this syntax, or OWL. All editing work and the live version of the PO are in the OBO format. For each live release, an OWL version of the plant_ontology file is also available at:http://palea.cgrb.oregonstate.edu/viewsvn/Poc/tags/live/plant_ontology.owl?view=log.
+
As specified in the [http://www.obofoundry.org/crit.shtml OBO Foundry principles], ontologies must be expressed in either the OBO syntax, extensions of this syntax, or OWL. All editing work and the live version of the PO are in the OBO format. For each live release, an OWL version of the plant_ontology file is also available at: http://palea.cgrb.oregonstate.edu/viewsvn/Poc/tags/live/plant_ontology.owl?view=log.
  
 
===Unique identifiers===
 
===Unique identifiers===

Revision as of 19:58, 4 February 2011

This page is under re-construction

As an OBO Foundry candidate ontology, the Plant Ontology Consortium (POC) is guided by the OBO Foundry principles. Other standard operating procedures have been developed to insure consistency among curator.

Plant Onotology Files

Our curation and management model is based on a tested protocol established by the Gene Ontology Consortium.

Ontology files

Plant Ontology (PO) development work was originally based around four separate text files in flat file format.

anatomy.ontology -- Plant structure related terms and relationships

anatomy.defs -- Definitions of Plant structure related terms

development.ontology -- Growth and development stage terms and relationships

development.defs -- Definitions of growth and development stage terms


Definition files were eventually merged into the ontology files, so that the ontologies were maintained in two files using the OBO flat file format:

po_anatomy.obo -- Plant structure related terms, relationships, and definition

po_temporal.obo -- Growth and development stage terms, relationships, and definitions


In January 2011, the Plant Structure Ontology (PSO) and the Plant Growth and Development Stage Ontology (PGDSO) were merged into one file:

plant_ontology.obo -- includes terms, relationships, and definition related to plant anatomical entities as well as plant growth and development stages


This merged file allows users to download the entire ontology as a single file, and allows editors to create relations between terms in the two branches of the PO.

In compliance with OBO Foundry principles, the PO is "open and available to be used by all without any constraint other than (a) its origin must be acknowledged and (b) it is not to be altered and subsequently redistributed under the original name or with the same identifiers." The live version of this file is available through our SVN repository at http://palea.cgrb.oregonstate.edu/viewsvn/Poc/tags/live/plant_ontology.obo?view=co as well as on our web browser at [1].

For the immediate future, the POC will maintain separate versions of the po_anatomy.obo and po_temporal.obo files that match the live version of plant_ontology.obo (but obviously will lack links between the two branches) at our SVN site. However, users are encouraged to switch to the merged file as soon as possible.

The editors' version of the plant_ontology.obo files is maintained at the SVN repository at http://palea.cgrb.oregonstate.edu/viewsvn/Poc/trunk/ontology/OBO_format/plant_ontology.obo?view=log . Outside uers should be extremely cautious about accessing this version of the file, as it is under regular revision, and changes made to this file may not be incorporated into live releases.

The SVN system allows users and editors to track changes to POC documents, allows members to work on the same documents concurrently, and allows differences between different curators' versions of the file to be merged reliably.

In order to update the ontologies, curators use OBO-Edit. This open source Java application facilitates editing and implements the rules and constraints needed to maintain internal consistency in the ontology.

Annotation files

Plant Ontology Editing Guidelines

In addition to the OBO Foundry principles, the POC has established internal guidelines, to ensure consistent editing practices, as described below.

Ontology file syntax

As specified in the OBO Foundry principles, ontologies must be expressed in either the OBO syntax, extensions of this syntax, or OWL. All editing work and the live version of the PO are in the OBO format. For each live release, an OWL version of the plant_ontology file is also available at: http://palea.cgrb.oregonstate.edu/viewsvn/Poc/tags/live/plant_ontology.owl?view=log.

Unique identifiers

As specified in the OBO Foundry principles, the PO must possesses a unique identifier space within the OBO Foundry, so that the source of a term (i.e. class) from any ontology can be immediately identified by the prefix of its identifier. The prefix for Plant Ontology terms is PO, with the syntax PO:nnnnnnn, where nnnnnnn is a zero-padded unique integer of seven digits. To ensure database integrity, unique identifiers are never removed after they have been published in a live release of the ontology. Instead, terms that are retired from the ontology are moved into the obsolete category (see below).

To ensure that the same identifier is not used twice by different editors, each participating group has been given non-overlapping ranges of numbers (see the [Accession_IDS_Guide]]). These ranges automatically act as internal identifiers for the group that submitted the term.

Adding terms

New terms can often be added to the tips of the ontology (leaf terms) without disturbing the structure of the graph. For example, creating a new specific instance of a more general term will not disturb sibling or cousin terms. However, introducing a new root term or a term into the middle of the graph can disturb existing parent/children relationships, and needs to be performed with more care.

All new terms that are added that are added to the PO should be posted as a request on our Source Forge tracker. This provides an opportunity for all curators and collaborators to comment on the proposed term and its definition and relations. The tracker item also provides a record of the decisions that were made about a term, when those decisions were made, and why.

Merging and splitting terms

Merging and splitting of terms that have descendents can have broad ripple effects. Such changes will need to be approved by a consensus prior to committing them. Any proposed merges or splits should be posted on the Source Forge tracker.

Term names

In general, the primary name for a term is the most commonly used name, with alternative names listed as exact synonyms. When the commonly used name is ambiguous, it is important to have an unambiguous name as an exact synonym. For example, it may not be clear to non-experts or automated reasoners that "epidermal initial" is a type of cell, so it has the exact synonym "epidermal initial cell".

For more information on synonyms, see below under Other attributes for PO terms.

The names for some PO terms may appear artificial, such as "collective leaf structure". These artificial names are used for classes in which many of the instances have common names (in this case, whorl or rosette), but there is no one common name that describes all instances.

Sensu terms: The POC, following the GO, has eliminated the use of the word sensu in term names. Sensu were originally brought in to the ontology to disambiguate clashes between terms that have different meanings to different communities. For example, the term capsule is used to describe a type of angiosperm fruit, but is also used to describe the sporangium in mosses. Rather than term names like "capsule sensu moss" and "capsule sensu Angiosperma," term names should reflect the differences in characteristics between the two groups. In this example, "sporangium capsule" and "fruit capsule" would be preferable names.

All uses of the word sensu have been removed from PO term names and replaced by the taxon name plus the term name. For example, "integument sensu Zea" has been replaced by "Zea integument." In the near future, most or all of the existing taxon specific terms, will be renamed or merged with other terms. Classes that are not distinguishable from their parent class other than by the taxon in which they occur (such as Zea integument) will be merged with the parent. Classes that have unique characteristics (such as Zea ear), will be given new names that do not include the taxon name.

Plant tissue names: For clarity, the different plant tissue classes in the PO should all be preceded by the prefix "portion of," as in "portion of plant tissue" or "portion of ground tissue." Without the prefix portion of, the name ground tissue could be referring to all of the ground tissue in every plant in the world, all of the ground tissue in an individual plant, or any bit of ground tissue in any individual plant. However, many of the curators felt that having the portion of prefix in all of the plant tissue names was too user-unfriendly. It was decided to use the portion of prefix only for the top few levels (e.g. portion of plant tissue and its direct descendants). Furthermore, if a tissue type has a common name that does not have the word tissue in the name (e.g., epidermis), then the portion of prefix was not necessary. Regardless of the names, the definitions of different tissue types should specify whether they refer to a portion of plant tissue (any bit of that tissue type in any individual plant), or the maximal portion of tissue in a plant structure or in a whole plant.

Term definitions

As specified in the OBO Foundry principles, each term in the PO must have a textual definition, and that terms should be defined so that their precise meaning within the context of a particular ontology is clear to a human reader.

Whenever possible, internationally accepted nomenclature and definitions obtained from standard reference works, journal articles, and other published sources are used, although they may be modified to fit the genus-differentia form. In cases when a published definition is unavailable, or when published definitions disagree with each other, definitions will be written by the scientific curators. This is often the case for upper-level ontology terms, since many published definitions are written with specific taxa in mind, while PO definitions must be appropriate for all plant taxa to which a term can apply.

All definitions must have a reference that indicates the authority for the term. References are typically textbook or journal article citations. For uniformity, the PO uses citation database identifiers, such as PubMed IDs and ISBN numbers as DBXrefs. If a definition is written by a curator (as is often the case with lower-level terms that have simple genus-differentia definitions), then the definition dbxref should have the form Database:curators initials. For example: POC:rw or TAIR:tb. Definitions that are written collectively by the group of POC curators are identified as POC:curators.

Images or diagrams can be helpful for supplementing text definitions, in cases where words cannot adequately describe anatomic or developmental relationships. Image files cannot be inserted into the OBO file, but links to reference images on the internet may be included. In order to insure that the images remain available between live releases, curators should only provide links to websites that are approved by the group and that have stable URLs for images.

Renaming and redefining terms

Since the PO is still actively under development, it is often necessary to make changes to existing terms. This can happen if the curators feel that a term name is not clear, or is not the most commonly used name, or if a definition is found to be incorrect, or if two or more terms are redundant with each other. Curators should use the following guidelines:

Renaming: A change in the primary name of a term should be approved by the group, and a tracker item posted on Source Forge. If the name is changed from a less-common to a more-common name, then the term's ID should stay the same, and the original name should be added as a synonym. If the name is changed because the original name was incorrect for the definition, then the term should be obsoleted, and replaced by a new term with the correct name and definition.

Redefining: Any substantive change in the definition of a term, and any change in the definition of an upper-level term (generally 4 or fewer layers into the ontology) should be approved by the group, and a tracker item posted on Source Forge.

There is an ongoing effort to convert all text definitions in the PO to the genus-differentia form (see Term definitions, below). If a new genus-differentia definition is for a lower-level ontology term and does change the meaning of the original definition but only uses the new wording format, then a curator may change the definition without posting a tracker item. For example, a change to the definition of epidermis (a term with three levels of ancestors), should be posted on Source Forge, while converting all of the definitions of the different types of epidermis (shoot epidermis, root epidermis, leaf epidermis, etc.) to genus-differentia form does not need a Source Forge tracker item. If the conversion to genus-differentia form changes the meaning of the definition then a tracker item should be posted.

If the original definition for a term was wrong, and changes to a definition substantially change its meaning, such that a comparison of the original definition and the new definition suggest that they are referring to two different classes, then the original term should be made obsolete, and replaced by a term with the same name but correct definition. Having a live term and an obsolete term wit the same name will prevent the onotology form loading onto the web browser, so the original, obsolete term should be renamed "obsolete term name", where "term name" is the original name.

Term relationships

Under the standard Plant Ontology data structure, terms are allowed to have a limited number of [/docs/otherdocs/poc_file.html parent/child relationships]. We currently use following three term-to-term relationship types in the PO:

instance of: This relationship is used in both the developmental and plant structure ontologies to indicate the relationship between a specific term to a more general one. For example achenium is an instance-of a dry indeshiscent fruit, which in turn is an instance-of fruit.

[relationship type - instance of]
PO id: is_a
name: instance of
definition: 'Instance of' relationship means that the term is a subclass of its parent, It should not be confused with an 'instance', meaning a specific example.

part of: This relationship is used in the plant structure ontology to indicate a subpart/part relationship within a tissue or organ. Ectocarp is part-of pericarp, which in turn is part-of fruit

[relationship type - part of]
PO id: part_of
name: part of
definition: It indicates a subpart/part relationship within a tissue or organ. Used in a non-restrictive manner, i.e., the parent may or may not have the child as a part, and the child may or may not be a part of the parent. Therefore, a child is sometimes part of its parent and not necessarily always part of a parent term.

develops from: This relationship is used in the plant structure ontology to indicate that a tissue/organ/celltype develops from its parent term. Trichome develops from a trichoblast.

[relationship type - develops from]
PO id: develops_from
name: develops from
definition: It indicates that cell/tissue/organ develops from its parent term. Implies both, develops from and a more indirect relationship, derived from.

Other attributes for PO terms

The Plant Ontology data structure allows terms to have attributes other than their name, ID and definition. The two attributes that we use in the current PO are Synonym, which indicates an alternative name for the term, and Reference, which indicates the authority for the term. References are typically textbook or journal article citations. For uniformity, we use citation database identifiers, such as PubMed IDs and ISBN numbers as [/docs/dbxref/PO_DBXref.txt DBXrefs]. Any term can have multiple attributes, allowing several synonyms or references to be attached to a term. We will also have external IDs for the terms that have been imported from participating databases (e.g. TAIR IDs and Gramene IDs).

Obsoleting and destroying terms

Quality control and consistency checking

As the monocot and dicot ontologies are merged, conflicts and inconsistencies will inevitably arise. We will monitor the developing PO for inconsistencies by applying the true path rule, which insists that semantic coherence is maintained as terms are followed upwards to their ancestors. Equally importantly, we will subject the PO to continuous testing and evaluation as we use it for practical database curation