PO Developers Guide

From Plant Ontology Wiki
Jump to navigationJump to search

This page is under re-construction

As an OBO Foundry ontology, the Plant Ontology Consortium (POC) is guided by the OBO Foundry principles. Other standard operating procedures have been developed to insure consistency among curators.

Plant Ontology Files

Our curation and management model is based on a tested protocol established by the Gene Ontology Consortium.

In order to update the ontologies, curators use OBO-Edit. This open source Java application facilitates editing and implements the rules and constraints needed to maintain internal consistency in the ontology.

current file format:

In January 2011, the Plant Structure Ontology (PSO) and the Plant Growth and Development Stage Ontology (PGDSO) were merged into one file: plant_ontology.obo. This file includes terms, relationships, and definition related to plant anatomical entities as well as plant growth and development stages. This merged file allows users to download the entire ontology as a single file and allows editors to create relations between terms in the two branches of the PO.

The editors' version of the plant_ontology.obo file is maintained in the Github repository at https://github.com/Planteome/plant-ontology . Outside users should be extremely cautious about accessing this version of the file, as it is under regular revision, and changes made to this file may not be incorporated into live releases. The Github system allows users and editors to track changes to POC documents, allows members to work on the same documents concurrently, and allows differences between different curators' versions of the file to be merged reliably.

Live versions of the merged Plant Ontology files can be downloaded from the Github repository. For the immediate future, the POC will maintain separate versions of the po_anatomy.obo and po_temporal.obo files that match the live version of plant_ontology.obo (but obviously will lack links between the two branches) at our Github site. However, users are encouraged to switch to the merged file as soon as possible.

downloading Plant Ontology files

For each live release, four different versions of the ontology file are provided, three in OBO syntax and one in OWL syntax:

plant_ontology.obo: The editor's version of the live release, in the OBO format. Contains all relations. Relations that are inferred based on intersection_of relation are not asserted. This version is best for users who want to run their own reasoner on the ontology.

plant_ontology.owl: The editor's version of the live release, in the OWL format. Contains all relations. Relations that can be inferred based on intersection_of relations are not asserted. This version is best for users who require the OWL format.

plant_ontology_assert.obo: The browser's version of the live release, in the OBO format. All relations that can be inferred based on intersection_of relations have been asserted. This matches the version of the Plant Ontology on our web browser and allows users to view all relations in the ontology without running a reasoner.

Note: Implied relations can be asserted either in OboEdit (using the "assert implied relations" panel) or in Oort. Unlike OboEdit, Oort also asserts the relations that are specified in any intersection_of lines. For example, if leaf sinus is defined as "intersection_of is_a sinus" and "intersection_of part_of leaf", Oort will assert "leaf sinus part_of leaf". "Assert implied relations" in OE will NOT do this, but the relation will show up in graphical view if the reasoner is on. As of release 17, we are using Oort to assert implied relations.

plant_ontology_assert_basic.obo: Similar to the browser's version of the live release, but with only is_a and part_of relations. All relations that can be inferred based on intersection_of relations have been asserted. The relations adjacent_to, derives_by_manipulation_from, develops_from, has_part, participates_in, and has_participant have been removed. This version is best for users whose applications may have difficulty handling relations other than is_a and part_of.

Note: When the adjacent_to, derives_by_manipulation_from, develops_from, has_part, participates_in, and has_participant relations are removed, they are also removed from any intersection_of tags. This leaves some terms with a single intersection_of relation (the is_a relation). These will show up as a warning in OboEdit, and must be removed by hand. An alternative would be to create plant_ontology_assert_basic.obo so that ALL intersection_of lines are removed from the obo file. Since the relations have alread been asserted, there is no need to keep the intersection_of tags.


In compliance with OBO Foundry principles, the PO is "open and available to be used by all without any constraint other than (a) its origin must be acknowledged and (b) it is not to be altered and subsequently redistributed under the original name or with the same identifiers." The live version of this file is available through our Github repository at https://raw.githubusercontent.com/Planteome/plant-ontology/master/po-release-files/plant_ontology.obo as well as on our web browser at http://plantontology.org/amigo/go.cgi.

old file formats:

Plant Ontology (PO) development work was originally based around four separate text files in flat file format.

anatomy.ontology -- Plant structure related terms and relationships

anatomy.defs -- Definitions of Plant structure related terms

development.ontology -- Growth and development stage terms and relationships

development.defs -- Definitions of growth and development stage terms


Definition files were eventually merged into the ontology files, so that the ontologies were maintained in two files using the OBO flat file format:

po_anatomy.obo -- Plant structure related terms, relationships, and definition

po_temporal.obo -- Growth and development stage terms, relationships, and definitions

Association files

Collaborating databases and projects provide the POC project a tab delimited file, known informally as a " association file". This file carries links between database objects and PO terms. The database object may represent one of gene, transcript, protein, protein_structure, complex, germplasm (stock/cultivar), mutant, QTL, etc.. Please refer to the Annotation_File_Format page for guidelines on how to prepare annotation files.

Annotation/association files are currently being contributed by Gramene, MaizeGDB, NASC, SGN, and TAIR.

Needs to be updated

Plant Ontology Editing Guidelines

In addition to the OBO Foundry principles, the POC has established internal guidelines, to ensure consistent editing practices, as described below.

Ontology file syntax

As specified in the OBO Foundry principles, ontologies must be expressed in either the OBO syntax, extensions of this syntax, or OWL. All editing work and the live version of the PO are in the OBO format, but both an OBO version and and OWL version are provided with each release.

Since the January 2011 relase, the Plant Ontology is provided as a single ontology file, which contains both the Plant Anatomical Entity branch and the Plant Structure Development Stage branch.

The live version of the Plant Ontology can be downloaded from the POC Github site.

Unique identifiers

As specified in the OBO Foundry principles, the PO must possesses a unique identifier space within the OBO Foundry, so that the source of a term (i.e. class) can be immediately identified by the prefix of its identifier. The prefix for Plant Ontology terms is PO, with the syntax PO:nnnnnnn, where nnnnnnn is a zero-padded unique integer of seven digits. To ensure database integrity, unique identifiers are never removed after they have been published in a live release of the ontology. Instead, terms that are retired from the ontology are moved into the obsolete category (see below).

To ensure that the same identifier is not used twice by different editors, each contributing group within the POC has been given non-overlapping ranges of numbers (see the Accession_IDS_Guide). These ranges automatically act as internal identifiers for the group that submitted the term. (note: these need to be updated/revised)

A term can have multiple IDs (one primary ID and one or more alternate IDs). Alternate IDs are created whenever two terms are merged internally or when terms have been added to the PO by merging external ontologies (e.g., TAIR and Gramene IDs).

Adding terms

New terms can often be added to the tips of the ontology (leaf terms) without disturbing the structure of the graph. For example, creating a new specific instance of a more general term will not disturb sibling or cousin terms. However, introducing a new root term or a term into the middle of the graph can disturb existing parent/children relationships, and needs to be performed with more care.

All new terms that are added that are added to the PO should be posted as a request on our Github Issues tracker. This provides an opportunity for all curators and collaborators to comment on the proposed term and its definition and relations. The tracker item also provides a record of the decisions that were made about a term, when those decisions were made, and why.

Merging terms

Merging and splitting of terms that have descendents can have broad ripple effects. Such changes will need to be approved by a consensus prior to committing them. Any proposed merges or splits should be posted on the Github Issue tracker.

Two or more terms should be merged when the curators determine that they are describing the same class of objects. This often occurs when two ontologies are merged, and there is some overlap of classes between the ontologies. It can also occur if a decision is made only to include a more general class rather than several specific classes. For example, achene, berry and capsule were merged into the term fruit. Finally, merging may be necessary if the curators determine that two classes are redundant, such as when meristemoid was merged with initial cell.

Merging involves a target term and a source term (or terms). When the two terms are merged, the ID of the target term becomes the primary ID while the ID of the source term becomes a secondary ID. The term name of the target term remains as the primary name while the term name of the source term becomes an exact synonym. Definitions and comments are merged, and the curators must edit them to ensure that they are appropriate for the new merged term.

Term names

The POC should follow OBO Foundry naming conventions.

In general, the primary name for a term is the most commonly used name, with alternative names listed as exact synonyms. When the commonly used name is ambiguous, it is important to have an unambiguous name as an exact synonym. For example, it may not be clear to non-experts or automated reasoners that "epidermal initial" is a type of cell, so it has the exact synonym "epidermal initial cell".

For more information on synonyms, see below under "Synonyms".

The names for some PO terms may appear artificial, such as "collective leaf structure". These artificial names are used for classes in which many of the instances have common names (in this case, whorl or rosette), but there is no one common name that describes all instances.

taxon-specific terms

All term names in the PO should be taxon neutral. The POC, following the GOC, has eliminated the use of the word sensu in term names. Sensu was originally brought in to the ontology to disambiguate clashes between terms that have different meanings to different communities. For example, the term capsule is used to describe a type of angiosperm fruit, but is also used to describe the sporangium in mosses. Rather than term names like "capsule sensu moss" and "capsule sensu Angiosperma," term names should reflect the differences in characteristics between the two groups. In this example, "sporangium capsule" and "fruit capsule" would be preferable names.

In the fall of 2010, all uses of the word sensu were removed from PO term names and replaced by the taxon name plus the term name. For example, "integument sensu Zea" has been replaced by "Zea integument." In the fall of 2011, all of the existing taxon specific terms were renamed or merged with other terms. Classes that were not distinguishable from their parent class, other than by the taxon in which they occur (such as Zea integument), were merged with the parent. Classes that have unique characteristics (such as Zea ear), were given new names that do not include the taxon name (such as ear inflorescence). See Eliminating_Zea/Poaceae_terms_from_PO for more details.

Plant tissue names

For clarity, the different plant tissue classes in the PO should all be preceded by the prefix "portion of," as in "portion of plant tissue" or "portion of ground tissue." Without the prefix "portion of," the name ground tissue could be referring to all of the ground tissue in every plant in the world, all of the ground tissue in an individual plant, or any bit of ground tissue in any individual plant. However, many of the curators felt that having the portion of prefix in all of the plant tissue names was too user-unfriendly. It was decided to use the portion of prefix only for the top few levels (e.g. portion of plant tissue and its direct descendants). Furthermore, if a tissue type has a common name that does not have the word tissue in the name (e.g., epidermis), then the portion of prefix is not used. Regardless of the names, the definitions of different tissue types should specify whether they refer to a portion of plant tissue (any bit of that tissue type in any individual plant), or the maximal portion of tissue in a plant structure or in a whole plant.

Term definitions

As specified in the OBO Foundry principles, each term in the PO must have a textual definition, and terms should be defined so that their precise meaning within the context of a particular ontology is clear to a human reader.

As of the March 2012 release, we will include the POids on the first usage of any PO terms used in definitions and comments.

Whenever possible, internationally accepted nomenclature and definitions obtained from standard reference works, journal articles, and other published sources are used, although they may be modified to fit the genus-differentia form. In cases when a published definition is unavailable, or when published definitions disagree with each other, definitions will be written by the curators. This is often the case for upper-level ontology terms, since many published definitions are written with specific taxa in mind, while PO definitions must be appropriate for all plant taxa to which a term can apply.

All definitions must have a reference that indicates the authority for the term. References are typically textbook or journal article citations. For uniformity, the PO uses citation database identifiers, such as PubMed IDs and ISBN numbers as DBXrefs. If a definition is written by a curator (as is often the case with lower-level terms that have simple genus-differentia definitions refering to other PO terms), then the definition DBXref should have the form Database:curator initials. For example: POC:rw or TAIR:tb. Definitions that are written collectively by the POC curators are identified as POC:curators.

Images or diagrams can be helpful for supplementing text definitions, in cases where words cannot adequately describe anatomic or developmental relationships. Image files cannot be inserted into the OBO file, but links to reference images on the internet may be included. In order to insure that the images remain available between live releases, curators should only provide links to websites that are approved by the group and that have stable URLs for their images.

Use of other ontology id's in definitions

As a general guideline, whenever we use a definition from another ontology for a term, but alter it to fit PO, we should put the other ontology ID plus POC:curators into the definition dbxrefs. This gives credit to the source of the definition.

For example: plant structure; def'n: An anatomical structure (CARO:0000003) that is or was part of a plant, or was derived from a part of a plant. [source: POC:curators]

should be: plant structure; def'n: An anatomical structure that is or was part of a plant, or was derived from a part of a plant. [source: POC:curators; CARO:0000003]

Note: We should confirm that the browser will automatically create hyperlinks to databases in the GO database registry, and request that GO register databases that are not there (such as CARO).


As of release #17 (April 2012), We have begun including the IDs of PO and GO terms used in PO definitions and comments. This allows users to link directly to those terms from the AmiGO browser and provides additional quality control.

An id is listed in parentheses after the first mention of a term in a definition or comment. All new definitions have ids in them, and ids are being added to older definitions as part of an ongoing process.

Renaming and redefining terms

Since the PO is still actively under development, it is often necessary to make changes to existing terms. This can happen if the curators feel that a term name is not clear, or is not the most commonly used name, or if a definition is found to be incorrect, or if two or more terms are redundant with each other. Curators should use the following guidelines:

Renaming: A change in the primary name of a term should be approved by the group, and a tracker item posted on Github. If the name is changed from a less-common to a more-common name, then the term's ID should stay the same, and the original name should be added as a synonym. If the name is changed because the original name was incorrect for the definition, then the term should be obsoleted, and replaced by a new term with the correct name and definition.

Redefining: Any substantive change in the definition of a term, and any change in the definition of an upper-level term (generally 4 or fewer layers into the ontology) should be approved by the group, and a tracker item posted on Github.

There is an ongoing effort to convert all text definitions in the PO to the genus-differentia form (see Term definitions, below). If a new genus-differentia definition is for a lower-level ontology term and does change the meaning of the original definition but only uses the new wording format, then a curator may change the definition without posting a tracker item. For example, a change to the definition of epidermis (a term with three levels of ancestors), should be posted on Github, while converting all of the definitions of the different types of epidermis (shoot epidermis, root epidermis, leaf epidermis, etc.) to genus-differentia form does not need a Github Issues tracker item. If the conversion to genus-differentia form changes the meaning of the definition then a tracker item should be posted.

If the original definition for a term was wrong, and changes to a definition substantially change its meaning, such that a comparison of the original definition and the new definition suggest that they are referring to two different classes, then the original term should be made obsolete, and replaced by a term with the same name but correct definition (see also OBO Foundry principles). Having a live term and an obsolete term wit the same name will prevent the onotology form loading onto the web browser, so the original, obsolete term should be renamed "obsolete term name", where "term name" is the original name.

Synonyms

A synonym indicates an alternative name for a term. Terms can have multiple synonyms.

The scope of a synonym may fall into one of four categories:

exact: The definition of the synonym is exactly the same as primary term definition. This is used when the same plant structure can have more than one name. e.g. anther wall has exact synonym ‘pollen sac wall’.

Additionally, translations into other languages are listed as exact synonyms. The PO list both Spanish and Japanese translations as exact synonyms; e.g. anther wall has exact synonym ‘pared de la antera’ (Spanish) and ‘葯壁 ‘(Japanese).

narrow: The definition of the synonym is the same as the primary definition, but has additional qualifiers. For example, pod is a narrow synonym of fruit. The definition of fruit accurately describes a pod, but a pod has additional characteristics that not all fruits share. Also used to supply a species-specific names so that users such as plant breeders can relate their terms to the PO hierarchy e.g. subterranean tuber axillary vegetative bud has narrow synonym ‘potato eye’.

broad: The primary definition accurately describes the synonym, but the definition of the synonym may encompass other structures (PO classes) as well. In most cases where a broad synonym is given, it will be a broad synonym for more than one PO term.

For example, 'adventitious root' is a broad synonym of both basal root and shoot-borne root, because the definition of adventitious root can encompass both basal and shoot-borne roots. Also, both awn and trichome have the broad synonym ‘bristle’.

related: This scope is applied when a word of phrase has been used synonymously with the primary term name in the literature, but the usage is not strictly correct. That is, the synonym in fact has a slightly different meaning than the primary term name.

Since users may not be aware that the synonym was being used incorrectly when searching for a term, related synonyms are included. For example, 'carpel septum' is a related synonym of ovary septum and phellem has related synonym ‘cork..


Please note: The software used to create and edit the PO uses related as the default synonym type. Therefor, many of the existing PO synonyms are listed as related, even though they should be exact or narrow. Work is ongoing to convert inappropriate related synonyms to the correct scope.


Synonyms can also be classified by types. The default is no type. Three new types of synonyms are under development in the PO: Spanish language synonyms, Japanese language synonyms, and plural synonyms. Within each type, synonyms can be defined as exact, narrow, broad, or related. Spanish, Japanese and plural synonyms are specified as exact synonyms.

Whenever possible, database cross-references (dbxrefs) for synonyms should be provided, to indicate the publication that used the synonym. However, dbxrefs for synonyms are not mandatory. For foreign language synonyms, the dbxref will usually be the database that provided the synonym followed by the curator's initials.

Relations among PO classes

Under the standard Plant Ontology data structure, terms are allowed to have a limited number of relations. Relations are formally defined in the OBO relation ontology.

The PO currently has following term-to-term relation types:

  • is_a
  • part_of
  • has_part
  • derives_by_manipulation_from
  • develops_from
  • adjacent_to
  • participates_in
  • has_participant

For more details about relations, including formal definitions, see Relations in the Plant Ontology.

Icons for Relations

When a new relation is added to the Ontology, an icon should be developed for it and it should be stored on the GitHub site.

Associations and relations

For is_a and part_of relations, associations for a child term are passed up to the parent term. For example, annotations associated with megasporphyll are also associated with sporophyll and phyllome, and annotations associated with ectocarp are also associated with pericarp and fruit.

For has_part, develops_from, derives_by manipulation_from, and adjacent_to relations, associations should not be passed from parent to child.

For example, the genes express in an inflorescence may or may not be expressed in flowers (inflorescence has_part flower), so the annotations associated with inflorescence should not be passed up to flower.

Likewise, genes expressed in a root hair cell may or may not be expressed in a trichoblast (root hair cell develops_from trichoblast), so the annotations associated with root hair should not be passed up to trichoblast.

Genes expressed in an anther wall endothecium may not be expressed in an anther wall exothecium (anther wall endothecium adjacent_to anther wall exothecium), so annotations should not move from anther wall endothecium to anther wall exothecium.

References (Dbxrefs)

There may be multiple types of references, or database cross-references (dbxrefs), associated with a PO term.

A definition dbxref is required for any term that has a definition (see above under "Term definitions." This indicates the authority for the term and should either be a textbook, journal article, or the database and initials of the curator. Definition dbxrefs should have the form database:identifier. Database is either the abbriviation for an external database such as PMID or ISBN (see DBXrefs) or the abbreviation for the ontology that defined the term, such as the Plant Ontology Consortium (POC), The Arabidopsis Information Resource (TAIR), Gramene (GR), or the Common Anatomy Reference Ontology (CARO). The identifier will be a unique number in the case of an external database (e.g., PubMed) or ontology (e.g. CARO), or will be the curator's initials in the case of POC, TAIR, or GR.

A synonym dbxref is the same as a definition dbxref, except that it describes the publication that used the synonym.

A term may have an Xref that is not associated with a definition or synonym. This may either refer to an external ontology that has a similar term (and will take the form of the unique identifier for that term in the external ontology), or it may be a link to an image in an external database (including a URL). Other types of xrefs are also possible, such as references to the Github issue tracker items that discuss terms (OBO_PO_SF).

Obsoleting and destroying terms

Destroying a term eliminates the term's ID from the ontology permanently, making it possible for that ID to be used again by another term. Obsoleting a term eliminates the term from the live version of the ontology, but does not remove the term from the flat file or destroy the term's ID. Therefore, an obsolete ID can never be used again for another term. Destroyed terms do not appear in the ontology flat file, but obsolete terms do appear in the flat file, with an IS_OBSOLETE tag.

Once an ID has been used in a live release version of the PO, it should never be destroyed. Destroying a term should only be done when a term has only been used in the editors' version of the ontology file. Because destroying a term makes its ID available for another term, outside users should never incorporate the editor's version of the PO file into an application, as this could lead to incorrect use of IDs.

Note that the name "deprecate" is used instead of "obsolete" by the ontology editing software Protoge.

when to obsolete a term

A term should be made obsolete if the curators determine the definition of the term is incorrect for the existing name or when redefining a term results in a substantial change in the meaning of the definition (see above under "Renaming and redefining terms" and the relevant OBO Foundry principles

If a term is renamed, and the original name is not an exact synonym for the new name, the old term should instead be made obsolete and replaced by a new term with the new name. Another common reason for obsoleting a term is when it falls outside the domain of the PO and should be moved to another ontology (e.g., the PO originally had terms for sub-cellular components that have now been made obsolete and moved to GO).

treatment of obsolete terms

Whenever a term is made obsolete, the word "OBSOLETE." (all caps, followed by period) should be added to the beginning of the obsolete term's definition. In addition, a comment should be added to the term, explaining why it was made obsolete. The comment should include the name(s) and id(s) of the replacement or consider term(s) (see below). If the term was moved to another ontology, the name and ID for that ontology should be provided in the comment.

Note that edits to the definition and comment should be made before the term is obsoleted, because obsolete terms cannot be edited in Obo Edit (they can be edited in the text file, if necessary).

Whenever possible, a replaced_by or consider relation should be added to obsolete terms. This allows users to automatically locate the new term that may take the place of the obsolete term.

Replaced_by should be used only when there is a one-to-one correspondence between the obsolete term and the replacement term, that is, the new term can be used for every instance of the old term. Even though replaced_by can be used to automatically move annotations from an obsolete term to another term, annotator are still responsible for checking the the new term is appropriate. Consider is use when there is more than one possible replacement for the obsolete term, or when there is only one replacement, but that replacement may not be appropriate in every case.

Intersection_of relations and cross-product definitions

In addition to standard relation tags (is_a, part_of, etc.), terms in the PO may have intersection_of tags. These are also referred to as cross-product definitions or logical computable definitions.

Intersection_of relations differ from regular relations in that regular relation tags specify necessary conditions, while intersection_of tags specify necessary and sufficient conditions.

More information can be found at the GO wiki page for logical definitions.

It is not necessary to redundantly assert intersection_of relations, but not doing so can cause confusion in OboEdit. For example, if leaf sinus has the relations "intersection_of is_a sinus" and "intersection_of part_of leaf", you do not need to add the relations "leaf sinus is_a sinus" and "leaf sinus part_of leaf". However, if you do not, then the terms will not display properly in OboEdit unless the reasoner is on. For the PO, we always assert the is_a parent, even when there is an intersection of relation. In most cases, we have also been asserting the second relation (the part_of relation in this example). This was necessary, because OboEdit did not add that second relationship when the "Assert implied relations" command was used. We are now using Oort to assert implied relations, and Oort does add those second relationships, so it is no longer technically necessary to assert those relations by hand. Nonetheless, not having them can cause confusion during editing, since there are still some issues with keeping the reasoner on all the time in OboEdit. In general, we have been asserting the second relation whenever the intersection_of tags are used to define the primary classification of a term, and not asserting the second relation whenever the intersection_of tags are used for secondary classifications.

Quality control and consistency checking

Editors use the Rule Based Reasoner built into OBOEdit on a regular basis to check for redundancies. As the use of cross-product definitions is expanded, the reasoner will also be used to check for implied relationships that perhaps should be asserted and to look for logical inconsistencies.

An important quality control is to identify any cycles in the ontology file, because they will prevent the file from loading on AmiGO and some other applications. Often these cycle arise through the use of relations that are biologically and logically correct, but cannot be handled by some reasoners. For example, we can correctly assert that "tracheary element part_of xylem" and "xylem has_part tracheary element". However, most reasoners do not understand the difference between part_of and has_part, and see these two relations as a cycle. The onotlogy file should be checked for cycle when the reasoner is on, because some cycles only show up through implied relations.

Use of Subsets

details to be added...

The ontology file can be filtered by subset using Oort.

Committing changes

  • Always do a git pull before editing, to make sure you are using the current version of the file.
  • As soon as you are finished editing, commit to Github. Provide a brief description of the changes you made. Details of changes (text of new definitions, reference, etc.) should go on the Github issue tracker.
  • Record any new terms, obsolete terms, or term name changes on the appropriate "New and Obsolete terms" wiki page for the upcoming release. If you later destroy a new term, or change its name, be sure to update this list. Maintain the list in numerical order, so it is easy to check for completeness.
  • Record any other changes in the appropriate "Summary of changes page" for the upcoming release.

Other resources

http://obi-ontology.org/page/OBIDeprecationPolicy