Difference between revisions of "PO Developers Guide"
Line 52: | Line 52: | ||
Every term in the Plant Ontology is identified by a [/docs/numbers/number.html unique identifier]. Under the conventions that we have already established, the syntax of a PO identifier is PO:nnnnnnn, where nnnnnnn is a zero-padded unique integer of seven digits. To ensure database integrity, [/docs/numbers/number.html unique identifier] are never removed. Terms that are retired from the ontology are moved into the obsolete category. To ensure that the same identifier is not used twice, each participating group will be given non-overlapping [/docs/numbers/number.html ID] ranges. These ranges will automatically act as (internal) identifiers for the group that submitted the term. | Every term in the Plant Ontology is identified by a [/docs/numbers/number.html unique identifier]. Under the conventions that we have already established, the syntax of a PO identifier is PO:nnnnnnn, where nnnnnnn is a zero-padded unique integer of seven digits. To ensure database integrity, [/docs/numbers/number.html unique identifier] are never removed. Terms that are retired from the ontology are moved into the obsolete category. To ensure that the same identifier is not used twice, each participating group will be given non-overlapping [/docs/numbers/number.html ID] ranges. These ranges will automatically act as (internal) identifiers for the group that submitted the term. | ||
− | ==Adding terms== | + | ===Adding terms=== |
− | New | + | New terms can often be added to the tips of the ontology (leaf terms) without disturbing the structure of the graph. For example, creating a new specific instance of a more general term will not disturb sibling or cousin terms. However, introducing a new root term or a term into the middle of the graph, and thereby disturbing existing parent/children relationships, needs to be performed with more care. |
All new terms that are added that are added to the PO should be posted as a request on our [https://sourceforge.net/tracker/?group_id=76834&atid=835555 Source Forge tracker]. This provides an opportunity for all curators and collaborators to comment on the proposed term and its definition and relations, and provides a record of what decisions were made about a term and why. | All new terms that are added that are added to the PO should be posted as a request on our [https://sourceforge.net/tracker/?group_id=76834&atid=835555 Source Forge tracker]. This provides an opportunity for all curators and collaborators to comment on the proposed term and its definition and relations, and provides a record of what decisions were made about a term and why. | ||
− | |||
==Merging and splitting terms== | ==Merging and splitting terms== |
Revision as of 16:56, 4 February 2011
Plant Ontology Developers Guide
This page is under re-construction
Plant Onotology Files
Our curation and management model is based on a tested protocol established by the Gene Ontology Consortium.
PO development work was originally based around four separate text files in flat file format.
anatomy.ontology -- Plant structure related terms and relationships
anatomy.defs -- Definitions of Plant structure related terms
development.ontology -- Growth and development stage terms and relationships
development.defs -- Definitions of growth and development stage terms
Definition files were eventually merged into the ontology files, so that the ontologies were maintained in two files using the OBO flat file format:
po_anatomy.obo -- Plant structure related terms, relationships, and definition
po_temporal.obo -- Growth and development stage terms, relationships, and definitions
In January 2011, the Plant Structure Ontology (PSO) and the Plant Growth and Development Stage Ontology (PGDSO) were merged into one file:
plant_ontology.obo -- includes both plant structure related terms, relationships, and definition as well as plant growth and development stage terms, relationships, and definitions
This merged file allows users to download the entire ontology as a single file, and allows editors to create relations between terms in the two branches of the PO. The live version of this file is available through our SVN repository at http://palea.cgrb.oregonstate.edu/viewsvn/Poc/tags/live/plant_ontology.obo?view=co . For the immediate future, the POC will maintain separate versions of the po_anatomy.obo and po_temporal.obo files that match the live version of plant_ontology.obo (but obviously will lack links between the two branches) at our SVN site. However, users are encouraged to switched to the merged file as soon as possible.
The editors' version of the plant_ontology.obo files is maintained at the SVN repository at http://palea.cgrb.oregonstate.edu/viewsvn/Poc/trunk/ontology/OBO_format/plant_ontology.obo?view=log . Users should be extremely cautious about accessing this version of the file, as it is under regular revision, and changes made to this file may not be incorporated into live releases.
The SVN system allows users and editors to track changes to POC documents, members to work on the same documents concurrently, and differences between different curators' versions of the documents to be merged reliably.
In order to update the ontologies, curators use OBO-Edit. This open source Java application facilitates editing and implements the rules and constraints needed to maintain internal consistency in the ontology.
Plant Ontology Guidelines
As an OBO Foundry candidate ontology, PO curators are guided by OBO Foundry principles. In addition, the POC has established internal guidelines, to ensure consistent practices.
Unique identifiers
As an OBO Foundry ontology, the PO must possesses a unique identifier space within the OBO Foundry.
The source of a term (i.e. class) from any ontology can be immediately identified by the prefix of the identifier of each term. It is, therefore, important that this prefix be unique. For more details see the ID policy.
Every term in the Plant Ontology is identified by a [/docs/numbers/number.html unique identifier]. Under the conventions that we have already established, the syntax of a PO identifier is PO:nnnnnnn, where nnnnnnn is a zero-padded unique integer of seven digits. To ensure database integrity, [/docs/numbers/number.html unique identifier] are never removed. Terms that are retired from the ontology are moved into the obsolete category. To ensure that the same identifier is not used twice, each participating group will be given non-overlapping [/docs/numbers/number.html ID] ranges. These ranges will automatically act as (internal) identifiers for the group that submitted the term.
Adding terms
New terms can often be added to the tips of the ontology (leaf terms) without disturbing the structure of the graph. For example, creating a new specific instance of a more general term will not disturb sibling or cousin terms. However, introducing a new root term or a term into the middle of the graph, and thereby disturbing existing parent/children relationships, needs to be performed with more care.
All new terms that are added that are added to the PO should be posted as a request on our Source Forge tracker. This provides an opportunity for all curators and collaborators to comment on the proposed term and its definition and relations, and provides a record of what decisions were made about a term and why.
Merging and splitting terms
Merges and splittings of terms that have descendents can have broad ripple effects. Such changes will need to be approved by a consensus prior to committing them. Any proposed merges or splits should be posted on the Source Forge tracker.
Term definitions
All terms in the PO will be associated with a human-readable definition that concisely describes the meaning and context of the term. Whenever possible, we will use internationally accepted nomenclature and definitions obtained from standard reference works, journal articles, and other sources. In cases when a published definition is unavailable, they will be written by the scientific curators.
We will allow references to images/diagrams to be inserted into term definitions, thereby supplementing text definitions in cases where words cannot adequately describe anatomic or developmental relationships. Diagrams can be be maintained in a standard size and format in a directory under CVS control.
Term relationships
Under the standard Plant Ontology data structure, terms are allowed to have a limited number of [/docs/otherdocs/poc_file.html parent/child relationships]. We currently use following three term-to-term relationship types in the PO:
instance of: This relationship is used in both the developmental and plant structure ontologies to indicate the relationship between a specific term to a more general one. For example achenium is an instance-of a dry indeshiscent fruit, which in turn is an instance-of fruit.
[relationship type - instance of]
PO id: is_a
name: instance of
definition: 'Instance of' relationship means that the term is a subclass of its parent, It should not be confused with an 'instance', meaning a specific example.
part of: This relationship is used in the plant structure ontology to indicate a subpart/part relationship within a tissue or organ. Ectocarp is part-of pericarp, which in turn is part-of fruit
[relationship type - part of]
PO id: part_of
name: part of
definition: It indicates a subpart/part relationship within a tissue or organ. Used in a non-restrictive manner, i.e., the parent may or may not have the child as a part, and the child may or may not be a part of the parent. Therefore, a child is sometimes part of its parent and not necessarily always part of a parent term.
develops from: This relationship is used in the plant structure ontology to indicate that a tissue/organ/celltype develops from its parent term. Trichome develops from a trichoblast.
[relationship type - develops from]
PO id: develops_from
name: develops from
definition: It indicates that cell/tissue/organ develops from its parent term. Implies both, develops from and a more indirect relationship, derived from.
Other attributes for PO terms
The Plant Ontology data structure allows terms to have attributes other than their name, ID and definition. The two attributes that we use in the current PO are Synonym, which indicates an alternative name for the term, and Reference, which indicates the authority for the term. References are typically textbook or journal article citations. For uniformity, we use citation database identifiers, such as PubMed IDs and ISBN numbers as [/docs/dbxref/PO_DBXref.txt DBXrefs]. Any term can have multiple attributes, allowing several synonyms or references to be attached to a term. We will also have external IDs for the terms that have been imported from participating databases (e.g. TAIR IDs and Gramene IDs). There is also the sensu qualifier, which is used to disambiguate clashes between terms that have different meanings to different communities. A good example of this is the incomplete flower from Poaceae, which is called floret. However in Compositae the structures that are called florets are quite different in organization and it would be a mistake to conflate them. Therefore it is best to create two floret terms, one used sensu Poaceae, and the other sensu Compositae.
Quality control and consistency checking
As the monocot and dicot ontologies are merged, conflicts and inconsistencies will inevitably arise. We will monitor the developing PO for inconsistencies by applying the true path rule, which insists that semantic coherence is maintained as terms are followed upwards to their ancestors. Equally importantly, we will subject the PO to continuous testing and evaluation as we use it for practical database curation