Difference between revisions of "PO Project Overview"

From Plant Ontology Wiki
Jump to navigationJump to search
Line 1: Line 1:
<blockquote>
 
  
==Project Description==
+
=Project Description=
  
 
The main goal of the project is to provide controlled vocabularies for the following plant-specific knowledge domains namely,
 
The main goal of the project is to provide controlled vocabularies for the following plant-specific knowledge domains namely,
Line 18: Line 17:
 
It is considered extremely important that the terms used in the controlled vocabularies be obtained/derived from internationally published sources and/or usage. Every term in the controlled vocabulary will be accompanied by an appropriate/internationally accepted definition. The definitions provided for the terms will indicate a traceable source for the definition (e.g. ISBN number). If the definition has been modified then the reference will also include a personal signature or identifier of the developer/curator/provider. Synonyms should be indicated wherever available. This standard will contribute to an authoritative product thus encouraging wide spread usage of the ontologies and will foster consistency in the annotation of data by database curators.
 
It is considered extremely important that the terms used in the controlled vocabularies be obtained/derived from internationally published sources and/or usage. Every term in the controlled vocabulary will be accompanied by an appropriate/internationally accepted definition. The definitions provided for the terms will indicate a traceable source for the definition (e.g. ISBN number). If the definition has been modified then the reference will also include a personal signature or identifier of the developer/curator/provider. Synonyms should be indicated wherever available. This standard will contribute to an authoritative product thus encouraging wide spread usage of the ontologies and will foster consistency in the annotation of data by database curators.
  
==Data Representation==
+
=Data Representation=
  
 
The PO Consortium adopted a simple data structure called the Directed Acyclic Graph (DAG). This structure is a type of hierarchy where the biological concepts are organized as a network tree structure in which the nodes at the top (root) of the tree are more general cases of specific terms at the bottom (leaves) of the structure.
 
The PO Consortium adopted a simple data structure called the Directed Acyclic Graph (DAG). This structure is a type of hierarchy where the biological concepts are organized as a network tree structure in which the nodes at the top (root) of the tree are more general cases of specific terms at the bottom (leaves) of the structure.

Revision as of 23:53, 25 November 2008

Project Description

The main goal of the project is to provide controlled vocabularies for the following plant-specific knowledge domains namely,

  • Plant Structure A controlled vocabulary of plant's morphological and anatomical structures representing organ, tissue and cell types and their relationships. Examples are stamen, gynoecium, petal, parenchyma, guard cell, etc.
  • Growth and developmental stages A controlled vocabulary of growth and developmental stages in various plants and their relationships. Examples are germination, seedling, flowering, etc.

These controlled vocabularies (arranged in ontologies) will be based on internationally published/accepted terminology and their definitions.

In order to facilitate the use of the controlled vocabularies as attributes in other ontologies being developed by collaborating databases e.g. Open Biological Ontology (OBO), the PO controlled vocabularies, arranged in ontologies, will facilitate the attribution process and querying at different levels of granularity. The PO controlled vocabularies will also facilitate the execution of uniform queries across participating databases, thereby facilitating interoperability of plant-based databases.

The members of the Plant Ontology Consortium would facilitate the sharing of information and tools developed by the Plant Ontology Consortium. However, each database which utilizes the Plant Ontology information will be able to chose how it will use the information (i.e. a non-federated approach to utilizing PO information will be practiced).

The PO will be developing an independent database of various plant-based ontologies with associated controlled vocabularies. Terminology associated with anatomy and morphology of plants also tends to be taxon specific, at the species level. Consequently, these ontologies will also be species-specific. As the range of plant ontologies grows it is envisaged that more generic ontologies will be developed which will be inclusive of groups of taxa e.g. for grasses such as rice, maize, wheat being members of the Poaceae; for legumes such as soya and medicago of the Fabaceae etc. This is a similar paradigm to that functioning in plant systematics where taxonomic hierarchies at the genus, family level etc. exist. This hierarchical structure of ontologies and associated controlled vocabularies will facilitate data retrieval via appropriate tools.

It is considered extremely important that the terms used in the controlled vocabularies be obtained/derived from internationally published sources and/or usage. Every term in the controlled vocabulary will be accompanied by an appropriate/internationally accepted definition. The definitions provided for the terms will indicate a traceable source for the definition (e.g. ISBN number). If the definition has been modified then the reference will also include a personal signature or identifier of the developer/curator/provider. Synonyms should be indicated wherever available. This standard will contribute to an authoritative product thus encouraging wide spread usage of the ontologies and will foster consistency in the annotation of data by database curators.

Data Representation

The PO Consortium adopted a simple data structure called the Directed Acyclic Graph (DAG). This structure is a type of hierarchy where the biological concepts are organized as a network tree structure in which the nodes at the top (root) of the tree are more general cases of specific terms at the bottom (leaves) of the structure.

Like a simple hierarchy, children are not allowed to be their own ancestors; hence cycles are forbidden.
However, unlike a simple hierarchy, child nodes are allowed to have more than one parent node, thus allowing multiple child to parent relationships.

Use of term-term relationships in the ontology tree

Is_a (or Instance_of): This relationship is used in both the developmental and anatomy ontologies to indicate the relationship between a specific term to a more general one. For example achenium is an instance-of a dry indeshiscent fruit, which is an instance_of fruit.

Part_of: This relationship is used in the anatomy ontology to indicate a subpart/part relationship within a tissue or organ. For example ectocarp is part_of pericarp, which in turn is part_of fruit.

Develops_from: This relationship is used in the anatomy ontology to indicate the temporal relationship between a tissue or organ and its developmental predecessor. For example ectocarp develops_from ovary outer.

The use of symbols denoting term-term relationship types in flat files
% is used to represent an is_a relationship,
< is used to represent a part_of
~ is used to represent develops_from relationship.

Other attributes of terms can also be represented, namely synonym(s), unique identifier and database cross-reference.

A term may have one or more synonyms

Alphanumeric identifier:

Every term has a unique alphanumeric identifier to be used as a database cross-reference [/docs/numbers/number.html identifier] in collaborating databases.

Syntax:

Parent-child relationships between terms are represented by indentation:

parent_term
child_term

Instance_of relationship represented as follows:

%term0
%term1 %term2

This is read as term1 being an instance of term0 and also an instance of term2.

Combined Instance relationship and Part-of relationship represented as follows:

%term0
%term1 < term2 < term3

This is read as term1 being an instance of term0 and also a part-of term2 and term3.

Line syntax is represented as follows:

< | % term [; db cross ref]* [;synonym:text]* [< | % term]*

It is considered important to have a clear picture of domain-conceptualization. The domain-conceptualization process provides names and describes/defines the entities that may exist in the ontology for that domain and the relationships among those entities. It therefore provides a vocabulary (controlled vocabulary) for representing and communicating knowledge about the domain.

Ontogenetic and phylogenetic information of various macromorphological systems pertinent to plants (i.e. roots, stems, leaves, inflorescence, fruit, seed) can be reflected in relevant ontologies. Clarification of the structure and placement of these ontologies can be provided by appropriate interpretation from published phylogenetic studies of plants. The information provided about characters/traits with the phylogenetic labels of synapomorphies, symplesiomorphies, apomorphies, could be valuable in determining/clarifying the structure of ontologies (notwithstanding the fact that this phylogenetic information is implicitly hypothetical).

Furthermore, the role of ontogeny information is evaluated regarding its possible use to test information according to the True Path Rule. Such perspectives are used in developing the ontology of stoma development and structure and the ontology of female and male inflorescence and flower structure.