PO-FNA webex meeting June 21st 2011

From Plant Ontology Wiki
Jump to navigationJump to search

In attendance:

POC members: Laurel Cooper (OSU), Pankaj Jaiswal (OSU), Ramona Walls (NYBG).

Collaborators: James Macklin (institution; Flora of North America; james.macklin@gmail.com) and Hong Cui (University of Arizona; hong1.cui@gmail.com)

[Flora of North America]

Link to [webex recording]


Purpose of meeting:

  • To establish collaborative relationship with PO/TO so that the extracted terms can be added to the ontologies. They will use the PO/TO ID in their annotation

Their group is interested in text mining applications using the PO and TO, as well as other ontologies.

Two applications: Flora of North America project and Taxonomic Concepts Project (new)

  • Current project @ parsing FNA is in the 3rd year, ends next year

See link to [Project overview] and [FNA project site]

  • New project- ABI(?) proposal being prepared: Parsing literature descriptions for a plant family and bee(?)family,

This project will be more difficult, will include specimen descriptions, predicted to be messier than describing genus and species.

New project: Parsing Taxonomic Concept Analysis:

Project goal: Robustly produce the software they have developed on the FNA project and extend it to a new application- Taxonomic Concept Analysis

Taxonomic Concept: eg. published fact such as accepted plant name and all the synonyms. (simplest case), essentially a checklist

-difficult for users to make sense of all the names, etc

-Want to move from names to "character spaces"; parsed descriptions will describe what is held by the name

-Characters and attributes, convert words back into matrices, use that to do analyses: logic-based and entropy-based ("information gain"). Assess: What is the character space held by this name?

Then: Compare character spaces- more quantitative and interesting

RW: discrete characters or continuous? JM: Want to encompass them all: Classical taxonomy has cont. characters have been converted to discrete characters. Methods will allow for polymorphism, discrete and continuous characters. Hardest part is defining terms and what those terms hold

Getting the terms into the PO/TO

recording timestamp: 8:30

Hong: let's discuss specifics: How and what format do the terms need to be in? To help the PO quickly evaluate and add them and give us the PO#.

PJ: Does the FNA assign the terms a unique ID #?

HC: Terms go into a relational table. We have a list of terms from the literature (plant structures), some of which have PO ids. Collaborator- (Bob Morris?) and or his student looked through the PO and many were not there.

PJ: Can develop mapping file to match our terms with theirs, conventions exist. The FNA terms may match a PO term name, a synonym, may be added as a synonym or may be considered for addition as a new term. see: [Link to example mapping file]

PJ- Need to look over list and determine if they are 'valid' plant structure vs phenotype terms

HC: We have already extracted the plant structure terms from the phenotype terms

PJ: simplest first step, to examine the list/vocabulary which has been curated

JM: Have a list or glossary of terms, of these, about 30% have definitions.

HC: During the project, we will be discovering new terms on a daily basis- is it better to save and send them as a batch or individually?

PJ: Either is fine. Individual terms or groups of terms can be submitted through the SourceForge tracker.

Good if we have background info on the terms, taxonomic characters, literature citations etc. Can you provide an examples of how it is used (ie: in a sentence)?

LC: List can be sent as a spreadsheet, in groups if possible.

-If you can provide definitions and suggested parents, it is helpful.

Management of terms and definitions:

recording timestamp:~22:00

PJ: Will the FNA keep on maintaining the glossary of terms that you are extracting from the literature sources, and use the mapping file to enrich your database, rather than using the PO as your primary vocabulary. Or will the natural language processing be based on the PO/TO terms?

JM: Probably will have to maintain our own list, but really want to connect out to the ontologies or to a controlled vocabulary (aka Peter Stevens). May have a pool of some terms that don't fit into any of the ontologies- these will be maintained separately.

RW: Will the extracted terms have definitions? JM: This is a very difficult thing to do. FNA has a controlled vocab started, but we do not know if the authors are following it.

This will be even more difficult as we move beyond the FNA in a big way. All we can do is go with the general definition or the synonyms that people generally use.

RW: When terms are incorporated into the PO, they need a text definition, as well as the implicit definition that arises from their relationships to the other terms in the ontology. When we add the new terms for you, we will have to reach an agreement about the definitions. This can be done together.

JM: We have the same issues with PS' glossary- some definitions may not be matched right now. Need to get the users involved, build consensus,

In the long term we want to build digital floras "on the fly", provide users with the list of PO/TO/PATO/vocab terms with the accepted def'ns. If you want to disagree or change, they would have to provide explanations. Use the ontology for reasoning by users

PJ: The new terms being requested should come with a taxonomic, specimen, literature reference(s), if possible. Best case scenario- provide the sentence it comes from. Biggest challenge will be adding the terms describing the character traits or phenotypes. Depends upon training and education of users and curators as well.

PJ: Character list: Can these go into the Trait Ontology? We anticipate working on this in 2-3 weeks, so it would be great to have a list of terms that are being requested at that time. Then they could be added as a batch.


time: ~34:00 phenotypes vs traits: Can these go into TO?

HC: Have separate a lists of trait terms and characters, for the phenotypes; PATO may be too general, they may not want to want to add all our specific terms


PJ: PATO: major concepts come from the metazoan world, adds a third level of involvement Dependency issue- are they able to entertain the plant definitions? But PATO may be useful for the upper levels though.

PO may consider a new ontology class- Plant characters (or Phenotype) Ontology? FNA would be the "aggregator" get it started.

This could also include terms coming from the APweb glossary (PS) and possibly taxonomic pages, phylogenomic tree.

Referencing the FNA

  • It is important to the PO that we provide recognition to the FNA for the terms or the definitions that you provide. We can link out to your site and references

(recording time stamp: ~37:00)

JM: It would be great if the PO could create 2 kinds of reference to the FNA: a ref to the term and also to its use.

The PO could reference the ontology terms based to the FNA site through links and provide examples of how it is used in the FNA. "The FNA used this term/synonym in the following way:..."

  • The FNA could be added as a database cross-reference (dbxref) in the ontology for terms or synonyms.
  • We can also create a subset similar to the one for Traitnet, which can be searched (use "all fields" in search on the website). Can do filtered save in OBO-Edit for extraction of a PO-slim set

PJ: Synonyms such as panicle could have a species-specific reference attached to it.


Summary of term info needed:

  • term name
  • sentence or paragraph where term appears
  • literature citation(s)
  • taxonomic rank
  • text definition (if available)

PJ: Start with the existing list of plant structure terms that covers the 30% of database that have been curated with definitions.

If you have a list of character terms, you can send that as well. Ok if they do not have definitions.

Action Items:

  • HC and JM will send PO the lists they currently have so that we can start looking at them. This week or next.
  • The PO will provide a letter of support for the ABI proposal, stating that we are actively collaborating, creating mappings and integrating your term requests.
  • Also, the PO will also provide a letter of support for the other proposal, with the ??? Biology lab, using same technology, parsing the Lifewise(?).
  • In the future, the PO would be interested in expanding the collaboration on the project for developing natural language processing tools to use the ontologies for extracting information concerning mutants, gene expression etc.