Flora of North America- FNA

From Plant Ontology Wiki
Jump to navigationJump to search

PO-FNA webex meeting June 21st 2011

In attendance:

POC members: Laurel Cooper (OSU), Pankaj Jaiswal (OSU), Ramona Walls (NYBG).

Collaborators: James Macklin (institution; Flora of North America; james.macklin@gmail.com) and Hong Cui (University of Arizona; hong1.cui@gmail.com)

http://www.efloras.org/flora_page.aspx?flora_id=1


Purpose of meeting:

  • To establish collaborative relationship with PO/TO so that the extracted terms can be added to the ontologies. They will use the PO/TO ID in their annotation

Their group is interested in text mining applications using the PO and TO, as well as other ontologies.

Two applications: Flora of North America and a new project, funding is is being applied for

Current project @ parsing FNA is in the 3rd year, ends next year

New ABI(?) proposal being prepared: Parsing literature descriptions for a plant family and bee(?) family,

This project will be more difficult, will specimen descriptions, predicted to be messier than describing genus and species.

New project: Parsing Taxonomic Concept Analysis:

Project goal: Robustly produce the software they have developed on the FNA project and extend it to a new application- Taxonomic Concept Analysis

Taxonomic Concept: eg. published fact such as accepted plant name and all the synonyms. (simplest case), essentially a checklist

-difficult for users to make sense of all the names, etc

-Want to move from names to "character spaces"; parsed descriptions will describe what is held by the name

-Characters and attributes, convert words back into matrices, use that to do analyses: logic-based and entropy-based ("information gain")

Assess: What is the character space held by this name?

Then: Compare character spaces- more quantitative and interesting


RW: discrete characters or continuous? JM: Want to encompass them all: Classical taxonomy has cont. characters have been converted to discrete characters. Methods will allow for polymorphism, discrete and continuous characters. Hardest part is defining terms and what those terms hold

8:30 Hong: lets discuss specifics: How and what format do the terms need to be in? To help the PO quickly evaluate and add them and give us the PO#.

How can we let the software know that some term will not be there in the PO/TO? Then they will have to figure out some other way for the SW to deal with them.


HC: List of terms from the literature (plant structures), with PO ids. Collaborator- (Bob Morris?) and or his student Looked through the PO and many were not there.

PJ: Does the FNA assign the terms a unique ID #? Terms go into a relational table

PJ: Can develop mapping file,to match our terms with theirs, conventions exist see: link to SVN site


PJ- Need to look over list and determine if they are 'valid' plant structure vs phenotype terms

The FNA terms may match a PO term name, synonym, may be added as a synonym or may be considered for addition as a new term

HC: We have already extracted the plant structure terms from the phenotype terms

PJ: simplest first step, to examine the list/vocabulary which has been curated

JM: Have a list or glossary of terms, of these, about 30% have definitions.

18:30


HC: During the project, we will be discovering new terms on a daily basis- is it better to save and send them as a batch or individually? PO: Either is fine, good if we have background info on the terms, taxonomic characters, literature citations etc. Can you provide an examples of how it is used (ie: in a sentence)?

PJ: Character list: Can these go into the TO? PATO may be to general, they may not want to want ot add all our specific terns- may be useful for the upper levels though.

PO may consider a new ontology class- Plant Phenotype Ontology?

The PO will reference the ontology terms based to the FNA site through links, and provide examples of how it is used FNA could be added as a dbxref in the ontology and could also create a subset similar to the one for traitnet