Flora of North America- FNA
PO-FNA webex meeting June 21st 2011
In attendance:
POC members: Laurel Cooper (OSU), Pankaj Jaiswal (OSU), Ramona Walls (NYBG).
Collaborators: James Macklin (institution; Flora of North America; james.macklin@gmail.com) and Hong Cui (University of Arizona; hong1.cui@gmail.com)
http://www.efloras.org/flora_page.aspx?flora_id=1
Purpose of meeting:
- To establish collaborative relationship with PO/TO so that the extracted terms can be added to the ontologies. They will use the PO/TO ID in their annotation
Their group is interested in text mining applications using the PO and TO, as well as other ontologies.
Two applications: Flora of North America and a new project, funding is is being applied for
Current project @ parsing FNA is in the 3rd year, ends next year
New ABI(?) proposal being prepared: Parsing literature descriptions for a plant family and bee(?) family,
This project will be more difficult, will specimen descriptions, predicted to be messier than describing genus and species.
New project: Parsing Taxonomic Concept Analysis:
Project goal: Robustly produce the software they have developed on the FNA project and extend it to a new application- Taxonomic Concept Analysis
Taxonomic Concept: eg. published fact such as accepted plant name and all the synonyms. (simplest case), essentially a checklist
-difficult for users to make sense of all the names, etc
-Want to move from names to "character spaces"; parsed descriptions will describe what is held by the name
-Characters and attributes, convert words back into matrices, use that to do analyses: logic-based and entropy-based ("information gain")
Assess: What is the character space held by this name?
Then: Compare character spaces- more quantitative and interesting
RW: discrete characters or continuous? JM: Want to encompass them all: Classical taxonomy has cont. characters have been converted to discrete characters. Methods will allow for polymorphism, discrete and continuous characters. Hardest part is defining terms and what those terms hold
8:30 Hong: lets discuss specifics: How and what format do the terms need to be in? To help the PO quickly evaluate and add them and give us the PO#.
How can we let the software know that some term will not be there in the PO/TO? Then they will have to figure out some other way for the SW to deal with them.
HC: List of terms from the literature (plant structures), with PO ids. Collaborator- (Bob Morris?) and or his student Looked through the PO and many were not there.
PJ: Does the FNA assign the terms a unique ID #? Terms go into a relational table
PJ: Can develop mapping file,to match our terms with theirs, conventions exist see: link to SVN site
PJ- Need to look over list and determine if they are 'valid' plant structure vs phenotype terms
The FNA terms may match a PO term name, synonym, may be added as a synonym or may be considered for addition as a new term
HC: We have already extracted the plant structure terms from the phenotype terms
PJ: simplest first step, to examine the list/vocabulary which has been curated
JM: Have a list or glossary of terms, of these, about 30% have definitions.
18:30
HC: During the project, we will be discovering new terms on a daily basis- is it better to save and send them as a batch or individually? PO: Either is fine, good if we have background info on the terms, taxonomic characters, literature citations etc. Can you provide an examples of how it is used (ie: in a sentence)?
PJ: Character list: Can these go into the TO? PATO may be to general, they may not want to want ot add all our specific terns- may be useful for the upper levels though.
PO may consider a new ontology class- Plant Phenotype Ontology?
The PO will reference the ontology terms based to the FNA site through links, and provide examples of how it is used FNA could be added as a dbxref in the ontology and could also create a subset similar to the one for traitnet