Flora of North America- FNA
PO-FNA webex meeting June 21st 2011
In attendance:
POC members: Laurel Cooper (OSU), Pankaj Jaiswal (OSU), Ramona Walls (NYBG).
Collaborators: James Macklin (institution; Flora of North America; james.macklin@gmail.com) and Hong Cui (University of Arizona; hong1.cui@gmail.com)
http://www.efloras.org/flora_page.aspx?flora_id=1
Link to [webex recording]
Purpose of meeting:
- To establish collaborative relationship with PO/TO so that the extracted terms can be added to the ontologies. They will use the PO/TO ID in their annotation
Their group is interested in text mining applications using the PO and TO, as well as other ontologies.
Two applications: Flora of North America project and Taxonomic Concepts Project (new)
- Current project @ parsing FNA is in the 3rd year, ends next year
- New project- ABI(?) proposal being prepared:
Parsing literature descriptions for a plant family and bee(?) family,
This project will be more difficult, will specimen descriptions, predicted to be messier than describing genus and species.
New project: Parsing Taxonomic Concept Analysis:
Project goal: Robustly produce the software they have developed on the FNA project and extend it to a new application- Taxonomic Concept Analysis
Taxonomic Concept: eg. published fact such as accepted plant name and all the synonyms. (simplest case), essentially a checklist
-difficult for users to make sense of all the names, etc
-Want to move from names to "character spaces"; parsed descriptions will describe what is held by the name
-Characters and attributes, convert words back into matrices, use that to do analyses: logic-based and entropy-based ("information gain")
Assess: What is the character space held by this name?
Then: Compare character spaces- more quantitative and interesting
RW: discrete characters or continuous? JM: Want to encompass them all: Classical taxonomy has cont. characters have been converted to discrete characters. Methods will allow for polymorphism, discrete and continuous characters. Hardest part is defining terms and what those terms hold
Getting the terms into the PO/TO
recording timestamp: 8:30
Hong: let's discuss specifics: How and what format do the terms need to be in? To help the PO quickly evaluate and add them and give us the PO#.
PJ: Does the FNA assign the terms a unique ID #? Terms go into a relational table
HC: We have a list of terms from the literature (plant structures), some of which have PO ids. Collaborator- (Bob Morris?) and or his student looked through the PO and many were not there.
PJ: Can develop mapping file,to match our terms with theirs, conventions exist see: [[1]]
The FNA terms may match a PO term name, synonym, may be added as a synonym or may be considered for addition as a new term
PJ- Need to look over list and determine if they are 'valid' plant structure vs phenotype terms
HC: We have already extracted the plant structure terms from the phenotype terms
PJ: simplest first step, to examine the list/vocabulary which has been curated
JM: Have a list or glossary of terms, of these, about 30% have definitions.
HC: How can we let the software know that some term will not be there in the PO/TO? Then they will have to figure out some other way for the SW to deal with them.
HC: During the project, we will be discovering new terms on a daily basis- is it better to save and send them as a batch or individually?
PJ: Either is fine, good if we have background info on the terms, taxonomic characters, literature citations etc. Can you provide an examples of how it is used (ie: in a sentence)?
LC: List can be sent as a spreadsheet, in groups if possible
Giving recognition to the FNA group
recording timestamp:18:30
PJ: Character list: Can these go into the TO? PATO may be to general, they may not want to want ot add all our specific terns- may be useful for the upper levels though.
PO may consider a new ontology class- Plant Phenotype Ontology?
The PO will reference the ontology terms based to the FNA site through links, and provide examples of how it is used FNA could be added as a dbxref in the ontology and could also create a subset similar to the one for traitnet