Difference between revisions of "Flora of North America- FNA"

From Plant Ontology Wiki
Jump to navigationJump to search
 
(25 intermediate revisions by 2 users not shown)
Line 1: Line 1:
'''PO-FNA webex meeting June 21st 2011'''
+
[[PO-FNA webex meeting June 21st 2011]]
  
In attendance:
+
[[PO-FNA Conf Call Nov 18th, 2011]]
  
POC members:  Laurel Cooper (OSU), Pankaj Jaiswal (OSU),  Ramona Walls (NYBG).
 
  
Collaborators:  
+
Link to [https://sourceforge.net/tracker/?func=detail&aid=3376762&group_id=76834&atid=835555 SourceForge tracker]
James Macklin (institution; Flora of North America; james.macklin@gmail.com) and Hong Cui (University of Arizona; hong1.cui@gmail.com)
 
  
http://www.efloras.org/flora_page.aspx?flora_id=1
 
  
Link to [[https://ontology.webex.com/ontology/ldr.php?AT=pb&SP=MC&rID=54366497&rKey=bb29ee8dbaa8edfc webex recording]]
+
Link to [http://huntbot.andrew.cmu.edu/HIBD/Departments/DB-INTRO/IntroFNA.shtml FNA online glossary], including and introduction/explanation of how the database was create.
  
  
==Purpose of meeting:==
+
[[PO-FNA mapping results]] - summary of the initial mapping efforts, plus suggestions on how to proceed.
*To establish collaborative relationship with PO/TO so that the extracted terms can be added to the ontologies. They will use the PO/TO ID in their annotation
 
  
Their group is interested in text mining applications using the PO and TO, as well as other ontologies.
 
  
==Two applications:  Flora of North America project and Taxonomic Concepts Project (new)==
+
We now have id's for FNA terms. Still need to add new terms from the FNA list and create mapping file.
  
*Current project @ parsing FNA is in the 3rd year, ends next year
+
Are we okay with using these FNA ids? They are UUID and are very long and contain dashes (e.g., 40b5fa5d-a75b-49d8-8328-b4c28d54ea66). See antheridium (PO:0025125) on dev browser for an example of how it will look.
  
*New project- ABI(?) proposal being prepared:
+
''Where did this synonym go? It is still there, just doesn't show up in AmiGO.''
  
Parsing literature descriptions for a plant family and bee(?) family,
+
RW is working on modifying a perl script from JE to insert FNA synonyms into the PO file. May needs some help from JE.
  
This project will be more difficult, will specimen descriptions, predicted to be messier than describing genus and species. 
+
-What are the plans after this?
  
New project: Parsing Taxonomic Concept Analysis:
+
''Once mappings are done, want to see how PO shows up in their data base. Show utility of PO for recovering descriptors from free text.''
  
Project goal: Robustly produce the software they have developed on the FNA project and extend it to a new application- Taxonomic Concept Analysis
+
''FNA is using natural language processing to map their descriptions to PO. We are starting with plant structures and later will add phenotype descriptors. We will start with leaf characters.''
  
Taxonomic Concept: eg. published fact such as accepted plant name and all the synonyms. (simplest case), essentially a checklist
+
''Will use PATO as it has terms available, and add PO terms as needed. Will also help show utility of PO to systematists.''
  
-difficult for users to make sense of all the names, etc
+
-Others (These may be longer term than the next release):
  
-Want to move from names to "character spaces"; parsed descriptions will describe what is held by the name
+
* Linking ontology terms to character matrices using Morphobank.
  
-Characters and attributes, convert words back into matrices, use that to do analyses: logic-based and entropy-based ("information gain")
 
  
Assess: What is the character space held by this name? 
+
[[Category:Collaborations]]
 
 
Then: Compare character spaces- more quantitative and interesting
 
 
 
RW: discrete characters or continuous? JM: Want to encompass them all: Classical taxonomy has  cont. characters have been converted to discrete characters. Methods will allow for polymorphism, discrete and continuous characters.  Hardest part is defining terms and what those terms hold
 
 
 
==Getting the terms into the PO/TO==
 
recording timestamp: 8:30
 
 
 
Hong: let's discuss specifics: How and what format do the terms need to be in? To help the PO quickly evaluate and add them and give us the PO#.
 
 
 
PJ: Does the FNA assign the terms a unique ID #? Terms go into a relational table
 
 
 
HC:  We have a list of terms from the literature (plant structures), some of which have PO ids.  Collaborator- (Bob Morris?) and or his student looked through the PO and many were not there.
 
 
 
PJ: Can develop mapping file,to match our terms with theirs, conventions exist
 
see: [[http://palea.cgrb.oregonstate.edu/viewsvn/Poc/trunk/mapping2po/cereal-growth2po-gramene.txt?revision=592&view=co]]
 
 
 
The FNA terms may match a PO term name, a synonym, may be added as a synonym or may be considered for addition as a new term. 
 
 
 
PJ- Need to look over list and determine if they are 'valid' plant structure vs phenotype terms
 
 
 
HC: We have already extracted the plant structure terms from the phenotype terms
 
 
 
PJ: simplest first step, to examine the list/vocabulary which has been curated
 
 
 
JM: Have a list or glossary of terms, of these, about 30% have definitions.
 
 
 
HC: During the project, we will be discovering new terms on a daily basis- is it better to save and send them as a batch or individually?
 
 
 
PJ: Either is fine.  Individual terms or groups of terms can be submitted through the SourceForge tracker.
 
 
 
Good if we have background info on the terms, taxonomic characters, literature citations etc.  Can you provide an examples of how it is used (ie: in a sentence)?
 
 
 
LC: List can be sent as a spreadsheet, in groups if possible. 
 
 
 
-If you can provide definitions and suggested parents, it is helpful.
 
 
 
==Management of terms and definitions:==
 
recording timestamp:~22:00
 
 
 
PJ: Will the FNA keep on maintaining the glossary of terms that you are extracting from the literature sources, and use the mapping file to enrich your database, rather than using the PO as your primary vocabulary. Or will the natural language processing be based on the PO/TO terms?
 
 
 
JM: Probably will have to maintain our own list, but really want to connect out to the ontologies or to a controlled vocabulary (aka Peter Stevens).   
 
May have a pool of some terms that don't fit into any of the ontologies- these will be maintained separately.
 
 
 
RW: Will the extracted terms have definitions?  JM: This is a very difficult thing to do.  FNA has a controlled vocab started, but we do not know if the authors are following it.
 
 
 
This will be even more difficult as we move beyond the FNA in a big way.  All we can do is go with the general definition or the synonyms that people generally use.
 
 
 
RW: When terms are incorporated into the PO, they need a text definition, as well as the implicit definition that arises from their relationships to the other terms in the ontology.  When we add the new terms for you, we will have to reach an agreement about the definitions.  This can be done together.
 
 
 
JM: We have the same issues with PS' glossary- some definitions may not be matched right now.  Need to get the users involved, build consensus,
 
 
 
In the long term we want to build digital floras "on the fly", provide users with the list of PO/TO/PATO/vocab terms with the accepted def'ns.  If you want to disagree or change, they would have to provide explanations. Use the ontology for reasoning by users
 
 
 
PJ: The new terms being requested should come with a taxonomic, specimen, literature reference(s), if possible.  Best case scenario- provide the sentence it comes from.
 
Biggest challenge will be adding the terms describing the character traits or phenotypes.  Depends upon training and education of users and curators as well.
 
 
 
PJ: Character list: Can these go into the Trait Ontology? We anticipate working on this in 2-3 weeks, so it would be great to have a list of terms that are being requested at that time.  Then they could be added as a batch. 
 
 
 
 
 
time: ~34:00
 
phenotypes vs traits:  Can these go into TO? 
 
 
 
HC: Have separate a lists of trait terms and characters, for the phenotypes; PATO may be too general, they may not want to want to add all our specific terms
 
 
 
 
 
PJ: PATO: major concepts come from the metazoan world,  adds a third level of involvement
 
Dependency issue- are they able to entertain the plant definitions? But PATO may be useful for the upper levels though.
 
 
 
PO may consider a new ontology class- Plant characters (or Phenotype) Ontology? FNA would be the "aggregator" get it started. 
 
 
 
PS: APweb glosary, taxonmic glossary
 
 
 
==Referencing the FNA==
 
Time: 37 minutes
 
The PO will reference the ontology terms based to the FNA site through links, and provide examples of how it is used
 
FNA could be added as a dbxref in the ontology and could also create a subset similar to the one for traitnet
 
 
 
 
 
 
 
HC: How can we let the software know that some term will not be there in the PO/TO?  Then they will have to figure out some other way for the SW to deal with them.
 
 
 
==Action Items:==
 

Latest revision as of 09:19, 30 April 2012

PO-FNA webex meeting June 21st 2011

PO-FNA Conf Call Nov 18th, 2011


Link to SourceForge tracker


Link to FNA online glossary, including and introduction/explanation of how the database was create.


PO-FNA mapping results - summary of the initial mapping efforts, plus suggestions on how to proceed.


We now have id's for FNA terms. Still need to add new terms from the FNA list and create mapping file.

Are we okay with using these FNA ids? They are UUID and are very long and contain dashes (e.g., 40b5fa5d-a75b-49d8-8328-b4c28d54ea66). See antheridium (PO:0025125) on dev browser for an example of how it will look.

Where did this synonym go? It is still there, just doesn't show up in AmiGO.

RW is working on modifying a perl script from JE to insert FNA synonyms into the PO file. May needs some help from JE.

-What are the plans after this?

Once mappings are done, want to see how PO shows up in their data base. Show utility of PO for recovering descriptors from free text.

FNA is using natural language processing to map their descriptions to PO. We are starting with plant structures and later will add phenotype descriptors. We will start with leaf characters.

Will use PATO as it has terms available, and add PO terms as needed. Will also help show utility of PO to systematists.

-Others (These may be longer term than the next release):

  • Linking ontology terms to character matrices using Morphobank.