Difference between revisions of "PO-FNA Conf Call Nov 18th, 2011"

From Plant Ontology Wiki
Jump to navigationJump to search
 
(20 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
'''POC-FNA meeting, Webex Conference Call; Date: Monday Nov 14th, 2011 10am (PST)'''
 
'''POC-FNA meeting, Webex Conference Call; Date: Monday Nov 14th, 2011 10am (PST)'''
  
Planned In attendance: ''Laurel Cooper (OSU), Pankaj Jaiswal (OSU), Ramona Walls (NYBG)''   
+
In attendance: ''Laurel Cooper (OSU), Pankaj Jaiswal (OSU), Ramona Walls (NYBG)''   
 
 
 
 
Collaborators: ''Hong Cui, James Macklin''
 
  
 +
Collaborators: ''Hong Cui (UofA), James Macklin (absent)''
  
 
Back to: Main [[Flora_of_North_America-_FNA]] page
 
Back to: Main [[Flora_of_North_America-_FNA]] page
  
 
=PO-FNA mapping=
 
=PO-FNA mapping=
*Received the initial file from HC (FNA Term glossary (PO071811) (HC 7-18-11).csv)- renamed for clarity from "PO071811"
+
'''*Received the initial file from HC''' (FNA Term glossary (PO071811) (HC 7-18-11).csv)- renamed for clarity from "PO071811"
 
From HC (by email)- "Attached please find the set of terms with definitions. This set was extracted from FNA glossary (see below). This CSV file has three columns: term, definition, and limitation. The limitation indicates where the structure is found. Definitions were given by botanists."
 
From HC (by email)- "Attached please find the set of terms with definitions. This set was extracted from FNA glossary (see below). This CSV file has three columns: term, definition, and limitation. The limitation indicates where the structure is found. Definitions were given by botanists."
  
Line 18: Line 16:
 
version is here: http://huntbot.andrew.cmu.edu/HIBD/Departments/DB-INTRO/IntroFNA.shtml so you can cite it.
 
version is here: http://huntbot.andrew.cmu.edu/HIBD/Departments/DB-INTRO/IntroFNA.shtml so you can cite it.
  
*Initial mapping was done using the program "OBOL" with assistance from CM. see:  
+
'''*Initial mapping was done using the program "OBOL"''' with assistance from CM. see: [http://www.hindawi.com/journals/cfg/2004/805603/abs/ link]
 +
 
 +
Mungall, C. J. (2004). Obol: Integrating Language and Meaning in Bio-Ontologies. Comparative and Functional Genomics, 5, 509-520.
 +
 
 
Straight entity-matching to PO, with no TO  
 
Straight entity-matching to PO, with no TO  
  
Line 30: Line 31:
 
4. Match synonym scope
 
4. Match synonym scope
  
In general, if there are multiple matches, only the best one is shown. Sometimes it is hard to disambiguate. E.g.:
+
In general, if there were multiple matches, only the best one was shown. (Sometimes it is hard to disambiguate).  
 +
 
 +
Eg. aerial root >STRUCTURE > match(PO:0000042; shoot-borne root, syn: aerial root,narrow)    match(PO:0009005; root, syn: aerial root, narrow)
 +
 
 +
note: this may reveal some synonyms that need to be revised in the PO, The syn for PO:0009005 seems odd.
 +
 
 +
'''*Manual editing of the mapping file by RW to Release #16:'''
 +
See detailed notes in the excel file: "FNA_Term_glossary(HC 7-18-11)_to_PO_release16.xlsx"
 +
 
 +
''RW should add a column to the spread sheet showing if a match was made by OBOL and then approved, by OBOL then rejected, or by hand.''
  
aerial root    STRUCTURE              match(PO:0000042,shoot-borne root,aerialroot,narrow)    match(PO:0009005,root,aerialroot,narrow)
+
''How to improve automatic mapping:''
  
The syn for PO:0009005 seems odd, no?
+
''Be more inclusive in lexical matching, e.g., match forms with alternate endings (maybe it already does this to some extent).''
  
It may be possible to do more sophisticated prioritization. However, it may be better to revise some of the synonym assignments in PO. David OS and I are providing better documentation for assigning synonym scope and provenance in obo ontologies.
+
''incorporate meaning into match -- is there some way to search the meaning in the definition, similar to searching the context of word use in FNA?''
  
 
=Mapping results=
 
=Mapping results=
Line 61: Line 71:
  
 
plus many others
 
plus many others
 +
 +
''All agreed that it would be best to have unique ID for FNA terms. Not sure if there should be one id for each definitions (synonyms share same id) or one id per term/definition combination (all unique words plus all unique concept have a different id).''
  
 
===FNA terms that map to >1 PO term===
 
===FNA terms that map to >1 PO term===
Line 69: Line 81:
  
 
FNA:blade matches to PO:lamina and PO:leaf lamina
 
FNA:blade matches to PO:lamina and PO:leaf lamina
 +
 +
''When possible, we should make general term (e.g., organ apex, lamina) and only map FNA term to the general term. FNA can use reasoner to find all of the subtypes. If it is not possible to create one general term (e.g., tendril), we will need to map to each term separately.''
 +
 +
''Should have a separate line for each unique concept (each line is one FNA id and one PO id).''
 +
 +
''For future text mining, HC will try to include the parent structure (e.g., leaf base, rather than just base), so it is more clear what to map to.''
  
 
===FNA terms that map to obsolete PO terms===
 
===FNA terms that map to obsolete PO terms===
Line 78: Line 96:
 
coccus, cocci, mericarp: obsolete. One of the segments of a dehisced schizocarp; usually one-seeded and itself indehiscent.
 
coccus, cocci, mericarp: obsolete. One of the segments of a dehisced schizocarp; usually one-seeded and itself indehiscent.
  
microphyll: obsolete (What about megaphyll? Currently mapped to vascular leaf.)
+
microphyll: obsolete (What about megaphyll? Currently mapped to vascular leaf. Microphyll could map there too.)
  
 
also:
 
also:
Line 84: Line 102:
 
multiple fruit: mapped to obsolete PO:0020086 multiple fruit, but now maps as synonym to PO:fruit
 
multiple fruit: mapped to obsolete PO:0020086 multiple fruit, but now maps as synonym to PO:fruit
  
 +
''POC will discuss these at their weekly meetings.''
  
 
*3 have been replaced by GO terms:
 
*3 have been replaced by GO terms:
Line 91: Line 110:
 
wall: maps to obsolete pollen wall PO:0020059 now GO:0043667 (also maps to 6 other PO terms)
 
wall: maps to obsolete pollen wall PO:0020059 now GO:0043667 (also maps to 6 other PO terms)
  
 +
''FNA should include mappings to GO in future text mining. RW included list of GO terms in the mapping to PO terms, where appropriate.''
 +
 +
''Even in FNA term maps to obsolete PO term, that term still exists. There is usually something pointing from the obsolete term to a new term in PO or GO.''
 +
 +
''If we really need to, there are ways to bring obsolete terms back in to circulation (by adding a replacement term).''
  
 
===FNA terms that are too general for PO===
 
===FNA terms that are too general for PO===
Line 97: Line 121:
 
12 terms that are too vague, but hard to define as characters: commissure, isthmus, lamella, lamellae, membrane, suture, tubercle, indument, indumentum, indumenta, vestiture, vesture
 
12 terms that are too vague, but hard to define as characters: commissure, isthmus, lamella, lamellae, membrane, suture, tubercle, indument, indumentum, indumenta, vestiture, vesture
  
==Phenotype/character terms form FNA==
+
''Some of these we may be able to map to PO terms (e.g., indument to epidermis or plant substance). Others can be mapped by including the parent structure from the context, e.g., a beak of what or a blotch on what.''
*begin work on phenotype/character terms, inlcuding the 101 from this list plus all of the FNA character terms
+
 
 +
==Phenotype/character terms from FNA==
 +
* HC sent us the file: FNAv19Traits2PTO.xls
 +
 
 +
*begin work on phenotype/character terms, including the 101 from this list plus all of the FNA character terms

Latest revision as of 17:42, 21 November 2011

POC-FNA meeting, Webex Conference Call; Date: Monday Nov 14th, 2011 10am (PST)

In attendance: Laurel Cooper (OSU), Pankaj Jaiswal (OSU), Ramona Walls (NYBG)

Collaborators: Hong Cui (UofA), James Macklin (absent)

Back to: Main Flora_of_North_America-_FNA page

PO-FNA mapping

*Received the initial file from HC (FNA Term glossary (PO071811) (HC 7-18-11).csv)- renamed for clarity from "PO071811" From HC (by email)- "Attached please find the set of terms with definitions. This set was extracted from FNA glossary (see below). This CSV file has three columns: term, definition, and limitation. The limitation indicates where the structure is found. Definitions were given by botanists."

Note that the initial file we received had three columns: "term, category, limitation", but no definitions.

From JM (by email): The FNA glossary was produced by Bob Kiger at the Hunt Institute in Pittsburg. It is available in printed form and on-line. The glossary was a first attempt at standardization for the project but was quickly built on but never officially updated. I think Hong says that there are now 70% more terms than the original and we are not even done yet! The on-line version is here: http://huntbot.andrew.cmu.edu/HIBD/Departments/DB-INTRO/IntroFNA.shtml so you can cite it.

*Initial mapping was done using the program "OBOL" with assistance from CM. see: link

Mungall, C. J. (2004). Obol: Integrating Language and Meaning in Bio-Ontologies. Comparative and Functional Genomics, 5, 509-520.

Straight entity-matching to PO, with no TO

The set of matches was appended on as additional columns.

Each match is a quad of:

1. PO ID 2. PO Label 3. Match Label (stemmed) 4. Match synonym scope

In general, if there were multiple matches, only the best one was shown. (Sometimes it is hard to disambiguate).

Eg. aerial root >STRUCTURE > match(PO:0000042; shoot-borne root, syn: aerial root,narrow) match(PO:0009005; root, syn: aerial root, narrow)

note: this may reveal some synonyms that need to be revised in the PO, The syn for PO:0009005 seems odd.

*Manual editing of the mapping file by RW to Release #16: See detailed notes in the excel file: "FNA_Term_glossary(HC 7-18-11)_to_PO_release16.xlsx"

RW should add a column to the spread sheet showing if a match was made by OBOL and then approved, by OBOL then rejected, or by hand.

How to improve automatic mapping:

Be more inclusive in lexical matching, e.g., match forms with alternate endings (maybe it already does this to some extent).

incorporate meaning into match -- is there some way to search the meaning in the definition, similar to searching the context of word use in FNA?

Mapping results

See the Summary of the initial mapping efforts on this page: PO-FNA mapping results

PO follow-up tasks

  • add 364 synonyms to existing terms
  • fix several errors that were discovered while doing the mappings
  • add 143 unique new terms, plus their synonyms

-FNA provides definitions, so this will be relatively easy -A few of these are already on our list of terms to add, some from other users (e.g. phyloclade).

  • create an official mapping file

Open questions

need unique IDs for FNA terms

in order to deal with duplicates (multiple concepts with same name)

Examples:

FNA:columella has two meanings. One maps to gynophore, the other maps to fruit columella

FNA:ligule has 4 different meanings (3 structures and 1 character)

plus many others

All agreed that it would be best to have unique ID for FNA terms. Not sure if there should be one id for each definitions (synonyms share same id) or one id per term/definition combination (all unique words plus all unique concept have a different id).

FNA terms that map to >1 PO term

Examples:

FNA:apex matches to PO:bract apex, leaf apex, petal apex, petiole distal end, phyllome apex, reproductive shoot apex, sepal apex, shoot apex, tepal apex, vegetative shoot apex, bract apex, leaf apex, petal apex, petiole distal end, phyllome apex, reproductive shoot apex, sepal apex, shoot apex, tepal apex, vegetative shoot apex

FNA:blade matches to PO:lamina and PO:leaf lamina

When possible, we should make general term (e.g., organ apex, lamina) and only map FNA term to the general term. FNA can use reasoner to find all of the subtypes. If it is not possible to create one general term (e.g., tendril), we will need to map to each term separately.

Should have a separate line for each unique concept (each line is one FNA id and one PO id).

For future text mining, HC will try to include the parent structure (e.g., leaf base, rather than just base), so it is more clear what to map to.

FNA terms that map to obsolete PO terms

  • 8 terms, inlcudes duplicate plural forms, 5 unique.

pyrene: obsolete. The hard inner portion of a drupe, consisting of osseous endocarp and included seed.

coccus, cocci, mericarp: obsolete. One of the segments of a dehisced schizocarp; usually one-seeded and itself indehiscent.

microphyll: obsolete (What about megaphyll? Currently mapped to vascular leaf. Microphyll could map there too.)

also:

multiple fruit: mapped to obsolete PO:0020086 multiple fruit, but now maps as synonym to PO:fruit

POC will discuss these at their weekly meetings.

  • 3 have been replaced by GO terms:

papilla, papipillae: map to obsolete PO:0020053. Could map to plant cell papilla (GO:0090395) or could be character term (papillate).

wall: maps to obsolete pollen wall PO:0020059 now GO:0043667 (also maps to 6 other PO terms)

FNA should include mappings to GO in future text mining. RW included list of GO terms in the mapping to PO terms, where appropriate.

Even in FNA term maps to obsolete PO term, that term still exists. There is usually something pointing from the obsolete term to a new term in PO or GO.

If we really need to, there are ways to bring obsolete terms back in to circulation (by adding a replacement term).

FNA terms that are too general for PO

101 character terms: areole 1, beak 1, diffuse root, blotch, scallop, etc.

12 terms that are too vague, but hard to define as characters: commissure, isthmus, lamella, lamellae, membrane, suture, tubercle, indument, indumentum, indumenta, vestiture, vesture

Some of these we may be able to map to PO terms (e.g., indument to epidermis or plant substance). Others can be mapped by including the parent structure from the context, e.g., a beak of what or a blotch on what.

Phenotype/character terms from FNA

  • HC sent us the file: FNAv19Traits2PTO.xls
  • begin work on phenotype/character terms, including the 101 from this list plus all of the FNA character terms