Difference between revisions of "PO-FNA Conf Call Nov 18th, 2011"
(49 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
− | + | '''POC-FNA meeting, Webex Conference Call; Date: Monday Nov 14th, 2011 10am (PST)''' | |
+ | In attendance: ''Laurel Cooper (OSU), Pankaj Jaiswal (OSU), Ramona Walls (NYBG)'' | ||
− | + | Collaborators: ''Hong Cui (UofA), James Macklin (absent)'' | |
+ | |||
+ | Back to: Main [[Flora_of_North_America-_FNA]] page | ||
+ | |||
+ | =PO-FNA mapping= | ||
+ | '''*Received the initial file from HC''' (FNA Term glossary (PO071811) (HC 7-18-11).csv)- renamed for clarity from "PO071811" | ||
+ | From HC (by email)- "Attached please find the set of terms with definitions. This set was extracted from FNA glossary (see below). This CSV file has three columns: term, definition, and limitation. The limitation indicates where the structure is found. Definitions were given by botanists." | ||
+ | |||
+ | Note that the initial file we received had three columns: "term, category, limitation", but no definitions. | ||
+ | |||
+ | From JM (by email): The FNA glossary was produced by Bob Kiger at the Hunt Institute in Pittsburg. It is available in printed form and on-line. The glossary was a first attempt at standardization for the project but was quickly built on but never officially updated. I think Hong says that there are now 70% more terms than the original and we are not even done yet! The on-line | ||
+ | version is here: http://huntbot.andrew.cmu.edu/HIBD/Departments/DB-INTRO/IntroFNA.shtml so you can cite it. | ||
+ | |||
+ | '''*Initial mapping was done using the program "OBOL"''' with assistance from CM. see: [http://www.hindawi.com/journals/cfg/2004/805603/abs/ link] | ||
+ | |||
+ | Mungall, C. J. (2004). Obol: Integrating Language and Meaning in Bio-Ontologies. Comparative and Functional Genomics, 5, 509-520. | ||
+ | |||
+ | Straight entity-matching to PO, with no TO | ||
+ | |||
+ | The set of matches was appended on as additional columns. | ||
+ | |||
+ | Each match is a quad of: | ||
+ | |||
+ | 1. PO ID | ||
+ | 2. PO Label | ||
+ | 3. Match Label (stemmed) | ||
+ | 4. Match synonym scope | ||
+ | |||
+ | In general, if there were multiple matches, only the best one was shown. (Sometimes it is hard to disambiguate). | ||
+ | |||
+ | Eg. aerial root >STRUCTURE > match(PO:0000042; shoot-borne root, syn: aerial root,narrow) match(PO:0009005; root, syn: aerial root, narrow) | ||
+ | |||
+ | note: this may reveal some synonyms that need to be revised in the PO, The syn for PO:0009005 seems odd. | ||
+ | |||
+ | '''*Manual editing of the mapping file by RW to Release #16:''' | ||
+ | See detailed notes in the excel file: "FNA_Term_glossary(HC 7-18-11)_to_PO_release16.xlsx" | ||
+ | |||
+ | ''RW should add a column to the spread sheet showing if a match was made by OBOL and then approved, by OBOL then rejected, or by hand.'' | ||
+ | |||
+ | ''How to improve automatic mapping:'' | ||
+ | |||
+ | ''Be more inclusive in lexical matching, e.g., match forms with alternate endings (maybe it already does this to some extent).'' | ||
+ | |||
+ | ''incorporate meaning into match -- is there some way to search the meaning in the definition, similar to searching the context of word use in FNA?'' | ||
+ | |||
+ | =Mapping results= | ||
+ | See the Summary of the initial mapping efforts on this page: [[PO-FNA mapping results]] | ||
+ | |||
+ | ==PO follow-up tasks== | ||
+ | *add 364 synonyms to existing terms | ||
+ | *fix several errors that were discovered while doing the mappings | ||
+ | *add 143 unique new terms, plus their synonyms | ||
+ | -FNA provides definitions, so this will be relatively easy | ||
+ | -A few of these are already on our list of terms to add, some from other users (e.g. phyloclade). | ||
+ | |||
+ | *create an official mapping file | ||
+ | |||
+ | ==Open questions== | ||
+ | ===need unique IDs for FNA terms=== | ||
+ | in order to deal with duplicates (multiple concepts with same name) | ||
+ | |||
+ | Examples: | ||
+ | |||
+ | FNA:columella has two meanings. One maps to gynophore, the other maps to fruit columella | ||
+ | |||
+ | FNA:ligule has 4 different meanings (3 structures and 1 character) | ||
+ | |||
+ | plus many others | ||
+ | |||
+ | ''All agreed that it would be best to have unique ID for FNA terms. Not sure if there should be one id for each definitions (synonyms share same id) or one id per term/definition combination (all unique words plus all unique concept have a different id).'' | ||
+ | |||
+ | ===FNA terms that map to >1 PO term=== | ||
+ | |||
+ | Examples: | ||
+ | |||
+ | FNA:apex matches to PO:bract apex, leaf apex, petal apex, petiole distal end, phyllome apex, reproductive shoot apex, sepal apex, shoot apex, tepal apex, vegetative shoot apex, bract apex, leaf apex, petal apex, petiole distal end, phyllome apex, reproductive shoot apex, sepal apex, shoot apex, tepal apex, vegetative shoot apex | ||
+ | |||
+ | FNA:blade matches to PO:lamina and PO:leaf lamina | ||
+ | |||
+ | ''When possible, we should make general term (e.g., organ apex, lamina) and only map FNA term to the general term. FNA can use reasoner to find all of the subtypes. If it is not possible to create one general term (e.g., tendril), we will need to map to each term separately.'' | ||
+ | |||
+ | ''Should have a separate line for each unique concept (each line is one FNA id and one PO id).'' | ||
+ | |||
+ | ''For future text mining, HC will try to include the parent structure (e.g., leaf base, rather than just base), so it is more clear what to map to.'' | ||
+ | |||
+ | ===FNA terms that map to obsolete PO terms=== | ||
+ | |||
+ | *8 terms, inlcudes duplicate plural forms, 5 unique. | ||
+ | |||
+ | pyrene: obsolete. The hard inner portion of a drupe, consisting of osseous endocarp and included seed. | ||
+ | |||
+ | coccus, cocci, mericarp: obsolete. One of the segments of a dehisced schizocarp; usually one-seeded and itself indehiscent. | ||
+ | |||
+ | microphyll: obsolete (What about megaphyll? Currently mapped to vascular leaf. Microphyll could map there too.) | ||
+ | |||
+ | also: | ||
+ | |||
+ | multiple fruit: mapped to obsolete PO:0020086 multiple fruit, but now maps as synonym to PO:fruit | ||
+ | |||
+ | ''POC will discuss these at their weekly meetings.'' | ||
+ | |||
+ | *3 have been replaced by GO terms: | ||
+ | |||
+ | papilla, papipillae: map to obsolete PO:0020053. Could map to plant cell papilla (GO:0090395) or could be character term (papillate). | ||
+ | |||
+ | wall: maps to obsolete pollen wall PO:0020059 now GO:0043667 (also maps to 6 other PO terms) | ||
+ | |||
+ | ''FNA should include mappings to GO in future text mining. RW included list of GO terms in the mapping to PO terms, where appropriate.'' | ||
+ | |||
+ | ''Even in FNA term maps to obsolete PO term, that term still exists. There is usually something pointing from the obsolete term to a new term in PO or GO.'' | ||
+ | |||
+ | ''If we really need to, there are ways to bring obsolete terms back in to circulation (by adding a replacement term).'' | ||
+ | |||
+ | ===FNA terms that are too general for PO=== | ||
+ | 101 character terms: areole 1, beak 1, diffuse root, blotch, scallop, etc. | ||
+ | |||
+ | 12 terms that are too vague, but hard to define as characters: commissure, isthmus, lamella, lamellae, membrane, suture, tubercle, indument, indumentum, indumenta, vestiture, vesture | ||
+ | |||
+ | ''Some of these we may be able to map to PO terms (e.g., indument to epidermis or plant substance). Others can be mapped by including the parent structure from the context, e.g., a beak of what or a blotch on what.'' | ||
+ | |||
+ | ==Phenotype/character terms from FNA== | ||
+ | * HC sent us the file: FNAv19Traits2PTO.xls | ||
+ | |||
+ | *begin work on phenotype/character terms, including the 101 from this list plus all of the FNA character terms |
Latest revision as of 17:42, 21 November 2011
POC-FNA meeting, Webex Conference Call; Date: Monday Nov 14th, 2011 10am (PST)
In attendance: Laurel Cooper (OSU), Pankaj Jaiswal (OSU), Ramona Walls (NYBG)
Collaborators: Hong Cui (UofA), James Macklin (absent)
Back to: Main Flora_of_North_America-_FNA page
PO-FNA mapping
*Received the initial file from HC (FNA Term glossary (PO071811) (HC 7-18-11).csv)- renamed for clarity from "PO071811" From HC (by email)- "Attached please find the set of terms with definitions. This set was extracted from FNA glossary (see below). This CSV file has three columns: term, definition, and limitation. The limitation indicates where the structure is found. Definitions were given by botanists."
Note that the initial file we received had three columns: "term, category, limitation", but no definitions.
From JM (by email): The FNA glossary was produced by Bob Kiger at the Hunt Institute in Pittsburg. It is available in printed form and on-line. The glossary was a first attempt at standardization for the project but was quickly built on but never officially updated. I think Hong says that there are now 70% more terms than the original and we are not even done yet! The on-line version is here: http://huntbot.andrew.cmu.edu/HIBD/Departments/DB-INTRO/IntroFNA.shtml so you can cite it.
*Initial mapping was done using the program "OBOL" with assistance from CM. see: link
Mungall, C. J. (2004). Obol: Integrating Language and Meaning in Bio-Ontologies. Comparative and Functional Genomics, 5, 509-520.
Straight entity-matching to PO, with no TO
The set of matches was appended on as additional columns.
Each match is a quad of:
1. PO ID 2. PO Label 3. Match Label (stemmed) 4. Match synonym scope
In general, if there were multiple matches, only the best one was shown. (Sometimes it is hard to disambiguate).
Eg. aerial root >STRUCTURE > match(PO:0000042; shoot-borne root, syn: aerial root,narrow) match(PO:0009005; root, syn: aerial root, narrow)
note: this may reveal some synonyms that need to be revised in the PO, The syn for PO:0009005 seems odd.
*Manual editing of the mapping file by RW to Release #16: See detailed notes in the excel file: "FNA_Term_glossary(HC 7-18-11)_to_PO_release16.xlsx"
RW should add a column to the spread sheet showing if a match was made by OBOL and then approved, by OBOL then rejected, or by hand.
How to improve automatic mapping:
Be more inclusive in lexical matching, e.g., match forms with alternate endings (maybe it already does this to some extent).
incorporate meaning into match -- is there some way to search the meaning in the definition, similar to searching the context of word use in FNA?
Mapping results
See the Summary of the initial mapping efforts on this page: PO-FNA mapping results
PO follow-up tasks
- add 364 synonyms to existing terms
- fix several errors that were discovered while doing the mappings
- add 143 unique new terms, plus their synonyms
-FNA provides definitions, so this will be relatively easy -A few of these are already on our list of terms to add, some from other users (e.g. phyloclade).
- create an official mapping file
Open questions
need unique IDs for FNA terms
in order to deal with duplicates (multiple concepts with same name)
Examples:
FNA:columella has two meanings. One maps to gynophore, the other maps to fruit columella
FNA:ligule has 4 different meanings (3 structures and 1 character)
plus many others
All agreed that it would be best to have unique ID for FNA terms. Not sure if there should be one id for each definitions (synonyms share same id) or one id per term/definition combination (all unique words plus all unique concept have a different id).
FNA terms that map to >1 PO term
Examples:
FNA:apex matches to PO:bract apex, leaf apex, petal apex, petiole distal end, phyllome apex, reproductive shoot apex, sepal apex, shoot apex, tepal apex, vegetative shoot apex, bract apex, leaf apex, petal apex, petiole distal end, phyllome apex, reproductive shoot apex, sepal apex, shoot apex, tepal apex, vegetative shoot apex
FNA:blade matches to PO:lamina and PO:leaf lamina
When possible, we should make general term (e.g., organ apex, lamina) and only map FNA term to the general term. FNA can use reasoner to find all of the subtypes. If it is not possible to create one general term (e.g., tendril), we will need to map to each term separately.
Should have a separate line for each unique concept (each line is one FNA id and one PO id).
For future text mining, HC will try to include the parent structure (e.g., leaf base, rather than just base), so it is more clear what to map to.
FNA terms that map to obsolete PO terms
- 8 terms, inlcudes duplicate plural forms, 5 unique.
pyrene: obsolete. The hard inner portion of a drupe, consisting of osseous endocarp and included seed.
coccus, cocci, mericarp: obsolete. One of the segments of a dehisced schizocarp; usually one-seeded and itself indehiscent.
microphyll: obsolete (What about megaphyll? Currently mapped to vascular leaf. Microphyll could map there too.)
also:
multiple fruit: mapped to obsolete PO:0020086 multiple fruit, but now maps as synonym to PO:fruit
POC will discuss these at their weekly meetings.
- 3 have been replaced by GO terms:
papilla, papipillae: map to obsolete PO:0020053. Could map to plant cell papilla (GO:0090395) or could be character term (papillate).
wall: maps to obsolete pollen wall PO:0020059 now GO:0043667 (also maps to 6 other PO terms)
FNA should include mappings to GO in future text mining. RW included list of GO terms in the mapping to PO terms, where appropriate.
Even in FNA term maps to obsolete PO term, that term still exists. There is usually something pointing from the obsolete term to a new term in PO or GO.
If we really need to, there are ways to bring obsolete terms back in to circulation (by adding a replacement term).
FNA terms that are too general for PO
101 character terms: areole 1, beak 1, diffuse root, blotch, scallop, etc.
12 terms that are too vague, but hard to define as characters: commissure, isthmus, lamella, lamellae, membrane, suture, tubercle, indument, indumentum, indumenta, vestiture, vesture
Some of these we may be able to map to PO terms (e.g., indument to epidermis or plant substance). Others can be mapped by including the parent structure from the context, e.g., a beak of what or a blotch on what.
Phenotype/character terms from FNA
- HC sent us the file: FNAv19Traits2PTO.xls
- begin work on phenotype/character terms, including the 101 from this list plus all of the FNA character terms