PO PlantSystematics conference call 10-21-11
Meeting to discuss technical issues of linking between PO and PS.org.
Present: Kevin Nixon, Pankaj Jaiswal, Justin Elser, Justin Preece, Ramona Walls
notes from meeting
JP: Currently no way of automatically putting in links to keywords -- means manual curation, which is time consuming and can lead to errors.
PJ: if we had a list of keywords, we could create a mapping and just maintain that, then use that mapping to create links both from PO to PS.org and from PS.org to PO
Mapping could be hosted on PO SVN site. We have a mappings folder already. SN can have access to this.
If an image on PS.org has an association to PO, we can could use that to automaticly create associations.
Advantage of mapping/using keywords rather than links to individual images: Don't want to create xrefs for keywords (like flower) that have hundreds of images.
Potential problem is false positives, if for example keyword is used incorrectly or doesn't exactly match meaning of PO definition.
KN: Contributors don't always put in keywords. This could be fixed by having students or other upgrade PO.org by adding keywords.
PS.org presently can't do boolean search. PO could input a list of words which PS.org could then split out into separate terms, and add wildcards. No need for any additional association file or mappings.
KN could add flags to images that match keywords, then we could click on which images we want to keep -- to avoid false positives.
KN create a table of keywords and ID, we keep them updated with our list of ids
PO is trying to encourage people to use standard vocabulary -- this is why we want to have links to PO terms/IDs.
Could begin mappings, but still have fuzzy search that will complement this.
PJ: (side note) working on software (with collaborator at OSU) to build sectors on images and automatically identify what is on image based on a library.
We have two needs: 1. link to PS.org images from PO 2. link PS.org images to PO IDs.
KN: First step is to implement a search that is sufficient given existing data structure, then move toward system that integrates PO by downloading a table from PO that integrates with his terms.
KN can start by sending us a list of his keywords. But many of them won't be relevant for PO.
PO will use this to create a mapping file.
Need to have a mechanism in place so that additions and changes are updated automatically as much as possible.
One option (to start smaller) is to start with only terms associated with e.g. flower and leaf to see how far we can extend from there -- images that have keyword flower and any keyword associated with it.
Start by getting keywords from KN and doing a candidate mapping.
Standard format of mapping file is on SVN -- we will use same format.
PO is using UTF8 (not unicode) because AmiGO requires UTF, but we may go to unicode in the future.
KN: IP for pages on PS.org will be moving, so we want to stick with www.plantsystematics domain - going onto gigabit server.
KN: list of keywords, work on search function and give us a different url for it.
We will input a list of words, delimited however we want (& or whatever) and KN will write a query that takes those and does a fuzzy search on them then create the link to the correct search results page (he has something like this he has already used for taxonomic names - works with missing letter and internal errors - can transfer this over to keyword search, even though he hasn't done it yet)
Will need to figure out what delimiter to use to separate words: e.g., stamen&anther&filament
May want to have something like po_word:
Eventually, we could add other variables, like: po_word:something;taxon:something.
May also need to think about window dimension later, if we decide to incorporate a pop-up from PO site.
JE: important to test out the links first on our browser, to make sure they work, before we implement this on a large scale.
tasks
short term:
- KN will send list of all PS.org keywords to RW at PO.
- RW will work with JP to create a mapping file between PS.org keywords and PO IDs. This will go on PO's SVN and will follow standard mapping file format.
- JP, JE and RW will work create a few test cases of keyword searches (e.g., some with a single word, some with multiple words) and send these to KN so he can figure out the best format for the url.
- JE will create the links from pages on the dev version of PO's AmiGO brower, to make sure they work properly.
longer term
- PO will work on method of automatically adding links to images from PO based on PO term names and synonyms.
- PO will work with KN to devise a strategy for associating PO id's directly with PO.org images. These can then be used to link back to PO.