PO PlantSystematics conference call 10-21-11
Meeting to discuss technical issues of linking between PO and PS.org.
Present: Kevin Nixon, Pankaj Jaiswal, Justin Elser, Justin Preece, Ramona Walls
JP: Currently no way of automatically putting in links to keywords -- means manual curation, which is time consuming and can lead to errors.
PJ: if we had a list of keywords, we could create a mapping and just maintain that, then use that mapping to create links both from PO to PS.org and from PS.org to PO
Mapping could be hosted on PO SVN site. We have a mappings folder already. SN can have access to this.
If an image on PS.org has an association to PO, we can could use that to automaticly create associations.
Advantage of mapping/using keywords rather than links to individual images: Don't want to create xrefs for keywords (like flower) that have hundreds of images.
Potential problem is false positives, if for example keyword is used incorrectly or doesn't exactly match meaning of PO definition.
KN: Contributors don't always put in keywords. This could be fixed by having students or other upgrade PO.org by adding keywords.
PS.org presently can't do boolean search. PO could input a list of words which PS.org could then split out into separate terms, and add wildcards. No need for any additional association file or mappings.
KN could add flags to images that match keywords, then we could click on which images we want to keep -- to avoid false positives.
KN create a table of keywords and ID, we keep them updated with our list of ids
PO is trying to encourage people to use standard vocabulary -- this is why we want to have links to PO terms/IDs.
Could begin mappings, but still have fuzzy search that will complement this.
PJ: (side note) working on software (with collaborator at OSU) to build sectors on images and automatically identify what is on image based on a library.
We have two needs: 1. link to images from PO 2. link all PSO images to PO IDs.
KN: First step is to implement a search that is sufficient given existing data structure, then move toward system that integrates PO by downloading a table from PO that integrates with his terms.
KN can start by sending us a list of his keywords. But many of them won't be relevant for PO.
PO will use this to create a mapping file.
Need to have a mechanism in place so that additions and changes are updated automatically as much as possible.
One option (to start smaller) is to start with only terms associated with e.g. flower and leaf to see how far we can extend from there -- images that have keyword flower and any keyword associated with it.
Start by getting keywords from KN and doing a candidate mapping.
Standard format of mapping file is on SVN -- we will use same format.
PO is using UTF8 (not unicode) because AmiGO requires UTF, but we may go to unicode in the future.
KN: IP for pages on PS.org will be moving, so we want to stick with www.plantsystematics domain - going onto gigabit server.
KN: list of keywords, work on search function and give us a different url for it.
We will input a list of words, delimited however we want (& or whatever) and KN will write a query that takes those and does a fuzzy search on them then create the link to the correct search results page (he has something like this he has already used for taxonomic names - works with missing letter and internal errors - can transfer this over to keyword search, even though he hasn't done it yet)
Will need to figure out what delimiter to use to separate words: e.g., stamen&anther&filament
May want to have something like po_word:
Eventually, we could add other variables, like: po_word:something;taxon:something.
May also need to think about window dimension later, if we decide to incorporate a pop-up from PO site.