Initial Meeting- Aug 2011
Notes about the dataset:
From Mary Schaeffer by email:
- These are all new annotations – and there are a lot, as each gene-model has some expression. I am lumping the putative splice variants to one model.
- For our 60 tissues for this set, the number of PO terms is some 52 distinct ones.
- one way to reduce the data size: look only at gene models that are not expressed in all tissues – this will reduce by some 50% but it is still a big dataset.
Mappings between the maize tissues and PO terms
From Mary Schaeffer by email:
Here is the table, pulled from MaizeGDB using a curation interface. The comments column includes both the PO terms and some other stuff. I am still reviewing this a bit, and see that I missed one, but this should be adequate to get some inputs from you on what associations you would like to receive from us.
The first column is the tissue name in MaizeGDB, the second a description provided by the researchers; the third, the related maize-specific term; the last the other comments, mostly the string of PO accessions and term names used. These later are also parsed into a separate extDB table so we can link to PO.org.
- General comments about the annotation process:
First, I linked each tissues to the maize specific terms, which in turn are linked to PO, using PO definitions, and in some cases requesting tweaks from your group in the definitions to make a better fit to a generic plant.
Then, I used these PO terms as a starting point, with extensive editing and review, that included checking all terms vs definitions, especially the temporal ones, and with some requests to the PO re tweaking definitions to fit better a generic case. Note, where the maize term silk, in MaizeGDB is linked to all the terms for silk at PO, I will only use the Zea one for the association file (even thought this seems smelly).
- General Note: The stages of maize growth used by Kaeppler group are those from the extension booklet: Ritchie et al ‘How a Corn Plant Develops’ reprinted 2010. The stages used by MaizeGDB are from Abbe & Stein, with synonyms to the Ritchie staging, based on work by Pat Byrne at MaizeGDB in the early-mid 90’s. Kiesselbach was often used here, along with general Esau Plant Anatomy (the full version, not the recent watered-down edition). The images were useful for general staging, and I often asked one of the research dudes re kernel stages, vs the #days after pollination they show. They also have embryos in the freezer, and I have inquired about getting better staging on the embryos that days after pollination. In general, their staging was close to that for Iowa booklet, but a bit earlier, by ~2 days.
- Specific note: The leaf number stage in maize is when the leaf is fully extended. Typically about 2 more leaves are visible at this time, without pulling apart the plant) So when it says V3, this matches the 5 leaf visible stage in PO, etc.
- Note, the links to PO from each of the Atlas tissues are mostly not yet in production; most of my finalizing of the annotations was done after July 1.
Questions
from MS by email:
- In a few cases, there is a classical gene name for the gene model. I assume these could be supplied as synonyms? Or, would you prefer they be supplied as a separate row?
Need to look at this, and ask PJ
RW: Columns 10 and 11 allow for specifying a gene name and a synonym. Often the gene symbol goes in column 11 as synonym, but column 11 can have cardinality >1, so you could put both symbol and classic gene name in that column.
- Do you still wish to have separate files for anatomy and growth terms?
I think that might be a good idea as well to make the huge file easier to deal with.
- Note, the instructions on the wiki for field 13. TAXON deal with Field 12 and should be altered.
I am not sure I understand to which page you are referring. The info shown here: http://wiki.plantontology.org:8080/index.php/Annotation_File_Format looks like it matches the GO page (http://www.geneontology.org/GO.format.gaf-2_0.shtml), as it should. Could you please send the link?
RW: I see what the problem was (wrong description for taxon column), and I have fixed it.
Question about column 16 – annotation extension, in the PO associations file:
- Could this be used for for a staging based on numbers of leaves, that corresponds to other stages in maize?
Example: “leaf tip expanding V7 B73” corresponds to maize growth stage: “2 tassel initiation/early whorl stage”.
- Should a term for tassel initiation be added here, for the association of “PO:0007063 LP.07 7 leaves visible”?
from PJ via email:
In the meantime, I think adding the info in column 16 would be a good idea. Unless we had it work in PO, you may want to check for consistency. I know it can change depending on the genotype and growth environment, but this is what we added when we created the growth and development section.
- ear initiation at maize V7 stage: [1]
- tassel initiation is probably part of V4-V6: [2]
RW: Use of column 16 is still fairly new in the ontology community. Based on the practice by GO, PO has defined column 16 for annotation extensions, that is, user-created or on-the-fly cross products. This means that column 16 must contain both a relation and an ontology term. (See PO_Annotation_Extensions_(column_16) for more details). Based on this usage, column 16 should not be used to link two growth stages that occur at the same time.
If two growth stages (e.g., one vegetative and one reproductive) always co-occur, the PO could create a relation to specify that. However, I suspect that such a relation would almost never occur across all species, and, as Pankaj pointed out, probably isn't even universal within one species, depending on genotype and environment.
It looks like the PO already has the synonyms for most of the reproductive stages that co-occur with the vegetative stages for maize (see links above). If not, we could add the synonyms you need. If there actually are two separate growth stages in the PO (or if there should be) and you want to link gene expression in a single tissue that corresponds to both of the stages, I suggest creating a separate line in the association file for each stage, as you have indicated in the mapping spread sheet.
Issues and concerns
- From the POC conference call 8-2-11:
-use of column 16 to designate the different stage descriptions in different sources
- Documentation of the statistical analysis and cut-off used for the microarray data- is this published yet?
Plan of action:
- Mary will work with JE to get SVN access set up, done
- PO will review the mappings between the maize samples (60) and the PO terms (~52). ''MS sent us the mappings as a spreadsheet and we discussed it on the POC conference call 8-2-11.
- Do we need to add or modify any existing PO terms? Are we going to proceed with getting rid of the Zea "sensu" terms?
- Mary will upload a small file first, (perhaps the annotations to the structure terms first?) and then upload the larger file.