Difference between revisions of "Initial Meeting- Aug 2011"
m (moved Submission of Association files for the Kaeppler gene expression data from MaizeGDB- Aug 2011 to Initial Meeting- Aug 2011) |
|||
(26 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
+ | Back to main page: [[MaizeGDB]] | ||
+ | |||
+ | |||
=Notes about the dataset:= | =Notes about the dataset:= | ||
From Mary Schaeffer by email: | From Mary Schaeffer by email: | ||
Line 7: | Line 10: | ||
*one way to reduce the data size: look only at gene models that are not expressed in all tissues – this will reduce by some 50% but it is still a big dataset. | *one way to reduce the data size: look only at gene models that are not expressed in all tissues – this will reduce by some 50% but it is still a big dataset. | ||
+ | |||
+ | ''There was concern that leaving out these gene models (e.g., housekeeping genes) would lead to strange results, if, for example, searches for all genes expressed in a tissue and doesn't find common genes expressed there.'' | ||
==Mappings between the maize tissues and PO terms== | ==Mappings between the maize tissues and PO terms== | ||
Line 27: | Line 32: | ||
* Note, the links to PO from each of the Atlas tissues are mostly not yet in production; most of my finalizing of the annotations was done after July 1. | * Note, the links to PO from each of the Atlas tissues are mostly not yet in production; most of my finalizing of the annotations was done after July 1. | ||
+ | |||
+ | ''PO wanted to have the comments from the Maize Atlas viewable by users. MaizeGDB has a page for each tissue, which includes these comments and more. We need to figure out how to link annotations to those pages, because normally annotation only links to the gene page from each data base. Maybe we should add them as xrefs to the corresponding PO terms?'' | ||
+ | |||
+ | ''Here is the link to their tissue pages: [[http://www.maizegdb.org/cgi-bin/termdoclist.cgi?ref=9021423&type=32466 Maize Developmental Gene Atlas]]'' | ||
+ | |||
+ | ''There is a page for each tissue, for example: [[http://www.maizegdb.org/cgi-bin/termrefs.cgi?id=2366347 anthers R1 B73]], with links to the PO terms and photos'' | ||
+ | |||
+ | ''PO should go through the tissue pages and make sure the references to PO terms are correct.'' | ||
=Questions= | =Questions= | ||
Line 33: | Line 46: | ||
* In a few cases, there is a classical gene name for the gene model. I assume these could be supplied as synonyms? Or, would you prefer they be supplied as a separate row? | * In a few cases, there is a classical gene name for the gene model. I assume these could be supplied as synonyms? Or, would you prefer they be supplied as a separate row? | ||
''Need to look at this, and ask PJ'' | ''Need to look at this, and ask PJ'' | ||
+ | |||
+ | ''RW: Columns 10 and 11 allow for specifying a gene name and a synonym. Often the gene symbol goes in column 11 as synonym, but column 11 can have cardinality >1, so you could put both symbol and classic gene name in that column.'' | ||
*Do you still wish to have separate files for anatomy and growth terms? | *Do you still wish to have separate files for anatomy and growth terms? | ||
Line 38: | Line 53: | ||
*Note, the instructions on the wiki for field 13. TAXON deal with Field 12 and should be altered. | *Note, the instructions on the wiki for field 13. TAXON deal with Field 12 and should be altered. | ||
− | '' | + | |
+ | ''This has been fixed.'' | ||
==Question about column 16 – annotation extension, in the PO associations file:== | ==Question about column 16 – annotation extension, in the PO associations file:== | ||
Line 55: | Line 71: | ||
*tassel initiation is probably part of V4-V6: [http://plantontology.org/amigo/go.cgi?query=tassel+initiation&search_constraint=terms&action=query&view=query] | *tassel initiation is probably part of V4-V6: [http://plantontology.org/amigo/go.cgi?query=tassel+initiation&search_constraint=terms&action=query&view=query] | ||
+ | |||
+ | |||
+ | ''RW: Use of column 16 is still fairly new in the ontology community. Based on the practice by GO, PO has defined column 16 for annotation extensions, that is, user-created or on-the-fly cross products. This means that column 16 must contain both a relation and an ontology term. (See [[PO_Annotation_Extensions_(column_16)]] for more details). Based on this usage, column 16 should not be used to link two growth stages that occur at the same time.'' | ||
+ | |||
+ | ''If two growth stages (e.g., one vegetative and one reproductive) always co-occur, the PO could create a relation to specify that. However, I suspect that such a relation would almost never occur across all species, and, as Pankaj pointed out, probably isn't even universal within one species, depending on genotype and environment.'' | ||
+ | |||
+ | ''It looks like the PO already has the synonyms for most of the reproductive stages that co-occur with the vegetative stages for maize (see links above). If not, we could add the synonyms you need. If there actually are two separate growth stages in the PO (or if there should be) and you want to link gene expression in a single tissue that corresponds to both of the stages, I suggest creating a separate line in the association file for each stage, as you have indicated in the mapping spread sheet.'' | ||
+ | |||
+ | |||
+ | Link to [http://wiki.plantontology.org:8080/index.php/Annotation_File_Format Annotation File Format Page] and GAF 2.0 [http://www.geneontology.org/GO.format.gaf-2_0.shtml file format] | ||
+ | |||
+ | |||
+ | ''Will create separate lines in association file. Column 16 can't really be used for this purpose at this time.'' | ||
=Issues and concerns= | =Issues and concerns= | ||
*From the POC conference call 8-2-11: | *From the POC conference call 8-2-11: | ||
-use of column 16 to designate the different stage descriptions in different sources | -use of column 16 to designate the different stage descriptions in different sources | ||
+ | |||
+ | -''See response from RW above.'' | ||
*Documentation of the statistical analysis and cut-off used for the microarray data- is this published yet? | *Documentation of the statistical analysis and cut-off used for the microarray data- is this published yet? | ||
+ | |||
+ | From MS, by email: | ||
+ | |||
+ | The data are published: [[http://www.ncbi.nlm.nih.gov/pubmed/21299659 Sekhon et el 2011]] – a project involving Robin Buell and Shawn Kaeppler. In collaboration with that group, MaizeGDB and PLEXdb have updated the dataset to match v2 gene models, as the publication was based on v1 and prior models. | ||
+ | |||
+ | Data are posted to our browser, under: [[http://gbrowse.maizegdb.org/cgi-bin/gbrowse/maize_v2/ Maize Atlas Nimblegen]] | ||
+ | |||
+ | The cutoff they used (log2) was 7. For PO, I was thinking of using a slightly higher cutoff, 7.6, recommended by Jack Gardiner in our group, although this doesn’t really cut too many out. | ||
+ | |||
+ | Just some examples of expression for known genes. ssu1, ssu2 (small subunit RuBISCO,), vp1 (viviporous1) - mapped to gene models | ||
+ | *MIN values over the dataset are:5.58, 5.36, 5.1 respectively | ||
+ | *Max values 14.91, 15.85 and 12.86, respectively | ||
+ | |||
+ | ''The data set does not include expression that does not match a gene model.'' | ||
+ | |||
+ | ''Information about methods, cutoff values, etc. should also go on the tissue web pages at MaizeGDB.'' | ||
=Plan of action:= | =Plan of action:= | ||
Line 66: | Line 113: | ||
*PO will review the mappings between the maize samples (60) and the PO terms (~52). ''''MS sent us the mappings as a spreadsheet and we discussed it on the POC conference call 8-2-11.'' | *PO will review the mappings between the maize samples (60) and the PO terms (~52). ''''MS sent us the mappings as a spreadsheet and we discussed it on the POC conference call 8-2-11.'' | ||
+ | |||
+ | ''RW: Three tissues (leaf 1st with sheath V3 B73, leaf 8th V9 B73, and leaf expanding V3 B73) should have annotations on PO:0009025 vascular leaf instead of PO:0025034 leaf.'' | ||
+ | |||
+ | ''RW went through Excel spread sheet and made recommendation for other changes/additions.'' | ||
+ | |||
*Do we need to add or modify any existing PO terms? Are we going to proceed with getting rid of the Zea "sensu" terms? | *Do we need to add or modify any existing PO terms? Are we going to proceed with getting rid of the Zea "sensu" terms? | ||
+ | |||
+ | ''RW: We have requests to merge PO:0006488 silk into PO:0009074 style, and PO:0007014 booting with PO:0007006 inflorescence just detectable.'' | ||
+ | |||
+ | Silk has no child terms, so merging it will not cause a problem. Once it is merged, the silk part_of Zea carpel relation will go away, and be replaced by the more general style part_of carpel (as will Poaceae style part_of Poaceae carpel). However, removing the silk part_of Zea carpel relation means that annotations will no longer be passed through Zea carpel to gynoecium of ear floret, to ear floret, etc.. If you want the annotation to show up on ear floret (and the structures it is part of), users will have to create an additional annotation for ear floret whenever they create one for style in maize. This is very similar to what you have to do for the parts of a leaf. | ||
+ | |||
+ | To get rid of Poaceae style, we will have to get rid of its children Poaceae style epidermis and Poaceae transmitting tissue. Both of these can be merged into their parent terms easily. Neither has annotations or children. | ||
+ | |||
+ | Details of merging booting into inflorescence just detectable are described on Source Forge. This will not cause problems with partonomy. | ||
+ | |||
+ | |||
+ | ''RW: Associations are already on the more general terms for the following:'' | ||
+ | |||
+ | anther (PO:0006473 Zea anther) | ||
+ | |||
+ | inflorescence bract (PO:0006337 inflorescence bract of ear) | ||
+ | |||
+ | style (PO:0006488 silk) | ||
+ | |||
+ | ''This is perfectly valid, but as described above, putting the annotation on the more general term means that it will not get passed up to the specific type of flower or inflorescence.'' | ||
+ | |||
+ | |||
+ | ''All present (LC, RW, JE, JP, and PJ from POC and MS from MaizeGDB) agreed to go ahead with merging Zea and poaceae terms into their parents. We will begin with the terms needed for these annotations (silk and Poaceae style into style and Zea anther and Poaceae anther into anther, plus whatever other terms are directly affected by those merges). Users who create associations for the parts of a floret (e.g., gynoecium, anther, style, ovary, etc.) will have to also create associations for the corresponding floret (tassel floret, ear floret, or the various types of tassel and ear florets). We may need to add a comment to some of the parts of flower to this affect. In the future, we should be able to create these associations automatically using column 16.'' | ||
+ | |||
+ | ''We should probably start using column 16 now, so users will know that two associations belong together.'' | ||
+ | |||
* Mary will upload a small file first, (perhaps the annotations to the structure terms first?) and then upload the larger file. | * Mary will upload a small file first, (perhaps the annotations to the structure terms first?) and then upload the larger file. | ||
+ | |||
+ | ''MS will send a small test file (about 1000 lines) to JE or put on SVN, so he can try loading it on beta browser to test. Should include some associations for anther or style so we can test how merging those terms works (see below).'' |
Latest revision as of 21:14, 11 April 2012
Back to main page: MaizeGDB
Notes about the dataset:
From Mary Schaeffer by email:
- These are all new annotations – and there are a lot, as each gene-model has some expression. I am lumping the putative splice variants to one model.
- For our 60 tissues for this set, the number of PO terms is some 52 distinct ones.
- one way to reduce the data size: look only at gene models that are not expressed in all tissues – this will reduce by some 50% but it is still a big dataset.
There was concern that leaving out these gene models (e.g., housekeeping genes) would lead to strange results, if, for example, searches for all genes expressed in a tissue and doesn't find common genes expressed there.
Mappings between the maize tissues and PO terms
From Mary Schaeffer by email:
Here is the table, pulled from MaizeGDB using a curation interface. The comments column includes both the PO terms and some other stuff. I am still reviewing this a bit, and see that I missed one, but this should be adequate to get some inputs from you on what associations you would like to receive from us.
The first column is the tissue name in MaizeGDB, the second a description provided by the researchers; the third, the related maize-specific term; the last the other comments, mostly the string of PO accessions and term names used. These later are also parsed into a separate extDB table so we can link to PO.org.
- General comments about the annotation process:
First, I linked each tissues to the maize specific terms, which in turn are linked to PO, using PO definitions, and in some cases requesting tweaks from your group in the definitions to make a better fit to a generic plant.
Then, I used these PO terms as a starting point, with extensive editing and review, that included checking all terms vs definitions, especially the temporal ones, and with some requests to the PO re tweaking definitions to fit better a generic case. Note, where the maize term silk, in MaizeGDB is linked to all the terms for silk at PO, I will only use the Zea one for the association file (even thought this seems smelly).
- General Note: The stages of maize growth used by Kaeppler group are those from the extension booklet: Ritchie et al ‘How a Corn Plant Develops’ reprinted 2010. The stages used by MaizeGDB are from Abbe & Stein, with synonyms to the Ritchie staging, based on work by Pat Byrne at MaizeGDB in the early-mid 90’s. Kiesselbach was often used here, along with general Esau Plant Anatomy (the full version, not the recent watered-down edition). The images were useful for general staging, and I often asked one of the research dudes re kernel stages, vs the #days after pollination they show. They also have embryos in the freezer, and I have inquired about getting better staging on the embryos that days after pollination. In general, their staging was close to that for Iowa booklet, but a bit earlier, by ~2 days.
- Specific note: The leaf number stage in maize is when the leaf is fully extended. Typically about 2 more leaves are visible at this time, without pulling apart the plant) So when it says V3, this matches the 5 leaf visible stage in PO, etc.
- Note, the links to PO from each of the Atlas tissues are mostly not yet in production; most of my finalizing of the annotations was done after July 1.
PO wanted to have the comments from the Maize Atlas viewable by users. MaizeGDB has a page for each tissue, which includes these comments and more. We need to figure out how to link annotations to those pages, because normally annotation only links to the gene page from each data base. Maybe we should add them as xrefs to the corresponding PO terms?
Here is the link to their tissue pages: [Maize Developmental Gene Atlas]
There is a page for each tissue, for example: [anthers R1 B73], with links to the PO terms and photos
PO should go through the tissue pages and make sure the references to PO terms are correct.
Questions
from MS by email:
- In a few cases, there is a classical gene name for the gene model. I assume these could be supplied as synonyms? Or, would you prefer they be supplied as a separate row?
Need to look at this, and ask PJ
RW: Columns 10 and 11 allow for specifying a gene name and a synonym. Often the gene symbol goes in column 11 as synonym, but column 11 can have cardinality >1, so you could put both symbol and classic gene name in that column.
- Do you still wish to have separate files for anatomy and growth terms?
I think that might be a good idea as well to make the huge file easier to deal with.
- Note, the instructions on the wiki for field 13. TAXON deal with Field 12 and should be altered.
This has been fixed.
Question about column 16 – annotation extension, in the PO associations file:
- Could this be used for for a staging based on numbers of leaves, that corresponds to other stages in maize?
Example: “leaf tip expanding V7 B73” corresponds to maize growth stage: “2 tassel initiation/early whorl stage”.
- Should a term for tassel initiation be added here, for the association of “PO:0007063 LP.07 7 leaves visible”?
from PJ via email:
In the meantime, I think adding the info in column 16 would be a good idea. Unless we had it work in PO, you may want to check for consistency. I know it can change depending on the genotype and growth environment, but this is what we added when we created the growth and development section.
- ear initiation at maize V7 stage: [1]
- tassel initiation is probably part of V4-V6: [2]
RW: Use of column 16 is still fairly new in the ontology community. Based on the practice by GO, PO has defined column 16 for annotation extensions, that is, user-created or on-the-fly cross products. This means that column 16 must contain both a relation and an ontology term. (See PO_Annotation_Extensions_(column_16) for more details). Based on this usage, column 16 should not be used to link two growth stages that occur at the same time.
If two growth stages (e.g., one vegetative and one reproductive) always co-occur, the PO could create a relation to specify that. However, I suspect that such a relation would almost never occur across all species, and, as Pankaj pointed out, probably isn't even universal within one species, depending on genotype and environment.
It looks like the PO already has the synonyms for most of the reproductive stages that co-occur with the vegetative stages for maize (see links above). If not, we could add the synonyms you need. If there actually are two separate growth stages in the PO (or if there should be) and you want to link gene expression in a single tissue that corresponds to both of the stages, I suggest creating a separate line in the association file for each stage, as you have indicated in the mapping spread sheet.
Link to Annotation File Format Page and GAF 2.0 file format
Will create separate lines in association file. Column 16 can't really be used for this purpose at this time.
Issues and concerns
- From the POC conference call 8-2-11:
-use of column 16 to designate the different stage descriptions in different sources
-See response from RW above.
- Documentation of the statistical analysis and cut-off used for the microarray data- is this published yet?
From MS, by email:
The data are published: [Sekhon et el 2011] – a project involving Robin Buell and Shawn Kaeppler. In collaboration with that group, MaizeGDB and PLEXdb have updated the dataset to match v2 gene models, as the publication was based on v1 and prior models.
Data are posted to our browser, under: [Maize Atlas Nimblegen]
The cutoff they used (log2) was 7. For PO, I was thinking of using a slightly higher cutoff, 7.6, recommended by Jack Gardiner in our group, although this doesn’t really cut too many out.
Just some examples of expression for known genes. ssu1, ssu2 (small subunit RuBISCO,), vp1 (viviporous1) - mapped to gene models
- MIN values over the dataset are:5.58, 5.36, 5.1 respectively
- Max values 14.91, 15.85 and 12.86, respectively
The data set does not include expression that does not match a gene model.
Information about methods, cutoff values, etc. should also go on the tissue web pages at MaizeGDB.
Plan of action:
- Mary will work with JE to get SVN access set up, done
- PO will review the mappings between the maize samples (60) and the PO terms (~52). ''MS sent us the mappings as a spreadsheet and we discussed it on the POC conference call 8-2-11.
RW: Three tissues (leaf 1st with sheath V3 B73, leaf 8th V9 B73, and leaf expanding V3 B73) should have annotations on PO:0009025 vascular leaf instead of PO:0025034 leaf.
RW went through Excel spread sheet and made recommendation for other changes/additions.
- Do we need to add or modify any existing PO terms? Are we going to proceed with getting rid of the Zea "sensu" terms?
RW: We have requests to merge PO:0006488 silk into PO:0009074 style, and PO:0007014 booting with PO:0007006 inflorescence just detectable.
Silk has no child terms, so merging it will not cause a problem. Once it is merged, the silk part_of Zea carpel relation will go away, and be replaced by the more general style part_of carpel (as will Poaceae style part_of Poaceae carpel). However, removing the silk part_of Zea carpel relation means that annotations will no longer be passed through Zea carpel to gynoecium of ear floret, to ear floret, etc.. If you want the annotation to show up on ear floret (and the structures it is part of), users will have to create an additional annotation for ear floret whenever they create one for style in maize. This is very similar to what you have to do for the parts of a leaf.
To get rid of Poaceae style, we will have to get rid of its children Poaceae style epidermis and Poaceae transmitting tissue. Both of these can be merged into their parent terms easily. Neither has annotations or children.
Details of merging booting into inflorescence just detectable are described on Source Forge. This will not cause problems with partonomy.
RW: Associations are already on the more general terms for the following:
anther (PO:0006473 Zea anther)
inflorescence bract (PO:0006337 inflorescence bract of ear)
style (PO:0006488 silk)
This is perfectly valid, but as described above, putting the annotation on the more general term means that it will not get passed up to the specific type of flower or inflorescence.
All present (LC, RW, JE, JP, and PJ from POC and MS from MaizeGDB) agreed to go ahead with merging Zea and poaceae terms into their parents. We will begin with the terms needed for these annotations (silk and Poaceae style into style and Zea anther and Poaceae anther into anther, plus whatever other terms are directly affected by those merges). Users who create associations for the parts of a floret (e.g., gynoecium, anther, style, ovary, etc.) will have to also create associations for the corresponding floret (tassel floret, ear floret, or the various types of tassel and ear florets). We may need to add a comment to some of the parts of flower to this affect. In the future, we should be able to create these associations automatically using column 16.
We should probably start using column 16 now, so users will know that two associations belong together.
- Mary will upload a small file first, (perhaps the annotations to the structure terms first?) and then upload the larger file.
MS will send a small test file (about 1000 lines) to JE or put on SVN, so he can try loading it on beta browser to test. Should include some associations for anther or style so we can test how merging those terms works (see below).