Questions and Issues Dec, 2011
Go back to: Cosmoss-_Physcomitrella page
Also see Cosmoss, the Physcomitrella patens resource site.
Go to: Background info on Physco biology and culture
Excepted from email exchanges:
DbxRefs for Annotations
Need to create stanza
maizeGDB example:
abbreviation: MaizeGDB
database: Maize Genetics and Genomics Database
object: Anything with a MaizeGDB Object ID Number or Gene Model Name
example_id: MaizeGDB:881225
generic_url: http://www.maizegdb.org/
url_syntax: http://www.maizegdb.org/cgi-bin/id_search.cgi?id=[example_id]
url_example: http://www.maizegdb.org/cgi-bin/id_search.cgi?id=881225
Physcomitrella Proposed, based on the assoc file:
abbreviation: cosmoss_PpV1.2
database: plantco.de|cosmoss.org
object: Anything with a Cosmoss accession number
example_id: cosmoss_PpV1.2:Pp1s47_77V2.1
generic_url: https://www.cosmoss.org/annotation/genonaut
url_syntax: https://www.cosmoss.org/annotation/genonaut?accession=ACCESSION&version=V1.2
url_example: https://www.cosmoss.org/annotation/genonaut?accession=Pp1s53_22V2.1&version=V1.2
Notes from email (12-15-11) about the database versions:
Numbering the Cosmoss Assembly versions:
The convention is that the number before the decimal point refers to the version of the assembly (which is currently still 1).
The number after the decimal point is the version of the annotation (V6): i.e V1.6 is assembly 1 annotation 6.
Numbering the Cosmoss Gene ID (CGI):
The trailing \.\d+ i.e. .1 in Pp1s2_12V6.1 refers to splice variant 1.
- V1.6 was the first annotation release to contain splice variants
- In V1.2 all transcripts/proteins have .1 (in theory). The exception to this rule is split genes: if a locus was split in two models in V1.2 the two genes have .1 and .2.
See helpful link to Cosmoss gene ID wiki
GAF 2.0 File format
Example from *.gaf2 file
Columns
1. Database: cosmoss_PpV1.2
2. Database_Object_ID: Pp1s47_77V2 This is fine
For more info on the Pp1s47_77V2
3. Database_Object_symbol: if you have assigned a gene symbol the give that if not repeat the value from column-2 Pp1s47_77V2
4. Qualifier: Optional
5. PO:ID should go here
6. Database:reference (provide a publication id with this expression data. If not available type in a PMID of the genome paper as of now. Note: may have more than 1 but only the last one is displayed.
current: GB:PHYPADRAFT_181133|PMID:18762443|PMID:18079367 (this is fine)
PMID:18762443 Lang,et al. (2008). Exploring plant biodiversity: the Physcomitrella genome and beyond. Trends in Plant Science, 13, 542-549.
PMID:18079367 Rensing et al. (2008). The Physcomitrella genome reveals evolutionary insights into the conquest of land by plants. Science, 319, 64-69.
7. Evidence code: IEP
8. with or from: leave it blank for now
9. Aspect 'A' if the PO id in column-5 is for anatomy-ok
10. Database_Object_name: gene name if you have any. If not leave it blank. optional
11. Synonym: list all the aliases (synonyms) and deprecated IDs separated by the pipe (|) character. e.g. Phypa_181133|PHYPADRAFT_181133
12. Database_Object_type: 'mRNA' ok
13. taxon: this is fine as it stands
14. date: this is fine as it stands
15. Assigned_by: plantco.de|cosmoss.org ok
16. Annotation_extension: TBA
17. Gene product form ID: provide the gene model id if you know one that's specific for expression (remember one model ID /lane). By default it is for the longest canonical/consensus gene model id.
Reference page for the microarray data
It would be helpful to have a reference page or published reference to the microarray data cut-off points, based on established protocols.
SR: We are already working on converting all expression data that will go public via Genevestigator after the conference.
What is the status of this?
GO Annotations
DL sent a GOA file and it looks like the same format that we would need for the structure terms. Need to find out how to go about getting them submitted to the GO