Difference between revisions of "Questions and Issues Dec, 2011"

From Plant Ontology Wiki
Jump to navigationJump to search
 
(21 intermediate revisions by the same user not shown)
Line 3: Line 3:
 
Also see [http://www.cosmoss.org/ Cosmoss], the ''Physcomitrella patens'' resource site.
 
Also see [http://www.cosmoss.org/ Cosmoss], the ''Physcomitrella patens'' resource site.
  
Go to:[[Background info on Physco biology and culture]]
+
Go to: [[Background info on Physco biology and culture]]
 +
 
 +
 
 +
 
 +
Info about Physco on [https://www.genevestigator.com Genevestigator]:
 +
 
 +
"In its figures, Genevestigator displays expression values from probesets, because these are the real physical entities that were measured on the microarrays. However, you can enter gene identifiers in a variety of formats, and Genevestigator will search for probe sets that represent these genes on the microarrays.
 +
 
 +
CLASSIC and ADVANCED user can use asterisks (*) for wildcard searches.
 +
 
 +
For the selected Organism / Array type, the following gene identifier formats are available:
 +
• NCBI Gene (e.g. phypadraft_128494)
 +
 
 +
• Phytozome (e.g. Phypa_64616)
 +
 
 +
• Cosmoss (e.g. Phypa_161460)
 +
 
 +
• PLAZA (e.g. 152715)
 +
 
 +
 
 +
 
  
 
Excepted from email exchanges:
 
Excepted from email exchanges:
 +
 +
=DbxRefs for Annotations=
 +
 +
Need to create stanza
 +
 +
maizeGDB example:
 +
 +
 +
abbreviation: MaizeGDB
 +
 +
database: Maize Genetics and Genomics Database
 +
 +
object: Anything with a MaizeGDB Object ID Number or Gene Model Name
 +
 +
example_id: MaizeGDB:881225
 +
 +
generic_url: http://www.maizegdb.org/
 +
 +
url_syntax: http://www.maizegdb.org/cgi-bin/id_search.cgi?id=[example_id]
 +
 +
url_example: http://www.maizegdb.org/cgi-bin/id_search.cgi?id=881225
 +
 +
 +
 +
Physcomitrella Proposed, based on the assoc file:
 +
 +
abbreviation: cosmoss_PpV1.2
 +
 +
database: plantco.de|cosmoss.org
 +
 +
object: Anything with a Cosmoss accession number
 +
 +
example_id: cosmoss_PpV1.2:Pp1s47_77V2.1
 +
 +
generic_url: https://www.cosmoss.org/annotation/genonaut
 +
 +
url_syntax: https://www.cosmoss.org/annotation/genonaut?accession=ACCESSION&version=V1.2
 +
 +
 +
url_example: https://www.cosmoss.org/annotation/genonaut?accession=Pp1s53_22V2.1&version=V1.2
 +
 +
 +
==Notes from email (12-15-11) about Accession names and numbers:==
 +
 +
'''Numbering the Cosmoss Assembly versions:'''
 +
 +
The convention is that the '''number before the decimal point refers to the version of the assembly''' (which is currently still 1).
 +
 +
The number after the decimal point is the '''version of the annotation''' (V6): i.e V1.6 is assembly 1 annotation 6.
 +
 +
*Issue to be aware of:
 +
V1.2 is in the genonaut db only for archival purposes - V1.6 is the current release and hence also the release were manual annotations should go to.
 +
Therefore you can only access v1.2 entries via the URL directly and '''not using the genonaut database drop down menu'''.
 +
 +
This has the glitch that V1.2 entries are listed as belonging to the "playground" database.
 +
 +
 +
'''Numbering the Cosmoss Gene ID (CGI):'''
 +
 +
The trailing  \.\d+ i.e. .1 in Pp1s2_12V6.1 refers to splice variant 1.
 +
 +
In V1.2 all transcripts/proteins have .1 (in theory). The exception to this rule is split genes: if a locus was split in two models in V1.2 the two genes have .1 and .2.  V1.6 was the first annotation release to contain splice variants
 +
 +
 +
See helpful link to [https://www.cosmoss.org/physcome_project/wiki/CGI Cosmoss gene ID wiki]
  
 
=GAF 2.0 File format=
 
=GAF 2.0 File format=
 +
 +
Example from *.gaf2  file
  
 
Columns
 
Columns
  
1   Database: cosmoss_PpV1.2
+
1. Database: cosmoss_PpV1.2
  
2   Database_Object_ID:  Pp1s47_77V2
+
2. Database_Object_ID:  Pp1s47_77V2
 +
This is fine
  
(from SR: Wrt Genbank acc: we kept the phypadraft alias that can be used in entrez to retrieve the records (since we do not have a lookup table)).
+
For more info on the [https://www.cosmoss.org/annotation/genonaut Pp1s47_77V2]
  
3   Database_Object_symbol: if you have assigned a gene symbol the give that if not repeat the value from column-2 Pp1s47_77V2
+
3. Database_Object_symbol: if you have assigned a gene symbol the give that if not repeat the value from column-2 Pp1s47_77V2
  
4   Qualifier: Optional
+
4. Qualifier: Optional
  
5   PO:ID should go here
+
5. PO:ID should go here
  
6   Database:reference (provide a publication id with this expression data.  If not available type in a PMID of the genome paper as of now. Current value e.g. GB:PHYPADRAFT_181133 is not acceptable. Looks like GB:PHYPADRAFT_181133 is an alias(synonym).
+
6Database:reference (provide a publication id with this expression data.  If not available type in a PMID of the genome paper as of now.
Note: may have more than 1 but only the 1st one is displayed.
+
Note: may have more than 1 but only the last one is displayed.
  
current: GB:PHYPADRAFT_181133|PMID:18762443|PMID:18079367
+
current: GB:PHYPADRAFT_181133|PMID:18762443|PMID:18079367 (this is fine)
  
 
PMID:18762443 Lang,et al. (2008). Exploring plant biodiversity: the Physcomitrella genome and beyond. Trends in Plant Science, 13, 542-549.  
 
PMID:18762443 Lang,et al. (2008). Exploring plant biodiversity: the Physcomitrella genome and beyond. Trends in Plant Science, 13, 542-549.  
Line 32: Line 120:
 
PMID:18079367 Rensing et al. (2008). The Physcomitrella genome reveals evolutionary insights into the conquest of land by plants. Science, 319, 64-69.
 
PMID:18079367 Rensing et al. (2008). The Physcomitrella genome reveals evolutionary insights into the conquest of land by plants. Science, 319, 64-69.
  
should be PMID:18079367|GB:PHYPADRAFT_181133|PMID:18762443
+
7. Evidence code: IEP
 
 
7   Evidence code: IEP
 
  
8   with or from: leave it blank for now
+
8. with or from: leave it blank for now
  
9   Aspect 'A' if the PO id in column-5 is for anatomy-ok
+
'''9. Aspect 'A' if the PO id in column-5 is for anatomy-ok'''
  
10   Database_Object_name: gene name if you have any. If not leave it blank.optional
+
10. Database_Object_name: gene name if you have any. If not leave it blank. optional
  
11   Synonym: list all the aliases (synonyms) and deprecated IDs separated by the pipe (|) character. e.g. Phypa_181133|PHYPADRAFT_181133
+
11. Synonym: list all the aliases (synonyms) and deprecated IDs separated by the pipe (|) character. e.g. Phypa_181133|PHYPADRAFT_181133
  
12   Database_Object_type: 'mRNA' ok
+
12. Database_Object_type: 'mRNA' ok
  
13   taxon: this is fine as it stands
+
13. taxon: this is fine as it stands
  
14   date: this is fine as it stands
+
14. date: this is fine as it stands
 
      
 
      
15   Assigned_by:  plantco.de|cosmoss.org ok
+
15. Assigned_by:  plantco.de|cosmoss.org ok
  
16   Annotation_extension: TBA
+
16. Annotation_extension: TBA
  
17   Gene product form ID:  provide the gene model id if you know one that's specific for expression (remember one model ID /lane). By default it is for the longest canonical/consensus gene model id.
+
17. Gene product form ID:  provide the gene model id if you know one that's specific for expression (remember one model ID /lane). By default it is for the longest canonical/consensus gene model id.
  
 
=Reference page for the microarray data=
 
=Reference page for the microarray data=

Latest revision as of 22:03, 20 December 2011

Go back to: Cosmoss-_Physcomitrella page

Also see Cosmoss, the Physcomitrella patens resource site.

Go to: Background info on Physco biology and culture


Info about Physco on Genevestigator:

"In its figures, Genevestigator displays expression values from probesets, because these are the real physical entities that were measured on the microarrays. However, you can enter gene identifiers in a variety of formats, and Genevestigator will search for probe sets that represent these genes on the microarrays.

CLASSIC and ADVANCED user can use asterisks (*) for wildcard searches.

For the selected Organism / Array type, the following gene identifier formats are available: • NCBI Gene (e.g. phypadraft_128494)

• Phytozome (e.g. Phypa_64616)

• Cosmoss (e.g. Phypa_161460)

• PLAZA (e.g. 152715)



Excepted from email exchanges:

DbxRefs for Annotations

Need to create stanza

maizeGDB example:


abbreviation: MaizeGDB

database: Maize Genetics and Genomics Database

object: Anything with a MaizeGDB Object ID Number or Gene Model Name

example_id: MaizeGDB:881225

generic_url: http://www.maizegdb.org/

url_syntax: http://www.maizegdb.org/cgi-bin/id_search.cgi?id=[example_id]

url_example: http://www.maizegdb.org/cgi-bin/id_search.cgi?id=881225


Physcomitrella Proposed, based on the assoc file:

abbreviation: cosmoss_PpV1.2

database: plantco.de|cosmoss.org

object: Anything with a Cosmoss accession number

example_id: cosmoss_PpV1.2:Pp1s47_77V2.1

generic_url: https://www.cosmoss.org/annotation/genonaut

url_syntax: https://www.cosmoss.org/annotation/genonaut?accession=ACCESSION&version=V1.2


url_example: https://www.cosmoss.org/annotation/genonaut?accession=Pp1s53_22V2.1&version=V1.2


Notes from email (12-15-11) about Accession names and numbers:

Numbering the Cosmoss Assembly versions:

The convention is that the number before the decimal point refers to the version of the assembly (which is currently still 1).

The number after the decimal point is the version of the annotation (V6): i.e V1.6 is assembly 1 annotation 6.

  • Issue to be aware of:

V1.2 is in the genonaut db only for archival purposes - V1.6 is the current release and hence also the release were manual annotations should go to. Therefore you can only access v1.2 entries via the URL directly and not using the genonaut database drop down menu.

This has the glitch that V1.2 entries are listed as belonging to the "playground" database.


Numbering the Cosmoss Gene ID (CGI):

The trailing \.\d+ i.e. .1 in Pp1s2_12V6.1 refers to splice variant 1.

In V1.2 all transcripts/proteins have .1 (in theory). The exception to this rule is split genes: if a locus was split in two models in V1.2 the two genes have .1 and .2. V1.6 was the first annotation release to contain splice variants


See helpful link to Cosmoss gene ID wiki

GAF 2.0 File format

Example from *.gaf2 file

Columns

1. Database: cosmoss_PpV1.2

2. Database_Object_ID: Pp1s47_77V2 This is fine

For more info on the Pp1s47_77V2

3. Database_Object_symbol: if you have assigned a gene symbol the give that if not repeat the value from column-2 Pp1s47_77V2

4. Qualifier: Optional

5. PO:ID should go here

6. Database:reference (provide a publication id with this expression data. If not available type in a PMID of the genome paper as of now. Note: may have more than 1 but only the last one is displayed.

current: GB:PHYPADRAFT_181133|PMID:18762443|PMID:18079367 (this is fine)

PMID:18762443 Lang,et al. (2008). Exploring plant biodiversity: the Physcomitrella genome and beyond. Trends in Plant Science, 13, 542-549.

PMID:18079367 Rensing et al. (2008). The Physcomitrella genome reveals evolutionary insights into the conquest of land by plants. Science, 319, 64-69.

7. Evidence code: IEP

8. with or from: leave it blank for now

9. Aspect 'A' if the PO id in column-5 is for anatomy-ok

10. Database_Object_name: gene name if you have any. If not leave it blank. optional

11. Synonym: list all the aliases (synonyms) and deprecated IDs separated by the pipe (|) character. e.g. Phypa_181133|PHYPADRAFT_181133

12. Database_Object_type: 'mRNA' ok

13. taxon: this is fine as it stands

14. date: this is fine as it stands

15. Assigned_by: plantco.de|cosmoss.org ok

16. Annotation_extension: TBA

17. Gene product form ID: provide the gene model id if you know one that's specific for expression (remember one model ID /lane). By default it is for the longest canonical/consensus gene model id.

Reference page for the microarray data

It would be helpful to have a reference page or published reference to the microarray data cut-off points, based on established protocols.

SR: We are already working on converting all expression data that will go public via Genevestigator after the conference.

What is the status of this?

GO Annotations

DL sent a GOA file and it looks like the same format that we would need for the structure terms. Need to find out how to go about getting them submitted to the GO