Skip to main content

Table 1 Examples of hypothetical missed genes that are associated with the strong COMBREX support level

From: Thousands of missed genes found in bacterial genomes and their analysis with COMBREX

Gene

Reasons for association with the strong support level

AE014295_orf00919 from Bifidobacterium longum NCC2705

Assigned to the NCBI curated cluster PRK11770. The ORF has 213 significant sequence homologs (BLAST E-values range between 1e-58 to 3e-09) in the cluster. The cluster contains 218 genes from 125 species belonging to 6 different phyla. It has a conserved domain along with a few cloned and purified members.

CP002334_orf00644 from Helicobacter pylori Lithuania75

Assigned to the NCBI cluster CLSK496073. All other members of the cluster are hypothetical proteins (NCBI annotation), but COMBREX identified 3 experimentally validated genes within the cluster.

AE017354_orf01466 from Legionella pneumophila

Has significant sequence similarity (BLAST E-Value 1e-09) to a gene from Aeromonas hydrophilathat is included in the gold-standard database in COMBREX (a novel set of genes with experimentally validated molecular function).

BA000023_orf01717 from Sulfolobus sokodaii str. 7

Has significant sequence similarity (BLAST E-Value 2e-21) to a protein from Sulfolobus solfataricus with NCBI annotation as hypothetical protein. However, in COMBREX this gene was identified as having a known 3D structure (PDB code: 2JTM).