3.3 Other Information
1. DATA SOURCE
For the collection of proteins that are regulated by hypoxia we have extensively searched PUBMED database and collected the relevant literature manually. The proteins-hypoxia relationship documented in the current release was collected using the following keywords for PUBMED searching hypoxia, ischemia, Homo sapiens. As a result, >2500 hypoxia-related publications were obtained, most of which were high throughput hypoxia related studies or replication studies of the previous findings. After manually scanning the abstracts of obtained findings, we excluded the reviews and those that studied the diseases other than hypoxia. Finally we retained ~65 publications which studied hypoxia by high throughput experimentation and ~1500 supporting articles that talked about one or more hypoxia candidate variants and hypoxia regulated proteins. Each entry in the database contains detailed information about hypoxia-Protein relationship, including a basic description of the protein, the expression pattern of the protein (up regulated or down regulated) in humans, the experimentally validated tissue specific expression of the protein, protein correlation with hypoxia, Protein-Protein (PPI) network, main results and conclusion of the publication etc. since we aim to construct a hypoxia protein database, we organized these data as protein-centered by manually converting the different protein names in the publications to the unique Genbank Protein GI numbers. We then converted the obtained Genbank Protein GI to the related ENTREZ IDs which were then kept as the standard protein identifiers to retrieve the other protein specific information. In the current release of HYPOXIA DB, 3500 proteins were selected for their relationship with hypoxia. Next, we built the hypoxia database HYPOXIADB by integrating the data that we collected specific for hypoxia with information from the other sources, which makes HYPOXIA DB a one stop and knowledgeable platform for the hypoxia research community.
2. Protein Annotations.
Besides the detailed hypoxia related information extracted from publications for each hypoxia regulated gene/protein, HYPOXIADB also provide with the following useful annotations for each gene/protein
It include Protein Name, Protein Symbol, Aliases, Chromosome Location, Organism, Gene ID, HGNC Id, Genbank Protein GI numbers, Unigene ID, Uniprot ID, Ensembl ID, Vega ID, OMIM ID, HPRD ID and Genbank Protein Accessions. Some of The protein/gene information was extracted from gene_info file downloaded from NCBI ftp ( ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/GENE_INFO/); while other was parsed using the R package org.Hs.eg.db.
The GO annotations were parsed from gene2go file, which was downloaded from NCBI ftp (ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/) and it includes GO ID, GO Term, Category and Evidence along with the PUBMED ID's.
The protein pathway data was parsed using the R packages org.Hs.eg.db and KEGG.db and also intensive manual curation was done to make sure that the pathways are linked correctly
The protein family information was extracted using the R packages org.Hs.eg.db and PFAM.db
PDB Ids were extracted using the BioDBnet and the screenshots of the PDB structures were retrieved using the wget method.
As HPRD is one of the biggest human protein-protein interaction databases. We extracted the protein-protein interaction information of HYPOXIA related proteins from the current version of HPRD (Release 9). Except for a table listing all the interactors of the Hypoxia related protein, we also have graphic showing for the interaction network of HYPOXIA related proteins of up to 20 interactors.
For all the Genbank accessions present in the database Entrez utilities were used to retrieve the Fasta sequences.
As homologene database tells about the homology of a protein/gene of one species with that of the other species, the homologene ID was extracted using BIODBnet and the corresponding homology information was parsed from homologene.xml file which was downloaded NCBI ftp (ftp://ftp.ncbi.nih.gov/pub/HomoloGene/current/) and it includes Homologene group ID, Gene ID, Taxon name, Protein GI's and Genbank Protein Accession Numbers.
Some useful database cross links are available for each gene:Entrez database, Gene, Protein database, Ensembl database, HPRD database, OMIM database, Homologene database, PDB, IPI database, PFam database, GO database, KEGG database etc.
3. Web Interface
Hypoxia DB has a user friendly web interface to make the database more useful for the research community. There are many options in HYPOXIA DB to serve the needs and requirements of the users. The web Interface of HYPOXIA DB is simple and user ready with many options such as Browse, Search, BLAST, Browse, Statistics, Facts, Contacts, Feedback and Submit to avoid the confusion and to provide the users with the most relevant information.
Users can Browse the data present in HYPOXIA DB at 3 different levels. The complete data of proteins present in HYPOXIA DB was manually reviewed and divided into the various category as follows:
This browse option can be used to browse the list of proteins according to their alphabetic order. The browse result is presented in the form of a table having two relevant fields viz. Proteins Symbols and Protein Names starting with a particular alphabet. Each protein symbol is then linked to the particular Protein Page which contains all the information about a particular protein present in HYPOXIA DB. User can also browse through the complete list of proteins according to the alphabetic order by clicking on the complete list option present below the alphabets list.
Using this browse option the user can browse through the list of proteins according to the various categories as listed below
The browse result is in the form of a table consisting of information fields such as Protein Symbol, Level of Regulation, fold Change, % Hypoxia, Tissue Specificity and Reference ID. The Field containing Protein Symbols links the user of particular Protein Page where as the Field containing Reference IDs links the user to the specific research publication.
This browse option can be used to retrieve the protein entries present in HYPOXIA DB according to its chromosomal location; the result is in the form of a table where the proteins have been arranged according to the Chromosomal Band location along with the other useful information such as Protein Symbol and Protein name. Protein Symbol is then linked to the particular Protein Page that contains useful protein annotations about a particular protein present in HYPOXIA DB. Users can also browse the complete list of proteins arranged according to the chromosomal location simply by clicking on the complete list option present right at the end of the Browse Page.
1. Quick Search
Quick search can be done using any of the following Protein Identifiers
These Protein Identifiers viz. Protein Name and Protein Symbol can be used for searching the similar protein entities present in HYPOXIA DB.
2. Search by Category
User can also search HYPOXIA DB by employing another method of searching the data in HYPOXIA DB i.e. Search by Category. In this search option user can give the following protein identifiers as the Query Term or ID
1. Gene Ontology ID or Term
Using this search by category option the user can search HYPOXIA DB using the GO ID or Term. The result obtained is the similarity matching result of the user query for i.e. the Query GO ID "GO: 000" will list all the GO ids starting with the query ID as result. Similarly the result for the GO query term "Protein" will give the similarity matches as the output i.e. all the GO terms having the query term "Protein" in the string will be obtained as the output. The query result is retrieved in the form of a table having the following fields, viz. Entrez ID, GO ID, GO Term, Category, and Evidence (Reference). The Entrez ID is linked to particular Protein Page, GO Id is linked to Gene Ontology database (AMIGO); the Evidence is linked to the GO Evidence Code whereas the Reference id is linked to the particular Research Papers from which the Evidence of the GO Term has been taken.
2. KEGG Pathway ID or Term
Using this search by category option the user can search HYPOXIA DB using the KEGG ID or Term. The result obtained is the similarity matching result of the user query i.e. the Query KEGG ID "hsa000" will list all the KEGG ids starting with the query ID as result. Similarly the result for the KEGG query term "Metabolic" will give the similarity matches as the output i.e. all the KEGG terms having the query term "Metabolic" in the string will be obtained as the output. The query result is retrieved in the form of a table having the following fields Entrez ID, KEGG ID and KEGG Term. The Entrez ID is linked to the particular Protein Page and the KEGG Id is linked to KEGG Pathway database.
3. PFAM ID
This search by category option can be used if the user wants to search HYPOXIA DB using the PFAM ID. The result obtained is the exact match of the user query, for e.g. the query ID "PF00012" will give only those results that exactly matches with the user query. The query result is obtained in the form of a table having the fields Entrez id, IPI ID, PFAM ID and Protein Family Identifier. The Entrez ID is linked to particular Protein Page, IPI ID is linked to the International Protein Index Database introduced by EMBL-EBI and PFAM ID is linked to the PFAM Database introduced by the Sanger's Institute.
4. PDB ID
This search by category option can be used to search HYPOXIA DB using the PDB IDs. The result obtained is the exact match of the user query, for e.g. the query ID "3LDO" will give only those results that exactly match with the user query. The query result is in the form of a table having data field viz. ENTREZ ID of the Proteins which have a structure matching with the enquired PDB ID. The Entrez ID is linked to particular Protein Page which provides all the other information about the protein.
5. OMIM ID
Using this search option user can search HYPOXIA DB using the OMIM IDs. The result obtained is the exact match of the user query for e.g. the query ID "138120" will give only those results that exactly matches with the user query. The query result is in the form of a list of ENTREZ IDs which have Mendelian Inheritance specifications matching with the enquired OMIM ID. The Entrez ID is linked to particular Protein Page which provides information about other protein annotations.
The information present on the Protein Page is divided into 3 levels
HYPOXIA DB has been augmented as it links a particular protein to other relevant external resources. It includes:
HYPOXIA DB is made after intensive literature search and during the course of our study we found that some proteins are more correlated to hypoxia than others. Correlation of Proteins with hypoxia is given here to provide information about the correlation of proteins with hypoxia. It provide information regarding the level of regulation, fold change, % of hypoxia, Tissue in which the protein is expressed, Reference IDs of the research publication. The last column enlists and the type of study done (Genomics/Proteomics/Transcriptomics/Transcriptomics and Proteomics) in the paper to associate the protein with hypoxia.
Other Information that are present in the individual PROTEIN PAGE gives information about
1. GO Ontology: By clicking on the GO Ontology link user can get the complete information about the Gene Ontology of the protein. It gives information about the GO ID which is further externally linked to the AMIGO Database, GO Term, Category of the GO Term and Evidence (PUBMED) which is linked to the individual Research papers with the help of the PUBMED IDs
2. KEGG Pathway: By using KEGG Pathway link user can get the complete information about the pathways in which the protein is involved. It gives information about the KEGG ID which is further externally linked to the KEGG Pathway Database and KEGG Term.
3. Homologene Information: Homologene Information is present for most of the proteins in HYPOXIADB is linked to the protein's homology page which gives information about the presence of protein in the genome sequence of other species. Information regarding Homologene group id, Species in which the protein entry has been found to occur, Entrez id and protein accession number is present in the homology page.
4. OMIM: It contains the list of omim ids and name of the associated disorder related to a particular protein and these omim ids are then cross linked to the OMIM database.
5. Protein Family Information: A click on the Protein Family Information link can give users the complete information about the protein family. It gives information about the IPI ID which is further externally linked to the IPI Database, PFAM ID which is linked to the PFAM Database externally and Protein Family Identifier.
6. Protein-Protein Interaction: Users can get the complete information about the protein-protein interaction of a particular protein simply by clicking the Protein-Protein Information link present on the individual Protein Pages. It gives information about Integrator's name, HPRD ID, Evidence of Interaction and the PUBMED IDs of the research papers which provides the evidence for the same. It also provides with the graphical view of interactions between the Protein of Interest and the Interactor. Protein-Protein Interaction page is linked to HPRD to provide the users with the information about the proteins that are not present in HYPOXIA DB (color coded: BLUE) and the Interactor that are coded in RED are linked to particular Protein Pages of the proteins that are present in HYPOXIA DB.
7. PDB : All the information regarding the PDB Ids and and their corresponding protein structures is present in the PDB link
8. FASTA Sequence: A single click on the FASTA Sequence link will the Protein FASTA Sequence easily available to the users for further analysis like BLAST which has been integrated in HYPOXIA DB along with the other available tools.
Statistics is one of the most common ways to show the complexity and coverage of database. The same can has been described on the Statistics and Facts page of HYPOXIA DB. Users can find the significant statistical analysis of the data presented in the database. The data is disrtibuted based on the significant statistical values of Biological Processes, Molecular functions, Cellular components, KEGG pathways and Homology distribution based on Gene Ontology, KEGG pathway and related homology studies. It also exhibits the distribution of Proteins and related entries in the database based on their Chromosomal Location and related references.
Further, a customized BLAST tool has been made available that search user-defined query against the sequences available in the database. It may be useful in characterization of the orphan sequences and fishing out homologous protein sequences from the database, based on sequence similarity.
The different types of matrices available to perform BLAST are:
Users can also choose the e-value cutoffs given at HYPOXIA DB. The available e value cut offs provided by HYPOXIA Db are:
The BLAST results are presented in the form of the HTML document according to the choice of matrix and the e value submitted by the user.
Additionally, an online submission facility has been provided in the database to allow users to submit protein entries that are associated with hypoxia. Once the user adds the protein information with the specified field as given in the submission form, the information will be uploaded after validation.
Also, an online feedback form has been provided in the database to help improve HYPOXIA DB and update it to meet the needs and requirements of the scientific community working in hypoxia and related disorders. hypoxia research has been continuing to grow and HYPOXIA DB encourage users feedback including error reports and feature requests with the hope to make HYPOIXADB a comprehensive resource to facilitate resource to facilitate hypoxia proteomic research which may lead to some novel treatments.