functional impact of protein mutations
release   3  

papers | about | privacy | changes | how it works | web API

How it works

This server provides semantic linking to variant analysis, annotations, variant multiple sequence alignment html page, and variant 3D structure page.

Please note that the analysis of submitted variations is done asynchronously - if a new variant falls into a protein domain which does not yet have a multiple sequence alignment (MSA) in the server database, "word [sent]" is returned in the "MSA" field until the MSA is built. You can see the size of current MSA queue on about page. The same approach applies when computing Functional Impact scores of new variants.



The server accepts list of variants, one variant per line, plus optional text describing your variants,

in genomic coordinates, "+" strand assumed :

  <genome build>,<chromosome>,<position>,<reference allele>,<substituted allele>

Genome build is optional (build 19 assumed), accepted values: 'hg19' and 'hg38'


  hg38,13,32338418,G,T   BRCA2

  hg19,7,55211080,G,A   EGFR

  7,55211080,G,A   EGFR


or in protein space:

  <protein ID> <variant> <text>,

where <protein ID> can be :

- Uniprot protein accession (e.g. EGFR_HUMAN)

- NCBI Refseq protein ID (e.g. NP_005219)



  EGFR_HUMAN R98Q Polymorphism

  EGFR_HUMAN G719D disease

  NP_000537 G356A

  NP_000537 G360A dbSNP:rs35993958

  NP_000537 S46A Abolishes phosphorylation


ID types can be mixed in one list in any way.


The server maps each variant to both Uniprot and Refseq protein sequences (if possible).

If the reference residue in the Uniprot protein sequence is

different from the one indicated in your variant the analysis will not be performed.

For non-human variants please use Uniprot IDs as mapping to Refseq is not supported.


Uniprot IDs are used to extract information about domain boundaries (Pfam, Uniprot), annotated functional regions (Uniprot),

protein-protein interactions (Piana). Refseq protein IDs are used to extract known alterations in cancer (COSMIC),

SNPs (dbSNP) and known role in cancer (CancerGenes).


The server determines domain boundaries (using Pfam or Uniprot) for the region with the variant and builds multiple

sequence alignment using all Uniprot protein sequences or uses existing one from the repository. To obtain the list

of existing alignments in the repository for a giver protein please see WEBAPI section below.


For each variant the server provides the following annotations (this description also available as a tooltip in the main table) :

Column name Description
Mutation Mutations as given by the user
RG variant Variant based on reference genome (for submitted in genomic coordinates)
RG variant type Variant type based on reference genome: missense,silent,stop loss,nonsense (for submitted in genomic coordinates)
User data Optional user data
MSA Link to multiple sequence alignment browser
PDB Link to 3D structure browser
Func.Impact Functional impact of a variant : predicted functional (high, medium), predicted non-functional (low, neutral). Please see paper for details.
FI score Functional impact combined score
VC score Variant conservation score
VS score Variant specificity score
Mapping issue Issue with variant/protein mapping
AA variant Amino-acid substitution
Gene Gene name
Location Chromosomal location of a gene
Uniprot Uniprot protein accession ID
Refseq Refseq protein ID
gaps in MSA Portion of gaps in variant position in multiple sequence alignment
MSA height Number of diverse sequences in multiple sequence alignment (identical or highly similar sequences filtered out)
Codon start position Start of a codon
Uniprot position Variant position in Uniprot protein, can be different from the one in Refseq
Uniprot residue Reference residue in Uniprot protein, can be different from the one in Refseq
Refseq position Variant position in Refseq protein, can be different from the one in Uniprot
Refseq residue Reference residue in Refseq protein, can be different from the one in Uniprot
Func. region Variant position is within region annotated by Uniprot as one of the following: ACT_SITE, BINDING, CARBOHYD, CA_BIND, CROSSLNK, DISULFID, DNA_BIND, METAL, MOD_RES, MOTIF, NON_STD, NP_BIND, SITE, ZN_FING
N.Cosmic Number of mutations in COSMIC for this protein
N.SNPs Number of SNPs in dbSNP for this protein
Protein Variant position maps to PDB residue which is in a binding site with another protein
DNA/RNA Variant position maps to PDB residue which is in a binding site with DNA/RNA molecule
small.mol Variant position maps to a PDB residue that is in a binding site with a small molecule. Only the first 4 are shown in the main table - browse through mapped PDB structures to see all small molecules. The following small molecules are ignored: PO4,PI,SO4,SUL,CL,BR,NO3,SCN,NH4,K,NA,LI,MG,DOD,NAG,MAN,GOL,SO4,CL,CO3,FS4 (source:Polyphen)
Cosmic@position COSMIC aterations in Refseq ±1 position
SNPs@position SNPs from dbSNP in Refseq ±1 position
gene's known role in cancer Gene annotations by CancerGenes database
regions@position Known functional regions annotated by Uniprot in variant position
domain@position Nearby Pfam domains in Uniprot position
domains All Pfam domains in a protein

Computational Biology Center | Memorial Sloan Kettering Cancer Center