A novel fuzzy C-means approach for uncovering cholesterol consensus motif from human G-protein coupled receptors (GPCR)


Membrane cholesterol plays an important role in modulating the function of several membrane proteins. From these proteins, aspecial cholesterol binding motif is reported to which the membrane cholesterol binds and modulates their activity. This consensusmotif is either seen as a forward pattern known as CRAC (L/V-X(1-5)-Y-X(1-5)-R/K) and/or as a backward pattern CARC (R/K-X(1-5)-Y-X(1-5)-L/V). This as such is a low consensus motif as substituting amino acid in the unconserved positions of the motif (‘X’)yields many combinations. In order to obtain a better consensus motif for cholesterol binding, it is worthwhile to look for the samewithin a membrane proteins superfamily (ABC transporters, GPCRs, etc.) and assign them as a signature motif. Therefore, in thecurrent work an attempt was made to identify the distribution of this motif in all seven helices of GPCR family and assign aconsensus signature motif for an individual helix using a novel Fuzzy C-Means (FCM) approach. The workflow proceeds in fourphases; first, GPCR protein sequences were extracted from UniProt database that contains seven transmembrane (TM) helices and acholesterol dictionary has been designed for different window sizes. In second phase; those sequences are filtered which starts withR/K or L/V using both CRAC and CARC cholesterol recognition methods leading to discovery of filtered cholesterol motifs. Thirdphase leads to identification of significant cholesterol motifs using FCM algorithm by computing the membership of sequences todifferent motifs and pattern matching with different helices. Finally those uncovered cholesterol motifs that matched with TMhelices were analyzed. From the results we report an algorithm that can efficiently identify and assign cholesterol signature motifsin GPCR protein sequences that can be further extended to other membrane proteins.