The Comparison of Fuzzy Clustering Methods for Symbolic Interval-Valued Data

Marcin Pełka; Andrzej Dudek

doi:10.5604/01.3001.0014.1755

Marcin Pełka Wrocław University of Economics, Department of Econometrics and Computer Science , Andrzej Dudek Wrocław University of Economics, Department of Econometrics and Computer Science Przegląd Statystyczny. Statistical Review, vol. 62, 2015, 3, pages: 301-319 Published online: 30 September 2015 DOI 10.5604/01.3001.0014.1755

887 Views 47 Downloads

ARTICLE

(English) PDF

ABSTRACT

Interval-valued data can find their practical applications in such situations as recording monthly interval temperatures at meteorological stations, daily interval stock prices, etc. The primary objective of the presented paper is to compare three different methods of fuzzy clustering for interval-valued symbolic data, i.e.: fuzzy c-means clustering, adaptive fuzzy c-means clustering and fuzzy k-means clustering with fuzzy spectral clustering. Fuzzy spectral clustering combines both spectral and fuzzy approaches in order to obtain better results (in terms of Rand index for fuzzy clustering). The conducted simulation studies with artificial and real data sets confirm both higher usefulness and more stable results of fuzzy spectral clustering method, as compared to other existing fuzzy clustering methods for symbolic interval-valued data, when dealing with data featuring different cluster structures, noisy variables and/or outliers.

KEYWORDS

spectral clustering, fuzzy clustering, fuzzy partition, interval-valued data, symbolic data analysis

REFERENCES

Bezdek J. C., (1981), Pattern Recognition with Fuzzy Objective Function Algorithms, Plenum Press, New York.

Billard L., Diday E., (2006), Symbolic Data Analysis. Conceptual Statistics and Data Mining, Wiley, Chichester.

Bock H.-H., Diday E. (eds.), (2000), Analysis of Symbolic Data. Explanatory Methods for Extracting Statistical Information from Complex Data, Springer-Verlag, Berlin-Heidelberg.

Chung F., (1997), Spectral graph theory, Washington, Conference Board of the Mathematical Sciences.

Cominetti O., Matzavinos A., Samarasinghe S., Kulasiri D., Maini P. K., Erban R., (2010), DifFUZZY: A Fuzzy Spectral Clustering Algorithm For Complex Data Sets, International Journal of Computational Intelligence in Bioinformatics and Systems Biology, 1 (4), 402–417.

De Carvalho F. A. T., Souza R. M. C. R., Chavent M., Lechevallier Y., (2006), Adaptive Hausdorff Distances And Dynamic Clustering Of Symbolic Data, Pattern Recognition Letters, 27 (3), 167–179.

De Carvalho F. A. T., Tenório C. P., Cavalcanti Junior N. L., (2006), Partitional Fuzzy Clustering Methods Based On Adaptive Quadratic Distances, Fuzzy Sets and Systems, 157, 2833–2857.

De Carvalho F. A. T., (2007), Fuzzy C-means Clustering Methods For Symbolic Interval Data, Pattern Recognition Letters, 28 (4), 423–437.

De Carvalho F. A. T., Tenório C. P., (2010), Fuzzy K-means Clustering Algorithms For Interval-valued Data Based On Adaptive Quadric Distances, Fuzzy Sets and Systems, 161 (23), 2978–2999.

de Sa V. R., (2005), Spectral Clustering With Two Views, ICML Workshop on Learning with Multiple Views.

Diday E., Govaert G., (1977), Classification Automatique Avec Distances Adaptatives, R.A.I.R.O. Informatique Computer Science, 11 (4), 329–349.

Dunn J. C., (1973), A Fuzzy Relative ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters, Journal of Cybernetics, 3, 32–57.

Dudek A., (2013), Metody analizy danych symbolicznych w badaniach ekonomicznych, Wrocław University of Economics Publishing House, Wrocław.

Dudek A., Pełka M., Wilk J., (2014), The symbolicDA package, http://www.R-project.org.

El-Sonbaty Y., Ismail M.A., (1998), Fuzzy Clustering For Symbolic Data, IEEE Transactions on Fuzzy Systems, 6, 195–204.

Fischer I., Poland J., (2004), New methods for spectral clustering, Technical Report No. IDSIA-12-04, Dalle Molle Institute for Artificial Intelligence, Manno-Lugano, Switzerland.

Gatnar E., Walesiak M., (eds.), (2011), Analiza danych jakościowych i symbolicznych z wykorzystaniem programu R, C.H. Beck, Warszawa.

Gordon A. D., (1999), Classification, Chapman and Hall/CRC, Boca Raton.

Guattery S., Miller G.L., (1998), On the Quality of Spectral Separators, SIAM Journal on Matrix Analysis and Applications, 19 (3), 701–719.

Gustafson D. E., Kessel W. C., (1979), Fuzzy Clustering with Fuzzy Covariance Matrix, Proceedings of IEEE Conference on Decision and Control, San Diego, CA, 761–766.

Hüllermeier E., Rifqi M., (2009), A Fuzzy Variant of the Rand Index for Comparing Clustering Structures, Proceedings of IFSA/EUSFLAT Conference ‘2009, 1294–1298.

Ichino M., (1988), General Metrics for Mixed Features – The Cartesian Space Theory for Pattern Recognition, Proceedings of the 1988 IEEE International Conference on Systems, Man and Cybernetics, 1, 494–497, International Academic Publishers Beijing.

Jain A. K., Murty M. N., Flynn P. J., (1999), Data Clustering: A Review, ACM Computational Surveys, 31 (3), 264–323.

Kannan R., Vempala S., Vetta A., (2000), On Clusterings – Good, Bad and Spectral, Technical Report, Computer Science Department, Yale University.

Karatzoglou A., (2006), Kernel Methods. Software, Algorithms and Applications, Doctoral thesis, Vienna University of Technology.

Malerba D., Esposito F., Gioviale V., Tamma V., (2001), Comparing Dissimilarity Measures for Symbolic Data Analysis, Pre-Proceedings of ETK-NTTS 2001, Hersonissos, 473-48.

Meila M., Shi J., (2001), A Random Walks View of Spectral Segmentation, 8-th International Workshop on Artificial Intelligence and Statistics (AISTATS).

Milligan G. W., Cooper M. C., (1988), A Study of Standardization of Variables in Cluster Analysis, Journal of Classification, 5 (2), 181–204.

Moore R..E., (1966), Interval Analysis, Prentice-Hall, Englewood Cliffs, NJ.

Ng A., Jordan M., Weiss Y., (2002), On Spectral Clustering: Analysis and Algorithm, in: Dietterich T., Becker S., Ghahramani Z., (eds.), Advances in Neural Information Processing Systems, 14, MIT Press, 849–856.

Nieddu L., Rizzi A., (2005), Metrics in Symbolic Data Analysis, in: Vichi M., Monari P., Signani S., Montanari A., (eds.), New Development in Classification and Data Analysis, Springer-Verlag, Berlin-Heidelberg, 71–78.

Poland J., Zeugmann T., (2006), Clustering the Google Distance with Eigenvectors and Semidefinite Programming, in: Jantke K. P., Kreuzberger G., (eds.), Diskussionsbeiträge, Institut für Medien und Kommunikationswisschaft, Technische Universität Ilmenau, 21, 61–69, July 2006.

Shi J., Malik J., (2000), Normalized Cuts and Image Segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22 (8), 888–905.

Qiu W., Joe H., (2006), Generation of Random Clusters With Specified Degree of Separation, Journal of Classification, 23 (2), 315–334.

Qiu W., Joe H., (2006a), Separation Index and Partial Membership for Clustering, Computational Statistics and Data Analysis, 50, 585–603.

Qiu, W., Joe, H. (2010), The clusterGeneration package, http://www.R-project.org.

von Luxburg U., Bousquet O., Belkin M., (2005), Limits of Spectral Clustering, in: Saul L., Weiss Y., Bottou L., (eds.), Advances in Neural Information Processing Systems (NIPS) 17, Cambridge, MA: MIT Press, 857–864.

von Luxburg U., (2006), A Tutorial on Spectral Clustering, Max Planck Institute for Biological Cybernetics, Technical Report TR-149.

Walesiak M., Dudek A., (2008), Identification of Noisy Variables for Nonmetric and Symbolic Data in Cluster Analysis, in: Preisach C., Burkhardt H., Schmidt-Thieme L., Decker R., (eds)., Data Analysis, Machine Learning and Applications, Springer-Verlag, Berlin-Heidelberg, 85–92.

Walesiak M., Dudek A., (2014), The clusterSim package, http://www.R-project.org.

Wang W., Zhang Y., (2007), On Fuzzy Validity Indices, Fuzzy Sets and Systems, 158, 2095–2117.

Zelnik-Manor L., Perona P., (2004), Self-tuning Spectral Clustering, Proceedings of the 18th Annual Conference on Neural Information Processing Systems (NIPS’04), http://books.nips.cc/nips17.html.

Yang M.-S., Hwang P.-Y., Chen D.-H., (2004), Fuzzy Clustering Algorithms for Mixed Feature Types, Fuzzy Sets Systems, 141, 301–317.

Yaguchi H., Ichino M., (1994), Feature Selection for Symbolic Data Classification, in: Diday E. Lechevallier Y., Schader M., Bertrand P., Burtschy B., (eds.), New Approaches in Classification and Data Analysis, Springer-Verlag, Berlin-Heidelberg, 387–394.