- Architecture and Design
- Asian and Pacific Studies
- Business and Economics
- Classical and Ancient Near Eastern Studies
- Computer Sciences
- Cultural Studies
- Engineering
- General Interest
- Geosciences
- Industrial Chemistry
- Islamic and Middle Eastern Studies
- Jewish Studies
- Library and Information Science, Book Studies
- Life Sciences
- Linguistics and Semiotics
- Literary Studies
- Materials Sciences
- Mathematics
- Social Sciences
- Sports and Recreation
- Theology and Religion
- Publish your article
- The role of authors
- Promoting your article
- Abstracting & indexing
- Publishing Ethics
- Why publish with De Gruyter
- How to publish with De Gruyter
- Our book series
- Our subject areas
- Your digital product at De Gruyter
- Contribute to our reference works
- Product information
- Tools & resources
- Product Information
- Promotional Materials
- Orders and Inquiries
- FAQ for Library Suppliers and Book Sellers
- Repository Policy
- Free access policy
- Open Access agreements
- Database portals
- For Authors
- Customer service
- People + Culture
- Journal Management
- How to join us
- Working at De Gruyter
- Mission & Vision
- De Gruyter Foundation
- De Gruyter Ebound
- Our Responsibility
- Partner publishers
Your purchase has been completed. Your documents are now available to view.
Chapter 1: Introduction to Cluster Analysis
From the book cluster analysis and data mining.
- Ronald S. King
- X / Twitter
Supplementary Materials
Please login or register with De Gruyter to order this product.
Chapters in this book (16)
Machine Learning-Based Clustering Analysis: Foundational Concepts, Methods, and Applications
- Conference paper
- First Online: 04 December 2021
- Cite this conference paper
- Miquel Serra-Burriel 5 &
- Christopher Ames 6
Part of the book series: Acta Neurochirurgica Supplement ((NEUROCHIRURGICA,volume 134))
2222 Accesses
1 Citations
3 Altmetric
Unsupervised learning, the task of clustering observations in such a way that observations within cluster are more similar than those assigned to other clusters is one the central tasks of data science. Its exploratory and descriptive nature make it one of the most underused and underappreciated methods. In the present chapter we describe its core function with applied examples, explore different approaches, and discuss meaningful applications of the approach for the practicing researcher.
This is a preview of subscription content, log in via an institution to check access.
Access this chapter
- Available as PDF
- Read on any device
- Instant download
- Own it forever
- Available as EPUB and PDF
- Compact, lightweight edition
- Dispatched in 3 to 5 business days
- Free shipping worldwide - see info
- Durable hardcover edition
Tax calculation will be finalised at checkout
Purchases are for personal use only
Institutional subscriptions
Storrs KR, Fleming RW. Unsupervised learning predicts human perception and misperception of gloss. bioRxiv. 2020. https://doi.org/10.1101/2020.04.07.026120 .
Driver HE, Kroeber AL. Quantitative expression of cultural relationships. Berkeley: University of California Press; 1932.
Google Scholar
Sánchez-Hernández G, Chiclana F, Agell N, Aguado JC. Ranking and selection of unsupervised learning marketing segmentation. Knowl Based Syst. 2013;44:20–33.
Article Google Scholar
Libbrecht MW, Noble WS. Machine learning applications in genetics and genomics. Nat Rev Genet. 2015;16:321–32. https://doi.org/10.1038/nrg3920 .
Article PubMed PubMed Central CAS Google Scholar
Denny M, Spirling A. Text preprocessing for unsupervised learning: why it matters, when it misleads, and what to do about it. Polit Anal. 2017;26(2):168–89.
Wang L. Discovering phase transitions with unsupervised learning. Phys Rev B. 2016;94:195105.
Sonnewald M, Dutkiewicz S, Hill C, Forget G. Elucidating ecological complexity: unsupervised learning determines global marine eco-provinces. Sci Adv. 2020;6:eaay4740.
Article PubMed PubMed Central Google Scholar
Syakur MA, Khotimah BK, Rochman EMS, Satoto BD. Integration K-means clustering method and elbow method for identification of the best customer profile cluster. In: IOP conference series: materials science and engineering. 2018.
Kodinariya TM, Makwana PR. Review on determining number of cluster in K-means clustering. Int J Adv Res Comput Sci Manag Stud. 2013;1:90–5.
Tibshirani R, Walther G, Hastie T. Estimating the number of clusters in a data set via the gap statistic. J R Stat Soc Ser B Stat Methodol. 2001;63:411–23.
Fichet B, Piccolo D, Verde R, Vichi M. Studies in classification, data analysis, and knowledge organization. In: Knowledge organization. 2011.
Lloyd S. Least squares quantization in PCM. IEEE Trans Inf Theory. 1982;28:129–37.
MacQueen J. Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, volume 1: statistics. Berkeley: University of California Press; 1967. p. 281–97. https://projecteuclid.org/euclid.bsmsp/1200512992 .
Hartigan JA, Wong MA. Algorithm AS 136: a k-means clustering algorithm. J R Stat Soc Ser C (Appl Stat). 1979;28:100–8.
Ester M, Kriegel H-P, Sander J, Xu X. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Kdd. 1996. p. 226–31.
Ames CP, Smith JS, Pellisé F, Kelly M, Alanay A, Acaroğlu E, et al. Artificial intelligence based hierarchical clustering of patient types and intervention categories in adult spinal deformity surgery: towards a new classification scheme that predicts quality and value. Spine (Phila Pa 1976). 2019;44:915–26.
Article PubMed Google Scholar
Terran J, Schwab F, Shaffrey CI, Smith JS, Devos P, Ames CP, et al. The SRS-Schwab adult spinal deformity classification: assessment and clinical correlations based on a prospective operative and nonoperative cohort. Neurosurgery. 2013;73(4):559–68.
Lenke LG. The Lenke classification system of operative adolescent idiopathic scoliosis. Neurosurg Clin N Am. 2007;18(2):199–206.
Seymour CW, Kennedy JN, Wang S, Chang C-CH, Elliott CF, Xu Z, et al. Derivation, validation, and potential treatment implications of novel clinical phenotypes for sepsis. JAMA. 2019;321:2003–17. https://doi.org/10.1001/jama.2019.5791 .
Download references
Author information
Authors and affiliations.
Epidemiology, Biostatistics and Prevention Institute, University of Zurich (UZH), Zurich, Switzerland
Miquel Serra-Burriel
Department of Neurological Surgery, University of California San Francisco (UCSF), San Francisco, CA, USA
Christopher Ames
You can also search for this author in PubMed Google Scholar
Corresponding author
Correspondence to Miquel Serra-Burriel .
Editor information
Editors and affiliations.
Machine Intelligence in Clinical Neuroscience (MICN) Laboratory, Department of Neurosurgery, Clinical Neuroscience Center, University Hospital Zurich, University of Zurich, Zurich, Switzerland
Victor E. Staartjes
Carlo Serra
Ethics declarations
Dr. Serra-Burriel reports receiving grant funding from the European Commission H2020 program and European Commission EiT Health program.
Dr. Ames reports receiving royalties from Stryker, Biomet Zimmer Spine, DePuy Synthes, NuVasive, Next Orthosurgical, K2M, and Medicrea; being a consultant to DePuy Synthes, Medtronic, Medicrea, and K2M; receiving research support from Titan Spine, DePuy Synthes, and ISSG; being on the editorial board of Operative Neurosurgery; receiving grant funding from SRS; being on the executive committee of ISSG; and being a director of Global Spine Analytics.
None in relation to the present work.
1 Electronic Supplementary Material
Supplementary content 12.1, rights and permissions.
Reprints and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper.
Serra-Burriel, M., Ames, C. (2022). Machine Learning-Based Clustering Analysis: Foundational Concepts, Methods, and Applications. In: Staartjes, V.E., Regli, L., Serra, C. (eds) Machine Learning in Clinical Neuroscience. Acta Neurochirurgica Supplement, vol 134. Springer, Cham. https://doi.org/10.1007/978-3-030-85292-4_12
Download citation
DOI : https://doi.org/10.1007/978-3-030-85292-4_12
Published : 04 December 2021
Publisher Name : Springer, Cham
Print ISBN : 978-3-030-85291-7
Online ISBN : 978-3-030-85292-4
eBook Packages : Medicine Medicine (R0)
Share this paper
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
- Publish with us
Policies and ethics
- Find a journal
- Track your research
IMAGES
VIDEO
COMMENTS
12 Chapter 15: Cluster analysis There are many other clustering methods. For example, a hierarchical di-visive method follows the reverse procedure in that it begins with a single cluster consistingofall observations, forms next 2, 3, etc. clusters, and ends with as many clusters as there are observations. It is not our intention to. 1
Cluster: a collection of data objects. Similar to one another within the same cluster. Dissimilar to the objects in other clusters. Cluster analysis. Grouping a set of data objects into clusters. Clustering is unsupervised classification: no predefined classes. Typical applications. As a stand-alone tool to get insight into data distribution.
1. An introduction to cluster analysis. ALEXANDER NOVOSELSKY, Weizmann Institute of Science. EUGENE KAGAN, Ariel University. The processes of human learning, understanding, and cognition are at ...
Contributing areas of research include data mining, statistics, machine learning, spatial database technology, informa-tion retrieval, Web search, biology, marketing, and many other application areas. Owing to the huge amounts of data collected in databases, cluster analysis has recently become.
8.1.1 What Is Cluster Analysis? Cluster analysis groups data objects based only on information found in the data that describes the objects and their relationships. The goal is that the objects within a group be similar (or related) to one another and different from (or unrelated to) the objects in other groups. The greater the similarity (or
The Goal — an optimal application-independent cluster analysis method — is mathematically impossible: No free lunch theorem: every possible clustering method performs equally well on average over all possible substantive applications Existing methods: Many choices: model-based, subspace, spectral, grid-based, graph-
The Handbook of Cluster Analysis provides a comprehensive and unified account of the main research developments in cluster analysis, written by active, distinguished researchers in this area, to help readers make informed choices of the most suitable clustering approach for their problem and make better use of existing cluster analysis tools. Handbook of Cluster Analysis provides a ...
Cluster Analysis: An Introduction. Cluster analysis is the generic name for a variety of mathematical methods for appraising similarities among a set of objects, where each object is described by measurements made on its attributes. The input to a cluster analysis is a data matrix having t columns, one for each object, and n rows, one for each ...
Quality of Clustering n A good clustering method will produce high quality clusters n High intra-class similarity: cohesive within clusters n Low inter-class similarity: distinctive between clusters n The quality of a clustering method depends on n The similarity measure used by the method n Its implementation, and n Its ability to discover some or all the hidden patterns
Cluster analysis is a method for segmentation and identifies homogenous groups of objects (or cases, observations) called clusters.These objects can be individual customers, groups of customers, companies, or entire countries. Objects in a certain cluster should be as similar as possible to each other, but as distinct as possible from objects in other clusters.
Rosie Cornish. 2007. 1 Introduction. This handout is designed to provide only a brief introduction to cluster analysis and how it is done. Books giving further details are listed at the end. Cluster analysis is a multivariate method which aims to classify a sample of subjects (or ob- jects) on the basis of a set of measured variables into a ...
AUTHOR INDEX Engelman, L., 247. 323 Erlenkotter, D., 110, 323 Estabrook, G. F., 331 Everitt, B., 123, 242, 323 Farchi. G., 325 Federkiel, H., 322 Fisher, L..115, 116 ...
3 Approaches to Cluster Analysis. Many data mining methods rely on some concept of the similarity between pieces of. information encoded in the data of interest. Vari ous names have been applied ...
Abstract and Figures. This article provides an overview of methods used to cluster data, that is, to discover and allocate objects to unknown subgroups. We review cluster analysis techniques for ...
The term cluster analysis denotes a family of unsupervised methods (the training set is not labeled) which are able to identify groups (clusters) in a multidimensional space. A cluster is a collection of similar objects (people, animals, documents, chemical elements, stars, etc.) which are dissimilar to objects in other clusters.. These methods belong to a larger group of classification ...
developing a consensus clustering methodology. As another example, a categorization of web pages based on text analysis can be enhanced by using the knowledge of topical document hierarchies available from Yahoo! or DMOZ. (e) Multi-view Clustering Often the objects to be clustered have multiple aspects or \views", and base clusterings
Educational research has typically used a clustering method which minimises the within-cluster sum of squares. This. may be likened to the creation a posteriori of maximally different treatment groups. in the ANOVA sense. An important development in cluster analysis was the appearance of a model.
Chapter 1: Introduction to Cluster Analysis was published in Cluster Analysis and Data Mining on page 1.
A. Data setting. Research in education that uses open-ended questions and is aimed at quantifying qualitative data usually involves the development of coding procedures. This requires an analysis of student answers in order to reveal (and then examine) patterns and trends, and to find common themes emerging from them.
Data analysis is used as a common method in modern science research, which is across communication science, computer science and biology science. Clustering, as the basic composition of data analysis, plays a significant role. On one hand, many tools for cluster analysis have been created, along with the information increase and subject intersection. On the other hand, each clustering ...
Clustering algorithms partition data objects into subsets (clusters) based on similarity or dissimilarity. Patterns within a valid cluster are more similar to each other than they are to a pattern belonging to a different cluster. The clustering process is an unsupervised, semisupervised, or supervised method.
clustering analysis and visualization. The classification of objects, into clusters, requires some methods for measuring the distance or the (dis)similarity between the objects. Chapter 3 covers the common distance measures used for assessing similarity between observations. Part II starts with partitioning clustering methods, which include:
For future use and to inform methodological practices in second language research, we briefly report on a sample study of cluster analysis that uses open data. Open Research This article has earned Open Data and Open Materials badges for making publicly available the digitally‐shareable data and the components of the research methods needed ...
The gap statistic method is similar to the silhouette method; however, it compares the resulting difference in intra-cluster variation from each clustering distribution with a random Monte Carlo simulated sample. Figure 12.4 presents the results on the optimal number of clustering by each of the described methods.