Easily segment your audience using preference data and other survey answers with the Clustering Demo.
This is a Clustering Demo that lets you automatically group your participants based on similar responses, enabling you to uncover potential market segments. It attempts to classify respondents into distinct groups of similar respondents, helping you identify patterns and design tailored strategies based on participants’ needs. With this demo, you can:
- Automatically group participants based on shared preferences.
- Uncover hidden market segments.
- Design targeted strategies based on participants’ unique needs and preferences.
- Make data-driven decisions with comprehensive market insights, optimizing product offerings and marketing campaigns.
Get started with the Clustering Demo
How to use the Clustering Demo
1: Prepare your data set
Please ensure input data is in CSV format with the first column labelled participant_id
. You can include any desired questions for the clustering exercise, except for Gabor-Granger results and open-ended responses.
In the example below, columns BB to BF can be included in the clustering exercise, whereas columns BH and CJ need to be excluded.
Additionally, you can incorporate individual preferences in the clustering exercise. Please note that for multiple-choice questions, it’s necessary to include a column with text responses instead of binary columns.
For illustration, in the following data set, columns AH to AL need to be excluded as they contain binary responses, whereas column AM can be included in the clustering exercise.
In summary:
- Start the first column with
participant_id
. - Include any desired questions for the clustering exercise (except for Gabor-Granger results and Open-End answers).
- You have the option to incorporate individual preferences in the clustering exercise.
- Use a column with text responses instead of binary columns for multiple-choice questions.
- Save your data set in .csv format.
The final data structure should look like:
2: Upload your data set
Open the Clustering Demo and upload your .csv file using the browse button. Once the file is uploaded, you can start performing the cluster analysis.
3: Select the parameters for the analysis
The demo offers three independent methodologies to compute solutions based on your data and research objectives.
The available methods are:
- Gaussian-Mixture Models (GMM): GMMs are statistical methods for clustering data based on Gaussian distribution, varying probabilities across different setups. This method is ideal for complex and overlapping groups.
- K-means(GMM): The most popular algorithm. K-means splits data into K clusters by assigning points to the nearest centroids. It is efficient for large datasets but limited to spherical groups. In addition, you should consider the appropriate features as inputs to enhance its performance. This involves leveraging domain knowledge to identify influential features using feature importance scores. The iterative process of experimenting with various feature combinations can lead to more meaningful and interpretable cluster results.
- Hierarchical Clustering(GMM): This algorithm builds a dendrogram of clusters by merging based on similarity measures. It’s well-suited for exploring customer segmentation in retail based on purchasing behaviour, identifying distinct customer personas in marketing based on demographics and preferences, and grouping similar user profiles in social networks based on interaction patterns.
Each methodology offers distinct advantages, allowing you to tailor your clustering approach to best suit the nature of your data. By default, the demo employs GMM to group participants based on their shared features.
4: Set up additional parameters for the analysis
You can also adjust the number of iterations for computing cluster solutions. Each iteration involves running the algorithm multiple times and assigning the final classification based on the group with the highest assignment achieved per iteration. (i.e. the mode obtained after n iterations).
In addition, you can adjust the number of clusters when using K-means and hierarchical methods. (GMM automatically computes the optimal number of groups.) Moreover, the platform suggests a maximum number of clusters based on your data. This value is calculated from the average of over ten different indices. However, this suggestion should be taken as a recommendation, as you are free to exceed the recommended number of clusters if you wish.
5: Perform the cluster analysis
The first output is a doughnut chart showing the clusters’ distribution, that is, the membership of each group within the overall sample. This chart will help you understand your data’s composition, identifying dominant segments, potential outliers, and areas that may require further exploration or targeted strategies.
This initial output can help you comprehensively explore your data’s underlying patterns and insights.
In addition, the mosaic plot allows you to profile your respondents, presenting a visual representation tailored for categorical variables (numeric values are not supported). This aims to identify differences between groups, assuming statistical independence among variables. The “monthly or less frequent” category is distinguished with red and blue colouring in the example. The blue group (cluster 2) implies a higher prevalence of respondents selecting this option compared to a scenario with no cluster relationship (where groups are evenly distributed across categories) and conversely for the red group (cluster 1). Embracing the mosaic plot empowers you to understand better how categorical variables interact within your dataset.
6: Export cluster solutions
Finally, with this demo, you can download the complete dataset or only the cluster solutions, allowing you to conduct further analysis. When downloading the cluster solutions, you will obtain a CSV file containing all the clusters computed by the demo. This functionality offers the flexibility and freedom to delve deeper into the data, enabling you to explore additional insights and conduct customised analyses.
Further analysis
A great way to get more insight from these cluster memberships is to use them to define segments in a Conjointly report. You can then:
- Run crosstabs of results by segment.
- Run preference share simulations by segment.