Customer Segmentation Using Machine Learning — AI Insight Architect

Customer segmentation is foundational work before you do anything else. It groups your users into meaningful clusters so you can tailor your product and marketing intelligently.

Talvinder Singh, from a Pragmatic Leaders AI Insight Architect session

Customer segmentation is a powerful approach that divides your customer base into distinct groups with shared characteristics, behaviors, or needs. The actual job of segmentation is to enable precise targeting and resource allocation. Without clear segments, your marketing and product efforts become scattershot — trying to please everyone and pleasing no one.

This lesson focuses on K-Means clustering, one of the most common machine learning algorithms for segmentation. You will see how to implement it practically in Python, interpret the results, and apply the insights strategically. Ethical considerations are also critical — segmentation can reinforce biases or violate privacy if not handled responsibly.

Customer segmentation is the foundation for personalization and growth

Think about your users or customers as a large, diverse group. They vary by age, income, preferences, and behavior. Treating them all the same is a recipe for wasted effort.

Segmentation answers the question: Who are the distinct groups within your user base that will respond differently to your product or marketing?

For example, a food delivery app might find clusters of:

Young professionals ordering late-night snacks
Families ordering dinner on weekends
Health-conscious users ordering salads and juices

Each segment has different needs and responds to different incentives. Your campaigns, features, and messaging must align with these differences.

Why segmentation matters for Indian companies

India’s diversity is vast — across languages, cultures, income levels, and digital literacy. A one-size-fits-all approach rarely works.

Companies like Swiggy and Meesho have succeeded by deeply understanding their segments:

Swiggy targets urban young adults and office workers with quick delivery and meal combos.
Meesho focuses on tier-2/3 resellers who need vernacular support and social commerce features.

If you ignore segmentation, you risk building features or campaigns that fail to resonate or scale.

How K-Means clustering works: grouping customers by similarity

K-Means is a simple, intuitive algorithm that clusters data points into ‘k’ groups based on feature similarity.

Imagine you have 100 customers, each described by features like age, annual income, and spending score. K-Means divides these customers into three segments (k=3), grouping those who are similar on these features.

The algorithm works in four steps:

Initialization: Randomly select k data points as initial centroids (cluster centers).
Assignment: Assign each data point to the nearest centroid based on distance (usually Euclidean).
Update: Recalculate each centroid as the mean of all points assigned to that cluster.
Repeat: Repeat the assignment and update steps until centroids stabilize (stop changing significantly).

This iterative process finds clusters where customers within a cluster are more similar to each other than to customers in other clusters.

What features to use for clustering?

Choosing the right features is key. Common choices include:

Demographics: age, income, location
Behavioral: purchase frequency, product categories used, session duration
Psychographics: interests, lifestyle indicators (if available)

The features must be numeric or converted to numeric form for K-Means.

Implementing customer segmentation in Python with K-Means

Here is a practical example using Python’s scikit-learn library on synthetic data representing 100 customers.

import pandas as pd
import numpy as np
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt

# Reproducibility
np.random.seed(42)

# Generate synthetic customer data
num_samples = 100
data = {
    'CustomerID': np.arange(1, num_samples + 1),
    'Age': np.random.randint(18, 70, num_samples),
    'Annual Income (k$)': np.random.randint(15, 120, num_samples),
    'Spending Score (1-100)': np.random.randint(1, 100, num_samples)
}

df = pd.DataFrame(data)

# Apply K-Means clustering with 3 clusters
kmeans = KMeans(n_clusters=3, random_state=42)
df['Segment'] = kmeans.fit_predict(df[['Age', 'Annual Income (k$)', 'Spending Score (1-100)']])

# Visualize the clusters by income and spending score
plt.figure(figsize=(10, 6))
plt.scatter(df['Annual Income (k$)'], df['Spending Score (1-100)'], c=df['Segment'], cmap='viridis')
plt.title('Customer Segmentation')
plt.xlabel('Annual Income (k$)')
plt.ylabel('Spending Score (1-100)')
plt.colorbar(label='Segment')
plt.show()

This code:

Creates a DataFrame with synthetic customer features.
Runs K-Means to assign each customer to one of three segments.
Plots the clusters so you can visually inspect the grouping.

You can adapt this to your real data by selecting relevant features and tuning the number of clusters.

How to interpret and use segmentation results

Assigning customers to segments is only the start. The actual job is to translate these clusters into meaningful personas and strategies.

Look at each segment’s average characteristics:

Segment A: Younger customers, moderate income, high spending score — possibly aspirational urban millennials.
Segment B: Older, higher income, moderate spending — maybe established professionals.
Segment C: Lower income, low spending — price-sensitive or occasional users.

Use these profiles to:

Develop targeted marketing campaigns: discounts for price-sensitive segments, premium offerings for high spenders.
Tailor product features: simplified UI for less tech-savvy segments, loyalty programs for frequent buyers.
Prioritize resources: focus on segments that drive revenue or have growth potential.

Collaboration across teams

As a PM, your job is to:

Work with data scientists or engineers who can run and refine segmentation models.
Partner with marketing to design segment-specific campaigns.
Coordinate with product design to customize user journeys per segment.

This cross-functional teamwork is critical to turn segmentation insights into business impact.

Ethical considerations in customer segmentation

Segmentation involves collecting and analyzing personal data. Here is what you must keep in mind:

Data privacy and consent: Ensure you have explicit permission to collect and use customer data. Follow laws like India’s IT Act and emerging data protection regulations.
Avoid bias and discrimination: Segments should not reinforce stereotypes or exclude groups unfairly. For example, avoid using sensitive attributes like caste or religion in clustering.
Transparency: Be clear internally and externally about how segmentation is used. Customers should not be surprised by targeting they find intrusive.
Data security: Protect customer data from breaches or misuse.

Ignoring these ethical aspects can damage trust and invite regulatory action.

From the field: How Indian startups use segmentation

Swiggy uses segmentation extensively to optimize offers and delivery logistics. They group users by order frequency, cuisine preference, and time of day to tailor promotions.

Meesho segments resellers by geography and language to provide vernacular support and relevant product catalogs.

These examples show that segmentation is not just academic — it drives real revenue and user satisfaction.

Field exercise: Build a simple segmentation model

Take 20 minutes to apply what you learned:

Collect a sample dataset (your company’s anonymized user data or a public dataset).
Choose 3-5 numeric features relevant to your business.
Use Python and scikit-learn to run K-Means clustering with k=3 or 4.
Analyze the cluster centroids and profile each segment.
Write down two concrete marketing or product actions for each segment.

This hands-on practice will build your confidence working with data teams and applying ML techniques.

Test yourself: The segmentation strategy call

// learn the judgment

You are a PM at a Series A Indian e-commerce startup based in Bangalore. The marketing team wants to launch personalized discount campaigns but lacks clear customer segments. You have access to data on customer age, order frequency, average order value, and preferred product categories.

The call: How do you approach building customer segments to support marketing? What features do you select, how do you validate segments, and how do you ensure ethical data use?

Your reasoning:

Where to go next

If you want to deepen your understanding of data analysis: Data Analysis for Product Managers
If you want to learn how to design experiments with segments: A/B Testing and Experimentation
If you want to explore ethical AI and responsible data use: Ethical AI and Data Governance
If you want to practice collaborating with data teams: Working with Data Science Teams
If you want to build predictive models beyond segmentation: Machine Learning Concepts for PMs

PL alumni now work at Flipkart, Razorpay, Swiggy, PhonePe, and dozens of other leading Indian tech companies.