How to Choose an ML Problem — Artificial Intelligence for Managers

The key to solving an AI problem is to understand the problem clearly, break it down, and pick the right algorithm for each part.

Talvinder Singh, from a Pragmatic Leaders AI for Managers session

Choosing the right machine learning problem is the foundation of any successful AI project. Many teams jump straight into building models without fully grasping what problem they are solving or which algorithm fits best. This leads to wasted effort, poor results, and missed opportunities.

The actual job is to understand the business context, analyze the available data, and then identify which machine learning approach aligns with your goals. Not every problem requires a complex deep learning model. Sometimes, a simpler classification or clustering algorithm will do.

Understand the business problem before the ML problem

Before you start thinking about algorithms, you must have a clear understanding of the business problem. What outcome matters? What decision will the model support?

For example, if you are in e-commerce and want to increase sales, your problem might be: "How do we recommend products that the user is most likely to buy?" This framing guides the choice of algorithm and data.

A common mistake is to focus on what AI can do rather than what the business needs. The trap is building a model that is technically impressive but irrelevant to the customer or company goals.

The main types of machine learning problems

Machine learning problems generally fall into a few categories. Each corresponds to different business questions and requires different algorithms.

Problem Type	What it does	Example Use Case	Indian Company Example
Classification	Assigns data points to discrete classes or categories	Detect if an email is spam or not	Swiggy classifying customer complaints into types
Regression	Predicts continuous numeric values	Forecast demand for a product next month	Flipkart predicting inventory needs for Diwali sales
Clustering	Groups similar data points without predefined labels	Segment customers into groups for targeted marketing	Meesho segmenting resellers by buying behavior
Dimension Reduction	Reduces the number of features while preserving information	Visualize high-dimensional customer data	CRED simplifying credit score factors for analysis
Reinforcement Learning	Learns optimal actions via trial and error	Optimize delivery routes dynamically	Dunzo improving delivery efficiency through routing

Classification algorithms in action

Classification is one of the most common ML problems. The goal is to accurately differentiate between two or more classes.

For example, an image classification task might label photos as "cat" or "dog." In fintech, classification could detect fraudulent transactions versus legitimate ones.

Classification algorithms include logistic regression, decision trees, random forests, and neural networks. The choice depends on data size, interpretability needs, and accuracy requirements.

Clustering algorithms for segmentation

Clustering is unsupervised—you don't have predefined labels. Instead, the algorithm finds natural groupings in data.

For instance, you might cluster users based on browsing behavior to discover segments for personalized marketing. This is widely used in customer segmentation, recommendation systems, and targeted campaigns.

Common clustering algorithms include K-means, hierarchical clustering, and DBSCAN.

Regression for forecasting and prediction

Regression predicts continuous outcomes. You might forecast sales, predict customer lifetime value, or estimate delivery times.

Linear regression is the simplest form, but more complex models like polynomial regression, support vector regression, and neural nets can capture nonlinear relationships.

Reinforcement learning for sequential decision-making

Reinforcement learning is less common but powerful for problems involving sequential actions and feedback, like robotics, game-playing, or dynamic pricing.

It requires a framework where the model learns from rewards and penalties over time.

Breaking down complex problems into smaller parts

Most real-world problems are too complex to solve with a single algorithm or approach. The pattern is consistent: break down the problem into smaller, manageable subproblems and apply the right algorithm to each.

For example, a recommendation system might involve:

Clustering users to identify segments
Classifying user interactions as positive or negative
Predicting the rating a user might give a product (regression)
Optimizing the order of recommendations (reinforcement learning)

Each step uses a different algorithm type, but together they solve the overall problem.

The algorithmic journey and its importance

Understanding the algorithmic journey means recognizing that machine learning is not just about picking an algorithm but about a sequence of steps:

Define the problem clearly.
Collect and prepare data.
Choose the right algorithm(s).
Train and validate models.
Deploy and monitor performance.

Managers must grasp this journey to set realistic expectations, allocate resources effectively, and communicate clearly with technical teams.

Indian context: examples and considerations

Indian companies use machine learning in diverse ways. For instance:

Swiggy uses classification algorithms to categorize customer feedback and prioritize operational fixes.
Meesho applies clustering to segment resellers across different regions and tailor marketing campaigns.
Flipkart uses regression models to forecast demand spikes during festival seasons and optimize inventory.

In India, data quality and availability can be challenging due to regional languages, inconsistent formats, and sparse labels. This affects algorithm choice and model performance.

Meeting the challenge of data representation

Your role as a manager includes understanding how data is represented for algorithms.

Data points have features (attributes), instances (individual records), and labels (target outcomes). For example, in a customer churn model:

Features: age, transaction frequency, last login date
Instance: a single customer record
Label: churned or not churned

Representing data well is critical to ML success.

Avoiding common pitfalls: overfitting and underfitting

Two common failure modes in ML:

Overfitting: The model learns the training data too well, including noise, and performs poorly on new data.
Underfitting: The model is too simple to capture the underlying pattern.

Understanding these helps in selecting the right model complexity and validation strategy.

Video: How to choose an ML Problem

Slack conversation: Clarifying the ML problem with the data science team

// thread: #ml-project — Aligning product and data science on problem definition

Priya (Product Manager)We want to improve customer retention. Should we build a churn prediction model?

Rahul (Data Scientist)Yes, but first we need to define churn clearly and check if we have enough labeled data.

PriyaWhat algorithms do you recommend for this?

RahulClassification algorithms like logistic regression or random forests are good starting points. We'll also explore feature importance.

PriyaSounds good. Let's start with data exploration and define success metrics.

Field Exercise: Identify your ML problem type

Title="Choose your ML problem type" time="15 min"

Pick a business problem you want to solve with AI. Follow these steps:

Write down the specific business outcome you want to impact.
Describe the data you have or can collect related to this problem.
Decide if the problem is best framed as classification, regression, clustering, or another ML type.
List possible algorithms you might use for this problem type.
Note any challenges you foresee with data quality or availability.

Reflect on how breaking the problem into smaller parts might help.

Judgment Exercise

scenario="You are a PM at a Series A fintech startup in Bangalore. The team wants to build an AI-powered fraud detection feature. The data scientist suggests a complex deep neural network but the data is limited and noisy. You have to decide the approach."

question="What is your recommendation for choosing the ML problem and algorithm? How do you communicate this to the team and leadership?"

expertReasoning="Advise starting with a simpler classification algorithm like logistic regression or decision trees to establish baseline performance. Emphasize the importance of data quality and problem definition before investing in complex models. Communicate that simpler models can be more interpretable and faster to deploy, reducing risk. Suggest iterative improvement based on initial results."

commonMistake="Approving a complex model upfront without sufficient data or problem clarity, leading to wasted time and confusion. Overlooking the business problem in favor of technical complexity."

// practice

You are a PM at a Series A fintech startup in Bangalore. The team wants to build an AI-powered fraud detection feature. The data scientist suggests a complex deep neural network but the data is limited and noisy. You have to decide the approach.

Your task: What is your recommendation for choosing the ML problem and algorithm? How do you communicate this to the team and leadership?

your reasoning:

0 chars (min 80)

Meeting scene: The algorithm choice debate

// scene:

AI strategy meeting at a mid-stage SaaS startup in Pune.

CEO: “We need the most advanced AI model to impress investors.”

CTO: “Our data is limited. Starting with a complex model may backfire.”

You (PM): “Let's focus on the business problem first. What outcome do we want and what data do we have? A simpler algorithm might give us faster feedback.”

Data Scientist: “Agreed. We can prototype with classification or clustering algorithms and iterate.”

CEO: “I see. So the model choice depends on problem clarity, not just tech buzz.”

This conversation clarified expectations and aligned the team on a pragmatic approach.

// tension:

Choosing the right ML algorithm requires balancing business goals, data constraints, and technical capability.

Where to go next

If you want to understand how to frame problems for AI: AI Product Strategy
If you want to learn how to gather and prepare data: Data Collection and Preparation
If you want to measure AI impact and success: Metrics and KPIs for AI Products
If you want to explore hands-on AI project workflows: Building AI Products

The key to solving an AI problem is to understand the problem clearly, break it down, and pick the right algorithm for each part.

Talvinder Singh, from a Pragmatic Leaders AI for Managers session

Understand the business problem before the ML problem

Before you start thinking about algorithms, you must have a clear understanding of the business problem. What outcome matters? What decision will the model support?

A common mistake is to focus on what AI can do rather than what the business needs. The trap is building a model that is technically impressive but irrelevant to the customer or company goals.

The main types of machine learning problems

Machine learning problems generally fall into a few categories. Each corresponds to different business questions and requires different algorithms.

Problem Type	What it does	Example Use Case	Indian Company Example
Classification	Assigns data points to discrete classes or categories	Detect if an email is spam or not	Swiggy classifying customer complaints into types
Regression	Predicts continuous numeric values	Forecast demand for a product next month	Flipkart predicting inventory needs for Diwali sales
Clustering	Groups similar data points without predefined labels	Segment customers into groups for targeted marketing	Meesho segmenting resellers by buying behavior
Dimension Reduction	Reduces the number of features while preserving information	Visualize high-dimensional customer data	CRED simplifying credit score factors for analysis
Reinforcement Learning	Learns optimal actions via trial and error	Optimize delivery routes dynamically	Dunzo improving delivery efficiency through routing

Classification algorithms in action

Classification is one of the most common ML problems. The goal is to accurately differentiate between two or more classes.

For example, an image classification task might label photos as "cat" or "dog." In fintech, classification could detect fraudulent transactions versus legitimate ones.

Classification algorithms include logistic regression, decision trees, random forests, and neural networks. The choice depends on data size, interpretability needs, and accuracy requirements.

Clustering algorithms for segmentation

Clustering is unsupervised—you don't have predefined labels. Instead, the algorithm finds natural groupings in data.

Common clustering algorithms include K-means, hierarchical clustering, and DBSCAN.

Regression for forecasting and prediction

Regression predicts continuous outcomes. You might forecast sales, predict customer lifetime value, or estimate delivery times.

Linear regression is the simplest form, but more complex models like polynomial regression, support vector regression, and neural nets can capture nonlinear relationships.

Reinforcement learning for sequential decision-making

Reinforcement learning is less common but powerful for problems involving sequential actions and feedback, like robotics, game-playing, or dynamic pricing.

It requires a framework where the model learns from rewards and penalties over time.

Breaking down complex problems into smaller parts

For example, a recommendation system might involve:

Clustering users to identify segments
Classifying user interactions as positive or negative
Predicting the rating a user might give a product (regression)
Optimizing the order of recommendations (reinforcement learning)

Each step uses a different algorithm type, but together they solve the overall problem.

The algorithmic journey and its importance

Understanding the algorithmic journey means recognizing that machine learning is not just about picking an algorithm but about a sequence of steps:

Define the problem clearly.
Collect and prepare data.
Choose the right algorithm(s).
Train and validate models.
Deploy and monitor performance.

Managers must grasp this journey to set realistic expectations, allocate resources effectively, and communicate clearly with technical teams.

Indian context: examples and considerations

Indian companies use machine learning in diverse ways. For instance:

Swiggy uses classification algorithms to categorize customer feedback and prioritize operational fixes.
Meesho applies clustering to segment resellers across different regions and tailor marketing campaigns.
Flipkart uses regression models to forecast demand spikes during festival seasons and optimize inventory.

In India, data quality and availability can be challenging due to regional languages, inconsistent formats, and sparse labels. This affects algorithm choice and model performance.

Meeting the challenge of data representation

Your role as a manager includes understanding how data is represented for algorithms.

Data points have features (attributes), instances (individual records), and labels (target outcomes). For example, in a customer churn model:

Features: age, transaction frequency, last login date
Instance: a single customer record
Label: churned or not churned

Representing data well is critical to ML success.

Avoiding common pitfalls: overfitting and underfitting

Two common failure modes in ML:

Overfitting: The model learns the training data too well, including noise, and performs poorly on new data.
Underfitting: The model is too simple to capture the underlying pattern.

Understanding these helps in selecting the right model complexity and validation strategy.

Video: How to choose an ML Problem

Slack conversation: Clarifying the ML problem with the data science team

// thread: #ml-project — Aligning product and data science on problem definition

Priya (Product Manager)We want to improve customer retention. Should we build a churn prediction model?

Rahul (Data Scientist)Yes, but first we need to define churn clearly and check if we have enough labeled data.

PriyaWhat algorithms do you recommend for this?

RahulClassification algorithms like logistic regression or random forests are good starting points. We'll also explore feature importance.

PriyaSounds good. Let's start with data exploration and define success metrics.

Field Exercise: Identify your ML problem type

Title="Choose your ML problem type" time="15 min"

Pick a business problem you want to solve with AI. Follow these steps:

Write down the specific business outcome you want to impact.
Describe the data you have or can collect related to this problem.
Decide if the problem is best framed as classification, regression, clustering, or another ML type.
List possible algorithms you might use for this problem type.
Note any challenges you foresee with data quality or availability.

Reflect on how breaking the problem into smaller parts might help.

Judgment Exercise

question="What is your recommendation for choosing the ML problem and algorithm? How do you communicate this to the team and leadership?"

commonMistake="Approving a complex model upfront without sufficient data or problem clarity, leading to wasted time and confusion. Overlooking the business problem in favor of technical complexity."

// practice

Your task: What is your recommendation for choosing the ML problem and algorithm? How do you communicate this to the team and leadership?

your reasoning:

0 chars (min 80)

Meeting scene: The algorithm choice debate

// scene:

AI strategy meeting at a mid-stage SaaS startup in Pune.

CEO: “We need the most advanced AI model to impress investors.”

CTO: “Our data is limited. Starting with a complex model may backfire.”

You (PM): “Let's focus on the business problem first. What outcome do we want and what data do we have? A simpler algorithm might give us faster feedback.”

Data Scientist: “Agreed. We can prototype with classification or clustering algorithms and iterate.”

CEO: “I see. So the model choice depends on problem clarity, not just tech buzz.”

This conversation clarified expectations and aligned the team on a pragmatic approach.

// tension:

Choosing the right ML algorithm requires balancing business goals, data constraints, and technical capability.

Where to go next

If you want to understand how to frame problems for AI: AI Product Strategy
If you want to learn how to gather and prepare data: Data Collection and Preparation
If you want to measure AI impact and success: Metrics and KPIs for AI Products
If you want to explore hands-on AI project workflows: Building AI Products