Model Families and Performance — Course 2: LLM Architectures, Ethics, and Governance

Choosing the right model family is about balancing speed, control, and compliance — not just chasing the fanciest accuracy number.

Talvinder Singh, from a Pragmatic Leaders AI Product Leadership cohort, 2024

You are the CTO of a telemedicine startup. Off-the-shelf large language models like ChatGPT hallucinate medical facts, putting patient safety and compliance at risk. The actual job is to choose a model family that balances clinical accuracy, regulatory adherence, and cost-effectiveness. This lesson teaches you how to evaluate closed, open, and specialized models with Indian enterprise realities in mind.

Choosing an AI model is not about picking the one with the highest benchmark score. It is about matching the model’s strengths and weaknesses to your product’s constraints and your customers’ needs.

Closed Models Are the “Luxury Cars” of AI — Fast, Polished, But Opaque

Closed models are proprietary AI systems hosted exclusively by vendors like OpenAI or Google. You access them via APIs, paying per token or request.

Strengths:

State-of-the-art performance: GPT-4 scores 86.4% on the MMLU benchmark, making it top-tier for general knowledge tasks.
Turnkey compliance: Vendors manage GDPR, HIPAA, and data privacy requirements, reducing your legal overhead.
Rapid integration: API-based access allows deployment in 1–2 days, ideal for prototyping or augmenting existing products.

Weaknesses:

Cost: GPT-4 Turbo charges approximately $0.06 per 1000 input tokens, which adds up at scale.
Black-box training: You cannot audit or adjust the model’s internal logic, making explainability and bias mitigation challenging.
Vendor lock-in: Dependence on a third-party API risks outages and pricing changes.

Use cases: Rapid prototyping, applications requiring multimodal inputs (text + images), or products with strict compliance needs but limited AI engineering bandwidth.

Indian example: Google’s Gemini model is used in telemedicine startups to analyze X-rays combined with medical history, providing diagnoses through a closed, compliant API.

Analogy: Closed models are like a five-star hotel — luxurious and easy to use, but you cannot remodel the kitchen or see what’s going on behind the scenes.

Open Models Are the “DIY Kits” — Flexible, Transparent, But Require Sweat Equity

Open models have publicly available weights and architectures. Examples include Meta’s LLaMA 2 and Mistral-7B. You can self-host, fine-tune, and audit these models.

Strengths:

Customizability: Retrain models like CodeLlama on your internal codebase or domain-specific data.
Transparency: Audit and mitigate bias using libraries like trl and open datasets.
Cost control: Self-hosting can reduce inference costs to $0.02–$0.04 per query, far cheaper than closed APIs at scale.

Weaknesses:

Inference costs: Running a large model like LLaMA 70B requires expensive GPUs and cloud infrastructure.
Licensing complexity: Many open models have commercial restrictions — LLaMA 2 requires a Meta license for commercial use; violating this can lead to legal action.
Operational complexity: You must manage deployment, scaling, and compliance audits yourself.

Use cases: Cost-sensitive projects with an in-house AI team, products requiring high transparency or domain-specific customization.

Indian example: Codellama-34b is fine-tuned to generate SQL queries tailored to a company’s proprietary database schema, enabling automation without vendor lock-in.

Analogy: Open models are like IKEA furniture — affordable and modifiable, but you’ll sweat assembling it yourself.

Specialized Models Are the “Expert Surgeons” — Precision Tools for Niche Domains

Specialized models are fine-tuned or trained for narrow, high-stakes tasks. Examples include Med-PaLM 2 for medical question answering and BloombergGPT for financial analysis.

Strengths:

Domain expertise: Med-PaLM 2 scores 85% on USMLE-style medical exams, demonstrating clinical accuracy.
Regulatory alignment: These models come pre-validated for HIPAA, GxP, or financial regulations.
Explainability and auditability: Designed to meet compliance and traceability requirements.

Weaknesses:

Narrow scope: A radiology-focused model cannot be repurposed for legal document analysis.
Vendor lock-in and cost: Premium pricing and dependency on vendor ecosystems (e.g., Google Health’s Med-PaLM 2 API).
Limited flexibility: Cannot easily customize beyond the specialized domain.

Use cases: Healthcare, legal, and finance applications where errors have high consequences and audit trails are mandatory.

Indian example: BloombergGPT analyzes SEC filings to identify financial risk factors for Indian institutional investors.

Analogy: Specialized models are expert surgeons — highly skilled at specific procedures but not general practitioners.

// scene:

CTO decision meeting at a telemedicine startup in Bangalore

You (CTO): “Our off-the-shelf models hallucinate medical facts. We need clinical accuracy, HIPAA compliance, and explainability.”

ML Lead: “We can use GPT-4 with Retrieval-Augmented Generation — ground responses with medical journals.”

Engineering Head: “Or we can fine-tune LLaMA 2 on PubMedQA with AWS BAA for compliance.”

Product Lead: “Med-PaLM 2 API is pre-validated but expensive. Cost is a concern.”

You (CTO): “Let’s compare costs and control. Fine-tuning LLaMA 2 gives us data control and cuts inference cost by 40% versus Med-PaLM.”

Decision leans towards open-source fine-tuning for balance of compliance, cost, and control.

// tension:

Choosing between vendor-managed compliance and in-house control under cost constraints.

Economic Impact: Model Total Cost of Ownership (TCO) Comparison

Model Type	Upfront Cost	Inference Cost / Query	Compliance Effort
Closed (GPT-4)	$0 (API-based)	$0.06 – $0.12	Low (vendor-managed)
Open (LLaMA 2)	$5,000 – $50,000	$0.02 – $0.04	High (in-house audits)
Specialized	$10,000+ (license fee)	$0.10 – $0.20	Medium (vendor support)

Case study: Mayo Clinic reduced diagnosis errors by 22% using Med-PaLM 2 but spent $2 million on integration and compliance efforts in 2023.

Step-by-Step Model Evaluation Strategy

Requirements Gathering
- Run benchmarks on your test dataset (e.g., 100 anonymized patient transcripts).
- Measure latency: GPT-4 responds in ~200ms; LLaMA 70B can take ~800ms on 8xA100 GPUs.
Compliance Check
- Verify GDPR constraints: Open models must not train on EU patient data without consent.
- For HIPAA compliance, use cloud providers with Business Associate Agreements (BAA), such as AWS.
Deployment Planning
- Closed models integrate via API in 1–2 days.
- Open models require optimization: quantization using llama.cpp, GPU clustering, and monitoring.

// thread: #model-selection — Cross-functional team clarifying deployment feasibility and compliance.

Anjali (Compliance)Do we have BAA in place with AWS for training LLaMA 2?

Rahul (ML Engineer)Yes, and we can encrypt all PHI during training.

Meera (Product)How does latency compare? Our doctors need sub-second responses.

Rahul (ML Engineer)GPT-4 is around 200ms; LLaMA 70B can hit 800ms on 8xA100 GPUs, but quantization can help.

Anjali (Compliance)Are we sure open-source licensing covers commercial use?

Rahul (ML Engineer)LLaMA 2 requires a Meta commercial license. We’ll audit compliance carefully.

Quiz: Match Model to Use Case

A startup needs a low-cost coding assistant.
- a) GPT-4
- b) Codellama-34b
A hospital prioritizes diagnostic accuracy.
- a) Med-PaLM 2
- b) GPT-4
A GDPR-compliant model with full transparency.
- a) LLaMA 2
- b) Mistral-7B (Apache 2.0 license)

Homework: Model Selection Simulation

// exercise: · 20 min

Model Selection Simulation

Scenario: Deploy a legal document reviewer for a European Union firm.
- Needs: GDPR compliance, explainability (e.g., SHAP values), multilingual support.
Compare these options:
- Closed: Claude 2 (constitutional AI).
- Open: Mixtral-8x7B (Apache 2.0 licensed).
- Specialized: Lexion AI (trained on SEC filings).
Choose a model and justify your decision based on cost, compliance, and performance.
Reflect on your trade-offs: Did you prioritize transparency over ease of use? Did cost influence your choice?

Notes on Critical Tools and Risks

Hugging Face Transformers: Provides access to open models like Mistral-7B and LLaMA 2 with fine-tuning libraries.
AWS BAA Agreement: Essential for HIPAA-compliant healthcare deployments.
llama.cpp: Enables quantization to reduce GPU costs and improve inference speed.

Red Flags:

No compliance audits for open models risk GDPR or HIPAA fines.
Using closed models for sensitive data requires verifying vendor data policies (e.g., OpenAI’s opt-out form).
Ignoring inference latency can degrade user experience; patient chatbots need under 500ms response times.

Test yourself: Model Selection at a Fintech Startup

// learn the judgment

You are CTO at a Series B Indian fintech startup processing loan applications. The engineering lead proposes fine-tuning LLaMA 2 on proprietary credit data. The compliance officer warns about GDPR and RBI regulations. The CEO wants a fast-to-market solution and suggests using GPT-4 API despite costs.

The call: Which model family do you choose and why? How do you balance compliance, cost, and speed?

Your reasoning:

// practice

Your task: Which model family do you choose and why? How do you balance compliance, cost, and speed?

your reasoning:

0 chars (min 80)

Where to go next

If you want to reduce inference costs while maintaining accuracy: LLM Optimization for Production
If you want to deploy AI responsibly in regulated sectors: Ethics, Bias, and Compliance
If you want hands-on practice with fine-tuning and RAG: Hands-On Labs: AI Deployment
If you want to understand transformer internals deeply: Transformer Architecture Deep Dive

Choosing the right model family is about balancing speed, control, and compliance — not just chasing the fanciest accuracy number.

Talvinder Singh, from a Pragmatic Leaders AI Product Leadership cohort, 2024

Closed Models Are the “Luxury Cars” of AI — Fast, Polished, But Opaque

Closed models are proprietary AI systems hosted exclusively by vendors like OpenAI or Google. You access them via APIs, paying per token or request.

Strengths:

State-of-the-art performance: GPT-4 scores 86.4% on the MMLU benchmark, making it top-tier for general knowledge tasks.
Turnkey compliance: Vendors manage GDPR, HIPAA, and data privacy requirements, reducing your legal overhead.
Rapid integration: API-based access allows deployment in 1–2 days, ideal for prototyping or augmenting existing products.

Weaknesses:

Cost: GPT-4 Turbo charges approximately $0.06 per 1000 input tokens, which adds up at scale.
Black-box training: You cannot audit or adjust the model’s internal logic, making explainability and bias mitigation challenging.
Vendor lock-in: Dependence on a third-party API risks outages and pricing changes.

Use cases: Rapid prototyping, applications requiring multimodal inputs (text + images), or products with strict compliance needs but limited AI engineering bandwidth.

Indian example: Google’s Gemini model is used in telemedicine startups to analyze X-rays combined with medical history, providing diagnoses through a closed, compliant API.

Analogy: Closed models are like a five-star hotel — luxurious and easy to use, but you cannot remodel the kitchen or see what’s going on behind the scenes.

Open Models Are the “DIY Kits” — Flexible, Transparent, But Require Sweat Equity

Open models have publicly available weights and architectures. Examples include Meta’s LLaMA 2 and Mistral-7B. You can self-host, fine-tune, and audit these models.

Strengths:

Customizability: Retrain models like CodeLlama on your internal codebase or domain-specific data.
Transparency: Audit and mitigate bias using libraries like trl and open datasets.
Cost control: Self-hosting can reduce inference costs to $0.02–$0.04 per query, far cheaper than closed APIs at scale.

Weaknesses:

Inference costs: Running a large model like LLaMA 70B requires expensive GPUs and cloud infrastructure.
Licensing complexity: Many open models have commercial restrictions — LLaMA 2 requires a Meta license for commercial use; violating this can lead to legal action.
Operational complexity: You must manage deployment, scaling, and compliance audits yourself.

Use cases: Cost-sensitive projects with an in-house AI team, products requiring high transparency or domain-specific customization.

Indian example: Codellama-34b is fine-tuned to generate SQL queries tailored to a company’s proprietary database schema, enabling automation without vendor lock-in.

Analogy: Open models are like IKEA furniture — affordable and modifiable, but you’ll sweat assembling it yourself.

Specialized Models Are the “Expert Surgeons” — Precision Tools for Niche Domains

Specialized models are fine-tuned or trained for narrow, high-stakes tasks. Examples include Med-PaLM 2 for medical question answering and BloombergGPT for financial analysis.

Strengths:

Domain expertise: Med-PaLM 2 scores 85% on USMLE-style medical exams, demonstrating clinical accuracy.
Regulatory alignment: These models come pre-validated for HIPAA, GxP, or financial regulations.
Explainability and auditability: Designed to meet compliance and traceability requirements.

Weaknesses:

Narrow scope: A radiology-focused model cannot be repurposed for legal document analysis.
Vendor lock-in and cost: Premium pricing and dependency on vendor ecosystems (e.g., Google Health’s Med-PaLM 2 API).
Limited flexibility: Cannot easily customize beyond the specialized domain.

Use cases: Healthcare, legal, and finance applications where errors have high consequences and audit trails are mandatory.

Indian example: BloombergGPT analyzes SEC filings to identify financial risk factors for Indian institutional investors.

Analogy: Specialized models are expert surgeons — highly skilled at specific procedures but not general practitioners.

// scene:

CTO decision meeting at a telemedicine startup in Bangalore

You (CTO): “Our off-the-shelf models hallucinate medical facts. We need clinical accuracy, HIPAA compliance, and explainability.”

ML Lead: “We can use GPT-4 with Retrieval-Augmented Generation — ground responses with medical journals.”

Engineering Head: “Or we can fine-tune LLaMA 2 on PubMedQA with AWS BAA for compliance.”

Product Lead: “Med-PaLM 2 API is pre-validated but expensive. Cost is a concern.”

You (CTO): “Let’s compare costs and control. Fine-tuning LLaMA 2 gives us data control and cuts inference cost by 40% versus Med-PaLM.”

Decision leans towards open-source fine-tuning for balance of compliance, cost, and control.

// tension:

Choosing between vendor-managed compliance and in-house control under cost constraints.

Economic Impact: Model Total Cost of Ownership (TCO) Comparison

Model Type	Upfront Cost	Inference Cost / Query	Compliance Effort
Closed (GPT-4)	$0 (API-based)	$0.06 – $0.12	Low (vendor-managed)
Open (LLaMA 2)	$5,000 – $50,000	$0.02 – $0.04	High (in-house audits)
Specialized	$10,000+ (license fee)	$0.10 – $0.20	Medium (vendor support)

Case study: Mayo Clinic reduced diagnosis errors by 22% using Med-PaLM 2 but spent $2 million on integration and compliance efforts in 2023.

Step-by-Step Model Evaluation Strategy

Requirements Gathering
- Run benchmarks on your test dataset (e.g., 100 anonymized patient transcripts).
- Measure latency: GPT-4 responds in ~200ms; LLaMA 70B can take ~800ms on 8xA100 GPUs.
Compliance Check
- Verify GDPR constraints: Open models must not train on EU patient data without consent.
- For HIPAA compliance, use cloud providers with Business Associate Agreements (BAA), such as AWS.
Deployment Planning
- Closed models integrate via API in 1–2 days.
- Open models require optimization: quantization using llama.cpp, GPU clustering, and monitoring.

// thread: #model-selection — Cross-functional team clarifying deployment feasibility and compliance.

Anjali (Compliance)Do we have BAA in place with AWS for training LLaMA 2?

Rahul (ML Engineer)Yes, and we can encrypt all PHI during training.

Meera (Product)How does latency compare? Our doctors need sub-second responses.

Rahul (ML Engineer)GPT-4 is around 200ms; LLaMA 70B can hit 800ms on 8xA100 GPUs, but quantization can help.

Anjali (Compliance)Are we sure open-source licensing covers commercial use?

Rahul (ML Engineer)LLaMA 2 requires a Meta commercial license. We’ll audit compliance carefully.

Quiz: Match Model to Use Case

A startup needs a low-cost coding assistant.
- a) GPT-4
- b) Codellama-34b
A hospital prioritizes diagnostic accuracy.
- a) Med-PaLM 2
- b) GPT-4
A GDPR-compliant model with full transparency.
- a) LLaMA 2
- b) Mistral-7B (Apache 2.0 license)

Homework: Model Selection Simulation

// exercise: · 20 min

Model Selection Simulation

Scenario: Deploy a legal document reviewer for a European Union firm.
- Needs: GDPR compliance, explainability (e.g., SHAP values), multilingual support.
Compare these options:
- Closed: Claude 2 (constitutional AI).
- Open: Mixtral-8x7B (Apache 2.0 licensed).
- Specialized: Lexion AI (trained on SEC filings).
Choose a model and justify your decision based on cost, compliance, and performance.
Reflect on your trade-offs: Did you prioritize transparency over ease of use? Did cost influence your choice?

Notes on Critical Tools and Risks

Hugging Face Transformers: Provides access to open models like Mistral-7B and LLaMA 2 with fine-tuning libraries.
AWS BAA Agreement: Essential for HIPAA-compliant healthcare deployments.
llama.cpp: Enables quantization to reduce GPU costs and improve inference speed.

Red Flags:

No compliance audits for open models risk GDPR or HIPAA fines.
Using closed models for sensitive data requires verifying vendor data policies (e.g., OpenAI’s opt-out form).
Ignoring inference latency can degrade user experience; patient chatbots need under 500ms response times.

Test yourself: Model Selection at a Fintech Startup

// learn the judgment

The call: Which model family do you choose and why? How do you balance compliance, cost, and speed?

Your reasoning:

// practice

Your task: Which model family do you choose and why? How do you balance compliance, cost, and speed?

your reasoning:

0 chars (min 80)

Where to go next

If you want to reduce inference costs while maintaining accuracy: LLM Optimization for Production
If you want to deploy AI responsibly in regulated sectors: Ethics, Bias, and Compliance
If you want hands-on practice with fine-tuning and RAG: Hands-On Labs: AI Deployment
If you want to understand transformer internals deeply: Transformer Architecture Deep Dive