ES
$ cat ~/articles/optimizacion-multiobjetivo-pareto.md ← back to articles

You're designing a distributed cache system. You want low latency, high consistency, and maximum availability. The CAP theorem tells you that you can only choose two. You're training a neural network for production. You need high accuracy, low inference latency, and a small model for edge devices. You can't have all three simultaneously. You're optimizing a REST API. You want low response time, high throughput, and minimal memory usage. Each objective conflicts with at least one other.

Multi-objective optimization formalizes these problems where improving one objective necessarily worsens another. There is no single "best solution". There exists a set of Pareto-optimal solutions where you cannot improve any objective without sacrificing at least one other. This article implements the fundamental metrics for working with these solutions: Pareto dominance, optimal frontier, hypervolume, spacing, coverage, and crowding distance.

Why there is no best solution

A classic optimization problem has a single objective to maximize or minimize. Finding the shortest path between two points in a graph. Minimizing the mean squared error of a model. Maximizing the throughput of a processing pipeline. These problems have a unique optimal solution or a set of solutions with the same objective value.

Multi-objective problems have multiple objective functions that we want to optimize simultaneously. Formally, given a decision vector x in the search space, we want to optimize:

minimize f₁(x), f₂(x), ..., fₖ(x)

Where each fᵢ is a different objective function. We write "minimize" by convention, but maximizing f(x) is equivalent to minimizing -f(x).

The problem arises when the objectives are in conflict. Improving f₁ worsens f₂. A concrete example: designing an image compressor.

  • Objective 1: Minimize compressed file size
  • Objective 2: Minimize quality loss (measured with PSNR)

Increasing the compression rate reduces the size but degrades the quality. Preserving perfect quality requires lossless compression with larger sizes. There is no configuration that simultaneously minimizes both objectives. The solution is not a point, it's a curve of trade-offs.

Another real example: configuring a Kubernetes cluster.

  • Objective 1: Minimize p99 latency of requests
  • Objective 2: Minimize infrastructure cost (number of nodes)
  • Objective 3: Maximize CPU utilization (avoid waste)

Horizontal scaling reduces latency but increases cost and reduces per-node utilization. Reducing nodes increases utilization but can saturate CPU causing high latencies. Each configuration (number of nodes, resource limits, autoscaling policies) represents a different point in the objective space.

Pareto dominance: comparing without weighting

Example: Choosing a cloud server

You have to choose an instance type for your application. Two options:

  • Server A: 100ms latency, 50€/month
  • Server B: 150ms latency, 80€/month

Server A dominates B because it's better in both aspects: faster AND cheaper. No one would rationally choose B. But if you compare:

  • Server A: 100ms latency, 50€/month
  • Server C: 80ms latency, 70€/month

Neither dominates the other. A is cheaper but C is faster. It depends on your priorities: if budget is critical you choose A, if speed is vital you choose C. There's no single correct answer.


When you have conflicting objectives, you can't simply add them together. The expression latency_ms + cost_eur doesn't make dimensional sense. Adding milliseconds with euros is invalid. Weighting with arbitrary weights w₁·latency + w₂·cost requires deciding the relative importance in advance, which is exactly what we want to avoid.

Pareto dominance defines a partial order relation between solutions without needing to weight objectives. A solution a dominates another solution b (written ab) if and only if:

  1. a is equal or better than b in all objectives: fᵢ(a) ≤ fᵢ(b) for all i
  2. a is strictly better than b in at least one objective: there exists some j where fⱼ(a) < fⱼ(b)

The first condition guarantees that a is not worse in any aspect. The second guarantees that a is genuinely better, not simply equal. If both solutions have exactly the same values in all objectives, neither dominates the other even though they are identical.

Concrete examples with two objectives (minimizing both):

Solution A: f₁=10, f₂=20
Solution B: f₁=15, f₂=25

A dominates B because 10 < 15 and 20 < 25. A is better in both objectives.

Solution A: f₁=10, f₂=30
Solution B: f₁=15, f₂=25

Neither dominates the other. A is better in f₁ but worse in f₂. B is better in f₂ but worse in f₁. These solutions are incomparable under Pareto dominance.

Solution A: f₁=10, f₂=20
Solution B: f₁=10, f₂=25

A dominates B because both are equal in f₁ but A is strictly better in f₂.

Python implementation:

import numpy as np
from typing import List

def dominates(a: np.ndarray, b: np.ndarray) -> bool:
    """
    Verifies if solution a dominates solution b.
    Assumes minimization in all objectives.

    Args:
        a: Array of objectives for solution a
        b: Array of objectives for solution b

    Returns:
        True if a dominates b, False otherwise
    """
    # Verify that a is equal or better in all objectives
    all_better_or_equal = np.all(a <= b)

    # Verify that a is strictly better in at least one objective
    any_strictly_better = np.any(a < b)

    return all_better_or_equal and any_strictly_better


# Usage example
sol_a = np.array([10, 20])  # f1=10, f2=20
sol_b = np.array([15, 25])  # f1=15, f2=25
sol_c = np.array([10, 30])  # f1=10, f2=30

print(f"A dominates B: {dominates(sol_a, sol_b)}")  # True
print(f"A dominates C: {dominates(sol_a, sol_c)}")  # True
print(f"C dominates A: {dominates(sol_c, sol_a)}")  # False
print(f"B dominates C: {dominates(sol_b, sol_c)}")  # False

This function assumes that all objectives are minimized. If you have objectives that are maximized, multiply those values by -1 before calling the function. For example, if f₂ is maximized, convert the problem using f₂' = -f₂ and minimize f₂'.

Pareto frontier: non-dominated solutions

Example: Configuring your database

You test 10 different configurations of your PostgreSQL: different buffer sizes, maximum connections, vacuum policies. You measure queries per second and memory used for each one. Of those 10 configurations, only 4 are on the Pareto frontier:

  • Config 1: 1000 qps, 2GB RAM (speed extreme)
  • Config 2: 800 qps, 1.5GB RAM (balance)
  • Config 3: 600 qps, 1GB RAM (balance)
  • Config 4: 400 qps, 0.5GB RAM (memory extreme)

The other 6 configurations are dominated: there is always a config that gives more qps with equal or less RAM. Those 6 can be automatically discarded. Only the 4 on the frontier deserve consideration because each represents a different optimal trade-off.


The Pareto set is the subset of all candidate solutions where no solution is dominated by another in the same set. Formally:

P = { x ∈ S | there does not exist y ∈ S such that y ≺ x }

Where S is the set of all candidate solutions. In other words, a solution is in the Pareto set if and only if no other solution dominates it.

The Pareto frontier is the image of this set in the objective space. If we graph all solutions with axes corresponding to each objective, the Pareto frontier is the curve formed by the non-dominated points.

The most straightforward algorithm to calculate the Pareto set is to compare each solution against all others:

def pareto_frontier(solutions: np.ndarray) -> np.ndarray:
    """
    Calculates the Pareto frontier of a set of solutions.

    Args:
        solutions: Array of shape (n_solutions, n_objectives)

    Returns:
        Boolean array of size n_solutions where True indicates
        that the solution is on the Pareto frontier
    """
    n_solutions = solutions.shape[0]
    is_pareto = np.ones(n_solutions, dtype=bool)

    for i in range(n_solutions):
        if not is_pareto[i]:
            # If we already know that i is dominated, skip it
            continue

        for j in range(n_solutions):
            if i == j:
                continue

            # If j dominates i, mark i as dominated
            if dominates(solutions[j], solutions[i]):
                is_pareto[i] = False
                break

    return is_pareto


# Generate example solutions
np.random.seed(42)
solutions = np.random.rand(100, 2) * 100

# Calculate frontier
pareto_mask = pareto_frontier(solutions)
pareto_solutions = solutions[pareto_mask]

print(f"Total solutions: {len(solutions)}")
print(f"Solutions on Pareto frontier: {len(pareto_solutions)}")
print(f"Percentage: {100 * len(pareto_solutions) / len(solutions):.1f}%")

This algorithm has O(n²) complexity where n is the number of solutions. For each solution i, it potentially compares against all other n-1 solutions. In problems with tens of thousands of solutions, this can become slow.

A common optimization is to sort the solutions by the first objective before comparing. If the solutions are sorted in ascending order by f₁, when we process solution i, we only need to compare against previous solutions that have f₁ ≤ f₁(i). This optimization reduces the number of comparisons in practice, although the asymptotic complexity remains O(n²).

def pareto_frontier_optimized(solutions: np.ndarray) -> np.ndarray:
    """
    Optimized version that sorts by the first objective.
    """
    # Sort by first objective
    sorted_indices = np.argsort(solutions[:, 0])
    sorted_solutions = solutions[sorted_indices]

    n_solutions = solutions.shape[0]
    is_pareto = np.ones(n_solutions, dtype=bool)

    for i in range(n_solutions):
        if not is_pareto[i]:
            continue

        # Only compare against previous solutions (f1 less than or equal)
        for j in range(i):
            if is_pareto[j] and dominates(sorted_solutions[j], sorted_solutions[i]):
                is_pareto[i] = False
                break

    # Reverse the sorting to return in original order
    original_order = np.empty(n_solutions, dtype=bool)
    original_order[sorted_indices] = is_pareto

    return original_order

Hypervolume: measuring frontier quality

Example: Comparing two tuning algorithms

You run two hyperparameter optimization frameworks for your model: Optuna and Ray Tune. Both give you Pareto frontiers optimizing accuracy vs training time. Which one worked better?

Hypervolume answers this. You define a "worst acceptable" reference point: 70% accuracy, 1 hour time. Hypervolume measures the area of space that each frontier dominates with respect to that reference point. If Optuna has hypervolume 0.45 and Ray Tune has 0.38, Optuna won: it covers more space of good solutions.

It's like measuring "how much territory conquered" each algorithm in the solution space. More dominated area = better solutions found.


Once you have a Pareto frontier, you need to measure how good it is. This is essential for comparing different optimization algorithms or for monitoring convergence during search.

Hypervolume (also called S-metric or Lebesgue measure) is the most robust metric for evaluating Pareto frontiers. It measures the volume of the objective space that is dominated by the frontier with respect to a reference point.

Formally, given a set of solutions P and a reference point r, the hypervolume is:

HV(P, r) = λ( ⋃_{x∈P} [f(x), r] )

Where λ is the Lebesgue measure (volume) and [f(x), r] is the hyperrectangle from point f(x) to reference point r.

In two dimensions, this is easily visualized. Each solution on the frontier defines a rectangle from that point to the reference point. Hypervolume is the total area covered by the union of these rectangles.

The reference point must be worse than any solution of interest in all objectives. Typically it is chosen as a point slightly worse than the nadir point (the point with the worst values of each objective on the frontier).

For two objectives, the calculation is efficient by sorting the points:

def hypervolume_2d(pareto_front: np.ndarray, reference_point: np.ndarray) -> float:
    """
    Calculates the hypervolume of a Pareto frontier in 2D.

    Args:
        pareto_front: Array of shape (n_points, 2) with points on the frontier
        reference_point: Array of shape (2,) with the reference point

    Returns:
        Hypervolume (area) dominated by the frontier
    """
    # Sort by first objective (ascending)
    sorted_front = pareto_front[np.argsort(pareto_front[:, 0])]

    # Calculate accumulated area
    hypervolume = 0.0
    prev_x = reference_point[0]

    for point in sorted_front:
        # Width of rectangle: from previous point to this point
        width = prev_x - point[0]

        # Height of rectangle: from this point to reference point
        height = reference_point[1] - point[1]

        # Accumulate area
        hypervolume += width * height

        # Update X position for next rectangle
        prev_x = point[0]

    return hypervolume


# Example: two frontiers to compare
front_a = np.array([[1, 5], [2, 3], [3, 2], [5, 1]])
front_b = np.array([[1, 6], [3, 4], [4, 3], [6, 1]])
ref_point = np.array([7, 7])

hv_a = hypervolume_2d(front_a, ref_point)
hv_b = hypervolume_2d(front_b, ref_point)

print(f"Hypervolume frontier A: {hv_a:.2f}")
print(f"Hypervolume frontier B: {hv_b:.2f}")
print(f"Frontier {'A' if hv_a > hv_b else 'B'} dominates more space")

The most important property of hypervolume is monotonicity with respect to dominance: if a frontier P' completely dominates another frontier P (i.e., for each point in P there exists at least one point in P' that dominates it), then HV(P', r) ≥ HV(P, r). This makes hypervolume a theoretically sound metric for comparing frontiers.

For problems with three or more objectives, calculating exact hypervolume is computationally expensive (NP-hard). The most practical approximation is Monte Carlo:

def hypervolume_monte_carlo(pareto_front: np.ndarray,
                           reference_point: np.ndarray,
                           n_samples: int = 10000) -> float:
    """
    Approximates hypervolume using Monte Carlo for any number of objectives.

    Args:
        pareto_front: Array of shape (n_points, n_objectives)
        reference_point: Array of shape (n_objectives,)
        n_samples: Number of random points to sample

    Returns:
        Hypervolume estimation
    """
    n_objectives = pareto_front.shape[1]

    # Find bounds of the sampling hyperrectangle
    min_bounds = np.min(pareto_front, axis=0)
    max_bounds = reference_point

    # Generate random points uniformly in the hyperrectangle
    random_points = np.random.uniform(
        low=min_bounds,
        high=max_bounds,
        size=(n_samples, n_objectives)
    )

    # Count how many points are dominated by at least one frontier point
    dominated_count = 0

    for random_point in random_points:
        # Check if any frontier point dominates this random point
        is_dominated = False

        for pareto_point in pareto_front:
            # A frontier point dominates the random point if it is
            # better or equal in all objectives
            if np.all(pareto_point <= random_point):
                is_dominated = True
                break

        if is_dominated:
            dominated_count += 1

    # Hypervolume is the proportion of dominated points
    # multiplied by the total volume of the hyperrectangle
    total_volume = np.prod(max_bounds - min_bounds)
    hypervolume = (dominated_count / n_samples) * total_volume

    return hypervolume


# Example with 3 objectives
front_3d = np.array([
    [1, 5, 3],
    [2, 3, 4],
    [3, 2, 5],
    [4, 1, 6]
])
ref_3d = np.array([7, 7, 7])

hv_3d = hypervolume_monte_carlo(front_3d, ref_3d, n_samples=50000)
print(f"3D Hypervolume (Monte Carlo): {hv_3d:.2f}")

The Monte Carlo method converges to the real value with O(1/√n) where n is the number of samples. With 10,000 samples you get approximately 1% error. For practical applications where you only need to compare frontiers relatively (A is better than B), this is sufficiently accurate.

Spacing: measuring uniform distribution

Example: Offering options to the user

Your e-commerce application optimizes delivery speed vs shipping cost. You find 10 Pareto-optimal solutions, but 8 are in the "very fast, very expensive" zone and only 2 in "slow, economical". Technically they are optimal, but you offer unbalanced options: the user looking for balance has no intermediate options.

Spacing measures this. Low spacing means uniform distribution: there are options distributed evenly across the entire frontier. High spacing means clustering: solutions are grouped leaving gaps. For UX, you want low spacing: give the user truly diverse options, not 8 variations of "express premium".


Hypervolume measures how close the frontier is to the theoretical optimum, but it doesn't measure how uniformly distributed the solutions are along the frontier. A frontier can have high hypervolume but all solutions concentrated in one region, leaving other regions without coverage.

Spacing quantifies the uniformity of the distribution. It measures the standard deviation of the distances to the nearest neighbor of each point:

S = √( (1/n) Σᵢ(dᵢ - d̄)² )

Where:

  • n is the number of points on the frontier
  • dᵢ is the minimum distance from point i to any other point
  • d̄ is the mean of all minimum distances

Low spacing indicates uniform distribution (all solutions are approximately at the same distance from their neighbors). High spacing indicates clustering (some solutions are very close together and others very separated).

from scipy.spatial import distance_matrix

def spacing(pareto_front: np.ndarray) -> float:
    """
    Calculates the spacing metric (distribution uniformity).

    Args:
        pareto_front: Array of shape (n_points, n_objectives)

    Returns:
        Spacing value (lower is better = more uniform)
    """
    n_points = len(pareto_front)

    if n_points < 2:
        return 0.0

    # Calculate distance matrix between all points
    dist_matrix = distance_matrix(pareto_front, pareto_front)

    # Put infinity on diagonal to ignore self-distance
    np.fill_diagonal(dist_matrix, np.inf)

    # Find minimum distance for each point
    min_distances = np.min(dist_matrix, axis=1)

    # Calculate standard deviation of these distances
    mean_dist = np.mean(min_distances)
    variance = np.mean((min_distances - mean_dist) ** 2)
    spacing_value = np.sqrt(variance)

    return spacing_value


# Compare two frontiers: one uniform and one with clustering
uniform_front = np.array([
    [1, 5], [2, 4], [3, 3], [4, 2], [5, 1]
])

clustered_front = np.array([
    [1, 5], [1.1, 4.9], [1.2, 4.8],  # Cluster at one extreme
    [4, 2], [5, 1]                     # Isolated points at the other
])

print(f"Spacing uniform frontier: {spacing(uniform_front):.4f}")
print(f"Spacing clustered frontier: {spacing(clustered_front):.4f}")

In practical applications, you want low spacing because it means you're giving the user diverse options. If all your solutions are in a small region of the frontier, the user can only choose between very similar trade-offs. A well-distributed frontier offers options from "optimize almost exclusively objective 1" to "optimize almost exclusively objective 2" with intermediate gradations.

Coverage: comparing two frontiers

Example: Compiler benchmark

You optimize compilation flags for GCC vs Clang for your codebase: smaller binary vs faster execution. GCC produces 12 Pareto-optimal configurations, Clang produces 15. Which is better?

Coverage(GCC, Clang) = 0.80 means that 80% of Clang's solutions are dominated by at least one from GCC. Coverage(Clang, GCC) = 0.25 means that only 25% of GCC's are dominated by Clang. Conclusion: GCC found better solutions, it dominates most of what Clang offers.

It's like a direct combat: "how many of your points are worse than at least one of mine?". High coverage in one direction = clear superiority.


When you compare two multi-objective optimization algorithms, you need to determine which produces better frontiers. Hypervolume tells you which dominates more space, but it doesn't tell you how many solutions from one frontier are dominated by the other.

The coverage metric C(A, B) measures what percentage of solutions in B are dominated by at least one solution in A:

C(A, B) = |{b ∈ B | ∃a ∈ A : a ≺ b}| / |B|

This metric is asymmetric: C(A, B) is not equal to C(B, A). If C(A, B) = 1, it means all solutions of B are dominated by at least one solution of A. If C(A, B) = 0, no solution of B is dominated by A.

def coverage(front_a: np.ndarray, front_b: np.ndarray) -> float:
    """
    Calculates C(A, B): what percentage of B is dominated by A.

    Args:
        front_a: Array of shape (n_a, n_objectives)
        front_b: Array of shape (n_b, n_objectives)

    Returns:
        Value between 0 and 1 (higher means A dominates more of B)
    """
    n_b = len(front_b)
    dominated_count = 0

    for sol_b in front_b:
        # Check if any solution from A dominates this solution from B
        for sol_a in front_a:
            if dominates(sol_a, sol_b):
                dominated_count += 1
                break  # No need to keep checking other A solutions

    return dominated_count / n_b if n_b > 0 else 0.0


# Example: compare two frontiers
front_a = np.array([[1, 5], [2, 3], [3, 2], [5, 1]])
front_b = np.array([[2, 6], [3, 4], [4, 3], [6, 2]])

c_ab = coverage(front_a, front_b)
c_ba = coverage(front_b, front_a)

print(f"C(A, B) = {c_ab:.2f}")  # What percentage of B does A dominate
print(f"C(B, A) = {c_ba:.2f}")  # What percentage of A does B dominate

if c_ab > c_ba:
    print("Frontier A dominates more solutions of B than vice versa")
elif c_ba > c_ab:
    print("Frontier B dominates more solutions of A than vice versa")
else:
    print("Both frontiers have similar coverage")

In algorithm benchmarking, you typically run each algorithm multiple times with different random seeds, get one frontier per run, and calculate C(A, B) between all pairs of frontiers from both algorithms. Then you report mean and standard deviation of C(A, B) and C(B, A).

Crowding distance: preserving diversity in NSGA-II

Example: Evolving neural network architectures

You use a genetic algorithm to find optimal neural network architectures: accuracy vs number of parameters. After 100 generations, all solutions converge to the same region: networks of ~50M parameters with 92-93% accuracy. You lost diversity: there are no options for small mobile networks or large networks for maximum accuracy.

Crowding distance solves this. It assigns a "loneliness score" to each solution: solutions in dense regions receive low score, isolated solutions receive high score. During selection, the algorithm prefers lonely solutions for reproduction, incentivizing exploration of less populated regions of the frontier.

It's like distributing resources uniformly in a territory: if there are many people in one zone, you incentivize migration to empty zones to maintain balanced coverage of space.


The NSGA-II (Non-dominated Sorting Genetic Algorithm II) algorithm is one of the most popular evolutionary algorithms for multi-objective optimization. One of its key components is crowding distance, which estimates the density of solutions around each frontier point.

The problem it solves is the following: during evolution, the algorithm tends to converge toward frontier regions where solutions are locally optimal according to genetic operators. This causes clustering, leaving other regions unexplored. Crowding distance penalizes solutions in dense regions, incentivizing the algorithm to maintain diversity.

The crowding distance of a solution is calculated as the sum of distances to its neighbors in each objective, normalized by the objective's range:

CD(i) = Σⱼ [ (fⱼ(i+1) - fⱼ(i-1)) / (fⱼᵐᵃˣ - fⱼᵐⁱⁿ) ]

Where:

  • i+1 is the immediate upper neighbor in objective j
  • i-1 is the immediate lower neighbor in objective j
  • fⱼᵐᵃˣ and fⱼᵐⁱⁿ are the maximum and minimum values of objective j across the entire frontier

Solutions at the extremes of each objective receive infinite crowding distance because they are unique (they have no neighbor in that direction).

def crowding_distance(pareto_front: np.ndarray) -> np.ndarray:
    """
    Calculates the crowding distance for each point on the Pareto frontier.

    Args:
        pareto_front: Array of shape (n_points, n_objectives)

    Returns:
        Array of crowding distances of size n_points
    """
    n_points, n_objectives = pareto_front.shape

    # Initialize all distances to 0
    distances = np.zeros(n_points)

    # Calculate for each objective
    for obj_idx in range(n_objectives):
        # Sort by this objective
        sorted_indices = np.argsort(pareto_front[:, obj_idx])

        # Extract values of this objective sorted
        obj_values = pareto_front[sorted_indices, obj_idx]

        # Objective range (for normalization)
        obj_range = obj_values[-1] - obj_values[0]

        # Special cases: if all values are equal, range is 0
        if obj_range == 0:
            continue

        # Assign infinite distance to extremes
        distances[sorted_indices[0]] = np.inf
        distances[sorted_indices[-1]] = np.inf

        # Calculate distance for intermediate points
        for i in range(1, n_points - 1):
            # Distance to upper neighbor minus distance to lower neighbor
            distance_contribution = (obj_values[i + 1] - obj_values[i - 1]) / obj_range

            # Accumulate distance (sum contributions from all objectives)
            distances[sorted_indices[i]] += distance_contribution

    return distances


# Example: frontier with non-uniform distribution
front = np.array([
    [1, 10],   # Extreme objective 1
    [2, 8],
    [2.1, 7.9],  # Very close to previous (should have low CD)
    [5, 5],
    [10, 1]    # Extreme objective 2
])

cd = crowding_distance(front)

print("Crowding distances:")
for i, (point, distance) in enumerate(zip(front, cd)):
    print(f"Point {i}: {point} -> CD = {distance:.3f if distance != np.inf else 'inf'}")

print("\nPoints sorted by crowding distance (from most isolated to most dense):")
sorted_indices = np.argsort(cd)[::-1]
for idx in sorted_indices:
    cd_str = "inf" if cd[idx] == np.inf else f"{cd[idx]:.3f}"
    print(f"Point {idx}: {front[idx]} -> CD = {cd_str}")

In NSGA-II, during parent selection for reproduction, when two solutions are on the same dominance front (same rank), the one with higher crowding distance is preferred. This incentivizes exploring less populated regions of the frontier.

The complete NSGA-II algorithm combines:

  1. Non-dominated sorting: classify solutions into multiple fronts (front 1 = Pareto frontier, front 2 = Pareto frontier of the rest, etc.)
  2. Crowding distance: calculate density within each front
  3. Tournament selection: compare solutions by rank first, by crowding distance if they have the same rank
  4. Standard genetic operators: crossover and mutation

Complete implementation: reusable class

Putting all metrics together in a cohesive class for use in real projects:

from dataclasses import dataclass
from typing import Tuple
import numpy as np
from scipy.spatial import distance_matrix


@dataclass
class ParetoMetrics:
    """Results of Pareto frontier analysis."""
    hypervolume: float
    spacing: float
    n_points: int
    crowding_distances: np.ndarray


class MultiObjectiveProblem:
    """
    Class for working with multi-objective optimization problems.
    """

    def __init__(self, solutions: np.ndarray, reference_point: np.ndarray = None):
        """
        Args:
            solutions: Array (n_solutions, n_objectives)
            reference_point: Reference point for hypervolume.
                           If None, calculated automatically.
        """
        self.solutions = solutions
        self.n_solutions, self.n_objectives = solutions.shape

        if reference_point is None:
            # Reference point: slightly worse than the worst in each objective
            self.reference_point = np.max(solutions, axis=0) * 1.1
        else:
            self.reference_point = reference_point

    def get_pareto_front(self) -> Tuple[np.ndarray, np.ndarray]:
        """
        Calculates the Pareto frontier.

        Returns:
            Tuple of (solutions on frontier, original indices)
        """
        pareto_mask = self._pareto_frontier()
        pareto_solutions = self.solutions[pareto_mask]
        pareto_indices = np.where(pareto_mask)[0]

        return pareto_solutions, pareto_indices

    def _pareto_frontier(self) -> np.ndarray:
        """Pareto frontier algorithm implementation."""
        is_pareto = np.ones(self.n_solutions, dtype=bool)

        for i in range(self.n_solutions):
            if not is_pareto[i]:
                continue

            for j in range(self.n_solutions):
                if i == j:
                    continue

                if self._dominates(self.solutions[j], self.solutions[i]):
                    is_pareto[i] = False
                    break

        return is_pareto

    @staticmethod
    def _dominates(a: np.ndarray, b: np.ndarray) -> bool:
        """Verifies Pareto dominance."""
        return np.all(a <= b) and np.any(a < b)

    def calculate_metrics(self) -> ParetoMetrics:
        """
        Calculates all metrics for the Pareto frontier.

        Returns:
            ParetoMetrics object with all metrics
        """
        pareto_front, _ = self.get_pareto_front()

        if self.n_objectives == 2:
            hv = self._hypervolume_2d(pareto_front)
        else:
            hv = self._hypervolume_monte_carlo(pareto_front)

        sp = self._spacing(pareto_front)
        cd = self._crowding_distance(pareto_front)

        return ParetoMetrics(
            hypervolume=hv,
            spacing=sp,
            n_points=len(pareto_front),
            crowding_distances=cd
        )

    def _hypervolume_2d(self, front: np.ndarray) -> float:
        """Exact hypervolume for 2 objectives."""
        sorted_front = front[np.argsort(front[:, 0])]

        hypervolume = 0.0
        prev_x = self.reference_point[0]

        for point in sorted_front:
            width = prev_x - point[0]
            height = self.reference_point[1] - point[1]
            hypervolume += width * height
            prev_x = point[0]

        return hypervolume

    def _hypervolume_monte_carlo(self, front: np.ndarray, n_samples: int = 10000) -> float:
        """Monte Carlo hypervolume for 3+ objectives."""
        min_bounds = np.min(front, axis=0)
        max_bounds = self.reference_point

        random_points = np.random.uniform(
            low=min_bounds,
            high=max_bounds,
            size=(n_samples, self.n_objectives)
        )

        dominated_count = 0
        for random_point in random_points:
            for pareto_point in front:
                if np.all(pareto_point <= random_point):
                    dominated_count += 1
                    break

        total_volume = np.prod(max_bounds - min_bounds)
        return (dominated_count / n_samples) * total_volume

    def _spacing(self, front: np.ndarray) -> float:
        """Calculates spacing."""
        if len(front) < 2:
            return 0.0

        dist_matrix = distance_matrix(front, front)
        np.fill_diagonal(dist_matrix, np.inf)

        min_distances = np.min(dist_matrix, axis=1)
        mean_dist = np.mean(min_distances)
        variance = np.mean((min_distances - mean_dist) ** 2)

        return np.sqrt(variance)

    def _crowding_distance(self, front: np.ndarray) -> np.ndarray:
        """Calculates crowding distance."""
        n_points = len(front)
        distances = np.zeros(n_points)

        for obj_idx in range(self.n_objectives):
            sorted_indices = np.argsort(front[:, obj_idx])
            obj_values = front[sorted_indices, obj_idx]

            obj_range = obj_values[-1] - obj_values[0]
            if obj_range == 0:
                continue

            distances[sorted_indices[0]] = np.inf
            distances[sorted_indices[-1]] = np.inf

            for i in range(1, n_points - 1):
                distance_contribution = (obj_values[i + 1] - obj_values[i - 1]) / obj_range
                distances[sorted_indices[i]] += distance_contribution

        return distances

    @staticmethod
    def compare_fronts(front_a: np.ndarray, front_b: np.ndarray) -> Tuple[float, float]:
        """
        Compares two frontiers using coverage.

        Returns:
            Tuple (C(A,B), C(B,A))
        """
        def coverage(fa: np.ndarray, fb: np.ndarray) -> float:
            dominated = 0
            for sb in fb:
                for sa in fa:
                    if MultiObjectiveProblem._dominates(sa, sb):
                        dominated += 1
                        break
            return dominated / len(fb) if len(fb) > 0 else 0.0

        return coverage(front_a, front_b), coverage(front_b, front_a)


# Complete usage example
if __name__ == "__main__":
    # Generate test problem
    np.random.seed(42)
    solutions = np.random.rand(200, 2) * 100

    # Create problem
    problem = MultiObjectiveProblem(solutions)

    # Get frontier
    pareto_front, indices = problem.get_pareto_front()
    print(f"Found {len(pareto_front)} solutions on Pareto frontier")

    # Calculate metrics
    metrics = problem.calculate_metrics()
    print(f"\nFrontier metrics:")
    print(f"  Hypervolume: {metrics.hypervolume:.2f}")
    print(f"  Spacing: {metrics.spacing:.4f}")
    print(f"  Points on frontier: {metrics.n_points}")

    # Show crowding distances
    print(f"\nTop 5 most isolated points (highest crowding distance):")
    sorted_cd = np.argsort(metrics.crowding_distances)[::-1]
    for i in sorted_cd[:5]:
        cd_str = "inf" if metrics.crowding_distances[i] == np.inf else f"{metrics.crowding_distances[i]:.3f}"
        print(f"  {pareto_front[i]} -> CD = {cd_str}")

This implementation is ready to use in production projects. It includes handling of special cases (frontiers with a single point, objectives with zero range, duplicate solutions) and uses NumPy efficiently for vectorized operations where possible.

Practical applications

Multi-objective optimization techniques are used in multiple domains:

Machine Learning: Hyperparameter tuning optimizing accuracy vs training time vs model size. Neural Architecture Search balancing accuracy vs latency vs FLOPs.

Software engineering: Compiler optimization balancing compilation time vs binary size vs execution speed. Resource allocation in clusters balancing latency vs cost vs utilization.

Neural networks: Network pruning optimizing accuracy vs sparsity vs inference speed. Quantization balancing precision vs size vs latency on edge devices.

Distributed systems: Cache configuration balancing hit rate vs memory used vs synchronization overhead. Processing pipeline design balancing throughput vs latency vs cost.

Finance: Portfolio optimization maximizing expected return while minimizing risk and respecting liquidity constraints.

In all these cases, the Pareto frontier provides a set of optimal solutions from which the user or system can choose according to context-specific preferences. The metrics implemented here allow rigorous quantification and comparison of the quality of these frontiers.