You're designing a distributed cache system. You want low latency, high consistency, and maximum availability. The CAP theorem tells you that you can only choose two. You're training a neural network for production. You need high accuracy, low inference latency, and a small model for edge devices. You can't have all three simultaneously. You're optimizing a REST API. You want low response time, high throughput, and minimal memory usage. Each objective conflicts with at least one other.
Multi-objective optimization formalizes these problems where improving one objective necessarily worsens another. There is no single "best solution". There exists a set of Pareto-optimal solutions where you cannot improve any objective without sacrificing at least one other. This article implements the fundamental metrics for working with these solutions: Pareto dominance, optimal frontier, hypervolume, spacing, coverage, and crowding distance.
Why there is no best solution
A classic optimization problem has a single objective to maximize or minimize. Finding the shortest path between two points in a graph. Minimizing the mean squared error of a model. Maximizing the throughput of a processing pipeline. These problems have a unique optimal solution or a set of solutions with the same objective value.
Multi-objective problems have multiple objective functions that we want to optimize simultaneously. Formally, given a decision vector x in the search space, we want to optimize:
minimize f₁(x), f₂(x), ..., fₖ(x)
Where each fᵢ is a different objective function. We write "minimize" by convention, but maximizing f(x) is equivalent to minimizing -f(x).
The problem arises when the objectives are in conflict. Improving f₁ worsens f₂. A concrete example: designing an image compressor.
- Objective 1: Minimize compressed file size
- Objective 2: Minimize quality loss (measured with PSNR)
Increasing the compression rate reduces the size but degrades the quality. Preserving perfect quality requires lossless compression with larger sizes. There is no configuration that simultaneously minimizes both objectives. The solution is not a point, it's a curve of trade-offs.
Another real example: configuring a Kubernetes cluster.
- Objective 1: Minimize p99 latency of requests
- Objective 2: Minimize infrastructure cost (number of nodes)
- Objective 3: Maximize CPU utilization (avoid waste)
Horizontal scaling reduces latency but increases cost and reduces per-node utilization. Reducing nodes increases utilization but can saturate CPU causing high latencies. Each configuration (number of nodes, resource limits, autoscaling policies) represents a different point in the objective space.
Pareto dominance: comparing without weighting
Example: Choosing a cloud server
You have to choose an instance type for your application. Two options:
- Server A: 100ms latency, 50€/month
- Server B: 150ms latency, 80€/month
Server A dominates B because it's better in both aspects: faster AND cheaper. No one would rationally choose B. But if you compare:
- Server A: 100ms latency, 50€/month
- Server C: 80ms latency, 70€/month
Neither dominates the other. A is cheaper but C is faster. It depends on your priorities: if budget is critical you choose A, if speed is vital you choose C. There's no single correct answer.
When you have conflicting objectives, you can't simply add them together. The expression latency_ms + cost_eur doesn't make dimensional sense. Adding milliseconds with euros is invalid. Weighting with arbitrary weights w₁·latency + w₂·cost requires deciding the relative importance in advance, which is exactly what we want to avoid.
Pareto dominance defines a partial order relation between solutions without needing to weight objectives. A solution a dominates another solution b (written a ≺ b) if and only if:
- a is equal or better than b in all objectives: fᵢ(a) ≤ fᵢ(b) for all i
- a is strictly better than b in at least one objective: there exists some j where fⱼ(a) < fⱼ(b)
The first condition guarantees that a is not worse in any aspect. The second guarantees that a is genuinely better, not simply equal. If both solutions have exactly the same values in all objectives, neither dominates the other even though they are identical.
Concrete examples with two objectives (minimizing both):
Solution A: f₁=10, f₂=20
Solution B: f₁=15, f₂=25
A dominates B because 10 < 15 and 20 < 25. A is better in both objectives.
Solution A: f₁=10, f₂=30
Solution B: f₁=15, f₂=25
Neither dominates the other. A is better in f₁ but worse in f₂. B is better in f₂ but worse in f₁. These solutions are incomparable under Pareto dominance.
Solution A: f₁=10, f₂=20
Solution B: f₁=10, f₂=25
A dominates B because both are equal in f₁ but A is strictly better in f₂.
Python implementation:
import numpy as np
from typing import List
def dominates(a: np.ndarray, b: np.ndarray) -> bool:
"""
Verifies if solution a dominates solution b.
Assumes minimization in all objectives.
Args:
a: Array of objectives for solution a
b: Array of objectives for solution b
Returns:
True if a dominates b, False otherwise
"""
# Verify that a is equal or better in all objectives
all_better_or_equal = np.all(a <= b)
# Verify that a is strictly better in at least one objective
any_strictly_better = np.any(a < b)
return all_better_or_equal and any_strictly_better
# Usage example
sol_a = np.array([10, 20]) # f1=10, f2=20
sol_b = np.array([15, 25]) # f1=15, f2=25
sol_c = np.array([10, 30]) # f1=10, f2=30
print(f"A dominates B: {dominates(sol_a, sol_b)}") # True
print(f"A dominates C: {dominates(sol_a, sol_c)}") # True
print(f"C dominates A: {dominates(sol_c, sol_a)}") # False
print(f"B dominates C: {dominates(sol_b, sol_c)}") # False
This function assumes that all objectives are minimized. If you have objectives that are maximized, multiply those values by -1 before calling the function. For example, if f₂ is maximized, convert the problem using f₂' = -f₂ and minimize f₂'.
Pareto frontier: non-dominated solutions
Example: Configuring your database
You test 10 different configurations of your PostgreSQL: different buffer sizes, maximum connections, vacuum policies. You measure queries per second and memory used for each one. Of those 10 configurations, only 4 are on the Pareto frontier:
- Config 1: 1000 qps, 2GB RAM (speed extreme)
- Config 2: 800 qps, 1.5GB RAM (balance)
- Config 3: 600 qps, 1GB RAM (balance)
- Config 4: 400 qps, 0.5GB RAM (memory extreme)
The other 6 configurations are dominated: there is always a config that gives more qps with equal or less RAM. Those 6 can be automatically discarded. Only the 4 on the frontier deserve consideration because each represents a different optimal trade-off.
The Pareto set is the subset of all candidate solutions where no solution is dominated by another in the same set. Formally:
P = { x ∈ S | there does not exist y ∈ S such that y ≺ x }
Where S is the set of all candidate solutions. In other words, a solution is in the Pareto set if and only if no other solution dominates it.
The Pareto frontier is the image of this set in the objective space. If we graph all solutions with axes corresponding to each objective, the Pareto frontier is the curve formed by the non-dominated points.
The most straightforward algorithm to calculate the Pareto set is to compare each solution against all others:
def pareto_frontier(solutions: np.ndarray) -> np.ndarray:
"""
Calculates the Pareto frontier of a set of solutions.
Args:
solutions: Array of shape (n_solutions, n_objectives)
Returns:
Boolean array of size n_solutions where True indicates
that the solution is on the Pareto frontier
"""
n_solutions = solutions.shape[0]
is_pareto = np.ones(n_solutions, dtype=bool)
for i in range(n_solutions):
if not is_pareto[i]:
# If we already know that i is dominated, skip it
continue
for j in range(n_solutions):
if i == j:
continue
# If j dominates i, mark i as dominated
if dominates(solutions[j], solutions[i]):
is_pareto[i] = False
break
return is_pareto
# Generate example solutions
np.random.seed(42)
solutions = np.random.rand(100, 2) * 100
# Calculate frontier
pareto_mask = pareto_frontier(solutions)
pareto_solutions = solutions[pareto_mask]
print(f"Total solutions: {len(solutions)}")
print(f"Solutions on Pareto frontier: {len(pareto_solutions)}")
print(f"Percentage: {100 * len(pareto_solutions) / len(solutions):.1f}%")
This algorithm has O(n²) complexity where n is the number of solutions. For each solution i, it potentially compares against all other n-1 solutions. In problems with tens of thousands of solutions, this can become slow.
A common optimization is to sort the solutions by the first objective before comparing. If the solutions are sorted in ascending order by f₁, when we process solution i, we only need to compare against previous solutions that have f₁ ≤ f₁(i). This optimization reduces the number of comparisons in practice, although the asymptotic complexity remains O(n²).
def pareto_frontier_optimized(solutions: np.ndarray) -> np.ndarray:
"""
Optimized version that sorts by the first objective.
"""
# Sort by first objective
sorted_indices = np.argsort(solutions[:, 0])
sorted_solutions = solutions[sorted_indices]
n_solutions = solutions.shape[0]
is_pareto = np.ones(n_solutions, dtype=bool)
for i in range(n_solutions):
if not is_pareto[i]:
continue
# Only compare against previous solutions (f1 less than or equal)
for j in range(i):
if is_pareto[j] and dominates(sorted_solutions[j], sorted_solutions[i]):
is_pareto[i] = False
break
# Reverse the sorting to return in original order
original_order = np.empty(n_solutions, dtype=bool)
original_order[sorted_indices] = is_pareto
return original_order
Hypervolume: measuring frontier quality
Example: Comparing two tuning algorithms
You run two hyperparameter optimization frameworks for your model: Optuna and Ray Tune. Both give you Pareto frontiers optimizing accuracy vs training time. Which one worked better?
Hypervolume answers this. You define a "worst acceptable" reference point: 70% accuracy, 1 hour time. Hypervolume measures the area of space that each frontier dominates with respect to that reference point. If Optuna has hypervolume 0.45 and Ray Tune has 0.38, Optuna won: it covers more space of good solutions.
It's like measuring "how much territory conquered" each algorithm in the solution space. More dominated area = better solutions found.
Once you have a Pareto frontier, you need to measure how good it is. This is essential for comparing different optimization algorithms or for monitoring convergence during search.
Hypervolume (also called S-metric or Lebesgue measure) is the most robust metric for evaluating Pareto frontiers. It measures the volume of the objective space that is dominated by the frontier with respect to a reference point.
Formally, given a set of solutions P and a reference point r, the hypervolume is:
HV(P, r) = λ( ⋃_{x∈P} [f(x), r] )
Where λ is the Lebesgue measure (volume) and [f(x), r] is the hyperrectangle from point f(x) to reference point r.
In two dimensions, this is easily visualized. Each solution on the frontier defines a rectangle from that point to the reference point. Hypervolume is the total area covered by the union of these rectangles.
The reference point must be worse than any solution of interest in all objectives. Typically it is chosen as a point slightly worse than the nadir point (the point with the worst values of each objective on the frontier).
For two objectives, the calculation is efficient by sorting the points:
def hypervolume_2d(pareto_front: np.ndarray, reference_point: np.ndarray) -> float:
"""
Calculates the hypervolume of a Pareto frontier in 2D.
Args:
pareto_front: Array of shape (n_points, 2) with points on the frontier
reference_point: Array of shape (2,) with the reference point
Returns:
Hypervolume (area) dominated by the frontier
"""
# Sort by first objective (ascending)
sorted_front = pareto_front[np.argsort(pareto_front[:, 0])]
# Calculate accumulated area
hypervolume = 0.0
prev_x = reference_point[0]
for point in sorted_front:
# Width of rectangle: from previous point to this point
width = prev_x - point[0]
# Height of rectangle: from this point to reference point
height = reference_point[1] - point[1]
# Accumulate area
hypervolume += width * height
# Update X position for next rectangle
prev_x = point[0]
return hypervolume
# Example: two frontiers to compare
front_a = np.array([[1, 5], [2, 3], [3, 2], [5, 1]])
front_b = np.array([[1, 6], [3, 4], [4, 3], [6, 1]])
ref_point = np.array([7, 7])
hv_a = hypervolume_2d(front_a, ref_point)
hv_b = hypervolume_2d(front_b, ref_point)
print(f"Hypervolume frontier A: {hv_a:.2f}")
print(f"Hypervolume frontier B: {hv_b:.2f}")
print(f"Frontier {'A' if hv_a > hv_b else 'B'} dominates more space")
The most important property of hypervolume is monotonicity with respect to dominance: if a frontier P' completely dominates another frontier P (i.e., for each point in P there exists at least one point in P' that dominates it), then HV(P', r) ≥ HV(P, r). This makes hypervolume a theoretically sound metric for comparing frontiers.
For problems with three or more objectives, calculating exact hypervolume is computationally expensive (NP-hard). The most practical approximation is Monte Carlo:
def hypervolume_monte_carlo(pareto_front: np.ndarray,
reference_point: np.ndarray,
n_samples: int = 10000) -> float:
"""
Approximates hypervolume using Monte Carlo for any number of objectives.
Args:
pareto_front: Array of shape (n_points, n_objectives)
reference_point: Array of shape (n_objectives,)
n_samples: Number of random points to sample
Returns:
Hypervolume estimation
"""
n_objectives = pareto_front.shape[1]
# Find bounds of the sampling hyperrectangle
min_bounds = np.min(pareto_front, axis=0)
max_bounds = reference_point
# Generate random points uniformly in the hyperrectangle
random_points = np.random.uniform(
low=min_bounds,
high=max_bounds,
size=(n_samples, n_objectives)
)
# Count how many points are dominated by at least one frontier point
dominated_count = 0
for random_point in random_points:
# Check if any frontier point dominates this random point
is_dominated = False
for pareto_point in pareto_front:
# A frontier point dominates the random point if it is
# better or equal in all objectives
if np.all(pareto_point <= random_point):
is_dominated = True
break
if is_dominated:
dominated_count += 1
# Hypervolume is the proportion of dominated points
# multiplied by the total volume of the hyperrectangle
total_volume = np.prod(max_bounds - min_bounds)
hypervolume = (dominated_count / n_samples) * total_volume
return hypervolume
# Example with 3 objectives
front_3d = np.array([
[1, 5, 3],
[2, 3, 4],
[3, 2, 5],
[4, 1, 6]
])
ref_3d = np.array([7, 7, 7])
hv_3d = hypervolume_monte_carlo(front_3d, ref_3d, n_samples=50000)
print(f"3D Hypervolume (Monte Carlo): {hv_3d:.2f}")
The Monte Carlo method converges to the real value with O(1/√n) where n is the number of samples. With 10,000 samples you get approximately 1% error. For practical applications where you only need to compare frontiers relatively (A is better than B), this is sufficiently accurate.
Spacing: measuring uniform distribution
Example: Offering options to the user
Your e-commerce application optimizes delivery speed vs shipping cost. You find 10 Pareto-optimal solutions, but 8 are in the "very fast, very expensive" zone and only 2 in "slow, economical". Technically they are optimal, but you offer unbalanced options: the user looking for balance has no intermediate options.
Spacing measures this. Low spacing means uniform distribution: there are options distributed evenly across the entire frontier. High spacing means clustering: solutions are grouped leaving gaps. For UX, you want low spacing: give the user truly diverse options, not 8 variations of "express premium".
Hypervolume measures how close the frontier is to the theoretical optimum, but it doesn't measure how uniformly distributed the solutions are along the frontier. A frontier can have high hypervolume but all solutions concentrated in one region, leaving other regions without coverage.
Spacing quantifies the uniformity of the distribution. It measures the standard deviation of the distances to the nearest neighbor of each point:
S = √( (1/n) Σᵢ(dᵢ - d̄)² )
Where:
- n is the number of points on the frontier
- dᵢ is the minimum distance from point i to any other point
- d̄ is the mean of all minimum distances
Low spacing indicates uniform distribution (all solutions are approximately at the same distance from their neighbors). High spacing indicates clustering (some solutions are very close together and others very separated).
from scipy.spatial import distance_matrix
def spacing(pareto_front: np.ndarray) -> float:
"""
Calculates the spacing metric (distribution uniformity).
Args:
pareto_front: Array of shape (n_points, n_objectives)
Returns:
Spacing value (lower is better = more uniform)
"""
n_points = len(pareto_front)
if n_points < 2:
return 0.0
# Calculate distance matrix between all points
dist_matrix = distance_matrix(pareto_front, pareto_front)
# Put infinity on diagonal to ignore self-distance
np.fill_diagonal(dist_matrix, np.inf)
# Find minimum distance for each point
min_distances = np.min(dist_matrix, axis=1)
# Calculate standard deviation of these distances
mean_dist = np.mean(min_distances)
variance = np.mean((min_distances - mean_dist) ** 2)
spacing_value = np.sqrt(variance)
return spacing_value
# Compare two frontiers: one uniform and one with clustering
uniform_front = np.array([
[1, 5], [2, 4], [3, 3], [4, 2], [5, 1]
])
clustered_front = np.array([
[1, 5], [1.1, 4.9], [1.2, 4.8], # Cluster at one extreme
[4, 2], [5, 1] # Isolated points at the other
])
print(f"Spacing uniform frontier: {spacing(uniform_front):.4f}")
print(f"Spacing clustered frontier: {spacing(clustered_front):.4f}")
In practical applications, you want low spacing because it means you're giving the user diverse options. If all your solutions are in a small region of the frontier, the user can only choose between very similar trade-offs. A well-distributed frontier offers options from "optimize almost exclusively objective 1" to "optimize almost exclusively objective 2" with intermediate gradations.
Coverage: comparing two frontiers
Example: Compiler benchmark
You optimize compilation flags for GCC vs Clang for your codebase: smaller binary vs faster execution. GCC produces 12 Pareto-optimal configurations, Clang produces 15. Which is better?
Coverage(GCC, Clang) = 0.80 means that 80% of Clang's solutions are dominated by at least one from GCC. Coverage(Clang, GCC) = 0.25 means that only 25% of GCC's are dominated by Clang. Conclusion: GCC found better solutions, it dominates most of what Clang offers.
It's like a direct combat: "how many of your points are worse than at least one of mine?". High coverage in one direction = clear superiority.
When you compare two multi-objective optimization algorithms, you need to determine which produces better frontiers. Hypervolume tells you which dominates more space, but it doesn't tell you how many solutions from one frontier are dominated by the other.
The coverage metric C(A, B) measures what percentage of solutions in B are dominated by at least one solution in A:
C(A, B) = |{b ∈ B | ∃a ∈ A : a ≺ b}| / |B|
This metric is asymmetric: C(A, B) is not equal to C(B, A). If C(A, B) = 1, it means all solutions of B are dominated by at least one solution of A. If C(A, B) = 0, no solution of B is dominated by A.
def coverage(front_a: np.ndarray, front_b: np.ndarray) -> float:
"""
Calculates C(A, B): what percentage of B is dominated by A.
Args:
front_a: Array of shape (n_a, n_objectives)
front_b: Array of shape (n_b, n_objectives)
Returns:
Value between 0 and 1 (higher means A dominates more of B)
"""
n_b = len(front_b)
dominated_count = 0
for sol_b in front_b:
# Check if any solution from A dominates this solution from B
for sol_a in front_a:
if dominates(sol_a, sol_b):
dominated_count += 1
break # No need to keep checking other A solutions
return dominated_count / n_b if n_b > 0 else 0.0
# Example: compare two frontiers
front_a = np.array([[1, 5], [2, 3], [3, 2], [5, 1]])
front_b = np.array([[2, 6], [3, 4], [4, 3], [6, 2]])
c_ab = coverage(front_a, front_b)
c_ba = coverage(front_b, front_a)
print(f"C(A, B) = {c_ab:.2f}") # What percentage of B does A dominate
print(f"C(B, A) = {c_ba:.2f}") # What percentage of A does B dominate
if c_ab > c_ba:
print("Frontier A dominates more solutions of B than vice versa")
elif c_ba > c_ab:
print("Frontier B dominates more solutions of A than vice versa")
else:
print("Both frontiers have similar coverage")
In algorithm benchmarking, you typically run each algorithm multiple times with different random seeds, get one frontier per run, and calculate C(A, B) between all pairs of frontiers from both algorithms. Then you report mean and standard deviation of C(A, B) and C(B, A).
Crowding distance: preserving diversity in NSGA-II
Example: Evolving neural network architectures
You use a genetic algorithm to find optimal neural network architectures: accuracy vs number of parameters. After 100 generations, all solutions converge to the same region: networks of ~50M parameters with 92-93% accuracy. You lost diversity: there are no options for small mobile networks or large networks for maximum accuracy.
Crowding distance solves this. It assigns a "loneliness score" to each solution: solutions in dense regions receive low score, isolated solutions receive high score. During selection, the algorithm prefers lonely solutions for reproduction, incentivizing exploration of less populated regions of the frontier.
It's like distributing resources uniformly in a territory: if there are many people in one zone, you incentivize migration to empty zones to maintain balanced coverage of space.
The NSGA-II (Non-dominated Sorting Genetic Algorithm II) algorithm is one of the most popular evolutionary algorithms for multi-objective optimization. One of its key components is crowding distance, which estimates the density of solutions around each frontier point.
The problem it solves is the following: during evolution, the algorithm tends to converge toward frontier regions where solutions are locally optimal according to genetic operators. This causes clustering, leaving other regions unexplored. Crowding distance penalizes solutions in dense regions, incentivizing the algorithm to maintain diversity.
The crowding distance of a solution is calculated as the sum of distances to its neighbors in each objective, normalized by the objective's range:
CD(i) = Σⱼ [ (fⱼ(i+1) - fⱼ(i-1)) / (fⱼᵐᵃˣ - fⱼᵐⁱⁿ) ]
Where:
- i+1 is the immediate upper neighbor in objective j
- i-1 is the immediate lower neighbor in objective j
- fⱼᵐᵃˣ and fⱼᵐⁱⁿ are the maximum and minimum values of objective j across the entire frontier
Solutions at the extremes of each objective receive infinite crowding distance because they are unique (they have no neighbor in that direction).
def crowding_distance(pareto_front: np.ndarray) -> np.ndarray:
"""
Calculates the crowding distance for each point on the Pareto frontier.
Args:
pareto_front: Array of shape (n_points, n_objectives)
Returns:
Array of crowding distances of size n_points
"""
n_points, n_objectives = pareto_front.shape
# Initialize all distances to 0
distances = np.zeros(n_points)
# Calculate for each objective
for obj_idx in range(n_objectives):
# Sort by this objective
sorted_indices = np.argsort(pareto_front[:, obj_idx])
# Extract values of this objective sorted
obj_values = pareto_front[sorted_indices, obj_idx]
# Objective range (for normalization)
obj_range = obj_values[-1] - obj_values[0]
# Special cases: if all values are equal, range is 0
if obj_range == 0:
continue
# Assign infinite distance to extremes
distances[sorted_indices[0]] = np.inf
distances[sorted_indices[-1]] = np.inf
# Calculate distance for intermediate points
for i in range(1, n_points - 1):
# Distance to upper neighbor minus distance to lower neighbor
distance_contribution = (obj_values[i + 1] - obj_values[i - 1]) / obj_range
# Accumulate distance (sum contributions from all objectives)
distances[sorted_indices[i]] += distance_contribution
return distances
# Example: frontier with non-uniform distribution
front = np.array([
[1, 10], # Extreme objective 1
[2, 8],
[2.1, 7.9], # Very close to previous (should have low CD)
[5, 5],
[10, 1] # Extreme objective 2
])
cd = crowding_distance(front)
print("Crowding distances:")
for i, (point, distance) in enumerate(zip(front, cd)):
print(f"Point {i}: {point} -> CD = {distance:.3f if distance != np.inf else 'inf'}")
print("\nPoints sorted by crowding distance (from most isolated to most dense):")
sorted_indices = np.argsort(cd)[::-1]
for idx in sorted_indices:
cd_str = "inf" if cd[idx] == np.inf else f"{cd[idx]:.3f}"
print(f"Point {idx}: {front[idx]} -> CD = {cd_str}")
In NSGA-II, during parent selection for reproduction, when two solutions are on the same dominance front (same rank), the one with higher crowding distance is preferred. This incentivizes exploring less populated regions of the frontier.
The complete NSGA-II algorithm combines:
- Non-dominated sorting: classify solutions into multiple fronts (front 1 = Pareto frontier, front 2 = Pareto frontier of the rest, etc.)
- Crowding distance: calculate density within each front
- Tournament selection: compare solutions by rank first, by crowding distance if they have the same rank
- Standard genetic operators: crossover and mutation
Complete implementation: reusable class
Putting all metrics together in a cohesive class for use in real projects:
from dataclasses import dataclass
from typing import Tuple
import numpy as np
from scipy.spatial import distance_matrix
@dataclass
class ParetoMetrics:
"""Results of Pareto frontier analysis."""
hypervolume: float
spacing: float
n_points: int
crowding_distances: np.ndarray
class MultiObjectiveProblem:
"""
Class for working with multi-objective optimization problems.
"""
def __init__(self, solutions: np.ndarray, reference_point: np.ndarray = None):
"""
Args:
solutions: Array (n_solutions, n_objectives)
reference_point: Reference point for hypervolume.
If None, calculated automatically.
"""
self.solutions = solutions
self.n_solutions, self.n_objectives = solutions.shape
if reference_point is None:
# Reference point: slightly worse than the worst in each objective
self.reference_point = np.max(solutions, axis=0) * 1.1
else:
self.reference_point = reference_point
def get_pareto_front(self) -> Tuple[np.ndarray, np.ndarray]:
"""
Calculates the Pareto frontier.
Returns:
Tuple of (solutions on frontier, original indices)
"""
pareto_mask = self._pareto_frontier()
pareto_solutions = self.solutions[pareto_mask]
pareto_indices = np.where(pareto_mask)[0]
return pareto_solutions, pareto_indices
def _pareto_frontier(self) -> np.ndarray:
"""Pareto frontier algorithm implementation."""
is_pareto = np.ones(self.n_solutions, dtype=bool)
for i in range(self.n_solutions):
if not is_pareto[i]:
continue
for j in range(self.n_solutions):
if i == j:
continue
if self._dominates(self.solutions[j], self.solutions[i]):
is_pareto[i] = False
break
return is_pareto
@staticmethod
def _dominates(a: np.ndarray, b: np.ndarray) -> bool:
"""Verifies Pareto dominance."""
return np.all(a <= b) and np.any(a < b)
def calculate_metrics(self) -> ParetoMetrics:
"""
Calculates all metrics for the Pareto frontier.
Returns:
ParetoMetrics object with all metrics
"""
pareto_front, _ = self.get_pareto_front()
if self.n_objectives == 2:
hv = self._hypervolume_2d(pareto_front)
else:
hv = self._hypervolume_monte_carlo(pareto_front)
sp = self._spacing(pareto_front)
cd = self._crowding_distance(pareto_front)
return ParetoMetrics(
hypervolume=hv,
spacing=sp,
n_points=len(pareto_front),
crowding_distances=cd
)
def _hypervolume_2d(self, front: np.ndarray) -> float:
"""Exact hypervolume for 2 objectives."""
sorted_front = front[np.argsort(front[:, 0])]
hypervolume = 0.0
prev_x = self.reference_point[0]
for point in sorted_front:
width = prev_x - point[0]
height = self.reference_point[1] - point[1]
hypervolume += width * height
prev_x = point[0]
return hypervolume
def _hypervolume_monte_carlo(self, front: np.ndarray, n_samples: int = 10000) -> float:
"""Monte Carlo hypervolume for 3+ objectives."""
min_bounds = np.min(front, axis=0)
max_bounds = self.reference_point
random_points = np.random.uniform(
low=min_bounds,
high=max_bounds,
size=(n_samples, self.n_objectives)
)
dominated_count = 0
for random_point in random_points:
for pareto_point in front:
if np.all(pareto_point <= random_point):
dominated_count += 1
break
total_volume = np.prod(max_bounds - min_bounds)
return (dominated_count / n_samples) * total_volume
def _spacing(self, front: np.ndarray) -> float:
"""Calculates spacing."""
if len(front) < 2:
return 0.0
dist_matrix = distance_matrix(front, front)
np.fill_diagonal(dist_matrix, np.inf)
min_distances = np.min(dist_matrix, axis=1)
mean_dist = np.mean(min_distances)
variance = np.mean((min_distances - mean_dist) ** 2)
return np.sqrt(variance)
def _crowding_distance(self, front: np.ndarray) -> np.ndarray:
"""Calculates crowding distance."""
n_points = len(front)
distances = np.zeros(n_points)
for obj_idx in range(self.n_objectives):
sorted_indices = np.argsort(front[:, obj_idx])
obj_values = front[sorted_indices, obj_idx]
obj_range = obj_values[-1] - obj_values[0]
if obj_range == 0:
continue
distances[sorted_indices[0]] = np.inf
distances[sorted_indices[-1]] = np.inf
for i in range(1, n_points - 1):
distance_contribution = (obj_values[i + 1] - obj_values[i - 1]) / obj_range
distances[sorted_indices[i]] += distance_contribution
return distances
@staticmethod
def compare_fronts(front_a: np.ndarray, front_b: np.ndarray) -> Tuple[float, float]:
"""
Compares two frontiers using coverage.
Returns:
Tuple (C(A,B), C(B,A))
"""
def coverage(fa: np.ndarray, fb: np.ndarray) -> float:
dominated = 0
for sb in fb:
for sa in fa:
if MultiObjectiveProblem._dominates(sa, sb):
dominated += 1
break
return dominated / len(fb) if len(fb) > 0 else 0.0
return coverage(front_a, front_b), coverage(front_b, front_a)
# Complete usage example
if __name__ == "__main__":
# Generate test problem
np.random.seed(42)
solutions = np.random.rand(200, 2) * 100
# Create problem
problem = MultiObjectiveProblem(solutions)
# Get frontier
pareto_front, indices = problem.get_pareto_front()
print(f"Found {len(pareto_front)} solutions on Pareto frontier")
# Calculate metrics
metrics = problem.calculate_metrics()
print(f"\nFrontier metrics:")
print(f" Hypervolume: {metrics.hypervolume:.2f}")
print(f" Spacing: {metrics.spacing:.4f}")
print(f" Points on frontier: {metrics.n_points}")
# Show crowding distances
print(f"\nTop 5 most isolated points (highest crowding distance):")
sorted_cd = np.argsort(metrics.crowding_distances)[::-1]
for i in sorted_cd[:5]:
cd_str = "inf" if metrics.crowding_distances[i] == np.inf else f"{metrics.crowding_distances[i]:.3f}"
print(f" {pareto_front[i]} -> CD = {cd_str}")
This implementation is ready to use in production projects. It includes handling of special cases (frontiers with a single point, objectives with zero range, duplicate solutions) and uses NumPy efficiently for vectorized operations where possible.
Practical applications
Multi-objective optimization techniques are used in multiple domains:
Machine Learning: Hyperparameter tuning optimizing accuracy vs training time vs model size. Neural Architecture Search balancing accuracy vs latency vs FLOPs.
Software engineering: Compiler optimization balancing compilation time vs binary size vs execution speed. Resource allocation in clusters balancing latency vs cost vs utilization.
Neural networks: Network pruning optimizing accuracy vs sparsity vs inference speed. Quantization balancing precision vs size vs latency on edge devices.
Distributed systems: Cache configuration balancing hit rate vs memory used vs synchronization overhead. Processing pipeline design balancing throughput vs latency vs cost.
Finance: Portfolio optimization maximizing expected return while minimizing risk and respecting liquidity constraints.
In all these cases, the Pareto frontier provides a set of optimal solutions from which the user or system can choose according to context-specific preferences. The metrics implemented here allow rigorous quantification and comparison of the quality of these frontiers.