Contents Preface v 1 Introduction 1 Mathematical Formulation 2 Example:A Transportation Problem 4 Continuous versus Discrete Optimization 4 Constrained and Unconstrained Optimization 6 Global and Local Optimization 6 Stochastic and Deterministic Optimization 7 Optimization Algorithms 7 Convexity 8 Notes and References 9 2 Fundamenta1s of Unconstrained Optimization 10 2.1 What Is a Solution? 13 Recognizing a Local Minimum 15 Nonsmooth Problems 18 2.2 Overview of A1 gorithms 19 Two Strategies:Line Search and Trust Region 19 Search Directions for Line Search Methods 21 Models for Trust-Region Methods 26 Scaling 27 Rates of Convergence 28 R-Rates ofConvergence 29 Notes and References 30 Exercises 30 3 Line Search Methods 34 3.1 Step Length 36 The Wolfe Conditions 37 The Goldstein Conditions 41 Sufficient Decrease and Backtracking 41 3.2 Convergence of Line Search Methods 43 3.3 Rate of Convergence 46 Convergence Rate of Steepest Descent 47 Quasi-Newton Methods 49 Newton's Method 51 Coordinate Descent Methods 53 3.4 Step-Length Selection A1gorithms 55 Interpolation 56 The Initial Step Length 58 A Line Search A1gorithm for the Wolfe Conditions 58 Notes and References 61 Exercises 62 4 Trust-Region Methods 64 Outline of the A1gorithm 67 4.1 The Cauchy Point and Related A1 gorithms 69 The Cauchy Point 69 Improving on the Cauchy Point 70 The Dogleg Method 71 Two- Dimensional Subspace Minimization 74 Steihaug's Approach 75 4.2 Using Nearly Exact Solutions to the Subproblem 77 Characterizing Exact Solutions 77 Calculating Nearly Exact Solutions 78 The Hard Case 82 Proof of Theorem 4.3 84 4.3 Global Convergence 87 Reduction Obtained by the Cauchy Point 87 Convergence to Stationary Points 89 Convergence of Algorithms Based on Nearly Exact Solutions 93 4.4 Other Enhancements 94 Scaling 94 Non-Euclidean Trust Regions 96 Notes and References 97 Exercises 97 5 Conjugate Gradient Methods 100 5.1 The Linear Conjugate Gradient Method 102 Conjugate Direction Methods 102 Basic Properties of the Conjugate Gradient Method 107 A Practical Form of the Conjugate Gradient Method 111 Rate of Convergence 112 Preconditioning 118 Practical Preconditioners 119 5.2 Nonlinear Conjugate Gradient Methods 120 The Fletcher-Reeves Method 120 The Polak-Ribiere Method 121 Quadratic Termination and Restarts 122 Numerical Performance 124 Behavior of the Fletcher-Reeves Method 124 Global Convergence 127 Notes and References 131 Exercises 132 6 Practical Newton Methods 134 6.1 Inexact Newton Steps 136 6.2 Line Search Newton Methods 139 Line Search Newton-CG Method 139 Modified Newton's Method 141 6.3 Hessian Modifications 142 Eigenvalue Modification 143 Adding a Multiple of the Identity 144 Modified Cholesky Factorization 145 Gershgorin Modification 150 Modified Symmetric Indefinite Factorization 151 6.4 Trust-Region Newton Methods 154 Newton-Dogleg and Subspace-Minimization Methods 154 Accurate Solution of the Trust -Region Problem 155 Trust-Region Newton-CG Method 156 Preconditioning the Newton-CG Method 157 Local Convergence of Trust-Region Newton Methods 159 Notes and References 162 Exercises 162 7 Calculating Derivatives 164 7.1 Finite-Dicerence Derivative Approximations 166 Approximating the Gradient 166 Approximating a Sparse Jacobian 169 Approximating the Hessian 173 Approximating a Sparse Hessian 174 7.2 Automatic Differentiation 176 An Example 177 The Forward Mode 178 The Reverse Mode 179 Vector Functions and Partial Separability 183 Calculating Jacobians ofVector Functions 184 Calculating Hessians:Forward Mode 185 Calculating Hessians:Reverse Mode 187 Current Limitations 188 Notes and References 189 Exercises 189 8 Quasi-Newton Methods 192 8.1 The BFGS Method 194 Properties of the BFGS Method 199 Implementation 200 8.2 The SRl Method 202 Properties of SRl Updating 205 8.3 The Broyden Class 207 Properties of the Broyden Class 209 8.4 Convergence Analysis 211 Global Convergence of the BFGS Method 211 Superlinear Convergence of BFGS 214 Convergence Analysis of the SRl Method 218 Notes and References 219 Exercises 220 9 Large-Scale Quasi-Newton and Partially Separable Optimization 222 9.1 Limited-Memory BFGS 224 Relationship with Conjugate Gradient Methods 227 9.2 General Limited-MemoryUpdating 229 Compact Representation of BFGS Updating 230 SR1 Matrìces 232 Unrolling the Update 232 9.3 Sparse Quasi-Newton Updates 233 9.4 Partially Separable Functions 235 A Simple Example 236 Internal Variables 237 9.5 Invariant Subspaces and Partial Separability 240 Sparsity vs. Partial Separability 242 Group Partial Separability 243 9.6 Algorithms for Partially Separable Functions 244 Exploiting Partial Separability in Newton's Method 244 Quasi-Newton Methods for Partially Separable Functions 245 Notes and References 247 Exercises 248 10 Nonlinear Least-Squares Problems 250 10.1 Background 253 Modeling, Regression, Statistics 253 Linear Least-Squares Problems 256 10.2 Algorithms for Nonlinear Least-Squares Problems 259 The Gauss-Newton Method 259 The Levenberg-Marquardt Method 262 Implementation of the Levenberg-Marquardt Method 264 Large- Residual Problems 266 Large-Scale Problems 269 10.3 Orthogonal Distance Regression 271 Notes and References 273 Exercises 274 11 Nonlinear Equations 276 11.1 Local Algorithms 281 Newton's Method for Nonlinear Equations 281 Inexact Newton Methods 284 Broyden's Method 286 Tensor Methods 290 11.2 Practical Methods 292 Merit Functions 292 Li ne Search Methods 294 Trust-Region Methods 298 11.3 Continuation/Homotopy Methods 304 Motivation 304 Practical Continuation Methods 306 Notes and References 310 Exercises 311 12 Theory of Constrained Optimization 314 Local and Global Solutions 316 Smoothness 317 12.1 Examples 319 A Single Equality Constraint 319 A Single Inequality Constraint 321 Two Inequality Constraints 324 12.2 First-Order Optimality Conditions 327 Statement of First-Order Necessary Conditions 327 Sensitivity 330 12.3 Derivation ofthe First-Order Conditions 331 Feasible Sequences 332 Characterizing Limiting Directions:Constraint Qualifications 336 Introducing Lagrange Multipliers 339 Proof of Theorem 12.1 341 12.4 Second-Order Conditions 342 Second-Order Conditions and Projected Hessians 348 Convex Programs 350 12.5 Other Constraint Qualifications 351 12.6 A Geometric Viewpoint 354 Notes and References 357 Exercises 358 13 Linear Programming:The Simplex Method 362 Linear Programming 364 13.1 Optimality and Duality 366 Optimality Conditions 366 The Dual Problem 367 13.2 Geometry of the Feasible Set 370 Basic Feasible Points 370 Vertices of the Feasible Polytope 372 13.3 The Simplex Method 374 Outline ofthe Method 374 Finite Termination of the Simplex Method 377 A Single Step of the Method 378 13.4 Linear Algebra in the Simplex Method 379 13.5 Other (Important) Details 383 Pricing and Selection of the Entering Index 383 Starting the Simplex Method 386 Degenerate Steps and Cycling 389 13.6 Where Does the Simplex Method Fit? 391 Notes and References 392 Exercises 393 14 Linear Programming:Interior-Point Methods 394 14.1 Primal-Dual Methods 396 Outline 396 The Central Path 399 A Primal-Dual Framework 401 Path-Following Methods 402 14.2 A Practical Primal-Dual Algorithm 404 Solving the Linear Systems 408 14.3 Other Primal-Dual Algorithms and Extensions 409 Other Path-Following Methods 409 Potential-Reduction Methods 409 Extensions 410 14.4 Analysis of Algorithm 14.2 411 Notes and References 416 Exercises 417 15 Fundamentals of Algorithms for Nonlinear Constrained Optimization 420 Initial Study of a Problem 422 15.1 Categorizing Optimization Algorithms 423 15.2 Elimination of Variables 426 Simple Elimination for Linear Constraints 427 General Reduction Strategies for Linear Constraints 430 The Effect of Inequality Constraints 434 15.3 Measuring Progress:Merit Functions 434 Notes and References 437 Exercises 438 16 Quadratic Programming 440 An Example:Portfolio Optimization 442 16.1 Equality-Constrained Quadratic Programs 443 Properties of Equality-Constrained QPs 444 16.2 Solving the KKT System 447 Direct Solution of the KKT System 448 Range-Space Method 449 Null-Space Method 450 A Method Based on Conjugacy 452 16.3 Inequality-Constrained Problems 453 Optimality Conditions for Inequality-Constrained Problems 454 Degeneracy 455 16.4 Active-Set Methods for Convex QP 457 Specification of the Active-Set Method for Convex QP 461 An Example 463 Further Remarks on the Active-Set Method 465 Finite Termination of the Convex QP A1gorithm 466 Updating Factorizations 467 16.5 Active-Set Methods for Indefinite QP 470 Illustration 472 Choice of Starting Point 474 Failure of the Active-Set Method 475 Detecting Indefiniteness Using the LB LT Factorization 475 16.6 The Gradient-Projection Method 476 Cauchy Point Computaion 477 Subspace Minimization 480 16.7 Interior-Point Methods 481 Extensions and Comparison with Active-Set Methods 484 16.8 Duality 484 Notes and References 485 Exercises 486 17 Penalty, Barrier, and Aunented Lagrangian Methods 490 17.1 The Quadratic Penalty Method 492 Motivation 492 A1gorithmic Framework 494 Convergence of the Quadratic Penalty Function 495 17.2 The Logarithmic Barrier Method 500 Properties of Logarithmic Barrier Functions 500 A1gorithms Based on the Log-Barrier Function 505 Properties ofthe Log-Barrier Function and Framework 17.2 507 Handling Equality Constraints 509 Relationship to Primal-Dual Methods 510 17.3 Exact Penalty Functions 512 17.4 Augmented Lagrangian Method 513 Motivation and Algorithm Framework 513 Extension to Inequality Constraints 516 Properties of the Augmented Lagrangian 518 Practical Implementation 521 17.5 Sequential Linearly Constrained Methods 523 Notes and References 525 Exercises 526 18 Sequential Quadratic Programming 528 18.1 Local SQP Method 530 SQP Framework 531 Inequality Constraints 533 IQP vs. EQP 534 18.2 Preview ofpractical SQP Methods 534 18.3 Step Computation 536 Equality Constraints 536 Inequality Constraints 538 18.4 The Hessian ofthe Quadratic Model 539 Full Quasi-Newton Approximations 540 Hessian of Augmented Lagrangian 541 Reduced-Hessian Approximations 542 18.5 Merit Functions and Descent 544 18.6 A Line Search SQP Method 547 18.7 Reduced-Hessian SQP Methods 548 Some Properties of Reduced-Hessian Methods 549 Update Criteria for Reduced-Hessian Updating 550 Changes of Bases 551 A Practical Reduced-Hessian Method 552 18.8 Trust-Region SQP Methods 553 Approach I:Shifting the Constraints 555 Approach II:Two Elliptical Constraints 556 Approach III:St QP (Sequential Quadratic Programming) 557 18.9 A Practical Trust-Region SQP A1 gorithm 560 18.10 Rate of Convergence 563 Convergence Rate of Reduced-Hessian Methods 565 18.11 The Maratos Effect 567 Second-Order Correction 570 Watchdog (Nonmonotone) Strategy 571 Notes and References 573 Exercises 574 A Background Material 576 A.1 Elements of An alysis, Geometry, Topology 577 Topology of the Euclidean Space Rn 577 Continuity and Limits 580 Derivatives 581 Directional Derivatives 583 Mean Value Theorem 584 Implicit Function Theorem 585 Geometry of Feasible Sets 586 Order Notation 591 Root-Finding for Scalar Equations 592 A.2 Elements ofLinear Algebra 593 Vectors and Matrices 593 Norms 594 Subspaces 597 Eigenvalues, Eigenvectors, and the Singular-Value Decomposition 598 Determinant and Trace 599 Matrix Factorizations:Cholesky, LU, QR 600 Sherman-Morrison-Woodbury Formula 605 Interlacing Eigenvalue Theorem 605 Error Analysis and Floating-Point Arithmetic 606 Conditioning and Stability 608 References 611 Index 625