Chapter 5 Moment Relaxation

In the last Chapter, we have learned a lot about polynomials, and in particular, the cone of nonnegative polynomials \(P_{n,2d}\), and the cone of SOS polynomials \(\Sigma_{n,2d}\), with the key inclusion relationship \[ \Sigma_{n,2d} \subseteq P_{n,2d}. \]

A natural question is then to characterize the dual of \(P_{n,2d}\) and \(\Sigma_{n,2d}\). Let me first restate the dual of a convex set here from Definition 1.4.

Definition 5.1 (Dual Cone) The dual of a convex cone \(S\) is \[ S^* := \{ y \mid \langle y, x \rangle \geq 0, \forall x \in S \}. \]

5.1 Dual Cones of Polynomials

What are the dual cones of \(P_{n,2d}\) and \(\Sigma_{n,2d}\)?

For a cone of polynomials, we will first need to define the notion of an inner product in Definition 5.1, i.e., a map that takes in a polynomial and returns a real number. The space of all such maps is the dual space of \(\mathbb{R}[x]_{2d}\), which we denote \(\mathbb{R}[x]^*_{2d}\). The elements of this vector space are linear functionals of polynomials \[\begin{equation} \ell: \mathbb{R}[x]_{2d} \rightarrow \mathbb{R}^{}. \tag{5.1} \end{equation}\] There are many such functionals and they can superficially look quite different. Given \(p(x) \in \mathbb{R}[x]_{2d}\), here are some examples of linear functionals:

  • Evaluation of \(p(x)\) at a point \(x_0 \in \mathbb{R}^{n}\), i.e., \(p \mapsto p(x_0)\)

  • Integration of \(p(x)\) over a subset \(S \subset \mathbb{R}^{n}\), i.e., \(p \mapsto \int_S p(x) dx\)

  • Evaluation of derivatives of \(p\) at a point \(x_0 \in \mathbb{R}^{n}\), i.e., \(p \mapsto \frac{\partial p}{\partial x_i \cdots \partial x_k} (x_0)\)

  • Extraction of coefficients, i.e., \(p \mapsto p_{\alpha}\), where \(p_{\alpha}\) is the coefficient corresponding to the monomial \(x^{\alpha}\)

For all the examples above, it is easy to verify linearity \[ \ell(p_1 + p_2) = \ell(p_1) + \ell(p_2), \quad \langle \ell, p_1 + p_2 \rangle = \langle \ell, p_1 \rangle + \langle \ell, p_2 \rangle. \]

A distinguished class of linear functionals are the point evaluations (our first example above): given any point \(v \in \mathbb{R}^{n}\), we can generate a linear functional \[ \ell_v \in \mathbb{R}[x]^*_{2d}: \ell_v(p) = p(x), \forall p \in \mathbb{R}[x]_{2d}. \] Then, we can generate additional linear functionals by taking linear combinations of point evaluations, i.e., maps of the form \[ p \mapsto \sum_{i} \lambda_i \ell_{v_i} (p) = \sum_{i} \lambda_i p(v_i), \quad \lambda_i \in \mathbb{R}^{}, v_i \in \mathbb{R}^{n}, \forall i. \] It turns out that all linear functionals can be generated in such form.

Theorem 5.1 (Dual Space of Polynomials) Every linear functional in \(\mathbb{R}[x]^*_{2d}\) is a linear combination of point evaluations, i.e., for any \(\ell \in \mathbb{R}[x]^*_{2d}\), there exists \(\lambda_1,\dots,\lambda_K\) and \(v_1,\dots,v_K\) such that \[ \ell_{p} = \sum_{i=1}^K \lambda_i p(v_i). \]

Truncated Multi-Sequence (TMS). The above characterization of \(\mathbb{R}[x]^*_{2d}\) is purely geometric and coordinate-free. Now we are going to fix the basis of \(\mathbb{R}[x]_{2d}\) to be the standard monomial basis \([x]_{2d}\) with length \(s(n,2d)\). In this coordinate, any linear functional \(\ell\) in (5.1) can be equivalently identified by its values at the monomial basis, this is \[ \ell: \mathbb{R}[x]_{2d} \rightarrow \mathbb{R}^{}, \quad x^{\alpha} \mapsto y_{\alpha}, \alpha \in \mathcal{F}_{n,2d}. \] This set of real numbers \[ y:=(y_\alpha)_{\alpha \in \mathcal{F}_{n,2d}} \in \mathbb{R}^{s(n,2d)} \] is called a truncated multi-sequences (TMS) of degree \(2d\). Using this TMS \(y\), and denote the linear functional associated with \(y\) as \(\ell_y \in \mathbb{R}[x]^*_{2d}\), then applying \(\ell_y\) to any polynomial \(p \in \mathbb{R}[x]_{2d}\) is simply \[ \ell_y(p) = \langle \ell_y, p \rangle = \langle y, \mathrm{vec}(p) \rangle = \sum_{\alpha \in \mathcal{F}_{n,2d}} p_\alpha y_\alpha \] i.e., the inner product between the TMS \(y\) and the vector of coefficients of \(p\): \[ \mathrm{vec}(p) = (p_\alpha)_{\alpha \in \mathcal{F}_{n,2d}}. \]

Moment Matrix. We now assemble the TMS \(y \in \mathbb{R}^{s(n,2d)}\) into symmetric matrices. In particular, for any nonnegative integer \(\kappa \in [0,d]\), the \(\kappa\)-th order moment matrix of the TMS \(y\) is a symmetric matrix \[ M_\kappa[y]:= [y_{\alpha + \beta}]_{\alpha,\beta \in \mathcal{F}_{n,\kappa}} \in \mathbb{S}^{s(n,\kappa)}, \] whose rows and columns are indexed by the exponents \(\alpha\) and \(\beta\), according to the graded lexicographic monomial ordering. Clearly, if \(\kappa_1 \leq \kappa_2\), then \(M_{\kappa_1}[y]\) is a leading principal submatrix of \(M_{\kappa_2}[y]\). Another perspective for looking at the moment matrix is that \[ M_{\kappa}[y] = \ell_y ([x]_{\kappa} [x]_{\kappa}^\top) = \langle \ell_y, [x]_{\kappa} [x]_{\kappa}^\top \rangle, \] where applying the linear functional \(\ell_y\) to the polynomial matrix \([x]_{\kappa} [x]_{\kappa}^\top\) is equivalent to element-wise application.

Let’s see some examples.

Example 5.1 (Moment Matrix) Consider the univariate case \(n=1\), and a TMS \[ y = (y_0,y_1,\dots,y_{2d}), \] the \(d\)-th order moment matrix of \(y\) is the following matrix \[ M_d[y] = \begin{bmatrix} y_0 & y_1 & y_2 & y_3 & \cdots & y_d \\ y_1 & y_2 & y_3 & y_4 & \cdots & y_{d+1} \\ y_2 & y_3 & y_4 & y_5 & \cdots & y_{d+2} \\ y_3 & y_4 & y_5 & y_6 & \cdots & y_{d+3} \\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\ y_d & y_{d+1} & y_{d+2} & y_{d+3} & \cdots & y_{2d} \end{bmatrix}, \] which is also a Hankel matrix.

For the binary case \(n=2\) and \(d=2\), we have \[ \mathcal{F}_{2,2} = \{ (00),(10),(01),(20),(11),(02) \}, \] and hence the second-order moment matrix is \[\begin{equation} M_2[y] = [y_{\alpha + \beta}]_{\alpha,\beta \in \mathcal{F}_{2,2}} = \begin{bmatrix} y_{00} & y_{10} & y_{01} & y_{20} & y_{11} & y_{02} \\ y_{10} & y_{20} & y_{11} & y_{30} & y_{21} & y_{12} \\ y_{01} & y_{11} & y_{02} & y_{21} & y_{12} & y_{03} \\ y_{20} & y_{30} & y_{21} & y_{40} & y_{31} & y_{22} \\ y_{11} & y_{21} & y_{12} & y_{31} & y_{22} & y_{13} \\ y_{02} & y_{12} & y_{03} & y_{22} & y_{13} & y_{04} \end{bmatrix}. \tag{5.2} \end{equation}\] Equivalently, one can check that \(M_2[y]\) is equivalent to applying \(\ell_y\) to the polynomial matrix \([x]_2 [x]_2^\top\).

In general, the moment matrix \(M_d[y]\) (given a TMS \(y \in \mathbb{R}^{s(n,2d)}\)) defines a bilinear form \(\langle \cdot, \cdot \rangle_y\) on \(\mathbb{R}[x]_d\). In particular, given any two polynomials \(p,q \in \mathbb{R}[x]_d\), we have \[\begin{equation} \langle p, q \rangle_y := \ell_y(p q) = \langle \mathrm{vec}(p), M_d[y] \mathrm{vec}(q) \rangle = \mathrm{vec}(p)^\top M_d[y] \mathrm{vec}(q). \tag{5.3} \end{equation}\] To see why the equation above is true, note that \(p(x) = \mathrm{vec}(p)^\top[x]_d\), \(q(x) = \mathrm{vec}(q)^\top[x]_d\), so \[ \ell_y(pq) = \ell_y(\mathrm{vec}(p)^\top[x]_d [x]_d^\top\mathrm{vec}(q)) = \mathrm{vec}(p)^\top\left( \ell_y ([x]_d [x]_d^\top) \right) \mathrm{vec}(q) = \mathrm{vec}(p)^\top M_d[y] \mathrm{vec}(q). \]

Dual Cone of SOS Polynomials. With the moment matrix, we can characterize the dual cone of SOS polynomials. Recall that every SOS polynomial of degree up to \(2d\) can be written as \[ p = [x]_d^\top Q [x]_d, \] where \(Q \in \mathbb{S}^{s(n,d)}_{+}\) is the associated gram matrix. The dual cone of \(\Sigma_{n,d}\) is therefore \[ \Sigma_{n,d}^* = \left\{ y \in \mathbb{R}^{s(n,2d)} \mid \ell_y (p) \geq 0, \forall p \in \Sigma_{n,2d} \right\} . \] Now observe that \[ \ell_y (p) \geq 0, \forall p \in \Sigma_{n,2d} \Leftrightarrow \ell_y ([x]_d^\top Q [x]_d) \geq 0, \forall Q \succeq 0 \Leftrightarrow \langle Q, M_d[y] \rangle \geq 0, \forall Q \succeq 0 \Leftrightarrow M_d[y] \succeq 0. \] Therefore, we conclude that the dual cone to \(\Sigma_{n,2d}\) is the moment cone.

Theorem 5.2 (Moment Cone) The dual cone to the cone of SOS polynomials is the moment cone \[ \Sigma_{n,2d}^* = \mathcal{M}_{n,2d} := \left\{ y \in \mathbb{R}^{s(n,2d)} \mid M_d[y] \succeq 0 \right\} . \]

Dual Cone of Nonnegative Polynomials. The dual cone to the cone of nonnegative polynomials is \[ P_{n,2d}^* = \left\{ y \in \mathbb{R}^{s(n,2d)} \mid \langle \mathrm{vec}(p), y \rangle \geq 0, \forall p \in P_{n,2d} \right\} . \] To describe the dual cone, we consider the conic hull \[\begin{equation} \mathcal{R}_{n,2d} = \left\{ \sum_{i=1}^K \lambda_i [v_i]_{2d} \mid \lambda_i \geq 0, v_i \in \mathbb{R}^{n}, K \in \mathbb{N} \right\} . \tag{5.4} \end{equation}\] We can clearly see that \[ \mathcal{R}_{n,2d} \subseteq P_{n,2d}^*, \] because for any \(p \in P_{n,2d}\), we have \[ \left\langle \mathrm{vec}(p), \sum_{i=1}^K \lambda_i [v_i]_{2d} \right\rangle = \sum_{i=1}^K \lambda_i \langle \mathrm{vec}(p), [v_i]_{2d} \rangle = \sum_{i=1}^K \lambda_i p(v_i) \geq 0. \] It turns out the closure of \(\mathcal{R}_{n,2d}\) is the dual cone of nonnegative polynomials.

Theorem 5.3 (Dual Cone of Nonnegative Polynomials) The dual cone of \(P_{n,2d}\) is the closure of \(\mathcal{R}_{n,2d}\): \[ P_{n,2d}^* = \mathrm{cl}(\mathcal{R}_{n,2d}), \quad \mathcal{R}_{n,2d}^* = P_{n,2d}. \]

The closure is needed can be seen from the following exercise.

Exercise 5.1 Consider the vector space of univariate quadratic polynomials \[ \mathbb{R}[x]_2 = \left\{ p_2 x^2 + p_1 x + p_0 \mid (p_2,p_1,p_0) \in \mathbb{R}^{3} \right\} . \]

  • Express the linear functional \[ p_2 x^2 + p_1 x + p_0 \mapsto p_2 \] as a linear combination of point evaluations. (Hint: consider \(p(0) = p_0\), \(p(1)=p_2 + p_1 + p_0\), \(p(-1) = p_2 - p_1 + p_0\). )

  • Show that this linear functional is in the dual cone \(P^*_{1,2}\) but cannot be written as a conic combination of point evaluations.

5.2 Dual of Quadratic Modules and Ideals

Localizing Matrix. Let \(y \in \mathbb{R}^{s(n,2d)}\) be a truncated multi-sequence (TMS) and \(g(x) \in \mathbb{R}[x]_{2d}\) be a given polynomial. Let \(s\) be the maximum integer such that \(2s + \deg(g) \leq 2d\), then the \(d\)-th order localizing matrix of \(g\) generated by \(y\) is the following symmetric matrix \[\begin{equation} L_{g}^d [y] = \ell_y\left( g(x)\cdot [x]_s [x]_s^\top\right), \tag{5.5} \end{equation}\] where the application of the linear functional \(\ell_y\) to the symmetric polynomial matrix \(g(x) \cdot [x]_s [x]_s^\top\) is element-wise. For example, let \(g = 1- x_1^2 - x_2^2\), then \[\begin{equation} \begin{split} g\cdot [x]_1 [x]_1^\top=& (1-x_1^2 - x_2^2) \begin{bmatrix} 1 & x_1 & x_2 \\ x_1 & x_1^2 & x_1 x_2 \\ x_2 & x_1 x_2 & x_2^2 \end{bmatrix} \\ = & \begin{bmatrix} 1 - x_1^2 - x_2^2 & x_1 - x_1^3 - x_1 x_2^2 & x_2 - x_1^2 x_2 - x_2^3 \\ x_1 - x_1^3 - x_1 x_2^2 & x_1^2 - x_1^4 - x_1^2 x_2^2 & x_1 x_2 - x_1^3 x_2 - x_1 x_2^3 \\ x_2 - x_1^2 x_2 - x_2^3 & x_1 x_2 - x_1^3 x_2 - x_1 x_2^3 & x_2^2 - x_1^2 x_2^2 - x_2^4 \end{bmatrix} \end{split} \end{equation}\] and \[\begin{equation} L_g^d[y] = \ell_y (g\cdot [x]_1 [x]_1^\top) = \begin{bmatrix} y_{00} - y_{20} - y_{02} & y_{10} - y_{30} - y_{12} & y_{01} - y_{21} - y_{03} \\ y_{10} - y_{30} - y_{12} & y_{20} - y_{40} - y_{22} & y_{11} - y_{31} - y_{13} \\ y_{01} - y_{21} - y_{03} & y_{11} - y_{31} - y_{13} & y_{02} - y_{22} - y_{04} \end{bmatrix}. \tag{5.6} \end{equation}\] A key observation is that the entries of \(L_g^d[y]\) are affine functions of the TMS \(y\).

Dual of Quadratic Module. Given a tuple of polynomials \(g = (g_1,\dots,g_{l_g})\), recall (from Definition 4.20) that the quadratic module associated with \(g\) is \[ \mathrm{Qmodule}[g] := \left\{ \sum_{i=0}^{l_g} \sigma_i g_i \mid \sigma_i \in \Sigma[x],i=0,\dots,l_g \right\} \] where \(g_0(x):=1\) and \(\Sigma[x]\) denotes the set of SOS polynomials. The degree-\(2d\) truncation of the quadratic module is \[\begin{equation} \mathrm{Qmodule}[g]_{2d} := \left\{ \sum_{i=0}^{l_g} \sigma_i g_i \mid \sigma_i \in \Sigma[x], \deg(\sigma_i g_i) \leq 2d, i=0,\dots,l_g \right\} . \tag{5.7} \end{equation}\] Now consider the convex cone defined by the PSD conditions of the localizing matrices: \[\begin{equation} \mathcal{M}[g]_{2d} := \left\{ y \in \mathbb{R}^{s(n,2d)} \mid M_d[y] \succeq 0, L_{g_i}^d[y] \succeq 0, i=1,\dots,l_g \right\} . \tag{5.8} \end{equation}\] The following result shows that \(\mathcal{M}[g]_{2d}\) is the dual cone of \(\mathrm{Qmodule}[g]_{2d}\).

Theorem 5.4 (Dual of Quadratic Module) Given a set of polynomials \(g=(g_1,\dots,g_{l_g})\), we have

  1. The cone \(\mathcal{M}[g]_{2d}\) is closed and convex, and it is dual to \(\mathrm{Qmodule}[g]_{2d}\), i.e., \[ \langle f, y \rangle \geq 0, \quad \forall f \in \mathrm{Qmodule}[g]_{2d}, y \in \mathcal{M}[g]_{2d}. \]

  2. If the set \(S_{\geq 0} := \{ x \in \mathbb{R}^{n} \mid g_i(x) \geq 0, i=1,\dots,l_g \}\) has nonempty interior, then both \(\mathcal{M}[g]_{2d}\) and \(\mathrm{Qmodule}[g]_{2d}\) are proper cones (i.e., closed and convex) and they are dual to each other. \[ (\mathcal{M}[g]_{2d})^* = \mathrm{Qmodule}[g]_{2d}, \quad (\mathrm{Qmodule}[g]_{2d})^* = \mathcal{M}[g]_{2d}. \]

To see 1 of Theorem 5.4, pick any \(f \in \mathrm{Qmodule}[g]_{2d}\) from (5.7), and any \(y \in \mathcal{M}[g]_{2d}\) from (5.8), we have \[\begin{equation} \begin{split} \langle f, y \rangle = & \langle \sigma_0, y \rangle + \sum_{i=1}^{l_g} \langle \sigma_i g_i, y \rangle \\ = & \left\langle [x]_d^\top Q_0 [x]_d, y \right\rangle + \sum_{i=1}^{l_g} \left\langle g_i \cdot [x]_{s_i}^\top Q_i [x]_{s_i}, y \right\rangle \\ = & \left\langle M_d[y], Q_0 \right\rangle + \sum_{i=1}^{l_g} \left\langle L^d_{g_i}[y], Q_i \right\rangle \geq 0 \end{split} \end{equation}\] where \(Q_0,Q_1,\dots,Q_{l_g}\) are the Gram matrices associated with the SOS multipliers \(\sigma_0,\sigma_1,\dots,\sigma_{l_g}\), and \(s_i\) is the maximum integer such that \(2 s_i + \deg (g_i) \leq 2d\). The equation above also shows that, in order for any TMS \(y\) to be in the dual cone of \(\mathrm{Qmodule}[y]_{2d}\), it must hold that \(M_d[y] \succeq 0, L^d_{g_i}[y] \succeq 0,i=1,\dots,l_g\), and hence \(y\) must belong to \(\mathcal{M}_d[y]\).

To see 2 of Theorem 5.4, note that when \(S_{\geq 0}\) has empty interior, the cone \(\mathrm{Qmodule}[g]_{2d}\) may not be closed. For example, consider \(g = -x^2\), the set \(S_{\geq 0}\) is the singleton \(\{ 0 \}\) and has empty interior. The polynomial \(x + \epsilon\), for any \(\epsilon > 0\), is in \(\mathrm{Qmodule}[g]_2\) because \[ x + \epsilon = \left( \sqrt{\epsilon} + \frac{x}{2 \sqrt{\epsilon}} \right)^2 + (-x^2) \left( \frac{1}{2 \sqrt{\epsilon}} \right)^2. \] However, when \(\epsilon = 0\), the polynomial \(x \not\in \mathrm{Qmodule}[x]_{2}\).

Dual of the Sum of Ideal and Quadratic Module. Recall that given a tuple of polynomials \(h=(h_1,\dots,h_{l_h})\), the ideal generated by \(h\) is \[ \mathrm{Ideal}[h] = \left\{ \sum_{i=1}^{l_h} \lambda_i h_i \mid \lambda_i \in \mathbb{R}[x],i=1,\dots,l_h \right\} , \] where \(\lambda_i\) are the polynomial multipliers. Similarly, the degree-\(2d\) truncation of \(\mathrm{Ideal}[h]\) is \[\begin{equation} \mathrm{Ideal}[h]_{2d} = \left\{ \sum_{i=1}^{l_h} \lambda_i h_i \mid \lambda_i \in \mathbb{R}[x], \deg(\lambda_i h_i) \leq 2d,i=1,\dots,l_h \right\} . \tag{5.9} \end{equation}\] For each \(\lambda_i \in \mathbb{R}[x]_{2d - \deg(h_i)}\) such that \(\deg(\lambda_i h_i) \leq 2d\), the coefficients of \((\lambda_i h_i\) are linear in that of \(\lambda_i\) \[ \mathrm{vec}(\lambda_i h_i) = H_i^{2d} \mathrm{vec}(\lambda_i), \] where \(\mathrm{vec}(\cdot)\) denotes the coefficients vector of a polynomial. For example, suppose \(h = x^2-1\) and \(d = 2\), then \(\lambda(x) = \lambda_2 x^2 + \lambda_1 x + \lambda_0\), and \[ \lambda(x) h(x) = (\lambda_2 x^2 + \lambda_1 x + \lambda_0)(x^2 - 1) = \lambda_2 x^4 + \lambda_1 x^3 + (\lambda_0 - \lambda_2) x^2 - \lambda_1 x - \lambda_0. \] Then, \[ \mathrm{vec}(\lambda h) = \begin{bmatrix} - \lambda_0 \\ - \lambda_1 \\ \lambda_0 - \lambda_2 \\ \lambda_1 \\ \lambda_2 \end{bmatrix} = \underbrace{\begin{bmatrix} -1 & 0 & 0 \\ 0 & -1 & 0 \\ 1 & 0 & -1 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}}_{H^4} \begin{bmatrix} \lambda_0 \\ \lambda_1 \\ \lambda_2 \end{bmatrix} \]

Localizing Vector. Given a TMS \(y \in \mathbb{R}^{s(n,2d)}\) and its associated linear functional \(\ell_y\), applying \(\ell_y\) to \(\lambda_i h_i\) with \(\deg(\lambda_i h_i) \leq 2d\) can be written as \[ \ell_y(\lambda_i h_i) = \langle y, \mathrm{vec}(\lambda_i h_i) \rangle = \langle y, H_i^{2d} \mathrm{vec}(\lambda_i) \rangle = \left\langle (H_i^{2d})^\top y, \mathrm{vec}(\lambda_i) \right\rangle, \] and the vector \[\begin{equation} L_{h_i}^{2d}[y] := (H_i^{2d})^\top y \tag{5.10} \end{equation}\] is called the localizing vector of \(h_i\) generated by the TMS \(y\). We are interested in linear functionals that vanish on \(\mathrm{Ideal}[h]_{2d}\). Clearly, this is the subspace \[\begin{equation} \mathcal{Z}[h]_{2d} := \left\{ y \in \mathbb{R}^{s(n,2d)} \mid L_{h_i}^{2d}[y] = 0,i=1,\dots,l_h \right\} . \tag{5.11} \end{equation}\]

The next result gives the dual of the sum of ideal and quadratic module.

Theorem 5.5 (Dual of the Sum of Ideal and Quadratic Module) Let \(g=(g_1,\dots,g_{l_g})\) and \(h=(h_1,\dots,l_h)\) be tuples of polynomials, for any \(2d > 0\), we have \[ \left( \mathrm{Ideal}[h]_{2d} + \mathrm{Qmodule}[g]_{2d} \right)^* = \mathcal{Z}[h]_{2d} \cap \mathcal{M}[g]_{2d} \] where \(\mathcal{Z}[h]_{2d}\) is the linear subspace in (5.11) and \(\mathcal{M}[g]_{2d}\) is the convex cone in (5.8).

To see this, pick any element \(f \in \mathrm{Ideal}[h]_{2d} + \mathrm{Qmodule}[g]_{2d}\) and any element \(y \in \mathcal{Z}[h]_{2d} \cap \mathcal{M}[g]_{2d}\), perform the inner product \[\begin{equation} \begin{split} \langle f, \ell_y \rangle = \left\langle \sum_{i=1}^{l_h} \lambda_i h_i + \sum_{i=0}^{l_g} \sigma_i g_i , \ell_y \right\rangle = \sum_{i=1}^{l_h} \langle \mathrm{vec}(\lambda_i h_i), y \rangle + \sum_{i=1}^{l_g} \langle g_i \cdot [x]_{s_i}^\top Q_i [x]_{s_i}, \ell_y \rangle \\ = \sum_{i=1}^{l_h} \langle L_{h_i}^{2d}[y], \mathrm{vec}(\lambda_i) \rangle + \sum_{i=0}^{l_g} \langle L_{g_i}^d[y], Q_i \rangle \end{split}. \end{equation}\] This is clearly nonnegative since \(L_{h_i}^{2d}[y] = 0\) and \(L_{g_i}^d[y] \succeq 0\) is implied by \(y \in \mathcal{Z}[h]_{2d} \cap \mathcal{M}[g]_{2d}\).

5.3 The Moment-SOS Hierarchy

In Chapter 4.6.1, we introduced an SOS relaxation for solving polynomial optimization problems (POPs). In particular, we consider the following POP, restated from (4.19) \[\begin{equation} \begin{split} p^\star = \min_{x \in \mathbb{R}^{n}} & \quad p(x) \\ \mathrm{s.t.}& \quad h_i(x) = 0, i=1,\dots,l_h \\ & \quad g_i(x) \geq 0, i=1,\dots,l_g \end{split} \tag{5.12} \end{equation}\] Using Putinar’s Positivstellensatz, for any \(\kappa \in \mathbb{N}\), we have the following SOS program \[\begin{equation} \boxed{ \begin{split} \gamma_{\kappa}^\star = \max & \quad \gamma \\ \mathrm{s.t.}& \quad p(x) - \gamma \in \mathrm{Ideal}[h]_{2\kappa} + \mathrm{Qmodule}[g]_{2\kappa} \end{split} } \tag{5.13} \end{equation}\] whose optimal value produces a lower bound to \(p^\star\), i.e., \(\gamma_{\kappa}^\star \leq p^\star\).

With the machinery introduced above on the dual of the cone of polynomials, we can write down the dual problem of the SOS program (5.13) \[\begin{equation} \boxed{ \begin{split} \beta_{\kappa}^\star = \min_{y \in \mathbb{R}^{s(n,2\kappa)}} & \quad \langle \ell_y, p \rangle \\ \mathrm{s.t.}& \quad y \in \mathcal{Z}[h]_{2\kappa} \cap \mathcal{M}[g]_{2\kappa} \\ & \quad \langle \ell_y, 1 \rangle = 1 \end{split} } \tag{5.14} \end{equation}\] In problem (5.14), \(\langle \ell_y, p \rangle\) and \(\langle \ell_y, 1 \rangle\) should be interpreted as applying the linear functional associated with the TMS \(y\) to the polynomial \(p\) and the polynomial “\(1\)”. As a result, the constraint \(\langle \ell_y, 1 \rangle=1\) implies \(y_0 = 1\). To see why (5.14) is the dual of (5.13), pick any \(\gamma\) that is feasible for (5.13) and any \(y\) that is feasible for (5.14), we have \[ \langle \ell_y, p \rangle - \gamma = \langle \ell_y, p \rangle - \langle \ell_y, \gamma \rangle = \langle \ell_y, p - \gamma \rangle \geq 0, \] where, again, \(\langle \ell_y, \gamma \rangle\) should be interpreted as applying \(\ell_y\) to the degree-\(0\) polynomial “\(\gamma\)”. The last inequality holds because \(y\) and \(p - \gamma\) live in two cones that are dual to each other. Consequently, we have the weak duality \[ \gamma_{\kappa}^\star \leq \beta_{\kappa}^\star, \quad \forall \kappa \in \mathbb{N}. \] The pair of SDPs (5.14) and (5.13) is called the moment-SOS hierarchy and is first proposed in (Lasserre 2001).

One may wonder how to directly write down the primal problem (5.14) given the dual problem (5.13), or vice versa? In the next we give a brief discussion about general conic duality, from which the primal-dual relationship shall be clear.

Conic Duality

Let \(V\) be a vector space and \(\mathcal{K}\subset V\) be a proper cone. On the dual side, let \(V^*\) be the dual space of \(V\) and \(\mathcal{K}^*\) be the dual cone of \(\mathcal{K}\). Note that \(V^*\), the dual space to \(V\), is the vector space of real-valued linear functionals. For example,

  • if \(V=\mathbb{R}^{n}\) (think of this as the space of points), then \(V^* = \mathbb{R}^{n}\) (think of this as the space of linear functionals on points).

  • if \(V = \mathbb{R}[x]_{2d}\) is the space of polynomials of degree up to \(2d\) (which is isomorphic to just \(\mathbb{R}^{s(n,2d)}\) after identifying each polynomial \(f\) with its vector of coefficients \(\mathrm{vec}(f)\)), then \(V^* = \mathbb{R}[x]_{2d}^*\) is the space of linear functionals (which, again, is isomorphic to just \(\mathbb{R}^{s(n,2d)}\) after identifying each linear functional with its associated truncated muti-sequence \(y\)).

For any \(x \in V\) and \(f \in V^*\), we can define an inner product \(\langle f, x \rangle = f(x)\), which is simply applying the linear functional \(f\) to the point \(x\) and returning a real number.

We then consider a linear map \(\mathcal{A}: V \rightarrow T\) where \(T\) is another vector space. The adjoint map \(\mathcal{A}^*: T^* \rightarrow V^*\) is defined via \[ \langle x, \mathcal{A}^* y \rangle_V = \langle y, \mathcal{A}x \rangle_T, \quad \forall x \in V, y \in T^*, \] where the subscript \(V,T\) implies the inner product is written in their respective primal-dual vector spaces.

We can then define the general primal-dual conic pair \[\begin{equation} \boxed{ \begin{split} \min_{x \in V} & \quad \langle c, x \rangle_V \\ \mathrm{s.t.}& \quad \mathcal{A}x = b \\ & \quad x \in \mathcal{K} \end{split}} \tag{5.15} \end{equation}\] and \[\begin{equation} \boxed{ \begin{split} \max_{y \in T^*} & \quad \langle b, y \rangle_T \\ \mathrm{s.t.}& \quad c - \mathcal{A}^* y \in \mathcal{K}^* \end{split}} \tag{5.16} \end{equation}\] where \(b \in T\) and \(c \in V^*\) are given. The proof of weak duality can be done by picking any \(x\) feasible for the primal and \(y\) feasible for the dual, leading to \[\begin{equation} \begin{split} \langle c, x \rangle_V - \langle b, y \rangle_T & = \langle c, x \rangle_V - \langle \mathcal{A}x, y \rangle_T \\ & = \langle c, x \rangle_V - \langle x, \mathcal{A}^* y \rangle_V \\ & = \langle c - \mathcal{A}^* y, x \rangle_V \\ & \geq 0 \end{split} \end{equation}\] where the last inequality holds because \(x \in \mathcal{K}\) and \(c - \mathcal{A}^* y \in \mathcal{K}^*\).

We have seen three instantiations of (5.15) and (5.16) so far in the class.

  1. Linear programming (Chapter 1.4). With \(V = V^* = \mathbb{R}^{n}\), \(T = T^* = \mathbb{R}^{m}\), \(\mathcal{K}= \mathcal{K}^* = \mathbb{R}^{n}_{+}\), we recover the standard LP pair (1.15) and (1.16).

  2. Semidefinite programming (Chapter 2). With \(V = V^* = \mathbb{S}^{n}\), \(T = T^* = \mathbb{R}^{m}\), \(\mathcal{K}= \mathcal{K}^* = \mathbb{S}^{n}_{+}\), we recover the standard SDP pair (2.5) and (2.6).

  3. Moment-SOS hierarchy. With

  • \(V^* = \mathbb{R}[x]_{2\kappa}\), \(V = \mathbb{R}[x]_{2\kappa}^*\)
  • \(T = T^* = \mathbb{R}^{}\)
  • \(\mathcal{K}= \mathcal{Z}[h]_{2\kappa} \cap \mathcal{M}[g]_{2\kappa}\), and \(\mathcal{K}^* = \mathrm{Ideal}[h]_{2\kappa} + \mathrm{Qmodule}[g]_{2\kappa}\)

From LP to SDP to the moment-SOS hierarchy, this is a (in my opinion, beautiful) sequence of generalizations in conic duality. As we seek higher level of generalization, we have more modelling power (i.e., we can solve harder optimization problems), but we also face more computational challenges (i.e., the convex optimization problems become more expensive to solve).

The next theorem collects some known results about the moment-SOS hierarchy in terms of asymptotic convergence and strong duality.

Theorem 5.6 (Moment-SOS Hierarchy) Consider the POP (5.12) and the primal-dual pair (5.14) and (5.13), we have \[ \gamma_{\kappa}^\star \leq \beta_{\kappa}^\star \leq p^\star, \quad \forall \kappa \in \mathbb{N}, \] and both sequences \(\{ \gamma_{\kappa}^\star \}_{\kappa=1}^{\infty}\), \(\{ \beta_{\kappa}^\star \}_{\kappa=1}^{\infty}\) are monotonically increasing. Moreover, let the feasible set of the POP (5.12) be \(S\),

  • Suppose all equality constraints are linear or there are no equality constraints. If there exists \(x_0 \in S\) such that \(g_i(x_0) > 0, i=1,\dots,l_g\), then \(\gamma_{\kappa}^\star = \beta_{\kappa}^\star\) for any \(\kappa\), and the dual problem (5.13) is solvable (i.e., the maximum is attained).

  • Suppose there exists a scalar \(R > 0\) such that \[ R - \Vert x \Vert^{2\kappa_0} \in \mathrm{Ideal}[h]_{2\kappa_0} + \mathrm{Qmodule}[g]_{2\kappa_0} \] for some \(\kappa_0 \in \mathbb{N}\). For any \(\kappa \geq \kappa_0\), if the moment relaxation (5.14) is feasible, then it is solvable (i.e., the minimum is attained).

  • If the constraint set is Archimedean, then the moment-SOS hierarchy has asymptotic convergence, i.e., \[ \lim_{\kappa \rightarrow \infty} \gamma_{\kappa}^\star = \lim_{\kappa \rightarrow \infty} \beta_{\kappa}^\star = p^\star. \]

  • If \(g\) includes a ball constraint \(R - \Vert x \Vert^2\), and the POP feasible set \(S\) is nonempty, then \(\gamma_{\kappa}^\star = \beta_{\kappa}^\star\) for any \(\kappa\), and the moment relaxation (5.14) is solvable (i.e., the minimum is attained).

The last point in Theorem 5.6 is particularly useful for numerical computation. It suggests that adding a redundant ball constraint is always encouraged for strong duality to hold. In addition, an appropriate scaling technique is needed so that all scaled variables belong to the unit ball. Without scaling, numerical troubles can occur.

Without these assumptions and conditions, it is possible that strong duality fails.

Example 5.2 (Failure of Strong Duality in Moment-SOS Hierarchy) Consider the following POP \[\begin{equation} \begin{split} p^\star = \min_{x \in \mathbb{R}^{2}} & \quad x_1 x_2 \\ \mathrm{s.t.}& \quad -1 \leq x_1 \leq 1 \\ & \quad x_2^2 \leq 0 \end{split} \end{equation}\] Clearly, the global minimum \(p^\star = 0\) and it is attained at \((\alpha,0)\) for any \(\alpha \in [-1,1]\). We now implement the moment-SOS hierarchy and see if strong duality holds.

As shown in Example 4.16, the SOS relaxation can be implemented in SOSTOOLS, see this code. Running the code at \(\kappa = 1\), we obtain the following output from MOSEK:

ITE PFEAS    DFEAS    GFEAS    PRSTATUS   POBJ              DOBJ              MU       TIME  
0   1.0e+00  1.0e+00  1.0e+00  0.00e+00   0.000000000e+00   0.000000000e+00   1.0e+00  0.00  
1   1.5e-01  1.5e-01  7.9e-02  5.56e-01   3.088172318e-01   4.050011330e-01   1.5e-01  0.00  
2   4.1e-02  4.1e-02  2.1e-02  3.13e-01   5.858768891e-01   7.658826639e-01   4.1e-02  0.00  
3   1.1e-02  1.1e-02  4.5e-03  2.25e-01   4.180497936e-01   5.431122321e-01   1.1e-02  0.00  
4   4.2e-03  4.2e-03  2.2e-03  -7.15e-02  7.821867345e-01   1.021805063e+00   4.2e-03  0.00  
5   8.7e-04  8.7e-04  3.1e-04  8.58e-02   3.753027688e-01   4.921723327e-01   8.7e-04  0.00  
6   2.0e-04  2.0e-04  1.1e-04  -3.30e-02  8.469613700e-01   1.114348467e+00   2.0e-04  0.00  
7   3.6e-05  3.6e-05  1.5e-05  6.62e-02   5.132062526e-01   6.764418357e-01   3.6e-05  0.00  
8   8.5e-06  8.5e-06  3.9e-06  2.07e-02   6.478873186e-01   8.551888421e-01   8.5e-06  0.00  
9   2.3e-06  2.3e-06  9.1e-07  6.40e-02   5.033974907e-01   6.654248395e-01   2.3e-06  0.00  
10  8.0e-07  8.0e-07  4.2e-07  -1.66e-01  8.695267485e-01   1.150524931e+00   8.0e-07  0.00  
11  1.7e-07  1.7e-07  6.1e-08  8.16e-02   3.666850704e-01   4.865006085e-01   1.7e-07  0.00  
12  3.8e-08  3.8e-08  2.2e-08  -1.05e-01  1.019156252e+00   1.353211111e+00   3.8e-08  0.00  
13  5.8e-09  5.8e-09  2.6e-09  3.57e-02   5.915896244e-01   7.859525244e-01   5.8e-09  0.00  
14  1.4e-09  1.4e-09  5.8e-10  9.40e-02   5.523832993e-01   7.340489667e-01   1.4e-09  0.00  
15  4.3e-10  4.3e-10  2.3e-10  -1.39e-01  8.826739207e-01   1.173406707e+00   4.3e-10  0.00  
16  1.1e-10  1.1e-10  3.7e-11  1.31e-01   3.711852473e-01   4.939043249e-01   1.1e-10  0.00  
17  7.0e-11  2.6e-11  1.5e-11  -1.55e-01  1.056341629e+00   1.406000814e+00   2.6e-11  0.00  
18  7.1e-12  4.6e-12  2.1e-12  -2.75e-03  6.282269209e-01   8.364103066e-01   4.6e-12  0.00  
19  1.5e-12  9.9e-13  4.1e-13  1.33e-01   5.056202080e-01   6.732103175e-01   9.9e-13  0.00  
20  4.2e-13  2.7e-13  1.5e-13  -1.07e-01  8.738865371e-01   1.163743093e+00   2.7e-13  0.00  
21  9.4e-14  6.1e-14  2.3e-14  1.05e-01   4.311662782e-01   5.743514027e-01   6.1e-14  0.00  
22  2.4e-14  1.6e-14  8.5e-15  -1.21e-01  8.898674342e-01   1.185531234e+00   1.6e-14  0.00  
23  7.1e-15  5.8e-15  2.2e-15  -3.77e-02  6.963438000e-01   9.278150049e-01   4.3e-15  0.00  
24  1.2e-15  7.6e-16  3.3e-16  1.78e-01   5.434018296e-01   7.241064462e-01   8.9e-16  0.00  
25  8.9e-16  1.8e-15  1.0e-16  -1.35e-01  6.204719258e-01   8.268596881e-01   2.4e-16  0.00  
26  2.2e-16  4.4e-16  2.2e-17  7.87e-02   5.651101237e-01   7.531366036e-01   6.0e-17  0.00  
27  4.4e-16  2.7e-15  1.9e-17  -1.32e-01  5.677222556e-01   7.566247465e-01   5.1e-17  0.00  
28  6.5e-17  3.6e-15  1.8e-17  -8.42e-02  5.674944086e-01   7.563223512e-01   4.9e-17  0.00  
29  6.0e-17  2.7e-15  1.7e-17  -8.28e-02  5.669921471e-01   7.556553290e-01   4.5e-17  0.00  
30  4.4e-16  2.7e-15  1.7e-17  -6.79e-02  5.669663151e-01   7.556209582e-01   4.5e-17  0.00  
31  4.4e-16  2.7e-15  1.4e-17  -6.99e-02  5.652526235e-01   7.533411196e-01   3.8e-17  0.00  
32  3.0e-17  8.9e-16  8.5e-18  -4.43e-02  5.579125501e-01   7.435651031e-01   2.4e-17  0.00  
33  3.0e-17  8.9e-16  8.5e-18  -8.34e-03  5.579035284e-01   7.435530824e-01   2.4e-17  0.00  
34  3.0e-17  8.9e-16  8.5e-18  -8.34e-03  5.579035284e-01   7.435530824e-01   2.4e-17  0.00  
35  2.5e-17  1.6e-17  6.8e-18  -1.15e-02  5.529270780e-01   7.369224772e-01   1.9e-17  0.00  
36  2.5e-17  2.2e-15  6.8e-18  -1.79e-03  5.529095013e-01   7.368990567e-01   1.9e-17  0.00  
37  2.5e-17  2.2e-15  6.8e-18  -1.79e-03  5.529095013e-01   7.368990567e-01   1.9e-17  0.00  
38  2.5e-17  2.2e-15  6.8e-18  -1.76e-03  5.529007155e-01   7.368873498e-01   1.9e-17  0.00  
39  1.1e-17  4.4e-15  4.0e-18  -1.74e-03  5.409849593e-01   7.210098998e-01   1.1e-17  0.01  
40  1.1e-17  4.9e-15  4.0e-18  1.44e-02   5.408403061e-01   7.208171399e-01   1.1e-17  0.01  
41  1.1e-17  4.9e-15  4.0e-18  1.44e-02   5.408403061e-01   7.208171399e-01   1.1e-17  0.01  
42  1.1e-17  4.9e-15  4.0e-18  1.44e-02   5.408403061e-01   7.208171399e-01   1.1e-17  0.01  
Optimizer terminated. Time: 0.01    


Interior-point solution summary
  Problem status  : UNKNOWN
  Solution status : UNKNOWN
  Primal.  obj: 5.4084030611e-01    nrm: 3e+08    Viol.  con: 1e-09    var: 0e+00    barvar: 0e+00  
  Dual.    obj: 7.2081713987e-01    nrm: 4e+08    Viol.  con: 0e+00    var: 1e-09    barvar: 7e-07 

which shows that the solver did not converge.

If instead, we add a redundant ball constraint \(4 - x_1^2 - x_2^2 \geq 0\) and solve again at \(\kappa=1\), we see the following result:

ITE PFEAS    DFEAS    GFEAS    PRSTATUS   POBJ              DOBJ              MU       TIME  
0   1.0e+00  1.0e+00  1.0e+00  0.00e+00   0.000000000e+00   0.000000000e+00   1.0e+00  0.00  
1   1.8e-01  1.8e-01  1.4e-01  3.28e-01   2.041575331e-01   5.267481899e-01   1.8e-01  0.00  
2   3.5e-02  3.5e-02  1.4e-02  9.46e-01   4.043057656e-01   4.966964156e-01   3.5e-02  0.00  
3   1.3e-02  1.3e-02  3.8e-03  5.26e-01   2.214064092e-01   2.805094594e-01   1.3e-02  0.00  
4   3.5e-03  3.5e-03  6.1e-04  7.62e-01   1.925510959e-01   2.134013995e-01   3.5e-03  0.00  
5   1.1e-03  1.1e-03  1.7e-04  4.01e-01   1.154109271e-01   1.350598297e-01   1.1e-03  0.00  
6   2.9e-04  2.9e-04  2.8e-05  7.10e-01   8.188641629e-02   8.975466685e-02   2.9e-04  0.00  
7   9.1e-05  9.1e-05  8.2e-06  4.07e-01   5.245561798e-02   5.978149022e-02   9.1e-05  0.00  
8   2.7e-05  2.7e-05  1.6e-06  6.84e-01   3.727529406e-02   4.054196962e-02   2.7e-05  0.00  
9   9.9e-06  9.9e-06  6.1e-07  3.09e-01   2.492316391e-02   2.855972552e-02   9.9e-06  0.00  
10  2.5e-06  2.5e-06  9.4e-08  6.93e-01   1.662754698e-02   1.801019802e-02   2.5e-06  0.00  
11  7.4e-07  7.4e-07  2.7e-08  3.57e-01   1.066174340e-02   1.199307778e-02   7.4e-07  0.00  
12  2.2e-07  2.2e-07  5.4e-09  6.43e-01   7.435651393e-03   8.053182585e-03   2.2e-07  0.00  
13  7.8e-08  7.8e-08  2.1e-09  2.63e-01   5.027346675e-03   5.735405751e-03   7.8e-08  0.00  
14  1.9e-08  1.9e-08  3.1e-10  6.78e-01   3.274376941e-03   3.539771530e-03   1.9e-08  0.00  
15  5.4e-09  5.4e-09  8.6e-11  3.51e-01   2.090217524e-03   2.340150880e-03   5.4e-09  0.00  
16  1.6e-09  1.6e-09  1.8e-11  6.25e-01   1.460073639e-03   1.580348192e-03   1.6e-09  0.00  
17  6.0e-10  6.0e-10  7.2e-12  2.35e-01   1.000023643e-03   1.142129878e-03   6.0e-10  0.00  
18  1.4e-10  1.4e-10  1.0e-12  6.79e-01   6.400941650e-04   6.913204662e-04   1.4e-10  0.00  
19  1.0e-10  3.9e-11  2.7e-13  3.56e-01   4.051653495e-04   4.517858240e-04   3.9e-11  0.00  
20  1.2e-11  1.2e-11  5.8e-14  6.05e-01   2.840240254e-04   3.074642454e-04   1.2e-11  0.00  
21  4.6e-12  4.5e-12  2.4e-14  2.10e-01   1.967647207e-04   2.251266137e-04   4.5e-12  0.00  
22  1.0e-12  1.0e-12  3.3e-15  6.81e-01   1.242068557e-04   1.340389165e-04   1.0e-12  0.00  
23  2.7e-13  2.7e-13  8.0e-16  3.56e-01   7.772043572e-05   8.646338802e-05   2.7e-13  0.00  
24  8.5e-14  8.5e-14  1.8e-16  5.92e-01   5.463905229e-05   5.916694930e-05   8.5e-14  0.00  
25  3.3e-14  3.3e-14  7.7e-17  1.91e-01   3.813146090e-05   4.367749095e-05   3.3e-14  0.00  
26  7.5e-15  7.5e-15  1.0e-17  6.82e-01   2.388667090e-05   2.575875793e-05   7.5e-15  0.00  
27  9.5e-13  6.1e-15  8.2e-18  3.63e-01   2.249020870e-05   2.430458202e-05   6.1e-15  0.00  
28  2.7e-13  1.3e-15  1.5e-18  3.14e-01   1.318081011e-05   1.456985641e-05   1.3e-15  0.00  
29  7.4e-13  1.2e-15  1.4e-18  1.00e+00   1.295450582e-05   1.429001140e-05   1.2e-15  0.00  
30  9.7e-13  1.2e-15  1.4e-18  1.00e+00   1.283897715e-05   1.414828202e-05   1.2e-15  0.00  
31  1.0e-12  1.2e-15  1.4e-18  1.00e+00   1.280983848e-05   1.411266897e-05   1.2e-15  0.00  
32  1.0e-12  1.2e-15  1.4e-18  1.00e+00   1.280618882e-05   1.410821245e-05   1.2e-15  0.00  
33  1.0e-12  1.2e-15  1.4e-18  1.00e+00   1.280436352e-05   1.410598387e-05   1.2e-15  0.00  
34  7.1e-13  3.3e-16  2.8e-19  4.71e-01   8.481144013e-06   9.212971618e-06   3.3e-16  0.00  
35  7.2e-13  3.3e-16  2.8e-19  1.00e+00   8.474714028e-06   9.205554497e-06   3.3e-16  0.00  
36  7.3e-13  3.3e-16  2.8e-19  1.00e+00   8.473094330e-06   9.203686456e-06   3.3e-16  0.01  
37  4.4e-14  1.2e-16  1.1e-19  1.73e-01   5.812012110e-06   6.610489568e-06   1.2e-16  0.01  
38  3.2e-13  7.8e-17  5.6e-20  1.00e+00   4.907525789e-06   5.428890141e-06   8.3e-17  0.01  
39  3.9e-13  7.1e-17  4.9e-20  1.00e+00   4.737143451e-06   5.216312950e-06   7.7e-17  0.01  
40  1.0e-13  1.9e-17  9.0e-21  5.68e-01   3.163202225e-06   3.391803637e-06   2.0e-17  0.01  
41  2.1e-13  1.8e-17  8.3e-21  1.00e+00   3.079800819e-06   3.297015112e-06   2.0e-17  0.01  
42  3.0e-13  1.7e-17  7.7e-21  1.00e+00   2.985096669e-06   3.190159883e-06   2.0e-17  0.01  
43  3.1e-13  1.7e-17  7.6e-21  1.00e+00   2.972167540e-06   3.175652455e-06   2.0e-17  0.01  
44  1.2e-13  3.9e-18  1.6e-21  1.93e-01   1.802355527e-06   1.972166368e-06   4.4e-18  0.01  
45  1.7e-13  1.5e-18  3.7e-22  1.00e+00   1.272549881e-06   1.334918948e-06   1.7e-18  0.01  
46  1.7e-13  1.5e-18  3.7e-22  1.00e+00   1.272549881e-06   1.334918948e-06   1.7e-18  0.01  
47  1.7e-13  1.5e-18  3.7e-22  1.00e+00   1.272549881e-06   1.334918948e-06   1.7e-18  0.01  
Optimizer terminated. Time: 0.01    


Interior-point solution summary
  Problem status  : PRIMAL_AND_DUAL_FEASIBLE
  Solution status : OPTIMAL

which shows the solver converged and the optimal value is numerically close to zero. This shows the correctness of the last point in the Theorem above.

We can also write the constraint \(x_2^2 \leq 0\) equivalently as \(x_2 = 0\), in which case the POP will satisfy the first point of Theorem 5.6 and strong duality holds. Implementing this in code, MOSEK produces the following outut

ITE PFEAS    DFEAS    GFEAS    PRSTATUS   POBJ              DOBJ              MU       TIME  
0   1.0e+00  1.0e+00  1.0e+00  0.00e+00   0.000000000e+00   0.000000000e+00   1.0e+00  0.00  
1   2.8e-01  2.8e-01  1.7e-01  6.80e-01   1.784287181e-01   2.250737385e-01   2.8e-01  0.00  
2   1.6e-02  1.6e-02  2.4e-03  1.10e+00   -1.896781377e-02  -1.421782353e-02  1.6e-02  0.00  
3   1.8e-04  1.8e-04  2.7e-06  1.00e+00   -2.173561193e-04  -1.728424914e-04  1.8e-04  0.00  
4   1.0e-06  1.0e-06  1.2e-09  1.00e+00   -1.291256020e-06  -1.041546182e-06  1.0e-06  0.00  
5   1.9e-08  1.9e-08  3.0e-12  1.00e+00   -2.316455946e-08  -1.858439993e-08  1.9e-08  0.00  
Optimizer terminated. Time: 0.00    


Interior-point solution summary
  Problem status  : PRIMAL_AND_DUAL_FEASIBLE
  Solution status : OPTIMAL

with optimal value numerically zero.

In all three cases, the geometry of the feasible set never changes, what changed was the algebraic description of the same geometric set. This shows that the moment-SOS hierarchy can be sensitive to how the user describes the problem.

5.4 Measure-Theoretic Interpretation

Let us zoom out a bit and see what we have done. We started with Putinar’s Positivstellensatz, applied it to polynomial optimization (POP) and obtained the SOS relaxation of POPs. Then, by taking the dual of \(\mathbb{R}[x]_{2d}\) and \(\mathrm{Ideal}[h] + \mathrm{Qmodule}[g]\), and applying standard conic duality, we obtained the primal moment relaxation, which is an optimization over the truncated multi-sequence \(y\).

However, recall in the discussion of Shor’s SDP relaxation, the dual of the dual can be directly obtained by first writing the QCQP as a rank-constrained matrix optimization problem and then dropping the rank constraint – an approach that gives us the primal relaxation without needing to go through the dual relaxation. Can we do the same for POPs, i.e., obtaining the moment relaxation without needing to go through the SOS relaxation? The answer is affirmative.

Consider the polynomial optimization (POP) in (5.12) and denote its feasible set as \[ S = \{ x \in \mathbb{R}^{n} \mid h_i(x)=0,i=1,\dots,l_h, g_i(x)\geq 0, i=1,\dots,l_g \}. \] We assume \(S\) is compact and Archimedean (e.g., there is a ball constraint \(R - x^\top x \geq 0\) in \(g\)). The POP (5.12) is equivalent to \[\begin{equation} p^\star = \min_{x \in S} \quad p(x). \tag{5.17} \end{equation}\]

A key observation is that problem (5.17) is equivalent to an infinite-dimensional convex optimization problem! Let \(\mathcal{P}(S)\) be the space of all possible measures supported on \(S\), the following result shows that the POP (5.17) is equivalent to a linear optimization over \(\mathcal{P}(S)\).

Theorem 5.7 (Optimization over Measures) The following optimization \[\begin{equation} \begin{split} \rho^\star = \min_{ \mu \in \mathcal{P}(S)} & \quad \int_S p(x) d \mu(x) \\ \mathrm{s.t.}& \quad \int_S 1 d \mu(x) = 1. \end{split} \tag{5.18} \end{equation}\] is equivalent to the POP (5.17), i.e., \(p^\star = \rho^\star\).

Proof. To see this, note that if \(p^\star = -\infty\), then trivially \(\rho^\star = -\infty\) (this is left as an exercise for you). We focus on the case \(p^\star > - \infty\) is finite. Since \(p^\star\) is the global minimum, we have \[ p(x) \geq p^\star, \forall x \in S. \] Therefore, \[ \int_S p(x) d\mu(x) \geq \int_S p^\star d \mu(x) \geq p^\star, \quad \forall \mu \in \mathcal{P}(S), \int_S 1 d\mu(x) = 1. \] This shows that \[ \rho^\star \geq p^\star. \] Conversely, for any \(x \in S\), let \(\delta_{x}\) be the Dirac measure. We have \(\delta_{x}\) is feasible for problem (5.18) and attains the same cost as \(x\) for the POP. This shows that \[ \rho^\star \leq p^\star. \] As a result, \(\rho^\star = p^\star\).

The proof shows that, if \(x_\star\) is a global optimizer of the POP, then the Dirac measure \(\mu_\star = \delta_{x_\star}\) is an optimal solution to (5.18).

Now let us inspect the objective function of (5.18). Suppose \(p(x) \in \mathbb{R}[x]_{2d}\), then \[ p(x) = \sum_{\alpha \in \mathcal{F}_{n,2d}} p_{\alpha} x^{\alpha} = \langle \mathrm{vec}(p), [x]_{2d} \rangle. \] Plugging this to the objective function, we see \[ \int_S p(x) d \mu(x) = \int_S \left(\sum_{\alpha} p_{\alpha} x^{\alpha}\right) d \mu(x) = \sum_{\alpha} p_{\alpha} \left(\int_S x^{\alpha} d \mu(x)\right). \] Denoting \[ y_{\alpha} = \int_S x^{\alpha} d \mu(x), \] as the \(\alpha\) moment of \(\mu\) on \(S\), and \[\begin{equation} y = (y_\alpha)_{\alpha \in \mathcal{F}_{n,2d}} = \int_S [x]_{2d} d \mu(x) \tag{5.19} \end{equation}\] as the truncated moment sequence of degree up to \(2d\), we can see that problem (5.18) is a linear optimization over the moments of \(\mu\): \[\begin{equation} \begin{split} \min_{\mu \in \mathcal{P}(S)} & \quad \langle \mathrm{vec}(p), y \rangle \\ \mathrm{s.t.}& \quad y_0 = 1 \end{split} \tag{5.20} \end{equation}\] where \(y_0\) is the zero-order moment.

Now in order to solve problem (5.20), the key question boils down to: given a TMS \(y = (y_{\alpha})\), under what conditions can we ensure that \(y\) has a representing measure supported on \(S\), i.e., equation (5.19) holds for some measure \(\mu\)?

As we will introduce soon, we will derive necessary conditions on the TMS \(y\) for it to admit a representing measure, which surprisingly and not surprisingly, is exactly the dual conic constraint \(y \in \mathcal{Z}[h]_{2d} \cap \mathcal{M}[g]_{2d}\) we introduced in Theorem 5.5! Therefore, the moment relaxation can be derived by solving (5.20) with necessary conditions on \(y\) admitting a representing measure.

However, the measure-theoretic perspective will give us one advantage, i.e., via checkable sufficient conditions for \(y\) to admit a representing measure, we can certify convergence of the moment-SOS hierarchy and extract global minimizers of the original POP!

5.5 Truncated Moment Problem

Given a truncated multi-sequence \(y = (y_\alpha)_{\alpha \in \mathcal{F}_{n,2d}}\), it is said to admit a Borel measure if (5.19) holds for some measure \(\mu\) supported on \(S\) (the support of a measure on \(\mathbb{R}^{n}\) is the smallest closed set \(T \subseteq \mathbb{R}^{n}\) such that \(\mu(\mathbb{R}^{n} \backslash T) = 0\)), in which case \(\mu\) is called a \(S\)-representing measure of \(y\).

A measure \(\mu\) whose support is a finite set is called finitely atomic. A measure is called \(r\)-atomic if its support has cardinality \(r\).

The following result, due to (Bayer and Teichmann 2006), shows that a representing measure can always be chosen to be finitely atomic if it exists.

Theorem 5.8 (Representing Measure) If a TMS \(y \in \mathbb{R}^{s(n,2d)}\) admits a representing measure, then it also admits a finitely atomic representing measure \(\nu\) such that \[ \mathrm{supp}(\nu) \subseteq \mathrm{supp}(\mu), \quad \left\vert \mathrm{supp}(\nu) \right\vert \leq s(n,2d). \]

The truncated \(S\)-moment problem concerns the following: given a TMS \(y\), does it admit a representing measure \(\mu\) supported on \(S\)?

To answer this question, let us pick any polynomial \(p(x) \in \mathbb{R}[x]_{2d}\) that is SOS, then it is easy to see that, for any measure \(\mu\), \[ \int_S p(x) d\mu(x) \geq 0 \] must hold. Using the fact that \(p(x)\) is SOS, we can write \[ p(x) = [x]_d^\top Q [x]_d = \left\langle Q, [x]_d [x]_d^\top \right\rangle, \quad Q \succeq 0 \] and therefore \[\begin{equation} \int_S p(x) d\mu(x) = \int_S \left\langle Q, [x]_d [x]_d^\top \right\rangle d \mu(x) = \left\langle Q, \int_S [x]_d [x]_d^\top d \mu(x) \right\rangle = \langle Q, M_d[y] \rangle \tag{5.21} \end{equation}\] where \(M_d[y] \in \mathbb{S}^{s(n,d)}\) is the moment matrix of \(\mu\)’s truncated moment sequence \(y \in \mathbb{R}^{s(n,2d)}\). In order for (5.21) to be nonnegative for any \(Q \succeq 0\), the moment matrix \(M_d[y]\) must be positive semidefinite.

We can keep doing this using the defining polynomials of \(S\). Pick any \(g_i(x) \geq 0\) constraint that defines \(S\), and any SOS polynomial \(\sigma_i(x)\) whose degree \(2s_i\) is chosen such that \(2 s_i + \deg(g) \leq 2d\), then \[ \int_S g_i(x) \sigma_i(x) d\mu(x) \geq 0 \] must hold. Since \(\sigma_i(x)\) is SOS, it can be written as \(\sigma_i(x) = [x]_{s_i}^\top Q_i [x]_{s_i}\) for some \(Q_i \succeq 0\). Then we have \[ \int_S g_i(x) \sigma_i(x) d\mu(x) = \int_S g_i(x) [x]_{s_i}^\top Q_i [x]_{s_i} d\mu(x) = \left\langle Q_i, \int_S g_i(x) [x]_{s_i}[x]_{s_i}^\top d\mu(x) \right\rangle = \langle Q_i, L_{g_i}^d[y] \rangle \] where \(L_{g_i}^d [y]\) is the localizing matrix of \(g_i(x)\) generated by the moment vector \(y\). In order for the above equation to be nonnegative for any \(Q_i \succeq 0\), we must have \(L_{g_i}^d[y] \succeq 0\) for any \(g_i\).

Similarly, we can pick any equality constraint \(h_i(x) = 0\) that defines \(S\) and any polynomial \(\lambda_i(x)\) whose degree is chosen such that \(\deg(\lambda_i) + \deg(h_i) \leq 2d\), then \[ \int_S h_i(x) \lambda_i(x) d\mu(x) = 0. \] In order for this equation to hold for any multiplier \(\lambda_i(x)\), one can verify that we must have the localizing vector \(L_{h_i}^{2d}[y] = 0\). This gives us a set of necessary conditions for \(y\) to admit a representing measure.

Proposition 5.1 (Necessary Condition for Existence of Representing Measure) Let \(y \in \mathbb{R}^{s(n,2d)}\) be a truncated multi-sequence. If \(y\) admits an \(S\)-representing measure, i.e., (5.19) holds for some measure \(\mu\) supported on \(S\), then \(y\) must satisfy \[ M_d[y] \succeq 0, \quad L_{g_i}^d[y] \succeq 0, i=1,\dots,l_g, \quad L_{h_i}^{2d}[y] = 0, i=1,\dots,l_h. \] This is equivalent to \(y \in \mathcal{Z}[h]_{2d} \cap \mathcal{M}[g]_{2d}\).

The first corollary of Proposition 5.1 is that problem (5.14) is a convex relaxation for the infinite-dimensional problem (5.18).

Corollary 5.1 (Moment Relaxation) The convex optimization problem (5.14) is a convex relaxation of problem (5.18) (and hence is also a relaxation of the original POP): \[ \beta^\star_\kappa \leq \rho^\star = p^\star. \]

The second corollary of Proposition 5.1, which is the extra advantage offered via this measure-theoretic interpretation, is that if we solve the moment relaxation (5.14) and obtain an optimal solution \(y_\star\) that indeed admits a representing measure \(\mu_\star\), then \(\mu_\star\) is in fact a global optimal solution to problem (5.18). Moreover, if \(y_\star\) admits a representing measure \(\mu_\star\), then by Theorem 5.8 it also admits a finitely atomic measure \(\nu_\star\) and every point in the support of \(\nu_\star\) is a global optimal solution to the POP. In this case, we say the moment-SOS relaxation is exact or tight.

Corollary 5.2 (Exactness of Moment Relaxation) Let the moment relaxation (5.14) be solvable and \(y_\star\) be an optimal solution. If \(y_\star\) admits an \(r\)-atomic representing measure \(\nu_\star\) supported on \(S\), then every point \(x_\star \in \mathrm{supp}(\nu_\star)\) is a globally optimal solution to the POP (5.17).

Proof. Since \(y_\star\) admits an \(r\)-atomic representing measure \(\nu_\star\) we have \[ \nu_\star = \lambda_1 \delta_{u_1} + \dots + \lambda_r \delta_{u_r}, \quad \lambda_i \geq 0, i=1,\dots,r, \] where \(\mathrm{supp}(\nu_\star) = \{ u_1,\dots,u_r \} \subset S\). This is equivalent to \[ y_\star = \lambda_1 [u_1]_{2\kappa} + \cdots + \lambda_r [u_r]_{2\kappa}, \quad \lambda_i \geq 0, i=1,\dots,r. \] The fact that \(y_\star\) needs to satisfy the constraint \(y_0 = 1\) implies \(\sum_{i=1}^r \lambda_i = 1\).

By the nature of relaxation, \[\begin{equation} \langle \ell_{y_\star}, p(x) \rangle \leq p^\star \Rightarrow \sum_{i=1}^r \lambda_i p(u_i) \leq p^\star. \tag{5.22} \end{equation}\] However, since each \(u_i\) is a feasible point to the POP, we have \[\begin{equation} p(u_i) \geq p^\star \Rightarrow \sum_{i=1}^r \lambda_i p(u_i) \geq p^\star. \tag{5.23} \end{equation}\] Combing (5.22) and (5.23), we have \[ p(u_i) = p^\star, \quad \forall i=1,\dots,r, \] and every \(u_i\) is globally optimal for the POP.

This is an extremely powerful result. It shows that, if given a TMS \(y\) one can check sufficient conditions on \(y\) admitting an \(r\)-atomic representing measure \(\nu_\star\), then it certifies convergence and exactness of the moment-SOS hierarchy. Moreover, every point in the support of \(\nu_\star\) is a globally optimal solution of the original POP.

5.6 Flat Extension: Detecting Global Optimality and Extracting Solutions

Fortunately, there exist sufficient conditions for a TMS \(y \in \mathbb{R}^{s(n,2d)}\) to admit an \(r\)-atomic representing measure. Suppose we solve the moment relaxation at order \(\kappa\) and let \(y_\star = (y_\alpha)_{\alpha \in \mathcal{F}_{n,2\kappa}}\) be an optimal solution. Denote \[ d_c := \max \left\{ 1, \lceil \frac{\deg(h_i)}{2} \rceil,i=1,\dots,l_h, \lceil \frac{\deg(g_i)}{2} \rceil,i=1,\dots,l_g \right\} \] and \[ d_0 := \max \left\{ d_c, \lceil \frac{\deg(f)}{2} \rceil \right\} . \] Clearly we have \(\kappa \geq d_0\), because otherwise the TMS \(y\) does not have enough degree to include all monomials in the objective and constraint polynomials.

The following result gives a sufficient condition for the exactness of the moment-SOS relaxation.

Theorem 5.9 (Flat Truncation) Suppose there exists \(t \in [d_0, \kappa]\) such that \[\begin{equation} \mathrm{rank}(M_t[y_\star]) = \mathrm{rank}(M_{t-d_c}[y_\star]) = r, \tag{5.24} \end{equation}\] then the truncated optimal solution \(y_{\star,2t}:=(y_\alpha)_{\alpha \in \mathcal{F}_{n,2t}}\) admits an \(r\)-atomic representing measure \(\nu\) such that each point in \(\mathrm{supp}(\nu)\) is a global optimizer of the original POP.

Condition (5.24) is called flat extension for the truncated TMS \(y_{\star,2t}\). The proof that the flat extension condition is sufficient for an \(r\)-atomic representing measure is due to (Curto and Fialkow 2005). The proof of exactness given an \(r\)-atomic representing measure of \(y_{\star,2t}\) is given by Corollary 5.2.

In the next, we will present an algorithm that can find the \(r\)-atomic representing measure of \(y_{\star,2t}\) when it satisfies the flat extension condition (5.24), which was first proposed in (Henrion and Lasserre 2005). For ease of notation, I will remove the “\(\star\)” subscript in the TMS \(y_{\star,2t}\). Given \(y \in \mathbb{R}^{s(n,2t)}\), the goal is to extract \(r = \mathrm{rank}(M_t[y])\) optimal solutions of the POP.

Since \(y\) admits an \(r\)-atomic representing measure \(\nu\), we have \[ \nu = \lambda_1 \delta_{x_1} + \dots + \lambda_r \delta_{x_r}, \quad \lambda_i \geq 0, x_i \in S, i=1,\dots,r. \] This implies \[\begin{equation} \begin{split} M_t[y] & = \lambda_1 [x_1]_t [x_1]_t^\top+ \dots + \lambda_r [x_r]_t [x_r]_t^\top\\ & = \underbrace{\begin{pmatrix} [x_1]_t & \dots & [x_r]_t \end{pmatrix}}_{:= V_\star \in \mathbb{R}^{s(n,t) \times r}} \underbrace{\begin{pmatrix} \lambda_1 & & \\ & \ddots & \\ & & \lambda_r \end{pmatrix}}_{\in \mathbb{R}^{r \times r}} \underbrace{\begin{pmatrix} [x_1]_t^\top\\ \vdots \\ [x_r]_t^\top \end{pmatrix}}_{:= V_\star^\top\in \mathbb{R}^{r \times s(n,t)}} \end{split} \end{equation}\]

A polynomial \(p \in \mathbb{R}[x]_t\) such that \(p \equiv 0\) on \(\mathrm{supp}(\nu)\) if and only if \(M_t[y] \mathrm{vec}(p) = 0\) because \[ 0 = \int p(x)^2 d\nu(x) = \mathrm{vec}(p)^\top M_t[y] \mathrm{vec}(p). \] We now look for polynomials from the null space (or kernel) of \(M_t[y]\), for which we denote \(\mathrm{ker}(M_t[y])\). Consider the ideal \[ I := \mathrm{Ideal}[ \{ p \in \mathbb{R}[x]_t \mid \mathrm{vec}(p) \in \mathrm{ker}(M_t[y]) \} ]. \] We will show that we can construct a basis for the quotient space \(\mathbb{R}[x] / I\).

Let the spectral decomposition of \(M_t[y]\) be \[ M_t[y] = V \Lambda V^\top, \quad V \in \mathbb{R}^{s(n,t) \times r}. \] Clearly, the kernel of \(M_t[y]\) is the same as the null space of \(V^\top\): \[ \mathrm{ker}(M_t[y]) = \mathrm{ker}(V^\top). \] Therefore, to get the kernel of \(M_t[y]\), we can first put \(V^\top\) into reduced row echelon form (RREF, for a square matrix, the RREF can be thought of as the upper triangular form), e.g., using the Matlab rref function: \[ U = \begin{bmatrix} 1 & * & 0 & 0 & * & \cdots & 0 & * & * & * \\ & & 1 & 0 & * & \cdots & 0 & * & * & * \\ & & & 1 & * & \cdots & 0 & * & * & * \\ & & & & & \cdots & 0 & * & * & * \\ & & & & & \ddots & 0 & * & * & * \\ & & & & & & 1 & * & * & * \end{bmatrix} \in \mathbb{R}^{r \times s(n,t)}. \] The columns of \(U\) are labeled by the standard monomials \(x^{\alpha}, \alpha \in \mathcal{F}_{n,t}\). The RREF \(U\) has \(r\) pivot columns (columns having the first nonzero element in each row), say labelled by \[ B_0 = \{ \beta_1, \dots, \beta_r \}, \quad \beta_i \in \mathcal{F}_{n,t}, i=1,\dots,t. \] Let \(e_i\) be the standard basis vector with all zeros except \(1\) at the \(i\)-th row, form the new set \[ B_1 := (e_1 + B_0) \cup \dots \cup (e_n + B_0) \backslash B_0. \]

The following result shows that we can obtain all the solutions \(x_1,\dots,x_r\) by solving a system of polynomial equations.

Lemma 5.1 (Extracting Solutions) Suppose the flat extension condition holds for \(y \in \mathbb{R}^{s(n,2t)}\), then, \[ \mathbb{R}[x]/I = \mathrm{span}\{ [x^{\beta_1}], \dots, [x^{\beta_r}] \}, \] and the points \(x_1,\dots,x_r\) in \(\mathrm{supp}(\nu)\) are precisely solutions to the following polynomial system of equations \[\begin{equation} p_{\alpha}(x) = x^{\alpha} - \sum_{j=1}^r U(j,\alpha) x^{\beta_j} = 0, \quad \alpha \in B_1. \tag{5.25} \end{equation}\]

The fact that all the polynomials \(p_{\alpha}(x), \alpha \in B_1\) are in the null space of \(M_t[y]\) can be deduced from the RREF \(U\), e.g., you can see this webpage about how to find the null space of an RREF.

Our goal now becomes to find all the roots of the system (5.25). This is something we learned how to solve in Chapter 4.1.5! Specifically, the basic strategy is to first find the multiplication matrices, and then simultaneously diagonalize them, or use a more robust procedure suggested in (Corless, Gianni, and Trager 1997b), which I asked you to practice in one of the problem sets.

Let us make this solution extraction procedure concrete by going through an example.

Example 5.3 (Solution Extraction from Moment Matrix) Consider the polynomial optimization problem \[\begin{equation} \begin{split} \min_{x \in \mathbb{R}^{2}} & \quad -(x_1 - 1)^2 - (x_1 - x_2)^2 - (x_2 - 3)^2 \\ \mathrm{s.t.}& \quad 1 - (x_1 - 1)^2 \geq 0 \\ & \quad 1 - (x_1 - x_2)^2 \geq 0 \\ & \quad 1 - (x_2 - 3)^2 \geq 0 \end{split} \end{equation}\]

We can implement the primal moment relaxation using the conversion approach in Chapter 5.7.2 and solve the resulting SDP using MOSEK. You can check out the code, which is quite easy to use.

Solving the moment relaxation with \(\kappa =2\), we obtain \[ \mathrm{rank}(M_1[y]) = \mathrm{rank}(M_2[y]) = 3. \] Therefore, there exist three global optimizers. The moment matrix reads \[ M_2[y] = \begin{bmatrix} 1.0000 & 1.5868 & 2.2477 & 2.7603 & 3.6690 & 5.2387 \\ 1.5868 & 2.7603 & 3.6690 & 5.1073 & 6.5115 & 8.8245 \\ 2.2477 & 3.6690 & 5.2387 & 6.5115 & 8.8245 & 12.7072 \\ 2.7603 & 5.1073 & 6.5115 & 9.8013 & 12.1965 & 15.9960 \\ 3.6690 & 6.5115 & 8.8245 & 12.1965 & 15.9960 & 22.1084 \\ 5.2387 & 8.8245 & 12.7072 & 15.9960 & 22.1084 & 32.1036 \end{bmatrix} \] whose rows and columns are indexed by the standard monomial basis \[ [x]_2 = \begin{bmatrix} 1 \\ x_1 \\ x_2 \\ x_1^2 \\ x_1 x_2 \\ x_2^2 \end{bmatrix}. \] We can factorize \(M_2[y]\) as \(VV^\top\) with \[ V = \begin{bmatrix} -0.9384 & - 0.0247 & 0.3447 \\ -1.6188 & 0.3036 & 0.2182 \\ -2.2486 & -0.1822 & 0.3864 \\ -2.9796 & 0.9603 & -0.0348 \\ -3.9813 & 0.3417 & -0.1697 \\ -5.6128 & -0.7627 & -0.1365 \end{bmatrix}. \] Reduce \(V\) to column echelon form (which is equivalent to reduce \(V^\top\) to row echelon form) \[ U = \begin{bmatrix} 1 & & \\ 0 & 1 & \\ 0 & 0 & 1 \\ -2 & 3 & 0 \\ -4 & 2 & 2 \\ -6 & 0 & 5 \end{bmatrix}. \] Pivot entries of \(U\) correspond to the following monomial basis \[ w(x) = \begin{bmatrix} 1\\ x_1 \\ x_2 \end{bmatrix}. \] The polynomial system of equations \([x]_2 = U w(x)\) reads \[\begin{equation} \begin{split} x_1^2 &= -2 + 3x_1 \\ x_1 x_2 &= -4 + 2x_1 + 2x_2 \\ x_2^2 &= -6 + 5x_2 \end{split} \end{equation}\] Multiplication matrices, by \(x_1\) and \(x_2\), can be extracted from \(U\): \[ x_1 \cdot w(x) = \begin{bmatrix} 0 & 1 & 0 \\ -2 & 3 & 0 \\ -4 & 2 & 2 \end{bmatrix} w(x) = M_1 w(x) \] \[ x_2 \cdot w(x) = \begin{bmatrix} 0 & 0 & 1 \\ -4 & 2 & 2\\ -6 & 0 & 5 \end{bmatrix} w(x) = M_2 w(x). \] Perform a generic linear combination of \(M_1\) and \(M_2\): \[ M = 3 M_1 + 4 M_2, \] and then find its Schur decomposition \[ U^* M U = T = \begin{bmatrix} 18.0000 & -36.3731 & -30.0892\\ 0 & 11.0000 & -0.8018 \\ 0 & 0 & 14.0000 \end{bmatrix}, \quad U = \begin{bmatrix} -0.2673 & -0.7715 & -0.5774\\ -0.5345 & 0.6172 & -0.5774\\ -0.8018 &-0.1543 & 0.5774 \end{bmatrix}. \] Then we perform \[ U^* M_1 U = \begin{bmatrix} 2.0000 & -5.1962 & -1.3887 \\ 0.0000 & 1.0000 & -0.2673\\ 0.0000 & 0.0000 & 2.0000 \end{bmatrix}, \quad U^* M_2 U = \begin{bmatrix} 3.0000 & -5.1962 & -6.4807\\ 0.0000 & 2.0000 & 0.0000\\ 0.0000 & -0.0000 & 2.0000 \end{bmatrix} \] from which the three global optimizers can be found: \[ x(1) = \begin{bmatrix} 2 \\ 3 \end{bmatrix}, \quad x(2) = \begin{bmatrix} 1 \\ 2 \end{bmatrix}, \quad x(3) = \begin{bmatrix} 2 \\ 2 \end{bmatrix}. \]

5.7 Conversion to Standard SDP

5.7.1 The First Approach

The moment relaxation (5.14) at the \(\kappa\)-th relaxation order can be written as \[\begin{equation} \begin{split} \max_{y \in \mathbb{R}^{s(n,2\kappa)}} & \quad \langle - \mathrm{vec}(p), y \rangle \\ \mathrm{s.t.}& \quad y_0 = 1 \\ & \quad M_\kappa[y] \succeq 0 \\ & \quad L_{g_i}^{\kappa}[y] \succeq 0, i=1,\dots,l_g \\ & \quad L_{h_i}^{2\kappa}[y] = 0, i=1,\dots,l_h \end{split}, \tag{5.26} \end{equation}\] where I have converted the \(\min\) to \(\max\). We now sketch how we can write problem (5.26) as a standard conic optimization problem described in Chapter 2.3, in particular the dual conic problem (2.8).

First, we can write the moment matrix as \[ M_{\kappa}[y] = \sum_{\alpha \in \mathcal{F}_{n,2\kappa}} y_\alpha B_{\alpha}, \quad B_{\alpha} \in \mathbb{S}^{s(n,\kappa)}, \forall \alpha \in \mathcal{F}_{n,2\kappa}, \] where \(B_\alpha\) are constant (sparse) matrices. For example, for the moment matrix (5.2) in Example 5.1, \(B_\alpha\) corresponding to the exponent \((1,1)\) is \[ B_{11} = \begin{bmatrix} 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \end{bmatrix}. \] One interesting property is \[ \langle B_\alpha, B_\beta \rangle = 0, \quad \forall \alpha \neq \beta, \] simply because \(y_\alpha\) and \(y_\beta\) does not appear in the same location of the moment matrix.

Similarly, we can write each localizing matrix \(L_{g_i}^\kappa[y]\) as \[ L_{g_i}^{\kappa}[y] = \sum_{\alpha \in \mathcal{F}_{n,2\kappa}} y_{\alpha} B^{g_i}_\alpha, \quad B_{\alpha}^{g_i} \in \mathbb{S}^{s(n,s_i)}, s_i = \lfloor (2\kappa - \deg(g_i))/2 \rfloor,i=1,\dots,l_g, \] where \(B_{\alpha}^{g_i}\) are constant matrices. For example, for the localizing matrix in (5.6), \(B_\alpha^g\) of the exponent \(\alpha = (2,2)\) is \[ B_{22}^{g} = \begin{bmatrix} 0 & 0 & 0 \\ 0 & -1 & 0 \\ 0 & 0 & -1 \end{bmatrix}. \]

Lastly, the localizing vector is simply \[ L_{h_i}^{2\kappa}[y] = (H^{2\kappa}_i)^\top y, \quad H^{2\kappa}_i \in \mathbb{R}^{s(n,2\kappa) \times s(n,t_i)}, t_i = 2\kappa - \deg(h_i), i=1,\dots,l_h. \]

With these, we can write problem (5.26) as \[\begin{equation} \begin{split} \max_{y \in \mathbb{R}^{s(n,2\kappa)}} & \quad \langle - \mathrm{vec}(p), y \rangle \\ \mathrm{s.t.}& \quad \sum_{\alpha \in \mathcal{F}_{n,2\kappa}} y_\alpha B_\alpha \succeq 0 \\ & \quad \sum_{\alpha \in \mathcal{F}_{n,2\kappa}} y_\alpha B_\alpha^{g_i} \succeq 0, \quad i=1,\dots,l_g \\ & \quad \underbrace{\begin{bmatrix} 1 \\ 0 \\ \vdots \\ 0 \end{bmatrix}}_{f} - \underbrace{\begin{bmatrix} e_1^\top\\ (H^{2\kappa}_1)^\top\\ \vdots \\ (H^{2\kappa}_{l_h})^\top \end{bmatrix}}_{F} y = 0 \end{split} \tag{5.27} \end{equation}\] where \(e_1\) is the all-zero vector except the first entry equal to \(1\).

Now consider the vector space \[ \mathcal{V}= \mathbb{S}^{s(n,\kappa)} \times \mathbb{S}^{s(n,s_1)} \times \cdots \times \mathbb{S}^{s(n,s_{l_g})} \times \mathbb{R}^{t}, \quad t:= 1 + s(n,t_1) + \dots + s(n,t_{l_h}), \] and the cone within \(\mathcal{V}\) \[ \mathcal{K}= \mathbb{S}^{s(n,\kappa)}_{+} \times \mathbb{S}^{s(n,s_1)}_{+} \times \cdots \times \mathbb{S}^{s(n,s_{l_g})}_{+} \times 0^{t}. \]

Consider the function \(\phi: \mathbb{R}^{s(n,2\kappa)} \rightarrow \mathcal{V}\) defined as \[\begin{equation} \phi(y) = \left( -\sum_{\alpha} y_\alpha B_\alpha, -\sum_{\alpha} y_\alpha B_\alpha^{g_1}, \cdots, -\sum_{\alpha} y_\alpha B_{\alpha}^{g_{l_g}}, Fy \right) \in \mathcal{V} \tag{5.28} \end{equation}\] and the vector \[ C = \left(0,0,\cdots,0,f \right) \in \mathcal{V}. \]

Observe that problem (5.27) is now in standard dual conic form \[\begin{equation} \begin{split} \max_{y \in \mathbb{R}^{s(n,2\kappa)}} & \quad \langle -\mathrm{vec}(p), y \rangle \\ \mathrm{s.t.}& \quad C - \phi(y) \in \mathcal{K}. \end{split} \tag{5.29} \end{equation}\]

We can also write the primal conic form of (5.29) using standard conic duality. The linear map \(\phi(y)\) in (5.28) can be written as \[ \phi(y) = \sum_{\alpha} y_{\alpha} A_{\alpha}, \] with \[ A_{\alpha} = \left( -B_{\alpha}, - B_{\alpha}^{g_1}, \cdots, - B_{\alpha}^{g_{l_g}}, f_{\alpha} \right) \in \mathcal{V}, \quad \alpha \in \mathcal{F}_{n,2\kappa}, \] where \(f_{\alpha}\) is the column of \(F\) indexed by the exponent \(\alpha\). With the dual cone of \(\mathcal{K}\) being \[ \mathcal{K}^* = \mathbb{S}^{s(n,\kappa)}_{+} \times \mathbb{S}^{s(n,s_1)}_{+} \times \cdots \times \mathbb{S}^{s(n,s_{l_g})}_{+} \times \mathbb{R}^{t}, \] the primal conic form of (5.29) reads \[\begin{equation} \begin{split} \min_{X \in \mathcal{V}} & \quad \langle C, X \rangle \\ \mathrm{s.t.}& \quad \langle A_\alpha, X \rangle = - p_{\alpha}, \quad \alpha \in \mathcal{F}_{n,2\kappa} \\ & \quad X \in \mathcal{K}^* \end{split} \tag{5.30} \end{equation}\] It is not too difficult to observe that the primal conic form (5.30) is just the conic representation of the dual SOS program (5.13), because the linear constraints in (5.30) are exactly those that match the coefficients of the two polynomials “\(p(x)\)” and “\(\gamma + \mathrm{Ideal}[h]_{2\kappa} + \mathrm{Qmodule}[g]_{2\kappa}\)”.

5.7.2 The Second Approach

The first conversion approach generates \(X \in \mathcal{K}\), which contains not only PSD variables but also free variables. In the second approach, we can generate a conic program with only PSD variables.

Rewriting the moment matrix. To do so, I will first create a variable \(X_{\mathrm{mom}} \in \mathbb{S}^{s(n,\kappa)}\) and enforce it to be a valid moment matrix by using linear constraints. This is equivalent to saying \[\begin{equation} X_{\mathrm{mom}} = \sum_{\alpha} y_{\alpha} B_{\alpha}, \tag{5.31} \end{equation}\] for some \(y \in \mathbb{R}^{s(n,2\kappa)}\), or in other words \[ X_{\mathrm{mom}} \in B := \mathrm{span}\{ B_{\alpha} \}_{\alpha \in \mathcal{F}_{n,2\kappa}} \subset \mathbb{S}^{s(n,\kappa)} \] i.e., \(X_{\mathrm{mom}}\) lives in the subspace \(B\) of dimension/rank \(s(n,2\kappa)\). The dimension of \(\mathbb{S}^{s(n,\kappa)}\) is \[ s(n,\kappa)^{\Delta} := \frac{s(n,\kappa) (s(n,\kappa) +1)}{2}. \] Therefore, denote the orthogonal complement of \(B\) as \(B^{\perp}\) with dimension \[ d(n,\kappa) := \dim (B^{\perp}) = s(n,\kappa)^{\Delta} - s(n,2\kappa), \] and a (potentially orthonormal) set of basis of \(B^{\perp}\) as \[ \{ B^{\perp}_{1}, \dots, B^{\perp}_{d(n,\kappa)} \}. \] Then \(X_{\mathrm{mom}}\) admitting a moment matrix representation (5.31) is equivalent to a set of linear constraints \[\begin{equation} \langle B^{\perp}_{i}, X_{\mathrm{mom}} \rangle = 0, i=1,\dots, d(n,\kappa). \tag{5.32} \end{equation}\] In addition, we have \[\begin{equation} \langle B_{\alpha}, X_{\mathrm{mom}} \rangle = \left\langle B_{\alpha}, \sum_{\beta}y_{\beta} B_{\beta} \right\rangle = c_{\alpha} y_{\alpha}, \quad \alpha \in \mathcal{F}_{n,2\kappa}, \tag{5.33} \end{equation}\] where \(c_{\alpha} := \Vert B_{\alpha} \Vert_\mathrm{F}^2\) is a constant. Therefore, the constraint \(y_0 = 1\) can be easily written as \[\begin{equation} \langle B_0, X_{\mathrm{mom}} \rangle = 1 \tag{5.34} \end{equation}\] because \(\Vert B_0 \Vert_\mathrm{F}^2 = 1\).

Adding localizing vector constraints. Next we look at the constraints that the localizing vectors must be zero: \[ \bar{F}y = \begin{bmatrix} (H_1^{2\kappa})^\top\\ \vdots \\ (H_{l_h}^{2\kappa})^\top \end{bmatrix} y = 0, \quad \bar{F} \in \mathbb{R}^{\bar{t} \times s(n,2\kappa)}, \bar{t}:=t-1. \] Using (5.33), this is equivalent to another set of linear constraints on the moment matrix \[\begin{equation} \bar{F} \begin{bmatrix} \vdots \\ \langle B_{\alpha}, X_{\mathrm{mom}} \rangle\\ \vdots \end{bmatrix} = 0 \quad \Leftrightarrow \quad \langle H_i, X_{\mathrm{mom}} \rangle = 0, i=1,\dots,\bar{t} \tag{5.35} \end{equation}\] with (if you want explict expressions of \(H_i\)’s) \[ H_i := \sum_{\alpha \in \mathcal{F}_{n,2\kappa}} \bar{F}_{i,\alpha} B_{\alpha}, \quad i=1,\dots,\bar{t} \] where \(\bar{F}_{i,\alpha}\) is the \((i,\alpha)\) entry of \(\bar{F}\).

Adding localizing matrices. Finally, we create additional variables for the localizing matrices \[ X_{\mathrm{loc},1},\dots,X_{\mathrm{loc},l_g}, \] and enforce that their entries are linear combinations of the moments. Recall that a localizing matrix is \[ L_{g_i}^{\kappa}[y] = \ell_y \left( g_i(x)\cdot [x]_{s_i}[x]_{s_i}^\top\right). \] Writing \(g_i(x) = \sum_\gamma g_{i,\gamma} x^{\gamma}\), this means the \((\alpha,\beta)\) entry of \(L_{g_i}^{\kappa}[y]\) is \[ \sum_{\gamma} g_{i,\gamma} y_{\alpha+\beta+\gamma}, \quad \alpha,\beta \in \mathcal{F}_{n,s_i}. \] Therefore, for each localizing matrix, we have the constraint \[\begin{equation} \langle B_{i,\alpha,\beta}, X_{\mathrm{loc},i} \rangle = X_{\mathrm{loc},i,\alpha,\beta} = \langle G_{i,\alpha,\beta}, X_{\mathrm{mom}} \rangle, \quad i=1,\dots,l_g, \alpha,\beta \in \mathcal{F}_{n,s_i}, \alpha \succeq \beta, \tag{5.36} \end{equation}\] where (if you want explict expressions of \(G_{i,\alpha,\beta}\)) \[ G_{i,\alpha,\beta} = \sum_{\gamma} \frac{1}{c_{\alpha+\beta+\gamma}} g_{i,\gamma} B_{\alpha+\beta+\gamma}. \]

The cost function. Finally, the cost function is \[ \sum_{\alpha} p_{\alpha} y_{\alpha} = \sum_{\alpha} \frac{p_{\alpha}}{c_{\alpha}} \langle B_{\alpha}, X_{\mathrm{mom}} \rangle = \left\langle \sum_{\alpha} \frac{p_{\alpha}}{c_{\alpha}} B_{\alpha} , X_{\mathrm{mom}} \right\rangle := \langle C_{\mathrm{mom}}, X_{\mathrm{mom}} \rangle. \]

Putting everything together. The moment relaxation (5.26) can be written as a standard primal SDP \[\begin{equation} \begin{split} \min_{\substack{X_{\mathrm{mom}} \\ X_{\mathrm{loc},1},\dots,X_{\mathrm{loc},l_g}} } & \quad \langle C_{\mathrm{mom}}, X_{\mathrm{mom}} \rangle \\ \mathrm{s.t.}& \quad \langle B_0, X_{\mathrm{mom}} \rangle = 1 \\ & \quad \langle B_i^{\perp}, X_{\mathrm{mom}} \rangle = 0, \quad i=1,\dots,d(n,\kappa) \\ & \quad \langle H_i, X_{\mathrm{mom}} \rangle = 0, \quad i=1,\dots,\bar{t} \\ & \quad \langle B_{i,\alpha,\beta}, X_{\mathrm{loc},i} \rangle - \langle G_{i,\alpha,\beta}, X_{\mathrm{mom}} \rangle = 0, \quad i=1,\dots,l_g, \alpha\succeq\beta \in \mathcal{F}_{n,s_i} \\ & \quad X_{\mathrm{mom}},X_{\mathrm{loc},1},\dots,X_{\mathrm{loc},l_g} \succeq 0 \end{split} \tag{5.37} \end{equation}\]

References

Bayer, Christian, and Josef Teichmann. 2006. “The Proof of Tchakaloff’s Theorem.” Proceedings of the American Mathematical Society 134 (10): 3035–40.
———. 1997b. “A Reordered Schur Factorization Method for Zero-Dimensional Polynomial Systems with Multiple Roots.” In Proceedings of the 1997 International Symposium on Symbolic and Algebraic Computation, 133–40.
Curto, Raúl E, and Lawrence A Fialkow. 2005. “Truncated k-Moment Problems in Several Variables.” Journal of Operator Theory, 189–226.
Henrion, Didier, and Jean-Bernard Lasserre. 2005. “Detecting Global Optimality and Extracting Solutions in GloptiPoly.” In Positive Polynomials in Control, 293–310. Springer.
Lasserre, Jean B. 2001. “Global Optimization with Polynomials and the Problem of Moments.” SIAM Journal on Optimization 11 (3): 796–817.