An attempt to build fully differentiable alternative of (non-negative) matching pursuit algorithm for solving L0-sparsity dictionary learning
machine learning/dictionary learning
Introduction
In sparse dictionary learning, sparse coding and dictionary update are solved in an alternating manner (Aharon et al., 2006).
In sparse coding stage, the following problem is solved given the dictionary
where
Proposed method
(Louizos et al., 2018) suggests a novel approach to handle the intractability of
where
To solve the constrained minimization in Equation (2), it’s natural to proceed using Lagrangian multiplier and optimize under bound constraint only:
On the one hand, one may optimize
In practice, potentially a great number of signals are required to be sparse coded given the dictionary:
It’s not uncommon that all the variables to optimize, especially
With dictionary learning, the dictionary need to be learned. Using the objective (5), I found it preferable to optimize using the procedure below:
- Given
, reinitialize , and jointly learn and until stationary point. - Given
and , perform one-step gradient ascent on . - Iterate.
I found the reinitialization step on the amortized network critically important. Without it, the network tends to predict all-zero and eventually learns nothing. However, the dictionary needs to be initialized only at the very beginning.
Experiments
For dictionary learning without non-negativity constraint on sparse coding, I compared against (Rubinstein et al., 2008) in image denoising. My proposed fully differentiable solution converges slower and denoises poorer than K-SVD supported by batch OMP.
For dictionary learning with non-negative constraint on sparse coding, I compare against (Nguyen et al., 2019) in exploration of atoms of discourse, which is known to admit a non-negative sparse coding form (Arora et al., 2018). While being faster, my proposed method still performs worse than non-negative OMP, in that the learned dictionary atoms are mostly not atoms of discourse.
Hence, this is the main reason why I record my attempt here in a post rather than write a paper. Perhaps, the proposed method is promising, but it’s not well-prepared yet.