CrossNet: An End-to-end Reference-based Super Resolution Network using Cross-scale Warping

 H. Zheng, M. Ji, H. Wang, Y. Liu, L. Fang

Fig. 1 Framework of CrossNet for RefSR.


The Reference-based Super-resolution (RefSR) super-resolves a low-resolution (LR) image given an external high-resolution (HR) reference image, where the reference image and LR image share similar viewpoint but with significant resolution gap (8 times). Existing RefSR methods work in a cascaded way such as patch matching followed by synthesis pipeline with two independently defined objective functions, leading to the inter-patch misalignment, grid effect and inefficient optimization. In this paper, we present CrossNet: an end-to-end CNN model containing encoder, cross-scale warping, and decoder. More specifically, the encoder serves to extract multi-scale features from both the LR and the reference images. Then the cross-scale warping can align the LR and reference image in feature domain. Finally, the decoder aggregates features to synthesize the HR output. The beauty of such end-to-end and fullconvolutional pipeline lies in its high efficiency to inference. Moreover, the cross-scale warping itself outperforms conventional patch matching scheme to predict more precise alignment in dealing with parallax. Extensive experiment on several large-scale datasets demonstrate the superior performance of CrossNet (around 2dB-4dB) compared to previous methods. More importantly, CrossNet achieves a speedup of more than 100 times compared to existing RefSR approaches, allowing the model to be applicable for real-time applications.


Our reference-based super resolution scheme, named CrossNet, is based on a fully convolutional cross-scale alignment module that spatially aligns the reference image information to the LR image domain. Along with the cross-scale alignment module, an encoder-decoder structure is proposed to directly synthesize the RefSR output in an end-to-end, and fully convolutional fashion. The entire network is plotted in Fig. 2.

Fig. 2 Network structure of the proposed model.


We train and evaluate our CrossNet on the representative Flower dataset and the Light Field Video (LFVideo) dataset (Shown in Figure 3 and Table 1).

Fig. 3 The PSNR measurement under different parallax settings: the reference images are select at (0; 0) LF grid, while the LR image are selected at (i; i) LR grid ( (i; i), 0 < i <= 8).

Table 1 Quantitative evaluation of the state-of-the-art SISR and RefSR algorithms, in terms of PSNR/SSIM/IFC for scale factors 4 times and 8 times respectively.

Video: Experiments on popular dataset.

Source Code




CrossNet: An End-to-end Reference-based Super Resolution Network using Cross-scale Warping

  title={CrossNet: An End-to-end Reference-based Super Resolution Network using Cross-scale Warping}, 
  author={Haitian Zheng, Mengqi Ji, Haoqian Wang, Yebin Liu, Lu Fang},   
  organization={European Conference on Computer Vision ECCV},   
  year={2018} }


Supplementary Material