fanglu at sz.tsinghua.edu.cn


I am currently an Associate Professor in Tsinghua University, Tsinghua-Berkeley Shenzhen Institute (TBSI). I received my Ph.D from HKUST in 2011, and B.E. from USTC in 2007. I used to spend 6 months in Prof. Aggelos K. Katsaggelos's lab in Northwestern University during the best season of Evanston in 2010, and I spent 8 months in Microsoft Research Asia (Beijing) with the host of Prof. Feng Wu in 2013. Later, I visited Prof. Eckehard Steinbach's group in Technical University of Munich (TUM) in 2014, supported by “Humboldt Research Fellowship for Experienced Researchers”.

My research interests are Computational Photography and 3D Vision, including Gigapixel Videography, Multi-Camera Array System, Multi-View Perception and 3D Reconstruction.

RESEARCH PROJECTS (Complete Publication List)

FlashFusion: Real-time Globally Consistent Dense 3D Reconstruction using CPU Computing

Robotics Science and Systems, 2018 Project Page

Aiming at the practical usage of dense 3D reconstruction on portable devices, we propose FlashFusion, a Fast LArge-Scale High-resolution (sub-centimeter level) 3D reconstruction system without the use of GPU computing. It enables globally-consistent localization through a robust yet fast global bundle adjustment scheme, and realizes spatial hashing based volumetric fusion running at 300Hz and rendering at 25Hz. Extensive experiments on both real world and synthetic datasets demonstrate that FlashFusion succeeds to enable real- time, globally consistent, high-resolution (5mm), and large-scale dense 3D reconstruction using highly-constrained computation, i.e., the CPU computing on portable device.


CrossNet: An End-to-end Reference-based Super Resolution Network using Cross-scale Warping

European Conference on Computer Vision(ECCV)2018 Project Page

Learning Cross-scale Correspondence and Patch-based Synthesis for Reference-based Super-Resolution

British Machine Vision Conference (BMVC) 2017


We present CrossNet: an end-to-end CNN model containing encoder, cross-scale warping, and decoder. More specifically, the encoder serves to extract multi-scale features from both the LR and the reference images. Then the cross-scale warping can align the LR and reference image in feature domain. Finally, the decoder aggregates features to synthesize the HR output.


MILD: Multi-Index hashing for appearance based Loop closure Detection

IEEE Int. Conf. on Multimedia & Expo. (ICME) 2017 Best Student Paper Award

Beyond SIFT using Binary Features in Loop Closure Detection

IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS) 2017


A binary feature based Loop Closure Detection method is proposed, which achieves higher precision-recall performance than state-of-the-art SIFT feature based approaches. The proposed scheme employs Multi-Index Hashing for Approximate Nearest Neighbor search of binary features, and it runs at 30Hz for databases containing thousands of images.


SurfaceNet: An End-to-end 3D Neural Network for Multiview Stereopsis

Int. Conf. on Computer Vision (ICCV) 2017 Project Page

We propose an end-to-end learning framework for multiview stereopsis, i.e., SurfaceNet. It takes a set of images and their corresponding camera parameters as input and directly infers the 3D model. The key advantage of the framework is that both photo-consistency as well geometric relations of the surface structure can be directly learned for the purpose of multiview stereopsis in an end-to-end fashion. SurfaceNet is a fully 3D convolutional network which is achieved by encoding the camera parameters together with the images in a 3D voxel representation.


Multiscale Gigapixel Video: A Cross Resolution Image Matching and Warping Approach

Int. Conf. on Computational Photography (ICCP) 2017 Project Page

We present a multi-scale camera array to capture and synthesize gigapixel videos in an efficient way. Our acquisition setup contains a reference camera with a short-focus lens to get a large field-of-view video and a number of unstructured long-focus cameras to capture local-view details.


FlyCap: Markerless Motion Capture Using Multiple Autonomous Flying Camera

IEEE Trans. on Visualization and Computer Graphics, 2017 Project Page

We present a new markerless motion capture end-to-end system for moving targets in a wide space without extra constraints like fixed capture volume, using multiple autonomous flying cameras.


Monocular Long-term Target Following on UAVs

CVPR workshop 2016

We investigate the challenging long-term visual tracking problem and its implementation on Unmanned Aerial Vehicles (UAVs).


Learning High-level Prior with Convolutional Neural Networks for Semantic Segmentation

under review, IEEE Trans. on Image Processing

This paper proposes a convolutional neural network that can fuse high-level prior for semantic image segmentation.


Deep Learning for Surface Material Classification Using Haptic and Visual Information

IEEE Trans. on Multimedia, 2016

This paper deals with the surface material classification problem based on a Fully Convolutional Network (FCN), taking acceleration signal and a corresponding image of the surface texture as inputs.


Computation and Memory Efficient Image Segmentation

IEEE Trans. on CSVT, 2016

This paper addresses the segmentation problem under limited computation and memory resources. Given a segmentation algorithm, we propose a framework that can reduce its computation time and memory requirement simultaneously, while preserving its accuracy.


Magic Glasses: From 2D to 3D

IEEE Trans. on CSVT, 2016

This paper proposes a virtual 3D eyeglasses try on system driven by a 2D Internet image of a human face wearing with a pair of eyeglasses.


Estimation of Virtual View Synthesis Distortion in 3D Video

IEEE Trans. on Image Processing, 2016 & IEEE TIP 2014.

This paper proposes an analytical model to estimate the synthesized view quality in 3D video. The model relates errors in the depth images to the synthesis quality, taking into account texture image characteristics, texture image quality, the virtual view position, and the rendering process.


Stereo Matching with Optimal Local Adaptive Radiometric Compensation

IEEE Signal Processing Letter, 2014

We propose a radiometrically invariant stereo matching method by approximating the spatially varying Pixel Value Correspondence Function (PVCF) between a corresponding pixel pair as a locally consistent polynomial within an optimal local adaptive window.


Robust Blur Kernel Estimation for License Plate Images from Fast Moving Vehicles

IEEE Trans. on Image Processing, 2016

This paper proposes a sparse representation scheme to deal with the snapshot of over-speed vehicle captured by surveillance camera that is frequently blurred due to fast motion.


Deblurring Saturated Night Images with Function-form Kernel

IEEE Trans. on Image Processing, 2015

This paper deals with the deblurring of night images that suffer low contrast, heavy noise and saturated regions. The key idea is to deduce blur kernels from saturated regions via a novel kernel representation and advanced algorithms.


Separable Kernel for Image Deblurring

IEEE Computer Vision and Pattern Recognition (CVPR), 2014

This paper deals with the image deblurring problem in a completely new perspective by proposing separable kernel - trajectory, intensity and defocus, to represent the inherent properties of the camera and scene system.


Adaptive Multispectral Demosaicking Based on Frequency Domain Analysis of Spectral Correlation

IEEE Trans. on Image Processing, 2017

This paper deals with multispectral demosaicking where each band is significantly undersampled due to the increment in the number of bands.


Multichannel Non-Local Means Fusion for Color Image Denoising

IEEE Trans. on CSVT, 2013

This paper proposes a multichannel nonlocal means fusion (MNLF) scheme based on the inherent strong interchannel correlation feature of color images.

Subpixel Rendering: From Font Rendering to Image Subsampling

IEEE Signal Processing Magazine, 2012

This paper introduces subpixel arrangement in color displays, how subpixel rendering works, and several practical subpixel rendering applications in font rendering and image subsampling.