I'm a CMU grad student, with my interest focus on Computer Vision, Deep Learning and Network Security.
During my undergrad study I was fortunate to be an intern with Jian Sun (Megvii/Face++), Song Han (MIT) and Alan Yuille (JHU). I have a track record of contributing to CNN efficient inference. Particularly, I designed channel pruning to effectively prune channels. I further proposed AMC to sample the design space of channel pruning via reinforcement learning, which greatly improved the performance.
I served as a reviewer for CVPR'19, ICLR'19, NIPS'18 and TIP.
We introduce a novel bounding box regression loss for learning bounding box transformation and localization variance together. The resulting localization variance is utilized in our new non-maximum suppression method to improve localization accuracy for object detection. On MS-COCO, we boost the AP of VGG-16 faster R-CNN from 23.6% to 29.1% with a single model and nearly no additional computational overhead. More importantly, our method improves the AP of ResNet-50 FPN fast R-CNN from 36.8% to 37.8%, which achieves state-of-the-art bounding box refinement result.
We propose a collection of three shift-based primitives for building efficient compact CNN-based networks. These three primitives (channel shift, address shift, shortcut shift) can reduce the inference time on GPU while maintains the prediction accuracy. These shift-based primitives only moves the pointer but avoids memory copy, thus very fast.
In this paper, we propose AMC: AutoML for Model Compression that leverage reinforcement learning to efficiently sample the design space and greatly improve the model compression quality.
In this paper, we introduce a new channel pruning method to accelerate very deep convolutional neural networks.Given a trained CNN model, we propose an iterative two-step algorithm to effectively prune each layer, by a LASSO regression based channel selection and least square reconstruction. We further generalize this algorithm to multi-layer and multi-branch cases. Our method reduces the accumulated error and enhance the compatibility with various architectures. Our pruned VGG-16 achieves the state-of-the-art results by 5x speed-up along with only 0.3% increase of error. More importantly, our method is able to accelerate modern networks like ResNet, Xception and suffers only 1.4%, 1.0% accuracy loss under 2x speed-up respectively, which is significant.
Security surveillance is one of the most important issues in smart cities, especially in an era of terrorism. Deploying a number of (video) cameras is a common approach for surveillance information retrieval. Given the never-ending power offered by vehicles to a metropolis, exploiting vehicle traffic to design camera placement strategies could potentially facilitate physicalworld security surveillance. We take the first step towards exploring the linkage between vehicle traffic and camera placement in favor of physical-world security surveillance from a network perspective.
Pruning Very Deep Neural Network Channels for Efficient Inference
Yihui He, Xiangyu Zhang, Jian Sun , TPAMI, Major Revision
Channel Pruning is further expanded to Filterwise Pruning with rich experiements.
Single Image Super-resolution with a Parameter Economic Residual-like Convolutional Neural Network
Yudong Liang, Ze Yang, Kai Zhang, Yihui He, Jinjun Wang, Nanning Zheng [arXiv]
This paper aims to extend the merits of residual network, such as skip connection induced fast training, for a typical low-level vision problem, i.e., single image super-resolution. In general, the two main challenges of existing deep CNN for supper-resolution lie in the gradient exploding/vanishing problem and large amount of parameters or computational cost as CNN goes deeper. Correspondingly, the skip connections or identity mapping shortcuts are utilized to avoid gradient exploding/vanishing problem. To tackle with the second problem, a parameter economic CNN architecture which has carefully designed width, depth and skip connections was proposed. Different residual-like architectures for image superresolution has also been compared. Experimental results have demonstrated that the proposed CNN model can not only achieve state-of-the-art PSNR and SSIM results for single image super-resolution but also produce visually pleasant results.
we perform an evaluation and analysis of cornerstone algorithms for the metric TSP. We evaluate greedy, 2-opt, and genetic algorithms. We use several datasets as input for the algorithms including a small dataset, a medium-sized dataset representing cities in the United States, and a synthetic dataset consisting of 200 cities to test algorithm scalability.
We consider image classification with estimated depth. This problem falls into the domain of transfer learning, since we are using a model trained on a set of depth images to generate depth maps (additional features) for use in another classification problem using another disjoint set of images. It's challenging as no direct depth information is provided. Though depth estimation has been well studied, none have attempted to aid image classification with estimated depth. Therefore, we present a way of transferring domain knowledge on depth estimation to a separate image classification task over a disjoint set of train, and test data. We build a RGBD dataset based on RGB dataset and do image classification on it. Then evaluation the performance of neural networks on the RGBD dataset compared to the RGB dataset. From our experiments, the benefit is significant with shallow and deep networks. It improves ResNet-20 by 0.55% and ResNet-56 by 0.53%.
We evaluate whether features extracted from the activation of a deep convolutional network trained in a fully supervised fashion on a large, fixed set of object recognition tasks can be re-purposed to novel generic tasks. We also released the software and pre-trained network to do large-scale image classification.
I design a small object detection network, which is simplified from YOLO(You Only Look Once) network. It's trained on PASCAL VOC. I evaluate it on an artwork dataset(Picasso dataset). With the best parameters, I got 40% precision and 35% recall.
shuttlecock detection and tracking [Code]
With guassian mixture model, I extract shuttlecock proposals. Then I use Partical filter to refine proposals. From multi view cameras, I employed structure from motion to predict its 3D location. Combined with Physics laws, landing location prediction accuracy is around 5 cm. (This system works on embeded linux with openCV)