I'm a Computer Science final year undergrad from Xi'an Jiaotong Univ., with my interest focus on Computer Vision, Deep Learning and Network Security.
Currently, I'm a research assistant with Song Han at MIT. I've interned at JHU supervised by Alan Yuille, Megvii(Face++) supervised by Jian Sun, and DeepGlint. I studied at UCSB grad school as an exchange student.
I've served as a reviewer for TIP (IEEE Transactions on Image Processing).
Currently applying for Ph.D. Also open to one-year research position. If you're interested, don’t hesitate to contact me. Here's my CV.
Link to [Google Scholar]
In this paper, we propose ADC:Automated Deep Compression that leverage reinforcement learning to efficiently sample the design space and greatly improve the model compression quality.
In this paper, we introduce a new channel pruning method to accelerate very deep convolutional neural networks.Given a trained CNN model, we propose an iterative two-step algorithm to effectively prune each layer, by a LASSO regression based channel selection and least square reconstruction. We further generalize this algorithm to multi-layer and multi-branch cases. Our method reduces the accumulated error and enhance the compatibility with various architectures. Our pruned VGG-16 achieves the state-of-the-art results by 5x speed-up along with only 0.3% increase of error. More importantly, our method is able to accelerate modern networks like ResNet, Xception and suffers only 1.4%, 1.0% accuracy loss under 2x speed-up respectively, which is significant.
Pruning Very Deep Neural Network Channels for Efficient Inference
Yihui He, Xiangyu Zhang, Jian Sun , TPAMI, Major Revision
Channel Pruning is further expanded to Filterwise Pruning with rich experiements.
Single Image Super-resolution with a Parameter Economic Residual-like Convolutional Neural Network
Yudong Liang, Ze Yang, Kai Zhang, Yihui He, Jinjun Wang, Nanning Zheng, TMM in submission [arXiv]
This paper aims to extend the merits of residual network, such as skip connection induced fast training, for a typical low-level vision problem, i.e., single image super-resolution. In general, the two main challenges of existing deep CNN for supper-resolution lie in the gradient exploding/vanishing problem and large amount of parameters or computational cost as CNN goes deeper. Correspondingly, the skip connections or identity mapping shortcuts are utilized to avoid gradient exploding/vanishing problem. To tackle with the second problem, a parameter economic CNN architecture which has carefully designed width, depth and skip connections was proposed. Different residual-like architectures for image superresolution has also been compared. Experimental results have demonstrated that the proposed CNN model can not only achieve state-of-the-art PSNR and SSIM results for single image super-resolution but also produce visually pleasant results.
Vehicle Traffic Driven Camera Placement for Better Metropolis Security
Yihui He*, Xiaobo Ma*, Xiapu Luo, Jianfeng Li, Xiaohong Guan , IEEE Intelligent System, Major Revision [arXiv] [Code]
Security surveillance is one of the most important issues in smart cities, especially in an era of terrorism. Deploying a number of (video) cameras is a common approach for surveillance information retrieval. Given the never-ending power offered by vehicles to a metropolis, exploiting vehicle traffic to design camera placement strategies could potentially facilitate physicalworld security surveillance. We take the first step towards exploring the linkage between vehicle traffic and camera placement in favor of physical-world security surveillance from a network perspective.
we perform an evaluation and analysis of cornerstone algorithms for the metric TSP. We evaluate greedy, 2-opt, and genetic algorithms. We use several datasets as input for the algorithms including a small dataset, a medium-sized dataset representing cities in the United States, and a synthetic dataset consisting of 200 cities to test algorithm scalability.
We consider image classification with estimated depth. This problem falls into the domain of transfer learning, since we are using a model trained on a set of depth images to generate depth maps (additional features) for use in another classification problem using another disjoint set of images. It's challenging as no direct depth information is provided. Though depth estimation has been well studied, none have attempted to aid image classification with estimated depth. Therefore, we present a way of transferring domain knowledge on depth estimation to a separate image classification task over a disjoint set of train, and test data. We build a RGBD dataset based on RGB dataset and do image classification on it. Then evaluation the performance of neural networks on the RGBD dataset compared to the RGB dataset. From our experiments, the benefit is significant with shallow and deep networks. It improves ResNet-20 by 0.55% and ResNet-56 by 0.53%.
We evaluate whether features extracted from the activation of a deep convolutional network trained in a fully supervised fashion on a large, fixed set of object recognition tasks can be re-purposed to novel generic tasks. We also released the software and pre-trained network to do large-scale image classification.
I design a small object detection network, which is simplified from YOLO(You Only Look Once) network. It's trained on PASCAL VOC. I evaluate it on an artwork dataset(Picasso dataset). With the best parameters, I got 40% precision and 35% recall.
shuttlecock detection and tracking [Code]
With guassian mixture model, I extract shuttlecock proposals. Then I use Partical filter to refine proposals. From multi view cameras, I employed structure from motion to predict its 3D location. Combined with Physics laws, landing location prediction accuracy is around 5 cm. (This system works on embeded linux with openCV)