2020 Vol. 46, No. 9

Display Method:
Volume 46 Issue92020
iconDownload (78783) 731 iconPreview
Construction of network security situation indicator system for video private network
LI Xin, DUAN Yongcheng, HUANG Shuhua, FAN Zhijie
2020, 46(9): 1625-1634. doi: 10.13700/j.bh.1001-5965.2020.0043
Abstract:

In view of the lack of a complete set of index system in the current network security situation assessment for video private networks, the article studies the construction of network security situation index system. Firstly, the paper focuses on the research and analysis of the network security risks faced by the video private network; then, combined with the basic evaluation indicators of previous network security situation assessment technology, a network security situation indicator system suitable for video private network is constructed according to the characteristics of video private network; finally, the calculation method of indicator values is given, which lays a foundation for network security situation assessment of video private network. It has certain reference significance for perfecting and enriching the network security situation indicator system for video private network.

Cross-modal object tracking algorithm based on pedestrian attribute
ZHOU Qianli, ZHANG Wenjing, ZHAO Luping, TIAN Naiqian, WANG Rong
2020, 46(9): 1635-1642. doi: 10.13700/j.bh.1001-5965.2020.0042
Abstract:

The accuracy and robustness of the object tracking algorithm have been influenced by the intra-class interference when tracking pedestrian. In this paper, we analyze the drawbacks of current tracking algorithms and propose a model to combine the visual feature and language priori to improve the performance of the tracker.The language guided branch is added to supervise the visual tracking branch by generating the attention, so the intra-class interference can be alleviated.We also propose a method to improve the accuracy of thecross-modal object tracking based on the location confidence instead of classification confidence for siamese trackers.To validate our method, we customize the dataset specialized for pedestrian tracking. The experiment shows the effectiveness of this model.

Lane semantic analysis based on road feature information
LUO Sheng, ZHAO Li, WANG Muchou
2020, 46(9): 1643-1649. doi: 10.13700/j.bh.1001-5965.2020.0079
Abstract:

Law enforcement on express roads in moving car requires to semantically analyze the road by lane detection algorithm, but the accuracy and recall rate of the algorithms based on human-crafted features are not good enough, and the algorithms based on deep learning require too much computing resource. Therefore, this paper proposes a semantical analysis algorithm based on road feature information. The proposed algorithm makes use of the gradient statistical information of edge points to filter out the candidate points in Hough space, and dynamic programming to find the most reasonable solution of lane line combination among the remaining candidate points. Thus it can accurately find all lane markings on roads with less computing resource. The experiment with self acquisition of data shows that the proposed method can structurally find all lanes on structured and unstructured roads. In a comparative experiment, contrasted with some other traditional lane detection methods and some deep learning networks, the proposed algorithm demonstrates its improvement in accuracy, recall rate and computing speed.

Vehicle re-identification optimization algorithm based on high-confidence local features
DOU Xinze, SHENG Hao, LYU Kai, LIU Yang, ZHANG Yang, WU Yubin, KE Wei
2020, 46(9): 1650-1659. doi: 10.13700/j.bh.1001-5965.2020.0067
Abstract:

In solving vehicle re-identification problems, different vehicle regions have different recognition degree of confidence. Based on this observation, we propose a vehicle re-identification optimization algorithm that takes advantage of the high-confidence local features. First, the vehicle key point detection algorithm is utilized to obtain the corresponding multiple key points' coordinate information of the vehicles, and to divide the vehicle brand extension regions and other prominent local regions. As the brand extension region is the most salient region, we propose to improve the degree of confidence of the local region in the testing phase. We also utilize a multi-layer convolutional neural network for processing the input images, cutting the convolutional features into several parts based on the obtained local regions, and acquiring feature tensors representing global and key regional information. Then, a fully connected layer is applied to combine the above features and output a one-dimensional vector for loss function calculating. In the testing phase, to reduce the target distances of vehicles with the same local identification, we propose to utilize the global features together with the high-confidence local features obtained by trained brand extension region extraction branch. Experiments on the widely used vehicle re-identification VehicleID dataset show that the proposed algorithm is effective.

White-box cryptographic video data sharing system based on SM4 algorithm
WU Zhen, BAI Jian, LI Dashuang, LI Bin, ZENG Bing, ZHANG Zhengqiang
2020, 46(9): 1660-1669. doi: 10.13700/j.bh.1001-5965.2020.0080
Abstract:

White-box cryptographic video data sharing system based on SM4 algorithm is a system that guarantees the security of cross-level and cross-domain sharing of surveillance video data.In this paper, we propose a white-box cipher implementation method based on SM4 cryptographic algorithm, and analyze the security of the algorithm, solves the problem how the SM4 algorithm can compute safely in untrusted hardware environments.In addition, we developed a video data security sharing software system based on background permission control mechanism, including shared video data upload/download, sharing audit, data white-box encryption, access control functions, and shared video decryption player based on white-box cryptographic algorithm, realizing the security management and control of the entire process of video data sharing.Then we set up the experimental environment, and the system's functional performance experiment was performed. The experimental results show that the system's functional performance meets the design requirements.

High-performance multi-core video stream transmission model based on PF_RING
LI Xin, FAN Zhijie, CAO Zhiwei, HU Zhengliang, CHEN Guoliang
2020, 46(9): 1670-1676. doi: 10.13700/j.bh.1001-5965.2020.0076
Abstract:

The problem of video traffic transmission has become a research focus in the task of high-performance video stream transmission. In this paper, a model based on PF_RING is proposed to solve this problem. In this model, PF_RING+TNAPI technology, memory routing table, multi-core, multi-queue multi-threading and other related technologies are used to ensure the high-performance transmission of video stream. At the same time, in order to ensure the safe transmission of shared video data among different network domains, a video transmission model for video control signaling dual channel and video data single channel is proposed. The experimental results show that the performance of the proposed method has been increased by at least 10% in effective throughput, CPU utilization, and average bit error rate. Therefore, by fusing PF_RING+TNAPI technology and video transmission system model, the proposed method ensures the security and efficiency of shared video data transmission.

Automatic recognition for terrorism related image based on transfer learning
CHEN Mengfu
2020, 46(9): 1677-1681. doi: 10.13700/j.bh.1001-5965.2020.0046
Abstract:

Using AI and deep learning technology to automatically analyze massive Internet pictures, quickly and accurately identifying harmful images related to terrorism and dealing with them in time is one of the important means for anti-terrorism work. This paper studies how to use deep learning and transfer learning technology to classify and recognize the images related to terrorism. First, we define the main concept features of the image related to terrorism and collect the relevant positive samples to construct dataset. Second, we design suitable deep neural network model and transfer learning method for the problem of less positive samples of the image related to terrorism. Finally, using the constructed training dataset to fine-tune the final model. The results show that, based on the proposed method in this paper, we can classify and recognize the Internet pictures which have terrorism content quickly and accurately with average classification accuracy rate of 96.7%, and thus the labor intensity of manual monitoring will reduce effectively, which can provide support for decision-making in the work of anti-terrorism early warning.

Cross-domain person re-identification based on partial semantic feature invariance
ZHANG Xiaowei, LYU Mingqiang, LI hui
2020, 46(9): 1682-1690. doi: 10.13700/j.bh.1001-5965.2020.0072
Abstract:

Cross-domain is one of the main challenges of person re-identification which is an important investigative method in criminal investigation cases, and that restricts the practical application of re-identification. In this paper, cross-domain invariance feature model based on pedestrian partial semantics is learned from the labeled source domain and the unlabeled target domain. First, features of person parts are learned by supervised learning with only pedestrian signs without labels of parts, and pedestrian parts are aligned by unsupervised learning in the source and target domains. Second, based on the aligned global and local features, feature template pooling is introduced to store the aligned global and local partial features of the target domain, and cross-domain invariance loss function is designed for feature invariance constraints to improve the cross-domain adaptability of person re-identification. Finally, verification experiments of cross-domain person re-identification are conducted on the Market-1501, DukeMTMC-reID and MSMT17 datasets. The experiment results show that the proposed method achieves significant performance improvements in cross-domain person re-identification.

Monocular image based 3D model retrieval using triplet network
DU Yujia, LI Haisheng, YAO Chunlian, CAI Qiang
2020, 46(9): 1691-1700. doi: 10.13700/j.bh.1001-5965.2020.0057
Abstract:

With the diversified development of media data, the cross-domain retrieval between images and 3D models becomes a new challenge for 3D model retrieval. In view that images and 3D models are extremely different and hard to match, a cross-domain retrieval algorithm based on triple network is proposed to construct a joint embedding space for real images and 3D shapes in an end-to-end manner. Then the similarity between different modal data could be effectively computed by the distance in the space, leading to accurate retrieval of similar 3D models from single image. In order to improve the accuracy of cross-domain retrieval, the 3D model was represented by a set of sequential views, and the Gate Recurrent Unit (GRU) was utilized for view-level features to generate the global feature. In addition, an attention mechanism was introduced to extract image features and bridge the semantic gaps between the real image and the rendered 3D views. Experimental results show that the mean average precision can be improved by at least 2.98%-3.05% on two cross-domain datasets compared with other similar algorithms.

Structural weighted low-rank approximation for Poisson image deblurring
WU Qingbo, REN Wenqi
2020, 46(9): 1701-1710. doi: 10.13700/j.bh.1001-5965.2020.0061
Abstract:

To solve the problem of image quality degradation caused by Gaussian blur and Poisson noise, an image deblurring method based on structural weighted low-rank approximation is proposed. First, a structural transformation is introduced by subsequently combining the four basic operations of scaling, rotation, shearing, and flipping in order to boost the similarity of candidate patches in the searching space. Then, a novel objective function is proposed by carefully designing the regularization term. To this end, we perform structural transformation on image patches and then penalize the transformed results with Weighted Nuclear Norm (WNN) based on the assumption of low-rank among non-local similar patches, suppressing Poisson noise at the same time of deblurring. Finally, an alternating optimization algorithm is presented based on the Half-Quadratic Splitting (HQS) method to solve the proposed objective function for Poisson image deblurring. Experimental results demonstrate that, under multiple intensities of Poisson noise, the proposed algorithm achieves higher Peak Signal to Noise Ratio (PSNR) and Structural Similarity (SSIM) than the state-of-the-art deblurring methods.

Learning shrinkage fields for low-light image enhancement via Retinex
WU Qingbo, WANG Rui, REN Wenqi
2020, 46(9): 1711-1720. doi: 10.13700/j.bh.1001-5965.2020.0064
Abstract:

To enhance a low-light image and mitigate noises simultaneously, we propose an image enhancement algorithm by learning Shrinkage Field (SF) to improve the reflectance image and the illumination map in Retinex model. To this end, first, we design a novel objective function by constraining the latent reflectance image and the illumination map with two different groups of high-order filters in regularization terms. These filters can be learnt to possess various activation patterns and thus facilitate recovering the reflectance image and suppressing noises at the same time. Then, we update the latent variables in the objective function optimization by calculating SFs, where the parameterized shrinkage functions are capable to scale the responses of the corresponding high-order filters convolving the reflectance image and the illumination map. Finally, we embed an auxiliary SF model before the update of the illumination map in each cascade to suppress the propagation of noises and undesirable artifacts and further to refine the estimation of illumination map. Experimental results demonstrate that the proposed algorithm outperforms the state-of-the-art low-light image enhancement methods in terms of Peak Signal to Noise Ratio(PSNR) and Structural Similarity (SSIM).

Extraction of foreground area of pedestrian objects under thermal infrared video surveillance
ZHANG Yugui, SHEN Liuqing, HU Haimiao
2020, 46(9): 1721-1729. doi: 10.13700/j.bh.1001-5965.2020.0068
Abstract:

Under the thermal infrared video surveillance, in order to solve the problem of the gray value inversion in thermal infrared image caused by the changes in ambient temperature, this paper proposes a method of extracting the foreground area of pedestrian objects by the fusion of boundary feature and motion feature in the thermal infrared images. First, the boundary feature of the pedestrian objects are extracted by using the significant differences existing between the pedestrian objects and the surrounding environment, and then the extracted boundary feature is subjected to area filling, which is followed by the elimination of the false detection objects by using thermal infrared pedestrian objects classifier, thus obtaining the final boundary feature extraction results. Second, the motion feature of the pedestrian objects are obtained by using the motion information between adjacent frames, and then the resulting motion feature is subjected to the morphological processing, which is followed by the elimination of the false detection objects by using thermal infrared pedestrian objects classifier, thus obtaining the final motion feature extraction results. Finally, the final boundary feature extraction results and the final motion feature extraction results are fused to obtain the final detection results. Our experiments show that the proposed method can effectively reduce the adverse effects brought about by the changes in ambient temperature, and further improve the extraction accuracy of foreground area of the pedestrian objects on the OSU and LSI thermal infrared image pedestrian objects detection dataset.

Intelligent criminal investigation system based on both footprint recognition and surveillance video analysis
TAO Yining, SU Feng, YUAN Peijiang, WANG Tianmiao, ZHONG Tao, HAO Jingru
2020, 46(9): 1730-1738. doi: 10.13700/j.bh.1001-5965.2020.0062
Abstract:

Criminal investigation is the basic requirement for cracking down on crimes and ensuring the long-term security of the country. One of the key points in criminal investigation is how to effectively use the collected information. In order to support criminal investigation better, an intelligent criminal investigation system based on both footprint recognition and surveillance video analysis is proposed. The system is powered by deep learning methods. First, using convolutional neural network technique, it predicts the characteristics of suspected individuals based on footprint information, including the footprint length, width, distribution of force, step length, stride, etc. Second, it uses the surrounding surveillance video big data for intelligent comparison to quickly screen criminal investigation objects and analyze the personal characteristics of pedestrians. Finally, the virtual reality simulation technology is used to construct a finite element model of foot pressure and sole stress analysis, and the model is used to obtain simulation footprints in various complex scenarios. The three confirmed each other and organically combined to quickly select criminal investigation targets. Experimental results show that the system can efficiently and accurately achieve people height prediction based on footprint characteristics. Combined with video surveillance big data, the proposed system can quickly narrow down the investigation range and find out the killer.

Dual-spectrum intelligent temperature detection and health big data management system
ZHANG Jieru, SU Feng, YUAN Peijiang, WANG Tianmiao, TAO Yining, DING Dong
2020, 46(9): 1739-1746. doi: 10.13700/j.bh.1001-5965.2020.0063
Abstract:

Public safety video surveillance has played an important role in the battle against Corona virus disease 2019. Aimed at the characteristics of high population density, large flow of people, and the easy spread of Corona virus disease 2019 in China, an intelligent temperature detection and health big data management system combining visible and infrared dual-spectral imaging monitoring is established to achieve contactless rapid temperature detection and face recognition while wearing a mask, and to quickly complete the registration of personal information. The system has been deployed in multiple places, and has passed the verification of effectiveness and reliability. The measurement speed is fast and the response time is within 30 ms. The measurement accuracy is high and the measurement temperature error is within ±0.3℃. The measurement range is wide and the monitoring distance is 0.1-10 m. The face capture rate is over 99%, and the recognition rate is over 95%. The health big data management system can monitor and track back-to-back personnel movements in real time, perform statistical analysis on personnel information and epidemic development big data in multiple dimensions, conduct epidemic development trends modeling and prediction, improve epidemic prevention and control strategies based on the analysis results, and carry out accurate and efficient epidemic prevention and control.

Pedestrian re-identification method based on spatial attention mechanism
ZHANG Zihao, ZHOU Qianli, WANG Rong
2020, 46(9): 1747-1755. doi: 10.13700/j.bh.1001-5965.2020.0075
Abstract:

Pedestrian re-identification has always been an important part of image retrieval. However, due to different pedestrian poses and complex backgrounds, the extracted pedestrian features are not robust and representative, which in turn affects the accuracy of pedestrian re-recognition. In this paper, based on AlignedReID++ algorithm, we proposes a pedestrian re-identification method based on spatial attention mechanism. First, in the feature extraction part, a spatial attention mechanism is introduced to enhance feature expression while suppressing possible noise. Second, the Instance-Normalization (IN) layer is introduced in the convolution layer to assist the Batch-Normalization (BN) layer to normalize the features and to solve the problem of single BN layer insensitivity to feature tonal and illumination changes, which enhances the robustness of feature extraction to tonal and illumination changes. Finally, to validate the proposed method, extensive experiment has been carried out on the Market1501, DukeMTMC, and CUHK03 pedestrian re-identification datasets. The experimental results show that the recognition accuracy of the improved model on the three datasets has been improved by 2%, 2.9%, and 5.1%, respectively, compared with model before modification, which indicates that the proposed method achieves higher accuracy and more robustness.

Improved face recognition method based on MobileFaceNet network
ZHANG Zihao, WANG Rong
2020, 46(9): 1756-1762. doi: 10.13700/j.bh.1001-5965.2020.0049
Abstract:

In order to solve the problem of more convolutional model parameters and slower convergence speed during training, an improved face recognition method based on MobileFaceNet network is proposed. First, we use the MobileFaceNet network to extract facial features. In the process of extracting features, the number of convolutional layer parameters in the model is reduced by introducing separable convolution. Then, the style attention mechanism is introduced in the MobileFaceNet network to enhance the expression of features. At the same time, the AdaCos face loss function is used to train the model, and the adaptive scaling factor in the AdaCos loss function is used to dynamically adjust the hyperparameters to avoid the effect of artificially setting hyperparameters on the model. Finally, we evaluate the training model on the LFW, AgeDB and CFP-FF test dataset, respectively. The experimental results show that the recognition accuracy of the improved model on the LFW, AgeDB and CFP-FF test dataset has increased by 0.25%, 0.16% and 0.3%, respectively, indicating that the improved model has higher accuracy and robustness than the model before improvement.

An improved ORB algorithm based on region division
SUN Hao, WANG Peng
2020, 46(9): 1763-1769. doi: 10.13700/j.bh.1001-5965.2020.0054
Abstract:

The feature points extracted by the traditional ORB algorithm are not evenly distributed, are redundant and have no scale invariance. To solve this problem, this paper proposes an improved ORB algorithm based on region division. According to the total number of feature points to be extracted and the number of regions to be divided, the algorithm calculates the number of feature points to be extracted for each region, which solves the problem of feature point overlap and redundancy in the feature point extraction process. By constructing the image pyramid and extracting feature points on each layer, the problem that the feature points extracted by ORB algorithm do not have scale invariance is solved. The experimental results show that the feature points extracted by our algorithm are more uniform and reasonable without losing the accuracy of image matching, and the extraction speed is about 16% faster than that of the traditional ORB algorithm.

Vital signs detection via facial video analysis
CHEN Hui, ZHENG Xiujuan, NI Zongjun, ZHANG Yun, YANG Xiaomei
2020, 46(9): 1770-1777. doi: 10.13700/j.bh.1001-5965.2020.0065
Abstract:

To detect the physiological signals related to vital signs via facial video is easily affected by ambient lights and head motions. In order to reduce the disturbance and increase the accuracy of estimations of vital signs, this paper proposes a facial video analysis method that combines Ensemble Empirical Mode Decomposition (EEMD) algorithm and signals quality detection to accurately detect vital signs such as the heart rate and respiratory rate of human beings. The performance of the proposed method is validated by comparing it with the existing signal processing techniques in a public dataset. The experimental results show that the proposed method can obtain more accurate estimates of heart rate and respiratory rate than the existing methods. The correlation coefficients between the estimates and the golden standards are higher than 0.9 and 0.8, respectively. The vital signs detection method has the potential to benefit real-time living face recognition and intelligent surveillance video analysis.

A lightweight multi-target real-time detection model
QIU Bo, LIU Xiang, SHI Yunyu, SHANG Yanfeng
2020, 46(9): 1778-1785. doi: 10.13700/j.bh.1001-5965.2020.0066
Abstract:

For the public security monitoring system, a lightweight multi-target real-time detection algorithm is proposed in order to realize the accurate intelligence of the content analysis and improve the actual service ability. First, the multi-fusion gradient cascade structure of CBNet is added based on CenterNet detection network, which effectively solves the problem of insufficient feature extraction capability of the backbone network in daily monitoring videos. Second, the number of parameters is reduced through the model pruning and compression, which can speed up the analysis speed of monitoring videos. During the experiments, the dataset for training and testing consists of a part of COCO datasets and a number of field data collected by ourselves. The ablation experiments are conducted with other mainstream detection algorithms (YOLO, Faster-RCNN, SSD, etc.). The experimental results show that the presented model can effectively balance the speed and precision in the analysis of monitoring videos for public security and has stronger universality.

Trans-scale feature aggregation network for multiscale pedestrian detection
CAO Shuai, ZHANG Xiaowei, MA Jianwei
2020, 46(9): 1786-1796. doi: 10.13700/j.bh.1001-5965.2020.0069
Abstract:

Space scale variation of pedestrian instance is one of the main bottlenecks affecting pedestrian detection performance. For this issue, a Trans-Scale Feature Aggregation Network (TS-FAN) is proposed to effectively deal with multi-scale pedestrian detection. First, in view of the feature differences among different scale spaces, we introduce a scale compensation strategy based on multi-path Region Proposal Network (RPN). According to the effectiveness of the convolutional feature layers of different scales, a series of candidate regional scale sets are generated adaptively from the feature maps corresponding to the size of the receptive field. Second, considering the semantic complementarity of convolutional features at different levels, a trans-scale feature aggregation module is proposed to effectively aggregate with semantic robustness highllevel features and with accurate location information of low-level features and achieve enhanced representation ability of convolutional features, by aggregating horizontal connection, top-down path and bottom-up path. Finally, combining the multi-path RPN scale compensation strategy and trans-scale feature aggregation module, we construct a multi-scale pedestrian detection network by adaptive scale perception. The experimental results show that, compared with the state-of-the-art method TLL-TFA, the log-average miss rate of pedestrian detection on widely-used Caltech dataset is reduced to 26.21% (increased by 11.94%) for whole-scale pedestrians (above 20 pixel in height), and 47.30% (increased by 12.79%) for small-scale pedestrian (between 20-30 pixels in height). And the similar improvement is also achieved on ETH dataset with drastic scale variation.

Facial expression recognition method based on a joint normalization strategy
LAN Lingqiang, LI Xin, LIU Qiyuan, LU Shuhua
2020, 46(9): 1797-1806. doi: 10.13700/j.bh.1001-5965.2020.0073
Abstract:

As for that end-to-end feature extraction and classification based on deep learning often used in facial expression recognition, a new method of depth model optimization has been proposed. This paper proposes the joint optimization strategies learned from ResNet18 residual network and normalization ideas, that is, filter response normalization and batch normalization, instance normalization and group normalization, as well as group normalization and batch normalization were embedded in the network, respectively, to balance and improve the distribution of feature data, make up for the shortcomings of single regularization, and improve model performance. The validation and test were carried out on the two public datasets FER2013 and CK+, and the highest accuracy rates are 73.558% and 94.9%, respectively. The experimental results indicate that the joint optimization strategy enhances the performance of the basic network, which is better than most of the latest facial expression recognition methods.

Palmprint enhancement and ROI extraction based on U-Net
LU Zhanhong, SHAN Lubin, SU Lixun, JIAO Yuxin, WANG Jiahua, WANG Haixia
2020, 46(9): 1807-1816. doi: 10.13700/j.bh.1001-5965.2020.0309
Abstract:

Palmprint is a biometric feature with great application potential due to its stability, uniqueness, hardness in copying, and easy access.In the palmprint recognition, palmprint Region of Interest(ROI) acquisition and palmprint enhancement usually have the problems of high time cost and high dependency between methods.This paper proposes a palmprint preprocessing method based on U-Net neural network structure.The experiments are carried out on palmprint database from Hong Kong Polytechnic University.The results show that the proposed method can eliminate the mutual influence between the preprocessing methods.Both denoising and enhancement of palmprint images are realized, and the region of interest can be extracted quickly and accurately; plamprint