2022 Vol. 48, No. 8

Display Method:
Volume 48 Issue82022
iconDownload (90388) 1466 iconPreview
Double drive adaptive super-resolution reconstruction method of remote sensing images for object detection
CHENG Keyang, RONG Lan, JIANG Senlin, ZHAN Yongzhao
2022, 48(8): 1343-1352. doi: 10.13700/j.bh.1001-5965.2021.0517
Abstract:

The existing optical remote sensing image super-resolution reconstruction method is mainly to generate visually satisfactory images, and does not take into account the particularity of the subsequent target detection task, so it cannot be effectively applied to target detection. Therefore, a double drive adaptive multi-scale optical remote sensing image super-resolution reconstruction method for target detection is proposed. The super-resolution reconstruction network and target detection network are combined for joint optimization. According to the characteristics of optical remote sensing image, an adaptive multi-scale remote sensing image super-partition reconstruction network is designed. The selective kernel network and adaptive gating unit are integrated to extract and fuse features, and the primary remote sensing image is reconstructed. Through the double drive module, and task driven will feature a priori driver loses to the above points in the network, on the one hand, improve the performance of target detection. The proposed method was tested on UCAS-AOD and NWPU VHR-10 datasets, and compared with the five mainstream algorithms, peak signal-to-noise ratio and average accuracy improved by 1.86 dB and 3.73%, respectively, compared with the FDSR algorithm. Experimental results show that compared with other methods, the combination of the proposed algorithm and optical remote sensing image target detection can achieve better results and the comprehensive performance is the best.

Traffic signal timing method based on deep reinforcement learning and extended Kalman filter
WU Lan, WU Yuanming, KONG Fanshi, LI Binquan
2022, 48(8): 1353-1363. doi: 10.13700/j.bh.1001-5965.2021.0529
Abstract:

The deep Q-learning network (DQN) has become an effective method to solve the traffic signal timing problem because of its strong perception and decision-making ability. However, in the field of traffic signal timing systems, the problem of parameter uncertainty caused by external environment disturbance and internal parameter fluctuation limits its further development. Based on this, a traffic signal timing method combining DQN and extended Kalman filter (DQN-EKF) is proposed. In this method, the uncertain parameters of the estimated network are taken as the state variables, and the target network values with uncertain parameters are taken as the observed variables. The EKF system equation is constructed by combining the process noise, the estimated network values with uncertain parameters and the system observation noise. The optimal estimation of the parameters in the DQN model is obtained through the iterative updating of the EKF Uncertainty. The experimental results show that the DQN-EKF timing algorithm is suitable for different traffic environments and can effectively improve the traffic efficiency of vehicles.

Single image dehazing method based on improved atmospheric scattering model
YANG Yong, QIU Genying, HUANG Shuying, WAN Weiguo, HU Wei
2022, 48(8): 1364-1375. doi: 10.13700/j.bh.1001-5965.2021.0532
Abstract:

Images obtained in foggy conditions often suffer from low contrast, color loss, and noise. At present, many traditional dehazing methods mainly focus on solving problems such as low contrast and color loss, but do not consider the hidden noise light scattered by dust particles in the air, resulting in a large amount of noise in the dehazing results. This work provides an image dehazing algorithm based on an enhanced atmospheric scattering model to address the mentioned problems. Firstly, according to the characteristics of haze, the traditional atmospheric scattering model of hazy imaging is improved by adding the noise light reflected by the medium in the air. Then, in order to address the transmission calculation inaccuracy problem for the dark channel prior, a refined calculation method of transmission is constructed according to the improved model. Finally, combined with the idea of edge preservation and noise suppression of the total variation model, a new objective function is constructed and solved iteratively to obtain the final defogging image. A large number of experimental results and comparative analyses show that the proposed method can effectively remove the haze in the image, reduce the noise in the dehazing results, and retain the rich texture information in the image.

Region-hierarchical predictive coding for quantized block compressive sensing
LIU Hao, ZHENG Haoran, HUANG Rong
2022, 48(8): 1376-1382. doi: 10.13700/j.bh.1001-5965.2021.0511
Abstract:

During the predictive coding of quantized block compressive sensing, a large quantity of inefficient candidates will lead to low rate-distortion performance. To efficiently reduce the encoding distortion, this paper proposes a region-hierarchical predictive coding method for quantized block compressive sensing, which is based on the block-by-block spiral scan. After all blocks are measured at a subrate, the measurement vector of each block is numbered and encoded in spiral scan order. For the current measurement vector, its prediction vector is the inverse quantization vector with maximum similarity from its context-aware candidate set. According to its hierarchical correlation, each measurement vector is classified into one of three regions. The block coding model is used to determine adaptive quality factors for different regions, where the key region is assigned a larger quality factor. As compared with the existing predictive coding methods, the proposed method jointly utilizes the local correlation and hierarchical correlation among these vectors, and the experimental results show that at least 0.12 dB rate-distortion gain is obtained.

Dual coding unit partition optimization algorithm of HEVC
LIU Meiqin, XU Chenming, YAO Chao, LIN Chunyu, ZHAO Yao
2022, 48(8): 1383-1389. doi: 10.13700/j.bh.1001-5965.2021.0528
Abstract:

To resolve the conflict between the increasing amount of video data and the demand for high-quality video experience, HEVC has boosted the compression performance by 50% based on dramatically increased complexity of H.264/AVC. In this paper, a fast coding unit (CU) partition algorithm is proposed to reduce the computational complexity of HEVC intra coding. To define the partition criteria, we design a convolutional neural network, named dual neural networks (DualNet). DualNet consists of two subnetworks, a prediction network and a target network. The prediction network is used to determine the partition actions by extracting images statistical features for skipping the traversal search of quadtree and improving the time efficiency of the CU partition. And the target network is to optimize the performance of the CU partition based on rate-distortion for achieving model complementarity. Experimental results show that the proposed algorithms can save 64.06% of the compression time with similar compression performance to HEVC.

Object tracking method based on IoU-constrained Siamese network
ZHOU Lifang, LIU Jinlan, LI Weisheng, LEI Bangjun, HE Yu, WANG Yihan
2022, 48(8): 1390-1398. doi: 10.13700/j.bh.1001-5965.2021.0533
Abstract:

The tracking method based on the Siamese network trains the tracking model offline. Therefore, it maintains a good balance between tracking accuracy and speed, which attracts the interest of a growing number of researchers recently. The existing Siamese network object tracking method uses a fixed threshold to select positive and negative training samples, which is easy to cause the problem of missing training samples, and such methods have low correlation between the classification branch and the regression branch during training, which is not conducive to training a high-precision tracking model. To this end, an object tracking method based on intersection over union (IoU)-constrained siamese network is proposed. By using a dynamic threshold strategy, the thresholds of positive and negative training samples are dynamically adjusted according to the relevant statistical characteristics of the predefined anchor boxes and the real boxes. Thereby improving the tracking accuracy. In addition, the proposed method uses the IoU quality assessment branch to replace the classification branch, and reflects the position of the target through the IoU between the anchor box and the target ground-truth frame, which improves the tracking accuracy and reduces the amount of model parameters. The proposed object tracking method based on the IoU-constrained Siamese network has been compared and tested on four datasets: VOT2016, OTB-100, VOT2019, and UAV123. Ideal results have been achieved in these datasets. The tracking accuracy of the proposed method in this paper is 0.017 higher than SiamRPN on the VOT2016 dataset. And with a real-time running speed at 220 frame/s, the expected average overlap rate is 0.463, which is only 0.001 worse than SiamRPN++.

Long-tail image captioning with dynamic semantic memory network
LIU Hao, YANG Xiaoshan, XU Changsheng
2022, 48(8): 1399-1408. doi: 10.13700/j.bh.1001-5965.2021.0518
Abstract:

Image captioning takes image as input and outputs a text sequence. Nowadays, most images included in image captioning datasets are captured from daily life of internet users. Captions of these images are consequently composed of a few common words and many rare words. Most existing studies focus on improving performance of captioning in the whole dataset, regardless of captioning performance among rare words. To solve this problem, we introduce long-tail image captioning with dynamic semantic memory network (DSMN). Long-tail image captioning requires model improving performance of rare words generation, while maintaining good performance of common words generation. DSMN model dynamically mining the global semantic relationship between rare words and common words, enabling knowledge transfer from common words to rare words. Result shows DSMN improves performance of semantic representation of rare words by collaborating global words semantic relation and local semantic information of the input picture and generated words. For better evaluation on long-tail image captioning, we organized a task-specified test split Few-COCO from original MS COCO Captioning dataset. By conducting quantitative and qualitative experiments, the rare words description precision of DSMN model on Few-COCO dataset is 0.602 8%, the recall is 0.323 4%, and the F-1 value is 0.356 7%, showing significant improvement compared with baseline methods.

Medical image segmentation based on multi-layer features and spatial information distillation
ZHENG Yuxiang, HAO Pengyi, WU Dong'en, BAI Cong
2022, 48(8): 1409-1417. doi: 10.13700/j.bh.1001-5965.2021.0504
Abstract:

U-Net is currently the most widely used segmentation model, and its "coding-decoding" structure has also become the most commonly used structure for building medical image segmentation models. Although U-Net has achieved very high segmentation accuracy in many fields, but there are problems such as highcomputational complexity, slow reasoning speed, and high memory consumption, which makes it difficult to deploy on mobile application platforms. To solve this problem, a medical image segmentation method combining multi-layer features and spatial information distillation, named as TinyUnet, is proposedin this paper. This method uses the U-Net with fewer parameters as the student network, which is smaller and lighter than the original U-Net. Considering that the small model does not have enough learning ability, this method distils the multi-layer teacher feature maps by selecting the appropriate distillation position; at the same time, this method strengthens the edge of the deep feature map of the teacher network, constructs the edge key point map structure, and uses the graph convolution network to distil the spatial information of the student network, so as to guide the student network to obtain more effective edge information and spatial information. Experiments show that TinyUnet can maintain the segmentation accuracy of U-Net from 98.3% to 99.7% on the three medical datasets, but reduces the parameters of U-Net by 99.6% on average and increases the computing speed by about 110 times. Meanwhile, compared with other advanced compact medical image segmentation models, TinyUnet not only achieves good segmentation accuracy but also occupies less memory and runs faster.

No reference quality assessment method for contrast-distorted images based on three elements of color
DING Yingqiu, YANG Yang, CHENG Ming, ZHANG Weiming
2022, 48(8): 1418-1427. doi: 10.13700/j.bh.1001-5965.2021.0509
Abstract:

Image quality assessment is a basic and challenging problem in the field of image processing, among which the contrast distortion has a greater impact on the perception of image quality. However, there is relatively little research on the no-reference image quality assessment of contrast-distorted images. This paper proposes a no-reference contrast-distorted image quality assessment method based on the three elements of color. The three parameters of brightness, hue and saturation of the three elements of color are used to realize the assessment of contrast-distorted images. First, in terms of brightness, the moment feature and the Kullback-Leibler divergence between the image histogram and the uniform distribution are extracted. Secondly, in terms of hue and saturation, the color-weighted local binary patterns (LBP) histogram features are extracted from the H and S channels of the HSV space, respectively. Finally, the AdaBoosting BP neural network is used to train the prediction model. Through extensive experimental analysis and cross-validation in five standard image databases, the experimental results show that the performance of this method is significantly improved compared with the existing contrast-distorted image quality assessment methods.

Knowledge graph completion based on graph contrastive attention network
LIU Danyang, FANG Quan, ZHANG Xiaowei, HU Jun, QIAN Shengsheng, XU Changsheng
2022, 48(8): 1428-1435. doi: 10.13700/j.bh.1001-5965.2021.0523
Abstract:

Knowledge graph (KG) completion aims to predict missing links based on the known triples in a knowledge base. Since most KG completion methods dealt with triples independently without capture the heterogeneous structure of KG and the rich information that was inherent the in neighbor nodes, which resulted in incomplete mining of triple features. This study revisits the end-to-end KG completion task, and proposes a novel graph contrastive attention network (GCAT), which can capture latent representations of entities and relations simultaneously through attention mechanism, and encapsulate more neighborhood context information from the entity. Specifically, to effectively encapsulate the features of triples, a subgraph-level contrastive training object is introduced, enhancing the quality of generated entity representation. To justify the effectiveness of GCAT, the proposed model is evaluated on link prediction tasks. Experimental results show that on the dataset FB15k-237, MRR of the model is 0.005 and 0.042 higher than that of InteractE and A2N, respectively, and that on the dataset WN18RR, MRR is 0.019 and 0.032 higher than that of InteractE and A2N, respectively. Experiments prove that the proposed model can effectively predict the missing links in KGs.

Image difference caption generation with text information assistance
CHEN Weijing, WANG Weiying, JIN Qin
2022, 48(8): 1436-1444. doi: 10.13700/j.bh.1001-5965.2021.0526
Abstract:

The image captioning task requires the machine to automatically generate natural language text to describe the semantic content of the image, thus transforming visual information into textual descriptions that facilitate image management, retrieval, classification, and other tasks. Image difference captioning is an extension of the image captioning task, which requires generating natural language sentences to describe the differences between two similar images. The difficulty of this task is how to determine the visual semantic difference between two images and convert the visual difference information into the corresponding textual descriptions. Previous studies do not make full use of textual information in the training stage to model cross-modal semantic associations between visual difference information and text. In this regard, the proposed framework named TA-IDC uses textual information to assist training. It adopts a multi-task learning method, adding a text encoder to the encoder-decoder structure and introducing textual information by text-assisted decoding and mixed decoding during the training stage. This aids in the modeling of semantic relationships between visual and text modalities, resulting in more accurate picture difference captions. Experimentally, TA-IDC outperforms the best results of existing models on main metricsby 12%, 2%, and 3% on three image difference caption datasets, respectively.

A high-speed spectral clustering method in Fourier domain for massive data
ZHANG Man, XU Zhaorui, SHEN Xiangjun
2022, 48(8): 1445-1454. doi: 10.13700/j.bh.1001-5965.2021.0537
Abstract:

Spectral clustering is widely used in data mining and pattern recognition. However, due to the high computational cost of eigenvector solutions and the huge memory requirements brought by big data, spectral clustering algorithm is greatly limited when it is applied to large-scale data. Therefore, this paper studies a high-speed spectral clustering method for massive data in the Fourier domain. This method makes full use of the repeatability of data pattern, and uses this characteristic to model in the Fourier domain. To get final eigenvectors, the time-consuming eigenvector pursuitcan be transformed into the selection of the pre-determined discriminant basis in the Fourier domain. The calculation process only needs simple multiplication and addition, so the amount of time for calculation is greatly reduced. On the other hand, due to the characteristics of calculation in the Fourier domain, another advantage of this method is that it can train the samples in batches, that is, only using part of the samples can well estimate eigenvector distribution in the whole data. The experimental results on large-scale data such as Ijcnn1, RCV1, Covtype-mult, Poker and MNIST-8M show that the training time of the proposed method is at most 810.58 times faster than that of algorithms FastESC, LSSHC, SC_RB, SSEIGS and USPEC, on the premise that the clustering accuracy and other indicators are basically maintained, which proves that the proposed method has significant advantages in processing large-scale data.

Crowd density estimation for fisheye images
YANG Jialin, LIN Chunyu, NIE Lang, LIU Meiqin, ZHAO Yao
2022, 48(8): 1455-1463. doi: 10.13700/j.bh.1001-5965.2021.0520
Abstract:

Aiming at the problem that the traditional crowd density estimation methods are not applicable under the distortion of fisheye images, this paper presents a crowd density estimation method for fisheye images, which realizes the monitoring of human traffic in scene of using fisheye lens. For model structure, we introduced deformable convolution to improve the adaptability of the model to fisheye distortion. For generating the training targets, we used Gaussian transform to perform a distribution match on the density maps of annotations, which depends on the features of fisheye distortion. For training, we optimized the loss function to avoid the model from falling into local optimal solutions. In addition, we collected and labeled the corresponding dataset due to the lack of dataset for fisheye crowd estimation. At last, by comparing the subjective and objective experiments with classical algorithms, we proved the superiority of the crowd estimation method for fisheye images in this paper with the mean absolute error of 3.78 in the test dataset, which is lower than others.

A full-scale feature aggregation network for remote sensing image change detection
LIU Guoqiang, FANG Sheng, LI Zhe
2022, 48(8): 1464-1470. doi: 10.13700/j.bh.1001-5965.2021.0522
Abstract:

Change detection (CD) is an important task of remote sensing, always facing many pseudo changes and large scale variations. However, existing methods mainly focus on modeling difference features and neglect extracting sufficient information from the original images, which affects feature discrimination and makes it difficult to distinguish change regions stably. To address these problems, a full-scale feature aggregation network (FFANet) is proposed to make fuller use of the original image features, which drives the generated feature representations to be semantically richer and spatially more precise, thus improving the network's detection performance for small targets and target edges. Deep supervision is also extended to combine multi-scale prediction maps to drive the detection of different objects at more appropriate scales, thus improving the robustness of the network to object scale variations. On the CDD dataset, our proposed method improves the F1-score by 0.034 compared to the baseline network by increasing the number of parameters by only 1.01×106.

Improved spatial and channel information based global smoke attention network
DONG Zeshu, YUAN Feiniu, XIA Xue
2022, 48(8): 1471-1479. doi: 10.13700/j.bh.1001-5965.2021.0549
Abstract:

Smoke has the characteristics of semi-transparency, irregularity and blurry boundaries, leading to the challenging task of image smoke segmentation. To solve these problems, we propose an attention modeling method to extract the correlation of long-distance information. The attention method can capture the long-distance dependency of pixels and continuity of regions, so as to reduce the misclassification of discontinuous smoke regions. To avoid large memory consumption of large matrix multiplication and high computational complexity, we modify both spatial and channel attention structures to design a bi-direction attention (BDA) and a multi-scale channel attention (MSCA), which are used to compensate for lost spatial information by global pooling in attention methods. In addition, we propose a global smoke attention network, which combines residual networks with attention models to reduce memory consumption and computational complexity without sacrificing global correlation information. Experimental results show that the proposed network achieves the mean intersection over union of 73.13%, 73.81% and 74.25% on the three virtual smoke test datasets of DS01, DS02 and DS03, respectively, and it outperforms most of the existing state-of-the-art methods.

Hypernymy detection based on graph contrast
ZHANG Yali, FANG Quan, WANG Yunxin, Hu Jun, QIAN Shengsheng, XU Changsheng
2022, 48(8): 1480-1486. doi: 10.13700/j.bh.1001-5965.2021.0524
Abstract:

Hypernymy is the foundation of many downstream tasks in natural language processing (NLP), so hypernymy detection has received considerable attention in the field of NLP. Adopting random initialization word vectors, existing word embedding methods cannot well capture the asymmetry and transferability of hypernymy, or make full use of the relationship between the prediction vector and the real projection. To address these problems, a novel method is proposed for detecting hypernymy based on graph contrastive learning (HyperCL). Firstly, HyperCL is introduced for data enhancement, and robust word feature representations are learned based on maximizing mutual information between local and global representations. Secondly, the proposed method learns how to project the hyponym vector to its hypernym and non-hypernym, and better distinguish the hypernym and non-hypernym in the embedded space, thus improving the detection accuracy. Experimental results on two benchmark datasets show that the proposed model increases the accuracy by more than 0.03, compared with the existing methods.

3D object detection based on multi-path feature pyramid network for stereo images
SU Kaiqi, YAN Weiqing, XU Jindong
2022, 48(8): 1487-1494. doi: 10.13700/j.bh.1001-5965.2021.0525
Abstract:

3D object detection is an important scene understanding task in computer vision and autonomous driving. However, most of these methods do not fully consider the large differences in scales between multiple objects. Thus, objects with a small scale are easily ignored, resulting in low detection accuracy. To address this problem, this paper proposes a 3D object detection method based on multi-path feature pyramid network (MpFPN) for stereo images. MpFPN extends feature pyramid network, adding a bottom-up path, top-down path, and connections between input and output features. It provides multi-scale feature information with higher semantic information and finer-grained spatial information for union region proposal network. Experimental results show that the proposed method achieves better results than comparative methods in easy, moderate and hard scenarios on the 3D object detection dataset KITTI.

Prediction model of COVID-19 based on spatiotemporal attention mechanism
BAO Xin, TAN Zhiyi, BAO Bingkun, XU Changsheng
2022, 48(8): 1495-1504. doi: 10.13700/j.bh.1001-5965.2021.0535
Abstract:

The continuous spread of the COVID-19 has brought profound impacts on human society. For the prevention and control of virus spreading, it is critical to predict the future trend of epidemic situation. Existing studies on COVID-19 spread prediction, based on classic SEIR models or naive time-series prediction models, are rarely considering the characteristics of complex regional correlation and strong time series dependence in the process of epidemic spread, which limits the performance of epidemic prediction. To this end, we propose a COVID-19 prediction model based on auto-encoder and spatiotemporal attention mechanism. The proposed model estimates the trend of COVID-19 by capturing the dynamic spatiotemporal dependence between the epidemic situation sequences of different regions. In particular, a spatial attention mechanism is implemented in the encoder section for every given region to capture the dynamic correlation between the epidemic situation time-series of the region and those of the related regions. Based on the leant correlation, an long short-term memory (LSTM) network is then applied to extract the epidemic sequential features for the given region by combining the recent epidemic situations of the region and the related regions. On the other hand, to better predict the dynamic of the future epidemic situation, temporal attention is introduced into an LSTM network-based decoder to capture the temporal dependence of the epidemic situation sequence. We evaluate the proposed model on several open datasets of COVID-19, and experimental results show that the proposed model outperforms the state-of-the-art models. The metrics of RMSE and MAE of the proposed model on the COVID-19 epidemic dataset of some European countries decreased 22.3% and 25.0%. The metrics of RMSE and MAE of the proposed model on the COVID-19 epidemic dataset of some Chinese provinces decreased 10.1% and 10.4%.

Hyperspectral image compression method based on 3D Saab transform
XU Aiming, HUANG Yuxing, SHEN Qiu
2022, 48(8): 1505-1514. doi: 10.13700/j.bh.1001-5965.2021.0521
Abstract:

Hyperspectral images contain rich and valuable spectral information, which brings great challenges to storage and transmission. However, most current hyperspectral image compression methods cannot consider spatial and spectral redundancy simultaneously, resulting in limited compression performance. We present a hyperspectral image compression method based on 3D subspace approximation with adjusted bias (Saab) transform. 3D Saab transform is firstly applied to hyperspectral image blocks, which performs spatial-spectral fusion and dimensionality reduction on blocks to remove spectral redundancy and local spatial redundancy simultaneously. Then, we use intra mode of high efficiency video coding (HEVC) to further remove spatial and statistical redundancy. Experimental results demonstrate that the proposed method can improve the signal-to-noise ratio (SNR) by at least 0.62 dB as compared with principle component analysis (PCA) based algorithm. At a high bit rate, the proposed method outperforms the state-of-art tensor decomposition compression method. We also evaluate the impact of different dimensionality reduction methods on classification, which demonstrates that the proposed method can better retain important features, with improved classification accuracy at a low bit rate.

A real scene underwater semantic segmentation method and related dataset
MA Zhiwei, LI Haojie, FAN Xin, LUO Zhongxuan, LI Jianjun, WANG Zhihui
2022, 48(8): 1515-1524. doi: 10.13700/j.bh.1001-5965.2021.0527
Abstract:

Underwater object recognition and segmentation with high accuracy have become a challenge with the development of underwater object grabbing technology. The existing underwater object detection technology can only give the general position of an object, unable to give more detailed information such as the outline of the object, which seriously affects the grabbing efficiency. To address this problem, we label and establish underwater semantic segmentation dataset of a real scene (DUT-USEG). The DUT-USEG dataset includes 6 617 images, 1 487 of which have semantic segmentation and instance segmentation annotations, and the remaining 5 130 images have object detection box annotations. Based on this dataset, we propose a semi-supervised underwater semantic segmentation network (US-Net) focusing on the boundaries. By designing a pseudo label generator and a boundary detection subnetwork, this network realizes the fine learning of boundaries between underwater objects and background, and improves the segmentation effect of boundary areas. Experiments show that the proposed method improves by 6.7% in three categories of holothurian, echinus, and starfish in DUT-USEG dataset, and achieves state-of-the-art results.

Appearance and action adaptive target tracking method
XIONG Junyao, WANG Rong, SUN Yibo
2022, 48(8): 1525-1533. doi: 10.13700/j.bh.1001-5965.2021.0597
Abstract:

On the basis of DaSiamese-RPN, a target tracking approach of appearance and action adaptation is proposed to limit the effect of appearance deformation on target tracking when the target is moving.First of all, the appearance and action adaptive module is introduced in the subnet of the Siamese network, which integrates the object's spatial information and action feature. Secondly, the global and local divergence between the actual and predicted feature maps are measured by using two Euclidean distances, and the loss function is constructed by weighting the fusion of the two, so as to strengthen the correlation between the global and local information. Finally, tests were conducted on the VOT2016, VOT2018, VOT2019, and OTB100 datasets. The experimental results showed that the expected average overlap was improved by 4.5% and 6.1% in the VOT2016 and VOT2018 datasets respectively. On the VOT2019 dataset, accuracy increased by 0.4% and expected average overlap decreased by 1%; The tracking success rate was improved by 0.3% and accuracy increased by 0.2% when evaluated on the OTB100 dataset.

Multi-label cooperative learning for cross domain person re-identification
LI Hui, ZHANG Xiaowei, ZHAO Xinpeng, LU Xinyu
2022, 48(8): 1534-1542. doi: 10.13700/j.bh.1001-5965.2021.0600
Abstract:

Cross-domain was an important application scenario in person re-identification, but the apparent difference of person image in illumination condition, shooting angle, imaging background and style between the source domain and target domain was the most important factor that leads to the decline of the generalization ability of person re-identification model. A cross-domain person re-identification method was proposed based on multi-label cooperative learning to solve the problem. Firstly, the semantic parsing model was used to construct the multi-label data based on semantic alignment, which was able to guide us to construct global features that pay more attention to the person area, achieve the purpose of semantic alignment, and reduce the background influence on cross-domain person re-identification. Furthermore, the collaborative learning average model was used to generate a multi-label representation of the person re-identification model based on global and local features after semantic alignment, reducing the interference of noisy hard labels in the cross-domain scenario. Finally, the semantic alignment model of multi-label based on a collaborative learning network framework was combined to improve the identification ability of re-identification model. The experiment results show that on the Market-1501→DukeMTMC-reID, DukeMTMC-reID→Market-1501, Market-1501→MSMT17, DukeMTMC-reID→MSMT17 cross-domain person re-identification data set, compared with the current state-of-the-artscross-domain person re-identification method NRMT, the mean average precision of this method is increased by 8.3%, 8.9%, 7.6% and 7.9%, respectively. Multi-label cooperative learning method has obvious advantages.

Player movement data analysis on soccer field reconstruction
JI Xiaoqi, SONG Zikai, YU Junqing
2022, 48(8): 1543-1552. doi: 10.13700/j.bh.1001-5965.2022.0131
Abstract:

Objective In soccer matches, player data analysis is crucial to improve the viewing experience for viewers and to aid coaches in performance evaluation. The difficulty of player data analysis is how to locate the coordinates of players on the soccer field, i.e., how to determine the mapping relationship between the defective field appearing in a frame of soccer video and the standard two-dimensional field. Aiming at how to deal with the high-speed movement of the camera and the sharp change of the angle of view in the soccer match, we designed and proposed a method of player motion analysis using field reconstruction and player tracking. For field reconstruction, the field in the soccer video is grouped into three parts: left, center, and right. Each group is mapped from the defective field to the standard field by soccer field segmentation, straight line detection, straight-line grouping, center circle point set identification, and key point matching; the kernelized correlation filter (KCF) tracking algorithm is used for player tracking. Then, using a combination of field reconstruction and player tracking approaches, we determine the standard coordinates of players and generate a set of player motion data and visualization results.The player data analysis method proposed in this paper can accurately and effectively count the player data, including player coordinates, motion trajectory, running speed, activity range, and player spacing. In terms of field reconstruction, image intersection is used for evaluation, and the intersection ratio of our algorithm reaches 87%, which improves 3.7% compared to the traditional dictionary-based reconstruction method (83.3% intersection ratio). The results of the experiments suggest that our field reconstruction method can more precisely depict the field mapping connection and can give greater assistance for the statistical analysis of player data.In this paper, we design and propose a complete algorithm for player data analysis based on soccer field reconstruction and obtain visualization results of player statistics. The soccer field reconstruction method combining the knowledge of soccer has improved in accuracy and efficiency. The player data analysis in this paper can provide data support for soccer fans and practitioners, the field reconstruction method lays a solid foundation for further research in the field of player analysis.