Transformer-Based 3D Object Detection

Jiayin  Li; Yixin Ma; Jiagu Pan; Xing Xu

doi:10.31686/ijier.vol12.iss4.4220

Authors

Jiayin Li Shanghai University of Engineering Science Author
Yixin Ma Shanghai University of Engineering Science Author
Jiagu Pan Shanghai University of Engineering Science Author
Xing Xu Shanghai University of Engineering Science Author

DOI:

https://doi.org/10.31686/ijier.vol12.iss4.4220

Keywords:

Transformer, Object Detection, Computer Vision, Point Cloud, Self-Attention Mechanism

Abstract

This paper mainly studies object detection methods based on Transformer. Transformer, as a natural language processing technology, is widely used in computer vision tasks such as image classification and object detection. This paper introduces an object detection method based on scale point cloud Transformer, which provides a new research direction for object detection in the future.

Downloads

Download data is not yet available.

Author Biographies

Jiayin Li, Shanghai University of Engineering Science

School of Electronic and Electrical Engineering
Yixin Ma, Shanghai University of Engineering Science

School of Electronic and Electrical Engineering
Jiagu Pan, Shanghai University of Engineering Science

School of Electronic and Electrical Engineering
Xing Xu, Shanghai University of Engineering Science

School of Electronic and Electrical Engineering

References

[1] Liu S., Cao Y., Huang W., etc. Radar point cloud segmentation integrating sparse attention and instance enhancement [J]. Chinese Journal of Image and Graphics, 2023, 28(02): 483-494. DOI: https://doi.org/10.11834/jig.210787

[2] Zhou J., Hu Y., Hu C., et al. Weakly perceptual target detection method based on point cloud completion and multi-resolution Transformer [J/OL]. Computer Applications: 1-13 [2023-03-27].

[3] Han L., Gao Y., Shi Z. Radar point cloud three-dimensional target detection based on sparse Transformer [J]. Computer Engineering, 2022, 48(11): 104-110+144.

[4] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., et al. Attention is all you need. In Advances in neural information processing systems, 2017:5998-6008.

[5] Devlin, J., Chang, MW, Lee, K., et al. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019, 1:4171-4186.

[6] Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In International Conference on Learning Representations, 2021.