Transformer-Based 3D Object Detection

Authors

  • Jiayin Li Shanghai University of Engineering Science
  • Yixin Ma Shanghai University of Engineering Science
  • Jiagu Pan Shanghai University of Engineering Science
  • Xing Xu Shanghai University of Engineering Science

DOI:

https://doi.org/10.31686/ijier.vol12.iss4.4220

Keywords:

Transformer, Object Detection, Computer Vision, Point Cloud, Self-Attention Mechanism

Abstract

This paper mainly studies object detection methods based on Transformer. Transformer, as a natural language processing technology, is widely used in computer vision tasks such as image classification and object detection. This paper introduces an object detection method based on scale point cloud Transformer, which provides a new research direction for object detection in the future.

Downloads

Download data is not yet available.

Author Biographies

  • Jiayin Li, Shanghai University of Engineering Science

    School of Electronic and Electrical Engineering

  • Yixin Ma, Shanghai University of Engineering Science

    School of Electronic and Electrical Engineering

  • Jiagu Pan, Shanghai University of Engineering Science

    School of Electronic and Electrical Engineering

  • Xing Xu, Shanghai University of Engineering Science

    School of Electronic and Electrical Engineering

References

[1] Liu S., Cao Y., Huang W., etc. Radar point cloud segmentation integrating sparse attention and instance enhancement [J]. Chinese Journal of Image and Graphics, 2023, 28(02): 483-494. DOI: https://doi.org/10.11834/jig.210787

[2] Zhou J., Hu Y., Hu C., et al. Weakly perceptual target detection method based on point cloud completion and multi-resolution Transformer [J/OL]. Computer Applications: 1-13 [2023-03-27].

[3] Han L., Gao Y., Shi Z. Radar point cloud three-dimensional target detection based on sparse Transformer [J]. Computer Engineering, 2022, 48(11): 104-110+144.

[4] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., et al. Attention is all you need. In Advances in neural information processing systems, 2017:5998-6008.

[5] Devlin, J., Chang, MW, Lee, K., et al. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019, 1:4171-4186.

[6] Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In International Conference on Learning Representations, 2021.

Downloads

Published

2025-03-25

How to Cite

Li, J., Ma, Y., Pan, J., & Xu, X. (2025). Transformer-Based 3D Object Detection. International Journal for Innovation Education and Research, 12(4), 1-5. https://doi.org/10.31686/ijier.vol12.iss4.4220

Most read articles by the same author(s)