Abstract
For any digital application with document images such as retrieval, the classification of document images becomes an essential stage. Conventionally for the purpose, the full versions of the documents, that is the uncompressed document images make the input dataset, which poses a threat due to the big volume required to accommodate the full versions of the documents. Therefore, it would be novel, if the same classification task could be accomplished directly (with some partial decompression) with the compressed representation of documents in order to make the whole process computationally more efficient. In this research work, a novel deep learning model—DWT-CompCNN—is proposed for classification of documents that are compressed using High Throughput JPEG 2000 (HTJ2K) algorithm. The proposed DWT-CompCNN comprises of five convolutional layers with filter sizes of 16, 32, 64, 128, and 256 consecutively for each increasing layer to improve learning from the wavelet coefficients extracted from the compressed images. Experiments are performed on two benchmark datasets, Tobacco-3482 and RVL-CDIP, which demonstrate that the proposed model is time and space efficient, and also achieves a better classification accuracy in compressed domain.
Similar content being viewed by others
Availability of data and materials
Data set and materials will be made available on request.
References
Kumar J, Ye P, Doermann D (2014) Structural similarity for document image classification and retrieval. Pattern Recogn Lett 43:119–126
Csurka G, Larlus D, Gordo A, Almazan J (2016) What is the right way to represent document images? arXiv preprint arXiv:1603.01076
Harley AW, Ufkes A, Derpanis KG (2015) Evaluation of deep convolutional nets for document image classification and retrieval. In: 2015 13th international conference on document analysis and recognition (ICDAR), pp 991–995. IEEE
Sarkhel R, Nandi A (2019) Deterministic routing between layout abstractions for multi-scale classification of visually rich documents. In: 28th international joint conference on artificial intelligence (IJCAI), 2019
Barni M (2018) Document and image compression
Zhang Y, Yutao Z, Guangxu L (2018) Document image compression with application to digital preservation in digital libraries. In: 2018 IEEE international conference on signal processing, communications and computing (ICSPCC), pp 1–4. IEEE
Byju AP, Sumbul G, Demir B, Bruzzone L (2020) Remote-sensing image scene classification with deep neural networks in JPEG 2000 compressed domain. IEEE Trans Geosci Remote Sens 59(4):3458–3472
Mukhopadhyay J (2011) Image and video processing in the compressed domain
Javed M, Nagabhushan P, Chaudhuri BB (2018) A review on document image analysis techniques directly in the compressed domain. Artif Intell Rev 50(4):539–568
Afzal MZ, Kölsch A, Ahmed S, Liwicki M (2017) Cutting the error by half: investigation of very deep CNN and advanced training strategies for document image classification. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol 1, pp 883–888. IEEE
Ferrando J, Domínguez JL, Torres J, García R, García D, Garrido D, Cortada J, Valero M (2020) Improving accuracy and speeding up document image classification through parallel systems. In: International conference on computational science. Springer, pp 387–400
Hu B, Ergu D, Yang H, Liu K, Cai Y (2019) Document images classification based on deep learning. Proc Comput Sci 162:514–522
Csurka G (2017) Document image classification, with a specific view on applications of patent images, 325–350
Kölsch A, Afzal MZ, Ebbecke M, Liwicki M (2017) Real-time document image classification using deep CNN and extreme learning machines. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol 1, pp 1318–1323. IEEE
Das A, Roy S, Bhattacharya U, Parui SK (2018) Document image classification with intra-domain transfer learning and stacked generalization of deep convolutional neural networks. In: 2018 24th international conference on pattern recognition (ICPR), pp 3180–3185. IEEE
Mandivarapu JK, Bunch E, You Q, Fung G (2021) Efficient document image classification using region-based graph neural network. arXiv preprint arXiv:2106.13802
Bakkali S, Ming Z, Coustaty M, Rusinol M (2020) Visual and textual deep feature fusion for document image classification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 562–563
Asim MN, Khan MUG, Malik MI, Razzaque K, Dengel A, Ahmed S (2019) Two stream deep network for document image classification. In: 2019 international conference on document analysis and recognition (ICDAR), pp 1410–1416. IEEE
Forman G (2003) An extensive empirical study of feature selection metrics for text classification. J Mach Learn Res 3:1289–1305
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826
Bakkali S, Ming Z, Coustaty M, Rusi nol M (2020) Cross-modal deep networks for document image classification. In: 2020 IEEE international conference on image processing (ICIP), pp 2556–2560. IEEE
Kang L, Kumar J, Ye P, Li Y, Doermann D (2014) Convolutional neural networks for document image classification. In: 2014 22nd international conference on pattern recognition, pp 3168–3172. IEEE
Tensmeyer C, Martinez T (2017) Analysis of convolutional neural networks for document image classification. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol 1, pp 388–393. IEEE
Salomon D (2004) Data compression: the complete reference
Nagabhushan P, Javed M, Chaudhuri B (2014) Entropy computations of document images in run-length compressed domain. In: 2014 fifth international conference on signal and image processing, pp 287–291. IEEE
De Queiroz RL (1998) Processing jpeg-compressed images and documents. IEEE Trans Image Process 7(12):1661–1672
Rabbani M (2002) Jpeg 2000: Image compression fundamentals, standards and practice. J Electr Imaging 11(2):286
Nagabhushan P, et al (2019) Text line segmentation in compressed representation of handwritten document using tunneling algorithm. arXiv preprint arXiv:1901.11477
Rajesh B, Jain P, Javed M, Doermann D (2021) Hh-compwordnet: Holistic handwritten word recognition in the compressed domain. In: 2021 data compression conference (DCC), pp 362–362. IEEE
Byju AP, Demir B, Bruzzone L (2020) A progressive content-based image retrieval in jpeg 2000 compressed remote sensing archives. IEEE Trans Geosci Remote Sens 58(8):5739–5751
Schaefer G (2017) Fast compressed domain jpeg image retrieval. In: 2017 International conference on vision, image and signal processing (ICVISP), pp. 22–26. IEEE
Rajesh B, Javed M, Srivastava S (2019) Dct-compcnn: a novel image classification network using jpeg compressed DCT coefficients. In: 2019 IEEE conference on information and communication technology, pp 1–6. IEEE
Arslan HS, Archambault S, Bhatt P, Watanabe K, Cuevaz J, Le P, Miller D, Zhumatiy V (2022) Usage of compressed domain in fast frameworks. Signal Image Video Process, 1–9
Hiremath PS, Shivashankar S (2008) Wavelet based co-occurrence histogram features for texture classification with an application to script identification in a document image. Pattern Recogn Lett 29(9):1182–1189
Williams T, Li R (2016) Advanced image classification using wavelets and convolutional neural networks. In: 2016 15th IEEE international conference on machine learning and applications (ICMLA), pp 233–239. IEEE
Khatami A, Nazari A, Beheshti A, Nguyen TT, Nahavandi S, Zieba J (2020) Convolutional neural network for medical image classification using wavelet features. In: 2020 international joint conference on neural networks (IJCNN), pp 1–8. IEEE
Ali RB, Ejbali R, Zaied M (2018) A deep convolutional neural wavelet network for classification of medical images. J Comput Sci 69(11):1488–1498
Rossetto AM, Zhou W (2019) Improving classification with CNNS using wavelet pooling with nesterov-accelerated adam. In: Proceedings of 11th international conference on bioinformation and computer biology, vol 60, pp 84–93
Li J, Gray RM (2000) Context-based multiscale classification of document images using wavelet coefficient distributions. IEEE Trans Image Proces 9(9):1604–1616
Li Q, Shen L, Guo S, Lai Z (2020) Wavelet integrated CNNS for noise-robust image classification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7245–7254
Shankar BU, Meher SK, Ghosh A (2007) Neuro-wavelet classifier for remote sensing image classification. In: 2007 international conference on computing: theory and applications (ICCTA’07), pp 711–715. IEEE
Chamain LD, Ding Z (2020) Improving deep learning classification of jpeg2000 images over bandlimited networks. In: ICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 4062–4066. IEEE
Abdmouleh MK, Masmoudi A, Bouhlel MS (2012) A new method which combines arithmetic coding with RLE for lossless image compression
Watson AB (1994) Image compression using the discrete cosine transform. Math J 4(1):81
Chowdhury MMH, Khatun A (2012) Image compression using discrete wavelet transform. Int J Comput Sci Issues (IJCSI) 9(4):327
Schelkens P, Skodras A, Ebrahimi T (2009) The jpeg 2000 suite
Taubman D, Naman A, Mathew R, Smith M, Watanabe O (2019) High throughput jpeg 2000 (htj2k): algorithm, performance and potential
Zhang X, Zou J, He K, Sun J (2015) Accelerating very deep convolutional networks for classification and detection. IEEE Trans Pattern Anal Mach Intell 38(10):1943–1955
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Lewis D, Agam G, Argamon S, Frieder O, Grossman D, Heard J (2006) Building a test collection for complex document information processing. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval, pp 665–666
Harley AW, Ufkes A, Derpanis KG Evaluation of deep convolutional nets for document image classification and retrieval. In: International conference on document analysis and recognition (ICDAR)
Fang X, Watanabe O (2021) Development of open-source codec compliant with htj2k standard. In: 2021 IEEE 10th global conference on consumer electronics (GCCE), pp 11–14. IEEE
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980
Watanabe O, Taubman D (2019) A matlab implementation of the emerging htj2k standard. In: 2019 IEEE 8th global conference on consumer electronics (GCCE), pp 491–495. IEEE
Afzal MZ, Capobianco S, Malik MI, Marinai S, Breuel TM, Dengel A, Liwicki M (2015) Deepdocclassifier: document classification with deep convolutional neural network. In: 2015 13th international conference on document analysis and recognition (ICDAR), pp 1111–1115. IEEE
Kanchi S, Pagani A, Mokayed H, Liwicki M, Stricker D, Afzal MZ (2022)Emmdocclassifier: efficient multimodal document image classifier for scarce data
Funding
No funding was received for conducting this study.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no relevant financial or nonfinancial interests to disclose.
Ethical approval
Not applicable.
Consent to participate
Not applicable.
Consent for publication
All authors have agreed with the content and give explicit consent to submit the work. All authors have obtained consent from the responsible authorities at the institute/organization where the work has been carried out.
Ethical conduct
The submitted work is original and has not been published or submitted elsewhere in any form or language.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Bisen, T., Javed, M., Kirtania, S. et al. DWT-CompCNN: deep image classification network for high throughput JPEG 2000 compressed documents. Pattern Anal Applic 26, 1641–1655 (2023). https://doi.org/10.1007/s10044-023-01190-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-023-01190-8