Skip to main content
Log in

Query-based denormalization using hypergraph (QBDNH): a schema transformation model for migrating relational to NoSQL databases

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

With the emergence of NoSQL databases, many large applications have migrated from relational databases (RDB) due to their superior flexibility and performance. Database migration from RDB to NoSQL databases involves schema transformation and data migration, which is not straightforward. The challenge lies in that RDB stores data in normalized form, whereas NoSQL supports denormalization. To address the challenge of schema transformation, this paper proposes a model called query-based denormalization using hypergraph (QBDNH) from RDB to the NoSQL database. The model takes the inputs from existing relational tables and queries and transforms them into the denormalized NoSQL model using hypergraphs. The approach overcomes limitations like complex relationship representation and data access pattern coverage of existing graph-based denormalization techniques. The proposed model reduces the overall time, cost, and effort needed to transform the schema manually. To validate the effectiveness of QBDNH, the experiments are conducted on the TPC-H dataset, and the performance of QBDNH is compared to existing graph-based denormalization models such as TLD, CLDA, and Kuszera. The evaluation is carried out in two parts: the first part analyzed the query speedup factor, while the second part measured efficiency improvement based on query pipeline execution. The results revealed that QBDNH achieved a notable query performance improvement with speedup factors of 1.29, 1.35, and 1.40 compared to existing TLD, CLDA, and Kuszera models. Furthermore, QBDNH significantly enhanced pipeline utilization compared to TLD and Kuszera.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
ALGORITHM 1:
Fig. 7
Fig. 8
ALGORITHM 2:
ALGORITHM 3:
ALGORITHM 4:
Fig. 9
ALGORITHM 5:
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

Notes

  1. https://www.mongodb.com/.

  2. https://db-engines.com/en/ranking.

  3. https://www.mongodb.com/atlas/database.

  4. https://studio3t.com/.

References

  1. Atzeni P, Jensen CS, Orsi G et al (2013) The relational model is dead, SQL is dead, and i don’t feel so good myself. SIGMOD Record 42:64–68. https://doi.org/10.1145/2503792.2503808

    Article  Google Scholar 

  2. Stonebraker M (2010) SQL databases v NoSQL databases. Commun ACM 53:10–11. https://doi.org/10.1145/1721654.1721659

    Article  Google Scholar 

  3. Masataka H, Yutaka W (2022) Making software based on human-driven design case study: SQL for non-experts. Proceedings—2022 IEEE 15th international symposium on embedded multicore/many-core systems-on-chip, MCSoC 2022:264–270. https://doi.org/10.1109/MCSoC57363.2022.00049

    Article  Google Scholar 

  4. Floratou A, Teletia N, DeWitt DJ et al (2012) Can the elephants handle the NoSQL onslaught? In: Proceedings of the VLDB endowment 5:1712–1723. https://doi.org/10.14778/2367502.2367511

  5. Cattell R (2010) Scalable SQL and NoSQL data stores. SIGMOD Record 39:12–27. https://doi.org/10.1145/1978915.1978919

    Article  Google Scholar 

  6. Ali D, Liu C, Mengchi L (2018) A survey on NoSQL stores. ACM Comput Surv (CSUR) 51. https://doi.org/10.1145/3158661

  7. Stonebraker M, Abadi DJ, Batkin A et al (2005) C-Store: A column-oriented DBMS. In: VLDB 2005—Proceedings of 31st international conference on very large data bases 2:553–564. https://doi.org/10.1145/3226595.3226638

  8. Störl U, Klettke M, Scherzinger S (2020) NoSQL schema evolution and data migration: State-of-the-art and opportunities. Adv Database Technol EDBT 2020-March, pp 655–658. https://doi.org/10.5441/002/edbt.2020.87

  9. Lee T, Chams M, Nado R et al (2001) System for detecting migration differences of a customized database schema. Google Patents 17:552–560

    Google Scholar 

  10. Wang Y, Shah R, Criswell A et al (2020) Data migration using datalog program synthesis. In: Proceedings of the VLDB endowment 13:1006–1019. https://doi.org/10.14778/3384345.3384350

  11. Gómez P, Casallas R, Roncancio C (2016) Data schema does matter, even in NoSQL systems! In: Proceedings—international conference on research challenges in information science 2016-Augus, pp 1–6. https://doi.org/10.1109/RCIS.2016.7549340

  12. Kaur K, Rani R (2013) Modeling and querying data in NoSQL databases. In: Proceedings—2013 IEEE international conference on big data, big data 2013, pp 1–7. https://doi.org/10.1109/BigData.2013.6691765

  13. Kuszera EM, Peres LM, Didonet Del Fabro M (2022) Exploring data structure alternatives in the RDB to NoSQL document store conversion process. Inf Syst 105:101941. https://doi.org/10.1016/j.is.2021.101941

    Article  Google Scholar 

  14. Karnitis G, Arnicans G (2015 ) Migration of relational database to document-oriented database: structure denormalization and data transformation. In: Proceedings—7th International Conference on Computational Intelligence, Communication Systems and Networks, CICSyN, pp 113–118. https://doi.org/10.1109/CICSYN.2015.30

  15. Yoo J, Lee KH, Jeon YH (2018) Migration from RDBMS to NoSQL using column-level denormalization and atomic aggregates*. J Inf Sci Eng 34:243–259. https://doi.org/10.6688/JISE.2018.34.1.15

    Article  Google Scholar 

  16. Chebotko A, Kashlev A, Lu S (2015) A big data modeling methodology for Apache Cassandra. Proceedings—2015 IEEE international congress on big data, bigdata congress 2015:238–245. https://doi.org/10.1109/BigDataCongress.2015.41

    Article  Google Scholar 

  17. Hewasinghage M, Abelló A, Varga J, Zimányi E (2020) DocDesign: cost-based database design for document stores. In: 32nd International conference on scientific and statistical database management (SSDBM), ACM, pp 1–4. https://doi.org/10.1145/3400903.3401689

  18. Hewasinghage M, Abelló A, Varga J, Zimányi E (2021) A cost model for random access queries in document stores. VLDB J 30:559–578. https://doi.org/10.1007/s00778-021-00660-x

    Article  Google Scholar 

  19. Wolf MM, Klinvex AM, Dunlavy DM (2016) Advantages to modeling relational data using hypergraphs versus graphs. In: 2016 IEEE high performance extreme computing conference, HPEC 2016 0–6. https://doi.org/10.1109/HPEC.2016.7761624

  20. TPC-H benchmark. http://www.tpc.org/tpch/

  21. (2016) A MongoDB White Paper RDBMS to MongoDB Migration Guide (White paper). MongoDB White Paper

  22. Whang JJ, Du R, Jung S et al (2020) MEGA: Multi-view semi-supervised clustering of hypergraphs. In: Proceedings of the VLDB endowment 13:698–711. https://doi.org/10.14778/3377369.3377378

  23. Lee G, Ko J, Shin K (2020) Hypergraph motifs: concepts, algorithms, and discoveries. In: Proceedings of the VLDB endowment 13:2256–2269. https://doi.org/10.14778/3407790.3407823

  24. Ghaleb FFM, Taha AA, Hazman M et al (2020) RDF-BF-Hypergraph representation for relational database. Int J Math Comput Sci 15:41–64

    MathSciNet  Google Scholar 

  25. Hewasinghage M, Abelló A, Varga J, Zimányi E (2021) Managing polyglot systems metadata with hypergraphs. Data Knowl Eng 134:101896. https://doi.org/10.1016/j.datak.2021.101896

    Article  Google Scholar 

  26. Mok WY, Embley DW (2006) Generating compact redundancy-free XML documents from conceptual-model hypergraphs. IEEE Trans Knowl Data Eng 18:1082–1096. https://doi.org/10.1109/TKDE.2006.125

    Article  Google Scholar 

  27. Vera-Olivera H, Guo R, Huacarpuma RC et al (2021) Data Modeling and NoSQL Databases-A Systematic Mapping Review. ACM Comput Surv 54. https://doi.org/10.1145/3457608

  28. Shin SK, Sanders GL (2006) Denormalization strategies for data retrieval from data warehouses. Decis Support Syst 42:267–282. https://doi.org/10.1016/j.dss.2004.12.004

    Article  Google Scholar 

  29. Imam AA, Basri S, Ahmad R et al (2018) Automatic schema suggestion model for NoSQL document-stores databases. Journal of Big Data 5:1–17. https://doi.org/10.1186/s40537-018-0156-1

    Article  Google Scholar 

  30. Imam AA, Basri S, Ahmad R, González-Aparicio MT (2019) Schema proposition model for NoSQL applications. Adv Intell Syst Comput 843:30–39. https://doi.org/10.1007/978-3-319-99007-1_3

    Article  Google Scholar 

  31. Ceresnak R, Dudas A, Matiasko K, Kvet M (2021) Mapping rules for schema transformation: SQL to NoSQL and back. In: International conference on information and digital technologies 2021, IDT 2021 52–58. https://doi.org/10.1109/IDT52577.2021.9497629

  32. Ramzan S, Bajwa IS, Ramzan B, Anwar W (2019) Intelligent data engineering for migration to NoSQL based secure environments. IEEE Access 7:69042–69057. https://doi.org/10.1109/ACCESS.2019.2916912

    Article  Google Scholar 

  33. Serrano D, Han D, Stroulia E (2015) From relations to multi-dimensional maps: towards an SQL-to-HBase transformation methodology. In: Proceedings—2015 IEEE 8th international conference on cloud computing, CLOUD 2015 81–89. https://doi.org/10.1109/CLOUD.2015.21

  34. Shichkina Y, Ha VM (2020) Method for creating collections with embedded documents for document-oriented databases taking into account executable queries. In: SPIIRAS proceedings 19:829–854. https://doi.org/10.15622/sp.2020.19.4.5

  35. Li C (2010) Transforming relational database into HBase: a case study. In: Proceedings 2010 IEEE international conference on software engineering and service sciences, ICSESS 2010, pp 683–687. https://doi.org/10.1109/ICSESS.2010.5552465

  36. Lee CH, Zheng YL (2016) SQL-To-NoSQL Schema Denormalization and Migration: A Study on Content Management Systems. Proceedings - 2015 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2015 2022–2026. https://doi.org/10.1109/SMC.2015.353

  37. Zhao G, Lin Q, Li L, Li Z (2014) Schema conversion model of SQL database to NoSQL. In: Proceedings—2014 9th international conference on P2P, parallel, grid, cloud and internet computing, 3PGCIC 2014 355–362. https://doi.org/10.1109/3PGCIC.2014.137

  38. Ko HKE, Lee YJK (2020) Techniques and guidelines for effective migration from RDBMS to NoSQL. J Supercomput 76:7936–7950. https://doi.org/10.1007/s11227-018-2361-2

    Article  Google Scholar 

  39. Jia T, Zhao X, Wang Z et al (2016) Model transformation and data migration from relational database to MongoDB. In: Proceedings—2016 IEEE international congress on big data, bigdata congress, pp 60–67. https://doi.org/10.1109/BIGDATACONGRESS.2016.16

  40. Mior MJ, Salem K, Aboulnaga A, Liu R (2017) NoSE: Schema design for NoSQL applications. IEEE Trans Knowl Data Eng 29:2275–2289. https://doi.org/10.1109/TKDE.2017.2722412

    Article  Google Scholar 

  41. Imam AA, Basri S, Ahmad R et al (2018) Data modeling guidelines for NoSQL document-store databases. Int J Adv Comput Sci Appl 9:544–555. https://doi.org/10.14569/IJACSA.2018.091066

  42. The Professional Client, IDE and GUI for MongoDB | Studio 3T. https://studio3t.com/. Accessed 8 Jun 2023

  43. Fleming PJ, Wallace JJ (1986) How not to lie with statistics: The correct way to summarize benchmark results. Commun ACM 29:218–221. https://doi.org/10.1145/5666.5673

    Article  Google Scholar 

  44. Dreseler M, Boissier M, Rabl T, Uflacker M (2020) Quantifying TPC-H choke points and their optimizations. In: Proceedings of the VLDB endowment 13:1206–1220. https://doi.org/10.14778/3389133.3389138

  45. Henry OB (2019) MongoDB aggregation stages and pipelining. White paper, pp 1–38

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Contributions

Neha Bansal was involved in writing original draft, writing—reviewing and editing, conceptualization, methodology, programming, validation, Shelly Sachdeva helped in supervision, validation, writing—reviewing and editing, Lalit K. Awasthi contributed to supervision, validation, writing—reviewing and editing.

Corresponding author

Correspondence to Shelly Sachdeva.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

This section shows the name of pipeline stages (PS) and total count (TC) of pipeline stages used for each TPC-H query in TLD, CLDA, Kuszera, and QBDNH model.

See Table 14.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bansal, N., Sachdeva, S. & Awasthi, L.K. Query-based denormalization using hypergraph (QBDNH): a schema transformation model for migrating relational to NoSQL databases. Knowl Inf Syst 66, 681–722 (2024). https://doi.org/10.1007/s10115-023-02017-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-023-02017-y

Keywords

Navigation