Examine Heuristic Data Lake Management Using AWS: A Big Data Handling Approach
Main Article Content
Abstract
In this era of technology, the most valued asset can be ‘Data’. With the increasing number of data, the value of it keeps increasing. Data storage and data manipulate for to achieve some particular goals or business requirements increasing in number and storing it has become a complex and tedious task. With the use of some advanced technologies like hadoop, it simplified the data storing process, but due to rapid development and excessive use of AI and ML, tons of data is collected. The quintessence is to ascertain an extra cost effective storage alternative. This paper provides with an effective solution to store data over the cloud with numerous benefits over traditional data storage methods by developing a data lake using AWS a Cost Effective Data Lake Management algorithm (CEDLMA). Furthermore, the functionalities of data lake include managing and storing sorted as well as unsorted data, gathering various analytics from the data lake as per business requirements. Proposed work is evaluated with AWS’s IAM and S3 services.
Article Details
This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.
References
R. Hai, C. Koutras, C. Quix and M. Jarke, “Data Lakes: A Survey of Functions and Systems”, in IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 12, pp. 12571-12590, 1 Dec. 2023, doi: 10.1109/TKDE.2023.3270101
D. Oreščanin, T. Hlupić and B. Vrdoljak, “Managing Personal Identifiable Information in Data Lakes”, in IEEE Access, vol. 12, pp. 32164-32180, 2024, doi: 10.1109/ACCESS.2024.3365042.
Xu, J. “An accurate management method of public services based on big data and cloud computing”, J Cloud Comp 12, 80 (2023). https://doi.org/10.1186/s13677-023-00456-0
Aakash Aundhkar, Shweta Guja, “A review on Enterprise Data Lake Solutions”, Journal of Science and Technology, Volume 06, Issue :01|August 2021
E. Zagan and M. Danubianu, “Data Lake Architecture for Storing and Transforming Web Server Access Log Files”, in IEEE Access, vol. 11, pp. 40916-40929, 2023, doi: 10.1109/ACCESS.2023.3270368.
F. Nargesian, K. Pu, B. Ghadiri-Bashardoost, E. Zhu and R. J. Miller, “Data Lake Organization”, in IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 1, pp. 237-250, 1 Jan. 2023, doi: 10.1109/TKDE.2021.3091101.
R. S, A. S. Karthik, M. H. S. M. K. Karthik, M. Jayasurya and S. Yashwanth, “Examining Amazon Customer Reviews using PySpark and AWS: A Data Lake Approach”, 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), Delhi, India, 2023, pp. 1-6, doi: 10.1109/ICCCNT56998.2023.10307845.
Z. Dong, “Research of Big Data Information Mining and Analysis : Technology Based on Hadoop Technology”, 2022 International Conference on Big Data, Information and Computer Network (BDICN), Sanya, China, 2022, pp. 173-176, doi: 10.1109/BDICN55575.2022.00041.
Tanmay Sanjay Hukkeri, Vanshika Kanoria, Jyoti Shetty, “A study of Enterprise Data Lake Solutions”, International Research Journal of Engineering and Technology (IRJET) Volume : 07 Issue : 05|May 2020
Amra Munshi, Yasser Abdel-Rady I Mohamed, “Data Lake Lambda Architecture for Smart grids big data analytics”, IEEE Issue: 23 July
Bozena M-M,Marek S,Dariusz M. “Soft and decarative fishing of information in Big Data Lake”, IEEE Transactions on Fuzzy Systems, 2018,1(99):1-6.
Cravero, O. Saldana, R. Espinosa, and C. Antileo, “Big data architecture for water resources management: A systematic mapping study,” IEEE Lat. Am. Trans., vol. 16, no. 3, pp. 902-- 908, 2018.
Sophia Boing Righetto, Eduardo Luiz Martins, Andre Luiz Pereria, “Data Lake Architecture for Distribution System Operator”, 2021 IEEE Power & Energy Society Innovative Smart Grid Technologies Conference (ISGT) | 978-1-7281-8897-3/21/$31.00©2021 IEEE DOI: 10.1109/ISGT49243.2021.9372181
ByungRai Cha, Jong won Kim, Design and Implementation of connected data lake for a reliable data transmission.
Tanmay Sanjay Hukkeri, Vanshika Kanoria, Jyoti Shetty, “A study of Enterprise Data Lake Solutions”, International Research Journal of Engineering and Technology (IRJET) Volume : 07 Issue : 05|May 2020
Yi-Hua Chen, Hsin-Hsin Chen, and Po-Chun Huang, “Enhancing the Data Privacy for Public Data Lakes”, Proceedings of IEEE International Conference on Applied System Innovation 2018
J. Sawadogo, Pegdwende and Darmont, “On data lake architectures and metadata management,” J. Intell. Inf. Syst. Springer, pp. 1--24, 2020.
Mukund Rajeshwar,Rajesh Bharati, “Function as a Service in Cloud Computing: A survey”, International Journal of Future Generation Communication and NetworkingVol. 13, No. 3, (2020), pp. 3291–3297
Filiana, A. G. Prabawati, M. N. A. Rini, G. Virginia, and B. Susanto, “Perancangan Data Warehouse Perguruan Tinggi untuk Kinerja Penelitian dan Pengabdian kepada Masyarakat,” J. Tek. Inform. dan Sist. Inf., vol. 6, no. 2, pp. 174–183, 2020, doi: 10.28932/jutisi.v6i2.2557
G. W. Darma, K. S. Utami, and N. W. S. Aryani, “Data Warehouse Analysis to Support UMKM Decisions using the Nine-step Kimball Method”, Int. J. Eng. Emerg. Technol., vol. 1, no. 1, pp. 65–68, 2019.
Shashikant Athawale, Virat Giri, S.L. Bangare, “Collateral extension in provocation of security in IoT”, Int. J. Future Gener. Commun. Netw. (Web of Science), 2233-7857, 14 (1) (2021), pp. 3703-3716.
S.L. Bangare, P.S. Bangare, K.P. Patil, “Internet of Things with green computing”, Turkish J. Physiother. Rehabil., 2651-4451, 32 (3) (2021), pp. 12494-12497
S.L. Bangare, S. Gupta, M. Dalal, A. Inamdar, “Using node.Js to build high speed and scalable backend database server”, Proc. NCPCI. Conf. International Journal of Research in Advent Technology, 4 (2016): 19.