Examine Heuristic Data Lake Management Using AWS: A Big Data Handling Approach

Main Article Content

Wasudeo Purushottam Rahane,Pramod D. Patil, Rajesh D. Bharti


In this era of technology, the most valued asset can be ‘Data’. With the increasing number of data, the value of it keeps increasing.  Data storage and data manipulate for to achieve some particular goals or business requirements increasing in number and storing it has become a complex and tedious task. With the use of some advanced technologies like hadoop, it simplified the data storing process, but due to rapid development and excessive use of AI and ML, tons of data is collected. The quintessence is to ascertain an extra cost effective storage alternative. This paper provides with an effective solution to store data over the cloud with numerous benefits over traditional data storage methods by developing a data lake using AWS a Cost Effective Data Lake Management algorithm (CEDLMA). Furthermore, the functionalities of data lake include managing and storing sorted as well as unsorted data, gathering various analytics from the data lake as per business requirements.  Proposed work is evaluated with AWS’s IAM and S3 services.

Article Details

Author Biography

Wasudeo Purushottam Rahane,Pramod D. Patil, Rajesh D. Bharti

[1]Wasudeo Purushottam Rahane

2Pramod D. Patil

3Rajesh D. Bharti


[1] Research Scholar, Dr. D. Y. Patil Institute of Technology, Savitribai Phule Pune University, Pune, India


2Professor, Dr. D. Y. Patil Institute of Technology, Savitribai Phule Pune University, Pune, India


3Professor, Dr. D. Y. Patil Institute of Technology, Savitribai Phule Pune University, Pune, India




R. Hai, C. Koutras, C. Quix and M. Jarke, “Data Lakes: A Survey of Functions and Systems”, in IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 12, pp. 12571-12590, 1 Dec. 2023, doi: 10.1109/TKDE.2023.3270101

D. Oreščanin, T. Hlupić and B. Vrdoljak, “Managing Personal Identifiable Information in Data Lakes”, in IEEE Access, vol. 12, pp. 32164-32180, 2024, doi: 10.1109/ACCESS.2024.3365042.

Xu, J. “An accurate management method of public services based on big data and cloud computing”, J Cloud Comp 12, 80 (2023). https://doi.org/10.1186/s13677-023-00456-0

Aakash Aundhkar, Shweta Guja, “A review on Enterprise Data Lake Solutions”, Journal of Science and Technology, Volume 06, Issue :01|August 2021

E. Zagan and M. Danubianu, “Data Lake Architecture for Storing and Transforming Web Server Access Log Files”, in IEEE Access, vol. 11, pp. 40916-40929, 2023, doi: 10.1109/ACCESS.2023.3270368.

F. Nargesian, K. Pu, B. Ghadiri-Bashardoost, E. Zhu and R. J. Miller, “Data Lake Organization”, in IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 1, pp. 237-250, 1 Jan. 2023, doi: 10.1109/TKDE.2021.3091101.

R. S, A. S. Karthik, M. H. S. M. K. Karthik, M. Jayasurya and S. Yashwanth, “Examining Amazon Customer Reviews using PySpark and AWS: A Data Lake Approach”, 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), Delhi, India, 2023, pp. 1-6, doi: 10.1109/ICCCNT56998.2023.10307845.

Z. Dong, “Research of Big Data Information Mining and Analysis : Technology Based on Hadoop Technology”, 2022 International Conference on Big Data, Information and Computer Network (BDICN), Sanya, China, 2022, pp. 173-176, doi: 10.1109/BDICN55575.2022.00041.

Tanmay Sanjay Hukkeri, Vanshika Kanoria, Jyoti Shetty, “A study of Enterprise Data Lake Solutions”, International Research Journal of Engineering and Technology (IRJET) Volume : 07 Issue : 05|May 2020

Amra Munshi, Yasser Abdel-Rady I Mohamed, “Data Lake Lambda Architecture for Smart grids big data analytics”, IEEE Issue: 23 July

Bozena M-M,Marek S,Dariusz M. “Soft and decarative fishing of information in Big Data Lake”, IEEE Transactions on Fuzzy Systems, 2018,1(99):1-6.

Cravero, O. Saldana, R. Espinosa, and C. Antileo, “Big data architecture for water resources management: A systematic mapping study,” IEEE Lat. Am. Trans., vol. 16, no. 3, pp. 902-- 908, 2018.

Sophia Boing Righetto, Eduardo Luiz Martins, Andre Luiz Pereria, “Data Lake Architecture for Distribution System Operator”, 2021 IEEE Power & Energy Society Innovative Smart Grid Technologies Conference (ISGT) | 978-1-7281-8897-3/21/$31.00©2021 IEEE DOI: 10.1109/ISGT49243.2021.9372181

ByungRai Cha, Jong won Kim, Design and Implementation of connected data lake for a reliable data transmission.

Tanmay Sanjay Hukkeri, Vanshika Kanoria, Jyoti Shetty, “A study of Enterprise Data Lake Solutions”, International Research Journal of Engineering and Technology (IRJET) Volume : 07 Issue : 05|May 2020

Yi-Hua Chen, Hsin-Hsin Chen, and Po-Chun Huang, “Enhancing the Data Privacy for Public Data Lakes”, Proceedings of IEEE International Conference on Applied System Innovation 2018

J. Sawadogo, Pegdwende and Darmont, “On data lake architectures and metadata management,” J. Intell. Inf. Syst. Springer, pp. 1--24, 2020.

Mukund Rajeshwar,Rajesh Bharati, “Function as a Service in Cloud Computing: A survey”, International Journal of Future Generation Communication and NetworkingVol. 13, No. 3, (2020), pp. 3291–3297

Filiana, A. G. Prabawati, M. N. A. Rini, G. Virginia, and B. Susanto, “Perancangan Data Warehouse Perguruan Tinggi untuk Kinerja Penelitian dan Pengabdian kepada Masyarakat,” J. Tek. Inform. dan Sist. Inf., vol. 6, no. 2, pp. 174–183, 2020, doi: 10.28932/jutisi.v6i2.2557

G. W. Darma, K. S. Utami, and N. W. S. Aryani, “Data Warehouse Analysis to Support UMKM Decisions using the Nine-step Kimball Method”, Int. J. Eng. Emerg. Technol., vol. 1, no. 1, pp. 65–68, 2019.

Shashikant Athawale, Virat Giri, S.L. Bangare, “Collateral extension in provocation of security in IoT”, Int. J. Future Gener. Commun. Netw. (Web of Science), 2233-7857, 14 (1) (2021), pp. 3703-3716.

S.L. Bangare, P.S. Bangare, K.P. Patil, “Internet of Things with green computing”, Turkish J. Physiother. Rehabil., 2651-4451, 32 (3) (2021), pp. 12494-12497

S.L. Bangare, S. Gupta, M. Dalal, A. Inamdar, “Using node.Js to build high speed and scalable backend database server”, Proc. NCPCI. Conf. International Journal of Research in Advent Technology, 4 (2016): 19.