Optimizing Data Science Applications using Static Analysis

Singh, Bhushan Pal ; Sahu, Mudra ; Sudarshan, S. (2021) Optimizing Data Science Applications using Static Analysis In: 18th International Symposium on Database Programming Languages.

Full text not available from this repository.

Official URL: http://doi.org/10.1145/3475726.3475729

Related URL: http://dx.doi.org/10.1145/3475726.3475729

Abstract

Data science applications are often coded in Python, using Pandas and similar APIs. Pandas requires data to be in memory, and when run on larger datasets, these applications may run out of memory, or suffer from poor performance. We describe the SCIRPy system for optimizing such applications by source to source transformations, using static analysis and transformation rules. SCIRPy implements a number of optimizations like data selection, drop column removal, multistage data fetch, and efficient data representation based on metadata analysis. The application source code is transformed into a custom-built intermediate representation (IR) and these optimizations are performed in this IR. The optimized IR is then transformed back to Python source. Our experiments show that our approach reduces the memory footprint and time consumption of a number of data science applications.

Item Type:Conference or Workshop Item (Paper)
Source:Copyright of this article belongs to Association for Computing Machiner
ID Code:128446
Deposited On:21 Oct 2022 06:19
Last Modified:14 Nov 2022 12:28

Repository Staff Only: item control page