Run your Java applications on Big Data Frameworks

Casper is a compiler that can automatically retarget sequential Java programs to Big Data processing frameworks such as Spark, Hadoop or Flink.

Learn More Online Demo GitHub Mailing List


MapReduce is a popular programming paradigm for running large-scale data-intensive computation. Recently, many frameworks that implement that paradigm have been developed. To leverage such frameworks, however, developers need to familiarize with each framework’s API and rewrite their code. We present Casper, a new tool that automatically translates sequential Java programs to the MapReduce paradigm. Casper automatically identifies potential code fragments to rewrite and translate them in two steps: first, Casper uses program synthesis to search for a program summary (i.e., a functional specification) of each fragment. The summary is expressed using a high-level intermediate language resembling the MapReduce paradigm. Next, each found summary is verified tobe semantically equivalent to the original using a theorem prover. Casper then generates executable code from the summary, using either the Hadoop, Spark, or Flink API. We have evaluated Casper by automatically converting real-world sequential Java benchmarks to MapReduce. The resulting benchmarks perform up to 32.2x faster compared to the original.

Legacy Applications and Changing Demands

Data Application Results

Big Data Processing Frameworks can help!

First slide

Process Large Volumes of Data

Scalable and Fault Tolerant

Optimized Parallel Execution

Inertia to Adapting New Technology

Performance Comparison: Casper vs Manual (75GB, 10 Nodes)


[1]  Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications
      Maaz Bin Safeer Ahmad and Alvin Cheung
      SIGMOD 2018 (To appear)

[2]  Optimizing Data-Intensive Applications Automatically By Leveraging Parallel Data Processing Frameworks
      Maaz Bin Safeer Ahmad and Alvin Cheung
      SIGMOD 2017 Demo
      Honorable Mention for Best Demo Award

[3]  Leveraging Parallel Data Processing Frameworks with Verified Lifting
      Maaz Bin Safeer Ahmad and Alvin Cheung
      SYNT 2016Presentation Slides
      Best Student Paper Award