Run your Java applications on Big Data Frameworks

Casper is a compiler that can automatically retarget sequential Java programs to Big Data processing frameworks such as Spark, Hadoop or Flink.

Learn More Online Demo GitHub Mailing List


Numerous large-scale data processing frameworks have been developed in recent years. To leverage them, however, developers need to familiarize with each framework’s API and painstakingly rewrite existing code. While compilers can be developed to automate such tasks, doing so is often tedious. We present a new tool called Casper that enables sequential programs written in a general-purpose language to leverage large-scale data processing frameworks completely automatically. Rather than syntax-driven translation, Casper first converts the input code to a high-level specification that enables easy translation to the target language. Unlike prior work, Casper uses a novel combination of formal verification and syntax-driven search to effectively prune the space of specifications. In addition, it comes with a novel cost model and runtime monitoring module that selects the optimal implementation given program inputs. We have evaluated Casper by automatically retargeting sequential Java benchmarks to three popular large-scale data processing frameworks: Hadoop, Spark, and Flink. The generated benchmarks perform 17.6× faster on average (and up to 32.2×) as compared to the original, and they are also competitive to the implementations written by domain experts.

Legacy Applications and Changing Demands

Data Application Results

Big Data Processing Frameworks can help!

First slide

Process Large Volumes of Data

Scalable and Fault Tolerant

Optimized Parallel Execution

Inertia to Adapting New Technology

Performance Comparison: Casper vs Manual (75GB, 10 Nodes)


[1]  Optimizing Data-Intensive Applications Automatically By Leveraging Parallel Data Processing Frameworks
      Maaz Bin Safeer Ahmad and Alvin Cheung
      SIGMOD 2017 Demo
      Honorable Mention for Best Demo Award

[2]  Leveraging Parallel Data Processing Frameworks with Verified Lifting
      Maaz Bin Safeer Ahmad and Alvin Cheung
      SYNT 2016 – Presentation Slides
      Best Student Paper Award