Please fill in the details

    How we saved a ton of effort on data pipe code migration!

    Dev   |

    Dec 02, 2021   

    How we…

    We worked on a Medical Records Analytics project and processed a monthly batch of data that arrived as CCDA files, very complex XMLs. You don’t want to analyze these using traditional excel because even parsing the file format would take a proprietary parser. We used many complex parsing techniques in java and stored the cleansed, processed data in a good old Oracle database.

    A typical monthly batch would have records of 40 million patients. That means 40 million such files to process, and our batch would take between 8 to 10 days to complete. Same story every month!

    One fine day we were done with supporting this beast and we decided to make way for Spark processing and HDFS storage. The whole idea rested on developers being able to efficiently and accurately convert the Java and PL/SQL code to Spark/Scala code. “And don’t forget to check that you get the same output, and obviously the batch would end within a day!” said the Project Manager.

    Nevertheless, we started the code conversion drill and soon realized that the only way to do it is iterative. We, developers, needed something to tell us that we are making the right impact on data with every nudge and tweak we do. That’s where we built DataQ DataCompare.

    After every code change, we would use the data difference report to know how effectively we are marched towards converting all the pipelines. The scrum master and project manager had actual numbers to report without reaching out for our help.

    At the end of it all, we converted the whole code and released it to production within four months and guess what, the same monthly batch now takes just over 7 hours to complete. It was a great Christmas party to have after achieving this kind of feat.

    Before I miss to tell you, DataQ Inc was born!

    Add your comment

    Notify of
    Inline Feedbacks
    View all comments
    Generic selectors
    Exact matches only
    Search in title
    Search in content
    Post Type Selectors

    Latest Articles

    ETL Testing…

    By Dev | Apr 28, 2023

    ETL Testing – White Vs…

    Read More
    Go to…

    By Dev | Oct 19, 2022

    Go to Market Faster with…

    Read More
    5 Most…

    By Dev | Aug 04, 2022

    5 Most Common Challenges in…

    Read More



    Data Sources

    Database Testing

    ETL Testing