Please fill in the details

How we saved a ton of effort on data pipe code migration!

Admin   |

Dec 02, 2021   

How we…

We worked on a Medical Records Analytics project and processed a monthly batch of data that arrived as CCDA files, very complex XMLs. You don’t want to analyze these using traditional excel because even parsing the file format would take a proprietary parser. We used many complex parsing techniques in java and stored the cleansed, processed data in a good old Oracle database.

A typical monthly batch would have records of 40 million patients. That means 40 million such files to process, and our batch would take between 8 to 10 days to complete. Same story every month!

One fine day we were done with supporting this beast and we decided to make way for Spark processing and HDFS storage. The whole idea rested on developers being able to efficiently and accurately convert the Java and PL/SQL code to Spark/Scala code. “And don’t forget to check that you get the same output, and obviously the batch would end within a day!” said the Project Manager.

Nevertheless, we started the code conversion drill and soon realized that the only way to do it is iterative. We, developers, needed something to tell us that we are making the right impact on data with every nudge and tweak we do. That’s where we built DataQ DataCompare.

After every code change, we would use the data difference report to know how effectively we are marched towards converting all the pipelines. The scrum master and project manager had actual numbers to report without reaching out for our help.

At the end of it all, we converted the whole code and released it to production within four months and guess what, the same monthly batch now takes just over 7 hours to complete. It was a great Christmas party to have after achieving this kind of feat.

Before I miss to tell you, DataQ Inc was born!

Add your comment

Notify of
Inline Feedbacks
View all comments
Generic selectors
Exact matches only
Search in title
Search in content

Latest Articles

Difference between…

By Admin | Dec 10, 2021

Difference between ETL and Database…

Read More
Data fit…

By Admin | Dec 02, 2021

Data fit to purpose comparative…

Read More
How a…

By Admin | Dec 02, 2021

How a successful migration to…

Read More



Data Sources

Database Testing

ETL Testing