Work Carried Out
As a part of the client’s team, Montrose Software was tasked with analysis and improvement of a large and complex ETL system. After analyzing the system, we grouped the ETL steps based on their nature, and unified each of the groups by replacing dozens of stand alone scripts and queries with one common metadata driven tool. This, along with SQL schema and queries optimization, improved the overall system performance by over 40%. To ensure results integrity, we wrote an automated regression tool, and made it a part of the nightly runs.
After successful ETL process improvement, Montrose Software’s team was asked to examine an existing bucketing tool driven by a flat file metadata structure. This tool was initially designed to be human-friendly, but over years of development, the metadata structure evolved to a custom interpreted programming language, allowing sub-file inclusion, parameters override, logical conditions, and much more. A large amount of flexibility, with a lack of constraints, made this metadata hard to work with, and vulnerable to human errors. Complex data dependencies, and a significant number of files, required an automated process to process and compare the full data structure. To accomplish that, we extracted the custom metadata language interpreter from the bucketing tool, and loaded all parsed data to a human-readable structure. By doing this, we found all configuration inconsistencies and duplications. With this knowledge, we designed and implemented a new system which enforced data integrity.