Migration and Collection of Data

Migration and Collection of Data

In any big data ecosystem, it can be hard to collect and move a lot of data. Having the right data at the right time can easily mean the difference between a successful and unsuccessful campaign when data play a crucial role in logistical planning and analysis. One example of this is keeping track of and reporting on the vaccinations given to members of our armed forces, especially since it involves confidential health workload data. In light of the recent rise in epidemic and pandemic-causing illnesses and viral strains, accurate and effective immunization tracking and reporting are becoming increasingly important. In addition to assisting the military in maintaining operational efficiency, protecting its members from these and other infectious diseases reduces the likelihood that they will become disease carriers both at home and abroad. The Defense Health Agency (DHA) had to quickly replace the outdated, dispersed immunization tracking and reporting systems for military personnel and their families with one that is more up-to-date, centralized, and universally accessible to members of all branches of the armed forces. We managed the collection, migration, and centralization of immunization records from all military branches into shared data repositories and enterprise information systems by utilizing the Multiplatform Data Acquisition, Collection, and Analytics (MDACA) Data Flow (“Data Flow”) running on Amazon Web Services (AWS) GovCloud. This was done in order to facilitate that endeavor. Data Flow is a directed-graph engine with hundreds of ready-made components for moving data between systems using the most prevalent protocols, schemas, and data formats, as shown in Figure 1. We were able to model and deploy a functioning system in a fraction of the time it would have taken us to design and code the necessary capabilities from scratch thanks to MDACA Data Flow’s building-blocks approach. The initial phase of the project required that the new solution’s interfaces be functionally identical to those of the legacy system. As a result, client applications would not need to be altered in order to continue tracking and reporting. Because of this, data had to be collected and delivered to the modernized back end using a variety of legacy communication protocols and messaging schemas. Some of these were: facilitating the ingestion of raw personnel and immunization records sent in a specialized subset of Health Level 7 (HL7) and proprietary fixed-length messaging schemas. utilizing streamed and flat-file based delivery via the HTTP(S), SFTP, and Amazon S3 protocols to receive immunization records individually as well as in batches containing thousands of records. ETL pipelines feed the data into back-end databases by converting the raw HL7 and proprietary formatted messages into JSON and Apache Parquet data formats.