ETL pipelines use SQL to directly communicate
ETL pipelines use SQL to directly communicate with back-end databases. MDACA Data Flow proved to be an existing, accredited, and natural fit for supporting the required requirements, as it is currently used to ingest nearly 9 billion DHA transactions daily (Figure 2). We quickly constructed pipelines to handle three primary data movement and conversion tasks by utilizing its drag-and-drop tools and extensive component library: Ingestion of vaccination records in real time from client sites in either proprietary message formats or HL7. The records had to be sent to S3 in parquet format in their raw format via HTTP without losing any information. We developed a single pipeline to receive the HL7 raw record over HTTP, transform it into JSON and parquet, upload it to S3 for extract-transform-load (ETL) pipelines to move it to the SQL back end, and return an HTTP status indicating upload success or failure. To parse and convert the data from HL7, no coding, scripting, or use of third-party libraries were required due to Data Flow’s ready support for HL7. For the exclusive configuration, we made a comparable pipeline that gets the information previously changed over completely to JSON design from our client-confronting web administrations. Infrequent ingestion of immunization record batches in HL7 format from thousands of record flat files. The batch files needed to be transferred to S3 via SFTP. Despite the fact that Data Flow has components for working with SFTP servers, the location’s administrative and security requirements necessitated the use of a separate process connected to the SFTP server to move the batch files to S3. Our answer included production of equal pipelines to intermittently filter S3 for new group documents, download and chronicle them, split the bunches into individual records and feed the singular records through a comparable transformation and steering process as utilized for the ongoing stream. However, in order to verify and validate the records’ structure and content prior to conversion, the pipeline had to send them via HTTP to our middle-tier web services. Due to Data Flow’s ready support for S3 and HTTP protocols, as well as its rapid processing and scaling on AWS GovCloud, this was simple to accomplish.