When you go to “Accept all” click, you agree to the storage of cookies on your device to improve navigation and support our marketing efforts. For more information, see our privacy policy.

Publications

Experiences, insights and more

Career portal

Vacancies

Project A

Anonymize texts and files

RESA

Transferring data to your SAP system

Efficient database migration: tools and processes for a smooth transition

In today's data-driven landscape, both SMEs and large enterprises are experiencing a dramatic increase in data volume, something that a never before...

Author

Victor Staack

Reading Time

7 minutes

Category

Data Science

Updated on

30.8.2024

initiation

In today's data-driven landscape, both SMEs and large enterprises are experiencing a dramatic increase in data volume, placing an unprecedented burden on older analytical databases. Systems such as Oracle and other legacy databases such as IBM DB2 and Sybase are struggling to handle the sheer volume of data, leading to performance bottlenecks. This is particularly problematic as customer analytics have become critical to companies' decision making. The rise of data-driven strategies requires robust and powerful analytics engines that can efficiently process large amounts of data. In addition, ensuring data integrity and accuracy during migrations is more important than ever in light of increasingly stringent regulatory requirements and compliance standards.

Increase in Hadoop's popularity

Tools and processes for seamless database migration

During a recent database migration project, several tools and processes proved helpful in maintaining focus and ensuring a seamless, timely, and efficient migration. The following is an overview of the methods and tools that enabled a successful migration.

Template creation

One of the most important tools was a Cookiecutter template, which was used to create a structured workflow for each table to be migrated. This template included:

Dynamic variables are as follows:

JSON

{
“uc_name”: “default_migration_use_case”,
“script_name”: “default_script_name”,
“hive_table_name”: “standard_hive_table_name”,
“hive_write_schema”: “default_hive_write_schema”,
“es_index_name”: “default_es_index_name”
}

Old code directory: (`sas_code`) A central location to store legacy scripts.

R-Shiny server integration: (`shiny_code_translator`) This script starts an R-Shiny server, which breaks down the old code into manageable pieces. Each chunk is presented to the user for review, with the option to explain the functionality of the code. AI-generated code translations are also provided for developer approval.

ainovi Code Translator
Original Code

proc sql ;
create table work.antragsparameter as select
a.* ,
b.parameter ,
b.wert ,
b.beginn ,
b.ende
from work.funksystem a left join daten.param_ubg b
on   a.ubg_id eq b.id
    and b.beginn le a.ne_ende
    and b.ende gt a.ne_start
    and b.ende gt today()
order by id, ne_start, ubg_id, parameter, beginn, ende ;
drop table work.funksystem ;
quit ;

data work.antragsparameter ;
set work.antragsparameter ;
by id ne_start ubg_id parameter ;
if first.id or first.ne_start or first.ubg_id or first.parameter ;
run ;

proc transpose data=work.antragsparameter out=work.funksystem_param ;
var wert ;
by id ne_start ubg_id funksystem_adr1;
id parameter ;
run ;
                    
Code Description
Code Translation

script compilation: After approval, the translated code modules are compiled into a complete script.

The script is stored in the `src` directory.

deployment structure: Templates for deploying the new scripts using tools such as Airflow, Ansible, and setting up Elasticsearch and Kibana index patterns.

This approach ensures a standardized process for handling each table, making the migration more manageable and consistent.

Deterministic testing

Deterministic testing was a crucial step to ensure the accuracy of the translated scripts. This comprised several components:

SolarCube integration: Early refactoring of translated scripts with SolarCube to improve code quality and maintainability.

comparison script: A script was developed to compare the original and translated tables. For each row in the original table, the script generated a report with three columns for each original column:

links: The original entry.

Right: The translated entry.

upshot: An indicator of whether the two entries are the same.

Pseudo code example also for MrBean script

In addition, an anti-join script was created to identify missing entries in the translated table. Here's a simple pseudo-code snippet for the mismatch query:

As a result of this comparison process, discrepancies were identified at an early stage so that timely corrections were possible.

debugging sessions

Automated debugging sessions played an important role in identifying and resolving issues. The process included:

Automatic breakpoints: Breakpoints were added to the code, along with expectations for interim results.

Comparing content and quantity: At each breakpoint, the content and number of data were compared.

Identifying issues: This step helped to quickly identify where the discrepancies occurred.

Here's an example of setting expectations using SAS macros:

These automated checks ensured that issues were identified and resolved immediately, minimizing downtime and ensuring data integrity.

Here is the full process flow diagram:

flowchart TB
    B[Create Cookiecutter Template]
    subgraph templating
    direction TB
    B --> C[Store Old Code in Repository]
    end
    subgraph translator
    direction TB
    D[Spawn R-Shiny Server]
    D --> E[Separate Old Code into Chunks]
    E --> F[User Explains Code Chunks]
    F --> G[Generate AI Code Translations]
    G --> H[Developer Reviews and Approves Translations]
    end
    subgraph scripting
    direction TB
    I[Compile Approved Translations into New Script]
    I --> J[Deploy New Script using Deployment Templates]
    end
    subgraph debugging
    direction TB
    subgraph refactor
    direction TB
    L[Refactor Code with SolarCube]
    end
    subgraph table_comp
    direction TB
    M[Compare Original and Translated Tables]
    M --> N[Generate Comparison Report]
    N --> O[Identify Missing Entries with Anti-Join Script]
    end
    subgraph breakpoints
    direction TB
    P[Conduct Debugging Sessions]
    P --> Q[Insert Breakpoints and Set Expectations]
    Q --> R[Compare Content and Counts at Breakpoints]
    R --> S[Pinpoint Issues]
    end
    S --> T[Resolve Issues and Update Code]
    end
    templating --> translator
    translator --> scripting
    scripting --> debugging
    refactor --> table_comp
    table_comp --> breakpoints

Victor Staack
Senior Data Scientist
Interest aroused? Tell us about your use case and we will provide you with the right solution.
Contact
Table of Contents

News

News
Apply now
We are looking for new employees in Data Science, IT Management, Microsoft and SAP
Start your career
Apply now
We are looking for new employees in Data Science, IT Management, Microsoft and SAP
Start your career