SAP HANA is becoming the database of choice for organizations to run their mission critical systems. Contractually committed SLAs reinforce the importance that these systems remain working and available at all times. To ensure SAP HANA availability, it offers several mechanisms:
- Storage replication - Data replication is achieved by means of storage mirroring independent from the database software. Disks are mirrored without a control process from the SAP HANA system. This is typically offered by the hardware manufacturer.
- Host auto-failure - using a hot standby system, where in case of a failure, data and log volumes of a failed worker node are taken over by a standby node.
- SAP HANA System Replication - SAP HANA constantly replicates all data to a secondary SAP HANA system.
In this post we will focus on SAP HANA system replication and how monitoring using syslink Xandria ensures that it is working properly and will be ready once needed.
SAP HANA system replication
SAP HANA System Replication is controlled by the SAP HANA database Kernel. It is done using two separate systems with the exact same number of active HANA nodes. Once system replication is setup between the two HANA systems, one is defined as the primary system and the other being the secondary system.
All data is then replicated from the primary HANA system to the secondary creating an initial baseline. After this, any logged changes in the primary system are also sent to the secondary system, but these log entries are not replayed. Similarly, every predefined interval the system send data snapshots from the primary to the secondary.
In case of a primary failure, where the secondary needs to take over it will use the last data snapshot and will replay the log since the snapshot timestamp. With the data snapshot, the primary system also sends information about the tables loaded in memory if the parameter 'preload_column_tables' is set to 'true'. If this parameter is also set to 'true' on the secondary system, these tables are preloaded in the memory of the secondary database. This reduces the recovery time, making HANA system replication a faster high availability solution in terms of recovery.
Since SPS11, HANA replication has given two different operation modes. The first, delta_datashipping, is similar to past versions of HANA replication, with the addition of delta data shipping every 10 minutes. These additional deltas continually build on top of the initial snapshot, making less logs necessary in the case of failover, meaning faster recovery time. The second, logreplay, utilizes the original methodology of HANA System Replication with one snapshot and logs being sent over. The difference is that the logs are immediately utilized on the secondary system after being shipped, creating a near High Availability (HA), or ‘hot standby’ scenario.
There are various synchronization modes (including Synchronous on disk, Synchronous in memory, Asynchronous and Full Sync). You can read more about it in SAP technical documentation.
Challenges in monitoring HANA system replication
Declaring a disaster recovery situation sends SAP support teams into a high-stress environment. While the operation sounds bulletproof, there are a lot of worries about making sure all the logs were shipped. Even one missed log will cause an issue when attempting to utilize the secondary system. Unfortunately, without a good monitoring solution in place, you won’t know any files are missing until you attempt to use the secondary system. Support teams usually set up replication services, hope the process is working successfully, then pray they never need to take advantage of the secondary system. While it is best practice to run a practice DR at least once a year, it usually requires a lot of coordination, a maintenance window and is usually unsuccessful or takes longer than anticipated. Because of all of this, DR tests are usually put on the back burner or completely forgotten about.
How Xandria address HANA system replication challenges
Using Xandria, IT team know and not just hope that their system replication will work once needed. Xandria can clearly identify any potential future issues with the replication process, allowing the team to resolve it while they happen to ensure that the system is ready to be turned on.
Using Xandria they get a complete visibility, via a single pane of glass, that their replication process is working and will work successfully. DR scenarios, tests or real, are stressful enough for the support teams. Xandria gives the SAP support teams confidence going into a DR scenario that all necessary technical components will be operational and ready for secondary system use.