Installing an ‘IP-less’ Exchange Server 2019 Database Availability Group

Yesterday, I posted on how Exchange now uses the Resilient File System (ReFS) to optimize and protect Exchange critical files. Another layer of protection is using a database availability group (DAG) for redundancy and is a necessary factor when designing an Exchange Enterprise Environment.
In this example, I will walk you through the installation of an Exchange Server 2019 DAG as I configured in my environment. This DAG will contain two Exchange Servers in the same site with a third Windows Server 2019 server being the File Share Witness (FSW).

Two Server Exchange DAG Configuration

For my configuration, I configured two identical Windows Server 2019 VMs (same procs, RAM, vhdx drives, partitions, etc…). I configured the Exchange Data Volume using ReFS and mounted them to the same folder on the C: Drive on each server. This is very important for replication to take place successfully when the databases are added to the DAG.


I next went to the Admin server where the FSW would be hosted and added the Exchange Trusted Subsystem Account to the local Administrators group on that server:

IMPORTANT!
Add the Exchange Trusted Subsystem Account to the Local Administrators Group on the FSW.

NOTE: The reason that this is an ‘IP-less’ DAG is that I’m creating a DAG with no cluster administrative access point (CAAP). The DAG has no IP address of its own, and no computer object in Active Directory. The main implication of this is that backup software that relies on the CAAP or backup operations won’t work. This option of an ‘IP-less’ DAG was first introduced in Exchange Server 2013 SP1/CU4, so by now any decent backup products should support this configuration. But you should always verify this with your backup vendor of choice. Also be aware that this is only supported for DAGs that are running on Windows Server 2012 R2 (or later).

Next, we create the DAG from Exchange PowerShell using the New-DatabaseAvailabilityGroup cmdlet. Now remember that since you are using the ReFS system for your database volumes, you will need to specify the -FileSystem parameter within the cmdlet to assure proper setup and replication of the data files.

Next, we add the Exchange Servers that hold the databases that will be replicated within the DAG:

The DAG will now show the two servers as Operational Member Servers:

The FSW Directory was created on the admin01 server when the DAG was created. We can verify that with the following cmdlet:

Next, we add the databases that we want replicated to the DAG as replicated databases. I want all my Databases on EX01 to replicate to EX02 and vice versa for the EX02 Databases. I want the activation preference to remain on the server that the databases were originally created on so I will use the -ActivationPreference parameter to accomplish that. I will go into more detail on Activation Preference in another post.

Now we verify that the Database Copies are healthy on each replication member using the Get-MailboxDatabaseCopyStatus cmdlet. You will see a Healthy Status on the replicated copies:

POSITIVE ENERGY!
KILL NARCISSISM!
HAPPY TROUBLESHOOTING!

REFERENCES:
Installing an Exchange Server 2016 Database Availability Group

Using the Resilient File System for Exchange Server

In my ongoing effort for becoming more knowledgeable on Exchange Server, I found that the preferred new file system for Exchange Databases and Log files is the ReFS.
ReFS is not that new. Microsoft’s Resilient File System (ReFS) was introduced with Windows Server 2012. ReFS is not a direct replacement for NTFS, and is missing some underlying NTFS features, but is designed to be (as the name suggests) a more resilient file system for extremely large amounts of data.

Support for ReFS with Exchange Server

From Exchange Server 2013 and upwards (which includes Exchange Server 2019 today) Microsoft supports the use of ReFS for Exchange servers, and in fact they now recommend it as the preferred file system for Exchange Server 2019, within the following guidelines.

For Exchange Server 2013:

  • ReFS is supported for volumes containing Exchange database files, log files, and content index files.
  • ReFS is not supported for volumes containing Exchange binaries (the program files).
  • ReFS is not supported for volumes containing the system partition.
  • ReFS data integrity features must be disabled for the database (.edb) files or the entire volume that hosts database files.
  • Hotfix KB2853418 must be installed.
  • For Windows 2012, the following hotfixes must be installed:

This means that you should continue to use NTFS for your operating system and Exchange Server 2013 installation volume, but you can consider using ReFS for the volumes hosting Exchange databases, log files, and index files.

For Exchange Server 2016:

  • ReFS is supported for volumes containing Exchange database files, log files, and content index files.
  • ReFS is not supported for volumes containing Exchange binaries (the program files).
  • ReFS is not supported for volumes containing the system partition.
  • ReFS data integrity features are recommended to be disabled.
  • For Windows 2012, the following hotfixes must be installed:

This means that you should continue to use NTFS for your operating system and Exchange Server 2016 installation volume, and it is recommended ReFS for the volumes hosting Exchange databases, log files, and index files.

For Exchange Server 2019:

  • ReFS is supported for volumes containing Exchange database files, log files, and content index files.
  • ReFS is not supported for volumes containing Exchange binaries (the program files).
  • ReFS is not supported for volumes containing the system partition.
  • ReFS data integrity features are recommended to be disabled.

This means that you should continue to use NTFS for your operating system and Exchange Server 2019 installation volume, and it is recommended ReFS for the volumes hosting Exchange databases, log files, and index files.

Creating an ReFS Formatted Volume

In Windows Server during the New Volume Wizard when you get to the step for configuring File System Settings change the file system from NTFS to ReFS.

exchange-server-refs

NOTE: Using the New Volume Wizard does not give you the option to disable data integrity at the volume level. To set it at the volume level itself use PowerShell when configuring new volumes. I found this out the hard way and am now re-configuring my volumes to disable the Integrity Streams.

I needed to create the mount point to mount the volume to:

I then got a list of my available disks:

In my case, disk 2 was the one I needed to format and change. I had to create a new partition and then format it:

Once formatted, I mount the volume to the Directory created earlier:

NOTE: Partition 1 on a disk is always reserved for system files on the drive volume. So the active partitions will always start at 2.

Lastly, verify that the partition is online and that the Integrity Streams are turned off:

Additional Considerations

When you are deploying an Exchange 2016 or 2019 DAG and using Autoreseed, the disk reclaimer needs to know which file system to use when formatting spare disks. So when, creating a DAG in Exchange PowerShell, make sure to set the -FileSystem parameter. For Exchange Server 2013 DAGs, manually format the spare volumes with ReFS.

More coming soon. I will post how I setup the “IP-less” DAG for my environment and got replication functional for my Exchange Databases.

REFERENCES:
Exchange 2013 storage configuration options
Exchange 2016 Preferred Architecture
Exchange Storage for Insiders: It’s ESE (Ignite video)
ReFS Exchange Server Volumes
Preparing ReFS Volumes for Exchange

Exchange DAG Replication Problem: An established connection was aborted by the software in your host machine

I had an issue with a four node DAG where the DR site with two of the DAG members were having replication issues. It was only technically affecting one DAG Member though. The copy queue length was really high and the logs were not committing to the database. A Test-ReplicationHealth cmdlet test told that the copy queue length for the affected database copy was high. No other databases were affected as there were eight databases on this DAG Node. The issue was that the log files were not replicating properly to the one DAG member for that database, causing the log file drives on all the other DAG members to build and become full:

EX04 DAG member has high Copy Queue Length
The purple member (EX04) free space is different from the other three DAG members

Circular Logging was turned on, but since the db was NOT in sync, the logs could NOT truncate properly which rendered CL useless. What was being done to stave the issue was to suspend the database copy of the affected DAG member (EX04), then resume the copy. The logs would replay and commit to the database copy on the DAG member, but over a short period of time, the same issue would arise again, as shown in this graph:

You can see the other DAG members start dropping in free space

There were absolutely no errors in the Event Viewer showing this replication issue. After some research, I ran the following cmdlet showing a particular output parameter that gave me the actual problem:

Get-MailboxDatabaseCopyStatus DAG1DB01 | ft -a -wr Name, Status, IncomingLogCopyingNetwork

Output with the actual error listed for the DR DAG members.

The operative error here was: {An error occurred while communicating with server ‘EX01’. Error: Unable to read data from the transport connection: An established connection was aborted by the software in your host machine.} 

Now even though only EX04 was actually having problems with its log replication, both DR members EX03 & EX04 were having the same problem. Again, there were NO events in event viewer showing this issue. I next did some connectivity tests to EX01 from EX04 even though the error said there was an established connection that was broken.

Ping EX01 -f -l 1472

Now the -f states do NOT fragment the packet and send it as a whole to the destination.
The -l states the packet/buffer size you want sent. In this case 1472 bits.
By doing this, you are able to assure that a router or switch is NOT segmenting the packets, packet segmentation of replication logs can cause data corruption and replication issues.

That test passed successfully. I also did a trace route to assure there was no packet loss on the route to the replicating server. That test passed successfully.

I next checked the DAG Network to assure that all networks were working for replication. Now, in this scenario, there was only ONE DAG Network, there was NOT a separate Replication Network. I did not design the DAG and limitations most likely came into play during the design. From my experience, you setup a separate replication network for replication only, but if your network has enough bandwidth, and the design calls for simplification, you can use one DAG network in your design.

Get-DatabaseAvailabilityGroupNetwork | fl 

RunspaceId : a1600003-8074-4000-9150-c7800000207f 
Name : MapiDagNetwork 
Description : 
Subnets : {{192.168.1.0/24,Up}, {192.168.2.0/24,Up}} 
Interfaces : {{EX01,Up,192.168.1.25}, {EX02,Up,192.168.1.26},{EX03,Up,192.168.2.25}, {EX04,Up,192.168.2.26}} 
MapiAccessEnabled : True 
ReplicationEnabled : True 
IgnoreNetwork : False 
Identity : DAG1\MapiDagNetwork 
IsValid : True 
ObjectState : New 

All the DAG Network Members were up and not showing errors. I next did a telnet session to EX01 over the default DAG replication port 64327 to see if there would be any connectivity issues to EX01:

telnet EX01 64327

That test was successful and there were no connectivity issues to EX01 from EX04. Again, there was only ONE database out of eight that was having replication problems. After mulling over the problem, it was decided to restart the MSExchangeRepl service on EX03 AND EX04 since the error was present on both DAG members. We would then, suspend the database copy and resume the database copy on the affected servers.

Run on EX03:
Restart-Service MSExchangeRepl
Suspend-MailboxDatabaseCopy DAG1DB01/EX03 -Confirm:$False
Resume-MailboxDatabaseCopy DAG1DB01/EX03 -Confirm:$False

Run on EX04:
Restart-Service MSExchangeRepl
Suspend-MailboxDatabaseCopy DAG1DB01/EX04 -Confirm:$False
Resume-MailboxDatabaseCopy DAG1DB01/EX04 -Confirm:$False

After monitoring the databases and log drives, the issue was resolved and replication started functioning properly.

Log Drive Available Space Returned to Normal for DAG members

PLEASE COMMENT! I WELCOME SUGGESTIONS, TIPS, ALTERNATIVE TROUBLESHOOTING! HAVE A GREAT DAY!