{"id":233,"date":"2019-02-19T21:10:41","date_gmt":"2019-02-20T02:10:41","guid":{"rendered":"http:\/\/itblog.ldlnet.net\/?p=233"},"modified":"2019-02-20T00:15:38","modified_gmt":"2019-02-20T05:15:38","slug":"exchange-dag-replication-problem-an-established-connection-was-aborted-by-the-software-in-your-host-machine","status":"publish","type":"post","link":"https:\/\/itblog.ldlnet.net\/index.php\/2019\/02\/19\/exchange-dag-replication-problem-an-established-connection-was-aborted-by-the-software-in-your-host-machine\/","title":{"rendered":"Exchange DAG Replication Problem: An established connection was aborted by the software in your host machine"},"content":{"rendered":"\n<p>I had an issue with a four node DAG where the DR site with two of the DAG members were having replication issues. It was only technically affecting one DAG Member though. The copy queue length was really high and the logs were not committing to the database. A <em>Test-ReplicationHealth<\/em> cmdlet test told that the copy queue length for the affected database copy was high. No other databases were affected as there were eight databases on this DAG Node. The issue was that the log files were not replicating properly to the one DAG member for that database, causing the log file drives on all the other DAG members to build and become full:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"758\" height=\"180\" src=\"http:\/\/itblog.ldlnet.net\/wp-content\/uploads\/2019\/02\/GMDCS_DAG_Issue_001.png\" alt=\"\" class=\"wp-image-236\" srcset=\"https:\/\/itblog.ldlnet.net\/wp-content\/uploads\/2019\/02\/GMDCS_DAG_Issue_001.png 758w, https:\/\/itblog.ldlnet.net\/wp-content\/uploads\/2019\/02\/GMDCS_DAG_Issue_001-300x71.png 300w\" sizes=\"auto, (max-width: 758px) 100vw, 758px\" \/><figcaption>EX04 DAG member has high Copy Queue Length<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/itblog.ldlnet.net\/wp-content\/uploads\/2019\/02\/DAG_Log_Drive_01-1024x431.png\" alt=\"\" class=\"wp-image-234\" width=\"854\" height=\"359\" srcset=\"https:\/\/itblog.ldlnet.net\/wp-content\/uploads\/2019\/02\/DAG_Log_Drive_01-1024x431.png 1024w, https:\/\/itblog.ldlnet.net\/wp-content\/uploads\/2019\/02\/DAG_Log_Drive_01-300x126.png 300w, https:\/\/itblog.ldlnet.net\/wp-content\/uploads\/2019\/02\/DAG_Log_Drive_01-768x323.png 768w, https:\/\/itblog.ldlnet.net\/wp-content\/uploads\/2019\/02\/DAG_Log_Drive_01.png 1157w\" sizes=\"auto, (max-width: 854px) 100vw, 854px\" \/><figcaption>The purple member (EX04) free space is different from the other three DAG members<\/figcaption><\/figure>\n\n\n\n<p>Circular Logging was turned on, but since the db was NOT in sync, the logs could NOT truncate properly which rendered CL useless. What was being done to stave the issue was to suspend the database copy of the affected DAG member (EX04), then resume the copy. The logs would replay and commit to the database copy on the DAG member, but over a short period of time, the same issue would arise again, as shown in this graph:<\/p>\n\n\n\n<figure class=\"wp-block-image is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/itblog.ldlnet.net\/wp-content\/uploads\/2019\/02\/DAG_Log_Drive_02-1024x432.png\" alt=\"\" class=\"wp-image-237\" width=\"861\" height=\"363\" srcset=\"https:\/\/itblog.ldlnet.net\/wp-content\/uploads\/2019\/02\/DAG_Log_Drive_02-1024x432.png 1024w, https:\/\/itblog.ldlnet.net\/wp-content\/uploads\/2019\/02\/DAG_Log_Drive_02-300x127.png 300w, https:\/\/itblog.ldlnet.net\/wp-content\/uploads\/2019\/02\/DAG_Log_Drive_02-768x324.png 768w, https:\/\/itblog.ldlnet.net\/wp-content\/uploads\/2019\/02\/DAG_Log_Drive_02.png 1149w\" sizes=\"auto, (max-width: 861px) 100vw, 861px\" \/><figcaption>You can see the other DAG members start dropping in free space<\/figcaption><\/figure>\n\n\n\n<p>There were absolutely no errors in the Event Viewer showing this replication issue. After some research, I ran the following cmdlet showing a particular output parameter that gave me the actual problem:<\/p>\n\n\n\n<p class=\"has-text-color has-small-font-size has-medium-pink-color\"><strong>Get-MailboxDatabaseCopyStatus<\/strong> <strong>DAG1DB01 | ft -a -wr Name, Status, IncomingLogCopyingNetwork<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-image is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/itblog.ldlnet.net\/wp-content\/uploads\/2019\/02\/GMDCS_DAG_Issue_02-1024x121.png\" alt=\"\" class=\"wp-image-239\" width=\"891\" height=\"104\" srcset=\"https:\/\/itblog.ldlnet.net\/wp-content\/uploads\/2019\/02\/GMDCS_DAG_Issue_02-1024x121.png 1024w, https:\/\/itblog.ldlnet.net\/wp-content\/uploads\/2019\/02\/GMDCS_DAG_Issue_02-300x36.png 300w, https:\/\/itblog.ldlnet.net\/wp-content\/uploads\/2019\/02\/GMDCS_DAG_Issue_02-768x91.png 768w, https:\/\/itblog.ldlnet.net\/wp-content\/uploads\/2019\/02\/GMDCS_DAG_Issue_02.png 1343w\" sizes=\"auto, (max-width: 891px) 100vw, 891px\" \/><figcaption>Output with the actual error listed for the DR DAG members.<br><\/figcaption><\/figure>\n\n\n\n<p>The operative error here was:<em> {An error occurred while communicating with server &#8216;EX01&#8217;. Error: Unable to read data from the transport connection: An established connection was aborted by the software in your host machine.}&nbsp; <\/em><\/p>\n\n\n\n<p>Now even though only EX04 was actually having problems with its log replication, both DR members EX03 &amp; EX04 were having the same problem. Again, there were NO events in event viewer showing this issue. I next did some connectivity tests to EX01 from EX04 even though the error said there was an established connection that was broken.<\/p>\n\n\n\n<p class=\"has-text-color has-small-font-size has-medium-pink-color\"><strong>Ping&nbsp;EX01&nbsp;-f&nbsp;-l&nbsp;1472<\/strong><\/p>\n\n\n\n<p>Now the <strong><em>-f<\/em><\/strong> states do NOT fragment the packet and send it as a whole to the destination. <br>The <strong><em>-l<\/em><\/strong> states the packet\/buffer size you want sent. In this case 1472 bits.<br>By doing this, you are able to assure that a router or switch is NOT segmenting the packets, packet segmentation of replication logs can cause data corruption and replication issues.<\/p>\n\n\n\n<p>That test passed successfully. I also did a trace route to assure there was no packet loss on the route to the replicating server. That test passed successfully.<\/p>\n\n\n\n<p>I next checked the DAG Network to assure that all networks were working for replication. Now, in this scenario, there was only ONE DAG Network, there was NOT a separate Replication Network. I did not design the DAG and limitations most likely came into play during the design. From my experience, you setup a separate replication network for replication only, but if your network has enough bandwidth, and the design calls for simplification, you can use one DAG network in your design. <\/p>\n\n\n\n<p class=\"has-text-color has-small-font-size has-medium-pink-color\"><strong>Get-DatabaseAvailabilityGroupNetwork | fl\u00a0<br><\/strong><br>RunspaceId : a1600003-8074-4000-9150-c7800000207f\u00a0<br>Name : MapiDagNetwork\u00a0<br>Description :\u00a0<br>Subnets : {{192.168.1.0\/24,Up}, {192.168.2.0\/24,Up}}\u00a0<br>Interfaces : {{EX01,Up,192.168.1.25}, {EX02,Up,192.168.1.26},{EX03,Up,192.168.2.25}, {EX04,Up,192.168.2.26}}\u00a0<br>MapiAccessEnabled : True\u00a0<br>ReplicationEnabled : True\u00a0<br>IgnoreNetwork : False\u00a0<br>Identity : DAG1\\MapiDagNetwork\u00a0<br>IsValid : True\u00a0<br>ObjectState : New\u00a0 <\/p>\n\n\n\n<p>All the DAG Network Members were up and not showing errors. I next did a telnet session to EX01 over the default DAG replication port 64327 to see if there would be any connectivity issues to EX01:<\/p>\n\n\n\n<p class=\"has-text-color has-small-font-size has-medium-pink-color\"><strong>telnet&nbsp;EX01&nbsp;64327<\/strong><\/p>\n\n\n\n<p>That test was successful and there were no connectivity issues to EX01 from EX04. Again, there was only ONE database out of eight that was having replication problems. After mulling over the problem, it was decided to restart the MSExchangeRepl service on EX03 <strong>AND<\/strong> EX04 since the error was present on both DAG members. We would then, suspend the database copy and resume the database copy on the affected servers.<\/p>\n\n\n\n<p class=\"has-text-color has-small-font-size has-medium-pink-color\"><em>Run on EX03:<\/em><br><strong>Restart-Service&nbsp;MSExchangeRepl<\/strong><br><strong>Suspend-MailboxDatabaseCopy&nbsp;DAG1DB01\/EX03&nbsp;-Confirm:$False<\/strong><br><strong>Resume-MailboxDatabaseCopy&nbsp;DAG1DB01\/EX03&nbsp;-Confirm:$False<\/strong><br><br><em>Run on EX04:<\/em><br><strong>Restart-Service&nbsp;MSExchangeRepl<\/strong><br><strong>Suspend-MailboxDatabaseCopy&nbsp;DAG1DB01\/EX04&nbsp;-Confirm:$False<\/strong><br><strong>Resume-MailboxDatabaseCopy&nbsp;DAG1DB01\/EX04&nbsp;-Confirm:$False<\/strong> <\/p>\n\n\n\n<p>After monitoring the databases and log drives, the issue was resolved and replication started functioning properly.<\/p>\n\n\n\n<figure class=\"wp-block-image is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/itblog.ldlnet.net\/wp-content\/uploads\/2019\/02\/DAG_Log_Drive_03-1024x428.png\" alt=\"\" class=\"wp-image-240\" width=\"828\" height=\"345\" srcset=\"https:\/\/itblog.ldlnet.net\/wp-content\/uploads\/2019\/02\/DAG_Log_Drive_03-1024x428.png 1024w, https:\/\/itblog.ldlnet.net\/wp-content\/uploads\/2019\/02\/DAG_Log_Drive_03-300x125.png 300w, https:\/\/itblog.ldlnet.net\/wp-content\/uploads\/2019\/02\/DAG_Log_Drive_03-768x321.png 768w, https:\/\/itblog.ldlnet.net\/wp-content\/uploads\/2019\/02\/DAG_Log_Drive_03.png 1154w\" sizes=\"auto, (max-width: 828px) 100vw, 828px\" \/><figcaption>Log Drive Available Space Returned to Normal for DAG members<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" style=\"text-align:center\">PLEASE COMMENT! I WELCOME SUGGESTIONS, TIPS, ALTERNATIVE TROUBLESHOOTING! HAVE A GREAT DAY!<\/h2>\n","protected":false},"excerpt":{"rendered":"<p>I had an issue with a four node DAG where the DR site with two of the DAG members were having replication<\/p>\n<p class=\"link-more\"><a class=\"myButt \" href=\"https:\/\/itblog.ldlnet.net\/index.php\/2019\/02\/19\/exchange-dag-replication-problem-an-established-connection-was-aborted-by-the-software-in-your-host-machine\/\">Read More<\/a><\/p>\n","protected":false},"author":1,"featured_media":161,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4,3],"tags":[106,9,8,107,13],"class_list":["post-233","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-exchange","category-powershell","tag-dag","tag-exchange","tag-powershell","tag-replication","tag-script","odd"],"_links":{"self":[{"href":"https:\/\/itblog.ldlnet.net\/index.php\/wp-json\/wp\/v2\/posts\/233","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/itblog.ldlnet.net\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/itblog.ldlnet.net\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/itblog.ldlnet.net\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/itblog.ldlnet.net\/index.php\/wp-json\/wp\/v2\/comments?post=233"}],"version-history":[{"count":3,"href":"https:\/\/itblog.ldlnet.net\/index.php\/wp-json\/wp\/v2\/posts\/233\/revisions"}],"predecessor-version":[{"id":243,"href":"https:\/\/itblog.ldlnet.net\/index.php\/wp-json\/wp\/v2\/posts\/233\/revisions\/243"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/itblog.ldlnet.net\/index.php\/wp-json\/wp\/v2\/media\/161"}],"wp:attachment":[{"href":"https:\/\/itblog.ldlnet.net\/index.php\/wp-json\/wp\/v2\/media?parent=233"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/itblog.ldlnet.net\/index.php\/wp-json\/wp\/v2\/categories?post=233"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/itblog.ldlnet.net\/index.php\/wp-json\/wp\/v2\/tags?post=233"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}