Exchange DAG Replication Problem: An established connection was aborted by the software in your host machine

I had an issue with a four node DAG where the DR site with two of the DAG members were having replication issues. It was only technically affecting one DAG Member though. The copy queue length was really high and the logs were not committing to the database. A Test-ReplicationHealth cmdlet test told that the copy queue length for the affected database copy was high. No other databases were affected as there were eight databases on this DAG Node. The issue was that the log files were not replicating properly to the one DAG member for that database, causing the log file drives on all the other DAG members to build and become full:

EX04 DAG member has high Copy Queue Length
The purple member (EX04) free space is different from the other three DAG members

Circular Logging was turned on, but since the db was NOT in sync, the logs could NOT truncate properly which rendered CL useless. What was being done to stave the issue was to suspend the database copy of the affected DAG member (EX04), then resume the copy. The logs would replay and commit to the database copy on the DAG member, but over a short period of time, the same issue would arise again, as shown in this graph:

You can see the other DAG members start dropping in free space

There were absolutely no errors in the Event Viewer showing this replication issue. After some research, I ran the following cmdlet showing a particular output parameter that gave me the actual problem:

Get-MailboxDatabaseCopyStatus DAG1DB01 | ft -a -wr Name, Status, IncomingLogCopyingNetwork

Output with the actual error listed for the DR DAG members.

The operative error here was: {An error occurred while communicating with server ‘EX01’. Error: Unable to read data from the transport connection: An established connection was aborted by the software in your host machine.} 

Now even though only EX04 was actually having problems with its log replication, both DR members EX03 & EX04 were having the same problem. Again, there were NO events in event viewer showing this issue. I next did some connectivity tests to EX01 from EX04 even though the error said there was an established connection that was broken.

Ping EX01 -f -l 1472

Now the -f states do NOT fragment the packet and send it as a whole to the destination.
The -l states the packet/buffer size you want sent. In this case 1472 bits.
By doing this, you are able to assure that a router or switch is NOT segmenting the packets, packet segmentation of replication logs can cause data corruption and replication issues.

That test passed successfully. I also did a trace route to assure there was no packet loss on the route to the replicating server. That test passed successfully.

I next checked the DAG Network to assure that all networks were working for replication. Now, in this scenario, there was only ONE DAG Network, there was NOT a separate Replication Network. I did not design the DAG and limitations most likely came into play during the design. From my experience, you setup a separate replication network for replication only, but if your network has enough bandwidth, and the design calls for simplification, you can use one DAG network in your design.

Get-DatabaseAvailabilityGroupNetwork | fl 

RunspaceId : a1600003-8074-4000-9150-c7800000207f 
Name : MapiDagNetwork 
Description : 
Subnets : {{192.168.1.0/24,Up}, {192.168.2.0/24,Up}} 
Interfaces : {{EX01,Up,192.168.1.25}, {EX02,Up,192.168.1.26},{EX03,Up,192.168.2.25}, {EX04,Up,192.168.2.26}} 
MapiAccessEnabled : True 
ReplicationEnabled : True 
IgnoreNetwork : False 
Identity : DAG1\MapiDagNetwork 
IsValid : True 
ObjectState : New 

All the DAG Network Members were up and not showing errors. I next did a telnet session to EX01 over the default DAG replication port 64327 to see if there would be any connectivity issues to EX01:

telnet EX01 64327

That test was successful and there were no connectivity issues to EX01 from EX04. Again, there was only ONE database out of eight that was having replication problems. After mulling over the problem, it was decided to restart the MSExchangeRepl service on EX03 AND EX04 since the error was present on both DAG members. We would then, suspend the database copy and resume the database copy on the affected servers.

Run on EX03:
Restart-Service MSExchangeRepl
Suspend-MailboxDatabaseCopy DAG1DB01/EX03 -Confirm:$False
Resume-MailboxDatabaseCopy DAG1DB01/EX03 -Confirm:$False

Run on EX04:
Restart-Service MSExchangeRepl
Suspend-MailboxDatabaseCopy DAG1DB01/EX04 -Confirm:$False
Resume-MailboxDatabaseCopy DAG1DB01/EX04 -Confirm:$False

After monitoring the databases and log drives, the issue was resolved and replication started functioning properly.

Log Drive Available Space Returned to Normal for DAG members

PLEASE COMMENT! I WELCOME SUGGESTIONS, TIPS, ALTERNATIVE TROUBLESHOOTING! HAVE A GREAT DAY!

Customize your Default PowerShell CLI Prompt

We all like to have our customization in our Windows Desktop. Custom colors, icons, wallpaper, etc.. Well IT guy/gal, why not have your PowerShell CLI the same way? I’ve looked around at a few blogs and got some ideas to share with you on customizing your PowerShell CLI.

Now, by default, Windows looks in the following directory for your customization file:

C:\Users\(username)\Documents\WindowsPowerShell

It looks for a file called profile.ps1 and will load that script every time you load PowerShell once it is customized.

You can construct the script within PowerShell ISE or your favorite editor. The following is how I programmed the script to customize my PowerShell prompt:

First, we want to clear the slate on the PowerShell CLI. 
I like to add my contact information for instance.

Second, I run a script that I came across and added to my customization.
It get’s me the weather for the local area that I am from. Click the link for details.

How To Uniquify Your PowerShell Console (Scroll to Getting the Weather)

Next, I customize the PowerShell Window Settings so that it is the size and shape that I want.
I set the Directory Location, The Colors, and The Window Sizes.

Next, I wanted to write some text in the window before I get my PowerShell Prompt.
I work at Avanade, so I wanted to put a welcome message, today’s date, and what PS version I was running just because I could.

Lastly, we actually configure the prompt. There is a lot of ways to do this and I will leave references for you at the bottom of this post so that you can get more details on the commands actually run.
I wanted to have a colorful prompt that stated the company motto and put the current time.
I also configured the Window Title at the top to show text, the current user, and the current directory with different colors as that is my expressive part of my brain. 🙂

Here is the script in its entirety:

Here is the final product when opening PowerShell:

Avanade Custom PowerShell Prompt
My Customized PS Prompt

HAPPY SCRIPTING!
HAPPY VALENTINES DAY!

References:
Customizing your PowerShell Profile
Get-Weather.ps1
Modify your PowerShell Prompt
PowerShell Basics: Console Configuration

Veeam Backup Validation Tool

I ran across an issue with my VM backups saying that they were failing validation and not backing up properly, even though each VM showed success when checking the logs. I was getting a specific error in the backup logs:

Backup files health check has been completed
Failed to perform backup file verification Error: Data error (cyclic redundancy check). Failed to read data from the file [B:\Backups\LDLNET Other Backup\LDLNET BackupD2019-01-12T001234.vbk]. Agent failed to process method {Signature.FullRecheckBackup}.

So, I did some research and found a little known tool that is used to manually validate the Veeam backup files, basically because it’s a tool usually executed only by the technical support staff. It is located in the following folder (Version 9.5.0.1922):

C:\Program Files\Veeam\Backup and Replication\Backup\Veeam.Backup.Validator.exe

Its main use case is to verify the consistency of a backup created with Veeam Backup & Replication. It’s NOT SureBackup, that does another kind of control by starting the VM from the backup file, and is for sure more reliable. But if you do not want to start a SureBackup activity, or if you only have a Standard license lacking SureBackup, this tool can be a good alternative, or you can use it to check a backup file after it has been moved or if you had a consistency problem on the storage holding those files.

The command switches are listed here for the executable

Now since I had a specific file that was showing an error, I wanted to run the command against that file to validate the backup. Here is the command that I ran:

.\Veeam.Backup.Validator.exe /file:”B:\Backups\LDLNET Other Backup\LDLNET BackupD2019-01-12T001234.vbk”

It started running against the file and it did fail as it did in the log files from the backup process. Here is the error message:

Skipping VM ‘B:\Backups\LDLNET Other Backup\LDLNET BackupD2019-01-12T001234.vbk ‘: File “LDLNET-VM01-flat.vmdk” is corrupted. Data error (cyclic redundancy check).
Failed to read data from the file [B:\Backups\LDLNET Other Backup\LDLNET BackupD2019-01-12T001234.vbk ].

Now, when looking at the backup job, I found that the file listed was the original full backup that I had completed when I originally changed the job for all the new VMs that were now listed in the backup job. Since that was the case and I did not have any VMs that were in a bad state, I deleted all the backup files from my storage and started another full backup of the VMs in the job.

Using the CLI tool to manually validate the backup file was very helpful in this case as it would help me decide to clear out a backup that would not restore properly, even with the incremental backups since the base full file was corrupt.

References:
Veeam Backup Validator: check the consistency of your backup files

Connect to all PowerShell Modules in O365 with one script

Let’s say you’re an admin that needs to connect to Office365 via PowerShell often. Now, there are many different websites or blogs that will show you how to connect to each session via PowerShell. That can cause a headache since you can end up having five different PowerShell sessions running in five different windows. You end up having to enter a username and password all those times, which can become time consuming.

I want to show you here how to combine all those sessions into one script where, if you’re security is tight enough on your computer, you don’t even have to enter credentials. This way, you can click on one icon and pull up all the O365 PowerShell commands that you’ll need to manage your organization.

First you need to download the following PowerShell Module Installation Files so that your PowerShell Database will have the correct modules installed:

Microsoft Online Service Sign-in Assistant for IT Professionals RTW
Windows Azure Active Directory Module for Windows PowerShell v2
SharePoint Online Management Shell
Skype for Business Online, Windows PowerShell Module

Next, we want to setup the CLI (Command Line Interface) to be too cool for school. I have learned it helps to have knowledge of how to customize the CLI window. You can do all of this in PowerShell ISE or Notepad, which ever you prefer. Here are the commands for the script that I use to setup the CLI:

Next, you want to set your Execution Policy and put in your credentials so that you won’t be prompted to enter the user credentials when you run the script.

NOTE: MAKE SURE YOU KEEP YOUR SCRIPT SAFE AS THE CREDENTIALS ARE VISIBLE WITHIN THE SCRIPT IN PLAIN TEXT!

You can, alternatively, set your script to prompt for credentials every time by using the following:

$LiveCred = Get-Credential

Here is that part of the script:

Now we get into the importing of the modules for each O365 service:

Get the MSOnline Module:

Connect to the MSOnline Service:

Connect to Azure AD PowerShell:

Connect to SharePoint Online PowerShell:
NOTE – MAKE SURE YOU CHANGE TO YOUR COMPANY NAME IN THE URL!!

Connect to Exchange Online PowerShell:

Connect to Skype For Business Online PowerShell:

Connect to the Security & Compliance PowerShell:
NOTE – This one I still get “Access Denied” when trying to connect. I have looked for an answer to that issue, but have not found one. Please comment with a link if you have an answer so that I can update this script!

Lastly, put in a note to show that the PS load is completed:

So Here is the final script in its entirety:

Now you can create your icon for your desktop so that you can easily access the script. I would save the script to your Scripts directory.

That will usually be C:\Users\’username’\Documents\WindowsPowerShell\Scripts or wherever directory you choose.

To start, right click the desktop and choose New > Shortcut
In the Target Field, enter the following for your PowerShell Shortcut, pointing to the path of your script:

C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe -noexit -ExecutionPolicy Unrestricted -File “C:\Users\username\Documents\WindowsPowerShell\Scripts\ConnectO365All.ps1”

Click on the Advanced button and check the box: Run As Administrator
Under the General Tab, name your shortcut: (CompanyName) O365 All PowerShell
Click OK to save the shortcut to your desktop.

LAST BUT NOT LEAST, RUN THE FOLLOWING COMMAND BEFORE EXITING OR CLOSING YOUR POWERSHELL WINDOW. THIS WILL REMOVE ALL THE SESSIONS YOU’VE CONNECTED TO:

Get-PSSession | Remove-PSSession

HAPPY SCRIPTING!
LEARN, DO, LIVE!

References:
Connect to all O365 Services in one PowerShell Window
How to connect to all O365 Services through PowerShell
Connecting to Office 365 “Everything” via PowerShell

Checking Drive Space Volumes for DAG DB members through PowerShell

I had received a weird alert for a DB volume for a DAG member being below threshold. This was odd to me due to the fact that there were four DAG members and we only received an alert for one. I went into Azure Log Analytics and ran the following query to render a graph for the past 14 days showing the percent free space of the volume for all the DAG members.

Thanks Georges Moua for the query script!

Now the reason I can run the query this way is due to the fact that the Design of the DAG was correctly done and the DB folders are identical on all DAG members. The query rendered the following chart:

As you can see the Green DAG member is way below the other DAG members.

I next went to an Exchange Server in the DAG and got the volume data for all the members in the DAG:

EX02’s volume free space is far below the other DAG members

I went on EX02 and found that there was a subfolder named “Restore” that was not present on the other servers. I ran the following script to get the size of that folder in GB:

The folder size was 185 GB. Removing that folder, along with all subfolders/files, would balance the free space to the other DAG members. I ran the following cmdlet to remove the folder and all subfolders/files:

This remediated the alert and balanced the drive space across all DAG members.

POST YOUR COMMENTS OR QUESTIONS!
HAPPY TROUBLESHOOTING!

Event 11022 with MSExchangeTransport – Easy Validation Test

In a hybrid environment, you’re always connecting between the cloud and on premises to establish transport through the connectors to transport mail. By default, this is done over a TLS (Transport Layer Security) connection. It’s similar to a VPN or SSL connection using certificates on the Transport Layer of the network stack to encrypt the data between the two Organizations in a Hybrid configuration.

Because you are using certificates, the certificate must be validated properly and checked to see if it has expired or been revoked by the issuing company. A revocation list is created and updated regularly for this purpose. If the connecting organization cannot validate the revocation of the certificate, it will not establish a TLS connection with the connecting organization. You will then get the following event:

Event 11022
MSExchangeTransport
Error:
Failed to confirm domain capabilities ‘mail.protection.outlook.com:AcceptOorgProtocol’ on connector ‘Inbound from Office 365’ because validation of the Transport Layer Security (TLS) certificate failed with status ‘RevocationOffline’. Contact the administrator of ‘mail.protection.outlook.com’ to resolve the problem, or remove the domain from the TlsDomainCapabilities list of the Receive connector.

Most likely, there is a network issue with the On Premises Organization being able to retrieve the Revocation File with the Certificate Information. Since it cannot retrieve that file, it stops the transport connection and throws the error.

A simple validation to validate the connector and assure transport from Office365 is to run the following cmdlet from the server on premises that performs the connection:

Again, I like to put the other cmdlets of 
write-host, hostname, and date 
in order to make it easy to document when working an incident.

From the highlighted text, we can see the test was successful.

The test runs a connection for each connector and tests the validity of each connector. If a success is returned, then we have knowledge that the certificate was validated and the connection was established through the connector from Office365.

If you get a failure though, you will need to run tests to see if you can pull the revocation list for the certificate as well as a simple test to connect to Office365:

Connect to Exchange Online via Powershell

IMPORTANT NOTE

I wanted to put some information on how to pull the CRL Distribution Point for the Office365 so that you could run an Invoke-WebRequest to pull the CRL file from the Distribution Point, but I have NOT found a single way through Powershell to pull that information. I have searched multiple posts and articles showing all these advanced methods of using certutil and PowerShell to get a bunch of other information, but NOTHING on how to pull the URL for the CRL file from the certificate. Doing a Get-ChildItem for the certificate using the Thumbprint does NOT pull that property from the certificate. Now, if you have a cmdlet that WILL do that, PLEASE POST!

So, in essence, to troubleshoot if you can get to the CRL file, you get the URL for the CRL Distribution Point from the GUI Properties of the certificate. Then you run the following cmdlet in PowerShell:

POST COMMENTS!
HAPPY TROUBLESHOOTING!

What the Hybrid Configuration Wizard Performs in the background and configuring Hybrid Co-Existence with Exchange Online

I’m working on getting certified in Exchange Hybrid Scenarios and Exchange Online configuration as part of my skill set for Exchange. In doing so, I had successfully implemented a complete Full Hybrid Exchange Environment between my Exchange Online Tenant and my On Premises Exchange 2019 Environment last evening.

I wanted to give an update that was posted to my LinkedIn Posting on this. Thank you Brian Day for the vote of confidence and caution that running these cmdlets manually is not supported by Microsoft and that the HCW, like all the Online Microsoft Products, is constantly changing and being updated.

Important Note

As preparation, I bought some Exchange Online Plan 1 licenses which give me a 50 GB mailbox limit and basic mailbox functionality. It does not include the more advanced features such as ATP, or DLP. I am running most of those features through my On Premises Environment. I mainly wanted to be able to place mailboxes in the cloud and have a hybrid setup. My plan was to have mail flow continue through my On Premises environment so that my Exchange Server features would be used and I would not have to change any MX or SPF records. I also had my certificates in place for SSL and OWA so I would want keep mail flow routed that way, through on premises. I do want to be able to have Free/Busy lookups cross-premise so federation would have to be enabled as well. I would also have to enable the MRS proxy on my Exchange Server so that mailbox migration could be implemented cross-premise. I also have previously configured Azure AD Sync along with ADFS for Single Sign On. In my case, another server was not needed as I didn’t have enough mailboxes or real need to split my frontend and backend deployment. Running the Hybrid Configuration Wizard would not open any new ports or change any existing port traffic that was already configured on my firewall. These are just a few of the considerations that need to be looked at when considering a hybrid integration.

Here is a great article to read for the prerequisites
Exchange Hybrid Deployment Pre-requisites

So, once I had all those considerations handled in my design, I ran the Hybrid Configuration Wizard. What I want to do in this blog post is to go through the steps that the wizard does in the background to setup the Hybrid Environment as you go through the Wizard.

I mainly used the following blog post as a reference, but have approached it differently by diving into the cmdlets that are run during the process:

1. The HCW validates the On-premises and Online Exchange Connection.

The Hybrid Configuration Wizard checks if it is possible to connect to both servers with PowerShell. It runs the Get-ExchangeServer cmdlet on premises after resolving the server in DNS. It then connects to Exchange Online, authorizing the connection:

Authority=https://login.windows.net/common Resource=https://outlook.office365.com ClientId=abcdefgh-a123-4566-9abc-2bdflancelin

2. The HCW collects data about Exchange configuration from the on-premises Active Directory

The Wizard gathers information about the local domain. In order to do that, the HCW executes a series of cmdlets.

These include, in order:

3. The HCW collects information on the Exchange online (Office 365) configuration

This task repeats what has been done in the previous step, only for the Exchange online, instead of the on-premises one.

The cmdlets include, in order:

4. Federation Trust is determined. If not present, a new Federation Trust and the required certificate will be created on the local Exchange Server

You will be prompted in the Wizard to create a Federation Trust if not present. The following articles explain Federation and its requirements:

Understanding Federation – Link Here
Understanding Federated Delegation – 
Link Here
Create a Federation Trust – 
Link Here

If the activity is finished successfully, a new certificate should appear on the on-premises Exchange Certificates list. The new certificate includes “Federation” in its Subject field. To make sure the certificate is there, you can run a cmdlet: Get-ExchangeCertificate | ft -a -wr


The results will look like this

5. The HCW creates a new Hybrid Configuration Object in the local Active Directory

The HCW will run cmdlets based on the information you provide in the HCW for the certificate, the on premises Exchange Server, the domain(s), and what features you want turned on:

It then checks the settings through the following cmdlets:

It then enables Organization Customization for both environments through this cmdlet:

6. Configuration is then completed to modify the settings on the on premises Exchange environment 

EmailAddressPolicy – HCW adds address @tenant.mail.onmicrosoft.com
The HCW configures remote domains – adds tenant.mail.onmicrosoft.com and tenant.onmicrosoft.com
The HCW adds a new accepted domain – adds tenant.mail.onmicrosoft.com

Some of the cmdlets run:

7. The HCW Configures the Organization Relationship between the local server and the cloud.

This configuration is not necessary in minimal hybrid deployment. Since I have a full hybrid deployment configured, the cmdlets were run as needed to configure it. Thanks to the correct configuration, it is possible to synchronize free/busy status of mailboxes and their elements between the on-premises Exchange Environment and Exchange online. 

Some of the cmdlets run in the process:

8. The HCW and setting connectors on both Exchange servers

The HCW checks to see if the connectors are there, if not, it sets them up. During this workflow, four connectors are set – one receive and one send connector for each server. Those connectors guarantee the mail flow between the on-premises and Exchange Online.

Some of the cmdlets run in the process:

The Intra-Organization is set as well:

9. The HCW configures OAuth Authentication across the Hybrid

This LINK explains how OAuth is configured between Exchange On Premises and Exchange Online. It’s a very good article to read as it shows how to get the Modern Authentication style working. Now the HCW does this for you and at the end of the article, you can run cmdlets to test the validity of the configuration.

If you want to go into a deep dive about how the Hybrid Authentication works, see the following:
Deep Dive Into Hybrid Authentication – from the MS Exchange Team Blog

Here are some of cmdlets run during this process workflow:

Again, look at both of those links to get a little more detail as to what each cmdlet does and how it sets up OAuth. Here are the two cmdlets used to test OAuth:

10. Enable MRS Proxy for Migration

In order to be able to move mailboxes between Exchange On Premises and Exchange Online, you have to enable the Exchange Web Services Virtual Directory to use the MRSProxy (Microsoft Replication Service proxy). You also have to set your EWS Virtual Directory to use Basic Authentication. You’ll want to do this before running the HCW or else you will receive the following error when the HCW validates the Migration setup and configuration:

Microsoft.Exchange.Migration.MigrationServerConnectionFailedException: The connection to the server ‘mail.ldlnet.net’ could not be completed. —> Microsoft.Exchange.MailboxReplicationService.RemoteTransientException: The call to ‘https://mail.ldlnet.net/EWS/mrsproxy.svc’ failed. Error details: The HTTP request was forbidden with client authentication scheme ‘Negotiate’. –> The remote server returned an error: (403) Forbidden.. —> Microsoft.Exchange.MailboxReplicationService.RemotePermanentException: The HTTP request was forbidden with client authentication scheme ‘Negotiate’. —> Microsoft.Exchange.MailboxReplicationService.RemotePermanentException: The remote server returned an error: (403) Forbidden.

Some of the cmdlets run to test Migration and MRS Proxy Settings are as follows:

11. Final HCW Configuration and cleanup.

The HCW runs from final cmdlets to finish up the installation of the Hybrid environment. Here are the cmdlets run:

All this information was found in the setup logs that are in the following directory
C:\Users\%username%\AppData\Roaming\Microsoft\Exchange Hybrid Configuration

REFERENCES
Understanding Federation
Understanding Federated Delegation
Create a Federation Trust
Hybrid deployment prerequisites
Exchange Specific OAuth 2.0 Protocol Specification
Understanding WS-Security
JSON Web Tokens
Using OAuth2 to access Calendar, Contact and Mail API in Office 365 Exchange Online
Configurable token lifetimes in Azure Active Directory (Public Preview)
OAuth Troubleshooting
Principles of Token Validation
Troubleshooting free/busy issues in Exchange hybrid environment
How to configure Exchange Server on-premises to use Hybrid Modern Authentication
Microsoft 365 Messaging Administrator Certification Transition (beta)
Microsoft 365 certification exams
Exchange Server build numbers and release dates

PLEASE LEAVE QUESTIONS, COMMENTS, UPDATES! I WOULD LOVE TO HEAR FROM YOU!

The Complete Guide To PowerShell Punctuation

In this short little snippet, I wanted to provide a good download that I had found. In my searching for more knowledge through PowerShell, I found this great PDF file that has listed all the different symbols used in PowerShell Scripting and their meanings.

Here is the download:

Example of the available file.

Please feel free to download it and use it in your education of PowerShell. I have knowledge that I will be using it. The website I got it from has some good explanation of functions and punctuation for PowerShell as well. See below…

Reference:
PowerShell Punctuation

Measuring CPU Processor Times Per Core Across Multiple Servers through PowerShell.

I want to thank Jason Field for the bulk of this script!

Our team was presented with an issue where we needed to measure the CPU Percentage Processor Times for Each Core within the Physical Processor and be able to output that data quickly through PowerShell. We all have knowledge that Performance Monitor can do this through a GUI, but it is very difficult to be able to output that data to a file in a manner that can be easily read. We had an original PowerShell cmdlet that would accomplish this for the total percent processor time for all cores over a one minute period:

Our challenge was to be able to do this per core, over the same time period, and get the average for each core, so that we could measure the output accordingly. I had been working on a script that was able to run the command, but not in a parallel fashion. The script was running in sequence and was taking way longer than one minute to complete. Frustrating to say the least.

In comes Jason Field, showing me the meaning and value of the back tick as well has how to run and monitor job functions in a PowerShell Script so that the task could be completed, as needed, across a server array.

The main purpose of the (back tick) is that it allows for variables to be used in the script block on the remote server instead of being filled in before creating the script block.

Here is the script:

Sample Output from the Script.

What I learned from this script is that the back tick ” ` “ allows for multiple commands to be run in a sub-routine within the script job and be gathered before the main script command is run and the output given. This gets past the multi-threading issue I was having with my original script. The script can then be run across multiple servers using the Invoke-Command cmdlet or over a remote PowerShell session as a Job. The jobs can then be monitored as the scripts finish across the multiple servers in the time period given for the samples. I had modified the script to do the multiple samples and then take the average of the CookedValue per the original cmdlet. I could not, however, get the ExpandProperty parameter to work with the script.

Please, as I am still learning, if you see an error with this script, please alert me with a comment or contact me directly so that I can update the script properly. 

Thanks again Jason! It really was like a light bulb going “DING” when I figured it out with concern to the back tick. I had also been having issues with the active job monitoring process. It was a real help!

PLEASE COMMENT, SHARE, AND HAPPY TROUBLESHOOTING!

Update Edge Server Certificate in a Hybrid Exchange Environment

LDLNET LLC Banner
LDLNET LLC – Life In Action! Your Source for Professional IT Services!

At work, our group was updating the Exchange Edge Server certificates and having mail flow problems causing messages to be in the Poison Queue and not transfer to Office365 properly. We finally got the procedure down to where it started working. I wanted to post that procedure here since I had never really worked with Edge Servers in the past. If this post can help you in the future, then “I done good!”

Now, everywhere I had read said that you have to remove and then re-create the Edge Subscription between your Transport Servers and the Edge Servers when changing the certificate.

Here is why:
When we subscribe the edge server, an AD LDS account called the EdgeSync Bootstrap Replication Account (ESBRA) is created. This is created using the default certificate private key of the certificate assigned to SMTP service as default, hence as long as we have that certificate the transport servers will be able to authenticate to the Edge server and replicate the required information to ADAM database.

Now when we install a third party certificate we assign SMTP service to it and overwrite the current certificate, basically we change the default SMTP certificate. So, by doing this, the current Edge Subscription will fail as the Edge server will not be able to decrypt the ESRA account passed on when communicating with the transport servers using the new certificate private key.

So, once you have your new 3rd party certificate, you install it to your edge servers:

Then, you enable the Exchange Certificate to be used for SMTP:

Mail flow will be broken at this point. Since messages were going to the poison queue due to the ESBRA account encryption failing when authenticating with the internal Transport Servers, I had to completely stop transport by disabling the Send Connectors between the internal Transport Servers and the Edge servers from the Transport Server.

The configuration of the Edge Servers were that there were two servers in the Edge Farm. Since one of the servers had not had a proper sync in a while, I decided to remove the recipient database that had been replicated to the failing server when removing the Edge Subscription. The other server, I left the recipient database in place so that we could get one server up and running quickly since transport was stopped at this point.

Here is the command that was run to remove the Edge Subscriptions. This needed to be completed on both the Edge Servers and the corresponding Transport Server:

I then had to create a new Edge Subscription file on each Edge Server to copy to the Transport Server. I already had connectors set so I did not need to recreate those connectors.

I copied the xml files of each Edge Server to the Transport Server and ran the following cmdlet to create the Edge Subscription to the Edge Servers. I then had the Edge Servers Rebooted for good measure before redoing a Full Manual Edge Sync.

I next had to preform a full manual EdgeSync from the transport server to the Edge Servers to assure that the recipient database on the AD LDS instance was up to date and that the send connectors were replicated properly.

I next had to re-run the Hybrid Configuration Wizard so that I could configure the Edge Servers as the transport for Hybrid cloud-bound Messages. Once the Edge Servers were chosen to transport Hybrid cloud-bound messages, I selected the new Edge Certificate so that transport would work properly when re-enabled and O365 would recognize the new certificate for Hybrid messages bound for the cloud.

I next re-enabled the Edge Send Connectors so that mail flow would begin working once the Full Edge Synchronization was completed. You have to let that complete before you can begin mail flow again so that messages won’t be delivered to the Poison Queue.

Mail flow began working. It took about 90 minutes for all the queues to clear properly that had queued messages waiting to transport. Any Poison Queued messages were removed with NDRs sent to the senders.

It was a doozy to say the least. Happy Troubleshooting!
Leave Comments or Questions you may have!

References:
Exchange 2010 Edge Transport Server: Configuring EdgeSync
Mail flow breaks after renewing SSL Certificate on Edge server with Edge Subscription
Start-EdgeSynchronization

PowerDNS Script

I was compiling some scripts to be able to modify DNS records in my previous post. While browsing through different scripts in the TechNet Gallery, I came across the following Script that provides a menu, options, and different settings which really make it a great script to use if you do a lot of DNS Modification and want to do it through PowerShell.

Here is the link to the original script page, but I have updated and modified the script to include being able to add/remove DNS Zones as well.

The DNS Zone functions have not been tested as of yet. I still have to get on my server farm at home and run this. It will save time though with me having to switch servers when adding a bulk list of DNS zones for my website farm. Play with the script and let me know what you think!

Removing a DNS Record through Powershell

In most environments, an admin usually just jumps on the server that they need to work from and does their work from there. An example of this would be an admin working on an IIS Web server and needing to remove a DNS A record from DNS without having to logon to the DNS server itself so that they can quickly make their changes in IIS.

A quick way to do this would be to run the following ps1 script in PowerShell in order to be able to remove the record quickly:

Sample Output from the script.
Sample Output from the Script removing DNS A Record: test.ldlnet.local

Now this works for a single DNS A Record. If there are multiple IPs for the same DNS record, for example, test.ldlnet.local points to both 192.168.1.23 and 192.168.1.24, then you probably need to run the following script listed here to keep the script from failing with an error. I have also expanded the entries to help the input be more specific:

Output from RemoveDNSRecord.ps1 for removing DNS A Record test.ldlnet.local with IP of 192.168.1.24

I have found some other good scripts that I will post to the blog to help manage DNS records through PowerShell. This should get things started for now. Happy Troubleshooting!

How to log off a RDP session remotely.

Have you ever tried to logon to a Remote Desktop session on a Windows Server and you get stuck on the following screen?

Stuck Logging Off

Well, here is a simple way you can remotely kill that RDP session through PowerShell so that you can logon to the server again…

Sample Output:

Output from qwinsta command…

Once you get the session ID, you can run the following to kick off the user’s session completely so that you can log into the server again:

Note: The session will be completely removed from RDP and anything running will be lost, but most of the time, you don’t have to worry about losing anything as the whole reason to lose the session is because you cannot logoff of it normally.

Life is then good again as you can log into your RDP session. Yay!

MaxConcurrentAPI Script for Netlogon Issues

I get incidents from time to time that deal with Netlogon Service Issues. For example: Semaphore Waiters, Semaphore Timeouts, Semaphore Acquires, etc…

Here is a script I got from the Microsoft Gallery
In some enterprise environments the sheer volume of NTLM authentication can produce performance bottlenecks on servers. To help make the problem easier to detect, this PowerShell script was written.

Execution:

Now, I modified this script taking out the clear screen parameter so that I could be run against multiple servers. Place the script in your Scripts directory and name it CheckMaxConcurrentApiScript.ps1

First, in PowerShell, gather your list of servers:

Or

Next, run the command to run the ps1 against those servers:

Or

Sample Output:

DC03
Detection Time : 12/13/2018 7:56:16 PM
Problem Detected : False
Server Name : DC03
Server Role : Domain Controller
Domain Name : ldlnet.org
Operating System : Microsoft Windows Server 2008 R2 Enterprise
Time Since Last Reboot : 4 days 22 hours
Current Effective MaxConcurrentApi Setting : 10
Suggested MaxConcurrentApi Setting (may be same as current) : 10
Current Threads in Use (Semaphore Holders) : 0
Clients Currently Waiting (Semaphore Waiters) : 0
Cumulative Client Timeouts (Semaphore Timeouts) : 17
Cumulative MaxConcurrentApi Thread Uses (Semaphore Acquires) : 3493999
Duration of Calls (Avg Semaphore Hold Time) : 0

EXCH02
Detection Time : 12/13/2018 8:00:53 PM
Problem Detected : False
Server Name : EXCH02
Server Role : Member Server
Domain Name : ldlnet.org
Operating System : Microsoft Windows Server 2008 R2 Standard
Time Since Last Reboot : 4 days 23 hours
Current Effective MaxConcurrentApi Setting : 10
Suggested MaxConcurrentApi Setting (may be same as current) : 10
Current Threads in Use (Semaphore Holders) : 0
Clients Currently Waiting (Semaphore Waiters) : 0
Cumulative Client Timeouts (Semaphore Timeouts) : 570
Cumulative MaxConcurrentApi Thread Uses (Semaphore Acquires) : 1682257
Duration of Calls (Avg Semaphore Hold Time) : 0

Hopefully, this script will assist you with gathering the needed information to help you balance the netlogon load between your servers when needed in your environment.

HAPPY TROUBLESHOOTING!

Exchange Back Pressure and Transport Issues

Sometimes you’ll get a situation where your email will stop flowing on one of your Exchange servers. Most of the time, we’re worried about our database and log file drives becoming full, but we don’t necessarily look at the configuration of our Transport servers to see if the resources in those directories become full or taxed to the point where it causes, “Back Pressure”. Exchange has events setup to monitor when that threshold is crossed and transport functionality is hindered:

  • Event ID 15004: Increase in the utilization level for any resource (eg from Normal to Medium)
  • Event ID 15005: Decrease in the utilization level for any resource (eg from High to Medium)
  • Event ID 15006: High utilization for disk space (ie critically low free disk space)
  • Event ID 15007: High utilization for memory (ie critically low available memory)

If you think that your server is experiencing a back pressure event, you can look quickly through event viewer for these events with the following script:

In most cases, you get the 15006 event:

Event 15006, MSExchangeTransport
Microsoft Exchange Transport is rejecting message submissions because the available disk space has dropped below the configured threshold. The following resources are under pressure:
Used disk space (“C:\Microsoft\Exchange Server\V15\TransportRoles\data\Queue”)
Used disk space (“C:\Microsoft\Exchange Server\V15\TransportRoles\data”)

Overall Resources
The following components are disabled due to back pressure:
Mail resubmission from the Message Resubmission component.
Mail submission from Pickup directory
Mail submission from Replay directory
Mail resubmission from the Shadow Redundancy Component
Inbound mail submission from the Internet

Exchange uses the following formula to calculate the threshold at which these events fire:

100 * (hard disk size – fixed constant) / hard drive size

So, in order to get transport running again, get your C: Drive cleared so that back pressure is lifted off of the server and transport can run again. You should then get a 15005 Event:

Log Name: Application 
Source: MSExchangeTransport 
Date: 10/19/2017 2:21:52 PM 
Event ID: 15005 
Task Category: ResourceManager 
Level: Information 
Keywords: Classic 
User: N/A 
Computer: EX01.ldlnet.org 
Description: 
The resource pressure decreased from Medium to Low.No components disabled due to back pressure. 
The following resources are in normal state: 
Private bytes 
System memory 
Version buckets[C:\Program Files\Microsoft\Exchange Server\V15\TransportRoles\data\Queue\mail.que] 
Jet Sessions[C:\Program Files\Microsoft\Exchange Server\V15\TransportRoles\data\Queue\mail.que] 
Checkpoint Depth[C:\Program Files\Microsoft\Exchange Server\V15\TransportRoles\data\Queue\mail.que] 
Queue database and disk space (“C:\Program Files\Microsoft\Exchange Server\V15\TransportRoles\data\Queue\mail.que”) 
Used disk space (“C:\Program Files\Microsoft\Exchange Server\V15\TransportRoles\data\Queue”) 
Used disk space (“C:\Program Files\Microsoft\Exchange Server\V15\TransportRoles\data”) 
Overall Resources

Now that you’ve completed this, how to do setup your Exchange Environment to keep this from happening again? Well, I would pick a drive volume that you’d never have to worry about filling up, or give the transport its own drive volume. This can be accomplished with a .ps1 script that is installed in the default Scripts directory on your Exchange Server installation:

‘C:\Program Files\Microsoft\Exchange Server\Vxx\scripts’

The name of that file is Move-TransportDatabase.ps1 and it
changes the location of the transport directories, moves the Queue Database and restarts the Transport service automatically. Here is an example of how the script is executed when running Exchange PowerShell with elevated privileges and wanting to move all the services to the E: Drive:

So, that’s how you get your transport directories configured to relieve “back pressure”. In my experience, somebody was doing a PST export of a mailbox to the local C: Drive instead of a specific drive volume that wouldn’t affect the OS, Exchange, and Transport. That’s for another time though! Happy Troubleshooting!

Reference: Exchange 2016 – Back Pressure
Reference: A Guide To Back Pressure.
Reference: Change Exchange Server 2013/2016 Mail Queue Database Location

Running Test-MailFlow on remote Exchange Servers

In my job I try to make the process as efficient as possible so that I can determine the issue quickly and then resolve it as quickly as possible. I was having issue with the Test-Mailflow cmdlet and running it remotely against the servers. I was getting the following error:

MapiExceptionSendAsDenied: Unable to submit message. (hr=0x80070005, ec=1244)

If I had multiple servers to test, I would have to logon to each server and run the test which is not efficient at all. I wanted to automate it more without having to change permissions to do so. I wanted to run an Invoke-Command and place the PSSession for Exchange in that command so that I could run the Test-Mailflow cmdlet and get the results.

Paul Cunningham wrote a great article and script to resolve this. READ HERE

His script allows you to input the server name when running the PS1 from the PowerShell Command Prompt:

I was able to take the Test-MailflowRemote.ps1 script and set it to run on all the mailbox servers for the environment I was monitoring. Now, we can only run the Test-MailFlow cmdlet against Exchange Mailbox Servers that have active databases mounted on them. So, I run the following first to get the list of Mailbox Servers that contain at least 1 active database:

I then run the ps1 script using the array I created with the $Svrs variable:

Output:

This helps a bunch when you need to run on multiple servers and get the test information quickly. Please comment! Happy Troubleshooting!

Protected AD Groups and the problems they can cause accounts

I have run into this issue over the years with accounts being in the Domain Admins group and having issues running PowerShell cmdlets as well as not being able to connect to ActiveSync from a mobile device with the account.

These issues are due to the AdminSDHolder Template in AD and the SDProp Process that is run every 60 Minutes in AD.
This is explained in fantastic detail through the following Microsoft article: Protected Accounts & Groups In Active Directory

Here is an example of an issue that occurred in one of the environments that I was managing. A user was trying to run the following AD cmdlet in PowerShell on DC01:

The user got the following error when the cmdlet was executed:

Set-ADUser : Insufficient access rights to perform the operation
At line:1 char:1
+ Set-ADUser lancel -Server dc01.ldlnet.org -Replace @{title=”Senior O …
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo: NotSpecified: (lancel:ADUser) [Set-ADUser], ADException
+ FullyQualifiedErrorId : ActiveDirectoryServer:8344,Microsoft.ActiveDirectory.Management.Commands.SetADUser

The issue was that the admin account used to run the cmdlet was in the Domain Admins group and was not inheriting permissions per the AdminSDHolder template that was applied to the account:

I checked to see that the admin account was in a protected group:

I next went to the Security Tab > Advanced Button and saw that the Enable Inheritance button was visible:

I’ve circled where to look in the window.

This verifies that the account is protected due to being in the Domain Admins group. Now, there are two workarounds for this particular error that we were experiencing.

  1. Click the Enable Inheritance button. This will cause the permissions to be inherited temporarily. When SDProp is cycled again, the account will lose any inherited permissions and will be essentially “broken” again. This is not good if you’re going to be running cmdlets regularly to modify AD Accounts.
  2. The preferred method to work around this issue is to set the -Server parameter to point to a different DC than the one you are on. So, essentially, we tell the cmdlet to execute on DC02 when running the cmdlet from DC01.

Either method will allow the cmdlet to execute successfully and modify the object. You would think that Microsoft would have noticed this issue with running an admin cmdlet for Active Directory, but they have not fixed this issue as of yet nor do i think they plan to. I would just go with workaround number two and remain sane.

Another example of this Protected Group issue comes with an account in a Protected Group that has a mailbox not being able to connect to Exchange ActiveSync when setting up their mobile device.

  • You usually get a 500 error on the device that you cannot connect.
  • You will also see event 1053 in Event Viewer alluding to not having sufficient access to create the container for the user in AD.

Read this page for more information: Exchange ActiveSync Permissions Issue with Protected Groups

So, in your endeavors admins, keep this in mind when running into these types of problems. Happy Troubleshooting!

Exchange Server HealthSets

This is a monitoring feature included with Exchange that until recently, I did not know existed, as it wasn’t really mentioned in any of my dealings with Exchange Server up until recently. The HealthSets feature monitors every aspect of a running Exchange Server and is broken down into three monitoring components:

  • Probe: used to determine if Exchange components are active.
  • Monitor: when probes signal a different state then the one stored in the patters of the monitoring engine, monitoring engine will determine whether a component or feature is unhealthy.
  • Responder: will take action if a monitor will alert the responder about an unhealthy state. Responders will take different actions depending on the type of component or feature. Actions can start with just recycling the application pool and can go to as far as restarting the server or even worse putting the server offline so it won’t accept any connections.

From what I have experienced in the past year with these HealthSets, an alert will be thrown due to a change in a service, or a restart of a service, a failed monitoring probe result, or something of the like. The healthset will become “unhealthy” in state at that time. You can run the following on a server in order to get the healthset status of that server:

If you get alerts for multiple Exchange Servers, let’s say for instance, the transport array, you can run the following cmdlets to get the status of all the Transport Servers in the array:

HealthSet PowerShell Output
HealthSet PowerShell Output

Now, a lot of times, the Unhealthy value in the HealthSet will have corrected itself as per the Responder, even though the AlertValue will remain Unhealthy. To clear the cache quickly and have the monitor probes run again for verification, perform the following restarts of services from this cmdlet in this order:

That should clear the probe results and let them run again. Now, should they again return an error, we will need to dig deeper to figure out the issue.
What you will want to do first is get the monitor definition. In this example, the Mapi.Submit.Monitor was the component that was unhealthy in the healthset. I had to run the following cmdlet to get the Monitor Definition: