Sunday, February 24, 2019

Performing a Rolling Upgrade of Windows Server 2012R2 to 2016

In January I performed a rolling cluster upgrade on my organizations Microsoft Hyper-V cluster going from server 2012R2 to server 2016.  I highly recommend reading the following resources as they were a great reference for me.

https://docs.microsoft.com/en-us/windows-server/failover-clustering/cluster-operating-system-rolling-upgrade
https://redmondmag.com/articles/2016/12/15/clustered-hyperv-deployment.aspx
https://redmondmag.com/articles/2016/12/19/hyperv-deployment-2.aspx

There were a couple of caveats with the upgrade as outlined below.
  • Move from seven clustered nodes to a two node failover
  • Consolidate 19 VMs to 12
  • Upgrade NAS and Cluster Servers to 10 Gig networking
  • virtualize all AD controllers
What is being kept:
  • 2 X FreeNAS storage Servers
What is being removed:
  • 6 X AMD Piledriver 15H Microcloud Server
What was being upgraded and kept:
  • 2 X AMD Bulldozer 1U Servers
Upgrading the core to 10 gig

The migration to the 2X AMD bulldozer 1U servers was due to a lack of availability of getting new hardware ordered and a urgent need to remove the Microcloud system.  I felt it was a good idea to start with the storage as I had 4 servers I needed to upgrade with 10 Gig and with Freenas it is pretty straight forward.  We went with Intel X540 T2 cards because they were within our project budget and with having purchased Meraki MS350 24X which have 10 Gig, I though it was best to make use of them.  Unfortunately they don't support iWARP/RDMA but we were running 19 virtual servers off 2 freenas boxes on dual gigabit LACP LAGG.  Any limit we would reach I would think should be caused by disk (spinning rust) as they are not high speed SAS or NVMe but Enterprise SATA drives.

The upgrade on the FreeNAS servers went great all that was really required was having good documentation for any IP addresses on the storage and that the switches had all the required networks.  On the cluster management part it was a matter of shutting down the VM's, off-lining the cluster storage (maintenance mode), popping in the new network cards and configuring the networks to what they were everything came back without a hitch.  For verification we ran with the 10 gig in the systems for a month with no issues.  We then started with the Rolling upgrade

Step 1. - Run a cluster validation report.

It's hard to see where you're going without knowing where you've been and where you are.  This is why I recommend running a cluster validation report before you start anything.  This way you know what errors you were getting before hand and you will be able to know if something happened because of the upgrade or because there was a mistake or that you fixed something that might have been broken.  As shown below is a validation report before I started the upgrade from 2012R2 to 2016


Server 2012R2 Cluster Validation Report
As you can see I have several warnings on the validation report.  The errors included some network issues of dropping packets and other things such as the Cluster Aware Update Service being configured to be run in a separate monitor as shown below.



With that report being done, we can now start our Rolling upgrade.

Rolling Upgrade

I had to come up with a plan of more or less what I was going to do for the rolling upgrade.  This is it in a nutshell.



<!--IMPORTANT NOTE--!>


When the first Windows Server 2016 node joins the cluster, the cluster enters "Mixed-OS" mode, and the cluster core resources are moved to the Windows Server 2016 node. A "Mixed-OS" mode cluster is a fully functional cluster where the new nodes run in a compatibility mode with the old nodes. "Mixed-OS" mode is a transitory mode for the cluster. It is not intended to be permanent and customers are expected to update all nodes of their cluster within four weeks.


Cluster Upgrade Process
  1. Disable Cluster Aware Updates (CAU), verify if CAU is currently running by using the Cluster-Aware Updating UI, or the Get-CauRun. Stop CAU using the Disable-CauClusterRole to prevent any nodes from being paused and drained by CAU during the Cluster OS Rolling Upgrade process. This really depends on how many resources you have to keep the cluster going in this case we have enough resources so it is not required.
  2. Backup Cluster Database (requires third party software)
  3. Remove/Drain Rolls from from cluster being upgraded for use in 2 node cluster Backup (Full Export) of Workload DATA (VMS)
  4. After moving the VMs off of server, upgrade RAM and Reconfigure disks 1 SSD OS MIRROR and 1 HDD MIRROR for DATA reinstall with server 2016
  5. Upgrade and Configure networking
  6. Add Hyper-V and cluster rolls
  7. Add node to the cluster 

Verify cluster rolls and nodes in hybird 2012r2 and 2016 setup. Once verified evict the other 6 2012R2 nodes ensuring all rolls are transferred to new 2016 nodes and verify the cluster is operating properly.   Then upgrade the Cluster Functionality level

  1. Re-Enable CAU, restart it using the CAU UI or use the Enable-CauClusterRole
  2. Open Powershell -> Run the Update-ClusterFunctionalLevel cmdlet - no errors should be returned. Fix any errors.
  3. Upgrade the VM’s Hyper-V VM Configuration (Version)

Upgrade the Hyper-V Storage-Pool. Storage pools can be upgraded using the powershell command Update-StoragePool which is an online operation.


Cluster Changes


It is always easier and safer to start new or fresh, however this is not possible at this time.  When we go to upgrade the cluster infrastructure we will be doing a Mixed OS cluster until we remove all 2012R2 servers and upgrade the cluster functionality level. When the first Windows Server 2016 node joins the cluster, the cluster enters "Mixed-OS" mode, and the cluster core resources are moved to the Windows Server 2016 node. A "Mixed-OS" mode cluster is a fully functional cluster where the new nodes run in a compatibility mode with the old nodes. "Mixed-OS" mode is a transitory mode for the cluster. It is not intended to be permanent and customers are expected to update all nodes of their cluster within four weeks.



The two nodes I am upgrading to server 2016 from 2012R2 are almost identical, with two major differences. 2 X 2.8 GHZ quad core processors vs 2 x 3.2 GHz quadcore processors and one requires an old adaptec raid driver vs a dothill raid controller. Both machines have 80GB of DDR3 ECC RAM.

TIPS

  • All Nodes need to have the exact same software installed.
  • All Nodes need to have the exact same Virtual Switches and Named the same.
  • All Nodes Should Have RDS Enabled
  • Virtual Network Adapter TEAM: AdapterName ( I typically put the Cluster/Client communication on the default VLAN)
  • Cluster Name: $CLUSTERNAME


Cluster Upgrade Process


Step 1 . Disable/Stop Cluster Aware Updates
  • If you are running Cluster Aware Updates (CAU), verify if CAU is currently running by using the Cluster-Aware Updating UI, or the Get-CauRun. Stop CAU using the Disable-CauClusterRole to prevent any nodes from being paused and drained by CAU during the Cluster OS Rolling Upgrade process. This really depends on how many resources you have to keep the cluster going in this case we have enough resources so it is not required.
  • Backup Cluster Database (use Veeam, Altaro, Windows Server Backup)
    Backing Up Failover Cluster Configuration
  • Drain the rolls from cluster node being updated. Ensure all rolls from node have been removed and transferred then evict from cluster
  • Upgrade Node to Server 2016. I made sure that I had as many of the rolls and features that were on the other nodes re-setup on the new 2016 node.
  • Rolls Required
    • File and Storage Services everything but work folders
    • Hyper-V

  • Features Required
    .net 3.5
    .net 4.5
    Client for NFS
    Data Center Bridging
    Enhanced Storage
    Failover Clustering
    Ink and Handwriting services
    media foundation
    multipath io
    Remote Admin Tools
    Hyper-V management
    File services
    Failover cluster tools
    SMB1
    SMB Bandwidth limit
    User Interfaces
    Windows Internal Database
    Windows Server Backup
    Windows server migration tools
    Windows Standards-based storage management
    WoW64 Support
  • Setup network. (Intel network setup) Cluster only allowed on cluster network. To do this we are setting up NIC Teaming to have easy access to VLAN Tagging.

Once done create the LAGG adding the active NIC to the team.
Repeat the steps for as many times as you need until you have all the VLANS you need for the server.

  • Add the Hyper-V Switches to the network. I setup only one network for VM and Cluster Traffic. It is recommended that you have more then one network for doing this. Make sure you match the proper Microsoft Multiplexor adapter with the proper vlan you have setup on your host interface, otherwise your VM's won't be able to communicate with anything.

VM and Host Network for cluster communication

VM network.  No Host/Cluster Communication Permitted.


    • Re-Connect the iSCSI targets on the host node after all networking is setup and configured.
      iSCSI Initiator
      IP: PORT -SSD
      IP: PORT -HDD
      IP: PORT - Quarcom
      IP: PORT - SSD
      IP: PORT - HDD
      IP: PORT - Backup Quarcom
      • Rejoin cluster
          Verify cluster rolls and nodes in hybird 2012r2 and 2016 setup.  Once verified evict the other 6 2012R2 nodes and verify the cluster is operating properly. Transfer Rolls such as the Cluster Core Resources and ensure are not running on nodes that you are decommissioning.
        1. Re-Enable CAU, restart it using the CAU UI or use the Enable-CauClusterRole
        2. upgrade the Cluster Functionality level
        3. Open Powershell -> Run the Update-ClusterFunctionalLevel cmdlet - no errors should be returned. Fix any errors.
        4. Upgrade the VM’s Hyper-V VM Configuration (Version)

        Upgrade the Hyper-V Storage-Pool. Storage pools can be upgraded using the powershell command Update-StoragePool which is an online operation.

        Verify Cluster Operating Level
        Get-Cluster | Select ClusterFunctionalLevel (Server 2016 and above only)
        https://serverfault.com/questions/848529/hyper-v-2012-r2-cluster-functional-level-missing

          Run cluster verification and fix any issues that crop up.




          Some tips from my Server 2012R2 to 2016 Upgrade:

          • I had to limit the number of networks the cluster host runs off of due to some DNS resolution issues that worked using server 2012R2 but not under 2016.  I had to remove the cluster host from using these networks due to DNS Error 1966 showing up and under server 2016 causing cluster communication problems.  To do this I disabled the communication using the Hyper-V switch manager with the setting shown below.

          • I did have to remove and re-add the cluster auto-update roll. 
          • Cleanup the AD and remove any cluster nodes that did not get automatically removed.
          Other then that I had a really successful cluster update to server 2016 and everything has been working well since.

          How to migrate PFSense Over to KEA DHCP Server from ISC DHCP Server

          I am a PFSENSE User and I manage PFSENSE for some other organizations and the time has come to make the switch for the DHCP Server over to K...