Book of Nutanix Clusters

Nutanix Clusters on AWS

Nutanix Clusters on AWS provides on-demand clusters running in target cloud environments using bare metal resources. This allows for true on-demand capacity with the simplicity of the Nutanix platform you know. Once provisioned the cluster appears like any traditional AHV cluster, just running in a cloud providers datacenters.

Supported Configurations

The solution is applicable to the configurations below (list may be incomplete, refer to documentation for a fully supported list):

Core Use Case(s):

Management interfaces(s):

Supported Environment(s):

Upgrades:

Compatible Features:

Key terms / Constructs

The following key items are used throughout this section and defined in the following:

Architecture

From a high-level the Nutanix Clusters Portal is the main interface for provisioning Nutanix Clusters on AWS and interacting with AWS.

The provisioning process can be summarized with the following high-level steps:

  1. Create cluster in Nutanix Clusters Portal
  2. Deployment specific inputs (e.g. Region, AZ, Instance type, VPC/Subnets, etc.)
  3. The Nutanix Cluster Orchestrator creates associated resources
  4. Host agent in Nutanix AMI checks-in with Nutanix Clusters on AWS
  5. Once all hosts as up, cluster is created

The following shows a high-level overview of the Nutanix Clusters on AWS interaction:

Nutanix Clusters on AWS - Overview Nutanix Clusters on AWS - Overview

The following shows a high-level overview of a the inputs taken by the cluster orchestrator and some created resources:

Nutanix Clusters on AWS - Cluster Orchestrator Inputs Nutanix Clusters on AWS - Cluster Orchestrator Inputs

The following shows a high-level overview of a node in AWS:

Nutanix Clusters on AWS - Node Architecture Nutanix Clusters on AWS - Node Architecture

Given the hosts are bare metal, we have full control over storage and network resources similar to a typical on-premise deployment. For the CVM and AHV host boot, EBS volumes are used. NOTE: certain resources like EBS interaction run through the AWS Nitro card which appears as a NVMe controller in the AHV host.

Placement policy

Nutanix Clusters on AWS uses a partition placement policy with 7 partitions by default. Hosts are striped across these partitions which correspond with racks in Nutanix. This ensures you can have 1-2 full “rack” failures and still maintain availability.

The following shows a high-level overview of the partition placement strategy and host striping:

Nutanix Clusters on AWS - Partition Placement Nutanix Clusters on AWS - Partition Placement

In cases where multiple node types are leveraged (e.g. i3.metal and m5d.metal, etc.), each node type has its own 7 partitions which nodes are striped across.

The following shows a high-level overview of the partition placement strategy and host striping when multiple instance types are used:

Nutanix Clusters on AWS - Partition Placement (Multi) Nutanix Clusters on AWS - Partition Placement (Multi)

Storage

Storage for Nutanix Clusters on AWS can be broken down into two core areas:

  1. Core / Active
  2. Hibernation

Core storage is the exact same as you’d expect on any Nutanix cluster, passing the “local” storage devices to the CVM to be leveraged by Stargate.

Note
Instance Storage

Given that the "local" storage is backed by the AWS instance store, which isn't fully resilient in the event of a power outage / node failure additional considerations must be handled.

For example, in a local Nutanix cluster in the event of a power outage or node failure, the storage is persisted on the local devices and will come back when the node / power comes back online. In the case of the AWS instance store, this is not the case.

In most cases it is highly unlikely that a full AZ will lose power / go down, however for sensitive workloads it is recommended to:

  • Leverage a backup solution to persist to S3 or any durable storage
  • Replicate data to another Nutanix cluster in a different AZ/Region/Cloud (on-prem or remote)

One unique ability with Nutanix Clusters on AWS is the ability to “hibernate” a cluster allowing you to persist the data while spinning down the EC2 compute instances. This could be useful for cases where you don’t need the compute resources and don’t want to continue paying for them, but want to persist the data and have the ability to restore at a later point.

When a cluster is hibernated, the data will be backed up from the cluster to S3. Once the data is backed up the EC2 instances will be terminated. Upon a resume / restore, new EC2 instances will be provisioned and data will be loaded into the cluster from S3.

Networking

Networking can be broken down into a few core areas:

Note
Native vs. Overlay

Instead of running our own overlay network, we decided to run natively on AWS subnets, this allows VMs running on the platform to natively communicate with AWS services with zero performance degradation.

Nutanix Clusters on AWS are provisioned into an AWS VPC, the following shows a high-level overview of an AWS VPC:

Nutanix Clusters on AWS - AWS VPC Nutanix Clusters on AWS - AWS VPC

Note
New vs. Default VPC

AWS will create a default VPC/Subnet/Etc. with a 172.31.0.0/16 ip scheme for each region.

It is recommended to create a new VPC with associated subnets, NAT/Internet Gateways, etc. that fits into your corporate IP scheme. This is important if you ever plan to extend networks between VPCs (VPC peering), or to your existing WAN. I treat this as I would any site on the WAN.

Host Networking

The hosts running on baremetal in AWS are traditional AHV hosts, and thus leverage the same OVS based network stack.

The following shows a high-level overview of a AWS AHV host’s OVS stack:

Nutanix Clusters on AWS - OVS Architecture Nutanix Clusters on AWS - OVS Architecture

The OVS stack is relatively the same as any AHV host except for the addition of the L3 uplink bridge.

For UVM (Guest VM) networking, VPC subnets are used. A UVM network can be created during the cluster creation process or via the following steps:

From the AWS VPC dashboard, click on ‘subnets’ then click on ‘Create Subnet’ and input the network details:

Nutanix Clusters on AWS - Create Subnet Nutanix Clusters on AWS - OVS Architecture

NOTE: the CIDR block should be a subset of the VPC CIDR range.

The subnet will inherit the route table from the VPC:

Nutanix Clusters on AWS - Route Table Nutanix Clusters on AWS - Route Table

In this case you can see any traffic in the peered VPC will go over the VPC peering link and any external traffic will go over the internet gateway.

Once complete, you will see the network is available in Prism.

WAN / L3 Networking

In most cases deployments will not be just in AWS and will need to communicate with the external world (Other VPCs, Internet or WAN).

For connecting VPCs (in the same or different regions), you can use VPC peering which allows you to tunnel between VPCs. NOTE: you will need to ensure you follow WAN IP scheme best practices and there are no CIDR range overlaps between VPCs / subnets.

The following shows a VPC peering connection between a VPC in the eu-west-1 and eu-west-2 regions:

Nutanix Clusters on AWS - VPC Peering Nutanix Clusters on AWS - VPC Peering

The route table for each VPC will then route traffic going to the other VPC over the peering connection (this will need to exist on both sides if communication needs to be bi-directional):

Nutanix Clusters on AWS - Route Table Nutanix Clusters on AWS - Route Table

For network expansion to on-premise / WAN, either a VPN gateway (tunnel) or AWS Direct Connect can be leveraged.

Security

Given these resources are running in a cloud outside our full control security, data encryption and compliance is a very critical consideration.

The recommendations can be characterized with the following:

Usage and Configuration

The following sections cover how to configure and leverage Nutanix Clusters on AWS.

The high-level process can be characterized into the following high-level steps:

  1. Create AWS Account(s)
  2. Configure AWS network resources (if necessary)
  3. Provision cluster(s) via Nutanix Clusters Portal
  4. Leverage cluster resources once provisioning is complete

More to come!