Appearance
AWS Private Installation Instructions
Overview
This document provides a comprehensive guide for installing Yellowbrick in a private Amazon Web Services (AWS) environment using a bring-your-own-VPC approach. This guide assumes that you will configure the necessary infrastructure within AWS, including the Virtual Private Cloud (VPC), to meet the requirements of a private deployment that creates no public IP addresses. This reference serves as a baseline but can be customized to fit your specific environment.
NOTE
A private installation requires advanced knowledge of cloud infrastructure and networking. Please contact your internal IT and operations teams as needed, or reach out to Yellowbrick Support for assistance.
Understanding a Private EKS Cluster
A private Amazon Elastic Kubernetes Service (EKS) cluster is configured to restrict access to both the control plane and worker nodes within a Virtual Private Cloud (VPC), with no public internet exposure. The key characteristics of a private EKS cluster include:
Private EKS API: The EKS Kubernetes API is accessed via a private endpoint within the VPC, with no public IP address assigned.
Private Worker Nodes: Worker nodes are deployed in private subnets without public IP addresses, ensuring that they are only accessible within the VPC.
VPC Endpoints for AWS Services: Use of VPC interface and gateway endpoints (VPCE) to securely connect to AWS services like S3, EC2, and ECR, without requiring internet access.
Outbound Connectivity: Outbound traffic can be managed through a NAT Gateway, VPC interface endpoints, or a custom gateway allowing secure access to AWS services and external resources.
For more details, refer to the AWS documentation on private EKS clusters.
Infrastructure Preparation
Creating the VPC Network
Before deploying Yellowbrick, create a VPC network that satisfies the requirements for a private installation. The network should be designed to support private communication between EKS nodes and AWS services while minimizing cross-AZ data transfer costs.
VPC Configuration:
Create a VPC with an appropriate address space. The AWS VPC CNI for EKS will allocate an IP address per Kubernetes pod, and thus requires a large number of IP addresses in the VPC. The VPC should be sufficiently large to account for all subnets, including a primary subnet of recommended size /19 which will host the EKS nodes and EKS pod workloads. Smaller sized subnets can be utilized for installations with smaller Yellowbrick cluster sizes, though it is not recommended to created subnets smaller than /24.
Configure the VPC to allow access to AWS service APIs. This may include:
Internet Gateway: Required if you plan to use a NAT Gateway for outbound internet access.
NAT Gateway: Deploy in a small, public subnet to allow outbound access to AWS services.
Route Tables: Associate route tables with the subnets to manage traffic routing within the VPC. Configure ACL rules as appropriate.
VPC Endpoints: Configure AWS VPC Endpoints for internal communication with AWS services. The Yellowbrick installation will always create a VPC Gateway Endpoint to S3 automatically. It is not recommended to use Amazon S3 interface endpoints (PrivateLink) due to additional data transfer costs. VPC interface endpoints to other services do not have the same concern. For more information on a list of AWS services, see Access to AWS Service APIs
Subnet Configuration:
Private Subnets: Create two private subnets, each in a different availability zone (AZ). These subnets should not assign public IP addresses and will host the EKS nodes.
Primary Subnet: An EKS installation requires at least two subnets. The Deployer will place the majority of the EKS nodes in a single primary subnet to minimize cross-AZ data transfer costs. This primary subnet should have a
/19
CIDR block to support the anticipated pod IP requirements. When limited address space is available, bias the majority of addresses to the primary subnet. You must tag this subnet withprimary: true
to properly be identified by the Deployer.
Tagging:
- Apply the additional tag
cluster_yellowbrick_io_owner = yb-install
to all manually created network and EKS infrastructure. This allows the Deployer to identify the correct components during deployment.
- Apply the additional tag
For examples of setting up a customized private infrastructure, refer to the Terraform reference architecture.
Access to AWS Service APIs
The Deployer and the Yellowbrick Operator require access to several AWS services. You can choose between two approaches:
Outbound NAT Gateway:
- Create a third small public subnet to host a NAT Gateway, allowing outbound access to AWS APIs through the Internet Gateway. This method is simple and effective for most scenarios.
AWS VPC Interface Endpoints:
- Use AWS PrivateLink to create Interface Endpoints for the following services:
autoscaling
ec2
ecr.api
ecr.dkr
eks
elasticloadbalancing
logs
sts
- Use AWS PrivateLink to create Interface Endpoints for the following services:
The Deployer will create a VPC Gateway Endpoint to S3, so you don't need to do that. Note that if using a custom DNS domain, be aware that there is no Interface Endpoint for Route 53, which may necessitate the use of a NAT Gateway for DNS management from within the EKS cluster.
Executing the Deployer
The Deployer will utilize an EC2 instance within the same VPC where Yellowbrick will be deployed. This instance will require access to the EKS cluster via the private endpoint and will manage the deployment process, including the creation of additional resources, ECR uploads, and Kubernetes workload installations.
Installation Process
NOTE
By installing Yellowbrick Enterprise Edition software into your Cloud Account, you agree to Yellowbrick’s Enterprise Edition EULA.
Subscribe to the Yellowbrick Data Warehouse Enterprise Edition AMI.
Create the base infrastructure as outlined in this deployment guide.
Proceed to follow either the instructions using CloudFormation or launching the the Yellowbrick Deployer custom AMI manually. Once one of those two options is complete, access the Deployer from a web browser to continue with the installation.
Option 1: CloudFormation
Launch the Yellowbrick CloudFormation Deployer template.
Along with a stack name and basic parameters of the Deployer, the CloudFormation parameters will offer a drop-down choice for a VPC and subnet to launch the Deployer into. It is very important to select the correct VPC and subnet from the previously created infrastructure. This will ensure when the Deployer creates the EKS cluster, it will have has access to EKS cluster's API via the private endpoint during installation.
After setting all remaining CloudFormation parameters as appropriate for your installation, proceed to create the stack.
Upon completion of the stack creation, locate the Deployer URL on the outputs of the CloudFormation stack. Navigate to this URL to continue with the installation using the Deployer. Please note this instance may not be accessible from the internet, and you may need to perform additional steps to ensure HTTPS access.
Option 2: Custom AMI
Deploy an EC2 instance using the AMI in the VPC and subnet from the previously created infrastructure. This will ensure when the Deployer creates the EKS cluster, it will have has access to EKS cluster's API via the private endpoint during installation. Please note this instance may not be accessible from the internet, and you may need to perform additional steps to gain SSH and HTTPS access.
The instance must be launched with an instance profile that assumes an IAM role containing the policies listed here. This IAM policy uses attribute-based access control (ABAC). Please ensure that the IAM role includes the following tags:
cluster_yellowbrick_io_owner = yb-install
cluster_yellowbrick_io_creator = yb-install
These tags are essential for the proper functioning of access control, enabling the instance to manage resources securely within the Yellowbrick infrastructure.
Create an SSH connection to the instance as the
ubuntu
user using the SSH keypair specified during the launch of the AMI.The EC2 instance is configured to automatically start the interactive web UI for the deployment process. Accessing this UI requires an access key that can be retrieved by executing
/opt/ybd/get-access-key
from the remote shell.From a web browser, access the EC2 instance over HTTPS port 443. Use the DNS or IP address of the EC2 instance as the hostname. Web traffic will be encrypted over TLS and a self-signed certificate will be used.
When accessing the Yellowbrick Deployer UI, you will need to provide the Deployer access key retrieved from the previous step.
Accessing Yellowbrick Deployer UI
With a web browser, access the Deployer by following the instructions given in each previous method.
On the "Restrict Access" step, indicate this is a private installation and click Next.
On the "Network" step, choose the correct VPC network previously created.
Continue with the deployment process as normal. The Deployer will configure the cluster, set up necessary IAM roles, create additional node groups, and deploy the Yellowbrick Operator and related workloads.
Terraform Reference
For a Terraform reference of this infrastructure, please see deployer-contrib.
Conclusion
By following this guide, you can establish a private IP environment for Yellowbrick within AWS and tailor the infrastructure to your specific requirements.