Appearance
AWS Private Installation Instructions
Overview
This document provides a comprehensive guide for installing the Yellowbrick Data Warehouse in a private Amazon Web Services (AWS) environment using a bring-your-own-Kubernetes (BYOK) approach. This guide assumes that you will configure the necessary infrastructure within AWS, including the Virtual Private Cloud (VPC) and Elastic Kubernetes Service (EKS) cluster, to meet the requirements of a secure and private deployment. The reference architecture provided serves as a baseline but can be customized to fit your specific environment.
Understanding a Private EKS Cluster
A private Amazon Elastic Kubernetes Service (EKS) cluster is configured to restrict access to both the control plane and worker nodes within a Virtual Private Cloud (VPC), with no public internet exposure. The key characteristics of a private EKS cluster include:
Private EKS API: The EKS control plane is accessed via a private endpoint within the VPC, with no public IP address assigned.
Private Worker Nodes: Worker nodes are deployed in private subnets without public IP addresses, ensuring that they are only accessible within the VPC.
VPC Endpoints for AWS Services: Use of VPC interface and gateway endpoints (VPCE) to securely connect to AWS services like S3, EC2, and ECR, without requiring internet access.
Outbound Connectivity: Outbound traffic can be managed through a NAT Gateway or VPC interface endpoints, allowing secure access to AWS services and external resources.
For more details, refer to the AWS documentation on private EKS clusters.
Infrastructure Preparation
Creating the VPC Network
Before deploying the EKS cluster, you must create a VPC network that satisfies the requirements for a private installation. The network should be designed to support private communication between EKS nodes and AWS services while minimizing cross-AZ data transfer costs.
VPC Configuration:
Create a VPC with an appropriate address space. The AWS VPC CNI for EKS will allocate an IP address per Kubernetes pod, and thus requires a large number of IP addresses in the VPC. The VPC should be sufficiently large to account for all subnets, including a primary subnet of recommended size /19 which will host the EKS nodes and EKS pod workloads.
Configure the VPC with the following components:
Internet Gateway: Required if you plan to use a NAT Gateway for outbound internet access.
NAT Gateway: Deploy in a small, public subnet to allow outbound access to AWS services if using AWS service APIs.
Route Tables: Associate route tables with the subnets to manage traffic routing within the VPC. Configure ACL rules as appropriate.
VPC Endpoints: Configure AWS VPC Endpoints for internal communication with AWS services.
Subnet Configuration:
Private Subnets: Create two private subnets, each in a different availability zone (AZ). These subnets should not assign public IP addresses and will host the EKS nodes.
Primary Subnet: An EKS installation requires at least two subnets. The deployment will place the majority of the EKS nodes in a single primary subnet to minimize cross-AZ data transfer costs. This primary subnet should have a
/19
CIDR block to support the anticipated pod IP requirements. When limited address space is available, bias the majority of addresses to the primary subnet. You must tag this subnet withprimary: true
.Service Endpoints: Configure an S3 Gateway Endpoint within the VPC to allow direct access to S3. While AWS PrivateLink for S3 is available, it is not recommended due to the additional data transfer costs.
Tagging:
- Apply the additional tag
cluster_yellowbrick_io_owner = yb-install
to all manually created network and EKS infrastructure. This allows the Deployer to identify the correct components during deployment.
- Apply the additional tag
Access to AWS Service APIs
The deployment process and the Yellowbrick Operator require access to several AWS services. You can choose between two approaches:
Outbound NAT Gateway:
- Create a third small public subnet to host a NAT Gateway, allowing outbound access to AWS APIs through the Internet Gateway. This method is simple and effective for most scenarios.
AWS VPC Interface Endpoints:
- Use AWS PrivateLink to create Interface Endpoints for the following services:
autoscaling
ec2
ecr.api
ecr.dkr
eks
elasticloadbalancing
logs
sts
- Use AWS PrivateLink to create Interface Endpoints for the following services:
The Deployer will create a VPC Gateway Endpoint to S3, so you don't need to do that. Note that if using a custom DNS domain, be aware that there is no Interface Endpoint for Route 53, which may necessitate the use of a NAT Gateway for DNS management from within the EKS cluster.
Creating the EKS Cluster
When deploying the EKS cluster, the following configurations are required:
Cluster Version: The EKS cluster must use Kubernetes version 1.30 or later.
Cluster Name: Name the EKS cluster to match the name of the Yellowbrick instance.
Private Endpoint: Enable the EKS private endpoint to restrict API access to within the VPC. Disable the public endpoint to ensure all communication remains within the private network.
EC2 Instance for Running the Deployer
Deployment will utilize an EC2 instance within the same VPC where the Yellowbrick Data Warehouse will be deployed to. This instance will require access to the EKS cluster via the private endpoint and will manage the deployment process, including the creation of additional resources, ECR uploads, and Kubernetes workload installations.
Installation Process
Subscribe to the Yellowbrick Data Warehouse Enterprise Edition AMI.
Create the base infrastructure as outlined in this deployment guide.
Proceed to follow either the instructions using CloudFormation or launching the the Yellowbrick Deployer custom AMI manually. Once one of those two options is complete, access the Yellowbrick Deployer from a web browser to continue with the installation.
Option 1: CloudFormation
Launch the Yellowbrick CloudFormation Deployer template.
Along with a stack name and basic parameters of the Deployer, the CloudFormation parameters will offer a drop-down choice for a VPC and subnet to launch the Deployer into. It is very important to select the correct VPC and subnet from the previously created infrastructure. This will ensure when the Deployer creates the EKS cluster, it will have has access to EKS cluster's API via the private endpoint during installation.
After setting all remaining CloudFormation parameters as appropriate for your installation, proceed to create the stack.
Upon completion of the stack creation, locate the Yellowbrick Deployer URL on the outputs of the CloudFormation stack. Navigate to this URL to continue with the installation using the Yellowbrick Deployer. Please note this instance may not be accessible from the internet, and you may need to perform additional steps to ensure HTTPS access.
Option 2: Custom AMI
Deploy an EC2 instance using the AMI in the VPC and subnet from the previously created infrastructure. This will ensure when the Deployer creates the EKS cluster, it will have has access to EKS cluster's API via the private endpoint during installation. Please note this instance may not be accessible from the internet, and you may need to perform additional steps to ensure SSH and HTTPS access.
The instance must be launched with an instance profile that assumes an IAM role containing the policies listed here. This IAM policy uses attribute-based access control (ABAC). Please ensure that the IAM role includes the following tags:
cluster_yellowbrick_io_owner = yb-install
cluster_yellowbrick_io_creator = yb-install
These tags are essential for the proper functioning of access control, enabling the instance to manage resources securely within the Yellowbrick infrastructure.
Create an SSH connection to the instance as the
ubuntu
user with the SSH keypair as given in the launch of the AMI.The EC2 instance is configured to automatically start the interactive web UI for the deployment process. Accessing this UI requires an access key that can be retrieved by executing
/opt/ybd/get-access-key
from the remote shell.From a web browser, access the EC2 instance over HTTPS port 443. Use the DNS or IP address of the EC2 instance as the hostname. Web traffic will be encrypted over TLS and a self-signed certificate will be used.
When accessing the Yellowbrick Deployer UI, you will need to provide the Deployer access key retrieved from the previous step.
Accessing Yellowbrick Deployer UI
With a web browser, access the Deployer by following the instructions given in each previous method.
On the "Restrict Access" step, indicate this is a private installation and input the name of the EKS cluster previously created. The existence of the EKS cluster will be validated, and the network configuration will be shown. Please verify those values are correct.
Continue with the deployment process as normal. The Deployer will configure the cluster, set up necessary IAM roles, create additional node groups, and deploy the Yellowbrick Operator and related workloads.
Conclusion
This guide provides a blueprint for deploying the Yellowbrick Data Warehouse in a secure AWS environment. By leveraging the flexibility of AWS services and following the bring-your-own-Kubernetes approach, you can tailor the infrastructure to your enterprise’s specific requirements.