Appearance
GCP Private Installation Instructions
Overview
This document outlines the process for installing the Yellowbrick Data Warehouse on Google Cloud Platform (GCP) using a private Google Kubernetes Engine (GKE) cluster. GCP offers flexibility for deploying infrastructure with varying levels of privacy and customization. This guide supports a bring-your-own-Kubernetes (BYOK) approach, allowing customers to create custom networking configurations that meet their specific requirements, while still leveraging the Yellowbrick installer to manage the deployment process.
Understanding a Private GKE Cluster
In GCP, a private GKE cluster is a Kubernetes cluster where the control plane (master) and nodes have restricted access, with no public IP addresses assigned to them. The key features of a private GKE cluster include:
- Private Nodes: Nodes are deployed without public IP addresses, ensuring that they are only accessible from within the VPC.
- Private Control Plane (Master): The GKE control plane is accessed through internal IP addresses within the VPC, with no public endpoint.
- VPC Peering and Private Services Access: For private communication with Google services such as Google Cloud Storage or Pub/Sub, VPC peering or Private Service Access can be configured.
For more details, refer to the Google Cloud documentation on private GKE clusters.
Infrastructure Preparation
Network Configuration
Before deploying the GKE cluster, you must configure the Virtual Private Cloud (VPC) network. This network will support the private cluster and provide secure communication with Google Cloud services.
VPC Setup:
- Create a VPC with sufficient IP ranges to accommodate your cluster and future expansion.
- Configure subnets within the VPC, ensuring that the primary subnet for GKE nodes is large enough to handle pod IP allocations. A
/22
subnet is recommended for the primary node subnet to allow for growth.
Subnet Configuration:
- Primary Subnet: Allocate a
/22
subnet for GKE nodes, ensuring that all nodes reside within this subnet to avoid cross-subnet communication costs. - Private Service Access: Enable Private Service Access to connect the VPC to Google-managed services without exposing data to the public internet.
- Primary Subnet: Allocate a
Firewall Rules:
- Create firewall rules to allow internal communication between GKE nodes and necessary Google Cloud services. For private installations, ensure that no public ingress is allowed, and restrict egress as needed.
GKE Cluster Deployment
The GKE cluster must not be created in Autopilot mode, as it prevents the creation of the different node pools that are required to run the Yellowbrick Data Warehouse. Only standard GKE clusers are supported.
GKE Cluster Configuration
When deploying the GKE cluster, the following settings are required:
Cluster Version:
- The GKE cluster must be created with Kubernetes version 1.30 or later.
Regional cluster:
- The GKE cluster must be regional, with only a single availability zone.
Private Endpoints:
- Enable private control plane and nodes, ensuring that all communication occurs within the VPC. Disable the public endpoint to ensure full privacy.
Node Configuration:
- Place all GKE nodes within the primary private subnet. This helps to minimize data transfer costs and improve security by keeping all communication within the VPC.
- One of the node pool must have the Kubernetes Label
cluster.yellowbrick.io/node_type: yb-operator
and that node pool must have at least one node available, or have auto-scaling enabled.
Networking
- The network access must be private
- The DNS Configuration must be using Kube-dns.
Security
- Workload identity must be enabled.
The following settings are recommended but not mandatory:
- Enable NodeLocal DNSCache
- The Yellowbrick Data Warehouse will have better performances with this local DNS cache.
Virtual Machine Deployment
Deploy a VM instance within the primary private subnet where the Deployer will be launched. This instance will need access to the GKE cluster via private endpoints and to Google Cloud services through Private Google Access. The Deployer will then complete the deployment process, including the creation of any additional resources and the installation of Kubernetes workloads.
Installation Process
Subscribe to the Yellowbrick Data Warehouse Enterprise Edition image in the Google Cloud Marketplace.
Create the base infrastructure as outlined in this deployment guide.
Launch a Google Compute Engine VM using the Yellowbrick image in the target subnet within the VPC. Assign the necessary roles listed here to the VM’s service account. As this VM will not be accessible from the internet, you may need to perform additional steps to ensure SSH and HTTPS access.
Create an SSH connection to the VM as the
ybdadmin
user using the SSH key pair specified during the launch.The VM is configured to automatically start the interactive web UI for the deployment process. Retrieve the access key by executing
/opt/ybd/get-access-key
from the remote shell.From a web browser, access the VM over HTTPS on port 443 using the VM’s private IP address or DNS name. The connection will be encrypted with a self-signed certificate.
Enter the access key from the previous step to proceed with the installation process.
During the installation, specify that this is a private installation and provide the name of the GKE cluster previously created. The existence of the EKS cluster will be validated, and the network configuration will be shown. Please verify those values are correct.
The Deployer will complete the deployment by configuring the cluster, assigning necessary IAM roles, creating additional node pools, and deploying the Yellowbrick Operator and related workloads.
Conclusion
By following this guide, you can establish a secure environment for the Yellowbrick Data Warehouse within GCP using a secure and private GKE cluster.