System Maintenance
The Yellowbrick appliance requires very little maintenance; however, you may need to verify that various components are working as intended.
ybcli
. On a healthy
system, the ybcli
returns information like this:
Yellowbrick Data CLI vx.x.x (Genx)
Copyright (c) 2016 Yellowbrick Data, Inc.
All rights reserved
-----------------------------------------
Redundant manager node detected
YBCLI is currently running on the PRIMARY manager node.
Local manager node : yb100-mgr0.yellowbrick.io (PRIMARY ACTIVE)
Remote manager node: yb100-mgr1.yellowbrick.io (SECONDARY ACTIVE)
Type 'help' for a list of supported commands
YBCLI (PRIMARY)>
The output includes detailed information about which nodes are present in the system, their
role, and current status. Any errors, such as when no primary node is found or if the system
is in maintenance mode, are reported during ybcli
startup.
The best way to get a complete overview of system health is to run the health all
and
status
all
commands. These two commands return the status of all components,
including blades, power supplies, fans, external processors, and so on.
YBCLI (PRIMARY) health blade 1
Bay: 1 Status: ok Power: on LED: [ OFF OFF ON ] FRU: C4-0005-01 R01 Serial: TAA160802003DE CPU: Booted (YBOS)
Retrieving blade alerts...
Blade alerts reported: 1
Bay: 1: volt_00_pvccin_cpu_mV → 2234 (max error)
health
storage
command on any node that is not in standby
state:...
Replicated block device mounted: OK
Data replication active: OK
Data replication in progress: NO
...
In this example, the command was run on the primary node. It shows that block replication between the manager nodes is running and healthy, and that replication is not currently in progress.
You can take a manager node out of the cluster by putting it into standby mode. This means that the node will stop participating in the cluster and will not be considered for automatic system failover. However, you can still initiate a manual system failover to the node.
health
storage
command provides accurate information about the progress of
resynchronizing the block device. Disk Health
The manager nodes each have 4 block devices: 2 are mirrored for the operating system, and 2 are mirrored for the data in the Yellowbrick database. The latter is also actively synchronized to the secondary manager node.
You can check the status and health of all manager node disk drives by looking at the
output of the health storage
command.
A drive can be removed at any time and replaced with a drive provided by Yellowbrick. If possible, putting the system into maintenance mode is recommended before any planned hardware replacement.
Restarting the Database After a Power Failure
In the event of a power failure that brings down both manager nodes, when the system is
powered back on, you can use the system status
command to check the overall state of the system. The database may not come back up, in
which case you can start it with the database start command.