본문으로 바로가기

Nutanix /home directory is full

category HCI /Nutanix 2020. 3. 3. 16:39

Nutanix CVM 중 /home 디렉터리가 Full 현상이 발생한다. 

 

해당 이슈는 CVM의 디스크 사용률(/home/nutanix)이 임계치를 넘었을 경우 해당 메시지가 발생합니다.

-      One-click 업그레이드 시 업로드한 AOS 이미지 파일이 남았을 경우

-      Log Collector

-      오래된 파일이 남았을 경우

 

 

해당 이슈 관련 KB 문서는 아래내용 참조 

What to do when /home partition or /home/nutanix directory is full

 

 

Article # 000001540 Last modified on Jan 13th 2020Summary: This article describes ways to safely free up space if /home or /home/nutanix becomes full or does not contain enough space to facilitate an AOS upgrade.Versions affected: ALL AOS VersionTroubleshooting UpgradeDescription

WARNING: DO NOT treat the Nutanix CVM (Controller VM) as a normal Linux machine. DO NOT perform "rm -rf /home" on any of the CVMs. It could lead to data loss scenarios. Contact Nutanix Support in case you have any doubts.

This condition can be reported in two scenarios:

  • The NCC health check disk_usage_check reports that the /home partition usage is above a certain threshold (by default 75%)
  • The pre-upgrade check test_nutanix_partition_space checks if all nodes have a minimum of 5.6 GB space on the /home/nutanix directory before performing an upgrade

The following error messages will be generated in Prism by the test_nutanix_partition_space pre-upgrade check:

Not enough space on /home/nutanix directory on Controller VM [ip]. Available = x GB : Expected = x GBFailed to calculate minimum space requiredFailed to get disk usage for cvm [ip], most likely because of failure to ssh into cvmUnexpected output from df on Controller VM [ip]. Please refer to preupgrade.out for further informationNutanix reserves space on the SSD-tier of each CVM for its infrastructure. These files and directories are located in the /home folder that you see when you log in to a CVM. The size of the /home folder is capped at 40 GB so that the majority of the space on SSD is available for user data.

Due to the limited size of the /home partition, it is possible for it to run low on free space and trigger Prism Alerts, NCC Health Check failures or warnings, or Pre-Upgrade Check failures. These guardrails exist to prevent /home from becoming completely full, as this causes data processing services like Stargate to become unresponsive. Clusters with multiple CVMs having 100% full /home partition will often result in downtime for user VMs.

The Scavenger service running on each CVM is responsible for the automated clean-up of old logs in /home and improvements to its scope were made in AOS 5.5.9, 5.10.1, and later releases. For customers running earlier AOS releases, or in special circumstances, it may be necessary to manually clean up files out of certain directories in order to bring space usage in /home down to a level that will allow future AOS upgrades.

When cleaning up unused binaries and old logs on a CVM, it is important to note that all the user data partitions on each drive associated with a given node are also mounted within /home. This is why we strongly advise against using undocumented commands like “rm -rf /home”, since this will also wipe the user data directories mounted within this path. The purpose of this article is to guide you through identifying the files that are causing the CVM to run low on free space and removing only those which can be safely deleted.

Solution

WARNING: DO NOT treat the Nutanix CVM (Controller VM) as a normal Linux machine. DO NOT perform "rm -rf /home" on any of the CVMs. It could lead to data loss scenarios. Contact Nutanix Support in case you have any doubts.

Step 1: Parsing the space usage for "/home".

Log in to CVM, download KB-1540_clean.sh to /home/nutanix/tmp directory, make it executable and run it.

KB-1540_clean.sh has some checks (MD5, compatibility, etc.) and deploys KB-1540_clean.sh script accordingly.

nutanix@cvm:~$ cd ~/tmp nutanix@cvm:~/tmp$ wget http://download.nutanix.com/kbattachments/1540/KB-1540_clean.sh nutanix@cvm:~/tmp$ chmod +x KB-1540_clean.sh nutanix@cvm:~/tmp$ ./KB-1540_clean.sh

You can select to deploy the script to the local CVM or all CVMs.

======== Select package to deploy 1 : Deploy the tool only to the local CVM 2 : Deploy the tool to all of the CVMs in the cluster Selection (Cancel="c"):

Run the script to get a clear distribution of partition space usage in /home.

nutanix@cvm:~/tmp$ ./nutanix_home_clean.sh


Step 2: Check for files that can be deleted from within the list of approved directories.

PLEASE READ: The following are the ONLY directories within which it is safe to remove files. Take note of the specific guidance for removing files from each directory. Do not use any other commands or scripts to remove files. Do not use "rm -rf" under any circumstances.

  1. Removing Old Logs and Core Files

    Only delete the files inside these directories. Do not delete the directories themselves.

    • /home/nutanix/data/cores/
    • /home/nutanix/data/binary_logs/
    • /home/nutanix/data/ncc/installer/
    • /home/nutanix/data/log_collector/

    Use this syntax for deleting files within each of these directories:

    nutanix@cvm:~$ rm /home/nutanix/data/cores/*
  2. Before removing old logs, check to see if you have any open cases with pending RCAs (Root Cause Analysis). The existing logs might be necessary for resolving those cases and you should check with the owner from Nutanix Support before cleaning up /home.
  3. Removing Old ISOs and Software Binariesnutanix@cvm:~$ ncli cluster info

    Example output:

    Cluster Name : Athena Cluster Version : 5.10.2

    Only delete the files inside these directories. Do not delete the directories themselves.

    • /home/nutanix/software_uncompressed/ - Delete any old versions other than the versions you are currently upgrading. The software_uncompressed folder is only in use when the pre-upgrade is running and should be removed after a successful upgrade. If you see a running cluster which is currently not upgrading, it is safe to remove everything underneath software_uncompressed
    • /home/nutanix/foundation/isos/ - Old ISOs of hypervisors or Phoenix.
    • /home/nutanix/foundation/tmp/ - Temporary files that can be deleted.

    Use this syntax for deleting files within each of these directories:

    nutanix@cvm:~$ rm /home/nutanix/foundation/isos/*

    If you see large files in the software_downloads directory that are not needed for any planned upgrades, do not remove those from the command-line. Instead, use the Prism Upgrade Software UI to accomplish as shown below. This example lists multiple versions of AOS which consume around 5 GB each, simply click on the 'X' to delete the files. Then click on each of the following tabs including File Server, Hypervisor, NCC, and Foundation to locate further downloads you may not require.

    It is possible that Enable Automatic Download is checked. This is located below the above screenshot (on the AOS tab). Left unmonitored, the cluster will download multiple versions, consuming more space in the home directory.

  4. Begin by confirming the version of AOS that is currently installed on your cluster by running the command below. Make sure never to remove any files that are associated with your current AOS version. You will find this under the "Cluster Version" field in the output of the command shown below.


Step 3: Check space usage in /home to see that it is now below 70%.

You can use the "df -h" command to check on the amount of free space in /home. To accommodate a potential AOS upgrade, usage should ideally be below 70%.

nutanix@cvm:~$ allssh "df -h /home"

Example output:

================== x.x.x.x ================= /dev/md2 40G 8.4G 31G 22% /home ================== x.x.x.x ================= /dev/md2 40G 8.5G 31G 22% /home ================== x.x.x.x ================= /dev/md2 40G 19G 21G 49% /home


Cleaned up files from the approved directories but still see high usage in /home?

Contact Nutanix Support and submit the script log bundle (/tmp/home_kb1540_<cvm_name>_<timestamp>.tar.gz). One of our Systems Reliability Engineers (SREs) will promptly assist you with identifying the source of and solution to the problem at hand. Under no circumstances should you remove files from any other directories aside from those found here as these may be critical to the CVM infrastructure or may contain user data.

 

 


  • KB 문서 자료 

https://portal.nutanix.com/#/page/kbs/details?targetId=kA0600000008dpDCAQ

 

Nutanix Support Portal

 

portal.nutanix.com

 


  • /home  공간사용량 분석 Tool  

http://download.nutanix.com/kbattachments/1540/KB-1540_clean.sh

불러오는 중입니다...

 

- END -