728x90
반응형
SMARTCTL

◆ smartctl 도구는 S.M.A.R.T. (Self-Monitoring, Analysis, and Reporting Technology) 기능이 탑재된 하드디스크를 점검하는 도구

◆S.M.A.R.T 의 기능을 이용하는 도구이다보니 이 기능이 없는 하드디스크는 이 도구를 이용할 수가 없다 





사용법

◆ smartctl [옵션] [장치명]

- smartctl -a /dev/sda

- smartctl -a -d cciss,0 /dev/cciss/c0d0 <HP Disk>

◆ 옵션

-i : 장치에 대한 정보를 표시

-g : 장치 설정을 가져옴

-a : 장치에 대한 모든 SMART 정보를 표시

-x : 장치에 대한 모든 정보를 표시

-scan : 장치를 검색

-scan-open : 장치를 검색하고 실행

-r : 보고서 출력

-s : SMART 활성화 / 비활성화

-o : 자동으로 오프라인 테스트 해제

-S : 비활성화 속성 자동 저장 사용

-H : 장치 health 표시

-A : 장치의 벤더 고유 속성과 값 표시


[root@Home ~]# smartctl --all /dev/sda

smartctl version 5.38 [x86_64-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Second Generation Serial ATA family
Device Model:     WDC WD5000AAKS-00A7B2
Serial Number:    WD-WCASY7521049
Firmware Version: 01.03B01
User Capacity:    500,107,862,016 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Wed Jun  1 19:55:45 2011 KST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
                                        was suspended by an interrupting command from host.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 (11160) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 131) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x303f) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   163   162   021    Pre-fail  Always       -       4825
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       259
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   085   085   000    Old_age   Always       -       11533
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       254
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       237
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       259
194 Temperature_Celsius     0x0022   109   086   000    Old_age   Always       -       38
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


빨간색 항목은 정상적인 Hard disk 라면 반드시 0을 가지고 있어야 하는 항목, 숫자가 높을수록 상태가 좋지 않은상태

Raw_Read_Error_Rate
디스크 표면이로부터 데이터를 읽는 과정에서 문제가 있을때 (주로 물리적인 충격으로 유발됨)
Spin_Up_Time
플레터 회전이 제로 rpm에서 최대 rpm에 도달하는데 걸리는 평균 시간
Start_Stop_Count
플레터가 회전하고 정지한 횟수
Reallocated_Sector_Ct
섹터에 문제가 생겨서 스페어영역의 섹터로 대체한 횟수
Seek_Error_Rate
탐색 오류율
Power_On_Hours
하드에 전원이 인가된 시간
Spin_Retry_Count
최대rpm에 도달하기위해서 회전을 시도하는 횟수 (정상이라면 1번에 끝나야한다)
Power_Cycle_Count
전원 on/off 횟수
Power-Off_Retract_Count
헤드가 플레터(디스크)에서 벗어나는 횟수 (간단하게 parking 위치로 이동한 횟수)
Load_Cycle_Count
헤드가 플레터위로 진입한 횟수
Temperature_Celsius
하드디스크 온도
Reallocated_Event_Count
스페어영역으로 대체된 섹터로 부터 데이터를 읽어간 횟수
hardware ecc recovered
ECC 오류검출로 인하여 복구된 횟수
Current_Pending_Sector
불안정적인 섹터로 스페어영역 섹터로 remap을 준비중이거나 읽는 과정에 문제가 생긴 섹터 (준 배드섹터)
Offline_Uncorrectable
읽기/쓰기에 문제가 생긴 섹터, 즉 디스크 표면이 손상됨. (한마디로 배드섹터)
UDMA_CRC_Error_Count
하드디스크 인터페이스를 통해 데이타 전송과정에 발생한 CRC 오류 횟수
Multi_Zone_Error_Rate
섹터에 쓰기과정에 발생한 에러가 검출된 횟수





예제

아래와 같은 에러 메시지가 발생하는 Linux system있다. 이 System을 smartctl을 통해 분석

ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata1.00: irq_stat 0x40000001
ata1.00: cmd 06/01:01:00:00:00/00:00:00:00:00/a0 tag 0 dma 69632 out
         res 51/04:01:00:00:00/00:00:00:00:00/a0 Emask 0x1 (device error)
ata1.00: status: { DRDY ERR }
ata1.00: error: { ABRT }
ata1.00: device reported invalid CHS sector 0
end_request: I/O error, dev sda, sector 533248
"DRDY ERR" 가 발생을 하고 있다. DRDY(Hard disk의 status register 에 Drive ready error를 확인함)

여기서 /dev/sda에 대한 smartctl 결과는 아래와 같다. 빨간색으로 표시한 부분은 문제가 있는 부분


# sudo smartctl --all /dev/sda
smartctl 5.40 2010-03-16 r3077 [x86_64-unknown-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model:     SSD128GNOB-HSM1
Serial Number:    <censored>
Firmware Version: 1571
User Capacity:    128.035.676.160 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   7
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Fri Jan  7 23:14:19 2011 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (   0) seconds.
Offline data collection
capabilities: (0x1d) SMART execute Offline immediate.
No Auto Offline data collection support.
Abort Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
No Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability:        (0x00) Error logging NOT supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: (   0) minutes.
Extended self-test routine
recommended polling time: (   0) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0000   007   000   000    Old_age   Offline      -       0
  9 Power_On_Hours          0x0000   202   001   000    Old_age   Offline      -       0
12 Power_Cycle_Count       0x0000   169   000   000    Old_age   Offline      -       0
184 End-to-End_Error        0x0000   018   000   000    Old_age   Offline      -       0
195 Hardware_ECC_Recovered  0x0000   000   000   000    Old_age   Offline      -       0
196 Reallocated_Event_Count 0x0000   000   000   000    Old_age   Offline      -       0
197 Current_Pending_Sector  0x0000   000   000   000    Old_age   Offline      -       0
198 Offline_Uncorrectable   0x0000   131   213   000    Old_age   Offline      -       38503
199 UDMA_CRC_Error_Count    0x0000   021   007   000    Old_age   Offline      -       39935
200 Multi_Zone_Error_Rate   0x0000   016   197   000    Old_age   Offline      -       401
201 Soft_Read_Error_Rate    0x0000   197   047   000    Old_age   Offline      -       173
202 Data_Address_Mark_Errs  0x0000   164   115   000    Old_age   Offline      -       2
203 Run_Out_Cancel          0x0000   030   103   000    Old_age   Offline      -       2
204 Soft_ECC_Correction     0x0000   000   000   000    Old_age   Offline      -       0
205 Thermal_Asperity_Rate   0x0000   160   134   000    Old_age   Offline      -       1
206 Flying_Height           0x0000   001   000   000    Old_age   Offline      -       0
207 Spin_High_Current       0x0000   219   006   000    Old_age   Offline      -       0
208 Spin_Buzz               0x0000   067   000   000    Old_age   Offline      -       0
209 Offline_Seek_Performnce 0x0000   100   000   000    Old_age   Offline      -       0
210 Unknown_Attribute       0x0000   238   000   000    Old_age   Offline      -       0
211 Unknown_Attribute       0x0000   000   000   000    Old_age   Offline      -       0

Warning: device does not support Error Logging
Warning! SMART ATA Error Log Structure error: invalid SMART checksum.
SMART Error Log Version: 1
No Errors Logged

Warning! SMART Self-Test Log Structure error: invalid SMART checksum.
SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


Device does not support Selective Self Tests/Logging

smartctl의 결과에 의해서 배드섹터가 38503개가 있으며 Cable상 오류도 발생하는 것을 찾을 수 있다.
반응형

'Linux' 카테고리의 다른 글

Linux badblocks  (0) 2018.08.08
Linux hdparm  (0) 2018.08.08
Linux tune2fs  (0) 2018.08.08
Linux LVM 명령어 리스트  (0) 2018.08.08
Linux Disk(LUN) 인식  (0) 2018.08.08

+ Recent posts