Disk file operations I/O

September 5th, 2013


22-38-37-706_950x510

Database is getting high waits on buffer busy waits. Here is an example period where 5 sessions are completely blocked on buffer busy waits for 4 minutes:

  select
       to_char(min(sample_time),'HH24:MI') maxst,
       to_char(max(sample_time),'HH24:MI') minst,
       count(*),
       session_id,
       ash.p1,
       ash.p2,
       ash.SQL_ID,
       blocking_session bsid
  from DBA_HIST_ACTIVE_SESS_HISTORY ash
  where event='buffer busy waits'
     and session_state='WAITING'
group by
       session_id
       ,sql_id
       ,blocking_session
       ,ash.p1
       ,ash.p2
Order by  1
/

gives

MAXST MINST   COUNT(*) SESSION_ID         P1         P2 SQL_ID           BSID
----- ----- ---------- ---------- ---------- ---------- ------------- -------
21:54 21:58         26       1526       1812     278732 3gbsbw6w8jdb3      11
21:54 21:58         25        528       1812     278732 3gbsbw6w8jdb3      11
21:54 21:58         25       1514       1812     278732 3gbsbw6w8jdb3      11
21:54 21:58         26        777       1812     278732 3gbsbw6w8jdb3      11
21:54 21:58         25         33       1812     278732 3gbsbw6w8jdb3      11

All are waiting on the same file and block held by one session and all are executing the same statement which is a select for update

The blocker is executing the same SQL statement and spends those 4 minutes waiting for “Disk file operations I/O”:

select
       min(to_char(sample_time,'HH24:MI')) minst,
       max(to_char(sample_time,'HH24:MI')) maxst,
       session_id,
       substr(event,0,30) event,
       count(*),
       ash.SQL_ID,
       blocking_session bsid
from DBA_HIST_ACTIVE_SESS_HISTORY ash
where
       session_id in  ( 11)
group by
     event,sql_id,session_id, blocking_session
order by 1
/

gives

MINST MAXST EVENT                            COUNT(*) SQL_ID           BSID
----- ----- ------------------------------ ---------- ------------- -------
21:54 21:58 Disk file operations I/O               26 3gbsbw6w8jdb3

What are “Disk file operations I/O”?

From the docs , http://docs.oracle.com/cd/E18283_01/server.112/e17110/waitevents003.htm#insertedID40

Disk file operations I/O

This event is used to wait for disk file operations (for example, open, close, seek, and resize). It is also used for miscellaneous I/O operations such as block dumps and password file accesses.

FileOperation Type of file operation
fileno File identification number
filetype Type of file (for example, log file, data file, and so on)
What kind of FileOperations and filetype are occuring:
select p1,p3, count(*) from
dba_hist_active_sess_history
where event ='Disk file operations I/O'
group by p1,p3
/
        P1         P3   COUNT(*)
---------- ---------- ----------
         2          1        193
         2          3         14
         4          4          1
         2          2       4459
         3          4        160
         1         18        103
So mainly FileOperation type 2 and filetype 2.
What are the file types? Not sure, but thanks to Andy Klock on Oracle-L  this looks like a possibility:
select distinct filetype_id, filetype_name from DBA_HIST_IOSTAT_FILETYPE order by 1;

FILETYPE_ID FILETYPE_NAME
----------- ------------------------------
0 Other
1 Control File
2 Data File
3 Log File
4 Archive Log
6 Temp File
9 Data File Backup
10 Data File Incremental Backup
11 Archive Log Backup
12 Data File Copy
17 Flashback Log
18 Data Pump Dump File

What about FileOperation=2? After a call to Oracle support, it looks like on this version, 11.2.0.3 the values are:

    1 file creation
    2 file open
    3 file resize
    4 file deletion
    5 file close
    6 wait for all aio requests to finish
    7 write verification
    8 wait for miscellaneous io (ftp, block dump, passwd file)
    9 read from snapshot files

putting this together gives a script like:

ol file_type for a20
col file_operation for a20
select
    decode(p3,0 ,'Other',
              1 ,'Control File',
              2 ,'Data File',
              3 ,'Log File',
              4 ,'Archive Log',
              6 ,'Temp File',
              9 ,'Data File Backup',
              10,'Data File Incremental Backup',
              11,'Archive Log Backup',
              12,'Data File Copy',
              17,'Flashback Log',
              18,'Data Pump Dump File',
                  'unknown '||p1)  file_type,
    decode(p1,1 ,'file creation',
              2 ,'file open',
              3 ,'file resize',
              4 ,'file deletion',
              5 ,'file close',
              6 ,'wait for all aio requests to finish',
              7 ,'write verification',
              8 ,'wait for miscellaneous io (ftp, block dump, passwd file)',
              9 ,'read from snapshot files',
                 'unknown '||p3) file_operation,
    decode(p3,2,-1,p2) file#,
    count(*)
from dba_hist_active_sess_history
where event ='Disk file operations I/O'
group by p1,p3,
    decode(p3,2,-1,p2)
/

with output like

FILE_TYPE            FILE_OPERATION            FILE#   COUNT(*)
-------------------- -------------------- ---------- ----------
Control File         file open                     0        193
Data File            file open                    -1       4460
Archive Log          file deletion                 0          1
Log File             file open                     0         14
Data Pump Dump File  file creation                 0        103
Archive Log          file resize                   8        160

The “-1″ for datafiles is to group all the datafiles in one line, otherwise in the above case there were over 200 lines of output
For the datafiles what are the I/O latencies looking like compared to the Disk file operations I/O ?

BHOU EVENT_NAME                        AVG_MS           CT
---- ------------------------------ --------- ------------
1054 Disk file operations I/O            2.00       13,547
1130 Disk file operations I/O            1.52       10,658
1200 Disk file operations I/O            1.57        9,846
1230 Disk file operations I/O            2.45        8,704
1300 Disk file operations I/O            3.84        9,526
1330 Disk file operations I/O            2.39       11,989
1400 Disk file operations I/O            1.68       14,698
1430 Disk file operations I/O            2.89       14,863
1500 Disk file operations I/O          860.85       10,577
1530 Disk file operations I/O           12.97       11,783
1600 Disk file operations I/O          623.88       10,902
1630 Disk file operations I/O          357.75       12,428
1700 Disk file operations I/O          294.84       10,543
1730 Disk file operations I/O           12.97       10,623
1800 Disk file operations I/O          461.91       14,443
1830 Disk file operations I/O           12.83       18,504
1900 Disk file operations I/O          443.37        9,563
1930 Disk file operations I/O          237.39       11,737
2000 Disk file operations I/O          542.44       13,027
2033 Disk file operations I/O            6.11        8,389
2100 Disk file operations I/O           16.85       10,561
2130 Disk file operations I/O          306.17        9,873
2200 Disk file operations I/O           20.83       11,335
2230 Disk file operations I/O           12.92       10,158
2300 Disk file operations I/O           13.42       11,025
2330 Disk file operations I/O           15.01       10,883
0000 Disk file operations I/O            5.33        8,533
1054 db file scattered read              1.50       92,394
1130 db file scattered read              1.33       73,243
1200 db file scattered read              1.82      122,988
1230 db file scattered read              2.53      255,474
1300 db file scattered read              4.26      288,144
1330 db file scattered read              2.47      308,045
1400 db file scattered read              2.60       91,684
1430 db file scattered read              3.56      176,324
1500 db file scattered read              4.95      621,658
1530 db file scattered read              5.11      227,565
1600 db file scattered read              5.86      472,804
1630 db file scattered read              9.44      224,984
1700 db file scattered read              9.40      165,238
1730 db file scattered read              7.78      349,003
1800 db file scattered read              6.93      252,761
1830 db file scattered read              7.79      151,760
1900 db file scattered read              5.48      165,369
1930 db file scattered read              3.09      200,868
2000 db file scattered read              3.45      136,647
2033 db file scattered read              5.17      136,330
2100 db file scattered read             11.16      103,799
2130 db file scattered read             10.44      118,025
2200 db file scattered read             20.02      127,638
2230 db file scattered read             13.66      157,210
2300 db file scattered read             10.95       98,493
2330 db file scattered read              8.39      149,606
0000 db file scattered read              4.16      230,075
1054 db file sequential read             3.04    1,152,102
1130 db file sequential read             7.75      165,262
1200 db file sequential read             6.74       23,876
1230 db file sequential read             5.30       10,026
1300 db file sequential read             3.34      496,681
1330 db file sequential read             1.58    1,253,208
1400 db file sequential read             8.86      239,247
1430 db file sequential read            12.91      191,376
1500 db file sequential read            19.97       73,061
1530 db file sequential read            17.80       43,662
1600 db file sequential read            12.41      144,741
1630 db file sequential read             8.99      411,254
1700 db file sequential read             8.03      540,138
1730 db file sequential read             9.26      422,317
1800 db file sequential read            19.16      155,787
1830 db file sequential read             6.01      641,517
1900 db file sequential read             4.79      573,674
1930 db file sequential read             2.72      824,991
2000 db file sequential read             1.59      504,650
2033 db file sequential read             1.88      324,741
2100 db file sequential read            24.32       74,026
2130 db file sequential read            16.05       67,545
2200 db file sequential read            15.52      219,928
2230 db file sequential read             9.87      259,956
2300 db file sequential read            15.18      122,362
2330 db file sequential read             9.97       94,124
0000 db file sequential read            14.19       50,264

Tough to see what is going on
Let’s lay it out in a bubble chart. Latency in ms is on the Y-Axis, time on the X-Axis (AWR snapshots every half hour) and size of the ball is the amount of requests:

The Disk file operations I/O doesn’t seem to correlate with disk activity nor latency. When disk activity is high, on the left, with much data coming off spindle (ie 8ms average) Disk file operations I/O is fast, ie 2ms. But at 20:00,  there is medium I/O activity but much of it is coming from cache, ie 2ms average and the Disk file operations are slow.

The database is not using ASM. The number of datafiles is 900. Open file descriptors is set at 8192.

Might be time to run some DTrace on the file operations to get a better idea of what is going on.

.
UPDATE:

Businessman Touching Domino Pieces Arranged in a LineFull disclosure: this database was running over NFS on Delphix OS which supports DTrace. I ran DTrace on all the NFS operations and none took over 2 seconds, which is far from the 4 minutes reported by the database.

Ended writing a script to run pstack on sessions that were waiting for Disk file operation I/O for more than 10 seconds.  When the pstack was taken the first time, there is no guarentee the process was still on the same wait, but if a second pstack was taken for the same process, then it guarenteed that the first pstack was during the wait.  All such waiters that had 2 pstacks  were waiting  in the first pstack for

_close_sys()

So, although Oracle looks to be reporting slow file open operations, the pstacks are showing problems on close file.

Googling for on HP ( the database was running on HP), I found a couple of links, but nothing definitive

http://www.unix.com/hp-ux/30396-socket-close-hangs-cpu-go-100-a.html
http://h30499.www3.hp.com/t5/Networking/CPU-usage-is-100-when-close-sys-call-is-made/td-p/4609196

Oracle reports a few bugs with the wait event “Disk file operation I/O” but none see applicable to this install

Bug 10361651 – File open may hang/spin forever if dNFS is unresponsive [ID 10361651.8]

This bug is suppose to be fixed on 11.2.0.3 and applies to dNFS, though the issues sounds like it would fit – if an I/O operation takes “long” then Oracle’s wait times get corrupted and the process waits much much too long on the event.

 


Uncategorized

  1. Trackbacks

  1. Comments

  2. | #1

    Great work, thanks for sharing

  3. swapan chakrabarty
    | #2

    just wondering, the last output you presented, the average time it took to complete a particular operation, can you publish that query? that would seem like useful info to track over time.


1 × three =