coawst 性能分析以及优化 v1

3 小时前
/

coawst 性能分析以及优化 v1

系统背景

集群上运行大规模 COAWST(wrf roms swan) 耦合算例

问题

  • 不同程序的 编译项参数配置 过多且复杂
  • 对于一个 IO 项问题,是没设置正确的环境变量导致没编译功能还是没设置好参数导致没启用功能,无从得知

IO 相关参数

wrf
# IO 方式 2,11,13,102 ...
io_form_history     
io_form_restart      

# Quilt I/O Server
nio_tasks_per_group 
nio_groups

编译设置 (环境变量):

PNETCDF_QUILT  # 启用基于 PnetCDF 的 Quilt I/O Server 实现

# PNETCDF_QUILT “值得开启”,但前提是 PnetCDF 编译链完整且稳定
# 与 nio_tasks_per_group = n 同时使用, 
# 不启用 也能使用 Quilt  Server(配置:nio_tasks_per_group nio_groups
roms
PIO 相关

编译设置:

分析

通过输出文件

通过分析输出文件的 大小/数量 初步判断瓶颈

# 文件数量
find . -maxdepth 1 -type f -name 'ocean_his_*' | wc -l

# 文件大小
du -ch ocean_his_* | tail -1

其中三个程序的输出有:

swan:swaninit*

Roms: ocean_his_* ocean_avg_*

Wrf: wrfout_*

输出占比集中在romsWRF

LOG 分析

  • Slrum log 能得到很多信息
  • 同时可以使用 darshan 评估 并行IO

基本信息

不同程序分配的节点数
Model Coupling: 

       Ocean Model MPI nodes: 00000 - 00255

       Waves Model MPI nodes: 00256 - 00511

       Atmos Model MPI nodes: 00512 - 01535
Roms 网格切分
Resolution, Grid 01: 970x1198x50,  Parallel Nodes: 256,  Tiling: 16x16
Wrf 网格切分
Ntasks in X           31 , ntasks in Y           32

同时能算出分配到 io 的节点 wrf-nodes - NtasksX*NtasksY

耗时指标

Swan 是耗时最小的部分,不做研究

WRF
  • Timing for Writing (写文件耗时)
  • Timing for main (计算步长耗时)
Timing for main: time 2026-04-03_23:50:00 on domain   1:    3.69922 elapsed seconds
Timing for Writing wrfout_d01_2026-04-04_00:00:00 for domain        1:    8.47534 elapsed seconds
ROMS

程序运行时间 ≈ Total / roms 分配进程 (log 中的 Average )

Average:               578.752
 Minimum:               577.427
 Maximum:               578.860

 Nonlinear model elapsed CPU time profile, Grid: 01

  Allocation and array initialization ..............        20.930  ( 0.0188 %)
  Ocean state initialization .......................        51.577  ( 0.0464 %)
  Reading of input data ............................       400.877  ( 0.3608 %)
  Processing of input data .........................        29.871  ( 0.0269 %)
  Processing of output time averaged data ..........        67.112  ( 0.0604 %)
  Computation of vertical boundary conditions ......        21.002  ( 0.0189 %)
  Computation of global information integrals ......        17.770  ( 0.0160 %)
  Writing of output data ...........................     14673.920  (13.2054 %)
  Model 2D kernel ..................................      5109.128  ( 4.5978 %)
  2D/3D coupling, vertical metrics .................       405.235  ( 0.3647 %)
  Omega vertical velocity ..........................       285.565  ( 0.2570 %)
  Equation of state for seawater ...................       961.461  ( 0.8652 %)
  Atmosphere-Ocean bulk flux parameterization ......        62.334  ( 0.0561 %)
  KPP vertical mixing parameterization .............      2940.584  ( 2.6463 %)
  3D equations right-side terms ....................       856.016  ( 0.7704 %)
  3D equations predictor step ......................      1448.507  ( 1.3035 %)
  Pressure gradient ................................       444.194  ( 0.3997 %)
  Harmonic mixing of tracers, geopotentials ........       816.935  ( 0.7352 %)
  Biharmonic mixing of tracers, geopotentials ......       393.041  ( 0.3537 %)
  Harmonic stress tensor, geopotentials ............      2031.193  ( 1.8279 %)
  Corrector time-step for 3D momentum ..............       921.772  ( 0.8295 %)
  Corrector time-step for tracers ..................       937.017  ( 0.8432 %)
  Unused 07 ........................................     41156.609  (37.0379 %)
                                              Total:     74052.650   66.6419 %

  Unique kernel(s) regions profiled ................     74052.650   66.6419 %
  Residual, non-profiled code ......................     37067.639   33.3581 %


 All percentages are with respect to total time =       111120.289


 MPI communications profile, Grid: 01

  Message Passage: 2D halo exchanges ...............      3098.897  ( 2.7888 %)
  Message Passage: 3D halo exchanges ...............      1388.548  ( 1.2496 %)
  Message Passage: 4D halo exchanges ...............       539.617  ( 0.4856 %)
  Message Passage: data broadcast ..................     13393.653  (12.0533 %)
  Message Passage: data reduction ..................       253.303  ( 0.2280 %)
  Message Passage: data gathering ..................      3194.237  ( 2.8746 %)
  Message Passage: data scattering..................      2440.529  ( 2.1963 %)
  Message Passage: point data gathering ............         0.254  ( 0.0002 %)
  Message Passage: synchronization barrier .........         7.261  ( 0.0065 %)
                                              Total:     24316.300   21.8829 %

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

其中需要关注的有:

  • Writing of output data
  • Unused 07 (闲置状态)
  • MPI data broadcast

LOG 分析以及优化思路

优化思路

  1. 线性扩展
  2. wrf 历史模式:102
  3. pnetcdf 并行 io

进行一次完整运行

ROM:

Computation of global information integrals ......      8438.950  ( 0.0370 %)
  Writing of output data ...........................   2007518.132  ( 8.8010 %)
  Model 2D kernel ..................................   2813099.155  (12.3327 %)

 All percentages are with respect to total time =     22810024.664
 
 MPI communications profile, Grid: 01
  Message Passage: 2D halo exchanges ...............   2572137.150  (11.2763 %)
  Message Passage: data broadcast ..................   1064456.268  ( 4.6666 %)
                                              Total:   5866359.029   25.7183 %
  1. Unused (ROMS 闲置时间) ≈ 60% ,ROMS 等待 WRF 的结果 --> 参数分配上还有优化空间

WRF:

(使用脚本统计数据指标)

Timing for Writing
Timing for main
total
Avg
total
Avg
831.46
20.28
21017.38
525.43
  • Timing for main 包括计算、等待 IO 进程、等待耦合的时间
  • 这里平均每步等待 20s

设计 30 分钟迭代的快速测试

线性扩展

darshan log

# darshan log version: 3.41
# compression method: ZLIB
# exe: ./coawstM.backup coupling.in 
# uid: 66756
# jobid: 1471163
# start_time: 1779956254
# start_time_asci: Thu May 28 16:17:34 2026
# end_time: 1779957375
# end_time_asci: Thu May 28 16:36:15 2026
# nprocs: 128
# run time: 1120.9576
# metadata: lib_ver = 3.5.0
# metadata: h = romio_no_indep_rw=true;cb_nodes=4

# log file regions
# -------------------------------------------------------
# header: 1328 bytes (uncompressed)
# job data: 243 bytes (compressed)
# record table: 24007 bytes (compressed)
# POSIX module: 33758 bytes (compressed), ver=4
# LUSTRE module: 6917 bytes (compressed), ver=2
# STDIO module: 773 bytes (compressed), ver=2
# HEATMAP module: 9885 bytes (compressed), ver=1

# mounted file systems (mount point and fs type)
# -------------------------------------------------------
# mount entry:        /proc/sys/fs/binfmt_misc        autofs
# mount entry:        /sys/fs/cgroup/unified        cgroup2
# mount entry:        /sys/kernel/tracing        tracefs
# mount entry:        /sys/kernel/config        configfs
# mount entry:        /sys/fs/bpf        bpf
# mount entry:        /dev/mqueue        mqueue
# mount entry:        /vol8        lustre
# mount entry:        /dev        devtmpfs

# *******************************************************
# POSIX module data
# *******************************************************

# description of POSIX counters:
#   POSIX_*: posix operation counts.
#   READS,WRITES,OPENS,SEEKS,STATS,MMAPS,SYNCS,FILENOS,DUPS are types of operations.
#   POSIX_RENAME_SOURCES/TARGETS: total count file was source or target of a rename operation
#   POSIX_RENAMED_FROM: Darshan record ID of the first rename source, if file was a rename target
#   POSIX_MODE: mode that file was opened in.
#   POSIX_BYTES_*: total bytes read and written.
#   POSIX_MAX_BYTE_*: highest offset byte read and written.
#   POSIX_CONSEC_*: number of exactly adjacent reads and writes.
#   POSIX_SEQ_*: number of reads and writes from increasing offsets.
#   POSIX_RW_SWITCHES: number of times access alternated between read and write.
#   POSIX_*_ALIGNMENT: memory and file alignment.
#   POSIX_*_NOT_ALIGNED: number of reads and writes that were not aligned.
#   POSIX_MAX_*_TIME_SIZE: size of the slowest read and write operations.
#   POSIX_SIZE_*_*: histogram of read and write access sizes.
#   POSIX_STRIDE*_STRIDE: the four most common strides detected.
#   POSIX_STRIDE*_COUNT: count of the four most common strides.
#   POSIX_ACCESS*_ACCESS: the four most common access sizes.
#   POSIX_ACCESS*_COUNT: count of the four most common access sizes.
#   POSIX_*_RANK: rank of the processes that were the fastest and slowest at I/O (for shared files).
#   POSIX_*_RANK_BYTES: bytes transferred by the fastest and slowest ranks (for shared files).
#   POSIX_F_*_START_TIMESTAMP: timestamp of first open/read/write/close.
#   POSIX_F_*_END_TIMESTAMP: timestamp of last open/read/write/close.
#   POSIX_F_READ/WRITE/META_TIME: cumulative time spent in read, write, or metadata operations.
#   POSIX_F_MAX_*_TIME: duration of the slowest read and write operations.
#   POSIX_F_*_RANK_TIME: fastest and slowest I/O time for a single rank (for shared files).
#   POSIX_F_VARIANCE_RANK_*: variance of total I/O time and bytes moved for all ranks (for shared files).

# WARNING: POSIX_OPENS counter includes both POSIX_FILENOS and POSIX_DUPS counts

# WARNING: POSIX counters related to file offsets may be incorrect if a file is simultaneously accessed by both POSIX and STDIO (e.g., using fileno())
#         - Affected counters include: MAX_BYTE_{READ|WRITTEN}, CONSEC_{READS|WRITES}, SEQ_{READS|WRITES}, {MEM|FILE}_NOT_ALIGNED, STRIDE*_STRIDE

total_POSIX_OPENS: 1167
total_POSIX_FILENOS: 13
total_POSIX_DUPS: 0
total_POSIX_READS: 107443
total_POSIX_WRITES: 759720
total_POSIX_SEEKS: 5894
total_POSIX_STATS: 1477
total_POSIX_MMAPS: -1
total_POSIX_FSYNCS: 0
total_POSIX_FDSYNCS: 0
total_POSIX_RENAME_SOURCES: 0
total_POSIX_RENAME_TARGETS: 0
total_POSIX_RENAMED_FROM: 0
total_POSIX_MODE: 0
total_POSIX_BYTES_READ: 4642184152
total_POSIX_BYTES_WRITTEN: 12596970403
total_POSIX_MAX_BYTE_READ: 6624238151
total_POSIX_MAX_BYTE_WRITTEN: 5010776075
total_POSIX_CONSEC_READS: 103785
total_POSIX_CONSEC_WRITES: 753463
total_POSIX_SEQ_READS: 105843
total_POSIX_SEQ_WRITES: 758835
total_POSIX_RW_SWITCHES: 2439
total_POSIX_MEM_NOT_ALIGNED: 0
total_POSIX_MEM_ALIGNMENT: 8
total_POSIX_FILE_NOT_ALIGNED: 862089
total_POSIX_FILE_ALIGNMENT: 1048576
total_POSIX_MAX_READ_TIME_SIZE: 28834614
total_POSIX_MAX_WRITE_TIME_SIZE: 28996093
total_POSIX_SIZE_READ_0_100: 3548
total_POSIX_SIZE_READ_100_1K: 915
total_POSIX_SIZE_READ_1K_10K: 101545
total_POSIX_SIZE_READ_10K_100K: 144
total_POSIX_SIZE_READ_100K_1M: 572
total_POSIX_SIZE_READ_1M_4M: 667
total_POSIX_SIZE_READ_4M_10M: 11
total_POSIX_SIZE_READ_10M_100M: 41
total_POSIX_SIZE_READ_100M_1G: 0
total_POSIX_SIZE_READ_1G_PLUS: 0
total_POSIX_SIZE_WRITE_0_100: 2234
total_POSIX_SIZE_WRITE_100_1K: 2425
total_POSIX_SIZE_WRITE_1K_10K: 751992
total_POSIX_SIZE_WRITE_10K_100K: 969
total_POSIX_SIZE_WRITE_100K_1M: 643
total_POSIX_SIZE_WRITE_1M_4M: 1276
total_POSIX_SIZE_WRITE_4M_10M: 29
total_POSIX_SIZE_WRITE_10M_100M: 152
total_POSIX_SIZE_WRITE_100M_1G: 0
total_POSIX_SIZE_WRITE_1G_PLUS: 0
total_POSIX_STRIDE1_STRIDE: 4194304
total_POSIX_STRIDE2_STRIDE: 4
total_POSIX_STRIDE3_STRIDE: 4096
total_POSIX_STRIDE4_STRIDE: 1
total_POSIX_STRIDE1_COUNT: 1218
total_POSIX_STRIDE2_COUNT: 288
total_POSIX_STRIDE3_COUNT: 124
total_POSIX_STRIDE4_COUNT: 24
total_POSIX_ACCESS1_ACCESS: 4189
total_POSIX_ACCESS2_ACCESS: 4163
total_POSIX_ACCESS3_ACCESS: 4194304
total_POSIX_ACCESS4_ACCESS: 8192
total_POSIX_ACCESS1_COUNT: 619252
total_POSIX_ACCESS2_COUNT: 109913
total_POSIX_ACCESS3_COUNT: 1849
total_POSIX_ACCESS4_COUNT: 99909
total_POSIX_FASTEST_RANK: -1
total_POSIX_FASTEST_RANK_BYTES: -1
total_POSIX_SLOWEST_RANK: -1
total_POSIX_SLOWEST_RANK_BYTES: -1
total_POSIX_F_OPEN_START_TIMESTAMP: 0.011120
total_POSIX_F_READ_START_TIMESTAMP: 0.014384
total_POSIX_F_WRITE_START_TIMESTAMP: 1.191473
total_POSIX_F_CLOSE_START_TIMESTAMP: 0.021221
total_POSIX_F_OPEN_END_TIMESTAMP: 1035.282613
total_POSIX_F_READ_END_TIMESTAMP: 1035.839145
total_POSIX_F_WRITE_END_TIMESTAMP: 1122.177334
total_POSIX_F_CLOSE_END_TIMESTAMP: 1122.201477
total_POSIX_F_READ_TIME: 7.116889
total_POSIX_F_WRITE_TIME: 56.739679
total_POSIX_F_META_TIME: 6.663915
total_POSIX_F_MAX_READ_TIME: 0.119348
total_POSIX_F_MAX_WRITE_TIME: 0.111346
total_POSIX_F_FASTEST_RANK_TIME: 0.000000
total_POSIX_F_SLOWEST_RANK_TIME: 0.000000
total_POSIX_F_VARIANCE_RANK_TIME: 0.000000
total_POSIX_F_VARIANCE_RANK_BYTES: 0.000000

# *******************************************************
# STDIO module data
# *******************************************************

# description of STDIO counters:
#   STDIO_{OPENS|FDOPENS|WRITES|READS|SEEKS|FLUSHES} are types of operations.
#   STDIO_BYTES_*: total bytes read and written.
#   STDIO_MAX_BYTE_*: highest offset byte read and written.
#   STDIO_*_RANK: rank of the processes that were the fastest and slowest at I/O (for shared files).
#   STDIO_*_RANK_BYTES: bytes transferred by the fastest and slowest ranks (for shared files).
#   STDIO_F_*_START_TIMESTAMP: timestamp of the first call to that type of function.
#   STDIO_F_*_END_TIMESTAMP: timestamp of the completion of the last call to that type of function.
#   STDIO_F_*_TIME: cumulative time spent in different types of functions.
#   STDIO_F_*_RANK_TIME: fastest and slowest I/O time for a single rank (for shared files).
#   STDIO_F_VARIANCE_RANK_*: variance of total I/O time and bytes moved for all ranks (for shared files).

# WARNING: STDIO_OPENS counter includes STDIO_FDOPENS count

# WARNING: STDIO counters related to file offsets may be incorrect if a file is simultaneously accessed by both STDIO and POSIX (e.g., using fdopen())
#         - Affected counters include: MAX_BYTE_{READ|WRITTEN}

total_STDIO_OPENS: 19
total_STDIO_FDOPENS: 0
total_STDIO_READS: 16
total_STDIO_WRITES: 0
total_STDIO_SEEKS: 26
total_STDIO_FLUSHES: 0
total_STDIO_BYTES_WRITTEN: 0
total_STDIO_BYTES_READ: 54526051
total_STDIO_MAX_BYTE_READ: 4194303
total_STDIO_MAX_BYTE_WRITTEN: 0
total_STDIO_FASTEST_RANK: -1
total_STDIO_FASTEST_RANK_BYTES: -1
total_STDIO_SLOWEST_RANK: -1
total_STDIO_SLOWEST_RANK_BYTES: -1
total_STDIO_F_META_TIME: 0.025318
total_STDIO_F_WRITE_TIME: 0.000000
total_STDIO_F_READ_TIME: 0.107740
total_STDIO_F_OPEN_START_TIMESTAMP: 2.131395
total_STDIO_F_CLOSE_START_TIMESTAMP: 2.132253
total_STDIO_F_WRITE_START_TIMESTAMP: 0.000000
total_STDIO_F_READ_START_TIMESTAMP: 2.136211
total_STDIO_F_OPEN_END_TIMESTAMP: 1035.230478
total_STDIO_F_CLOSE_END_TIMESTAMP: 1035.235342
total_STDIO_F_WRITE_END_TIMESTAMP: 0.000000
total_STDIO_F_READ_END_TIMESTAMP: 1035.233100
total_STDIO_F_FASTEST_RANK_TIME: 0.000000
total_STDIO_F_SLOWEST_RANK_TIME: 0.000000
total_STDIO_F_VARIANCE_RANK_TIME: 0.000000
total_STDIO_F_VARIANCE_RANK_BYTES: 0.000000
# darshan log version: 3.41
# compression method: ZLIB
# exe: ./coawstM.backup coupling.in 
# uid: 66756
# jobid: 1134026
# start_time: 1779960614
# start_time_asci: Thu May 28 17:30:14 2026
# end_time: 1779961290
# end_time_asci: Thu May 28 17:41:30 2026
# nprocs: 1024
# run time: 675.4815
# metadata: lib_ver = 3.5.0
# metadata: h = romio_no_indep_rw=true;cb_nodes=4

# log file regions
# -------------------------------------------------------
# header: 1328 bytes (uncompressed)
# job data: 243 bytes (compressed)
# record table: 192120 bytes (compressed)
# POSIX module: 252901 bytes (compressed), ver=4
# LUSTRE module: 54221 bytes (compressed), ver=2
# STDIO module: 758 bytes (compressed), ver=2
# HEATMAP module: 68184 bytes (compressed), ver=1

# mounted file systems (mount point and fs type)
# -------------------------------------------------------
# mount entry:        /proc/sys/fs/binfmt_misc        autofs
# mount entry:        /sys/fs/cgroup/unified        cgroup2
# mount entry:        /sys/kernel/tracing        tracefs
# mount entry:        /sys/kernel/config        configfs
# mount entry:        /sys/fs/bpf        bpf
# mount entry:        /dev/mqueue        mqueue
# mount entry:        /vol8        lustre
# mount entry:        /dev        devtmpfs

# *******************************************************
# POSIX module data
# *******************************************************

# description of POSIX counters:
#   POSIX_*: posix operation counts.
#   READS,WRITES,OPENS,SEEKS,STATS,MMAPS,SYNCS,FILENOS,DUPS are types of operations.
#   POSIX_RENAME_SOURCES/TARGETS: total count file was source or target of a rename operation
#   POSIX_RENAMED_FROM: Darshan record ID of the first rename source, if file was a rename target
#   POSIX_MODE: mode that file was opened in.
#   POSIX_BYTES_*: total bytes read and written.
#   POSIX_MAX_BYTE_*: highest offset byte read and written.
#   POSIX_CONSEC_*: number of exactly adjacent reads and writes.
#   POSIX_SEQ_*: number of reads and writes from increasing offsets.
#   POSIX_RW_SWITCHES: number of times access alternated between read and write.
#   POSIX_*_ALIGNMENT: memory and file alignment.
#   POSIX_*_NOT_ALIGNED: number of reads and writes that were not aligned.
#   POSIX_MAX_*_TIME_SIZE: size of the slowest read and write operations.
#   POSIX_SIZE_*_*: histogram of read and write access sizes.
#   POSIX_STRIDE*_STRIDE: the four most common strides detected.
#   POSIX_STRIDE*_COUNT: count of the four most common strides.
#   POSIX_ACCESS*_ACCESS: the four most common access sizes.
#   POSIX_ACCESS*_COUNT: count of the four most common access sizes.
#   POSIX_*_RANK: rank of the processes that were the fastest and slowest at I/O (for shared files).
#   POSIX_*_RANK_BYTES: bytes transferred by the fastest and slowest ranks (for shared files).
#   POSIX_F_*_START_TIMESTAMP: timestamp of first open/read/write/close.
#   POSIX_F_*_END_TIMESTAMP: timestamp of last open/read/write/close.
#   POSIX_F_READ/WRITE/META_TIME: cumulative time spent in read, write, or metadata operations.
#   POSIX_F_MAX_*_TIME: duration of the slowest read and write operations.
#   POSIX_F_*_RANK_TIME: fastest and slowest I/O time for a single rank (for shared files).
#   POSIX_F_VARIANCE_RANK_*: variance of total I/O time and bytes moved for all ranks (for shared files).

# WARNING: POSIX_OPENS counter includes both POSIX_FILENOS and POSIX_DUPS counts

# WARNING: POSIX counters related to file offsets may be incorrect if a file is simultaneously accessed by both POSIX and STDIO (e.g., using fileno())
#         - Affected counters include: MAX_BYTE_{READ|WRITTEN}, CONSEC_{READS|WRITES}, SEQ_{READS|WRITES}, {MEM|FILE}_NOT_ALIGNED, STRIDE*_STRIDE

total_POSIX_OPENS: 8943
total_POSIX_FILENOS: 13
total_POSIX_DUPS: 0
total_POSIX_READS: 830835
total_POSIX_WRITES: 899851
total_POSIX_SEEKS: 5606
total_POSIX_STATS: 11461
total_POSIX_MMAPS: -1
total_POSIX_FSYNCS: 0
total_POSIX_FDSYNCS: 0
total_POSIX_RENAME_SOURCES: 0
total_POSIX_RENAME_TARGETS: 0
total_POSIX_RENAMED_FROM: 0
total_POSIX_MODE: 0
total_POSIX_BYTES_READ: 10420613385
total_POSIX_BYTES_WRITTEN: 13183610642
total_POSIX_MAX_BYTE_READ: 6624238151
total_POSIX_MAX_BYTE_WRITTEN: 5010776075
total_POSIX_CONSEC_READS: 820249
total_POSIX_CONSEC_WRITES: 893034
total_POSIX_SEQ_READS: 822019
total_POSIX_SEQ_WRITES: 898406
total_POSIX_RW_SWITCHES: 2439
total_POSIX_MEM_NOT_ALIGNED: 0
total_POSIX_MEM_ALIGNMENT: 8
total_POSIX_FILE_NOT_ALIGNED: 1712797
total_POSIX_FILE_ALIGNMENT: 1048576
total_POSIX_MAX_READ_TIME_SIZE: 8192
total_POSIX_MAX_WRITE_TIME_SIZE: 28834614
total_POSIX_SIZE_READ_0_100: 17852
total_POSIX_SIZE_READ_100_1K: 915
total_POSIX_SIZE_READ_1K_10K: 810585
total_POSIX_SIZE_READ_10K_100K: 99
total_POSIX_SIZE_READ_100K_1M: 665
total_POSIX_SIZE_READ_1M_4M: 667
total_POSIX_SIZE_READ_4M_10M: 11
total_POSIX_SIZE_READ_10M_100M: 41
total_POSIX_SIZE_READ_100M_1G: 0
total_POSIX_SIZE_READ_1G_PLUS: 0
total_POSIX_SIZE_WRITE_0_100: 2221
total_POSIX_SIZE_WRITE_100_1K: 2448
total_POSIX_SIZE_WRITE_1K_10K: 892017
total_POSIX_SIZE_WRITE_10K_100K: 1065
total_POSIX_SIZE_WRITE_100K_1M: 643
total_POSIX_SIZE_WRITE_1M_4M: 1276
total_POSIX_SIZE_WRITE_4M_10M: 29
total_POSIX_SIZE_WRITE_10M_100M: 152
total_POSIX_SIZE_WRITE_100M_1G: 0
total_POSIX_SIZE_WRITE_1G_PLUS: 0
total_POSIX_STRIDE1_STRIDE: 4194304
total_POSIX_STRIDE2_STRIDE: 4096
total_POSIX_STRIDE3_STRIDE: 1
total_POSIX_STRIDE4_STRIDE: 26
total_POSIX_STRIDE1_COUNT: 1218
total_POSIX_STRIDE2_COUNT: 124
total_POSIX_STRIDE3_COUNT: 24
total_POSIX_STRIDE4_COUNT: 20
total_POSIX_ACCESS1_ACCESS: 4189
total_POSIX_ACCESS2_ACCESS: 8192
total_POSIX_ACCESS3_ACCESS: 4530
total_POSIX_ACCESS4_ACCESS: 2334
total_POSIX_ACCESS1_COUNT: 676442
total_POSIX_ACCESS2_COUNT: 761680
total_POSIX_ACCESS3_COUNT: 2193
total_POSIX_ACCESS4_COUNT: 2176
total_POSIX_FASTEST_RANK: -1
total_POSIX_FASTEST_RANK_BYTES: -1
total_POSIX_SLOWEST_RANK: -1
total_POSIX_SLOWEST_RANK_BYTES: -1
total_POSIX_F_OPEN_START_TIMESTAMP: 0.090209
total_POSIX_F_READ_START_TIMESTAMP: 0.097283
total_POSIX_F_WRITE_START_TIMESTAMP: 0.853514
total_POSIX_F_CLOSE_START_TIMESTAMP: 0.101990
total_POSIX_F_OPEN_END_TIMESTAMP: 565.882341
total_POSIX_F_READ_END_TIMESTAMP: 570.081440
total_POSIX_F_WRITE_END_TIMESTAMP: 675.602195
total_POSIX_F_CLOSE_END_TIMESTAMP: 675.626367
total_POSIX_F_READ_TIME: 29.532315
total_POSIX_F_WRITE_TIME: 64.455771
total_POSIX_F_META_TIME: 286.078577
total_POSIX_F_MAX_READ_TIME: 0.102825
total_POSIX_F_MAX_WRITE_TIME: 0.078899
total_POSIX_F_FASTEST_RANK_TIME: 0.000000
total_POSIX_F_SLOWEST_RANK_TIME: 0.000000
total_POSIX_F_VARIANCE_RANK_TIME: 0.000000
total_POSIX_F_VARIANCE_RANK_BYTES: 0.000000

# *******************************************************
# STDIO module data
# *******************************************************

# description of STDIO counters:
#   STDIO_{OPENS|FDOPENS|WRITES|READS|SEEKS|FLUSHES} are types of operations.
#   STDIO_BYTES_*: total bytes read and written.
#   STDIO_MAX_BYTE_*: highest offset byte read and written.
#   STDIO_*_RANK: rank of the processes that were the fastest and slowest at I/O (for shared files).
#   STDIO_*_RANK_BYTES: bytes transferred by the fastest and slowest ranks (for shared files).
#   STDIO_F_*_START_TIMESTAMP: timestamp of the first call to that type of function.
#   STDIO_F_*_END_TIMESTAMP: timestamp of the completion of the last call to that type of function.
#   STDIO_F_*_TIME: cumulative time spent in different types of functions.
#   STDIO_F_*_RANK_TIME: fastest and slowest I/O time for a single rank (for shared files).
#   STDIO_F_VARIANCE_RANK_*: variance of total I/O time and bytes moved for all ranks (for shared files).

# WARNING: STDIO_OPENS counter includes STDIO_FDOPENS count

# WARNING: STDIO counters related to file offsets may be incorrect if a file is simultaneously accessed by both STDIO and POSIX (e.g., using fdopen())
#         - Affected counters include: MAX_BYTE_{READ|WRITTEN}

total_STDIO_OPENS: 19
total_STDIO_FDOPENS: 0
total_STDIO_READS: 16
total_STDIO_WRITES: 0
total_STDIO_SEEKS: 26
total_STDIO_FLUSHES: 0
total_STDIO_BYTES_WRITTEN: 0
total_STDIO_BYTES_READ: 54526051
total_STDIO_MAX_BYTE_READ: 4194303
total_STDIO_MAX_BYTE_WRITTEN: 0
total_STDIO_FASTEST_RANK: -1
total_STDIO_FASTEST_RANK_BYTES: -1
total_STDIO_SLOWEST_RANK: -1
total_STDIO_SLOWEST_RANK_BYTES: -1
total_STDIO_F_META_TIME: 0.019924
total_STDIO_F_WRITE_TIME: 0.000000
total_STDIO_F_READ_TIME: 0.106629
total_STDIO_F_OPEN_START_TIMESTAMP: 1.386794
total_STDIO_F_CLOSE_START_TIMESTAMP: 1.387560
total_STDIO_F_WRITE_START_TIMESTAMP: 0.000000
total_STDIO_F_READ_START_TIMESTAMP: 1.391299
total_STDIO_F_OPEN_END_TIMESTAMP: 565.481624
total_STDIO_F_CLOSE_END_TIMESTAMP: 565.487266
total_STDIO_F_WRITE_END_TIMESTAMP: 0.000000
total_STDIO_F_READ_END_TIMESTAMP: 565.484678
total_STDIO_F_FASTEST_RANK_TIME: 0.000000
total_STDIO_F_SLOWEST_RANK_TIME: 0.000000
total_STDIO_F_VARIANCE_RANK_TIME: 0.000000
total_STDIO_F_VARIANCE_RANK_BYTES: 0.000000

核心指标直观对比表

  • 可以看出因为串行模型下,扩展核数导致 Lustre 的元数据服务器(MDS)拥堵
WRF IO MODE 102

该模式让 WRF 每个进程分别负责各自的文件,避免了上千进程的写文件的拥堵;该方法也是 WRF 论坛中对边 >500 规模网格的推荐方法

ROM

Average:               257.028
 Minimum:               253.560
 Maximum:               257.158


  Writing of output data ...........................     14544.184  (29.4719 %)
  Model 2D kernel ..................................      4990.394  (10.1124 %)
  Unused 07 ........................................       330.811  ( 0.6703 %)
                                              Total:     33027.832   66.9266 %

  Unique kernel(s) regions profiled ................     33027.832   66.9266 %
  Residual, non-profiled code ......................     16321.481   33.0734 %


 All percentages are with respect to total time =        49349.313


 MPI communications profile, Grid: 01

  Message Passage: 2D halo exchanges ...............      2917.842  ( 5.9126 %)
  Message Passage: 3D halo exchanges ...............      1340.847  ( 2.7171 %)
  Message Passage: 4D halo exchanges ...............       515.030  ( 1.0436 %)
  Message Passage: data broadcast ..................     13709.517  (27.7806 %)
  Message Passage: data reduction ..................       286.763  ( 0.5811 %)
  Message Passage: data gathering ..................      2557.551  ( 5.1825 %)
  Message Passage: data scattering..................      2209.861  ( 4.4780 %)
  Message Passage: point data gathering ............         0.976  ( 0.0020 %)
  Message Passage: synchronization barrier .........         3.564  ( 0.0072 %)
                                              Total:     23541.949   47.7047 %

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

WRF


Timing for Writing
Timing for main

total
Avg
total
Avg
io_mode 2
158.38
39.59
472.45
-
io_mode 102
4.03
1.01
189.08
-

ROM 中的 Unused 07 (空闲时间)与 WRF 中的 writing 耗时都大幅缩短

  • 该方法会导致大量碎片文件;需要后处理 (可以另起程序一边生成新的数据,一边合并) 总速度时也有显著提升

使用社交账号登录

  • Loading...
  • Loading...
  • Loading...
  • Loading...
  • Loading...