Chuanqiz’s blog

归档

caffe – goturn 安装问题汇总

caffe – goturn 安装问题汇总 1.caf…

caffe-ssd bug 解决日志

0.SSD测试时出现 I1122 13:34:31.285167 …

作报告==讲故事 by 赵梦莹老师

作报告==讲故事 目的是让别人听懂自己在说什么,所以在做PPT之前…

11月9日 软件园校区嵌入式实验室讨论班

11月9日 软件园校区嵌入式实验室讨论班 本次讨论班的主题是异构计…

310 regulation 征求建议中

310 regulation hi,大家好,俗话说,无规矩不成方圆…

阿里HR面

终于等到你,还好我没放弃—–阿里HR面 1…

1. tools

  • nsight in WIN(vs) or Linux (eclipse)
  • nvprof in linux cmd line
    
    //in gtx1060 
    nvprof --metrics ipc,issued_ipc,achieved_occupancy,global_hit_rate,local_hit_rate,l2_tex_read_hit_rate,gld_transactions,gst_transactions,local_load_transactions,local_store_transactions,l2_tex_read_transactions,l2_tex_write_transactions,l2_read_transactions,l2_write_transactions,dram_read_transactions,dram_write_transactions,sysmem_read_transactions,sysmem_write_transactions ./wave
    

2. 度量标准 metrics

2.1 Performance

  • ipc
    • Instructions executed per cycle
  • issued_ipc
    • Instructions issued per cycle
  • achieved_occupancy
    • Ratio of the average active warps per active cycle to the maximum number of warps supported on a multiprocessor

说明:本文研究点在 Data Cache,那么一下的提到的L1 Cache 都为 Data Cache

2.2 Cache Hit Rate

L1 Cache

Fermi/Kepler (Capability 2.x/3.x)

  • l1_cache_global_hit_rate
    • Hit rate in L1 cache for global loads
  • l1_cache_local_hit_rate
    • Hit rate in L1 cache for local loads and stores
  • nc_cache_global_hit_rate
    • only for Kepler
    • Hit rate in non coherent cache for global loads

Maxwell/Pascal(Capability 5.x/6.x)

  • global_hit_rate
    • Hit rate for global loads
  • local_hit_rate
    • Hit rate for local loads and stores

L2 Cache

Fermi/Kepler (Capability 2.x/3.x)

  • l2_l1_read_hit_rate
    • Hit rate at L2 cache for all read requests from L1 cache
  • l2_tex_read_hit_rate
    • Hit rate at L2 cache for all read requests from texture cache

Maxwell/Pascal(Capability 5.x/6.x)

  • l2_tex_read_hit_rate
    • Hit rate at L2 cache for all read requests from texture cache

2.3 Transactions

L1 Cache

Global data

  • gld_transactions
    • Number of global memory load transactions
  • gld_transactions_per_request
    • Average number of global memory load transactions performed for each global memory load
  • gst_transactions
    • Number of global memory store transactions
  • gst_transactions_per_request
    • Average number of global memory store transactions performed for each global memory store

Local data

  • local_load_transactions
    • Number of local memory load transactions
  • local_load_transactions_per_request
    • Average number of local memory load transactions performed for each local memory load
  • local_store_transactions
    • Number of local memory store transactions
  • local_store_transactions_per_request
    • Average number of local memory store transactions performed for each local memory store

L2 Cache

Fermi/Kepler (Capability 2.x/3.x)

  • l2_l1_read_transactions
    • Memory read transactions seen at L2 cache for all read requests from L1 cache
  • l2_l1_write_transactions
    • Memory write transactions seen at L2 cache for all write requests from L1 cache

Maxwell/Pascal(Capability 5.x/6.x)

  • l2_tex_read_transactions
    • Memory read transactions seen at L2 cache for read requests from the texture cache
  • l2_tex_write_transactions

    Both

  • l2_read_transactions
    • Memory read transactions seen at L2 cache for all read requests
  • l2_write_transactions
    • Memory write transactions seen at L2 cache for all write requests

Only in Kepler

  • nc_l2_read_transactions
    • Memory read transactions seen at L2 cache for non coherent global read requests

备注

  • Kepler架构以来,L1 Cacheglobal data 的默认策略是 bypassing ,只有Fermi架构L1 Cache对 global data 是既可读又可写的,但是不能保持cache coherence
  • 那么为了保证 cache coherence,nvidia 采取了较为极端的做法,那就是bypassing L1 Cache ,并且在MaxwellPascal 架构中,与Tex Cache 合并,设置为 Read Only , 但我认为其效果并不佳。最新架构volta又将其架构改为 FermiL1 CacheShared memory 可配置的模式。
  • 可知,在MaxwellPascal 架构中,我们就将 tex cache 看成 L1 Data Cache

GDRAM

  • dram_read_transactions
    • Device memory read transactions
  • dram_write_transactions
    • Device memory write transactions

DRAM

  • sysmem_read_transactions
    • System memory read transactions
  • sysmem_write_transactions
    • System memory write transactions

Influence by L2 Hit Rate

Reference

Read more at: http://docs.nvidia.com/cuda/profiler-users-guide/index.html#ixzz4t4vGKod8
Follow us: @GPUComputing on Twitter | NVIDIA on Facebook

研究生出国开会、交流的出入境手续

应研究生院培养办要求,办理出国开会,交流,出入境手续,请在每月最后…

中心校区常用地点

中心校区常用地点 成绩单打印 明德楼B座 1楼大厅 财务大厅 明德…

Effective C++ 条款27 之 类型转换

Effective C++ 之 类型转换 阿里面试过去好久了,可惜…