Chuanqiz’s blog

年度归档: 2017 年

阿里HR面

终于等到你,还好我没放弃—–阿里HR面 1…

1. tools

  • nsight in WIN(vs) or Linux (eclipse)
  • nvprof in linux cmd line
    
    //in gtx1060 
    nvprof --metrics ipc,issued_ipc,achieved_occupancy,global_hit_rate,local_hit_rate,l2_tex_read_hit_rate,gld_transactions,gst_transactions,local_load_transactions,local_store_transactions,l2_tex_read_transactions,l2_tex_write_transactions,l2_read_transactions,l2_write_transactions,dram_read_transactions,dram_write_transactions,sysmem_read_transactions,sysmem_write_transactions ./wave
    

2. 度量标准 metrics

2.1 Performance

  • ipc
    • Instructions executed per cycle
  • issued_ipc
    • Instructions issued per cycle
  • achieved_occupancy
    • Ratio of the average active warps per active cycle to the maximum number of warps supported on a multiprocessor

说明:本文研究点在 Data Cache,那么一下的提到的L1 Cache 都为 Data Cache

2.2 Cache Hit Rate

L1 Cache

Fermi/Kepler (Capability 2.x/3.x)

  • l1_cache_global_hit_rate
    • Hit rate in L1 cache for global loads
  • l1_cache_local_hit_rate
    • Hit rate in L1 cache for local loads and stores
  • nc_cache_global_hit_rate
    • only for Kepler
    • Hit rate in non coherent cache for global loads

Maxwell/Pascal(Capability 5.x/6.x)

  • global_hit_rate
    • Hit rate for global loads
  • local_hit_rate
    • Hit rate for local loads and stores

L2 Cache

Fermi/Kepler (Capability 2.x/3.x)

  • l2_l1_read_hit_rate
    • Hit rate at L2 cache for all read requests from L1 cache
  • l2_tex_read_hit_rate
    • Hit rate at L2 cache for all read requests from texture cache

Maxwell/Pascal(Capability 5.x/6.x)

  • l2_tex_read_hit_rate
    • Hit rate at L2 cache for all read requests from texture cache

2.3 Transactions

L1 Cache

Global data

  • gld_transactions
    • Number of global memory load transactions
  • gld_transactions_per_request
    • Average number of global memory load transactions performed for each global memory load
  • gst_transactions
    • Number of global memory store transactions
  • gst_transactions_per_request
    • Average number of global memory store transactions performed for each global memory store

Local data

  • local_load_transactions
    • Number of local memory load transactions
  • local_load_transactions_per_request
    • Average number of local memory load transactions performed for each local memory load
  • local_store_transactions
    • Number of local memory store transactions
  • local_store_transactions_per_request
    • Average number of local memory store transactions performed for each local memory store

L2 Cache

Fermi/Kepler (Capability 2.x/3.x)

  • l2_l1_read_transactions
    • Memory read transactions seen at L2 cache for all read requests from L1 cache
  • l2_l1_write_transactions
    • Memory write transactions seen at L2 cache for all write requests from L1 cache

Maxwell/Pascal(Capability 5.x/6.x)

  • l2_tex_read_transactions
    • Memory read transactions seen at L2 cache for read requests from the texture cache
  • l2_tex_write_transactions

    Both

  • l2_read_transactions
    • Memory read transactions seen at L2 cache for all read requests
  • l2_write_transactions
    • Memory write transactions seen at L2 cache for all write requests

Only in Kepler

  • nc_l2_read_transactions
    • Memory read transactions seen at L2 cache for non coherent global read requests

备注

  • Kepler架构以来,L1 Cacheglobal data 的默认策略是 bypassing ,只有Fermi架构L1 Cache对 global data 是既可读又可写的,但是不能保持cache coherence
  • 那么为了保证 cache coherence,nvidia 采取了较为极端的做法,那就是bypassing L1 Cache ,并且在MaxwellPascal 架构中,与Tex Cache 合并,设置为 Read Only , 但我认为其效果并不佳。最新架构volta又将其架构改为 FermiL1 CacheShared memory 可配置的模式。
  • 可知,在MaxwellPascal 架构中,我们就将 tex cache 看成 L1 Data Cache

GDRAM

  • dram_read_transactions
    • Device memory read transactions
  • dram_write_transactions
    • Device memory write transactions

DRAM

  • sysmem_read_transactions
    • System memory read transactions
  • sysmem_write_transactions
    • System memory write transactions

Influence by L2 Hit Rate

Reference

Read more at: http://docs.nvidia.com/cuda/profiler-users-guide/index.html#ixzz4t4vGKod8
Follow us: @GPUComputing on Twitter | NVIDIA on Facebook

研究生出国开会、交流的出入境手续

应研究生院培养办要求,办理出国开会,交流,出入境手续,请在每月最后…

中心校区常用地点

中心校区常用地点 成绩单打印 明德楼B座 1楼大厅 财务大厅 明德…

Effective C++ 条款27 之 类型转换

Effective C++ 之 类型转换 阿里面试过去好久了,可惜…

阿里二面(交叉面)

自从8.25晚上笔试跪了之后,就没报太大希望了,然后又开始做实验,…

How to calculate occupancy

How to calculate occupancy 这个问题,其…

NVIDIA 电话面试小记(Compute Arch 方向)

NVIDIA 电话面试小记(Compute Arch 方向) 简历…

软件园校区常用指南

软件园校区常用指南 办公楼 盖章(学院) 312 杨航老师 或者3…

说明

注册、密码重置邮件请到垃圾邮箱查找。 欢迎大家分享一些有用的经验在…