close

12.4 TensorRT 實用工具

前言

工程化部署是一個複雜的任務,涉及的環節眾多,因此需要有足夠好的工具來檢測、分析,NVIDIA也提供了一系列工具用於分析、調試、優化部署環節。本節就介紹兩個實用工具,nsight system polygraphy

nsight system可分析cpugpu的性能,可找出應用程式的瓶頸。

polygraphy可在各種框架中運行和調試深度學習模型,用於分析模型轉換間的瓶頸。

nsight system

NVIDIA Nsight Systems是一個系統分析工具,它可以分析CPUGPU的利用率、記憶體佔用、資料輸送量等各種性能指標,找出應用程式的瓶頸所在。用戶文檔

安裝

打開官網,選擇對應的作業系統、版本進行下載

  • NsightSystems-2023.3.1.92-3314722.msi,按兩下安裝,一路默認
  • 將目錄添加到環境變數:C:\Program Files\NVIDIA Corporation\Nsight Systems 2023.3.1\target-windows-x64
  • gui工作目錄頁添加到環境變數:C:\Program Files\NVIDIA Corporation\Nsight Systems 2023.3.1\host-windows-x64

運行

nsys包括命令列工具與UI介面,這裡採用UI介面演示。

  • 命令列工具是C:\Program Files\NVIDIA Corporation\Nsight Systems 2023.3.1\target-windows-x64\nsys.exe
  • UI介面是C:\Program Files\NVIDIA Corporation\Nsight Systems 2023.3.1\host-windows-x64\nsys-ui.exe

nsys運行邏輯是從nsys端啟動任務,nsys自動監控任務的性能。

第一步:啟動nsys。在cmd中輸入nsys-ui,或者到安裝目錄下按兩下nsys-ui.exe

第二步:創建project,配置要運行的程式。在這裡運行本章配套代碼01_trt_resnet50_cuda.py。具體操作如下圖所示

<<AI人工智慧 PyTorch自學>> 12.4 Tens

第三步:查看統計資訊

<<AI人工智慧 PyTorch自學>> 12.4 Tens

nsight system是一個強大的軟體,但具體如何有效使用,以及如何更細細微性、更接近底層的去分析耗時,請大家參照官方文檔以及需求來學習。

polygraphy

polygraphyTensorRT生態中重要的debug調試工具,它可以

  • 使用多種後端運行推理計算,包括 TensorRT, onnxruntime, TensorFlow
  • 比較不同後端的逐層計算結果;
  • 由模型檔生成 TensorRT 引擎並序列化為.plan
  • 查看模型網路的逐層資訊;
  • 修改 Onnx 模型,如提取子圖,計算圖化簡;
  • 分析 Onnx TensorRT 失敗原因,將原計算圖中可以 / 不可以轉 TensorRT 的子圖分割保存;
  • 隔離 TensorRT 中錯誤的tactic

常用的幾個功能是:

  • 檢驗 TensorRT 上計算結果正確性 /精度
  • 找出計算錯誤 / 精度不足的層
  • 進行簡單的計算圖優化

安裝

pip install nvidia-pyindex

pip install polygraphy

Copy

驗證

polygraphy依託於虛擬環境運行,因此需要啟動相應的虛擬環境,然後執行 polygraphy -h

polygraphy有七種模式,分別是 {run,convert,inspect,surgeon,template,debug,data},具體含義參見文檔

(pt112) C:\Users\yts32>polygraphy -h

usage: polygraphy [-h] [-v] {run,convert,inspect,check,surgeon,template,debug,data} ...

 

Polygraphy: A Deep Learning Debugging Toolkit

 

optional arguments:

  -h, --help            show this help message and exit

  -v, --version         show program's version number and exit

 

Tools:

  {run,convert,inspect,check,surgeon,template,debug,data}

    run                 Run inference and compare results across backends.

    convert             Convert models to other formats.

    inspect             View information about various types of files.

    check               Check and validate various aspects of a model

    surgeon             Modify ONNX models.

    template            [EXPERIMENTAL] Generate template files.

    debug               [EXPERIMENTAL] Debug a wide variety of model issues.

    data                Manipulate input and output data generated by other Polygraphy subtools.

Copy

案例1:運行onnxtrt模型

polygraphy run resnet50_bs_1.onnx --onnxrt     

 

polygraphy run resnet50_bs_1.engine --trt --input-shapes 'input:[1,3,224,224]' --verbose

Copy

得到如下運行日誌,表明兩個框架推理運行成功:

......

| Completed 1 iteration(s) in 1958 ms | Average inference time: 1958 ms.

......

| Completed 1 iteration(s) in 120.1 ms | Average inference time: 120.1 ms.

Copy

案例2:對比onnxtrt輸出結果(常用)

polygraphy還可以充當trtexec的功能,可以實現onnx匯出trt模型,並且進行逐層結果對比。

其中atol表示絕對誤差,rtol表示相對誤差。

polygraphy run  resnet50_bs_1.onnx --onnxrt --trt ^

--save-engine=resnet50_bs_1_fp32_polygraphy.engine ^

--onnx-outputs mark all --trt-outputs mark all ^

--input-shapes "input:[1,3,224,224]" ^

--atol 1e-3 --rtol 1e-3 --verbose > onnx-trt-compare.log

Copy

輸出的日誌如下:

對於每一個網路層會輸出onnxtrt的長條圖,絕對誤差長條圖,相對誤差長條圖

最後會統計所有網路層符合設置的超參數atol, rtol的百分比,本案例中 Pass Rate: 100.0%

[I]     Comparing Output: 'input.4' (dtype=float32, shape=(1, 64, 112, 112)) with 'input.4' (dtype=float32, shape=(1, 64, 112, 112))

[I]         Tolerance: [abs=0.001, rel=0.001] | Checking elemwise error

[I]         onnxrt-runner-N0-08/20/23-23:08:36: input.4 | Stats: mean=0.20451, std-dev=0.31748, var=0.10079, median=0.19194, min=-1.3369 at (0, 33, 13, 104), max=2.0691 at (0, 33, 64, 77), avg-magnitude=0.29563

[V]             ---- Histogram ----

                Bin Range        |  Num Elems | Visualization

                (-1.34 , -0.996) |         29 |

                (-0.996, -0.656) |      11765 | #

                (-0.656, -0.315) |      31237 | ###

                (-0.315, 0.0255) |     140832 | #############

                (0.0255, 0.366 ) |     411249 | ########################################

                (0.366 , 0.707 ) |     166138 | ################

                (0.707 , 1.05  ) |      40956 | ###

                (1.05  , 1.39  ) |        551 |

                (1.39  , 1.73  ) |         54 |

                (1.73  , 2.07  ) |          5 |

[I]         trt-runner-N0-08/20/23-23:08:36: input.4 | Stats: mean=0.20451, std-dev=0.31748, var=0.10079, median=0.19194, min=-1.3369 at (0, 33, 13, 104), max=2.0691 at (0, 33, 64, 77), avg-magnitude=0.29563

[V]             ---- Histogram ----

                Bin Range        |  Num Elems | Visualization

                (-1.34 , -0.996) |         29 |

                (-0.996, -0.656) |      11765 | #

                (-0.656, -0.315) |      31237 | ###

                (-0.315, 0.0255) |     140832 | #############

                (0.0255, 0.366 ) |     411249 | ########################################

                (0.366 , 0.707 ) |     166138 | ################

                (0.707 , 1.05  ) |      40956 | ###

                (1.05  , 1.39  ) |        551 |

                (1.39  , 1.73  ) |         54 |

                (1.73  , 2.07  ) |          5 |

[I]         Error Metrics: input.4

[I]             Minimum Required Tolerance: elemwise error | [abs=8.3447e-07] OR [rel=0.037037] (requirements may be lower if both abs/rel tolerances are set)

[I]             Absolute Difference | Stats: mean=2.6075e-08, std-dev=3.2558e-08, var=1.06e-15, median=1.4901e-08, min=0 at (0, 0, 0, 3), max=8.3447e-07 at (0, 33, 86, 43), avg-magnitude=2.6075e-08

[V]                 ---- Histogram ----

                    Bin Range            |  Num Elems | Visualization

                    (0       , 8.34e-08) |     757931 | ########################################

                    (8.34e-08, 1.67e-07) |      40058 | ##

                    (1.67e-07, 2.5e-07 ) |       4334 |

                    (2.5e-07 , 3.34e-07) |        249 |

                    (3.34e-07, 4.17e-07) |        181 |

                    (4.17e-07, 5.01e-07) |         53 |

                    (5.01e-07, 5.84e-07) |          0 |

                    (5.84e-07, 6.68e-07) |          8 |

                    (6.68e-07, 7.51e-07) |          1 |

                    (7.51e-07, 8.34e-07) |          1 |

[I]             Relative Difference | Stats: mean=6.039e-07, std-dev=5.4597e-05, var=2.9809e-09, median=8.7838e-08, min=0 at (0, 0, 0, 3), max=0.037037 at (0, 4, 15, 12), avg-magnitude=6.039e-07

[V]                 ---- Histogram ----

                    Bin Range          |  Num Elems | Visualization

                    (0      , 0.0037 ) |     802806 | ########################################

                    (0.0037 , 0.00741) |          7 |

                    (0.00741, 0.0111 ) |          1 |

                    (0.0111 , 0.0148 ) |          0 |

                    (0.0148 , 0.0185 ) |          0 |

                    (0.0185 , 0.0222 ) |          0 |

                    (0.0222 , 0.0259 ) |          1 |

                    (0.0259 , 0.0296 ) |          0 |

                    (0.0296 , 0.0333 ) |          0 |

                    (0.0333 , 0.037  ) |          1 |

[I]         PASSED | Output: 'input.4' | Difference is within tolerance (rel=0.001, abs=0.001)

Copy

[I]     PASSED | All outputs matched | Outputs: ['input.4', 'onnx::MaxPool_323', 'input.8', 'input.16', 'onnx::Conv_327', 'input.24', 'onnx::Conv_330', 'onnx::Add_505', 'onnx::Add_508', 'onnx::Relu_335', 'input.36', 'input.44', 'onnx::Conv_339', 'input.52', 'onnx::Conv_342', 'onnx::Add_517', 'onnx::Relu_345', 'input.60', 'input.68', 'onnx::Conv_349', 'input.76', 'onnx::Conv_352', 'onnx::Add_526', 'onnx::Relu_355', 'input.84', 'input.92', 'onnx::Conv_359', 'input.100', 'onnx::Conv_362', 'onnx::Add_535', 'onnx::Add_538', 'onnx::Relu_367', 'input.112', 'input.120', 'onnx::Conv_371', 'input.128', 'onnx::Conv_374', 'onnx::Add_547', 'onnx::Relu_377', 'input.136', 'input.144', 'onnx::Conv_381', 'input.152', 'onnx::Conv_384', 'onnx::Add_556', 'onnx::Relu_387', 'input.160', 'input.168', 'onnx::Conv_391', 'input.176', 'onnx::Conv_394', 'onnx::Add_565', 'onnx::Relu_397', 'input.184', 'input.192', 'onnx::Conv_401', 'input.200', 'onnx::Conv_404', 'onnx::Add_574', 'onnx::Add_577', 'onnx::Relu_409', 'input.212', 'input.220', 'onnx::Conv_413', 'input.228', 'onnx::Conv_416', 'onnx::Add_586', 'onnx::Relu_419', 'input.236', 'input.244', 'onnx::Conv_423', 'input.252', 'onnx::Conv_426', 'onnx::Add_595', 'onnx::Relu_429', 'input.260', 'input.268', 'onnx::Conv_433', 'input.276', 'onnx::Conv_436', 'onnx::Add_604', 'onnx::Relu_439', 'input.284', 'input.292', 'onnx::Conv_443', 'input.300', 'onnx::Conv_446', 'onnx::Add_613', 'onnx::Relu_449', 'input.308', 'input.316', 'onnx::Conv_453', 'input.324', 'onnx::Conv_456', 'onnx::Add_622', 'onnx::Relu_459', 'input.332', 'input.340', 'onnx::Conv_463', 'input.348', 'onnx::Conv_466', 'onnx::Add_631', 'onnx::Add_634', 'onnx::Relu_471', 'input.360', 'input.368', 'onnx::Conv_475', 'input.376', 'onnx::Conv_478', 'onnx::Add_643', 'onnx::Relu_481', 'input.384', 'input.392', 'onnx::Conv_485', 'input.400', 'onnx::Conv_488', 'onnx::Add_652', 'onnx::Relu_491', 'input.408', 'onnx::Flatten_493', 'onnx::Gemm_494', 'output']

 

 

[I] Accuracy Summary | onnxrt-runner-N0-08/20/23-23:08:36 vs. trt-runner-N0-08/20/23-23:08:36 | Passed: 1/1 iterations | Pass Rate: 100.0%

Copy

更多使用案例推薦閱讀github cookbook

小結

本節介紹了nsight systempolygraphy的應用,在模型部署全流程中,可以深入挖掘的還有很多,推薦查看TensorRTGitHub下的tools目錄

 

arrow
arrow
    全站熱搜
    創作者介紹
    創作者 HCHUNGW 的頭像
    HCHUNGW

    HCHUNGW的部落格

    HCHUNGW 發表在 痞客邦 留言(0) 人氣()