Basic Matmul
【免费下载链接】catlass本项目是CANN的算子模板库,提供NPU上高性能矩阵乘及其相关融合类算子模板样例。项目地址: https://gitcode.com/cann/catlass
代码位置
功能说明
基础矩阵乘,cube算子,无AIV计算,非TLA实现。
类模板概述
- 模板入参
class BlockMmad_:blockMmad类,矩阵乘组件class BlockEpilogue_:blockEpilogue类,后处理组件,实际未使用class BlockScheduler_:blockScheduler类,仅支持Gemm::Block::GemmIdentityBlockSwizzle
- Params:
struct Params { GemmCoord problemShape; //用例shape GM_ADDR ptrA; //输入matA的GM起始地址 LayoutA layoutA; //输入matA的layout GM_ADDR ptrB; //输入matB的GM起始地址 LayoutB layoutB; //输入matB的layout GM_ADDR ptrC; //输出matC的GM起始地址 LayoutC layoutC; //输出matC的layout ... }- Arguments:
struct Arguments { GemmCoord problemShape; //用例shape GM_ADDR ptrA; //输入matA的GM起始地址 GM_ADDR ptrB; //输入matB的GM起始地址 GM_ADDR ptrC; //输出matC的GM起始地址 };调用示例
kernel组装
using BlockMmad = Gemm::Block::BlockMmad<DispatchPolicy, L1TileShape, L0TileShape, AType, BType, CType>; using BlockEpilogue = void; using BlockScheduler = typename Gemm::Block::GemmIdentityBlockSwizzle<3, 0>; // kernel level using MatmulKernel = Gemm::Kernel::BasicMatmul<BlockMmad, BlockEpilogue, BlockScheduler>;约束说明
该kernel在void operator()<AscendC::AIC>核函数中,调用blockMmad的方式不涉及异步和Preload,故仅支持block_mmad_pingpong等简单blockMmad组件
【免费下载链接】catlass本项目是CANN的算子模板库,提供NPU上高性能矩阵乘及其相关融合类算子模板样例。项目地址: https://gitcode.com/cann/catlass
创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考