news 2026/5/22 17:11:32

CANN asc-devkit基础API指南

作者头像

张小明

前端开发工程师

1.2k 24
文章封面图
CANN asc-devkit基础API指南

Basic API Contribution Guide

【免费下载链接】asc-devkit本项目是CANN 推出的昇腾AI处理器专用的算子程序开发语言,原生支持C和C++标准规范,主要由类库和语言扩展层构成,提供多层级API,满足多维场景算子开发诉求。项目地址: https://gitcode.com/cann/asc-devkit

Overview

Basic API is the instruction-level API layer in the Ascend C programming framework. It directly wraps hardware instructions of Ascend AI processors and uses C++ style function interfaces. Basic API serves as the foundation for building high-level APIs. Developers can implement complex algorithm logic by combining basic APIs.

Core Features of Basic API:

  • Instruction-level encapsulation: Each API maps to one or more hardware instructions.
  • LocalTensor abstraction: UsesLocalTensor<T>type to operate memory.
  • Template design: Supports multiple data types (half, float, int16_t, int32_t, and so on).
  • Dual interfaces: High-dimensional tiling computation (fine control) and first-n elements computation (simplified invocation).
  • Architecture adaptation: Supports different NPU architectures through architecture macro definitions.

Development Process

Requirement Analysis

  • Define API functionality (for example, Add, Mul, Relu).
  • Determine supported data types.
  • Analyze hardware instruction support.

API Design

  • Define function prototypes (using LocalTensor).
  • Design high-dimensional tiling computation and first-n elements computation interfaces.
  • Define parameter specifications (mask, repeat, stride, and so on).

Implementation Development

  • Write interface declarations (include/basic_api/).
  • Implement core logic (impl/basic_api/).
  • Handle architecture differences.

Test and Verification

  • Write unit tests.
  • Verify functional correctness.
  • Check boundary conditions.

Documentation

  • Complete API documentation.
  • Provide usage examples.
  • Explain constraints.

API Introduction

High-dimensional Tiling Computation vs First-n Elements Computation Interface

High-dimensional Tiling Computation (Fine Control)
// Requires manual setting of mask and repeat parameters template <typename T, bool isSetMask = true> __aicore__ inline void Add(const LocalTensor<T>& dst, const LocalTensor<T>& src0, const LocalTensor<T>& src1, uint64_t mask[], // mask array const uint8_t repeatTime, // repeat count const BinaryRepeatParams& repeatParams); // stride parameters

Applicable Scenarios:

  • Require fine control over computation process.
  • Non-contiguous memory access.
  • Performance optimization.
First-n Elements Computation (Simplified Invocation)
// Automatically handles mask and repeat template <typename T> __aicore__ inline void Add(const LocalTensor<T>& dst, const LocalTensor<T>& src0, const LocalTensor<T>& src1, const int32_t& count); // only element count needed

Applicable Scenarios:

  • Contiguous memory block computation.
  • Simplified code.
  • Rapid development.

Directory Planning

Directory Structure

asc-devkit/ ├── include/ │ └── basic_api/ # Basic API header files │ ├── kernel_operator_common_intf.h # Common interface │ ├── kernel_operator_vec_binary_intf.h # Vector binary operations │ ├── kernel_operator_vec_unary_intf.h # Vector unary operations │ ├── kernel_operator_data_copy_intf.h # Data movement │ ├── kernel_operator_fixpipe_intf.h # Fixpipe │ ├── kernel_operator_mm_intf.h # Matrix multiplication │ ├── kernel_operator_scalar_intf.h # Scalar operations │ ├── kernel_operator_sys_var_intf.h # System variables │ ├── kernel_operator_atomic_intf.h # Atomic operations │ ├── kernel_tensor.h # Tensor definition │ └── kernel_struct_*.h # Parameter structures │ ├── impl/ │ └── basic_api/ # Basic API implementation │ ├── dav_m200/ # NPU ARCH 200x architecture │ │ ├── kernel_operator_vec_binary_impl.h │ │ └── ... │ ├── dav_c220/ # NPU ARCH 220x architecture │ │ ├── kernel_operator_vec_binary_impl.h │ │ └── ... │ └── CMakeLists.txt │ ├── tests/ │ └── api/ │ └── basic_api/ # Basic API tests │ ├── tikcpp_case_common/ │ │ └── test_operator_axpy.cpp │ ├── tikcpp_case_ascend910/ │ │ └── ... │ └── tikcpp_case_ascend910b1/ │ └── ... │ └── docs/ └── api/ └── context/ └── ... # Basic API documentation

File Naming Conventions

File TypeNaming ConventionExample
Interface headerkernel_operator_<category>_intf.hkernel_operator_vec_binary_intf.h
Implementation filekernel_operator_<category>_impl.hkernel_operator_vec_binary_impl.h
Test filetest_operator_<category>.cpptest_operator_vec_binary.cpp
Documentation file<api>.mdAdd.md

API Categories

CategoryDescriptionExample APIs
vec_binaryVector binary operationsAdd, Sub, Mul, Div, Max, Min
vec_unaryVector unary operationsRelu, Exp, Cast, Abs
vec_reduceVector reductionSum, Max, Mean
data_copyData movementDataCopy, LoadData
fixpipePipeline controlFixpipe
mmMatrix multiplicationMmad, Conv2D
scalarScalar operationsToFloat
atomicAtomic operationsAtomicAdd, AtomicCAS

Architecture Design

Implementation Layers

Layer 1: Interface Declaration Layer (include/basic_api/)
// include/basic_api/kernel_operator_vec_binary_intf.h #ifndef ASCENDC_MODULE_OPERATOR_VEC_BINARY_INTERFACE_H #define ASCENDC_MODULE_OPERATOR_VEC_BINARY_INTERFACE_H #include "kernel_tensor.h" #include "kernel_struct_binary.h" namespace AscendC { // Add - High-dimensional tiling computation template <typename T, bool isSetMask = true> __aicore__ inline void Add(const LocalTensor<T>& dst, const LocalTensor<T>& src0, const LocalTensor<T>& src1, uint64_t mask[], const uint8_t repeatTime, const BinaryRepeatParams& repeatParams); // Add - First-n elements computation template <typename T> __aicore__ inline void Add(const LocalTensor<T>& dst, const LocalTensor<T>& src0, const LocalTensor<T>& src1, const int32_t& count); } // namespace AscendC #include "impl/basic_api/kernel_operator_vec_binary_intf_impl.h" #endif
Layer 2: Instruction Implementation Layer (impl/basic_api/)
// impl/basic_api/dav_c220/kernel_operator_vec_binary_impl.h #ifndef ASCENDC_MODULE_OPERATOR_VEC_BINARY_IMPL_H #define ASCENDC_MODULE_OPERATOR_VEC_BINARY_IMPL_H namespace AscendC { // Add implementation - First-n elements computation template <typename T> __aicore__ inline void AddImpl(__ubuf__ T* dst, __ubuf__ T* src0, __ubuf__ T* src1, const int32_t& count) { if ASCEND_IS_AIV { // 1. Set mask set_mask_count(); set_vector_mask(0, count); // 2. Call underlying instruction vadd(dst, src0, src1, 1, DEFAULT_BLK_STRIDE, DEFAULT_BLK_STRIDE, DEFAULT_BLK_STRIDE, DEFAULT_REPEAT_STRIDE, DEFAULT_REPEAT_STRIDE, DEFAULT_REPEAT_STRIDE); // 3. Restore mask set_mask_norm(); set_vector_mask(static_cast<uint64_t>(-1), static_cast<uint64_t>(-1)); } } // Add implementation - High-dimensional tiling computation template <typename T, bool isSetMask = true> __aicore__ inline void AddImpl(__ubuf__ T* dst, __ubuf__ T* src0, __ubuf__ T* src1, const uint64_t mask[], const uint8_t repeatTime, const BinaryRepeatParams& repeatParams) { if ASCEND_IS_AIV { // Set mask (if needed) if (isSetMask) { AscendCUtils::SetMask<T, isSetMask>(mask[1], mask[0]); } // Call underlying instruction vadd(dst, src0, src1, repeatTime, repeatParams.dstBlkStride, repeatParams.src0BlkStride, repeatParams.src1BlkStride, repeatParams.dstRepStride, repeatParams.src0RepStride, repeatParams.src1RepStride); } } } // namespace AscendC #endif
Layer 3: Interface Wrapper Layer
// impl/basic_api/kernel_operator_vec_binary_intf_impl.h namespace AscendC { // First-n elements computation interface wrapper template <typename T> __aicore__ inline void Add(const LocalTensor<T>& dst, const LocalTensor<T>& src0, const LocalTensor<T>& src1, const int32_t& count) { AddImpl<T>(dst.GetPtr(), src0.GetPtr(), src1.GetPtr(), count); } // High-dimensional tiling computation interface wrapper template <typename T, bool isSetMask = true> __aicore__ inline void Add(const LocalTensor<T>& dst, const LocalTensor<T>& src0, const LocalTensor<T>& src1, uint64_t mask[], const uint8_t repeatTime, const BinaryRepeatParams& repeatParams) { AddImpl<T, isSetMask>(dst.GetPtr(), src0.GetPtr(), src1.GetPtr(), mask, repeatTime, repeatParams); } } // namespace AscendC

Architecture Adaptation

Hardware may differ across NPU architectures and requires reimplementation.


Development Example: Implementing Axpy Basic API

API Requirement Analysis

Implement vector multiply-add:dst = src * scalar + dst

  • Supported data types: half, float
  • Interface type: First-n elements computation (simplified invocation)
  • Hardware support: Confirm hardware support

Review Existing API Structure

Basic API usesLocalTensor<T>as parameters. The first-n elements computation interface only requires the count parameter:

// Reference existing Add interface template <typename T> __aicore__ inline void Add(const LocalTensor<T>& dst, const LocalTensor<T>& src0, const LocalTensor<T>& src1, const int32_t& count);

Interface Design

Add ininclude/basic_api/kernel_operator_vec_binary_intf.h:

/* ************************************************************************************************** * Axpy * * ************************************************************************************************* */ /* * @ingroup Axpy * @brief dst = dst + src * scalar * @param [out] dst output LocalTensor * @param [in] src input LocalTensor * @param [in] scalar scalar value * @param [in] count number Number of data involved in calculation */ template <typename T, typename U> __aicore__ inline void Axpy(const LocalTensor<T>& dst, const LocalTensor<U>& src, const U scalar, const int32_t& count);

Implementation Code

Reference other interface implementations.

Interface Wrapper

Add inimpl/basic_api/kernel_operator_vec_binary_intf_impl.h:

template <typename T, typename U> __aicore__ inline void Axpy(const LocalTensor<T>& dst, const LocalTensor<U>& src, const U scalar, const int32_t& count) { AxpyImpl<T, U>(dst.GetPtr(), src.GetPtr(), scalar, count); }

Test Code

Add test code for the corresponding interface.


Test and Verification Requirements

Functional Testing

Verify API computation correctness.

Boundary Testing

TEST_F(TestAxpy, BoundaryTest) { // Test boundary values: count=0, 1, 256, 257 // Test different data type combinations // Test special values (NaN, Inf) }

Data Type Testing

INSTANTIATE_TEST_CASE_P(TEST_AXPY_TYPES, AxpyTestsuite, ::testing::Values( BinaryTestParams { 256, 2, 2, main_axpy<half, half> }, BinaryTestParams { 256, 4, 2, main_axpy<float, half> }, BinaryTestParams { 256, 4, 4, main_axpy<float, float> } ) );

Code Standards

Naming Conventions

// Function name: PascalCase, first letter uppercase void Add(...); void Relu(...); void Axpy(...); // Parameter name: camelCase LocalTensor<T> dstTensor; int32_t elementCount; // Macro definition: UPPERCASE_WITH_UNDERSCORES #define ASCENDC_ASSERT(cond, msg) ... // Type name: PascalCase struct BinaryRepeatParams; class LocalTensor;

Code Style

// 1. Indentation: 4 spaces // 2. Braces: K&R style // 3. Spaces: Spaces around operators // 4. Comments: Doxygen style /** * @brief Vector addition operation * @param dst Destination LocalTensor * @param src0 Source LocalTensor 0 * @param src1 Source LocalTensor 1 * @param count Element count */ template <typename T> __aicore__ inline void Add(const LocalTensor<T>& dst, const LocalTensor<T>& src0, const LocalTensor<T>& src1, const int32_t& count) { // Parameter validation ASCENDC_ASSERT(count > 0, "count must be positive"); // Call implementation AddImpl<T>(dst.GetPtr(), src0.GetPtr(), src1.GetPtr(), count); }

Error Handling

// 1. Parameter validation (Debug mode) ASCENDC_ASSERT(count > 0, "count must be greater than 0"); ASCENDC_ASSERT(dst != nullptr, "dst cannot be nullptr"); ASCENDC_ASSERT(src != nullptr, "src cannot be nullptr"); // 2. Type checking static_assert(SupportType<T, half, float, int16_t, int32_t>(), "Unsupported data type"); // 3. Architecture checking #if !defined(__NPU_ARCH__) || (__NPU_ARCH__ != 2201 && __NPU_ARCH__ != 3510) #error "Unsupported NPU architecture" #endif

【免费下载链接】asc-devkit本项目是CANN 推出的昇腾AI处理器专用的算子程序开发语言,原生支持C和C++标准规范,主要由类库和语言扩展层构成,提供多层级API,满足多维场景算子开发诉求。项目地址: https://gitcode.com/cann/asc-devkit

创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考

版权声明: 本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若内容造成侵权/违法违规/事实不符,请联系邮箱:809451989@qq.com进行投诉反馈,一经查实,立即删除!
网站建设 2026/5/22 17:09:05

工业自然语言处理应用:工厂里的“翻译官“革命

标签&#xff1a; 自然语言处理 NLP BERT 文本挖掘 工业AI 知识抽取引言&#xff1a;从"人工录入"到"智能理解"想象一下&#xff0c;工厂里每天产生成千上万份维修工单、设备说明书、客户投诉记录——这些文字就像一门外语&#xff0c;需要大量人力去"…

作者头像 李华
网站建设 2026/5/22 17:06:19

Qt5超级模块性能优化完全指南:10个实用技巧提升应用性能

Qt5超级模块性能优化完全指南&#xff1a;10个实用技巧提升应用性能 【免费下载链接】qt5 Qt5 super module 项目地址: https://gitcode.com/gh_mirrors/qt/qt5 Qt5超级模块&#xff08;Qt5 super module&#xff09;是一个功能强大的跨平台应用程序开发框架&#xff0c…

作者头像 李华
网站建设 2026/5/22 17:04:56

如何3分钟掌握Mermaid Live Editor:免费在线图表编辑终极指南

如何3分钟掌握Mermaid Live Editor&#xff1a;免费在线图表编辑终极指南 【免费下载链接】mermaid-live-editor Edit, preview and share mermaid charts/diagrams. New implementation of the live editor. 项目地址: https://gitcode.com/GitHub_Trending/me/mermaid-live…

作者头像 李华
网站建设 2026/5/22 17:03:27

Illinois Rocstar LLC 完整介绍(CFD/多物理/高性能计算领域)

文章目录Illinois Rocstar LLC 完整介绍&#xff08;CFD/多物理/高性能计算领域&#xff09;一、基础概况二、核心业务&#xff08;和你关注的技术强相关&#xff09;1. 自研开源多物理仿真套件&#xff1a;Rocstar&#xff08;RocstarMP&#xff09;2. OpenFOAM深度定制与加速…

作者头像 李华
网站建设 2026/5/22 17:02:26

软件测试的隐藏晋升通道:从QA到QE再到QP

在软件测试领域&#xff0c;大多数人熟悉的职业路径是纵向的&#xff1a;初级、高级、测试架构师或测试经理。然而&#xff0c;在喧闹的晋升阶梯背后&#xff0c;还隐藏着一条认知门槛更高、价值密度更大的水平进化通道——从QA到QE&#xff0c;最终抵达QP。这不是岗位名称的更…

作者头像 李华