Basic API Contribution Guide
【免费下载链接】asc-devkit本项目是CANN 推出的昇腾AI处理器专用的算子程序开发语言,原生支持C和C++标准规范,主要由类库和语言扩展层构成,提供多层级API,满足多维场景算子开发诉求。项目地址: https://gitcode.com/cann/asc-devkit
Overview
Basic API is the instruction-level API layer in the Ascend C programming framework. It directly wraps hardware instructions of Ascend AI processors and uses C++ style function interfaces. Basic API serves as the foundation for building high-level APIs. Developers can implement complex algorithm logic by combining basic APIs.
Core Features of Basic API:
- Instruction-level encapsulation: Each API maps to one or more hardware instructions.
- LocalTensor abstraction: Uses
LocalTensor<T>type to operate memory. - Template design: Supports multiple data types (half, float, int16_t, int32_t, and so on).
- Dual interfaces: High-dimensional tiling computation (fine control) and first-n elements computation (simplified invocation).
- Architecture adaptation: Supports different NPU architectures through architecture macro definitions.
Development Process
Requirement Analysis
- Define API functionality (for example, Add, Mul, Relu).
- Determine supported data types.
- Analyze hardware instruction support.
API Design
- Define function prototypes (using LocalTensor).
- Design high-dimensional tiling computation and first-n elements computation interfaces.
- Define parameter specifications (mask, repeat, stride, and so on).
Implementation Development
- Write interface declarations (include/basic_api/).
- Implement core logic (impl/basic_api/).
- Handle architecture differences.
Test and Verification
- Write unit tests.
- Verify functional correctness.
- Check boundary conditions.
Documentation
- Complete API documentation.
- Provide usage examples.
- Explain constraints.
API Introduction
High-dimensional Tiling Computation vs First-n Elements Computation Interface
High-dimensional Tiling Computation (Fine Control)
// Requires manual setting of mask and repeat parameters template <typename T, bool isSetMask = true> __aicore__ inline void Add(const LocalTensor<T>& dst, const LocalTensor<T>& src0, const LocalTensor<T>& src1, uint64_t mask[], // mask array const uint8_t repeatTime, // repeat count const BinaryRepeatParams& repeatParams); // stride parametersApplicable Scenarios:
- Require fine control over computation process.
- Non-contiguous memory access.
- Performance optimization.
First-n Elements Computation (Simplified Invocation)
// Automatically handles mask and repeat template <typename T> __aicore__ inline void Add(const LocalTensor<T>& dst, const LocalTensor<T>& src0, const LocalTensor<T>& src1, const int32_t& count); // only element count neededApplicable Scenarios:
- Contiguous memory block computation.
- Simplified code.
- Rapid development.
Directory Planning
Directory Structure
asc-devkit/ ├── include/ │ └── basic_api/ # Basic API header files │ ├── kernel_operator_common_intf.h # Common interface │ ├── kernel_operator_vec_binary_intf.h # Vector binary operations │ ├── kernel_operator_vec_unary_intf.h # Vector unary operations │ ├── kernel_operator_data_copy_intf.h # Data movement │ ├── kernel_operator_fixpipe_intf.h # Fixpipe │ ├── kernel_operator_mm_intf.h # Matrix multiplication │ ├── kernel_operator_scalar_intf.h # Scalar operations │ ├── kernel_operator_sys_var_intf.h # System variables │ ├── kernel_operator_atomic_intf.h # Atomic operations │ ├── kernel_tensor.h # Tensor definition │ └── kernel_struct_*.h # Parameter structures │ ├── impl/ │ └── basic_api/ # Basic API implementation │ ├── dav_m200/ # NPU ARCH 200x architecture │ │ ├── kernel_operator_vec_binary_impl.h │ │ └── ... │ ├── dav_c220/ # NPU ARCH 220x architecture │ │ ├── kernel_operator_vec_binary_impl.h │ │ └── ... │ └── CMakeLists.txt │ ├── tests/ │ └── api/ │ └── basic_api/ # Basic API tests │ ├── tikcpp_case_common/ │ │ └── test_operator_axpy.cpp │ ├── tikcpp_case_ascend910/ │ │ └── ... │ └── tikcpp_case_ascend910b1/ │ └── ... │ └── docs/ └── api/ └── context/ └── ... # Basic API documentationFile Naming Conventions
| File Type | Naming Convention | Example |
|---|---|---|
| Interface header | kernel_operator_<category>_intf.h | kernel_operator_vec_binary_intf.h |
| Implementation file | kernel_operator_<category>_impl.h | kernel_operator_vec_binary_impl.h |
| Test file | test_operator_<category>.cpp | test_operator_vec_binary.cpp |
| Documentation file | <api>.md | Add.md |
API Categories
| Category | Description | Example APIs |
|---|---|---|
| vec_binary | Vector binary operations | Add, Sub, Mul, Div, Max, Min |
| vec_unary | Vector unary operations | Relu, Exp, Cast, Abs |
| vec_reduce | Vector reduction | Sum, Max, Mean |
| data_copy | Data movement | DataCopy, LoadData |
| fixpipe | Pipeline control | Fixpipe |
| mm | Matrix multiplication | Mmad, Conv2D |
| scalar | Scalar operations | ToFloat |
| atomic | Atomic operations | AtomicAdd, AtomicCAS |
Architecture Design
Implementation Layers
Layer 1: Interface Declaration Layer (include/basic_api/)
// include/basic_api/kernel_operator_vec_binary_intf.h #ifndef ASCENDC_MODULE_OPERATOR_VEC_BINARY_INTERFACE_H #define ASCENDC_MODULE_OPERATOR_VEC_BINARY_INTERFACE_H #include "kernel_tensor.h" #include "kernel_struct_binary.h" namespace AscendC { // Add - High-dimensional tiling computation template <typename T, bool isSetMask = true> __aicore__ inline void Add(const LocalTensor<T>& dst, const LocalTensor<T>& src0, const LocalTensor<T>& src1, uint64_t mask[], const uint8_t repeatTime, const BinaryRepeatParams& repeatParams); // Add - First-n elements computation template <typename T> __aicore__ inline void Add(const LocalTensor<T>& dst, const LocalTensor<T>& src0, const LocalTensor<T>& src1, const int32_t& count); } // namespace AscendC #include "impl/basic_api/kernel_operator_vec_binary_intf_impl.h" #endifLayer 2: Instruction Implementation Layer (impl/basic_api/)
// impl/basic_api/dav_c220/kernel_operator_vec_binary_impl.h #ifndef ASCENDC_MODULE_OPERATOR_VEC_BINARY_IMPL_H #define ASCENDC_MODULE_OPERATOR_VEC_BINARY_IMPL_H namespace AscendC { // Add implementation - First-n elements computation template <typename T> __aicore__ inline void AddImpl(__ubuf__ T* dst, __ubuf__ T* src0, __ubuf__ T* src1, const int32_t& count) { if ASCEND_IS_AIV { // 1. Set mask set_mask_count(); set_vector_mask(0, count); // 2. Call underlying instruction vadd(dst, src0, src1, 1, DEFAULT_BLK_STRIDE, DEFAULT_BLK_STRIDE, DEFAULT_BLK_STRIDE, DEFAULT_REPEAT_STRIDE, DEFAULT_REPEAT_STRIDE, DEFAULT_REPEAT_STRIDE); // 3. Restore mask set_mask_norm(); set_vector_mask(static_cast<uint64_t>(-1), static_cast<uint64_t>(-1)); } } // Add implementation - High-dimensional tiling computation template <typename T, bool isSetMask = true> __aicore__ inline void AddImpl(__ubuf__ T* dst, __ubuf__ T* src0, __ubuf__ T* src1, const uint64_t mask[], const uint8_t repeatTime, const BinaryRepeatParams& repeatParams) { if ASCEND_IS_AIV { // Set mask (if needed) if (isSetMask) { AscendCUtils::SetMask<T, isSetMask>(mask[1], mask[0]); } // Call underlying instruction vadd(dst, src0, src1, repeatTime, repeatParams.dstBlkStride, repeatParams.src0BlkStride, repeatParams.src1BlkStride, repeatParams.dstRepStride, repeatParams.src0RepStride, repeatParams.src1RepStride); } } } // namespace AscendC #endifLayer 3: Interface Wrapper Layer
// impl/basic_api/kernel_operator_vec_binary_intf_impl.h namespace AscendC { // First-n elements computation interface wrapper template <typename T> __aicore__ inline void Add(const LocalTensor<T>& dst, const LocalTensor<T>& src0, const LocalTensor<T>& src1, const int32_t& count) { AddImpl<T>(dst.GetPtr(), src0.GetPtr(), src1.GetPtr(), count); } // High-dimensional tiling computation interface wrapper template <typename T, bool isSetMask = true> __aicore__ inline void Add(const LocalTensor<T>& dst, const LocalTensor<T>& src0, const LocalTensor<T>& src1, uint64_t mask[], const uint8_t repeatTime, const BinaryRepeatParams& repeatParams) { AddImpl<T, isSetMask>(dst.GetPtr(), src0.GetPtr(), src1.GetPtr(), mask, repeatTime, repeatParams); } } // namespace AscendCArchitecture Adaptation
Hardware may differ across NPU architectures and requires reimplementation.
Development Example: Implementing Axpy Basic API
API Requirement Analysis
Implement vector multiply-add:dst = src * scalar + dst
- Supported data types: half, float
- Interface type: First-n elements computation (simplified invocation)
- Hardware support: Confirm hardware support
Review Existing API Structure
Basic API usesLocalTensor<T>as parameters. The first-n elements computation interface only requires the count parameter:
// Reference existing Add interface template <typename T> __aicore__ inline void Add(const LocalTensor<T>& dst, const LocalTensor<T>& src0, const LocalTensor<T>& src1, const int32_t& count);Interface Design
Add ininclude/basic_api/kernel_operator_vec_binary_intf.h:
/* ************************************************************************************************** * Axpy * * ************************************************************************************************* */ /* * @ingroup Axpy * @brief dst = dst + src * scalar * @param [out] dst output LocalTensor * @param [in] src input LocalTensor * @param [in] scalar scalar value * @param [in] count number Number of data involved in calculation */ template <typename T, typename U> __aicore__ inline void Axpy(const LocalTensor<T>& dst, const LocalTensor<U>& src, const U scalar, const int32_t& count);Implementation Code
Reference other interface implementations.
Interface Wrapper
Add inimpl/basic_api/kernel_operator_vec_binary_intf_impl.h:
template <typename T, typename U> __aicore__ inline void Axpy(const LocalTensor<T>& dst, const LocalTensor<U>& src, const U scalar, const int32_t& count) { AxpyImpl<T, U>(dst.GetPtr(), src.GetPtr(), scalar, count); }Test Code
Add test code for the corresponding interface.
Test and Verification Requirements
Functional Testing
Verify API computation correctness.
Boundary Testing
TEST_F(TestAxpy, BoundaryTest) { // Test boundary values: count=0, 1, 256, 257 // Test different data type combinations // Test special values (NaN, Inf) }Data Type Testing
INSTANTIATE_TEST_CASE_P(TEST_AXPY_TYPES, AxpyTestsuite, ::testing::Values( BinaryTestParams { 256, 2, 2, main_axpy<half, half> }, BinaryTestParams { 256, 4, 2, main_axpy<float, half> }, BinaryTestParams { 256, 4, 4, main_axpy<float, float> } ) );Code Standards
Naming Conventions
// Function name: PascalCase, first letter uppercase void Add(...); void Relu(...); void Axpy(...); // Parameter name: camelCase LocalTensor<T> dstTensor; int32_t elementCount; // Macro definition: UPPERCASE_WITH_UNDERSCORES #define ASCENDC_ASSERT(cond, msg) ... // Type name: PascalCase struct BinaryRepeatParams; class LocalTensor;Code Style
// 1. Indentation: 4 spaces // 2. Braces: K&R style // 3. Spaces: Spaces around operators // 4. Comments: Doxygen style /** * @brief Vector addition operation * @param dst Destination LocalTensor * @param src0 Source LocalTensor 0 * @param src1 Source LocalTensor 1 * @param count Element count */ template <typename T> __aicore__ inline void Add(const LocalTensor<T>& dst, const LocalTensor<T>& src0, const LocalTensor<T>& src1, const int32_t& count) { // Parameter validation ASCENDC_ASSERT(count > 0, "count must be positive"); // Call implementation AddImpl<T>(dst.GetPtr(), src0.GetPtr(), src1.GetPtr(), count); }Error Handling
// 1. Parameter validation (Debug mode) ASCENDC_ASSERT(count > 0, "count must be greater than 0"); ASCENDC_ASSERT(dst != nullptr, "dst cannot be nullptr"); ASCENDC_ASSERT(src != nullptr, "src cannot be nullptr"); // 2. Type checking static_assert(SupportType<T, half, float, int16_t, int32_t>(), "Unsupported data type"); // 3. Architecture checking #if !defined(__NPU_ARCH__) || (__NPU_ARCH__ != 2201 && __NPU_ARCH__ != 3510) #error "Unsupported NPU architecture" #endif【免费下载链接】asc-devkit本项目是CANN 推出的昇腾AI处理器专用的算子程序开发语言,原生支持C和C++标准规范,主要由类库和语言扩展层构成,提供多层级API,满足多维场景算子开发诉求。项目地址: https://gitcode.com/cann/asc-devkit
创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考