在 Power BI 中处理预计算层次数据-平芜编程栈

原文：towardsdatascience.com/on-handling-precalculated-hierarchical-data-in-power-bi-4a215b96b99c?source=collection_archive---------12-----------------------#2024-05-03

虽然层次结构是数据中的常见概念，但一些来源以不寻常的格式提供数据。通常，我们在最低层级获取值。但当我们得到预先聚合的值时，会发生什么呢？在这里，我将深入探讨这个话题。

https://medium.com/@salvatorecagliari?source=post_page---byline--4a215b96b99c--------------------------------https://towardsdatascience.com/?source=post_page---byline--4a215b96b99c-------------------------------- Salvatore Cagliari

·发表于Towards Data Science ·8 分钟阅读·2024 年 5 月 3 日

–

https://github.com/OpenDocCN/towardsdatascience-blog-zh-2024/raw/master/docs/img/45a1401ef7a0f5ec3fdf498000e02d6b.png

照片由ThisisEngineering提供，来源于Unsplash

介绍和数据

让我们设定一个场景：我们有一个包含行政费用的组织。

费用可以发生在国家、州和商店级别。

请看以下表格：

https://github.com/OpenDocCN/towardsdatascience-blog-zh-2024/raw/master/docs/img/dce67aeea8c01e4ae3050904b2f33e34.png

图 1 — 数据在预期位置的值（图源：作者）

我们看到两行分别是两家商店的费用，一行是南卡罗来纳州的组织费用。

我可以使用这些数据来计算费用的总和，并得出南卡罗来纳州所有商店的总费用。

但是，当源系统以不同的形式提供数据时，怎么办？

例如，像这样：

https://github.com/OpenDocCN/towardsdatascience-blog-zh-2024/raw/master/docs/img/ddddf77439bedfee5575a5da2680a495.png

图 2 — 南卡罗来纳州预先聚合的值数据（图源：作者）

第三行包含了南卡罗来纳州两家商店的预先聚合的总和，以及南卡罗来纳州的组织费用。

简单地将这三行相加会得到错误的结果，因为结果中会重复计算这两家商店的费用：

https://github.com/OpenDocCN/towardsdatascience-blog-zh-2024/raw/master/docs/img/134946f9f906a1cabb2f97d570761f53.png

图 3 — 聚合包含预先聚合值的数据时的错误结果（图源：作者）

挑战是：如何计算每个层级中的正确结果？

我的解决方案方法必须考虑以下几点：

我不能更改数据源中的数据。
我必须在数据模型中添加一些计算以纠正结果。
我必须在层级的每个级别执行不同的计算。

但是我在哪里以及如何进行操作呢？

我有三种方法可以解决这个问题：

添加一个计算列来获得正确的结果。
添加一个度量值来计算正确的结果。
使用可视化计算。

计算列

好的，让我们开始添加一些计算列。

首先，我需要知道每一行在层级中的级别。为此，我需要一个名为“路径长度”的列。这样的列通常用于处理父子层级。

因此，我添加了两列新列，以便更好地导航层级：

https://github.com/OpenDocCN/towardsdatascience-blog-zh-2024/raw/master/docs/img/b632fd803b5bd49043bcd3f945b5a8c1.png

图 4 — 用于层级导航的额外计算列（作者提供的图）

我使用了以下表达式来计算 HierachyPath 列：

HierarchyPath='Cost Data'[Country]&IF('Cost Data'[State]<>'Cost Data'[Country],"|"&'Cost Data'[Country])&IF('Cost Data'[Store]<>'Cost Data'[State],"|"&'Cost Data'[Store])

然后，我使用了PATHLENGHTH()函数来计算“路径长度”列：

Path Length=PATHLENGTH('Cost Data'[HierarchyPath])

接下来，我可以编写一个表达式，执行以下步骤来处理表中的每一行：

获取当前职位的值。
获取当前职位在层级中下方的值的总和。
从第 2 步的总和中扣除当前行中的值。

结果是一个包含上面第一张图片中值的列。

Corrected Expenses=VAR CurrentExp='Cost Data'[Expenses]VAR CurrentLevel='Cost Data'[Path Length]VAR CurrentPath='Cost Data'[HierarchyPath]VAR ChildExpenses=CALCULATE(SUM('Cost Data'[Expenses]),REMOVEFILTERS('Cost Data'),'Cost Data'[Path Length]=CurrentLevel+1,CONTAINSSTRING('Cost Data'[HierarchyPath],CurrentPath))RETURN CurrentExp-ChildExpenses

关键在于“ChildExpenses”变量的表达式。该表达式计算了当前职位下、同一父级下的所有行的总和。

请注意，在 Power BI 中调用CALCULATE()函数计算一个计算列时，会触发上下文转换。

如果你不熟悉上下文转换的概念，确保阅读我解释它的文章：

## DAX 中上下文转换的精彩之处

行上下文和筛选上下文是 DAX 中的常见概念。但我们可以通过上下文转换在这两者之间切换。

towardsdatascience.com

这是该列的结果：

https://github.com/OpenDocCN/towardsdatascience-blog-zh-2024/raw/master/docs/img/27ec7282310a6d995e3c4357a76feb3a.png

图 5 — 计算列的结果以获得正确的结果（作者提供的图）

这列替代了原始的 Expenses 列。

我将原始的 Expenses 列重命名为“Expense_Original”，并将计算列重命名为“Expenses”。由于 Expense_Original 列对报告没有用处，因此它在数据模型中是隐藏的。

现在，我可以直观地创建报告了：

https://github.com/OpenDocCN/towardsdatascience-blog-zh-2024/raw/master/docs/img/e76f5adc863aa25672ceb98b9dcdfcd4.png

图 6 — Power BI 中重命名的原始 Expenses 列和计算列并排显示（作者提供的图）

这是所需的结果。

但让我们看看我是否能创建一个度量值来计算正确的结果。

度量值

要编写一个度量值，我必须分别处理每个层级。

我不能使用与计算列相同的方法，因为在每个上层级（如国家或州）下，商店级别有多行数据。

结果是以下的 DAX 代码：

Expenses(Corrected)=VAR CurrentExp=[Expenses(Original)]VAR CurrentLevel=SELECTEDVALUE('Cost Data'[Path Length])VAR CurrentPath=SELECTEDVALUE('Cost Data'[HierarchyPath])VAR CurrentCountry=SELECTEDVALUE('Cost Data'[Country])VAR CurrentState=SELECTEDVALUE('Cost Data'[State])VAR CurrentStore=SELECTEDVALUE('Cost Data'[Store])VAR StateExpenses=--Get the pre-aggregated value of the Expensesforthe State CALCULATE([Expenses(Original)],REMOVEFILTERS('Cost Data'),'Cost Data'[Path Length]=CurrentLevel+1,CONTAINSSTRING('Cost Data'[HierarchyPath],CurrentPath))RETURN SWITCH(TRUE()--Calculation at the lowest level(Store)--But only when the Store has a different name than the State,NOT ISBLANK(CurrentStore)&&CurrentStore<>CurrentState,CurrentExp--Detract the Expensesfromthesumat the State level when the"Store"has the same nameasthe State--These are the rowswiththe Expensesforthe State,NOT ISBLANK(CurrentStore)&&CurrentStore=CurrentState,CurrentExp-StateExpenses--Calculate the Sum at the state level,NOT ISBLANK(CurrentState)&&ISBLANK(CurrentStore)--First,calculate the SumforallStores--But only when the Stores have a different name than the State,CALCULATE([Expenses(Original)],REMOVEFILTERS('Cost Data'),'Cost Data'[Country]=CurrentCountry,'Cost Data'[State]=CurrentState,'Cost Data'[Store]<>CurrentState)--At this stage,each rowinthe Visual has multiple Data rows.--Therefore,SELECTEDVALUE()forthe path doesn'treturnanyvalue.--Now add thesumforallStores,detracting the duplicate valueforthe"Stores"withthe same nameasthe State+(CALCULATE([Expenses(Original)],REMOVEFILTERS('Cost Data'),'Cost Data'[Country]=CurrentCountry,'Cost Data'[State]=CurrentState,'Cost Data'[Store]=CurrentState)-CALCULATE([Expenses(Original)],REMOVEFILTERS('Cost Data'),'Cost Data'[Country]=CurrentCountry,'Cost Data'[State]=CurrentState,'Cost Data'[Store]<>CurrentState))--Calculate the corrected Sumforthe Country--Must use the same logicasabove,but by moving one level above,considering only the Countryandthe State,CALCULATE([Expenses(Original)],REMOVEFILTERS('Cost Data'),'Cost Data'[Country]=CurrentCountry,'Cost Data'[State]<>CurrentCountry)+(CALCULATE([Expenses(Original)],REMOVEFILTERS('Cost Data'),'Cost Data'[Country]=CurrentCountry,'Cost Data'[State]=CurrentCountry)-CALCULATE([Expenses(Original)],REMOVEFILTERS('Cost Data'),'Cost Data'[Country]=CurrentCountry,'Cost Data'[State]<>CurrentCountry)))

我在代码中添加了大量的注释。

因此，我不会详细解释度量值的每一步。

然而，这种方法非常复杂，无法与使用计算列的方法的简便性相比。

可视化计算

最后，我可以使用 Power BI 中的最新功能之一：可视化计算。

可视化计算可以直接在视觉效果中添加计算，而无需将度量值添加到数据模型中。

这为我们提供了一些激动人心的可能性，并且消除了为满足特定视觉效果需求而编写度量值的必要。

我在下面的参考部分添加了一些关于这个话题的链接。

在这里，我尝试使用这个新功能来实现一个简单的解决方案。

然而，在进行了大量的研究和反复试验后，我仍然没有找到一个有效的解决方案。

我找到了解决方案来计算每个商店的正确结果，但对于州和国家的计算则没有成功：

Visual calculation=VAR CurrentCountry=[Country]VAR CurrentState=[State]RETURN SWITCH(TRUE(),[State]<>[Store]&&ISATLEVEL([Store]),[Expenses(Original)],[State]=[Store]&&ISATLEVEL([Store]),[Expenses(Original)]-CALCULATE(SUM([Expenses(Original)]),[State]<>[Store],[Country]=CurrentCountry,[State]=CurrentState))