Optimization | Vojtěch Šíma

Fabric Dataflow Gen2 Partitioned Compute: Setup and Benchmark

Partitioned Compute in Dataflow Gen2 is a feature designed to process multiple files at the same time, potentially slashing your refresh durations. But how do you actually get it working with your data? In this post, I break down the exact setup process for ADLS Gen2, explain how the engine streams requests under the hood, and put it to the test with a massive benchmark. Read on to learn how to configure your partition keys and see the raw numbers for yourself.

Data Engineering

Vojtěch Šíma

Mar 156 min read

Fabric Dataflow Gen2 Partitioned Compute: Nastavení a Benchmark

Partitioned Compute v Dataflow Gen2 umí zpracovávat více souborů paralelně a teoreticky tak výrazně zkrátit dobu refreshe. Jak to ale reálně rozchodit s tvými daty? V tomhle článku ti ukážu přesný postup nastavení pro ADLS Gen2, vysvětlím, jak tenhle engine funguje pod pokličkou, a rovnou ho podrobím zátěžovému testu. Přečti si, jak správně nastavit partition key a zjisti z mého benchmarku, jak si tahle novinka vede v praxi.

Data Engineering

Vojtěch Šíma

Mar 156 min read

Automatická údržba delta tabulek ve Fabric Lakehouse (bez PySparku)

Aby byla údržba Delta tabulek správná, musíš nejdřív pochopit transakční log, checkpointy a vliv write amplification na úložiště. Článek rozebírá fyzickou architekturu Delta tabulek a ukazuje automatizaci kontrol zdraví, operací VACUUM a OPTIMIZE pomocí Python knihovny delta-rs v Microsoft Fabric. Nabízí tak lehkou alternativu k údržbě založené na Sparku.

Microsoft Fabric

Vojtěch Šíma

Feb 1510 min read

Automated Delta Table Maintenance in Fabric Lakehouse (Without PySpark)

To properly maintain Delta Tables, you first need to understand the transaction log, checkpoints, and how write amplification affects storage. This post breaks down the physical architecture of Delta Tables and demonstrates how to automate health checks, VACUUM, and OPTIMIZE operations using the delta-rs Python library in Microsoft Fabric, offering a lightweight alternative to Spark-based maintenance.

Microsoft Fabric

Vojtěch Šíma

Feb 1413 min read

Proč je List.Contains v Power Query pomalý? Rychlejší Lookup Alternativy

Srovnání metod pro lookup v Power Query: List.Contains vs Table.Join vs Record.FieldOrDefault. Benchmarky na 1M řádků ukazují, že skenování listu je pomalé a rychlost závisí na pozici, zatímco mapování přes record vytvořený pomocí Record.FromList drží čas prakticky konstantní. Do testů bylo zahrnuto i rozbalování se zadanými typy přes Record.FieldOrDefault.

Data Engineering

Vojtěch Šíma

Aug 24, 20259 min read

Comparison of Power Query lookup methods: List.Contains vs Table.Join vs Record.FieldOrDefault. Benchmarks on 1M rows show list scans slow with position, while a record map built via Record.FromList stays near constant time. Includes typed expansion with Record.FieldOrDefault.

Why Is Power Query List.Contains Slow? Faster Lookup Alternatives

List.Contains scans and stalls at scale. This post benchmarks lookup patterns on real tables and shows why buffering only helps a little, why joins are solid, and how a record map built with Record.FromList plus Record.FieldOrDefault delivers quick, clean lookups. Learn when to keep the merge and when to build the map.

Data Engineering

Vojtěch Šíma

Aug 21, 202510 min read

DAX - AVERAGEX a SUMX příklad pro výpočet průměru napříč kategoriemi v Power BI bez opakovaného použití CALCULATE

Už žádné spamování CALCULATE: Iterace přes kategorie v DAXu

Jak nahradit opakovaný CALCULATE v Power BI pomocí DAX iterátorů jako SUMX a AVERAGEX. Čistší, flexibilnější a škálovatelnější výpočty nad kategoriemi.

Semantic Modeling

Vojtěch Šíma

Jul 1, 20257 min read