<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet href="/scripts/pretty-feed-v3.xsl" type="text/xsl"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:h="http://www.w3.org/TR/html4/"><channel><title>TheUnknownBlog</title><description>Stay Hungry, Stay Foolish</description><link>https://20051110.xyz</link><item><title>装一台新电脑</title><link>https://20051110.xyz/blog/arch-install</link><guid isPermaLink="true">https://20051110.xyz/blog/arch-install</guid><description>I use arch btw.</description><pubDate>Sat, 21 Feb 2026 21:57:00 GMT</pubDate><content:encoded>&lt;p&gt;本来以为装一台新电脑没什么好写的，后来仔细一想，发现其实还是有不少坑的。从选硬件，到装系统，然后我又折腾了好久才把 Headless mode 的 game streaming 搞定，再是桌面美化。索性就把整个过程写下来，给以后自己和大家做个参考。&lt;/p&gt;
&lt;h2&gt;选硬件&lt;/h2&gt;
&lt;p&gt;先上配置单：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CPU: Intel Core i5-13490F&lt;/li&gt;
&lt;li&gt;主板: 七彩虹 B760M-T WIFI DDR4&lt;/li&gt;
&lt;li&gt;内存: 海盗船 Vengeance Pro DDR4 3200MHz 16GB + 威刚万紫千红 DDR4 2400MHz 8GB&lt;/li&gt;
&lt;li&gt;显卡: 华擎 RX 9070 GRE Steel Legend&lt;/li&gt;
&lt;li&gt;存储: 西数 SN570 1TB NVMe SSD&lt;/li&gt;
&lt;li&gt;电源: 利民 SP750 750W 80PLUS 白金全模组&lt;/li&gt;
&lt;li&gt;机箱: 随便买的&lt;/li&gt;
&lt;li&gt;散热器: 随便买的&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;对于上述所有硬件，我花出的成本价格为 4900 元左右。注意：这个价格已经减去了“我卖掉了家里两根内存条”的收入。如果不算的话，成本在 5450 元左右。&lt;/p&gt;
&lt;h3&gt;解答一些问题&lt;/h3&gt;
&lt;p&gt;为了防止杠精，让我先来把一些可能会被问到的问题回答一下。&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;为什么要装一台新电脑？&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;在笔者写这篇文章的时候内存的价格是疯狂的。16GB DDR5 6400MHz 的内存条一根要 1300 元，就算是 16GB DDR4 3200MHz 的内存条在京东上也要 600 元一根。那么为什么现在还要装一台新电脑呢？众所周知我现在用的是一台 MacBook Air M3，我平时一直是连接 SSH 来做开发的，老的开发机是 i5-4440S，已经有 13 个年头了，虽然说不上没法用，但是确实体验不太好。~~其实我主要是想要一台可以玩游戏的电脑，上面的只是借口。~~&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;为什么不直接买一台整机？&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;傻瓜才买整机。&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;为什么内存条混插？&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;因为便宜啊！那条 8GB 的内存条是我从家里老电脑上拆下来的，反正也没什么用处了。诶那你可能会说“混插频率不就被限制了吗？”好问题！但是 somehow 我家里那条威刚的体质特别好。BIOS 里我直接傻瓜 XMP 一开，内存频率就直接跑到了 3200MHz，时序 18-20-20-40 没有任何问题。（时序有点拉，但是总比跑 2400MHz 好）过了 MemTest86 的测试之后我就放心用了。&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;你硬盘哪里来的&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;那块 NVMe SSD 是我之前给 MacBook Air M3 买的扩容硬盘（暑假买的，花费 330 元），哈哈哈哈 现在同款硬盘要 899 元了。&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;13 代酷睿会缩肛你还买？&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;好问题！但是 guess what？i5-13490F 作为英特尔特供中国市场的版本，当时为了节省成本，使用的是跟 12 代 Alder Lake 一样的架构设计，没有缩肛🙂‍↔️ 它的核心规模和 i5-12600KF 是几乎一致的。（当时 i5-13400F 是混用 Alder Lake 核心和 Raptor Lake 核心，然后大家还想抽奖抽到 Raptor Lake 核心，结果反倒会缩肛。想不到这也有反转！）&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;为什么不买 AMD？&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;AMD 如果要上 DDR4 内存的话只有 Zen 3 架构能上。Zen 3 感觉还是太老了一些。并且还考虑到 AMD 平台的性能和内存频率非常挂钩（他的 Infinity Band 频率和内存频率是挂钩的）而我根本上不起 DDR5 内存条。&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;我感觉解答完上述问题之后，大家对我选硬件的思路应该有了一定的了解。在这里我再提几个我选硬件时候注意到的点：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;电源最好要选支持 ATX 3.1 标准的，毕竟现在显卡功耗越来越高了，ATX 3.1 规定了 12VHPWR 接口的供电要求以及峰值功率的要求，能更好地保护硬件。&lt;/li&gt;
&lt;li&gt;主板最好要带 WiFi 和蓝牙模块，因为这张显卡算是中高端型号，体积大（它占用了 2.9 槽位），相当于已经把主板最下方的一个 PCIe x1 插槽给挡住了，如果主板没有自带 WiFi 和蓝牙模块的话，就没法再插一张无线网卡了。（只能插 USB 无线网卡，但我个人不太喜欢用 USB 无线网卡）&lt;/li&gt;
&lt;li&gt;显卡保修最好是要支持个人送保的，现在显卡厂商中支持个人送保的并不多，华硕、技嘉、微星、七彩虹等品牌都支持个人送保，华擎不支持，那么你就得找代售点送修（所以我选择了从京东自营购买，这样京东会帮我送修，不会扯皮）。&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;装电脑&lt;/h2&gt;
&lt;p&gt;装电脑其实并不难，网上有很多装机视频和教程，但是我在 CPU 安装被卡了一下，因为和我之前装过的电脑不太一样。&lt;/p&gt;
&lt;p&gt;这次我用的是 Intel 第 13 代 CPU，使用的是 LGA 1700 插槽。我之前只装过 LGA 115x（六代酷睿）和 LGA 3124（至强第一代可扩展）的 CPU。这里主要带来的不同是：（1）CPU 在安装时候并不只是直接放进去然后扣上固定杆，在按下固定杆的同时还需要按住扣具的左上角，否则扣具是扣不上的，个人猜测这可能是因为 CPU 不再是方形，而是长方形的缘故；（2）在安装散热器扣具的时候，散热器背板的四个角要向外“拉一下”，让螺柱间距变大，才能卡到主板的孔位上。&lt;/p&gt;
&lt;p&gt;我们直接来欣赏一下装好后的效果吧 🤗
&lt;img src=&quot;https://20051110.xyz/_astro/finish.BiooLjn8_Z1Uxr7X.webp&quot; alt=&quot;装好后的效果&quot;&gt;&lt;/p&gt;
&lt;p&gt;btw 我确实不喜欢 RGB 吧，但是行业的共识是没有 RGB 的显卡大约是同级中最丐的了，我也不想买丐显卡。我回头再看看怎么把 CPU 散热器和显卡的 RGB 灯关掉吧。不然我这台电脑就成了一个行走的迪斯科了。&lt;/p&gt;
&lt;p&gt;当然 玻璃盖板还没盖（因为我还准备装三个风扇），所以看起来有点乱。等我装好风扇之后再放一张照片。当然我理线肯定也要再理一下，不过不是现在，因为我在家里装的电脑，搬到学校时候我肯定要把显卡和 CPU 散热器拆下来，所以我就先不理线了。&lt;/p&gt;
&lt;h2&gt;装系统&lt;/h2&gt;
&lt;p&gt;我这次装的是 Arch Linux。这次我尝试了点不一样的：PXE Boot + Arch Linux Netboot 镜像安装系统。全程没有使用任何 U 盘。非常爽。这里我顺便分享一下我的 PXE Boot 环境搭建过程。&lt;/p&gt;
&lt;h3&gt;搭建 PXE Boot 环境&lt;/h3&gt;
&lt;p&gt;PXE Boot basically 就是通过网络来启动电脑。我在这里不是很想赘述他的工作原理（如果你好奇的话，他是基于 DHCP 和 TFTP 工作的，关于什么是 DHCP 我在上一篇 blogpost 中有讲 感兴趣可以看看？）。操作的方法是：你首先需要在主板里开启又一个叫做“Network Stack”的选项（听起来跟网络启动没有半毛钱关系），然后重启 BIOS，你大概率就能看到启动选项卡里有一个叫做“PXE Boot”的选项了。&lt;/p&gt;
&lt;p&gt;有了这个选项还是没用，我们还需要把一个特定的 Bootloader 文件喂给这个电脑，总之我们需要搭建一个“让把 Bootloader 喂给计算机”的 server。&lt;/p&gt;
&lt;p&gt;这里参考了&lt;a href=&quot;https://jah.io/easy-mode-pxe-boot&quot;&gt;这篇文章&lt;/a&gt;。但是稍作了一下修改：文章中提到&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Go Version 1.18 changed the way that go get works. As of that version it now manages module dependencies and no longer fetches, compiles and installs tools/binaries built in Go. You want go install for that as of 1.18, however, pixiecore won&apos;t build in 1.18, so you need to run go install using go 1.17, no newer. This is because they project is basically abandonware now for whatever reason.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;其实大可不必大费周章去装一个 Go 1.17 的环境，consider that 大家的 go 环境是新的版本（比如我是 1.24），我们可以自己 clone 源代码，然后用新的 go 版本来编译它。具体步骤如下：&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;git clone https://github.com/danderson/netboot.git
cd netboot
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;然后编译：&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;go build -o pixiecore-bin ./cmd/pixiecore
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;编译完成后，我们可以使用 &lt;code&gt;netboot.xyz&lt;/code&gt; 这个非常好用的 PXE Boot。我们先下载 UEFI Bootloader：&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;wget https://boot.netboot.xyz/ipxe/netboot.xyz.efi
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;然后运行 pixiecore：&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;sudo ./pixiecore-bin boot &quot;netboot.xyz.efi&quot; --bootmsg &quot;booting from pxe&quot; -d --ipxe-efi64 &quot;netboot.xyz.efi&quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;这样我们的 PXE Boot server 就搭建完成了。接下来我们只需要在目标电脑上选择 PXE Boot 启动（注意要连接同一个局域网，并且待启动的电脑需要接上网线），然后你在 pixiecore 运行的终端上能看到这样的内容：&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20051110.xyz/_astro/netboot.D_rfV1B3_299HnA.webp&quot; alt=&quot;PXE Boot 日志&quot;&gt;&lt;/p&gt;
&lt;p&gt;你在目标电脑上就能看到类似这样的界面：&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20051110.xyz/_astro/server.DpRktS2s_ZX1SQK.webp&quot; alt=&quot;PXE Boot 界面&quot;&gt;&lt;/p&gt;
&lt;p&gt;在跑完进度条后，你就能进入 netboot.xyz 的主界面了：&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20051110.xyz/_astro/netboot.Bspj4UvV_Z21g2Ug.webp&quot; alt=&quot;netboot.xyz 主界面&quot;&gt;&lt;/p&gt;
&lt;p&gt;在这里我们选择“Linux Network Installs -&gt; “Arch Linux”，然后他会下载 Arch Linux Netboot 镜像并启动它，接下来就可以按照 Arch Wiki 上的安装步骤来安装 Arch Linux 了。同样，我们也可以选择其他的 Linux 发行版来安装，比如 Ubuntu、Fedora 等等。我没试过用它装 Windows（虽然它也提供了 Windows 的安装选项），试过的朋友可以在评论区告诉我体验怎么样。&lt;/p&gt;
&lt;p&gt;如果你在下载 Arch Linux Netboot 镜像的时候遇到了问题（比如下载速度慢，或者下载失败），那是因为你没有换源。我没有找到 netboot.xyz 把 archlinux 的 netboot 镜像换成国内源的方法。如果你网络环境通畅，那就直接下载吧，反正也不大（大约 700MB）。如果你网络环境不太好，同样推荐你使用 Arch 官方的 pxe 来启动。这里面首页进去就会让你选区域，选 China 后可选阿里云，清华大学，南京大学等国内的镜像源，下载速度就会快很多了。具体详情，可以参考 &lt;a href=&quot;https://wiki.archlinux.org/title/Netboot&quot;&gt;Arch Wiki 上的 PXE Boot 章节&lt;/a&gt;。逻辑是完全一样的，你把命令中 &lt;code&gt;netboot.xyz.efi&lt;/code&gt; 换成 Arch 官方下载的 .efi 文件就好了。&lt;/p&gt;
&lt;p&gt;在这里面 Utilities 选项卡下还有一个叫做“memtest86+ (v8.0.0)”的选项，可以用来测试内存是否有问题，非常方便。我就是用它测试通过了我那根混插的内存条。&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20051110.xyz/_astro/memtest.6hp9_tVl_ZTtGv.webp&quot; alt=&quot;Memtest Pass&quot;&gt;&lt;/p&gt;
&lt;p&gt;来放一张我装好 Arch Linux 之后的截图（桌面美化稍后会有，这不是最终效果）：&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20051110.xyz/_astro/fastfetch.BMF_lr-d_Z1l4MOl.webp&quot; alt=&quot;Arch Linux&quot;&gt;&lt;/p&gt;
&lt;h2&gt;玩游戏&lt;/h2&gt;
&lt;p&gt;买了这么好的显卡显然是要玩游戏的，而且为了用 Linux 特意选了一块 AMD 显卡（So NVIDIA, Fuck You）。AMD 显卡有非常好的社区驱动支持，在 archinstall 的时候就会提示让你选择 AMD 的 proprietary 的驱动或是社区驱动，社区驱动往往 perform 更好，proprietary 的驱动是给那些大客户用的，（或许）更稳定。&lt;/p&gt;
&lt;h3&gt;降压、解锁功耗墙&lt;/h3&gt;
&lt;p&gt;在这里我们既要降压也要解锁功耗墙。至于我为什么不说“超频”，且听我解释：&lt;/p&gt;
&lt;p&gt;当代 AMD 的 GPU 的 base clock 往往都标的非常保守（拿我的 9070 GRE 举例，他的基础频率最高只有 2350MHz），在这个频率下是根本跑不满功耗的（同样拿我的举例子，只能跑 160~200W 左右，而 TDP 为 240W）。此时 GPU 如果 Performance Level 在 “Automatic” 上（见下图），GPU 会尽可能尝试 boost 到高的频率来&lt;strong&gt;吃满功耗&lt;/strong&gt;。对于高负载的游戏下（比如赛博朋克2077），GPU 是跑不到板卡厂商标定的最高频率的。因此，真正限制 GPU 频率发挥的往往不是 “你超了多少频” 而是 “GPU 到底能不能跑到这个频率”，也即“有没有这个功耗允许他跑到这个频率”。&lt;/p&gt;
&lt;p&gt;With that in mind，我们需要做的只是降压+解锁功耗墙即可。降压能让相同频率下显卡功耗更低，于是显卡能跑更高频。解锁功耗墙同样也是为了让其跑到更高频率。&lt;/p&gt;
&lt;p&gt;对于具体配置，每张显卡各不相同。不过你应该先在B站上做做调研。搜索“你的显卡型号+超频”，看个五六个视频以及评论区，你就对你的显卡的“average的体质”有了了解。可以从那个配置的基础上改。&lt;/p&gt;
&lt;p&gt;我在 Arch 上使用 LACT 程序。我并不推荐你手动去改 GPU 的 profile。使用现成的超频工具能带来两个好处：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;它自带 daemon，开启能自动生效，并且能防止配置不被 override&lt;/li&gt;
&lt;li&gt;它能 List available 的配置。而你直接改配置文件可能会改坏，并不是所有配置都适用于所有年代的显卡。&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;对于我的显卡，最终稳定游戏的是以下配置：显存 +100MHz，GPU 电压 -85mV，功耗墙 240W -&gt; 264W。其实在显存 +150MHz，电压 -100mV 的情况下也能过测，但是我并不推荐这样日常使用。某个测试稳定（比如 superposition，或者 3dmark）并不代表所有游戏都稳定。在 Windows 下，不稳定的现象表现为 “AMD 显卡掉驱动”，在 Linux 下，则表现为桌面环境卡死，然后过一会儿恢复，提示你“GPU 已经恢复”。如果你在游戏中遇到了这种情况，那就说明你的配置不稳定了。我在之前几天亦遇到过几次这种情况，后来把降压幅度调小了一点就稳定了。总之，稳定性才是第一位的，能过测只是一个参考。&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20051110.xyz/_astro/lact.eIpXKMzD_mIo8g.webp&quot; alt=&quot;降压超频&quot;&gt;&lt;/p&gt;
&lt;h3&gt;开启 FSR4 Support&lt;/h3&gt;
&lt;p&gt;首次游玩赛博朋克2077 的时候，我发现对于 FSR 的支持最高只有 3.1 版本。在网上搜索了一下，发现 AMD 官方早就说赛博朋克2077 的 FSR 版本已经支持 4.0 了（2025 年 4 月份就有相关报道了），但是为什么我这边最高只能选 3.1 呢？&lt;/p&gt;
&lt;p&gt;后来发现是 Steam 的 Proton 版本过旧了。Steam 的 Proton 是一个兼容层，能让 Windows 游戏在 Linux 上运行。Proton 的更新频率尽管已经较快了，但有时候还是会落后于 Windows 上游戏的更新。根据社区提示，我们需要换用 CachyOS 的 Proton 版本（CachyOS 是一个基于 Arch Linux 的发行版，专门针对游戏进行了优化）。换了 CachyOS 的 Proton 版本之后，FSR4 就能选了。&lt;/p&gt;
&lt;p&gt;需要安装 &lt;code&gt;proton-cachyos&lt;/code&gt; 包：&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;yay -S proton-cachyos
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;安装需要大约 1 个小时，，，，建议不要在睡前安装，否则会被硬控。装完后在 Steam 关于游戏的“Properties”里把 Proton 版本切换到 CachyOS 的版本就好了。&lt;/p&gt;
&lt;h3&gt;无头模式下的 Game Streaming&lt;/h3&gt;
&lt;p&gt;装好系统之后我就开始折腾 Headless Mode 下的 Game Streaming 了。这里我选择了 Sunshine + Moonlight 的组合。Sunshine 是一个开源的自托管的游戏流媒体服务器，Moonlight 则是一个开源的 NVIDIA GameStream 客户端，可以在多种设备上运行。&lt;/p&gt;
&lt;p&gt;Sunshine 的安装非常简单，Arch Linux 的用户可以直接从 AUR 上安装：&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;yay -S sunshine-git
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;安装完成后，运行 Sunshine，在网页端（注意要用https访问）设置好你的密码和编解码器。然后在你的客户端设备上安装 Moonlight，连接到你的 Sunshine 服务器（配对码也是在网页端显示），就可以开始游戏了。&lt;/p&gt;
&lt;p&gt;但是这样的话你在玩游戏的时候显示器一直得开着。如果你把显示器关掉，Sunshine 会罢工，告诉你 “Streaming Error,... Is the host display turned on?” 一直把显示器开着也不是长久之计，我们要节约能源！所以我们需要让 Sunshine 在无头模式下也能正常工作。这里有两种方法，都可以解决这个问题：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;方法一：你喜欢的购物网站上买一个叫做 HDMI 诱骗器的东西（HDMI Dummy Plug），插在显示器的 HDMI 接口上，这样电脑就会认为有一个显示器连接着了，Sunshine 就不会罢工了。&lt;/li&gt;
&lt;li&gt;方法二：通过内核参数强制“伪造”一个显示器。这个稍微更有技术含量一些，我们详细讲一下：&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;它会让 Linux 内核在启动阶段就确信“有一个显示器插在 HDMI 接口上”。首先准备一个 EDID 文件。你需要一个文件来告诉显卡“我连接的是一个什么显示器”。通常你不需要大费周章去生成一个 EDID 文件，最简单的办法是直接把你现在显示器的 EDID 文件 dump 出来就好了。&lt;/p&gt;
&lt;p&gt;如果没有安装 &lt;code&gt;edid-decode&lt;/code&gt; 的话可以先安装：&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;yay -S edid-decode
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;然后去这个目录下看看：&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;ls /sys/class/drm
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;在这里每个人的配置都会不一样。比如我的输出是这样的：&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;card1  card1-DP-1  card1-DP-2  card1-DP-3  card1-HDMI-A-1  card1-Writeback-1  renderD128  version
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;其中 &lt;code&gt;card1&lt;/code&gt; 中包含了显卡的一些配置，我们不管。其余的 &lt;code&gt;card1-DP-1&lt;/code&gt;，&lt;code&gt;card1-DP-2&lt;/code&gt;，&lt;code&gt;card1-DP-3&lt;/code&gt; 是三个 DisplayPort 接口，&lt;code&gt;card1-HDMI-A-1&lt;/code&gt; 是 HDMI 接口。对于我来说我的真显示器是插在 HDMI 接口上的，所以我就去 &lt;code&gt;card1-HDMI-A-1&lt;/code&gt; 这个目录下 dump 出 EDID：&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;cat /sys/class/drm/card1-HDMI-A-1/edid | edid-decode
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;如果你看到了显示器的相关信息（比如分辨率啊，刷新率啊，厂商信息啊等等），那就说明你成功了。接下来我们把这个 EDID 信息保存到一个文件里：&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;sudo cat /sys/class/drm/card1-HDMI-A-1/edid &gt; /usr/lib/firmware/edid/my_monitor.bin
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;接下来我们找一下显卡空闲的接口。你当然可以通过上面 &lt;code&gt;cat edid&lt;/code&gt; 的方法来一个个试，如果 &lt;code&gt;edid-decode&lt;/code&gt; 提示 empty stdin，那么就是空的，但是更简单的方法是运行一下：&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;for p in /sys/class/drm/*/status; do con=${p%/status}; echo -n &quot;${con#*/card?-}: &quot;; cat $p; done
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;我们只需要找到那个 status 是 disconnected 的接口就好了。比如我的输出是这样的：&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;DP-1: disconnected
DP-2: disconnected
DP-3: disconnected
HDMI-A-1: connected
Writeback-1: unknown
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;那么我就选 DP-3 这个接口来伪造显示器。注意：这个借口你最好也别接显示器了，不然可能会有冲突。接下来我们需要把这个 EDID 文件的路径告诉内核。&lt;/p&gt;
&lt;p&gt;你需要修改 Bootloader 配置（GRUB 或 systemd-boot），加入强制启用显示器的参数。&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;如果你用 GRUB:
编辑 /etc/default/grub，在 GRUB_CMDLINE_LINUX_DEFAULT 引号里追加：&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-plaintext&quot;&gt;drm.edid_firmware=DP-3:edid/my_monitor.bin video=DP-3:e
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;记得把这个 DP-3: 换成你上面查到的接口名。同时，&lt;code&gt;edid/my_monitor.bin&lt;/code&gt;: 换成你第一步里放的文件名（路径默认相对于 /usr/lib/firmware/）。&lt;code&gt;video=DP-3:e&lt;/code&gt;: 这个 :e 最关键，意思是 &quot;Enable&quot;（强制启用），告诉内核忽略物理连接状态。&lt;/p&gt;
&lt;p&gt;然后更新 GRUB：&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;sudo grub-mkconfig -o /boot/grub/grub.cfg
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;如果你用 systemd-boot:
直接编辑 &lt;code&gt;/boot/loader/entries/&lt;/code&gt; 下对应的 .conf 文件，在 options 行尾追加同样的内容。注意这里每个人每个系统的 .conf 文件以及目录结构可能都不太一样，你需要自己找一下。比如我这里是 /boot/loader/entries/2026-01-20_14-25-25_linux-zen.conf，我就在这个文件的 options 行尾追加：&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-plaintext&quot;&gt;drm.edid_firmware=DP-3:edid/my_monitor.bin video=DP-3:e
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;然后注意 ⚠️ 需要更新 initramfs！因为 EDID 文件需要被打包进启动镜像里，显卡驱动在加载时才能读到它。
编辑 &lt;code&gt;/etc/mkinitcpio.conf&lt;/code&gt;，找到 FILES=() 这一行，把你的 EDID 文件加进去：&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-plaintext&quot;&gt;FILES=(/usr/lib/firmware/edid/my_monitor.bin)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;然后重新生成：&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;sudo mkinitcpio -P
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;重启电脑后，你就会发现即使显示器关掉了，Sunshine 也不会罢工了！你就可以愉快地在无头模式下玩游戏了！Better yet，像 KDE 之类非常智能的桌面环境还会自动检测到显示器状态的变化，你在接上正常显示器的时候，你可以去显示设置里 disable 掉那个虚拟显示器，这样你就可以正常使用显示器了；当你把显示器关掉的时候，KDE 会发现当前显示器不可用了，就自动切换到那个虚拟显示器上了，完全不需要你手动干预，非常智能。&lt;/p&gt;
&lt;h2&gt;换桌面&lt;/h2&gt;
&lt;p&gt;说实话当时我 archinstall 的时候是直接选的 niri 作为桌面环境。（我用 Linux 其实很大一部分原因也是馋 niri！macOS 下有一个类似物叫做 paperspoonwm 但是那玩意动画不好看 然后也受限于 macOS 的 WM 总之就是做不好）但是刚一进去这个 niri 也太简陋了😅 所有的 navigation 又是完全依赖于键盘 不配置的情况下完全是不可用的状态。于是又重新安装了 KDE Plasma 😅。但是经过一番美化，我觉得当下 niri 已经是相当可用，我已经将其作为我的主力桌面环境使用，请看现在的效果图：&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20051110.xyz/_astro/niri.By4jVghK_ZjfaP6.webp&quot; alt=&quot;niri&quot;&gt;&lt;/p&gt;
&lt;p&gt;Linux 社区来 show 桌面的时候总是会打开一个 btop，打开一个 fastfetch 然后右下角再开点别的东西。遵循这个传统，我也是这样排布的。右下角是我和同学开的 Minecraft Server 😁。&lt;/p&gt;
&lt;p&gt;对于审美，每个人各不相同，但是我在这里可以抛砖引玉一下我个人觉得较好的美化方案：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;背景模糊：我个人觉得背景模糊是非常重要的一个美化元素了。它能让界面看起来更有层次感，减少视觉疲劳。However, by the end of 2026.2，niri 是不支持背景模糊的，上图终端的背景模糊是在 niri 中配置 “draw-border-with-background false”，然后让终端自己来画背景模糊的。好消息是在笔者写这篇文章的时候， niri 的开发者已经在他的 development branch 上实现了背景模糊的功能了，可以看这个 commit：&lt;a href=&quot;https://github.com/niri-wm/niri/commit/1d92c18aac07dc83e08e470ed315a6d36da3c19e&quot;&gt;Link to GitHub commit&lt;/a&gt;。想要尝试的话也可以编译 https://github.com/niri-wm/niri/tree/wip/branch 这个分支上的代码。所以不久的将来 niri 就会原生支持背景模糊了，敬请期待！&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Waybar：Waybar 是一个非常好用的状态栏工具，支持高度自定义。你可以通过编辑配置文件来调整它的外观和功能，比如添加系统信息、网络状态、电池状态等等。Waybar 的主题也非常丰富，你可以在网上找到很多现成的主题，或者自己动手设计一个。他的主题就是通过一大堆 CSS 来实现的，所以理论上你只要能写 CSS 就能设计出你想要的状态栏了。一个很不错的基底是 &lt;a href=&quot;https://github.com/catppuccin/waybar&quot;&gt;这个主题&lt;/a&gt;，他能给你那种“药丸圆角”的那种感觉，口说无凭，这是他官方给的效果图：&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;preview.webp&quot; alt=&quot;Waybar&quot;&gt;&lt;/p&gt;
&lt;p&gt;你可以向其中加入很多你自己喜欢的模块，比如我就加入了一个显示当前 CPU 占用率，内存占用率，网络状态的模块。你也可以加入当前正在播放的音乐信息，或者是一个显示当前时间和日期的模块，甚至是一个显示当前天气的模块。总之，Waybar 的自定义性非常强，你可以根据自己的需求来设计你的状态栏。这是我现在的 Waybar：&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20051110.xyz/_astro/my_waybar.CXyaqSH__Z1auyxd.webp&quot; alt=&quot;My Waybar&quot;&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Fuzzel：默认的 niri 应用启动器就是 fuzzel，但是默认状态实在是太丑了。好在 fuzzel 的主题配置较为简单，就是一个标准的 .ini 文件，里面定义了一些颜色和字体的配置项。我就选择了 Tokyo Night 这个配色方案，稍微改了一下字体和边框的配置，辅以背景模糊，效果如下：&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20051110.xyz/_astro/fuzzel.CUlOvdfb_Z18Gzqx.webp&quot; alt=&quot;Fuzzel&quot;&gt;&lt;/p&gt;
&lt;p&gt;你可以 argue 说你觉得还是不好看，但是我觉得已经是非常不错了。毕竟也就是这么几行配置文件的事情，如果你需要的话可以抄我的配置文件，或者自己改一改来适合你的审美。&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-ini&quot;&gt;[main]
font=JetBrainsMono Nerd Font:size=13
prompt=&quot;❯   &quot;
icon-theme=Papirus-Dark
icons-enabled=yes
width=45
lines=10
horizontal-pad=20
vertical-pad=20
inner-pad=10
layer=overlay

[colors]
background=1a1b26e6
text=c0caf5ff
match=f7768eff
selection=414868ff
selection-text=c0caf5ff
selection-match=ff899dff
border=7aa2f7ff

[border]
width=2
radius=10
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Alacritty：默认的 niri 终端是 Alacritty。Alacritty 是一个非常快的 GPU 加速终端模拟器，支持高度自定义。Alacritty 默认那套“黑底白字无边距”的样式非常简陋。我们可以通过编辑配置文件来美化它。Alacritty 在较新的版本中已经彻底废弃了原有的 YAML 格式，全面转向了 TOML 格式。网上的很多老教程可能已经失效了。如果你需要我的配置文件的话，我同样贴在这里：&lt;/p&gt;
&lt;p&gt;Alacritty 会自动读取 &lt;code&gt;~/.config/alacritty/alacritty.toml&lt;/code&gt;。另外，记得安装好字体（我用的是 JetBrains Mono Nerd Font），不然可能会出现乱码或者图标显示不出来的情况。用&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;sudo pacman -S ttf-jetbrains-mono-nerd
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;来安装这个字体。&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-toml&quot;&gt;[window]
padding = { x = 16, y = 16 }
opacity = 0.90
decorations = &quot;None&quot;
dynamic_title = true

[font]
normal = { family = &quot;JetBrainsMono Nerd Font&quot;, style = &quot;Regular&quot; }
bold = { family = &quot;JetBrainsMono Nerd Font&quot;, style = &quot;Bold&quot; }
italic = { family = &quot;JetBrainsMono Nerd Font&quot;, style = &quot;Italic&quot; }
size = 12.0

[cursor]
style = { shape = &quot;Beam&quot;, blinking = &quot;On&quot; }

[colors.primary]
background = &quot;#1a1b26&quot;
foreground = &quot;#c0caf5&quot;

[colors.normal]
black   = &quot;#15161e&quot;
red     = &quot;#f7768e&quot;
green   = &quot;#9ece6a&quot;
yellow  = &quot;#e0af68&quot;
blue    = &quot;#7aa2f7&quot;
magenta = &quot;#bb9af7&quot;
cyan    = &quot;#7dcfff&quot;
white   = &quot;#a9b1d6&quot;

[colors.bright]
black   = &quot;#414868&quot;
red     = &quot;#ff899d&quot;
green   = &quot;#b1e37b&quot;
yellow  = &quot;#f3c27b&quot;
blue    = &quot;#8cb6ff&quot;
magenta = &quot;#ceafff&quot;
cyan    = &quot;#8fe2ff&quot;
white   = &quot;#c0caf5&quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;又或者说 你看到这里的时候可能已经厌倦了，觉得“反正我也不 care 美化”，或者“怎么有这么多配置文件要写？”。那么我还有一个大招。你直接用 &lt;code&gt;dankmaterialshell&lt;/code&gt;。这个项目把桌面美化做成了一个一键安装的脚本，能让你的桌面瞬间变得非常好看。它的效果图非常好看（采用了 consistent 的谷歌 Material 3 Design Style）。可以去 &lt;a href=&quot;https://github.com/AvengeMedia/DankMaterialShell&quot;&gt;项目地址&lt;/a&gt; 看看效果图。安装也非常简单，直接运行下面的命令就好了：&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;curl -fsSL https://install.danklinux.com | sh
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;他能让你自由选择窗口管理器（支持 niri、hyprland,...），让你自由选择终端（ghostty, alacritty, ...）总之你如果不是 artist 或者你只需要一个“开箱即用”的美化方案的话，这个项目绝对是非常好的选择了。我的桌面就是安装了这个项目之后稍微改了一点配置的结果了。&lt;/p&gt;
&lt;p&gt;桌面美化的折腾是永无止境的，等 niri 的原生背景模糊功能出来了，我还想再把桌面美化一下，敬请期待下一篇 blogpost 吧！&lt;/p&gt;</content:encoded><h:img src="undefined"/><enclosure url="undefined"/></item><item><title>What is the Internet Protocol?</title><link>https://20051110.xyz/blog/internet-protocol</link><guid isPermaLink="true">https://20051110.xyz/blog/internet-protocol</guid><description>Understanding the Invisible Envelope that Delivers Data Across the Globe</description><pubDate>Tue, 27 Jan 2026 18:32:00 GMT</pubDate><content:encoded>&lt;p&gt;Before you read further, take a moment to ask yourself: &lt;strong&gt;What is the Internet Protocol (IP)?&lt;/strong&gt; Can you explain its purpose &amp;#x26; what it is consisting of beyond just &quot;it&apos;s how devices get IP addresses&quot;?&lt;/p&gt;
&lt;p&gt;Like many developers, I had a functional understanding of the stack: I knew TCP provides reliable connections, UDP is for fast messages, and IP addresses are where packets go. But when I really stopped to ask, &lt;strong&gt;&quot;What &lt;em&gt;is&lt;/em&gt; the Internet Protocol?&quot;&lt;/strong&gt;, I realized I was stuck.&lt;/p&gt;
&lt;p&gt;For IP, the only thing I could picture was an IP Address. This led to a fundamental confusion: &lt;strong&gt;Is the Internet Protocol just a standard for addresses?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The answer, of course, is no. After digging into the architecture, I finally built a mental model that clicks. Here is what I learned.&lt;/p&gt;
&lt;h2&gt;The Protocol is Not the Address&lt;/h2&gt;
&lt;p&gt;First let&apos;s see the well-known OSI Model:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;+---------------------+
| Application Layer   |  (HTTP, FTP, DNS)
+---------------------+
| Transport Layer     |  (TCP, UDP)
+---------------------+
| Network Layer       |  (IP)
+---------------------+
| Data Link Layer     |  (Ethernet, Wi-Fi)
+---------------------+
| Physical Layer      |  (Cables, Radio Waves)
+---------------------+
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you are confused why the above graph only contains 5 layers instead of the usual 7, check out &lt;a href=&quot;/channel/osi&quot;&gt;my channel post&lt;/a&gt;. TL;DR: people often use a simplified “Internet stack” view where OSI’s Session/Presentation layers are folded into the Application layer, and Physical + Data Link are commonly treated together as “the link” (even though they’re distinct in the strict OSI model).&lt;/p&gt;
&lt;p&gt;From this model, it&apos;s easy to see the functionality of IP: it operates at the &lt;strong&gt;Network Layer&lt;/strong&gt;, responsible for routing packets from one host to another across different networks.&lt;/p&gt;
&lt;p&gt;The biggest mental block was separating the &lt;em&gt;address name&lt;/em&gt; (the IP address) from the &lt;em&gt;logic&lt;/em&gt; (the IP protocol). Once you start understanding IP as &lt;strong&gt;a set of rules and procedures&lt;/strong&gt; rather than just a label, the rest falls into place.&lt;/p&gt;
&lt;p&gt;The internet is a series of wires, &lt;strong&gt;IP is the logic that navigates the wires.&lt;/strong&gt; The IP address, on the other hand, is part of the protocol (like a street address in a mailing system) but not the whole story. I gave a short talk about &lt;a href=&quot;/src/assets/teaching/slides/Route.pdf&quot;&gt;How Computer Networks Route Your Packets&lt;/a&gt;, and actually the logic of &lt;code&gt;Routing&lt;/code&gt; is also part of the Internet Protocol. What&apos;s more, IP also defines how packets are structured (the IP Header), how fragmentation works (note that large packets may need to be broken down; on Ethernet MTU is often 1500 bytes), and the delegation of IP addresses (CIDR, DHCP, SLAAC, etc).&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;postal office&lt;/code&gt; analogy is one of the best mental models in networking (from my perspective). When you send a letter, the postal system functions similarly to how IP works:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Envelope: The IP protocol defines how to package data into packets with headers containing source and destination addresses.&lt;/li&gt;
&lt;li&gt;Addressing: The national addressing system ensures each location has a unique identifier, just like IP addresses.&lt;/li&gt;
&lt;li&gt;Routing: The postal system routes letters through various post offices; similarly, IP routes packets through routers across networks.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And let&apos;s talk about routing in more detail with the above 3 components in mind.&lt;/p&gt;
&lt;h2&gt;The IP Header&lt;/h2&gt;
&lt;p&gt;The Internet Protocol has a single focus: getting a packet from Computer A to Computer B. It is completely agnostic about what is inside that packet. It doesn&apos;t care if it&apos;s a fragment of a 4K video, a move in a competitive game, or a simple text file.&lt;/p&gt;
&lt;p&gt;Let&apos;s first see the IP Header for IPv4 (credit: UC Berkeley CS168):&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20051110.xyz/_astro/ipv4_header.5l95QlQQ_ZYaTk0.webp&quot; alt=&quot;IPv4 Header&quot;&gt;&lt;/p&gt;
&lt;p&gt;There are so many fields in the header, but when you take a closer look at it, you&apos;ll find many of them are pretty useful. For example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Version:&lt;/strong&gt; Indicates whether it&apos;s IPv4 or IPv6.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Source &amp;#x26; Destination IP:&lt;/strong&gt; The &quot;From&quot; and &quot;To&quot; addresses. We need these to know where to send the packet &amp;#x26; where to send back the response.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;TTL (Time To Live):&lt;/strong&gt; Prevents packets from circulating forever.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Protocol:&lt;/strong&gt; Indicates whether the payload is TCP, UDP, ICMP, etc. This is for demultiplexing the Layer 4 protocol at the destination. Otherwise the payload would be gibberish.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Header Checksum:&lt;/strong&gt; Ensures the integrity of the header data. &lt;strong&gt;NOTE:&lt;/strong&gt; This checksum only covers the header, not the payload. This is due to the &lt;em&gt;end-to-end principle&lt;/em&gt;: payload integrity is handled by the end hosts, not the routers.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Fragmentation Fields:&lt;/strong&gt; Allow large packets to be broken down into smaller fragments for transmission and reassembled at the destination.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The IPv6 header is simpler and more efficient (credit: UC Berkeley CS168):&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20051110.xyz/_astro/ipv6_header.DlbAZf7m_jD0kI.webp&quot; alt=&quot;IPv6 Header&quot;&gt;&lt;/p&gt;
&lt;p&gt;You can view it as an evolution of the IPv4 header and pushing the &lt;code&gt;End-to-End Principle&lt;/code&gt; even further by:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Removing the Header Checksum (relying entirely on upper-layer protocols for error checking).&lt;/li&gt;
&lt;li&gt;Simplifying fragmentation (only the source handles fragmentation; intermediate routers don’t fragment).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It also ensures scalability with the &lt;code&gt;next header&lt;/code&gt; field, allowing extension headers without complicating the base header.&lt;/p&gt;
&lt;h2&gt;Addressing&lt;/h2&gt;
&lt;p&gt;Before we talk about routing, we must ask: how do devices get these addresses in the first place?&lt;/p&gt;
&lt;p&gt;We have moved away from the rigid &quot;Class A/B/C&quot; system of the 1980s to Classless Inter-Domain Routing (CIDR). If you want to learn about the old-school Class A/B/C system, you could view this &lt;a href=&quot;/channel/classful_addressing&quot;&gt;channel post&lt;/a&gt;. CIDR allows us to slice IP space at any bit boundary, creating subnets of any size to fit the need perfectly.&lt;/p&gt;
&lt;p&gt;The CIDR notation is nothing fancy; if you are not familiar with it, here is a quick refresher:&lt;/p&gt;
&lt;p&gt;An IP address is a 32-bit number (IPv4) or a 128-bit number (IPv6). For simplicity, we take IPv4 as example. What if we want to represent a block of addresses (e.g. 10.0.0.0 to 10.0.0.255)? We could write it as 10.0.0.*, of course, but this fails at examples such as 10.0.0.0 to 10.0.0.127. Instead, we use CIDR notation: &lt;code&gt;10.0.0.0/24&lt;/code&gt; to represent the first example and &lt;code&gt;10.0.0.0/25&lt;/code&gt; for the second. The &lt;code&gt;/24&lt;/code&gt; or &lt;code&gt;/25&lt;/code&gt; indicates how many bits are fixed for the network portion of the address. If you are still confused, let&apos;s write it in binary:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;10.0.0.0/24 means:
00001010.00000000.00000000.00000000
|----- 24 bits fixed -----| Hosts |

The first 24 bits define the network, fixed.
The last 8 bits are free for host addresses.
This gives us 2^8 = 256 addresses (10.0.0.0 to 10.0.0.255).
We write `/24` because 24 bits are fixed for the network.

10.0.0.0/25 means:
00001010.00000000.00000000.0|0000000
|------ 25 bits fixed ------| Hosts |

The first 25 bits define the network, fixed.
The last 7 bits are free for host addresses.
This gives us 2^7 = 128 addresses (10.0.0.0 to 10.0.0.127).
We write `/25` because 25 bits are fixed for the network.
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The beauty of CIDR is its flexibility. You can carve up address space at any bit boundary, creating subnets of exactly the size you need. A &lt;code&gt;/30&lt;/code&gt; gives you 4 addresses, while a &lt;code&gt;/22&lt;/code&gt; gives you $2^{32-22}=1024$ addresses. This efficiency is what allows the internet to scale beyond the rigid Class A/B/C system. Similarly, for IPv6, the total bit length is 128 bits, and a very common LAN subnet size is &lt;code&gt;/64&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Now the practical question: &lt;strong&gt;who hands out addresses to hosts?&lt;/strong&gt; This is where DHCP and SLAAC come in, and they solve &lt;em&gt;slightly different problems&lt;/em&gt;.&lt;/p&gt;
&lt;h3&gt;DHCP (IPv4)&lt;/h3&gt;
&lt;p&gt;In IPv4, the default model is &lt;strong&gt;DHCP (Dynamic Host Configuration Protocol)&lt;/strong&gt;: a server “leases” configuration to clients for a limited time.&lt;/p&gt;
&lt;p&gt;What the client gets is usually more than just an IP:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;IPv4 address&lt;/li&gt;
&lt;li&gt;Subnet mask&lt;/li&gt;
&lt;li&gt;Default gateway&lt;/li&gt;
&lt;li&gt;DNS servers&lt;/li&gt;
&lt;li&gt;Lease time (and other options)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The handshake you’ll often see described is &lt;strong&gt;DORA&lt;/strong&gt;:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Discover&lt;/strong&gt;: client broadcasts “is there a DHCP server?”&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Offer&lt;/strong&gt;: server offers an address + options&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Request&lt;/strong&gt;: client requests that offer&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Ack&lt;/strong&gt;: server confirms (lease is now active)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Two important operational details are easy to miss:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Leases expire and renew.&lt;/strong&gt; Clients try to renew partway through the lease (often around 50%), and if renewal fails they can “rebind” by broadcasting to any DHCP server.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;DHCP works across subnets using relays.&lt;/strong&gt; Broadcasts don’t cross routers, so networks commonly deploy a &lt;strong&gt;DHCP relay&lt;/strong&gt; (often on the router) that forwards DHCP requests to a centralized server.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This “stateful” model is simple for admins: one place to audit, reserve static leases, and manage options.&lt;/p&gt;
&lt;h3&gt;SLAAC (IPv6)&lt;/h3&gt;
&lt;p&gt;IPv6 introduced &lt;strong&gt;SLAAC (Stateless Address Autoconfiguration)&lt;/strong&gt; because the world needed to number &lt;em&gt;billions&lt;/em&gt; of devices without a central server tracking every single assignment.&lt;/p&gt;
&lt;p&gt;In a SLAAC environment, the router doesn’t hand out individual addresses. Instead, it periodically sends &lt;strong&gt;Router Advertisements (RA)&lt;/strong&gt; (part of NDP/ICMPv6) that say:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;“Here is the &lt;strong&gt;prefix&lt;/strong&gt; for this link (often &lt;code&gt;/64&lt;/code&gt;).”&lt;/li&gt;
&lt;li&gt;“Here is the default gateway.”&lt;/li&gt;
&lt;li&gt;“Here are timing parameters (valid lifetime, preferred lifetime).”&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Each host then creates its own address by combining:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the advertised &lt;strong&gt;prefix&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;a self-generated &lt;strong&gt;Interface ID&lt;/strong&gt; (lower 64 bits)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That Interface ID is &lt;em&gt;not&lt;/em&gt; always derived from MAC due to privacy concerns. Modern OSes commonly use temporary, randomized addresses; it rotates over time to reduce tracking. Before a host starts using the address, it runs &lt;strong&gt;Duplicate Address Detection (DAD)&lt;/strong&gt; to ensure no one else on the link is already using it.&lt;/p&gt;
&lt;p&gt;So SLAAC is “stateless” in the sense that &lt;em&gt;no server maintains a lease database of host addresses&lt;/em&gt;. The network advertises the prefix; hosts pick their own. (It’s also common to mix SLAAC + DHCPv6: SLAAC for the address, DHCPv6 for DNS/search domains, depending on network policy.)&lt;/p&gt;
&lt;h3&gt;DHCPv6 Prefix Delegation (PD)&lt;/h3&gt;
&lt;p&gt;Here’s the missing piece that makes home IPv6 feel different from home IPv4: With IPv4, home networks often relied on &lt;strong&gt;NAT&lt;/strong&gt;: your router got &lt;em&gt;one&lt;/em&gt; public IP and hid everyone else behind it. In IPv6, NAT is discouraged, while it IS possible for you to setup NAT66 on OpenWRT or similar router, it is not the common practice (I mean, IPv6 has enough addresses, why bother?).&lt;/p&gt;
&lt;p&gt;IPv6 aims to restore &lt;strong&gt;end-to-end addressing&lt;/strong&gt;, so your home router needs not just one address, but a &lt;strong&gt;block&lt;/strong&gt; large enough to carve into multiple &lt;code&gt;/64&lt;/code&gt; LANs (guest Wi‑Fi, IoT VLAN, etc). That’s what PD is for.&lt;/p&gt;
&lt;p&gt;Think of it as a two-level process:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;ISP -&gt; Router (DHCPv6-PD):&lt;/strong&gt; your router requests a prefix using an &lt;strong&gt;IA_PD&lt;/strong&gt; (Identity Association for Prefix Delegation). The ISP “delegates” something like a &lt;code&gt;/56&lt;/code&gt; or &lt;code&gt;/60&lt;/code&gt; to your router.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Router -&gt; LAN Hosts (SLAAC via RA):&lt;/strong&gt; your router takes that delegated block, selects one &lt;code&gt;/64&lt;/code&gt; per LAN segment, and advertises each &lt;code&gt;/64&lt;/code&gt; via &lt;strong&gt;Router Advertisements&lt;/strong&gt;. Hosts then self-assign addresses using SLAAC.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;Routing&lt;/h2&gt;
&lt;p&gt;So how does a packet find its way across the ocean? Routers make decisions based on a logic called Longest Prefix Match (LPM). A router doesn&apos;t memorize every single IP address on earth. Instead, it memorizes &quot;prefixes&quot; (recall what we discussed above, the CIDR). If a packet matches multiple entries in the routing table, the router always picks the most specific one (e.g. If a /24 and /22 both match, it picks /24 routing).&lt;/p&gt;
&lt;p&gt;But &lt;strong&gt;where do these tables come from&lt;/strong&gt;? This is the domain of Routing Protocols, which are divided into two distinct families based on their scope, IGP and BGP.&lt;/p&gt;
&lt;h3&gt;IGP&lt;/h3&gt;
&lt;p&gt;Interior Gateway Protocols (IGP) operate within a single AS (Autonomous System).&lt;/p&gt;
&lt;p&gt;Remember: we view the whole Internet as a network of networks. It is not a centralized system, but rather a federated system. Within each small network we may have our own routing policies. To manage this complexity, the concept of an Autonomous System (AS) is introduced.&lt;/p&gt;
&lt;p&gt;An Autonomous System (AS) is a collection of IP networks and routers under the control of a single organization that presents a common routing policy to the internet. Each AS is assigned a unique AS number (ASN) for identification.&lt;/p&gt;
&lt;p&gt;There are 2 major IGP protocols:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Distance-Vector Protocols&lt;/li&gt;
&lt;li&gt;Link-State Protocols&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It is pretty common that you have no idea about what the above 2 protocols is about. But before I dive deeper into them, let me give you a quick summary:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Distance-Vector Protocols get the full view of the network by getting their neighbors&apos; routing tables periodically.&lt;/li&gt;
&lt;li&gt;Link-State Protocols get the full view of the network by local computation and flooding the network with link-state advertisements.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Maybe ths above explanation is awful 😢, maybe you still don&apos;t grasp the point. Don&apos;t worry, hope you can grasp the details below.&lt;/p&gt;
&lt;h4&gt;Distance-Vector Protocols&lt;/h4&gt;
&lt;p&gt;Let’s first look at a picture to understand how distance-vector protocols work:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20051110.xyz/_astro/distance_vector.DjxJHieC_14mT77.webp&quot; alt=&quot;Distance Vector&quot;&gt;&lt;/p&gt;
&lt;p&gt;In distance-vector protocols, each router maintains a table (vector) that lists the best known distance to each destination and the next hop to reach that destination. Periodically, each router shares its table with its immediate neighbors. Formally, we can state the update process as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If you hear about a path to some destination, update the table if:
&lt;ul&gt;
&lt;li&gt;You don&apos;t have a path to that destination yet, or&lt;/li&gt;
&lt;li&gt;The new path is shorter than your current known path.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Tell your neighbors about your updated table.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This process is exactly what we state above &quot;getting their neighbors&apos; routing tables periodically&quot;. Routers get &lt;code&gt;How far is each destination&lt;/code&gt; by &quot;neighbor&apos;s result&quot; + &quot;cost to reach that neighbor&quot;.&lt;/p&gt;
&lt;p&gt;But as you might notice, this process has some problems. The major problem is: if a link goes down, it can take a long time for all routers to realize that the path is no longer valid and find the new optimal path.&lt;/p&gt;
&lt;p&gt;One simple optimization to address the above problem is to add the &lt;code&gt;poison packets&lt;/code&gt; mechanism. When a router detects that a link is down, it immediately informs its neighbors that the distance to the affected destination is infinite (or some very large number). But adding the packet would also cause some problem we haven&apos;t seen before (what I do not want to cover here). If you want to see the final algorithm when adding the poison packets, here it is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If you hear an advertisement for that destination, update the table and reset the timer if:
&lt;ul&gt;
&lt;li&gt;The destination isn&apos;t in the table, or&lt;/li&gt;
&lt;li&gt;The advertised cost + link cost is better than the best-known cost, or&lt;/li&gt;
&lt;li&gt;The advertisement is from the current next-hop (includes poison advertisements).&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Advertise updates to neighbors when the table changes, and periodically.
&lt;ul&gt;
&lt;li&gt;Don’t advertise back to the next-hop (split horizon), &lt;strong&gt;or&lt;/strong&gt; advertise poison back (poison reverse).&lt;/li&gt;
&lt;li&gt;Any cost ≥ a threshold (e.g., 16 in RIP) is treated as infinity.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;If a table entry expires, mark it poison and advertise it.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;Link-State Protocols&lt;/h4&gt;
&lt;p&gt;Distance-vector spreads &lt;em&gt;results&lt;/em&gt; (“here’s my best distance”). Link-state spreads &lt;em&gt;facts&lt;/em&gt; (“here’s what I’m directly connected to”). See the picture below:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20051110.xyz/_astro/link_state.DOb10hgh_Zi6Kcf.webp&quot; alt=&quot;Link State&quot;&gt;&lt;/p&gt;
&lt;p&gt;In a link-state protocol (OSPF / IS-IS), each router does three big things:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Discover neighbors&lt;/strong&gt;&lt;br&gt;
Routers exchange hello messages to find adjacent routers and form neighbor relationships.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Flood link-state advertisements&lt;/strong&gt;&lt;br&gt;
Each router advertises the state and cost of its &lt;em&gt;local links&lt;/em&gt; (e.g., “I have a link to R2 with cost 10”).&lt;br&gt;
These LSAs are &lt;strong&gt;flooded&lt;/strong&gt; (forwarded onward) until convergence.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Run Dijkstra&lt;/strong&gt;&lt;br&gt;
Once everyone has the same topology database, each router independently runs &lt;strong&gt;Dijkstra&lt;/strong&gt; to compute a shortest-path tree rooted at itself.&lt;br&gt;
The result becomes the router’s forwarding entries (“to reach prefix X, next hop is Y”).&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This approach has several advantages over distance-vector:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Faster convergence:&lt;/strong&gt; link-state can react to changes more quickly since routers have a full view of the topology.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Routing Logic is consistent:&lt;/strong&gt; all routers compute the same shortest-path tree, it is less likely to have routing loops.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;However, there are tradeoffs, one major problem is it consumes &lt;strong&gt;More CPU/RAM:&lt;/strong&gt; maintaining a database and running Dijkstra is heavier than basic distance-vector.&lt;/p&gt;
&lt;h3&gt;BGP&lt;/h3&gt;
&lt;p&gt;IGPs are about “best path by metric” inside one organization. &lt;strong&gt;BGP is about policy&lt;/strong&gt; across organizations. Note that the network is federated: no one entity controls the whole Internet. Each AS has its own policies about which routes to accept, prefer, or advertise. Trasferring packets across AS boundaries requires it to obey these policies.&lt;/p&gt;
&lt;p&gt;BGP (Border Gateway Protocol) is the Internet’s inter-domain routing protocol. It’s often described as a &lt;strong&gt;path-vector&lt;/strong&gt; protocol:&lt;/p&gt;
&lt;p&gt;Remember that in the link-state protocol, each router floods the network with link-state advertisements to build a complete topology map. From the privacy perspective, this is not acceptable for BGP. Each AS wants to keep its internal topology &amp;#x26; customers private. Instead, BGP only shares reachability information (which prefixes can be reached) along with path attributes, without revealing the internal structure of the AS.&lt;/p&gt;
&lt;p&gt;At a high level, the global Internet is many ASes (ISPs, cloud providers, enterprises, universities). Each AS can run its own IGP internally (OSPF/IS-IS/etc.). At the edges, ASes use &lt;strong&gt;BGP&lt;/strong&gt; to exchange which prefixes they can reach.&lt;/p&gt;
&lt;p&gt;For more details about &lt;code&gt;peering &amp;#x26; transit&lt;/code&gt;, &lt;code&gt;iBGP &amp;#x26; eBGP&lt;/code&gt;, &lt;code&gt;hot-potato routing&lt;/code&gt;, please check my short talk about &lt;a href=&quot;/src/assets/teaching/slides/Route.pdf&quot;&gt;How Computer Networks Route Your Packets&lt;/a&gt;. Here I&apos;ll just omit these details.&lt;/p&gt;</content:encoded><h:img src="undefined"/><enclosure url="undefined"/></item><item><title>如何在寝室制作热巧克力（WIP）</title><link>https://20051110.xyz/blog/hot-chocolate</link><guid isPermaLink="true">https://20051110.xyz/blog/hot-chocolate</guid><description>在寒冷的夜晚，享受一杯自制热巧克力的温暖与甜蜜。</description><pubDate>Mon, 29 Dec 2025 21:27:55 GMT</pubDate><content:encoded>&lt;p&gt;接续这一条 Channel 发的内容：&lt;a href=&quot;/channel/hot_chocolate&quot;&gt;如何在寝室制作热巧克力&lt;/a&gt;。我在这里更新一下制作热巧克力的完整步骤（完整吗？现在我还在探索中）和一些小贴士。&lt;/p&gt;
&lt;h2&gt;选择容器&lt;/h2&gt;
&lt;p&gt;寝室里制作热巧克力需要一个微波炉安全的杯子或碗。确保它足够大，可以容纳热巧克力和搅拌空间。&lt;/p&gt;
&lt;h3&gt;注意事项&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;使用微波炉安全的容器，避免塑料杯，千万不要使用金属容器！这会炸微波炉！&lt;/li&gt;
&lt;li&gt;那些一次性的咖啡杯 / 关东煮杯子能不能用？绝大多数纸杯内壁都有一层塑料淋膜。如果是PE淋膜（最常见）：耐热只有90-100℃左右。微波炉加热牛奶极易产生局部高温，导致涂层融化，让你可以喝到“塑料味”的热巧，甚至摄入有害物质。如果是PP淋膜：可以耐受120℃左右，相对安全，但这种杯子成本高，便利店不一定用。&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;鉴于以上几点，我仍然最推荐的是陶瓷杯或玻璃杯（商品详情页面里面最好提到了说“可以进微波炉”，因为这类杯子往往有更厚的壁，不会炸）。我在 PDD 上买了个杯子，12.1 元包邮，并且还附带一个搅拌勺。所以不要贪这点钱，花点钱让自己喝的安心最重要。&lt;/p&gt;
&lt;h3&gt;如何清洗容器？&lt;/h3&gt;
&lt;p&gt;寝室里很少有人会买洗洁精（毕竟大家都不做饭），也不太可能会有小苏打或者白醋（毕竟不是在化学实验室里🤣）。对于刚买来的杯子，如果实在不放心上面的细菌 / 残留的灰尘，可以考虑用一下“牙膏”来清洁。&lt;/p&gt;
&lt;p&gt;在知道“牙膏清洁法”之前我也使用的是热水泡的方法。仅靠“开水泡”可以做到90%的放心（主要是杀菌和去味），但想要彻底洗干净（去油和微小粉尘），还是需要用到牙膏。为什么？&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;开水烫泡确实能杀灭绝大多数附着的细菌，这点没问题。
但是新杯子表面可能有一层薄薄的保护蜡或工业油膜。开水虽然能融化它们，但如果只是泡着不动，油膜漂浮在水面，水倒掉时油膜又会挂在杯壁上，等于没洗掉。
另外，静电吸附的微小粉尘，光靠水泡很难彻底脱落，需要物理摩擦。&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;鉴于以上两点，使用牙膏清洁法是最简单有效的。使用我们高中化学的知识，我们知道它含有研磨剂（碳酸钙或二氧化硅）和表面活性剂，去污能力强，且带有清香。而且 better yet，牙膏毕竟要入口，还是比较安全的，哪怕你真的没有洗干净，残留的量也不会对人体造成伤害。&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;先把杯子用清水冲湿。&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;挤大概黄豆大小的牙膏涂在杯壁和杯底。&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;用手指（或者干净的牙刷/洗脸巾）在杯子内部反复摩擦，特别是杯口接触嘴唇的地方。&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;你会感觉到摩擦力，这是牙膏在带走污垢。结束后，用清水彻底冲洗干净即可。&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;这样清洁出来的杯子亲测很干净好吧。建议将牙膏清洁法和热水泡法结合使用，先用牙膏清洁，再用开水泡一会儿（我泡了两次开水，每次泡开水都是加开水至快溢出的情况，然后等待 90 摄氏度多的开水冷却至 40 摄氏度左右，再倒掉）。&lt;/p&gt;
&lt;p&gt;经过这样两步走的清洁，杯子就非常干净了，已经可以放心使用。&lt;/p&gt;
&lt;h2&gt;选择原料&lt;/h2&gt;
&lt;p&gt;在本次实验中，我使用了以下原料：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;纯牛奶（全脂更佳，口感更好）。每罐 200ml 的小包装牛奶。&lt;/li&gt;
&lt;li&gt;巧克力（我使用了 85% 可可含量的黑巧克力）。巧克力可以替换为牛奶巧克力（需要你相应减少糖的用量）又或者说者可可粉（需要你额外添加糖和脂肪）。&lt;/li&gt;
&lt;li&gt;糖（根据个人口味调整）。我们稍后会说到这点。&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;当然，AI还给我建议了很多提升风味的原料，比如肉桂粉 / 香草精 / 棉花糖 / 鲜奶油 / 薄荷糖浆等等。但鉴于寝室条件有限，我这次就先用最基础的原料来制作。&lt;/p&gt;
&lt;h3&gt;糖如何选择？&lt;/h3&gt;
&lt;p&gt;一般来说寝室里不太可能有白砂糖这种东西，就算有的话也是红糖之类（但我觉得红糖跟热可可不搭）。建议重新选购一下糖。大家可以直接去电商平台上买糖，一般来说大家有四个象限的选择：&lt;/p&gt;
&lt;p&gt;|        | 颗粒状糖        | 方糖     |
| ------ | --------------- | -------- |
| 白色糖 | 白砂糖 / 细砂糖 | 咖啡方糖 |
| 棕色糖 | 红糖 / 黄糖     | 黄糖方糖 |&lt;/p&gt;
&lt;p&gt;以下给出我的一些选购建议：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;如果用量少（比如你就准备做一杯 hot chocolate），建议不要买，直接去麦当劳白嫖就可以了。你厚着脸皮问店员要几包糖包，完全没问题的。我试了两次（校内校外的麦当劳都试过）店员都很乐意给我糖🍬。关于糖的用量，spoiler：200mL牛奶 + 85% 黑巧克力的话，加 12g 糖就够甜（加 4 包绝对会太多！我试过）。如果使用牛奶巧克力，还需要相应减少。麦当劳糖包是 4g / 袋，所以 3 袋即可。&lt;/li&gt;
&lt;li&gt;如果用量多的话，建议买颗粒状糖更划算。颗粒状糖更容易溶解在牛奶中（方糖需要时间融化）。尽管价格不一定比方糖实惠，但颗粒状糖更好用。按照我买的价格对比：我买了 25 包 5g / 袋 “太古金黄咖啡调糖”，每包 0.22 元，非常实惠，总价 5.5 元包邮，~~他们这样快递费都不够赚吧~~。By contrast，方糖三盒 * 250g / 盒，价格 20.0 元包邮，实惠不少，但是你做不出来那么多热巧克力吧🤣。&lt;/li&gt;
&lt;li&gt;选择白色糖还是棕色糖？
&lt;ul&gt;
&lt;li&gt;白砂糖 / 细砂糖：甜味纯净，不会影响巧克力的风味。&lt;/li&gt;
&lt;li&gt;黄糖：带有焦糖风味，会增加热巧克力的复杂度。如果你喜欢这种风味，可以尝试。我觉得很不错好吧，毕竟白砂糖就是“纯甜”，为什么不试试有风味的糖呢？&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;巧克力如何选择？&lt;/h3&gt;
&lt;p&gt;如果你喜欢浓郁的巧克力风味，建议选择 70% 及以上的黑巧克力。如果你喜欢甜一些的口感，可以选择牛奶巧克力（大约 30%-40% 可可含量）。注意，牛奶巧克力含糖量较高，制作时需要相应减少糖的用量。&lt;/p&gt;
&lt;p&gt;同时注意：因为牛奶巧克力可可少了，所以后续如果你希望“更浓郁的风味”的话，需要相应稍微增加巧克力的用量。&lt;/p&gt;
&lt;h2&gt;实验 Attempt 1&lt;/h2&gt;
&lt;h3&gt;期望步骤&lt;/h3&gt;
&lt;h4&gt;制作“巧克力酱”&lt;/h4&gt;
&lt;ol&gt;
&lt;li&gt;掰碎：把板状巧克力掰得越碎越好。建议用量：200mL牛奶大约搭配 20g 巧克力。&lt;/li&gt;
&lt;li&gt;加少许奶。只倒一点点牛奶，刚刚没过巧克力碎即可。此时也加入糖。&lt;/li&gt;
&lt;li&gt;初次加热：放入微波炉，中高火加热 30-40秒。&lt;/li&gt;
&lt;li&gt;搅拌乳化：拿出来。这时候巧克力可能看起来还没化，但牛奶是烫的。用勺子不断搅拌，利用余温把巧克力彻底化开，变成一杯浓稠的巧克力酱。&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;如果还有硬块化不开，就再加热10秒。一定要搅拌到顺滑无颗粒。&lt;/li&gt;
&lt;li&gt;为什么糖在这一步加入？糖在加热过程中会更好地融入牛奶分子，产生一种“熟奶香”。如果在热好了之后再放糖，虽然也能化，但风味融合度不如一起加热来得好。&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;融合牛奶&lt;/h4&gt;
&lt;ol&gt;
&lt;li&gt;倒满牛奶：在巧克力酱里倒入剩下的牛奶（留一点空间防溢出）。&lt;/li&gt;
&lt;li&gt;混合：用勺子稍微搅拌一下，让酱和奶初步混合。&lt;/li&gt;
&lt;li&gt;二次加热：放入微波炉，高火加热 1分 - 1分30秒。&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;实际实验&lt;/h3&gt;
&lt;p&gt;&lt;img src=&quot;https://20051110.xyz/_astro/0.B1LVnnTd_q0h5N.webp&quot; alt=&quot;巧克力用量&quot;&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;如图所示，一整版是 100g 巧克力，我掰了大约 20g 出来，放入杯子中。&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;img src=&quot;https://20051110.xyz/_astro/0.5.D9lRomDl_ZDX0L5.webp&quot; alt=&quot;加牛奶&quot;&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;注意：这里出现了第一个问题。巧克力还是没有掰太碎，实在是太大块了，导致后续没有融化的特别好！&lt;/strong&gt;&lt;/p&gt;
&lt;ol start=&quot;2&quot;&gt;
&lt;li&gt;
&lt;p&gt;加少许奶。不要倒满牛奶！ 只倒一点点牛奶，刚刚没过巧克力碎即可（大约30-50mL）。在这里我只倒了我杯子的 1/4 高。&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;放入微波炉，高火加热 1 分钟。（&lt;strong&gt;这是我的操作！不要学！1 分钟太多太多了！&lt;/strong&gt;）牛奶溢出来了！（当然，我把微波炉擦过了，但是还是要注意不要溢出来！）&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;拿出来搅拌。巧克力还是没完全化开！（&lt;strong&gt;第二个问题出现了！&lt;/strong&gt;）我觉得是因为巧克力块太大块了，导致没法完全融化。注意：尽管巧克力看起来像下图这样“看似融化”，但是还是记得多搅拌！因为巧克力块还是有的！（见下面两张图）&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;看似搅拌好了的巧克力酱：
&lt;img src=&quot;https://20051110.xyz/_astro/1.QdXpLtJ0_Z2osEsW.webp&quot; alt=&quot;看似搅拌好了&quot;&gt;&lt;/p&gt;
&lt;p&gt;但事实上还有细细碎碎的巧克力块（这是我喝到最后才发现的）：
&lt;img src=&quot;https://20051110.xyz/_astro/2.Cs3Ljb54_4yFnY.webp&quot; alt=&quot;还有细细碎碎的巧克力块&quot;&gt;&lt;/p&gt;
&lt;p&gt;请忽略上图杯子边沿超级超级脏的样子🤣，如果它让你想到了一些不好的画面，请你不要想它，而是相信这杯热巧克力是很美味的😋。&lt;/p&gt;
&lt;h3&gt;实验反思&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;巧克力块要掰得越碎越好！建议掰到大约豌豆大小，甚至更小。这样才能确保它们能完全融化。&lt;/li&gt;
&lt;li&gt;初次加热时间要短！我下次准备减少火力到中高火，加热时间减少到 20-30 秒。&lt;/li&gt;
&lt;li&gt;搅拌要彻底！确保巧克力完全融化，没有颗粒感。&lt;/li&gt;
&lt;li&gt;糖的用量要适中！本次我加入了 4 包麦当劳糖包（16g）（我使用了 200mL 牛奶），结果太甜了！下次准备减少到 3 包（12g）。糖和奶的比例应该在 6% 左右比较合适。&lt;/li&gt;
&lt;li&gt;本次因为我在清理微波炉，所以我没有进行“二次加热融合牛奶”的步骤。我准备下次实验时再进行这个步骤，看看效果如何。当然，我会注意不要让牛奶溢出来😅。&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;洗杯子&lt;/h3&gt;
&lt;p&gt;同样，我非常推荐使用“牙膏清洁法”来清洗杯子。经过热巧克力的浸泡，杯子内部可能会有一些巧克力残留物。使用牙膏+ 手搓，30秒钟就能洗得干干净净，比单纯用水冲洗效果好很多。请看效果图：&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20051110.xyz/_astro/3.D5YnxTu0_1QME5W.webp&quot; alt=&quot;洗杯子前后对比&quot;&gt;&lt;/p&gt;</content:encoded><h:img src="undefined"/><enclosure url="undefined"/></item><item><title>What is a Socket?</title><link>https://20051110.xyz/blog/socket</link><guid isPermaLink="true">https://20051110.xyz/blog/socket</guid><description>Understanding Sockets in Go Networking and the Underlying OS Mechanics</description><pubDate>Sat, 06 Dec 2025 21:57:00 GMT</pubDate><content:encoded>&lt;p&gt;If you’ve spent any time learning network programming in Go, you’ve likely marveled at how simple the &lt;code&gt;net&lt;/code&gt; package is. With just three lines of code, you can create a performant TCP server. I mean&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-go&quot;&gt;ln, _ := net.Listen(&quot;tcp&quot;, &quot;:8080&quot;)
for {
    conn, _ := ln.Accept()
    go handle(conn)
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;But this simplicity often hides the mechanics. You know &lt;em&gt;how&lt;/em&gt; to use &lt;code&gt;net.Dial&lt;/code&gt; or &lt;code&gt;net.Listen&lt;/code&gt;, but do you know what a &quot;socket&quot; actually is?&lt;/p&gt;
&lt;p&gt;Here is what I learned about file descriptors, the OS kernel, and why your listener socket is &quot;blind.&quot;&lt;/p&gt;
&lt;h2&gt;The Socket is Just a File&lt;/h2&gt;
&lt;p&gt;In Unix-like operating systems (Linux, macOS), there is a golden rule: &lt;strong&gt;Everything is a file.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;When you write this in Go:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-go&quot;&gt;conn, _ := net.Dial(&quot;tcp&quot;, &quot;google.com:80&quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You aren&apos;t magically holding a physical wire connected to Google. You are asking the Operating System to create a network endpoint. The OS sets up the heavy lifting in memory and hands you back a simple integer, known as a &lt;strong&gt;File Descriptor&lt;/strong&gt;. Yeah, just like anything else you read or write on your computer, a file on a disk, or stdin/stdout.&lt;/p&gt;
&lt;p&gt;Your &lt;code&gt;net.Conn&lt;/code&gt; object is essentially a wrapper around that number.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;When you call &lt;code&gt;conn.Write()&lt;/code&gt;, you are writing bytes to a file buffer.&lt;/li&gt;
&lt;li&gt;When you call &lt;code&gt;conn.Read()&lt;/code&gt;, you are reading bytes from a file buffer.&lt;/li&gt;
&lt;li&gt;The OS kernel takes care of actually pushing that data across the physical wires.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Why Listeners Create New Sockets?&lt;/h2&gt;
&lt;p&gt;When I first try to write my SOCKS5 server with Go, the below case is the most confusing part. Look at this standard Go pattern:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-go&quot;&gt;ln, _ := net.Listen(&quot;tcp&quot;, &quot;:8080&quot;)

for {
    conn, _ := ln.Accept()
    go handle(conn)
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;So why does &lt;code&gt;ln.Accept()&lt;/code&gt; return a &lt;em&gt;new&lt;/em&gt; connection (&lt;code&gt;conn&lt;/code&gt;)? Why doesn&apos;t it just use the &lt;code&gt;ln&lt;/code&gt; object to talk to the client?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The Answer: Concurrency.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Think of your server as a busy hotel. If the Receptionist had to personally escort every guest to their room and stay there to chat, the front desk would be empty. No new guests could check in.&lt;/p&gt;
&lt;p&gt;By design, the OS separates these roles. The Listener stays bound to Port 8080. When a Client arrives, the Listener performs the handshake, creates a &lt;em&gt;new&lt;/em&gt; file descriptor for that specific conversation, and immediately goes back to watching the door for the next guest.&lt;/p&gt;
&lt;p&gt;Same thing happens if you use C to create sockets. If your implement your server in C, you would go through slightly more steps: &lt;code&gt;socket()&lt;/code&gt;, &lt;code&gt;bind()&lt;/code&gt;, &lt;code&gt;listen()&lt;/code&gt;, and then &lt;code&gt;accept()&lt;/code&gt;. The &lt;code&gt;accept()&lt;/code&gt; call is what creates a new socket file descriptor for the specific client connection.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-c&quot;&gt;int server_fd = socket(AF_INET, SOCK_STREAM, 0);
bind(server_fd, (struct sockaddr *)&amp;#x26;address, sizeof(address));
listen(server_fd, 3);
int new_socket = accept(server_fd, (struct sockaddr *)&amp;#x26;address, (socklen_t*)&amp;#x26;addrlen);
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;The Listener is Blind&lt;/h2&gt;
&lt;p&gt;A common misconception is that because the Listener creates the connection, it must &quot;see&quot; all the traffic. But is it true? e.g. in the above C code, can you read data from &lt;code&gt;server_fd&lt;/code&gt;? Is it possible to do something like this?&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-c&quot;&gt;char buffer[1024] = {0};
read(server_fd, buffer, 1024); // Is this valid?
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you tried to read data from the listening socket, what would happen? &lt;strong&gt;It would fail.&lt;/strong&gt; The listening socket is essentially &lt;strong&gt;blind&lt;/strong&gt; to data payloads. It only understands one thing: &lt;strong&gt;The Handshake&lt;/strong&gt;  (SYN packets).&lt;/p&gt;
&lt;p&gt;When a packet of data arrives at your server&apos;s IP, the Operating System does the following:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Is this a new handshake?&lt;/strong&gt; Send it to the &lt;strong&gt;Listener&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Is this data for an active conversation?&lt;/strong&gt; Look up the specific &lt;strong&gt;Child Socket&lt;/strong&gt; (that &lt;code&gt;conn&lt;/code&gt; object you got earlier) and send the data there.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The Listener is just a forwarder for new connections. It doesn&apos;t handle data itself.&lt;/p&gt;
&lt;h2&gt;Why This Matters?&lt;/h2&gt;
&lt;p&gt;Understanding that &lt;code&gt;Accept()&lt;/code&gt; generates a new, independent file descriptor is exactly why Go is so good at networking.&lt;/p&gt;
&lt;p&gt;Because the new connection (&lt;code&gt;conn&lt;/code&gt;) is completely decoupled from the listener (&lt;code&gt;ln&lt;/code&gt;), we can immediately hand &lt;code&gt;conn&lt;/code&gt; over to a Goroutine.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-go&quot;&gt;go handle(conn) // This runs in the background
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The main loop stays unblocked, the Listener stays at the front desk, and Go&apos;s runtime manages thousands of these &quot;Room Keys&quot; concurrently.&lt;/p&gt;</content:encoded><h:img src="undefined"/><enclosure url="undefined"/></item><item><title>What Stops L1 Cache from Being Larger?</title><link>https://20051110.xyz/blog/l1cache</link><guid isPermaLink="true">https://20051110.xyz/blog/l1cache</guid><description>Have you ever wondered why L1 cache sizes are relatively small compared to L2 and L3 caches? Actually, it has stopped being larger for a long time.</description><pubDate>Thu, 27 Nov 2025 14:55:00 GMT</pubDate><content:encoded>&lt;p&gt;Let&apos;s take a look at a modern CPU, the AMD Ryzen 7950X: (all the below CPU-Z images are from &lt;a href=&quot;https://valid.x86.fr&quot;&gt;valid.x86.fr&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20051110.xyz/_astro/955htt.Cc7QM509_2bDzj.webp&quot; alt=&quot;Ryzen 7950X&quot;&gt;&lt;/p&gt;
&lt;p&gt;And you might wonder: huh, this looks normal. L1 cache is 32 + 32KB per core, L2 is 1MB per core, and L3 is 64MB shared. This seems reasonable, with each level being larger than the previous one.&lt;/p&gt;
&lt;p&gt;But what if I show you an older CPU, the Intel Core 2 Duo E8400 from 2008?&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20051110.xyz/_astro/7s2acr.CEbABV53_Cx75I.webp&quot; alt=&quot;Core 2 Duo E8400&quot;&gt;&lt;/p&gt;
&lt;p&gt;Surprisingly, it has the same L1 cache size: 32 + 32KB per core, L2 is 6MB shared, and there is no L3 cache. This means that for over 15 years, L1 cache sizes have not increased at all! Why is that?&lt;/p&gt;
&lt;p&gt;We all want larger caches to reduce memory latency and improve performance. AMD even come with their 3D V-Cache technology to stack more cache on top of existing cache dies. So, what stops L1 cache from being larger?&lt;/p&gt;
&lt;p&gt;This question emerged in my mind when I was learning CS:APP (Computer Systems: A Programmer&apos;s Perspective) about Virtual Memory System. As you can see here in the course slide, the modern CPU utilizes a &quot;cute trick&quot; for speeding up L1 cache access (credit: &lt;a href=&quot;http://csapp.cs.cmu.edu/&quot;&gt;CS:APP3e Slide&lt;/a&gt;):&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20051110.xyz/_astro/csapp.CWxw_0ph_c4mDO.webp&quot; alt=&quot;L1 Cache Cute Trick&quot;&gt;&lt;/p&gt;
&lt;p&gt;TL;DR of the trick is that L1 cache is &quot;&lt;strong&gt;Virtually Indexed, Physically Tagged&lt;/strong&gt;&quot;. It allows the processor to start looking for data in the L1 Cache before it has even finished translating the address from Virtual to Physical.&lt;/p&gt;
&lt;p&gt;Normally, if CPU is built without this trick, accessing memory works in a strict sequence:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The CPU translates the Virtual Address (VA) to a Physical Address (PA) using the TLB.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The CPU uses the Physical Address to check the L1 Cache.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is safe, but slow because step 2 cannot start until step 1 is finished. However, if we utilize the VIPT trick, the CPU can do two things simultaneously. By the time the L1 Cache has found the potential data row using the Index, the Address Translation finishes, and the hardware compares this physical tag against the tag stored in the cache slot we just looked up. Really smart!&lt;/p&gt;
&lt;p&gt;But wait... For this trick to work, the bits required for the &lt;strong&gt;Cache Index (CI) plus the Cache Offset (CO) must fit inside the Page Offset&lt;/strong&gt;. If the cache were larger, the CI bits would spill over into the VPN. If that happened, we couldn&apos;t use the virtual bits to index the cache because that part of the address does change during translation.&lt;/p&gt;
&lt;p&gt;Note that the cache size is calculated as:&lt;/p&gt;
&lt;p&gt;$$
\text{Cache Size} = 2^{CI + CO} \times \text{Associativity}
$$&lt;/p&gt;
&lt;p&gt;where $2^{CI}$ denotes the number of cache sets, $2^{CO}$ denotes the block size, and Associativity is how many blocks are in each set. As Page Offset size is fixed by the system architecture (e.g., 12 bits for 4KB pages), this actually &lt;em&gt;limits&lt;/em&gt; how large the L1 cache can be.&lt;/p&gt;
&lt;p&gt;The standard way to cheat this limit is to increase Associativity. If we double the associativity, we can double the cache size without increasing CI. However, increasing associativity is not free. Remember that you want better performance (access speed) from L1 cache? Actually, the more associative a cache is, the longer it takes to look up data.&lt;/p&gt;
&lt;p&gt;To select one entry from a cache, we need:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;$N$ comparators for an $N$-way set associative cache, running in parallel. So 16-way associative cache needs 2x comparator power, and 2x area compared to 8-way.&lt;/li&gt;
&lt;li&gt;A multiplexer to select the right data output from the $N$ entries. But note: This mux or its control logic is &lt;strong&gt;often on the critical path&lt;/strong&gt; for L1 hit latency. This mux consumes area and power, and more importantly, &lt;strong&gt;increases the access time&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Hardware is physical. You need to consider about fan-out. The input address tag must be fanned-out to 16 locations instead of 8.&lt;/li&gt;
&lt;li&gt;You also need to deal with replacement policy! You need to track usage for 16 blocks instead of 8 to determine which one to evict. 16-way caches almost never use true LRU. They use approximations (Pseudo-LRU) or random replacement often.&lt;/li&gt;
&lt;li&gt;Once you have to physically stretch the cache across more of the core, the wire delays and clock tree load start dominating. This is one big reason designers would rather keep L1 small and very close to the pipelines.&lt;/li&gt;
&lt;li&gt;...&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;All these overheads add up. So back to our question: why L1 cache has not increased in size for 15 years? The answer is clear now: CPU designers have to make a trade-off between cache size and access speed. A 32 KB L1 is full of parallel tag comparators and big mux trees and hit on nearly every cycle, so it’s a power hotspot. Still, it is a very reasonable sweet spot for many modern CPUs, and its size has stayed flat mostly because other things scaled instead (L2/L3, prefetching, OoO machinery, etc.). Modern CPUs leaned into this idea:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Keep L1 tiny but extremely fast.&lt;/li&gt;
&lt;li&gt;Grow L2/L3 aggressively for capacity and hit-rate.&lt;/li&gt;
&lt;li&gt;Use prefetchers, better branch prediction, bigger OoO windows, etc. to hide latency.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So instead of &quot;make L1 bigger&quot;, architects made the rest of the machine smarter. They bring things into L1 just in time with prefetch.&lt;/p&gt;
&lt;p&gt;But wait, before you close this article, let me show you one more thing. I only show you part of the story. Let&apos;s look at another CPUs:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20051110.xyz/_astro/69c8c6.BdurLi2k_1jicc2.webp&quot; alt=&quot;Intel Ultra 9 285K&quot;&gt;&lt;/p&gt;
&lt;p&gt;It got 48KB L1 Data Cache and 64KB L1 Instruction Cache per performance core!&lt;/p&gt;
&lt;p&gt;The Apple M5 CPU (I do not have the CPU-Z picture), is even more interesting: it has &lt;strong&gt;192KB L1 Instruction Cache&lt;/strong&gt; 🤯 and &lt;strong&gt;128KB L1 Data Cache&lt;/strong&gt; 🤯 per performance core! How? Well... let&apos;s break them down.&lt;/p&gt;
&lt;p&gt;Intel decided 32KB wasn&apos;t enough for their Data Cache. But remember the VIPT Limit (Page Size = 4KB)? To get 48KB without breaking VIPT, Intel had to pay the &quot;associativity tax&quot; we discussed earlier. They made the L1 Data Cache 12-way set associative (instead of the common 8-way).&lt;/p&gt;
&lt;p&gt;But what about Apple? Apple&apos;s M-series chips have L1 caches that are &lt;strong&gt;3x–6x larger than Intel or AMD&lt;/strong&gt;. How???? Dude, there are no magic here.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Apple runs its CPUs at lower frequencies (~4.0 GHz) compared to AMD/Intel (~5.7 GHz and they are going even further!). Lower frequency makes it easier to access a large cache in 3 cycles.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Apple uses ARM, which has a fixed instruction length (mostly). This makes indexing and decoding slightly more predictable than x86&apos;s variable-length chaos, allowing them to optimize large cache access differently.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Most importantly: While x86 is stuck with standard 4KB memory pages, Apple Silicon is optimized for &lt;strong&gt;16KB pages&lt;/strong&gt;. By using a 16KB page size, the &apos;Page Offset&apos; becomes larger, effectively quadrupling the VIPT limit. This allows Apple to build massive L1 caches without needing complex hardware tricks or excessive associativity.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So what about AMD then? ~~Would increasing L1 cache size to 48KB make Ryzen CPUs even more better than Intel?~~ Well, it turns out that in their latest chip, the Ryzen 9 9950X, they increased the L1 Dcache to 48KB.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20051110.xyz/_astro/x5e28p.d7vVVWVn_aVEl4.webp&quot; alt=&quot;AMD Ryzen 9 9950X&quot;&gt;&lt;/p&gt;
&lt;p&gt;Is that the complete story? Not really. Remember we said &quot;Apple uses ARM, which has a fixed instruction length (mostly). This makes indexing and decoding slightly more predictable than x86&apos;s variable-length chaos&quot;? So what about x86? Modern x86 instructions are complex and &quot;ugly.&quot; Before the CPU can execute them, it must decode them into simpler internal commands called &quot;micro-ops.&quot; It turns out that the Micro-op cache is a critical component in modern x86 CPUs.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Both Zen 4 and Zen 5 architectures feature an Op Cache, but Zen 5 has upgraded the design by utilizing two 6-wide Op Caches, as opposed to Zen 4’s single 9-wide Op Cache. The Op Cache is crucial because it stores pre-decoded micro-operations (uOps). When instructions are fetched repeatedly (such as in loops), the CPU can pull these uOps directly from the Op Cache instead of decoding the instructions again, which saves time and power.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The above text are from &lt;a href=&quot;https://medium.com/@jason890418123/exploring-zen-5-and-zen-4-microarchitectures-dive-into-op-cache-branch-prediction-and-more-f9da2469fb5e&quot;&gt;here&lt;/a&gt;. Because this exists, the Icache doesn&apos;t need to be huge; it just serves as a backup for the Op-Cache.&lt;/p&gt;
&lt;p&gt;Hope you enjoyed this article! If you have any questions or something to correct, feel free to comment below.&lt;/p&gt;</content:encoded><h:img src="undefined"/><enclosure url="undefined"/></item><item><title>A Simple Hack to Use Touch ID for sudo on macOS</title><link>https://20051110.xyz/blog/touchid-sudo</link><guid isPermaLink="true">https://20051110.xyz/blog/touchid-sudo</guid><description>Tired of typing your sudo password on macOS? Learn how to enable Touch ID for sudo commands in just a few simple steps.</description><pubDate>Sat, 06 Sep 2025 14:24:00 GMT</pubDate><content:encoded>&lt;p&gt;If you spend any amount of time in the macOS Terminal, you know the drill. You type a command with &lt;code&gt;sudo&lt;/code&gt;, press Enter, you type your long, secure password for the tenth time today, and you think, &quot;There has to be a better way.&quot;&lt;/p&gt;
&lt;p&gt;There is. And it&apos;s right at your fingertips.
It&apos;s a simple, reversible, and game-changing tweak that you&apos;ll appreciate every single day.&lt;/p&gt;
&lt;h2&gt;How It Works (The Quick Version)&lt;/h2&gt;
&lt;p&gt;macOS uses a flexible system called PAM (Pluggable Authentication Modules) to handle authentication. All we&apos;re going to do is edit the configuration file for &lt;code&gt;sudo&lt;/code&gt; to tell it: &quot;Hey, before you ask for a password, just check for a valid fingerprint from Touch ID first. If that works, we&apos;re good to go.&quot;&lt;/p&gt;
&lt;h2&gt;The 2-Minute Setup Guide&lt;/h2&gt;
&lt;h3&gt;Open the Terminal&lt;/h3&gt;
&lt;p&gt;You can find it in &lt;code&gt;Applications/Utilities&lt;/code&gt; or just search for it with Spotlight (&lt;code&gt;⌘ + Space&lt;/code&gt;).&lt;/p&gt;
&lt;h3&gt;Open the PAM Configuration File&lt;/h3&gt;
&lt;p&gt;We need to edit a protected system file, so we&apos;ll use the simple command-line editor &lt;code&gt;nano&lt;/code&gt; with &lt;code&gt;sudo&lt;/code&gt; privileges. Copy and paste the following command and press Enter. It will ask for your password (likely for the last time!).&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;sudo nano /etc/pam.d/sudo
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;nano&lt;/code&gt; editor will open inside your Terminal window. You&apos;ll see a few lines of configuration text.&lt;/p&gt;
&lt;p&gt;The most important part is getting this next step right. On a &lt;strong&gt;new line right after the first commented line&lt;/strong&gt; (the one starting with &lt;code&gt;#&lt;/code&gt;), add the following:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;auth       sufficient     pam_tid.so
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Make sure it is the &lt;strong&gt;very first active rule&lt;/strong&gt;. For my system, the file looks like this after the edit:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# sudo: auth account password session

auth       sufficient     pam_tid.so   # &amp;#x3C;-- This is the line we added
auth       include        sudo_local
auth       sufficient     pam_smartcard.so
auth       required       pam_opendirectory.so
account    required       pam_permit.so
password   required       pam_deny.so
session    required       pam_permit.so
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The keyword &lt;code&gt;sufficient&lt;/code&gt; is what makes this work. It tells the system that if Touch ID authentication succeeds, it&apos;s enough to grant permission, and no other authentication methods (like your password) are needed.&lt;/p&gt;
&lt;h3&gt;Save and Exit&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;Press &lt;code&gt;Control + O&lt;/code&gt; to Write Out (save) the file.&lt;/li&gt;
&lt;li&gt;Press &lt;code&gt;Enter&lt;/code&gt; to confirm the filename.&lt;/li&gt;
&lt;li&gt;Press &lt;code&gt;Control + X&lt;/code&gt; to exit &lt;code&gt;nano&lt;/code&gt; and return to your prompt.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;Time to Test It!&lt;/h2&gt;
&lt;p&gt;For the change to take effect, you &lt;strong&gt;must open a new Terminal window or tab&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;In your new Terminal session, type a simple &lt;code&gt;sudo&lt;/code&gt; command, like:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;sudo ls
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Instead of a password prompt, you should be greeted by a Touch ID verification pop-up. Place your finger on the sensor, and your command will run. Welcome to the good life.&lt;/p&gt;
&lt;h2&gt;Good to Know&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;How do I undo this?&lt;/strong&gt; Simply edit the &lt;code&gt;/etc/pam.d/sudo&lt;/code&gt; file again and delete the &lt;code&gt;auth sufficient pam_tid.so&lt;/code&gt; line you added.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;What if it still asks for my password?&lt;/strong&gt; You likely put the new line in the wrong place. Go back to Step 3 and make absolutely sure it&apos;s the first non-commented line in the file.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;What about macOS updates?&lt;/strong&gt; Major system updates can sometimes overwrite this file, reverting it to the default. If Touch ID suddenly stops working for &lt;code&gt;sudo&lt;/code&gt; after an update, just repeat these steps.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That’s it! Enjoy the precious seconds you’ve reclaimed. Happy coding! 👍&lt;/p&gt;</content:encoded><h:img src="undefined"/><enclosure url="undefined"/></item><item><title>Overview of the RISC-V Design with Tomasulo&apos;s Algorithm</title><link>https://20051110.xyz/blog/tomasulo-cpu</link><guid isPermaLink="true">https://20051110.xyz/blog/tomasulo-cpu</guid><description>An introduction to the RISC-V ISA, Verilog, and Tomasulo’s algorithm.</description><pubDate>Sun, 24 Aug 2025 18:09:00 GMT</pubDate><content:encoded>&lt;h2&gt;Disclaimer&lt;/h2&gt;
&lt;p&gt;The content of this blog post has a large portion of &lt;strong&gt;AI-generated text&lt;/strong&gt; (Google Gemini 2.5 Pro Deep Research). Although I have reviewed, edited the text, and did fact check, I cannot guarantee that it is 100% accurate or free of errors. Please use this content as a starting point for your own research and understanding, and verify any critical information independently.&lt;/p&gt;
&lt;p&gt;With that said, I believe this post is &lt;strong&gt;super well-written and informative&lt;/strong&gt;, and what really fascinates me is the &quot;problem-solving&quot; learning curve, which highlights the flaws and problems in every design choice, and segues into the components that solves the problem.&lt;/p&gt;
&lt;h2&gt;Part I: The Language of Hardware - Verilog Fundamentals&lt;/h2&gt;
&lt;p&gt;The study of processor design requires a fundamental shift in perspective. The tools and languages used to design hardware, such as Verilog, represent a different paradigm of computation. Understanding this paradigm is the first and most crucial step toward grasping the intricate workings of a modern CPU.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;Section 1.1: Thinking in Parallel&lt;/h3&gt;
&lt;p&gt;The most significant conceptual leap from software to hardware is the transition from sequential to concurrent execution. A typical software program, written in a language like C++, is a sequence of instructions executed one after another by a processor. The program&apos;s state changes in a predictable, linear progression. In contrast, a physical hardware circuit is a collection of components—gates, flip-flops, memory blocks—that, once powered on, operate continuously and in parallel. A thousand logic gates do not wait their turn; they all compute their output based on their current inputs simultaneously, every moment in time.&lt;/p&gt;
&lt;p&gt;It is for this reason that Verilog is classified as a &lt;strong&gt;Hardware Description Language (HDL)&lt;/strong&gt;, not a programming language in the traditional sense. Its primary purpose is not to provide a list of commands for a processor to execute, but to &lt;strong&gt;describe&lt;/strong&gt; the physical structure and behavior of a digital electronic circuit. This description serves two main purposes: it can be fed into a simulation tool to model how the described circuit will behave over time, or it can be used by a synthesis tool to generate a netlist, which is a detailed blueprint for manufacturing an Application-Specific Integrated Circuit (ASIC) or configuring a Field-Programmable Gate Array (FPGA).&lt;/p&gt;
&lt;p&gt;The fundamental unit of design in Verilog is the &lt;strong&gt;module&lt;/strong&gt;. A module is a self-contained block of hardware logic, analogous to a class in C++ or a physical integrated circuit (IC) chip. It encapsulates internal logic and defines a clear interface to the outside world through a set of ports, which are declared as &lt;strong&gt;input&lt;/strong&gt;, &lt;strong&gt;output&lt;/strong&gt;, or &lt;strong&gt;inout&lt;/strong&gt;. This modularity is essential for hierarchical design, allowing complex systems like an entire CPU to be built by connecting smaller, well-defined modules such as an Arithmetic Logic Unit (ALU), a register file, and a control unit.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;Section 1.2: Describing Behavior - &lt;code&gt;initial&lt;/code&gt; and &lt;code&gt;always&lt;/code&gt; Blocks&lt;/h3&gt;
&lt;p&gt;Within a Verilog module, the behavior of the circuit is described primarily within two types of procedural blocks: &lt;code&gt;initial&lt;/code&gt; and &lt;code&gt;always&lt;/code&gt;. These blocks contain statements that define how the outputs and internal state of the module should change in response to inputs and time.&lt;/p&gt;
&lt;h4&gt;The &lt;code&gt;initial&lt;/code&gt; Block&lt;/h4&gt;
&lt;p&gt;The &lt;code&gt;initial&lt;/code&gt; block is the simpler of the two. As its name suggests, it contains a block of code that begins execution only once, at the very start of a simulation, at time zero. If multiple &lt;code&gt;initial&lt;/code&gt; blocks are defined within a module, they all start concurrently at time zero.&lt;/p&gt;
&lt;p&gt;This &quot;run-once&quot; behavior has a critical implication: &lt;code&gt;initial&lt;/code&gt; blocks are generally &lt;strong&gt;not synthesizable&lt;/strong&gt;. Real hardware does not have a concept of a &quot;beginning of time&quot; in the same way a simulation does; once powered on, it operates continuously. Therefore, an &lt;code&gt;initial&lt;/code&gt; block cannot be translated into a physical circuit that performs an action only at power-on. Its primary role is within the realm of simulation, specifically in the construction of a &lt;strong&gt;testbench&lt;/strong&gt;. A testbench is a separate Verilog module written to test the design (often called the &quot;Design Under Test&quot; or DUT). Within a testbench, &lt;code&gt;initial&lt;/code&gt; blocks are indispensable for generating clock signals, providing a sequence of input stimuli to the DUT, and setting up initial memory states to verify the design&apos;s correctness.&lt;/p&gt;
&lt;h4&gt;The &lt;code&gt;always&lt;/code&gt; Block&lt;/h4&gt;
&lt;p&gt;The &lt;code&gt;always&lt;/code&gt; block is the cornerstone of synthesizable Verilog code. It contains a block of statements that execute repeatedly throughout the simulation. The execution of an &lt;code&gt;always&lt;/code&gt; block is triggered by events specified in its &lt;strong&gt;sensitivity list&lt;/strong&gt;, denoted by &lt;code&gt;@(...)&lt;/code&gt;. This behavior directly models the nature of real hardware, which continuously reacts to changes in its input signals or to clock edges.&lt;/p&gt;
&lt;p&gt;The sensitivity list dictates what kind of hardware the &lt;code&gt;always&lt;/code&gt; block describes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;always @(posedge clk)&lt;/code&gt;: This syntax specifies that the block should execute only on the positive (rising) edge of the signal named &lt;code&gt;clk&lt;/code&gt;. This is the standard way to describe &lt;strong&gt;sequential logic&lt;/strong&gt;, such as flip-flops and registers, which are memory elements that capture and store a value at a specific moment defined by a clock signal.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;always @(*)&lt;/code&gt;: The asterisk is a shorthand that tells the simulator to execute the block whenever &lt;strong&gt;any&lt;/strong&gt; of the signals read on the right-hand side of assignments within the block changes its value. This describes &lt;strong&gt;combinational logic&lt;/strong&gt;—circuits like adders, multiplexers, or decoders whose outputs depend solely on their current inputs, with no memory of past states.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Because these constructs map directly to physical hardware components (clocked registers and logic gates), &lt;code&gt;always&lt;/code&gt; blocks are the primary tool for describing the synthesizable behavior of a digital design.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;Section 1.3: The Heart of Synthesis - Blocking vs. Non-blocking Assignments&lt;/h3&gt;
&lt;p&gt;Perhaps the most frequent and critical point of confusion for those transitioning from software to Verilog is the distinction between the two types of assignment operators: &lt;strong&gt;blocking (&lt;code&gt;=&lt;/code&gt;)&lt;/strong&gt; and &lt;strong&gt;non-blocking (&lt;code&gt;&amp;#x3C;=&lt;/code&gt;)&lt;/strong&gt;. This is not a matter of stylistic preference; the choice of operator is a direct instruction to the synthesis tool about the type of hardware circuit to create. Misunderstanding this distinction is the leading cause of simulation-synthesis mismatches, where a design works perfectly in simulation but fails when implemented in actual hardware.&lt;/p&gt;
&lt;h4&gt;Blocking Assignments (&lt;code&gt;=&lt;/code&gt;)&lt;/h4&gt;
&lt;p&gt;A blocking assignment is executed in the order it appears within a procedural block, much like in a C program. The execution of the current statement &quot;blocks&quot; the execution of any subsequent statements in the same &lt;code&gt;begin...end&lt;/code&gt; block until it is complete. The variable on the left-hand side is updated immediately, and this new value is used by all subsequent statements in the block.&lt;/p&gt;
&lt;p&gt;This immediate-update behavior models a chain of &lt;strong&gt;combinational logic&lt;/strong&gt;. Imagine a series of logic gates connected by wires. The output of the first gate is instantaneously available as the input to the second gate. Blocking assignments are therefore the correct choice for describing this type of logic, typically within an &lt;code&gt;always @(*)&lt;/code&gt; block.&lt;/p&gt;
&lt;h4&gt;Non-blocking Assignments (&lt;code&gt;&amp;#x3C;=&lt;/code&gt;)&lt;/h4&gt;
&lt;p&gt;A non-blocking assignment operates in a two-phase manner that is fundamentally different from any software assignment. Within a block triggered by an event (like a clock edge), all the right-hand side (RHS) expressions of the non-blocking assignments are evaluated and stored in temporary variables &lt;strong&gt;first&lt;/strong&gt;. Only after all RHS expressions have been evaluated does the second phase begin, where the left-hand side (LHS) variables are all updated &lt;strong&gt;simultaneously&lt;/strong&gt; with their corresponding temporary values. The execution of one non-blocking assignment does not block the evaluation of the next.&lt;/p&gt;
&lt;p&gt;This two-phase mechanism perfectly models the behavior of a bank of &lt;strong&gt;sequential logic&lt;/strong&gt; elements, such as D-type flip-flops, that share a common clock. On a clock edge, all the flip-flops simultaneously sample the data at their D inputs. A short time later (the clock-to-Q delay), all their Q outputs change to reflect the newly captured values. The value of one flip-flop&apos;s output at the beginning of the clock cycle determines the input of the next flip-flop, but the update doesn&apos;t happen until the end of the cycle. Non-blocking assignments are therefore the correct and safe way to model state changes in sequential logic, and they should be used exclusively for assignments within a clocked &lt;code&gt;always @(posedge clk)&lt;/code&gt; block.&lt;/p&gt;
&lt;h4&gt;Example: The Shift Register&lt;/h4&gt;
&lt;p&gt;The difference becomes crystal clear with a simple 3-bit shift register example. The goal is to have a value at &lt;code&gt;data_in&lt;/code&gt; shift one position to the right on each clock cycle: &lt;code&gt;data_in -&gt; q1 -&gt; q2 -&gt; q3&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Incorrect Version (using Blocking &lt;code&gt;=&lt;/code&gt;)&lt;/strong&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-verilog&quot;&gt;always @(posedge clk) begin
  q1 = data_in;
  q2 = q1;
  q3 = q2;
end
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In simulation, on a single rising clock edge, &lt;code&gt;q1&lt;/code&gt; is immediately updated with &lt;code&gt;data_in&lt;/code&gt;. Because this is a blocking assignment, that new value of &lt;code&gt;q1&lt;/code&gt; is then immediately used to update &lt;code&gt;q2&lt;/code&gt;. And that new value of &lt;code&gt;q2&lt;/code&gt; is immediately used to update &lt;code&gt;q3&lt;/code&gt;. The result is that the value from &lt;code&gt;data_in&lt;/code&gt; propagates all the way to &lt;code&gt;q3&lt;/code&gt; within a single clock cycle. The synthesis tool will interpret this as a direct wire from &lt;code&gt;data_in&lt;/code&gt; to &lt;code&gt;q3&lt;/code&gt;, not a series of registers. This is not a shift register.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Correct Version (using Non-blocking &lt;code&gt;&amp;#x3C;=&lt;/code&gt;)&lt;/strong&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-verilog&quot;&gt;always @(posedge clk) begin
  q1 &amp;#x3C;= data_in;
  q2 &amp;#x3C;= q1;
  q3 &amp;#x3C;= q2;
end
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;On a rising clock edge, the simulator evaluates all RHS expressions first: &lt;code&gt;data_in&lt;/code&gt;, the &lt;strong&gt;current&lt;/strong&gt; value of &lt;code&gt;q1&lt;/code&gt;, and the &lt;strong&gt;current&lt;/strong&gt; value of &lt;code&gt;q2&lt;/code&gt;. Then, at the end of the simulation time step, it updates the LHS variables simultaneously. &lt;code&gt;q1&lt;/code&gt; gets the value of &lt;code&gt;data_in&lt;/code&gt;, &lt;code&gt;q2&lt;/code&gt; gets the &lt;strong&gt;old&lt;/strong&gt; value of &lt;code&gt;q1&lt;/code&gt;, and &lt;code&gt;q3&lt;/code&gt; gets the &lt;strong&gt;old&lt;/strong&gt; value of &lt;code&gt;q2&lt;/code&gt;. This correctly models three separate flip-flops, and it takes three clock cycles for a value to propagate from &lt;code&gt;data_in&lt;/code&gt; to &lt;code&gt;q3&lt;/code&gt;. This is a true shift register.&lt;/p&gt;
&lt;h4&gt;Pitfalls and Best Practices&lt;/h4&gt;
&lt;p&gt;Mixing blocking and non-blocking assignments in the same &lt;code&gt;always&lt;/code&gt; block, or using the wrong type for the logic intended, can lead to indeterminate behavior known as a &lt;strong&gt;race condition&lt;/strong&gt;. This occurs when the final state of a variable depends on the unpredictable order in which a simulator evaluates concurrent events. To avoid these issues and ensure a design that is both simulatable and synthesizable, designers adhere to strict rules of thumb:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;When modeling sequential logic (clocked &lt;code&gt;always&lt;/code&gt; blocks), use &lt;strong&gt;non-blocking&lt;/strong&gt; assignments (&lt;code&gt;&amp;#x3C;=&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;When modeling combinational logic (&lt;code&gt;always @(*)&lt;/code&gt; blocks), use &lt;strong&gt;blocking&lt;/strong&gt; assignments (&lt;code&gt;=&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Do not mix blocking and non-blocking assignments in the same &lt;code&gt;always&lt;/code&gt; block.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The underlying reason for these rules is to bridge the gap between the discrete event-scheduling model of a simulator and the continuous, physical reality of hardware. A non-blocking assignment is a directive to the simulator to &lt;strong&gt;schedule&lt;/strong&gt; an update for the end of the current time step, which is how a synthesis tool understands the need for a memory element (a flip-flop) that holds a value across clock cycles. A blocking assignment directs the simulator to update a value &lt;strong&gt;immediately&lt;/strong&gt;, which is how a synthesis tool understands a direct connection of logic gates whose output changes as soon as the input changes. Using the wrong operator creates a mismatch between what is simulated and what is built, which is the root cause of many hardware design bugs.&lt;/p&gt;
&lt;p&gt;| Feature                    | Blocking Assignment (&lt;code&gt;=&lt;/code&gt;)                                             | Non-blocking Assignment (&lt;code&gt;&amp;#x3C;=&lt;/code&gt;)                                      |
| :------------------------- | :-------------------------------------------------------------------- | :------------------------------------------------------------------ |
| &lt;strong&gt;Operator&lt;/strong&gt;               | &lt;code&gt;=&lt;/code&gt;                                                                   | &lt;code&gt;&amp;#x3C;=&lt;/code&gt;                                                                |
| &lt;strong&gt;Execution Model&lt;/strong&gt;        | Sequential, in-order execution within a block. Updates are immediate. | Parallel evaluation of RHS, followed by simultaneous update of LHS. |
| &lt;strong&gt;Hardware Inference&lt;/strong&gt;     | Combinational logic (wires, gates).                                   | Sequential logic (flip-flops, registers).                           |
| &lt;strong&gt;Typical &lt;code&gt;always&lt;/code&gt; Block&lt;/strong&gt; | &lt;code&gt;always @(*)&lt;/code&gt;                                                         | &lt;code&gt;always @(posedge clk)&lt;/code&gt;                                             |
| &lt;strong&gt;Use Case Example&lt;/strong&gt;       | &lt;code&gt;assign y = sel ? b : a;&lt;/code&gt;                                             | &lt;code&gt;always @(posedge clk) begin q &amp;#x3C;= d; end&lt;/code&gt;                           |&lt;/p&gt;
&lt;h2&gt;Part II: The Blueprint of a CPU - The RISC-V ISA&lt;/h2&gt;
&lt;p&gt;Having established the language for describing hardware, the next step is to understand the vocabulary that a processor speaks. This vocabulary is its Instruction Set Architecture (ISA), the fundamental interface between software and hardware. For this exploration, the RISC-V ISA provides an ideal foundation due to its modern, clean, and extensible design.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;Section 2.1: An Introduction to Instruction Set Architectures (ISA)&lt;/h3&gt;
&lt;p&gt;An ISA is the abstract model of a computer that is visible to a machine-language programmer or compiler. It is the definitive contract between the software that runs on a processor and the hardware that executes it. This contract specifies a set of critical elements, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The set of available instructions (the &quot;opcodes&quot;).&lt;/li&gt;
&lt;li&gt;The native data types.&lt;/li&gt;
&lt;li&gt;The programmer-visible registers.&lt;/li&gt;
&lt;li&gt;The memory addressing modes.&lt;/li&gt;
&lt;li&gt;The handling of events like interrupts and exceptions.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Any processor that correctly implements a given ISA will execute the same machine code and produce the same results, regardless of its internal microarchitectural design. An Intel Core i9 and an AMD Ryzen processor, for example, have vastly different internal designs but can both run Windows because they both implement the x86-64 ISA.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;Section 2.2: The RISC-V Revolution - Openness and Modularity&lt;/h3&gt;
&lt;p&gt;RISC-V (pronounced &quot;risk-five&quot;) is not just another ISA; it represents a paradigm shift in how ISAs are developed and used. It was born at the University of California, Berkeley, in 2010 with the goal of creating a practical, high-quality ISA that was open, free, and suitable for a wide range of computing applications, from academic research to industrial deployment.&lt;/p&gt;
&lt;h4&gt;The RISC Philosophy&lt;/h4&gt;
&lt;p&gt;At its core, RISC-V is a pure embodiment of the &lt;strong&gt;Reduced Instruction Set Computer (RISC)&lt;/strong&gt; philosophy. This design approach contrasts with the Complex Instruction Set Computer (CISC) paradigm of architectures like x86. The core tenets of RISC, and by extension RISC-V, are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;A small number of simple instructions:&lt;/strong&gt; The instruction set is kept minimal, focusing on fundamental operations. More complex operations are built by combining these simple instructions.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Fixed-length instruction encoding:&lt;/strong&gt; All base instructions are the same length (32 bits), which dramatically simplifies the hardware required for instruction fetching and decoding.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Load/Store architecture:&lt;/strong&gt; The only instructions that access memory are explicit &lt;strong&gt;load&lt;/strong&gt; and &lt;strong&gt;store&lt;/strong&gt; operations. All arithmetic and logical operations are performed on operands held in processor registers. This simplifies the control logic and encourages efficient register usage by compilers.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;One instruction per cycle:&lt;/strong&gt; The simplicity of the instructions is designed to allow for execution in a single clock cycle in a basic pipeline, which is key to achieving high performance.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This adherence to simplicity results in a more streamlined processor design, leading to improved performance, lower power consumption, and reduced design complexity.&lt;/p&gt;
&lt;h4&gt;Open and Free&lt;/h4&gt;
&lt;p&gt;Unlike proprietary ISAs such as x86 and ARM, the RISC-V specification is developed and maintained by the non-profit RISC-V International and is available under open-source licenses. This means anyone can design, manufacture, and sell RISC-V chips and software without paying royalties. This openness has catalyzed a global wave of innovation, enabling startups, academic institutions, and even large corporations to develop custom processors tailored for specific applications without the barrier of licensing fees or vendor lock-in.&lt;/p&gt;
&lt;h4&gt;Modular Design&lt;/h4&gt;
&lt;p&gt;A defining feature of RISC-V is its inherent modularity. The ISA is not a monolithic entity but is structured as a small, mandatory &lt;strong&gt;base integer ISA&lt;/strong&gt; with a rich set of optional &lt;strong&gt;standard extensions&lt;/strong&gt;. A processor&apos;s full ISA is specified by its base and the extensions it implements. For instance, a common configuration for a 64-bit general-purpose processor is denoted &lt;strong&gt;RV64GC&lt;/strong&gt;, which stands for &lt;strong&gt;RV64IMAFDC&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;The base integer ISAs are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;RV32I&lt;/strong&gt;: The base 32-bit integer instruction set with 32 integer registers (x0-x31).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;RV64I&lt;/strong&gt;: The base 64-bit integer instruction set, extending the registers and operations to 64 bits.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;RV32E&lt;/strong&gt;: An embedded variant of &lt;code&gt;RV32I&lt;/code&gt; with only 16 integer registers, designed for the smallest microcontrollers.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The most common standard extensions, often grouped under the letter &apos;G&apos; for &quot;General-Purpose,&quot; are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;M&lt;/strong&gt;: Standard Extension for Integer Multiplication and Division. Adds instructions like &lt;code&gt;mul&lt;/code&gt;, &lt;code&gt;div&lt;/code&gt;, and &lt;code&gt;rem&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;A&lt;/strong&gt;: Standard Extension for Atomic Instructions. Provides instructions for atomic memory operations (e.g., &lt;code&gt;amoswap&lt;/code&gt;), essential for synchronization in multi-core systems.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;F&lt;/strong&gt;: Standard Extension for Single-Precision Floating-Point. Adds a separate floating-point register file (f0-f31) and instructions for 32-bit floating-point arithmetic.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;D&lt;/strong&gt;: Standard Extension for Double-Precision Floating-Point. Extends the &lt;code&gt;F&lt;/code&gt; extension with support for 64-bit floating-point operations.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;C&lt;/strong&gt;: Standard Extension for Compressed Instructions. Defines 16-bit versions of the most common 32-bit instructions. This can significantly reduce code size and improve instruction fetch bandwidth, which is critical in memory-constrained embedded systems and for performance in high-end cores.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This modularity allows designers to create highly optimized processors. A tiny microcontroller for an IoT sensor might only implement &lt;code&gt;RV32EMC&lt;/code&gt;, while a high-performance application processor in a data center might implement &lt;code&gt;RV64G&lt;/code&gt; plus extensions for vector processing (V) and bit manipulation (B).&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;Section 2.3: Anatomy of a RISC-V Instruction&lt;/h3&gt;
&lt;p&gt;All base RISC-V instructions are 32 bits long and fall into one of a few well-defined formats. The regularity of these formats is a key design feature that enables the simple, high-performance pipelines for which RISC architectures are known. The primary formats are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;R-type (Register):&lt;/strong&gt; Used for register-to-register operations like &lt;code&gt;add&lt;/code&gt;, &lt;code&gt;sub&lt;/code&gt;, &lt;code&gt;and&lt;/code&gt;, &lt;code&gt;or&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;| funct7 (7) | rs2 (5) | rs1 (5) | funct3 (3) | rd (5) | opcode (7) |
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;opcode&lt;/code&gt;: Defines the instruction type (e.g., &lt;code&gt;OP&lt;/code&gt; for register-register arithmetic).&lt;/li&gt;
&lt;li&gt;&lt;code&gt;rd&lt;/code&gt;: The destination register.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;funct3&lt;/code&gt;: Further specifies the operation (e.g., &lt;code&gt;ADD&lt;/code&gt;/&lt;code&gt;SUB&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;&lt;code&gt;rs1&lt;/code&gt;, &lt;code&gt;rs2&lt;/code&gt;: The two source registers.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;funct7&lt;/code&gt;: An additional field to differentiate operations (e.g., &lt;code&gt;ADD&lt;/code&gt; from &lt;code&gt;SUB&lt;/code&gt;).&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;I-type (Immediate):&lt;/strong&gt; Used for operations with an immediate value, including &lt;code&gt;addi&lt;/code&gt;, and for load instructions like &lt;code&gt;lw&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;| imm[11:0] (12) | rs1 (5) | funct3 (3) | rd (5) | opcode (7) |
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;imm[11:0]&lt;/code&gt;: A 12-bit signed immediate value.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;rs1&lt;/code&gt;: The source register.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;rd&lt;/code&gt;: The destination register.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;S-type (Store):&lt;/strong&gt; Used for store instructions like &lt;code&gt;sw&lt;/code&gt; (store word).&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;| imm[11:5] (7) | rs2 (5) | rs1 (5) | funct3 (3) | imm[4:0] (5) | opcode (7) |
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;The 12-bit immediate is split to accommodate the two source register fields.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;rs1&lt;/code&gt;: The base address register.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;rs2&lt;/code&gt;: The register containing the data to be stored.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;B-type (Branch):&lt;/strong&gt; Used for conditional branch instructions like &lt;code&gt;beq&lt;/code&gt; (branch if equal). Similar to S-type, the immediate is split.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;| imm[12|10:5] (7) | rs2 (5) | rs1 (5) | funct3 (3) | imm[4:1|11] (5) | opcode (7) |
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;rs1&lt;/code&gt;, &lt;code&gt;rs2&lt;/code&gt;: The registers to be compared.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;imm&lt;/code&gt;: The signed branch offset, which is multiplied by 2 and added to the PC.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;U-type (Upper Immediate):&lt;/strong&gt; Used for loading a 20-bit upper immediate value, as in &lt;code&gt;lui&lt;/code&gt; (load upper immediate).&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;| imm[31:12] (20) | rd (5) | opcode (7) |
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;J-type (Jump):&lt;/strong&gt; Used for unconditional jumps like &lt;code&gt;jal&lt;/code&gt; (jump and link).&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;| imm[20|10:1|11|19:12] (20) | rd (5) | opcode (7) |
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The deliberate and consistent placement of the &lt;code&gt;opcode&lt;/code&gt;, &lt;code&gt;rs1&lt;/code&gt;, &lt;code&gt;rs2&lt;/code&gt;, and &lt;code&gt;rd&lt;/code&gt; fields across these formats is not an accident. It is a cornerstone of efficient RISC design. In a pipelined processor, the Instruction Decode (ID) stage must identify the source registers and read their values from the register file. Because &lt;code&gt;rs1&lt;/code&gt; and &lt;code&gt;rs2&lt;/code&gt; are always in the same bit positions for all instruction formats that use them (R, I, S, B), the decoder hardware is greatly simplified. It can begin reading from the register file before it has even finished fully decoding the instruction to determine the exact operation. This parallelism within the ID stage is a crucial enabler of the classic 5-stage RISC pipeline, a concept that forms the foundation of modern processor execution.&lt;/p&gt;
&lt;h2&gt;Part III: The Assembly Line - Pipelined Execution and Its Perils&lt;/h2&gt;
&lt;p&gt;To achieve high performance, modern processors do not execute instructions one at a time, waiting for each to complete before starting the next. Instead, they use a technique called &lt;strong&gt;pipelining&lt;/strong&gt;, which overlaps the execution of multiple instructions, much like an assembly line in a factory. This approach is fundamental to all high-performance CPUs, and the RISC-V ISA is explicitly designed to facilitate it.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;Section 3.1: The Classic 5-Stage RISC Pipeline&lt;/h3&gt;
&lt;p&gt;Pipelining increases the &lt;strong&gt;instruction throughput&lt;/strong&gt;—the number of instructions completed per unit of time—without necessarily decreasing the &lt;strong&gt;latency&lt;/strong&gt; of any single instruction. The concept is best understood through the analogy of doing laundry. A sequential approach would be to wash, dry, fold, and put away one load of laundry completely before starting the next. A pipelined approach starts the washer on the second load as soon as the first load moves to the dryer. By keeping all stages (washer, dryer, folding table) busy, the total time to complete many loads is significantly reduced.&lt;/p&gt;
&lt;p&gt;Similarly, the execution of a RISC instruction can be broken down into a series of uniform steps. The classic RISC pipeline consists of five stages:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;IF (Instruction Fetch):&lt;/strong&gt; The processor fetches the 32-bit instruction from the instruction memory (or cache) at the address currently held by the Program Counter (PC). Concurrently, the PC is updated to point to the next instruction, which is typically at address $PC+4$ since each instruction is 4 bytes long.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;ID (Instruction Decode and Register Fetch):&lt;/strong&gt; The fetched instruction is decoded by the control unit to determine what operation to perform. The format of the instruction is identified, and the required control signals for subsequent stages are generated. Simultaneously, the source register identifiers (&lt;code&gt;rs1&lt;/code&gt; and &lt;code&gt;rs2&lt;/code&gt;) are used to read their corresponding values from the processor&apos;s register file. Any immediate value in the instruction is also sign-extended and prepared for use.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;EX (Execute):&lt;/strong&gt; This is where the actual computation occurs. The Arithmetic Logic Unit (ALU) performs the operation specified by the instruction. This could be an arithmetic operation (add, &lt;code&gt;sub&lt;/code&gt;), a logical operation (and, &lt;code&gt;or&lt;/code&gt;), a memory address calculation for a load or store (by adding the base register and the immediate offset), or a comparison for a branch instruction.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;MEM (Memory Access):&lt;/strong&gt; This stage is active only for load and store instructions. For a &lt;strong&gt;load&lt;/strong&gt; instruction (&lt;code&gt;lw&lt;/code&gt;), the address calculated in the EX stage is used to read data from the data memory (or cache). For a &lt;strong&gt;store&lt;/strong&gt; instruction (&lt;code&gt;sw&lt;/code&gt;), the address and data are used to write to the data memory. For all other instructions (e.g., arithmetic or branch), this stage performs no operation.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;WB (Write-Back):&lt;/strong&gt; The final stage writes the result of the operation back into the register file. For an arithmetic instruction, the result comes from the ALU. For a &lt;strong&gt;load&lt;/strong&gt; instruction, the result is the data read from memory. The destination register identifier (&lt;code&gt;rd&lt;/code&gt;) from the instruction determines which register is written.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In an ideal scenario, a new instruction enters the IF stage every clock cycle. After five cycles, the pipeline is full, and one instruction completes every cycle, achieving an ideal throughput of one instruction per cycle (IPC).&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;Section 3.2: When the Assembly Line Breaks - Pipeline Hazards&lt;/h3&gt;
&lt;p&gt;The simple, elegant model of the 5-stage pipeline breaks down when dependencies between instructions conflict with the overlapped execution model. These conflicts are known as &lt;strong&gt;pipeline hazards&lt;/strong&gt;, and they are the primary challenge in processor design. Hazards force the pipeline to stall, inserting &quot;bubbles&quot; where no useful work is done, thereby degrading performance. There are three main types of hazards.&lt;/p&gt;
&lt;h4&gt;Structural Hazards&lt;/h4&gt;
&lt;p&gt;A &lt;strong&gt;structural hazard&lt;/strong&gt; occurs when two or more instructions in the pipeline require the same hardware resource at the same time. A classic example is a processor with a single, unified memory for both instructions and data. In such a design, a &lt;strong&gt;load&lt;/strong&gt; instruction in its MEM stage would need to access memory simultaneously with a later instruction in its IF stage, which also needs to access memory to be fetched. This resource conflict would force one of the instructions to wait. The standard solution in RISC processors is to use a &lt;strong&gt;Harvard architecture&lt;/strong&gt;, which employs separate, independent memories or caches for instructions and data, thus eliminating this specific hazard. Another potential structural hazard is in the register file, which is accessed for reads in the ID stage and for writes in the WB stage. This is typically resolved by designing the register file with separate read and write ports, or by performing writes in the first half of the clock cycle and reads in the second half.&lt;/p&gt;
&lt;h4&gt;Data Hazards&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Data hazards&lt;/strong&gt; arise from data dependencies between instructions. They occur when an instruction&apos;s execution depends on the result of a preceding instruction that is still in the pipeline.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Read-After-Write (RAW):&lt;/strong&gt; This is the most common and intuitive data hazard. An instruction attempts to read a source register before a previous instruction has written its result back to that register. Consider the sequence:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;add x5, x1, x2  // Instruction 1
sub x6, x5, x3  // Instruction 2
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;sub&lt;/code&gt; instruction needs the value of &lt;code&gt;x5&lt;/code&gt;, but the &lt;code&gt;add&lt;/code&gt; instruction only calculates it in its EX stage and writes it back in its WB stage. By the time the &lt;code&gt;sub&lt;/code&gt; instruction is in its ID stage ready to read &lt;code&gt;x5&lt;/code&gt;, the &lt;code&gt;add&lt;/code&gt; instruction has not yet completed its WB stage, so the register file contains an old, stale value for &lt;code&gt;x5&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Write-After-Read (WAR):&lt;/strong&gt; An instruction tries to write to a destination register before a preceding instruction has finished reading that register&apos;s original value. This is not a problem in the simple 5-stage pipeline because reads always happen in an earlier stage (ID) than writes (WB). However, it becomes a major issue in processors with out-of-order execution.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Write-After-Write (WAW):&lt;/strong&gt; Two instructions in the pipeline are scheduled to write to the same destination register. Similar to WAR, this is not an issue in a simple in-order pipeline where writes happen in program order, but it is a critical hazard that must be managed in more complex designs.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;Control Hazards&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Control hazards&lt;/strong&gt;, also known as branch hazards, are caused by branch and jump instructions that change the normal flow of program execution. The processor does not know the outcome of a conditional branch (whether it is taken or not taken) until the comparison is performed in the EX stage. By that time, the processor has already fetched and started decoding the instructions that sequentially follow the branch (at $PC+4$). If the branch is taken, these fetched instructions are incorrect and must be flushed from the pipeline, and the fetch must restart from the branch target address. This flushing process introduces stalls, or bubbles, into the pipeline, reducing performance.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;Section 3.3: Basic Hazard Resolution - Stalling and Forwarding&lt;/h3&gt;
&lt;p&gt;To ensure correct program execution, hazards must be detected and resolved by the processor&apos;s control logic.&lt;/p&gt;
&lt;h4&gt;Stalling (Pipeline Bubbles)&lt;/h4&gt;
&lt;p&gt;The most straightforward solution to a hazard is to &lt;strong&gt;stall&lt;/strong&gt; the pipeline. When the hazard detection logic in the ID stage identifies a dependency (e.g., a RAW hazard), it can freeze the early stages of the pipeline and insert no-operation instructions, or &quot;bubbles,&quot; into the later stages. For the &lt;code&gt;add&lt;/code&gt;/&lt;code&gt;sub&lt;/code&gt; example above, the &lt;code&gt;sub&lt;/code&gt; instruction would be held in the ID stage for several cycles until the &lt;code&gt;add&lt;/code&gt; instruction completes its WB stage and the new value of &lt;code&gt;x5&lt;/code&gt; is available in the register file. While simple and effective, stalling is inefficient as it directly reduces the pipeline&apos;s throughput.&lt;/p&gt;
&lt;h4&gt;Forwarding (Bypassing)&lt;/h4&gt;
&lt;p&gt;A much more efficient solution for most data hazards is &lt;strong&gt;forwarding&lt;/strong&gt;, also known as &lt;strong&gt;bypassing&lt;/strong&gt;. The key observation is that the result of an operation is often available within the pipeline long before it is written back to the register file. For example, the result of the &lt;code&gt;add&lt;/code&gt; instruction is available at the output of the ALU at the end of the EX stage. Forwarding logic adds extra data paths to send this result directly from the output of a later stage (like EX or MEM) back to the input of an earlier stage (like EX) for a subsequent, dependent instruction. This bypasses the need to wait for the result to be written to and then read from the register file. In the &lt;code&gt;add&lt;/code&gt;/&lt;code&gt;sub&lt;/code&gt; example, the result from the &lt;code&gt;add&lt;/code&gt; instruction&apos;s EX stage can be forwarded directly to the input of the &lt;code&gt;sub&lt;/code&gt; instruction&apos;s EX stage, completely eliminating the stall.&lt;/p&gt;
&lt;p&gt;However, forwarding cannot solve all data hazards. A classic case is the &lt;strong&gt;load-use hazard&lt;/strong&gt;. Consider this sequence:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;lw  x5, 0(x1)   // Instruction 1
add x6, x5, x2  // Instruction 2
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;lw&lt;/code&gt; instruction only has the data from memory available at the end of its MEM stage. The &lt;code&gt;add&lt;/code&gt; instruction needs this data at the beginning of its EX stage. Even with a forwarding path from the MEM stage back to the EX stage, the data arrives one cycle too late. The &lt;code&gt;add&lt;/code&gt; instruction must be stalled for one cycle. This limitation, along with the performance penalty from control hazards and the inefficiency of handling long-latency operations like floating-point division, reveals the inherent performance ceiling of a rigid, in-order pipeline. It is this ceiling that motivates the development of more sophisticated, dynamic execution techniques that can look further ahead in the instruction stream to find independent work to do.&lt;/p&gt;
&lt;p&gt;| Hazard Type | Description                                                            | Example RISC-V Sequence                                              | Simple Pipeline Effect                                                 | Solution(s)                                                           |
| :---------- | :--------------------------------------------------------------------- | :------------------------------------------------------------------- | :--------------------------------------------------------------------- | :-------------------------------------------------------------------- |
| Structural  | Two instructions need the same resource in the same cycle.             | &lt;code&gt;lw&lt;/code&gt; in MEM stage, &lt;code&gt;add&lt;/code&gt; in IF stage, both needing a unified memory. | One instruction must stall.                                            | Separate Instruction/Data Memories (Harvard Architecture).            |
| Data (RAW)  | An instruction needs the result of a previous, unfinished instruction. | &lt;code&gt;add x5, x1, x2&lt;/code&gt; followed by &lt;code&gt;sub x6, x5, x3&lt;/code&gt;                        | &lt;code&gt;sub&lt;/code&gt; reads a stale value of &lt;code&gt;x5&lt;/code&gt; from the register file.              | Stalling, Forwarding (Bypassing).                                     |
| Control     | The address of the next instruction is unknown due to a branch.        | &lt;code&gt;beq x1, x2, L1&lt;/code&gt; followed by &lt;code&gt;add x3, x4, x5&lt;/code&gt;                        | Processor fetches &lt;code&gt;add&lt;/code&gt; before knowing if the branch to &lt;code&gt;L1&lt;/code&gt; is taken. | Stall until branch resolves, Branch Prediction, Flush incorrect path. |&lt;/p&gt;
&lt;h2&gt;Part IV: The Brains of the Operation - Dynamic Scheduling with Tomasulo&apos;s Algorithm&lt;/h2&gt;
&lt;p&gt;The limitations of in-order pipelining become severe in the presence of long-latency operations (like floating-point arithmetic or cache misses) and frequent data dependencies. Stalls can quickly dominate the execution time, leaving valuable functional units idle. To overcome this, high-performance processors employ &lt;strong&gt;dynamic scheduling&lt;/strong&gt;, a technique that allows instructions to execute out of their original program order. The seminal hardware algorithm for this is Tomasulo&apos;s algorithm, first implemented in the IBM System/360 Model 91.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;Section 4.1: Beyond In-Order Execution&lt;/h3&gt;
&lt;p&gt;The core idea behind dynamic scheduling is to shift from a control-flow-driven execution model to a &lt;strong&gt;dataflow-driven&lt;/strong&gt; one. In a simple pipeline, an instruction executes when it reaches the front of the line. In a dynamically scheduled machine, an instruction is allowed to execute as soon as all of its required operands are available, regardless of its position in the original program sequence. This decoupling of instruction issue (fetching and decoding) from execution allows the processor to look ahead in the instruction stream, find independent instructions, and execute them while a prior, dependent instruction is stalled waiting for its data. This significantly increases the utilization of the processor&apos;s multiple execution units and improves overall performance.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;Section 4.2: Core Components of the Tomasulo Machine&lt;/h3&gt;
&lt;p&gt;Tomasulo&apos;s algorithm achieves this dataflow execution through three key hardware components that work in concert.&lt;/p&gt;
&lt;h4&gt;Reservation Stations (RS)&lt;/h4&gt;
&lt;p&gt;Instead of a single pipeline, a Tomasulo-based processor has a set of functional units (e.g., one or more adders, multipliers, load/store units), each equipped with its own set of buffers called &lt;strong&gt;Reservation Stations (RS)&lt;/strong&gt;. When an instruction is decoded, it is issued to a free reservation station associated with the required functional unit. The RS acts as a waiting area, holding the instruction until it is ready to execute.&lt;/p&gt;
&lt;p&gt;Each entry in a reservation station contains the following fields:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Busy:&lt;/strong&gt; A bit indicating whether the station is in use.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Op:&lt;/strong&gt; The operation to be performed (e.g., &lt;code&gt;ADD&lt;/code&gt;, &lt;code&gt;MUL&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Vj, Vk:&lt;/strong&gt; The actual values of the two source operands. These fields are filled if the operand values are already available in the register file when the instruction is issued.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Qj, Qk:&lt;/strong&gt; The source operand &lt;strong&gt;tags&lt;/strong&gt;. If an operand is not yet available because it is being produced by another instruction currently in-flight, these fields will hold a tag that identifies which reservation station will produce the required result. A value of zero or null in these fields indicates that the corresponding &lt;code&gt;V&lt;/code&gt; field holds a valid operand.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Dest:&lt;/strong&gt; A tag identifying the destination of the result (in modern implementations, this is a pointer to a Reorder Buffer entry).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The RS continuously monitors for its required operands. Once both &lt;code&gt;Qj&lt;/code&gt; and &lt;code&gt;Qk&lt;/code&gt; are zero (meaning &lt;code&gt;Vj&lt;/code&gt; and &lt;code&gt;Vk&lt;/code&gt; are both valid), the instruction is ready to be dispatched to its functional unit for execution.&lt;/p&gt;
&lt;h4&gt;The Common Data Bus (CDB)&lt;/h4&gt;
&lt;p&gt;The &lt;strong&gt;Common Data Bus (CDB)&lt;/strong&gt; is a broadcast bus that connects the outputs of all functional units to the inputs of all reservation stations and the register file. When a functional unit finishes its computation, it does not just write the result to a register. Instead, it places both the computed &lt;strong&gt;value&lt;/strong&gt; and its unique &lt;strong&gt;tag&lt;/strong&gt; (the name of the reservation station that produced it) onto the CDB.&lt;/p&gt;
&lt;p&gt;All reservation stations are &quot;snooping&quot; (monitoring) the CDB in every cycle. If an RS sees a tag on the CDB that matches a tag in its &lt;code&gt;Qj&lt;/code&gt; or &lt;code&gt;Qk&lt;/code&gt; field, it knows its long-awaited operand is now available. It grabs the value from the CDB, places it into the corresponding &lt;code&gt;Vj&lt;/code&gt; or &lt;code&gt;Vk&lt;/code&gt; field, and clears the &lt;code&gt;Qj&lt;/code&gt; or &lt;code&gt;Qk&lt;/code&gt; field to zero. This mechanism allows results to be forwarded directly from producer to consumer without ever needing to pass through the register file, dramatically reducing stalls from RAW dependencies.&lt;/p&gt;
&lt;h4&gt;Hardware Register Renaming&lt;/h4&gt;
&lt;p&gt;Out-of-order execution introduces the possibility of WAR and WAW hazards, which were not a problem in the simple in-order pipeline. Tomasulo&apos;s algorithm elegantly eliminates these hazards through a mechanism called &lt;strong&gt;hardware register renaming&lt;/strong&gt;.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Why WAR and WAW hazards are a problem in out-of-order execution? You can think about it yourself, or read &lt;a href=&quot;#appendix-a&quot;&gt;Appendix A&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The key is to decouple the architectural registers (the names visible to the programmer, e.g., F0, F2, F4) from the physical storage locations (the reservation stations). A mapping table, often called the &lt;strong&gt;Register Alias Table (RAT)&lt;/strong&gt; or Register Result Status, maintains the current mapping. For each architectural register, this table stores the tag of the reservation station that will produce the next value for that register.&lt;/p&gt;
&lt;p&gt;The process works as follows:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Issue:&lt;/strong&gt; When an instruction like &lt;code&gt;ADD.D F6, F8, F2&lt;/code&gt; is issued, the control logic looks up &lt;code&gt;F8&lt;/code&gt; and &lt;code&gt;F2&lt;/code&gt; in the RAT.
&lt;ul&gt;
&lt;li&gt;If the RAT entry for a source register is empty, the value is ready in the main register file. This value is copied to the &lt;code&gt;V&lt;/code&gt; field of the reservation station.&lt;/li&gt;
&lt;li&gt;If the RAT entry contains a tag (e.g., &lt;code&gt;Add1&lt;/code&gt;), it means another instruction is currently computing the value. This tag is copied into the &lt;code&gt;Q&lt;/code&gt; field of the new reservation station.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Rename:&lt;/strong&gt; After reading the source tags, the logic updates the RAT entry for the destination register, &lt;code&gt;F6&lt;/code&gt;, with the tag of the newly allocated reservation station (e.g., &lt;code&gt;Add2&lt;/code&gt;). Now, any subsequent instruction that needs &lt;code&gt;F6&lt;/code&gt; will be directed to get its value from &lt;code&gt;Add2&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This renaming process breaks false dependencies. If a later instruction also writes to &lt;code&gt;F6&lt;/code&gt; (a WAW hazard), it will simply be allocated a new reservation station (&lt;code&gt;Add3&lt;/code&gt;), and the RAT will be updated to point to &lt;code&gt;Add3&lt;/code&gt;. The original &lt;code&gt;ADD.D&lt;/code&gt; instruction is unaffected because it is already linked to &lt;code&gt;Add2&lt;/code&gt;. Similarly, WAR hazards are eliminated because source operands either get their value immediately or are linked to a specific producer via a tag; a subsequent write to that source register will be renamed to a new physical location and will not affect the original value needed by the earlier instruction.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;Section 4.3: A Cycle-by-Cycle Walkthrough of Tomasulo&apos;s Algorithm&lt;/h3&gt;
&lt;p&gt;To solidify these concepts, a detailed, cycle-by-cycle trace of a sequence of dependent instructions is invaluable. This walkthrough will demonstrate the dynamic interplay between the reservation stations, the RAT, and the CDB.&lt;/p&gt;
&lt;h4&gt;Simulation Setup:&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Functional Units:&lt;/strong&gt;
1 Integer Unit (for effective address calculation): 1 cycle latency.
2 FP Adders (for &lt;code&gt;ADD.D&lt;/code&gt;, &lt;code&gt;SUB.D&lt;/code&gt;): 2 cycles latency.
2 FP Multipliers (for &lt;code&gt;MUL.D&lt;/code&gt;): 10 cycles latency.
1 FP Divider (for &lt;code&gt;DIV.D&lt;/code&gt;): 40 cycles latency.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Instruction Issue:&lt;/strong&gt; 1 instruction per cycle.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;CDB:&lt;/strong&gt; 1 result can be broadcast per cycle.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Reservation Stations:&lt;/strong&gt; 3 for Add/Sub, 2 for Mult/Div.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;Example Instruction Sequence:&lt;/h4&gt;
&lt;pre&gt;&lt;code&gt;L.D F6, 34(R2)
L.D F2, 45(R3)
MUL.D F0, F2, F4
SUB.D F8, F6, F2
DIV.D F10, F0, F6
ADD.D F6, F8, F2
&lt;/code&gt;&lt;/pre&gt;
&lt;h4&gt;Initial State:&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Register File:&lt;/strong&gt; R2=100, R3=200, F4=2.0. All other FP registers have some initial value.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Memory:&lt;/strong&gt; Mem[134]=10.0, Mem[245]=5.0.&lt;/li&gt;
&lt;li&gt;All Reservation Stations are empty.&lt;/li&gt;
&lt;li&gt;Register Result Status is empty (all values are in the Register File).&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h4&gt;Cycle 1:&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Events:&lt;/strong&gt; &lt;code&gt;L.D F6, 34(R2)&lt;/code&gt; is issued.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Actions:&lt;/strong&gt;
A Load buffer (Load1) is allocated.
The value of &lt;code&gt;R2&lt;/code&gt; (100) is read from the integer register file.
The effective address is calculated immediately: $100 + 34 = 134$.
The Register Result Status for &lt;code&gt;F6&lt;/code&gt; is updated to point to &lt;code&gt;Load1&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;| Instruction      | Issue | Execute | Write Result |
| :--------------- | :---- | :------ | :----------- |
| &lt;code&gt;L.D F6, 34(R2)&lt;/code&gt; | 1     |         |              |&lt;/p&gt;
&lt;p&gt;| Reservation Stations | Busy | Op   | Vj  | Vk  | Qj  | Qk  | Address |
| :------------------- | :--- | :--- | :-- | :-- | :-- | :-- | :------ |
| Load1                | Yes  | Load | 100 |     |     |     | 34      |
| Load2                | No   |      |     |     |     |     |         |&lt;/p&gt;
&lt;p&gt;| Register Result Status | F0  | F2  | F4  | F6    | F8  | F10 |
| :--------------------- | :-- | :-- | :-- | :---- | :-- | :-- |
| Qi                     |     |     |     | Load1 |     |     |&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;CDB Activity:&lt;/strong&gt; None.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h4&gt;Cycle 2:&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Events:&lt;/strong&gt; &lt;code&gt;L.D F2, 45(R3)&lt;/code&gt; is issued. &lt;code&gt;Load1&lt;/code&gt; begins memory access.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Actions:&lt;/strong&gt;
Load2 buffer is allocated.
Value of &lt;code&gt;R3&lt;/code&gt; (200) is read. Effective address $200 + 45 = 245$ is calculated.
Register Result Status for &lt;code&gt;F2&lt;/code&gt; is updated to &lt;code&gt;Load2&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;| Instruction      | Issue | Execute | Write Result |
| :--------------- | :---- | :------ | :----------- |
| &lt;code&gt;L.D F6, 34(R2)&lt;/code&gt; | 1     | 2       |              |
| &lt;code&gt;L.D F2, 45(R3)&lt;/code&gt; | 2     |         |              |&lt;/p&gt;
&lt;p&gt;| Reservation Stations | Busy | Op   | Vj  | Vk  | Qj  | Qk  | Address |
| :------------------- | :--- | :--- | :-- | :-- | :-- | :-- | :------ |
| Load1                | Yes  | Load | 100 |     |     |     | 34      |
| Load2                | Yes  | Load | 200 |     |     |     | 45      |&lt;/p&gt;
&lt;p&gt;| Register Result Status | F0  | F2    | F4  | F6    | F8  | F10 |
| :--------------------- | :-- | :---- | :-- | :---- | :-- | :-- |
| Qi                     |     | Load2 |     | Load1 |     |     |&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;CDB Activity:&lt;/strong&gt; None.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h4&gt;Cycle 3:&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Events:&lt;/strong&gt; &lt;code&gt;MUL.D F0, F2, F4&lt;/code&gt; is issued. &lt;code&gt;Load1&lt;/code&gt; completes memory access. &lt;code&gt;Load2&lt;/code&gt; begins memory access.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Actions:&lt;/strong&gt;
A multiplier RS (Mult1) is allocated.
RAT is checked for sources &lt;code&gt;F2&lt;/code&gt; and &lt;code&gt;F4&lt;/code&gt;. &lt;code&gt;F2&lt;/code&gt; is being produced by &lt;code&gt;Load2&lt;/code&gt;, so &lt;code&gt;Qj&lt;/code&gt; of &lt;code&gt;Mult1&lt;/code&gt; gets tag &lt;code&gt;Load2&lt;/code&gt;. &lt;code&gt;F4&lt;/code&gt; is ready in the register file, so its value (2.0) is copied to &lt;code&gt;Vk&lt;/code&gt;.
RAT for destination &lt;code&gt;F0&lt;/code&gt; is updated to &lt;code&gt;Mult1&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;| Instruction        | Issue | Execute | Write Result |
| :----------------- | :---- | :------ | :----------- |
| &lt;code&gt;L.D F6, 34(R2)&lt;/code&gt;   | 1     | 2       | 3            |
| &lt;code&gt;L.D F2, 45(R3)&lt;/code&gt;   | 2     | 3       |              |
| &lt;code&gt;MUL.D F0, F2, F4&lt;/code&gt; | 3     |         |              |&lt;/p&gt;
&lt;p&gt;| Reservation Stations | Busy | Op    | Vj  | Vk  | Qj    | Qk  | Address |
| :------------------- | :--- | :---- | :-- | :-- | :---- | :-- | :------ |
| Load1                | Yes  | Load  | ... |     |       |     |         |
| Load2                | Yes  | Load  | ... |     |       |     |         |
| Mult1                | Yes  | MUL.D |     | 2.0 | Load2 |     |         |
| Add1                 | No   |       |     |     |       |     |         |&lt;/p&gt;
&lt;p&gt;| Register Result Status | F0    | F2    | F4  | F6    | F8  | F10 |
| :--------------------- | :---- | :---- | :-- | :---- | :-- | :-- |
| Qi                     | Mult1 | Load2 |     | Load1 |     |     |&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;CDB Activity:&lt;/strong&gt; &lt;code&gt;Load1&lt;/code&gt; broadcasts result Mem[134] (value 10.0) with tag &lt;code&gt;Load1&lt;/code&gt;.
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Snooping:&lt;/strong&gt; No waiting RS needs &lt;code&gt;Load1&lt;/code&gt; yet. The RAT entry for &lt;code&gt;F6&lt;/code&gt; is updated with the value and the tag is cleared.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h4&gt;Cycle 4:&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Events:&lt;/strong&gt; &lt;code&gt;SUB.D F8, F6, F2&lt;/code&gt; is issued. &lt;code&gt;Load2&lt;/code&gt; completes memory access.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Actions:&lt;/strong&gt;
An adder RS (Add1) is allocated.
RAT is checked for &lt;code&gt;F6&lt;/code&gt; and &lt;code&gt;F2&lt;/code&gt;. &lt;code&gt;F6&lt;/code&gt; is now ready (value 10.0 from &lt;code&gt;Load1&lt;/code&gt;&apos;s broadcast), so &lt;code&gt;Vj&lt;/code&gt; gets 10.0. &lt;code&gt;F2&lt;/code&gt; is still being produced by &lt;code&gt;Load2&lt;/code&gt;, so &lt;code&gt;Qk&lt;/code&gt; gets tag &lt;code&gt;Load2&lt;/code&gt;.
RAT for &lt;code&gt;F8&lt;/code&gt; is updated to &lt;code&gt;Add1&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;| Instruction        | Issue | Execute | Write Result |
| :----------------- | :---- | :------ | :----------- |
| &lt;code&gt;L.D F6, 34(R2)&lt;/code&gt;   | 1     | 2       | 3            |
| &lt;code&gt;L.D F2, 45(R3)&lt;/code&gt;   | 2     | 3       | 4            |
| &lt;code&gt;MUL.D F0, F2, F4&lt;/code&gt; | 3     |         |              |
| &lt;code&gt;SUB.D F8, F6, F2&lt;/code&gt; | 4     |         |              |&lt;/p&gt;
&lt;p&gt;| Reservation Stations | Busy | Op    | Vj   | Vk  | Qj    | Qk    |
| :------------------- | :--- | :---- | :--- | :-- | :---- | :---- |
| Mult1                | Yes  | MUL.D |      | 2.0 | Load2 |       |
| Add1                 | Yes  | SUB.D | 10.0 |     |       | Load2 |&lt;/p&gt;
&lt;p&gt;| Register Result Status | F0    | F2    | F4  | F6  | F8   | F10 |
| :--------------------- | :---- | :---- | :-- | :-- | :--- | :-- |
| Qi                     | Mult1 | Load2 |     |     | Add1 |     |&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;CDB Activity:&lt;/strong&gt; &lt;code&gt;Load2&lt;/code&gt; broadcasts result Mem[245] (value 5.0) with tag &lt;code&gt;Load2&lt;/code&gt;.
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Snooping:&lt;/strong&gt; Both &lt;code&gt;Mult1&lt;/code&gt; and &lt;code&gt;Add1&lt;/code&gt; are waiting for &lt;code&gt;Load2&lt;/code&gt;. They both snoop the CDB, capture the value 5.0, and clear their &lt;code&gt;Q&lt;/code&gt; fields. &lt;code&gt;Mult1&lt;/code&gt;&apos;s &lt;code&gt;Vj&lt;/code&gt; becomes 5.0. &lt;code&gt;Add1&lt;/code&gt;&apos;s &lt;code&gt;Vk&lt;/code&gt; becomes 5.0.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h4&gt;Cycle 5:&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Events:&lt;/strong&gt; &lt;code&gt;DIV.D F10, F0, F6&lt;/code&gt; is issued. Both &lt;code&gt;Mult1&lt;/code&gt; and &lt;code&gt;Add1&lt;/code&gt; are now ready to execute.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Actions:&lt;/strong&gt;
A divider RS (Div1) is allocated.
RAT is checked for &lt;code&gt;F0&lt;/code&gt; and &lt;code&gt;F6&lt;/code&gt;. &lt;code&gt;F0&lt;/code&gt; is being produced by &lt;code&gt;Mult1&lt;/code&gt;. &lt;code&gt;F6&lt;/code&gt; is ready. &lt;code&gt;Div1&lt;/code&gt; gets tag &lt;code&gt;Mult1&lt;/code&gt; in &lt;code&gt;Qj&lt;/code&gt; and value 10.0 in &lt;code&gt;Vk&lt;/code&gt;.
RAT for &lt;code&gt;F10&lt;/code&gt; is updated to &lt;code&gt;Div1&lt;/code&gt;.
&lt;code&gt;Mult1&lt;/code&gt; begins its 10-cycle execution (5.0 * 2.0).
&lt;code&gt;Add1&lt;/code&gt; begins its 2-cycle execution (10.0 - 5.0). Note the out-of-order execution: &lt;code&gt;SUB.D&lt;/code&gt; starts before &lt;code&gt;MUL.D&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;| Instruction         | Issue | Execute | Write Result |
| :------------------ | :---- | :------ | :----------- |
| ...                 | ...   | ...     | ...          |
| &lt;code&gt;MUL.D F0, F2, F4&lt;/code&gt;  | 3     | 5       |              |
| &lt;code&gt;SUB.D F8, F6, F2&lt;/code&gt;  | 4     | 5       |              |
| &lt;code&gt;DIV.D F10, F0, F6&lt;/code&gt; | 5     |         |              |&lt;/p&gt;
&lt;p&gt;| Reservation Stations | Busy | Op    | Vj   | Vk   | Qj    | Qk  |
| :------------------- | :--- | :---- | :--- | :--- | :---- | :-- |
| Mult1                | Yes  | MUL.D | 5.0  | 2.0  |       |     |
| Add1                 | Yes  | SUB.D | 10.0 | 5.0  |       |     |
| Div1                 | Yes  | DIV.D |      | 10.0 | Mult1 |     |&lt;/p&gt;
&lt;p&gt;| Register Result Status | F0    | F2  | F4  | F6  | F8   | F10  |
| :--------------------- | :---- | :-- | :-- | :-- | :--- | :--- |
| Qi                     | Mult1 |     |     |     | Add1 | Div1 |&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;CDB Activity:&lt;/strong&gt; None.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;...This process continues. The &lt;code&gt;SUB.D&lt;/code&gt; will finish in cycle 6 and broadcast its result. The &lt;code&gt;ADD.D&lt;/code&gt; (instruction 6) will issue and wait for results from &lt;code&gt;Add1&lt;/code&gt; and &lt;code&gt;Load2&lt;/code&gt;. The &lt;code&gt;MUL.D&lt;/code&gt; will finish in cycle 14 and broadcast, allowing the &lt;code&gt;DIV.D&lt;/code&gt; to start its long 40-cycle execution. This detailed trace reveals how the hardware dynamically resolves dependencies and executes instructions as soon as their data is ready, maximizing parallelism.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;Section 4.4: Taming the Chaos with the Reorder Buffer (ROB)&lt;/h3&gt;
&lt;p&gt;While Tomasulo&apos;s algorithm is brilliant at extracting instruction-level parallelism, its out-of-order completion creates a significant problem: it makes handling exceptions and branch mispredictions incredibly difficult. If &lt;code&gt;MUL.D&lt;/code&gt; completes after &lt;code&gt;SUB.D&lt;/code&gt;, but &lt;code&gt;SUB.D&lt;/code&gt; causes an arithmetic exception, the machine state is inconsistent. The processor has modified state (F8) from an instruction that is logically &lt;strong&gt;after&lt;/strong&gt; the faulting instruction. This is called an &lt;strong&gt;imprecise exception&lt;/strong&gt;, and it makes operating systems and recovery mechanisms nearly impossible to implement correctly.&lt;/p&gt;
&lt;p&gt;The solution is to add a new hardware structure, the &lt;strong&gt;Reorder Buffer (ROB)&lt;/strong&gt;, which extends the original algorithm to ensure that while instructions may &lt;strong&gt;execute&lt;/strong&gt; out of order, they &lt;strong&gt;commit&lt;/strong&gt; their results to the architectural state (the main register file and memory) in strict program order.&lt;/p&gt;
&lt;h4&gt;ROB Mechanism&lt;/h4&gt;
&lt;p&gt;The ROB is a circular buffer that operates on a First-In, First-Out (FIFO) basis. It bridges the gap between out-of-order execution completion and in-order architectural update.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Issue:&lt;/strong&gt; When an instruction is decoded, it is allocated an entry at the tail of the ROB. This ROB entry number becomes the instruction&apos;s new tag. The register renaming table (RAT) now points to ROB entries, not reservation stations.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Execute:&lt;/strong&gt; Instructions are still sent to reservation stations and execute out-of-order as before.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Write Result:&lt;/strong&gt; When a functional unit completes, it broadcasts its result and its ROB tag on the CDB. The result is written into the corresponding entry in the &lt;strong&gt;ROB&lt;/strong&gt;, not the register file. The ROB entry is marked as &quot;ready&quot;. Any waiting reservation stations also snoop the CDB and grab the result.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Commit:&lt;/strong&gt; The processor examines the instruction at the &lt;strong&gt;head&lt;/strong&gt; of the ROB. If its entry is marked &quot;ready,&quot; the instruction is committed. This means its result is finally written from the ROB to the architectural register file or memory. The instruction is then removed from the ROB (the head pointer advances). If the instruction at the head is not yet ready, the commit stage stalls, and no subsequent instructions can be committed, thus enforcing in-order retirement.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Each entry in the ROB typically contains these fields:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Busy:&lt;/strong&gt; Indicates if the entry is valid.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Instruction Type:&lt;/strong&gt; Specifies if it&apos;s a branch, store, or register operation.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;State:&lt;/strong&gt; Tracks the instruction&apos;s progress (e.g., &lt;code&gt;Issue&lt;/code&gt;, &lt;code&gt;Execute&lt;/code&gt;, &lt;code&gt;WriteResult&lt;/code&gt;, &lt;code&gt;Commit&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Destination:&lt;/strong&gt; The architectural register number or memory address to be written.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Value:&lt;/strong&gt; The computed result, held here until commit.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Ready:&lt;/strong&gt; A bit indicating the result is valid in the &lt;code&gt;Value&lt;/code&gt; field.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Exception:&lt;/strong&gt; Stores any exception information generated during execution.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h3&gt;Section 4.5: Achieving Precise Exceptions and Speculation&lt;/h3&gt;
&lt;p&gt;The addition of the ROB is the key that unlocks two of the most powerful features of modern high-performance processors: precise exceptions and speculative execution.&lt;/p&gt;
&lt;h4&gt;Precise Exceptions&lt;/h4&gt;
&lt;p&gt;The ROB provides a simple and elegant mechanism for handling exceptions precisely. When an instruction (e.g., a &lt;code&gt;DIV.D&lt;/code&gt; by zero) encounters an exception during its execution, the exception is not acted upon immediately. Instead, the exception status is simply recorded in the &lt;code&gt;Exception&lt;/code&gt; field of the instruction&apos;s entry in the ROB. The processor continues to execute and complete other instructions out of order. The exception is only handled when the faulting instruction reaches the head of the ROB and is ready to be committed. At that point, the processor knows the exception is not speculative and is the next one to occur in the program&apos;s sequential order. It can then flush the entire pipeline and ROB, save a precise state, and jump to the operating system&apos;s exception handler.&lt;/p&gt;
&lt;h4&gt;Branch Speculation&lt;/h4&gt;
&lt;p&gt;The ROB is also the enabler of efficient &lt;strong&gt;branch speculation&lt;/strong&gt;. When the processor encounters a branch, a branch predictor guesses the outcome. The processor then &lt;strong&gt;speculatively&lt;/strong&gt; fetches, issues, and executes instructions from the predicted path, filling the ROB with these speculative instructions.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;If the prediction is correct:&lt;/strong&gt; The branch instruction eventually reaches the head of the ROB and is committed. The speculative instructions that follow it then commit normally as they reach the head. No time was lost.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;If the prediction is incorrect:&lt;/strong&gt; When the branch instruction is finally executed and the misprediction is discovered, the processor performs a recovery. It flushes all speculative instructions from the pipeline, reservation stations, and the ROB (this is as simple as resetting the ROB&apos;s tail pointer to its head pointer). No architectural state was corrupted because none of the speculative instructions were ever committed. The processor then begins fetching from the correct path.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The combination of a Tomasulo-style dataflow execution core with a Reorder Buffer for in-order commit forms the foundation of virtually all modern high-performance, out-of-order (OOO) processors. This two-part architecture elegantly solves the problems of data dependencies, false dependencies, and imprecise state, allowing for a massive increase in instruction-level parallelism.&lt;/p&gt;
&lt;h2&gt;Appendix A&lt;/h2&gt;
&lt;h3&gt;WAR Hazard&lt;/h3&gt;
&lt;p&gt;A WAR hazard, or &quot;anti-dependence,&quot; happens when an instruction wants to write to a register before an earlier instruction has finished reading that register&apos;s original value.&lt;/p&gt;
&lt;p&gt;Here is a simple example:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# Instruction 1 w/ long latency
1.  FMUL.D  F2, F4, F6   // Multiplies F4 and F6, result goes to F2

# Instruction 2 w/ short latency, independent operands
2.  FADD.D  F4, F8, F10  // Adds F8 and F10, result goes to F4
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;A Tomasulo processor&apos;s goal is to maximize performance by executing instructions as soon as their operands are ready.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Instruction 1 (FMUL.D) is issued. Let&apos;s say it&apos;s a long operation that will take 10 cycles. It needs to read F4.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Instruction 2 (FADD.D) is issued right after. The processor sees its source operands (F8 and F10) are ready and that the addition functional unit is free. It&apos;s a short 2-cycle operation.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The processor executes FADD.D immediately, without waiting for the FMUL.D to finish.&lt;/p&gt;
&lt;p&gt;So here is the &lt;strong&gt;WAR&lt;/strong&gt; hazard: The FADD.D finishes in 2 cycles and wants to write its result to register F4. But the FMUL.D instruction hasn&apos;t even started its long execution yet and still needs the original value from F4! If the FADD.D were allowed to write to the actual architectural register F4, it would corrupt the input for the FMUL.D, leading to an incorrect program result.&lt;/p&gt;
&lt;h3&gt;WAW Hazard&lt;/h3&gt;
&lt;p&gt;A WAW hazard, or &quot;output dependence,&quot; happens when two different instructions want to write to the same destination register, and the instruction that came later in the program finishes execution first.&lt;/p&gt;
&lt;p&gt;Here is a simple example:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;1.  FMUL.D  F2, F4, F6    // Writes to F2

2.  FADD.D  F2, F8, F10   // Also writes to F2
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The correct final value in F2 should be the result of the FADD.D instruction, since it comes later in the program.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The FADD.D (Instruction 2) is short and finishes in 2 cycles. It&apos;s ready to write its result.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The FMUL.D (Instruction 1) is long and finishes 10 cycles later.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So here is the &lt;strong&gt;WAW&lt;/strong&gt; hazard: If the FADD.D writes its result to F2, and then 8 cycles later the FMUL.D also writes its result to F2, the final value in the register will be from the FMUL.D. This is incorrect! The result from the instruction that was supposed to happen first has overwritten the result from the instruction that was supposed to happen last.&lt;/p&gt;</content:encoded><h:img src="undefined"/><enclosure url="undefined"/></item><item><title>Enabling KVM GPU Passthrough</title><link>https://20051110.xyz/blog/gpu-passthrough</link><guid isPermaLink="true">https://20051110.xyz/blog/gpu-passthrough</guid><description>How to enable GPU passthrough for KVM on Linux</description><pubDate>Sun, 27 Apr 2025 10:55:00 GMT</pubDate><content:encoded>&lt;h2&gt;Credits&lt;/h2&gt;
&lt;p&gt;In this article, the &quot;Enabling IOMMU&quot; and the &quot;GPU Passthrough&quot; sections are adapted from &lt;a href=&quot;https://drakeor.com/2022/02/16/kvm-gpu-passthrough-tutorial/&quot;&gt;Drakeor&apos;s Blog&lt;/a&gt; with some clarifications and modifications. The original article is very well written and I highly recommend reading it.&lt;/p&gt;
&lt;p&gt;If this article is helpful, make sure to check out Drakeor&apos;s blog and support him. Thanks to Drakeor for the great work!&lt;/p&gt;
&lt;h2&gt;Enabling IOMMU&lt;/h2&gt;
&lt;h3&gt;Setup&lt;/h3&gt;
&lt;p&gt;In my setup, I have a host machine with an NVIDIA GeForce RTX 4090 GPU and a guest machine running Ubuntu 24.04 Server for AI training. The host machine is running Ubuntu 24.04 LTS with 6.11 kernel. The host machine has a integrated Intel UHD Graphics 770 GPU, which is used for the host display. The NVIDIA GPU is passed through to the guest machine.&lt;/p&gt;
&lt;p&gt;The host machine has the following hardware:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CPU: Intel Core i9-14900K&lt;/li&gt;
&lt;li&gt;Motherboard: Gigabyte Z790 AORUS XTREME&lt;/li&gt;
&lt;li&gt;GPU: ZOTAC GeForce RTX 4090&lt;/li&gt;
&lt;li&gt;RAM: 64GB DDR5&lt;/li&gt;
&lt;li&gt;Storage: 2TB NVMe SSD&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Enabling IOMMU is a crucial step for GPU passthrough. It allows the host machine to access the GPU directly. It takes two steps to enable IOMMU: enabling it in the BIOS and enabling it in linux.&lt;/p&gt;
&lt;h3&gt;BIOS Settings&lt;/h3&gt;
&lt;p&gt;This tutorial assumes that you have IOMMU support for both your motherboard and CPU. Most modern server motherboards should support it, but your mileage may vary with desktop motherboards. Here are the options in BIOS corresponding to IOMMU related features:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Intel Based: Enable &quot;Intel VT-d&quot;. May also be called &quot;Intel Virtualization Technology&quot; or simply &quot;VT-d&quot; on some motherboards.&lt;/li&gt;
&lt;li&gt;AMD Based: Enable &quot;SVM&quot;. May also be called &quot;AMD Virtualization&quot; or simply &quot;AMD-V&quot;.
Note: I&apos;ve seen &quot;IOMMU&quot; as it&apos;s own separate option on one of my motherboards, but not on any of my other motherboards. Make sure it&apos;s enabled if you do see it. If you don&apos;t see it, it&apos;s likely rolled into one of the former VT-d or AMD-V options listed above.&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;
&lt;p&gt;Some modern computers may have IOMMU enabled by default, so you may first verify whether it is enabled or not. If you are not sure, you can check the BIOS settings.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h4&gt;Checking for IOMMU Support on your CPU&lt;/h4&gt;
&lt;p&gt;On Ubuntu/Debian for my Intel processor, it&apos;s as easy as this:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;cat /proc/cpuinfo | grep --color vmx
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you see colored &lt;code&gt;vmx&lt;/code&gt; in the output, you have IOMMU support. If you see nothing, your CPU does not support IOMMU.&lt;/p&gt;
&lt;p&gt;The AMD equivalent is this:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;cat /proc/cpuinfo | grep --color svm
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;There are one other BIOS settings that I recommend enabling before you move on to the next section.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Make sure the &lt;strong&gt;Primary GPU is set to integrated and not using your passthrough graphics card&lt;/strong&gt;. This is called &quot;Boot GPU&quot; and &quot;Primary Graphics&quot; in my BIOS. Also remember to plug your monitor into the integrated graphics port on your motherboard. This is important because the host machine will use the integrated graphics for display and the passthrough graphics card will be used by the guest machine.&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;
&lt;p&gt;It is also worth notice that some motherboards have a setting called &quot;Above 4G Decoding&quot; or &quot;Resizable Bar Support&quot;. This is not the same as IOMMU. It is used for PCIe devices that require more than 4GB of address space. It is not required for IOMMU to work, but it is recommended to enable it if you have a GPU with more than 4GB of VRAM.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Once you&apos;ve enabled the above settings, save and exit the BIOS. This is a one-time operation. You will not need to do this again unless you reset your BIOS settings.&lt;/p&gt;
&lt;h3&gt;Linux GRUB Settings&lt;/h3&gt;
&lt;p&gt;Add the following options to your GRUB_CMDLINE_LINUX option in the &lt;code&gt;/etc/default/grub&lt;/code&gt; file:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;nano /etc/default/grub
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;For Intel CPUs, add the following options:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;GRUB_CMDLINE_LINUX=&quot;... intel_iommu=on iommu=pt video=efifb:off&quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The ... in the above line is the existing options. Make sure to keep them.&lt;/p&gt;
&lt;p&gt;For AMD CPUs, add the following options:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;GRUB_CMDLINE_LINUX=&quot;... amd_iommu=on iommu=pt video=efifb:off&quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And then update GRUB:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;sudo grub-mkconfig -o /boot/grub/grub.cfg
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Make sure to reboot your system.&lt;/p&gt;
&lt;p&gt;Then, to check that IOMMU is enabled, we can run the following command&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;sudo dmesg | grep -i -e DMAR -e IOMMU
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You should see at least a message or two about it loading like below:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;Feb 10 17:55:23.119993 opaleye kernel: pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
Feb 10 17:55:23.123622 opaleye kernel: pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40
Feb 10 17:55:23.123691 opaleye kernel: perf/amd_1ommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank) •
Feb 10 17:55:23.124108 opaleye kernel: AMD-Vi: AMD IOMMUv2 loaded and initialized
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;GPU Passthrough&lt;/h2&gt;
&lt;h3&gt;Find IOMMU Groups&lt;/h3&gt;
&lt;p&gt;Finding IOMMU Groups&lt;/p&gt;
&lt;p&gt;Before looking at the IOMMU Groups, I want to make sure that my graphics card is visible to the OS. I run the following command:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;lspci -nnk | grep VGA
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;For me, this results in 2 graphics controllers being shown:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;00:02.0 VGA compatible controller [0300]: Intel Corporation Raptor Lake-S GT1 [UHD Graphics 770] [8086:a780] (rev 04)
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation AD102 [GeForce RTX 4090] [10de:2684] (rev a1)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The first one is the integrated graphics card and the second one is the NVIDIA GPU. To list all the IOMMU groups they are part of, I&apos;ll run the following command (TheUnknownThing notes: I&apos;ve modified the command because the original one from drakeor&apos;s blog was not working for me):&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;for d in /sys/kernel/iommu_groups/*/devices/*; do
  n=${d#*/iommu_groups/*}; n=${n%%/*}
  printf &apos;IOMMU Group %s &apos; &quot;$n&quot;
  lspci -nns &quot;${d##*/}&quot;
done | sort -V
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&quot;https://20051110.xyz/_astro/IOMMU.B9WXifC2_ZI6v4s.webp&quot; alt=&quot;IOMMU Groups&quot;&gt;&lt;/p&gt;
&lt;p&gt;As is shown in the figure, my RTX 4090 is in IOMMU group 12.&lt;/p&gt;
&lt;h3&gt;Loading the Correct Kernel Modules&lt;/h3&gt;
&lt;p&gt;Okay, so now that we have IOMMU all set, we need to make sure to load the correct modules for our passthrough graphics card. By default, nouveau will try to grab the graphics card when we boot.&lt;/p&gt;
&lt;p&gt;I created a new file called /etc/modprobe.d/vfio.conf and added the following lines:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;blacklist nouveau
options vfio_pci ids=10de:2684,10de:22ba
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&quot;https://20051110.xyz/_astro/ID.CzJ56Oe7_226Akw.webp&quot; alt=&quot;ID&quot;&gt;&lt;/p&gt;
&lt;p&gt;Note that I got the IDs from the IOMMU Group above. I need to pass in EVERY device in that IOMMU group or it won&apos;t work! Even though I&apos;m not using audio, I still need to pass in the audio device in that group.&lt;/p&gt;
&lt;p&gt;Side note: why we need to block nouveau? Because it will try to grab the graphics card and we don&apos;t want that. We want vfio-pci to grab it instead.&lt;/p&gt;
&lt;p&gt;In &lt;code&gt;/etc/modules-load.d/modules.conf&lt;/code&gt;, we&apos;ll ensure vfio_pci is loaded at boot:&lt;/p&gt;
&lt;p&gt;Add &lt;code&gt;vfio_pci&lt;/code&gt; to the file:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;echo &quot;vfio_pci&quot; | sudo tee -a /etc/modules-load.d/modules.conf
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Now reboot your system.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Now run the following to make sure the correct module is being used:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;lspci -nnk
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Make sure you see &lt;code&gt;vfio-pci&lt;/code&gt; in the driver column for your graphics card.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20051110.xyz/_astro/VFIO.--QXy68v_1e9DcJ.webp&quot; alt=&quot;VFIO&quot;&gt;&lt;/p&gt;
&lt;h3&gt;Passing the GPU to the Guest VM&lt;/h3&gt;
&lt;p&gt;If you haven&apos;t installed the virt-manager or created your VM yet, please move on to the &lt;a href=&quot;#creating-a-vm&quot;&gt;Creating a VM&lt;/a&gt; section.&lt;/p&gt;
&lt;p&gt;So recall that the PCI address is on the left-side of when I ran lspci -Dnn earlier:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;0000:01:00.0 VGA compatible controller [0300]: NVIDIA Corporation AD102 [GeForce RTX 4090] [10de:2684] (rev a1)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We want to take that value (0000:01:00.0) and convert all the colons and dots into underscores. So for 0000:01:00.0, it will be 0000_01_00_0.&lt;/p&gt;
&lt;p&gt;Now we need to detach the PCI device from the host machine. We can do this with the following virsh command:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;virsh nodedev-detach pci_0000_01_00_0
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then we&apos;ll edit the VM we want to attach the GPU to with the following virsh command:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;virsh edit &amp;#x3C;vm_name&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Under the devices tag, we&apos;ll add the GPU. Note that address, bus, slot, and function matches the PCI address we saw earlier. You could add the following to wherever you want in the devices section, but I like to put it at the end.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;..
&amp;#x3C;devices&gt;
...
    &amp;#x3C;hostdev mode=&apos;subsystem&apos; type=&apos;pci&apos; managed=&apos;yes&apos;&gt;
        &amp;#x3C;driver name=&apos;vfio&apos;/&gt;
        &amp;#x3C;source&gt;
        &amp;#x3C;address domain=&apos;0x0000&apos; bus=&apos;0x01&apos; slot=&apos;0x00&apos; function=&apos;0x0&apos;/&gt;
        &amp;#x3C;/source&gt;
    &amp;#x3C;/hostdev&gt;
...
&amp;#x3C;/devices&gt;
...
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now save the file and reboot your VM, and you should see the NVIDIA GPU in the VM. Remember to install the NVIDIA drivers in the guest machine. For a quick test, I will run the following command in the guest machine:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;sudo apt update
sudo ubuntu-drivers autoinstall
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And testthe following command to check if the NVIDIA drivers are installed correctly:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;sudo nvidia-smi
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;Creating a VM&lt;/h2&gt;
&lt;h3&gt;Prerequisites: Check Hardware Virtualization Support&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;KVM requires hardware virtualization extensions (Intel VT-x or AMD-V) to be enabled in your system&apos;s BIOS/UEFI.  As we discussed earlier, I&apos;ll assume you have this enabled.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Check if the KVM modules are loaded (after installation step below):&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;lsmod | grep kvm
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You should see kvm_intel or kvm_amd listed.&lt;/p&gt;
&lt;h3&gt;Install Libvirt&lt;/h3&gt;
&lt;p&gt;Ensure your package list is up-to-date:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;sudo apt update
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You&apos;ll need the Libvirt daemon, the QEMU/KVM hypervisor, and management tools.&lt;/p&gt;
&lt;p&gt;The Libvirt package installation includes several components:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;qemu-kvm&lt;/code&gt;: The KVM hypervisor backend.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;libvirt-daemon-system&lt;/code&gt;: The main Libvirt daemon that runs as a system service.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;libvirt-clients&lt;/code&gt;: Command-line tools for managing Libvirt (like &lt;code&gt;virsh&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;&lt;code&gt;bridge-utils&lt;/code&gt;: Utilities for creating and managing network bridges (often needed for VM networking).&lt;/li&gt;
&lt;li&gt;&lt;code&gt;virtinst&lt;/code&gt;: Tools to create virtual machines (like &lt;code&gt;virt-install&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;&lt;code&gt;virt-manager&lt;/code&gt;: (Optional, but Recommended) A graphical user interface for managing VMs.&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;sudo apt install qemu-kvm libvirt-daemon-system libvirt-clients bridge-utils virtinst virt-manager
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This command installs all the essential components, including the graphical &lt;code&gt;virt-manager&lt;/code&gt;. If you are setting up a headless server, you can omit &lt;code&gt;virt-manager&lt;/code&gt;.&lt;/p&gt;
&lt;h3&gt;Add Your User to the &lt;code&gt;libvirt&lt;/code&gt; Group&lt;/h3&gt;
&lt;p&gt;By default, only the &lt;code&gt;root&lt;/code&gt; user can manage system-wide Libvirt virtual machines. To allow your regular user account to manage VMs without using &lt;code&gt;sudo&lt;/code&gt; for every command, add it to the &lt;code&gt;libvirt&lt;/code&gt; group.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;sudo adduser &amp;#x3C;your_username&gt; libvirt
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Replace &lt;code&gt;&amp;#x3C;your_username&gt;&lt;/code&gt; with your actual username.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Important:&lt;/strong&gt; You need to &lt;strong&gt;log out and log back in&lt;/strong&gt; for this group change to take effect. Alternatively, you can activate the group membership for your current shell session using &lt;code&gt;newgrp libvirt&lt;/code&gt; (but logging out/in is generally recommended).&lt;/p&gt;
&lt;h3&gt;Verify the Installation&lt;/h3&gt;
&lt;p&gt;Check the Libvirt daemon status by executing the following command:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;sudo systemctl status libvirtd
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It should show as &lt;code&gt;active (running)&lt;/code&gt;. If not, try starting and enabling it:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;sudo systemctl start libvirtd
sudo systemctl enable libvirtd
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And check Libvirt connection (as your user, after logging back in):&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;virsh list --all
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This command should run without errors (even if it shows an empty list of VMs). If you get a permission error, double-check that you&apos;ve logged out and back in after adding your user to the &lt;code&gt;libvirt&lt;/code&gt; group.&lt;/p&gt;
&lt;h3&gt;Create a Virtual Machine&lt;/h3&gt;
&lt;p&gt;First, download the ISO image for the OS you want to install. For this tutorial, I will use Ubuntu 24.04 Server. You can download it from the &lt;a href=&quot;https://ubuntu.com/download/server&quot;&gt;official Ubuntu website&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I will recommend using the &lt;code&gt;virt-manager&lt;/code&gt; GUI for creating and managing VMs, as it simplifies the process significantly. However, if you prefer command-line tools, you can use &lt;code&gt;virt-install&lt;/code&gt;.
To simplify the process, I will use &lt;code&gt;virt-manager&lt;/code&gt;.&lt;/p&gt;
&lt;h3&gt;Launching &lt;code&gt;virt-manager&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;To launch &lt;code&gt;virt-manager&lt;/code&gt;, run the following command in your terminal:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;virt-manager
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This will open the graphical interface for managing virtual machines. And the experience is quite straightforward, so I won&apos;t go into detail here. Just follow the prompts to create a new VM.&lt;/p&gt;
&lt;h2&gt;Accessing VM through &lt;code&gt;virsh console&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;virsh console&lt;/code&gt; command connects you to a &lt;em&gt;serial console&lt;/em&gt; device that libvirt exposes to the virtual machine. For this to work bidirectional (input and output), two things need to be properly configured:&lt;/p&gt;
&lt;h3&gt;Virsh console&lt;/h3&gt;
&lt;p&gt;In the Virtual Machine&apos;s Libvirt XML, tt needs to have a &lt;code&gt;&amp;#x3C;console type=&apos;pty&apos;&gt;&lt;/code&gt; or similar device defined, connected to a serial port (like &lt;code&gt;target port=&apos;0&apos;&lt;/code&gt;). You can double-check this by running &lt;code&gt;virsh dumpxml ubuntu24.04&lt;/code&gt; and looking within the &lt;code&gt;&amp;#x3C;devices&gt;&lt;/code&gt; section for a &lt;code&gt;&amp;#x3C;console&gt;&lt;/code&gt; or &lt;code&gt;&amp;#x3C;serial&gt;&lt;/code&gt; entry.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&amp;#x3C;serial type=&apos;pty&apos;&gt;
  &amp;#x3C;source path=&apos;/dev/pts/3&apos;/&gt;
  &amp;#x3C;target type=&apos;isa-serial&apos; port=&apos;0&apos;&gt;
    &amp;#x3C;model name=&apos;isa-serial&apos;/&gt;
  &amp;#x3C;/target&gt;
  &amp;#x3C;alias name=&apos;serial0&apos;/&gt;
&amp;#x3C;/serial&gt;
&amp;#x3C;console type=&apos;pty&apos; tty=&apos;/dev/pts/3&apos;&gt;
  &amp;#x3C;source path=&apos;/dev/pts/3&apos;/&gt;
  &amp;#x3C;target type=&apos;serial&apos; port=&apos;0&apos;/&gt;
  &amp;#x3C;alias name=&apos;serial0&apos;/&gt;
&amp;#x3C;/console&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If this is missing, you&apos;ll need to add it using &lt;code&gt;virsh edit ubuntu24.04&lt;/code&gt;.&lt;/p&gt;
&lt;h3&gt;Inside the Guest VM&lt;/h3&gt;
&lt;p&gt;Edit the GRUB configuration:&lt;/p&gt;
&lt;p&gt;Open the GRUB default file in a text editor:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;sudo nano /etc/default/grub
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Find the line that starts with &lt;code&gt;GRUB_CMDLINE_LINUX_DEFAULT&lt;/code&gt;. It might look something like:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;GRUB_CMDLINE_LINUX_DEFAULT=&quot;quiet splash&quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You need to add console redirection parameters. Add &lt;code&gt;console=tty0 console=ttyS0,115200&lt;/code&gt;.
_ &lt;code&gt;console=tty0&lt;/code&gt;: Ensures output also goes to the primary virtual console (if you still have one, which you likely do for initial setup).
_ &lt;code&gt;console=ttyS0,115200&lt;/code&gt;: Directs kernel and boot messages to the first serial port (&lt;code&gt;ttyS0&lt;/code&gt;) at a baud rate of 115200. This corresponds to the &lt;code&gt;port=&apos;0&apos;&lt;/code&gt; in the libvirt XML.&lt;/p&gt;
&lt;p&gt;The line should become something like:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;GRUB_CMDLINE_LINUX_DEFAULT=&quot;quiet splash console=tty0 console=ttyS0,115200&quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you already have other parameters in this line, just add the &lt;code&gt;console=...&lt;/code&gt; parts inside the quotes, separated by spaces.&lt;/p&gt;
&lt;p&gt;Enable a Serial Getty Service:&lt;/p&gt;
&lt;p&gt;Ubuntu uses &lt;code&gt;systemd&lt;/code&gt; to manage services. You need to enable the service that provides a login prompt on the serial port.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;sudo systemctl enable serial-getty@ttyS0.service
sudo systemctl start serial-getty@ttyS0.service
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;enable&lt;/code&gt; command ensures it starts on boot, and &lt;code&gt;start&lt;/code&gt; attempts to start it immediately.&lt;/p&gt;
&lt;p&gt;After editing the GRUB configuration file, you must update the GRUB bootloader:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;sudo update-grub
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Also update Initramfs (necessary for console changes to take full effect early in boot):&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;sudo update-initramfs -u
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And remember to reboot your VM, and it should now be accessible via the &lt;code&gt;virsh console&lt;/code&gt; command.&lt;/p&gt;</content:encoded><h:img src="undefined"/><enclosure url="undefined"/></item><item><title>CS188 Notes 4 - Reinforcement Learning</title><link>https://20051110.xyz/blog/cs188-notes-4</link><guid isPermaLink="true">https://20051110.xyz/blog/cs188-notes-4</guid><description>Notes from UC Berkeley&apos;s CS188 course on Artificial Intelligence.</description><pubDate>Sun, 20 Apr 2025 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;Note:&lt;/h2&gt;
&lt;p&gt;You could view previous notes on &lt;a href=&quot;/blog/cs188-notes-3&quot;&gt;CS188: Lecture 9 - Markov Decision Processes (MDPs)&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Also note that my notes are based on the &lt;strong&gt;Spring 2025&lt;/strong&gt; version of the course, and my understanding of the material. So they MAY NOT be 100% accurate or complete. Also, THIS IS NOT A SUBSTITUTE FOR THE COURSE MATERIAL. I would only take notes on parts of the lecture that I find interesting or confusing. I will NOT be taking notes on every single detail of the lecture.&lt;/p&gt;
&lt;h2&gt;Reinforcement Learning&lt;/h2&gt;
&lt;p&gt;In this note I will go through the key concepts in the Reinforcement Learning (RL) lecture. I will also try to clarify my understanding of the Q-learning algorithm, which is a key concept in RL.&lt;/p&gt;
&lt;p&gt;First let&apos;s categorize the topics. I&apos;ll use the same categories as in the lecture slides also adding some of my own notes.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Passive Learning
&lt;ul&gt;
&lt;li&gt;Model-based&lt;/li&gt;
&lt;li&gt;Model-free&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Active Learning&lt;/li&gt;
&lt;li&gt;Approximate Q-learning&lt;/li&gt;
&lt;li&gt;Policy Gradient&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;ONE SENTENCE SUMMARY:
Passive learning involves evaluating a fixed policy (likely human will control it), while active learning seeks to improve the policy through exploration (likely model itself would operate); model-based methods use environment models, model-free methods learn directly from experience, approximate Q-learning generalizes learning to large state spaces, and policy gradient methods optimize policies directly using gradient ascent.&lt;/p&gt;
&lt;p&gt;I believe this is a good summary of the key concepts in RL. I will go through each of these categories in detail below. Also, I will use the structure of &quot;HOW? -&gt; WHY? -&gt; PROBLEM&quot; to explain each concept.&lt;/p&gt;
&lt;h2&gt;Passive RL&lt;/h2&gt;
&lt;h3&gt;Model-based&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;How?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The agent learns a model of the environment (e.g., transition probabilities, rewards) and uses this model to evaluate the policy. This is done by estimating the expected value of each action in each state based on the model.&lt;/p&gt;
&lt;p&gt;Then Solve for values as if the learned model were correct. (Trust the model)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Why?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Answering &quot;why&quot; in this section is basically answering &quot;why do we need a model?&quot; The answer is that we do not have a model of the environment, so we need to learn it. This is done by estimating the transition probabilities and rewards based on the observed data.
This is a key concept in RL, as it allows the agent to learn from its experiences and improve its policy over time.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Problem&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The problem with this approach is that it requires a lot of data to learn the model accurately. If the model is not accurate, the agent may make suboptimal decisions based on the learned model.&lt;/p&gt;
&lt;h3&gt;Model-free&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;How?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;In this case there are no models to guide us &quot;what to do&quot;. We need to learn the value function directly from the data.&lt;/p&gt;
&lt;p&gt;The simplest thought is to &lt;strong&gt;Average together observed sample values&lt;/strong&gt;. Every time you visit a state, write down what the sum of discounted rewards turned out to be, and average it out. But what&apos;s bad about this is that it do not take account of state connections. For example, there is a graph A -&gt; B -&gt; C (end). How to calculate $V$ for state $A$ and $B$? We would evaluate every single starting state separately, for example, when evaluating A, we would NOT take the previous evaluation of B in to account, it only cares the final outcome and to average it. This is not a good idea, because we are wasting a lot of data. We could use the data from state B to help us evaluate state A. So we need to take into account the connections between states.&lt;/p&gt;
&lt;p&gt;So an evolution of this is to use the &lt;strong&gt;Bellman equation&lt;/strong&gt;. The idea is to use the value of the next state to help us evaluate the current state. This is done by using the Bellman equation similar to the one we used in the MDP lecture. However, we need modifications for this.&lt;/p&gt;
&lt;p&gt;The ORIGINAL Bellman equation is:&lt;/p&gt;
&lt;p&gt;$$
V(s) = \sum_{s&apos;} T(s, a, s&apos;)[R(s, a, s&apos;) + \gamma V(s&apos;)]
$$&lt;/p&gt;
&lt;p&gt;And its ADAPTED version is:&lt;/p&gt;
&lt;p&gt;$$
V(s) = \frac{1}{n}\sum \mathrm{sample}&lt;em&gt;{s&apos;} \quad \text{where} \ \mathrm{sample}&lt;/em&gt;{s&apos;} = R(s, a, s&apos;) + \gamma V(s&apos;)
$$&lt;/p&gt;
&lt;p&gt;What&apos;s improved from the naive version is that we are utilizing the existing data to evaluate. However, as we notice that there are problems with this:
We are waiting until the end of an episode to update values as we are using the average of all samples. We could &lt;strong&gt;update values more frequently&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;So this is where the &lt;strong&gt;Temporal Difference (TD) Learning&lt;/strong&gt; comes in. The idea is to update the value of the current state based on the value of the next state, without waiting for the end of the episode.
Because updates happen after every transition, states and transitions that are experienced more frequently will have a greater influence on the learned values over time.&lt;/p&gt;
&lt;p&gt;The specific type of TD learning shown here is for &lt;strong&gt;policy evaluation&lt;/strong&gt;. This means we have a &lt;em&gt;fixed policy&lt;/em&gt; &lt;code&gt;π&lt;/code&gt; (a fixed way of choosing actions in each state), and we want to figure out the value function &lt;code&gt;Vπ(s)&lt;/code&gt; for that policy. We are &lt;em&gt;not&lt;/em&gt; trying to find the &lt;em&gt;best&lt;/em&gt; policy yet, just evaluating the current one.&lt;/p&gt;
&lt;p&gt;In TD, we have samples, and the update rule.&lt;/p&gt;
&lt;p&gt;$$
\mathrm{sample} = R(s, \pi(s), s&apos;) + γV^\pi(s&apos;)
$$&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;sample&lt;/code&gt; (or TD Target) is: &quot;the reward I just got, plus the discounted value of where I landed (according to my current beliefs)&quot;.&lt;/p&gt;
&lt;p&gt;The update rule is:&lt;/p&gt;
&lt;p&gt;$$
V^\pi(s) \leftarrow (1 - α)V^\pi(s) + α \ \mathrm{sample}
$$&lt;/p&gt;
&lt;p&gt;We calculate the &lt;strong&gt;TD Error&lt;/strong&gt;: &lt;code&gt;sample - Vπ(s)&lt;/code&gt;. This error represents the difference between our target (&lt;code&gt;sample&lt;/code&gt;) and our current estimate (&lt;code&gt;Vπ(s)&lt;/code&gt;). We then adjust our current estimate &lt;code&gt;Vπ(s)&lt;/code&gt; by moving it a small step (&lt;code&gt;α&lt;/code&gt;) in the direction of that error.&lt;/p&gt;
&lt;p&gt;It shows that TD learning is essentially maintaining a running average of the TD targets it observes for each state.
It gradually &quot;forgets&quot; older, potentially less accurate, information because initial value estimates might be far off.
Using a decreasing learning rate &lt;code&gt;α&lt;/code&gt; over time can help the value estimates converge more stably.&lt;/p&gt;
&lt;p&gt;However, there are still problems. Mentioned in the previous lecture, what really GUIDES the agent is the $Q$-values. So we need to learn the $Q$-values instead of the $V$-values.&lt;/p&gt;
&lt;h2&gt;Q-Learning (Active RL)&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;How?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Q-Learning is a model-free reinforcement learning algorithm used to learn the optimal action-value function (Q-values). Unlike TD learning which focuses on state values, Q-learning focuses on (state, action) pairs.&lt;/p&gt;
&lt;p&gt;The Q-learning update rule is:&lt;/p&gt;
&lt;p&gt;$$
Q(s, a) \leftarrow Q(s, a) + \alpha \left[ r + \gamma \max_{a&apos;} Q(s&apos;, a&apos;) - Q(s, a) \right]
$$&lt;/p&gt;
&lt;p&gt;Similar to above, where: $Q(s, a)$ is the current estimate of the Q-value for state $s$ and action $a$, $\alpha$ is the learning rate, and the term $r + \gamma \max_{a&apos;} Q(s&apos;, a&apos;) - Q(s, a)$ is the TD error. You might wonder &quot;why do we need to use the max operator here?&quot; The answer is that we are trying to learn the optimal Q-value for each state-action pair. The max operator allows us to select the best action in the next state $s&apos;$ based on the current Q-values.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Why?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Q-learning allows us to select the best action in each state, unlike TD learning which only evaluates a fixed policy. It&apos;s called &quot;off-policy&quot; because it learns the optimal policy regardless of how the agent is currently behaving (exploration). The agent can follow any exploratory policy during training while still learning the greedy optimal policy.&lt;/p&gt;
&lt;p&gt;With Q-values, we can derive our policy directly:&lt;/p&gt;
&lt;p&gt;$$
\pi(s) = \arg\max_a Q(s, a)
$$&lt;/p&gt;
&lt;p&gt;This means choosing the action that maximizes the expected future rewards for each state.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Problem&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The main challenge in Q-learning is balancing exploration and exploitation, i.e., balancing the behaviour that &quot;Trying new actions to discover potentially better rewards&quot; and &quot;Using known Q-values to maximize rewards based on past experience&quot;&lt;/p&gt;
&lt;p&gt;This is typically addressed using an &lt;strong&gt;$\epsilon$-greedy policy&lt;/strong&gt; or &lt;strong&gt;exploration functions&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;The $\epsilon$-greedy policy works as follows:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;With probability $1-\epsilon$, choose the best action (exploit)&lt;/li&gt;
&lt;li&gt;With probability $\epsilon$, choose a random action (explore)&lt;/li&gt;
&lt;li&gt;Gradually decrease $\epsilon$ over time to favor exploitation as learning progresses&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And the exploration function works as follows:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Define &quot;exploration bonus&quot; based on the uncertainty of Q-values. Let $n$ be the number of times action $a$ has been taken in state $s$. The exploration bonus can be defined as $\frac{1}{n(s, a)}$.&lt;/li&gt;
&lt;li&gt;When choosing actions, add the exploration bonus to the Q-value: $Q(s, a) + \frac{1}{n(s, a)}$.&lt;/li&gt;
&lt;li&gt;This encourages the agent to explore less frequently visited actions, balancing exploration and exploitation.&lt;/li&gt;
&lt;li&gt;Gradually decrease the exploration bonus over time to favor exploitation as learning progresses&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This approach can be more efficient than $\epsilon$-greedy, as it focuses exploration on less certain actions rather than uniformly random actions. So it IS used in practice.&lt;/p&gt;
&lt;h3&gt;Experience Replay&lt;/h3&gt;
&lt;p&gt;Experience replay is a optimization technique used in reinforcement learning, particularly in deep Q-learning. So I&apos;ll add it as a subtopic here.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;How?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Experience replay enhances Q-learning by storing the agent&apos;s experiences (transitions) in a replay buffer. Instead of updating Q-values using only the most recent experience, the agent &lt;strong&gt;stores the recent experience to buffer, and randomly samples batches of past experiences&lt;/strong&gt; from this buffer for training.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Why?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;In Q-learning, the agent learns from its experiences sequentially. This can lead to correlations between consecutive experiences, making learning inefficient. By using experience replay, the agent can break these correlations and learn from a more diverse set of experiences. Consecutive experiences are often similar, making learning inefficient. Random sampling creates more independent training examples.&lt;/p&gt;
&lt;p&gt;This is especially important in deep reinforcement learning where neural networks are used to approximate Q-values.&lt;/p&gt;
&lt;p&gt;With all the problems addressed above, we still could not put the Q-learning algorithm into practice. The problem is that the state space is too large. We cannot store the Q-values for every single state-action pair. So we need to use function approximation to generalize across similar states, and this is where &lt;strong&gt;Approximate Q-Learning&lt;/strong&gt; comes in.&lt;/p&gt;
&lt;h2&gt;Hold on a second&lt;/h2&gt;
&lt;p&gt;But before that, I do believe I need to clarify some points here.&lt;/p&gt;
&lt;p&gt;You might think: &quot;Why Q-Learning is discussed in active learning? Q-learning could be used in passive learning, while TD could also be used in active learning, is that correct?&quot;&lt;/p&gt;
&lt;p&gt;Yes, you are right. In CS188 (and many RL courses), the algorithms are typically presented in this order:&lt;/p&gt;
&lt;p&gt;TD Learning is introduced first as a way to learn value functions for passive learning&lt;/p&gt;
&lt;p&gt;Q-Learning is introduced next as a way to extend these ideas to active learning&lt;/p&gt;
&lt;p&gt;This pedagogical approach sometimes creates the impression that these algorithms are strictly tied to their respective learning categories, but they&apos;re more flexible than that.&lt;/p&gt;
&lt;p&gt;The main difference is that TD learning (as typically presented) learns state values V(s) while Q-learning learns state-action values Q(s,a). Q-values naturally lend themselves to policy improvement (just take argmax), which is why Q-learning is often presented in the active learning context.&lt;/p&gt;
&lt;h2&gt;Approximate Q-Learning&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;How?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;In environments with large or continuous state spaces, it&apos;s impractical to maintain a separate Q-value for each state-action pair. Approximate Q-learning uses function approximation to generalize across similar states.&lt;/p&gt;
&lt;p&gt;Simple solution is that recall the &quot;feature function&quot; that we discussed in the game tree lecture. We describe a state using a vector of features (properties) $f_1, f_2, \ldots, f_n$ and learn a linear function of these features:&lt;/p&gt;
&lt;p&gt;$$
Q(s, a) = w_1 f_1(s, a) + w_2 f_2(s, a) + \ldots + w_n f_n(s, a)
$$&lt;/p&gt;
&lt;p&gt;Where $w_1, w_2, \ldots, w_n$ are weights that we learn through experience.
This is a linear function approximation. We can also use non-linear function approximators like neural networks, but the basic idea is the same: learn a function that maps states (and actions) to Q-values.&lt;/p&gt;
&lt;p&gt;And you might wonder: &quot;How to learn the weights?&quot; The answer is that we can use the same Q-learning update rule, but instead of updating the Q-value directly, we update the weights using some tricks. This tricks is a simple notion of &quot;if something unexpectedly bad happens, blame the features that were on: disprefer all states with that state’s features&quot;.&lt;/p&gt;
&lt;p&gt;So the update rule becomes:&lt;/p&gt;
&lt;p&gt;$$
w_i \leftarrow w_i + \alpha \left[ r + \gamma \max_{a&apos;} Q(s&apos;, a&apos;) - Q(s, a) \right] f_i(s, a)
$$&lt;/p&gt;
&lt;p&gt;Where $f_i(s, a)$ is the value of the $i$-th feature for state $s$ and action $a$. This means we are updating the weights based on the features that were present in the current state-action pair.&lt;/p&gt;
&lt;p&gt;The update rule of $Q$ is still the same:&lt;/p&gt;
&lt;p&gt;$$
Q(s, a) \leftarrow Q(s, a) + \alpha \ \mathrm{Difference}
$$&lt;/p&gt;
&lt;p&gt;Where $\mathrm{Difference} = r + \gamma \max_{a&apos;} Q(s&apos;, a&apos;) - Q(s, a)$&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Why?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Approximate Q-learning allows reinforcement learning to scale to complex environments with huge state spaces (like Atari games, robotics, etc.) where tabular methods would be impossible.&lt;/p&gt;
&lt;p&gt;It enables generalization across similar states, so learning in one state can improve performance in similar states, even those the agent hasn&apos;t encountered yet.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Problem&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Approximate Q-learning faces these two challenges:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Forgetting - learning in one region of the state space can undo learning in another region&lt;/li&gt;
&lt;li&gt;Feature selection - choosing the right representation for states is critical for good generalization&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;Policy Gradient Methods&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;How?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Instead of learning a value function and deriving a policy from it, policy gradient methods directly parameterize the policy itself. That is, the agent&apos;s behavior is described by a function $\pi(a|s; \theta)$, where $\theta$ are the parameters (often the weights of a neural network). The goal is to adjust $\theta$ so that the expected return (the sum of rewards) is maximized.&lt;/p&gt;
&lt;p&gt;The core idea is to use gradient ascent: we estimate how changing the parameters would affect the expected return, and then nudge the parameters in that direction. The update rule looks like this:&lt;/p&gt;
&lt;p&gt;$$
\theta \leftarrow \theta + \alpha \nabla_\theta J(\theta)
$$&lt;/p&gt;
&lt;p&gt;where $J(\theta)$ is the expected return under the current policy.&lt;/p&gt;
&lt;p&gt;But how do we compute this gradient? The answer is the &lt;strong&gt;policy gradient theorem&lt;/strong&gt;, which tells us that the gradient of the expected return can be estimated using samples from the environment:&lt;/p&gt;
&lt;p&gt;$$
\nabla_\theta J(\theta) \approx \mathbb{E}&lt;em&gt;{\pi&lt;/em&gt;\theta} \left[ \nabla_\theta \log \pi_\theta(a|s) \cdot G \right]
$$&lt;/p&gt;
&lt;p&gt;Here, $G$ is the return (sum of discounted rewards) following the action $a$ in state $s$. In practice, we run episodes, collect rewards, and use these samples to estimate the gradient.&lt;/p&gt;
&lt;p&gt;This approach is called &lt;strong&gt;REINFORCE&lt;/strong&gt;, the simplest policy gradient algorithm. Each time the agent takes an action, it computes the gradient of the log-probability of that action, multiplies it by the return, and uses that as the update direction.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Why?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Policy gradient methods are powerful for several reasons. First, they allow us to optimize the policy directly, which is what we ultimately care about. This is especially useful in environments with continuous or high-dimensional action spaces, where value-based methods struggle. Policy gradients can also learn stochastic policies, which can be optimal in environments with inherent randomness or partial observability.&lt;/p&gt;
&lt;p&gt;Another advantage is that policy gradient methods can be combined with function approximation (e.g., neural networks) to handle very large or continuous state spaces. This is the foundation of modern deep reinforcement learning algorithms.&lt;/p&gt;</content:encoded><h:img src="undefined"/><enclosure url="undefined"/></item><item><title>CS188 Notes 2 - Markov Decision Processes (MDPs)</title><link>https://20051110.xyz/blog/cs188-notes-2</link><guid isPermaLink="true">https://20051110.xyz/blog/cs188-notes-2</guid><description>Notes from UC Berkeley&apos;s CS188 course on Artificial Intelligence.</description><pubDate>Sat, 19 Apr 2025 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;Note:&lt;/h2&gt;
&lt;p&gt;You could view previous notes on &lt;a href=&quot;/blog/cs188-notes-1&quot;&gt;CS188: Lecture 4 - Constraint Satisfaction Problems (CSPs)&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Also note that my notes are based on the &lt;strong&gt;Spring 2025&lt;/strong&gt; version of the course, and my understanding of the material. So they MAY NOT be 100% accurate or complete. Also, THIS IS NOT A SUBSTITUTE FOR THE COURSE MATERIAL. I would only take notes on parts of the lecture that I find interesting or confusing. I will NOT be taking notes on every single detail of the lecture.&lt;/p&gt;
&lt;h2&gt;CS188: Lecture 8 - Markov Decision Processes (MDPs)&lt;/h2&gt;
&lt;h3&gt;Markov Decision Processes (MDPs)&lt;/h3&gt;
&lt;p&gt;A Markov Decision Process (MDP) represents sequential decision-making in environments where actions produce stochastic (random) outcomes, and an agent&apos;s goal is to maximize its cumulative reward over time. In an MDP, the agent faces uncertainty: it cannot always predict the result of its actions, but it must still try to act optimally.&lt;/p&gt;
&lt;p&gt;The key components of an MDP are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;States&lt;/strong&gt; $S$: Possible situations the agent can find itself in.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Actions&lt;/strong&gt; $A$: The set of possible moves or decisions the agent can make in each state.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Transition Function&lt;/strong&gt; $T(s, a, s&apos;)$: The probability that action $a$ in state $s$ leads to state $s&apos;$.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Reward Function&lt;/strong&gt; $R(s, a, s&apos;)$: The reward received after transitioning from $s$ to $s&apos;$ via action $a$.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Discount Factor&lt;/strong&gt; $\gamma$: How much the agent values future rewards compared to immediate rewards.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We yield the value function $V(s)$ for each &lt;strong&gt;state&lt;/strong&gt; $s$, which represents the expected cumulative reward starting from state $s$ and following the optimal policy thereafter. And the action-value function $Q(s, a)$ for each &lt;strong&gt;action state&lt;/strong&gt; $(s,a)$, which represents the expected cumulative reward starting from state $s$, taking action $a$, and then following the optimal policy thereafter.&lt;/p&gt;
&lt;p&gt;You might think &quot;why not just use the value function $V(s)$?&quot; The reason is actions are easier to select from $Q$-values than values! You will see this in the following part of this lecture.&lt;/p&gt;
&lt;p&gt;The &lt;strong&gt;goal&lt;/strong&gt; is to find an optimal &lt;strong&gt;policy&lt;/strong&gt; $\pi^*$, which is a mapping from states to actions ($\pi(s) = a$), maximizing the expected cumulative (usually discounted) reward from any state. In this sense, an MDP defines both the &quot;game rules&quot; and what it means to &quot;play well&quot; in that environment.&lt;/p&gt;
&lt;h4&gt;Stationary Preferences&lt;/h4&gt;
&lt;p&gt;The assumption of &lt;strong&gt;stationary preferences&lt;/strong&gt; means that your relative preference between two future sequences of rewards doesn&apos;t change just because you receive the same immediate reward before both. This property imposes a recursive structure on the utility function for reward sequences.&lt;/p&gt;
&lt;p&gt;Formally, the utility $U$ of a sequence $[r_0, r_1, r_2, ...]$ must satisfy:&lt;/p&gt;
&lt;p&gt;$$
U([r_0, r_1, r_2, ...]) = f(r_0, U([r_1, r_2, ...]))
$$&lt;/p&gt;
&lt;p&gt;where $f$ is some consistent function. If we assume $f$ is linear, this recursion unrolls to only two possible forms for the utility function (after appropriate normalization):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Additive Utility:&lt;/strong&gt; $U = r_0 + r_1 + r_2 + \cdots$ (corresponds to $\gamma = 1$)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Discounted Utility:&lt;/strong&gt; $U = r_0 + \gamma r_1 + \gamma^2 r_2 + \cdots$ (where $0 \leq \gamma &amp;#x3C; 1$)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The discounted utility is the standard in MDPs, as it ensures convergence for infinite horizons and reflects the diminishing importance of rewards further in the future.&lt;/p&gt;
&lt;h4&gt;Why MDPs?&lt;/h4&gt;
&lt;p&gt;MDPs are particularly suitable when:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The environment is &lt;strong&gt;stochastic&lt;/strong&gt;: the same action in the same state can yield different results.&lt;/li&gt;
&lt;li&gt;Rewards may be &lt;strong&gt;delayed&lt;/strong&gt;: the value of an action now may be realized only after multiple future steps.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Unlike simple search algorithms (e.g., greedy or expectimax), MDPs explicitly model both uncertainty (via $T$) and the accumulation of rewards over time (via $R$ and $\gamma$). While solving an MDP requires knowledge of $T$ and $R$, reinforcement learning (RL) methods learn optimal policies directly from experience, using the MDP framework as a theoretical foundation.&lt;/p&gt;
&lt;h4&gt;MDPs vs Expectimax&lt;/h4&gt;
&lt;p&gt;Both MDPs and expectimax handle uncertainty and aim for maximum expected utility. Expectimax, however, is typically used to compute the expected value of actions from a specific starting point, often with a tree structure and a finite horizon. MDPs, in contrast, compute a &lt;strong&gt;policy&lt;/strong&gt;—the best action for every possible state—naturally handling cycles and infinite (discounted) horizons.&lt;/p&gt;
&lt;p&gt;In short: expectimax is a limited lookahead from the current state; solving an MDP finds a full strategy for all states.&lt;/p&gt;
&lt;h4&gt;MDPs and Multi-Agent Games&lt;/h4&gt;
&lt;p&gt;Standard MDPs are designed for a &lt;strong&gt;single agent&lt;/strong&gt; interacting with a stochastic environment. They do not directly accommodate multiple strategic agents whose actions affect each other&apos;s outcomes. Multi-agent situations typically require other formalisms, such as stochastic games or Markov games.&lt;/p&gt;
&lt;h4&gt;MDPs vs Greedy Search&lt;/h4&gt;
&lt;p&gt;Greedy algorithms make decisions based solely on immediate rewards, without considering long-term consequences. MDPs, by calculating the expected sum of (possibly discounted) future rewards, are inherently long-sighted. Optimizing for the value function $V^&lt;em&gt;(s)$ or the action-value function $Q^&lt;/em&gt;(s,a)$, MDPs look ahead through the space of future possibilities, not just the next step.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;strong&gt;ONE SENTENCE SUMMARY:&lt;/strong&gt;&lt;br&gt;
Markov Decision Processes are mathematical models for sequential decision-making under uncertainty, aiming to find policies that maximize expected (possibly discounted) cumulative reward, and forming the theoretical foundation for reinforcement learning.&lt;/p&gt;</content:encoded><h:img src="undefined"/><enclosure url="undefined"/></item><item><title>CS188 Notes 3 - Markov Decision Processes (MDPs) II</title><link>https://20051110.xyz/blog/cs188-notes-3</link><guid isPermaLink="true">https://20051110.xyz/blog/cs188-notes-3</guid><description>Notes from UC Berkeley&apos;s CS188 course on Artificial Intelligence.</description><pubDate>Sat, 19 Apr 2025 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;Note:&lt;/h2&gt;
&lt;p&gt;You could view previous notes on &lt;a href=&quot;/blog/cs188-notes-2&quot;&gt;CS188: Lecture 8 - Markov Decision Processes (MDPs)&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Also note that my notes are based on the &lt;strong&gt;Spring 2025&lt;/strong&gt; version of the course, and my understanding of the material. So they MAY NOT be 100% accurate or complete. Also, THIS IS NOT A SUBSTITUTE FOR THE COURSE MATERIAL. I would only take notes on parts of the lecture that I find interesting or confusing. I will NOT be taking notes on every single detail of the lecture.&lt;/p&gt;
&lt;h2&gt;Markov Decision Processes (MDPs)&lt;/h2&gt;
&lt;p&gt;After the previous lecture, I realized I had some misunderstandings about the Policy Iteration algorithm, especially when compared to Value Iteration. So here, I&apos;ll clarify my understanding of these two core approaches for solving MDPs.&lt;/p&gt;
&lt;h3&gt;Why use a &quot;fixed policy&quot; in Policy Iteration?&lt;/h3&gt;
&lt;p&gt;It can be confusing at first that Policy Iteration evaluates a fixed policy. You might ask: does using a fixed, possibly non-optimal policy ever lead to the optimal one?&lt;/p&gt;
&lt;p&gt;The answer is that evaluating a fixed policy is an essential &lt;em&gt;intermediate&lt;/em&gt; step towards finding the optimal policy. We might &quot;evaluate&quot; a policy that is not optimal, but we it yields valuable information about the expected future rewards of that policy, so finnaly what we act on is the optimal policy.&lt;/p&gt;
&lt;p&gt;In Policy Iteration, we loop between two key phases:&lt;/p&gt;
&lt;h4&gt;&lt;strong&gt;Step 1: Policy Evaluation&lt;/strong&gt;&lt;/h4&gt;
&lt;p&gt;We begin with an initial policy $\pi$ (random, greedy, whatever). For this $\pi$, we compute the exact utility $V^{\pi}(s)$ for each state $s$ under the assumption that we &lt;em&gt;always&lt;/em&gt; follow $\pi$. The Bellman equation for this is:&lt;/p&gt;
&lt;p&gt;$$
V^{\pi}(s) = \sum_{s&apos;} T(s, \pi(s), s&apos;) [ R(s, \pi(s), s&apos;) + \gamma V^{\pi}(s&apos;) ]
$$&lt;/p&gt;
&lt;p&gt;This evaluates the policy&apos;s long-term value at every state, given that policy.&lt;/p&gt;
&lt;h4&gt;&lt;strong&gt;Step 2: Policy Improvement&lt;/strong&gt;&lt;/h4&gt;
&lt;p&gt;Now that we have $V^{\pi}$, we look at each state $s$ and ask: &quot;Is there an action $a$ that would improve my expected future rewards if I took it immediately, then continued with $\pi$?&quot;&lt;/p&gt;
&lt;p&gt;For each state, we consider:&lt;/p&gt;
&lt;p&gt;$$
Q^{\pi}(s, a) = \sum_{s&apos;} T(s, a, s&apos;) [ R(s, a, s&apos;) + \gamma V^{\pi}(s&apos;) ]
$$&lt;/p&gt;
&lt;p&gt;We then build a new policy by setting:&lt;/p&gt;
&lt;p&gt;$$
\pi_{\text{new}}(s) = \arg\max_a Q^{\pi}(s, a)
$$&lt;/p&gt;
&lt;p&gt;That is, for each state, choose the action that looks best based on the values under the old policy. This is the &lt;em&gt;policy improvement&lt;/em&gt; step.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Repeat:&lt;/strong&gt; We now re-evaluate the new policy $\pi_{\text{new}}$, and the process continues until the policy stops changing. This guarantees convergence to the optimal policy $\pi^&lt;em&gt;$ and optimal value function $V^&lt;/em&gt;$. Evaluating a fixed policy at each stage is essential for knowing both how good our current strategy is and how to improve it.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;What is the difference between Policy Iteration and Value Iteration?&lt;/h2&gt;
&lt;p&gt;In short:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Value Iteration&lt;/strong&gt; is always searching for the best action at each step, directly refining the estimate of the &lt;em&gt;optimal&lt;/em&gt; value function.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Policy Evaluation&lt;/strong&gt; (as used in Policy Iteration) simply calculates the consequences of following a &lt;em&gt;predefined&lt;/em&gt; plan $\pi$, without improvement during evaluation itself. Policy improvement occurs as a separate step.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Let&apos;s break down the differences in detail.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Value Iteration Equation:&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;$$
V_{k+1}(s) = \max_{a} \sum_{s&apos;} T(s, a, s&apos;) [ R(s, a, s&apos;) + \gamma V_{k}(s&apos;) ]
$$&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Goal:&lt;/strong&gt; Directly compute the optimal value function $V^*(s)$.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;How:&lt;/strong&gt; Each iteration, for each state $s$, considers all possible actions $a$. For each action, it calculates the expected value (reward + discounted future value), then takes the maximum over all actions.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Policy:&lt;/strong&gt; Implicit. The $\max$ operation is finding the best action, and the final optimal policy $\pi^*$ is extracted after $V_k$ converges.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;What it computes:&lt;/strong&gt; Iteratively refines the best possible long-term value from each state.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong&gt;Policy Evaluation Equation (for a fixed policy $\pi$):&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;$$
V^{\pi}&lt;em&gt;{k+1}(s) = \sum&lt;/em&gt;{s&apos;} T(s, \pi(s), s&apos;) [ R(s, \pi(s), s&apos;) + \gamma V^{\pi}_k(s&apos;) ]
$$&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Goal:&lt;/strong&gt; Compute the value function $V^\pi(s)$ for the given, fixed policy $\pi$ (which may not be optimal).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;How:&lt;/strong&gt; Each iteration, for each state $s$, uses only the action prescribed by $\pi$: $a = \pi(s)$. Calculates the expected value (reward + discounted future value) following this fixed action. There is no $\max$ because the action is predetermined by $\pi$.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Policy:&lt;/strong&gt; Explicit and fixed throughout evaluation.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;Comparison Table&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;| Feature           | Value Iteration (VI)                        | Policy Evaluation (PE for fixed $\pi$)                    |
| :---------------- | :------------------------------------------ | :-------------------------------------------------------- |
| Equation Core     | $\max_a \sum T(s,a,s&apos;)[R + \gamma V_k(s&apos;)]$ | $\sum T(s, \pi(s), s&apos;)[R + \gamma V^\pi_k(s&apos;)]$           |
| $\max_a$ Present? | &lt;strong&gt;Yes&lt;/strong&gt;                                     | &lt;strong&gt;No&lt;/strong&gt;                                                    |
| Action Choice     | Considers all $a$, picks the best           | Only the action $\pi(s)$ given by policy                  |
| Policy Role       | Policy is implicit (via $\max$)             | Policy is explicit and fixed                              |
| Goal              | Compute optimal value function $V^&lt;em&gt;$        | Compute value function $V^\pi$ for the given policy $\pi$ |
| Used Where?       | Standalone algorithm to find $V^&lt;/em&gt;$          | Subroutine within Policy Iteration                        |
| Convergence       | $V_k$ converges to $V^*$                    | $V^\pi_k$ converges to $V^\pi$                            |&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;Does Policy Evaluation converge after more iterations than Value Iteration?&lt;/h3&gt;
&lt;p&gt;It&apos;s tempting to think that Policy Evaluation takes more iterations to converge, since it does not optimize at every step, but in practice, Policy Iteration often &lt;strong&gt;converges in fewer&lt;/strong&gt; outer iterations (policy updates) than Value Iteration, though the work per iteration can differ.&lt;/p&gt;
&lt;p&gt;The real power of Policy Iteration comes after Policy Evaluation. Once we have $V^\pi$ for our current policy, we can often make a large jump to a better policy by improving all states at once:&lt;/p&gt;
&lt;p&gt;$$
\pi_{\text{new}}(s) = \arg\max_a \sum_{s&apos;} T(s, a, s&apos;) [ R(s, a, s&apos;) + \gamma V^\pi(s&apos;) ]
$$&lt;/p&gt;
&lt;p&gt;We only repeat this process until the policy stops changing, which often happens quickly and requires fewer overall iterations than Value Iteration.&lt;/p&gt;</content:encoded><h:img src="undefined"/><enclosure url="undefined"/></item><item><title>CS188 Notes 1 - Constraint Satisfaction Problems (CSPs)</title><link>https://20051110.xyz/blog/cs188-notes-1</link><guid isPermaLink="true">https://20051110.xyz/blog/cs188-notes-1</guid><description>Notes from UC Berkeley&apos;s CS188 course on Artificial Intelligence.</description><pubDate>Fri, 18 Apr 2025 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;Note:&lt;/h2&gt;
&lt;p&gt;This is a work in progress. I will be adding more notes and examples as I go through the course. The course is available on the &lt;a href=&quot;https://inst.eecs.berkeley.edu/~cs188/sp25/&quot;&gt;Berkeley website&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Note that my notes are based on the &lt;strong&gt;Spring 2025&lt;/strong&gt; version of the course, and my understanding of the material. So they MAY NOT be 100% accurate or complete. Also, THIS IS NOT A SUBSTITUTE FOR THE COURSE MATERIAL. I would only take notes on parts of the lecture that I find interesting or confusing. I will NOT be taking notes on every single detail of the lecture.&lt;/p&gt;
&lt;p&gt;I will begin my notes with Lec.4 (CSPs I) and continue from there.&lt;/p&gt;
&lt;h2&gt;CS188: Lecture 4 - Constraint Satisfaction Problems (CSPs)&lt;/h2&gt;
&lt;p&gt;The &lt;strong&gt;goal&lt;/strong&gt; is to find a &lt;strong&gt;complete assignment&lt;/strong&gt; (every variable has a value from its domain) such that &lt;strong&gt;all constraints&lt;/strong&gt; are satisfied. CSPs are a special kind of search problem where the path to the goal doesn&apos;t matter, only the final state.&lt;/p&gt;
&lt;h3&gt;Backtracking Search&lt;/h3&gt;
&lt;p&gt;The fundamental algorithm for solving CSPs systematically is &lt;strong&gt;Backtracking Search&lt;/strong&gt;. It works as follows:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Start with an empty assignment.&lt;/li&gt;
&lt;li&gt;Select an unassigned variable.&lt;/li&gt;
&lt;li&gt;Try assigning a value from its domain.&lt;/li&gt;
&lt;li&gt;Check if this assignment violates any constraints with already assigned variables.
&lt;ul&gt;
&lt;li&gt;If &lt;strong&gt;no violation&lt;/strong&gt;, recursively call backtracking for the next variable. If the recursive call succeeds, we&apos;re done (or continue if finding all solutions).&lt;/li&gt;
&lt;li&gt;If &lt;strong&gt;violation&lt;/strong&gt;, or if the recursive call returns failure, try the next value for the current variable.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;If all values for the current variable have been tried and failed, &lt;strong&gt;backtrack&lt;/strong&gt;: return failure to the previous call, forcing it to try a different value.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This explores the space of partial assignments in a depth-first manner. While complete (guaranteed to find a solution if one exists), basic backtracking can be very slow.&lt;/p&gt;
&lt;h3&gt;Filtering (Constraint Propagation)&lt;/h3&gt;
&lt;p&gt;Filtering techniques aim to prune the search space &lt;em&gt;before&lt;/em&gt; or &lt;em&gt;during&lt;/em&gt; backtracking by removing values from domains that cannot possibly lead to a solution.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Forward Checking:&lt;/strong&gt;
When a variable &lt;code&gt;X&lt;/code&gt; is assigned a value &lt;code&gt;v&lt;/code&gt;, look at all unassigned neighboring variables &lt;code&gt;Y&lt;/code&gt; connected to &lt;code&gt;X&lt;/code&gt; by a constraint. Remove any value &lt;code&gt;y&lt;/code&gt; from &lt;code&gt;Y&lt;/code&gt;&apos;s domain that is inconsistent with &lt;code&gt;X=v&lt;/code&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Why:&lt;/strong&gt; Simple, relatively cheap check that prevents immediate failures down the line.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Limitation:&lt;/strong&gt; It only checks constraints between the &lt;em&gt;newly assigned&lt;/em&gt; variable and its &lt;em&gt;future&lt;/em&gt; neighbors. It doesn&apos;t detect inconsistencies &lt;em&gt;between two unassigned variables&lt;/em&gt;, even if their domains have been reduced (e.g., if both NT and SA are reduced to only {Blue}, Forward Checking won&apos;t notice the NT-SA conflict until one of them is assigned).&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Arc Consistency (2-Consistency):&lt;/strong&gt;
An arc &lt;code&gt;X -&gt; Y&lt;/code&gt; is consistent if &lt;em&gt;for every&lt;/em&gt; value &lt;code&gt;x&lt;/code&gt; remaining in &lt;code&gt;X&lt;/code&gt;&apos;s domain, there exists &lt;em&gt;at least one&lt;/em&gt; value &lt;code&gt;y&lt;/code&gt; remaining in &lt;code&gt;Y&lt;/code&gt;&apos;s domain such that &lt;code&gt;(x, y)&lt;/code&gt; satisfies the constraint between &lt;code&gt;X&lt;/code&gt; and &lt;code&gt;Y&lt;/code&gt;. So you could think of it as a &quot;two-way&quot; check, a update of the previous mentioned Forward Checking.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;How (AC-3 Algorithm Idea):&lt;/strong&gt; Maintain a queue of all arcs. While the queue is not empty, pop an arc &lt;code&gt;X -&gt; Y&lt;/code&gt;. Check if it&apos;s consistent. If not, remove the inconsistent value(s) &lt;code&gt;x&lt;/code&gt; from &lt;code&gt;X&lt;/code&gt;&apos;s domain (&quot;delete from the tail&quot;). &lt;strong&gt;Crucially:&lt;/strong&gt; If any value was removed from &lt;code&gt;X&lt;/code&gt;, add all arcs &lt;code&gt;Z -&gt; X&lt;/code&gt; (where &lt;code&gt;Z&lt;/code&gt; is a neighbor of &lt;code&gt;X&lt;/code&gt;, other than &lt;code&gt;Y&lt;/code&gt;) back into the queue, because the removal might make some values in &lt;code&gt;Z&lt;/code&gt; inconsistent. Repeat until the queue is empty (no more values can be removed).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Why:&lt;/strong&gt; More powerful than Forward Checking. It propagates constraints between variables, potentially detecting failures much earlier (like the NT-SA {Blue} conflict). Can be used as preprocessing or maintained during search. However, it is more computationally expensive than Forward Checking, but it is often worth it.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;K-Consistency &amp;#x26; Strong K-Consistency:&lt;/strong&gt;
Generalizes consistency checks to &lt;code&gt;k&lt;/code&gt; variables. K-Consistency means any consistent assignment to &lt;code&gt;k-1&lt;/code&gt; variables can be extended to a &lt;code&gt;k&lt;/code&gt;-th variable.
1-Consistency = Node Consistency (unary constraints).
2-Consistency = Arc Consistency.
3-Consistency = Path Consistency.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Strong K-Consistency:&lt;/strong&gt; Means the CSP is J-Consistent for all &lt;code&gt;J&lt;/code&gt; from 1 to &lt;code&gt;K&lt;/code&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Fact&lt;/strong&gt;: &lt;strong&gt;Strong&lt;/strong&gt; n-Consistency (where n is the number of variables) guarantees a solution can be found &lt;strong&gt;without backtracking&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;My misunderstanding&lt;/strong&gt;: Why &quot;&lt;strong&gt;Strong&lt;/strong&gt;&quot;? Because the backtrack-free construction process requires the guarantee at &lt;em&gt;every&lt;/em&gt; step &lt;code&gt;k&lt;/code&gt;. Step &lt;code&gt;k&lt;/code&gt; requires k-Consistency &lt;em&gt;assuming&lt;/em&gt; the first &lt;code&gt;k-1&lt;/code&gt; assignments were consistent. Plain n-Consistency only guarantees the &lt;em&gt;last&lt;/em&gt; step (n-1 to n) works, but doesn&apos;t guarantee the intermediate steps (like 2 to 3) are possible if the problem isn&apos;t also 3-Consistent, etc. A problem could be n-Consistent (vacuously, if no consistent n-1 assignments exist) but fail lower levels of consistency, requiring backtracking or even having no solution. Strong n-Consistency ensures all necessary intermediate guarantees hold.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Speeding Up Backtracking&lt;/h3&gt;
&lt;p&gt;These heuristics don&apos;t prune the search space but guide the backtracking search to potentially find solutions faster or detect failures earlier.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Variable Ordering: Minimum Remaining Values (MRV):&lt;/strong&gt;
Choose the &lt;em&gt;next unassigned variable&lt;/em&gt; that has the &lt;strong&gt;fewest&lt;/strong&gt; legal values left in its domain.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Why (&quot;Fail-Fast&quot;):&lt;/strong&gt; If a variable has 0 values, failure is detected immediately. If it has 1 value, it&apos;s forced, simplifying the problem. Variables with few values are often bottlenecks; dealing with them early is likely to prune large parts of the search tree quickly if they lead to failure. Also called &quot;most constrained variable&quot;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Value Ordering: Least Constraining Value (LCV):&lt;/strong&gt;
Once a variable is selected (e.g., by MRV), try assigning values from its domain in an order. Choose the value that &lt;strong&gt;rules out the fewest&lt;/strong&gt; values in the domains of &lt;em&gt;neighboring unassigned variables&lt;/em&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Why (&quot;Succeed-First&quot;):&lt;/strong&gt; Tries to keep options open for the future, increasing the chance that the current path leads to a solution without immediate backtracking. It prioritizes choices that seem less likely to cause conflicts later.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;MRV and LCV often work very well together.&lt;/p&gt;</content:encoded><h:img src="undefined"/><enclosure url="undefined"/></item><item><title>The Hidden Cost of try-catch</title><link>https://20051110.xyz/blog/try-catch</link><guid isPermaLink="true">https://20051110.xyz/blog/try-catch</guid><description>Profiling revealed that using exceptions in C++ for expected control flow can lead to significant performance degradation.</description><pubDate>Sat, 12 Apr 2025 14:44:00 GMT</pubDate><content:encoded>&lt;h2&gt;The Problem&lt;/h2&gt;
&lt;p&gt;So I was implementing my own version of standard library containers like &lt;code&gt;std::map&lt;/code&gt;. It&apos;s a fantastic learning exercise! I get to the &lt;code&gt;operator[]&lt;/code&gt;, the &lt;code&gt;access-or-insert&lt;/code&gt; function. And I was looking at the existing &lt;code&gt;at()&lt;/code&gt; method (which provides bounds-checked access) and think, &quot;Aha! I can reuse &lt;code&gt;at()&lt;/code&gt; and just catch the exception if the key isn&apos;t there!&quot;&lt;/p&gt;
&lt;p&gt;It seems elegant, right? I wrote something like this:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-cpp&quot;&gt;T &amp;#x26;at(const Key &amp;#x26;key) {
    if (root == nullptr) {
        throw index_out_of_bound();
    }
    return find(key, root);
}

T &amp;#x26;operator[](const Key &amp;#x26;key) {
    try {
        return at(key);
    } catch (index_out_of_bound &amp;#x26;) {
        // insert
        value_type value(key, T());
        pair&amp;#x3C;iterator, bool&gt; result = insert(value);
        return result.first-&gt;second;
    }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I compile it, feeling pretty good about the code reuse. Then, I run your benchmarks, comparing my &lt;code&gt;sjtu::operator[]&lt;/code&gt; against &lt;code&gt;std::map::operator[]&lt;/code&gt;, especially focusing on scenarios involving insertions (where the key doesn&apos;t initially exist), and boom - Time Limit Exceeded. Why? So I looked at the benchmark script, and it got something like&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-cpp&quot;&gt;	//	test: erase()
	while (map.begin() != map.end()) {
		map.erase(map.begin());
	}
	assert(map.empty() &amp;#x26;&amp;#x26; map.size() == 0);
	//	test: operator[]
	for (int i = 0; i &amp;#x3C; 100000; ++i) {
		std::cout &amp;#x3C;&amp;#x3C; map[Integer(i)];
	}
	std::cout &amp;#x3C;&amp;#x3C; map.size() &amp;#x3C;&amp;#x3C; std::endl;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;So probably you have already identified the problem now, but not so lucky for me. I was just thinking, &quot;Oh, maybe the &lt;code&gt;insert&lt;/code&gt; function is slow.&quot;&lt;/p&gt;
&lt;h2&gt;The Profiling&lt;/h2&gt;
&lt;p&gt;The benchmark results are shocking. This implementation is &lt;em&gt;dramatically&lt;/em&gt; slower – in my case, it is 88% slower – than &lt;code&gt;std::map&lt;/code&gt; specifically when &lt;code&gt;operator[]&lt;/code&gt; results in inserting a new element. Accessing existing elements might be fine, but the insert path is killing performance.&lt;/p&gt;
&lt;p&gt;What gives? Is your tree balancing algorithm inefficient? Is memory allocation slow? This is where debugging tools become essential. Simple code inspection doesn&apos;t immediately reveal &lt;em&gt;why&lt;/em&gt; it&apos;s so much slower as it DO ACHIEVE $O(\log N)$ Time complexity.&lt;/p&gt;
&lt;p&gt;Time to bring out the &lt;strong&gt;profilers&lt;/strong&gt;. Tools like &lt;code&gt;perf&lt;/code&gt; (on Linux) and &lt;code&gt;callgrind&lt;/code&gt; (part of the Valgrind suite) are designed to answer the question: &quot;Where is my program &lt;em&gt;actually&lt;/em&gt; spending its time?&quot;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Beginning with &lt;code&gt;perf record ./code&lt;/code&gt; followed by &lt;code&gt;perf report&lt;/code&gt; is a great start as it already provides simple CLI views to see which functions are &quot;hot&quot; – consuming the most CPU cycles. The &lt;code&gt;perf report&lt;/code&gt; points towards functions with names &lt;code&gt;_Unwind_Find_FDE&lt;/code&gt;, and various functions involved in stack unwinding and exception handling. This already reminded me to focus on some syntax issues (improper coding) instead of my code. However, I’m unfamiliar with something like _Unwind_Find_FDE, so I use callgrind to further view the instruction counts.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;img src=&quot;https://20051110.xyz/_astro/perf-report-1.CJQNxDZ7_g2E47.webp&quot; alt=&quot;Perf Report&quot;&gt;&lt;/p&gt;
&lt;ol start=&quot;2&quot;&gt;
&lt;li&gt;&lt;strong&gt;Running &lt;code&gt;callgrind&lt;/code&gt;:&lt;/strong&gt; I run &lt;code&gt;valgrind --tool=callgrind ./code&lt;/code&gt;. And I am using macOS, so I use &lt;code&gt;qcachegrind&lt;/code&gt; to visualize the results.
&lt;ul&gt;
&lt;li&gt;The visualization confirms &lt;code&gt;perf&lt;/code&gt;&apos;s findings but with more detail. I can see that when &lt;code&gt;sjtu::operator[]&lt;/code&gt; calls &lt;code&gt;sjtu::at&lt;/code&gt; and &lt;code&gt;at&lt;/code&gt; executes &lt;code&gt;throw&lt;/code&gt;, a massive cascade of function calls related to exception handling follows - costing 87% of execution time!!!!!&lt;/li&gt;
&lt;li&gt;Crucially, &lt;code&gt;callgrind&lt;/code&gt; shows the &lt;em&gt;cost&lt;/em&gt; associated not just with the &lt;code&gt;throw&lt;/code&gt; itself, but with the entire &lt;strong&gt;stack unwinding&lt;/strong&gt; process – the runtime searching for the &lt;code&gt;catch&lt;/code&gt; block and meticulously destroying any local objects created within the &lt;code&gt;try&lt;/code&gt; block and intervening function calls.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;img src=&quot;https://20051110.xyz/_astro/qcachegrind-0.Cffbs33F_Z24XwlU.webp&quot; alt=&quot;Callgrind Report&quot;&gt;&lt;/p&gt;
&lt;h3&gt;The &quot;Aha!&quot; Moment&lt;/h3&gt;
&lt;p&gt;The profilers leave no doubt. The performance bottleneck &lt;strong&gt;is&lt;/strong&gt; the deliberate, designed-in overhead of the C++ exception handling mechanism being triggered repeatedly for a &lt;em&gt;non-exceptional&lt;/em&gt; condition (key not found during an insertion).&lt;/p&gt;
&lt;h2&gt;What &lt;em&gt;Actually&lt;/em&gt; Happens When C++ Throws an Exception? (And Why Profilers Flag It)&lt;/h2&gt;
&lt;p&gt;After chatting with some AI Chatbots and doing some googling, I realize that throwing and catching an exception isn&apos;t just a fancy &lt;code&gt;goto&lt;/code&gt;. Instead, it involves a complex runtime process that the profilers pick up as costly operations:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Exception Object Creation:&lt;/strong&gt; &lt;code&gt;throw std::out_of_range(...)&lt;/code&gt; creates an object, often involving dynamic memory allocation (heap allocation shows up in profilers).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Stack Unwinding:&lt;/strong&gt; (The main cost flagged by profilers) The runtime walks backward up the call stack.
&lt;ul&gt;
&lt;li&gt;It destroys local objects (RAII cleanup). Profilers show time spent in destructors &lt;em&gt;during&lt;/em&gt; unwinding.&lt;/li&gt;
&lt;li&gt;It consults compiler-generated &quot;unwinding tables&quot;. Accessing and processing this data takes time/instructions.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Handler Matching:&lt;/strong&gt; The runtime checks &lt;code&gt;catch&lt;/code&gt; blocks using RTTI, adding overhead.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Control Transfer:&lt;/strong&gt; Jumping to the &lt;code&gt;catch&lt;/code&gt; block disrupts linear execution flow, potentially causing instruction cache misses and branch mispredictions (subtler effects seen in very low-level profiling).&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The profiling results, combined with understanding the mechanics, paint a clear picture:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Stack Unwinding Overhead:&lt;/strong&gt; As &lt;code&gt;callgrind&lt;/code&gt; showed, walking the stack, looking up cleanup actions, and calling destructors is expensive, especially compared to a simple &lt;code&gt;if&lt;/code&gt; check.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Runtime Machinery:&lt;/strong&gt; The hidden machinery (dynamic allocation, RTTI, table lookups) adds significant overhead absent in direct conditional logic.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Optimization Barriers:&lt;/strong&gt; Exception handling constructs can limit compiler optimizations compared to simpler control flow, contributing to higher instruction counts seen in &lt;code&gt;callgrind&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In our &lt;code&gt;operator[]&lt;/code&gt; example, the case where the key &lt;em&gt;doesn&apos;t&lt;/em&gt; exist is expected. By using exceptions here, we frequently trigger the heavyweight process the profilers flagged, leading to poor performance.&lt;/p&gt;
&lt;p&gt;So what does a normal &lt;code&gt;operator[]&lt;/code&gt; look like? It should be something like this:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-cpp&quot;&gt;    T &amp;#x26;operator[](const Key &amp;#x26;key) {
        Node *node = find_node(key, root);
        if (node != nullptr) {
            return node-&gt;data.second;
        } else {
            // Insert new element
            value_type value(key, T());
            pair&amp;#x3C;iterator, bool&gt; result = insert(value);
            return result.first-&gt;second;
        }
    }
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;and the profiler results should look like something like this:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://20051110.xyz/_astro/perf-report-0.Dl7yfGEs_2mxqnc.webp&quot; alt=&quot;Normal Report&quot;&gt;&lt;/p&gt;
&lt;p&gt;As you can see in the image, the top CPU-consuming functions are now actual function in the code, not the exception handling machinery. The &lt;code&gt;find_node&lt;/code&gt; function is now the most expensive operation, which is expected since it involves
$O(\log N)$ tree traversal.&lt;/p&gt;</content:encoded><h:img src="undefined"/><enclosure url="undefined"/></item><item><title>My First VSCode Extension - ACMOJ Helper from Scratch</title><link>https://20051110.xyz/blog/vscode-extension</link><guid isPermaLink="true">https://20051110.xyz/blog/vscode-extension</guid><description>As a student frequently using ACMOJ, constantly switching between VSCode and browser was tedious. Could I complete all these operations within VSCode?</description><pubDate>Sun, 06 Apr 2025 06:25:00 GMT</pubDate><content:encoded>&lt;p&gt;Constantly switching between the editor (VS Code) and browser was incredibly tedious. Looking at problem descriptions, examples, and outputs in the browser, then comparing results, copying code to VS Code, writing and debugging, copying back to browser for submission, and finally switching back to browser to check results... Although I could use split screen, the Stage Manager experience on macOS wasn&apos;t great. This process not only interrupted my thought flow but was also inefficient.&lt;/p&gt;
&lt;p&gt;Could I complete all these operations within VSCode? Seeing classmates in my class developing plugins, it didn&apos;t seem that difficult. Having recently learned Golang, TypeScript didn&apos;t seem too hard to learn either 😋 With this idea in mind, I created my first VSCode extension development journey, aiming to create a convenient assistant for ACMOJ. This article documents the process from conception to implementation, through pitfalls to the final working product.&lt;/p&gt;
&lt;h2&gt;Getting Started&lt;/h2&gt;
&lt;p&gt;VS Code extensions are primarily written in TypeScript (or JavaScript) and run in a Node.js environment. Before starting, the essential tools are:&lt;/p&gt;
&lt;p&gt;Node.js &amp;#x26; npm/yarn serve as the basic runtime environment and package manager. Yeoman &amp;#x26; generator-code are the official scaffolding tools recommended by VS Code for quickly generating project structure. Simply run &lt;code&gt;npm install -g yo generator-code&lt;/code&gt; followed by &lt;code&gt;yo code&lt;/code&gt; and select TypeScript Extension. VS Code itself is needed for developing and debugging the plugin.&lt;/p&gt;
&lt;p&gt;The generated project structure is clear and straightforward. The &lt;code&gt;src/extension.ts&lt;/code&gt; file serves as the plugin&apos;s entry point, containing &lt;code&gt;activate&lt;/code&gt; (called when activated) and &lt;code&gt;deactivate&lt;/code&gt; (called when deactivated) functions. The &lt;code&gt;package.json&lt;/code&gt; file is the core manifest file, defining the plugin&apos;s metadata, &lt;strong&gt;contributions&lt;/strong&gt; (such as commands, views, configurations), and &lt;strong&gt;activation events&lt;/strong&gt; (determining when to load the plugin). The &lt;code&gt;tsconfig.json&lt;/code&gt; file contains TypeScript configuration.&lt;/p&gt;
&lt;p&gt;My initial blueprint was to implement these core features:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Authentication:&lt;/strong&gt; Connect to the ACMOJ API.
&lt;strong&gt;Problem/Assignment Browsing:&lt;/strong&gt; View problem lists in VS Code&apos;s sidebar.
&lt;strong&gt;Problem Details:&lt;/strong&gt; Display problem descriptions, examples, etc. in Webview.
&lt;strong&gt;Code Submission:&lt;/strong&gt; Quickly submit code from the current editor.
&lt;strong&gt;Result Viewing:&lt;/strong&gt; View submission status and results in sidebar or Webview.&lt;/p&gt;
&lt;h2&gt;API Interaction and Authentication&lt;/h2&gt;
&lt;p&gt;ACMOJ provides an OpenAPI-compliant API, which forms the foundation for implementing functionality.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;API Client Setup&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;I chose &lt;code&gt;axios&lt;/code&gt; as the HTTP request library and encapsulated an &lt;code&gt;ApiClient&lt;/code&gt; class to uniformly handle request sending, Base URL configuration, and error handling. The key was setting up request interceptors to automatically attach &lt;code&gt;Bearer &amp;#x3C;token&gt;&lt;/code&gt; in the &lt;code&gt;Authorization&lt;/code&gt; Header.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Authentication &quot;Episode&quot; - OAuth vs PAT&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The API documentation mentioned both OAuth2 (Authorization Code Flow) and Personal Access Token (PAT) authentication methods.&lt;/p&gt;
&lt;p&gt;Initially, I tried implementing the OAuth2 flow. This involved directing users to browser authorization, then starting a temporary HTTP server locally to listen for callback URIs to obtain the &lt;code&gt;code&lt;/code&gt;, then using the &lt;code&gt;code&lt;/code&gt; and &lt;code&gt;client_secret&lt;/code&gt; to exchange for an &lt;code&gt;access_token&lt;/code&gt;. While this flow is standard for applications requiring multi-user authorization, it&apos;s quite complex to implement, especially handling &lt;code&gt;client_secret&lt;/code&gt; and local callbacks securely in a VS Code extension environment. (Actually, what stopped me initially was needing a &lt;code&gt;client secret&lt;/code&gt; from the admin team. At that time, I didn&apos;t know anyone on the admin team, though they seem to know me now after developing this plugin XD)&lt;/p&gt;
&lt;p&gt;Considering that target users (mainly myself and classmates) could easily generate PATs on the ACMOJ website, I decided to switch to the simpler PAT authentication. This greatly simplified the flow: create an &lt;code&gt;AuthService&lt;/code&gt; (or &lt;code&gt;TokenManager&lt;/code&gt;), provide an &lt;code&gt;acmoj.setToken&lt;/code&gt; command using &lt;code&gt;vscode.window.showInputBox({ password: true })&lt;/code&gt; to prompt users for PAT input, use VS Code&apos;s &lt;code&gt;SecretStorage&lt;/code&gt; API (&lt;code&gt;context.secrets.store&lt;/code&gt; / &lt;code&gt;context.secrets.get&lt;/code&gt;) to securely store and read PATs, provide an &lt;code&gt;acmoj.clearToken&lt;/code&gt; command to clear stored PATs, directly get stored PATs from &lt;code&gt;AuthService&lt;/code&gt; in &lt;code&gt;ApiClient&lt;/code&gt;&apos;s request interceptor to add to request headers, and in response interceptor, if encountering 401 Unauthorized errors, call &lt;code&gt;AuthService&lt;/code&gt; methods to clear invalid tokens and prompt users to reset.&lt;/p&gt;
&lt;h2&gt;Building User Interface with TreeView and Webview&lt;/h2&gt;
&lt;p&gt;To display information and provide interaction in VS Code, I mainly used TreeView and Webview.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TreeView (Sidebar)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;I used the &lt;code&gt;vscode.TreeDataProvider&lt;/code&gt; interface to create two views for the Activity Bar:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Problemsets (Contests/Assignments):&lt;/strong&gt; Initially, I simply listed all problems but quickly found the information overwhelming. I improved it to display Problemsets that users joined. Further improvement involved categorizing Problemsets into &quot;Ongoing&quot;, &quot;Upcoming&quot;, and &quot;Passed&quot; top-level nodes based on their start/end times. This required fetching all Problemsets, then filtering and sorting them in the &lt;code&gt;getChildren&lt;/code&gt; method based on current time and category nodes. I used two custom &lt;code&gt;TreeItem&lt;/code&gt; types: &lt;code&gt;CategoryTreeItem&lt;/code&gt; and &lt;code&gt;ProblemsetTreeItem&lt;/code&gt;. Each Problemset node was set as expandable (&lt;code&gt;vscode.TreeItemCollapsibleState.Collapsed&lt;/code&gt;), loading its contained problem list (&lt;code&gt;ProblemBriefTreeItem&lt;/code&gt;) when clicked.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Submissions (Submission Records):&lt;/strong&gt; This displays the user&apos;s submission list, including ID, problem, status, language, time, etc. I set different icons (&lt;code&gt;ThemeIcon&lt;/code&gt;) for different submission statuses (AC, WA, TLE, RE...) to make them more intuitive.&lt;/p&gt;
&lt;p&gt;The key to implementing TreeView lies in the &lt;code&gt;getChildren&lt;/code&gt; (get child nodes) and &lt;code&gt;getTreeItem&lt;/code&gt; (define node appearance and behavior) methods. Through &lt;code&gt;EventEmitter&lt;/code&gt; and &lt;code&gt;onDidChangeTreeData&lt;/code&gt; events, you can notify VS Code to refresh the view.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Webview (Detail Display)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;When users click on problems or submission records in TreeView, I use &lt;code&gt;vscode.window.createWebviewPanel&lt;/code&gt; to create a Webview for displaying detailed information. Why use &lt;code&gt;webview&lt;/code&gt;? Because I needed to render TeX formulas, and JSON requests returned Markdown results.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Content Rendering:&lt;/strong&gt; Webview is essentially an embedded browser environment with HTML content. I used the &lt;code&gt;markdown-it&lt;/code&gt; library to convert Markdown-formatted problem descriptions, input/output formats, etc. obtained from the API into HTML.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Challenge: Mathematical Formula Rendering:&lt;/strong&gt; OJ problem descriptions often contain LaTeX formulas.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Attempt One (Failed):&lt;/strong&gt; Initially, I tried including KaTeX JS library and auto-render script in the Webview HTML for client-side rendering. However, this caused the strange issue of formulas being rendered twice (once as original text, once as KaTeX rendered result).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Attempt Two (Success):&lt;/strong&gt; I realized the problem was in the duplicate rendering flow. The final solution was using &lt;code&gt;markdown-it&lt;/code&gt;&apos;s KaTeX plugin (&lt;code&gt;@vscode/markdown-it-katex&lt;/code&gt; - this package had another developer&apos;s version when installing via npm, which was outdated and had security risks, but the good news is that VS Code officially noticed this project and made subsequent fixes, so I used this one). When using &lt;code&gt;md.render()&lt;/code&gt; on the &lt;strong&gt;extension side&lt;/strong&gt; (Node.js environment), this plugin directly converts LaTeX in Markdown (&lt;code&gt;$...$&lt;/code&gt;, &lt;code&gt;$$...$$&lt;/code&gt;) to final KaTeX HTML structure. This way, the HTML sent to Webview is already pre-rendered, and the Webview side &lt;strong&gt;only needs&lt;/strong&gt; to include KaTeX CSS (&lt;code&gt;katex.min.css&lt;/code&gt;) to display styles correctly, no longer needing KaTeX JS and auto-render scripts.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Commands and Status Bar&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;I used &lt;code&gt;vscode.commands.registerCommand&lt;/code&gt; to register various user operations (set Token, refresh views, submit code, view problems by ID, etc.). I used &lt;code&gt;vscode.window.createStatusBarItem&lt;/code&gt; to display current login status and username on the left side of the status bar, which can trigger corresponding commands (like showing user info or setting Token) when clicked.&lt;/p&gt;
&lt;h2&gt;Packaging and Publishing&lt;/h2&gt;
&lt;p&gt;Everything worked smoothly during development and debugging (&lt;code&gt;F5&lt;/code&gt;), but when I used &lt;code&gt;vsce package&lt;/code&gt; to package into a VSIX file and installed it on another computer, I encountered the classic problem: &lt;code&gt;Command &apos;acmoj.setToken&apos; not found&lt;/code&gt; or &lt;code&gt;Cannot find module &apos;axios&apos;&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Debugging Process&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;I checked the developer tools by opening VS Code developer tools (&lt;code&gt;Developer: Toggle Developer Tools&lt;/code&gt;) Console on the test computer. I found that activating the extension directly reported error &lt;code&gt;Cannot find module &apos;axios&apos;&lt;/code&gt;. I checked VSIX contents using &lt;code&gt;vsce ls&lt;/code&gt; command (or renaming &lt;code&gt;.vsix&lt;/code&gt; to &lt;code&gt;.zip&lt;/code&gt; and extracting) to view package contents. I discovered that the &lt;code&gt;node_modules&lt;/code&gt; folder wasn&apos;t packaged at all!&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Root Cause&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;I mistakenly placed runtime-required libraries (like &lt;code&gt;axios&lt;/code&gt;, &lt;code&gt;markdown-it&lt;/code&gt;, &lt;code&gt;katex&lt;/code&gt;, &lt;code&gt;@vscode/markdown-it-katex&lt;/code&gt;) under &lt;code&gt;devDependencies&lt;/code&gt; instead of &lt;code&gt;dependencies&lt;/code&gt; in &lt;code&gt;package.json&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Dependencies&lt;/strong&gt; are libraries required for extension &lt;strong&gt;runtime&lt;/strong&gt; and will be packaged by &lt;code&gt;vsce package&lt;/code&gt;. &lt;strong&gt;DevDependencies&lt;/strong&gt; are libraries used during &lt;strong&gt;development&lt;/strong&gt; (compilers, type definitions, linters, packaging tools, etc.) and will &lt;strong&gt;not&lt;/strong&gt; be packaged.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Solution&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;I carefully checked &lt;code&gt;package.json&lt;/code&gt; and moved all runtime dependencies (&lt;code&gt;axios&lt;/code&gt;, etc.) to the &lt;code&gt;dependencies&lt;/code&gt; section, while keeping development tools (&lt;code&gt;typescript&lt;/code&gt;, &lt;code&gt;@types/*&lt;/code&gt;, &lt;code&gt;eslint&lt;/code&gt;, &lt;code&gt;@vscode/vsce&lt;/code&gt;, etc.) in &lt;code&gt;devDependencies&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-json&quot;&gt;{
  &quot;dependencies&quot;: {
    &quot;@vscode/markdown-it-katex&quot;: &quot;...&quot;,
    &quot;axios&quot;: &quot;...&quot;,
    &quot;katex&quot;: &quot;...&quot;,
    &quot;markdown-it&quot;: &quot;...&quot;
  },
  &quot;devDependencies&quot;: {
    &quot;@types/vscode&quot;: &quot;...&quot;,
    &quot;@types/node&quot;: &quot;...&quot;,
    &quot;@types/markdown-it&quot;: &quot;...&quot;,
    &quot;@vscode/vsce&quot;: &quot;...&quot;, // The packaging tool itself is a dev dependency
    &quot;typescript&quot;: &quot;...&quot;,
    &quot;eslint&quot;: &quot;...&quot;
  }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Key Step:&lt;/strong&gt; After modifying &lt;code&gt;package.json&lt;/code&gt;, it&apos;s essential to perform &lt;strong&gt;&quot;clean &amp;#x26; reinstall&quot;&lt;/strong&gt; - I continued getting errors initially because I didn&apos;t clear node_modules and package-lock.json.&lt;/p&gt;
&lt;p&gt;This time, the generated VSIX file finally contained the correct &lt;code&gt;node_modules&lt;/code&gt;, and after installation, commands could be found normally and the extension activated successfully.&lt;/p&gt;
&lt;h2&gt;TypeScript Interlude&lt;/h2&gt;
&lt;p&gt;As a TypeScript project, I also encountered some typical type issues:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Module/Type Not Found:&lt;/strong&gt; &lt;code&gt;Cannot find module &apos;vscode&apos;&lt;/code&gt; or other &lt;code&gt;@types&lt;/code&gt; packages, usually resolved by &lt;code&gt;npm install --save-dev @types/vscode @types/node ...&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Implicit &lt;code&gt;any&lt;/code&gt;:&lt;/strong&gt; After enabling &lt;code&gt;strict&lt;/code&gt; mode, I needed to explicitly add types for callback function parameters (like &lt;code&gt;progress&lt;/code&gt; in &lt;code&gt;withProgress&lt;/code&gt;, &lt;code&gt;text&lt;/code&gt; in &lt;code&gt;validateInput&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;API Signature Mismatch:&lt;/strong&gt; When calling &lt;code&gt;vscode.window.showQuickPick&lt;/code&gt;, if providing option objects, you need to pass &lt;code&gt;QuickPickItem[]&lt;/code&gt; instead of &lt;code&gt;string[]&lt;/code&gt;, requiring mapping.&lt;/p&gt;
&lt;h2&gt;Is This the End?&lt;/h2&gt;
&lt;p&gt;While acmoj-helper can already run and has helped me considerably in daily use, during the development process, I gradually felt some &quot;growing pains.&quot; As features iterated (even with minor adjustments), I found the code becoming somewhat messy:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Unclear Responsibilities:&lt;/strong&gt; The &lt;code&gt;commands.ts&lt;/code&gt; file not only handled command registration but also contained substantial complex business logic implementations like &lt;code&gt;submitCurrentFile&lt;/code&gt;. This made the file abnormally bloated, making modifications affect the entire system.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;High Coupling:&lt;/strong&gt; Modifying one module (like &lt;code&gt;cache.ts&lt;/code&gt; handling API caching) might unexpectedly affect views (&lt;code&gt;submissionProvider.ts&lt;/code&gt;) or command handling. When I mentioned rewriting &lt;code&gt;submissionProvider&lt;/code&gt; earlier, that was a typical example - the view layer was too tightly coupled with data fetching and business logic.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Registration Chaos:&lt;/strong&gt; Command registration was scattered across &lt;code&gt;extension.ts&lt;/code&gt; and &lt;code&gt;commands.ts&lt;/code&gt;, lacking centralization and clarity.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Extension Difficulties:&lt;/strong&gt; If I wanted to add new features like &quot;Contest&quot; view or more complex problem filtering logic, it would be extremely painful under the existing structure, requiring careful navigation through various files to ensure existing functionality wasn&apos;t broken.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Testing Obstacles:&lt;/strong&gt; Code mixing UI logic, API calls, and business processing was very difficult to unit test.&lt;/p&gt;
&lt;p&gt;These issues made me realize that while the current architecture works, it&apos;s not &quot;elegant&quot; and lacks long-term viability. To ensure this project can develop healthily and to improve my own code design skills, I decided to conduct a thorough refactoring.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Refactoring Goals: Decoupling, Layering, Single Responsibility&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The new architecture I&apos;m currently working on is roughly divided into these layers:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;VS Code Integration Layer (&lt;code&gt;extension.ts&lt;/code&gt;, &lt;code&gt;src/commands/index.ts&lt;/code&gt;)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Service Layer (&lt;code&gt;src/services/&lt;/code&gt;)&lt;/strong&gt; - Responsible for encapsulating core business logic and interactions with external resources (like APIs, caching). Each service corresponds to a clear domain.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Command Handling Layer (&lt;code&gt;src/commands/&lt;/code&gt;)&lt;/strong&gt; - Command handlers receive calls from VS Code and then &lt;strong&gt;use the service layer&lt;/strong&gt; to complete specific tasks. They serve as bridges between VS Code commands and business logic. Complex logic (like &lt;code&gt;submitCurrentFile&lt;/code&gt;) is now clearly encapsulated in corresponding command handlers.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;UI Layer (&lt;code&gt;src/views/&lt;/code&gt;, &lt;code&gt;src/webviews/&lt;/code&gt;)&lt;/strong&gt; - Responsible for data display and UI interaction. The &lt;code&gt;views/&lt;/code&gt; directory contains TreeDataProviders (like &lt;code&gt;ProblemsetProvider&lt;/code&gt;, &lt;code&gt;SubmissionProvider&lt;/code&gt;) that get data from the &lt;strong&gt;service layer&lt;/strong&gt; and format it into structures needed by VS Code TreeView. The &lt;code&gt;webviews/&lt;/code&gt; directory contains Webview Panel logic. After refactoring, I created dedicated classes for problem details and submission details (&lt;code&gt;ProblemDetailPanel&lt;/code&gt;, &lt;code&gt;SubmissionDetailPanel&lt;/code&gt;), encapsulating their respective HTML generation, message handling, and lifecycle management. They also get data through the &lt;strong&gt;service layer&lt;/strong&gt;, and Webview operations (like &quot;copy code&quot;) now typically send messages to VS Code via &lt;code&gt;postMessage&lt;/code&gt;, responded to by corresponding &lt;strong&gt;command handlers&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Core/Data Layer (&lt;code&gt;src/core/&lt;/code&gt;, &lt;code&gt;src/types.ts&lt;/code&gt;)&lt;/strong&gt; - Provides the most basic components and definitions. A typical example during refactoring was &lt;strong&gt;&lt;code&gt;core/apiClient.ts&lt;/code&gt;&lt;/strong&gt;: a purer HTTP client only responsible for sending requests, handling authentication headers, retry logic, and basic error interpretation. It no longer contains specific business endpoint logic. Previously, getUserProfile, getSubmission, etc. were all in there.&lt;/p&gt;
&lt;p&gt;While the refactoring process was quite challenging and temporarily introduced new bugs, it laid a solid foundation for ACMOJ Helper&apos;s long-term development. Now I can more confidently implement those more comprehensive features I envisioned at the end of version 1.0.&lt;/p&gt;
&lt;p&gt;If you&apos;re also interested in VSCode extension development or want to build integrations for tools or platforms you frequently use, don&apos;t hesitate - just start doing it! Begin with &lt;code&gt;yo code&lt;/code&gt;, encounter problems, solve problems - this process itself is the best learning experience.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Project Repository:&lt;/strong&gt; &lt;a href=&quot;https://github.com/TheUnknownThing/vscode-acmoj&quot;&gt;TheUnknownThing/vscode-acmoj&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks for reading! I hope my experience can be helpful to you.&lt;/p&gt;</content:encoded><h:img src="undefined"/><enclosure url="undefined"/></item><item><title>I&apos;ll Never Use memset Again...</title><link>https://20051110.xyz/blog/memset</link><guid isPermaLink="true">https://20051110.xyz/blog/memset</guid><description>The Pitfalls of the memset function</description><pubDate>Mon, 10 Mar 2025 05:11:40 GMT</pubDate><content:encoded>&lt;h2&gt;0. Foreword&lt;/h2&gt;
&lt;p&gt;This problem originated from my first programming exam during my freshman year... It was a question involving block decomposition (data chunking), and in my program, I had an operation like this:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-cpp&quot;&gt;memset(mul_tag, 1, sizeof(mul_tag));
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Unsurprisingly, the program resulted in a WA (Wrong Answer). I spent a very, very long time debugging. This line looked completely harmless, didn&apos;t it? But as it turned out, simply changing this line fixed the program! Why??? The answer becomes clear when we look at the &lt;code&gt;memset&lt;/code&gt; function prototype.&lt;/p&gt;
&lt;h2&gt;1. &lt;code&gt;memset&lt;/code&gt; Function Introduction&lt;/h2&gt;
&lt;p&gt;The prototype for the &lt;code&gt;memset&lt;/code&gt; function is as follows:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-c&quot;&gt;void *memset(void *s, int c, size_t n);
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;s&lt;/code&gt;: A pointer to the block of memory to fill.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;c&lt;/code&gt;: The value to be set. &lt;strong&gt;Note:&lt;/strong&gt; Although &lt;code&gt;c&lt;/code&gt; is of type &lt;code&gt;int&lt;/code&gt;, &lt;code&gt;memset&lt;/code&gt; actually converts &lt;code&gt;c&lt;/code&gt; to an &lt;code&gt;unsigned char&lt;/code&gt; before filling.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;n&lt;/code&gt;: The number of bytes to be set to the value.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The purpose of &lt;code&gt;memset&lt;/code&gt; is to set the first &lt;code&gt;n&lt;/code&gt; bytes of the memory block pointed to by &lt;code&gt;s&lt;/code&gt; to the value specified by &lt;code&gt;c&lt;/code&gt;.&lt;/p&gt;
&lt;h2&gt;2. The Trap&lt;/h2&gt;
&lt;p&gt;&lt;code&gt;memset&lt;/code&gt; performs its filling operation &lt;strong&gt;byte by byte&lt;/strong&gt;. When &lt;code&gt;a&lt;/code&gt; is an &lt;code&gt;int&lt;/code&gt; array (assuming &lt;code&gt;int&lt;/code&gt; occupies 4 bytes), &lt;code&gt;memset(a, 1, sizeof(a))&lt;/code&gt; will set &lt;em&gt;each byte&lt;/em&gt; of &lt;em&gt;each &lt;code&gt;int&lt;/code&gt; element&lt;/em&gt; to &lt;code&gt;1&lt;/code&gt;. This results in each &lt;code&gt;int&lt;/code&gt; element having the value &lt;code&gt;0x01010101&lt;/code&gt;, which is &lt;code&gt;16843009&lt;/code&gt; in decimal, not the &lt;code&gt;1&lt;/code&gt; we were hoping for.&lt;/p&gt;
&lt;h2&gt;3. Exceptions&lt;/h2&gt;
&lt;p&gt;Using &lt;code&gt;memset(a, 1, sizeof(a))&lt;/code&gt; is dangerous in most scenarios. However, there are a few exceptional cases where it works as expected or is safe:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If &lt;code&gt;a&lt;/code&gt; is a &lt;code&gt;char&lt;/code&gt; array, &lt;code&gt;memset(a, 1, sizeof(a))&lt;/code&gt; is correct because the &lt;code&gt;char&lt;/code&gt; type occupies only one byte.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;memset(a, 0, sizeof(a))&lt;/code&gt; can be safely used for arrays of any type to initialize the entire array to 0. (This is what we typically do! And it&apos;s precisely why I initially thought &lt;code&gt;memset(a, 1, sizeof(a))&lt;/code&gt; would be fine!)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;memset(a, -1, sizeof(a))&lt;/code&gt; is safe for &lt;code&gt;int&lt;/code&gt; arrays and will correctly initialize the elements to -1. Why? Hint: Computers store negative numbers using two&apos;s complement representation. The two&apos;s complement of -1 (for a 32-bit int) is &lt;code&gt;11111111 11111111 11111111 11111111&lt;/code&gt;, which means every byte is &lt;code&gt;0xFF&lt;/code&gt;. Therefore, &lt;code&gt;memset(a, -1, sizeof(a))&lt;/code&gt; fills every byte with &lt;code&gt;0xFF&lt;/code&gt;, effectively setting each &lt;code&gt;int&lt;/code&gt; element to -1.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;4. You Should Use &lt;code&gt;std::fill&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;Instead of &lt;code&gt;memset&lt;/code&gt; for non-zero/non-minus-one initializations (especially in C++), you should use &lt;code&gt;std::fill&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;std::fill&lt;/code&gt; Example (C++):&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-c++&quot;&gt;#include &amp;#x3C;algorithm&gt;
#include &amp;#x3C;array&gt; // Or use raw arrays

std::array&amp;#x3C;int, 10&gt; a;  // Or: int a[10];
std::fill(a.begin(), a.end(), 1); // Or: std::fill(a, a + 10, 1);
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;std::fill&lt;/code&gt; operates on elements of the container or array, assigning the specified value (&lt;code&gt;1&lt;/code&gt; in this case) correctly to each element, regardless of its underlying byte representation.&lt;/p&gt;</content:encoded><h:img src="undefined"/><enclosure url="undefined"/></item><item><title>Installing Windows on a IPv6 VPS</title><link>https://20051110.xyz/blog/ipv6-vps-windows</link><guid isPermaLink="true">https://20051110.xyz/blog/ipv6-vps-windows</guid><description>If you happen to have a cloud server that does not provide Windows images, you might want to try installing Windows yourself.</description><pubDate>Wed, 15 Jan 2025 14:59:00 GMT</pubDate><content:encoded>&lt;p&gt;If you happen to have a high-configuration cloud server (like my Afly Black Friday VPS) that doesn&apos;t provide Windows images, you might want to try installing Windows yourself using the DD method.&lt;/p&gt;
&lt;h2&gt;What is DD System Installation?&lt;/h2&gt;
&lt;p&gt;As the name suggests, DD system installation uses the dd command to transfer a vhd file to a specific partition, then configures boot files to make it bootable. As scripts have evolved, many features have been added (like installation from img or iso images, system rescue). However, this isn&apos;t the main focus - this tutorial aims to cover the pitfalls I encountered while using such scripts, and how to solve them.&lt;/p&gt;
&lt;h2&gt;My Environment Configuration&lt;/h2&gt;
&lt;p&gt;First, let me introduce my environment (these configurations might seem unusual, but these specific characteristics led to some interesting problems):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CPU: 3 Core AMD Ryzen 9 9950X&lt;/li&gt;
&lt;li&gt;RAM: 4.5GB&lt;/li&gt;
&lt;li&gt;SSD: 125GB&lt;/li&gt;
&lt;li&gt;Network: IPv6 /128 Only (Yes, pure IPv6 environment with no IPv4 access! And only a /128 IPv6 allocation, which becomes important later)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Preparation&lt;/h2&gt;
&lt;h3&gt;Script Used&lt;/h3&gt;
&lt;p&gt;I chose this script: &lt;a href=&quot;https://github.com/bin456789/reinstall&quot;&gt;https://github.com/bin456789/reinstall&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I strongly recommend carefully reading the README first, as the repository contains detailed instructions on how to use the script.&lt;/p&gt;
&lt;h3&gt;System Image Selection&lt;/h3&gt;
&lt;p&gt;I used an image from TeddySun&apos;s collection, which you can find by searching &lt;a href=&quot;https://teddysun.com/?s=DD&quot;&gt;https://teddysun.com/?s=DD&lt;/a&gt; to find your preferred image. I selected Windows 10 LTSC because it&apos;s relatively clean.&lt;/p&gt;
&lt;h3&gt;Quick Installation Commands&lt;/h3&gt;
&lt;p&gt;If you&apos;re in a hurry, here are the basic installation commands:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;# Download the script
curl -O https://raw.githubusercontent.com/bin456789/reinstall/main/reinstall.sh || wget -O reinstall.sh

# Execute the installation
bash reinstall.sh dd --img https://dl.lamp.sh/vhd/zh-cn_windows10_ltsc.xz
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Remember to install curl beforehand (if your system doesn&apos;t have it)&lt;/p&gt;
&lt;h3&gt;First Issue: Incorrect DNS Configuration&lt;/h3&gt;
&lt;p&gt;This problem was mainly caused by my specific network environment. The DNS configuration in the script&apos;s Alpine environment was incorrect, preventing files from being downloaded. Here&apos;s my solution:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;#!/bin/sh

# Modify /etc/resolv.conf file
echo &quot;nameserver 2001:4860:4860::8888&quot; &gt; /etc/resolv.conf
echo &quot;nameserver 2001:4860:4860::8844&quot; &gt;&gt; /etc/resolv.conf

if [ -f /etc/systemd/resolved.conf ]; then
    echo &quot;[Resolve]&quot; &gt;&gt; /etc/systemd/resolved.conf
    echo &quot;DNS=2001:4860:4860::8888&quot; &gt;&gt; /etc/systemd/resolved.conf
    echo &quot;DNS=2001:4860:4860::8844&quot; &gt;&gt; /etc/systemd/resolved.conf
    systemctl restart systemd-resolved
fi

echo &quot;DNS successfully changed to Google IPv6 DNS&quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Of course, if you have a normal dual-stack environment, you probably won&apos;t encounter this issue.&lt;/p&gt;
&lt;h3&gt;Second Issue: Password Setup&lt;/h3&gt;
&lt;p&gt;I found this particularly interesting: when the script first runs, it prompts you to enter a password, but this password is not the one you&apos;ll use to log into Windows! Despite the script&apos;s README mentioning this, I missed it.&lt;/p&gt;
&lt;p&gt;In fact, the Windows login password is determined by the image. For TeddySun&apos;s image that I used:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Username: Administrator&lt;/li&gt;
&lt;li&gt;Password: Teddysun.com&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Third Issue: Windows IPv6 Privacy Protection&lt;/h3&gt;
&lt;p&gt;This problem puzzled me for a long time. If you run &lt;code&gt;ipconfig /all&lt;/code&gt; on a Windows computer, you might notice something called &quot;temporary address.&quot; This is because Windows &quot;protects your online privacy,&quot; but in my environment with only a /128 IPv6 allocation, this became a problem: external access to your machine is through that fixed IP address, but your machine accesses external websites using a temporary address. This means you can connect via Remote Desktop but can&apos;t access the internet.&lt;/p&gt;
&lt;p&gt;The solution is simple - open Command Prompt as administrator:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-cmd&quot;&gt;netsh interface ipv6 set privacy state=disable
# Then restart the network adapter
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;Fourth Issue: Workarounds for Pure IPv6 Environment&lt;/h3&gt;
&lt;p&gt;This issue also stems from my special network environment. Not having IPv4 access is quite inconvenient, so I used Cloudflare WARP to provide IPv4 access. However, note that if you directly use the Windows version of WARP, after enabling it, your IPv6 address will also change to WARP&apos;s address, preventing you from connecting to Remote Desktop!&lt;/p&gt;
&lt;p&gt;I used a solution provided by a user on the Nodeseek forum (&lt;a href=&quot;https://www.nodeseek.com/post-128008-1&quot;&gt;original post&lt;/a&gt;):&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Download and install the official CloudFlare WARP client&lt;/li&gt;
&lt;li&gt;In WARP settings:
&lt;ul&gt;
&lt;li&gt;Click the gear icon in the bottom right → Preferences&lt;/li&gt;
&lt;li&gt;Advanced → Configure Proxy Mode&lt;/li&gt;
&lt;li&gt;Enable proxy mode and set a memorable port&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This effectively gives you a locally available Cloudflare-provided IPv4 exit socks proxy, which you can use however you like - with SwitchyOmega or other tools, configure as you prefer. This way, you can maintain Remote Desktop connections while gaining IPv4 access.&lt;/p&gt;
&lt;h3&gt;Fifth Issue: LTSC Minor Problem&lt;/h3&gt;
&lt;p&gt;If you chose the LTSC 2021 version of Windows like I did, you might notice that the &lt;code&gt;wsappx&lt;/code&gt; service is always running in the background. This issue has a solution on the PCbeta forum; if you&apos;re interested, check out this post: &lt;a href=&quot;https://bbs.pcbeta.com/viewthread-1912382-1-1.html&quot;&gt;LTSC Optimization Guide&lt;/a&gt;&lt;/p&gt;</content:encoded><h:img src="undefined"/><enclosure url="undefined"/></item><item><title>Caddy Configuration for Typecho — Revisited</title><link>https://20051110.xyz/blog/caddy-typecho</link><guid isPermaLink="true">https://20051110.xyz/blog/caddy-typecho</guid><description>I decided to stop using pre-built Docker images and instead manually configure PHP + Caddy + Typecho.</description><pubDate>Tue, 14 Jan 2025 17:12:00 GMT</pubDate><content:encoded>&lt;p&gt;It seems the very first post on this blog showed how to set up Caddy, but at that time I used someone else&apos;s Docker image which bundled nginx, PHP, and Typecho, and I simply reverse-proxied Caddy to that port.&lt;/p&gt;
&lt;p&gt;Now I rented a server on Alibaba Cloud with only 512MB RAM, which is a bit tight. To avoid the extra memory overhead of nginx and Docker, I decided to hand-build the Typecho environment.&lt;/p&gt;
&lt;h2&gt;Install the world&apos;s best programming language&lt;/h2&gt;
&lt;h3&gt;Add the Sury PPA repository&lt;/h3&gt;
&lt;p&gt;First, add the PPA that contains the latest PHP packages. You need to install some prerequisite packages.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;sudo apt update
sudo apt install lsb-release apt-transport-https ca-certificates software-properties-common -y
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;After installing the tools, import the Sury GPG key. Sury provides almost every PHP version. Typecho requires PHP &gt; 7.4, so we&apos;ll install 8.2.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;sudo wget -O /etc/apt/trusted.gpg.d/php.gpg https://packages.sury.org/php/apt.gpg
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then add the repository to your sources list.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;sudo sh -c &apos;echo &quot;deb https://packages.sury.org/php/ $(lsb_release -sc) main&quot; &gt; /etc/apt/sources.list.d/php.list&apos;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Update the package list to verify it&apos;s working.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;sudo apt update
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;Install PHP 8.2 packages&lt;/h3&gt;
&lt;p&gt;Install PHP 8.2 and common extensions.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;sudo apt install php8.2 php8.2-cli php8.2-fpm php8.2-mysql php8.2-curl php8.2-gd php8.2-mbstring php8.2-xml php8.2-zip php8.2-opcache php8.2-sqlite3 -y
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;Install Caddy v2&lt;/h2&gt;
&lt;p&gt;I found many guides using Caddy v1 plus special rewrite rules, but one important upgrade in Caddy v2 is that you don&apos;t need extra rewrite rules for typical setups — v2 is simply more convenient. Don&apos;t try to force old patterns.&lt;/p&gt;
&lt;p&gt;Caddy provides an official script; this is what I recommend:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;sudo apt install -y debian-keyring debian-archive-keyring apt-transport-https
curl -1sLf &apos;https://dl.cloudsmith.io/public/caddy/stable/gpg.key&apos; | sudo gpg --dearmor -o /usr/share/keyrings/caddy-stable-archive-keyring.gpg
curl -1sLf &apos;https://dl.cloudsmith.io/public/caddy/stable/debian.deb.txt&apos; | sudo tee /etc/apt/sources.list.d/caddy-stable.list
sudo apt update
sudo apt install caddy
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you need extra plugins (for example, DNS providers), you can use xcaddy to build your own binary, but that&apos;s out of scope for this post.&lt;/p&gt;
&lt;h2&gt;Configure the Caddyfile&lt;/h2&gt;
&lt;p&gt;Again: I prefer you learn to use the Caddyfile rather than the JSON config.&lt;/p&gt;
&lt;p&gt;The following Caddyfile has been tested — paste and use it as-is:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;YOUR WEBSITE {
            encode gzip
            log
            tls YOUR EMAIL
            header Strict-Transport-Security max-age=31536000
            root * /var/www/YOUR WEBSITE
            php_fastcgi unix//run/php/php8.2-fpm.sock
            file_server
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Do you now appreciate Caddy v2&apos;s convenience? You don&apos;t need to configure php-fpm details or rewrite rules — it&apos;s basically ready out of the box.&lt;/p&gt;
&lt;p&gt;Everything is self-explanatory, but remember to replace &lt;code&gt;YOUR WEBSITE&lt;/code&gt; and &lt;code&gt;YOUR EMAIL&lt;/code&gt; with your actual domain and email address. If you want to read more about Caddyfile options, check the &lt;a href=&quot;https://caddyserver.com/docs/caddyfile&quot;&gt;official documentation&lt;/a&gt;, &quot;it is amazingly easy to read&quot; (my peer who studies Medical Sciences has said so).&lt;/p&gt;
&lt;h2&gt;Final step: add your Typecho site files&lt;/h2&gt;
&lt;p&gt;Download the latest Typecho release with wget:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;wget https://github.com/typecho/typecho/releases/latest/download/typecho.zip
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Unzip Typecho into /var/www:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Make sure /var/www exists:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;sudo mkdir -p /var/www
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then create your site directory; for example, if your site is 20051110.xyz, create:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;sudo mkdir /var/www/20051110.xyz
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Unzip typecho.zip into /var/www/your-site-directory:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;sudo cd /var/www/your-site-directory
sudo unzip /root/typecho.zip # remember to replace with the actual download location of Typecho
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Final step: change file ownership of /var/www and its subdirectories to the www-data user and group (the user typically used by web servers):&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;sudo chown -R www-data:www-data /var/www
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You can check that your directory structure looks like this:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;/var/www/your-site
├── admin/
├── install/
├── usr/
├── var/
├── index.php
├── install.php
└── ...
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If everything is correct, start Caddy:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;caddy run --config=Caddyfile # I ran this because my Caddyfile is in the same directory
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Caddy will automatically obtain certificates for you. Once that&apos;s done, visit the site and proceed with the Typecho installation (don&apos;t worry — it&apos;s a GUI, just click through).&lt;/p&gt;
&lt;p&gt;Thanks for reading! Hope this post helps.&lt;/p&gt;</content:encoded><h:img src="undefined"/><enclosure url="undefined"/></item><item><title>A Few Things About OpenWRT Compilation</title><link>https://20051110.xyz/blog/openwrt-compile</link><guid isPermaLink="true">https://20051110.xyz/blog/openwrt-compile</guid><description>I encountered some issues during OpenWRT compilation that are worth documenting.</description><pubDate>Sun, 20 Oct 2024 01:05:00 GMT</pubDate><content:encoded>&lt;p&gt;Let me answer a few questions first&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Why compile it myself? I&apos;m a mature computer science student (lol).&lt;/li&gt;
&lt;li&gt;Why use Github Actions? Because Github is really convenient ~~I originally wanted to use my dedicated AMD 9950X as the build machine, but it failed after 15 minutes and I was too lazy to troubleshoot~~&lt;/li&gt;
&lt;li&gt;Why can&apos;t I understand this? ~~If you don&apos;t understand, don&apos;t read it.~~ Just download pre-compiled packages from others.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The following is based on the latest OpenWRT (23.05) + Github Actions online compilation&lt;/p&gt;
&lt;h2&gt;1. Preparation&lt;/h2&gt;
&lt;h3&gt;Clone the repository locally&lt;/h3&gt;
&lt;p&gt;First, you need to Fork the LEDE source code. LEDE repository [Github][1]&lt;/p&gt;
&lt;p&gt;Clone the repository you just forked to your local machine:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;git clone https://github.com/your-username/lede
&lt;/code&gt;&lt;/pre&gt;
&lt;blockquote&gt;
&lt;p&gt;Don&apos;t download ZIP! The ZIP file is not a Git repository and doesn&apos;t contain the &lt;code&gt;.git&lt;/code&gt; folder, so you can&apos;t use &lt;code&gt;git&lt;/code&gt; commands on it.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;Update Feeds&lt;/h3&gt;
&lt;pre&gt;&lt;code&gt;cd lede
./scripts/feeds update -a
./scripts/feeds install -a
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you don&apos;t update the feeds, you won&apos;t see Luci apps later! &lt;strong&gt;This step is mandatory!&lt;/strong&gt;&lt;/p&gt;
&lt;h3&gt;Enter the configuration menu&lt;/h3&gt;
&lt;p&gt;Use the following command to enter the configuration menu:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;make menuconfig
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;Configuration menu explanation&lt;/h3&gt;
&lt;p&gt;Generally, you only need to modify these:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Target System: Processor architecture&lt;/li&gt;
&lt;li&gt;Subtarget: Select processor&lt;/li&gt;
&lt;li&gt;Target Profile: Preconfigured profile&lt;/li&gt;
&lt;li&gt;LuCI: LuCI plugins
&lt;ul&gt;
&lt;li&gt;Applications: Applications&lt;/li&gt;
&lt;li&gt;Themes: Themes&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For example, I selected:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Target System: Mediatek-ARM&lt;/li&gt;
&lt;li&gt;Subtarget: Filogic&lt;/li&gt;
&lt;li&gt;Target Profile: ASR3000&lt;/li&gt;
&lt;li&gt;LuCI: LuCI plugins
&lt;ul&gt;
&lt;li&gt;Applications: Many fun plugins for you to explore!&lt;/li&gt;
&lt;li&gt;Themes: luci-theme-material&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;After making changes, select &lt;code&gt;Save&lt;/code&gt; to save as a &lt;code&gt;.config&lt;/code&gt; file.&lt;/p&gt;
&lt;p&gt;For Luci plugins, please refer to [this article on the Enshan forum][2]&lt;/p&gt;
&lt;h3&gt;Commit to your forked repository&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Delete the &lt;code&gt;/.config&lt;/code&gt; line in the &lt;code&gt;.gitignore&lt;/code&gt; file to stop ignoring the config file.&lt;/strong&gt; Very important!!! Otherwise, the &lt;code&gt;.config&lt;/code&gt; file won&apos;t be included when you &lt;code&gt;commit&lt;/code&gt;!!!&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Commit changes to GitHub:&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;git add .
git commit -m &quot;upd: personal config&quot;
git push origin master
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;~~The branch is called master, has a master-servant flavor to it~~&lt;/p&gt;
&lt;h2&gt;Pitfalls:&lt;/h2&gt;
&lt;h3&gt;Enable WIFI before compilation&lt;/h3&gt;
&lt;p&gt;If you need to enable WIFI by default for easy management, I searched many tutorials online but they were useless, mostly from around 2015 with no reference value. I figured it out myself:
Go to the &lt;code&gt;package/lean/default-settings/files/&lt;/code&gt; directory, edit the file &lt;code&gt;zzz-default-settings&lt;/code&gt;
Comment out these two lines by adding &lt;code&gt;#&lt;/code&gt; at the beginning:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-sh&quot;&gt;sed -i &apos;/option disabled/d&apos; /etc/config/wireless
sed -i &apos;/set wireless.radio${devidx}.disabled/d&apos; /lib/wifi/mac80211.sh
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;Github Actions Compilation&lt;/h3&gt;
&lt;p&gt;Online guides still say &quot;submit a Release and it will automatically trigger Github Actions&quot; but that didn&apos;t work for me, so I needed to make some changes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;When using Github Actions for compilation, remember to go to the Workflow page and enable the Workflow, and also enable OpenWrt-CI (because Workflows in forked repositories are disabled by default)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Also modify the repository&apos;s &lt;code&gt;.github/workflows/openwrt-ci.yml&lt;/code&gt;, changing the &lt;code&gt;cron&lt;/code&gt; task at the beginning (line 10) to the following to allow manual workflow triggering:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;on:
  repository_dispatch:
  workflow_dispatch:
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It will take about two hours, ~~but what does that have to do with me since I&apos;m using Github&apos;s resources~~&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Modifying various miscellaneous settings&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Change the default theme
&lt;pre&gt;&lt;code&gt;sed -i &quot;s/luci-theme-bootstrap/luci-theme-material/g&quot; feeds/luci/collections/luci/Makefile
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt; (Nowadays people&apos;s aesthetics seem to prefer the argon theme, anyway this should match what you installed in the `luci-themes` section of your `.config`)
- Add compiler information
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;sed -i &quot;s/OpenWrt /TheUnknownThing build $(TZ=UTC-8 date &quot;+%Y.%m.%d&quot;) @ OpenWrt /g&quot; package/lean/default-settings/files/zzz-default-settings&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;You probably don&apos;t want to keep &quot;TheUnknownThing&quot; as your builder name, change it to something else.
- Modify the default management address
The default management address is `192.168.1.1`, if it conflicts with your upstream network segment, you can modify it
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;sed -i &apos;s/192.168.1.1/192.168.2.1/g&apos; package/base-files/files/bin/config_generate&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;This changes it to `192.168.2.1`



[1]: https://github.com/coolsnowwolf/lede
[2]: https://www.right.com.cn/forum/thread-3682029-1-1.html
&lt;/code&gt;&lt;/pre&gt;</content:encoded><h:img src="undefined"/><enclosure url="undefined"/></item><item><title>How to Elegantly Annotate PDFs with LaTeX</title><link>https://20051110.xyz/blog/latex-annotation-1</link><guid isPermaLink="true">https://20051110.xyz/blog/latex-annotation-1</guid><description>My professor shared a PDF Beamer file, and I want to add mathematical annotations to it. How can I do this elegantly?</description><pubDate>Wed, 09 Oct 2024 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Let me cut to the chase—it&apos;s getting late and I need some sleep!&lt;/p&gt;
&lt;p&gt;The inspiration for this solution comes from &lt;a href=&quot;https://tex.stackexchange.com/questions/85651/is-there-are-way-to-annotate-pdfs-with-latex#:~:text=Okular%20can%20annotate%20PDFs%20nicely,%20e.g.&quot;&gt;Stackexchange&lt;/a&gt;. I&apos;ve tried several annotation software options before, but I really want to bring only my iPad to class. Using VNC or xrdp to remotely access Linux just feels clunky and has terrible latency. So I&apos;ve settled on using Overleaf + LaTeX with PDF page inclusion for annotations.&lt;/p&gt;
&lt;p&gt;The Stackexchange author provided this solution:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;\documentclass{article}
%\url{http://tex.stackexchange.com/q/85651/86}
\usepackage[svgnames]{xcolor}
\usepackage{pdfpages}
\usepackage{tikz}

\tikzset{
  every node/.style={
    anchor=mid west,
  }
}

\makeatletter
\pgfkeys{/form field/.code 2 args={\expandafter\global\expandafter\def\csname field@#1\expandafter\endcsname\expandafter{#2}}}

\newcommand{\place}[3][]{\node[#1] at (#2) {\csname field@#3\endcsname};}
\makeatother
\newcommand{\xmark}[1]{\node at (#1) {X};}

\begin{document}

\foreach \mykey/\myvalue in {
  ctsfn/{Defined in Week 1},
  metsp/{Defined in Week 3},
} {
  \pgfkeys{/form field={\mykey}{\myvalue}}
}

\includepdf[
  pages=1,
  picturecommand={%
    \begin{tikzpicture}[remember picture,overlay]
%%% The next lines draw a useful grid - get rid of them (comment them out) on the final version
    \draw[gray] (current page.south west) grid (current page.north east);
\foreach \k in {1,...,28} {
      \path (current page.south east) ++(-2,\k) node {\k};
}
\foreach \k in {1,...,20} {
      \path (current page.south west) ++(\k,2) node {\k};
}
%%% grid code ends here
\tikzset{every node/.append style={fill=Honeydew,font=\large}}
\place[name=ctsfn]{14cm,17cm}{ctsfn}
\place[name=metsp]{11cm,9cm}{metsp}
\draw[ultra thick,blue,-&gt;] (ctsfn) to[out=135,in=90] (9cm,17.3cm);
\draw[ultra thick,blue,-&gt;] (metsp) to[out=155,in=70] (6cm,9cm);
    \end{tikzpicture}
  }
]{tikzmark_example.pdf}

\end{document}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The original author&apos;s result:
&lt;img src=&quot;https://20051110.xyz/_astro/Z7cqd.DiFwxJ0H_Z4hInz.webp&quot; alt=&quot;Original author&amp;#x27;s result&quot;&gt;&lt;/p&gt;
&lt;p&gt;This immediately caught my eye because:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;It has a coordinate grid, making annotation placement super convenient&lt;/li&gt;
&lt;li&gt;It&apos;s highly extensible—you can mix text and graphics, insert TikZ diagrams, mathematical formulas, you name it!&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;However, there were several issues to address:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The professor&apos;s Beamer slides are in landscape format, but this code produces portrait output&lt;/li&gt;
&lt;li&gt;The macro definitions are somewhat messy, and I don&apos;t need fancy connecting lines. Plus, the &lt;code&gt;includepdf&lt;/code&gt; call is too verbose and inelegant for repeated use&lt;/li&gt;
&lt;li&gt;The coordinate grid looks pretty ugly&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So here&apos;s how I solved these problems:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Fix the orientation&lt;/strong&gt;: Use &lt;code&gt;\usepackage[paperwidth=12cm, paperheight=16cm, landscape]{geometry}&lt;/code&gt; to make it landscape format.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Create a clean macro&lt;/strong&gt; to simplify &lt;code&gt;includepdf&lt;/code&gt; usage and support multiple annotations:&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;pre&gt;&lt;code&gt;\newcommand{\includePDFWithAnnotations}[2]{
\includepdf[
  pages=#1,
  picturecommand={%
    \begin{tikzpicture}[remember picture,overlay]
    %%% The next lines draw a useful grid - get rid of them (comment them out) on the final version
    \draw[very thin, lightgray] (current page.south west) grid (current page.north east);
    \foreach \k in {0,...,11} {
      \path (current page.south east) ++(-0.55,\k + 0.2) node[font=\tiny] {\k};
    }
    \foreach \k in {0,...,14} {
      \path (current page.south west) ++(\k,0.2) node[font=\tiny] {\k};
    }
    %%% grid code ends here
    \tikzset{every node/.append style={fill=Honeydew,font=\huge}}
    % Iterate through annotation list and place annotations
    #2
    \end{tikzpicture}
  }
]{YOUR PDF NAME.pdf}
}
&lt;/code&gt;&lt;/pre&gt;
&lt;ol start=&quot;3&quot;&gt;
&lt;li&gt;&lt;strong&gt;Use the macro elegantly&lt;/strong&gt; to insert multiple annotations:&lt;/li&gt;
&lt;/ol&gt;
&lt;pre&gt;&lt;code&gt;\includePDFWithAnnotations{1}{
\place{5, 4}{$123avd$}
\place{7, 8}{$456xyz$}
}

\includePDFWithAnnotations{7}{
\place{5, 4}{$123avd$}
\place{7, 8}{$456xyz$}
}
&lt;/code&gt;&lt;/pre&gt;
&lt;ol start=&quot;4&quot;&gt;
&lt;li&gt;&lt;strong&gt;Improve the aesthetics&lt;/strong&gt;: Move the coordinate grid to the page edges, use tiny font size, make the lines thinner and lighter colored. Much more visually appealing!&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;img src=&quot;https://20051110.xyz/_astro/1.Bn-d4_uE_ZkrUiy.webp&quot; alt=&quot;Final result&quot;&gt;&lt;/p&gt;
&lt;p&gt;Isn&apos;t that satisfying?&lt;/p&gt;
&lt;p&gt;Here&apos;s the complete TeX example:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;\documentclass[UTF8]{ctexart}
\usepackage[svgnames]{xcolor}
\usepackage[paperwidth=12cm, paperheight=16cm, landscape]{geometry}
\usepackage{pdfpages}
\usepackage{tikz}
\usepackage{amsmath,amsfonts,amssymb,amsthm}

\tikzset{
  every node/.style={
    anchor=mid west,
  }
}

\makeatletter
\pgfkeys{/form field/.code 2 args={\expandafter\global\expandafter\def\csname field@#1\expandafter\endcsname\expandafter{#2}}}

\newcommand{\place}[2]{\node at (#1) {\large #2};}
\makeatother

\newcommand{\xmark}[1]{\node at (#1) {X};}

\newcommand{\NotePage}[2]{
  \includepdf[
    pages=#1,
    picturecommand={%
      \begin{tikzpicture}[remember picture,overlay]
      %%% The next lines draw a useful grid - get rid of them (comment them out) on the final version
      \draw[very thin, lightgray] (current page.south west) grid (current page.north east);
      \foreach \k in {0,...,11} {
        \path (current page.south east) ++(-0.45,\k + 0.2) node[font=\tiny] {\k};
      }
      \foreach \k in {0,...,14} {
        \path (current page.south west) ++(\k,0.2) node[font=\tiny] {\k};
      }
      \place{0,11.25}{Page #1}
      %%% grid code ends here
      \tikzset{every node/.append style={fill=Honeydew,font=\huge}}
      #2
      \end{tikzpicture}
    }
  ]{LA14.pdf}
}

\begin{document}

\NotePage{1}{
  \place{1,4.5}{That is because $\det{A} = \det{A^\top}$}
}
\NotePage{2}{}

\end{document}
&lt;/code&gt;&lt;/pre&gt;</content:encoded><h:img src="undefined"/><enclosure url="undefined"/></item><item><title>Use lsyncd for Real-Time File Synchronization</title><link>https://20051110.xyz/blog/rsync-lsyncd</link><guid isPermaLink="true">https://20051110.xyz/blog/rsync-lsyncd</guid><description>The limitations of rsync + inotify forced us to look for better solutions.</description><pubDate>Sun, 06 Oct 2024 10:56:00 GMT</pubDate><content:encoded>&lt;p&gt;Ever since I became an mjj (server hoarder), I’ve accumulated a lot of VPSs—I just can’t resist buying more. But to keep my blog data synchronized across multiple servers, I’ve put in quite a bit of effort. I got tired of using cron to automatically package my blog directory and then manually back it up. Ultimately, it’s laziness—I want a fully automated solution. Since my Typecho blog is deployed via Docker images, I figured I’d go the extra mile and tackle multi-end data synchronization, so that any change on one site is reflected on all sites.&lt;/p&gt;
&lt;p&gt;When it comes to synchronizing files between multiple servers, rsync is a commonly used tool. It achieves efficient directory synchronization through incremental transfers, compression, and deletion operations. However, the classic working mode of rsync is “manual or scheduled trigger,” which falls short for scenarios requiring real-time synchronization.&lt;/p&gt;
&lt;h2&gt;How rsync Works&lt;/h2&gt;
&lt;p&gt;rsync compares the differences between the source and target directories and only transfers changed files or data blocks, reducing bandwidth usage. This method is ideal for backing up and synchronizing large amounts of data, especially in bandwidth-constrained environments. However, rsync usually needs to be triggered manually or via scheduled tasks (like cron). For applications that require real-time updates, this approach leads to data lag and resource waste.&lt;/p&gt;
&lt;h2&gt;The Shortcomings of rsync + inotify&lt;/h2&gt;
&lt;p&gt;To address real-time synchronization, you can use inotify to monitor file system changes and trigger rsync when changes occur. However, this approach has several obvious drawbacks:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;inotify requires additional scripts to work with rsync, increasing system complexity.&lt;/li&gt;
&lt;li&gt;This solution is usually one-way and cannot achieve multi-source real-time synchronization, which goes against my goals.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;The Advantages of lsyncd&lt;/h2&gt;
&lt;p&gt;To solve the above problems, lsyncd combines inotify’s real-time monitoring with rsync’s efficient transfer capabilities, providing a simple yet powerful solution for real-time synchronization. The advantages of lsyncd include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;lsyncd can handle complex real-time synchronization tasks with a simple configuration file, eliminating the need for extra scripts.&lt;/li&gt;
&lt;li&gt;It supports one-way synchronization between multiple servers, ensuring that data on every server is up to date. &lt;em&gt;Note: lsyncd does not natively support true bidirectional or multi-master sync with conflict resolution. But it doesn&apos;t matter in my case because I only need one-way sync.&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Step-by-Step Guide to Configuring lsyncd&lt;/h2&gt;
&lt;p&gt;Here’s how to use lsyncd for real-time synchronization:&lt;/p&gt;
&lt;h3&gt;Install lsyncd and rsync:&lt;/h3&gt;
&lt;p&gt;On all servers involved in synchronization, run the following command to install the necessary tools:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;sudo apt-get install lsyncd rsync
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;Configure lsyncd:&lt;/h3&gt;
&lt;p&gt;On each server, create the configuration file &lt;code&gt;/etc/lsyncd.conf&lt;/code&gt; with the following content:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-lua&quot;&gt;settings {
    logfile = &quot;/var/log/lsyncd/lsyncd.log&quot;,
    statusFile = &quot;/var/log/lsyncd/lsyncd.status&quot;,
    inotifyMode  = &quot;CloseWrite or Modify&quot;,
    maxProcesses = 1,
    -- nodaemon = true,
}

sync {
    default.rsyncssh,
    source = &quot;/var/www&quot;,
    targetdir = &quot;/var/www&quot;,
    host = &quot;45.*.*.*&quot;,
    delete = true,
    rsync = {
        binary = &quot;/usr/bin/rsync&quot;,
        archive = true,
        compress = true,
        verbose = true,
    },
    delay = 1,
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Explanation:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;source&lt;/code&gt;: The local directory to monitor, &lt;code&gt;/var/www/&lt;/code&gt; (replace with your own).&lt;/li&gt;
&lt;li&gt;&lt;code&gt;host&lt;/code&gt;: The remote target server (excluding itself).&lt;/li&gt;
&lt;li&gt;&lt;code&gt;targetdir&lt;/code&gt;: The remote target directory, &lt;code&gt;/var/www/&lt;/code&gt; (replace with your own).&lt;/li&gt;
&lt;li&gt;&lt;code&gt;delay&lt;/code&gt;: Sets the synchronization delay (in seconds) to prevent excessive syncing during frequent changes.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;delete&lt;/code&gt;: Deletes files on the target server that have been deleted on the source server.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Note: When using &lt;code&gt;rsyncssh&lt;/code&gt;, &lt;code&gt;maxProcesses&lt;/code&gt; must be 1. If using &lt;code&gt;rsync&lt;/code&gt;, you can set a higher value (e.g., 5).&lt;/p&gt;
&lt;p&gt;Tip: For troubleshooting, it’s recommended to start with &lt;code&gt;lsyncd /etc/lsyncd.conf&lt;/code&gt; to check for errors. Also, make sure to create the log directory first: &lt;code&gt;mkdir -p /var/log/lsyncd&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;One more thing: To allow servers to log in to each other without a password, you need to set up SSH key-based authentication.&lt;/p&gt;
&lt;p&gt;To automate real-time synchronization, ensure the source server can log in to the target server via SSH without a password.&lt;/p&gt;
&lt;p&gt;On the source server, generate an SSH key:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;ssh-keygen -t ed25519
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Follow the prompts; usually, you don’t set a passphrase.&lt;/p&gt;
&lt;p&gt;Copy the public key to the target server:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;ssh-copy-id user@target-server
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This copies the generated public key to the target server, enabling passwordless login. Note: You need to configure this on both servers if you want mutual access.&lt;/p&gt;
&lt;p&gt;Once configured, start verification.&lt;/p&gt;
&lt;h3&gt;Start lsyncd:&lt;/h3&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;lsyncd /etc/lsyncd.conf
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;Verify the Configuration:&lt;/h3&gt;
&lt;p&gt;Perform file operations in the &lt;code&gt;/var/www/&lt;/code&gt; directory on any server and check synchronization on the others.&lt;/p&gt;
&lt;h2&gt;Handling Conflicts&lt;/h2&gt;
&lt;p&gt;When multiple servers modify the same file at the same time, conflicts may occur. However, my use case probably won’t encounter conflicts, so I’m leaving it as is :D&lt;/p&gt;</content:encoded><h:img src="undefined"/><enclosure url="undefined"/></item><item><title>Disabling Adobe Acrobat&apos;s OCR Feature</title><link>https://20051110.xyz/blog/delete-acrobat-ocr</link><guid isPermaLink="true">https://20051110.xyz/blog/delete-acrobat-ocr</guid><description>Acrobat’s OCR really annoys me. Every time I edit a PDF, it freezes for a moment. So I’m just going to disable it once and for all.</description><pubDate>Wed, 11 Sep 2024 19:11:00 GMT</pubDate><content:encoded>&lt;p&gt;Acrobat’s OCR really annoys me. Every time I edit a PDF, it freezes for a moment—I have to wait for the current page’s OCR to finish before I can turn off automatic text recognition. So now, I’m just going to disable it once and for all. Go to this directory:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;C:\Program Files (x86)\Adobe\Acrobat DC\Acrobat\plug_ins&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Do you see &quot;PaperCapture&quot; there? Just rename it to &quot;PaperCapture_disabled&quot; and you’re done.&lt;/p&gt;</content:encoded><h:img src="undefined"/><enclosure url="undefined"/></item><item><title>方正书版10.0从安装到入门</title><link>https://20051110.xyz/blog/founder-book-10</link><guid isPermaLink="true">https://20051110.xyz/blog/founder-book-10</guid><description>方正书版10.0从安装到入门</description><pubDate>Sat, 16 Mar 2024 21:05:00 GMT</pubDate><content:encoded>&lt;p&gt;今儿花了一下午总算是把方正书版10.0搞定了。&lt;/p&gt;
&lt;h2&gt;1. 安装PDF Creator&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;方正PDFCreator 3.0&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;重要提示：请务必在系统装完后第一时间就安装字库和PDF Creator（虽然我不信这个邪，但是确实这样会少很多乱七八糟字体的干扰或者你从另外什么地方安装了字库的干扰）&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;安装顺序如下：&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;安装PDFCreator3108；&lt;/li&gt;
&lt;li&gt;把破解文件覆盖在安装文件的目录下C:\ProgramFiles\Founder\PDFCreator\Bin&lt;/li&gt;
&lt;li&gt;导入注册表〖根据你安装在哪个盘上要修改盘符和路径〗；(没安装RIP软件的，才导入这个注册表文件。目的是“欺骗”系统，让系统认为你安装了RIP软件)&lt;/li&gt;
&lt;li&gt;先安装CID5.01(748_GB)字库，“方正CID V5.00〖全套〗”安装密码：安装系列号：000000000 安装密码：42C2D35B4735036B; 字体密码：5918347506891A57（包括GBK、GB/748，都一样！） 再安装CID5.0(GBK)字库，安装序列号000000000安装密码：ce9d84241294e529;字体密码：2e4965af7e74ad68 ；安装字库时选择“方正世纪RIP”(我这里没有弹出选择这个选项，不过不要紧，还是顺利安装了)；&lt;/li&gt;
&lt;li&gt;字库路径为:C:\ProgramFiles\Founder\PDFCreator\Font，此时会在这个目录下生成一个fonts的目录C:\Program Files\Founder\PDFCreator\Fonts；（也可能不会生成！这个时候需要另外一个文件帮忙！安装完两个字库一半会生成一个FONTS文件夹，但很多人电脑偏偏没有生成，有些简单的后端字体识别不出，所以要借用PSPNT的FONTS文件夹字体来补充。）&lt;/li&gt;
&lt;li&gt;打开PDFCreator 3108；&lt;/li&gt;
&lt;li&gt;字体重置：字体路径为: C:\ProgramFiles\Founder\PDFCreator\Resource\CIDfont；TrueType字体路径为：C:\WINDOWS\Fonts&lt;/li&gt;
&lt;li&gt;PDFCreator重置字库时候千万不要去点它或者动它，不然会死机，只能重启。&lt;/li&gt;
&lt;li&gt;我这边提示安装成功了1100多个字体，最后可以正常输出PDF。&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;2. 安装书版10.0&lt;/h2&gt;
&lt;p&gt;这个不用多说了，安装完成以后把修改的文件复制到安装目录下。&lt;/p&gt;
&lt;p&gt;注意：在这一步就安装书版10.0还有女娲补字就好了，别的一切都不要安装，包括这里面什么字体啊什么的，免得到时出问题。&lt;/p&gt;
&lt;h2&gt;3. PS输出设置&lt;/h2&gt;
&lt;p&gt;在FBD输出PS/EPS时候点开左下角的“选项”，我这里比较粗暴，把后端748字库和后端GBK字库都勾选了“全部已安装”。按照我的测试来看，只要你的PDFCreator配置良好，这样就可以正常输出PDF了。在输出之前不妨去网络上随便找个正常的PS文件试试看，避免是因为自己PS有问题错怪了PDFCreator。&lt;/p&gt;
&lt;h2&gt;4. Word文件转FBD小样的一些问题&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;已知书版10.0的doc文件转换是broken的，不要用。&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;在使用这个网络上大神开发的软件时，小样的输出会有一些问题，这里总结如下：&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;如果你在用Word转FBD 6.0版本，那么最终doc文件里的所有MathType公式都保留其原样即可，不要跟着网上的教程转换啊什么的；如果你用的是5.6版本，那么需要跟着网上教程走。&lt;/p&gt;
&lt;p&gt;MathType转换过后sin, cos, &lt;code&gt;π&lt;/code&gt;, ln……这种数学符号会变斜，需要自己纠正。我写了个VSCode里用的正则表达式可以参考（Ctrl+F查找替换，选择正则表达式）（这里看的会是乱码，但是粘贴进去就是那个圈z）：&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;上面填这些：(cos|sin|tan|π|lim|ln|i)
下面填这些：$1
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;同样对于根号和字符贴在一起的情况，需要在&lt;code&gt;〖KF(〗&lt;/code&gt;前加圈1/2，同样可以使用替换来实现。&lt;/p&gt;
&lt;p&gt;对于选择题选项的排版，这里写了一个Python小程序，你只需要配置好第一个选择题的WB即可，它实现了以下功能：&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;在开头和结尾加上〖ZK(〗和〖ZK)〗（ZK+换行符）&lt;/li&gt;
&lt;li&gt;替换（换段）符号（除了第一处）为〖DW1〗到〖DW3〗&lt;/li&gt;
&lt;/ol&gt;
&lt;pre&gt;&lt;code&gt;import pyperclip

def transform_text(input_text):
    # 在开头和结尾加上特定标记
    transformed_text = &apos;〖ZK(〗&apos; + input_text + &apos;〖ZK)〗&apos;

    # 替换符号为〖DW1〗到〖DW3〗，跳过第一个符号
    parts = transformed_text.split(&apos;&apos;)
    transformed_text = &apos;&apos;.join(parts[:2])  # 保留第一个符号
    for i, part in enumerate(parts[2:], 1):  # 从第二个符号开始替换
        transformed_text += &apos;〖DW&apos; + str(i) + &apos;〗&apos; + part

    return transformed_text

# 原始文本
original_text = &apos;&apos;
original_text = pyperclip.paste()

transformed_text = transform_text(original_text)

# 将转换后的文本打印出来
print(transformed_text)
pyperclip.copy(transformed_text)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;至此，请愉快地开启你的排版生涯吧！&lt;/p&gt;</content:encoded><h:img src="undefined"/><enclosure url="undefined"/></item><item><title>Starting to Write Again</title><link>https://20051110.xyz/blog/hello-world</link><guid isPermaLink="true">https://20051110.xyz/blog/hello-world</guid><description>It probably began with a sudden inspiration during the winter break, preparing to take care of my blog again.</description><pubDate>Sat, 09 Mar 2024 22:44:00 GMT</pubDate><content:encoded>&lt;p&gt;It probably began with a sudden inspiration during the winter break, preparing to take care of my blog again.&lt;/p&gt;
&lt;p&gt;The last time I seriously ran a blog was perhaps in middle school. Back then, I thought having a Blogger was cool, and having your own space for thoughts on the internet was great and trendy—I guess it&apos;s not like that anymore?&lt;/p&gt;
&lt;p&gt;CNBlogs urges everyone to turn off ad-blocking plugins, the atmosphere on CSDN in China keeps getting worse, and Bloggers have turned into Vloggers—why am I starting to take care of my blog again at this time?&lt;/p&gt;
&lt;p&gt;I didn&apos;t expect to encounter so many difficulties deploying Typecho... My previous setup was Nginx+MySQL+BaoTa, so you can understand that BaoTa had taken care of everything; all I needed to do was click the deployment button and it was ready to go.&lt;/p&gt;
&lt;p&gt;So maybe the first thing I tried to do to become a &lt;em&gt;True&lt;/em&gt; Blogger is to organize all the things myself.&lt;/p&gt;
&lt;p&gt;I dropped Nginx for Caddy2 (isn&apos;t this just asking for trouble? I couldn&apos;t find any working URL rewrite configurations online after searching for days!) Of course, I thought about giving up (php-fpm configuration was fine, mysql configuration was fine, Caddy rewrite was fine and the homepage was accessible, but articles were unreadable? Login page worked but couldn&apos;t log in?) In the end, the almighty Docker solved the problem, and I think it&apos;s worth pasting the setup here for backup:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;docker run -d \
--name=typecho-blog \
--restart always \
--mount type=tmpfs,destination=/tmp \
-v /root/Typecho-Files:/data \
-e PHP_TZ=Asia/Shanghai \
-e PHP_MAX_EXECUTION_TIME=600 \
-p 127.0.0.1:9080:80 \
80x86/typecho:latest
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here I didn&apos;t expose the host port because I planned to use Caddy as a reverse proxy. When using Caddy as a reverse proxy, be aware of these two pitfalls I encountered (initially I could only access the homepage, but clicking on any content would redirect me to localhost:9080 which was inaccessible—how to solve it? Turns out I didn&apos;t properly set X-Forwarded-Proto and X-Forwarded-Port):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Check Typecho&apos;s config.inc.php file to ensure that &lt;strong&gt;TYPECHO_SITE_URL&lt;/strong&gt; is set to your public domain.&lt;/li&gt;
&lt;li&gt;In the Caddy configuration, make sure to set the correct X-Forwarded-For and X-Forwarded-Proto headers so Typecho knows the actual request protocol and client IP.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Your Caddy configuration should look like this:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;YOUR_DOMAIN_GOES_HERE {
  reverse_proxy http://localhost:9080 {
       header_up Host {host}
       header_up X-Forwarded-Host {host}
       header_up X-Forwarded-For {remote_host}
       header_up X-Forwarded-Proto {scheme}
    }
  tls YOUR_EMAIL_GOES_HERE
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Great, I finally have my own blog again. Hope I can write more in the future.&lt;/p&gt;</content:encoded><h:img src="undefined"/><enclosure url="undefined"/></item></channel></rss>