打造巴基斯坦通知助手:一款針對當地安全問題的小型AI工具
重點摘要
為了 Hugging Face Build Small 黑客松,我想打造一個實用、貼近在地且超越展示用途的工具。成果就是「巴基斯坦通知助手」,這款專注於安全議題的AI工具,能協助巴基斯坦民眾在點擊連結、回撥電話、洩漏一次性密碼或付款之前,辨識可疑訊息。這個想法源自一個常見問題:人們經常收到看似來自銀行、快遞、稅務機關、交通警察、公用事業、電信業者或政府部門的訊息。有些是真的,很多則是詐騙。難處往往不在於讀懂訊息本身。
Back to Articles Building Pakistan Notice Helper: A Small AI Tool for a Very Local Safety Problem Team Article Published June 8, 2026 Upvote 1 Abid Ali Awan kingabzpro Follow build-small-hackathon For the Hugging Face Build Small Hackathon, I wanted to build something practical, local, and useful beyond a demo. The result is Pakistan Notice Helper, a safety-focused AI tool that helps people in Pakistan understand suspicious messages before they click a link, call a number, share an OTP, or make a payment. The idea came from a common problem: people regularly receive messages that look like they are from banks, couriers, tax authorities, traffic police, utilities, mobile operators, or government departments. Some are real. Many are scams. The hard part is not always reading the message. The hard part is knowing what to do next. Pakistan Notice Helper is not an authenticity checker. It does not claim that a message is officially genuine or fraudulent. Instead, it works as a triage tool. It accepts text or a screenshot and returns a risk label, a short explanation, visible red flags, and safe next steps. Why this fits Build Small The project fits the Backyard AI track because it focuses on a specific local problem: scam-style notices and suspicious messages in Pakistan. Instead of building a large general-purpose assistant, I wanted to see how far a small model could go when the scope was clear, the product behavior was well-defined, and the interface was designed around real users. I initially tested a larger Qwen model, but the final production choice became Qwen3.5 4B Q8 through llama.cpp. It passed all high-risk scam cases and both screenshot cases in my ten-case evaluation. That made it a practical choice for a small-model safety assistant. The project uses: Hugging Face Space → custom Gradio frontend → queued Gradio Server endpoint → Modal endpoint → CUDA llama.cpp → Qwen3.5 4B Q8 MTP GGUF + vision projector This gave me a small-model stack that could handle both text and screenshots while staying below the hackathon’s 32B model limit. What the app does Pakistan Notice Helper supports both English and Urdu. This was one of the most important product decisions because suspicious messages in Pakistan are often written in English, Urdu, Roman Urdu, or a mix of all three. Urdu mode is not just a translated interface. When a user switches to Urdu, the app changes the layout to right-to-left, translates the headings, labels, risk cards, validation messages, and result controls, and also asks the model to generate the assessment in clear Urdu script. This means the user can submit a suspicious message and receive the full safety response in Urdu, including the risk label, explanation, red flags, safe next steps, and optional reply draft when appropriate. For a local safety tool, that matters because advice is easier to trust and act on when it is written in the language people are most comfortable using. The app looks for warning signs such as: urgent threats or account suspension language; requests for OTPs, PINs, passwords, CVVs, CNIC details, or card data; suspicious payment links or personal mobile numbers; impersonation of banks, telecom companies, couriers, tax authorities, or police; prizes, refunds, jobs, or benefits that require an advance fee. The tool then gives users safer next steps, such as verifying through independently found official channels instead of using the link or phone number inside the suspicious message. What I learned while building it This project taught me that building with small models is less about chasing the highest benchmark score and more about finding the right balance between quality, speed, cost, and product safety. 1. Small models work best when the scope is clear One of the biggest lessons was that small models can work surprisingly well when the task is carefully bounded. Pakistan Notice Helper does not need to be a general scam investigator. It needs to identify visible risk signals, avoid overclaiming, and give safe next steps. That made the product scope, prompt design, and output contract just as important as the model itself. The app is designed to say: this looks risky, here are the warning signs, and here is what you should do safely next. It is not designed to say: this is definitely real or definitely fake. 2. Starting with a larger model I started with Qwen3.6 27B, and the quality was excellent. In my testing, it handled suspicious messages very well and produced strong, reliable explanations. The problem was deployment cost and practicality. The model required much more VRAM, a larger GPU machine, and longer recovery time during cold starts. For a hackathon demo with irregular traffic, that was not ideal. It worked, but it was too expensive and heavy for the kind of small, focused tool I wanted to build. In terms of quality, I would rate the larger model around 95/100 for this task. But quality alone was not enough. I also had to think about cost, speed, cold starts, and whether the app could stay responsive. 3. Testing smaller local options After that, I tried moving to a much smaller vision-language model, MiniCPM-V 4.6 Q8, with the hope that it could run more locally and reduce serving cost. That experiment did not work well. It was very slow on GPU, and when I tried running it through ZeroGPU, I ran into quota and runtime issues. Even when the interface showed that I still had around 35 minutes of quota left, the app did not behave reliably. I am still not fully sure what caused those issues, but it made the deployment unstable. I then reverted and deployed the model through Modal. The deployment itself was fast and started responding within a few seconds, but the model quality was not good enough. It struggled with detecting suspicious messages and failed too many of my test cases, so I had to drop it. 4. Finding the “Goldilocks” model I then looked through the small open-source model rankings on Artificial Analysis and found what became the best fit for this project: Qwen3.5 4B. It was small enough to stay aligned with the Build Small spirit, fast enough for the app experience, and capable enough for the safety behavior I needed. Compared with Qwen3.6 27B, I would rate it around 80/100 for this task, while the larger model was closer to 95/100. But the tradeoff made sense. The 4B model was cheaper to serve, faster to load, easier to deploy, and practical on a smaller Modal machine. That balance of model quality, speed, cost, and cold-start behavior made it the “Goldilocks” model for Pakistan Notice Helper. 5. Prompting and output contracts mattered a lot Some early versions failed in useful ways. Thinking mode consumed the 500-token output budget before returning the final structured JSON, so I disabled thinking for production. One dense Roman Urdu screenshot reached the original completion limit, so image requests now receive a larger token budget. Another model response suggested an official-looking domain that had not been verified. That was a serious product issue, so I updated the system prompt to forbid invented URLs, phone numbers, organizations, and facts. These fixes made the system safer and more predictable. The model was not just being asked to “detect scams.” It was being asked to follow a strict safety contract. 6. Urdu UX needed real product work The Urdu interface also needed more work than I expected. Direct translations sounded unnatural. Some headings needed different line heights. Mixed Urdu and Latin model names could reorder unexpectedly. Mobile controls needed more vertical space, especially in right-to-left layout. I also tested a bundled Nastaliq webfont. It looked beautiful in isolation, but inside the product UI it reduced readability and made the interface feel less consistent. I removed it and returned to a system Arabic font stack while keeping the improved Urdu copy and RTL layout. These were not just design details. They affected whether the app felt clear, usab
Related
相關文章

Token成本算盤打響,Seedance開始駛向“五環外”
這篇消息聚焦「Token成本算盤打響,Seedance開始駛向“五環外”」。原始導語提到:視頻AI的決勝場,不在模型本身。 從 AI 情報角度來看,這類內容值得關注其背後的技術進展、產品落地、產業競爭與後續市場影響。

Pixel 10 手機用戶反饋谷歌 AI“搶鏡”問題,Gmail 無法正常回復郵件
科技媒體 Android Authority 昨日(6 月 18 日)發佈博文,報道稱 Pixel 10 系列手機遭遇 AI“搶鏡”問題,用戶在 Gmail 回覆郵件時無法彈出輸入法鍵盤,優先顯示 Help me write 功能。

DeepSeek 識圖模式正式上線 App 和網頁端
DeepSeek 多模態研究員 Xiaokang Chen 今日表示,DeepSeek 的識圖模式已在網頁和 App 端正式上線。IT之家測試,目前 DeepSeek 的 App 端識圖模式依然提示“圖片理解功能內測中”,網頁端沒有這項提示。

微信、豆包之後,消息稱阿里將推“千問輸入法”
千問團隊將推出名為“千問輸入法”的獨立 App,與 PC 端的千問語音輸入法有一定區別,AI 功能、鍵盤會更貼合手機端操作,填補千問在移動端 AI 輸入法賽道的空白,產品已開發完成,擇日上線各大應用商店。
Kimi Work 迎重大升級:推出“目標模式”並打通外部應用插件
月之暗面旗下 Kimi 電腦客戶端近日煥新升級,為 Kimi Work(Beta 版)引入兩項重磅新特性:目標模式實現連續自主工作 24 小時,插件中心正式對接多家主流辦公軟件,提升工作流效率。為加速用戶深度體驗,官方同步推出限時優惠,2026 年 6 月全月,使用 Work 模式的會員額度消耗直接打 5 折,帶來實惠。
網易雲音樂旗下AI情感陪伴App“妙時”宣佈7月14日停運
網易雲音樂旗下“妙時”(含AI奇遇)AI情感陪伴應用發佈停運公告,將於7月14日0時全面停止服務。客服迴應屬正常業務調整,不影響其他產品。目前已停止新用戶註冊和充值,用戶可在8月14日前申請退還剩餘代幣和會員費,並導出AI戀人聊天記錄。