Dear BibiGPT users,

This update focuses on Quick View, Easy Search, and Better Use — we gave our AI "eyes" so it can now read PPT slides and burned-in subtitles directly from video frames. Plus, local privacy mode is now available on desktop. Here's what's new.

立即体验 BibiGPT

想要体验这些强大的新功能吗？立即访问 BibiGPT，开启您的智能音视频总结之旅！

开始使用

👀 Quick View

Local Privacy Mode — Now on Desktop

Worried about uploading sensitive meeting recordings or personal memos to the cloud?

Local privacy mode has expanded from web to macOS and Windows clients. When enabled, speech recognition and summary generation run entirely on your local machine — no server uploads, no database storage. Physical-level privacy isolation, perfect for classified interviews, internal training recordings, or personal voice memos.

BibiGPT desktop client local privacy mode upload toggle

Google Gemma 4 31B Model

We've added Google Gemma 4 (31B) to the model selector — one of the most talked-about open-source models right now.

Fully open-sourced under the Apache 2.0 license, this 31-billion-parameter model excels at logical reasoning and long-context understanding, supports 140+ languages, and comes with native multimodal capabilities. Try running a few videos through Gemma 4 — different models bring genuinely different perspectives.

BibiGPT model selector searching for Google Gemma 4 31B

🔍 Easy Search

看看 BibiGPT 的 AI 总结效果

Let's build GPT: from scratch, in code, spelled out

Andrej Karpathy walks through building a tiny GPT in PyTorch — tokenizer, attention, transformer block, training loop.

Summary

Andrej Karpathy spends two hours rebuilding a tiny but architecturally faithful version of GPT in a single Jupyter notebook. He starts from a 1MB Shakespeare text file with a character-level tokenizer, derives self-attention from a humble running average, layers in queries/keys/values, scales up to multi-head attention, and stacks the canonical transformer block. By the end the model produces uncanny pseudo-Shakespeare and the audience has a complete mental map of pretraining, supervised fine-tuning, and RLHF — the three stages that turn a next-token predictor into ChatGPT.

Highlights

🧱 Build the dumbest version first. A bigram baseline gives a working training loop and a loss number to beat before any attention is introduced.
🧮 Self-attention rederived three times. Explicit loop → triangular matmul → softmax-weighted matmul makes the formula click instead of memorise.
🎯 Queries, keys, values are just learned linear projections. Once you see them as that, the famous attention diagram stops being magical.
🩺 Residuals + LayerNorm are what make depth trainable. Karpathy shows how each one earns its place in a transformer block.
🌍 Pretraining is only stage one. The toy model is what we built; supervised fine-tuning and RLHF are what turn it into an assistant.

Questions

- To keep the vocabulary tiny (65 symbols) and the focus on the model. Production GPTs use BPE for efficiency, but the architecture is identical.
- It keeps the variance of the scores roughly constant as the head dimension grows, so the softmax does not collapse to a one-hot distribution.
- Scale (billions vs. tens of millions of parameters), data, and two extra training stages: supervised fine-tuning on conversation data and reinforcement learning from human feedback.

Key Terms

Bigram model: A baseline language model that predicts the next token using only the previous token, implemented as a single embedding lookup.
Self-attention: A mechanism where each token attends to all earlier tokens via softmax-weighted dot products of query and key projections.
LayerNorm (pre-norm): Normalisation applied before each sublayer in modern transformers; keeps activations well-conditioned and lets you train deeper.
RLHF: Reinforcement learning from human feedback — the alignment stage that nudges a pretrained model toward responses humans actually prefer.

想要总结你自己的视频？

BibiGPT 支持 YouTube、B站、抖音等 30+ 平台，一键获得 AI 智能总结

免费试用 BibiGPT

Hard Subtitle OCR Extraction (Beta)

Some videos have subtitles burned directly into the frames — no CC track, and traditional ASR chokes on background noise.

BibiGPT can now read them directly from video frames using OCR. Great for noisy street interviews, lectures with heavy accents, or any video where on-screen text is clear but audio quality isn't. Currently supports Chinese, English, Japanese, French, German, and Spanish.

BibiGPT hard subtitle OCR recognition process

BibiGPT already understood video visuals — now it goes further by reading on-screen text directly.

🛠️ Better Use

PPT Keyframe Extraction (Beta)

The real value of educational videos often lives on the slides, not in the narration. But finding that one slide means scrubbing through the timeline endlessly.

BibiGPT's PPT keyframe extraction now automatically detects scene changes, captures unique keyframes, and groups subtitle text underneath each corresponding slide. The result is a visual outline — browse an entire video's key visuals like flipping through a PDF.

BibiGPT PPT keyframe extraction results in Keynote-style page browser

Screenshot Keyframe Analysis

BibiGPT has supported visual understanding for a while — AI can already analyze video frames. This update adds screenshot keyframe analysis on top of that: after extracting keyframes, you can have AI deeply analyze each screenshot for complex charts, code snippets, or presentation content, filling gaps that audio alone would miss.

Multiple vision models are available including GLM-5V Turbo and Qwen 3.5 Omni — switch freely based on your needs.

BibiGPT keyframe screenshot analysis panel showing visual analysis results

BibiGPT screenshot analysis model selector with GLM-5V Turbo and other vision models

More Recent Improvements

Beyond the major features above, here's what else we've shipped:

X/Twitter video fix: Pasting X video links used to play audio only — now fixed
Wan 2.7 video generation: New text-to-video, image-to-video modes (Pro exclusive)
Smart renewal reminders: Sidebar shows personalized reminders as your plan nears expiration
Subscription channel icons: YouTube, Bilibili, podcast icons now show in your subscription feed
Usage page upgrade: View historical usage by week/month/quarter with separate credit and API balance
Batch operation improvements: Better button naming and auto-validation when adding to collections

有反馈或建议？

我们非常重视您的意见！如果您在使用过程中遇到问题或有改进建议，请随时告诉我们。

提交反馈

Summary

This update takes BibiGPT's visual understanding to the next level: local privacy mode keeps sensitive content on your machine, hard subtitle OCR solves the classic "clear subtitles but bad audio" problem, and PPT extraction with screenshot analysis turns video slides into a browsable knowledge base.

Start your AI efficient learning journey now:

🌐 Official Website: https://aitodo.co
📱 Mobile Download: https://aitodo.co/app
💻 Desktop Download: https://aitodo.co/download/desktop
✨ Learn More Features: https://aitodo.co/features

立即体验 BibiGPT

想要体验这些强大的新功能吗？立即访问 BibiGPT，开启您的智能音视频总结之旅！

开始使用

Enjoy!

BibiGPT Team

BibiGPT v4.318.0 Update: PPT Extraction, Hard Subtitle OCR & Local Privacy Mode

立即体验 BibiGPT

👀 Quick View

Local Privacy Mode — Now on Desktop

Google Gemma 4 31B Model

🔍 Easy Search

Summary

Highlights

Questions

Key Terms

Hard Subtitle OCR Extraction (Beta)

🛠️ Better Use

PPT Keyframe Extraction (Beta)

Screenshot Keyframe Analysis

More Recent Improvements

有反馈或建议？

Summary

立即体验 BibiGPT

探索

技术支持

关于我们

条款

入门指南

平台功能

集成扩展

免费工具

高级工具

社交分享工具