BibiGPT v4.318.0 Update: PPT Extraction, Hard Subtitle OCR & Local Privacy Mode

BibiGPT v4.318.0 brings PPT keyframe extraction, hard subtitle OCR, local privacy mode on desktop, Google Gemma 4 31B model, and screenshot visual analysis — evolving from listening to truly seeing your videos.

BibiGPT Team

BibiGPT v4.318.0 Update: PPT Extraction, Hard Subtitle OCR & Local Privacy Mode

Dear BibiGPT users,

This update focuses on Quick View, Easy Search, and Better Use — we gave our AI "eyes" so it can now read PPT slides and burned-in subtitles directly from video frames. Plus, local privacy mode is now available on desktop. Here's what's new.

Experience BibiGPT now

Ready to try these powerful features? Visit BibiGPT and start your intelligent audio/video summarization journey!

Get started

👀 Quick View

Local Privacy Mode — Now on Desktop

Worried about uploading sensitive meeting recordings or personal memos to the cloud?

Local privacy mode has expanded from web to macOS and Windows clients. When enabled, speech recognition and summary generation run entirely on your local machine — no server uploads, no database storage. Physical-level privacy isolation, perfect for classified interviews, internal training recordings, or personal voice memos.

BibiGPT desktop client local privacy mode upload toggleBibiGPT desktop client local privacy mode upload toggle

Google Gemma 4 31B Model

We've added Google Gemma 4 (31B) to the model selector — one of the most talked-about open-source models right now.

Fully open-sourced under the Apache 2.0 license, this 31-billion-parameter model excels at logical reasoning and long-context understanding, supports 140+ languages, and comes with native multimodal capabilities. Try running a few videos through Gemma 4 — different models bring genuinely different perspectives.

BibiGPT model selector searching for Google Gemma 4 31BBibiGPT model selector searching for Google Gemma 4 31B

See BibiGPT's AI Summary in Action

Let's build GPT: from scratch, in code, spelled out

Let's build GPT: from scratch, in code, spelled out

Andrej Karpathy walks through building a tiny GPT in PyTorch — tokenizer, attention, transformer block, training loop.

Summary

Andrej Karpathy spends two hours rebuilding a tiny but architecturally faithful version of GPT in a single Jupyter notebook. He starts from a 1MB Shakespeare text file with a character-level tokenizer, derives self-attention from a humble running average, layers in queries/keys/values, scales up to multi-head attention, and stacks the canonical transformer block. By the end the model produces uncanny pseudo-Shakespeare and the audience has a complete mental map of pretraining, supervised fine-tuning, and RLHF — the three stages that turn a next-token predictor into ChatGPT.

Highlights

  • 🧱 Build the dumbest version first. A bigram baseline gives a working training loop and a loss number to beat before any attention is introduced.
  • 🧮 Self-attention rederived three times. Explicit loop → triangular matmul → softmax-weighted matmul makes the formula click instead of memorise.
  • 🎯 Queries, keys, values are just learned linear projections. Once you see them as that, the famous attention diagram stops being magical.
  • 🩺 Residuals + LayerNorm are what make depth trainable. Karpathy shows how each one earns its place in a transformer block.
  • 🌍 Pretraining is only stage one. The toy model is what we built; supervised fine-tuning and RLHF are what turn it into an assistant.

#GPT #Transformer #Attention #LLM #AndrejKarpathy

Questions

  1. Why start with character-level tokens instead of BPE?
    • To keep the vocabulary tiny (65 symbols) and the focus on the model. Production GPTs use BPE for efficiency, but the architecture is identical.
  2. Why scale dot-product attention by 1/√d_k?
    • It keeps the variance of the scores roughly constant as the head dimension grows, so the softmax does not collapse to a one-hot distribution.
  3. What separates the toy GPT from ChatGPT?
    • Scale (billions vs. tens of millions of parameters), data, and two extra training stages: supervised fine-tuning on conversation data and reinforcement learning from human feedback.

Key Terms

  • Bigram model: A baseline language model that predicts the next token using only the previous token, implemented as a single embedding lookup.
  • Self-attention: A mechanism where each token attends to all earlier tokens via softmax-weighted dot products of query and key projections.
  • LayerNorm (pre-norm): Normalisation applied before each sublayer in modern transformers; keeps activations well-conditioned and lets you train deeper.
  • RLHF: Reinforcement learning from human feedback — the alignment stage that nudges a pretrained model toward responses humans actually prefer.

Want to summarize your own videos?

BibiGPT supports YouTube, Bilibili, TikTok and 30+ platforms with one-click AI summaries

Try BibiGPT Free

Hard Subtitle OCR Extraction (Beta)

Some videos have subtitles burned directly into the frames — no CC track, and traditional ASR chokes on background noise.

BibiGPT can now read them directly from video frames using OCR. Great for noisy street interviews, lectures with heavy accents, or any video where on-screen text is clear but audio quality isn't. Currently supports Chinese, English, Japanese, French, German, and Spanish.

BibiGPT hard subtitle OCR recognition processBibiGPT hard subtitle OCR recognition process

BibiGPT already understood video visuals — now it goes further by reading on-screen text directly.

🛠️ Better Use

PPT Keyframe Extraction (Beta)

The real value of educational videos often lives on the slides, not in the narration. But finding that one slide means scrubbing through the timeline endlessly.

BibiGPT's PPT keyframe extraction now automatically detects scene changes, captures unique keyframes, and groups subtitle text underneath each corresponding slide. The result is a visual outline — browse an entire video's key visuals like flipping through a PDF.

BibiGPT PPT keyframe extraction results in Keynote-style page browserBibiGPT PPT keyframe extraction results in Keynote-style page browser

Screenshot Keyframe Analysis

BibiGPT has supported visual understanding for a while — AI can already analyze video frames. This update adds screenshot keyframe analysis on top of that: after extracting keyframes, you can have AI deeply analyze each screenshot for complex charts, code snippets, or presentation content, filling gaps that audio alone would miss.

Multiple vision models are available including GLM-5V Turbo and Qwen 3.5 Omni — switch freely based on your needs.

BibiGPT keyframe screenshot analysis panel showing visual analysis resultsBibiGPT keyframe screenshot analysis panel showing visual analysis results

BibiGPT screenshot analysis model selector with GLM-5V Turbo and other vision modelsBibiGPT screenshot analysis model selector with GLM-5V Turbo and other vision models

More Recent Improvements

Beyond the major features above, here's what else we've shipped:

  • X/Twitter video fix: Pasting X video links used to play audio only — now fixed
  • Wan 2.7 video generation: New text-to-video, image-to-video modes (Pro exclusive)
  • Smart renewal reminders: Sidebar shows personalized reminders as your plan nears expiration
  • Subscription channel icons: YouTube, Bilibili, podcast icons now show in your subscription feed
  • Usage page upgrade: View historical usage by week/month/quarter with separate credit and API balance
  • Batch operation improvements: Better button naming and auto-validation when adding to collections

Have feedback or ideas?

We value your input! If you encounter issues or have suggestions, please let us know anytime.

Submit feedback

Summary

This update takes BibiGPT's visual understanding to the next level: local privacy mode keeps sensitive content on your machine, hard subtitle OCR solves the classic "clear subtitles but bad audio" problem, and PPT extraction with screenshot analysis turns video slides into a browsable knowledge base.

Start your AI efficient learning journey now:

Experience BibiGPT now

Ready to try these powerful features? Visit BibiGPT and start your intelligent audio/video summarization journey!

Get started

Enjoy!

BibiGPT Team