DeepSeek V4 Preview × BibiGPT — Pro + Flash dual SKU
DeepSeek published the V4 Preview lineup on 2026-04-24 — V4-Pro (1.6T MoE / 49B active) and V4-Flash (284B / 13B active), both at 1M context, with a new Hybrid CSA+HCA attention scheme and three API modes (Fast / Expert / Vision). BibiGPT users can pair this preview lineup for hour-long video, podcast and multi-document summarization once it lands behind the routing layer.
Key facts (90-second read)
As of 2026-05-08: DeepSeek released the V4 Preview lineup on 2026-04-24. Two SKUs ship together — V4-Pro (1.6T MoE / 49B active) and V4-Flash (284B / 13B active) — both at a 1M token context window, both running on the new Hybrid CSA + HCA attention scheme, both reachable through Fast / Expert / Vision API modes. Compared with the earlier V4 release (covered separately at /features/deepseek-v4-1m-context-explained), V4 Preview's news is the dual SKU split, the Hybrid CSA+HCA attention upgrade, and the explicit three-mode API surface — not the 1M jump itself. For BibiGPT users, V4-Flash is the cheap default for hour-long video and podcast summarization, V4-Pro is reserved for harder reasoning passes on the same transcript, and Vision mode pairs cleanly with BibiGPT's frame-extraction workflow. Authoritative sources: api-docs.deepseek.com news260424 and the deepseek-ai collection on Hugging Face.
Features
What ships in DeepSeek V4 Preview?
Two SKUs released together on 2026-04-24 — V4-Pro and V4-Flash — both at a 1M token context window, both running on the new Hybrid CSA+HCA attention scheme, both reachable through three distinct API modes.
Pro vs Flash dual SKU
V4-Pro is a 1.6T MoE checkpoint with 49B parameters firing per token. V4-Flash is a 284B MoE checkpoint with only 13B active per token — same context window, same attention scheme, but a much lighter inference footprint at a fraction of the per-token cost.
Hybrid CSA + HCA attention
V4 Preview replaces prior MoE-only attention with Hybrid CSA + HCA — cross-shared attention plus hierarchical-causal attention. The hybrid scheme is designed to keep semantic coherence intact across long documents instead of degrading toward the tail of the context window.
Three API modes — Fast / Expert / Vision
Each preview SKU is reachable through three modes. Fast prioritizes throughput; Expert prioritizes reasoning quality; Vision adds multimodal input on top of the same backbone — one API surface, three knobs to dial cost-vs-quality and modality.
What V4 Preview means for BibiGPT users
BibiGPT turns hour-long videos and podcasts into structured notes. V4-Flash drops the per-token cost of a 1M-context summarization run hard, V4-Pro reaches for the highest reasoning ceiling, and Vision opens the door for screenshot-grade frame analysis — all on the same context budget.
1M context — 8h podcast end-to-end
1,000,000 tokens fit a full 8-hour conference recording, an entire multi-episode course, or a stack of related papers in one prompt. BibiGPT's chunk-and-stitch pipeline can collapse to a single inference for content that previously needed retrieval, cutting cross-chunk reference loss between hours one and eight.
V4-Flash unlocks cheap long-context summary
Only 13B parameters fire per token on V4-Flash. For BibiGPT-style summarization workloads — long transcript in, structured outline out — Flash is the dominant cost-quality point inside the 1M-context tier. Pro is reserved for harder reasoning hops on the same transcript.
Vision mode + BibiGPT visual analysis
V4-Vision takes screenshots and frames as input. BibiGPT's existing visual analysis workflow — extract key frames from a video, then ask the model what's on screen — can pair directly with V4-Vision once exposed in the routing layer. Frame-level Q&A becomes one inference, not a separate captioner pass.
5 key changes (90-second read)
Headline shifts from the DeepSeek V4 Preview release on 2026-04-24.
- 1
Pro vs Flash dual SKU
V4-Pro is 1.6T MoE / 49B active per token. V4-Flash is 284B / 13B active — same context window, same attention, much lighter inference. Pick Flash for cheap long-context summarization, Pro for harder reasoning passes on the same transcript.
- 2
Hybrid CSA + HCA attention
Cross-shared attention plus hierarchical-causal attention replaces V4's MoE-only attention. The hybrid scheme is designed to preserve semantic coherence across the full 1M-token context — the failure mode that hour-long-video summarization runs into.
- 3
Three API modes — Fast / Expert / Vision
Each Preview SKU exposes Fast (throughput), Expert (reasoning quality) and Vision (multimodal input) on the same API surface. One context budget, three knobs to dial cost-vs-quality and modality.
- 4
1M context, 8h-podcast friendly
Both Pro and Flash keep the V4 family's 1M token context window. A full 8-hour conference recording or a multi-episode course series fits in one prompt — BibiGPT's chunk-and-stitch pipeline can collapse to a single inference for content that previously needed retrieval.
- 5
Open weights on Hugging Face
V4 Preview checkpoints land in the deepseek-ai collection on Hugging Face the same week. Self-hostable for privacy-sensitive workloads — paywalled course content, internal meeting recordings — without sending audio or transcripts to a third-party API.
3 typical scenarios for BibiGPT users
Grounded in real BibiGPT user personas — all actionable today by extracting a transcript with BibiGPT and calling V4 Preview directly until native routing lands.
Creator — 8-hour podcast, single-prompt outline
Use BibiGPT to extract an 8-hour podcast or all-day conference recording transcript, then route the outline-and-summary step through V4-Flash on Expert mode. Full transcript fits in 1M context, so chapter references stay coherent end-to-end without chunk-stitch artifacts.
Student — multi-episode course cross-Q&A
Concatenate BibiGPT-extracted transcripts from a multi-episode lecture series. With 1M headroom, ask 'which episode covered topic X?' and resolve directly on V4-Flash without an external retrieval index that drops citations between episode boundaries.
Power user — frame-level visual analysis with V4-Vision
Extract key frames from a slide-deck talk or chart-heavy video with BibiGPT, then send the frames to V4-Vision alongside the transcript. Frame-level Q&A — 'what's the y-axis on slide 14?' — collapses to one inference, no separate captioner pass.
FAQ'S
Frequently Asked Questions
Ask us anything!
Run V4-Flash on a 1M-context podcast — start with BibiGPT transcript extraction
BibiGPT extracts long-form transcripts from YouTube, Bilibili and podcast URLs in 5 languages. Pair the transcript with V4-Flash for the cheapest 1M-context summary point in this tier, V4-Pro for hardest reasoning, V4-Vision for frame-level analysis. Once V4 Preview is routed inside BibiGPT, the same workflow runs end-to-end behind a single URL.