Personal homepage China time

AI product engineer / multimodal systems

I code.

Vague product intent becomes controllable systems: pixels, prompts, data flows, services, deployments, and tools that make teams move faster. The same instinct applies to perception: name the pattern, expose the state, then make it adjustable.

A builder for the messy middle of AI products.

Product judgment, model behavior, workflow design, and production engineering meet here. The useful part is rarely one model; it is the system around it.

HA Distributed, highly available services
Data Structured and unstructured processing
Infra Model serving, containers, internal tools
Lead/SRE Delivery ownership and operations reliability
Product lead Break down ambiguous AIGC needs, scope capability boundaries, and keep product, design, frontend, backend, and workflow teams moving together.
Agent systems Build LangChain-based agents with search, reading, image generation, video retrieval, and structured output for stable frontend rendering.
Multimodal AI Ship text, image, voice, and visual workflow systems using Stable Diffusion, LoRA, GPT-SoVITS, ComfyUI, FastAPI, Docker, and Gradio.
Infra / SRE Turn repeated operations into reliable tools, from model-serving infrastructure to database operations automation, container optimization, and production remediation.

Private lab work, grouped by control surface.

Some work starts from scratch; some starts as a fork, clone, or borrowed scaffold. The source matters less than the control surface: what becomes observable, programmable, and reliable enough to use.

Personal AI OS

Devices as interfaces

Phones, NFC, USB, voice, local events, message channels, and always-on gateways as one personal control plane.

devices / voice / events / gateways
Knowledge engines

Memory as infrastructure

Persistent ingest, source traceability, schema-constrained extraction, graph relevance, and epistemic scoring.

schemas / queues / graphs
Agent interfaces

Messy platforms, clean tools

Real-world surfaces wrapped as CLI, MCP, skills, JSON-first contracts, and safety rails that agents can actually use.

CLI / MCP / agent I/O
Generative canvases

Pixels under direction

Whiteboards, timelines, scene protocols, TTS, animation, GPU scheduling, and model behavior control.

pixels / timelines / model control

Art is a timeline you can scrub.

A small scroll-driven video study: motion, text, shade, and progress all map to one controlled time axis. The feeling is soft; the system underneath is exact.

ScrollVideo study

Scroll becomes time.

Frame, annotation, and exit state share one progress value.

0% / 0.00s

Proof in products, not pitch decks.

PageOn

AI PPT generation with a visual representation language, structured LLM output, and agents that compose multimodal presentation content.

app.pageon.ai

Cyber Space

Multimodal AI chat product on the App Store, combining text, voice, image input, Stable Diffusion generation, and voice conversion services.

App Store

Tencent SRE systems

Automated operations, container optimization, and SRE tooling for TDSQL: faster single-machine deployments, reusable Python ops packages, and alert remediation at scale.

Automation operations / container optimization / SRE

Every pixel and every bit is controllable.

The ontology is simple: people do not store reality as causal chains first. Memory stores co-occurrence: scene, mood, body, words, timing. Prediction turns those bundles into expectation; narrative often upgrades expectation into causality.

Co-occurrence

Scene, feeling, language, and timing are encoded together.

Expectation

Repeated patterns become the way a person anticipates the next frame.

Control

Code makes the hidden map explicit: pixels for perception, bits for state, systems for behavior.