Local LLM Visual Benchmark

Vibe testing local LLM's web code abilities

Apr 28, 2025

Inspired by the recent heptagon vibe code and the release of GLM-4 0414 test to create a few more visual code tests for local models to allow a simple in browser vibe benchmarking.

Stop Guessing, Start SEEING!

Open Source Local LLM Visual Code Test

No more wondering... did it actually work? I mean, the model spat out something that looks like HTML, CSS, and JavaScript. But does it RENDER? Is that a functional beauty, or is it a beige rectangle of despair or error page? Is that animation smooth, or does it seizure across the screen?

No more code roulette when picking your local LLM, feed a whole gang of them your choices of prompts for some cool visual thing, it spits back a wall of HTML/CSS/JS, and you hold your breath... will it render? Or will it just summon a blank page haunted by console errors?

So, what is this glorious beast?

It's a benchmark testing suite focused on visual code generation using local LLMs running on your own hardware.

VISUAL Benchmarking
Forget just checking syntax. This tool is about the render. Does the code hold up? Do the animations... animate? We want to SEE the results!

Backend Freedom
Use either KoboldCpp or LlamaCpp. Plug your favorite local inference engine in and let the benchmarking commence. (Go ahead, pit that 7B model against the fine-tuned 70B beast. FOR SCIENCE!)

Your Prompts, Your Playground
Don't like my tests? Got a burning desire to see if an LLM can generate a convincing simulation of competitive cheese rolling? Use any prompt you want! The stage is yours. Unleash your creative (or slightly unhinged) benchmarking ideas!

Smart Result Snipping
There are a few handy utility scripts to automatically gatehr the htmla nd create a comparison dashboard output (currently focusing on HTML-based goodies). Less digital dumpster diving for you

Side-by-Side Showdown Dashboard
My results are up at github

What Tests are Included?

My current set of test prompts:

Aquarium
Generating a serene screen saver styles fish tank simulation with HTML, CSS, and JS. Will the fish swim or just... listlessly float?

Bouncing Balls
Tasking models with a classic bouncing ball physics simulation, can they handle gravity and boundaries?

Happy Mrs. Chicken
A simulation challenge demanding a chicken that wanders randomly and lays eggs. Peak visual computing right here. Let's see if the AI cracks under pressure!

Fireworks Display
Can the LLM light up the digital sky with an autonomous fireworks display? Or will it just fizzle out?

The Heptagon
Every viber’s favorite, a spinning heptagon containing bouncing balls. A delightful test of geometry, animation, and collision logic. What could possibly go wrong?

Lava Lamp
Groovy! Generating a '60s-style lava lamp simulation. Will it be a mesmerizing flow, a blobby mess or just fail to light up?

Vehicle Crash
Simple vehicles moving around and crashing into each other. Surprisingly tricky for testing basic object animation and interaction logic. Let the demolision begin.

Words are cheap, pixels are priceless! Check out the latest results from my own runs. Witness the triumphs and the glorious failures:

Prepare Your Eyeballs:

Current Results from 28 April 2025

Feeling brave? Want to run your own models through this visual gauntlet? Peek at the code? Add your own prompt for generating sentient toast? Hit up the repo:

Source Code on GitHub
https://github.com/electricazimuth/LocalLLM_VisualCodeTest/

Load up your favorite models, and put their visual coding chops to the test. Let me know what you discover, what breaks, and what amazing/horrifying things your LLMs create

Happy Benchmarking

Electric Azimuth, Machine Learning and Technical Engineering

Discussion about this post

Ready for more?