ML Experiments with MediaPipe and TensorFlow.js at openCode using Ollama's LLMs
sign language recognition system using MediaPipe and TensorFlow.js involves real-time hand tracking for finding ASK Alphabets, the pure vibe coding results
My First Crazy Attempt with OpenCode, Ollama, Local LLMs, Machine Learning
I have been working with Code Editor based Vibe coding for the past year. I particularly enjoy using Github Copilot and Cursor. My preference leans towards Claude Models, but I avoid using claude-code because I like to keep an eye on the code and understand what’s happening. Therefore, I prefer to use them alongside code editors.
I’m considering trying out Claude-code at some point when I feel I have enough patience. However, while I wait, I’ve noticed that Ollama has been consistently pushing many LLMs locally, which has piqued my interest.
So, I have some questions about them.
I’m aware of Ollama’s capabilities, which allow us to download all the Free LLMs. I opted for llama3.2, deepseker-coder, and qwen3-coder-next for my local setup.
Initially, I thought about making them work together with code editors like Cursor, but I’ve decided to give them a chance with OpenCode to see what they can accomplish.
In case you’re not familiar, Ollama is an open-source tool that makes it easy to run, manage, and interact with large language models (LLMs) on your own hardware. It streamlines the process of downloading and running models like Llama 3 while providing privacy and low latency. It now also supports cloud models, and since I plan to do some development, I chose Qwen3-Coder, llama3.2, and deepseker-code.
How to make these work together.
- Download Ollama and pull required models to local like
ollama pull qwen3-coder-nextand host it to local asollama server - In the configuration update the config file for corresponding application like
model,baseurl,config - The install the opencode
npm install -g opencodeand keet it ready opencode --model ollama/qwen3-coder-next"create a REST API in Node.js" ready simple command can start application development
Why Machine Learning?
I was having crazy line up in the list to do
- Full-stack application with solid. Tanstack-next, astro and Golang
- Data Science & Analytics, using scala in databricks
- Mobile App Development with flutter/Dart (Not even react native)
- Automation E2E testing with cucumber and cypress/testcafe
- Exploring Kubernetes with DevOps
I definitely don’t want to stick with the typical web application that interacts with a REST API and database, which is my usual routine. Instead, I’m eager to explore something different, to take a path I’ve never ventured down before, even if just for a while.
Back in 2016, while I was at L&T Infotech, I was part of the Automation Center of Excellence Team. We were developing mosaic applications that utilized ITSM data to perform analytics and automation. Most of my teammates were well-versed in Angular, Node, Python, C++, and Big Data, but R in Azure was the domain of my senior, which others didn’t know much about. However, we understood that it was used for machine learning, so today I decided to dive into that realm. After all, machine learning is a crucial stage for all the AI agents we utilize.
This idea of implementing machine learning has been occupying my thoughts for over a decade. I never really had the chance to study or explore it — I’ve always wanted to get involved — but life, work, and countless other priorities kept pushing it aside. There was no formal course, no dedicated study time, and no weekend projects. Just a quiet “someday” that kept getting postponed.
So, when the chance arose to experiment with OpenCode and a local LLM, I thought — why not seize the moment? Why not use this as the opportunity I’ve been waiting for? Instead of sticking to the safe full-stack territory where I can easily identify the AI’s mistakes, I want to step into something entirely new and see what unfolds when the AI takes the lead in thinking.
The main reason to download the ollama even before opencode made its announcement is that if you keep up with news regarding AI-driven development, you may have already heard about the RALPH WIGGUM AI LOOP TECHNIQUE. I considered experimenting with that using local models rather than opting for a subscription model, which would ultimately be quite expensive. However, I got caught up with work and didn't pursue it. Now that I have some free time, the trend isn't generating enough excitement for me to dive in, so I plan to focus on something else instead.
The Sysem configuration
When we use local LLM, GPU are very important, I tried first with my mini pc with configuration of 32GB / AMD Ryzen 7 8745HS w/ Radeon 780M Graphics which if doesn't works then thought of using the other gaming laptop which have dedicated graphics unit, fortunately it worked here, but it took some time.
Models Downloaded
- Deepseek-coder
- Llama3.2
- qwen3-coder-next
Picking the Project: ASL Sign Language Recognition
I didn't want to be reckless about it though. Machine learning is vast — computer vision, NLP, reinforcement learning, generative models — and I didn't want to pick up something that would require infrastructure I don't have or datasets that don't exist.
So, I went safely. ASL (American Sign Language) alphabet recognition. Not full sentence detection, not real-time translation of conversations — just the 24 static hand signs for the alphabet (J and Z involve motion, so they're typically excluded). A well-studied problem with plenty of reference material and existing datasets.
I found the Sign Language Dataset on Kaggle — about 860 samples across 24 classes, stored as .npy files with 300x300 pixel images. Good enough for a learning experience.
The Stack That Emerged
I say "emerged" because I didn't architect this. The LLM did. Here's what OpenCode, powered by local Llama and Qwen3-Coder, decided to build:
- Next.js 15 with the App Router and React 19 for the frontend
- Tailwind CSS v4 for styling (dark theme, gradient text, the whole modern look)
- MediaPipe HandLandmarker for real-time hand detection through the webcam
- TensorFlow.js running a custom CNN model entirely in the browser
- Firebase Hosting for deployment as a static site (my choice to host)
The entire ML pipeline runs client-side. No backend server, no API calls for inference. Your webcam feed goes through MediaPipe to detect 21 hand landmarks, those landmarks define a bounding box that gets cropped and resized to 64x64 grayscale, and then a CNN makes the prediction. All in your browser.
Webcam → MediaPipe Hand Detection → Crop Hand Region (64x64)
→ TensorFlow.js CNN → Softmax over 24 classes → Predicted Letter
The Model
The CNN architecture is straightforward — three convolutional layers with max pooling, followed by a dense layer and softmax output:
- Conv2D(32 filters) → ReLU → MaxPool
- Conv2D(64 filters) → ReLU → MaxPool
- Conv2D(64 filters) → ReLU
- Flatten → Dense(64) → Dense(24) → Softmax
Trained for 20 epochs with Adam optimizer on the Kaggle dataset. Nothing fancy. The training scripts exist in both Python (train_cnn.py using Keras) and JavaScript (train.js using TensorFlow.js) — the LLM apparently hedged its bets on which runtime I'd prefer.
The Vibe Coding Reality
Here's the part that still feels surreal to me: I did not read a single line of code.
Not one. Not the React component that handles the webcam feed. Not the TensorFlow.js inference logic. Not the MediaPipe initialization. Not the training scripts. Not the Firebase configuration. Nothing.
This is pure vibe coding in every sense of the term. I described what I wanted, pointed OpenCode at the dataset, and let the local LLMs figure out the rest. The SignLanguageDetector.tsx component — all 295 lines of it handling webcam capture, hand skeleton visualization, canvas manipulation, tensor preprocessing, and real-time prediction display — came entirely from the AI.
Does that make me nervous? A little. Do I know if the tensor normalization is correct? No. Do I know if the MediaPipe WASM initialization is optimal? Also no. But it works.
Does It Actually Work?
Yes. Sort of.
You open the app, grant camera access, hold up a hand sign, and it shows you a predicted letter with a confidence percentage. The hand skeleton overlay draws in real-time — green connectors between landmarks, red dots at each joint. There's a 40% confidence threshold, so it won't show wild guesses.
Is it accurate? Not really. The dataset is small (~860 samples), the model is basic, and the gap between training data (static .npy images) and real-world webcam input (varying lighting, backgrounds, hand sizes, angles) is significant. Some letters it gets right consistently. Others, it confidently gets wrong. Classic ML problems that would need more data, augmentation, and a better architecture to solve.
But here's the thing — accuracy wasn't the point.
What Was the Point Then?
This project was never about building a production-grade sign language recognition system. It was never about pushing OpenCode to its limits either. It was about three things:
- Finally touching machine learning with my own hands (well, the AI's hands, but you know what I mean) after wanting to for over a decade.
- Seeing what local LLMs can do when you throw them at a domain you're completely unfamiliar with.
- Experiencing pure vibe coding — giving up control entirely and seeing where it leads.
And on all three counts, it delivered. I now have a deployed web app at signlangrecweb1.web.app that does real-time hand sign recognition in the browser. I have training scripts I can iterate on. I have a mental model of how a CNN pipeline works end-to-end, from raw data to browser inference — even if I absorbed it through osmosis rather than study.
Reflections on OpenCode with Local LLMs
Running Llama and Qwen3-Coder locally via OpenCode turned out to be surprisingly effective for this type of project. The models grasped the ML pipeline, structured TensorFlow.js code, managed MediaPipe integration, established Firebase deployment, and even devised various training methods (both Python and Node.js versions).
Were there any issues? Likely. I can't say for sure — after all, I didn't examine the code. However, it's noteworthy that a locally-operated LLM was able to scaffold an entire ML web application from description to deployment. There were no cloud API calls, no subscription fees, just local computing handling everything.
What's Next?
Honestly? I might actually read the code now. Maybe. Or maybe I'll just keep vibing and see if the next iteration can handle J and Z (the motion-based letters), or bump accuracy with data augmentation, or try a landmark-based model instead of raw pixels.
Or maybe I'll move on to the next shiny thing. That's the beauty of experiments — there's no obligation to finish, only to learn.