The OCR Model That Outranks GPT-4o

by Ayush Kumar | August 19, 2025

Ready to build cheaper?

Custom CPU plans from as little as $0.012/hour.

NuMarkdown-8B-Thinking is a reasoning-powered OCR Vision-Language Model (VLM) built to transform documents into clean, structured Markdown. Fine-tuned from Qwen2.5-VL-7B, it introduces thinking tokens that help the model analyze complex layouts, tables, and unusual document structures before generating output. This makes it especially useful for RAG pipelines, document extraction, and knowledge organization. With its reasoning-first approach, NuMarkdown-8B-Thinking consistently outperforms generic OCR and even rivals large closed-source reasoning models in accuracy and layout understanding.

Arena ranking against popular alternatives (using trueskill-2 ranking system, with around 500 model-anonymized votes):

Rank	Model	μ	σ	μ − 3σ
🥇 1	gemini-flash-reasoning	26.75	0.80	24.35
🥈 2	NuMarkdown-reasoning	26.10	0.79	23.72
🥉 3	NuMarkdown-reasoning-w/o_grpo	25.32	0.80	22.93
4	OCRFlux-3B	24.63	0.80	22.22
5	gpt-4o	24.48	0.80	22.08
6	gemini-flash-w/o_reasoning	24.11	0.79	21.74
7	RolmoOCR	23.53	0.82	21.07

Win/Draw/Lose-rate against others Models

Model	Win (%)	Draw (%)	Lose (%)
RolmOCR	62%	13%	26%
gemini-flash-w/o_reasoning	59%	23%	18%
OCRflux-3B	57%	24%	18%
GPT-4o	52%	19%	29%
NuMarkdown-reasoning-w/o_GRPO	50%	36%	14%
gemini-flash-reasoning	25%	40%	35%

GPU Configuration Table – NuMarkdown-8B-Thinking

Deployment Type	Recommended GPU(s)	VRAM (per GPU)	System RAM	vCPU	Notes
Minimum (Demo / Testing)	1× NVIDIA A100 40GB	40 GB	64 GB	16	Will run, but may need reduced context length (shorter docs).
Recommended (Full Inference)	1× NVIDIA A100 80GB / H100 80GB	80 GB	128 GB	32	Smooth inference with full reasoning tokens (Markdown extraction from complex docs).
High-Performance / Multi-user	2× A100 80GB / 2× H100 80GB	80 GB each	256 GB	48	Parallel inference or batch OCR for large document pipelines.
Consumer-Grade (Experimental)	1× RTX 4090 (24 GB)	24 GB	64 GB	16	Possible only with 4-bit quantization (GGUF/QLoRA). Limited context & speed.
Cloud Lightweight Setup	1× L4 (24 GB)	24 GB	64 GB	12	Works with quantized weights, slower but cost-efficient for dev pipelines.

Step-by-Step Process to Install & Run NuMarkdown-8B-Thinking Locally

For the purpose of this tutorial, we will use a GPU-powered Virtual Machine offered by NodeShift; however, you can replicate the same steps with any other cloud provider of your choice. NodeShift provides the most affordable Virtual Machines at a scale that meets GDPR, SOC2, and ISO27001 requirements.

Step 1: Sign Up and Set Up a NodeShift Cloud Account

Visit the NodeShift Platform and create an account. Once you’ve signed up, log into your account.

Follow the account setup process and provide the necessary details and information.

Step 2: Create a GPU Node (Virtual Machine)

GPU Nodes are NodeShift’s GPU Virtual Machines, on-demand resources equipped with diverse GPUs ranging from H100s to A100s. These GPU-powered VMs provide enhanced environmental control, allowing configuration adjustments for GPUs, CPUs, RAM, and Storage based on specific requirements.

Navigate to the menu on the left side. Select the GPU Nodes option, create a GPU Node in the Dashboard, click the Create GPU Node button, and create your first Virtual Machine deploy

Step 3: Select a Model, Region, and Storage

In the “GPU Nodes” tab, select a GPU Model and Storage according to your needs and the geographical region where you want to launch your model.

We will use 1 x H100 SXM GPU for this tutorial to achieve the fastest performance. However, you can choose a more affordable GPU with less VRAM if that better suits your requirements.

Step 4: Select Authentication Method

There are two authentication methods available: Password and SSH Key. SSH keys are a more secure option. To create them, please refer to our official documentation.

Step 5: Choose an Image

In our previous blogs, we used pre-built images from the Templates tab when creating a Virtual Machine. However, for running NuMarkdown-8B-Thinking, we need a more customized environment with full CUDA development capabilities. That’s why, in this case, we switched to the Custom Image tab and selected a specific Docker image that meets all runtime and compatibility requirements.

We chose the following image:

nvidia/cuda:12.1.1-devel-ubuntu22.04

This image is essential because it includes:

Full CUDA toolkit (including nvcc)
Proper support for building and running GPU-based applications like NuMarkdown-8B-Thinking
Compatibility with CUDA 12.1.1 required by certain model operations

Launch Mode

We selected:

Interactive shell server

This gives us SSH access and full control over terminal operations — perfect for installing dependencies, running benchmarks, and launching models like NuMarkdown-8B-Thinking.

Docker Repository Authentication

We left all fields empty here.

Since the Docker image is publicly available on Docker Hub, no login credentials are required.

Identification

Template Name:

nvidia/cuda:12.1.1-devel-ubuntu22.04

CUDA and cuDNN images from gitlab.com/nvidia/cuda. Devel version contains full cuda toolkit with nvcc.

This setup ensures that the NuMarkdown-8B-Thinking runs in a GPU-enabled environment with proper CUDA access and high compute performance.

After choosing the image, click the ‘Create’ button, and your Virtual Machine will be deployed.

Step 6: Virtual Machine Successfully Deployed

You will get visual confirmation that your node is up and running.

Step 7: Connect to GPUs using SSH

NodeShift GPUs can be connected to and controlled through a terminal using the SSH key provided during GPU creation.

Once your GPU Node deployment is successfully created and has reached the ‘RUNNING’ status, you can navigate to the page of your GPU Deployment Instance. Then, click the ‘Connect’ button in the top right corner.

Now open your terminal and paste the proxy SSH IP or direct SSH IP.

Next, If you want to check the GPU details, run the command below:

nvidia-smi

Step 8: Check the Available Python version and Install the new version

Run the following commands to check the available Python version.

If you check the version of the python, system has Python 3.8.1 available by default. To install a higher version of Python, you’ll need to use the deadsnakes PPA.

Run the following commands to add the deadsnakes PPA:

sudo apt update
sudo apt install -y software-properties-common
sudo add-apt-repository -y ppa:deadsnakes/ppa
sudo apt update

Step 9: Install Python 3.11

Now, run the following command to install Python 3.11 or another desired version:

sudo apt install -y python3.11 python3.11-venv python3.11-dev

Step 10: Update the Default `Python3` Version

Now, run the following command to link the new Python version as the default python3:

sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.8 1
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.11 2
sudo update-alternatives --config python3

Then, run the following command to verify that the new Python version is active:

python3 --version

Step 11: Install and Update Pip

Run the following command to install and update the pip:

curl -O https://bootstrap.pypa.io/get-pip.py
python3.11 get-pip.py

Then, run the following command to check the version of pip:

pip --version

Step 12: Created and activated Python 3.11 virtual environment

Run the following commands to created and activated Python 3.11 virtual environment:

apt update && apt install -y python3.11-venv git wget
python3.11 -m venv numarkdown
source numarkdown/bin/activate

Step 13: Install Torch

Run the following command to install torch:

pip install "torchvision==0.18.1+cu121" --index-url https://download.pytorch.org/whl/cu121

Step 14: Install Dependencies

Run the following command to install dependencies:

pip install -U pillow transformers accelerate

Step 15: Connect to your GPU VM using Remote SSH

Open VS Code, cursor or choice of code editor on your Mac.
Press Cmd + Shift + P, then choose Remote-SSH: Connect to Host.
Select your configured host.
Once connected, you’ll see SSH: 149.7.4.3(Your VM IP) in the bottom-left status bar (like in the image).

Step 16: Create a New Python Script `ex.py` and Add the Following Code

Create a new python script (example: numarkdown.py) and add the following code:

import os
import torch
from PIL import Image
from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration

# --- Force stable attention backend (avoid FlashAttention-2) ---
os.environ["TRANSFORMERS_ATTENTION_IMPLEMENTATION"] = "sdpa"
os.environ["HF_USE_FLASH_ATTENTION_2"] = "0"

# --- Model & processor setup ---
model_id = "numind/NuMarkdown-8B-Thinking"

# Use slow processor to silence "fast vs slow" warnings (optional)
processor = AutoProcessor.from_pretrained(
    model_id,
    trust_remote_code=True,
    use_fast=False,  # keep legacy processor
    min_pixels=100 * 28 * 28,
    max_pixels=5000 * 28 * 28
)

model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    model_id,
    torch_dtype="bfloat16",        # efficient on modern GPUs
    device_map="auto",             # auto-GPU placement
    trust_remote_code=True,
    attn_implementation="sdpa",    # force PyTorch SDPA attention
)

# --- Input image (replace with your doc image) ---
img = Image.open("sample.png").convert("RGB")

# Optional downscale: keep under ~3–4 MP to save VRAM
MAX_SIDE = 2200
img.thumbnail((MAX_SIDE, MAX_SIDE))

# --- Prompt & inputs ---
messages = [{"role": "user", "content": [{"type": "image"}]}]
prompt = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
inputs = processor(text=prompt, images=[img], return_tensors="pt").to(model.device)

# --- Run inference ---
with torch.no_grad():
    out = model.generate(
        **inputs,
        temperature=1e-5,
        max_new_tokens=2000  # adjust if you need longer markdown
    )

result = processor.decode(out[0])

# --- Extract <answer> cleanly ---
def between(s, a, b):
    i = s.find(a)
    j = s.find(b, i + len(a))
    return s[i + len(a):j] if i != -1 and j != -1 else s

answer = between(result, "<answer>", "</answer>")
print(answer)

Step 17 — Upload Image via the Editor & Run the Script

17.1 Open the VM workspace in your editor

In VS Code: Remote Explorer → SSH Targets → connect to your VM → open /root (or your chosen project folder).
You should see your project files (numarkdown.py, etc.) in the left Explorer.

17.2 Upload your local image to the VM (drag & drop)

In VS Code Explorer (connected to the VM), right-click the folder where numarkdown.py lives (e.g., /root) and choose “Reveal in File Explorer” (optional) just to confirm location.
Drag your local image file (e.g., sample.png or myscan.jpg) from your laptop’s file manager into the VS Code Explorer for the VM workspace.
Confirm the upload when prompted. You should now see the image in the remote file list (e.g., /root/sample.png).

17.3 (Optional) Rename the file to match the script

If your script expects image.png:

In VS Code Explorer: right-click the uploaded file → Rename → image.png.

(Or skip this if your script accepts a CLI argument.)

17.4 Activate the venv in the editor’s terminal (remote)

In VS Code, open a terminal (Terminal → New Terminal). It’s already running on the VM.

source ~/numarkdown/bin/activate
cd ~

17.5 Run the extractor

If your script expects image.png:

python3 numarkdown.py

If your script accepts a filename:

python3 numarkdown.py sample.png

You’ll see the Markdown printed in the terminal.

17.6 Save the Markdown to a file (so you can open it in the editor)

# image.png route
python3 numarkdown.py > output.md

# argument route
python3 numarkdown.py sample.png > output.md

In VS Code Explorer, click output.md to preview the formatted result right in your editor.

17.7 Quick checks & common fixes

Don’t see the image in VS Code on the VM? You likely uploaded to a different folder. Check the terminal:

pwd && ls -lh

Make sure the image sits next to numarkdown.py (or pass its full path).

FileNotFoundError: 'image.png'
Rename your uploaded file to image.png or run python3 numarkdown.py <yourfile>.

Large scans / VRAM: If you hit OOM, downscale locally before upload, or let the script handle it (our script already thumbnails to ~3–4 MP).

Up until now, we’ve been running and interacting with our model directly from the terminal. That worked fine for quick tests, but now let’s make things smoother and more user-friendly by running it inside a browser interface. For that, we’ll use Streamlit, a lightweight Python framework that lets us build interactive web apps in just a few lines of code.

Step 18: Install Required Libraries for Browser App

First, install Streamlit along with a few other helper libraries we’ll need:

pip install streamlit pillow pdf2image pypdf transformers accelerate timm

This command will:

streamlit → run the browser app
pillow → handle image processing
pdf2image & pypdf → process PDFs
transformers, accelerate, timm → load and run the model efficiently

Step 19: Fix APT Sources, Update, and Install Poppler Utils

We’ll switch the Ubuntu mirror to the official archive, clean bad apt lists, update package indexes with resilience, and finally install poppler-utils (provides pdftoppm/pdftocairo) in one command.

sudo sed -i 's|http://mirror.serverion.com/ubuntu|http://archive.ubuntu.com/ubuntu|g' /etc/apt/sources.list && \
sudo apt-get clean && \
sudo rm -rf /var/lib/apt/lists/* && \
sudo apt-get update -o Acquire::Retries=3 --fix-missing && \
sudo apt-get install -y poppler-utils

Step 20: Create the Streamlit App Script (`app.py`)

We’ll write a full Streamlit UI that lets you upload an image or PDF, runs NuMarkdown-8B-Thinking, and returns clean Markdown (with an option to view the raw output that contains <think>).

Create app.py in your VM (inside your project folder) and add the following code:

import os
import io
import time
from typing import List, Tuple

import streamlit as st
import torch
from PIL import Image
from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration

# --- Force stable attention backend (avoid FlashAttention-2) ---
os.environ["TRANSFORMERS_ATTENTION_IMPLEMENTATION"] = "sdpa"
os.environ["HF_USE_FLASH_ATTENTION_2"] = "0"

MODEL_ID = "numind/NuMarkdown-8B-Thinking"
MAX_SIDE = 2200                           # ~3–4MP safety
MIN_PIXELS = 100 * 28 * 28               # model hint
MAX_PIXELS = 5000 * 28 * 28              # model hint
DEFAULT_MAX_NEW_TOKENS = 2000

st.set_page_config(page_title="NuMarkdown-8B-Thinking UI", layout="wide")

@st.cache_resource(show_spinner=True)
def load_model_and_processor():
    processor = AutoProcessor.from_pretrained(
        MODEL_ID,
        trust_remote_code=True,
        use_fast=False,          # quiet warnings, stable behavior
        min_pixels=MIN_PIXELS,
        max_pixels=MAX_PIXELS,
    )
    model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
        MODEL_ID,
        torch_dtype=torch.bfloat16,
        device_map="auto",
        trust_remote_code=True,
        attn_implementation="sdpa",
    )
    model.eval()
    return processor, model

def pil_from_upload(file) -> Image.Image:
    img = Image.open(file).convert("RGB")
    img.thumbnail((MAX_SIDE, MAX_SIDE))
    return img

def pdf_to_images(file_bytes: bytes, dpi: int = 200) -> List[Image.Image]:
    # Convert PDF bytes to a list of PIL images (requires poppler-utils)
    try:
        from pdf2image import convert_from_bytes
    except Exception as e:
        raise RuntimeError(
            "pdf2image is not available or Poppler is missing. "
            "Install with `pip install pdf2image` and `sudo apt-get install poppler-utils`."
        ) from e
    images = convert_from_bytes(file_bytes, dpi=dpi)
    # downscale each page to ~3–4MP max
    for i in range(len(images)):
        images[i] = images[i].convert("RGB")
        images[i].thumbnail((MAX_SIDE, MAX_SIDE))
    return images

def between(s: str, a: str, b: str) -> str:
    i = s.find(a)
    j = s.find(b, i + len(a))
    return s[i + len(a):j] if i != -1 and j != -1 else s

@torch.inference_mode()
def run_single_image(processor, model, img: Image.Image, temperature: float, max_new_tokens: int) -> Tuple[str, str]:
    messages = [{"role": "user", "content": [{"type": "image"}]}]
    prompt = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    inputs = processor(text=prompt, images=[img], return_tensors="pt").to(model.device)

    out = model.generate(
        **inputs,
        temperature=max(temperature, 1e-5),  # must be > 0 in recent transformers
        max_new_tokens=max_new_tokens,
    )
    text = processor.decode(out[0])
    answer = between(text, "<answer>", "</answer>")
    return answer, text  # (markdown, raw_with_think)

def concat_markdown(pages_md: List[str]) -> str:
    # Add page separators for clarity
    parts = []
    for i, md in enumerate(pages_md, 1):
        parts.append(f"\n\n---\n\n<!-- Page {i} -->\n\n{md.strip()}\n")
    return "".join(parts).strip()

# ----------------- UI -----------------

st.title("🧠 NuMarkdown-8B-Thinking — Document → Markdown")
st.caption("Upload a scanned page (PNG/JPG) or a PDF. The model reasons about layout, tables, etc., then returns clean Markdown.")

col_left, col_right = st.columns([2, 1])

with col_right:
    st.subheader("Settings")
    temperature = st.number_input("Temperature", value=0.00001, min_value=0.00001, max_value=2.0, step=0.00001, format="%.5f")
    max_new_tokens = st.number_input("Max new tokens", value=DEFAULT_MAX_NEW_TOKENS, min_value=200, max_value=6000, step=100)
    show_think = st.toggle("Show <think> (reasoning) raw output", value=False)
    run_button = st.button("Run Extraction", type="primary", use_container_width=True)

with col_left:
    upload = st.file_uploader("Upload an image or a PDF", type=["png", "jpg", "jpeg", "pdf"])

st.divider()

if run_button:
    if not upload:
        st.error("Please upload a PNG/JPG or PDF first.")
        st.stop()

    processor, model = load_model_and_processor()

    filetype = (upload.type or "").lower()
    start_time = time.time()

    if "pdf" in filetype or upload.name.lower().endswith(".pdf"):
        # PDF → images
        with st.status("Converting PDF to images…", expanded=False):
            pdf_bytes = upload.read()
            images = pdf_to_images(pdf_bytes, dpi=200)
        st.success(f"PDF pages: {len(images)}")

        pages_md = []
        progress = st.progress(0, text="Running model on pages…")
        for i, img in enumerate(images, 1):
            md, raw = run_single_image(processor, model, img, temperature, max_new_tokens)
            pages_md.append(md)
            progress.progress(i / len(images), text=f"Processed page {i}/{len(images)}")

            if show_think:
                with st.expander(f"Raw output (page {i})"):
                    st.code(raw)

        markdown_all = concat_markdown(pages_md)
        dur = time.time() - start_time

        st.subheader("📄 Markdown (all pages)")
        st.code(markdown_all, language="markdown")
        st.download_button("Download Markdown", data=markdown_all.encode("utf-8"),
                           file_name=f"{upload.name.rsplit('.',1)[0]}_extracted.md", mime="text/markdown")
        st.caption(f"Done in {dur:.1f}s")

    else:
        # Single image
        img = pil_from_upload(upload)
        st.image(img, caption="Input image", use_column_width=True)

        with st.status("Running model…", expanded=False):
            md, raw = run_single_image(processor, model, img, temperature, max_new_tokens)
        dur = time.time() - start_time

        st.subheader("📝 Markdown")
        st.code(md, language="markdown")
        st.download_button("Download Markdown", data=md.encode("utf-8"),
                           file_name=f"{upload.name.rsplit('.',1)[0]}_extracted.md", mime="text/markdown")

        if show_think:
            st.subheader("🧩 Raw output (with <think>)")
            st.code(raw)

        st.caption(f"Done in {dur:.1f}s")

Step 21: Launch the Streamlit App

Now that we’ve written our app.py Streamlit script, the next step is to launch the app from the terminal.

Run the following command inside your VM:

streamlit run app.py --server.port 7860 --server.address 0.0.0.0

--server.port 7860 → Runs the app on port 7860 (you can change it if needed).
--server.address 0.0.0.0 → Ensures the app is accessible externally (not just inside the VM).

Once executed, Streamlit will start the web server and you’ll see a message:

You can now view your Streamlit app in your browser.

URL: http://0.0.0.0:7860

Step 22: Access the Streamlit App in Browser

After launching the app, you’ll see the interface in your browser.

Go to:

http://0.0.0.0:7860/

Step 23: Upload and Extract Documents

Use the Drag and Drop or Browse files button to upload a scanned image (.jpg/.png) or a PDF.
Adjust Settings on the right:
- Temperature → Controls randomness (keep very low like 0.00001 for OCR).
- Max new tokens → Length of output (default: 2000).
- Show <think> reasoning → Optional, shows model’s reasoning process.
Click Run Extraction.

The model will process your input file, convert images/PDF pages into clean Markdown output, and display it below. You can copy or download this Markdown directly.

---

<!-- Page 1 -->

# Ayush Kumar

+91-998-4219-294 | ayushknj3@gmail.com | linktr.ee/Ayush7614
[in] ayush-kumar-984443191 | [Chat] Ayush7614 | [Twitter] @AyushKu38757918
Noida, Uttar Pradesh, India

### Objective
Developer Relations Engineer and Full-Stack Developer with deep expertise in open-source, cloud, LLMs, AI/ML, DevOps, and technical community building. Adept at creating large-scale developer education content and tools that empower engineers globally.

### Education
* ABES Engineering College
  * B.Tech in Electronics and Communication Engineering
  * – GPA: 7.7 / 10
  * – Courses: Operating Systems, Data Structures, Algorithms, AI, ML, Networking, Databases
  * July 2019 – August 2023
  * Ghaziabad, India

### Experience
* NodeShift AI Cloud
  * Lead Developer Relations Engineer
  * – Authored 150+ blogs on AI, LLMs, MCP, APIs, Web3, Gaming, Cloud, and TAK Server.
  * – Worked on the Dubai UAE Government’s TAK Server deployment project using NodeShift GPU and compute VMs.
  * – Designed and implemented marketing strategies to enhance brand visibility and audience engagement.
  * – Created developer-focused content in multiple formats (blogs, guides, videos) to educate and captivate our global community.
  * – Actively engaged with users across platforms to increase awareness and adoption of NodeShift services.
  * – Explored and initiated sponsorship and partnership opportunities across technical and developer communities.
  * – Reviewed customer feedback and usage patterns to refine developer experience and improve product documentation.
  * – Led efforts to improve and expand technical documentation to ensure a smoother onboarding experience and increased retention.
  * July 2024 – Present
  * Remote
* Techlatest.net
  * DevRel Engineer Consultant
  * – Content Lead – Developed strategy for AI/ML, DevOps, and GUI-based content.
  * – Authored 150+ blogs and tutorials across Cloud, Linux, Stable Diffusion, Flowise, Superset, etc.
  * – Built GUI Linux (Ubuntu, Kali, Rocky, Tails), Redash, VSCode, RStudio-based developer VMs.
  * – Created newsletters, video courses, and product documentation.
  * – Lead social media presence and SEO optimization; grow Discord and Twitter community.
  * – Worked across AWS, GCP, and Azure ecosystems for product testing and publishing.
  * March 2023 – July 2024
  * Estonia, Remote
* DEVs Dungeon
  * DevRel Engineer, Community Work (Part Time)
  * – Writing blogs for the DEVs Dungeon Community blog.
  * – Organizing Meetups and Hackathons in my Region.
  * – Participating in Events to Represent DEVs Dungeon.
  * – Social media marketing for DEVs Dungeon.
  * – Creating Content on GitHub, Twitter, and LinkedIn.
  * – Building and managing the community.
  * March 2023 – December 2023
  * Remote
* Google Summer of Code - Fossology
  * Student Developer
  * – Built REST APIs using ReactJs and improved legacy APIs.
  * – Created new endpoints with PHP and Slim Framework.
  * – Updated documentation using YAML files for API clarity.
  * May 2022 – August 2022
  * Remote


---

<!-- Page 2 -->

* **Humalect**
  * **DevRel Engineer (Intern)**
    – Content Lead for Humalect on social platforms.
    – Wrote blogs, newsletters, and planned podcasts.
    – Represented Humalect at events and built community.
  December 2022 – January 2023
  Remote

* **QwikSkills**
  * **Community Manager (Intern)**
    – Onboarded 300+ community members, hosted online events.
    – Managed Discord/Telegram and wrote community blogs.
    – Designed campaigns and handled technical support.
  August 2022 – January 2023
  Remote

* **NimbleEdge**
  * **Community Manager (Intern)**
    – Engaged OSS community and hosted global events.
    – Managed dev communities across GitHub, Discord, Meetup.
    – Created support content, handled social media and code issues.
  September 2022 – November 2022
  Remote

* **Keploy**
  * **Open Source Engineer (Intern)**
    – Set up CI/CD pipelines using GitHub Actions.
    – Built UI for Keploy website with ReactJs.
    – Contributed to the main platform.
  May 2022 – August 2022
  Remote

* **Keploy**
  * **DevRel Engineer (Intern)**
    – Provided API guidance and SDK support.
    – Built demo apps and participated in technical forums.
  April 2022 – July 2022
  Remote

* **CryptoCapable**
  * **DevRel Engineer (Intern)**
    – Promoted Web3, Crypto, Blockchain technologies.
    – Delivered talks and guided developer onboarding.
  February 2022 – April 2022
  Remote

* **Hyathi Technologies**
  * **Full Stack Developer (Intern)**
    – Built website MVP with React, Tailwind, NodeJS, MongoDB.
    – Implemented CI/CD using GitHub Actions.
  December 2021 – January 2022
  Remote

* **OneGo**
  * **Full Stack Developer (Intern)**
    – Developed startup site using HTML, CSS, Bootstrap.
    – Integrated Firebase backend, deployed via GitHub Actions.
  September 2021 – November 2021
  Ghaziabad, India

## Projects

* **Paanch-Editor**
  * **Responsive image editing tool using JS, HTML/CSS with 5+ effects**
    – Allows users to apply effects and download edited images directly in-browser.
  Remote

* **Etihaas Chrome Extension**
  * **Displays 'On this day' historical facts using public APIs**
    – Chrome extension shows history events for today’s date from API.
  Remote

* **Foody-Moody**
  * **Fusion food recipe site using React, Node, MongoDB**
    – Dynamic full-stack web app offering unique cuisine recipes.
  Remote

* **Tutorhuntz (Freelance)**
  * **Platform connecting tutors and students in 100+ subjects**
    – Built with React, Node.js, Express.js, Minimal UI, designed for academic support.
  Remote

* **Zipify**
  * **File compression web app built in Node.js**
    – Compress files into ZIPs using jszip and Express server.
  Remote

* **Women-Help Tracker**
  * **Health tracking web app for menstrual wellness**
    – Developed using HTML/CSS, Node.js, Python to support women’s wellness.
  Remote


---

<!-- Page 3 -->

## Honors and Awards

*   Winner – Smart India Hackathon 2022, led team of 5 to national victory.
*   First in college to become GitHub Campus Expert and GSoC contributor.
*   AWS Machine Learning and SUSE Cloud Native Scholarship by Udacity.
*   Top ranks: 3rd in KWOC, 5th SWOC, 17th JWOC, 81st DWOC, 6th CWOC.
*   Best Mentor Award – HSSOC, PSOC, DevicePT open source programs.

## Volunteer Experience

*   Founder – Nexus What The Hack: national-level hackathon community.
*   GitHub Campus Expert – Conducted 20+ technical events, meetups, and hackathons.
*   Auth0 Ambassador – Delivered tech sessions, supported community growth.
*   Mentor – SigmaHacks, CalHacks, Hack This November, HackVolunteer, Garuda Hacks.
*   Organized 15+ community bootcamps and mentored 2000+ budding OSS contributors.

Conclusion

NuMarkdown-8B-Thinking brings reasoning into OCR like never before. By combining the power of Qwen2.5-VL with fine-tuned thinking tokens, it doesn’t just extract text — it understands layouts, tables, and complex structures before producing clean Markdown. This reasoning-first approach makes it a strong choice for document extraction, RAG pipelines, and knowledge organization, often rivaling even closed-source models in accuracy.

With the setup steps we walked through — from provisioning a GPU VM to running the model inside an intuitive Streamlit interface — you now have a complete end-to-end workflow. You can upload PDFs or images, watch them convert into structured Markdown in real time, and immediately use that output in your own applications.

Whether you’re a researcher, developer, or enterprise team, NuMarkdown-8B-Thinking offers a practical, open, and high-performing solution for document intelligence. Try it on your own documents, plug it into your pipelines, and experience what reasoning-powered OCR can unlock.

Relevant blog posts

November 12, 2025

How to Install & Run SAP-RPT-1-OSS Locally?

sap-rpt-1-oss is SAP’s table-native, semantics-aware in-context learner for classification and regression. It embeds column names and cell values (no manual preprocessing), handles missing data, and scales quality with context size and bagging. For peak accuracy, it prefers big VRAM; for speed or smaller GPUs, just shrink the context and bagging.

November 11, 2025

How to Cut Your AI Costs in Half with TOON – The Smarter, Token-Optimized Alternative to JSON

Every token you send to an AI model costs money, and when your application scales, those costs can balloon fast. That’s where Token-Oriented Object Notation (TOON) steps in, offering a revolutionary way to save on API expenses without sacrificing data clarity or model accuracy. Designed as a compact, human-readable, and LLM-optimized alternative to JSON, TOON drastically reduces token usage by 30–60% across large structured datasets. It blends the simplicity of CSV, the readability of YAML, and the precision of JSON, creating a format that’s tailor-made for AI inputs. With features like tabular arrays, indentation-based hierarchy, and optional key folding, TOON helps models parse and reason about structured data more efficiently, all while maintaining perfect fidelity to your original dataset. The result? You send less data, get faster responses, and cut your AI inference costs dramatically, all by changing how you represent your data.

November 11, 2025

How to Install & Run Omnilingual ASR Locally?

Omnilingual ASR is Meta’s groundbreaking open-source speech recognition system built to support over 1,600 languages, including hundreds never before covered by any ASR model. It’s designed for inclusivity — allowing new languages to be added with just a few paired examples — and combines scalable zero-shot learning with flexible model architectures (Wav2Vec2, CTC, and LLM-based). The flagship OmniASR_LLM_7B model achieves state-of-the-art transcription accuracy, with character error rates (CER) below 10% for nearly 80% of supported languages, making it the most globally comprehensive ASR ever released. Each model is fully compatible with PyTorch, Fairseq2, and Hugging Face datasets, making it easy for developers and researchers to build multilingual transcription systems at scale.

See all posts

Ready to build
with us?

The ideal way for organizations young and old to ease their way into the distributed and affordable cloud at their own pace.

Stay Tuned!

Stay up to date with the latest updates, news, and hotfixes for our product.

NodeShift creates a vital link between developers and affordable cloud.

Switch theme

English (EN)
Arabic (AR)
Chinese (ZH-CN)
German (DE)
Korean (KO)
Russian (RU)
French (FR)
Spanish (ES)
Portuguese (PT)
Japanese (JA)

JavaScript is disabled in your browser. For a better experience, please enable JavaScript.Learn how to enable JavaScript.