NuMarkdown-8B-Thinking is a reasoning-powered OCR Vision-Language Model (VLM) built to transform documents into clean, structured Markdown. Fine-tuned from Qwen2.5-VL-7B, it introduces thinking tokens that help the model analyze complex layouts, tables, and unusual document structures before generating output. This makes it especially useful for RAG pipelines, document extraction, and knowledge organization. With its reasoning-first approach, NuMarkdown-8B-Thinking consistently outperforms generic OCR and even rivals large closed-source reasoning models in accuracy and layout understanding.
Arena ranking against popular alternatives (using trueskill-2 ranking system, with around 500 model-anonymized votes):
Rank | Model | μ | σ | μ − 3σ |
---|
🥇 1 | gemini-flash-reasoning | 26.75 | 0.80 | 24.35 |
🥈 2 | NuMarkdown-reasoning | 26.10 | 0.79 | 23.72 |
🥉 3 | NuMarkdown-reasoning-w/o_grpo | 25.32 | 0.80 | 22.93 |
4 | OCRFlux-3B | 24.63 | 0.80 | 22.22 |
5 | gpt-4o | 24.48 | 0.80 | 22.08 |
6 | gemini-flash-w/o_reasoning | 24.11 | 0.79 | 21.74 |
7 | RolmoOCR | 23.53 | 0.82 | 21.07 |
Win/Draw/Lose-rate against others Models
Model | Win (%) | Draw (%) | Lose (%) |
---|
RolmOCR | 62% | 13% | 26% |
gemini-flash-w/o_reasoning | 59% | 23% | 18% |
OCRflux-3B | 57% | 24% | 18% |
GPT-4o | 52% | 19% | 29% |
NuMarkdown-reasoning-w/o_GRPO | 50% | 36% | 14% |
gemini-flash-reasoning | 25% | 40% | 35% |
GPU Configuration Table – NuMarkdown-8B-Thinking
Deployment Type | Recommended GPU(s) | VRAM (per GPU) | System RAM | vCPU | Notes |
---|
Minimum (Demo / Testing) | 1× NVIDIA A100 40GB | 40 GB | 64 GB | 16 | Will run, but may need reduced context length (shorter docs). |
Recommended (Full Inference) | 1× NVIDIA A100 80GB / H100 80GB | 80 GB | 128 GB | 32 | Smooth inference with full reasoning tokens (Markdown extraction from complex docs). |
High-Performance / Multi-user | 2× A100 80GB / 2× H100 80GB | 80 GB each | 256 GB | 48 | Parallel inference or batch OCR for large document pipelines. |
Consumer-Grade (Experimental) | 1× RTX 4090 (24 GB) | 24 GB | 64 GB | 16 | Possible only with 4-bit quantization (GGUF/QLoRA). Limited context & speed. |
Cloud Lightweight Setup | 1× L4 (24 GB) | 24 GB | 64 GB | 12 | Works with quantized weights, slower but cost-efficient for dev pipelines. |
Step-by-Step Process to Install & Run NuMarkdown-8B-Thinking Locally
For the purpose of this tutorial, we will use a GPU-powered Virtual Machine offered by NodeShift; however, you can replicate the same steps with any other cloud provider of your choice. NodeShift provides the most affordable Virtual Machines at a scale that meets GDPR, SOC2, and ISO27001 requirements.
Step 1: Sign Up and Set Up a NodeShift Cloud Account
Visit the NodeShift Platform and create an account. Once you’ve signed up, log into your account.
Follow the account setup process and provide the necessary details and information.
Step 2: Create a GPU Node (Virtual Machine)
GPU Nodes are NodeShift’s GPU Virtual Machines, on-demand resources equipped with diverse GPUs ranging from H100s to A100s. These GPU-powered VMs provide enhanced environmental control, allowing configuration adjustments for GPUs, CPUs, RAM, and Storage based on specific requirements.
Navigate to the menu on the left side. Select the GPU Nodes option, create a GPU Node in the Dashboard, click the Create GPU Node button, and create your first Virtual Machine deploy
Step 3: Select a Model, Region, and Storage
In the “GPU Nodes” tab, select a GPU Model and Storage according to your needs and the geographical region where you want to launch your model.
We will use 1 x H100 SXM GPU for this tutorial to achieve the fastest performance. However, you can choose a more affordable GPU with less VRAM if that better suits your requirements.
Step 4: Select Authentication Method
There are two authentication methods available: Password and SSH Key. SSH keys are a more secure option. To create them, please refer to our official documentation.
Step 5: Choose an Image
In our previous blogs, we used pre-built images from the Templates tab when creating a Virtual Machine. However, for running NuMarkdown-8B-Thinking, we need a more customized environment with full CUDA development capabilities. That’s why, in this case, we switched to the Custom Image tab and selected a specific Docker image that meets all runtime and compatibility requirements.
We chose the following image:
nvidia/cuda:12.1.1-devel-ubuntu22.04
This image is essential because it includes:
- Full CUDA toolkit (including
nvcc
)
- Proper support for building and running GPU-based applications like NuMarkdown-8B-Thinking
- Compatibility with CUDA 12.1.1 required by certain model operations
Launch Mode
We selected:
Interactive shell server
This gives us SSH access and full control over terminal operations — perfect for installing dependencies, running benchmarks, and launching models like NuMarkdown-8B-Thinking.
Docker Repository Authentication
We left all fields empty here.
Since the Docker image is publicly available on Docker Hub, no login credentials are required.
Identification
nvidia/cuda:12.1.1-devel-ubuntu22.04
CUDA and cuDNN images from gitlab.com/nvidia/cuda. Devel version contains full cuda toolkit with nvcc.
This setup ensures that the NuMarkdown-8B-Thinking runs in a GPU-enabled environment with proper CUDA access and high compute performance.
After choosing the image, click the ‘Create’ button, and your Virtual Machine will be deployed.
Step 6: Virtual Machine Successfully Deployed
You will get visual confirmation that your node is up and running.
Step 7: Connect to GPUs using SSH
NodeShift GPUs can be connected to and controlled through a terminal using the SSH key provided during GPU creation.
Once your GPU Node deployment is successfully created and has reached the ‘RUNNING’ status, you can navigate to the page of your GPU Deployment Instance. Then, click the ‘Connect’ button in the top right corner.
Now open your terminal and paste the proxy SSH IP or direct SSH IP.
Next, If you want to check the GPU details, run the command below:
nvidia-smi
Step 8: Check the Available Python version and Install the new version
Run the following commands to check the available Python version.
If you check the version of the python, system has Python 3.8.1 available by default. To install a higher version of Python, you’ll need to use the deadsnakes
PPA.
Run the following commands to add the deadsnakes
PPA:
sudo apt update
sudo apt install -y software-properties-common
sudo add-apt-repository -y ppa:deadsnakes/ppa
sudo apt update
Step 9: Install Python 3.11
Now, run the following command to install Python 3.11 or another desired version:
sudo apt install -y python3.11 python3.11-venv python3.11-dev
Step 10: Update the Default Python3
Version
Now, run the following command to link the new Python version as the default python3
:
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.8 1
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.11 2
sudo update-alternatives --config python3
Then, run the following command to verify that the new Python version is active:
python3 --version
Step 11: Install and Update Pip
Run the following command to install and update the pip:
curl -O https://bootstrap.pypa.io/get-pip.py
python3.11 get-pip.py
Then, run the following command to check the version of pip:
pip --version
Step 12: Created and activated Python 3.11 virtual environment
Run the following commands to created and activated Python 3.11 virtual environment:
apt update && apt install -y python3.11-venv git wget
python3.11 -m venv numarkdown
source numarkdown/bin/activate
Step 13: Install Torch
Run the following command to install torch:
pip install "torchvision==0.18.1+cu121" --index-url https://download.pytorch.org/whl/cu121
Step 14: Install Dependencies
Run the following command to install dependencies:
pip install -U pillow transformers accelerate
Step 15: Connect to your GPU VM using Remote SSH
- Open VS Code, cursor or choice of code editor on your Mac.
- Press
Cmd + Shift + P
, then choose Remote-SSH: Connect to Host
.
- Select your configured host.
- Once connected, you’ll see
SSH: 149.7.4.3
(Your VM IP) in the bottom-left status bar (like in the image).
Step 16: Create a New Python Script ex.py
and Add the Following Code
Create a new python script (example: numarkdown.py) and add the following code:
import os
import torch
from PIL import Image
from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration
# --- Force stable attention backend (avoid FlashAttention-2) ---
os.environ["TRANSFORMERS_ATTENTION_IMPLEMENTATION"] = "sdpa"
os.environ["HF_USE_FLASH_ATTENTION_2"] = "0"
# --- Model & processor setup ---
model_id = "numind/NuMarkdown-8B-Thinking"
# Use slow processor to silence "fast vs slow" warnings (optional)
processor = AutoProcessor.from_pretrained(
model_id,
trust_remote_code=True,
use_fast=False, # keep legacy processor
min_pixels=100 * 28 * 28,
max_pixels=5000 * 28 * 28
)
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
model_id,
torch_dtype="bfloat16", # efficient on modern GPUs
device_map="auto", # auto-GPU placement
trust_remote_code=True,
attn_implementation="sdpa", # force PyTorch SDPA attention
)
# --- Input image (replace with your doc image) ---
img = Image.open("sample.png").convert("RGB")
# Optional downscale: keep under ~3–4 MP to save VRAM
MAX_SIDE = 2200
img.thumbnail((MAX_SIDE, MAX_SIDE))
# --- Prompt & inputs ---
messages = [{"role": "user", "content": [{"type": "image"}]}]
prompt = processor.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
inputs = processor(text=prompt, images=[img], return_tensors="pt").to(model.device)
# --- Run inference ---
with torch.no_grad():
out = model.generate(
**inputs,
temperature=1e-5,
max_new_tokens=2000 # adjust if you need longer markdown
)
result = processor.decode(out[0])
# --- Extract <answer> cleanly ---
def between(s, a, b):
i = s.find(a)
j = s.find(b, i + len(a))
return s[i + len(a):j] if i != -1 and j != -1 else s
answer = between(result, "<answer>", "</answer>")
print(answer)
Step 17 — Upload Image via the Editor & Run the Script
17.1 Open the VM workspace in your editor
- In VS Code: Remote Explorer → SSH Targets → connect to your VM → open
/root
(or your chosen project folder).
- You should see your project files (
numarkdown.py
, etc.) in the left Explorer.
17.2 Upload your local image to the VM (drag & drop)
- In VS Code Explorer (connected to the VM), right-click the folder where
numarkdown.py
lives (e.g., /root
) and choose “Reveal in File Explorer” (optional) just to confirm location.
- Drag your local image file (e.g.,
sample.png
or myscan.jpg
) from your laptop’s file manager into the VS Code Explorer for the VM workspace.
- Confirm the upload when prompted. You should now see the image in the remote file list (e.g.,
/root/sample.png
).
17.3 (Optional) Rename the file to match the script
If your script expects image.png
:
- In VS Code Explorer: right-click the uploaded file → Rename →
image.png
.
(Or skip this if your script accepts a CLI argument.)
17.4 Activate the venv in the editor’s terminal (remote)
In VS Code, open a terminal (Terminal → New Terminal). It’s already running on the VM.
source ~/numarkdown/bin/activate
cd ~
17.5 Run the extractor
If your script expects image.png
:
python3 numarkdown.py
If your script accepts a filename:
python3 numarkdown.py sample.png
You’ll see the Markdown printed in the terminal.
17.6 Save the Markdown to a file (so you can open it in the editor)
# image.png route
python3 numarkdown.py > output.md
# argument route
python3 numarkdown.py sample.png > output.md
In VS Code Explorer, click output.md
to preview the formatted result right in your editor.
17.7 Quick checks & common fixes
- Don’t see the image in VS Code on the VM? You likely uploaded to a different folder. Check the terminal:
pwd && ls -lh
Make sure the image sits next to numarkdown.py
(or pass its full path).
FileNotFoundError: 'image.png'
Rename your uploaded file to image.png
or run python3 numarkdown.py <yourfile>
.
Large scans / VRAM: If you hit OOM, downscale locally before upload, or let the script handle it (our script already thumbnails to ~3–4 MP).
Up until now, we’ve been running and interacting with our model directly from the terminal. That worked fine for quick tests, but now let’s make things smoother and more user-friendly by running it inside a browser interface. For that, we’ll use Streamlit, a lightweight Python framework that lets us build interactive web apps in just a few lines of code.
Step 18: Install Required Libraries for Browser App
First, install Streamlit along with a few other helper libraries we’ll need:
pip install streamlit pillow pdf2image pypdf transformers accelerate timm
This command will:
- streamlit → run the browser app
- pillow → handle image processing
- pdf2image & pypdf → process PDFs
- transformers, accelerate, timm → load and run the model efficiently
Step 19: Fix APT Sources, Update, and Install Poppler Utils
We’ll switch the Ubuntu mirror to the official archive, clean bad apt lists, update package indexes with resilience, and finally install poppler-utils (provides pdftoppm
/pdftocairo
) in one command.
sudo sed -i 's|http://mirror.serverion.com/ubuntu|http://archive.ubuntu.com/ubuntu|g' /etc/apt/sources.list && \
sudo apt-get clean && \
sudo rm -rf /var/lib/apt/lists/* && \
sudo apt-get update -o Acquire::Retries=3 --fix-missing && \
sudo apt-get install -y poppler-utils
Step 20: Create the Streamlit App Script (app.py
)
We’ll write a full Streamlit UI that lets you upload an image or PDF, runs NuMarkdown-8B-Thinking, and returns clean Markdown (with an option to view the raw output that contains <think>
).
Create app.py
in your VM (inside your project folder) and add the following code:
import os
import io
import time
from typing import List, Tuple
import streamlit as st
import torch
from PIL import Image
from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration
# --- Force stable attention backend (avoid FlashAttention-2) ---
os.environ["TRANSFORMERS_ATTENTION_IMPLEMENTATION"] = "sdpa"
os.environ["HF_USE_FLASH_ATTENTION_2"] = "0"
MODEL_ID = "numind/NuMarkdown-8B-Thinking"
MAX_SIDE = 2200 # ~3–4MP safety
MIN_PIXELS = 100 * 28 * 28 # model hint
MAX_PIXELS = 5000 * 28 * 28 # model hint
DEFAULT_MAX_NEW_TOKENS = 2000
st.set_page_config(page_title="NuMarkdown-8B-Thinking UI", layout="wide")
@st.cache_resource(show_spinner=True)
def load_model_and_processor():
processor = AutoProcessor.from_pretrained(
MODEL_ID,
trust_remote_code=True,
use_fast=False, # quiet warnings, stable behavior
min_pixels=MIN_PIXELS,
max_pixels=MAX_PIXELS,
)
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
MODEL_ID,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
attn_implementation="sdpa",
)
model.eval()
return processor, model
def pil_from_upload(file) -> Image.Image:
img = Image.open(file).convert("RGB")
img.thumbnail((MAX_SIDE, MAX_SIDE))
return img
def pdf_to_images(file_bytes: bytes, dpi: int = 200) -> List[Image.Image]:
# Convert PDF bytes to a list of PIL images (requires poppler-utils)
try:
from pdf2image import convert_from_bytes
except Exception as e:
raise RuntimeError(
"pdf2image is not available or Poppler is missing. "
"Install with `pip install pdf2image` and `sudo apt-get install poppler-utils`."
) from e
images = convert_from_bytes(file_bytes, dpi=dpi)
# downscale each page to ~3–4MP max
for i in range(len(images)):
images[i] = images[i].convert("RGB")
images[i].thumbnail((MAX_SIDE, MAX_SIDE))
return images
def between(s: str, a: str, b: str) -> str:
i = s.find(a)
j = s.find(b, i + len(a))
return s[i + len(a):j] if i != -1 and j != -1 else s
@torch.inference_mode()
def run_single_image(processor, model, img: Image.Image, temperature: float, max_new_tokens: int) -> Tuple[str, str]:
messages = [{"role": "user", "content": [{"type": "image"}]}]
prompt = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=prompt, images=[img], return_tensors="pt").to(model.device)
out = model.generate(
**inputs,
temperature=max(temperature, 1e-5), # must be > 0 in recent transformers
max_new_tokens=max_new_tokens,
)
text = processor.decode(out[0])
answer = between(text, "<answer>", "</answer>")
return answer, text # (markdown, raw_with_think)
def concat_markdown(pages_md: List[str]) -> str:
# Add page separators for clarity
parts = []
for i, md in enumerate(pages_md, 1):
parts.append(f"\n\n---\n\n<!-- Page {i} -->\n\n{md.strip()}\n")
return "".join(parts).strip()
# ----------------- UI -----------------
st.title("🧠 NuMarkdown-8B-Thinking — Document → Markdown")
st.caption("Upload a scanned page (PNG/JPG) or a PDF. The model reasons about layout, tables, etc., then returns clean Markdown.")
col_left, col_right = st.columns([2, 1])
with col_right:
st.subheader("Settings")
temperature = st.number_input("Temperature", value=0.00001, min_value=0.00001, max_value=2.0, step=0.00001, format="%.5f")
max_new_tokens = st.number_input("Max new tokens", value=DEFAULT_MAX_NEW_TOKENS, min_value=200, max_value=6000, step=100)
show_think = st.toggle("Show <think> (reasoning) raw output", value=False)
run_button = st.button("Run Extraction", type="primary", use_container_width=True)
with col_left:
upload = st.file_uploader("Upload an image or a PDF", type=["png", "jpg", "jpeg", "pdf"])
st.divider()
if run_button:
if not upload:
st.error("Please upload a PNG/JPG or PDF first.")
st.stop()
processor, model = load_model_and_processor()
filetype = (upload.type or "").lower()
start_time = time.time()
if "pdf" in filetype or upload.name.lower().endswith(".pdf"):
# PDF → images
with st.status("Converting PDF to images…", expanded=False):
pdf_bytes = upload.read()
images = pdf_to_images(pdf_bytes, dpi=200)
st.success(f"PDF pages: {len(images)}")
pages_md = []
progress = st.progress(0, text="Running model on pages…")
for i, img in enumerate(images, 1):
md, raw = run_single_image(processor, model, img, temperature, max_new_tokens)
pages_md.append(md)
progress.progress(i / len(images), text=f"Processed page {i}/{len(images)}")
if show_think:
with st.expander(f"Raw output (page {i})"):
st.code(raw)
markdown_all = concat_markdown(pages_md)
dur = time.time() - start_time
st.subheader("📄 Markdown (all pages)")
st.code(markdown_all, language="markdown")
st.download_button("Download Markdown", data=markdown_all.encode("utf-8"),
file_name=f"{upload.name.rsplit('.',1)[0]}_extracted.md", mime="text/markdown")
st.caption(f"Done in {dur:.1f}s")
else:
# Single image
img = pil_from_upload(upload)
st.image(img, caption="Input image", use_column_width=True)
with st.status("Running model…", expanded=False):
md, raw = run_single_image(processor, model, img, temperature, max_new_tokens)
dur = time.time() - start_time
st.subheader("📝 Markdown")
st.code(md, language="markdown")
st.download_button("Download Markdown", data=md.encode("utf-8"),
file_name=f"{upload.name.rsplit('.',1)[0]}_extracted.md", mime="text/markdown")
if show_think:
st.subheader("🧩 Raw output (with <think>)")
st.code(raw)
st.caption(f"Done in {dur:.1f}s")
Step 21: Launch the Streamlit App
Now that we’ve written our app.py
Streamlit script, the next step is to launch the app from the terminal.
Run the following command inside your VM:
streamlit run app.py --server.port 7860 --server.address 0.0.0.0
--server.port 7860
→ Runs the app on port 7860 (you can change it if needed).
--server.address 0.0.0.0
→ Ensures the app is accessible externally (not just inside the VM).
Once executed, Streamlit will start the web server and you’ll see a message:
You can now view your Streamlit app in your browser.
URL: http://0.0.0.0:7860
Step 22: Access the Streamlit App in Browser
After launching the app, you’ll see the interface in your browser.
http://0.0.0.0:7860/
Step 23: Upload and Extract Documents
- Use the Drag and Drop or Browse files button to upload a scanned image (
.jpg/.png
) or a PDF.
- Adjust Settings on the right:
- Temperature → Controls randomness (keep very low like
0.00001
for OCR).
- Max new tokens → Length of output (default:
2000
).
- Show <think> reasoning → Optional, shows model’s reasoning process.
- Click Run Extraction.
The model will process your input file, convert images/PDF pages into clean Markdown output, and display it below. You can copy or download this Markdown directly.
---
<!-- Page 1 -->
# Ayush Kumar
+91-998-4219-294 | ayushknj3@gmail.com | linktr.ee/Ayush7614
[in] ayush-kumar-984443191 | [Chat] Ayush7614 | [Twitter] @AyushKu38757918
Noida, Uttar Pradesh, India
### Objective
Developer Relations Engineer and Full-Stack Developer with deep expertise in open-source, cloud, LLMs, AI/ML, DevOps, and technical community building. Adept at creating large-scale developer education content and tools that empower engineers globally.
### Education
* ABES Engineering College
* B.Tech in Electronics and Communication Engineering
* – GPA: 7.7 / 10
* – Courses: Operating Systems, Data Structures, Algorithms, AI, ML, Networking, Databases
* July 2019 – August 2023
* Ghaziabad, India
### Experience
* NodeShift AI Cloud
* Lead Developer Relations Engineer
* – Authored 150+ blogs on AI, LLMs, MCP, APIs, Web3, Gaming, Cloud, and TAK Server.
* – Worked on the Dubai UAE Government’s TAK Server deployment project using NodeShift GPU and compute VMs.
* – Designed and implemented marketing strategies to enhance brand visibility and audience engagement.
* – Created developer-focused content in multiple formats (blogs, guides, videos) to educate and captivate our global community.
* – Actively engaged with users across platforms to increase awareness and adoption of NodeShift services.
* – Explored and initiated sponsorship and partnership opportunities across technical and developer communities.
* – Reviewed customer feedback and usage patterns to refine developer experience and improve product documentation.
* – Led efforts to improve and expand technical documentation to ensure a smoother onboarding experience and increased retention.
* July 2024 – Present
* Remote
* Techlatest.net
* DevRel Engineer Consultant
* – Content Lead – Developed strategy for AI/ML, DevOps, and GUI-based content.
* – Authored 150+ blogs and tutorials across Cloud, Linux, Stable Diffusion, Flowise, Superset, etc.
* – Built GUI Linux (Ubuntu, Kali, Rocky, Tails), Redash, VSCode, RStudio-based developer VMs.
* – Created newsletters, video courses, and product documentation.
* – Lead social media presence and SEO optimization; grow Discord and Twitter community.
* – Worked across AWS, GCP, and Azure ecosystems for product testing and publishing.
* March 2023 – July 2024
* Estonia, Remote
* DEVs Dungeon
* DevRel Engineer, Community Work (Part Time)
* – Writing blogs for the DEVs Dungeon Community blog.
* – Organizing Meetups and Hackathons in my Region.
* – Participating in Events to Represent DEVs Dungeon.
* – Social media marketing for DEVs Dungeon.
* – Creating Content on GitHub, Twitter, and LinkedIn.
* – Building and managing the community.
* March 2023 – December 2023
* Remote
* Google Summer of Code - Fossology
* Student Developer
* – Built REST APIs using ReactJs and improved legacy APIs.
* – Created new endpoints with PHP and Slim Framework.
* – Updated documentation using YAML files for API clarity.
* May 2022 – August 2022
* Remote
---
<!-- Page 2 -->
* **Humalect**
* **DevRel Engineer (Intern)**
– Content Lead for Humalect on social platforms.
– Wrote blogs, newsletters, and planned podcasts.
– Represented Humalect at events and built community.
December 2022 – January 2023
Remote
* **QwikSkills**
* **Community Manager (Intern)**
– Onboarded 300+ community members, hosted online events.
– Managed Discord/Telegram and wrote community blogs.
– Designed campaigns and handled technical support.
August 2022 – January 2023
Remote
* **NimbleEdge**
* **Community Manager (Intern)**
– Engaged OSS community and hosted global events.
– Managed dev communities across GitHub, Discord, Meetup.
– Created support content, handled social media and code issues.
September 2022 – November 2022
Remote
* **Keploy**
* **Open Source Engineer (Intern)**
– Set up CI/CD pipelines using GitHub Actions.
– Built UI for Keploy website with ReactJs.
– Contributed to the main platform.
May 2022 – August 2022
Remote
* **Keploy**
* **DevRel Engineer (Intern)**
– Provided API guidance and SDK support.
– Built demo apps and participated in technical forums.
April 2022 – July 2022
Remote
* **CryptoCapable**
* **DevRel Engineer (Intern)**
– Promoted Web3, Crypto, Blockchain technologies.
– Delivered talks and guided developer onboarding.
February 2022 – April 2022
Remote
* **Hyathi Technologies**
* **Full Stack Developer (Intern)**
– Built website MVP with React, Tailwind, NodeJS, MongoDB.
– Implemented CI/CD using GitHub Actions.
December 2021 – January 2022
Remote
* **OneGo**
* **Full Stack Developer (Intern)**
– Developed startup site using HTML, CSS, Bootstrap.
– Integrated Firebase backend, deployed via GitHub Actions.
September 2021 – November 2021
Ghaziabad, India
## Projects
* **Paanch-Editor**
* **Responsive image editing tool using JS, HTML/CSS with 5+ effects**
– Allows users to apply effects and download edited images directly in-browser.
Remote
* **Etihaas Chrome Extension**
* **Displays 'On this day' historical facts using public APIs**
– Chrome extension shows history events for today’s date from API.
Remote
* **Foody-Moody**
* **Fusion food recipe site using React, Node, MongoDB**
– Dynamic full-stack web app offering unique cuisine recipes.
Remote
* **Tutorhuntz (Freelance)**
* **Platform connecting tutors and students in 100+ subjects**
– Built with React, Node.js, Express.js, Minimal UI, designed for academic support.
Remote
* **Zipify**
* **File compression web app built in Node.js**
– Compress files into ZIPs using jszip and Express server.
Remote
* **Women-Help Tracker**
* **Health tracking web app for menstrual wellness**
– Developed using HTML/CSS, Node.js, Python to support women’s wellness.
Remote
---
<!-- Page 3 -->
## Honors and Awards
* Winner – Smart India Hackathon 2022, led team of 5 to national victory.
* First in college to become GitHub Campus Expert and GSoC contributor.
* AWS Machine Learning and SUSE Cloud Native Scholarship by Udacity.
* Top ranks: 3rd in KWOC, 5th SWOC, 17th JWOC, 81st DWOC, 6th CWOC.
* Best Mentor Award – HSSOC, PSOC, DevicePT open source programs.
## Volunteer Experience
* Founder – Nexus What The Hack: national-level hackathon community.
* GitHub Campus Expert – Conducted 20+ technical events, meetups, and hackathons.
* Auth0 Ambassador – Delivered tech sessions, supported community growth.
* Mentor – SigmaHacks, CalHacks, Hack This November, HackVolunteer, Garuda Hacks.
* Organized 15+ community bootcamps and mentored 2000+ budding OSS contributors.
Conclusion
NuMarkdown-8B-Thinking brings reasoning into OCR like never before. By combining the power of Qwen2.5-VL with fine-tuned thinking tokens, it doesn’t just extract text — it understands layouts, tables, and complex structures before producing clean Markdown. This reasoning-first approach makes it a strong choice for document extraction, RAG pipelines, and knowledge organization, often rivaling even closed-source models in accuracy.
With the setup steps we walked through — from provisioning a GPU VM to running the model inside an intuitive Streamlit interface — you now have a complete end-to-end workflow. You can upload PDFs or images, watch them convert into structured Markdown in real time, and immediately use that output in your own applications.
Whether you’re a researcher, developer, or enterprise team, NuMarkdown-8B-Thinking offers a practical, open, and high-performing solution for document intelligence. Try it on your own documents, plug it into your pipelines, and experience what reasoning-powered OCR can unlock.