None
Promtec Innovation • mérida, yucatán • Posted June 27, 2026
About the Role
We're building something that doesn't exist yet in
Latin America: a domain-specific large language model
trained on a large Spanish-language corpus, deployed
on proprietary on-premise hardware, solving a real
problem for clients who are already waiting.
We can't tell you exactly what it is yet.
What we can tell you:
→ The corpus is real and large
→ The clients are real and paying
→ The hardware is ready
→ The team is small and the decisions matter
→ The person who joins now shapes the architecture
This is a fully on-site role in Mérida, Yucatán.
We want someone in the room — not because we don't
trust remote work, but because the knowledge needs
to live in the team, not in one person's laptop.
──────────────────────────────
WHAT YOU'LL BUILD
──────────────────────────────
→ Large-scale Spanish text dataset pipeline:
cleaning, deduplication, tokenization
→ Continual Pre-Training (CPT) on open-source
base models (Llama/Qwen family) on dedicated
GPU workstation — in our office
→ Supervised fine-tuning: SFT with LoRA/QLoRA,
HuggingFace + TRL
→ RLHF/DPO pipeline with domain expert annotators
→ Model quantization for on-premise deployment:
GGUF, MLX, llama.cpp
→ RAG system on PostgreSQL + pgvector
→ Evaluation suite + hallucination monitoring
──────────────────────────────
✅ YOU NEED THIS
──────────────────────────────
→ Python — advanced and demonstrable
→ Real ML framework experience: PyTorch,
TensorFlow, or Scikit-learn with actual
projects, not just certifications
→ Mathematical foundation: linear algebra,
stats, calculus — things you actually use
→ Linux CLI — the training workstation runs
Linux, full stop
→ English — reading ML papers and HuggingFace
docs is part of the daily job
→ Spanish native or C2 — the corpus is in
Spanish
→ Based in Mérida, Yucatán — fully on-site, no exceptions
──────────────────────────────
⭐ BONUS POINTS
──────────────────────────────
→ HuggingFace Transformers / TRL / PEFT
→ LLM fine-tuning: SFT, LoRA, QLoRA, DPO
→ RAG pipelines and vector databases (pgvector)
→ Ollama, llama.cpp, MLX (Apple Silicon)
→ Data engineering: ETL, scraping, text pipelines
→ Docker + CI/CD basics
──────────────────────────────
WHAT YOU GET
──────────────────────────────
Competitive salary — first call, directly,
no invented ranges
On-site in Mérida — real team, real
collaboration, real knowledge transfer
️ Dedicated GPU hardware — not your laptop