
If you’ve ever tried to integrate Large Language Model (LLM) inference into your application, you’ve likely run into some recurring headaches.
Developers face issues like:
Inconsistent model names across inference providers (e.g.,
llama3.2:3b
in Ollama vs.meta-llama/Llama-3.2-3B-Instruct
on Hugging Face).Difficulty switching between inference providers without changing app code.
Local vs. cloud mismatch, making development and production environments behave differently.
Cost and latency issues with serverless inference - it can be expensive or slow during development.
Hardware limitations with local inference - large, accurate models don’t fit into consumer hardware.
Today we’re thrilled to announce new LLM inference capabilities in Tower, designed to eliminate the friction in moving from prototype to production.
With Tower, you can:
Prototype locally with Ollama to avoid cloud costs and latency.
Deploy seamlessly to serverless inference via Hugging Face Hub, Together.ai, or other providers.
Use the same code for dev and prod, thanks to smart model name resolution and environment-based secrets.
Unified LLM Interface
The heart of this improvement is the Llm
class instantiated by the llms()
helper function, which now gives developers a single, unified interface for all LLM inference tasks.
Whether you are testing inference on a small model locally, powered by Ollama, or routing inference requests via Hugging Face Hub to specialized providers like Together.ai, your Tower code stays the same.
Here’s all it takes to create an LLM in Tower:
From here, you can choose the familiar chat-based completion interface to interact with the LLM:
From Local Prototyping to Cloud Production
A common challenge for developers is moving from local experiments to production deployments. With Tower’s improved LLM inference support, this transition is frictionless.
Local development is powered by Ollama, making it easy to iterate without incurring cloud costs or hitting API rate limits:
Once ready for production, switch to remote inference in the Tower cloud—no app code changes required. Instead, configure inference secrets for Hugging Face Hub:
Tower will automatically direct requests to the right environment, letting you develop locally and deploy remotely with confidence.
Smarter Model Name Resolution
One of the most frustrating aspects of working with multiple LLM providers is model name inconsistency. Tower now abstracts away this complexity.
You can specify:
Model families like
"deepseek-r1"
or"llama3.2"
Exact model identifiers if you need a specific variant
During development, Tower resolves the model family to any locally available model version. In production, Tower resolves it to a servable remote model via your configured provider.
For example, the following app code:
Will resolve …
Dev (Ollama) →
llama3.2:3b
Prod (Hugging Face Hub) →
meta-llama/Llama-3.2-3B-Instruct
This is the power of automatic model name resolution in Tower: you can use simple, human-friendly model family names, and Tower will seamlessly map them to the correct local or cloud model in production.
Why This Matters
These improvements make LLM inference in Tower:
Simpler – One interface for local and remote models
Flexible – Swap inference providers without changing code
Production-ready – Clean separation of dev and prod environments with secrets
Cost-efficient – Use local inference to save GPU hours during development
For data and AI teams, this means faster prototyping, smoother deployment, and less operational overhead.
Get Started
An example of this in action is our DeepSeek-Summarize-Github app. It demonstrates local-to-cloud transition using DeepSeek R1, moving from local inference during prototyping to Hugging Face + Together.ai in production.
If you’re ready to supercharge your LLM-powered workflows in Tower:
Get an intro into Model Inference
Learn more about how Inference works
Explore our examples on GitHub
Sign Up for Tower Beta
With Tower’s improved LLM inference, the path from idea to production has never been smoother.