Solutions

Tower Supercharges LLM Inference for App Developers

Aug 5, 2025

If you’ve ever tried to integrate Large Language Model (LLM) inference into your application, you’ve likely run into some recurring headaches.

Developers face issues like:

Inconsistent model names across inference providers (e.g., llama3.2:3b in Ollama vs. meta-llama/Llama-3.2-3B-Instruct on Hugging Face).
Difficulty switching between inference providers without changing app code.
Local vs. cloud mismatch, making development and production environments behave differently.
Cost and latency issues with serverless inference - it can be expensive or slow during development.
Hardware limitations with local inference - large, accurate models don’t fit into consumer hardware.

Today we’re thrilled to announce new LLM inference capabilities in Tower, designed to eliminate the friction in moving from prototype to production.

With Tower, you can:

Prototype locally with Ollama to avoid cloud costs and latency.
Deploy seamlessly to serverless inference via Hugging Face Hub, Together.ai, or other providers.
Use the same code for dev and prod, thanks to smart model name resolution and environment-based secrets.

Unified LLM Interface

The heart of this improvement is the Llm class instantiated by the llms() helper function, which now gives developers a single, unified interface for all LLM inference tasks.

Whether you are testing inference on a small model locally, powered by Ollama, or routing inference requests via Hugging Face Hub to specialized providers like Together.ai, your Tower code stays the same.

Here’s all it takes to create an LLM in Tower:

import tower

# Create a language model instance
llm = tower.llms("llama3.2")

From here, you can choose the familiar chat-based completion interface to interact with the LLM:

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain what an Iceberg table is."}
]
response = llm.complete_chat(messages)

From Local Prototyping to Cloud Production

A common challenge for developers is moving from local experiments to production deployments. With Tower’s improved LLM inference support, this transition is frictionless.

Local development is powered by Ollama, making it easy to iterate without incurring cloud costs or hitting API rate limits:

ollama pull deepseek-r1:14b
ollama run deepseek-r1:14b

tower secrets create --environment="dev-local" \
  --name=TOWER_INFERENCE_ROUTER --value="ollama"

tower run --local --environment="dev-local" \
  --parameter=model_to_use='deepseek-r1:14b'

Once ready for production, switch to remote inference in the Tower cloud—no app code changes required. Instead, configure inference secrets for Hugging Face Hub:

tower secrets create --environment="prod" \
  --name=TOWER_INFERENCE_ROUTER --value="hugging_face_hub"

tower secrets create --environment="prod" \
  --name=TOWER_INFERENCE_ROUTER_API_KEY --value="hf_xxx"

tower run --environment="prod" \
  --parameter=model_to_use='deepseek-ai/DeepSeek-R1'

Tower will automatically direct requests to the right environment, letting you develop locally and deploy remotely with confidence.

Smarter Model Name Resolution

One of the most frustrating aspects of working with multiple LLM providers is model name inconsistency. Tower now abstracts away this complexity.

You can specify:

Model families like "deepseek-r1" or "llama3.2"
Exact model identifiers if you need a specific variant

During development, Tower resolves the model family to any locally available model version. In production, Tower resolves it to a servable remote model via your configured provider.

For example, the following app code:

model_name = os.getenv("MODEL_NAME", "llama3.2")
llm = tower.llms(model_name)

Will resolve …

Dev (Ollama) → llama3.2:3b
Prod (Hugging Face Hub) → meta-llama/Llama-3.2-3B-Instruct

This is the power of automatic model name resolution in Tower: you can use simple, human-friendly model family names, and Tower will seamlessly map them to the correct local or cloud model in production.

Why This Matters

These improvements make LLM inference in Tower:

Simpler – One interface for local and remote models
Flexible – Swap inference providers without changing code
Production-ready – Clean separation of dev and prod environments with secrets
Cost-efficient – Use local inference to save GPU hours during development

For data and AI teams, this means faster prototyping, smoother deployment, and less operational overhead.

Get Started

An example of this in action is our DeepSeek-Summarize-Github app. It demonstrates local-to-cloud transition using DeepSeek R1, moving from local inference during prototyping to Hugging Face + Together.ai in production.

If you’re ready to supercharge your LLM-powered workflows in Tower: