VYROX builds a private Large Language Model on hardware you own - not a cloud server. 100% data privacy, full offline use, and zero monthly subscriptions. We size the GPU, install Ollama / vLLM / LM Studio with Qwen3.6, Kimi K2.6, GLM-5.1 or DeepSeek V4, tune it, and hand it over working.
Private. Offline-capable. No tokens metered. Runs on a single workstation in your office.
Engineered in Kuala Lumpur · deployed across Malaysia & Southeast Asia · MY · SG · TH · ID · PH · VN · BN
Founded 2015 · led by Ts. Dr. Leong Yee Rock - engineers who size, build & commission the system, then hand it over working. No black box, no lock-in.
Running a Large Language Model locally means the AI runs on your own computer or server - not a remote cloud like OpenAI or Anthropic. Your data never leaves the building.
Contracts, financials, source code and customer records are processed on a machine you own. Nothing is uploaded, logged by a vendor, or used to train someone else's model. PDPA-aligned by default.
No internet, no problem. The AI keeps running on the factory floor, in a clinic, at a remote site, or during an outage. No API downtime, no rate limits, no "service unavailable."
Cloud AI charges per user, per month, per token - forever. A local model is a one-time setup that then serves your whole team for the price of electricity. No per-seat licence.
A local model isn't a chatbot toy - it's the engine behind real systems your team uses daily. Filter by function; each shows the model + tool stack we'd deploy.
The same private model on the same hardware runs in three modes - along a spectrum of autonomy. Start with a chatbot for quick wins, add workflows for high-volume operations, then deploy agents for complex 24/7 work.
| At a glance | 💬 Chatbot | 🔀 Workflow | 🤖 Agentic |
|---|---|---|---|
| Autonomy | Low - human every turn | Medium - runs unattended | High - decides its own steps |
| Path | Free conversation | Fixed pipeline (DAG) | Planned at runtime |
| Determinism | n/a | High - repeatable | Lower - reasoned each run |
| Human oversight | Reads every answer | Reviews exceptions | Approves irreversible actions |
| Build effort | Days | 1-3 weeks | 3-8 weeks |
| Run cost (local) | Lowest | Low | Higher (many calls/run) |
| Best local models | Qwen3.6, Gemma 4, Llama | Qwen3.6, Qwen3-VL, Mistral | Kimi K2.6, GLM-5.1, DeepSeek V4, Qwen3.6 |
| Frameworks | Open WebUI, LibreChat | n8n, LangGraph, Flowise, Dify | LangGraph, CrewAI, AutoGen, OpenHands, MCP |
| Best for | Q&A, support, knowledge | High-volume repetitive ops | Complex multi-step, 24/7 automation |
The mature path: most clients start with a chatbot (value in days), automate their highest-volume process as a workflow, then add agents where the work is genuinely multi-step. All three run on the same local box - and VYROX engineers tool-calling, MCP connectors, guardrails and an eval harness so agents are reliable, not just impressive.
These three professions handle data that is legally confidential - patient records, privileged files, financial accounts. For them, local AI isn't just cheaper; it's often the only compliant option. Each is a co-pilot that keeps the licensed professional firmly in the loop.
⚕️ Decision-support only. A registered medical practitioner makes every clinical decision; the system does not diagnose or treat and is not a registered medical device. It assists documentation and retrieval, with the clinician reviewing all output.
📊 Assists; does not replace professional judgment. A qualified accountant reviews and signs off all output. Not tax or audit advice - it supports the work your licensed professional remains responsible for.
⚖️ Assists qualified legal professionals; it does not provide legal advice. The advocate & solicitor retains full professional responsibility and reviews all output before use.
A general model knows the public internet. It does not know your contracts, SOPs, pricing or customer history. RAG (Retrieval-Augmented Generation) grounds every answer in your own documents - with citations - and on a local deployment that data never leaves the building.
The best open models now land within a few points of frontier cloud on knowledge, science and coding benchmarks - and run on hardware you own. Scores are approximate, drawn from public leaderboards and model cards (mid-2026); harnesses differ, so treat ±2-3 points as noise.
| Model | Type | MMLU-Pro knowledge | GPQA science | SWE-bench coding |
|---|
On knowledge (MMLU-Pro ~84-90), graduate science (GPQA ~80) and coding (SWE-bench ~65-67), top open models match or beat older cloud models like GPT-4o - running privately on your hardware, with no per-token bill.
The hardest agentic, long-horizon tasks - large-repo autonomous coding, sustained multi-step planning - still favour the latest frontier Claude/Gemini/OpenAI. That's exactly what our hybrid setups route to the cloud, on demand.
For drafting, summarising, extraction, Q&A, translation and everyday coding - the bulk of business work - open models are more than good enough. You keep ~all the quality and stop paying for everything.
Now including the April-May 2026 wave - Qwen3.6, Kimi K2.6, GLM-5.1, DeepSeek V4, MiniMax M2.7 and Gemma 4. Every model below runs locally with Ollama, LM Studio or vLLM. Filter by task, sort by size, and see the VRAM each needs at Q4.
The practical accelerators for genuine local LLM hosting. Prices are indicative mid-2026 street estimates (an active DRAM/GPU shortage is inflating prices - we confirm exact figures at quote time).
| GPU | VRAM | Bandwidth | Approx RM | Biggest model @ Q4 | Class |
|---|---|---|---|---|---|
| Intel Arc B580 | 12 GB | 456 GB/s | RM 1.2k-1.6k | 7-8B (IPEX/Vulkan) | Budget |
| RTX 3060 | 12 GB | 360 GB/s | RM 1.3k-1.8k | 7-8B | Budget |
| Intel Arc A770 | 16 GB | 560 GB/s | RM 1.4k-1.9k | 13-14B | Budget |
| RTX 4060 Ti 16GB | 16 GB | 288 GB/s | RM 2.2k-2.8k | 13-14B | Budget |
| RTX 4070 | 12 GB | 504 GB/s | RM 2.8k-3.2k | 13B | Budget |
| RTX 5070 | 12 GB | 672 GB/s | RM 3k-3.7k | 13B | Consumer |
| RTX 5070 Ti | 16 GB | 896 GB/s | RM 4.2k-5.5k | 14B; GPT-OSS 20B | Consumer |
| RTX 3090 / Ti | 24 GB | 936 GB/s | RM 4.1k-6k (used) | 32B dense / 30B-A3B | Budget |
| RTX 4080 / Super | 16 GB | 717 GB/s | RM 4.6k-6k | 14B; GPT-OSS 20B | Consumer |
| RTX 4090 | 24 GB | ~1008 GB/s | RM 8.7k-12k | 32B dense / Gemma 3 27B | Prosumer |
| RTX 5080 | 16 GB | ~960 GB/s | RM 5.5k-8k | 14B; GPT-OSS 20B | Prosumer |
| RTX 5090 | 32 GB | 1792 GB/s | RM 14k+ | 32B comfortably; 70B Q3 tight | Prosumer |
| RTX PRO 5000 Blackwell | 48 / 72 GB | 1344 GB/s | RM 19k+ | 70B dense Q4 | Workstation |
| RTX 6000 Ada | 48 GB | 960 GB/s | RM 31k-37k | 70B dense Q4 | Workstation |
| AMD Radeon PRO W7900 | 48 GB | 864 GB/s | RM 16k-18k | 70B (ROCm) | Workstation |
| RTX PRO 6000 Blackwell | 96 GB | 1792 GB/s | RM 39k-44k | 120B-class on ONE card | Workstation |
| NVIDIA L4 | 24 GB | 300 GB/s | RM 11k+ | 32B (low-power, slow) | Datacenter |
| NVIDIA L40S | 48 GB | 864 GB/s | RM 32k-41k | 70B dense Q4 | Datacenter |
| A100 80GB | 80 GB | 2039 GB/s | RM 41k-69k | 120B-class; 235B (2×) | Datacenter |
| H100 | 80 GB | 3.35 TB/s | RM 115k-147k | 120B single; frontier multi | Datacenter |
| H200 | 141 GB | ~4.8 TB/s | RM 115k-161k | 235B single | Datacenter |
| B200 (Blackwell) | 180 GB | ~8.0 TB/s | RM 161k+ | 235B+ single; frontier | Datacenter |
| AMD Radeon 7900 XTX | 24 GB | 960 GB/s | RM 4.1k-5k | 32B (ROCm/Vulkan) | Budget |
| AMD Instinct MI300X | 192 GB | 5.3 TB/s | RM 46k-69k | 235B+ single GPU | Datacenter |
| AMD Instinct MI325X | 256 GB | 6.0 TB/s | RM 92k+ | 300B+ single GPU | Datacenter |
Multi-GPU note: consumer 40/50-series have no NVLink - GPUs talk over PCIe (tensor-parallel via vLLM, with overhead). 2× 24GB ≈ 48GB → 70B Q4; 4× 3090 ≈ 96GB → 120B-class. A single RTX PRO 6000 96GB often beats multi-GPU on simplicity and power.
| Chip | Max memory | Bandwidth | Product | Approx RM | Biggest model @ Q4 |
|---|---|---|---|---|---|
| M4 | 16-32 GB | 120 GB/s | Mac Mini / Air | RM 2.8k+ | 14B-30B-A3B |
| M4 Pro | 64 GB | 273 GB/s | Mac Mini Pro / MBP | RM 6.4k+ | 32B; 70B Q4 tight |
| M4 Max | 128 GB | 546 GB/s | Mac Studio / MBP 16 | RM 9.2k+ | 70B dense; GPT-OSS 120B |
| M5 Max NEW 2026 | 128 GB | ~546 GB/s | Mac Studio / MBP 16 | RM 10.5k+ | 70B dense; faster GPU + NPU, better tok/s |
| M3 Ultra | 256 GB | 819 GB/s | Mac Studio | RM 18.4k+ | 235B-class; Qwen3-235B Q4 |
Apple's unified memory lets the GPU address all RAM - a cheap path to huge models, limited by bandwidth not capacity. A 256GB M3 Ultra Mac Studio is the most popular single-box big-model machine. (Note: M4 Ultra was never released; Apple withdrew the 512GB option in 2026.)
| Device | Chip | Memory | What it runs | Approx RM |
|---|---|---|---|---|
| NVIDIA DGX Spark | GB10 Grace-Blackwell | 128 GB | up to ~200B Q4; CUDA-native dev box | RM 21.6k |
| Framework Desktop | Ryzen AI Max+ 395 | 128 GB unified | 70B, GPT-OSS 120B, Qwen3-235B Q4 | RM 9.2k-13k |
| GMKtec / Minisforum / Corsair | Ryzen AI Max+ 395 | 128 GB | same Strix Halo class | RM 11k-16k |
| Jetson Thor (AGX) | Blackwell edge | 128 GB | 70B-class at the edge | RM 16k |
| Jetson Orin Nano/AGX | Ampere edge | 8-64 GB | 7B-13B (robotics / CCTV) | RM 1.1k-9.2k |
DGX Spark vs Strix Halo: DGX Spark wins on CUDA software compatibility; Strix Halo wins on price (~half) and x86/Linux. Both are bandwidth-limited (~256-273 GB/s) - great for memory-heavy MoE inference, weaker on fast prefill.
| Accelerator | Local LLM? | Reality |
|---|---|---|
| Google Coral Edge TPU | ✗ No | Built for tiny vision CNNs. No DRAM, int8 only, no transformer/attention support - cannot run even a 1B LLM. |
| Google Cloud TPU (Trillium/Ironwood) | ⚠ Cloud-only | Powerful for training/serving via JAX, but rented hourly - not on-prem hardware you own. |
| Groq LPU | ⚠ Cloud | Ultra-low-latency inference as a cloud API; real deployments need racks. Not a consumer-local box. |
| Cerebras WSE-3 | ✗ No | Wafer-scale, $2M+ datacenter systems. Not local in any SME sense. |
| Hailo-8/10 NPU | ⚠ Niche | Edge vision / very small on-device models only. |
Takeaway: for genuine local LLM hosting, the practical accelerators are NVIDIA GPUs, AMD Instinct/Radeon, Apple Silicon, and unified-memory mini-PCs. We'll tell you honestly which fits - never sell you a Coral stick for an LLM.
| Tier | GPUs | Host | Use case | Approx RM |
|---|---|---|---|---|
| Entry rig | 2× RTX 3090/4090 | Threadripper, 128GB, 1500W | 70B Q4, small team | RM 23k-41k |
| Prosumer WS | 1-2× RTX PRO 6000 96GB | TR PRO, 256GB ECC | 120B single-box | RM 55k-100k |
| 4-GPU server | 4× L40S / RTX 6000 Ada | Dual EPYC, 512GB-1TB ECC, 4U | 235B Q4, multi-user vLLM | RM 184k-322k |
| 8-GPU HGX | 8× H100/H200 SXM | NVLink/NVSwitch, 2TB RAM, liquid | Frontier inference + training | RM 1.4M+ |
Engineering: 8× H100 ≈ 5.6kW (needs 3-phase power); 4U GPU servers are loud (liquid cooling above 4× SXM); EPYC/Threadripper PRO for PCIe 5.0 lanes; platinum/titanium PSUs with N+1 redundancy. VYROX sources, assembles, commissions and supports the whole node.
Choose a model, quantization and context length - we compute the memory and light up the hardware that fits.
Rule of thumb: Q4 VRAM (GB) ≈ params(B) × 0.6. MoE models size to total params for memory, but run at the speed of their active params. Always add KV-cache for long context.
| Model size | Q4_K_M | Q8 | FP16 | Example hardware (Q4) |
|---|---|---|---|---|
| 1-3B | ~1-2 GB | ~3 GB | ~6 GB | Any iGPU, phone, Jetson, 8GB GPU |
| 7-8B | ~5-6 GB | ~9 GB | ~16 GB | RTX 4060 8GB, M4 16GB |
| 13-14B | ~9-10 GB | ~16 GB | ~28 GB | RTX 4070/4080, M4 |
| 24-32B | ~16-20 GB | ~34 GB | ~65 GB | RTX 3090/4090/5090, M4 Pro |
| 70B dense | ~40-43 GB | ~75 GB | ~140 GB | 2× 3090/4090, RTX 6000 Ada, M4 Max |
| 120B MoE (GPT-OSS/Scout) | ~60-65 GB | ~120 GB | - | RTX PRO 6000 96GB, A100, M3 Ultra, DGX Spark |
| 235B MoE | ~135-145 GB | ~250 GB | - | 2× A100, MI300X, M3 Ultra 256GB |
| 671B (DeepSeek) | ~380-400 GB | ~700 GB | - | 8× H100/H200, multi-MI300X |
| ~1T MoE (Kimi K2.6 / DeepSeek V4) | ~550-880 GB | ~1-1.6 TB | - | 8× H200/B200 server |
Context cost: KV-cache grows with context length × layers. A 7-8B model adds ~0.5-1 GB per 8K tokens; a 70B at full 128K context can add 20-40GB+ - often the hidden cost that blows a VRAM budget. We size for weights plus your real context window.
Open weights don't all mean "free for business." And a multi-GPU box has real power, heat and uptime needs. We handle all of it - here's the honest picture.
| License | Models | Commercial use |
|---|---|---|
| Apache-2.0 | Qwen3 & Qwen3-Coder, GPT-OSS, Mistral Small 3.1 / Nemo / Devstral / Magistral, Gemma-adjacent | Free, permissive - yes |
| MIT | DeepSeek R1 / V3.1 (+ distills), Phi-4, GLM-4.5/4.6, bge-m3, Whisper | Free, permissive - yes |
| Gemma | Gemma 3 (1B-27B) | Yes, under Google's Gemma terms (prohibited-use policy applies) |
| Llama Community / Llama 4 | Llama 3.x, Llama 4 Scout / Maverick | Yes - but a special licence is required above 700M monthly active users |
| Mistral Research (MRL) | Mistral Large 2, Ministral 8B | Research/non-commercial weights - commercial needs a Mistral licence |
| MNPL (non-production) | Codestral 22B | Not for production without a commercial agreement |
VYROX defaults your build to permissively-licensed models (Apache/MIT) so you own your deployment outright - and flags any model with commercial restrictions before it's used.
| Tier | Peak draw | Heat output | Power / cooling | Noise |
|---|---|---|---|---|
| Desk AI | ~0.4-0.6 kW | ~2,000 BTU/h | Standard 13A wall socket, room air | Quiet desktop |
| Studio AI | ~0.6-0.9 kW | ~3,000 BTU/h | Dedicated 13-15A circuit, ventilated | Low-moderate |
| Engine AI | ~1.2-1.8 kW | ~6,000 BTU/h | 20A circuit, UPS, server cupboard / aircon | Server-grade fans |
| Rack AI (8-GPU) | ~6-7 kW | ~22,000 BTU/h | 3-phase power + room cooling / liquid, N+1 PSU | Loud - data-centre/room |
A Desk/Studio build sips a few ringgit of electricity a day. Engine tiers want a UPS and a ventilated cupboard. Rack tiers need real facilities - we assess your site and spec power, cooling and UPS as part of delivery.
Staff reach it on your LAN via a browser (Open WebUI) or an OpenAI-compatible API. Remote access over VPN; a reverse proxy + gateway handles auth, SSO and rate-limiting; vLLM load-balances many users across GPUs.
Models, the vector DB and configs are backed up and reproducible. For mission-critical use we add a warm spare, a cloud-failover line, or a second node so a single box is never a silent point of failure.
Hardware carries manufacturer warranty (typically 3 years on workstation/datacentre parts); we handle RMA. Optional managed-service tier adds monitoring, model upgrades and a same-business-day SLA. Leasing / instalment options available.
Pick a model and a machine - we'll stream sample text at the estimated tokens/sec so you can feel it.
Move the sliders to your team. Watch the cumulative cost diverge and find your break-even month.
Illustrative estimate. Cloud ≈ ChatGPT Team / Claude Team seats (~RM130-280/user/mo) at ≈ RM4.6/USD; local figure includes hardware + commissioning; electricity ~RM1-6/day. Your exact crossover is calculated in the free audit.
No single answer is right for everyone. Here's the comparison without the spin - including when cloud or hybrid is genuinely the better call.
| Dimension | Local / On-prem | Cloud API | Hybrid |
|---|---|---|---|
| Data privacy | Highest - never leaves you | Lowest - sent to a third party | High - sensitive stays local |
| Recurring cost | Low & fixed (power + support) | Variable - can balloon | Mixed - base + overflow |
| Upfront cost | Higher - hardware | Near zero | Moderate |
| Latency | Low, predictable (your LAN) | Internet + provider load | Depends on path |
| Offline operation | Yes | No | Partial |
| Frontier reasoning ceiling | Very good (hardware-capped) | Highest | Best of both |
| Scaling to spikes | Hardware-limited | Elastic | Burst to cloud |
| Maintenance | Managed by VYROX | Vendor-managed | Most complex |
For most SMEs handling private or regulated data with steady daily volume, local pays for itself in months and removes per-token billing risk. If cloud or hybrid fits you better, we'll tell you - and build that instead.
Complete, VYROX-commissioned systems - hardware sized, models loaded, runtime and agents wired in, staff trained.
How "users supported" is calculated: concurrent users = free VRAM after model weights ÷ KV-cache per session (≈ 2 × layers × kv-dim × context × precision), capped by GPU throughput ÷ a 15 tok/s per-user floor and by the runtime's parallel slots (Ollama ~8, vLLM many). "Team size" assumes typical intermittent office use (~1 active generation per 5 staff). Numbers shown are at ~8K context with FP16 KV-cache; Q8 KV-cache roughly doubles concurrency and shorter context increases it further. Use the cost configurator to model your exact model, context and concurrency.
Pick what you'll use AI for and your team - the recommendation, hardware, model and 5-year savings update instantly.
Two ways to use it: set how many concurrent users you need and leave hardware on Auto - we pick the cheapest build that serves them - or drive every variable yourself (LLM size, quantization, GPU, context, hours/day). The itemised build cost, capacity, payback and cloud savings update live. Indicative estimates; your exact quote comes from the audit.
Indicative estimate. Hardware ≈ Malaysia street pricing during the 2026 GPU/DRAM shortage; commissioning, electricity (TNB commercial ~RM0.50/kWh) and optional support included as shown. Cloud baseline ≈ Team-tier seats (+ agent API where selected). VYROX confirms exact specs, prices and a measured savings projection in the free audit.
A free 45-minute Local-AI Audit: your real cloud spend today, the right hardware, the models, and the costed payback - no obligation, no sales deck.
Built for the privacy-driven buyer. Choose a deployment mode, and layer on the controls your auditor expects.
The LLM runs on a server physically inside your office, factory or data centre. Reachable over your LAN/VPN; the public internet cannot touch it. Best balance of control and convenience.
No internet connection at all. Updates applied manually via controlled media. For the most sensitive environments - defence-adjacent, critical infrastructure, regulated health/finance.
Deployed inside your own cloud tenancy or a Malaysian data centre. You keep data residency and isolation, with cloud scalability - local-grade control without owning physical servers.
| PDPA alignment | Architected so personal data stays within your control and within Malaysia, supporting your PDPA 2010 obligations. |
| Data residency | You choose exactly where data physically lives - it can stay entirely on your premises / in Malaysia. |
| No third-party sharing | No prompts, documents or outputs are sent to any external AI provider. Full stop. |
| Role-based access (RBAC) | Users and teams only see the data and tools they're permitted to. |
| Audit logging | Every query and access is logged for traceability and incident review. |
| Encryption | At rest (documents, vector DB, model data on disk) and in transit (TLS across LAN/VPN). |
| Single Sign-On | Integrates with Microsoft Entra/Azure AD, Google Workspace, or LDAP. |
| ISO 27001-aligned process | We follow ISO 27001-aligned practices for access, change and key management during delivery. |
VYROX implements ISO 27001-aligned and PDPA-aligned controls and supports your compliance posture; formal certification of your organisation remains with you and your auditor.
The most polished, beginner-friendly graphical app. Browse, download and chat with models - no command line.
Developer favourite. One command - ollama run qwen3 - pulls and serves a model in the background.
Drop-in replacement for the OpenAI API. Point existing apps at your local server - text, audio and image, fully offline.
High-throughput serving for teams: tensor-parallel multi-GPU, batching, OpenAI-compatible endpoints.
| Runtime | Best for | Interface | Multi-GPU | Scale |
|---|---|---|---|---|
| Ollama | Easiest one-dev prototyping, any OS | CLI + API | Weak (1 GPU/req) | Single user |
| LM Studio | GUI-first desktop for non-CLI users | GUI + server | Limited | Solo → small team |
| vLLM | Production multi-user serving | Server + API | Strong (TP/PP) | Production team |
| llama.cpp | Edge / embedded / max format control | CLI + server | Yes (layers) | Single / edge |
| LocalAI | OpenAI-compatible API gateway over many backends | REST API | Via backend | Team / gateway |
| Open WebUI | Shared ChatGPT-style team interface | Web GUI | N/A (front-end) | Team chat |
Rule of thumb: solo + simplicity → Ollama / LM Studio · edge/embedded → llama.cpp · concurrent production serving → vLLM · one API over mixed backends → LocalAI · team chat UI → Open WebUI on top. VYROX picks and configures the right combination for your build.
A clear, six-phase engagement. Indicative timelines for a typical SME deployment; you get something concrete at every step.
We map use cases, review your data and infrastructure, and assess fit. You get: a written findings summary and an honest local-vs-cloud-vs-hybrid recommendation. No cost, no obligation.
We design models, hardware sizing, deployment mode, integrations and security controls. You get: a detailed spec and a fixed-price quote in RM with timeline.
We source and configure hardware (or provision your VPC), install the stack, and build your RAG/data pipeline. You get: a configured, tested system; procurement handled for you.
We deploy on your premises (or tenancy) and connect data sources, SSO and existing tools. You get: a live, integrated, security-hardened system.
We tune retrieval and prompts on your real data, run accuracy checks, and train your staff. You get: a validated system, trained users, and docs (EN/BM).
Monitoring, updates, model upgrades and a support line. You get: SLA-backed support and a roadmap for new use cases.
Fact: for everyday work - drafting, summarising, extraction, coding, internal Q&A - Qwen3.6, Kimi K2.6, GLM-5.1 and DeepSeek V4 run at quality very close to the big clouds. We build hybrids that call cloud only when it genuinely wins.
Fact: new open models ship monthly and are free. Swapping today's model for next year's best is a one-line change on the same hardware. Your rig gets smarter over time, for RM0.
Fact: every build ships with remote monitoring, free model upgrades, and a same-business-day SLA. Standard open-source, documented, your IT trained. No black box, no lock-in.
If your measured first-year savings don't beat the subscriptions it replaced, we re-tune the system at our cost until they do - and you keep the hardware either way. Every build also includes remote health monitoring, free model & runtime upgrades, and a same-business-day response SLA.
Book a free 45-minute Local-AI Audit. We measure your current cloud spend, spec the exact build, and give you the costed break-even date - in writing, no obligation.
No deck pitch. Just engineers sizing your build.