The Energy Bill Nobody Talks About
Running AI workloads is expensive. Most organisations focus on the compute costs showing up in their cloud invoices, but there is a parallel cost that gets far less attention: the carbon emissions and energy consumption baked into every training run, every inference call, and every idle GPU sitting at 15% utilisation waiting for the next job.
Australian data centres consumed roughly 4-5% of the country's electricity in recent years, and that share is climbing as AI adoption accelerates. If your organisation is running significant machine learning workloads - whether on-premises, in a hyperscaler environment, or through a managed hosting provider - you almost certainly have room to reduce both your energy spend and your emissions without sacrificing performance.
Green AI data centre management is not about virtue signalling. It is about operational discipline applied to a resource that most engineering teams treat as essentially free.
Why AI Workloads Are Particularly Energy-Intensive
Standard web applications have relatively predictable, modest compute profiles. AI workloads are different in three ways that make energy management harder.
Spiky, unpredictable demand. A training job might run for six hours at near-100% GPU utilisation, then the cluster sits idle overnight. Batch inference pipelines often follow similar patterns. This creates situations where you are paying for - and generating emissions from - hardware that is not doing useful work.
Memory bandwidth bottlenecks. Modern large language models and vision transformers are memory-bandwidth-constrained, not compute-constrained. This means the GPU is drawing close to full power while waiting for data, which is thermally and energetically inefficient.
Cooling amplification. Every watt of compute generates heat that requires cooling to remove. In a typical data centre with a Power Usage Effectiveness (PUE) ratio of 1.4, for every 1 kW of IT load you are actually consuming 1.4 kW total. High-density AI racks can push PUE-equivalent overhead higher if cooling infrastructure was not designed for them.
Understanding these characteristics is the starting point for any serious green AI data centre strategy.
Intelligent Workload Scheduling: The Biggest Lever You Have
The single highest-impact change most organisations can make is shifting non-time-sensitive workloads to run when energy is cheaper, cleaner, or both.
Time-shifting to low-carbon windows
Australia's National Electricity Market (NEM) has significant renewable penetration variation throughout the day. In South Australia and increasingly in Victoria and Queensland, midday solar generation frequently pushes grid emissions intensity below 200 gCO2e/kWh. Overnight, when solar drops off and demand is lower, the mix shifts depending on state and season.
Tools like the AEMO's API or third-party services such as Electricity Maps provide real-time and forecast grid intensity data. You can feed this into a job scheduler - whether that is a custom script wrapping Slurm, a Kubernetes-based system like Volcano or Kueue, or a managed ML platform - to preference queuing large training runs for low-intensity windows.
Concrete example: A Sydney-based financial services firm running weekly model retraining jobs moved their pipeline from a fixed 11pm Tuesday schedule to a dynamic scheduler that targets periods when the NSW grid intensity forecast drops below 350 gCO2e/kWh. The jobs still complete within a 12-hour window, but average emissions per training run dropped by approximately 22% with zero change to the underlying model or infrastructure.
Right-sizing GPU allocation
Most ML engineers request GPU resources conservatively high to avoid job failures. This is rational from a reliability standpoint but wasteful from an energy standpoint. Implementing a profiling step before resource allocation - using tools like NVIDIA's Nsight Systems or even simple utilisation logging via nvidia-smi - lets you match GPU count and memory allocation to actual job requirements.
A job that genuinely needs 4 x A100s should get them. A job that historically runs at 30% GPU utilisation on 4 cards should be re-examined before you provision it that way again.
Optimising Model Serving Infrastructure
Training gets most of the attention, but inference is where the ongoing energy cost lives for production systems. A model that is queried millions of times per day has a very different energy profile to a weekly training run.
Batching and request coalescing
Serving individual inference requests one at a time is energy-inefficient. Batching multiple requests together allows the GPU to do more useful work per watt. Most production serving frameworks - TorchServe, Triton Inference Server, vLLM for language models - support dynamic batching. If you are not using it, you are leaving efficiency on the table.
The trade-off is latency. For interactive applications, you need to tune batch window sizes carefully. For asynchronous workloads like document processing pipelines, aggressive batching is almost always the right call.
Model quantisation and distillation
Running a full-precision FP32 model when an INT8 or FP16 quantised version delivers acceptable accuracy is wasteful. Quantisation typically reduces memory bandwidth requirements by 50-75%, which translates directly to lower energy consumption per inference.
For organisations running proprietary models, knowledge distillation - training a smaller student model to replicate the behaviour of a larger teacher - can reduce serving costs by an order of magnitude for well-scoped tasks. A distilled model handling a specific classification task will almost always be more energy-efficient than routing that task through a general-purpose large model.
Choosing and Evaluating Your Hosting Provider
If you are using cloud or colocation infrastructure, your provider's sustainability commitments and physical infrastructure choices matter as much as your own operational practices.
Questions to ask your hosting provider
- What is the current PUE of the facility your workloads run in? Anything above 1.5 is worth scrutinising.
- What percentage of energy consumption is matched with renewable energy certificates (RECs) or power purchase agreements (PPAs)? RECs and PPAs are not the same thing - RECs can be purchased separately from actual generation, while PPAs represent a direct relationship with a renewable generator.
- Does the provider offer carbon-aware routing or scheduling tools? AWS, Google Cloud, and Azure all have some version of this, though the granularity varies significantly.
- Where are the facilities physically located? Facilities in cooler climates or locations with access to hydro or geothermal power have structural efficiency advantages.
Australian organisations should also be aware of the federal government's Safeguard Mechanism reforms, which are progressively tightening emissions baselines for large facilities. If your data centre operations fall under this mechanism, or if your provider's operations do, this has direct compliance implications that are worth tracking.
Measuring What You Actually Emit
You cannot manage what you do not measure. Most organisations have surprisingly poor visibility into the actual emissions associated with their AI workloads.
Building a workload-level emissions model
A practical approach involves three data inputs:
- Energy consumption per workload - GPU hours consumed, multiplied by the power draw profile of the hardware. Cloud providers increasingly expose this through billing APIs or sustainability dashboards.
- Grid emissions intensity - The average or marginal emissions intensity of the grid at the time and location the workload ran.
- Upstream infrastructure emissions - The embodied carbon in hardware manufacturing and the operational emissions of cooling and ancillary systems, typically represented through the PUE multiplier.
Multiplying these together gives you a reasonable estimate of Scope 2 emissions (and some Scope 3 upstream emissions) attributable to specific workloads.
Tools like CodeCarbon (open source, Python library) can instrument training runs directly. For inference workloads, you will likely need to build something custom or use your cloud provider's emissions reporting tools combined with your own grid intensity data.
Setting meaningful targets
Once you have baseline measurements, you can set targets that are grounded in operational reality rather than aspirational statements. A reasonable starting point for most organisations is a 20-30% reduction in emissions intensity per unit of useful work over 12 months, achievable through the scheduling and optimisation steps described above without significant infrastructure investment.
Practical Priorities for Australian Organisations
Given Australia's specific grid characteristics, regulatory environment, and typical hosting options, the following prioritisation makes sense for most organisations:
- Start with scheduling. The NEM's renewable variability makes time-shifting genuinely impactful, more so than in grids with flatter emissions profiles.
- Audit your inference serving setup. Batching and quantisation are low-risk, high-return changes that do not require new infrastructure.
- Engage your hosting provider on PUE and energy sourcing. Australian colocation providers vary significantly in their sustainability credentials, and the conversation is worth having before contract renewal.
- Instrument your workloads. Even rough emissions estimates are better than none. You need a baseline to improve against.
- Avoid carbon offsets as a primary strategy. Offsets have their place, but they should not substitute for operational efficiency improvements. Reducing actual consumption is always preferable.
What to Do Next
If you are running AI workloads at any meaningful scale and have not yet done a systematic review of your energy consumption and emissions profile, that is the logical starting point.
At Exponential Tech, we work with Australian organisations to audit their existing AI infrastructure, identify the highest-impact efficiency opportunities, and implement workload management practices that reduce both cost and emissions. This is not a theoretical exercise - the changes described in this article are operational, measurable, and achievable with existing tooling.
The green AI data centre conversation in Australia is moving from optional to expected, driven by both regulatory pressure and genuine cost incentives. Organisations that build operational discipline around energy efficiency now will be better positioned as those pressures increase.
If you want to understand where your AI workloads stand today, get in touch with the Exponential Tech team to discuss a workload emissions audit.