Server Architecture Changing After Six Decades?

Amar Kapadia

March 29, 2024

At the Partner Reception at GTC last week, Jensen stated that the computer architecture is changing for the first time since 1964 with the advent of accelerated computing.

I tend to agree. After 60 years, the server architecture is changing from retrieval based to generative for the first time. The below diagram captures my thinking, which is centered around the Human Machine Interface (HMI).

From the mid 60s to mid 80s, the HMI was CLI and the “servers” were minicomputers and mainframes built by companies such as IBM, Digital Equipment Corporation, and HP using highly proprietary architecture. The focal point was the CPU. From the mid 80s to the mid 20s, the HMI of choice has been a GUI (largely based on Xerox PARC research) or REST APIs, which led to client server and its variations such as the current front-end↔back-end split. This era has been dominated by industry standard servers with a CPU focus. The winner has been the x86 CPU and its ecosystem. Networking, memory, I/O, storage, and datacenters have undergone a tremendous renaissance during this era.

Moving forward, the interface will be GenAI. It’s no longer going to be highly structured ways of interfacing with computers to retrieve information, but rather human-like communication based on dynamic generation of responses. Both input and output will be based on GenAI. After all, when we talk to humans, we don’t provide inputs through point-and-click screens and view outputs through dashboards. This era will be dominated by accelerated computing where the winner will be the GPU and its ecosystem. This doesn’t mean the CPU disappears. In fact, the CPU will always be needed, it’s just that it will take a back-seat.
‍

In my mind, there are three key tenets to this new architecture:

CPU, memory, I/O, networking, storage, datacenter all have to cater to the GPU and will change in fundamental ways
Utilization has to be 100% given the cost of GPUs; utilization has not been a concern so far
Use of completely greenfield technology stack
‍

This new world creates massive new opportunities for us (Aarna):

The infra needs to be orchestrated and managed in new ways. In the NVIDIA context that could take the shape of DGX-as-a-service and MGX-as-a-service.
Workload orchestration and management will take a front-row seat given the utilization concern. Sophisticated techniques are required such as bringing in secondary workloads on the same GPU cluster when the primary workload is easing up. The GPU owner may need to sell off GPU capacity to aggregators as a “spot instance” during periods of underutilization.
Given our use of the Kubernetes-Nephio framework, greenfield is music to our ears. We don’t have to worry about VMs or bare metal instances based on old operating systems.
‍

I’d love to hear your thoughts on these topics. Do reach out to me for a discussion.