Advances in computing technologies — and in particular, in specialized hardware like GPUs and TPUs — have led to a boom in the development of deep neural networks. These powerful algorithms are the key to many successful applications in artificial intelligence, such as large language models and text-to-image generators. However, when it comes to deep neural networks, large never seems to be quite large enough. These models may have architectures composed of many billions, or even over a trillion, parameters. At some point, even the latest and greatest in hardware will be brought to its knees by the expansion of these networks.
Yet many research papers have suggested that very large models are in fact crucial for deep learning, so trimming them back to fit within the bounds of available hardware resources is likely to stymie technological progress. A team headed up by researchers at Colorado State University has proposed a new type of deep learning architecture that allows more parameters to be handled by less hardware, opening the door to larger deep learning models without requiring corresponding advances in hardware.