Thesis: Quantized Training For Graph Neural Networks

Graph Neural Networks (GNNs) have gained significant attention in recent years due to their exceptional performance in various domains including social networks, recommender systems, and drug discovery. These models have demonstrated remarkable capabilities in capturing complex relationships and structures within geometric data. Modern applications of GNNs rely on large datasets, which are crucial for capturing the intricate relationships and structures present in complex real-world system, however, training these models requires substantial memory and computational resources which increases with input graph size.

Although there has been some work investigating quantization during inference, there has been little research into the quantized training of GNNs. In this project, the quantized training of GNNs is explored using fixed-point quantization and Microsoft Floating Point (MSFP). To achieve optimal results, a range of strategies are employed including dynamic quantization. The results demonstrate that it is possible to achieve model accuracies which are within 1% of baseline FP32 trained models using 8-bit fixed-point quantization, offering an arithmetic and memory density increase of up to 7.7x and 4x respectively. These results can be replicated using Microsoft Floating Point quantization, offering an arithmetic and memory density increase of up to 18.3x and 5.8x respectively. By employing dynamic quantization schemes utilizing Microsoft Floating Point, accuracies which are within 1% of baseline 32-bit floating point are obtained offering potential gains in arithmetic and memory density of up to 26x and 6.4x respectively.