Berserq's AI Artist comprises multiple AI models to generate images from text, including a latent diffusion model to tackle the high dimensionality of pixel space of images. The latent diffusion model also addresses the problem of resource intensity during the training phase and the high quantity of parameters typically required for GANs and autoregressive models.
An autoencoder model is pretrained that learns a compact latent space that is perceptually equivalent to the pixel space. The autoencoder outputs a tensor product of latent codes. The latent embedding is regularised with vector quantisation within the decoder.
A diffusion model is trained inside the latent space of the autoencoder in a more computationally efficient method. The diffusion model is placed between the convolutional encoder-decoder. The denoising model is a U-Net (convolutional neural network) that predicts the noise that added to the latent codes in the previous diffusion process step.
Finally, a generative process via cross attention using flattened features from the intermediate layers of the U-Net creates the output images.
One challenge for Berserq's AI Artist is to resist exacerbating societal biases. This is addressed by ensuring the training data is diverse and truly representative. One dataset that is used is the non-curated LAION-400M dataset.