When high-dimensional texts are adopted as input,images generated by the previously proposed deep convolutional generative adversarial network (DCGAN) model usually suffer from distortions and structure degradation due to the sparsity of texts, which seriously poses a negative impact on the generative performance. To address this issue, an improved deep convolutional generative adversarial network model, CA-DCGAN is proposed. Technically, a deep convolutional network and a recurrent text encoder are simultaneously employed to encode the input text so that the corresponding text embedding representation can be obtained. Then, a conditional augmentation (CA) model is introduced to generate an additional condition variable to replace the original high-dimensional text feature. Finally, the conditional variable and random noise are combined as the input of the generator. Meanwhile, to avoid over-fitting and promote the convergence,we introduce a KL regularization term into the generator’s loss. Moreover, we adopt a spectral normalization (SN) layer in the discriminator to prevent the mode collapse caused by the unbalanced training due to the fast gradient descent of the discriminator. The experimental verification results show that the proposed model on the Oxford-102-flowers and CUB-200 datasets is better than that of alignDRAW, GAN-CLS, GAN-INT-CLS, StackGAN (64×64), StackGAN-vl (64×64) in terms of the quality of generated images. The results show that the lowest inception score increased by 10.9% and 5.6% respectively, the highest inception score increased by 41.4% and 37.5% respectively, while the lowest FID index value decreased by 11.4% and 8.4% respectively,the highest FID index value decreased by 43.9% and 42.5% respectively,which further validate the effectiveness of the proposed method.