Pixelcraft: AI-Powered Artistic Innovation
Main Article Content
Abstract
Background of study: Recent breakthroughs in Artificial Intelligence (AI) have significantly advanced text-to-image generation, enabling machines to convert natural language descriptions into realistic visual outputs. Stable Diffusion has emerged as a promising solution, offering high-fidelity results with improved controllability and accessibility. To leverage these strengths, this study introduces PixelCraft, an AI-powered text-to-image generation system designed to support creative, educational, and industrial applications.
Aims: The purpose of this paper is to design, develop, and evaluate PixelCraft, an intuitive AI system that generates coherent images from textual prompts using Stable Diffusion.
Methods: PixelCraft integrates a Stable Diffusion pipeline implemented using Hugging Face libraries and wrapped in a Tkinter-based graphical interface for seamless user interaction. The system processes user prompts, executes diffusion-based denoising stages, and outputs generated images that can be viewed and saved. A structured evaluation was conducted using widely accepted performance metrics, including CLIP similarity scores, Fréchet Inception Distance (FID), and Structural Similarity Index Measure (SSIM). Comparative analyses were performed against models such as BigGAN, VQ-VAE-2, and DALL·E-2.
Result: Experimental findings show that PixelCraft achieves strong semantic alignment and visual coherence, yielding an average CLIP score of 0.95, an FID score of ~15, and an SSIM of 0.91. These results outperform several benchmark models, demonstrating superior consistency across both simple and moderately complex prompts.
Conclusion: PixelCraft effectively demonstrates Stable Diffusion's ability to generate high-quality images from natural-language descriptions. The system provides a practical, accessible platform for artists, educators, and digital content creators, significantly reducing barriers associated with traditional design tools.
Article Details
Copyright (c) 2025 MD Fouziya, Avula Sruthi, Paila Prathina , Parsha Sushma, Madas Rithika, Helmie Arif Wibawa

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
References
Avrahami, O., Hebrew, T., Lischinski, D., & Hebrew, T. (2023). Blended Latent Diffusion. ACM Transactions on Graphics (TOG), 42(4), 1–11. https://doi.org/10.1145/3592450
Bansal, G., Nawal, A., Chamola, V., & Herencsar, N. (2024). Revolutionizing Visuals : The Role of Generative AI in Modern Image Generation. ACM Transactions on Multimedia Computing, Communications and Applications, 20(11). https://doi.org/10.1145/3689641
Brade, S., Wang, B., Sousa, M., Oore, S., & Grossman, T. (2023). Promptify : Text-to-Image Generation through Interactive Prompt Exploration with Large Language Models. In Proceedings of ACM Conference (Conference’17) (Vol. 1, Issue 1). Association for Computing Machinery. https://doi.org/10.1145/3586183.3606725
Cai, L. (2023). Comparative Analysis the Super-Resolution Image Generation Performance Based on BigGAN and VQ-VAE-2. Highlights in Science, Engineering and Technology, 41, 202–210. https://doi.org/10.54097/hset.v41i.6812
Cao, W., Zhang, S., Li, Q., & Xu, R. (2023). STEP: Generating Semantic Text Embeddings with Prompt. Conference: 2023 Eleventh International Conference on Advanced Cloud and Big Data (CBD), 180–185. https://doi.org/10.1109/CBD63341.2023.00040
Faez, S., & Anwer, A. (2024). An Improved Image Generation Conditioned on Text Using Stable Diffusion Model. Journal of Al-Qadisiyah for Computer Science and Mathematics, 16(4), 1–14. https://doi.org/10.29304/jqcsm.2024.16.41772
Fang, S. (2024). EAI Endorsed Transactions A Comprehensive Survey of Text Encoders for Text-to- Image Diffusion Models. EAI Endorsed Transactions on AI and Robotics, 3, 1–11. https://doi.org/10.4108/airo.5566
Frolov, S., Hinz, T., Raue, F., Hees, J., & Dengel, A. (2021). Adversarial text-to-image synthesis : A review. Neural Networks, 144, 187–209. https://doi.org/10.1016/j.neunet.2021.07.019
Guo, H., Xie, F., Soong, F. K., Wu, X., & Meng, H. (2023). A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31, 1811–1824. https://doi.org/10.1109/TASLP.2023.3272470
Indumathi, D., & Tharani, S. (2024). Evaluating Text-to-Image Generation Methods : Stable Diffusion vs Generative Adversarial Networks ( GANs ). International Journal for Research in Applied Science & Engineering Technology (IJRASET), 12(XI), 2523–2533. https://doi.org/10.22214/ijraset.2024.65677
Ivezić, D., & Babac, M. B. (2023). Trends and Challenges of Text-to-Image Generation : Sustainability Perspective. Croatian Regional Development Journal, 4(1), 56–77. https://doi.org/10.2478/crdj-2023-0004
Jamal, S., & Wimmer, H. (2024). Perception and evaluation of text-to-image generative AI models : a comparative study of DALL-E , Google Imagen , GROK , and Stable Diffusion. Issues in Information Systems, 25(2), 277–292. https://doi.org/10.48009/2_iis_2024_123
Kang, M., Shin, J., & Park, J. (2023). StudioGAN : A Taxonomy and Benchmark of GANs for Image Synthesis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(12), 15725–15742. https://doi.org/10.1109/TPAMI.2023.3306436
Li, J., Wang, H., Li, Y., & Zhang, H. (2025). A Comprehensive Review of Image Restoration Research Based on Diffusion Models. Mathematics, 13(13), 1–37. https://doi.org/10.3390/math13132079
Li, Y., Chen, M., Yang, W., Wang, K., Ma, J., Bovik, A. C., & Zhang, Y. (2020). SAMScore : A Semantic Structural Similarity Metric for Image Translation Evaluation. IEEE Transactions on Artificial Intelligence, 18(9), 1–20. https://doi.org/10.48550/arXiv.2305.15367
Po, R., Yifan, W., Golyanik, V., Aberman, K., Barron, J. T., Bermano, A., Chan, E., Dekel, T., Holynski, A., Kanazawa, A., Liu, C. K., Liu, L., Mildenhall, B., Nießner, M., Ommer, B., Theobalt, C., Wonka, P., & Wetzstein, G. (2024). State of the Art on Diffusion Models for Visual Computing. Computer Graphics Forum, 43(2). https://doi.org/10.1111/cgf.15063
Rombach, R., Blattmann, A., & Lorenz, D. (2022). High-Resolution Image Synthesis with Latent Diffusion Models. ArXiv, 1–45. https://doi.org/10.48550/arXiv.2112.10752
Sai, P. C., Karthik, K., Prasad, K. B., & Pranav, C. V. S. (2024). Real-Time Task Manager: A Python-Based Approach Using Psutil and Tkinter. Conference: 2024 8th International Conference on Computational System and Information Technology for Sustainable Solutions (CSITSS). https://doi.org/10.1109/ CSITSS64042.2024.10816758
Schuhmann, C., Beaumont, R., Vencu, R., Gordon, C., Wightman, R., Cherti, M., Coombes, T., Katta, A., Mullis, C., Wortsman, M., Schramowski, P., Kundurthy, S., & Crowson, K. (2022). LAION-5B : An open large-scale dataset for training next generation image-text models. NIPS’22: Proceedings of the 36th International Conference on Neural Information Processing Systems, 25278–25294. https://doi.org/10.48550/arXiv.2210.08402
Shivani, J., Sanika, P., Vijay, Z., & Pachhade, R. C. (2025). Gesture-Based Air Writing System Utilizing Computer Vision. International Research Journal on Advanced Engineering Hub (IRJAEH), 03(May), 2309–2312. https://doi.org/10.47392/IRJAEH.2025.0340 Gesture-Based
Wang, F., Zhang, Z., Li, L., & Long, S. (2024). Virtual Reality and Augmented Reality in Artistic Expression : A Comprehensive Study of Innovative Technologies. International Journal of Advanced Computer Science and Applications(IJACSA), 15(3), 641–649. https://doi.org/10.14569/IJACSA.2024.0150365
Wang, Y., & Zhang, G. (2025). Lightweight Text-to-Image Generation Model Based on Contrastive Language-Image Pre-Training Embeddings and Conditional Variational Autoencoders. Electronics (Switzerland), 14(11), 1–31. https://doi.org/10.3390/electronics14112185
Wo, Z. (2025). A Review of Generative Adversarial Networks for Text to Image Tasks. In Proceedings Ofthe 2nd International Conference on Data Science and Engineering (ICDSE 2025), 487–491. https://doi.org/10.5220/0013699800004670
Zhou, R., Jiang, C., & Xu, Q. (2021). Neurocomputing A survey on generative adversarial network-based text-to-image synthesis. Neurocomputing, 451, 316–336. https://doi.org/10.1016/j.neucom.2021.04.069
Zuo, Q., Gu, X., Dong, Y., Zhao, Z., & Yuan, W. (2024). High-Fidelity 3D Textured Shapes Generation by Sparse Encoding and Adversarial Decoding. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 52-69. https://doi.org/10.1007/978-3-031-72684-2_4
MD Fouziya