Pixelcraft: AI-Powered Artistic Innovation

Main Article Content

  MD Fouziya
  Avula Sruthi
  Paila Prathina
  Parsha Sushma
  Madas Rithika
  Helmie Arif Wibawa

Abstract

Background of study: Recent breakthroughs in Artificial Intelligence (AI) have significantly advanced text-to-image generation, enabling machines to convert natural language descriptions into realistic visual outputs. Stable Diffusion has emerged as a promising solution, offering high-fidelity results with improved controllability and accessibility. To leverage these strengths, this study introduces PixelCraft, an AI-powered text-to-image generation system designed to support creative, educational, and industrial applications.
Aims: The purpose of this paper is to design, develop, and evaluate PixelCraft, an intuitive AI system that generates coherent images from textual prompts using Stable Diffusion.
Methods: PixelCraft integrates a Stable Diffusion pipeline implemented using Hugging Face libraries and wrapped in a Tkinter-based graphical interface for seamless user interaction. The system processes user prompts, executes diffusion-based denoising stages, and outputs generated images that can be viewed and saved. A structured evaluation was conducted using widely accepted performance metrics, including CLIP similarity scores, Fréchet Inception Distance (FID), and Structural Similarity Index Measure (SSIM). Comparative analyses were performed against models such as BigGAN, VQ-VAE-2, and DALL·E-2.
Result: Experimental findings show that PixelCraft achieves strong semantic alignment and visual coherence, yielding an average CLIP score of 0.95, an FID score of ~15, and an SSIM of 0.91. These results outperform several benchmark models, demonstrating superior consistency across both simple and moderately complex prompts.
Conclusion: PixelCraft effectively demonstrates Stable Diffusion's ability to generate high-quality images from natural-language descriptions. The system provides a practical, accessible platform for artists, educators, and digital content creators, significantly reducing barriers associated with traditional design tools.

Article Details

How to Cite
Fouziya, M., Sruthi, A., Prathina , P., Sushma, P., Rithika, M., & Wibawa, H. A. (2025). Pixelcraft: AI-Powered Artistic Innovation. International Journal of Advances in Artificial Intelligence and Machine Learning, 2(3), 199–207. https://doi.org/10.58723/ijaaiml.v2i3.460
Section
Articles

References

Avrahami, O., Hebrew, T., Lischinski, D., & Hebrew, T. (2023). Blended Latent Diffusion. ACM Transactions on Graphics (TOG), 42(4), 1–11. https://doi.org/10.1145/3592450

Bansal, G., Nawal, A., Chamola, V., & Herencsar, N. (2024). Revolutionizing Visuals : The Role of Generative AI in Modern Image Generation. ACM Transactions on Multimedia Computing, Communications and Applications, 20(11). https://doi.org/10.1145/3689641

Brade, S., Wang, B., Sousa, M., Oore, S., & Grossman, T. (2023). Promptify : Text-to-Image Generation through Interactive Prompt Exploration with Large Language Models. In Proceedings of ACM Conference (Conference’17) (Vol. 1, Issue 1). Association for Computing Machinery. https://doi.org/10.1145/3586183.3606725

Cai, L. (2023). Comparative Analysis the Super-Resolution Image Generation Performance Based on BigGAN and VQ-VAE-2. Highlights in Science, Engineering and Technology, 41, 202–210. https://doi.org/10.54097/hset.v41i.6812

Cao, W., Zhang, S., Li, Q., & Xu, R. (2023). STEP: Generating Semantic Text Embeddings with Prompt. Conference: 2023 Eleventh International Conference on Advanced Cloud and Big Data (CBD), 180–185. https://doi.org/10.1109/CBD63341.2023.00040

Faez, S., & Anwer, A. (2024). An Improved Image Generation Conditioned on Text Using Stable Diffusion Model. Journal of Al-Qadisiyah for Computer Science and Mathematics, 16(4), 1–14. https://doi.org/10.29304/jqcsm.2024.16.41772

Fang, S. (2024). EAI Endorsed Transactions A Comprehensive Survey of Text Encoders for Text-to- Image Diffusion Models. EAI Endorsed Transactions on AI and Robotics, 3, 1–11. https://doi.org/10.4108/airo.5566

Frolov, S., Hinz, T., Raue, F., Hees, J., & Dengel, A. (2021). Adversarial text-to-image synthesis : A review. Neural Networks, 144, 187–209. https://doi.org/10.1016/j.neunet.2021.07.019

Guo, H., Xie, F., Soong, F. K., Wu, X., & Meng, H. (2023). A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31, 1811–1824. https://doi.org/10.1109/TASLP.2023.3272470

Indumathi, D., & Tharani, S. (2024). Evaluating Text-to-Image Generation Methods : Stable Diffusion vs Generative Adversarial Networks ( GANs ). International Journal for Research in Applied Science & Engineering Technology (IJRASET), 12(XI), 2523–2533. https://doi.org/10.22214/ijraset.2024.65677

Ivezić, D., & Babac, M. B. (2023). Trends and Challenges of Text-to-Image Generation : Sustainability Perspective. Croatian Regional Development Journal, 4(1), 56–77. https://doi.org/10.2478/crdj-2023-0004

Jamal, S., & Wimmer, H. (2024). Perception and evaluation of text-to-image generative AI models : a comparative study of DALL-E , Google Imagen , GROK , and Stable Diffusion. Issues in Information Systems, 25(2), 277–292. https://doi.org/10.48009/2_iis_2024_123

Kang, M., Shin, J., & Park, J. (2023). StudioGAN : A Taxonomy and Benchmark of GANs for Image Synthesis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(12), 15725–15742. https://doi.org/10.1109/TPAMI.2023.3306436

Li, J., Wang, H., Li, Y., & Zhang, H. (2025). A Comprehensive Review of Image Restoration Research Based on Diffusion Models. Mathematics, 13(13), 1–37. https://doi.org/10.3390/math13132079

Li, Y., Chen, M., Yang, W., Wang, K., Ma, J., Bovik, A. C., & Zhang, Y. (2020). SAMScore : A Semantic Structural Similarity Metric for Image Translation Evaluation. IEEE Transactions on Artificial Intelligence, 18(9), 1–20. https://doi.org/10.48550/arXiv.2305.15367

Po, R., Yifan, W., Golyanik, V., Aberman, K., Barron, J. T., Bermano, A., Chan, E., Dekel, T., Holynski, A., Kanazawa, A., Liu, C. K., Liu, L., Mildenhall, B., Nießner, M., Ommer, B., Theobalt, C., Wonka, P., & Wetzstein, G. (2024). State of the Art on Diffusion Models for Visual Computing. Computer Graphics Forum, 43(2). https://doi.org/10.1111/cgf.15063

Rombach, R., Blattmann, A., & Lorenz, D. (2022). High-Resolution Image Synthesis with Latent Diffusion Models. ArXiv, 1–45. https://doi.org/10.48550/arXiv.2112.10752

Sai, P. C., Karthik, K., Prasad, K. B., & Pranav, C. V. S. (2024). Real-Time Task Manager: A Python-Based Approach Using Psutil and Tkinter. Conference: 2024 8th International Conference on Computational System and Information Technology for Sustainable Solutions (CSITSS). https://doi.org/10.1109/ CSITSS64042.2024.10816758

Schuhmann, C., Beaumont, R., Vencu, R., Gordon, C., Wightman, R., Cherti, M., Coombes, T., Katta, A., Mullis, C., Wortsman, M., Schramowski, P., Kundurthy, S., & Crowson, K. (2022). LAION-5B : An open large-scale dataset for training next generation image-text models. NIPS’22: Proceedings of the 36th International Conference on Neural Information Processing Systems, 25278–25294. https://doi.org/10.48550/arXiv.2210.08402

Shivani, J., Sanika, P., Vijay, Z., & Pachhade, R. C. (2025). Gesture-Based Air Writing System Utilizing Computer Vision. International Research Journal on Advanced Engineering Hub (IRJAEH), 03(May), 2309–2312. https://doi.org/10.47392/IRJAEH.2025.0340 Gesture-Based

Wang, F., Zhang, Z., Li, L., & Long, S. (2024). Virtual Reality and Augmented Reality in Artistic Expression : A Comprehensive Study of Innovative Technologies. International Journal of Advanced Computer Science and Applications(IJACSA), 15(3), 641–649. https://doi.org/10.14569/IJACSA.2024.0150365

Wang, Y., & Zhang, G. (2025). Lightweight Text-to-Image Generation Model Based on Contrastive Language-Image Pre-Training Embeddings and Conditional Variational Autoencoders. Electronics (Switzerland), 14(11), 1–31. https://doi.org/10.3390/electronics14112185

Wo, Z. (2025). A Review of Generative Adversarial Networks for Text to Image Tasks. In Proceedings Ofthe 2nd International Conference on Data Science and Engineering (ICDSE 2025), 487–491. https://doi.org/10.5220/0013699800004670

Zhou, R., Jiang, C., & Xu, Q. (2021). Neurocomputing A survey on generative adversarial network-based text-to-image synthesis. Neurocomputing, 451, 316–336. https://doi.org/10.1016/j.neucom.2021.04.069

Zuo, Q., Gu, X., Dong, Y., Zhao, Z., & Yuan, W. (2024). High-Fidelity 3D Textured Shapes Generation by Sparse Encoding and Adversarial Decoding. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 52-69. https://doi.org/10.1007/978-3-031-72684-2_4