Open Access Open Access  Restricted Access Subscription Access

SketchWhisper

Yash Dileep Barai, Yash Manish Chavda, Md Endan Mustadun Mollick, Krish Deepak Savla, Bhagyashree Patil

Abstract


This paper presents a pioneering method for developing a system that transforms spoken input into vector sketch-style images, aiming to bridge the semantic divide between natural language descriptions and visual representations. Our approach introduces novel methodologies that diverge from conventional paradigms.Fundamentally, we employ stable diffusion models, customized and finely tuned on specialized datasets to optimize vector sketch generation. This innovative method directly tackles the challenge of maintaining coherence and stability within the generated sketches, ensuring fluid transitions and accurate depiction of stroke patterns. Moreover, we investigate the fusion of dual transformer models, with one dedicated to processing textual input and the other specialized in synthesizing vector stroke data for images. This integration aims to synergistically combine the strengths of transformer architectures with state-of-the-art techniques in vector sketch generation, thereby augmenting the system's capacity to capture intricate linguistic semantics and translate them into visually cohesive sketches. Our experimental and evaluative endeavors underscore the effectiveness and adaptability of these methodologies. Envisioned as a transformative tool, our system holds wide-ranging applications encompassing digital art creation, educational support, and assistive technologies. This research significantly propels the evolution of speech-to-draw systems, fostering a more intuitive and expressive landscape for human-computer interaction.


Full Text:

PDF

References


Jared N. Bott, Joseph J LaViola. (2015). The WOZ Recognizer: A Wizard of Oz Sketch Recognition System. Ksii Transactions on Internet and Information Systems, doi: 10.1145/2743029

Ayan Das., Yongxin, Yang., Timothy, M., Hospedales., Tao, Xiang., Yi-Zhe, Song. (2021). Cloud2 Curve: Generation and Vectorization of Parametric Sketches. doi: 10.1109/CVPR46437.2021.00701

Huang, F., & Canny, J. F. (2019, October). Sketchforme: Composing sketched scenes from text descriptions for interactive applications. In Proceedings of the 32nd annual ACM symposium on user interface software and technology (pp. 209-220).

Kenneth, D., Forbus., Ronald, W., Ferguson., Jeffery, M., Usher. (2001). Towards a computational model of sketching. 77-83. doi: 10.1145/359784.360278

Ha, D., & Eck, D. (2017). A neural representation of sketch drawings. arXiv preprint arXiv:1704.03477.

Fernandez-Fernandez, R., Victores, J. G., Estevez, D., & Balaguer, C. (2019). Quick, stat!: A statistical analysis of the quick, draw! dataset. arXiv preprint arXiv:1907.06417.

Adler, A., & Davis, R. (2007, August). Speech and sketching: An empirical study of multimodal interaction. In Proceedings of the 4th Eurographics workshop on Sketch-based interfaces and modeling (pp. 83-90).

Voynov, A., Aberman, K., & Cohen-Or, D. (2023, July). Sketch-guided text-to-image diffusion models. In ACM SIGGRAPH 2023 Conference Proceedings (pp. 1-11).

Yuan, Xue., Yuan-Chen, Guo., Han, Zhang., Tao, Xu., Song-Hai, Zhang., Xiaolei, Huang. (2022). Deep image synthesis from intuitive user input: A review and perspectives. Computational Visual Media, doi: 10.1007/S41095-021-0234-8

Lee-Na, Teh., Alvin, W., Yeo. (2009). Multilingual Multimodal Integration of Sketch and Speech: A Generic Speech Representation Model for Spatial Description. 17-22. doi: 10.1109/IALP.2009.13


Refbacks

  • There are currently no refbacks.