Kan Jen Cheng

I'm a student at UC Berkeley. Currently, I am doing audio research in Berkeley Speech Group.

My research interests center on auditory perception, sound synthesis, texture editing, and computer vision. Recognizing that human perception of the environment relies heavily on the interplay between auditory and visual cues, I aim to develop sophisticated multi-modal systems capable of integrating audio-visual information to enhance human understanding, interpretation, and interaction with the world.

Email  /  CV  /  Github

profile photo

Research

I'm interested in deep learning, generative AI, and audio processing. Most of my research is about inferring the physical world (speech, sound etc) from audio. Some papers are highlighted.

EMO-Reasoning: Benchmarking Emotional Reasoning Capabilities in Spoken Dialogue Systems
Jingwen Liu*, Kan Jen Cheng*, Jiachen Lian, Akshay Anand, Rishi Jain, Faith Qiao, Robin Netzorg, Huang-Cheng Chou, Tingle Li, Guan-Ting Lin, Gopala Anumanchipalli
ASRU, 2025  
project page / arXiv

A holistic benchmark for assessing emotional coherence in spoken dialogue systems through continuous, categorical, and perceptual metrics.

Audio Texture Manipulation by Exemplar-Based Analogy
Kan Jen Cheng*, Tingle Li*, Gopala Anumanchipalli
ICASSP, 2025  
project page / arXiv

An exemplar-based analogy model for audio texture manipulation that uses paired speech examples to learn transformations.


Last updated:

Template from Jon Barron.