Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wav headers for PCM streaming doesn't work #894

Open
devpras22 opened this issue Nov 5, 2024 · 9 comments
Open

wav headers for PCM streaming doesn't work #894

devpras22 opened this issue Nov 5, 2024 · 9 comments
Labels

Comments

@devpras22
Copy link

devpras22 commented Nov 5, 2024

When streaming WAV format:

OpenAI sends 24kHz PCM chunks
We add WAV headers specifying 24kHz
Library reads 24kHz from WAV header
But plays too fast

Same library works correctly for:

Complete WAV files (plays at correct 24kHz)
MP3 streaming (auto-detects rate)
Any non-streaming WAV playback

Root cause appears to be:

I2S initialized at 44.1kHz in constructor
During WAV streaming, sample rate change doesn't stick
Resulting in 24kHz audio playing at 44.1kHz speed

This suggests the issue is specific to how Audio.h handles sample rates during WAV/PCM streaming, not with regular WAV file playback.

@schreibfaul1
Copy link
Owner

It may be that the RIFF header is not always read correctly. do you have an example for me?

@devpras22
Copy link
Author

Thank you for your response. I’ve done more testing and found:

  1. When we send a complete WAV file with proper RIFF structure (using Whisper API):
const fullAudioBuffer = Buffer.concat(chunks);
const wavHeader = generateWavHeader(fullAudioBuffer.length, 24000, 1, 16);
const fullWavBuffer = Buffer.concat([wavHeader, fullAudioBuffer]);
  • This plays at correct speed
  • Sample rate is properly set to 24000Hz from the WAV header
  • WAV format is recognized (codec = 1)
  1. Then with realtime streaming:
// First send WAV header
const wavHeader = generateWavHeader(100000, 24000, 1, 16);  // Estimated size
res.write(wavHeader);

// Then stream PCM chunks
audioStream.write(audioData);  // Raw PCM chunks
  • Sample rate STAYS at 44100Hz (default) - library doesn't read/set rate from our WAV header
  • WAV is recognized (codec changes to 1 during playback)
  1. Interesting observation:
  • If we run the Whisper code first (which sets rate to 24000Hz)
  • Then run realtime streaming WITHOUT restarting ESP32
  • Sample rate stays at 24000Hz (from previous Whisper run)
  • But audio STILL plays faster than it should

This suggests the issue isn't just about sample rate, but how the library handles the RIFF header during streaming. Could you advise on the correct way to maintain WAV/RIFF structure during streaming? Or should we handle streaming PCM differently?"​​​​​​​​​​​​​​​​

@schreibfaul1
Copy link
Owner

The sample rate is not changed within a file. If you have a data stream, have a look at the first RIFF header, where you will find the file length. You will probably find another RIFF header at the end of the first file.

@devpras22
Copy link
Author

I tried starting the stream with one RIFF header at 24kHz, but the ESP32 still defaults to 44.1kHz during playback, causing the audio to play too fast.

Based on your last comment, could you clarify:

  1. Header Structure for Streaming: Should I periodically send new RIFF headers during streaming, or is there a specific structure that the library requires to correctly interpret and set the 24kHz rate?
  2. File Length in RIFF Header: Since this is a real-time stream, we don’t have a predefined file size. Is there an ideal way to handle the file length in the initial RIFF header to ensure the library interprets and sets the 24kHz rate?

@schreibfaul1
Copy link
Owner

It may be that there was a problem with wav mono. I had previously doubled the individual samples for the channels. This is now done by the I2S itself. Hope this works better and you can play your stream at the right speed.

@devpras22
Copy link
Author

devpras22 commented Nov 7, 2024

Thanks for your recent update to the library for handling mono playback. I've tried a few approaches to get the ESP32 to recognize the 24kHz sample rate for streaming, but so far, it’s still defaulting to 44.1kHz even with this new update.

I have tried the following 3 things -

  1. Sending the RIFF Header separately on its own before the audio stream hoping the ESP32 would recognize the sample rate setup before the audio data.
  2. Sending the RIFF Header with the First Audio Chunk so the ESP32 received both the header and the audio data in one go.
  3. Sending the RIFF Header with Every Audio Chunk as a last attempt, even though this isn’t typical for a continuous stream. The sample rate still didn’t apply.

please help! I have exhausted all options here now.

here's the 3 different ways (in code) I tried to implement for sending the RIFF header -

const wavHeader = generateWavHeader(4800, 24000, 1, 16); 
res.write(wavHeader); 
audioStream.write(audioData); 
if (isFirstChunk) {
    const wavHeader = generateWavHeader(audioData.length, 24000, 1, 16);
    const fullFirstChunk = Buffer.concat([wavHeader, audioData]);
    audioStream.write(fullFirstChunk);  // Send combined header + first chunk
    isFirstChunk = false;
} else {
    audioStream.write(audioData);  // Send subsequent chunks as raw PCM
}
const wavHeader = generateWavHeader(audioData.length, 24000, 1, 16);
const chunkWithHeader = Buffer.concat([wavHeader, audioData]);
audioStream.write(chunkWithHeader);  // Send header with every chunk

also note if i combine all chunks and then stream that, the esp32 reads the RIFF header and changes the sample rate correctly.. here's the code for that -

      const fullAudioBuffer = Buffer.concat(chunks);
      const wavHeader = generateWavHeader(fullAudioBuffer.length, 24000, 1, 16);
      const fullWavBuffer = Buffer.concat([wavHeader, fullAudioBuffer]);
      res.setHeader('Content-Type', 'audio/wav');
      res.setHeader('Content-Length', fullWavBuffer.length);
      res.write(fullWavBuffer);

wavheader function

function generateWavHeader(dataSize, sampleRate = 24000, channels = 1, bitsPerSample = 16) {
    const byteRate = sampleRate * channels * bitsPerSample / 8;
    const blockAlign = channels * bitsPerSample / 8;
    const header = Buffer.alloc(44);
    header.write("RIFF", 0);
    header.writeUInt32LE(36 + dataSize, 4);
    header.write("WAVE", 8);
    header.write("fmt ", 12);
    header.writeUInt32LE(16, 16);
    header.writeUInt16LE(1, 20);
    header.writeUInt16LE(channels, 22);
    header.writeUInt32LE(sampleRate, 24);
    header.writeUInt32LE(byteRate, 28);
    header.writeUInt16LE(blockAlign, 32);
    header.writeUInt16LE(bitsPerSample, 34);
    header.write("data", 36);
    header.writeUInt32LE(dataSize, 40);
    return header;
}

Copy link

github-actions bot commented Dec 8, 2024

This issue is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale label Dec 8, 2024
@JeffryCA
Copy link

I am having trouble even playing audio from
https://storage.googleapis.com/app-toy2life-eu-3/audio_responses/45f717a7-6687-454c-917f-39f39e45d0a5.wav

It is 24000 pcm data generated by elevenlabs and before saving the file I added the wav header

def create_wav_buffer(data: bytes, sample_width: int, sample_rate: int) -> BytesIO:
    wav_buffer = BytesIO()
    with closing(wave.open(wav_buffer, "wb")) as wav_file:
        wav_file.setnchannels(1)  # Mono
        wav_file.setsampwidth(sample_width // 8)  # x-bit PCM
        wav_file.setframerate(sample_rate)  # x kHz sample rate
        wav_file.writeframes(data)
    wav_buffer.seek(0)
    return wav_buffer
    
 wav_buffer = create_wav_buffer(data=audio_data, sample_width=16, sample_rate=24000)

@github-actions github-actions bot removed the stale label Dec 21, 2024
Copy link

This issue is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale label Jan 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants