Skip to content

bring DAC sample rate closer to 48kHz#500

Open
evnoj wants to merge 1 commit intomonome:mainfrom
evnoj:dac-clock-fix
Open

bring DAC sample rate closer to 48kHz#500
evnoj wants to merge 1 commit intomonome:mainfrom
evnoj:dac-clock-fix

Conversation

@evnoj
Copy link
Copy Markdown
Contributor

@evnoj evnoj commented Apr 30, 2026

This PR brings the output sample rate as close to 48kHz as I believe is possible. Currently, the sample rate of the outputs is not actually 48kHz, it it is 46.875kHz. This is caused by the clock configuration for the I2S peripheral that drives the DAC. The effect is that the rate of change of output voltage is slower than expected (a.k.a. it takes longer for a voltage to reach its target than expected). This is because lib/slopes.h:8 uses a sample rate of 48kHz to calculate the number of samples per ms, and lib/slopes.c:112 uses this value to calculate the per-sample voltage delta.

LLM disclosure

This issue was initially found by an LLM, when I was hacking on the firmware. I couldn't figure out why I was unable to accurately change voltage over time at a specified speed by calculating the delta that should be applied per sample. This calculation relied on the sample rate being 48kHz, but the voltage's rate of change was always slightly slower than expected.

This issue went significantly deeper into the workings of a microcontroller than I was familiar with, so I have spent a lot of time learning about clocking on microcontrollers, reading the STM32F72 manual, and reading the crow code and HAL library code. I have asked an LLM questions about the microcontroller and about specifics of how crow uses the I2S peripheral. I've tried very hard to ensure that I understand the issue and factors at play, and come up with a good solution, but this is new territory for me and I apologize if I've missed something or not made sense.

This report and all code is entirely human written. Sorry if it is too verbose, I wasn't sure what parts I could gloss over and didn't want to make assumptions.

I understand if you are uninterested in engaging with this due to the LLM assistance, if that is the case feel free to close this.

explanation

To achieve 48kHz on each of the 4 outputs, we need to be sending I2S frames (where a frame is a sample for one output) at $4\ channels*48kHz=192kHz$.

The STM32F72 reference manual describes the sample frequency (frequency at which a frame is generated) on pg. 991. This is the formula for PCM mode with MCKOE enabled (which is configured in ll/dac8565.c via I2S_STANDARD_PCM_SHORT and I2S_MCLKOUTPUT_ENABLE):

$$ Fs = \frac{F_{\text{I2SxCLK}}}{128 \times ((2 \times \text{I2SDIV}) + \text{ODD})} $$

  • $Fs$ is the sample rate
  • $F_{\text{I2SxCLK}}$ is the output rate of the PLLI2S clock
  • $\text{I2SDIV}$ is 2-255, $\text{ODD}$ is 0 or 1, so $(2 \times \text{I2SDIV}) + \text{ODD}$ is 4-511

We can consider $(2 \times \text{I2SDIV}) + \text{ODD}$ as a single term, an integer divider in the range 4-511, simplifying the sample rate formula:

$$ Fs=\frac{F_{\text{I2SxCLK}}}{128 \times \text{div}} $$

The PLL multiplier ($N$) and divider ($R$) for the PLLI2S clock is set in HAL_I2S_MspInit() within ll/dac8565.c:

RCC_ExCLKInitStruct.PLLI2S.PLLI2SN = 384; // mul factor: 50~432
RCC_ExCLKInitStruct.PLLI2S.PLLI2SR = 2; // div factor: 2~7

And the input to the PLLI2S clock is the either the HSE or HSI, divided by PLLM (see the clock tree on pg. 131 of the manual). This is set in ll/system.c:

osc.PLL.PLLSource  = RCC_PLLSOURCE_HSE;
osc.PLL.PLLM       = 8;

With an HSE of 8MHz and a PLLM of 8, PLLI2S receives a 1MHz clock. Using PLLI2SN = 384 and PLLI2SR = 2, the PLLI2S output clock is $1\text{MHz} \times 384 \div 2 = 192\text{MHz}$

So, to achieve our desired sample rate of 192kHz, we must find a valid divider that plugs into our formula given $F_{\text{I2SxCLK}} = 192\text{MHz}$. The HAL allows us to specify our desired $Fs$, and it will set the divider as close as it can. This is done in ll/dac8565.c:

dac_i2s.Init.AudioFreq    = I2S_AUDIOFREQ_192K; // manipulates mclk to hit 48kHz on 4 chans

This calculation is at STM32F7xx_HAL_Driver/Src/stm32f7xx_hal_i2s.c:354 (since we have MCLK enabled, and our DataFormat is 24B):

tmp = (uint32_t)(((((i2sclk / (packetlength * 4U)) * 10U) / hi2s->Init.AudioFreq)) + 5U);
// ... later, on line 368
tmp = tmp / 10U;
  • i2sclk is 192,000,000
  • packetlength is 32
  • AudioFreq is 192,000

The multiplication by 10, adding 5, and dividing by 10 rounds to the nearest integer. Without the rounding, the simplified formula is:

$$ \text{div} = \frac{F_{\text{I2SxCLK}}}{128 \times F_s} $$

Plugging in our values:

$$ \text{div} = \frac{192000000}{128 \times 192000} = \frac{1000}{128} = 7.8125 $$

$\text{div}$ must be an integer, so it is rounded to 8. Plugging this back into the sample rate formula:

$$ Fs = \frac{F_{\text{I2SxCLK}}}{128 \times \text{div}} = \frac{192000000}{128 \times 8} = \frac{192000000}{1024} = 187500 = 187.5 \text{kHz} $$

Spread across four outputs, the sample rate of each output is 187.5kHz / 4 = 46.875kHz

fix

The HAL is already finding the divider that gets the closest possible sample rate to 192kHz, given the $F_{\text{I2SxCLK}}$ of 192MHz. We want to find the value of $F_{\text{I2SxCLK}}$ for which an integer divider in the range 4-511 can be found that gets the result of this calculation as close as possible to 192000:

$$ \frac{F_{\text{I2SxCLK}}}{128 \times \text{div}} $$

Where $F_{\text{I2SxCLK}} = \frac{N}{R} \times 10^6$ , $N$ is an integer 50-432 and $R$ is an integer 2-7.

This lua script simply tests every possible value to find the best combination of N and R:

-- find the integer "div" in range 4-511 that makes the result of the equation
-- in_clock_freq / (128*div)
-- as close as possible to 192000
-- returns:
    -- the best value of div
    -- the calculated actual sample rate
    -- the difference from 192000
function find_best_div(in_clock_freq)
    local best_div = 4
    local best_rate = in_clock_freq / (128*4)
    local best_diff = 192000 - best_rate

    for div=5,511 do
        local rate = in_clock_freq / (128*div)
        local diff = 192000 - rate
        if math.abs(diff) < math.abs(best_diff) then
            best_rate = rate
            best_diff = diff
            best_div = div
        end
    end

    return best_div,best_rate,best_diff
end

best = {
    n = 0,
    r = 0,
    div = 0,
    rate = 0,
    diff = math.maxinteger
}

for n=50,432 do
    for r=2,7 do
        local clock_freq = (n/r) * 1000000
        div,rate,diff = find_best_div(clock_freq)

        if math.abs(diff) < math.abs(best.diff) then
            best.n = n
            best.r = r
            best.div = div
            best.rate = rate
            best.diff = diff
        end
    end
end

print("best values:")
print("n: "..best.n)
print("r: "..best.r)
print("best div: "..best.div)
print("actual sample rate: "..best.rate)
print("error: "..string.format("%.4f", 100*math.abs(best.diff / 192000)).."%")

output:

best values:
n: 344
r: 2
best div: 7
actual sample rate: 191964.28571428571
error: 0.0186%

So we cannot land on 192kHz exactly, but we can get very close. The actual output sample rate we can achieve is $191.964 \text{kHz} / 4 = 47.992 \text{kHz}$

This is the same sample rate that Mutable Instruments Rings is able to hit.

testing

I made two lua scripts to verify the issue and test its resolution. One tests error with long envelopes by measuring elapsed time, and the other measures audio-rate frequencies and requires using a tuner/scope. Given the expected sample rate of 48000, and the actual sample rate of 46875, we'd expect error in the tests of $(48000-46875)/48000 \times 100 = 2.34$%

This is a script that measures the actual time it takes for the voltage to reach its target with several different durations:

results = {}
durations = {1, 10, 100} -- run tests over these durations

coro = coroutine.create(function()
    for _,duration in ipairs(durations) do
        results[duration] = {}
        local pending = 4

        for i=1,4 do
            output[i].action = {to(0,0), to(5,duration)}
            output[i].done = function()
                results[duration][i].time_done = time()
                pending = pending - 1
                if pending == 0 then
                    -- once all outputs have completed, start the next duration test
                    coroutine.resume(coro)
                end
            end

            results[duration][i] = {}
            results[duration][i].time_start = time()
            output[i]()
        end

        -- wait for ASL stages to complete before testing with the next duration
        coroutine.yield()
    end

    -- all tests done, print the results
    for duration,result in pairs(results) do
        print("\nresults for duration of "..duration.." seconds")
        print("---------")
        for i,t in ipairs(result) do
            local elapsed = (t.time_done - t.time_start) / 1000
            local error = string.format("%.2f%%", ((elapsed - duration) / duration) * 100)
            print("output "..i..":")
            print("  elapsed: "..elapsed)
            print("  error: "..error)
        end
    end
end)

-- start the tests
coroutine.resume(coro)

This is the output of that script on my crow w/ the current fw:

results for duration of 1 seconds
---------
output 1:
  elapsed: 1.024
  error: 2.40%
output 2:
  elapsed: 1.024
  error: 2.40%
output 3:
  elapsed: 1.023
  error: 2.30%
output 4:
  elapsed: 1.024
  error: 2.40%
 
results for duration of 10 seconds
---------
output 1:
  elapsed: 10.237
  error: 2.37%
output 2:
  elapsed: 10.238
  error: 2.38%
output 3:
  elapsed: 10.237
  error: 2.37%
output 4:
  elapsed: 10.238
  error: 2.38%
 
results for duration of 100 seconds
---------
output 1:
  elapsed: 102.397
  error: 2.40%
output 2:
  elapsed: 102.398
  error: 2.40%
output 3:
  elapsed: 102.397
  error: 2.40%
output 4:
  elapsed: 102.398
  error: 2.40%

The errors are near the expected error of 2.34%

This is the output with the updated clock configuration:

results for duration of 1 seconds
---------
output 1:
  elapsed: 1.0
  error: 0.00%
output 2:
  elapsed: 1.0
  error: 0.00%
output 3:
  elapsed: 1.0
  error: 0.00%
output 4:
  elapsed: 1.0
  error: 0.00%
 
results for duration of 10 seconds
---------
output 1:
  elapsed: 9.999
  error: -0.01%
output 2:
  elapsed: 9.999
  error: -0.01%
output 3:
  elapsed: 10.0
  error: 0.00%
output 4:
  elapsed: 9.999
  error: -0.01%
 
results for duration of 100 seconds
---------
output 1:
  elapsed: 100.016
  error: 0.02%
output 2:
  elapsed: 100.017
  error: 0.02%
output 3:
  elapsed: 100.016
  error: 0.02%
output 4:
  elapsed: 100.016
  error: 0.02%

The errors are now close to 0%


This is a script that sets each of the 4 outputs to an audio-rate cycle, they should be plugged into a tuner/scope to measure their actual frequency:

function make_oscillator(freq)
    return loop{ to(-5,0)
               , to(5, 1/freq)
               }
end

output[1](make_oscillator(120))
output[2](make_oscillator(222))
output[3](make_oscillator(440))
output[4](make_oscillator(666))

My measurements:

  • current firmware
    • output 1
      • expected: 120Hz
      • actual: 117.19Hz
      • error: 2.34%
    • output 2
      • expected: 222Hz
      • actual: 216.80Hz
      • error: 2.342%
    • output 3
      • expected: 440Hz
      • actual: 429.68Hz
      • error: 2.345%
    • output 4
      • expected: 666Hz
      • actual: 650.38Hz
      • error: 2.345%
  • firmware w/ fix
    • output 1
      • expected: 120Hz
      • actual: 119.98Hz
      • error: 0.016%
    • output 2
      • expected: 222Hz
      • actual: 221.96Hz
      • error: 0.018%
    • output 3
      • expected: 440Hz
      • actual: 439.91Hz
      • error: 0.02%
    • output 4
      • expected: 666Hz
      • actual: 665.86Hz
      • error: 0.021%

final notes

Fixing this issue may impact scripts which have compensated for the sample rate error. The only one I know of is nb_drumcrow, which has this line that multiplies the cycle time by 1.023996 (close to the 2.34% sample rate error) to achieve the expected frequency:

cyc = 1/(min_(max_(cyc * 1.023996 * s.detune, 0.1), FREQ_LIMIT[s.shape] * max_freq)) -- for correct (de)tuning

Let me know if you have any questions, thoughts, or concerns. And I get it if you're not interested in changing the MCU clock configuration for a module that people haven't had a problem with for years and is no longer in production.

The DAC is driven by the I2S peripheral. The STM32F72 reference manual describes the formula for the sample rate of the I2S in PCM mode with MCLK enabled on pg. 991. To have 4 channels outputting at the desired 48kHz, our I2S sample frequency must be 48kHz*4=192kHz.

To achieve this, we need to set PLLI2SN and PLLI2SR such that a divider can be found that gets the sample rate as close to 192kHz as possible. The best values are N=344 and R=2, which results in an actual sample rate of 191.964kHz, so a per-output sample rate of 47.992kHz
@tehn
Copy link
Copy Markdown
Member

tehn commented Apr 30, 2026

thank you for digging into this! paging @trentgill for review

@tehn tehn requested a review from trentgill April 30, 2026 21:47
@trentgill
Copy link
Copy Markdown
Collaborator

looks good! there is a kind of cursed beauty in seeing the amount of work / investigation / research you did to change a single constant — such is the embedded engineer’s life! this would be a great case study to show to a manager of software engineers to explain why lines of code is a ridiculous metric for performance.

i don’t have time to test it myself, but your testing methodology looks sound.

regarding release & breaking changes, this feels like the only challenge. we do have a few other small changes/fixes (including clock details from @evnoj ) but little that feels like a compelling reason to upgrade. i’m hesitant to add this fix, knowing that it will cause friction for folks who do rely on nb_drumcrow for perhaps no benefit. what are you thoughts on this @tehn ?

@evnoj thanks for digging in here!

@tehn
Copy link
Copy Markdown
Member

tehn commented May 1, 2026

my impulse is to merge it. for anyone using nb_drumcrow, only a small fix to that script is needed. but to me there is likely a much greater number of people impacted by the timing issues who are just navigating their own scripts, for example i'd really like this script to be accurate:

function make_oscillator(freq)
    return loop{ to(-5,0)
               , to(5, 1/freq)
               }
end

output[1](make_oscillator(120))

i understand it's a hassle to build another release, however. in any case, i feel like it's worth merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants