Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: reference channels shifts (disordered) over multiple recordings #309

Open
gusido opened this issue Aug 19, 2021 · 46 comments
Open

[Bug]: reference channels shifts (disordered) over multiple recordings #309

gusido opened this issue Aug 19, 2021 · 46 comments
Assignees

Comments

@gusido
Copy link

gusido commented Aug 19, 2021

Describe the bug

To Reproduce
Steps to reproduce the behavior:

  1. download provided image for hardware testing from: [https://files.seeedstudio.com/linux/Raspberry%20Pi%204%20reSpeaker/2021-05-07-raspios-buster-armhf-lite-respeaker.img.xz]
  2. burn image to an sd card
  3. boot kit (rpi4 with respeaker 5)
  4. login and run the following commnad:
    arecord -D hw:CARD=seeed8micvoicec,DEV=0 -d 3 -r 48000 -c 8 -f s32_le test.wav
  5. repeat 2-3 times
  6. review recordings in audacity or similar software and see reference channels (2 channels that contain no signal) shift places (reference channel may appear at channels other than 6,7)

Expected behavior

reference channels should always be channels 6 and 7 (count starting from 0)

Platform

Relevant log output

No response

@AIWintermuteAI
Copy link
Contributor

hi, @gusido !
I was able to reproduce the issue on the latest dd9391f commit version.

I looked briefly through the https://github.com/respeaker/seeed-voicecard/blob/master/ac108.c code and there has been quite a few changes that might affect channel order. I'm almost done with issue backlog and after that will be spending time working on issues that we were able to reproduce while doing internal testing.

@HinTak do you have any ideas of what might be causing channel shift? It looks similar to #301 ,which I wasn't able to reproduce. But this one affects Reference channels and not the recording cahnnels it seems.

@HinTak
Copy link
Contributor

HinTak commented Aug 20, 2021

Yes, #301 and quite a few closed-without-resolution ones. Afaik this is generic to multichannel (>2) capture and playback on the pi. See it on a different device and more / better discussion : Audio-Injector/Octo#1 . The audio-injector people at least leave the issue open for years, for other people to read about it...

@thmacoem
Copy link

Hi there.
Had the same issue on both Pi 3 and Pi 4, 32bits OS.
I opened a topic thinking about a bad config but now I know it's a bug...
As I found some previous version of the driver without this bug, does anybody know the most recent release without the bug ?
Thanks in advance

@AIWintermuteAI AIWintermuteAI removed their assignment Nov 25, 2021
@egaznep
Copy link

egaznep commented Dec 8, 2021

Hi, I am having a similar problem with the Respeaker-4-mic-array for Raspberry Pi, though not with the "reference" mics but with the recordings. Is there any active development/troubleshooting going on?

I am using the 64-bit kernel, and tested out on 2 different Raspberry Pi's and arrays with Audacity. I get two different permutations: 1-2-3-4 or 3-4-1-2. I couldn't get to find a reliable way to induce the switching between these permutations, but if I try hard enough (basically restarting the capture within Audacity or the script at https://github.com/spatialaudio/python-sounddevice/blob/0.4.1/examples/plot_input.py until it happens). Please let me know if it would be appropriate to open up a new issue. Stable permutations of the microphones is very crucial for our application, and we would be really happy to reach a solution as fast as possible.

Best regards,

@JaPhoton
Copy link

JaPhoton commented Dec 9, 2021

Hi, I had the same problem on Respeaker-4-mic-array and made this workaround.
channel_order_fix.zip

In the zip file, ac108.c and seeed-voicecard.c are modified
This patch only works for Respeaker-4-mic-array, but I think it can be modified for any device. The key point is to start generating the clock after "spin_unlock_irqrestore" (additional "mdelay(10);" is no needed in ac108.c and should be removed from my patch). I did it today, so I'm not sure if it's working properly.

Yours faithfully,

@egaznep
Copy link

egaznep commented Dec 10, 2021

@JaPhoton this solved my problem. Thanks a lot!

@dacsantillan
Copy link

Hello, can you help out with the Respeaker-6-mic-array? Thank you so much

@rnehrboss
Copy link

@JaPhoton Great work. Did this fix get merged in?
If not, how do we Make the new files using the C source code files you provided?
@egaznep Looks like you got it working too. Do you mind sharing the recompile and installation steps using @JaPhoton s code.

Thanks!

@JaPhoton
Copy link

@rnehrboss
On the repository dir is Makefile so you can probably make this files using "make" command in console.

@StuartIanNaylor
Copy link

The 6 mic is a nightmare as seems totally random and currently not much good for the DelaySum/TDOA beamformer I have hacked together.
https://github.com/StuartIanNaylor/ProjectEars Its my 1st C++ project starting from scratch with C++ but couldn't find another lite realtime Pi3 capable beamformer anywhere.

Is anyone @JaPhoton else hosting a repo with the channel fixes as say with above its impossible to use with rotating channels.

@StuartIanNaylor
Copy link

#251 (comment)

You have stated the ac108 is @EOL would the http://www.everest-semi.com/pdf/ES7210%20PB.pdf be an alternative?

@aaronAtAgrisound
Copy link

Is anyone still looking at this? it's been close to a year, and the respeaker 6 is still essentially unusable because of the inconsistent channel order.

@StuartIanNaylor
Copy link

StuartIanNaylor commented Jul 30, 2022

Only thing I can say is make sure you buy with paypal and at least then you can get a refund as yeah completely unusable if you have a random channel order.

PS Respeaker please fix these with a new revision and supply mic daughter boards as why limit the board to what is a bad choice of the geometry you supply and near impossible to isolate the onboard mics.
Just a straight board 4/8 channel ADC with dupont jumpers for analogue inputs with one being analogue mics from your store.

Or at least be honest and remove them from the store.

@beitong95
Copy link

I also have the same problem.
Try an older kernel and an older driver.
Branch rel-v5.5 works for me.
I use sudo ./install.sh --compat-kernel to install the driver. It will use the hardcoded FORCE_KERNEL in the install.sh, which is 1.20200819-1 a.k.a. 5.4.51-v7l+.
The mic array I am using is a 4-mic linear array.
The raspberry pi I am using is:
Revision : c03111
SoC : BCM2711
RAM : 4GB

@StuartIanNaylor
Copy link

I ordered via paypal and got a refund as seemed a better idea.

@egaznep
Copy link

egaznep commented Oct 11, 2022 via email

@jacopomaroli
Copy link

hey folks, I did a PR which cleans up and improves the above patch since it was breaking the output for my respeaker 6 mic.
The original patch was getting rid of the clock changes for the ac101 (output) without doing it anywhere else.

Let me know if you had the same problem and if this fixes it :)

P.S. I did a PR against HinTak repo as it's more updated and we might want to batch multiple changes. let me know if you prefer a PR against this repo instead.

@jacopomaroli
Copy link

jacopomaroli commented Feb 22, 2023

well... turns out the previous solution worked like 80% of the times so I bit the bullet and implemented automatic loopback channel detection into the ec project (that's realistically how most of us would use it anyway)

please check the PR above this comment and play with the other features I added. Let me know how it goes :)

@changxuding
Copy link

Is anyone still looking at this? it's been close to a year, and the respeaker 6 is still essentially unusable because of the inconsistent channel order.

I checkout branch linux-4.19-or-less instead of master, use sudo ./install.sh --compat-kernel, then kernel version will be transformed from 5.10.17-v7l+ to 4.19. the order becomes correct...

@HinTak
Copy link
Contributor

HinTak commented Apr 12, 2023

FWIW, Even "placebo style" random white-space changes is guaranteed to be correct at least 25% of time, since there are only 4 sync positions. Besides, I don't think the original was as poor as 25%? More like occasional (ie 80% correct). So I think "80%" correct is just placebo.

@beitong95
Copy link

One more comment on this issue. I think many users need to remotely log in to the Raspberry Pi through their laptop (not 24/7 on, you might close your laptop), start a screen session, then run their script in the screen session, and lastly, use Ctrl-A Ctrl-D to exit the screen session. In this way, your script will keep running even if you disconnect the SSH session.

However, in our tests, this process may lead to a channel shift after you use Ctrl-A Ctrl-D to exit the screen session. The solution is to not use screen on Raspberry Pi to keep your remote command alive.

@HinTak
Copy link
Contributor

HinTak commented Apr 28, 2023

Disclaimer: I don't work for Seeed Studio. FWIW, comments like "this issue happens in this other situation I care about too" isn't helpful.

The problem is well-understood I think - various components of the hardware just fake it and packs 2-channel 176k data, to and from, 8-channel 44k data. There are 4 ways of doing it. The driver starts and stop the components together, so most of the time, it is correct. However, when the system is busy (any situation, hence naming your "favourite" situation is not helpful) and stutters a bit, they go out of sync and you get one of the other 3 of 4 ways of packing 2x176k to 8x44k.

I think the only correct way to fix this, is to fix the other bug about kernel panic with spinlocks. That addresses the scheduling problem.

@HinTak
Copy link
Contributor

HinTak commented Apr 28, 2023

The spinlock issue is #251

@codepainters
Copy link

This exact issue has bitten me in my current project. With 4 mic ReSpeaker card I observe occasional channel swap while recording.

While reading the issue, BCM I2S block description, I came across the following:

If a FIFO error occurs in a two channel frame, then channel synchronisation may be lost
which may result in a left right audio channel swap. RXSYNC and TXSYNC status bits are provided to help determine if channel slip has occurred. They indicate if the number of words in the FIFO is a multiple of a full frame (taking into account where we are in the current frame being transferred). This assumes that an integer number of frames data has been sent/read from the FIFOs.

It's the only way I can imagine things going out of sync - swapping L/R parts of I2S frame would permute 1-2-3-4 mics into 3-4-1-2 which is what I observe. A sample scenario would be where the FIFO is overflown at start if e.g codec starts pushing the data before DMA is started. Is this what's happening here?

Is there anything else (github issue, forum thread, whatever) that sheds some light on this topic?

@rnehrboss
Copy link

rnehrboss commented Sep 18, 2023 via email

@codepainters
Copy link

Do you refer to this particular comment? #309 (comment)

I will certainly give it a try then.

@rnehrboss
Copy link

rnehrboss commented Sep 18, 2023 via email

@HinTak
Copy link
Contributor

HinTak commented Sep 18, 2023

Do you refer to this particular comment? #309 (comment)

I will certainly give it a try then.

Already explained that the code change is rubbish:
#309 (comment)

@rnehrboss
Copy link

Working for us. Prior, the channels were totally random, after driver change, seem to be 100% acurate. We now have many many units in the field.

@codepainters
Copy link

Hmm, I'm still missing any explanation of what exactly is causing the issue, I'd love to gain a deeper understanding.

@HinTak you write about "4 sync positions" - how can there be 4 sync positions, anyway? I must be missing something important here, but my understanding so far was that the mis-synchronization is due to I2S input FIFO going out of sync, but the FIFO is 32-bit wide, so that would only explain 1-2-3-4 to 3-4-1-2 swap. I'm confused here.

@rnehrboss do you remember, if you have tried the original code from the zip from JaPhoton, or the one from jacopomaroli pull request?

@rnehrboss
Copy link

rnehrboss commented Sep 18, 2023 via email

@HinTak
Copy link
Contributor

HinTak commented Sep 18, 2023

@codepainters AFAIK it is an artifact of trying to pack and unpack 8-channel audio as 2-channel at 4x frequency. (The 4-channel device has 8 channels with 4 empty). So there are 4 ways of doing it, with a bias to the sync position. So even if you do it without any synchronisation, you would still be 25% correct.

@codepainters
Copy link

@codepainters AFAIK it is an artifact of trying to pack and unpack 8-channel audio as 2-channel at 4x frequency. (The 4-channel device has 8 channels with 4 empty). So there are 4 ways of doing it, with a bias to the sync position. So even if you do it without any synchronisation, you would still be 25% correct.

I'm still confused about where shall such a (mis)synchronization happen:

  • codec is programmed to output its channels at particular TDM slots in the I2S/PCM frame
  • I2S receiver synchronizes to frame starts, by means of the LRCK signal from codec (no chance for channel misalignment here)
  • unpacking TDM frames happens inside bcm2835-i2s module, according to the following config (part of device tree overlay), and its fully deterministic:
                            cpu_dai: seeed-voice-card,cpu {
                                 sound-dai = <&i2s>;
                                 dai-tdm-slot-num     = <2>;
                                 dai-tdm-slot-width   = <32>;
                                 dai-tdm-slot-tx-mask = <1 1 0 0>;
                                 dai-tdm-slot-rx-mask = <1 1 0 0>;
                         };
    

The only place in this chain that is susceptible to misalignment is the FIFO, as documented by Broadcomm (of course we have 2 TDM slots per one I2S channel):

If a FIFO error occurs in a two channel frame, then channel synchronisation may be lost
which may result in a left right audio channel swap.

@codepainters
Copy link

Ok, I've done my homework, I think I understand a bit more.

@HinTak wrote:

AFAIK it is an artifact of trying to pack and unpack 8-channel audio as 2-channel at 4x frequency. (The 4-channel device has 8 channels with 4 empty). So there are 4 ways of doing it, with a bias to the sync position. So even if you do it without any synchronisation, you would still be 25% correct.

AFAIK the trick with running stereo I2S at 4x the nominal frequency that you refer to is what e.g. Audio Injector Octo does. It overcomes the sync issue using additional CPLD as BLCK and LRCK source (i.e. both RaspberryPi and codec are configured as slaves), as far as I understand CPLD takes care of starting the stream at the right moment.

However, with 4-Mic ReSpeaker it is slightly different.

The codec itself is configured with LRCK frequency equal to sampling frequency. Each frame is 128 bits long, with 4 slots, 32 bits each. I've confirmed it by checking codec registers, as well as with the scope (yellow - data, blue - LRCK):

s1

You can clearly see 4 slots, 32 bits each, with last 8 bits in each slot zeroed (the codec seems to produce 24 bit samples).

Actually it puzzled me for a while - Pi's I2S interface can handle up to 2 slots of 32 bits per frame, so how is that even possible to handle 4 slots per frame?

Here's the tricky part (from "BCM2711 ARM Peripherals" document):

Note that in frame sync slave mode there are two synchronising methods. The legacy method is used when the frame
length = 0. In this case the internal frame logic has to detect the incoming PCM_FS signal and reset the internal frame
counter at the start of every frame. The logic relies on the PCM_FS to indicate the length of the frame and so can cope
with adjacent frames of different lengths. However, this creates a short timing path that will corrupt the PCM_DOUT for
one specific frame/channel setting.
The preferred method is to set the frame length to the expected length. Here the incoming PCM_FS is used to
resynchronise the internal frame counter and this eliminates the short timing path.

And the ReSpeaker driver sets I2S frame length to 64 bits. Thus, each 128 bit frame from codec is in fact consumed as 2 consecutive 64 bit frames, 2 slots each. The receiver effectively re-synchronizes every second 64-bit frame.

It has interesting implications:

  • channel rotation is not caused at the I2S transport level (as is the case with 4x fs trick) - receiver always synchronizes at the same slot (where the 128 bit frame starts).
  • I've found no information on how the I2S receiver behaves right after enabling it - if it waits for the first LRCK pulse before receiving anything, or if it starts deserializing the bitstream at random place, and only regains the synchronization on the first LRCK pulse.
    • in the first case, it should be enough to enable LRCK only after the receiver is ready (with FIFO emptied), to get a proper channel order. I suppose that's exactly what the patch discussed above tries to achieve.
    • in the second case, if any 32-bit words are written to the FIFO before the first LRCK pulse, then there's no way to reliably synchronize channels without extra hardware.

Unfortunately Broadcomm's document doesn't give enough details, some more experimentation is necessary.

@rnehrboss
Copy link

rnehrboss commented Sep 23, 2023 via email

@StuartIanNaylor
Copy link

Dunno does anyone even know what TDM format is in place or is even a true TDM format?
https://docs.espressif.com/projects/esp-idf/en/latest/esp32s3/api-reference/peripherals/i2s.html

@codepainters
Copy link

Dunno does anyone even know what TDM format is in place or is even a true TDM format? https://docs.espressif.com/projects/esp-idf/en/latest/esp32s3/api-reference/peripherals/i2s.html

I'm not sure what you mean. I'm quite confident now about the format used by the codec. I've checked the registers (Igot a full AC108 datasheet from X-Powers), done some oscilloscope measurements, everything matches.

As stated before - it's 128 bits per frame, 4 slots, 32 bits per slot, LRCK pulse width = 1 BCLK period, 1 BLCK period delay. AC108 manual calls it PCM mode A (SR = WORD_SIZE =32, LRCK mode Short):

Screenshot_20230923_194103

Here's a trace that confirms the LRCK and BLCK polarities:

d7b8ff52-5363-455b-a919-82248f63e34b

That's pretty much all you need to know about the format.

@codepainters
Copy link

Wow great analysis.

Thanks. I just wanted to understand the root cause.

I'll be curious to know what your experimentation reveals.

I'm not sure if we will go down the rabbit hole, it's quite deep :) We need a reliable solution, and all the multichannel Raspberry interfaces seem to be a hack now. I'm not sure if it is worth the effort.

@StuartIanNaylor
Copy link

PCM Short Format: Data has one-bit shift and the WS signal becomes a pulse lasting one BCLK cycle for every frame.

Dunno the Espressif doc has good examples, but always wondered what I2S ports that do and don't support TDM mode as in what is the difference. I noticed this with the ESP32 which doesn't support TDM mode...
Is this simply that if the hardware doesn't support TDM mode then you will just lose sync and irrespective of software especially if the master is non tdm hardware there is not much you can do about it?

@codepainters
Copy link

I've done a simple experiment - see https://github.com/codepainters/rpi-i2s-experiments

Basically I send I2S frames to Pi, only enabling LRCK after some number of frames, to check how the receiver behaves. This confirmed my understanding - I2S receiver starts deserializing at a random place in the stream, and only regains synchronization on the first LRCK pulse.

Given the above, I've no idea how to solve this very issue. With a CPLD/FPGA it could be possible to precisely gate the BCLK and LRCK clocks - but that's a hardware mod.

At that point I decided to give up - even if there's any software-only solution, it would be an ugly hack. For our project we've decided to build a simple I2S to USB interface and use regular USB Audio Device drivers.

@rnehrboss
Copy link

rnehrboss commented Sep 26, 2023 via email

@wshanmu
Copy link

wshanmu commented Jul 24, 2024

Branch rel-v5.5 works for me

This also works for the 6-mic array. Thanks for your sharing.

@beitong95
Copy link

Branch rel-v5.5 works for me

This also works for the 6-mic array. Thanks for your sharing.

I think there are still other reasons that can cause the channel shift problem. But for the software version, I think we should use the rel-v5.5.

@wshanmu
Copy link

wshanmu commented Jul 25, 2024

For those who just want to avoid this problem, I try the following steps with a Raspberry Pi 4B and the 6-Mic array and make it work: (followed the sharing from @beitong95)

  1. Flash the image supplied on the readme page (https://files.seeedstudio.com/linux/Raspberry%20Pi%204%20reSpeaker/2021-05-07-raspios-buster-armhf-lite-respeaker.img.xz)
  2. cd seeed-voicecard, then run sudo ./uninstall.sh, and sudo reboot to uninstall the preinstalled version
  3. git checkout -b rel-v5.5 remotes/origin/rel-v5.5, then run sudo ./install.sh --compat-kernel and reboot again.

And I found the python script provided by the wiki (which uses PyAudio module) can get stuck at some cases. So I write a script based on sounddevice:

play_and_record.py
import wave
import argparse
import numpy as np
import sounddevice as sd
import os

class Recorder:
    def __init__(self, channels, samplerate, chunk_size):
        self.channels = channels
        self.samplerate = samplerate
        self.chunk_size = chunk_size
        self.frames = []

    def callback(self, indata, frames, time, status):
        if status:
            print(status)
        self.frames.append(indata.copy())

# Setting up argument parser
parser = argparse.ArgumentParser()
parser.add_argument("--filename", type=str, default="output")
parser.add_argument("--playPath", type=str, default="./sequence.npy") # audio to be played
parser.add_argument("--savePath", type=str, default="./recordings/")
parser.add_argument("--playDevice", type=int, default=5)
parser.add_argument("--recDevice", type=int, default=0)

args = parser.parse_args()
WAVE_OUTPUT_FILENAME = args.filename.strip()
file_path = args.playPath.strip()
chunk_size = 1024
data = np.load(file_path)
RECORD_SECONDS = len(data) / 48000 + 0.1

recorder = Recorder(channels=8, samplerate=48000, chunk_size=chunk_size)

stream = sd.InputStream(
    samplerate=48000,
    channels=8,
    dtype='int16',
    blocksize=chunk_size,
    callback=recorder.callback,
    device=args.recDevice
)

print("Recording...")
with stream:
    sd.play(data, samplerate=48000, device=args.playDevice)
    sd.wait()

base_filename = os.path.splitext(WAVE_OUTPUT_FILENAME)[0].strip()
filename = f"{base_filename}.wav"
full_save_path = os.path.join(args.savePath.strip(), filename)
print(f"Saving to {full_save_path}")
wf = wave.open(full_save_path, 'wb')
wf.setnchannels(8)
wf.setsampwidth(2)  # 16-bit resolution
wf.setframerate(48000)
wf.writeframes(b''.join(recorder.frames))
wf.close()

print("Done recording")

Hope this will be useful :)

@HinTak
Copy link
Contributor

HinTak commented Jul 25, 2024

I am not convinced about just downgrading. I think there were a kernel bug between 5.4 and 5.10 where different parts of the hardware were initialized and deinitialized at the wrong order. As this bug is sensitive to how the driver is initialized, that bug may gives better sync due to it being wrong.... anyway, I haven't seen a "convincing" answer yet, just a lot of voodoos...I.e. "dance naked under the next full moon in an open grass field and your device will work" :-).

@StuartIanNaylor
Copy link

StuartIanNaylor commented Aug 14, 2024

Essentially you are bit banging TDM mode over a I2S channel that doesn't support hardware TDM.
Time Division multiplexing does have a standard and hardware mode that does control initialisation.
I noticed this on the Esspressif site as the mode overview table they give excludes the very popular esp32 but the bigger esp32-s3 does have hardware support.
https://docs.espressif.com/projects/esp-idf/en/stable/esp32/api-reference/peripherals/i2s.html#overview-of-all-modes

From Esspressif to Ti you can get devices that do or don't support hardware TDM and none suggest bit banging TDM on a standard I2S.

Here is an old Cyrus logic app note https://gab.wallawalla.edu/~larry.aamodt/engr432/cirrus_logic_TDM_AN301.pdf

Time Division Multiplexed Audio Interface: A Tutorial

As far as I can gather the L/R clock in TDM mode is not a L/R clock but the frame sync and yes many things can be bitbanged that don't have hardware support but there are reasons why these do have specific harware needs to run as expected.
There is an actual difference in the timing of the L/R clock because its not a L/R clock its the TDM frame sync.
There is no kernel bug on initialisation with hardware that doesn't support TDM mode it just changed to where bitbanging TDM no longer worked.

It clearly states how the Frame Synchronization Pulse should be timed and is the same with all hardware TDM I2S where the L/R clock works in a totally different manner because it isn't a L/R clock but a pulse denoting the 1st frame in the multichannel audio...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment