Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge/sound upstream 20241009 #5204

Merged
merged 10,000 commits into from
Oct 14, 2024

Conversation

bardliao
Copy link
Collaborator

@bardliao bardliao commented Oct 9, 2024

@vijendarmukunda Could you double check the AMD part? I fixed some conflicts. Not sure if I did it right.

Alex Hung and others added 30 commits October 1, 2024 18:10
[WHY & HOW]
Some eDP panels suffer from flicking when HDR is enabled in KDE. This
quirk works around it by skipping VSC that is incompatible with eDP
panels.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3151
Cc: Mario Limonciello <[email protected]>
Cc: Alex Deucher <[email protected]>
Reviewed-by: Rodrigo Siqueira <[email protected]>
Signed-off-by: Alex Hung <[email protected]>
Signed-off-by: Aurabindo Pillai <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit 4d42572)
Cc: [email protected]
[Why]

There are more IPS modes other than DMUB_IPS_ENABLE that enables IPS. We
need to enable the hotplug detect idle workqueue for those modes as
well.

[How]

Modify the if condition to initialize the workqueue in all IPS modes
except for DMUB_IPS_DISABLE_ALL.

Fixes: 6544458 ("drm/amd/display: Determine IPS mode by ASIC and PMFW versions")
Signed-off-by: Leo Li <[email protected]>
Reviewed-by: Roman Li <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit 181db30)
Cc: [email protected]
[Why]
Connected with a Thunderbolt monitor and do the suspend and the system
may hang while resume.

The TBT monitor HPD will be triggered during the resume procedure
and call the drm_client_modeset_probe() while
struct drm_connector connector->dev->master is NULL.

It will mess up the pipe topology after resume.

[How]
Skip the TBT monitor HPD during the resume procedure because we
currently will probe the connectors after resume by default.

Reviewed-by: Wayne Lin <[email protected]>
Signed-off-by: Tom Chung <[email protected]>
Signed-off-by: Fangzhi Zuo <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit 453f86a)
Cc: [email protected]
Merge series from Jinjie Ruan <[email protected]>:

Fix pm_runtime_set_suspended() with runtime pm enabled, and fix the missing
check for spi-cadence.

Jinjie Ruan (3):
  spi: spi-imx: Fix pm_runtime_set_suspended() with runtime pm enabled
  spi: spi-cadence: Fix pm_runtime_set_suspended() with runtime pm
    enabled
  spi: spi-cadence: Fix missing spi_controller_is_target() check

 drivers/spi/spi-cadence.c | 8 +++++---
 drivers/spi/spi-imx.c     | 2 +-
 2 files changed, 6 insertions(+), 4 deletions(-)

--
2.34.1
Due to server permission control, the client does not have access to
the shared root directory, but can access subdirectories normally, so
users usually mount the shared subdirectories directly. In this case,
queryfs should use the actual path instead of the root directory to
avoid the call returning an error (EACCES).

Signed-off-by: wangrong <[email protected]>
Reviewed-by: Paulo Alcantara (Red Hat) <[email protected]>
Cc: [email protected]
Signed-off-by: Steve French <[email protected]>
Declarations local to arch/*/kernel/*.c are better off *not* in a public
header - arch/parisc/kernel/unaligned.h is just fine for those
bits.

With that done parisc asm/unaligned.h is reduced to include
of asm-generic/unaligned.h and can be removed - unaligned.h is in
mandatory-y in include/asm-generic/Kbuild.

Acked-by: Helge Deller <[email protected]>
Signed-off-by: Al Viro <[email protected]>
new_dir does *NOT* point into dir_folio - it's an inode, not a pointer
to ufs directory entry.

Fixes: 516b97c "ufs: Convert directory handling to kmap_local"
Acked-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Al Viro <[email protected]>
Some time ago, we introduced the obey_preferred_dacs flag for choosing
the DAC/pin pairs specified by the driver instead of parsing the
paths.  This works as expected, per se, but there have been a few
cases where we forgot to set this flag while preferred_dacs table is
already set up.  It ended up with incorrect wiring and made us
wondering why it doesn't work.

Basically, when the preferred_dacs table is provided, it means that
the driver really wants to wire up to follow that.  That is, the
presence of the preferred_dacs table itself is already a "do-it"
flag.

In this patch, we simply replace the evaluation of obey_preferred_dacs
flag with the presence of preferred_dacs table for fixing the
misbehavior.  Another patch to drop of the obsoleted flag will
follow.

Fixes: 242d990 ("ALSA: hda/generic: Add option to enforce preferred_dacs pairs")
Link: https://bugzilla.suse.com/show_bug.cgi?id=1219803
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>
Now we evaluate directly with preferred_dacs table, the flag is no
longer used and merely a placeholder.
Let's drop the definition and its users.

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>
Add the quirk for HP Pavilion Gaming laptop 15z-ec200 for
enabling the mute led. The fix apply the ALC285_FIXUP_HP_MUTE_LED
quirk for this model.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=219303
Signed-off-by: Abhishek Tamboli <[email protected]>
Cc: <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>
If you enable "Option -> Show Debug Info" and click a link, the program
terminates with the following error:

    *** buffer overflow detected ***: terminated

The buffer overflow is caused by the following line:

    strcat(data, "$");

The buffer needs one more byte to accommodate the additional character.

Fixes: c4f7398 ("kconfig: qconf: make debug links work again")
Signed-off-by: Masahiro Yamada <[email protected]>
Since commit d29e741 ("gpio: davinci: drop platform data support"),
irqchip is no longer being registered on platforms what don't use
unbanked gpios. Fix this.

Reported-by: Sabeeh Khan <[email protected]>
Fixes: d29e741 ("gpio: davinci: drop platform data support")
Signed-off-by: Vignesh Raghavendra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Bartosz Golaszewski <[email protected]>
As Jan suggested in links below, refactor udf_current_aext() to
differentiate between error, hit EOF and success, it now takes pointer to
etype to store the extent type, return 1 when getting etype success,
return 0 when hitting EOF and return -errno when err.

Link: https://lore.kernel.org/all/20240912111235.6nr3wuqvktecy3vh@quack3/
Signed-off-by: Zhao Mengmeng <[email protected]>
Suggested-by: Jan Kara <[email protected]>
Signed-off-by: Jan Kara <[email protected]>
Link: https://patch.msgid.link/[email protected]
When the trigger_tstamp_latched flag is set, the PCM core code assumes that
the low-level driver handles the trigger timestamping itself. Ensure that
runtime->trigger_tstamp is always updated.

Buglink: alsa-project/alsa-lib#387
Reported-by: Zeno Endemann <[email protected]>
Signed-off-by: Jaroslav Kysela <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>
Since udf_current_aext() has error handling, udf_next_aext() should have
error handling too. Besides, when too many indirect extents found in one
inode, return -EFSCORRUPTED; when reading block failed, return -EIO.

Signed-off-by: Zhao Mengmeng <[email protected]>
Suggested-by: Jan Kara <[email protected]>
Signed-off-by: Jan Kara <[email protected]>
Link: https://patch.msgid.link/[email protected]
Refactor inode_bmap() to handle error since udf_next_aext() can return
error now. On situations like ftruncate, udf_extend_file() can now
detect errors and bail out early without resorting to checking for
particular offsets and assuming internal behavior of these functions.

Reported-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=7a4842f0b1801230a989
Tested-by: [email protected]
Signed-off-by: Zhao Mengmeng <[email protected]>
Suggested-by: Jan Kara <[email protected]>
Signed-off-by: Jan Kara <[email protected]>
Link: https://patch.msgid.link/[email protected]
Check for overflow when computing alen in udf_current_aext to mitigate
later uninit-value use in udf_get_fileshortad KMSAN bug[1].
After applying the patch reproducer did not trigger any issue[2].

[1] https://syzkaller.appspot.com/bug?extid=8901c4560b7ab5c2f9df
[2] https://syzkaller.appspot.com/x/log.txt?x=10242227980000

Reported-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=8901c4560b7ab5c2f9df
Tested-by: [email protected]
Suggested-by: Jan Kara <[email protected]>
Signed-off-by: Gianfranco Trad <[email protected]>
Signed-off-by: Jan Kara <[email protected]>
Link: https://patch.msgid.link/[email protected]
Add adsp-backed soundcard compatible for QRB4210 RB2 platform,
which as of now looks fully compatible with SM8250.

Signed-off-by: Alexey Klimov <[email protected]>
Reviewed-by: Krzysztof Kozlowski <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Mark Brown <[email protected]>
Add "qcom,qrb4210-rb2-sndcard" to the list of recognizable
devices.

Signed-off-by: Alexey Klimov <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Mark Brown <[email protected]>
My understanding of the interrupts property is that it can either be:
1/ - TX
2/ - TX
   - RX
3/ - Common/combined.

There are very little chances that either:
   - TX
   - Common/combined
or even
   - TX
   - RX
   - Common/combined
could be a thing.

Looking at the interrupt-names definition (which uses oneOf instead of
anyOf), it makes indeed little sense to use anyOf in the interrupts
definition. I believe this is just a mistake, hence let's fix it.

Fixes: 8be9064 ("ASoC: dt-bindings: davinci-mcasp: convert McASP bindings to yaml schema")
Signed-off-by: Miquel Raynal <[email protected]>
Reviewed-by: Krzysztof Kozlowski <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Mark Brown <[email protected]>
In most Linux distribution kernels, the SND is set to m, in such a
case, when booting the kernel on i.MX8MP EVK board, there is a
warning calltrace like below:
 Call trace:
 snd_card_init+0x484/0x4cc [snd]
 snd_card_new+0x70/0xa8 [snd]
 snd_soc_bind_card+0x310/0xbd0 [snd_soc_core]
 snd_soc_register_card+0xf0/0x108 [snd_soc_core]
 devm_snd_soc_register_card+0x4c/0xa4 [snd_soc_core]

That is because the card.owner is not set, a warning calltrace is
raised in the snd_card_init() due to it.

Fixes: aa73670 ("ASoC: imx-card: Add imx-card machine driver")
Signed-off-by: Hui Wang <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Mark Brown <[email protected]>
devm_kasprintf() can return a NULL pointer on failure but this
returned value is not checked.

Fixes: b359760 ("ASoC: intel: sof_sdw: Add simple DAI link creation helper")
Signed-off-by: Charles Han <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Mark Brown <[email protected]>
When i2s_irq_handler is called, it's guaranteed that adata is not NULL,
since IRQ handlers are guaranteed to be provided with a valid data pointer.
Moreover, adata pointer is being dereferenced right before the NULL check,
which makes the check pointless, even if adata could be NULL.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Signed-off-by: Murad Masimov <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Mark Brown <[email protected]>
…_object_watched()

When __fsnotify_recalc_mask() recomputes the mask on the watched object,
the compiler can "optimize" the code to perform partial updates to the
mask (including zeroing it at the beginning). Thus places checking
the object mask without conn->lock such as fsnotify_object_watched()
could see invalid states of the mask. Make sure the mask update is
performed by one memory store using WRITE_ONCE().

Reported-by: [email protected]
Reported-by: Dmitry Vyukov <[email protected]>
Link: https://lore.kernel.org/all/CACT4Y+Zk0ohwwwHSD63U2-PQ=UuamXczr1mKBD6xtj2dyYKBvA@mail.gmail.com
Signed-off-by: Jan Kara <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
Link: https://patch.msgid.link/[email protected]
[Syzbot reported]
WARNING: possible circular locking dependency detected
6.11.0-rc4-syzkaller-00019-gb311c1b497e5 #0 Not tainted
------------------------------------------------------
kswapd0/78 is trying to acquire lock:
ffff88801b8d8930 (&group->mark_mutex){+.+.}-{3:3}, at: fsnotify_group_lock include/linux/fsnotify_backend.h:270 [inline]
ffff88801b8d8930 (&group->mark_mutex){+.+.}-{3:3}, at: fsnotify_destroy_mark+0x38/0x3c0 fs/notify/mark.c:578

but task is already holding lock:
ffffffff8ea2fd60 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat mm/vmscan.c:6841 [inline]
ffffffff8ea2fd60 (fs_reclaim){+.+.}-{0:0}, at: kswapd+0xbb4/0x35a0 mm/vmscan.c:7223

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (fs_reclaim){+.+.}-{0:0}:
       ...
       kmem_cache_alloc_noprof+0x3d/0x2a0 mm/slub.c:4044
       inotify_new_watch fs/notify/inotify/inotify_user.c:599 [inline]
       inotify_update_watch fs/notify/inotify/inotify_user.c:647 [inline]
       __do_sys_inotify_add_watch fs/notify/inotify/inotify_user.c:786 [inline]
       __se_sys_inotify_add_watch+0x72e/0x1070 fs/notify/inotify/inotify_user.c:729
       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
       do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

-> #0 (&group->mark_mutex){+.+.}-{3:3}:
       ...
       __mutex_lock+0x136/0xd70 kernel/locking/mutex.c:752
       fsnotify_group_lock include/linux/fsnotify_backend.h:270 [inline]
       fsnotify_destroy_mark+0x38/0x3c0 fs/notify/mark.c:578
       fsnotify_destroy_marks+0x14a/0x660 fs/notify/mark.c:934
       fsnotify_inoderemove include/linux/fsnotify.h:264 [inline]
       dentry_unlink_inode+0x2e0/0x430 fs/dcache.c:403
       __dentry_kill+0x20d/0x630 fs/dcache.c:610
       shrink_kill+0xa9/0x2c0 fs/dcache.c:1055
       shrink_dentry_list+0x2c0/0x5b0 fs/dcache.c:1082
       prune_dcache_sb+0x10f/0x180 fs/dcache.c:1163
       super_cache_scan+0x34f/0x4b0 fs/super.c:221
       do_shrink_slab+0x701/0x1160 mm/shrinker.c:435
       shrink_slab+0x1093/0x14d0 mm/shrinker.c:662
       shrink_one+0x43b/0x850 mm/vmscan.c:4815
       shrink_many mm/vmscan.c:4876 [inline]
       lru_gen_shrink_node mm/vmscan.c:4954 [inline]
       shrink_node+0x3799/0x3de0 mm/vmscan.c:5934
       kswapd_shrink_node mm/vmscan.c:6762 [inline]
       balance_pgdat mm/vmscan.c:6954 [inline]
       kswapd+0x1bcd/0x35a0 mm/vmscan.c:7223

[Analysis]
The problem is that inotify_new_watch() is using GFP_KERNEL to allocate
new watches under group->mark_mutex, however if dentry reclaim races
with unlinking of an inode, it can end up dropping the last dentry reference
for an unlinked inode resulting in removal of fsnotify mark from reclaim
context which wants to acquire group->mark_mutex as well.

This scenario shows that all notification groups are in principle prone
to this kind of a deadlock (previously, we considered only fanotify and
dnotify to be problematic for other reasons) so make sure all
allocations under group->mark_mutex happen with GFP_NOFS.

Reported-and-tested-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=c679f13773f295d2da53
Signed-off-by: Lizhi Xu <[email protected]>
Reviewed-by: Amir Goldstein <[email protected]>
Signed-off-by: Jan Kara <[email protected]>
Link: https://patch.msgid.link/[email protected]
Fix the documentation to match the new function signature.

Fixes: 76c313f ("blk-integrity: improved sg segment mapping")
Signed-off-by: Keith Busch <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
These are called from blkcg_print_blkgs() which already disables IRQs so
disabling it again is wrong.  It means that IRQs will be enabled slightly
earlier than intended, however, so far as I can see, this bug is harmless.

Fixes: 35198e3 ("blk-iocost: read params inside lock in sysfs apis")
Signed-off-by: Dan Carpenter <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
For fixing CVE-2023-6270, f98364e ("aoe: fix the potential
use-after-free problem in aoecmd_cfg_pkts") makes tx() calling dev_put()
instead of doing in aoecmd_cfg_pkts(). It avoids that the tx() runs
into use-after-free.

Then Nicolai Stange found more places in aoe have potential use-after-free
problem with tx(). e.g. revalidate(), aoecmd_ata_rw(), resend(), probe()
and aoecmd_cfg_rsp(). Those functions also use aoenet_xmit() to push
packet to tx queue. So they should also use dev_hold() to increase the
refcnt of skb->dev.

On the other hand, moving dev_put() to tx() causes that the refcnt of
skb->dev be reduced to a negative value, because corresponding
dev_hold() are not called in revalidate(), aoecmd_ata_rw(), resend(),
probe(), and aoecmd_cfg_rsp(). This patch fixed this issue.

Cc: [email protected]
Link: https://nvd.nist.gov/vuln/detail/CVE-2023-6270
Fixes: f98364e ("aoe: fix the potential use-after-free problem in aoecmd_cfg_pkts")
Reported-by: Nicolai Stange <[email protected]>
Signed-off-by: Chun-Yi Lee <[email protected]>
Link: https://lore.kernel.org/stable/20240624064418.27043-1-jlee%40suse.com
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
Commit 2fae6bb ("xen/privcmd: Add new syscall to get gsi from dev")
adds a weak reverse dependency to the config XEN_PRIVCMD definition,
referring to CONFIG_XEN_PCIDEV_BACKEND. In Kconfig files, one refers to
config options without the CONFIG prefix, though. So in its current form,
this does not create the reverse dependency as intended, but is an
attribute with no effect.

Refer to the intended config option XEN_PCIDEV_BACKEND in the XEN_PRIVCMD
definition.

Fixes: 2fae6bb ("xen/privcmd: Add new syscall to get gsi from dev")
Signed-off-by: Lukas Bulwahn <[email protected]>
Reviewed-by: Juergen Gross <[email protected]>
Message-ID: <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>
This reverts commit e6a3531.

The problem that the commit e6a3531
fixes was reported as a security bug, but Google engineers working on
Android and ChromeOS didn't want to change the default behavior, they
want to get -EIO rather than restarting the system, so I am reverting
that commit.

Note also that calling machine_restart from the I/O handling code is
potentially unsafe (the reboot notifiers may wait for the bio that
triggered the restart), but Android uses the reboot notifiers to store
the reboot reason into the PMU microcontroller, so machine_restart must
be used.

Signed-off-by: Mikulas Patocka <[email protected]>
Cc: [email protected]
Fixes: e6a3531 ("dm-verity: restart or panic on an I/O error")
Suggested-by: Sami Tolvanen <[email protected]>
Suggested-by: Will Drewry <[email protected]>
@vijendarmukunda
Copy link

@bardliao : could you please point me commits which you have resolved the merge conflicts?

@bardliao
Copy link
Collaborator Author

bardliao commented Oct 9, 2024

@bardliao : could you please point me commits which you have resolved the merge conflicts?

It is bardliao@2ffd47d. The conflict is in sound/soc/amd/acp/acp-sdw-sof-mach.c.

Copy link

@vijendarmukunda vijendarmukunda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked AMD part soundwire machine driver changes. LGTM.

@bardliao
Copy link
Collaborator Author

SOFCI TEST

@ujfalusi
Copy link
Collaborator

stable-2.2 has regression:
platform cml_rt5682_def: deferred probe pending: (reason unknown)
platform glk_da7219_def: deferred probe pending: (reason unknown)

and a new (for me) on LNL:

[  853.302082] kernel: snd_sof:sof_pcm_trigger: sof-audio-pci-intel-lnl 0000:00:1f.3: pcm: trigger stream 7 dir 0 cmd 1
[  853.302087] kernel: snd_sof:sof_ipc4_trigger_pipelines: sof-audio-pci-intel-lnl 0000:00:1f.3: trigger cmd: 1 state: 4
[  853.302092] kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc tx      : 0xe070001|0x180: GLB_CHAIN_DMA
[  853.303958] kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc rx      : 0x1b0a0000|0x0: GLB_NOTIFICATION|EXCEPTION_CAUGHT
[  853.303967] kernel: sof-audio-pci-intel-lnl 0000:00:1f.3: ------------[ DSP dump start ]------------
[  853.304053] kernel: sof-audio-pci-intel-lnl 0000:00:1f.3: DSP panic!
[  853.304092] kernel: sof-audio-pci-intel-lnl 0000:00:1f.3: fw_state: SOF_FW_BOOT_COMPLETE (7)
[  853.304144] kernel: sof-audio-pci-intel-lnl 0000:00:1f.3: 0x50000005: module: ROM_EXT, state: FW_ENTERED, running
[  853.304208] kernel: sof-audio-pci-intel-lnl 0000:00:1f.3: Firmware state: 0x0, status/error code: 0x0
[  853.304272] kernel: sof-audio-pci-intel-lnl 0000:00:1f.3: Core dump is not available due to invalid separator 0xc0de
[  853.304331] kernel: sof-audio-pci-intel-lnl 0000:00:1f.3: ------------[ DSP dump end ]------------
[  853.304384] kernel: snd_sof:sof_set_fw_state: sof-audio-pci-intel-lnl 0000:00:1f.3: fw_state change: 7 -> 8
[  853.304405] kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc rx done : 0x1b0a0000|0x0: GLB_NOTIFICATION|EXCEPTION_CAUGHT
[  853.806657] kernel: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc timed out for 0xe070001|0x180
[  853.806749] kernel: sof-audio-pci-intel-lnl 0000:00:1f.3: Attempting to prevent DSP from entering D3 state to preserve context
[  853.806759] kernel: sof-audio-pci-intel-lnl 0000:00:1f.3: ------------[ IPC dump start ]------------
[  853.806819] kernel: sof-audio-pci-intel-lnl 0000:00:1f.3: Host IPC initiator: 0x8e070001|0x180|0x0, target: 0x1b0a0000|0x0|0x0, ctl: 0x3
[  853.806890] kernel: sof-audio-pci-intel-lnl 0000:00:1f.3: ------------[ IPC dump end ]------------
[  853.806939] kernel: sof-audio-pci-intel-lnl 0000:00:1f.3: IPC timeout
[  853.806997] kernel: sof-audio-pci-intel-lnl 0000:00:1f.3: ASoC: error at soc_component_trigger on 0000:00:1f.3: -110
[  853.807065] kernel:  HDMI3: ASoC: trigger FE cmd: 1 failed: -110
[  853.302056]  dma: dma_get: dma_get() ID 0 sref = 2 busy channels 0
[  853.302070]  dma: dma_get: dma_get() ID 0 sref = 2 busy channels 0
[  853.302090]  chain_dma: chain_init: comp:0 0x0 chain_init(): dma_request_channel() failed
[  853.302098]  chain_dma: chain_task_start: comp:129 0x81 chain_task_start(), host_dma_id = 0x00000001
[  853.302103]  os: print_fatal_exception:  ** FATAL EXCEPTION
[  853.302110]  os: print_fatal_exception:  ** CPU 0 EXCCAUSE 13 (load/store PIF data error)
[  853.302115]  os: print_fatal_exception:  **  PC 0xa0079eb5 VADDR (nil)
[  853.302118]  os: print_fatal_exception:  **  PS 0x60720
[  853.302121]  os: print_fatal_exception:  **    (INTLEVEL:0 EXCM: 0 UM:1 RING:0 WOE:1 OWB:7 CALLINC:2)
[  853.302125]  os: xtensa_dump_stack:  **  A0 0xa0052325  SP 0xa010b820  A2 (nil)  A3 0x4011cc80
[  853.302130]  os: xtensa_dump_stack:  **  A4 0xa011cd40  A5 0x18  A6 0x401111a0  A7 0xa010b820
[  853.302133]  os: xtensa_dump_stack:  **  A8 0xa0062ab5  A9 0xa010b7e0 A10 0x401111a0 A11 0xa007bfd0
[  853.302136]  os: xtensa_dump_stack:  ** A12 0xa0062cd8 A13 0x1 A14 0xa A15 0xa010b760
[  853.302140]  os: xtensa_dump_stack:  ** LBEG 0xa0037405 LEND 0xa0037414 LCOUNT 0xa00626cb
[  853.302143]  os: xtensa_dump_stack:  ** SAR 0x1d
[  853.302146]  os: xtensa_dump_stack:  **  THREADPTR (nil)

@ujfalusi
Copy link
Collaborator

@bardliao, what happens is:
after the last iteration on PCM 6 the test executes kill -9 "$pid" (and does not wait fort he termination) and moves to the next PCM (7), the kill for some reason does not happen right away, PCM7 is started (with host/link DMA id 1) then the stop to PCM6 comes, which places the (host/link 0) ChainDMA to PAUSED, then we stop the PCM7, that (host/link 1) goes to PAUSED then RESET but PCM6 is not moved to RESET (host/link 0) ????
When we start the PCM7 (host/link 1) -> firmware crash.

I don't see anything like this happening with other PRs...

@ujfalusi
Copy link
Collaborator

Logging the test result before re-triggering the test: 46626

@ujfalusi
Copy link
Collaborator

SOFCI TEST

@bardliao
Copy link
Collaborator Author

@ujfalusi Is it possible to bisect it? Can the issue be reproduced with linux-next kernel?

@ujfalusi
Copy link
Collaborator

I think the i2c bus is not probing and thus the two chromebook is without audio card as the codec is not probed.

@ujfalusi
Copy link
Collaborator

@ujfalusi Is it possible to bisect it? Can the issue be reproduced with linux-next kernel?

I would go with mainline first, it should work fine on LNL...

@bardliao
Copy link
Collaborator Author

I checked the stable v2.2 issue on ubuntu@jf-cml-hel-rt5682-05 the same device as the CI test and see below error.

[    5.350869] sof-audio-pci-intel-cnl 0000:00:1f.3: ------------[ DSP dump start ]------------
[    5.350913] sof-audio-pci-intel-cnl 0000:00:1f.3: Firmware boot failure due to timeout
[    5.350933] sof-audio-pci-intel-cnl 0000:00:1f.3: fw_state: SOF_FW_BOOT_IN_PROGRESS (3)
[    5.350957] sof-audio-pci-intel-cnl 0000:00:1f.3: 0x80000005: module: ROM, state: FW_ENTERED, not running
[    5.350981] sof-audio-pci-intel-cnl 0000:00:1f.3: status code: 0xbeef0000 (error: user exception)
[    5.351062] sof-audio-pci-intel-cnl 0000:00:1f.3: invalid header size 0x1010e0e. FW oops is bogus
[    5.351098] sof-audio-pci-intel-cnl 0000:00:1f.3: unexpected fault 0xbeef0000 trace 0x00000220
[    5.351119] sof-audio-pci-intel-cnl 0000:00:1f.3: ------------[ DSP dump end ]------------
[    5.351139] sof-audio-pci-intel-cnl 0000:00:1f.3: error: failed to boot DSP firmware -5

And that is due to an incorrect sof-cml.ri is used. The md5sum of the incorrect sof-cml.ri is dee17a3c329e560c08f104f0e35a59f6 which is updated on Oct 12th. Not sure what happened. After using the sof-cml.ri from jf-cml-hel-rt5682-01, the issue is gone.

@bardliao
Copy link
Collaborator Author

There is another issue on the stable-v2.2 test. snd_sof_load_topology is not called by the 6.12-rc2 kernel. The log below are seen with 6.11-rc6 kernel but not with 6.12-rc2 kernel.

snd_sof:snd_sof_load_topology: sof-audio-pci-intel-apl 0000:00:0e.0: loading topology:intel/sof-tplg/sof-glk-da7219.tplg
snd_sof:snd_sof_load_topology: sof-audio-pci-intel-cnl 0000:00:1f.3: loading topology:intel/sof-tplg/sof-cml-rt1011-rt5682.tplg

I will look into it.

@bardliao
Copy link
Collaborator Author

I think the i2c bus is not probing and thus the two chromebook is without audio card as the codec is not probed.

You are right. There is no i2c-10EC5682:00 when I check ls /sys/bus/i2c/devices/ with 6.12-rc2 kernel.
ls /sys/bus/i2c/devices/ on 6.11-rc6 kernel:
i2c-0 i2c-1 i2c-10 i2c-10EC1011:00 i2c-10EC1011:01 i2c-10EC1011:02 i2c-10EC1011:03 i2c-10EC5682:00 i2c-2 i2c-3 i2c-4 i2c-5 i2c-6 i2c-7 i2c-8 i2c-9 i2c-ELAN0000:00 i2c-GDIX0000:00
ls /sys/bus/i2c/devices/ on 6.12-rc2 kernel:
i2c-0 i2c-1 i2c-2 i2c-3 i2c-4 i2c-5 i2c-6 i2c-7

@bardliao
Copy link
Collaborator Author

If I test with the https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git,
the issue happens on the next-20240919 tag and not on the next-20240918 tag. So, the bad commit should be between next-20240918 and next-20240919.

@ujfalusi
Copy link
Collaborator

It is the i2c_designware which is not probing

@ujfalusi
Copy link
Collaborator

@bardliao, the i2c issue will be fixed by: thesofproject/kconfig#101

@bardliao
Copy link
Collaborator Author

@bardliao, the i2c issue will be fixed by: thesofproject/kconfig#101

Thanks @ujfalusi I just found the same. haha.

@ujfalusi
Copy link
Collaborator

SOFCI TEST

@ujfalusi
Copy link
Collaborator

Let's see with the updated sof-kconfig...

@bardliao
Copy link
Collaborator Author

Test result looks good to me. Let's merge.

@bardliao bardliao merged commit 4cbac21 into thesofproject:topic/sof-dev Oct 14, 2024
13 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.