wrapfs-2.6.36.y.git
3 years agoALSA: als4000: Fix the missing snd_card_free() call at probe error
Takashi Iwai [Tue, 12 Apr 2022 10:26:02 +0000 (12:26 +0200)]
ALSA: als4000: Fix the missing snd_card_free() call at probe error

commit d616a0246da88d811f9f4c3aa83003c05efd3af0 upstream.

The previous cleanup with devres may lead to the incorrect release
orders at the probe error handling due to the devres's nature.  Until
we register the card, snd_card_free() has to be called at first for
releasing the stuff properly when the driver tries to manage and
release the stuff via card->private_free().

This patch fixes it by calling snd_card_free() on the error from the
probe callback using a new helper function.

Fixes: 0e175f665960 ("ALSA: als4000: Allocate resources with device-managed APIs")
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20220412102636.16000-6-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoALSA: als300: Fix the missing snd_card_free() call at probe error
Takashi Iwai [Tue, 12 Apr 2022 10:26:27 +0000 (12:26 +0200)]
ALSA: als300: Fix the missing snd_card_free() call at probe error

commit ab8bce9da6102c575c473c053672547589bc4c59 upstream.

The previous cleanup with devres may lead to the incorrect release
orders at the probe error handling due to the devres's nature.  Until
we register the card, snd_card_free() has to be called at first for
releasing the stuff properly when the driver tries to manage and
release the stuff via card->private_free().

This patch fixes it by calling snd_card_free() manually on the error
from the probe callback.

Fixes: 21a9314cf93b ("ALSA: als300: Allocate resources with device-managed APIs")
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20220412102636.16000-31-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoALSA: ali5451: Fix the missing snd_card_free() call at probe error
Takashi Iwai [Tue, 12 Apr 2022 10:26:01 +0000 (12:26 +0200)]
ALSA: ali5451: Fix the missing snd_card_free() call at probe error

commit 19401a9441236cfbbbeb1bef4ef4c8668db45dfc upstream.

The recent cleanup with devres may lead to the incorrect release
orders at the probe error handling due to the devres's nature.  Until
we register the card, snd_card_free() has to be called at first for
releasing the stuff properly when the driver tries to manage and
release the stuff via card->private_free().

This patch fixes it by calling snd_card_free() on the error from the
probe callback using a new helper function.

Fixes: 1f0819979248 ("ALSA: ali5451: Allocate resources with device-managed APIs")
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20220412102636.16000-5-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoALSA: sis7019: Fix the missing error handling
Takashi Iwai [Tue, 12 Apr 2022 10:26:24 +0000 (12:26 +0200)]
ALSA: sis7019: Fix the missing error handling

commit 2236a3243ff8291e97c70097dd11a0fdb8904380 upstream.

The previous cleanup with devres forgot to replace the snd_card_free()
call with the devm version.  Moreover, it still needs the manual call
of snd_card_free() at the probe error path, otherwise the reverse
order of the releases may happen.  This patch addresses those issues.

Fixes: 499ddc16394c ("ALSA: sis7019: Allocate resources with device-managed APIs")
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20220412102636.16000-28-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoALSA: core: Add snd_card_free_on_error() helper
Takashi Iwai [Tue, 12 Apr 2022 09:31:40 +0000 (11:31 +0200)]
ALSA: core: Add snd_card_free_on_error() helper

commit fee2b871d8d6389c9b4bdf9346a99ccc1c98c9b8 upstream.

This is a small helper function to handle the error path more easily
when an error happens during the probe for the device with the
device-managed card.  Since devres releases in the reverser order of
the creations, usually snd_card_free() gets called at the last in the
probe error path unless it already reached snd_card_register() calls.
Due to this nature, when a driver expects the resource releases in
card->private_free, this might be called too lately.

As a workaround, one should call the probe like:

 static int __some_probe(...) { // do real probe.... }

 static int some_probe(...)
 {
return snd_card_free_on_error(dev, __some_probe(dev, ...));
 }

so that the snd_card_free() is called explicitly at the beginning of
the error path from the probe.

This function will be used in the upcoming fixes to address the
regressions by devres usages.

Fixes: e8ad415b7a55 ("ALSA: core: Add managed card creation")
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20220412093141.8008-2-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agobtrfs: return allocated block group from do_chunk_alloc()
Naohiro Aota [Tue, 22 Mar 2022 09:11:33 +0000 (18:11 +0900)]
btrfs: return allocated block group from do_chunk_alloc()

commit 820c363bd526ec8e133e4b84e6ad1fda12023b4b upstream.

Return the allocated block group from do_chunk_alloc(). This is a
preparation patch for the next patch.

CC: stable@vger.kernel.org # 5.16+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agobtrfs: fix btrfs_submit_compressed_write cgroup attribution
Dennis Zhou [Thu, 31 Mar 2022 21:58:28 +0000 (14:58 -0700)]
btrfs: fix btrfs_submit_compressed_write cgroup attribution

commit acee08aaf6d158d03668dc82b0a0eef41100531b upstream.

This restores the logic from commit 46bcff2bfc5e ("btrfs: fix compressed
write bio blkcg attribution") which added cgroup attribution to btrfs
writeback. It also adds back the REQ_CGROUP_PUNT flag for these ios.

Fixes: 91507240482e ("btrfs: determine stripe boundary at bio allocation time in btrfs_submit_compressed_write")
CC: stable@vger.kernel.org # 5.16+
Signed-off-by: Dennis Zhou <dennis@kernel.org>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agobtrfs: release correct delalloc amount in direct IO write path
Naohiro Aota [Mon, 28 Mar 2022 12:32:05 +0000 (21:32 +0900)]
btrfs: release correct delalloc amount in direct IO write path

commit 6d82ad13c4110e73c7b0392f00534a1502a1b520 upstream.

Running generic/406 causes the following WARNING in btrfs_destroy_inode()
which tells there are outstanding extents left.

In btrfs_get_blocks_direct_write(), we reserve a temporary outstanding
extents with btrfs_delalloc_reserve_metadata() (or indirectly from
btrfs_delalloc_reserve_space(()). We then release the outstanding extents
with btrfs_delalloc_release_extents(). However, the "len" can be modified
in the COW case, which releases fewer outstanding extents than expected.

Fix it by calling btrfs_delalloc_release_extents() for the original length.

To reproduce the warning, the filesystem should be 1 GiB.  It's
triggering a short-write, due to not being able to allocate a large
extent and instead allocating a smaller one.

  WARNING: CPU: 0 PID: 757 at fs/btrfs/inode.c:8848 btrfs_destroy_inode+0x1e6/0x210 [btrfs]
  Modules linked in: btrfs blake2b_generic xor lzo_compress
  lzo_decompress raid6_pq zstd zstd_decompress zstd_compress xxhash zram
  zsmalloc
  CPU: 0 PID: 757 Comm: umount Not tainted 5.17.0-rc8+ #101
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS d55cb5a 04/01/2014
  RIP: 0010:btrfs_destroy_inode+0x1e6/0x210 [btrfs]
  RSP: 0018:ffffc9000327bda8 EFLAGS: 00010206
  RAX: 0000000000000000 RBX: ffff888100548b78 RCX: 0000000000000000
  RDX: 0000000000026900 RSI: 0000000000000000 RDI: ffff888100548b78
  RBP: ffff888100548940 R08: 0000000000000000 R09: ffff88810b48aba8
  R10: 0000000000000001 R11: ffff8881004eb240 R12: ffff88810b48a800
  R13: ffff88810b48ec08 R14: ffff88810b48ed00 R15: ffff888100490c68
  FS:  00007f8549ea0b80(0000) GS:ffff888237c00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f854a09e733 CR3: 000000010a2e9003 CR4: 0000000000370eb0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   <TASK>
   destroy_inode+0x33/0x70
   dispose_list+0x43/0x60
   evict_inodes+0x161/0x1b0
   generic_shutdown_super+0x2d/0x110
   kill_anon_super+0xf/0x20
   btrfs_kill_super+0xd/0x20 [btrfs]
   deactivate_locked_super+0x27/0x90
   cleanup_mnt+0x12c/0x180
   task_work_run+0x54/0x80
   exit_to_user_mode_prepare+0x152/0x160
   syscall_exit_to_user_mode+0x12/0x30
   do_syscall_64+0x42/0x80
   entry_SYSCALL_64_after_hwframe+0x44/0xae
   RIP: 0033:0x7f854a000fb7

Fixes: f0bfa76a11e9 ("btrfs: fix ENOSPC failure when attempting direct IO write into NOCOW range")
CC: stable@vger.kernel.org # 5.17
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agodrm/amdgpu: Ensure HDA function is suspended before ASIC reset
Kai-Heng Feng [Thu, 7 Apr 2022 12:12:28 +0000 (20:12 +0800)]
drm/amdgpu: Ensure HDA function is suspended before ASIC reset

commit 887f75cfd0da44c19dda93b2ff9e70ca8792cdc1 upstream.

DP/HDMI audio on AMD PRO VII stops working after S3:
[  149.450391] amdgpu 0000:63:00.0: amdgpu: MODE1 reset
[  149.450395] amdgpu 0000:63:00.0: amdgpu: GPU mode1 reset
[  149.450494] amdgpu 0000:63:00.0: amdgpu: GPU psp mode1 reset
[  149.983693] snd_hda_intel 0000:63:00.1: refused to change power state from D0 to D3hot
[  150.003439] amdgpu 0000:63:00.0: refused to change power state from D0 to D3hot
...
[  155.432975] snd_hda_intel 0000:63:00.1: CORB reset timeout#2, CORBRP = 65535

The offending commit is daf8de0874ab5b ("drm/amdgpu: always reset the asic in
suspend (v2)"). Commit 34452ac3038a7 ("drm/amdgpu: don't use BACO for
reset in S3 ") doesn't help, so the issue is something different.

Assuming that to make HDA resume to D0 fully realized, it needs to be
successfully put to D3 first. And this guesswork proves working, by
moving amdgpu_asic_reset() to noirq callback, so it's called after HDA
function is in D3.

Fixes: daf8de0874ab5b ("drm/amdgpu: always reset the asic in suspend (v2)")
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agouapi/linux/stddef.h: Add include guards
Tadeusz Struk [Tue, 29 Mar 2022 17:12:52 +0000 (10:12 -0700)]
uapi/linux/stddef.h: Add include guards

commit 55037ed7bdc62151a726f5685f88afa6a82959b1 upstream.

Add include guard wrapper define to uapi/linux/stddef.h to prevent macro
redefinition errors when stddef.h is included more than once. This was not
needed before since the only contents already used a redefinition test.

Signed-off-by: Tadeusz Struk <tadeusz.struk@linaro.org>
Link: https://lore.kernel.org/r/20220329171252.57279-1-tadeusz.struk@linaro.org
Fixes: 50d7bd38c3aa ("stddef: Introduce struct_group() helper macro")
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agomedia: si2157: unknown chip version Si2147-A30 ROM 0x50
Piotr Chmura [Thu, 31 Mar 2022 15:55:50 +0000 (17:55 +0200)]
media: si2157: unknown chip version Si2147-A30 ROM 0x50

commit 3ae87d2f25c0e998da2721ce332e2b80d3d53c39 upstream.

Fix firmware file names assignment in si2157 tuner, allow for running
devices without firmware files needed.

modprobe gives error: unknown chip version Si2147-A30 ROM 0x50
Device initialization is interrupted.

Caused by:
1. table si2157_tuners has swapped fields rom_id and required vs struct
   si2157_tuner_info.
2. both firmware file names can be null for devices with
   required == false - device uses build-in firmware in this case

Tested on this device:
m07ca:1871 AVerMedia Technologies, Inc. TD310 DVB-T/T2/C dongle

[mchehab: fix mangled patch]
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215726
Link: https://lore.kernel.org/lkml/5f660108-8812-383c-83e4-29ee0558d623@leemhuis.info/
Link: https://lore.kernel.org/linux-media/c4bcaff8-fbad-969e-ad47-e2c487ac02a1@gmail.com
Fixes: 1c35ba3bf972 ("media: si2157: use a different namespace for firmware")
Cc: stable@vger.kernel.org # 5.17.x
Signed-off-by: Piotr Chmura <chmooreck@gmail.com>
Tested-by: Robert Schlabbach <robert_s@gmx.net>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoRISC-V: KVM: Don't clear hgatp CSR in kvm_arch_vcpu_put()
Anup Patel [Sat, 9 Apr 2022 03:45:33 +0000 (09:15 +0530)]
RISC-V: KVM: Don't clear hgatp CSR in kvm_arch_vcpu_put()

commit 8c3ce496bd612bd21679e445f75fcabb6be997b2 upstream.

We might have RISC-V systems (such as QEMU) where VMID is not part
of the TLB entry tag so these systems will have to flush all TLB
entries upon any change in hgatp.VMID.

Currently, we zero-out hgatp CSR in kvm_arch_vcpu_put() and we
re-program hgatp CSR in kvm_arch_vcpu_load(). For above described
systems, this will flush all TLB entries whenever VCPU exits to
user-space hence reducing performance.

This patch fixes above described performance issue by not clearing
hgatp CSR in kvm_arch_vcpu_put().

Fixes: 34bde9d8b9e6 ("RISC-V: KVM: Implement VCPU world-switch")
Cc: stable@vger.kernel.org
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agobtrfs: remove unused variable in btrfs_{start,write}_dirty_block_groups()
Nathan Chancellor [Thu, 24 Mar 2022 15:36:45 +0000 (08:36 -0700)]
btrfs: remove unused variable in btrfs_{start,write}_dirty_block_groups()

commit 6d4a6b515c39f1f8763093e0f828959b2fbc2f45 upstream.

Clang's version of -Wunused-but-set-variable recently gained support for
unary operations, which reveals two unused variables:

  fs/btrfs/block-group.c:2949:6: error: variable 'num_started' set but not used [-Werror,-Wunused-but-set-variable]
          int num_started = 0;
              ^
  fs/btrfs/block-group.c:3116:6: error: variable 'num_started' set but not used [-Werror,-Wunused-but-set-variable]
          int num_started = 0;
              ^
  2 errors generated.

These variables appear to be unused from their introduction, so just
remove them to silence the warnings.

Fixes: c9dc4c657850 ("Btrfs: two stage dirty block group writeout")
Fixes: 1bbc621ef284 ("Btrfs: allow block group cache writeout outside critical section in commit")
CC: stable@vger.kernel.org # 5.4+
Link: https://github.com/ClangBuiltLinux/linux/issues/1614
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agobtrfs: remove no longer used counter when reading data page
Filipe Manana [Thu, 3 Feb 2022 15:36:44 +0000 (15:36 +0000)]
btrfs: remove no longer used counter when reading data page

commit ad3fc7946b1829213bbdbb2b9ad0d124b31ae4a7 upstream.

After commit 92082d40976ed0 ("btrfs: integrate page status update for
data read path into begin/end_page_read"), the 'nr' counter at
btrfs_do_readpage() is no longer used, we increment it but we never
read from it. So just remove it.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agonet: dsa: realtek: make interface drivers depend on OF
Alvin Šipraga [Wed, 23 Mar 2022 12:42:25 +0000 (13:42 +0100)]
net: dsa: realtek: make interface drivers depend on OF

commit 109d899452ba17996eccec7ae8249fb1f8900a16 upstream.

The kernel test robot reported build warnings with a randconfig that
built realtek-{smi,mdio} without CONFIG_OF set. Since both interface
drivers are using OF and will not probe without, add the corresponding
dependency to Kconfig.

Link: https://lore.kernel.org/all/202203231233.Xx73Y40o-lkp@intel.com/
Link: https://lore.kernel.org/all/202203231439.ycl0jg50-lkp@intel.com/
Fixes: aac94001067d ("net: dsa: realtek: add new mdio interface for drivers")
Fixes: 765c39a4fafe ("net: dsa: realtek: convert subdrivers into modules")
Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Link: https://lore.kernel.org/r/20220323124225.91763-1-alvin@pqrs.dk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
[alsi: backport to 5.16: remove mdio part]
Cc: stable@vger.kernel.org # v5.16+
Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agonet: dsa: realtek: rtl8365mb: serialize indirect PHY register access
Alvin Šipraga [Mon, 21 Feb 2022 18:46:31 +0000 (19:46 +0100)]
net: dsa: realtek: rtl8365mb: serialize indirect PHY register access

commit 2796728460b822d549841e0341752b263dc265c4 upstream.

Realtek switches in the rtl8365mb family can access the PHY registers of
the internal PHYs via the switch registers. This method is called
indirect access. At a high level, the indirect PHY register access
method involves reading and writing some special switch registers in a
particular sequence. This works for both SMI and MDIO connected
switches.

Currently the rtl8365mb driver does not take any care to serialize the
aforementioned access to the switch registers. In particular, it is
permitted for other driver code to access other switch registers while
the indirect PHY register access is ongoing. Locking is only done at the
regmap level. This, however, is a bug: concurrent register access, even
to unrelated switch registers, risks corrupting the PHY register value
read back via the indirect access method described above.

Arınç reported that the switch sometimes returns nonsense data when
reading the PHY registers. In particular, a value of 0 causes the
kernel's PHY subsystem to think that the link is down, but since most
reads return correct data, the link then flip-flops between up and down
over a period of time.

The aforementioned bug can be readily observed by:

 1. Enabling ftrace events for regmap and mdio
 2. Polling BSMR PHY register for a connected port;
    it should always read the same (e.g. 0x79ed)
 3. Wait for step 2 to give a different value

Example command for step 2:

    while true; do phytool read swp2/2/0x01; done

On my i.MX8MM, the above steps will yield a bogus value for the BSMR PHY
register within a matter of seconds. The interleaved register access it
then evident in the trace log:

 kworker/3:4-70      [003] .......  1927.139849: regmap_reg_write: ethernet-switch reg=1004 val=bd
     phytool-16816   [002] .......  1927.139979: regmap_reg_read: ethernet-switch reg=1f01 val=0
 kworker/3:4-70      [003] .......  1927.140381: regmap_reg_read: ethernet-switch reg=1005 val=0
     phytool-16816   [002] .......  1927.140468: regmap_reg_read: ethernet-switch reg=1d15 val=a69
 kworker/3:4-70      [003] .......  1927.140864: regmap_reg_read: ethernet-switch reg=1003 val=0
     phytool-16816   [002] .......  1927.140955: regmap_reg_write: ethernet-switch reg=1f02 val=2041
 kworker/3:4-70      [003] .......  1927.141390: regmap_reg_read: ethernet-switch reg=1002 val=0
     phytool-16816   [002] .......  1927.141479: regmap_reg_write: ethernet-switch reg=1f00 val=1
 kworker/3:4-70      [003] .......  1927.142311: regmap_reg_write: ethernet-switch reg=1004 val=be
     phytool-16816   [002] .......  1927.142410: regmap_reg_read: ethernet-switch reg=1f01 val=0
 kworker/3:4-70      [003] .......  1927.142534: regmap_reg_read: ethernet-switch reg=1005 val=0
     phytool-16816   [002] .......  1927.142618: regmap_reg_read: ethernet-switch reg=1f04 val=0
     phytool-16816   [002] .......  1927.142641: mdio_access: SMI-0 read  phy:0x02 reg:0x01 val:0x0000 <- ?!
 kworker/3:4-70      [003] .......  1927.143037: regmap_reg_read: ethernet-switch reg=1001 val=0
 kworker/3:4-70      [003] .......  1927.143133: regmap_reg_read: ethernet-switch reg=1000 val=2d89
 kworker/3:4-70      [003] .......  1927.143213: regmap_reg_write: ethernet-switch reg=1004 val=be
 kworker/3:4-70      [003] .......  1927.143291: regmap_reg_read: ethernet-switch reg=1005 val=0
 kworker/3:4-70      [003] .......  1927.143368: regmap_reg_read: ethernet-switch reg=1003 val=0
 kworker/3:4-70      [003] .......  1927.143443: regmap_reg_read: ethernet-switch reg=1002 val=6

The kworker here is polling MIB counters for stats, as evidenced by the
register 0x1004 that we are writing to (RTL8365MB_MIB_ADDRESS_REG). This
polling is performed every 3 seconds, but is just one example of such
unsynchronized access. In Arınç's case, the driver was not using the
switch IRQ, so the PHY subsystem was itself doing polling analogous to
phytool in the above example.

A test module was created [see second Link] to simulate such spurious
switch register accesses while performing indirect PHY register reads
and writes. Realtek was also consulted to confirm whether this is a
known issue or not. The conclusion of these lines of inquiry is as
follows:

1. Reading of PHY registers via indirect access will be aborted if,
   after executing the read operation (via a write to the
   INDIRECT_ACCESS_CTRL_REG), any register is accessed, other than
   INDIRECT_ACCESS_STATUS_REG.

2. The PHY register indirect read is only complete when
   INDIRECT_ACCESS_STATUS_REG reads zero.

3. The INDIRECT_ACCESS_DATA_REG, which is read to get the result of the
   PHY read, will contain the result of the last successful read
   operation. If there was spurious register access and the indirect
   read was aborted, then this register is not guaranteed to hold
   anything meaningful and the PHY read will silently fail.

4. PHY writes do not appear to be affected by this mechanism.

5. Other similar access routines, such as for MIB counters, although
   similar to the PHY indirect access method, are actually table access.
   Table access is not affected by spurious reads or writes of other
   registers. However, concurrent table access is not allowed. Currently
   this is protected via mib_lock, so there is nothing to fix.

The above statements are corroborated both via the test module and
through consultation with Realtek. In particular, Realtek states that
this is simply a property of the hardware design and is not a hardware
bug.

To fix this problem, one must guard against regmap access while the
PHY indirect register read is executing. Fix this by using the newly
introduced "nolock" regmap in all PHY-related functions, and by aquiring
the regmap mutex at the top level of the PHY register access callbacks.
Although no issue has been observed with PHY register _writes_, this
change also serializes the indirect access method there. This is done
purely as a matter of convenience and for reasons of symmetry.

Fixes: 4af2950c50c8 ("net: dsa: realtek-smi: add rtl8365mb subdriver for RTL8365MB-VC")
Link: https://lore.kernel.org/netdev/CAJq09z5FCgG-+jVT7uxh1a-0CiiFsoKoHYsAWJtiKwv7LXKofQ@mail.gmail.com/
Link: https://lore.kernel.org/netdev/871qzwjmtv.fsf@bang-olufsen.dk/
Reported-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Reported-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
[alsi: backport to 5.16: s/priv/smi/g]
Cc: stable@vger.kernel.org # v5.16+
Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agonet: dsa: realtek: allow subdrivers to externally lock regmap
Alvin Šipraga [Mon, 21 Feb 2022 18:46:30 +0000 (19:46 +0100)]
net: dsa: realtek: allow subdrivers to externally lock regmap

commit 907e772f6f6debb610ea28298ab57b31019a4edb upstream.

Currently there is no way for Realtek DSA subdrivers to serialize
consecutive regmap accesses. In preparation for a bugfix relating to
indirect PHY register access - which involves a series of regmap
reads and writes - add a facility for subdrivers to serialize their
regmap access.

Specifically, a mutex is added to the driver private data structure and
the standard regmap is initialized with custom lock/unlock ops which use
this mutex. Then, a "nolock" variant of the regmap is added, which is
functionally equivalent to the existing regmap except that regmap
locking is disabled. Functions that wish to serialize a sequence of
regmap accesses may then lock the newly introduced driver-owned mutex
before using the nolock regmap.

Doing things this way means that subdriver code that doesn't care about
serialized register access - i.e. the vast majority of code - needn't
worry about synchronizing register access with an external lock: it can
just continue to use the original regmap.

Another advantage of this design is that, while regmaps with locking
disabled do not expose a debugfs interface for obvious reasons, there
still exists the original regmap which does expose this interface. This
interface remains safe to use even combined with driver codepaths that
use the nolock regmap, because said codepaths will use the same mutex
to synchronize access.

With respect to disadvantages, it can be argued that having
near-duplicate regmaps is confusing. However, the naming is rather
explicit, and examples will abound.

Finally, while we are at it, rename realtek_smi_mdio_regmap_config to
realtek_smi_regmap_config. This makes it consistent with the naming
realtek_mdio_regmap_config in realtek-mdio.c.

Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[alsi: backport to 5.16: s/priv/smi/g and remove realtek-mdio changes]
Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoACPI: processor idle: Check for architectural support for LPI
Mario Limonciello [Fri, 25 Feb 2022 19:06:46 +0000 (13:06 -0600)]
ACPI: processor idle: Check for architectural support for LPI

commit eb087f305919ee8169ad65665610313e74260463 upstream.

When `osc_pc_lpi_support_confirmed` is set through `_OSC` and `_LPI` is
populated then the cpuidle driver assumes that LPI is fully functional.

However currently the kernel only provides architectural support for LPI
on ARM.  This leads to high power consumption on X86 platforms that
otherwise try to enable LPI.

So probe whether or not LPI support is implemented before enabling LPI in
the kernel.  This is done by overloading `acpi_processor_ffh_lpi_probe` to
check whether it returns `-EOPNOTSUPP`. It also means that all future
implementations of `acpi_processor_ffh_lpi_probe` will need to follow
these semantics as well.

Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agocpuidle: PSCI: Move the `has_lpi` check to the beginning of the function
Mario Limonciello [Fri, 25 Feb 2022 19:06:45 +0000 (13:06 -0600)]
cpuidle: PSCI: Move the `has_lpi` check to the beginning of the function

commit 01f6c7338ce267959975da65d86ba34f44d54220 upstream.

Currently the first thing checked is whether the PCSI cpu_suspend function
has been initialized.

Another change will be overloading `acpi_processor_ffh_lpi_probe` and
calling it sooner.  So make the `has_lpi` check the first thing checked
to prepare for that change.

Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agodrm/amd/display: Fix p-state allow debug index on dcn31
Nicholas Kazlauskas [Fri, 18 Mar 2022 15:10:34 +0000 (11:10 -0400)]
drm/amd/display: Fix p-state allow debug index on dcn31

commit 3107e1a7ae088ee94323fe9ab05dbefd65b3077f upstream.

[Why]
It changed since dcn30 but the hubbub31 constructor hasn't been
modified to reflect this.

[How]
Update the value in the constructor to 0x6 so we're checking the right
bits for p-state allow.

It worked before by accident, but can falsely assert 0 depending on HW
state transitions. The most frequent of which appears to be when
all pipes turn off during IGT tests.

Cc: Harry Wentland <harry.wentland@amd.com>
Fixes: e7031d8258f1b4 ("drm/amd/display: Add pstate verification and recovery for DCN31")
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Eric Yang <Eric.Yang2@amd.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agodrm/amd/display: Add pstate verification and recovery for DCN31
Nicholas Kazlauskas [Fri, 17 Dec 2021 14:55:54 +0000 (09:55 -0500)]
drm/amd/display: Add pstate verification and recovery for DCN31

commit e7031d8258f1b4d6d50e5e5b5d92ba16f66eb8b4 upstream.

[Why]
To debug when p-state is being blocked and avoid PMFW hangs when
it does occur.

[How]
Re-use the DCN10 hardware sequencer by adding a new interface for
verifying p-state high on the hubbub. The interface is mostly the
same as the DCN10 interface, but the bit definitions have changed for
the debug bus.

Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Eric Yang <Eric.Yang2@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoLinux 5.17.3 v5.17.3
Greg Kroah-Hartman [Wed, 13 Apr 2022 17:27:43 +0000 (19:27 +0200)]
Linux 5.17.3

Link: https://lore.kernel.org/r/20220412062951.095765152@linuxfoundation.org
Tested-by: Ronald Warsow <rwarsow@gmx.de>
Tested-by: Ron Economos <re@w6rz.net>
Tested-by: Justin M. Forbes <jforbes@fedoraproject.org>
Tested-by: Shuah Khan <skhan@linuxfoundation.org>
Tested-by: Fox Chen <foxhlchen@gmail.com>
Tested-by: Zan Aziz <zanaziz313@gmail.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
Tested-by: Jiri Slaby <jirislaby@kernel.org>
Tested-by: Rudi Heitbaum <rudi@heitbaum.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoio_uring: drop the old style inflight file tracking
Jens Axboe [Thu, 31 Mar 2022 18:38:46 +0000 (12:38 -0600)]
io_uring: drop the old style inflight file tracking

commit d5361233e9ab920e135819f73dd8466355f1fddd upstream.

io_uring tracks requests that are referencing an io_uring descriptor to
be able to cancel without worrying about loops in the references. Since
we now assign the file at execution time, the easier approach is to drop
a potentially problematic reference before we punt the request. This
eliminates the need to special case these types of files beyond just
marking them as such, and simplifies cancelation quite a bit.

This also fixes a recent issue where an async punted tee operation would
with the io_uring descriptor as the output file would crash when
attempting to get a reference to the file from the io-wq worker. We
could have worked around that, but this is the much cleaner fix.

Fixes: 6bf9c47a3989 ("io_uring: defer file assignment")
Reported-by: syzbot+c4b9303500a21750b250@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoio_uring: defer file assignment
Jens Axboe [Tue, 29 Mar 2022 16:10:08 +0000 (10:10 -0600)]
io_uring: defer file assignment

commit 6bf9c47a398911e0ab920e362115153596c80432 upstream.

If an application uses direct open or accept, it knows in advance what
direct descriptor value it will get as it picks it itself. This allows
combined requests such as:

sqe = io_uring_get_sqe(ring);
io_uring_prep_openat_direct(sqe, ..., file_slot);
sqe->flags |= IOSQE_IO_LINK | IOSQE_CQE_SKIP_SUCCESS;

sqe = io_uring_get_sqe(ring);
io_uring_prep_read(sqe,file_slot, buf, buf_size, 0);
sqe->flags |= IOSQE_FIXED_FILE;

io_uring_submit(ring);

where we prepare both a file open and read, and only get a completion
event for the read when both have completed successfully.

Currently links are fully prepared before the head is issued, but that
fails if the dependent link needs a file assigned that isn't valid until
the head has completed.

Conversely, if the same chain is performed but the fixed file slot is
already valid, then we would be unexpectedly returning data from the
old file slot rather than the newly opened one. Make sure we're
consistent here.

Allow deferral of file setup, which makes this documented case work.

Cc: stable@vger.kernel.org # v5.15+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoio_uring: propagate issue_flags state down to file assignment
Jens Axboe [Mon, 4 Apr 2022 23:18:43 +0000 (17:18 -0600)]
io_uring: propagate issue_flags state down to file assignment

commit 5106dd6e74ab6c94daac1c357094f11e6934b36f upstream.

We'll need this in a future patch, when we could be assigning the file
after the prep stage. While at it, get rid of the io_file_get() helper,
it just makes the code harder to read.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoio_uring: move read/write file prep state into actual opcode handler
Jens Axboe [Tue, 29 Mar 2022 16:48:05 +0000 (10:48 -0600)]
io_uring: move read/write file prep state into actual opcode handler

commit 584b0180f0f4d67d7145950fe68c625f06c88b10 upstream.

In preparation for not necessarily having a file assigned at prep time,
defer any initialization associated with the file to when the opcode
handler is run.

Cc: stable@vger.kernel.org # v5.15+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agostatic_call: Don't make __static_call_return0 static
Christophe Leroy [Mon, 14 Mar 2022 11:49:36 +0000 (12:49 +0100)]
static_call: Don't make __static_call_return0 static

commit 8fd4ddda2f49a66bf5dd3d0c01966c4b1971308b upstream.

System.map shows that vmlinux contains several instances of
__static_call_return0():

c0004fc0 t __static_call_return0
c0011518 t __static_call_return0
c00d8160 t __static_call_return0

arch_static_call_transform() uses the middle one to check whether we are
setting a call to __static_call_return0 or not:

c0011520 <arch_static_call_transform>:
c0011520:       3d 20 c0 01     lis     r9,-16383 <== r9 =  0xc001 << 16
c0011524:       39 29 15 18     addi    r9,r9,5400 <== r9 += 0x1518
c0011528:       7c 05 48 00     cmpw    r5,r9 <== r9 has value 0xc0011518 here

So if static_call_update() is called with one of the other instances of
__static_call_return0(), arch_static_call_transform() won't recognise it.

In order to work properly, global single instance of __static_call_return0() is required.

Fixes: 3f2a8fc4b15d ("static_call/x86: Add __static_call_return0()")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/30821468a0e7d28251954b578e5051dc09300d04.1647258493.git.christophe.leroy@csgroup.eu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agomm/sparsemem: fix 'mem_section' will never be NULL gcc 12 warning
Waiman Long [Fri, 8 Apr 2022 20:09:01 +0000 (13:09 -0700)]
mm/sparsemem: fix 'mem_section' will never be NULL gcc 12 warning

commit a431dbbc540532b7465eae4fc8b56a85a9fc7d17 upstream.

The gcc 12 compiler reports a "'mem_section' will never be NULL" warning
on the following code:

    static inline struct mem_section *__nr_to_section(unsigned long nr)
    {
    #ifdef CONFIG_SPARSEMEM_EXTREME
        if (!mem_section)
                return NULL;
    #endif
        if (!mem_section[SECTION_NR_TO_ROOT(nr)])
                return NULL;
       :

It happens with CONFIG_SPARSEMEM_EXTREME off.  The mem_section definition
is

    #ifdef CONFIG_SPARSEMEM_EXTREME
    extern struct mem_section **mem_section;
    #else
    extern struct mem_section mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT];
    #endif

In the !CONFIG_SPARSEMEM_EXTREME case, mem_section is a static
2-dimensional array and so the check "!mem_section[SECTION_NR_TO_ROOT(nr)]"
doesn't make sense.

Fix this warning by moving the "!mem_section[SECTION_NR_TO_ROOT(nr)]"
check up inside the CONFIG_SPARSEMEM_EXTREME block and adding an
explicit NR_SECTION_ROOTS check to make sure that there is no
out-of-bound array access.

Link: https://lkml.kernel.org/r/20220331180246.2746210-1-longman@redhat.com
Fixes: 3e347261a80b ("sparsemem extreme implementation")
Signed-off-by: Waiman Long <longman@redhat.com>
Reported-by: Justin Forbes <jforbes@redhat.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Rafael Aquini <aquini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoirqchip/gic, gic-v3: Prevent GSI to SGI translations
Andre Przywara [Mon, 4 Apr 2022 11:08:42 +0000 (12:08 +0100)]
irqchip/gic, gic-v3: Prevent GSI to SGI translations

commit 544808f7e21cb9ccdb8f3aa7de594c05b1419061 upstream.

At the moment the GIC IRQ domain translation routine happily converts
ACPI table GSI numbers below 16 to GIC SGIs (Software Generated
Interrupts aka IPIs). On the Devicetree side we explicitly forbid this
translation, actually the function will never return HWIRQs below 16 when
using a DT based domain translation.

We expect SGIs to be handled in the first part of the function, and any
further occurrence should be treated as a firmware bug, so add a check
and print to report this explicitly and avoid lengthy debug sessions.

Fixes: 64b499d8df40 ("irqchip/gic-v3: Configure SGIs as standard interrupts")
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220404110842.2882446-1-andre.przywara@arm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agopowerpc/64: Fix build failure with allyesconfig in book3s_64_entry.S
Christophe Leroy [Sun, 27 Mar 2022 07:32:26 +0000 (09:32 +0200)]
powerpc/64: Fix build failure with allyesconfig in book3s_64_entry.S

commit af41d2866f7d75bbb38d487f6ec7770425d70e45 upstream.

Using conditional branches between two files is hasardous,
they may get linked too far from each other.

  arch/powerpc/kvm/book3s_64_entry.o:(.text+0x3ec): relocation truncated
  to fit: R_PPC64_REL14 (stub) against symbol `system_reset_common'
  defined in .text section in arch/powerpc/kernel/head_64.o

Reorganise the code to use non conditional branches.

Fixes: 89d35b239101 ("KVM: PPC: Book3S HV P9: Implement the rest of the P9 path in C")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Avoid odd-looking bne ., use named local labels]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/89cf27bf43ee07a0b2879b9e8e2f5cd6386a3645.1648366338.git.christophe.leroy@csgroup.eu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoirqchip/gic-v4: Wait for GICR_VPENDBASER.Dirty to clear before descheduling
Marc Zyngier [Thu, 17 Mar 2022 09:49:02 +0000 (09:49 +0000)]
irqchip/gic-v4: Wait for GICR_VPENDBASER.Dirty to clear before descheduling

commit af27e41612ec7e5b4783f589b753a7c31a37aac8 upstream.

The way KVM drives GICv4.{0,1} is as follows:
- vcpu_load() makes the VPE resident, instructing the RD to start
  scanning for interrupts
- just before entering the guest, we check that the RD has finished
  scanning and that we can start running the vcpu
- on preemption, we deschedule the VPE by making it invalid on
  the RD

However, we are preemptible between the first two steps. If it so
happens *and* that the RD was still scanning, we nonetheless write
to the GICR_VPENDBASER register while Dirty is set, and bad things
happen (we're in UNPRED land).

This affects both the 4.0 and 4.1 implementations.

Make sure Dirty is cleared before performing the deschedule,
meaning that its_clear_vpend_valid() becomes a sort of full VPE
residency barrier.

Reported-by: Jingyi Wang <wangjingyi11@huawei.com>
Tested-by: Nianyao Tang <tangnianyao@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Fixes: 57e3cebd022f ("KVM: arm64: Delay the polling of the GICR_VPENDBASER.Dirty bit")
Link: https://lore.kernel.org/r/4aae10ba-b39a-5f84-754b-69c2eb0a2c03@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agox86/extable: Prefer local labels in .set directives
Nick Desaulniers [Tue, 29 Mar 2022 20:21:45 +0000 (13:21 -0700)]
x86/extable: Prefer local labels in .set directives

commit 334865b2915c33080624e0d06f1c3e917036472c upstream.

Bernardo reported an error that Nathan bisected down to
(x86_64) defconfig+LTO_CLANG_FULL+X86_PMEM_LEGACY.

    LTO     vmlinux.o
  ld.lld: error: <instantiation>:1:13: redefinition of 'found'
  .set found, 0
              ^

  <inline asm>:29:1: while in macro instantiation
  extable_type_reg reg=%eax, type=(17 | ((0) << 16))
  ^

This appears to be another LTO specific issue similar to what was folded
into commit 4b5305decc84 ("x86/extable: Extend extable functionality"),
where the `.set found, 0` in DEFINE_EXTABLE_TYPE_REG in
arch/x86/include/asm/asm.h conflicts with the symbol for the static
function `found` in arch/x86/kernel/pmem.c.

Assembler .set directive declare symbols with global visibility, so the
assembler may not rename such symbols in the event of a conflict. LTO
could rename static functions if there was a conflict in C sources, but
it cannot see into symbols defined in inline asm.

The symbols are also retained in the symbol table, regardless of LTO.

Give the symbols .L prefixes making them locally visible, so that they
may be renamed for LTO to avoid conflicts, and to drop them from the
symbol table regardless of LTO.

Fixes: 4b5305decc84 ("x86/extable: Extend extable functionality")
Reported-by: Bernardo Meurer Costa <beme@google.com>
Debugged-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20220329202148.2379697-1-ndesaulniers@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agox86,static_call: Fix __static_call_return0 for i386
Peter Zijlstra [Fri, 18 Mar 2022 20:24:38 +0000 (21:24 +0100)]
x86,static_call: Fix __static_call_return0 for i386

commit 1cd5f059d956e6f614ba6666ecdbcf95db05d5f5 upstream.

Paolo reported that the instruction sequence that is used to replace:

    call __static_call_return0

namely:

    66 66 48 31 c0 data16 data16 xor %rax,%rax

decodes to something else on i386, namely:

    66 66 48 data16 dec %ax
    31 c0 xor    %eax,%eax

Which is a nonsensical sequence that happens to have the same outcome.
*However* an important distinction is that it consists of 2
instructions which is a problem when the thing needs to be overwriten
with a regular call instruction again.

As such, replace the instruction with something that decodes the same
on both i386 and x86_64.

Fixes: 3f2a8fc4b15d ("static_call/x86: Add __static_call_return0()")
Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220318204419.GT8939@worktop.programming.kicks-ass.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agosched: Teach the forced-newidle balancer about CPU affinity limitation.
Sebastian Andrzej Siewior [Thu, 17 Mar 2022 14:51:32 +0000 (15:51 +0100)]
sched: Teach the forced-newidle balancer about CPU affinity limitation.

commit 386ef214c3c6ab111d05e1790e79475363abaa05 upstream.

try_steal_cookie() looks at task_struct::cpus_mask to decide if the
task could be moved to `this' CPU. It ignores that the task might be in
a migration disabled section while not on the CPU. In this case the task
must not be moved otherwise per-CPU assumption are broken.

Use is_cpu_allowed(), as suggested by Peter Zijlstra, to decide if the a
task can be moved.

Fixes: d2dfa17bc7de6 ("sched: Trivial forced-newidle balancer")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/YjNK9El+3fzGmswf@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agosched/core: Fix forceidle balancing
Peter Zijlstra [Wed, 16 Mar 2022 21:03:41 +0000 (22:03 +0100)]
sched/core: Fix forceidle balancing

commit 5b6547ed97f4f5dfc23f8e3970af6d11d7b7ed7e upstream.

Steve reported that ChromeOS encounters the forceidle balancer being
ran from rt_mutex_setprio()'s balance_callback() invocation and
explodes.

Now, the forceidle balancer gets queued every time the idle task gets
selected, set_next_task(), which is strictly too often.
rt_mutex_setprio() also uses set_next_task() in the 'change' pattern:

queued = task_on_rq_queued(p); /* p->on_rq == TASK_ON_RQ_QUEUED */
running = task_current(rq, p); /* rq->curr == p */

if (queued)
dequeue_task(...);
if (running)
put_prev_task(...);

/* change task properties */

if (queued)
enqueue_task(...);
if (running)
set_next_task(...);

However, rt_mutex_setprio() will explicitly not run this pattern on
the idle task (since priority boosting the idle task is quite insane).
Most other 'change' pattern users are pidhash based and would also not
apply to idle.

Also, the change pattern doesn't contain a __balance_callback()
invocation and hence we could have an out-of-band balance-callback,
which *should* trigger the WARN in rq_pin_lock() (which guards against
this exact anti-pattern).

So while none of that explains how this happens, it does indicate that
having it in set_next_task() might not be the most robust option.

Instead, explicitly queue the forceidle balancer from pick_next_task()
when it does indeed result in forceidle selection. Having it here,
ensures it can only be triggered under the __schedule() rq->lock
instance, and hence must be ran from that context.

This also happens to clean up the code a little, so win-win.

Fixes: d2dfa17bc7de ("sched: Trivial forced-newidle balancer")
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: T.J. Alumbaugh <talumbau@chromium.org>
Link: https://lkml.kernel.org/r/20220330160535.GN8939@worktop.programming.kicks-ass.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoobjtool: Fix SLS validation for kcov tail-call replacement
Peter Zijlstra [Wed, 23 Mar 2022 22:35:01 +0000 (23:35 +0100)]
objtool: Fix SLS validation for kcov tail-call replacement

commit 7a53f408902d913cd541b4f8ad7dbcd4961f5b82 upstream.

Since not all compilers have a function attribute to disable KCOV
instrumentation, objtool can rewrite KCOV instrumentation in noinstr
functions as per commit:

  f56dae88a81f ("objtool: Handle __sanitize_cov*() tail calls")

However, this has subtle interaction with the SLS validation from
commit:

  1cc1e4c8aab4 ("objtool: Add straight-line-speculation validation")

In that when a tail-call instrucion is replaced with a RET an
additional INT3 instruction is also written, but is not represented in
the decoded instruction stream.

This then leads to false positive missing INT3 objtool warnings in
noinstr code.

Instead of adding additional struct instruction objects, mark the RET
instruction with retpoline_safe to suppress the warning (since we know
there really is an INT3).

Fixes: 1cc1e4c8aab4 ("objtool: Add straight-line-speculation validation")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220323230712.GA8939@worktop.programming.kicks-ass.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agox86/bug: Prevent shadowing in __WARN_FLAGS
Vincent Mailhol [Thu, 24 Mar 2022 02:37:42 +0000 (11:37 +0900)]
x86/bug: Prevent shadowing in __WARN_FLAGS

commit 9ce02f0fc68326dd1f87a0a3a4c6ae7fdd39e6f6 upstream.

The macro __WARN_FLAGS() uses a local variable named "f". This being a
common name, there is a risk of shadowing other variables.

For example, GCC would yield:

| In file included from ./include/linux/bug.h:5,
|                  from ./include/linux/cpumask.h:14,
|                  from ./arch/x86/include/asm/cpumask.h:5,
|                  from ./arch/x86/include/asm/msr.h:11,
|                  from ./arch/x86/include/asm/processor.h:22,
|                  from ./arch/x86/include/asm/timex.h:5,
|                  from ./include/linux/timex.h:65,
|                  from ./include/linux/time32.h:13,
|                  from ./include/linux/time.h:60,
|                  from ./include/linux/stat.h:19,
|                  from ./include/linux/module.h:13,
|                  from virt/lib/irqbypass.mod.c:1:
| ./include/linux/rcupdate.h: In function 'rcu_head_after_call_rcu':
| ./arch/x86/include/asm/bug.h:80:21: warning: declaration of 'f' shadows a parameter [-Wshadow]
|    80 |         __auto_type f = BUGFLAG_WARNING|(flags);                \
|       |                     ^
| ./include/asm-generic/bug.h:106:17: note: in expansion of macro '__WARN_FLAGS'
|   106 |                 __WARN_FLAGS(BUGFLAG_ONCE |                     \
|       |                 ^~~~~~~~~~~~
| ./include/linux/rcupdate.h:1007:9: note: in expansion of macro 'WARN_ON_ONCE'
|  1007 |         WARN_ON_ONCE(func != (rcu_callback_t)~0L);
|       |         ^~~~~~~~~~~~
| In file included from ./include/linux/rbtree.h:24,
|                  from ./include/linux/mm_types.h:11,
|                  from ./include/linux/buildid.h:5,
|                  from ./include/linux/module.h:14,
|                  from virt/lib/irqbypass.mod.c:1:
| ./include/linux/rcupdate.h:1001:62: note: shadowed declaration is here
|  1001 | rcu_head_after_call_rcu(struct rcu_head *rhp, rcu_callback_t f)
|       |                                               ~~~~~~~~~~~~~~~^

For reference, sparse also warns about it, c.f. [1].

This patch renames the variable from f to __flags (with two underscore
prefixes as suggested in the Linux kernel coding style [2]) in order
to prevent collisions.

[1] https://lore.kernel.org/all/CAFGhKbyifH1a+nAMCvWM88TK6fpNPdzFtUXPmRGnnQeePV+1sw@mail.gmail.com/

[2] Linux kernel coding style, section 12) Macros, Enums and RTL,
paragraph 5) namespace collisions when defining local variables in
macros resembling functions
https://www.kernel.org/doc/html/latest/process/coding-style.html#macros-enums-and-rtl

Fixes: bfb1a7c91fb7 ("x86/bug: Merge annotate_reachable() into_BUG_FLAGS() asm")
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20220324023742.106546-1-mailhol.vincent@wanadoo.fr
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoRevert "powerpc: Set max_mapnr correctly"
Kefeng Wang [Wed, 6 Apr 2022 14:57:58 +0000 (00:57 +1000)]
Revert "powerpc: Set max_mapnr correctly"

commit 1ff5c8e8c835e8a81c0868e3050c76563dd56a2c upstream.

This reverts commit 602946ec2f90d5bd965857753880db29d2d9a1e9.

If CONFIG_HIGHMEM is enabled, no highmem will be added with max_mapnr
set to max_low_pfn, see mem_init():

  for (pfn = highmem_mapnr; pfn < max_mapnr; ++pfn) {
        ...
        free_highmem_page();
  }

Now that virt_addr_valid() has been fixed in the previous commit, we can
revert the change to max_mapnr.

Fixes: 602946ec2f90 ("powerpc: Set max_mapnr correctly")
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reported-by: Erhard F. <erhard_f@mailbox.org>
[mpe: Update change log to reflect series reordering]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220406145802.538416-2-mpe@ellerman.id.au
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agopowerpc: Fix virt_addr_valid() for 64-bit Book3E & 32-bit
Kefeng Wang [Wed, 6 Apr 2022 14:57:57 +0000 (00:57 +1000)]
powerpc: Fix virt_addr_valid() for 64-bit Book3E & 32-bit

commit ffa0b64e3be58519ae472ea29a1a1ad681e32f48 upstream.

mpe: On 64-bit Book3E vmalloc space starts at 0x8000000000000000.

Because of the way __pa() works we have:
  __pa(0x8000000000000000) == 0, and therefore
  virt_to_pfn(0x8000000000000000) == 0, and therefore
  virt_addr_valid(0x8000000000000000) == true

Which is wrong, virt_addr_valid() should be false for vmalloc space.
In fact all vmalloc addresses that alias with a valid PFN will return
true from virt_addr_valid(). That can cause bugs with hardened usercopy
as described below by Kefeng Wang:

  When running ethtool eth0 on 64-bit Book3E, a BUG occurred:

    usercopy: Kernel memory exposure attempt detected from SLUB object not in SLUB page?! (offset 0, size 1048)!
    kernel BUG at mm/usercopy.c:99
    ...
    usercopy_abort+0x64/0xa0 (unreliable)
    __check_heap_object+0x168/0x190
    __check_object_size+0x1a0/0x200
    dev_ethtool+0x2494/0x2b20
    dev_ioctl+0x5d0/0x770
    sock_do_ioctl+0xf0/0x1d0
    sock_ioctl+0x3ec/0x5a0
    __se_sys_ioctl+0xf0/0x160
    system_call_exception+0xfc/0x1f0
    system_call_common+0xf8/0x200

  The code shows below,

    data = vzalloc(array_size(gstrings.len, ETH_GSTRING_LEN));
    copy_to_user(useraddr, data, gstrings.len * ETH_GSTRING_LEN))

  The data is alloced by vmalloc(), virt_addr_valid(ptr) will return true
  on 64-bit Book3E, which leads to the panic.

  As commit 4dd7554a6456 ("powerpc/64: Add VIRTUAL_BUG_ON checks for __va
  and __pa addresses") does, make sure the virt addr above PAGE_OFFSET in
  the virt_addr_valid() for 64-bit, also add upper limit check to make
  sure the virt is below high_memory.

  Meanwhile, for 32-bit PAGE_OFFSET is the virtual address of the start
  of lowmem, high_memory is the upper low virtual address, the check is
  suitable for 32-bit, this will fix the issue mentioned in commit
  602946ec2f90 ("powerpc: Set max_mapnr correctly") too.

On 32-bit there is a similar problem with high memory, that was fixed in
commit 602946ec2f90 ("powerpc: Set max_mapnr correctly"), but that
commit breaks highmem and needs to be reverted.

We can't easily fix __pa(), we have code that relies on its current
behaviour. So for now add extra checks to virt_addr_valid().

For 64-bit Book3S the extra checks are not necessary, the combination of
virt_to_pfn() and pfn_valid() should yield the correct result, but they
are harmless.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Add additional change log detail]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220406145802.538416-1-mpe@ellerman.id.au
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoDrivers: hv: vmbus: Replace smp_store_mb() with virt_store_mb()
Andrea Parri (Microsoft) [Mon, 28 Mar 2022 15:44:57 +0000 (17:44 +0200)]
Drivers: hv: vmbus: Replace smp_store_mb() with virt_store_mb()

commit eaa03d34535872d29004cb5cf77dc9dec1ba9a25 upstream.

Following the recommendation in Documentation/memory-barriers.txt for
virtual machine guests.

Fixes: 8b6a877c060ed ("Drivers: hv: vmbus: Replace the per-CPU channel lists with a global array of channels")
Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Link: https://lore.kernel.org/r/20220328154457.100872-1-parri.andrea@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoKVM: avoid NULL pointer dereference in kvm_dirty_ring_push
Paolo Bonzini [Wed, 6 Apr 2022 17:13:42 +0000 (13:13 -0400)]
KVM: avoid NULL pointer dereference in kvm_dirty_ring_push

commit 5593473a1e6c743764b08e3b6071cb43b5cfa6c4 upstream.

kvm_vcpu_release() will call kvm_dirty_ring_free(), freeing
ring->dirty_gfns and setting it to NULL.  Afterwards, it calls
kvm_arch_vcpu_destroy().

However, if closing the file descriptor races with KVM_RUN in such away
that vcpu->arch.st.preempted == 0, the following call stack leads to a
NULL pointer dereference in kvm_dirty_run_push():

 mark_page_dirty_in_slot+0x192/0x270 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3171
 kvm_steal_time_set_preempted arch/x86/kvm/x86.c:4600 [inline]
 kvm_arch_vcpu_put+0x34e/0x5b0 arch/x86/kvm/x86.c:4618
 vcpu_put+0x1b/0x70 arch/x86/kvm/../../../virt/kvm/kvm_main.c:211
 vmx_free_vcpu+0xcb/0x130 arch/x86/kvm/vmx/vmx.c:6985
 kvm_arch_vcpu_destroy+0x76/0x290 arch/x86/kvm/x86.c:11219
 kvm_vcpu_destroy arch/x86/kvm/../../../virt/kvm/kvm_main.c:441 [inline]

The fix is to release the dirty page ring after kvm_arch_vcpu_destroy
has run.

Reported-by: Qiuhao Li <qiuhao@sysec.org>
Reported-by: Gaoning Pan <pgn@zju.edu.cn>
Reported-by: Yongkang Jia <kangel@zju.edu.cn>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agodmaengine: Revert "dmaengine: shdma: Fix runtime PM imbalance on error"
Vinod Koul [Thu, 10 Mar 2022 04:43:20 +0000 (10:13 +0530)]
dmaengine: Revert "dmaengine: shdma: Fix runtime PM imbalance on error"

commit d143f939a95696d38ff800ada14402fa50ebbd6c upstream.

This reverts commit 455896c53d5b ("dmaengine: shdma: Fix runtime PM
imbalance on error") as the patch wrongly reduced the count on error and
did not bail out. So drop the count by reverting the patch .

Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agotools build: Use $(shell ) instead of `` to get embedded libperl's ccopts
Arnaldo Carvalho de Melo [Mon, 4 Apr 2022 20:28:48 +0000 (17:28 -0300)]
tools build: Use $(shell ) instead of `` to get embedded libperl's ccopts

commit 541f695cbcb6932c22638b06e0cbe1d56177e2e9 upstream.

Just like its done for ldopts and for both in tools/perf/Makefile.config.

Using `` to initialize PERL_EMBED_CCOPTS somehow precludes using:

  $(filter-out SOMETHING_TO_FILTER,$(PERL_EMBED_CCOPTS))

And we need to do it to allow for building with versions of clang where
some gcc options selected by distros are not available.

Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # Debian/Selfmade LLVM-14 (x86-64)
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Fangrui Song <maskray@google.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Keeping <john@metanate.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Link: http://lore.kernel.org/lkml/YktYX2OnLtyobRYD@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agotools build: Filter out options and warnings not supported by clang
Arnaldo Carvalho de Melo [Tue, 5 Apr 2022 13:33:21 +0000 (10:33 -0300)]
tools build: Filter out options and warnings not supported by clang

commit 41caff459a5b956b3e23ba9ca759dd0629ad3dda upstream.

These make the feature check fail when using clang, so remove them just
like is done in tools/perf/Makefile.config to build perf itself.

Adding -Wno-compound-token-split-by-macro to tools/perf/Makefile.config
when building with clang is also necessary to avoid these warnings
turned into errors (-Werror):

    CC      /tmp/build/perf/util/scripting-engines/trace-event-perl.o
  In file included from util/scripting-engines/trace-event-perl.c:35:
  In file included from /usr/lib64/perl5/CORE/perl.h:4085:
  In file included from /usr/lib64/perl5/CORE/hv.h:659:
  In file included from /usr/lib64/perl5/CORE/hv_func.h:34:
  In file included from /usr/lib64/perl5/CORE/sbox32_hash.h:4:
  /usr/lib64/perl5/CORE/zaphod32_hash.h:150:5: error: '(' and '{' tokens introducing statement expression appear in different macro expansion contexts [-Werror,-Wcompound-token-split-by-macro]
      ZAPHOD32_SCRAMBLE32(state[0],0x9fade23b);
      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  /usr/lib64/perl5/CORE/zaphod32_hash.h:80:38: note: expanded from macro 'ZAPHOD32_SCRAMBLE32'
  #define ZAPHOD32_SCRAMBLE32(v,prime) STMT_START {  \
                                       ^~~~~~~~~~
  /usr/lib64/perl5/CORE/perl.h:737:29: note: expanded from macro 'STMT_START'
  #   define STMT_START   (void)( /* gcc supports "({ STATEMENTS; })" */
                                ^
  /usr/lib64/perl5/CORE/zaphod32_hash.h:150:5: note: '{' token is here
      ZAPHOD32_SCRAMBLE32(state[0],0x9fade23b);
      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  /usr/lib64/perl5/CORE/zaphod32_hash.h:80:49: note: expanded from macro 'ZAPHOD32_SCRAMBLE32'
  #define ZAPHOD32_SCRAMBLE32(v,prime) STMT_START {  \
                                                  ^
  /usr/lib64/perl5/CORE/zaphod32_hash.h:150:5: error: '}' and ')' tokens terminating statement expression appear in different macro expansion contexts [-Werror,-Wcompound-token-split-by-macro]
      ZAPHOD32_SCRAMBLE32(state[0],0x9fade23b);
      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  /usr/lib64/perl5/CORE/zaphod32_hash.h:87:41: note: expanded from macro 'ZAPHOD32_SCRAMBLE32'
      v ^= (v>>23);                       \
                                          ^
  /usr/lib64/perl5/CORE/zaphod32_hash.h:150:5: note: ')' token is here
      ZAPHOD32_SCRAMBLE32(state[0],0x9fade23b);
      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  /usr/lib64/perl5/CORE/zaphod32_hash.h:88:3: note: expanded from macro 'ZAPHOD32_SCRAMBLE32'
  } STMT_END
    ^~~~~~~~
  /usr/lib64/perl5/CORE/perl.h:738:21: note: expanded from macro 'STMT_END'
  #   define STMT_END     )
                          ^

Please refer to the discussion on the Link: tag below, where Nathan
clarifies the situation:

<quote>
acme> And then get to the problems at the end of this message, which seem
acme> similar to the problem described here:
acme>
acme> From  Nathan Chancellor <>
acme> Subject [PATCH] mwifiex: Remove unnecessary braces from HostCmd_SET_SEQ_NO_BSS_INFO
acme>
acme> https://lkml.org/lkml/2020/9/1/135
acme>
acme> So perhaps in this case its better to disable that
acme> -Werror,-Wcompound-token-split-by-macro when building with clang?

Yes, I think that is probably the best solution. As far as I can tell,
at least in this file and context, the warning appears harmless, as the
"create a GNU C statement expression from two different macros" is very
much intentional, based on the presence of PERL_USE_GCC_BRACE_GROUPS.
The warning is fixed in upstream Perl by just avoiding creating GNU C
statement expressions using STMT_START and STMT_END:

  https://github.com/Perl/perl5/issues/18780
  https://github.com/Perl/perl5/pull/18984

If I am reading the source code correctly, an alternative to disabling
the warning would be specifying -DPERL_GCC_BRACE_GROUPS_FORBIDDEN but it
seems like that might end up impacting more than just this site,
according to the issue discussion above.
</quote>

Based-on-a-patch-by: Sedat Dilek <sedat.dilek@gmail.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # Debian/Selfmade LLVM-14 (x86-64)
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Fangrui Song <maskray@google.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Keeping <john@metanate.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Link: http://lore.kernel.org/lkml/YkxWcYzph5pC1EK8@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoperf python: Fix probing for some clang command line options
Arnaldo Carvalho de Melo [Fri, 8 Apr 2022 13:08:07 +0000 (10:08 -0300)]
perf python: Fix probing for some clang command line options

commit dd6e1fe91cdd52774ca642d1da75b58a86356b56 upstream.

The clang compiler complains about some options even without a source
file being available, while others require one, so use the simple
tools/build/feature/test-hello.c file.

Then check for the "is not supported" string in its output, in addition
to the "unknown argument" already being looked for.

This was noticed when building with clang-13 where -ffat-lto-objects
isn't supported and since we were looking just for "unknown argument"
and not providing a source code to clang, was mistakenly assumed as
being available and not being filtered to set of command line options
provided to clang, leading to a build failure.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Fangrui Song <maskray@google.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Keeping <john@metanate.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Link: http://lore.kernel.org/lkml/
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoperf build: Don't use -ffat-lto-objects in the python feature test when building...
Arnaldo Carvalho de Melo [Thu, 7 Apr 2022 14:04:20 +0000 (11:04 -0300)]
perf build: Don't use -ffat-lto-objects in the python feature test when building with clang-13

commit 3a8a0475861a443f02e3a9b57d044fe2a0a99291 upstream.

Using -ffat-lto-objects in the python feature test when building with
clang-13 results in:

  clang-13: error: optimization flag '-ffat-lto-objects' is not supported [-Werror,-Wignored-optimization-argument]
  error: command '/usr/sbin/clang' failed with exit code 1
  cp: cannot stat '/tmp/build/perf/python_ext_build/lib/perf*.so': No such file or directory
  make[2]: *** [Makefile.perf:639: /tmp/build/perf/python/perf.so] Error 1

Noticed when building on a docker.io/library/archlinux:base container.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Fangrui Song <maskray@google.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Keeping <john@metanate.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agobpf: Treat bpf_sk_lookup remote_port as a 2-byte field
Jakub Sitnicki [Sat, 19 Mar 2022 18:33:54 +0000 (19:33 +0100)]
bpf: Treat bpf_sk_lookup remote_port as a 2-byte field

commit 058ec4a7d9cf77238c73ad9f1e1a3ed9a29afcab upstream.

In commit 9a69e2b385f4 ("bpf: Make remote_port field in struct
bpf_sk_lookup 16-bit wide") the remote_port field has been split up and
re-declared from u32 to be16.

However, the accompanying changes to the context access converter have not
been well thought through when it comes big-endian platforms.

Today 2-byte wide loads from offsetof(struct bpf_sk_lookup, remote_port)
are handled as narrow loads from a 4-byte wide field.

This by itself is not enough to create a problem, but when we combine

 1. 32-bit wide access to ->remote_port backed by a 16-wide wide load, with
 2. inherent difference between litte- and big-endian in how narrow loads
    need have to be handled (see bpf_ctx_narrow_access_offset),

we get inconsistent results for a 2-byte loads from &ctx->remote_port on LE
and BE architectures. This in turn makes BPF C code for the common case of
2-byte load from ctx->remote_port not portable.

To rectify it, inform the context access converter that remote_port is
2-byte wide field, and only 1-byte loads need to be treated as narrow
loads.

At the same time, we special-case the 4-byte load from &ctx->remote_port to
continue handling it the same way as do today, in order to keep the
existing BPF programs working.

Fixes: 9a69e2b385f4 ("bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wide")
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220319183356.233666-2-jakub@cloudflare.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoselftests/bpf: Fix u8 narrow load checks for bpf_sk_lookup remote_port
Jakub Sitnicki [Sat, 19 Mar 2022 18:33:55 +0000 (19:33 +0100)]
selftests/bpf: Fix u8 narrow load checks for bpf_sk_lookup remote_port

commit 3c69611b8926f8e74fcf76bd97ae0e5dafbeb26a upstream.

In commit 9a69e2b385f4 ("bpf: Make remote_port field in struct
bpf_sk_lookup 16-bit wide") ->remote_port field changed from __u32 to
__be16.

However, narrow load tests which exercise 1-byte sized loads from
offsetof(struct bpf_sk_lookup, remote_port) were not adopted to reflect the
change.

As a result, on little-endian we continue testing loads from addresses:

 - (__u8 *)&ctx->remote_port + 3
 - (__u8 *)&ctx->remote_port + 4

which map to the zero padding following the remote_port field, and don't
break the tests because there is no observable change.

While on big-endian, we observe breakage because tests expect to see zeros
for values loaded from:

 - (__u8 *)&ctx->remote_port - 1
 - (__u8 *)&ctx->remote_port - 2

Above addresses map to ->remote_ip6 field, which precedes ->remote_port,
and are populated during the bpf_sk_lookup IPv6 tests.

Unsurprisingly, on s390x we observe:

  #136/38 sk_lookup/narrow access to ctx v4:OK
  #136/39 sk_lookup/narrow access to ctx v6:FAIL

Fix it by removing the checks for 1-byte loads from offsets outside of the
->remote_port field.

Fixes: 9a69e2b385f4 ("bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wide")
Suggested-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220319183356.233666-3-jakub@cloudflare.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agobpf: Make remote_port field in struct bpf_sk_lookup 16-bit wide
Jakub Sitnicki [Wed, 9 Feb 2022 18:43:32 +0000 (19:43 +0100)]
bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wide

commit 9a69e2b385f443f244a7e8b8bcafe5ccfb0866b4 upstream.

remote_port is another case of a BPF context field documented as a 32-bit
value in network byte order for which the BPF context access converter
generates a load of a zero-padded 16-bit integer in network byte order.

First such case was dst_port in bpf_sock which got addressed in commit
4421a582718a ("bpf: Make dst_port field in struct bpf_sock 16-bit wide").

Loading 4-bytes from the remote_port offset and converting the value with
bpf_ntohl() leads to surprising results, as the expected value is shifted
by 16 bits.

Reduce the confusion by splitting the field in two - a 16-bit field holding
a big-endian integer, and a 16-bit zero-padding anonymous field that
follows it.

Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220209184333.654927-2-jakub@cloudflare.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoRevert "selftests: net: Add tls config dependency for tls selftests"
Jakub Kicinski [Mon, 28 Mar 2022 21:29:04 +0000 (14:29 -0700)]
Revert "selftests: net: Add tls config dependency for tls selftests"

commit 20695e9a9fd39103d1b0669470ae74030b7aa196 upstream.

This reverts commit d9142e1cf3bbdaf21337767114ecab26fe702d47.

The test is supposed to run cleanly with TLS is disabled,
to test compatibility with TCP behavior. I can't repro
the failure [1], the problem should be debugged rather
than papered over.

Link: https://lore.kernel.org/all/20220325161203.7000698c@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/
Fixes: d9142e1cf3bb ("selftests: net: Add tls config dependency for tls selftests")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20220328212904.2685395-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agonet/smc: send directly on setting TCP_NODELAY
Dust Li [Tue, 1 Mar 2022 09:43:59 +0000 (17:43 +0800)]
net/smc: send directly on setting TCP_NODELAY

commit b70a5cc045197aad9c159042621baf3c015f6cc7 upstream.

In commit ea785a1a573b("net/smc: Send directly when
TCP_CORK is cleared"), we don't use delayed work
to implement cork.

This patch use the same algorithm, removes the
delayed work when setting TCP_NODELAY and send
directly in setsockopt(). This also makes the
TCP_NODELAY the same as TCP.

Cc: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: Dust Li <dust.li@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agodrm/amdkfd: Fix variable set but not used warning
Philip Yang [Fri, 28 Jan 2022 14:26:30 +0000 (09:26 -0500)]
drm/amdkfd: Fix variable set but not used warning

commit 90c44207cdd18091ac9aa7cab8a3e7b0ef00e847 upstream.

All warnings (new ones prefixed by >>):

   drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_svm.c: In function
'svm_range_deferred_list_work':
>> drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_svm.c:2067:22: warning:
variable 'p' set but not used [-Wunused-but-set-variable]
    2067 |  struct kfd_process *p;
         |

Fixes: 367c9b0f1b8750 ("drm/amdkfd: Ensure mm remain valid in svm deferred_list work")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-By: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoRevert "ACPI: processor: idle: Only flush cache on entering C3"
Akihiko Odaki [Sun, 3 Apr 2022 06:23:22 +0000 (15:23 +0900)]
Revert "ACPI: processor: idle: Only flush cache on entering C3"

commit dfbba2518aac4204203b0697a894d3b2f80134d3 upstream.

Revert commit 87ebbb8c612b ("ACPI: processor: idle: Only flush cache
on entering C3") that broke the assumptions of the acpi_idle_play_dead()
callers.

Namely, the CPU cache must always be flushed in acpi_idle_play_dead(),
regardless of the target C-state that is going to be requested, because
this is likely to be part of a CPU offline procedure or preparation for
entering a system-wide sleep state and the lack of synchronization
between the CPU cache and RAM may lead to problems going forward, for
example when the CPU is brought back online.

In particular, it breaks resume from suspend-to-RAM on Lenovo ThinkPad
C13 which fails occasionally until the problematic commit is reverted.

Signed-off-by: Akihiko Odaki <akihiko.odaki@gmail.com>
[ rjw: Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Ketsui <esgwpl@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoSUNRPC: Ensure we flush any closed sockets before xs_xprt_free()
Trond Myklebust [Sun, 3 Apr 2022 19:58:11 +0000 (15:58 -0400)]
SUNRPC: Ensure we flush any closed sockets before xs_xprt_free()

commit f00432063db1a0db484e85193eccc6845435b80e upstream.

We must ensure that all sockets are closed before we call xprt_free()
and release the reference to the net namespace. The problem is that
calling fput() will defer closing the socket until delayed_fput() gets
called.
Let's fix the situation by allowing rpciod and the transport teardown
code (which runs on the system wq) to call __fput_sync(), and directly
close the socket.

Reported-by: Felix Fu <foyjog@gmail.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Fixes: a73881c96d73 ("SUNRPC: Fix an Oops in udp_poll()")
Cc: stable@vger.kernel.org # 5.1.x: 3be232f11a3c: SUNRPC: Prevent immediate close+reconnect
Cc: stable@vger.kernel.org # 5.1.x: 89f42494f92f: SUNRPC: Don't call connect() more than once on a TCP socket
Cc: stable@vger.kernel.org # 5.1.x
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agodrm/amdgpu: don't use BACO for reset in S3
Alex Deucher [Fri, 25 Mar 2022 15:53:39 +0000 (11:53 -0400)]
drm/amdgpu: don't use BACO for reset in S3

commit ebc002e3ee78409c42156e62e4e27ad1d09c5a75 upstream.

Seems to cause a reboots or hangs on some systems.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1924
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1953
Fixes: daf8de0874ab5b ("drm/amdgpu: always reset the asic in suspend (v2)")
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agodrm/amdkfd: Create file descriptor after client is added to smi_clients list
Lee Jones [Thu, 31 Mar 2022 12:21:17 +0000 (13:21 +0100)]
drm/amdkfd: Create file descriptor after client is added to smi_clients list

commit e79a2398e1b2d47060474dca291542368183bc0f upstream.

This ensures userspace cannot prematurely clean-up the client before
it is fully initialised which has been proven to cause issues in the
past.

Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agodrm/nouveau/pmu: Add missing callbacks for Tegra devices
Karol Herbst [Tue, 22 Mar 2022 12:48:00 +0000 (13:48 +0100)]
drm/nouveau/pmu: Add missing callbacks for Tegra devices

commit 38d4e5cf5b08798f093374e53c2f4609d5382dd5 upstream.

Fixes a crash booting on those platforms with nouveau.

Fixes: 4cdd2450bf73 ("drm/nouveau/pmu/gm200-: use alternate falcon reset sequence")
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Karol Herbst <kherbst@redhat.com>
Cc: dri-devel@lists.freedesktop.org
Cc: nouveau@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v5.17+
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220322124800.2605463-1-kherbst@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agodrm/amdgpu/vcn: Fix the register setting for vcn1
Emily Deng [Mon, 21 Mar 2022 08:25:24 +0000 (16:25 +0800)]
drm/amdgpu/vcn: Fix the register setting for vcn1

commit 02fc996d5098f4c3f65bdf6cdb6b28e3f29ba789 upstream.

Correct the code error for setting register UVD_GFX10_ADDR_CONFIG.
Need to use inst_idx, or it only will set VCN0.

Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agodrm/amdgpu/smu10: fix SoC/fclk units in auto mode
Alex Deucher [Fri, 1 Apr 2022 15:08:48 +0000 (11:08 -0400)]
drm/amdgpu/smu10: fix SoC/fclk units in auto mode

commit 2f25d8ce09b7ba5d769c132ba3d4eb84a941d2cb upstream.

SMU takes clock limits in Mhz units.  socclk and fclk were
using 10 khz units in some cases.  Switch to Mhz units.
Fixes higher than required SoC clocks.

Fixes: 97cf32996c46d9 ("drm/amd/pm: Removed fixed clock in auto mode DPM")
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agodrm/amdgpu/display: change pipe policy for DCN 2.1
Benjamin Marty [Wed, 23 Mar 2022 21:08:26 +0000 (22:08 +0100)]
drm/amdgpu/display: change pipe policy for DCN 2.1

commit 879791ad8bf3dc5453061cad74776a617b6e3319 upstream.

Fixes crash on MST Hub disconnect.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1849
Fixes: ee2698cf79cc ("drm/amd/display: Changed pipe split policy to allow for multi-display pipe split")
Signed-off-by: Benjamin Marty <info@benjaminmarty.ch>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agodrm/amd/display: Fix by adding FPU protection for dcn30_internal_validate_bw
CHANDAN VURDIGERE NATARAJ [Tue, 29 Mar 2022 07:40:31 +0000 (13:10 +0530)]
drm/amd/display: Fix by adding FPU protection for dcn30_internal_validate_bw

commit ca1198849ab0e7af5efb392ef6baf1138f6fc086 upstream.

[Why]
Below general protection fault observed when WebGL Aquarium is run for
longer duration. If drm debug logs are enabled and set to 0x1f then the
issue is observed within 10 minutes of run.

[  100.717056] general protection fault, probably for non-canonical address 0x2d33302d32323032: 0000 [#1] PREEMPT SMP NOPTI
[  100.727921] CPU: 3 PID: 1906 Comm: DrmThread Tainted: G        W         5.15.30 #12 d726c6a2d6ebe5cf9223931cbca6892f916fe18b
[  100.754419] RIP: 0010:CalculateSwathWidth+0x1f7/0x44f
[  100.767109] Code: 00 00 00 f2 42 0f 11 04 f0 48 8b 85 88 00 00 00 f2 42 0f 10 04 f0 48 8b 85 98 00 00 00 f2 42 0f 11 04 f0 48 8b 45 10 0f 57 c0 <f3> 42 0f 2a 04 b0 0f 57 c9 f3 43 0f 2a 0c b4 e8 8c e2 f3 ff 48 8b
[  100.781269] RSP: 0018:ffffa9230079eeb0 EFLAGS: 00010246
[  100.812528] RAX: 2d33302d32323032 RBX: 0000000000000500 RCX: 0000000000000000
[  100.819656] RDX: 0000000000000001 RSI: ffff99deb712c49c RDI: 0000000000000000
[  100.826781] RBP: ffffa9230079ef50 R08: ffff99deb712460c R09: ffff99deb712462c
[  100.833907] R10: ffff99deb7124940 R11: ffff99deb7124d70 R12: ffff99deb712ae44
[  100.841033] R13: 0000000000000001 R14: 0000000000000000 R15: ffffa9230079f0a0
[  100.848159] FS:  00007af121212640(0000) GS:ffff99deba780000(0000) knlGS:0000000000000000
[  100.856240] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  100.861980] CR2: 0000209000fe1000 CR3: 000000011b18c000 CR4: 0000000000350ee0
[  100.869106] Call Trace:
[  100.871555]  <TASK>
[  100.873655]  ? asm_sysvec_reschedule_ipi+0x12/0x20
[  100.878449]  CalculateSwathAndDETConfiguration+0x1a3/0x6dd
[  100.883937]  dml31_ModeSupportAndSystemConfigurationFull+0x2ce4/0x76da
[  100.890467]  ? kallsyms_lookup_buildid+0xc8/0x163
[  100.895173]  ? kallsyms_lookup_buildid+0xc8/0x163
[  100.899874]  ? __sprint_symbol+0x80/0x135
[  100.903883]  ? dm_update_plane_state+0x3f9/0x4d2
[  100.908500]  ? symbol_string+0xb7/0xde
[  100.912250]  ? number+0x145/0x29b
[  100.915566]  ? vsnprintf+0x341/0x5ff
[  100.919141]  ? desc_read_finalized_seq+0x39/0x87
[  100.923755]  ? update_load_avg+0x1b9/0x607
[  100.927849]  ? compute_mst_dsc_configs_for_state+0x7d/0xd5b
[  100.933416]  ? fetch_pipe_params+0xa4d/0xd0c
[  100.937686]  ? dc_fpu_end+0x3d/0xa8
[  100.941175]  dml_get_voltage_level+0x16b/0x180
[  100.945619]  dcn30_internal_validate_bw+0x10e/0x89b
[  100.950495]  ? dcn31_validate_bandwidth+0x68/0x1fc
[  100.955285]  ? resource_build_scaling_params+0x98b/0xb8c
[  100.960595]  ? dcn31_validate_bandwidth+0x68/0x1fc
[  100.965384]  dcn31_validate_bandwidth+0x9a/0x1fc
[  100.970001]  dc_validate_global_state+0x238/0x295
[  100.974703]  amdgpu_dm_atomic_check+0x9c1/0xbce
[  100.979235]  ? _printk+0x59/0x73
[  100.982467]  drm_atomic_check_only+0x403/0x78b
[  100.986912]  drm_mode_atomic_ioctl+0x49b/0x546
[  100.991358]  ? drm_ioctl+0x1c1/0x3b3
[  100.994936]  ? drm_atomic_set_property+0x92a/0x92a
[  100.999725]  drm_ioctl_kernel+0xdc/0x149
[  101.003648]  drm_ioctl+0x27f/0x3b3
[  101.007051]  ? drm_atomic_set_property+0x92a/0x92a
[  101.011842]  amdgpu_drm_ioctl+0x49/0x7d
[  101.015679]  __se_sys_ioctl+0x7c/0xb8
[  101.015685]  do_syscall_64+0x5f/0xb8
[  101.015690]  ? __irq_exit_rcu+0x34/0x96

[How]
It calles populate_dml_pipes which uses doubles to initialize.
Adding FPU protection avoids context switch and probable loss of vba context
as there is potential contention while drm debug logs are enabled.

Signed-off-by: CHANDAN VURDIGERE NATARAJ <chandan.vurdigerenataraj@amd.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agodrm/panel: ili9341: fix optional regulator handling
Daniel Mack [Thu, 17 Mar 2022 22:55:37 +0000 (23:55 +0100)]
drm/panel: ili9341: fix optional regulator handling

commit d14eb80e27795b7b20060f7b151cdfe39722a813 upstream.

If the optional regulator lookup fails, reset the pointer to NULL.
Other functions such as mipi_dbi_poweron_reset_conditional() only do
a NULL pointer check and will otherwise dereference the error pointer.

Fixes: 5a04227326b04c15 ("drm/panel: Add ilitek ili9341 panel driver")
Signed-off-by: Daniel Mack <daniel@zonque.org>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20220317225537.826302-1-daniel@zonque.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoamd/display: set backlight only if required
Shirish S [Fri, 11 Mar 2022 15:00:17 +0000 (20:30 +0530)]
amd/display: set backlight only if required

commit 4052287a75eb3fc0f487fcc5f768a38bede455c8 upstream.

[Why]
comparing pwm bl values (coverted) with user brightness(converted)
levels in commit_tail leads to continuous setting of backlight via dmub
as they don't to match.
This leads overdrive in queuing of commands to DMCU that sometimes lead
to depending on load on DMCU fw:

"[drm:dc_dmub_srv_wait_idle] *ERROR* Error waiting for DMUB idle: status=3"

[How]
Store last successfully set backlight value and compare with it instead
of pwm reads which is not what we should compare with.

Signed-off-by: Shirish S <shirish.s@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agofbdev: Fix unregistering of framebuffers without device
Thomas Zimmermann [Mon, 4 Apr 2022 19:44:02 +0000 (21:44 +0200)]
fbdev: Fix unregistering of framebuffers without device

commit 0f525289ff0ddeb380813bd81e0f9bdaaa1c9078 upstream.

OF framebuffers do not have an underlying device in the Linux
device hierarchy. Do a regular unregister call instead of hot
unplugging such a non-existing device. Fixes a NULL dereference.
An example error message on ppc64le is shown below.

  BUG: Kernel NULL pointer dereference on read at 0x00000060
  Faulting instruction address: 0xc00000000080dfa4
  Oops: Kernel access of bad area, sig: 11 [#1]
  LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
  [...]
  CPU: 2 PID: 139 Comm: systemd-udevd Not tainted 5.17.0-ae085d7f9365 #1
  NIP:  c00000000080dfa4 LR: c00000000080df9c CTR: c000000000797430
  REGS: c000000004132fe0 TRAP: 0300   Not tainted  (5.17.0-ae085d7f9365)
  MSR:  8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR: 28228282  XER: 20000000
  CFAR: c00000000000c80c DAR: 0000000000000060 DSISR: 40000000 IRQMASK: 0
  GPR00: c00000000080df9c c000000004133280 c00000000169d200 0000000000000029
  GPR04: 00000000ffffefff c000000004132f90 c000000004132f88 0000000000000000
  GPR08: c0000000015658f8 c0000000015cd200 c0000000014f57d0 0000000048228283
  GPR12: 0000000000000000 c00000003fffe300 0000000020000000 0000000000000000
  GPR16: 0000000000000000 0000000113fc4a40 0000000000000005 0000000113fcfb80
  GPR20: 000001000f7283b0 0000000000000000 c000000000e4a588 c000000000e4a5b0
  GPR24: 0000000000000001 00000000000a0000 c008000000db0168 c0000000021f6ec0
  GPR28: c0000000016d65a8 c000000004b36460 0000000000000000 c0000000016d64b0
  NIP [c00000000080dfa4] do_remove_conflicting_framebuffers+0x184/0x1d0
  [c000000004133280] [c00000000080df9c] do_remove_conflicting_framebuffers+0x17c/0x1d0 (unreliable)
  [c000000004133350] [c00000000080e4d0] remove_conflicting_framebuffers+0x60/0x150
  [c0000000041333a0] [c00000000080e6f4] remove_conflicting_pci_framebuffers+0x134/0x1b0
  [c000000004133450] [c008000000e70438] drm_aperture_remove_conflicting_pci_framebuffers+0x90/0x100 [drm]
  [c000000004133490] [c008000000da0ce4] bochs_pci_probe+0x6c/0xa64 [bochs]
  [...]
  [c000000004133db0] [c00000000002aaa0] system_call_exception+0x170/0x2d0
  [c000000004133e10] [c00000000000c3cc] system_call_common+0xec/0x250

The bug [1] was introduced by commit 27599aacbaef ("fbdev: Hot-unplug
firmware fb devices on forced removal"). Most firmware framebuffers
have an underlying platform device, which can be hot-unplugged
before loading the native graphics driver. OF framebuffers do not
(yet) have that device. Fix the code by unregistering the framebuffer
as before without a hot unplug.

Tested with 5.17 on qemu ppc64le emulation.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Fixes: 27599aacbaef ("fbdev: Hot-unplug firmware fb devices on forced removal")
Reported-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Tested-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>
Cc: Zack Rusin <zackr@vmware.com>
Cc: Javier Martinez Canillas <javierm@redhat.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: stable@vger.kernel.org # v5.11+
Cc: Helge Deller <deller@gmx.de>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Zheyu Ma <zheyuma97@gmail.com>
Cc: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Cc: Zhen Lei <thunder.leizhen@huawei.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: linux-fbdev@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Link: https://lore.kernel.org/all/YkHXO6LGHAN0p1pq@debian/
Link: https://patchwork.freedesktop.org/patch/msgid/20220404194402.29974-1-tzimmermann@suse.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoirqchip/gic-v3: Fix GICR_CTLR.RWP polling
Marc Zyngier [Tue, 15 Mar 2022 16:50:32 +0000 (16:50 +0000)]
irqchip/gic-v3: Fix GICR_CTLR.RWP polling

commit 0df6664531a12cdd8fc873f0cac0dcb40243d3e9 upstream.

It turns out that our polling of RWP is totally wrong when checking
for it in the redistributors, as we test the *distributor* bit index,
whereas it is a different bit number in the RDs... Oopsie boo.

This is embarassing. Not only because it is wrong, but also because
it took *8 years* to notice the blunder...

Just fix the damn thing.

Fixes: 021f653791ad ("irqchip: gic-v3: Initial support for GICv3")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Link: https://lore.kernel.org/r/20220315165034.794482-2-maz@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoperf/core: Inherit event_caps
Namhyung Kim [Mon, 28 Mar 2022 20:01:12 +0000 (13:01 -0700)]
perf/core: Inherit event_caps

commit e3265a4386428d3d157d9565bb520aabff8b4bf0 upstream.

It was reported that some perf event setup can make fork failed on
ARM64.  It was the case of a group of mixed hw and sw events and it
failed in perf_event_init_task() due to armpmu_event_init().

The ARM PMU code checks if all the events in a group belong to the
same PMU except for software events.  But it didn't set the event_caps
of inherited events and no longer identify them as software events.
Therefore the test failed in a child process.

A simple reproducer is:

  $ perf stat -e '{cycles,cs,instructions}' perf bench sched messaging
  # Running 'sched/messaging' benchmark:
  perf: fork(): Invalid argument

The perf stat was fine but the perf bench failed in fork().  Let's
inherit the event caps from the parent.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20220328200112.457740-1-namhyung@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoperf: qcom_l2_pmu: fix an incorrect NULL check on list iterator
Xiaomeng Tong [Sun, 27 Mar 2022 05:57:33 +0000 (13:57 +0800)]
perf: qcom_l2_pmu: fix an incorrect NULL check on list iterator

commit 2012a9e279013933885983cbe0a5fe828052563b upstream.

The bug is here:
return cluster;

The list iterator value 'cluster' will *always* be set and non-NULL
by list_for_each_entry(), so it is incorrect to assume that the
iterator value will be NULL if the list is empty or no element
is found.

To fix the bug, return 'cluster' when found, otherwise return NULL.

Cc: stable@vger.kernel.org
Fixes: 21bdbb7102ed ("perf: add qcom l2 cache perf events driver")
Signed-off-by: Xiaomeng Tong <xiam0nd.tong@gmail.com>
Link: https://lore.kernel.org/r/20220327055733.4070-1-xiam0nd.tong@gmail.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoata: sata_dwc_460ex: Fix crash due to OOB write
Christian Lamparter [Sat, 19 Mar 2022 20:11:02 +0000 (21:11 +0100)]
ata: sata_dwc_460ex: Fix crash due to OOB write

commit 7aa8104a554713b685db729e66511b93d989dd6a upstream.

the driver uses libata's "tag" values from in various arrays.
Since the mentioned patch bumped the ATA_TAG_INTERNAL to 32,
the value of the SATA_DWC_QCMD_MAX needs to account for that.

Otherwise ATA_TAG_INTERNAL usage cause similar crashes like
this as reported by Tice Rex on the OpenWrt Forum and
reproduced (with symbols) here:

| BUG: Kernel NULL pointer dereference at 0x00000000
| Faulting instruction address: 0xc03ed4b8
| Oops: Kernel access of bad area, sig: 11 [#1]
| BE PAGE_SIZE=4K PowerPC 44x Platform
| CPU: 0 PID: 362 Comm: scsi_eh_1 Not tainted 5.4.163 #0
| NIP:  c03ed4b8 LR: c03d27e8 CTR: c03ed36c
| REGS: cfa59950 TRAP: 0300   Not tainted  (5.4.163)
| MSR:  00021000 <CE,ME>  CR: 42000222  XER: 00000000
| DEAR: 00000000 ESR: 00000000
| GPR00: c03d27e8 cfa59a08 cfa55fe0 00000000 0fa46bc0 [...]
| [..]
| NIP [c03ed4b8] sata_dwc_qc_issue+0x14c/0x254
| LR [c03d27e8] ata_qc_issue+0x1c8/0x2dc
| Call Trace:
| [cfa59a08] [c003f4e0] __cancel_work_timer+0x124/0x194 (unreliable)
| [cfa59a78] [c03d27e8] ata_qc_issue+0x1c8/0x2dc
| [cfa59a98] [c03d2b3c] ata_exec_internal_sg+0x240/0x524
| [cfa59b08] [c03d2e98] ata_exec_internal+0x78/0xe0
| [cfa59b58] [c03d30fc] ata_read_log_page.part.38+0x1dc/0x204
| [cfa59bc8] [c03d324c] ata_identify_page_supported+0x68/0x130
| [...]

This is because sata_dwc_dma_xfer_complete() NULLs the
dma_pending's next neighbour "chan" (a *dma_chan struct) in
this '32' case right here (line ~735):
> hsdevp->dma_pending[tag] = SATA_DWC_DMA_PENDING_NONE;

Then the next time, a dma gets issued; dma_dwc_xfer_setup() passes
the NULL'd hsdevp->chan to the dmaengine_slave_config() which then
causes the crash.

With this patch, SATA_DWC_QCMD_MAX is now set to ATA_MAX_QUEUE + 1.
This avoids the OOB. But please note, there was a worthwhile discussion
on what ATA_TAG_INTERNAL and ATA_MAX_QUEUE is. And why there should not
be a "fake" 33 command-long queue size.

Ideally, the dw driver should account for the ATA_TAG_INTERNAL.
In Damien Le Moal's words: "... having looked at the driver, it
is a bigger change than just faking a 33rd "tag" that is in fact
not a command tag at all."

Fixes: 28361c403683c ("libata: add extra internal command")
Cc: stable@kernel.org # 4.18+
BugLink: https://github.com/openwrt/openwrt/issues/9505
Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoperf/x86/intel: Don't extend the pseudo-encoding to GP counters
Kan Liang [Mon, 28 Mar 2022 15:49:02 +0000 (08:49 -0700)]
perf/x86/intel: Don't extend the pseudo-encoding to GP counters

commit 4a263bf331c512849062805ef1b4ac40301a9829 upstream.

The INST_RETIRED.PREC_DIST event (0x0100) doesn't count on SPR.
perf stat -e cpu/event=0xc0,umask=0x0/,cpu/event=0x0,umask=0x1/ -C0

 Performance counter stats for 'CPU(s) 0':

           607,246      cpu/event=0xc0,umask=0x0/
                 0      cpu/event=0x0,umask=0x1/

The encoding for INST_RETIRED.PREC_DIST is pseudo-encoding, which
doesn't work on the generic counters. However, current perf extends its
mask to the generic counters.

The pseudo event-code for a fixed counter must be 0x00. Check and avoid
extending the mask for the fixed counter event which using the
pseudo-encoding, e.g., ref-cycles and PREC_DIST event.

With the patch,
perf stat -e cpu/event=0xc0,umask=0x0/,cpu/event=0x0,umask=0x1/ -C0

 Performance counter stats for 'CPU(s) 0':

           583,184      cpu/event=0xc0,umask=0x0/
           583,048      cpu/event=0x0,umask=0x1/

Fixes: 2de71ee153ef ("perf/x86/intel: Fix ICL/SPR INST_RETIRED.PREC_DIST encodings")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1648482543-14923-1-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agox86/mm/tlb: Revert retpoline avoidance approach
Dave Hansen [Fri, 18 Mar 2022 13:52:59 +0000 (06:52 -0700)]
x86/mm/tlb: Revert retpoline avoidance approach

commit d39268ad24c0fd0665d0c5cf55a7c1a0ebf94766 upstream.

0day reported a regression on a microbenchmark which is intended to
stress the TLB flushing path:

https://lore.kernel.org/all/20220317090415.GE735@xsang-OptiPlex-9020/

It pointed at a commit from Nadav which intended to remove retpoline
overhead in the TLB flushing path by taking the 'cond'-ition in
on_each_cpu_cond_mask(), pre-calculating it, and incorporating it into
'cpumask'.  That allowed the code to use a bunch of earlier direct
calls instead of later indirect calls that need a retpoline.

But, in practice, threads can go idle (and into lazy TLB mode where
they don't need to flush their TLB) between the early and late calls.
It works in this direction and not in the other because TLB-flushing
threads tend to hold mmap_lock for write.  Contention on that lock
causes threads to _go_ idle right in this early/late window.

There was not any performance data in the original commit specific
to the retpoline overhead.  I did a few tests on a system with
retpolines:

https://lore.kernel.org/all/dd8be93c-ded6-b962-50d4-96b1c3afb2b7@intel.com/

which showed a possible small win.  But, that small win pales in
comparison with the bigger loss induced on non-retpoline systems.

Revert the patch that removed the retpolines.  This was not a
clean revert, but it was self-contained enough not to be too painful.

Fixes: 6035152d8eeb ("x86/mm/tlb: Open-code on_each_cpu_cond_mask() for tlb_is_not_lazy()")
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Nadav Amit <namit@vmware.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/164874672286.389.7021457716635788197.tip-bot2@tip-bot2
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agox86/msi: Fix msi message data shadow struct
Reto Buerki [Thu, 7 Apr 2022 11:06:47 +0000 (13:06 +0200)]
x86/msi: Fix msi message data shadow struct

commit 59b18a1e65b7a2134814106d0860010e10babe18 upstream.

The x86 MSI message data is 32 bits in total and is either in
compatibility or remappable format, see Intel Virtualization Technology
for Directed I/O, section 5.1.2.

Fixes: 6285aa50736 ("x86/msi: Provide msi message shadow structs")
Co-developed-by: Adrian-Ken Rueegsegger <ken@codelabs.ch>
Signed-off-by: Adrian-Ken Rueegsegger <ken@codelabs.ch>
Signed-off-by: Reto Buerki <reet@codelabs.ch>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220407110647.67372-1-reet@codelabs.ch
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agogpio: Restrict usage of GPIO chip irq members before initialization
Shreeya Patel [Mon, 21 Mar 2022 13:32:41 +0000 (19:02 +0530)]
gpio: Restrict usage of GPIO chip irq members before initialization

commit 5467801f1fcbdc46bc7298a84dbf3ca1ff2a7320 upstream.

GPIO chip irq members are exposed before they could be completely
initialized and this leads to race conditions.

One such issue was observed for the gc->irq.domain variable which
was accessed through the I2C interface in gpiochip_to_irq() before
it could be initialized by gpiochip_add_irqchip(). This resulted in
Kernel NULL pointer dereference.

Following are the logs for reference :-

kernel: Call Trace:
kernel:  gpiod_to_irq+0x53/0x70
kernel:  acpi_dev_gpio_irq_get_by+0x113/0x1f0
kernel:  i2c_acpi_get_irq+0xc0/0xd0
kernel:  i2c_device_probe+0x28a/0x2a0
kernel:  really_probe+0xf2/0x460
kernel: RIP: 0010:gpiochip_to_irq+0x47/0xc0

To avoid such scenarios, restrict usage of GPIO chip irq members before
they are completely initialized.

Signed-off-by: Shreeya Patel <shreeya.patel@collabora.com>
Cc: stable@vger.kernel.org
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agodrbd: fix an invalid memory access caused by incorrect use of list iterator
Xiaomeng Tong [Wed, 6 Apr 2022 19:04:44 +0000 (21:04 +0200)]
drbd: fix an invalid memory access caused by incorrect use of list iterator

commit ae4d37b5df749926891583d42a6801b5da11e3c1 upstream.

The bug is here:
idr_remove(&connection->peer_devices, vnr);

If the previous for_each_connection() don't exit early (no goto hit
inside the loop), the iterator 'connection' after the loop will be a
bogus pointer to an invalid structure object containing the HEAD
(&resource->connections). As a result, the use of 'connection' above
will lead to a invalid memory access (including a possible invalid free
as idr_remove could call free_layer).

The original intention should have been to remove all peer_devices,
but the following lines have already done the work. So just remove
this line and the unneeded label, to fix this bug.

Cc: stable@vger.kernel.org
Fixes: c06ece6ba6f1b ("drbd: Turn connection->volumes into connection->peer_devices")
Signed-off-by: Xiaomeng Tong <xiam0nd.tong@gmail.com>
Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Reviewed-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoRDMA/hfi1: Fix use-after-free bug for mm struct
Douglas Miller [Fri, 8 Apr 2022 13:35:23 +0000 (09:35 -0400)]
RDMA/hfi1: Fix use-after-free bug for mm struct

commit 2bbac98d0930e8161b1957dc0ec99de39ade1b3c upstream.

Under certain conditions, such as MPI_Abort, the hfi1 cleanup code may
represent the last reference held on the task mm.
hfi1_mmu_rb_unregister() then drops the last reference and the mm is freed
before the final use in hfi1_release_user_pages().  A new task may
allocate the mm structure while it is still being used, resulting in
problems. One manifestation is corruption of the mmap_sem counter leading
to a hang in down_write().  Another is corruption of an mm struct that is
in use by another task.

Fixes: 3d2a9d642512 ("IB/hfi1: Ensure correct mm is used at all times")
Link: https://lore.kernel.org/r/20220408133523.122165.72975.stgit@awfm-01.cornelisnetworks.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Douglas Miller <doug.miller@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoarm64: patch_text: Fixup last cpu should be master
Guo Ren [Thu, 7 Apr 2022 07:33:20 +0000 (15:33 +0800)]
arm64: patch_text: Fixup last cpu should be master

commit 31a099dbd91e69fcab55eef4be15ed7a8c984918 upstream.

These patch_text implementations are using stop_machine_cpuslocked
infrastructure with atomic cpu_count. The original idea: When the
master CPU patch_text, the others should wait for it. But current
implementation is using the first CPU as master, which couldn't
guarantee the remaining CPUs are waiting. This patch changes the
last CPU as the master to solve the potential risk.

Fixes: ae16480785de ("arm64: introduce interfaces to hotpatch kernel and module code")
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20220407073323.743224-2-guoren@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoqed: fix ethtool register dump
Manish Chopra [Fri, 1 Apr 2022 18:53:04 +0000 (11:53 -0700)]
qed: fix ethtool register dump

commit 20921c0c86092b4082c91bd7c88305da74e5520b upstream.

To fix a coverity complain, commit d5ac07dfbd2b
("qed: Initialize debug string array") removed "sw-platform"
(one of the common global parameters) from the dump as this
was used in the dump with an uninitialized string, however
it did not reduce the number of common global parameters
which caused the incorrect (unable to parse) register dump

this patch fixes it with reducing NUM_COMMON_GLOBAL_PARAMS
bye one.

Cc: stable@vger.kernel.org
Cc: Tim Gardner <tim.gardner@canonical.com>
Cc: "David S. Miller" <davem@davemloft.net>
Fixes: d5ac07dfbd2b ("qed: Initialize debug string array")
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Alok Prasad <palok@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Manish Chopra <manishc@marvell.com>
Reviewed-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agocifs: force new session setup and tcon for dfs
Paulo Alcantara [Fri, 1 Apr 2022 16:51:34 +0000 (13:51 -0300)]
cifs: force new session setup and tcon for dfs

commit fb39d30e227233498c8debe6a9fe3e7cf575c85f upstream.

Do not reuse existing sessions and tcons in DFS failover as it might
connect to different servers and shares.

Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Cc: stable@vger.kernel.org
Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agospi: core: add dma_map_dev for __spi_unmap_msg()
Vinod Koul [Wed, 6 Apr 2022 13:22:38 +0000 (18:52 +0530)]
spi: core: add dma_map_dev for __spi_unmap_msg()

commit 409543cec01a84610029d6440c480c3fdd7214fb upstream.

Commit b470e10eb43f ("spi: core: add dma_map_dev for dma device") added
dma_map_dev for _spi_map_msg() but missed to add for unmap routine,
__spi_unmap_msg(), so add it now.

Fixes: b470e10eb43f ("spi: core: add dma_map_dev for dma device")
Cc: stable@vger.kernel.org # v5.14+
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Link: https://lore.kernel.org/r/20220406132238.1029249-1-vkoul@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agobtrfs: prevent subvol with swapfile from being deleted
Kaiwen Hu [Wed, 23 Mar 2022 07:10:32 +0000 (15:10 +0800)]
btrfs: prevent subvol with swapfile from being deleted

commit 60021bd754c6ca0addc6817994f20290a321d8d6 upstream.

A subvolume with an active swapfile must not be deleted otherwise it
would not be possible to deactivate it.

After the subvolume is deleted, we cannot swapoff the swapfile in this
deleted subvolume because the path is unreachable.  The swapfile is
still active and holding references, the filesystem cannot be unmounted.

The test looks like this:

  mkfs.btrfs -f $dev > /dev/null
  mount $dev $mnt

  btrfs sub create $mnt/subvol
  touch $mnt/subvol/swapfile
  chmod 600 $mnt/subvol/swapfile
  chattr +C $mnt/subvol/swapfile
  dd if=/dev/zero of=$mnt/subvol/swapfile bs=1K count=4096
  mkswap $mnt/subvol/swapfile
  swapon $mnt/subvol/swapfile

  btrfs sub delete $mnt/subvol
  swapoff $mnt/subvol/swapfile  # failed: No such file or directory
  swapoff --all

  unmount $mnt                  # target is busy.

To prevent above issue, we simply check that whether the subvolume
contains any active swapfile, and stop the deleting process.  This
behavior is like snapshot ioctl dealing with a swapfile.

CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Robbie Ko <robbieko@synology.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Kaiwen Hu <kevinhu@synology.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agobtrfs: avoid defragging extents whose next extents are not targets
Qu Wenruo [Tue, 15 Mar 2022 11:28:05 +0000 (19:28 +0800)]
btrfs: avoid defragging extents whose next extents are not targets

commit 75a36a7d3ea904cef2e5b56af0c58cc60dcf947a upstream.

[BUG]
There is a report that autodefrag is defragging single sector, which
is completely waste of IO, and no help for defragging:

   btrfs-cleaner-808 defrag_one_locked_range: root=256 ino=651122 start=0 len=4096

[CAUSE]
In defrag_collect_targets(), we check if the current range (A) can be merged
with next one (B).

If mergeable, we will add range A into target for defrag.

However there is a catch for autodefrag, when checking mergeability
against range B, we intentionally pass 0 as @newer_than, hoping to get a
higher chance to merge with the next extent.

But in the next iteration, range B will looked up by defrag_lookup_extent(),
with non-zero @newer_than.

And if range B is not really newer, it will rejected directly, causing
only range A being defragged, while we expect to defrag both range A and
B.

[FIX]
Since the root cause is the difference in check condition of
defrag_check_next_extent() and defrag_collect_targets(), we fix it by:

1. Pass @newer_than to defrag_check_next_extent()
2. Pass @extent_thresh to defrag_check_next_extent()

This makes the check between defrag_collect_targets() and
defrag_check_next_extent() more consistent.

While there is still some minor difference, the remaining checks are
focus on runtime flags like writeback/delalloc, which are mostly
transient and safe to be checked only in defrag_collect_targets().

Link: https://github.com/btrfs/linux/issues/423#issuecomment-1066981856
CC: stable@vger.kernel.org # 5.16+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agobtrfs: remove device item and update super block in the same transaction
Qu Wenruo [Tue, 8 Mar 2022 05:36:38 +0000 (13:36 +0800)]
btrfs: remove device item and update super block in the same transaction

commit bbac58698a55cc0a6f0c0d69a6dcd3f9f3134c11 upstream.

[BUG]
There is a report that a btrfs has a bad super block num devices.

This makes btrfs to reject the fs completely.

  BTRFS error (device sdd3): super_num_devices 3 mismatch with num_devices 2 found here
  BTRFS error (device sdd3): failed to read chunk tree: -22
  BTRFS error (device sdd3): open_ctree failed

[CAUSE]
During btrfs device removal, chunk tree and super block num devs are
updated in two different transactions:

  btrfs_rm_device()
  |- btrfs_rm_dev_item(device)
  |  |- trans = btrfs_start_transaction()
  |  |  Now we got transaction X
  |  |
  |  |- btrfs_del_item()
  |  |  Now device item is removed from chunk tree
  |  |
  |  |- btrfs_commit_transaction()
  |     Transaction X got committed, super num devs untouched,
  |     but device item removed from chunk tree.
  |     (AKA, super num devs is already incorrect)
  |
  |- cur_devices->num_devices--;
  |- cur_devices->total_devices--;
  |- btrfs_set_super_num_devices()
     All those operations are not in transaction X, thus it will
     only be written back to disk in next transaction.

So after the transaction X in btrfs_rm_dev_item() committed, but before
transaction X+1 (which can be minutes away), a power loss happen, then
we got the super num mismatch.

[FIX]
Instead of starting and committing a transaction inside
btrfs_rm_dev_item(), start a transaction in side btrfs_rm_device() and
pass it to btrfs_rm_dev_item().

And only commit the transaction after everything is done.

Reported-by: Luca Béla Palkovics <luca.bela.palkovics@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/CA+8xDSpvdm_U0QLBAnrH=zqDq_cWCOH5TiV46CKmp3igr44okQ@mail.gmail.com/
CC: stable@vger.kernel.org # 4.14+
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agobtrfs: zoned: traverse devices under chunk_mutex in btrfs_can_activate_zone
Johannes Thumshirn [Mon, 7 Mar 2022 10:47:17 +0000 (02:47 -0800)]
btrfs: zoned: traverse devices under chunk_mutex in btrfs_can_activate_zone

commit 0b9e66762aa0cda2a9c2d5542d64e04dac528fa6 upstream.

btrfs_can_activate_zone() can be called with the device_list_mutex already
held, which will lead to a deadlock:

insert_dev_extents() // Takes device_list_mutex
`-> insert_dev_extent()
 `-> btrfs_insert_empty_item()
  `-> btrfs_insert_empty_items()
   `-> btrfs_search_slot()
    `-> btrfs_cow_block()
     `-> __btrfs_cow_block()
      `-> btrfs_alloc_tree_block()
       `-> btrfs_reserve_extent()
        `-> find_free_extent()
         `-> find_free_extent_update_loop()
          `-> can_allocate_chunk()
           `-> btrfs_can_activate_zone() // Takes device_list_mutex again

Instead of using the RCU on fs_devices->device_list we
can use fs_devices->alloc_list, protected by the chunk_mutex to traverse
the list of active devices.

We are in the chunk allocation thread. The newer chunk allocation
happens from the devices in the fs_device->alloc_list protected by the
chunk_mutex.

  btrfs_create_chunk()
    lockdep_assert_held(&info->chunk_mutex);
    gather_device_info
      list_for_each_entry(device, &fs_devices->alloc_list, dev_alloc_list)

Also, a device that reappears after the mount won't join the alloc_list
yet and, it will be in the dev_list, which we don't want to consider in
the context of the chunk alloc.

  [15.166572] WARNING: possible recursive locking detected
  [15.167117] 5.17.0-rc6-dennis #79 Not tainted
  [15.167487] --------------------------------------------
  [15.167733] kworker/u8:3/146 is trying to acquire lock:
  [15.167733] ffff888102962ee0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: find_free_extent+0x15a/0x14f0 [btrfs]
  [15.167733]
  [15.167733] but task is already holding lock:
  [15.167733] ffff888102962ee0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: btrfs_create_pending_block_groups+0x20a/0x560 [btrfs]
  [15.167733]
  [15.167733] other info that might help us debug this:
  [15.167733]  Possible unsafe locking scenario:
  [15.167733]
  [15.171834]        CPU0
  [15.171834]        ----
  [15.171834]   lock(&fs_devs->device_list_mutex);
  [15.171834]   lock(&fs_devs->device_list_mutex);
  [15.171834]
  [15.171834]  *** DEADLOCK ***
  [15.171834]
  [15.171834]  May be due to missing lock nesting notation
  [15.171834]
  [15.171834] 5 locks held by kworker/u8:3/146:
  [15.171834]  #0: ffff888100050938 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1c3/0x5a0
  [15.171834]  #1: ffffc9000067be80 ((work_completion)(&fs_info->async_data_reclaim_work)){+.+.}-{0:0}, at: process_one_work+0x1c3/0x5a0
  [15.176244]  #2: ffff88810521e620 (sb_internal){.+.+}-{0:0}, at: flush_space+0x335/0x600 [btrfs]
  [15.176244]  #3: ffff888102962ee0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: btrfs_create_pending_block_groups+0x20a/0x560 [btrfs]
  [15.176244]  #4: ffff8881152e4b78 (btrfs-dev-00){++++}-{3:3}, at: __btrfs_tree_lock+0x27/0x130 [btrfs]
  [15.179641]
  [15.179641] stack backtrace:
  [15.179641] CPU: 1 PID: 146 Comm: kworker/u8:3 Not tainted 5.17.0-rc6-dennis #79
  [15.179641] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1.fc35 04/01/2014
  [15.179641] Workqueue: events_unbound btrfs_async_reclaim_data_space [btrfs]
  [15.179641] Call Trace:
  [15.179641]  <TASK>
  [15.179641]  dump_stack_lvl+0x45/0x59
  [15.179641]  __lock_acquire.cold+0x217/0x2b2
  [15.179641]  lock_acquire+0xbf/0x2b0
  [15.183838]  ? find_free_extent+0x15a/0x14f0 [btrfs]
  [15.183838]  __mutex_lock+0x8e/0x970
  [15.183838]  ? find_free_extent+0x15a/0x14f0 [btrfs]
  [15.183838]  ? find_free_extent+0x15a/0x14f0 [btrfs]
  [15.183838]  ? lock_is_held_type+0xd7/0x130
  [15.183838]  ? find_free_extent+0x15a/0x14f0 [btrfs]
  [15.183838]  find_free_extent+0x15a/0x14f0 [btrfs]
  [15.183838]  ? _raw_spin_unlock+0x24/0x40
  [15.183838]  ? btrfs_get_alloc_profile+0x106/0x230 [btrfs]
  [15.187601]  btrfs_reserve_extent+0x131/0x260 [btrfs]
  [15.187601]  btrfs_alloc_tree_block+0xb5/0x3b0 [btrfs]
  [15.187601]  __btrfs_cow_block+0x138/0x600 [btrfs]
  [15.187601]  btrfs_cow_block+0x10f/0x230 [btrfs]
  [15.187601]  btrfs_search_slot+0x55f/0xbc0 [btrfs]
  [15.187601]  ? lock_is_held_type+0xd7/0x130
  [15.187601]  btrfs_insert_empty_items+0x2d/0x60 [btrfs]
  [15.187601]  btrfs_create_pending_block_groups+0x2b3/0x560 [btrfs]
  [15.187601]  __btrfs_end_transaction+0x36/0x2a0 [btrfs]
  [15.192037]  flush_space+0x374/0x600 [btrfs]
  [15.192037]  ? find_held_lock+0x2b/0x80
  [15.192037]  ? btrfs_async_reclaim_data_space+0x49/0x180 [btrfs]
  [15.192037]  ? lock_release+0x131/0x2b0
  [15.192037]  btrfs_async_reclaim_data_space+0x70/0x180 [btrfs]
  [15.192037]  process_one_work+0x24c/0x5a0
  [15.192037]  worker_thread+0x4a/0x3d0

Fixes: a85f05e59bc1 ("btrfs: zoned: avoid chunk allocation if active block group has enough space")
CC: stable@vger.kernel.org # 5.16+
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agobtrfs: fix qgroup reserve overflow the qgroup limit
Ethan Lien [Mon, 7 Mar 2022 10:00:04 +0000 (18:00 +0800)]
btrfs: fix qgroup reserve overflow the qgroup limit

commit b642b52d0b50f4d398cb4293f64992d0eed2e2ce upstream.

We use extent_changeset->bytes_changed in qgroup_reserve_data() to record
how many bytes we set for EXTENT_QGROUP_RESERVED state. Currently the
bytes_changed is set as "unsigned int", and it will overflow if we try to
fallocate a range larger than 4GiB. The result is we reserve less bytes
and eventually break the qgroup limit.

Unlike regular buffered/direct write, which we use one changeset for
each ordered extent, which can never be larger than 256M.  For
fallocate, we use one changeset for the whole range, thus it no longer
respects the 256M per extent limit, and caused the problem.

The following example test script reproduces the problem:

  $ cat qgroup-overflow.sh
  #!/bin/bash

  DEV=/dev/sdj
  MNT=/mnt/sdj

  mkfs.btrfs -f $DEV
  mount $DEV $MNT

  # Set qgroup limit to 2GiB.
  btrfs quota enable $MNT
  btrfs qgroup limit 2G $MNT

  # Try to fallocate a 3GiB file. This should fail.
  echo
  echo "Try to fallocate a 3GiB file..."
  fallocate -l 3G $MNT/3G.file

  # Try to fallocate a 5GiB file.
  echo
  echo "Try to fallocate a 5GiB file..."
  fallocate -l 5G $MNT/5G.file

  # See we break the qgroup limit.
  echo
  sync
  btrfs qgroup show -r $MNT

  umount $MNT

When running the test:

  $ ./qgroup-overflow.sh
  (...)

  Try to fallocate a 3GiB file...
  fallocate: fallocate failed: Disk quota exceeded

  Try to fallocate a 5GiB file...

  qgroupid         rfer         excl     max_rfer
  --------         ----         ----     --------
  0/5           5.00GiB      5.00GiB      2.00GiB

Since we have no control of how bytes_changed is used, it's better to
set it to u64.

CC: stable@vger.kernel.org # 4.14+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Ethan Lien <ethanlien@synology.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoperf/x86/intel: Update the FRONTEND MSR mask on Sapphire Rapids
Kan Liang [Mon, 28 Mar 2022 15:49:03 +0000 (08:49 -0700)]
perf/x86/intel: Update the FRONTEND MSR mask on Sapphire Rapids

commit e590928de7547454469693da9bc7ffd562e54b7e upstream.

On Sapphire Rapids, the FRONTEND_RETIRED.MS_FLOWS event requires the
FRONTEND MSR value 0x8. However, the current FRONTEND MSR mask doesn't
support it.

Update intel_spr_extra_regs[] to support it.

Fixes: 61b985e3e775 ("perf/x86/intel: Add perf core PMU support for Sapphire Rapids")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/1648482543-14923-2-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agox86/speculation: Restore speculation related MSRs during S3 resume
Pawan Gupta [Tue, 5 Apr 2022 00:35:45 +0000 (17:35 -0700)]
x86/speculation: Restore speculation related MSRs during S3 resume

commit e2a1256b17b16f9b9adf1b6fea56819e7b68e463 upstream.

After resuming from suspend-to-RAM, the MSRs that control CPU's
speculative execution behavior are not being restored on the boot CPU.

These MSRs are used to mitigate speculative execution vulnerabilities.
Not restoring them correctly may leave the CPU vulnerable.  Secondary
CPU's MSRs are correctly being restored at S3 resume by
identify_secondary_cpu().

During S3 resume, restore these MSRs for boot CPU when restoring its
processor state.

Fixes: 772439717dbf ("x86/bugs/intel: Set proper CPU features and setup RDS")
Reported-by: Neelima Krishnan <neelima.krishnan@intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Tested-by: Neelima Krishnan <neelima.krishnan@intel.com>
Acked-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agox86/pm: Save the MSR validity status at context setup
Pawan Gupta [Tue, 5 Apr 2022 00:34:19 +0000 (17:34 -0700)]
x86/pm: Save the MSR validity status at context setup

commit 73924ec4d560257004d5b5116b22a3647661e364 upstream.

The mechanism to save/restore MSRs during S3 suspend/resume checks for
the MSR validity during suspend, and only restores the MSR if its a
valid MSR.  This is not optimal, as an invalid MSR will unnecessarily
throw an exception for every suspend cycle.  The more invalid MSRs,
higher the impact will be.

Check and save the MSR validity at setup.  This ensures that only valid
MSRs that are guaranteed to not throw an exception will be attempted
during suspend.

Fixes: 7a9c2dd08ead ("x86/pm: Introduce quirk framework to save/restore extra MSR registers around suspend/resume")
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoio_uring: fix race between timeout flush and removal
Jens Axboe [Fri, 8 Apr 2022 17:08:58 +0000 (11:08 -0600)]
io_uring: fix race between timeout flush and removal

commit e677edbcabee849bfdd43f1602bccbecf736a646 upstream.

io_flush_timeouts() assumes the timeout isn't in progress of triggering
or being removed/canceled, so it unconditionally removes it from the
timeout list and attempts to cancel it.

Leave it on the list and let the normal timeout cancelation take care
of it.

Cc: stable@vger.kernel.org # 5.5+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoio_uring: implement compat handling for IORING_REGISTER_IOWQ_AFF
Eugene Syromiatnikov [Wed, 6 Apr 2022 11:55:33 +0000 (13:55 +0200)]
io_uring: implement compat handling for IORING_REGISTER_IOWQ_AFF

commit 0f5e4b83b37a96e3643951588ed7176b9b187c0a upstream.

Similarly to the way it is done im mbind syscall.

Cc: stable@vger.kernel.org # 5.14
Fixes: fe76421d1da1dcdb ("io_uring: allow user configurable IO thread CPU affinity")
Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoio_uring: defer splice/tee file validity check until command issue
Jens Axboe [Tue, 29 Mar 2022 16:59:20 +0000 (10:59 -0600)]
io_uring: defer splice/tee file validity check until command issue

commit a3e4bc23d5470b2beb7cc42a86b6a3e75b704c15 upstream.

In preparation for not using the file at prep time, defer checking if this
file refers to a valid io_uring instance until issue time.

This also means we can get rid of the cleanup flag for splice and tee.

Cc: stable@vger.kernel.org # v5.15+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoio_uring: don't check req->file in io_fsync_prep()
Jens Axboe [Wed, 30 Mar 2022 17:06:02 +0000 (11:06 -0600)]
io_uring: don't check req->file in io_fsync_prep()

commit ec858afda857e361182ceafc3d2ba2b164b8e889 upstream.

This is a leftover from the really old days where we weren't able to
track and error early if we need a file and it wasn't assigned. Kill
the check.

Cc: stable@vger.kernel.org # v5.15+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agomm/mempolicy: fix mpol_new leak in shared_policy_replace
Miaohe Lin [Fri, 8 Apr 2022 20:09:07 +0000 (13:09 -0700)]
mm/mempolicy: fix mpol_new leak in shared_policy_replace

commit 4ad099559b00ac01c3726e5c95dc3108ef47d03e upstream.

If mpol_new is allocated but not used in restart loop, mpol_new will be
freed via mpol_put before returning to the caller.  But refcnt is not
initialized yet, so mpol_put could not do the right things and might
leak the unused mpol_new.  This would happen if mempolicy was updated on
the shared shmem file while the sp->lock has been dropped during the
memory allocation.

This issue could be triggered easily with the below code snippet if
there are many processes doing the below work at the same time:

  shmid = shmget((key_t)5566, 1024 * PAGE_SIZE, 0666|IPC_CREAT);
  shm = shmat(shmid, 0, 0);
  loop many times {
    mbind(shm, 1024 * PAGE_SIZE, MPOL_LOCAL, mask, maxnode, 0);
    mbind(shm + 128 * PAGE_SIZE, 128 * PAGE_SIZE, MPOL_DEFAULT, mask,
          maxnode, 0);
  }

Link: https://lkml.kernel.org/r/20220329111416.27954-1-linmiaohe@huawei.com
Fixes: 42288fe366c4 ("mm: mempolicy: Convert shared_policy mutex to spinlock")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: <stable@vger.kernel.org> [3.8]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agommmremap.c: avoid pointless invalidate_range_start/end on mremap(old_size=0)
Paolo Bonzini [Fri, 8 Apr 2022 20:09:04 +0000 (13:09 -0700)]
mmmremap.c: avoid pointless invalidate_range_start/end on mremap(old_size=0)

commit 01e67e04c28170c47700c2c226d732bbfedb1ad0 upstream.

If an mremap() syscall with old_size=0 ends up in move_page_tables(), it
will call invalidate_range_start()/invalidate_range_end() unnecessarily,
i.e.  with an empty range.

This causes a WARN in KVM's mmu_notifier.  In the past, empty ranges
have been diagnosed to be off-by-one bugs, hence the WARNing.  Given the
low (so far) number of unique reports, the benefits of detecting more
buggy callers seem to outweigh the cost of having to fix cases such as
this one, where userspace is doing something silly.  In this particular
case, an early return from move_page_tables() is enough to fix the
issue.

Link: https://lkml.kernel.org/r/20220329173155.172439-1-pbonzini@redhat.com
Reported-by: syzbot+6bde52d89cfdf9f61425@syzkaller.appspotmail.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agohighmem: fix checks in __kmap_local_sched_{in,out}
Max Filippov [Fri, 8 Apr 2022 20:08:55 +0000 (13:08 -0700)]
highmem: fix checks in __kmap_local_sched_{in,out}

commit 66f133ceab7456c789f70a242991ed1b27ba1c3d upstream.

When CONFIG_DEBUG_KMAP_LOCAL is enabled __kmap_local_sched_{in,out} check
that even slots in the tsk->kmap_ctrl.pteval are unmapped.  The slots are
initialized with 0 value, but the check is done with pte_none.  0 pte
however does not necessarily mean that pte_none will return true.  e.g.
on xtensa it returns false, resulting in the following runtime warnings:

 WARNING: CPU: 0 PID: 101 at mm/highmem.c:627 __kmap_local_sched_out+0x51/0x108
 CPU: 0 PID: 101 Comm: touch Not tainted 5.17.0-rc7-00010-gd3a1cdde80d2-dirty #13
 Call Trace:
   dump_stack+0xc/0x40
   __warn+0x8f/0x174
   warn_slowpath_fmt+0x48/0xac
   __kmap_local_sched_out+0x51/0x108
   __schedule+0x71a/0x9c4
   preempt_schedule_irq+0xa0/0xe0
   common_exception_return+0x5c/0x93
   do_wp_page+0x30e/0x330
   handle_mm_fault+0xa70/0xc3c
   do_page_fault+0x1d8/0x3c4
   common_exception+0x7f/0x7f

 WARNING: CPU: 0 PID: 101 at mm/highmem.c:664 __kmap_local_sched_in+0x50/0xe0
 CPU: 0 PID: 101 Comm: touch Tainted: G        W         5.17.0-rc7-00010-gd3a1cdde80d2-dirty #13
 Call Trace:
   dump_stack+0xc/0x40
   __warn+0x8f/0x174
   warn_slowpath_fmt+0x48/0xac
   __kmap_local_sched_in+0x50/0xe0
   finish_task_switch$isra$0+0x1ce/0x2f8
   __schedule+0x86e/0x9c4
   preempt_schedule_irq+0xa0/0xe0
   common_exception_return+0x5c/0x93
   do_wp_page+0x30e/0x330
   handle_mm_fault+0xa70/0xc3c
   do_page_fault+0x1d8/0x3c4
   common_exception+0x7f/0x7f

Fix it by replacing !pte_none(pteval) with pte_val(pteval) != 0.

Link: https://lkml.kernel.org/r/20220403235159.3498065-1-jcmvbkbc@gmail.com
Fixes: 5fbda3ecd14a ("sched: highmem: Store local kmaps in task struct")
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agolz4: fix LZ4_decompress_safe_partial read out of bound
Guo Xuenan [Fri, 8 Apr 2022 20:08:58 +0000 (13:08 -0700)]
lz4: fix LZ4_decompress_safe_partial read out of bound

commit eafc0a02391b7b36617b36c97c4b5d6832cf5e24 upstream.

When partialDecoding, it is EOF if we've either filled the output buffer
or can't proceed with reading an offset for following match.

In some extreme corner cases when compressed data is suitably corrupted,
UAF will occur.  As reported by KASAN [1], LZ4_decompress_safe_partial
may lead to read out of bound problem during decoding.  lz4 upstream has
fixed it [2] and this issue has been disscussed here [3] before.

current decompression routine was ported from lz4 v1.8.3, bumping
lib/lz4 to v1.9.+ is certainly a huge work to be done later, so, we'd
better fix it first.

[1] https://lore.kernel.org/all/000000000000830d1205cf7f0477@google.com/
[2] https://github.com/lz4/lz4/commit/c5d6f8a8be3927c0bec91bcc58667a6cfad244ad#
[3] https://lore.kernel.org/all/CC666AE8-4CA4-4951-B6FB-A2EFDE3AC03B@fb.com/

Link: https://lkml.kernel.org/r/20211111105048.2006070-1-guoxuenan@huawei.com
Reported-by: syzbot+63d688f1d899c588fb71@syzkaller.appspotmail.com
Signed-off-by: Guo Xuenan <guoxuenan@huawei.com>
Reviewed-by: Nick Terrell <terrelln@fb.com>
Acked-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Cc: Yann Collet <cyan@fb.com>
Cc: Chengyang Fan <cy.fan@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agommc: core: Fixup support for writeback-cache for eMMC and SD
Michael Wu [Thu, 31 Mar 2022 07:32:23 +0000 (15:32 +0800)]
mmc: core: Fixup support for writeback-cache for eMMC and SD

commit 08ebf903af57cda6d773f3dd1671b64f73b432b8 upstream.

During the card initialization process, the mmc core checks whether the
eMMC/SD card supports an internal writeback-cache and then enables it
inside the card.

Unfortunately, this isn't according to what the mmc core reports to the
upper block layer. Instead, the writeback-cache support with REQ_FLUSH and
REQ_FUA, are being enabled depending on whether the host supports the CMD23
(MMC_CAP_CMD23) and whether an eMMC supports the reliable-write command.

This is wrong and it may also sound awkward. In fact, it's a remnant
from when both eMMC/SD cards didn't have dedicated commands/support to
control the internal writeback-cache. In other words, it was the best we
could do at that point in time.

To fix the problem, but also without breaking backwards compatibility,
let's align the REQ_FLUSH support with whether the writeback-cache became
successfully enabled - for both eMMC and SD cards.

Cc: stable@kernel.org
Fixes: 881d1c25f765 ("mmc: core: Add cache control for eMMC4.5 device")
Fixes: 130206a615a9 ("mmc: core: Add support for cache ctrl for SD cards")
Depends-on: 97fce126e279 ("mmc: block: Issue a cache flush only when it's enabled")
Reviewed-by: Avri Altman <Avri.Altman@wdc.com>
Signed-off-by: Michael Wu <michael@allwinnertech.com>
Link: https://lore.kernel.org/r/20220331073223.106415-1-michael@allwinnertech.com
[Ulf: Re-wrote the commit message]
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agommc: renesas_sdhi: don't overwrite TAP settings when HS400 tuning is complete
Wolfram Sang [Mon, 4 Apr 2022 11:49:02 +0000 (13:49 +0200)]
mmc: renesas_sdhi: don't overwrite TAP settings when HS400 tuning is complete

commit 03e59b1e2f56245163b14c69e0a830c24b1a3a47 upstream.

When HS400 tuning is complete and HS400 is going to be activated, we
have to keep the current number of TAPs and should not overwrite them
with a hardcoded value. This was probably a copy&paste mistake when
upporting HS400 support from the BSP.

Fixes: 26eb2607fa28 ("mmc: renesas_sdhi: add eMMC HS400 mode support")
Reported-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220404114902.12175-1-wsa+renesas@sang-engineering.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agommc: renesas_sdhi: special 4tap settings only apply to HS400
Wolfram Sang [Mon, 4 Apr 2022 10:05:08 +0000 (12:05 +0200)]
mmc: renesas_sdhi: special 4tap settings only apply to HS400

commit 46d4820f949a3030b19ee482c68a50b06dd27590 upstream.

Previous documentation was vague, so we included SDR104 for slow SDnH
clock settings. It turns out now, that it is only needed for HS400.

Fixes: bb6d3fa98a41 ("clk: renesas: rcar-gen3: Switch to new SD clock handling")
Cc: stable@vger.kernel.org
Reported-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Link: https://lore.kernel.org/r/20220404100508.3209-1-wsa+renesas@sang-engineering.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agommc: mmci: stm32: correctly check all elements of sg list
Yann Gautier [Thu, 17 Mar 2022 11:19:43 +0000 (12:19 +0100)]
mmc: mmci: stm32: correctly check all elements of sg list

commit 0d319dd5a27183b75d984e3dc495248e59f99334 upstream.

Use sg and not data->sg when checking sg list elements. Else only the
first element alignment is checked.
The last element should be checked the same way, for_each_sg already set
sg to sg_next(sg).

Fixes: 46b723dd867d ("mmc: mmci: add stm32 sdmmc variant")
Cc: stable@vger.kernel.org
Signed-off-by: Yann Gautier <yann.gautier@foss.st.com>
Link: https://lore.kernel.org/r/20220317111944.116148-2-yann.gautier@foss.st.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agommc: block: Check for errors after write on SPI
Christian Löhle [Thu, 24 Mar 2022 14:18:41 +0000 (14:18 +0000)]
mmc: block: Check for errors after write on SPI

commit 5d435933376962b107bd76970912e7e80247dcc7 upstream.

Introduce a SEND_STATUS check for writes through SPI to not mark
an unsuccessful write as successful.

Since SPI SD/MMC does not have states, after a write, the card will
just hold the line LOW until it is ready again. The driver marks the
write therefore as completed as soon as it reads something other than
all zeroes.
The driver does not distinguish from a card no longer signalling busy
and it being disconnected (and the line being pulled-up by the host).
This lead to writes being marked as successful when disconnecting
a busy card.
Now the card is ensured to be still connected by an additional CMD13,
just like non-SPI is ensured to go back to TRAN state.

While at it and since we already poll for the post-write status anyway,
we might as well check for SPIs error bits (any of them).

The disconnecting card problem is reproducable for me after continuous
write activity and randomly disconnecting, around every 20-50 tries
on SPI DS for some card.

Fixes: 7213d175e3b6f ("MMC/SD card driver learns SPI")
Cc: stable@vger.kernel.org
Signed-off-by: Christian Loehle <cloehle@hyperstone.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/76f6f5d2b35543bab3dfe438f268609c@hyperstone.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoRevert "mmc: sdhci-xenon: fix annoying 1.8V regulator warning"
Pali Rohár [Fri, 18 Mar 2022 14:14:41 +0000 (15:14 +0100)]
Revert "mmc: sdhci-xenon: fix annoying 1.8V regulator warning"

commit 7e2646ed47542123168d43916b84b954532e5386 upstream.

This reverts commit bb32e1987bc55ce1db400faf47d85891da3c9b9f.

Commit 1a3ed0dc3594 ("mmc: sdhci-xenon: fix 1.8v regulator stabilization")
contains proper fix for the issue described in commit bb32e1987bc5 ("mmc:
sdhci-xenon: fix annoying 1.8V regulator warning").

Fixes: 8d876bf472db ("mmc: sdhci-xenon: wait 5ms after set 1.8V signal enable")
Cc: stable@vger.kernel.org # 1a3ed0dc3594 ("mmc: sdhci-xenon: fix 1.8v regulator stabilization")
Signed-off-by: Pali Rohár <pali@kernel.org>
Reviewed-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Marcin Wojtas <mw@semihalf.com>
Link: https://lore.kernel.org/r/20220318141441.32329-1-pali@kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>