WANG Cong [Wed, 6 Jul 2016 05:12:36 +0000 (22:12 -0700)]
ppp: defer netns reference release for ppp channel
[ Upstream commit
205e1e255c479f3fd77446415706463b282f94e4 ]
Matt reported that we have a NULL pointer dereference
in ppp_pernet() from ppp_connect_channel(),
i.e. pch->chan_net is NULL.
This is due to that a parallel ppp_unregister_channel()
could happen while we are in ppp_connect_channel(), during
which pch->chan_net set to NULL. Since we need a reference
to net per channel, it makes sense to sync the refcnt
with the life time of the channel, therefore we should
release this reference when we destroy it.
Fixes: 1f461dcdd296 ("ppp: take reference on channels netns")
Reported-by: Matt Bennett <Matt.Bennett@alliedtelesis.co.nz>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linux-ppp@vger.kernel.org
Cc: Guillaume Nault <g.nault@alphalink.fr>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Dan Williams [Fri, 24 Jun 2016 00:50:39 +0000 (17:50 -0700)]
libnvdimm, pfn, dax: fix initialization vs autodetect for mode + alignment
commit
1ee6667cd8d183b2fed12f97285f184431d2caf9 upstream.
The updated ndctl unit tests discovered that if a pfn configuration with
a 4K alignment is read from the namespace, that alignment will be
ignored in favor of the default 2M alignment. The result is that the
configuration will fail initialization with a message like:
dax6.1: bad offset: 0x22000 dax disabled align: 0x200000
Fix this by allowing the alignment read from the info block to override
the default which is 2M not 0 in the autodetect path. This also fixes a
similar problem with the mode and alignment settings silently being
overwritten by the kernel when userspace has changed it. We now will
either overwrite the info block if userspace changes the uuid or fail
and warn if a live setting disagrees with the info block.
Cc: Micah Parrish <micah.parrish@hpe.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Dan Williams [Thu, 31 Mar 2016 22:41:18 +0000 (15:41 -0700)]
libnvdimm, dax: record the specified alignment of a dax-device instance
commit
45a0dac0451136fa7ae34a6fea53ef6a136287ce upstream.
We want to use the alignment as the allocation and mapping unit.
Previously this information was only useful for establishing the data
offset, but now it is important to remember the granularity for the
later use.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Greg Kroah-Hartman [Wed, 10 Aug 2016 10:54:57 +0000 (12:54 +0200)]
Linux 4.6.6
Paul Burton [Wed, 3 Feb 2016 03:15:26 +0000 (03:15 +0000)]
MIPS: CM: Fix mips_cm_max_vp_width for UP kernels
commit
a60ae81e5e5918138703f22427dd8f2445985b55 upstream.
Fix mips_cm_max_vp_width for UP kernels where it previously referenced
smp_num_siblings, which is not declared for UP kernels. This led to
build errors such as the following:
drivers/built-in.o: In function `$L446':
irq-mips-gic.c:(.text+0x1994): undefined reference to `smp_num_siblings'
drivers/built-in.o:irq-mips-gic.c:(.text+0x199c): more undefined references to `smp_num_siblings' follow
On UP kernels simply return 1, leaving the reference to smp_num_siblings
in place only for SMP kernels.
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: Matt Redfearn <matt.redfearn@imgtec.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/12332/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Miklos Szeredi [Wed, 3 Aug 2016 11:44:27 +0000 (13:44 +0200)]
vfs: fix deadlock in file_remove_privs() on overlayfs
commit
c1892c37769cf89c7e7ba57528ae2ccb5d153c9b upstream.
file_remove_privs() is called with inode lock on file_inode(), which
proceeds to calling notify_change() on file->f_path.dentry. Which triggers
the WARN_ON_ONCE(!inode_is_locked(inode)) in addition to deadlocking later
when ovl_setattr tries to lock the underlying inode again.
Fix this mess by not mixing the layers, but doing everything on underlying
dentry/inode.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 07a2daab49c5 ("ovl: Copy up underlying inode's ->i_mode to overlay inode")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Scott Bauer [Thu, 28 Jul 2016 01:11:29 +0000 (19:11 -0600)]
vfs: ioctl: prevent double-fetch in dedupe ioctl
commit
10eec60ce79187686e052092e5383c99b4420a20 upstream.
This prevents a double-fetch from user space that can lead to to an
undersized allocation and heap overflow.
Fixes: 54dbc1517237 ("vfs: hoist the btrfs deduplication ioctl to the vfs")
Signed-off-by: Scott Bauer <sbauer@plzdonthack.me>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alexander Shishkin [Thu, 30 Jun 2016 08:51:44 +0000 (11:51 +0300)]
intel_th: Fix a deadlock in modprobing
commit
a36aa80f3cb2540fb1dbad6240852de4365a2e82 upstream.
Driver initialization tries to request a hub (GTH) driver module from
its probe callback, resulting in a deadlock.
This patch solves the problem by adding a deferred work for requesting
the hub module.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alexander Shishkin [Tue, 28 Jun 2016 15:55:23 +0000 (18:55 +0300)]
intel_th: pci: Add Kaby Lake PCH-H support
commit
7a1a47ce35821b40f5b2ce46379ba14393bc3873 upstream.
This adds Intel(R) Trace Hub PCI ID for Kaby Lake PCH-H.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Gregory Greenman [Tue, 5 Jul 2016 12:23:10 +0000 (15:23 +0300)]
cfg80211: handle failed skb allocation
commit
16a910a6722b7a8680409e634c7c0dac073c01e4 upstream.
Handle the case when dev_alloc_skb returns NULL.
Fixes: 2b67f944f88c2 ("cfg80211: reuse existing page fragments in A-MSDU rx")
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Dmitri Epshtein [Wed, 6 Jul 2016 02:18:58 +0000 (04:18 +0200)]
net: mvneta: set real interrupt per packet for tx_done
commit
06708f81528725148473c0869d6af5f809c6824b upstream.
Commit
aebea2ba0f74 ("net: mvneta: fix Tx interrupt delay") intended to
set coalescing threshold to a value guaranteeing interrupt generation
per each sent packet, so that buffers can be released with no delay.
In fact setting threshold to '1' was wrong, because it causes interrupt
every two packets. According to the documentation a reason behind it is
following - interrupt occurs once sent buffers counter reaches a value,
which is higher than one specified in MVNETA_TXQ_SIZE_REG(q). This
behavior was confirmed during tests. Also when testing the SoC working
as a NAS device, better performance was observed with int-per-packet,
as it strongly depends on the fact that all transmitted packets are
released immediately.
This commit enables NETA controller work in interrupt per sent packet mode
by setting coalescing threshold to 0.
Signed-off-by: Dmitri Epshtein <dima@marvell.com>
Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Fixes
aebea2ba0f74 ("net: mvneta: fix Tx interrupt delay")
Acked-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ilya Dryomov [Tue, 19 Jul 2016 01:50:28 +0000 (03:50 +0200)]
libceph: apply new_state before new_up_client on incrementals
commit
930c532869774ebf8af9efe9484c597f896a7d46 upstream.
Currently, osd_weight and osd_state fields are updated in the encoding
order. This is wrong, because an incremental map may look like e.g.
new_up_client: { osd=6, addr=... } # set osd_state and addr
new_state: { osd=6, xorstate=EXISTS } # clear osd_state
Suppose osd6's current osd_state is EXISTS (i.e. osd6 is down). After
applying new_up_client, osd_state is changed to EXISTS | UP. Carrying
on with the new_state update, we flip EXISTS and leave osd6 in a weird
"!EXISTS but UP" state. A non-existent OSD is considered down by the
mapping code
2087 for (i = 0; i < pg->pg_temp.len; i++) {
2088 if (ceph_osd_is_down(osdmap, pg->pg_temp.osds[i])) {
2089 if (ceph_can_shift_osds(pi))
2090 continue;
2091
2092 temp->osds[temp->size++] = CRUSH_ITEM_NONE;
and so requests get directed to the second OSD in the set instead of
the first, resulting in OSD-side errors like:
[WRN] : client.4239 192.168.122.21:0/
2444980242 misdirected client.4239.1:2827 pg 2.
5df899f2 to osd.4 not [1,4,6] in e680/680
and hung rbds on the client:
[ 493.566367] rbd: rbd0: write 400000 at
11cc00000 (0)
[ 493.566805] rbd: rbd0: result -6 xferred 400000
[ 493.567011] blk_update_request: I/O error, dev rbd0, sector
9330688
The fix is to decouple application from the decoding and:
- apply new_weight first
- apply new_state before new_up_client
- twiddle osd_state flags if marking in
- clear out some of the state if osd is destroyed
Fixes: http://tracker.ceph.com/issues/14901
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tejun Heo [Mon, 18 Jul 2016 22:40:00 +0000 (18:40 -0400)]
libata: LITE-ON CX1-JB256-HP needs lower max_sectors
commit
1488a1e3828d60d74c9b802a05e24c0487babe4e upstream.
Since
34b48db66e08 ("block: remove artifical max_hw_sectors cap"),
max_sectors is no longer limited to BLK_DEF_MAX_SECTORS and LITE-ON
CX1-JB256-HP keeps timing out with higher max_sectors. Revert it to
the previous value.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: dgerasimov@gmail.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=121671
Fixes: 34b48db66e08 ("block: remove artifical max_hw_sectors cap")
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Lukasz Gemborowski [Mon, 27 Jun 2016 10:57:47 +0000 (12:57 +0200)]
i2c: mux: reg: wrong condition checked for of_address_to_resource return value
commit
22ebf00eb56fe77922de8138aa9af9996582c2b3 upstream.
of_address_to_resource return 0 on successful call but
devm_ioremap_resource is called only if it returns non-zero value
Signed-off-by: Lukasz Gemborowski <lukasz.gemborowski@nokia.com>
Reviewed-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sricharan R [Fri, 10 Jun 2016 18:08:20 +0000 (23:38 +0530)]
i2c: qup: Fix wrong value of index variable
commit
d4f56c7773483b8829e89cfc739b7a5a071f6da0 upstream.
index gets incremented during check to determine if the
messages can be transferred with dma. But not reset after
that, resulting in wrong start value in subsequent loop,
causing failure. Fix it.
Signed-off-by: Sricharan R <sricharan@codeaurora.org>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Laurent Pinchart [Tue, 24 May 2016 12:09:33 +0000 (09:09 -0300)]
adv7604: Don't ignore pad number in subdev DV timings pad operations
commit
6519c3d7b8621c9f4333c98ed4b703029b51ba79 upstream.
The dv_timings_cap() and enum_dv_timings() pad operations take a pad
number as an input argument and return the DV timings capabilities and
list of supported DV timings for that pad.
Commit
bd3e275f3ec0 ("[media] media: i2c: adv7604: Use v4l2-dv-timings
helpers") broke this as it started ignoring the pad number, always
returning the information associated with the currently selected input.
Fix it.
Fixes: bd3e275f3ec0 ("[media] media: i2c: adv7604: Use v4l2-dv-timings helpers")
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Thomas Gleixner [Tue, 12 Jul 2016 19:59:23 +0000 (21:59 +0200)]
cpu/hotplug: Keep enough storage space if SMP=n to avoid array out of bounds scribble
commit
a7c734140aa36413944eef0f8c660e0e2256357d upstream.
Xiaolong Ye reported lock debug warnings triggered by the following commit:
8de4a0066106 ("perf/x86: Convert the core to the hotplug state machine")
The bug is the following: the cpuhp_bp_states[] array is cut short when
CONFIG_SMP=n, but the dynamically registered callbacks are stored nevertheless
and happily scribble outside of the array bounds...
We need to store them in case that the state is unregistered so we can invoke
the teardown function. That's independent of CONFIG_SMP. Make sure the array
is large enough.
Reported-by: kernel test robot <xiaolong.ye@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Adam Borowski <kilobyte@angband.pl>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: lkp@01.org
Cc: tipbuild@zytor.com
Fixes: cff7d378d3fd "cpu/hotplug: Convert to a state machine for the control processor"
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1607122144560.4083@nanos
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alexey Dobriyan [Thu, 7 Jul 2016 22:39:11 +0000 (01:39 +0300)]
posix_cpu_timer: Exit early when process has been reaped
commit
2c13ce8f6b2f6fd9ba2f9261b1939fc0f62d1307 upstream.
Variable "now" seems to be genuinely used unintialized
if branch
if (CPUCLOCK_PERTHREAD(timer->it_clock)) {
is not taken and branch
if (unlikely(sighand == NULL)) {
is taken. In this case the process has been reaped and the timer is marked as
disarmed anyway. So none of the postprocessing of the sample is
required. Return right away.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Link: http://lkml.kernel.org/r/20160707223911.GA26483@p183.telecom.by
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
James Patrick-Evans [Fri, 15 Jul 2016 15:40:45 +0000 (16:40 +0100)]
media: fix airspy usb probe error path
commit
aa93d1fee85c890a34f2510a310e55ee76a27848 upstream.
Fix a memory leak on probe error of the airspy usb device driver.
The problem is triggered when more than 64 usb devices register with
v4l2 of type VFL_TYPE_SDR or VFL_TYPE_SUBDEV.
The memory leak is caused by the probe function of the airspy driver
mishandeling errors and not freeing the corresponding control structures
when an error occours registering the device to v4l2 core.
A badusb device can emulate 64 of these devices, and then through
continual emulated connect/disconnect of the 65th device, cause the
kernel to run out of RAM and crash the kernel, thus causing a local DOS
vulnerability.
Fixes CVE-2016-5400
Signed-off-by: James Patrick-Evans <james@jmp-e.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Brian King [Mon, 27 Jun 2016 14:09:40 +0000 (09:09 -0500)]
ipr: Clear interrupt on croc/crocodile when running with LSI
commit
54e430bbd490e18ab116afa4cd90dcc45787b3df upstream.
If we fall back to using LSI on the Croc or Crocodile chip we need to
clear the interrupt so we don't hang the system.
Tested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alan Stern [Thu, 23 Jun 2016 19:05:26 +0000 (15:05 -0400)]
SCSI: fix new bug in scsi_dev_info_list string matching
commit
5e7ff2ca7f2da55fe777167849d0c93403bd0dc8 upstream.
Commit
b704f70ce200 ("SCSI: fix bug in scsi_dev_info_list matching")
changed the way vendor- and model-string matching was carried out in the
routine that looks up entries in a SCSI devinfo list. The new matching
code failed to take into account the case of a maximum-length string; in
such cases it could end up testing for a terminating '\0' byte beyond
the end of the memory allocated to the string. This out-of-bounds bug
was detected by UBSAN.
I don't know if anybody has actually encountered this bug. The symptom
would be that a device entry in the blacklist might not be matched
properly if it contained an 8-character vendor name or a 16-character
model name. Such entries certainly exist in scsi_static_device_list.
This patch fixes the problem by adding a check for a maximum-length
string before the '\0' test.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Fixes: b704f70ce200 ("SCSI: fix bug in scsi_dev_info_list matching")
Tested-by: Wilfried Klaebe <linux-kernel@lebenslange-mailadresse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bruno Prémont [Thu, 30 Jun 2016 15:00:32 +0000 (17:00 +0200)]
qla2xxx: Fix NULL pointer deref in QLA interrupt
commit
262e2bfd7d1e1f1ee48b870e5dfabb87c06b975e upstream.
In qla24xx_process_response_queue() rsp->msix->cpuid may trigger NULL
pointer dereference when rsp->msix is NULL:
[ 5.622457] NULL pointer dereference at
0000000000000050
[ 5.622457] IP: [<
ffffffff8155e614>] qla24xx_process_response_queue+0x44/0x4b0
[ 5.622457] PGD 0
[ 5.622457] Oops: 0000 [#1] SMP
[ 5.622457] Modules linked in:
[ 5.622457] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.6.3-x86_64 #1
[ 5.622457] Hardware name: HP ProLiant DL360 G5, BIOS P58 05/02/2011
[ 5.622457] task:
ffff8801a88f3740 ti:
ffff8801a8954000 task.ti:
ffff8801a8954000
[ 5.622457] RIP: 0010:[<
ffffffff8155e614>] [<
ffffffff8155e614>] qla24xx_process_response_queue+0x44/0x4b0
[ 5.622457] RSP: 0000:
ffff8801afb03de8 EFLAGS:
00010002
[ 5.622457] RAX:
0000000000000000 RBX:
0000000000000032 RCX:
00000000ffffffff
[ 5.622457] RDX:
0000000000000002 RSI:
ffff8801a79bf8c8 RDI:
ffff8800c8f7e7c0
[ 5.622457] RBP:
ffff8801afb03e68 R08:
0000000000000000 R09:
0000000000000000
[ 5.622457] R10:
00000000ffff8c47 R11:
0000000000000002 R12:
ffff8801a79bf8c8
[ 5.622457] R13:
ffff8800c8f7e7c0 R14:
ffff8800c8f60000 R15:
0000000000018013
[ 5.622457] FS:
0000000000000000(0000) GS:
ffff8801afb00000(0000) knlGS:
0000000000000000
[ 5.622457] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 5.622457] CR2:
0000000000000050 CR3:
0000000001e07000 CR4:
00000000000006e0
[ 5.622457] Stack:
[ 5.622457]
ffff8801afb03e30 ffffffff810c0f2d 0000000000000086 0000000000000002
[ 5.622457]
ffff8801afb03e28 ffffffff816570e1 ffff8800c8994628 0000000000000002
[ 5.622457]
ffff8801afb03e60 ffffffff816772d4 b47c472ad6955e68 0000000000000032
[ 5.622457] Call Trace:
[ 5.622457] <IRQ>
[ 5.622457] [<
ffffffff810c0f2d>] ? __wake_up_common+0x4d/0x80
[ 5.622457] [<
ffffffff816570e1>] ? usb_hcd_resume_root_hub+0x51/0x60
[ 5.622457] [<
ffffffff816772d4>] ? uhci_hub_status_data+0x64/0x240
[ 5.622457] [<
ffffffff81560d00>] qla24xx_intr_handler+0xf0/0x2e0
[ 5.622457] [<
ffffffff810d569e>] ? get_next_timer_interrupt+0xce/0x200
[ 5.622457] [<
ffffffff810c89b4>] handle_irq_event_percpu+0x64/0x100
[ 5.622457] [<
ffffffff810c8a77>] handle_irq_event+0x27/0x50
[ 5.622457] [<
ffffffff810cb965>] handle_edge_irq+0x65/0x140
[ 5.622457] [<
ffffffff8101a498>] handle_irq+0x18/0x30
[ 5.622457] [<
ffffffff8101a276>] do_IRQ+0x46/0xd0
[ 5.622457] [<
ffffffff817f8fff>] common_interrupt+0x7f/0x7f
[ 5.622457] <EOI>
[ 5.622457] [<
ffffffff81020d38>] ? mwait_idle+0x68/0x80
[ 5.622457] [<
ffffffff8102114a>] arch_cpu_idle+0xa/0x10
[ 5.622457] [<
ffffffff810c1b97>] default_idle_call+0x27/0x30
[ 5.622457] [<
ffffffff810c1d3b>] cpu_startup_entry+0x19b/0x230
[ 5.622457] [<
ffffffff810324c6>] start_secondary+0x136/0x140
[ 5.622457] Code: 00 00 65 48 8b 04 25 28 00 00 00 48 89 45 d0 31 c0 48 8b 47 58 a8 02 0f 84 c5 00 00 00 48 8b 46 50 49 89 f4 65 8b 15 34 bb aa 7e <39> 50 50 74 11 89 50 50 48 8b 46 50 8b 40 50 41 89 86 60 8b 00
[ 5.622457] RIP [<
ffffffff8155e614>] qla24xx_process_response_queue+0x44/0x4b0
[ 5.622457] RSP <
ffff8801afb03de8>
[ 5.622457] CR2:
0000000000000050
[ 5.622457] ---[ end trace
fa2b19c25106d42b ]---
[ 5.622457] Kernel panic - not syncing: Fatal exception in interrupt
The affected code was introduced by commit
cdb898c52d1dfad4b4800b83a58b3fe5d352edde
(qla2xxx: Add irq affinity notification).
Only dereference rsp->msix when it has been set so the machine can boot
fine. Possibly rsp->msix is unset because:
[ 3.479679] qla2xxx [0000:00:00.0]-0005: : QLogic Fibre Channel HBA Driver: 8.07.00.33-k.
[ 3.481839] qla2xxx [0000:13:00.0]-001d: : Found an ISP2432 irq 17 iobase 0xffffc90000038000.
[ 3.484081] qla2xxx [0000:13:00.0]-0035:0: MSI-X; Unsupported ISP2432 (0x2, 0x3).
[ 3.485804] qla2xxx [0000:13:00.0]-0037:0: Falling back-to MSI mode -258.
[ 3.890145] scsi host0: qla2xxx
[ 3.891956] qla2xxx [0000:13:00.0]-00fb:0: QLogic QLE2460 - PCI-Express Single Channel 4Gb Fibre Channel HBA.
[ 3.894207] qla2xxx [0000:13:00.0]-00fc:0: ISP2432: PCIe (2.5GT/s x4) @ 0000:13:00.0 hdma+ host#=0 fw=7.03.00 (9496).
[ 5.714774] qla2xxx [0000:13:00.0]-500a:0: LOOP UP detected (4 Gbps).
Signed-off-by: Bruno Prémont <bonbons@linux-vserver.org>
Acked-by: Quinn Tran <quinn.tran@qlogic.com>
Fixes: cdb898c52d1dfad4b4800b83a58b3fe5d352edde
Signed-off-by: James Bottomley <jejb@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Paul Burton [Tue, 5 Jul 2016 13:26:00 +0000 (14:26 +0100)]
irqchip/mips-gic: Match IPI IRQ domain by bus token only
commit
547aefc4db877e65245c3d95fcce703701bf3a0c upstream.
Commit
fbde2d7d8290 ("MIPS: Add generic SMP IPI support") introduced
code which calls irq_find_matching_host with a NULL node parameter in
order to discover IPI IRQ domains which are not associated with the DT
root node's interrupt parent. This suggests that implementations of IPI
IRQ domains should effectively ignore the node parameter if it is NULL
and search purely based upon the bus token. Commit
2af70a962070
("irqchip/mips-gic: Add a IPI hierarchy domain") did not do this when
implementing the GIC IPI IRQ domain, and on MIPS Boston boards this
leads to no IPI domain being discovered and a NULL pointer dereference
when attempting to send an IPI:
CPU 0 Unable to handle kernel paging request at virtual address
0000000000000040, epc ==
ffffffff8016e70c, ra ==
ffffffff8010ff5c
Oops[#1]:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted
4.7.0-rc6-00223-gad0d1b6 #945
task:
a8000000ff066fc0 ti:
a8000000ff068000 task.ti:
a8000000ff068000
$ 0 :
0000000000000000 0000000000000001 ffffffff80730000 0000000000000003
$ 4 :
0000000000000000 ffffffff8057e5b0 a800000001e3ee00 0000000000000000
$ 8 :
0000000000000000 0000000000000023 0000000000000001 0000000000000001
$12 :
0000000000000000 ffffffff803323d0 0000000000000000 0000000000000000
$16 :
0000000000000000 0000000000000000 0000000000000001 ffffffff801108fc
$20 :
0000000000000000 ffffffff8057e5b0 0000000000000001 0000000000000000
$24 :
0000000000000000 ffffffff8012de28
$28 :
a8000000ff068000 a8000000ff06fbc0 0000000000000000 ffffffff8010ff5c
Hi :
ffffffff8014c174
Lo :
a800000001e1e140
epc :
ffffffff8016e70c __ipi_send_mask+0x24/0x11c
ra :
ffffffff8010ff5c mips_smp_send_ipi_mask+0x68/0x178
Status:
140084e2 KX SX UX KERNEL EXL
Cause :
00800008 (ExcCode 02)
BadVA :
0000000000000040
PrId :
0001a920 (MIPS I6400)
Process swapper/0 (pid: 1, threadinfo=
a8000000ff068000, task=
a8000000ff066fc0, tls=
0000000000000000)
Stack :
0000000000000000 0000000000000000 0000000000000001 ffffffff801108fc
0000000000000000 ffffffff8057e5b0 0000000000000001 ffffffff8010ff5c
0000000000000001 0000000000000020 0000000000000000 0000000000000000
0000000000000000 ffffffff801108fc 0000000000000000 0000000000000001
0000000000000001 0000000000000000 0000000000000000 ffffffff801865e8
a8000000ff0c7500 a8000000ff06fc90 0000000000000001 0000000000000002
ffffffff801108fc ffffffff801868b8 0000000000000000 ffffffff801108fc
0000000000000000 0000000000000003 ffffffff8068c700 0000000000000001
ffffffff80730000 0000000000000001 a8000000ff00a290 ffffffff80110c50
0000000000000003 a800000001e48308 0000000000000003 0000000000000008
...
Call Trace:
[<
ffffffff8016e70c>] __ipi_send_mask+0x24/0x11c
[<
ffffffff8010ff5c>] mips_smp_send_ipi_mask+0x68/0x178
[<
ffffffff801865e8>] generic_exec_single+0x150/0x170
[<
ffffffff801868b8>] smp_call_function_single+0x108/0x160
[<
ffffffff80110c50>] cps_boot_secondary+0x328/0x394
[<
ffffffff80110534>] __cpu_up+0x38/0x90
[<
ffffffff8012de4c>] bringup_cpu+0x24/0xac
[<
ffffffff8012df40>] cpuhp_up_callbacks+0x58/0xdc
[<
ffffffff8012e648>] cpu_up+0x118/0x18c
[<
ffffffff806dc158>] smp_init+0xbc/0xe8
[<
ffffffff806d4c18>] kernel_init_freeable+0xa0/0x228
[<
ffffffff8056c908>] kernel_init+0x10/0xf0
[<
ffffffff80105098>] ret_from_kernel_thread+0x14/0x1c
Fix this by allowing the GIC IPI IRQ domain to match purely based upon
the bus token if the node provided is NULL.
Fixes: 2af70a962070 ("irqchip/mips-gic: Add a IPI hierarchy domain")
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Qais Yousef <qsyousef@gmail.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Link: http://lkml.kernel.org/r/20160705132600.27730-2-paul.burton@imgtec.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Paul Burton [Tue, 5 Jul 2016 13:25:59 +0000 (14:25 +0100)]
irqchip/mips-gic: Map to VPs using HW VPNum
commit
99ec8a3608330d202448085185cf28389b789b7b upstream.
When mapping an interrupt to a VP(E) we must use the identifier for the
VP that the hardware expects, and this does not always match up with the
Linux CPU number. Commit
d46812bb0bef ("irqchip: mips-gic: Use HW IDs
for VPE_OTHER_ADDR") corrected this for the cases that existed at the
time it was written, but commit
2af70a962070 ("irqchip/mips-gic: Add a
IPI hierarchy domain") added another case before the former patch was
merged. This leads to incorrectly using Linux CPU numbers when mapping
interrupts to VPs, which breaks on certain systems such as those with
multi-core I6400 CPUs. Fix by adding the appropriate call to
mips_cm_vp_id() to retrieve the expected VP identifier.
Fixes: d46812bb0bef ("irqchip: mips-gic: Use HW IDs for VPE_OTHER_ADDR")
Fixes: 2af70a962070 ("irqchip/mips-gic: Add a IPI hierarchy domain")
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Qais Yousef <qsyousef@gmail.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Link: http://lkml.kernel.org/r/20160705132600.27730-1-paul.burton@imgtec.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Vegard Nossum [Sun, 3 Jul 2016 08:54:54 +0000 (10:54 +0200)]
RDS: fix rds_tcp_init() error path
commit
3dad5424adfb346c871847d467f97dcdca64ea97 upstream.
If register_pernet_subsys() fails, we shouldn't try to call
unregister_pernet_subsys().
Fixes: 467fa15356 ("RDS-TCP: Support multiple RDS-TCP listen endpoints, one per netns.")
Cc: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Oliver Hartkopp [Tue, 21 Jun 2016 13:45:47 +0000 (15:45 +0200)]
can: fix oops caused by wrong rtnl dellink usage
commit
25e1ed6e64f52a692ba3191c4fde650aab3ecc07 upstream.
For 'real' hardware CAN devices the netlink interface is used to set CAN
specific communication parameters. Real CAN hardware can not be created nor
removed with the ip tool ...
This patch adds a private dellink function for the CAN device driver interface
that does just nothing.
It's a follow up to commit
993e6f2fd ("can: fix oops caused by wrong rtnl
newlink usage") but for dellink.
Reported-by: ajneu <ajneu1@gmail.com>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Oliver Hartkopp [Tue, 21 Jun 2016 10:14:07 +0000 (12:14 +0200)]
can: fix handling of unmodifiable configuration options fix
commit
bce271f255dae8335dc4d2ee2c4531e09cc67f5a upstream.
With upstream commit
bb208f144cf3f59 (can: fix handling of unmodifiable
configuration options) a new can_validate() function was introduced.
When invoking 'ip link set can0 type can' without any configuration data
can_validate() tries to validate the content without taking into account that
there's totally no content. This patch adds a check for missing content.
Reported-by: ajneu <ajneu1@gmail.com>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Thor Thayer [Thu, 16 Jun 2016 16:10:19 +0000 (11:10 -0500)]
can: c_can: Update D_CAN TX and RX functions to 32 bit - fix Altera Cyclone access
commit
427460c83cdf55069eee49799a0caef7dde8df69 upstream.
When testing CAN write floods on Altera's CycloneV, the first 2 bytes
are sometimes 0x00, 0x00 or corrupted instead of the values sent. Also
observed bytes 4 & 5 were corrupted in some cases.
The D_CAN Data registers are 32 bits and changing from 16 bit writes to
32 bit writes fixes the problem.
Testing performed on Altera CycloneV (D_CAN). Requesting tests on other
C_CAN & D_CAN platforms.
Reported-by: Richard Andrysek <richard.andrysek@gomtec.de>
Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Wolfgang Grandegger [Mon, 13 Jun 2016 13:44:19 +0000 (15:44 +0200)]
can: at91_can: RX queue could get stuck at high bus load
commit
43200a4480cbbe660309621817f54cbb93907108 upstream.
At high bus load it could happen that "at91_poll()" enters with all RX
message boxes filled up. If then at the end the "quota" is exceeded as
well, "rx_next" will not be reset to the first RX mailbox and hence the
interrupts remain disabled.
Signed-off-by: Wolfgang Grandegger <wg@grandegger.com>
Tested-by: Amr Bekhit <amrbekhit@gmail.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Zijlstra [Fri, 24 Jun 2016 13:53:54 +0000 (15:53 +0200)]
sched/fair: Fix effective_load() to consistently use smoothed load
commit
7dd4912594daf769a46744848b05bd5bc6d62469 upstream.
Starting with the following commit:
fde7d22e01aa ("sched/fair: Fix overly small weight for interactive group entities")
calc_tg_weight() doesn't compute the right value as expected by effective_load().
The difference is in the 'correction' term. In order to ensure \Sum
rw_j >= rw_i we cannot use tg->load_avg directly, since that might be
lagging a correction on the current cfs_rq->avg.load_avg value.
Therefore we use tg->load_avg - cfs_rq->tg_load_avg_contrib +
cfs_rq->avg.load_avg.
Now, per the referenced commit, calc_tg_weight() doesn't use
cfs_rq->avg.load_avg, as is later used in @w, but uses
cfs_rq->load.weight instead.
So stop using calc_tg_weight() and do it explicitly.
The effects of this bug are wake_affine() making randomly
poor choices in cgroup-intense workloads.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: fde7d22e01aa ("sched/fair: Fix overly small weight for interactive group entities")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Taras Kondratiuk [Wed, 13 Jul 2016 22:05:38 +0000 (22:05 +0000)]
mmc: block: fix packed command header endianness
commit
f68381a70bb2b26c31b13fdaf67c778f92fd32b4 upstream.
The code that fills packed command header assumes that CPU runs in
little-endian mode. Hence the header is malformed in big-endian mode
and causes MMC data transfer errors:
[ 563.200828] mmcblk0: error -110 transferring data, sector 2048, nr 8, cmd response 0x900, card status 0xc40
[ 563.219647] mmcblk0: packed cmd failed, nr 2, sectors 16, failure index: -1
Convert header data to LE.
Signed-off-by: Taras Kondratiuk <takondra@cisco.com>
Fixes: ce39f9d17c14 ("mmc: support packed write command for eMMC4.5 devices")
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ville Viinikka [Fri, 8 Jul 2016 15:27:02 +0000 (18:27 +0300)]
mmc: block: fix free of uninitialized 'idata->buf'
commit
bfe5b1b1e013f7b1c0fd2ac3b3c8c380114b3fb9 upstream.
Set 'idata->buf' to NULL so that it never gets returned without
initialization. This fixes a bug where mmc_blk_ioctl_cmd() would
free both 'idata' and 'idata->buf' but 'idata->buf' was returned
uninitialized.
Fixes: 1ff8950c0433 ("mmc: block: change to use kmalloc when copy data from userspace")
Signed-off-by: Ville Viinikka <ville@tuxera.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Omar Sandoval [Fri, 1 Jul 2016 07:39:35 +0000 (00:39 -0700)]
block: fix use-after-free in sys_ioprio_get()
commit
8ba8682107ee2ca3347354e018865d8e1967c5f4 upstream.
get_task_ioprio() accesses the task->io_context without holding the task
lock and thus can race with exit_io_context(), leading to a
use-after-free. The reproducer below hits this within a few seconds on
my 4-core QEMU VM:
#define _GNU_SOURCE
#include <assert.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <sys/wait.h>
int main(int argc, char **argv)
{
pid_t pid, child;
long nproc, i;
/* ioprio_set(IOPRIO_WHO_PROCESS, 0, IOPRIO_PRIO_VALUE(IOPRIO_CLASS_IDLE, 0)); */
syscall(SYS_ioprio_set, 1, 0, 0x6000);
nproc = sysconf(_SC_NPROCESSORS_ONLN);
for (i = 0; i < nproc; i++) {
pid = fork();
assert(pid != -1);
if (pid == 0) {
for (;;) {
pid = fork();
assert(pid != -1);
if (pid == 0) {
_exit(0);
} else {
child = wait(NULL);
assert(child == pid);
}
}
}
pid = fork();
assert(pid != -1);
if (pid == 0) {
for (;;) {
/* ioprio_get(IOPRIO_WHO_PGRP, 0); */
syscall(SYS_ioprio_get, 2, 0);
}
}
}
for (;;) {
/* ioprio_get(IOPRIO_WHO_PGRP, 0); */
syscall(SYS_ioprio_get, 2, 0);
}
return 0;
}
This gets us KASAN dumps like this:
[ 35.526914] ==================================================================
[ 35.530009] BUG: KASAN: out-of-bounds in get_task_ioprio+0x7b/0x90 at addr
ffff880066f34e6c
[ 35.530009] Read of size 2 by task ioprio-gpf/363
[ 35.530009] =============================================================================
[ 35.530009] BUG blkdev_ioc (Not tainted): kasan: bad access detected
[ 35.530009] -----------------------------------------------------------------------------
[ 35.530009] Disabling lock debugging due to kernel taint
[ 35.530009] INFO: Allocated in create_task_io_context+0x2b/0x370 age=0 cpu=0 pid=360
[ 35.530009] ___slab_alloc+0x55d/0x5a0
[ 35.530009] __slab_alloc.isra.20+0x2b/0x40
[ 35.530009] kmem_cache_alloc_node+0x84/0x200
[ 35.530009] create_task_io_context+0x2b/0x370
[ 35.530009] get_task_io_context+0x92/0xb0
[ 35.530009] copy_process.part.8+0x5029/0x5660
[ 35.530009] _do_fork+0x155/0x7e0
[ 35.530009] SyS_clone+0x19/0x20
[ 35.530009] do_syscall_64+0x195/0x3a0
[ 35.530009] return_from_SYSCALL_64+0x0/0x6a
[ 35.530009] INFO: Freed in put_io_context+0xe7/0x120 age=0 cpu=0 pid=1060
[ 35.530009] __slab_free+0x27b/0x3d0
[ 35.530009] kmem_cache_free+0x1fb/0x220
[ 35.530009] put_io_context+0xe7/0x120
[ 35.530009] put_io_context_active+0x238/0x380
[ 35.530009] exit_io_context+0x66/0x80
[ 35.530009] do_exit+0x158e/0x2b90
[ 35.530009] do_group_exit+0xe5/0x2b0
[ 35.530009] SyS_exit_group+0x1d/0x20
[ 35.530009] entry_SYSCALL_64_fastpath+0x1a/0xa4
[ 35.530009] INFO: Slab 0xffffea00019bcd00 objects=20 used=4 fp=0xffff880066f34ff0 flags=0x1fffe0000004080
[ 35.530009] INFO: Object 0xffff880066f34e58 @offset=3672 fp=0x0000000000000001
[ 35.530009] ==================================================================
Fix it by grabbing the task lock while we poke at the io_context.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Randy Dunlap [Wed, 6 Jul 2016 23:06:53 +0000 (16:06 -0700)]
init/Kconfig: keep Expert users menu together
commit
076501ff6ba265a473689c112eda9f1f34f620b5 upstream.
The "expert" menu was broken (split) such that all entries in it after
KALLSYMS were displayed in the "General setup" area instead of in the
"Expert users" area. Fix this by adding one kconfig dependency.
Yes, the Expert users menu is fragile. Problems like this have happened
several times in the past. I will attempt to isolate the Expert users
menu if there is interest in that.
Fixes: 4d5d5664c900 ("x86: kallsyms: disable absolute percpu symbols on !SMP")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ursula Braun [Mon, 4 Jul 2016 12:07:16 +0000 (14:07 +0200)]
qeth: delete napi struct when removing a qeth device
commit
7831b4ff0d926e0deeaabef9db8800ed069a2757 upstream.
A qeth_card contains a napi_struct linked to the net_device during
device probing. This struct must be deleted when removing the qeth
device, otherwise Panic on oops can occur when qeth devices are
repeatedly removed and added.
Fixes: a1c3ed4c9ca ("qeth: NAPI support for l2 and l3 discipline")
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Tested-by: Alexander Klein <ALKL@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Dan Carpenter [Tue, 21 Jun 2016 13:58:46 +0000 (16:58 +0300)]
platform/chrome: cros_ec_dev - double fetch bug in ioctl
commit
096cdc6f52225835ff503f987a0d68ef770bb78e upstream.
We verify "u_cmd.outsize" and "u_cmd.insize" but we need to make sure
that those values have not changed between the two copy_from_user()
calls. Otherwise it could lead to a buffer overflow.
Additionally, cros_ec_cmd_xfer() can set s_cmd->insize to a lower value.
We should use the new smaller value so we don't copy too much data to
the user.
Reported-by: Pengfei Wang <wpengfeinudt@gmail.com>
Fixes: a841178445bb ('mfd: cros_ec: Use a zero-length array for command data')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Gwendal Grignou <gwendal@chromium.org>
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Scott Mayhew [Thu, 30 Jun 2016 14:39:32 +0000 (10:39 -0400)]
lockd: unregister notifier blocks if the service fails to come up completely
commit
cb7d224f82e41d82518e7f9ea271d215d4d08e6e upstream.
If the lockd service fails to start up then we need to be sure that the
notifier blocks are not registered, otherwise a subsequent start of the
service could cause the same notifier to be registered twice, leading to
soft lockups.
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Fixes: 0751ddf77b6a "lockd: Register callbacks on the inetaddr_chain..."
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Boris Brezillon [Mon, 18 Jul 2016 07:49:12 +0000 (09:49 +0200)]
clk: at91: fix clk_programmable_set_parent()
commit
f96423f483b1a7854270335b319e8d1cdd6f3585 upstream.
Since commit
1bdf02326b71e ("clk: at91: make use of syscon/regmap
internally"), clk_programmable_set_parent() is always selecting the
first parent (AKA slow_clk), no matter what's passed in the 'index'
parameter.
Fix that by initializing the pckr variable to the index value.
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Reported-by: Hans Verkuil <hans.verkuil@cisco.com>
Fixes: 1bdf02326b71e ("clk: at91: make use of syscon/regmap internally")
Signed-off-by: Michael Turquette <mturquette@baylibre.com>
Link: lkml.kernel.org/r/
1468828152-18389-1-git-send-email-boris.brezillon@free-electrons.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Heiko Stuebner [Tue, 17 May 2016 18:57:50 +0000 (20:57 +0200)]
clk: rockchip: initialize flags of clk_init_data in mmc-phase clock
commit
595144c1141c951a3c6bb9004ae6a2bc29aad66f upstream.
The flags element of clk_init_data was never initialized for mmc-
phase-clocks resulting in the element containing a random value
and thus possibly enabling unwanted clock flags.
Fixes: 89bf26cbc1a0 ("clk: rockchip: Add support for the mmc clock phases using the framework")
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Michal Suchanek [Mon, 13 Jun 2016 17:46:49 +0000 (17:46 +0000)]
spi: sun4i: fix FIFO limit
commit
6d9fe44bd73d567d04d3a68a2d2fa521ab9532f2 upstream.
When testing SPI without DMA I noticed that filling the FIFO on the
spi controller causes timeout.
Always leave room for one byte in the FIFO.
Signed-off-by: Michal Suchanek <hramrach@gmail.com>
Acked-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Michal Suchanek [Mon, 13 Jun 2016 17:46:49 +0000 (17:46 +0000)]
spi: sunxi: fix transfer timeout
commit
719bd6542044efd9b338a53dba1bef45f40ca169 upstream.
The trasfer timeout is fixed at 1000 ms. Reading a 4Mbyte flash over
1MHz SPI bus takes way longer than that. Calculate the timeout from the
actual time the transfer is supposed to take and multiply by 2 for good
measure.
Signed-off-by: Michal Suchanek <hramrach@gmail.com>
Acked-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tomeu Vizoso [Wed, 8 Jun 2016 07:32:51 +0000 (09:32 +0200)]
spi: rockchip: Signal unfinished DMA transfers
commit
4dc0dd83603f05dc3ae152af33ecb15104c313f3 upstream.
When using DMA, the transfer_one callback should return 1 because the
transfer hasn't finished yet.
A previous commit changed the function to return 0 when the DMA channels
were correctly prepared.
This manifested in Veyron boards with this message:
[ 1.983605] cros-ec-spi spi0.0: EC failed to respond in time
Fixes: ea9849113343 ("spi: rockchip: check return value of dmaengine_prep_slave_sg")
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Andrey Ulanov [Fri, 15 Apr 2016 21:24:41 +0000 (14:24 -0700)]
namespace: update event counter when umounting a deleted dentry
commit
e06b933e6ded42384164d28a2060b7f89243b895 upstream.
- m_start() in fs/namespace.c expects that ns->event is incremented each
time a mount added or removed from ns->list.
- umount_tree() removes items from the list but does not increment event
counter, expecting that it's done before the function is called.
- There are some codepaths that call umount_tree() without updating
"event" counter. e.g. from __detach_mounts().
- When this happens m_start may reuse a cached mount structure that no
longer belongs to ns->list (i.e. use after free which usually leads
to infinite loop).
This change fixes the above problem by incrementing global event counter
before invoking umount_tree().
Change-Id: I622c8e84dcb9fb63542372c5dbf0178ee86bb589
Signed-off-by: Andrey Ulanov <andreyu@google.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Colin Ian King [Mon, 20 Jun 2016 14:40:27 +0000 (15:40 +0100)]
devpts: fix null pointer dereference on failed memory allocation
commit
5353ed8deedee9e5acb9f896e9032158f5d998de upstream.
An ENOMEM when creating a pair tty in tty_ldisc_setup causes a null
pointer dereference in devpts_kill_index because tty->link->driver_data
is NULL. The oops was triggered with the pty stressor in stress-ng when
in a low memory condition.
tty_init_dev tries to clean up a tty_ldisc_setup ENOMEM error by calling
release_tty, however, this ultimately tries to clean up the NULL pair'd
tty in pty_unix98_remove, triggering the Oops.
Add check to pty_unix98_remove to only clean up fsi if it is not NULL.
Ooops:
[ 23.020961] Oops: 0000 [#1] SMP
[ 23.020976] Modules linked in: ppdev snd_hda_codec_generic snd_hda_intel snd_hda_codec parport_pc snd_hda_core snd_hwdep parport snd_pcm input_leds joydev snd_timer serio_raw snd soundcore i2c_piix4 mac_hid ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel qxl aes_x86_64 ttm lrw gf128mul glue_helper ablk_helper drm_kms_helper cryptd syscopyarea sysfillrect psmouse sysimgblt floppy fb_sys_fops drm pata_acpi jitterentropy_rng drbg ansi_cprng
[ 23.020978] CPU: 0 PID: 1452 Comm: stress-ng-pty Not tainted 4.7.0-rc4+ #2
[ 23.020978] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 23.020979] task:
ffff88007ba30000 ti:
ffff880078ea8000 task.ti:
ffff880078ea8000
[ 23.020981] RIP: 0010:[<
ffffffff813f11ff>] [<
ffffffff813f11ff>] ida_remove+0x1f/0x120
[ 23.020981] RSP: 0018:
ffff880078eabb60 EFLAGS:
00010a03
[ 23.020982] RAX:
4444444444444567 RBX:
0000000000000000 RCX:
000000000000001f
[ 23.020982] RDX:
000000000000014c RSI:
000000000000026f RDI:
0000000000000000
[ 23.020982] RBP:
ffff880078eabb70 R08:
0000000000000004 R09:
0000000000000036
[ 23.020983] R10:
000000000000026f R11:
0000000000000000 R12:
000000000000026f
[ 23.020983] R13:
000000000000026f R14:
ffff88007c944b40 R15:
000000000000026f
[ 23.020984] FS:
00007f9a2f3cc700(0000) GS:
ffff88007fc00000(0000) knlGS:
0000000000000000
[ 23.020984] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 23.020985] CR2:
0000000000000010 CR3:
000000006c81b000 CR4:
00000000001406f0
[ 23.020988] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[ 23.020988] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[ 23.020988] Stack:
[ 23.020989]
0000000000000000 000000000000026f ffff880078eabb90 ffffffff812a5a99
[ 23.020990]
0000000000000000 00000000fffffff4 ffff880078eabba8 ffffffff814f9cbe
[ 23.020991]
ffff88007965c800 ffff880078eabbc8 ffffffff814eef43 fffffffffffffff4
[ 23.020991] Call Trace:
[ 23.021000] [<
ffffffff812a5a99>] devpts_kill_index+0x29/0x50
[ 23.021002] [<
ffffffff814f9cbe>] pty_unix98_remove+0x2e/0x50
[ 23.021006] [<
ffffffff814eef43>] release_tty+0xb3/0x1b0
[ 23.021007] [<
ffffffff814f18d4>] tty_init_dev+0xd4/0x1c0
[ 23.021011] [<
ffffffff814f9fae>] ptmx_open+0xae/0x190
[ 23.021013] [<
ffffffff812254ef>] chrdev_open+0xbf/0x1b0
[ 23.021015] [<
ffffffff8121d973>] do_dentry_open+0x203/0x310
[ 23.021016] [<
ffffffff81225430>] ? cdev_put+0x30/0x30
[ 23.021017] [<
ffffffff8121ee44>] vfs_open+0x54/0x80
[ 23.021018] [<
ffffffff8122b8fc>] ? may_open+0x8c/0x100
[ 23.021019] [<
ffffffff8122f26b>] path_openat+0x2eb/0x1440
[ 23.021020] [<
ffffffff81230534>] ? putname+0x54/0x60
[ 23.021022] [<
ffffffff814f6f97>] ? n_tty_ioctl_helper+0x27/0x100
[ 23.021023] [<
ffffffff81231651>] do_filp_open+0x91/0x100
[ 23.021024] [<
ffffffff81230596>] ? getname_flags+0x56/0x1f0
[ 23.021026] [<
ffffffff8123fc66>] ? __alloc_fd+0x46/0x190
[ 23.021027] [<
ffffffff8121f1e4>] do_sys_open+0x124/0x210
[ 23.021028] [<
ffffffff8121f2ee>] SyS_open+0x1e/0x20
[ 23.021035] [<
ffffffff81845576>] entry_SYSCALL_64_fastpath+0x1e/0xa8
[ 23.021044] Code: 63 28 45 31 e4 eb dd 0f 1f 44 00 00 55 4c 63 d6 48 ba 89 88 88 88 88 88 88 88 4c 89 d0 b9 1f 00 00 00 48 f7 e2 48 89 e5 41 54 53 <8b> 47 10 48 89 fb 8d 3c c5 00 00 00 00 48 c1 ea 09 b8 01 00 00
[ 23.021045] RIP [<
ffffffff813f11ff>] ida_remove+0x1f/0x120
[ 23.021045] RSP <
ffff880078eabb60>
[ 23.021046] CR2:
0000000000000010
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Rafael J. Wysocki [Tue, 28 Jun 2016 01:29:29 +0000 (03:29 +0200)]
cpufreq: Avoid false-positive WARN_ON()s in cpufreq_update_policy()
commit
742c87bf27d3b715820da6f8a81d6357adbf18f8 upstream.
CPU notifications from the firmware coming in when cpufreq is
suspended cause cpufreq_update_current_freq() to return 0 which
triggers the WARN_ON() in cpufreq_update_policy() for no reason.
Avoid that by checking cpufreq_suspended before calling
cpufreq_update_current_freq().
Fixes: c9d9c929e674 (cpufreq: Abort cpufreq_update_current_freq() for cpufreq_suspended set)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Miklos Szeredi [Wed, 29 Jun 2016 08:54:23 +0000 (10:54 +0200)]
9p: use file_dentry()
commit
b403f0e37a11f84f7ceaf40b0075499e5bcfd220 upstream.
v9fs may be used as lower layer of overlayfs and accessing f_path.dentry
can lead to a crash. In this case it's a NULL pointer dereference in
p9_fid_create().
Fix by replacing direct access of file->f_path.dentry with the
file_dentry() accessor, which will always return a native object.
Reported-by: Alessio Igor Bogani <alessioigorbogani@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Tested-by: Alessio Igor Bogani <alessioigorbogani@gmail.com>
Fixes: 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay")
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Vegard Nossum [Fri, 15 Jul 2016 04:22:07 +0000 (00:22 -0400)]
ext4: verify extent header depth
commit
7bc9491645118c9461bd21099c31755ff6783593 upstream.
Although the extent tree depth of 5 should enough be for the worst
case of 2*32 extents of length 1, the extent tree code does not
currently to merge nodes which are less than half-full with a sibling
node, or to shrink the tree depth if possible. So it's possible, at
least in theory, for the tree depth to be greater than 5. However,
even in the worst case, a tree depth of 32 is highly unlikely, and if
the file system is maliciously corrupted, an insanely large eh_depth
can cause memory allocation failures that will trigger kernel warnings
(here, eh_depth = 65280):
JBD2: ext4.exe wants too many credits credits:195849 rsv_credits:0 max:256
------------[ cut here ]------------
WARNING: CPU: 0 PID: 50 at fs/jbd2/transaction.c:293 start_this_handle+0x569/0x580
CPU: 0 PID: 50 Comm: ext4.exe Not tainted 4.7.0-rc5+ #508
Stack:
604a8947 625badd8 0002fd09 00000000
60078643 00000000 62623910 601bf9bc
62623970 6002fc84 626239b0 900000125
Call Trace:
[<
6001c2dc>] show_stack+0xdc/0x1a0
[<
601bf9bc>] dump_stack+0x2a/0x2e
[<
6002fc84>] __warn+0x114/0x140
[<
6002fdff>] warn_slowpath_null+0x1f/0x30
[<
60165829>] start_this_handle+0x569/0x580
[<
60165d4e>] jbd2__journal_start+0x11e/0x220
[<
60146690>] __ext4_journal_start_sb+0x60/0xa0
[<
60120a81>] ext4_truncate+0x131/0x3a0
[<
60123677>] ext4_setattr+0x757/0x840
[<
600d5d0f>] notify_change+0x16f/0x2a0
[<
600b2b16>] do_truncate+0x76/0xc0
[<
600c3e56>] path_openat+0x806/0x1300
[<
600c55c9>] do_filp_open+0x89/0xf0
[<
600b4074>] do_sys_open+0x134/0x1e0
[<
600b4140>] SyS_open+0x20/0x30
[<
6001ea68>] handle_syscall+0x88/0x90
[<
600295fd>] userspace+0x3fd/0x500
[<
6001ac55>] fork_handler+0x85/0x90
---[ end trace
08b0b88b6387a244 ]---
[ Commit message modified and the extent tree depath check changed
from 5 to 32 -- tytso ]
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jeff Mahoney [Tue, 5 Jul 2016 21:32:30 +0000 (17:32 -0400)]
ecryptfs: don't allow mmap when the lower fs doesn't support it
commit
f0fe970df3838c202ef6c07a4c2b36838ef0a88b upstream.
There are legitimate reasons to disallow mmap on certain files, notably
in sysfs or procfs. We shouldn't emulate mmap support on file systems
that don't offer support natively.
CVE-2016-1583
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
[tyhicks: clean up f_op check by using ecryptfs_file_to_lower()]
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jeff Mahoney [Tue, 5 Jul 2016 21:32:29 +0000 (17:32 -0400)]
Revert "ecryptfs: forbid opening files without mmap handler"
commit
78c4e172412de5d0456dc00d2b34050aa0b683b5 upstream.
This reverts commit
2f36db71009304b3f0b95afacd8eba1f9f046b87.
It fixed a local root exploit but also introduced a dependency on
the lower file system implementing an mmap operation just to open a file,
which is a bit of a heavy hammer. The right fix is to have mmap depend
on the existence of the mmap handler instead.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Miklos Szeredi [Fri, 1 Jul 2016 12:56:07 +0000 (14:56 +0200)]
locks: use file_inode()
commit
6343a2120862f7023006c8091ad95c1f16a32077 upstream.
(Another one for the f_path debacle.)
ltp fcntl33 testcase caused an Oops in selinux_file_send_sigiotask.
The reason is that generic_add_lease() used filp->f_path.dentry->inode
while all the others use file_inode(). This makes a difference for files
opened on overlayfs since the former will point to the overlay inode the
latter to the underlying inode.
So generic_add_lease() added the lease to the overlay inode and
generic_delete_lease() removed it from the underlying inode. When the file
was released the lease remained on the overlay inode's lock list, resulting
in use after free.
Reported-by: Eryu Guan <eguan@redhat.com>
Fixes: 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Rhyland Klein [Thu, 9 Jun 2016 21:28:39 +0000 (17:28 -0400)]
power_supply: power_supply_read_temp only if use_cnt > 0
commit
5bc28b93a36e3cb3acc2870fb75cb6ffb182fece upstream.
Change power_supply_read_temp() to use power_supply_get_property()
so that it will check the use_cnt and ensure it is > 0. The use_cnt
will be incremented at the end of __power_supply_register, so this
will block to case where get_property can be called before the supply
is fully registered. This fixes the issue show in the stack below:
[ 1.452598] power_supply_read_temp+0x78/0x80
[ 1.458680] thermal_zone_get_temp+0x5c/0x11c
[ 1.464765] thermal_zone_device_update+0x34/0xb4
[ 1.471195] thermal_zone_device_register+0x87c/0x8cc
[ 1.477974] __power_supply_register+0x364/0x424
[ 1.484317] power_supply_register_no_ws+0x10/0x18
[ 1.490833] bq27xxx_battery_setup+0x10c/0x164
[ 1.497003] bq27xxx_battery_i2c_probe+0xd0/0x1b0
[ 1.503435] i2c_device_probe+0x174/0x240
[ 1.509172] driver_probe_device+0x1fc/0x29c
[ 1.515167] __driver_attach+0xa4/0xa8
[ 1.520643] bus_for_each_dev+0x58/0x98
[ 1.526204] driver_attach+0x20/0x28
[ 1.531505] bus_add_driver+0x1c8/0x22c
[ 1.537067] driver_register+0x68/0x108
[ 1.542630] i2c_register_driver+0x38/0x7c
[ 1.548457] bq27xxx_battery_i2c_driver_init+0x18/0x20
[ 1.555321] do_one_initcall+0x38/0x12c
[ 1.560886] kernel_init_freeable+0x148/0x1ec
[ 1.566972] kernel_init+0x10/0xfc
[ 1.572101] ret_from_fork+0x10/0x40
Also make the same change to ps_get_max_charge_cntl_limit() and
ps_get_cur_chrage_cntl_limit() to be safe. Lastly, change the return
value of power_supply_get_property() to -EAGAIN from -ENODEV if
use_cnt <= 0.
Fixes: 297d716f6260 ("power_supply: Change ownership from driver to core")
Signed-off-by: Rhyland Klein <rklein@nvidia.com>
Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Sebastian Reichel <sre@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Daniel Bristot de Oliveira [Wed, 22 Jun 2016 20:28:41 +0000 (17:28 -0300)]
cgroup: Disable IRQs while holding css_set_lock
commit
82d6489d0fed2ec8a8c48c19e8d8a04ac8e5bb26 upstream.
While testing the deadline scheduler + cgroup setup I hit this
warning.
[ 132.612935] ------------[ cut here ]------------
[ 132.612951] WARNING: CPU: 5 PID: 0 at kernel/softirq.c:150 __local_bh_enable_ip+0x6b/0x80
[ 132.612952] Modules linked in: (a ton of modules...)
[ 132.612981] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.7.0-rc2 #2
[ 132.612981] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.2-20150714_191134- 04/01/2014
[ 132.612982]
0000000000000086 45c8bb5effdd088b ffff88013fd43da0 ffffffff813d229e
[ 132.612984]
0000000000000000 0000000000000000 ffff88013fd43de0 ffffffff810a652b
[ 132.612985]
00000096811387b5 0000000000000200 ffff8800bab29d80 ffff880034c54c00
[ 132.612986] Call Trace:
[ 132.612987] <IRQ> [<
ffffffff813d229e>] dump_stack+0x63/0x85
[ 132.612994] [<
ffffffff810a652b>] __warn+0xcb/0xf0
[ 132.612997] [<
ffffffff810e76a0>] ? push_dl_task.part.32+0x170/0x170
[ 132.612999] [<
ffffffff810a665d>] warn_slowpath_null+0x1d/0x20
[ 132.613000] [<
ffffffff810aba5b>] __local_bh_enable_ip+0x6b/0x80
[ 132.613008] [<
ffffffff817d6c8a>] _raw_write_unlock_bh+0x1a/0x20
[ 132.613010] [<
ffffffff817d6c9e>] _raw_spin_unlock_bh+0xe/0x10
[ 132.613015] [<
ffffffff811388ac>] put_css_set+0x5c/0x60
[ 132.613016] [<
ffffffff8113dc7f>] cgroup_free+0x7f/0xa0
[ 132.613017] [<
ffffffff810a3912>] __put_task_struct+0x42/0x140
[ 132.613018] [<
ffffffff810e776a>] dl_task_timer+0xca/0x250
[ 132.613027] [<
ffffffff810e76a0>] ? push_dl_task.part.32+0x170/0x170
[ 132.613030] [<
ffffffff8111371e>] __hrtimer_run_queues+0xee/0x270
[ 132.613031] [<
ffffffff81113ec8>] hrtimer_interrupt+0xa8/0x190
[ 132.613034] [<
ffffffff81051a58>] local_apic_timer_interrupt+0x38/0x60
[ 132.613035] [<
ffffffff817d9b0d>] smp_apic_timer_interrupt+0x3d/0x50
[ 132.613037] [<
ffffffff817d7c5c>] apic_timer_interrupt+0x8c/0xa0
[ 132.613038] <EOI> [<
ffffffff81063466>] ? native_safe_halt+0x6/0x10
[ 132.613043] [<
ffffffff81037a4e>] default_idle+0x1e/0xd0
[ 132.613044] [<
ffffffff810381cf>] arch_cpu_idle+0xf/0x20
[ 132.613046] [<
ffffffff810e8fda>] default_idle_call+0x2a/0x40
[ 132.613047] [<
ffffffff810e92d7>] cpu_startup_entry+0x2e7/0x340
[ 132.613048] [<
ffffffff81050235>] start_secondary+0x155/0x190
[ 132.613049] ---[ end trace
f91934d162ce9977 ]---
The warn is the spin_(lock|unlock)_bh(&css_set_lock) in the interrupt
context. Converting the spin_lock_bh to spin_lock_irq(save) to avoid
this problem - and other problems of sharing a spinlock with an
interrupt.
Cc: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Juri Lelli <juri.lelli@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: cgroups@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com>
Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Acked-by: Zefan Li <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tejun Heo [Thu, 26 May 2016 19:42:13 +0000 (15:42 -0400)]
cgroup: set css->id to -1 during init
commit
8fa3b8d689a54d6d04ff7803c724fb7aca6ce98e upstream.
If percpu_ref initialization fails during css_create(), the free path
can end up trying to free css->id of zero. As ID 0 is unused, it
doesn't cause a critical breakage but it does trigger a warning
message. Fix it by setting css->id to -1 from init_and_link_css().
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Wenwei Tao <ww.tao0320@gmail.com>
Fixes: 01e586598b22 ("cgroup: release css->id after css_free")
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Wenwei Tao [Fri, 13 May 2016 14:59:20 +0000 (22:59 +0800)]
cgroup: remove redundant cleanup in css_create
commit
b00c52dae6d9ee8d0f2407118ef6544ae5524781 upstream.
When create css failed, before call css_free_rcu_fn, we remove the css
id and exit the percpu_ref, but we will do these again in
css_free_work_fn, so they are redundant. Especially the css id, that
would cause problem if we remove it twice, since it may be assigned to
another css after the first remove.
tj: This was broken by two commits updating the free path without
synchronizing the creation failure path. This can be easily
triggered by trying to create more than 64k memory cgroups.
Signed-off-by: Wenwei Tao <ww.tao0320@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Fixes: 9a1049da9bd2 ("percpu-refcount: require percpu_ref to be exited explicitly")
Fixes: 01e586598b22 ("cgroup: release css->id after css_free")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alexander Shiyan [Wed, 1 Jun 2016 19:21:53 +0000 (22:21 +0300)]
pinctrl: imx: Do not treat a PIN without MUX register as an error
commit
ba562d5e54fd3136bfea0457add3675850247774 upstream.
Some PINs do not have a MUX register, it is not an error.
It is necessary to allow the continuation of the PINs configuration,
otherwise the whole PIN-group will be configured incorrectly.
Signed-off-by: Alexander Shiyan <shc_work@mail.ru>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tony Lindgren [Tue, 31 May 2016 21:17:06 +0000 (14:17 -0700)]
pinctrl: single: Fix missing flush of posted write for a wakeirq
commit
0ac3c0a4025f41748a083bdd4970cb3ede802b15 upstream.
With many repeated suspend resume cycles, the pin specific wakeirq
may not always work on omaps. This is because the write to enable the
pin interrupt may not have reached the device over the interconnect
before suspend happens.
Let's fix the issue with a flush of posted write with a readback.
Reported-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Minfei Huang [Fri, 27 May 2016 06:17:10 +0000 (14:17 +0800)]
pvclock: Add CPU barriers to get correct version value
commit
749d088b8e7f4b9826ede02b9a043e417fa84aa1 upstream.
Protocol for the "version" fields is: hypervisor raises it (making it
uneven) before it starts updating the fields and raises it again (making
it even) when it is done. Thus the guest can make sure the time values
it got are consistent by checking the version before and after reading
them.
Add CPU barries after getting version value just like what function
vread_pvclock does, because all of callees in this function is inline.
Fixes: 502dfeff239e8313bfbe906ca0a1a6827ac8481b
Signed-off-by: Minfei Huang <mnghuan@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Michael Welling [Wed, 20 Jul 2016 17:02:07 +0000 (10:02 -0700)]
Input: tsc200x - report proper input_dev name
commit
e9003c9cfaa17d26991688268b04244adb67ee2b upstream.
Passes input_id struct to the common probe function for the tsc200x drivers
instead of just the bustype.
This allows for the use of the product variable to set the input_dev->name
variable according to the type of touchscreen used. Note that when we
introduced support for TSC2004 we started calling everything TSC200X, so
let's keep this quirk.
Signed-off-by: Michael Welling <mwelling@ieee.org>
Acked-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Pali Rohár <pali.rohar@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Andrew Duggan [Wed, 20 Jul 2016 00:53:59 +0000 (17:53 -0700)]
Input: synaptics-rmi4 - fix maximum size check for F12 control register 8
commit
e4add7b6beaff4061693d0632bc1dcb306edba10 upstream.
According to the RMI4 spec the maximum size of F12 control register 8 is
15 bytes. The current code incorrectly reports an error if control 8 is
greater then 14. Making sensors with a control register 8 with 15 bytes
unusable.
Signed-off-by: Andrew Duggan <aduggan@synaptics.com>
Reported-by: Chris Healy <cphealy@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Dmitry Torokhov [Thu, 14 Jul 2016 16:33:41 +0000 (09:33 -0700)]
Revert "Input: wacom_w8001 - drop use of ABS_MT_TOOL_TYPE"
commit
3e9161bfe0482f26efeaf584d5fd69398c69313c upstream.
This reverts commit
5f7e5445a2de848c66d2d80ba5479197e8287c33 because
removal of input_mt_report_slot_state() means we no longer generate
tracking IDs for the reported contacts.
Acked-by: Peter Hutterer <peter.hutterer@who-t.net>
Acked-by: Ping Cheng <pinglinux@gmail.com>
Cameron Gutman [Wed, 29 Jun 2016 16:51:35 +0000 (09:51 -0700)]
Input: xpad - validate USB endpoint count during probe
commit
caca925fca4fb30c67be88cacbe908eec6721e43 upstream.
This prevents a malicious USB device from causing an oops.
Signed-off-by: Cameron Gutman <aicommander@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ping Cheng [Thu, 23 Jun 2016 17:55:11 +0000 (10:55 -0700)]
Input: wacom_w8001 - ignore invalid pen data packets
commit
9e72ac7492149a229ce9039c680849cb682d7092 upstream.
ThinkPad X60 Tablet PC (pen only device) sometime posts
packets that are larger than W8001_PKTLEN_TPCPEN.
Reported-by: Chris J Arges <christopherarges@gmail.com>
Tested-by: Chris J Arges <christopherarges@gmail.com>
Signed-off-by: Ping Cheng <pingc@wacom.com>
Reviewed-by: Peter Hutterer <peter.hutterer@who-t.net>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ping Cheng [Thu, 23 Jun 2016 17:54:17 +0000 (10:54 -0700)]
Input: wacom_w8001 - w8001_MAX_LENGTH should be 13
commit
12afb34400eb2b301f06b2aa3535497d14faee59 upstream.
Somehow the patch that added two-finger touch support forgot to update
W8001_MAX_LENGTH from 11 to 13.
Signed-off-by: Ping Cheng <pingc@wacom.com>
Reviewed-by: Peter Hutterer <peter.hutterer@who-t.net>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cameron Gutman [Thu, 23 Jun 2016 17:24:42 +0000 (10:24 -0700)]
Input: xpad - fix oops when attaching an unknown Xbox One gamepad
commit
c7f1429389ec1aa25e042bb13451385fbb596f8c upstream.
Xbox One controllers have multiple interfaces which all have the
same class, subclass, and protocol. One of the these interfaces
has only a single endpoint. When Xpad attempts to bind to this
interface, it causes an oops when trying initialize the output URB
by trying to access the second endpoint's descriptor.
This situation was avoided for known Xbox One devices by checking
the XTYPE constant associated with the VID and PID tuple. However,
this breaks when new or previously unknown Xbox One controllers
are attached to the system.
This change addresses the problem by deriving the XTYPE for Xbox
One controllers based on the interface protocol before checking
the interface number.
Fixes: 1a48ff81b391 ("Input: xpad - add support for Xbox One controllers")
Signed-off-by: Cameron Gutman <aicommander@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Dmitry Torokhov [Tue, 21 Jun 2016 23:09:00 +0000 (16:09 -0700)]
Input: elantech - add more IC body types to the list
commit
226ba707744a51acb4244724e09caacb1d96aed9 upstream.
The touchpad in HP Pavilion 14-
ab057ca reports it's version as 12 and
according to Elan both 11 and 12 are valid IC types and should be
identified as hw_version 4.
Reported-by: Patrick Lessard <Patrick.Lessard@cogeco.com>
Tested-by: Patrick Lessard <Patrick.Lessard@cogeco.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sinclair Yeh [Fri, 24 Jun 2016 00:37:34 +0000 (17:37 -0700)]
Input: vmmouse - remove port reservation
commit
60842ef8128e7bf58c024814cd0dc14319232b6c upstream.
The VMWare EFI BIOS will expose port 0x5658 as an ACPI resource. This
causes the port to be reserved by the APCI module as the system comes up,
making it unavailable to be reserved again by other drivers, thus
preserving this VMWare port for special use in a VMWare guest.
This port is designed to be shared among multiple VMWare services, such as
the VMMOUSE. Because of this, VMMOUSE should not try to reserve this port
on its own.
The VMWare non-EFI BIOS does not do this to preserve compatibility with
existing/legacy VMs. It is known that there is small chance a VM may be
configured such that these ports get reserved by other non-VMWare devices,
and if this ever happens, the result is undefined.
Signed-off-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kangjie Lu [Tue, 3 May 2016 20:44:32 +0000 (16:44 -0400)]
ALSA: timer: Fix leak in events via snd_timer_user_tinterrupt
commit
e4ec8cc8039a7063e24204299b462bd1383184a5 upstream.
The stack object “r1” has a total size of 32 bytes. Its field
“event” and “val” both contain 4 bytes padding. These 8 bytes
padding bytes are sent to user without being initialized.
Signed-off-by: Kangjie Lu <kjlu@gatech.edu>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kangjie Lu [Tue, 3 May 2016 20:44:20 +0000 (16:44 -0400)]
ALSA: timer: Fix leak in events via snd_timer_user_ccallback
commit
9a47e9cff994f37f7f0dbd9ae23740d0f64f9fe6 upstream.
The stack object “r1” has a total size of 32 bytes. Its field
“event” and “val” both contain 4 bytes padding. These 8 bytes
padding bytes are sent to user without being initialized.
Signed-off-by: Kangjie Lu <kjlu@gatech.edu>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kangjie Lu [Tue, 3 May 2016 20:44:07 +0000 (16:44 -0400)]
ALSA: timer: Fix leak in SNDRV_TIMER_IOCTL_PARAMS
commit
cec8f96e49d9be372fdb0c3836dcf31ec71e457e upstream.
The stack object “tread” has a total size of 32 bytes. Its field
“event” and “val” both contain 4 bytes padding. These 8 bytes
padding bytes are sent to user without being initialized.
Signed-off-by: Kangjie Lu <kjlu@gatech.edu>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bob Liu [Tue, 7 Jun 2016 14:43:15 +0000 (10:43 -0400)]
xen-blkfront: don't call talk_to_blkback when already connected to blkback
commit
efd1535270c1deb0487527bf0c3c827301a69c93 upstream.
Sometimes blkfront may twice receive blkback_changed() notification
(XenbusStateConnected) after migration, which will cause
talk_to_blkback() to be called twice too and confuse xen-blkback.
The flow is as follow:
blkfront blkback
blkfront_resume()
> talk_to_blkback()
> Set blkfront to XenbusStateInitialised
front changed()
> Connect()
> Set blkback to XenbusStateConnected
blkback_changed()
> Skip talk_to_blkback()
because frontstate == XenbusStateInitialised
> blkfront_connect()
> Set blkfront to XenbusStateConnected
-----
And here we get another XenbusStateConnected notification leading
to:
-----
blkback_changed()
> because now frontstate != XenbusStateInitialised
talk_to_blkback() is also called again
> blkfront state changed from
XenbusStateConnected to XenbusStateInitialised
(Which is not correct!)
front_changed():
> Do nothing because blkback
already in XenbusStateConnected
Now blkback is in XenbusStateConnected but blkfront is still
in XenbusStateInitialised - leading to no disks.
Poking of the XenbusStateConnected state is allowed (to deal with
block disk change) and has to be dealt with. The most likely
cause of this bug are custom udev scripts hooking up the disks
and then validating the size.
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bob Liu [Tue, 31 May 2016 08:59:17 +0000 (16:59 +0800)]
xen-blkfront: fix resume issues after a migration
commit
2a6f71ad99cabe436e70c3f5fcf58072cb3bc07f upstream.
After a migrate to another host (which may not have multiqueue
support), the number of rings (block hardware queues)
may be changed and the ring info structure will also be reallocated.
This patch fixes two related bugs:
* call blk_mq_update_nr_hw_queues() to make blk-core know the number
of hardware queues have been changed.
* Don't store rinfo pointer to hctx->driver_data, because rinfo may be
reallocated so use hctx->queue_num to get the rinfo structure instead.
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jan Beulich [Thu, 7 Jul 2016 07:32:04 +0000 (01:32 -0600)]
xenbus: don't bail early from xenbus_dev_request_and_reply()
commit
7469be95a487319514adce2304ad2af3553d2fc9 upstream.
xenbus_dev_request_and_reply() needs to track whether a transaction is
open. For XS_TRANSACTION_START messages it calls transaction_start()
and for XS_TRANSACTION_END messages it calls transaction_end().
If sending an XS_TRANSACTION_START message fails or responds with an
an error, the transaction is not open and transaction_end() must be
called.
If sending an XS_TRANSACTION_END message fails, the transaction is
still open, but if an error response is returned the transaction is
closed.
Commit
027bd7e89906 ("xen/xenbus: Avoid synchronous wait on XenBus
stalling shutdown/restart") introduced a regression where failed
XS_TRANSACTION_START messages were leaving the transaction open. This
can cause problems with suspend (and migration) as all transactions
must be closed before suspending.
It appears that the problematic change was added accidentally, so just
remove it.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jan Beulich [Thu, 7 Jul 2016 07:23:57 +0000 (01:23 -0600)]
xenbus: don't BUG() on user mode induced condition
commit
0beef634b86a1350c31da5fcc2992f0d7c8a622b upstream.
Inability to locate a user mode specified transaction ID should not
lead to a kernel crash. For other than XS_TRANSACTION_START also
don't issue anything to xenbus if the specified ID doesn't match that
of any active transaction.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bob Liu [Mon, 27 Jun 2016 08:33:24 +0000 (16:33 +0800)]
xen-blkfront: save uncompleted reqs in blkfront_resume()
commit
7b427a59538a98161321aa46c13f4ea81b43f4eb upstream.
Uncompleted reqs used to be 'saved and resubmitted' in blkfront_recover() during
migration, but that's too late after multi-queue was introduced.
After a migrate to another host (which may not have multiqueue support), the
number of rings (block hardware queues) may be changed and the ring and shadow
structure will also be reallocated.
The blkfront_recover() then can't 'save and resubmit' the real
uncompleted reqs because shadow structure have been reallocated.
This patch fixes this issue by moving the 'save' logic out of
blkfront_recover() to earlier place in blkfront_resume().
The 'resubmit' is not changed and still in blkfront_recover().
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Andrey Grodzovsky [Tue, 21 Jun 2016 18:26:36 +0000 (14:26 -0400)]
xen/pciback: Fix conf_space read/write overlap check.
commit
02ef871ecac290919ea0c783d05da7eedeffc10e upstream.
Current overlap check is evaluating to false a case where a filter
field is fully contained (proper subset) of a r/w request. This
change applies classical overlap check instead to include all the
scenarios.
More specifically, for (Hilscher GmbH CIFX 50E-DP(M/S)) device driver
the logic is such that the entire confspace is read and written in 4
byte chunks. In this case as an example, CACHE_LINE_SIZE,
LATENCY_TIMER and PCI_BIST are arriving together in one call to
xen_pcibk_config_write() with offset == 0xc and size == 4. With the
exsisting overlap check the LATENCY_TIMER field (offset == 0xd, length
== 1) is fully contained in the write request and hence is excluded
from write, which is incorrect.
Signed-off-by: Andrey Grodzovsky <andrey2805@gmail.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Vineet Gupta [Tue, 28 Jun 2016 04:12:25 +0000 (09:42 +0530)]
ARC: unwind: ensure that .debug_frame is generated (vs. .eh_frame)
commit
f52e126cc7476196f44f3c313b7d9f0699a881fc upstream.
With recent binutils update to support dwarf CFI pseudo-ops in gas, we
now get .eh_frame vs. .debug_frame. Although the call frame info is
exactly the same in both, the CIE differs, which the current kernel
unwinder can't cope with.
This broke both the kernel unwinder as well as loadable modules (latter
because of a new unhandled relo R_ARC_32_PCREL from .rela.eh_frame in
the module loader)
The ideal solution would be to switch unwinder to .eh_frame.
For now however we can make do by just ensureing .debug_frame is
generated by removing -fasynchronous-unwind-tables
.eh_frame generated with -gdwarf-2 -fasynchronous-unwind-tables
.debug_frame generated with -gdwarf-2
Fixes STAR
9001058196
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alexey Brodkin [Thu, 23 Jun 2016 08:00:39 +0000 (11:00 +0300)]
arc: unwind: warn only once if DW2_UNWIND is disabled
commit
9bd54517ee86cb164c734f72ea95aeba4804f10b upstream.
If CONFIG_ARC_DW2_UNWIND is disabled every time arc_unwind_core()
gets called following message gets printed in debug console:
----------------->8---------------
CONFIG_ARC_DW2_UNWIND needs to be enabled
----------------->8---------------
That message makes sense if user indeed wants to see a backtrace or
get nice function call-graphs in perf but what if user disabled
unwinder for the purpose? Why pollute his debug console?
So instead we'll warn user about possibly missing feature once and
let him decide if that was what he or she really wanted.
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Josh Poimboeuf [Mon, 13 Jun 2016 07:32:09 +0000 (02:32 -0500)]
sched/debug: Fix deadlock when enabling sched events
commit
eda8dca519269c92a0771668b3d5678792de7b78 upstream.
I see a hang when enabling sched events:
echo 1 > /sys/kernel/debug/tracing/events/sched/enable
The printk buffer shows:
BUG: spinlock recursion on CPU#1, swapper/1/0
lock: 0xffff88007d5d8c00, .magic:
dead4ead, .owner: swapper/1/0, .owner_cpu: 1
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.7.0-rc2+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
...
Call Trace:
<IRQ> [<
ffffffff8143d663>] dump_stack+0x85/0xc2
[<
ffffffff81115948>] spin_dump+0x78/0xc0
[<
ffffffff81115aea>] do_raw_spin_lock+0x11a/0x150
[<
ffffffff81891471>] _raw_spin_lock+0x61/0x80
[<
ffffffff810e5466>] ? try_to_wake_up+0x256/0x4e0
[<
ffffffff810e5466>] try_to_wake_up+0x256/0x4e0
[<
ffffffff81891a0a>] ? _raw_spin_unlock_irqrestore+0x4a/0x80
[<
ffffffff810e5705>] wake_up_process+0x15/0x20
[<
ffffffff810cebb4>] insert_work+0x84/0xc0
[<
ffffffff810ced7f>] __queue_work+0x18f/0x660
[<
ffffffff810cf9a6>] queue_work_on+0x46/0x90
[<
ffffffffa00cd95b>] drm_fb_helper_dirty.isra.11+0xcb/0xe0 [drm_kms_helper]
[<
ffffffffa00cdac0>] drm_fb_helper_sys_imageblit+0x30/0x40 [drm_kms_helper]
[<
ffffffff814babcd>] soft_cursor+0x1ad/0x230
[<
ffffffff814ba379>] bit_cursor+0x649/0x680
[<
ffffffff814b9d30>] ? update_attr.isra.2+0x90/0x90
[<
ffffffff814b5e6a>] fbcon_cursor+0x14a/0x1c0
[<
ffffffff81555ef8>] hide_cursor+0x28/0x90
[<
ffffffff81558b6f>] vt_console_print+0x3bf/0x3f0
[<
ffffffff81122c63>] call_console_drivers.constprop.24+0x183/0x200
[<
ffffffff811241f4>] console_unlock+0x3d4/0x610
[<
ffffffff811247f5>] vprintk_emit+0x3c5/0x610
[<
ffffffff81124bc9>] vprintk_default+0x29/0x40
[<
ffffffff811e965b>] printk+0x57/0x73
[<
ffffffff810f7a9e>] enqueue_entity+0xc2e/0xc70
[<
ffffffff810f7b39>] enqueue_task_fair+0x59/0xab0
[<
ffffffff8106dcd9>] ? kvm_sched_clock_read+0x9/0x20
[<
ffffffff8103fb39>] ? sched_clock+0x9/0x10
[<
ffffffff810e3fcc>] activate_task+0x5c/0xa0
[<
ffffffff810e4514>] ttwu_do_activate+0x54/0xb0
[<
ffffffff810e5cea>] sched_ttwu_pending+0x7a/0xb0
[<
ffffffff810e5e51>] scheduler_ipi+0x61/0x170
[<
ffffffff81059e7f>] smp_trace_reschedule_interrupt+0x4f/0x2a0
[<
ffffffff81893ba6>] trace_reschedule_interrupt+0x96/0xa0
<EOI> [<
ffffffff8106e0d6>] ? native_safe_halt+0x6/0x10
[<
ffffffff8110fb1d>] ? trace_hardirqs_on+0xd/0x10
[<
ffffffff81040ac0>] default_idle+0x20/0x1a0
[<
ffffffff8104147f>] arch_cpu_idle+0xf/0x20
[<
ffffffff81102f8f>] default_idle_call+0x2f/0x50
[<
ffffffff8110332e>] cpu_startup_entry+0x37e/0x450
[<
ffffffff8105af70>] start_secondary+0x160/0x1a0
Note the hang only occurs when echoing the above from a physical serial
console, not from an ssh session.
The bug is caused by a deadlock where the task is trying to grab the rq
lock twice because printk()'s aren't safe in sched code.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: cb2517653fcc ("sched/debug: Make schedstats a runtime tunable that is disabled by default")
Link: http://lkml.kernel.org/r/20160613073209.gdvdybiruljbkn3p@treble
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Andrey Ryabinin [Thu, 9 Jun 2016 12:20:05 +0000 (15:20 +0300)]
kernel/sysrq, watchdog, sched/core: Reset watchdog on all CPUs while processing sysrq-w
commit
57675cb976eff977aefb428e68e4e0236d48a9ff upstream.
Lengthy output of sysrq-w may take a lot of time on slow serial console.
Currently we reset NMI-watchdog on the current CPU to avoid spurious
lockup messages. Sometimes this doesn't work since softlockup watchdog
might trigger on another CPU which is waiting for an IPI to proceed.
We reset softlockup watchdogs on all CPUs, but we do this only after
listing all tasks, and this may be too late on a busy system.
So, reset watchdogs CPUs earlier, in for_each_process_thread() loop.
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1465474805-14641-1-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jiri Slaby [Wed, 20 Jul 2016 22:45:08 +0000 (15:45 -0700)]
pps: do not crash when failed to register
commit
368301f2fe4b07e5fb71dba3cc566bc59eb6705f upstream.
With this command sequence:
modprobe plip
modprobe pps_parport
rmmod pps_parport
the partport_pps modules causes this crash:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: parport_detach+0x1d/0x60 [pps_parport]
Oops: 0000 [#1] SMP
...
Call Trace:
parport_unregister_driver+0x65/0xc0 [parport]
SyS_delete_module+0x187/0x210
The sequence that builds up to this is:
1) plip is loaded and takes the parport device for exclusive use:
plip0: Parallel port at 0x378, using IRQ 7.
2) pps_parport then fails to grab the device:
pps_parport: parallel port PPS client
parport0: cannot grant exclusive access for device pps_parport
pps_parport: couldn't register with parport0
3) rmmod of pps_parport is then killed because it tries to access
pardev->name, but pardev (taken from port->cad) is NULL.
So add a check for NULL in the test there too.
Link: http://lkml.kernel.org/r/20160714115245.12651-1-jslaby@suse.cz
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Acked-by: Rodolfo Giometti <giometti@enneenne.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Andrey Ryabinin [Wed, 20 Jul 2016 22:45:00 +0000 (15:45 -0700)]
radix-tree: fix radix_tree_iter_retry() for tagged iterators.
commit
3cb9185c67304b2a7ea9be73e7d13df6fb2793a1 upstream.
radix_tree_iter_retry() resets slot to NULL, but it doesn't reset tags.
Then NULL slot and non-zero iter.tags passed to radix_tree_next_slot()
leading to crash:
RIP: radix_tree_next_slot include/linux/radix-tree.h:473
find_get_pages_tag+0x334/0x930 mm/filemap.c:1452
....
Call Trace:
pagevec_lookup_tag+0x3a/0x80 mm/swap.c:960
mpage_prepare_extent_to_map+0x321/0xa90 fs/ext4/inode.c:2516
ext4_writepages+0x10be/0x2b20 fs/ext4/inode.c:2736
do_writepages+0x97/0x100 mm/page-writeback.c:2364
__filemap_fdatawrite_range+0x248/0x2e0 mm/filemap.c:300
filemap_write_and_wait_range+0x121/0x1b0 mm/filemap.c:490
ext4_sync_file+0x34d/0xdb0 fs/ext4/fsync.c:115
vfs_fsync_range+0x10a/0x250 fs/sync.c:195
vfs_fsync fs/sync.c:209
do_fsync+0x42/0x70 fs/sync.c:219
SYSC_fdatasync fs/sync.c:232
SyS_fdatasync+0x19/0x20 fs/sync.c:230
entry_SYSCALL_64_fastpath+0x23/0xc1 arch/x86/entry/entry_64.S:207
We must reset iterator's tags to bail out from radix_tree_next_slot()
and go to the slow-path in radix_tree_next_chunk().
Fixes: 46437f9a554f ("radix-tree: fix race in gang lookup")
Link: http://lkml.kernel.org/r/1468495196-10604-1-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Johannes Weiner [Wed, 20 Jul 2016 22:44:57 +0000 (15:44 -0700)]
mm: memcontrol: fix cgroup creation failure after many small jobs
commit
73f576c04b9410ed19660f74f97521bee6e1c546 upstream.
The memory controller has quite a bit of state that usually outlives the
cgroup and pins its CSS until said state disappears. At the same time
it imposes a 16-bit limit on the CSS ID space to economically store IDs
in the wild. Consequently, when we use cgroups to contain frequent but
small and short-lived jobs that leave behind some page cache, we quickly
run into the 64k limitations of outstanding CSSs. Creating a new cgroup
fails with -ENOSPC while there are only a few, or even no user-visible
cgroups in existence.
Although pinning CSSs past cgroup removal is common, there are only two
instances that actually need an ID after a cgroup is deleted: cache
shadow entries and swapout records.
Cache shadow entries reference the ID weakly and can deal with the CSS
having disappeared when it's looked up later. They pose no hurdle.
Swap-out records do need to pin the css to hierarchically attribute
swapins after the cgroup has been deleted; though the only pages that
remain swapped out after offlining are tmpfs/shmem pages. And those
references are under the user's control, so they are manageable.
This patch introduces a private 16-bit memcg ID and switches swap and
cache shadow entries over to using that. This ID can then be recycled
after offlining when the CSS remains pinned only by objects that don't
specifically need it.
This script demonstrates the problem by faulting one cache page in a new
cgroup and deleting it again:
set -e
mkdir -p pages
for x in `seq 128000`; do
[ $((x % 1000)) -eq 0 ] && echo $x
mkdir /cgroup/foo
echo $$ >/cgroup/foo/cgroup.procs
echo trex >pages/$x
echo $$ >/cgroup/cgroup.procs
rmdir /cgroup/foo
done
When run on an unpatched kernel, we eventually run out of possible IDs
even though there are no visible cgroups:
[root@ham ~]# ./cssidstress.sh
[...]
65000
mkdir: cannot create directory '/cgroup/foo': No space left on device
After this patch, the IDs get released upon cgroup destruction and the
cache and css objects get released once memory reclaim kicks in.
[hannes@cmpxchg.org: init the IDR]
Link: http://lkml.kernel.org/r/20160621154601.GA22431@cmpxchg.org
Fixes: b2052564e66d ("mm: memcontrol: continue cache reclaim from offlined groups")
Link: http://lkml.kernel.org/r/20160617162516.GD19084@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: John Garcia <john.garcia@mesosphere.io>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Hugh Dickins [Thu, 14 Jul 2016 19:07:38 +0000 (12:07 -0700)]
mm: thp: refix false positive BUG in page_move_anon_rmap()
commit
5a49973d7143ebbabd76e1dcd69ee42e349bb7b9 upstream.
The VM_BUG_ON_PAGE in page_move_anon_rmap() is more trouble than it's
worth: the syzkaller fuzzer hit it again. It's still wrong for some THP
cases, because linear_page_index() was never intended to apply to
addresses before the start of a vma.
That's easily fixed with a signed long cast inside linear_page_index();
and Dmitry has tested such a patch, to verify the false positive. But
why extend linear_page_index() just for this case? when the avoidance in
page_move_anon_rmap() has already grown ugly, and there's no reason for
the check at all (nothing else there is using address or index).
Remove address arg from page_move_anon_rmap(), remove VM_BUG_ON_PAGE,
remove CONFIG_DEBUG_VM PageTransHuge adjustment.
And one more thing: should the compound_head(page) be done inside or
outside page_move_anon_rmap()? It's usually pushed down to the lowest
level nowadays (and mm/memory.c shows no other explicit use of it), so I
think it's better done in page_move_anon_rmap() than by caller.
Fixes: 0798d3c022dc ("mm: thp: avoid false positive VM_BUG_ON_PAGE in page_move_anon_rmap()")
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1607120444540.12528@eggly.anvils
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Dmitry Vyukov [Thu, 14 Jul 2016 19:07:29 +0000 (12:07 -0700)]
vmlinux.lds: account for destructor sections
commit
e41f501d391265ff568f3e49d6128cc30856a36f upstream.
If CONFIG_KASAN is enabled and gcc is configured with
--disable-initfini-array and/or gold linker is used, gcc emits
.ctors/.dtors and .text.startup/.text.exit sections instead of
.init_array/.fini_array. .dtors section is not explicitly accounted in
the linker script and messes vvar/percpu layout.
We want:
ffffffff822bfd80 D _edata
ffffffff822c0000 D __vvar_beginning_hack
ffffffff822c0000 A __vvar_page
ffffffff822c0080 0000000000000098 D vsyscall_gtod_data
ffffffff822c1000 A __init_begin
ffffffff822c1000 D init_per_cpu__irq_stack_union
ffffffff822c1000 A __per_cpu_load
ffffffff822d3000 D init_per_cpu__gdt_page
We got:
ffffffff8279a600 D _edata
ffffffff8279b000 A __vvar_page
ffffffff8279c000 A __init_begin
ffffffff8279c000 D init_per_cpu__irq_stack_union
ffffffff8279c000 A __per_cpu_load
ffffffff8279e000 D __vvar_beginning_hack
ffffffff8279e080 0000000000000098 D vsyscall_gtod_data
ffffffff827ae000 D init_per_cpu__gdt_page
This happens because __vvar_page and .vvar get different addresses in
arch/x86/kernel/vmlinux.lds.S:
. = ALIGN(PAGE_SIZE);
__vvar_page = .;
.vvar : AT(ADDR(.vvar) - LOAD_OFFSET) {
/* work around gold bug 13023 */
__vvar_beginning_hack = .;
Discard .dtors/.fini_array/.text.exit, since we don't call dtors.
Merge .text.startup into init text.
Link: http://lkml.kernel.org/r/1467386363-120030-1-git-send-email-dvyukov@google.com
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mel Gorman [Thu, 14 Jul 2016 19:07:23 +0000 (12:07 -0700)]
mm, meminit: ensure node is online before checking whether pages are uninitialised
commit
ef70b6f41cda6270165a6f27b2548ed31cfa3cb2 upstream.
early_page_uninitialised looks up an arbitrary PFN. While a machine
without node 0 will boot with "mm, page_alloc: Always return a valid
node from early_pfn_to_nid", it works because it assumes that nodes are
always in PFN order. This is not guaranteed so this patch adds
robustness by always checking if the node being checked is online.
Link: http://lkml.kernel.org/r/1468008031-3848-4-git-send-email-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mel Gorman [Thu, 14 Jul 2016 19:07:20 +0000 (12:07 -0700)]
mm, meminit: always return a valid node from early_pfn_to_nid
commit
e4568d3803852d00effd41dcdd489e726b998879 upstream.
early_pfn_to_nid can return node 0 if a PFN is invalid on machines that
has no node 0. A machine with only node 1 was observed to crash with
the following message:
BUG: unable to handle kernel paging request at
000000000002a3c8
PGD 0
Modules linked in:
Hardware name: Supermicro H8DSP-8/H8DSP-8, BIOS 080011 06/30/2006
task:
ffffffff81c0d500 ti:
ffffffff81c00000 task.ti:
ffffffff81c00000
RIP: reserve_bootmem_region+0x6a/0xef
CR2:
000000000002a3c8 CR3:
0000000001c06000 CR4:
00000000000006b0
Call Trace:
free_all_bootmem+0x4b/0x12a
mem_init+0x70/0xa3
start_kernel+0x25b/0x49b
The problem is that early_page_uninitialised uses the early_pfn_to_nid
helper which returns node 0 for invalid PFNs. No caller of
early_pfn_to_nid cares except early_page_uninitialised. This patch has
early_pfn_to_nid always return a valid node.
Link: http://lkml.kernel.org/r/1468008031-3848-3-git-send-email-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mauro Carvalho Chehab [Thu, 14 Jul 2016 19:07:15 +0000 (12:07 -0700)]
uapi: export lirc.h header
commit
12cb22bb8ae9aff9d72a9c0a234f26d641b20eb6 upstream.
This header contains the userspace API for lirc.
This is a fixup for commit
b7be755733dc ("[media] bz#75751: Move
internal header file lirc.h to uapi/"). It moved the header to the
right place, but it forgot to add it at Kbuild. So, despite being at
uapi, it is not copied to the right place.
Fixes: b7be755733dc44c72 ("[media] bz#75751: Move internal header file lirc.h to uapi/")
Link: http://lkml.kernel.org/r/320c765d32bfc82c582e336d52ffe1026c73c644.1468439021.git.mchehab@s-opensource.com
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Cc: Alec Leamas <leamas.alec@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
David Rientjes [Thu, 14 Jul 2016 19:06:50 +0000 (12:06 -0700)]
mm, compaction: prevent VM_BUG_ON when terminating freeing scanner
commit
a46cbf3bc53b6a93fb84a5ffb288c354fa807954 upstream.
It's possible to isolate some freepages in a pageblock and then fail
split_free_page() due to the low watermark check. In this case, we hit
VM_BUG_ON() because the freeing scanner terminated early without a
contended lock or enough freepages.
This should never have been a VM_BUG_ON() since it's not a fatal
condition. It should have been a VM_WARN_ON() at best, or even handled
gracefully.
Regardless, we need to terminate anytime the full pageblock scan was not
done. The logic belongs in isolate_freepages_block(), so handle its
state gracefully by terminating the pageblock loop and making a note to
restart at the same pageblock next time since it was not possible to
complete the scan this time.
[rientjes@google.com: don't rescan pages in a pageblock]
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1607111244150.83138@chino.kir.corp.google.com
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1606291436300.145590@chino.kir.corp.google.com
Signed-off-by: David Rientjes <rientjes@google.com>
Reported-by: Minchan Kim <minchan@kernel.org>
Tested-by: Minchan Kim <minchan@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Torsten Hilbrich [Fri, 24 Jun 2016 21:50:18 +0000 (14:50 -0700)]
fs/nilfs2: fix potential underflow in call to crc32_le
commit
63d2f95d63396059200c391ca87161897b99e74a upstream.
The value `bytes' comes from the filesystem which is about to be
mounted. We cannot trust that the value is always in the range we
expect it to be.
Check its value before using it to calculate the length for the crc32_le
call. It value must be larger (or equal) sumoff + 4.
This fixes a kernel bug when accidentially mounting an image file which
had the nilfs2 magic value 0x3434 at the right offset 0x406 by chance.
The bytes 0x01 0x00 were stored at 0x408 and were interpreted as a
s_bytes value of 1. This caused an underflow when substracting sumoff +
4 (20) in the call to crc32_le.
BUG: unable to handle kernel paging request at
ffff88021e600000
IP: crc32_le+0x36/0x100
...
Call Trace:
nilfs_valid_sb.part.5+0x52/0x60 [nilfs2]
nilfs_load_super_block+0x142/0x300 [nilfs2]
init_nilfs+0x60/0x390 [nilfs2]
nilfs_mount+0x302/0x520 [nilfs2]
mount_fs+0x38/0x160
vfs_kern_mount+0x67/0x110
do_mount+0x269/0xe00
SyS_mount+0x9f/0x100
entry_SYSCALL_64_fastpath+0x16/0x71
Link: http://lkml.kernel.org/r/1466778587-5184-2-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Torsten Hilbrich <torsten.hilbrich@secunet.com>
Tested-by: Torsten Hilbrich <torsten.hilbrich@secunet.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
David Rientjes [Fri, 24 Jun 2016 21:50:10 +0000 (14:50 -0700)]
mm, compaction: abort free scanner if split fails
commit
a4f04f2c6955aff5e2c08dcb40aca247ff4d7370 upstream.
If the memory compaction free scanner cannot successfully split a free
page (only possible due to per-zone low watermark), terminate the free
scanner rather than continuing to scan memory needlessly. If the
watermark is insufficient for a free page of order <= cc->order, then
terminate the scanner since all future splits will also likely fail.
This prevents the compaction freeing scanner from scanning all memory on
very large zones (very noticeable for zones > 128GB, for instance) when
all splits will likely fail while holding zone->lock.
compaction_alloc() iterating a 128GB zone has been benchmarked to take
over 400ms on some systems whereas any free page isolated and ready to
be split ends up failing in split_free_page() because of the low
watermark check and thus the iteration continues.
The next time compaction occurs, the freeing scanner will likely start
at the end of the zone again since no success was made previously and we
get the same lengthy iteration until the zone is brought above the low
watermark. All thp page faults can take >400ms in such a state without
this fix.
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1606211820350.97086@chino.kir.corp.google.com
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Lukasz Odzioba [Fri, 24 Jun 2016 21:50:01 +0000 (14:50 -0700)]
mm/swap.c: flush lru pvecs on compound page arrival
commit
8f182270dfec432e93fae14f9208a6b9af01009f upstream.
Currently we can have compound pages held on per cpu pagevecs, which
leads to a lot of memory unavailable for reclaim when needed. In the
systems with hundreads of processors it can be GBs of memory.
On of the way of reproducing the problem is to not call munmap
explicitly on all mapped regions (i.e. after receiving SIGTERM). After
that some pages (with THP enabled also huge pages) may end up on
lru_add_pvec, example below.
void main() {
#pragma omp parallel
{
size_t size = 55 * 1000 * 1000; // smaller than MEM/CPUS
void *p = mmap(NULL, size, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS , -1, 0);
if (p != MAP_FAILED)
memset(p, 0, size);
//munmap(p, size); // uncomment to make the problem go away
}
}
When we run it with THP enabled it will leave significant amount of
memory on lru_add_pvec. This memory will be not reclaimed if we hit
OOM, so when we run above program in a loop:
for i in `seq 100`; do ./a.out; done
many processes (95% in my case) will be killed by OOM.
The primary point of the LRU add cache is to save the zone lru_lock
contention with a hope that more pages will belong to the same zone and
so their addition can be batched. The huge page is already a form of
batched addition (it will add 512 worth of memory in one go) so skipping
the batching seems like a safer option when compared to a potential
excess in the caching which can be quite large and much harder to fix
because lru_add_drain_all is way to expensive and it is not really clear
what would be a good moment to call it.
Similarly we can reproduce the problem on lru_deactivate_pvec by adding:
madvise(p, size, MADV_FREE); after memset.
This patch flushes lru pvecs on compound page arrival making the problem
less severe - after applying it kill rate of above example drops to 0%,
due to reducing maximum amount of memory held on pvec from 28MB (with
THP) to 56kB per CPU.
Suggested-by: Michal Hocko <mhocko@suse.com>
Link: http://lkml.kernel.org/r/1466180198-18854-1-git-send-email-lukasz.odzioba@intel.com
Signed-off-by: Lukasz Odzioba <lukasz.odzioba@intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Cc: Ming Li <mingli199x@qq.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tejun Heo [Fri, 24 Jun 2016 21:49:58 +0000 (14:49 -0700)]
memcg: css_alloc should return an ERR_PTR value on error
commit
ea3a9645866e12d2b198434f03df3c3e96fb86ce upstream.
mem_cgroup_css_alloc() was returning NULL on failure while cgroup core
expected it to return an ERR_PTR value leading to the following NULL
deref after a css allocation failure. Fix it by return
ERR_PTR(-ENOMEM) instead. I'll also update cgroup core so that it
can handle NULL returns.
mkdir: page allocation failure: order:6, mode:0x240c0c0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO)
CPU: 0 PID: 8738 Comm: mkdir Not tainted 4.7.0-rc3+ #123
...
Call Trace:
dump_stack+0x68/0xa1
warn_alloc_failed+0xd6/0x130
__alloc_pages_nodemask+0x4c6/0xf20
alloc_pages_current+0x66/0xe0
alloc_kmem_pages+0x14/0x80
kmalloc_order_trace+0x2a/0x1a0
__kmalloc+0x291/0x310
memcg_update_all_caches+0x6c/0x130
mem_cgroup_css_alloc+0x590/0x610
cgroup_apply_control_enable+0x18b/0x370
cgroup_mkdir+0x1de/0x2e0
kernfs_iop_mkdir+0x55/0x80
vfs_mkdir+0xb9/0x150
SyS_mkdir+0x66/0xd0
do_syscall_64+0x53/0x120
entry_SYSCALL64_slow_path+0x25/0x25
...
BUG: unable to handle kernel NULL pointer dereference at
00000000000000d0
IP: init_and_link_css+0x37/0x220
PGD
34b1e067 PUD
3a109067 PMD 0
Oops: 0002 [#1] SMP
Modules linked in:
CPU: 0 PID: 8738 Comm: mkdir Not tainted 4.7.0-rc3+ #123
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.2-20160422_131301-anatol 04/01/2014
task:
ffff88007cbc5200 ti:
ffff8800666d4000 task.ti:
ffff8800666d4000
RIP: 0010:[<
ffffffff810f2ca7>] [<
ffffffff810f2ca7>] init_and_link_css+0x37/0x220
RSP: 0018:
ffff8800666d7d90 EFLAGS:
00010246
RAX:
0000000000000000 RBX:
0000000000000000 RCX:
0000000000000000
RDX:
ffffffff810f2499 RSI:
0000000000000000 RDI:
0000000000000008
RBP:
ffff8800666d7db8 R08:
0000000000000003 R09:
0000000000000000
R10:
0000000000000001 R11:
0000000000000000 R12:
ffff88005a5fb400
R13:
ffffffff81f0f8a0 R14:
ffff88005a5fb400 R15:
0000000000000010
FS:
00007fc944689700(0000) GS:
ffff88007fc00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00007f3aed0d2b80 CR3:
000000003a1e8000 CR4:
00000000000006f0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
Call Trace:
cgroup_apply_control_enable+0x1ac/0x370
cgroup_mkdir+0x1de/0x2e0
kernfs_iop_mkdir+0x55/0x80
vfs_mkdir+0xb9/0x150
SyS_mkdir+0x66/0xd0
do_syscall_64+0x53/0x120
entry_SYSCALL64_slow_path+0x25/0x25
Code: 89 f5 48 89 fb 49 89 d4 48 83 ec 08 8b 05 72 3b d8 00 85 c0 0f 85 60 01 00 00 4c 89 e7 e8 72 f7 ff ff 48 8d 7b 08 48 89 d9 31 c0 <48> c7 83 d0 00 00 00 00 00 00 00 48 83 e7 f8 48 29 f9 81 c1 d8
RIP init_and_link_css+0x37/0x220
RSP <
ffff8800666d7d90>
CR2:
00000000000000d0
---[ end trace
a2d8836ae1e852d1 ]---
Link: http://lkml.kernel.org/r/20160621165740.GJ3262@mtj.duckdns.org
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tejun Heo [Fri, 24 Jun 2016 21:49:54 +0000 (14:49 -0700)]
memcg: mem_cgroup_migrate() may be called with irq disabled
commit
d93c4130a7d049b234b5d5a15808eaf5406f2789 upstream.
mem_cgroup_migrate() uses local_irq_disable/enable() but can be called
with irq disabled from migrate_page_copy(). This ends up enabling irq
while holding a irq context lock triggering the following lockdep
warning. Fix it by using irq_save/restore instead.
=================================
[ INFO: inconsistent lock state ]
4.7.0-rc1+ #52 Tainted: G W
---------------------------------
inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
kcompactd0/151 [HC0[0]:SC0[0]:HE1:SE1] takes:
(&(&ctx->completion_lock)->rlock){+.?.-.}, at: [<
000000000038fd96>] aio_migratepage+0x156/0x1e8
{IN-SOFTIRQ-W} state was registered at:
__lock_acquire+0x5b6/0x1930
lock_acquire+0xee/0x270
_raw_spin_lock_irqsave+0x66/0xb0
aio_complete+0x98/0x328
dio_complete+0xe4/0x1e0
blk_update_request+0xd4/0x450
scsi_end_request+0x48/0x1c8
scsi_io_completion+0x272/0x698
blk_done_softirq+0xca/0xe8
__do_softirq+0xc8/0x518
irq_exit+0xee/0x110
do_IRQ+0x6a/0x88
io_int_handler+0x11a/0x25c
__mutex_unlock_slowpath+0x144/0x1d8
__mutex_unlock_slowpath+0x140/0x1d8
kernfs_iop_permission+0x64/0x80
__inode_permission+0x9e/0xf0
link_path_walk+0x6e/0x510
path_lookupat+0xc4/0x1a8
filename_lookup+0x9c/0x160
user_path_at_empty+0x5c/0x70
SyS_readlinkat+0x68/0x140
system_call+0xd6/0x270
irq event stamp: 971410
hardirqs last enabled at (971409): migrate_page_move_mapping+0x3ea/0x588
hardirqs last disabled at (971410): _raw_spin_lock_irqsave+0x3c/0xb0
softirqs last enabled at (970526): __do_softirq+0x460/0x518
softirqs last disabled at (970519): irq_exit+0xee/0x110
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&(&ctx->completion_lock)->rlock);
<Interrupt>
lock(&(&ctx->completion_lock)->rlock);
*** DEADLOCK ***
3 locks held by kcompactd0/151:
#0: (&(&mapping->private_lock)->rlock){+.+.-.}, at: aio_migratepage+0x42/0x1e8
#1: (&ctx->ring_lock){+.+.+.}, at: aio_migratepage+0x5a/0x1e8
#2: (&(&ctx->completion_lock)->rlock){+.?.-.}, at: aio_migratepage+0x156/0x1e8
stack backtrace:
CPU: 20 PID: 151 Comm: kcompactd0 Tainted: G W 4.7.0-rc1+ #52
Call Trace:
show_trace+0xea/0xf0
show_stack+0x72/0xf0
dump_stack+0x9a/0xd8
print_usage_bug.part.27+0x2d4/0x2e8
mark_lock+0x17e/0x758
mark_held_locks+0xa2/0xd0
trace_hardirqs_on_caller+0x140/0x1c0
mem_cgroup_migrate+0x266/0x370
aio_migratepage+0x16a/0x1e8
move_to_new_page+0xb0/0x260
migrate_pages+0x8f4/0x9f0
compact_zone+0x4dc/0xdc8
kcompactd_do_work+0x1aa/0x358
kcompactd+0xba/0x2c8
kthread+0x10a/0x110
kernel_thread_starter+0x6/0xc
kernel_thread_starter+0x0/0xc
INFO: lockdep is turned off.
Link: http://lkml.kernel.org/r/20160620184158.GO3262@mtj.duckdns.org
Link: http://lkml.kernel.org/g/5767CFE5.7080904@de.ibm.com
Fixes: 74485cf2bc85 ("mm: migrate: consolidate mem_cgroup_migrate() calls")
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mel Gorman [Fri, 24 Jun 2016 21:49:37 +0000 (14:49 -0700)]
mm, sl[au]b: add __GFP_ATOMIC to the GFP reclaim mask
commit
e838a45f9392a5bd2be1cd3ab0b16ae85857461c upstream.
Commit
d0164adc89f6 ("mm, page_alloc: distinguish between being unable
to sleep, unwilling to sleep and avoiding waking kswapd") modified
__GFP_WAIT to explicitly identify the difference between atomic callers
and those that were unwilling to sleep. Later the definition was
removed entirely.
The GFP_RECLAIM_MASK is the set of flags that affect watermark checking
and reclaim behaviour but __GFP_ATOMIC was never added. Without it,
atomic users of the slab allocator strip the __GFP_ATOMIC flag and
cannot access the page allocator atomic reserves. This patch addresses
the problem.
The user-visible impact depends on the workload but potentially atomic
allocations unnecessarily fail without this path.
Link: http://lkml.kernel.org/r/20160610093832.GK2527@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reported-by: Marcin Wojtas <mw@semihalf.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ludovic Desroches [Thu, 12 May 2016 14:54:10 +0000 (16:54 +0200)]
dmaengine: at_xdmac: double FIFO flush needed to compute residue
commit
9295c41d77ca93aac79cfca6fa09fa1ca5cab66f upstream.
Due to the way CUBC register is updated, a double flush is needed to
compute an accurate residue. First flush aim is to get data from the DMA
FIFO and second one ensures that we won't report data which are not in
memory.
Signed-off-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Fixes: e1f7c9eee707 ("dmaengine: at_xdmac: creation of the atmel
eXtended DMA Controller driver")
Reviewed-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ludovic Desroches [Thu, 12 May 2016 14:54:09 +0000 (16:54 +0200)]
dmaengine: at_xdmac: fix residue corruption
commit
53398f488821c2b5b15291e3debec6ad33f75d3d upstream.
An unexpected value of CUBC can lead to a corrupted residue. A more
complex sequence is needed to detect an inaccurate value for NCA or CUBC.
Signed-off-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Fixes: e1f7c9eee707 ("dmaengine: at_xdmac: creation of the atmel
eXtended DMA Controller driver")
Reviewed-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ludovic Desroches [Thu, 12 May 2016 14:54:08 +0000 (16:54 +0200)]
dmaengine: at_xdmac: align descriptors on 64 bits
commit
4a9723e8df68cfce4048517ee32e37f78854b6fb upstream.
Having descriptors aligned on 64 bits allows update CNDA and CUBC in an
atomic way.
Signed-off-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Fixes: e1f7c9eee707 ("dmaengine: at_xdmac: creation of the atmel
eXtended DMA Controller driver")
Reviewed-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Lukas Wunner [Sun, 12 Jun 2016 10:31:53 +0000 (12:31 +0200)]
x86/quirks: Add early quirk to reset Apple AirPort card
commit
abb2bafd295fe962bbadc329dbfb2146457283ac upstream.
The EFI firmware on Macs contains a full-fledged network stack for
downloading OS X images from osrecovery.apple.com. Unfortunately
on Macs introduced 2011 and 2012, EFI brings up the Broadcom 4331
wireless card on every boot and leaves it enabled even after
ExitBootServices has been called. The card continues to assert its IRQ
line, causing spurious interrupts if the IRQ is shared. It also corrupts
memory by DMAing received packets, allowing for remote code execution
over the air. This only stops when a driver is loaded for the wireless
card, which may be never if the driver is not installed or blacklisted.
The issue seems to be constrained to the Broadcom 4331. Chris Milsted
has verified that the newer Broadcom 4360 built into the MacBookPro11,3
(2013/2014) does not exhibit this behaviour. The chances that Apple will
ever supply a firmware fix for the older machines appear to be zero.
The solution is to reset the card on boot by writing to a reset bit in
its mmio space. This must be done as an early quirk and not as a plain
vanilla PCI quirk to successfully combat memory corruption by DMAed
packets: Matthew Garrett found out in 2012 that the packets are written
to EfiBootServicesData memory (http://mjg59.dreamwidth.org/11235.html).
This type of memory is made available to the page allocator by
efi_free_boot_services(). Plain vanilla PCI quirks run much later, in
subsys initcall level. In-between a time window would be open for memory
corruption. Random crashes occurring in this time window and attributed
to DMAed packets have indeed been observed in the wild by Chris
Bainbridge.
When Matthew Garrett analyzed the memory corruption issue in 2012, he
sought to fix it with a grub quirk which transitions the card to D3hot:
http://git.savannah.gnu.org/cgit/grub.git/commit/?id=
9d34bb85da56
This approach does not help users with other bootloaders and while it
may prevent DMAed packets, it does not cure the spurious interrupts
emanating from the card. Unfortunately the card's mmio space is
inaccessible in D3hot, so to reset it, we have to undo the effect of
Matthew's grub patch and transition the card back to D0.
Note that the quirk takes a few shortcuts to reduce the amount of code:
The size of BAR 0 and the location of the PM capability is identical
on all affected machines and therefore hardcoded. Only the address of
BAR 0 differs between models. Also, it is assumed that the BCMA core
currently mapped is the 802.11 core. The EFI driver seems to always take
care of this.
Michael Büsch, Bjorn Helgaas and Matt Fleming contributed feedback
towards finding the best solution to this problem.
The following should be a comprehensive list of affected models:
iMac13,1 2012 21.5" [Root Port 00:1c.3 = 8086:1e16]
iMac13,2 2012 27" [Root Port 00:1c.3 = 8086:1e16]
Macmini5,1 2011 i5 2.3 GHz [Root Port 00:1c.1 = 8086:1c12]
Macmini5,2 2011 i5 2.5 GHz [Root Port 00:1c.1 = 8086:1c12]
Macmini5,3 2011 i7 2.0 GHz [Root Port 00:1c.1 = 8086:1c12]
Macmini6,1 2012 i5 2.5 GHz [Root Port 00:1c.1 = 8086:1e12]
Macmini6,2 2012 i7 2.3 GHz [Root Port 00:1c.1 = 8086:1e12]
MacBookPro8,1 2011 13" [Root Port 00:1c.1 = 8086:1c12]
MacBookPro8,2 2011 15" [Root Port 00:1c.1 = 8086:1c12]
MacBookPro8,3 2011 17" [Root Port 00:1c.1 = 8086:1c12]
MacBookPro9,1 2012 15" [Root Port 00:1c.1 = 8086:1e12]
MacBookPro9,2 2012 13" [Root Port 00:1c.1 = 8086:1e12]
MacBookPro10,1 2012 15" [Root Port 00:1c.1 = 8086:1e12]
MacBookPro10,2 2012 13" [Root Port 00:1c.1 = 8086:1e12]
For posterity, spurious interrupts caused by the Broadcom 4331 wireless
card resulted in splats like this (stacktrace omitted):
irq 17: nobody cared (try booting with the "irqpoll" option)
handlers:
[<
ffffffff81374370>] pcie_isr
[<
ffffffffc0704550>] sdhci_irq [sdhci] threaded [<
ffffffffc07013c0>] sdhci_thread_irq [sdhci]
[<
ffffffffc0a0b960>] azx_interrupt [snd_hda_codec]
Disabling IRQ #17
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79301
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=111781
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=728916
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=895951#c16
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=
1009819
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=
1098621
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=
1149632#c5
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=
1279130
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=
1332732
Tested-by: Konstantin Simanov <k.simanov@stlk.ru> # [MacBookPro8,1]
Tested-by: Lukas Wunner <lukas@wunner.de> # [MacBookPro9,1]
Tested-by: Bryan Paradis <bryan.paradis@gmail.com> # [MacBookPro9,2]
Tested-by: Andrew Worsley <amworsley@gmail.com> # [MacBookPro10,1]
Tested-by: Chris Bainbridge <chris.bainbridge@gmail.com> # [MacBookPro10,2]
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Rafał Miłecki <zajec5@gmail.com>
Acked-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chris Milsted <cmilsted@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Michael Buesch <m@bues.ch>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: b43-dev@lists.infradead.org
Cc: linux-pci@vger.kernel.org
Cc: linux-wireless@vger.kernel.org
Link: http://lkml.kernel.org/r/48d0972ac82a53d460e5fce77a07b2560db95203.1465690253.git.lukas@wunner.de
[ Did minor readability edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Lukas Wunner [Sun, 12 Jun 2016 10:31:53 +0000 (12:31 +0200)]
x86/quirks: Reintroduce scanning of secondary buses
commit
850c321027c2e31d0afc71588974719a4b565550 upstream.
We used to scan secondary buses until the following commit that
was applied in 2009:
8659c406ade3 ("x86: only scan the root bus in early PCI quirks")
which commit constrained early quirks to the root bus only. Its
motivation was to prevent application of the nvidia_bugs quirk
on secondary buses.
We're about to add a quirk to reset the Broadcom 4331 wireless card on
2011/2012 Macs, which is located on a secondary bus behind a PCIe root
port. To facilitate that, reintroduce scanning of secondary buses.
The commit message of
8659c406ade3 notes that scanning only the root bus
"saves quite some unnecessary scanning work". The algorithm used prior
to
8659c406ade3 was particularly time consuming because it scanned
buses 0 to 31 brute force. To avoid lengthening boot time, employ a
recursive strategy which only scans buses that are actually reachable
from the root bus.
Yinghai Lu pointed out that the secondary bus number read from a
bridge's config space may be invalid, in particular a value of 0 would
cause an infinite loop. The PCI core goes beyond that and recurses to a
child bus only if its bus number is greater than the parent bus number
(see pci_scan_bridge()). Since the root bus is numbered 0, this implies
that secondary buses may not be 0. Do the same on early scanning.
If this algorithm is found to significantly impact boot time or cause
infinite loops on broken hardware, it would be possible to limit its
recursion depth: The Broadcom 4331 quirk applies at depth 1, all others
at depth 0, so the bus need not be scanned deeper than that for now. An
alternative approach would be to revert to scanning only the root bus,
and apply the Broadcom 4331 quirk to the root ports 8086:1c12, 8086:1e12
and 8086:1e16. Apple always positioned the card behind either of these
three ports. The quirk would then check presence of the card in slot 0
below the root port and do its deed.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: linux-pci@vger.kernel.org
Link: http://lkml.kernel.org/r/f0daa70dac1a9b2483abdb31887173eb6ab77bdf.1465690253.git.lukas@wunner.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Lukas Wunner [Sun, 12 Jun 2016 10:31:53 +0000 (12:31 +0200)]
x86/quirks: Apply nvidia_bugs quirk only on root bus
commit
447d29d1d3aed839e74c2401ef63387780ac51ed upstream.
Since the following commit:
8659c406ade3 ("x86: only scan the root bus in early PCI quirks")
... early quirks are only applied to devices on the root bus.
The motivation was to prevent application of the nvidia_bugs quirk on
secondary buses.
We're about to reintroduce scanning of secondary buses for a quirk to
reset the Broadcom 4331 wireless card on 2011/2012 Macs. To prevent
regressions, open code the requirement to apply nvidia_bugs only on the
root bus.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/4d5477c1d76b2f0387a780f2142bbcdd9fee869b.1465690253.git.lukas@wunner.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>