Erez Zadok [Thu, 26 Jun 2014 03:22:55 +0000 (23:22 -0400)]
Wrapfs: support extended attributes (xattr) operations
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Signed-off-by: Mengyang Li <li.mengyang@stonybrook.edu>
Erez Zadok [Sat, 21 Jun 2014 00:28:12 +0000 (20:28 -0400)]
Wrapfs: support asynchronous-IO (AIO) operations
Signed-off-by: Li Mengyang <li.mengyang@stonybrook.edu>
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Sat, 21 Jun 2014 00:28:12 +0000 (20:28 -0400)]
Wrapfs: support direct-IO (DIO) operations
Signed-off-by: Li Mengyang <li.mengyang@stonybrook.edu>
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Thu, 15 May 2014 04:16:05 +0000 (00:16 -0400)]
Wrapfs: implement vm_ops->page_mkwrite
Some file systems (e.g., ext4) require it. Reported by Ted Ts'o.
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Thu, 3 Apr 2014 18:12:09 +0000 (14:12 -0400)]
Wrapfs: update documentation
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Thu, 3 Apr 2014 18:12:00 +0000 (14:12 -0400)]
Wrapfs: update maintainers
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Tue, 21 Jan 2014 08:23:10 +0000 (03:23 -0500)]
Wrapfs: update documentation
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Tue, 21 Jan 2014 06:17:55 +0000 (01:17 -0500)]
Wrapfs: 2014 Copyright update
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Wed, 5 Jun 2013 05:44:38 +0000 (01:44 -0400)]
Wrapfs: copy lower inode attributes in ->ioctl
Some ioctls (e.g., EXT2_IOC_SETFLAGS) can change inode attributes, so copy
them from lower inode.
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Wed, 5 Jun 2013 05:43:38 +0000 (01:43 -0400)]
Wrapfs: remove unnecessary call to vm_unmap in ->mmap
Code is unnecessary and causes deadlocks in newer kernels.
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Mon, 3 Jun 2013 01:55:33 +0000 (21:55 -0400)]
patch copyright-2013.patch
Erez Zadok [Sun, 26 May 2013 03:23:42 +0000 (23:23 -0400)]
Wrapfs: use d_make_root
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Tue, 31 Jan 2012 09:40:19 +0000 (04:40 -0500)]
Wrapfs: use mode_t
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Mon, 30 Jan 2012 01:34:27 +0000 (20:34 -0500)]
Wrapfs: use set_nlink()
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Fri, 9 Sep 2011 04:47:49 +0000 (00:47 -0400)]
Wrapfs: drop our dentry in ->rmdir
Also clear nlinks on our inode.
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Tue, 6 Sep 2011 04:10:32 +0000 (00:10 -0400)]
Wrapfs: use d_alloc_root
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Tue, 6 Sep 2011 04:10:31 +0000 (00:10 -0400)]
Wrapfs: use d_set_d_op
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Tue, 6 Sep 2011 04:10:30 +0000 (00:10 -0400)]
Wrapfs: use updated vfs_path_lookup prototype
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Tue, 6 Sep 2011 04:10:30 +0000 (00:10 -0400)]
Wrapfs: ->fsync updates for new prototype
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Tue, 6 Sep 2011 04:10:29 +0000 (00:10 -0400)]
Wrapfs: support LOOKUP_RCU in ->d_revalidate
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Tue, 6 Sep 2011 04:10:28 +0000 (00:10 -0400)]
Wrapfs: new ->permission prototype and fixes.
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Mon, 2 May 2011 06:00:02 +0000 (02:00 -0400)]
Wrapfs: lookup fixes
Don't use lookup_one_len any longer (doesn't work for NFS).
Initialize lower wrapfs_dentry_info so lower_path is NULL.
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Fri, 18 Mar 2011 17:14:28 +0000 (13:14 -0400)]
Wrapfs: remove extra debug in rmdir
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Fri, 18 Mar 2011 16:38:01 +0000 (12:38 -0400)]
Wrapfs: checkpatch fixes
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Fri, 18 Mar 2011 04:45:17 +0000 (00:45 -0400)]
Wrapfs: port to 2.6.39
Remove lock/unlock_kernel in ->fasync.
Convert from ->get_sb to ->mount op.
Remove include to smp_lock.h, added sched.h.
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Fri, 18 Mar 2011 03:21:55 +0000 (23:21 -0400)]
Wrapfs: copyright update for 2011
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Fri, 18 Mar 2011 03:21:55 +0000 (23:21 -0400)]
Wrapfs: better handling of NFS silly-renamed files
In ->unlink, if we try to unlink an NFS silly-renamed file, NFS returns
-EBUSY. We have to treat it as a success and return 0 to the VFS. NFS will
remove silly-deleted files later on anyway.
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Fri, 18 Mar 2011 03:21:55 +0000 (23:21 -0400)]
Wrapfs: update parent directory inode size in inode ops
After ->unlink, ->rmdir, and ->rename, we need to copy the (possibly
changed) inode size of the parent directory(ies) where the operation took
place.
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Fri, 18 Mar 2011 03:21:55 +0000 (23:21 -0400)]
Wrapfs: remove unnecessary calls to copy lower inode->n_links
Removed from ->create, ->symlink, and ->mknod.
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Tue, 8 Mar 2011 04:20:33 +0000 (23:20 -0500)]
Wrapfs: ->setattr fixes
Call inode_change_ok on our inode, not lower.
Don't copy inode sizes (VFS does it).
Pass lower file in struct iattr passed to notify_change on lower inode.
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Sun, 6 Mar 2011 21:23:16 +0000 (16:23 -0500)]
Wrapfs: update ->permission prototye and code for new iperm flag
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Fri, 12 Nov 2010 23:15:05 +0000 (18:15 -0500)]
Wrapfs: handle maxbytes properly
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Sat, 11 Sep 2010 19:49:33 +0000 (15:49 -0400)]
Wrapfs: support ->unlocked_ioctl and ->compat_ioctl
Old ->ioctl was split into ->unlocked_ioctl and ->compat_ioctl. Compat
version doesn't need to lock_kernel any longer.
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Wed, 11 Aug 2010 03:50:14 +0000 (23:50 -0400)]
Wrapfs: new vfs_statfs and ->evict_inode prototypes
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Sat, 7 Aug 2010 03:37:29 +0000 (23:37 -0400)]
Wrapfs: update ->fsync prototype
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Wed, 21 Apr 2010 01:22:02 +0000 (21:22 -0400)]
Wrapfs: update documentation
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Tue, 20 Apr 2010 19:32:09 +0000 (15:32 -0400)]
Wrapfs: include slab.h
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Tue, 20 Apr 2010 19:26:02 +0000 (15:26 -0400)]
Wrapfs: avoid an extra path_get/put pair in wrapfs_open
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Fri, 26 Feb 2010 08:18:04 +0000 (03:18 -0500)]
Wrapfs: decrement nd_path on follow_link error
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Tue, 5 Jan 2010 09:27:00 +0000 (04:27 -0500)]
Wrapfs: don't mention kernel version in modload message
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Tue, 5 Jan 2010 09:28:11 +0000 (04:28 -0500)]
Wrapfs/VFS: remove init_lower_nd and unexport release_lower_nd
Only wrapfs_create used it, and it is unnecessary to init a completely new
nameidata for the lower file system.
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Tue, 5 Jan 2010 01:45:06 +0000 (20:45 -0500)]
Kconfig: hook to configure Wrapfs
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Tue, 5 Jan 2010 01:45:06 +0000 (20:45 -0500)]
Makefile: hook to compile Wrapfs
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Tue, 5 Jan 2010 01:45:06 +0000 (20:45 -0500)]
VFS: export release_open_intent symbol
Needed to release the resources of the lower nameidata structures that we
create and pass to lower file systems (e.g., when calling vfs_create).
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Tue, 5 Jan 2010 01:45:06 +0000 (20:45 -0500)]
Wrapfs: file system magic number
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Tue, 5 Jan 2010 01:45:06 +0000 (20:45 -0500)]
Wrapfs: Kconfig options
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Tue, 5 Jan 2010 01:45:06 +0000 (20:45 -0500)]
Wrapfs: main Makefile
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Tue, 5 Jan 2010 01:45:06 +0000 (20:45 -0500)]
Wrapfs: vm_ops operations
Includes necessary address_space workaround ops.
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Tue, 5 Jan 2010 01:45:06 +0000 (20:45 -0500)]
Wrapfs: mount-time and module-linkage functions
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Tue, 5 Jan 2010 01:45:06 +0000 (20:45 -0500)]
Wrapfs: lookup-related functions
Main lookup function, nameidata helpers, and stacking-interposition
functions.
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Tue, 5 Jan 2010 01:45:06 +0000 (20:45 -0500)]
Wrapfs: file operations
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Tue, 5 Jan 2010 01:45:06 +0000 (20:45 -0500)]
Wrapfs: dentry operations
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Tue, 5 Jan 2010 01:45:06 +0000 (20:45 -0500)]
Wrapfs: inode operations
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Tue, 5 Jan 2010 01:45:06 +0000 (20:45 -0500)]
Wrapfs: superblock operations
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Tue, 5 Jan 2010 01:45:06 +0000 (20:45 -0500)]
Wrapfs: main header file
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Tue, 5 Jan 2010 01:45:06 +0000 (20:45 -0500)]
Wrapfs: Maintainers
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Tue, 5 Jan 2010 01:45:06 +0000 (20:45 -0500)]
Documentation: index entry for Wrapfs
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Erez Zadok [Tue, 5 Jan 2010 01:45:06 +0000 (20:45 -0500)]
Wrapfs: introduction and usage documentation
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Zefan Li [Wed, 26 Oct 2016 15:15:47 +0000 (23:15 +0800)]
Linux 3.4.113
Arnaldo Carvalho de Melo [Mon, 14 Mar 2016 12:56:35 +0000 (09:56 -0300)]
net: Fix use after free in the recvmmsg exit path
commit
34b88a68f26a75e4fded796f1a49c40f82234b7d upstream.
The syzkaller fuzzer hit the following use-after-free:
Call Trace:
[<
ffffffff8175ea0e>] __asan_report_load8_noabort+0x3e/0x40 mm/kasan/report.c:295
[<
ffffffff851cc31a>] __sys_recvmmsg+0x6fa/0x7f0 net/socket.c:2261
[< inline >] SYSC_recvmmsg net/socket.c:2281
[<
ffffffff851cc57f>] SyS_recvmmsg+0x16f/0x180 net/socket.c:2270
[<
ffffffff86332bb6>] entry_SYSCALL_64_fastpath+0x16/0x7a
arch/x86/entry/entry_64.S:185
And, as Dmitry rightly assessed, that is because we can drop the
reference and then touch it when the underlying recvmsg calls return
some packets and then hit an error, which will make recvmmsg to set
sock->sk->sk_err, oops, fix it.
Reported-and-Tested-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Fixes: a2e2725541fa ("net: Introduce recvmmsg socket syscall")
http://lkml.kernel.org/r/
20160122211644.GC2470@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Zefan Li <lizefan@huawei.com>
Michal Hocko [Sun, 16 Oct 2016 09:55:00 +0000 (11:55 +0200)]
mm, gup: close FOLL MAP_PRIVATE race
commit
19be0eaffa3ac7d8eb6784ad9bdbc7d67ed8e619 upstream.
faultin_page drops FOLL_WRITE after the page fault handler did the CoW
and then we retry follow_page_mask to get our CoWed page. This is racy,
however because the page might have been unmapped by that time and so
we would have to do a page fault again, this time without CoW. This
would cause the page cache corruption for FOLL_FORCE on MAP_PRIVATE
read only mappings with obvious consequences.
This is an ancient bug that was actually already fixed once by Linus
eleven years ago in commit
4ceb5db9757a ("Fix get_user_pages() race
for write access") but that was then undone due to problems on s390
by commit
f33ea7f404e5 ("fix get_user_pages bug") because s390 didn't
have proper dirty pte tracking until
abf09bed3cce ("s390/mm: implement
software dirty bits"). This wasn't a problem at the time as pointed out
by Hugh Dickins because madvise relied on mmap_sem for write up until
0a27a14a6292 ("mm: madvise avoid exclusive mmap_sem") but since then we
can race with madvise which can unmap the fresh COWed page or with KSM
and corrupt the content of the shared page.
This patch is based on the Linus' approach to not clear FOLL_WRITE after
the CoW page fault (aka VM_FAULT_WRITE) but instead introduces FOLL_COW
to note this fact. The flag is then rechecked during follow_pfn_pte to
enforce the page fault again if we do not see the CoWed page. Linus was
suggesting to check pte_dirty again as s390 is OK now. But that would
make backporting to some old kernels harder. So instead let's just make
sure that vm_normal_page sees a pure anonymous page.
This would guarantee we are seeing a real CoW page. Introduce
can_follow_write_pte which checks both pte_write and falls back to
PageAnon on forced write faults which passed CoW already. Thanks to Hugh
to point out that a special care has to be taken for KSM pages because
our COWed page might have been merged with a KSM one and keep its
PageAnon flag.
Fixes: 0a27a14a6292 ("mm: madvise avoid exclusive mmap_sem")
Reported-by: Phil "not Paul" Oester <kernel@linuxace.com>
Disclosed-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
[bwh: Backported to 3.2:
- Adjust filename, context, indentation
- The 'no_page' exit path in follow_page() is different, so open-code the
cleanup
- Delete a now-unused label]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Zefan Li <lizefan@huawei.com>
Nikolay Aleksandrov [Tue, 17 Nov 2015 14:49:06 +0000 (15:49 +0100)]
net/core: revert "net: fix __netdev_update_features return.." and add comment
commit
17b85d29e82cc3c874a497a8bc5764d6a2b043e2 upstream.
This reverts commit
00ee59271777 ("net: fix __netdev_update_features return
on ndo_set_features failure")
and adds a comment explaining why it's okay to return a value other than
0 upon error. Some drivers might actually change flags and return an
error so it's better to fire a spurious notification rather than miss
these.
CC: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Zefan Li <lizefan@huawei.com>
Paul Bolle [Thu, 18 Feb 2016 20:29:08 +0000 (21:29 +0100)]
ser_gigaset: use container_of() instead of detour
commit
8d2c3ab4445640957d136caa3629857d63544a2a upstream.
The purpose of gigaset_device_release() is to kfree() the struct
ser_cardstate that contains our struct device. This is done via a bit of
a detour. First we make our struct device's driver_data point to the
container of our struct ser_cardstate (which is a struct cardstate). In
gigaset_device_release() we then retrieve that driver_data again. And
after that we finally kfree() the struct ser_cardstate that was saved in
the struct cardstate.
All of this can be achieved much easier by using container_of() to get
from our struct device to its container, struct ser_cardstate. Do so.
Note that at the time the detour was implemented commit
b8b2c7d845d5
("base/platform: assert that dev_pm_domain callbacks are called
unconditionally") had just entered the tree. That commit disconnected
our platform_device and our platform_driver. These were reconnected
again in v4.5-rc2 through commit
25cad69f21f5 ("base/platform: Fix
platform drivers with no probe callback"). And one of the consequences
of that fix was that it broke the detour via driver_data. That's because
it made __device_release_driver() stop being a NOP for our struct device
and actually do stuff again. One of the things it now does, is setting
our driver_data to NULL. That, in turn, makes it impossible for
gigaset_device_release() to get to our struct cardstate. Which has the
net effect of leaking a struct ser_cardstate at every call of this
driver's tty close() operation. So using container_of() has the
additional benefit of actually working.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Acked-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Zefan Li <lizefan@huawei.com>
Tilman Schmidt [Tue, 15 Dec 2015 17:11:31 +0000 (18:11 +0100)]
ser_gigaset: remove unnecessary kfree() calls from release method
commit
8aeb3c3d655e22d3aa5ba49f313157bd27354bb4 upstream.
device->platform_data and platform_device->resource are never used
and remain NULL through their entire life. Drops the kfree() calls
for them from the device release method.
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Zefan Li <lizefan@huawei.com>
Konrad Rzeszutek Wilk [Thu, 11 Feb 2016 21:10:24 +0000 (16:10 -0500)]
xen/pciback: Save the number of MSI-X entries to be copied later.
commit
d159457b84395927b5a52adb72f748dd089ad5e5 upstream.
Commit
8135cf8b092723dbfcc611fe6fdcb3a36c9951c5 (xen/pciback: Save
xen_pci_op commands before processing it) broke enabling MSI-X because
it would never copy the resulting vectors into the response. The
number of vectors requested was being overwritten by the return value
(typically zero for success).
Save the number of vectors before processing the op, so the correct
number of vectors are copied afterwards.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Zefan Li <lizefan@huawei.com>
Tetsuo Handa [Fri, 5 Feb 2016 23:36:30 +0000 (15:36 -0800)]
mm, vmstat: fix wrong WQ sleep when memory reclaim doesn't make any progress
commit
564e81a57f9788b1475127012e0fd44e9049e342 upstream.
Jan Stancek has reported that system occasionally hanging after "oom01"
testcase from LTP triggers OOM. Guessing from a result that there is a
kworker thread doing memory allocation and the values between "Node 0
Normal free:" and "Node 0 Normal:" differs when hanging, vmstat is not
up-to-date for some reason.
According to commit
373ccbe59270 ("mm, vmstat: allow WQ concurrency to
discover memory reclaim doesn't make any progress"), it meant to force
the kworker thread to take a short sleep, but it by error used
schedule_timeout(1). We missed that schedule_timeout() in state
TASK_RUNNING doesn't do anything.
Fix it by using schedule_timeout_uninterruptible(1) which forces the
kworker thread to take a short sleep in order to make sure that vmstat
is up-to-date.
Fixes: 373ccbe59270 ("mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any progress")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: Jan Stancek <jstancek@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Cristopher Lameter <clameter@sgi.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Arkadiusz Miskiewicz <arekm@maven.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Zefan Li <lizefan@huawei.com>
John Stultz [Thu, 11 Jun 2015 22:54:55 +0000 (15:54 -0700)]
time: Prevent early expiry of hrtimers[CLOCK_REALTIME] at the leap second edge
commit
833f32d763028c1bb371c64f457788b933773b3e upstream.
Currently, leapsecond adjustments are done at tick time. As a result,
the leapsecond was applied at the first timer tick *after* the
leapsecond (~1-10ms late depending on HZ), rather then exactly on the
second edge.
This was in part historical from back when we were always tick based,
but correcting this since has been avoided since it adds extra
conditional checks in the gettime fastpath, which has performance
overhead.
However, it was recently pointed out that ABS_TIME CLOCK_REALTIME
timers set for right after the leapsecond could fire a second early,
since some timers may be expired before we trigger the timekeeping
timer, which then applies the leapsecond.
This isn't quite as bad as it sounds, since behaviorally it is similar
to what is possible w/ ntpd made leapsecond adjustments done w/o using
the kernel discipline. Where due to latencies, timers may fire just
prior to the settimeofday call. (Also, one should note that all
applications using CLOCK_REALTIME timers should always be careful,
since they are prone to quirks from settimeofday() disturbances.)
However, the purpose of having the kernel do the leap adjustment is to
avoid such latencies, so I think this is worth fixing.
So in order to properly keep those timers from firing a second early,
this patch modifies the ntp and timekeeping logic so that we keep
enough state so that the update_base_offsets_now accessor, which
provides the hrtimer core the current time, can check and apply the
leapsecond adjustment on the second edge. This prevents the hrtimer
core from expiring timers too early.
This patch does not modify any other time read path, so no additional
overhead is incurred. However, this also means that the leap-second
continues to be applied at tick time for all other read-paths.
Apologies to Richard Cochran, who pushed for similar changes years
ago, which I resisted due to the concerns about the performance
overhead.
While I suspect this isn't extremely critical, folks who care about
strict leap-second correctness will likely want to watch
this. Potentially a -stable candidate eventually.
Originally-suggested-by: Richard Cochran <richardcochran@gmail.com>
Reported-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Reported-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jiri Bohac <jbohac@suse.cz>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/1434063297-28657-4-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[Yadi: Move do_adjtimex to timekeeping.c and solve context issues]
Signed-off-by: Hu <yadi.hu@windriver.com>
Signed-off-by: Zefan Li <lizefan@huawei.com>
Eric Dumazet [Sun, 10 Jul 2016 08:04:02 +0000 (10:04 +0200)]
tcp: make challenge acks less predictable
commit
75ff39ccc1bd5d3c455b6822ab09e533c551f758 upstream.
Yue Cao claims that current host rate limiting of challenge ACKS
(RFC 5961) could leak enough information to allow a patient attacker
to hijack TCP sessions. He will soon provide details in an academic
paper.
This patch increases the default limit from 100 to 1000, and adds
some randomization so that the attacker can no longer hijack
sessions without spending a considerable amount of probes.
Based on initial analysis and patch from Linus.
Note that we also have per socket rate limiting, so it is tempting
to remove the host limit in the future.
v2: randomize the count of challenge acks per second, not the period.
Fixes: 282f23c6ee34 ("tcp: implement RFC 5961 3.2")
Reported-by: Yue Cao <ycao009@ucr.edu>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[lizf: Backported to 3.4:
- adjust context
- use ACCESS_ONCE instead WRITE_ONCE/READ_ONCE
- open-code prandom_u32_max()]
Signed-off-by: Zefan Li <lizefan@huawei.com>
Zefan Li [Sun, 9 Oct 2016 11:23:12 +0000 (19:23 +0800)]
Revert "USB: Add OTG PET device to TPL"
This reverts commit
97fa724b23c3dd22e9c0979ad0e9d260cc6d545d.
Conflicts:
drivers/usb/core/quirks.c
Signed-off-by: Zefan Li <lizefan@huawei.com>
Zefan Li [Sun, 9 Oct 2016 11:20:47 +0000 (19:20 +0800)]
Revert "USB: Add device quirk for ASUS T100 Base Station keyboard"
This reverts commit
eea5a87d270e8d6925063019c3b0f3ff61fcb49a.
Conflicts:
drivers/usb/core/quirks.c
include/linux/usb/quirks.h
Signed-off-by: Zefan Li <lizefan@huawei.com>
Zefan Li [Sun, 9 Oct 2016 11:06:49 +0000 (19:06 +0800)]
Fix incomplete backport of commit
0f792cf949a0
Signed-off-by: Zefan Li <lizefan@huawei.com>
Zefan Li [Sun, 9 Oct 2016 11:00:59 +0000 (19:00 +0800)]
Fix incomplete backport of commit
423f04d63cf4
Signed-off-by: Zefan Li <lizefan@huawei.com>
Nicolas Dichtel [Wed, 5 Sep 2012 02:12:42 +0000 (02:12 +0000)]
ipv6: fix handling of blackhole and prohibit routes
commit
ef2c7d7b59708d54213c7556a82d14de9a7e4475 upstream.
When adding a blackhole or a prohibit route, they were handling like classic
routes. Moreover, it was only possible to add this kind of routes by specifying
an interface.
Bug already reported here:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=498498
Before the patch:
$ ip route add blackhole 2001::1/128
RTNETLINK answers: No such device
$ ip route add blackhole 2001::1/128 dev eth0
$ ip -6 route | grep 2001
2001::1 dev eth0 metric 1024
After:
$ ip route add blackhole 2001::1/128
$ ip -6 route | grep 2001
blackhole 2001::1 dev lo metric 1024 error -22
v2: wrong patch
v3: add a field fc_type in struct fib6_config to store RTN_* type
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Zefan Li <lizefan@huawei.com>
Michal Kubeček [Mon, 9 Sep 2013 19:45:04 +0000 (21:45 +0200)]
ipv6: don't call fib6_run_gc() until routing is ready
commit
2c861cc65ef4604011a0082e4dcdba2819aa191a upstream.
When loading the ipv6 module, ndisc_init() is called before
ip6_route_init(). As the former registers a handler calling
fib6_run_gc(), this opens a window to run the garbage collector
before necessary data structures are initialized. If a network
device is initialized in this window, adding MAC address to it
triggers a NETDEV_CHANGEADDR event, leading to a crash in
fib6_clean_all().
Take the event handler registration out of ndisc_init() into a
separate function ndisc_late_init() and move it after
ip6_route_init().
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Zefan Li <lizefan@huawei.com>
Michal Kubeček [Thu, 1 Aug 2013 08:04:24 +0000 (10:04 +0200)]
ipv6: update ip6_rt_last_gc every time GC is run
commit
49a18d86f66d33a20144ecb5a34bba0d1856b260 upstream.
As pointed out by Eric Dumazet, net->ipv6.ip6_rt_last_gc should
hold the last time garbage collector was run so that we should
update it whenever fib6_run_gc() calls fib6_clean_all(), not only
if we got there from ip6_dst_gc().
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Zefan Li <lizefan@huawei.com>
Karl Heiss [Thu, 24 Sep 2015 16:15:07 +0000 (12:15 -0400)]
sctp: Prevent soft lockup when sctp_accept() is called during a timeout event
commit
635682a14427d241bab7bbdeebb48a7d7b91638e upstream.
A case can occur when sctp_accept() is called by the user during
a heartbeat timeout event after the 4-way handshake. Since
sctp_assoc_migrate() changes both assoc->base.sk and assoc->ep, the
bh_sock_lock in sctp_generate_heartbeat_event() will be taken with
the listening socket but released with the new association socket.
The result is a deadlock on any future attempts to take the listening
socket lock.
Note that this race can occur with other SCTP timeouts that take
the bh_lock_sock() in the event sctp_accept() is called.
BUG: soft lockup - CPU#9 stuck for 67s! [swapper:0]
...
RIP: 0010:[<
ffffffff8152d48e>] [<
ffffffff8152d48e>] _spin_lock+0x1e/0x30
RSP: 0018:
ffff880028323b20 EFLAGS:
00000206
RAX:
0000000000000002 RBX:
ffff880028323b20 RCX:
0000000000000000
RDX:
0000000000000000 RSI:
ffff880028323be0 RDI:
ffff8804632c4b48
RBP:
ffffffff8100bb93 R08:
0000000000000000 R09:
0000000000000000
R10:
ffff880610662280 R11:
0000000000000100 R12:
ffff880028323aa0
R13:
ffff8804383c3880 R14:
ffff880028323a90 R15:
ffffffff81534225
FS:
0000000000000000(0000) GS:
ffff880028320000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0:
000000008005003b
CR2:
00000000006df528 CR3:
0000000001a85000 CR4:
00000000000006e0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000ffff0ff0 DR7:
0000000000000400
Process swapper (pid: 0, threadinfo
ffff880616b70000, task
ffff880616b6cab0)
Stack:
ffff880028323c40 ffffffffa01c2582 ffff880614cfb020 0000000000000000
<d>
0100000000000000 00000014383a6c44 ffff8804383c3880 ffff880614e93c00
<d>
ffff880614e93c00 0000000000000000 ffff8804632c4b00 ffff8804383c38b8
Call Trace:
<IRQ>
[<
ffffffffa01c2582>] ? sctp_rcv+0x492/0xa10 [sctp]
[<
ffffffff8148c559>] ? nf_iterate+0x69/0xb0
[<
ffffffff814974a0>] ? ip_local_deliver_finish+0x0/0x2d0
[<
ffffffff8148c716>] ? nf_hook_slow+0x76/0x120
[<
ffffffff814974a0>] ? ip_local_deliver_finish+0x0/0x2d0
[<
ffffffff8149757d>] ? ip_local_deliver_finish+0xdd/0x2d0
[<
ffffffff81497808>] ? ip_local_deliver+0x98/0xa0
[<
ffffffff81496ccd>] ? ip_rcv_finish+0x12d/0x440
[<
ffffffff81497255>] ? ip_rcv+0x275/0x350
[<
ffffffff8145cfeb>] ? __netif_receive_skb+0x4ab/0x750
...
With lockdep debugging:
=====================================
[ BUG: bad unlock balance detected! ]
-------------------------------------
CslRx/12087 is trying to release lock (slock-AF_INET) at:
[<
ffffffffa01bcae0>] sctp_generate_timeout_event+0x40/0xe0 [sctp]
but there are no more locks to release!
other info that might help us debug this:
2 locks held by CslRx/12087:
#0: (&asoc->timers[i]){+.-...}, at: [<
ffffffff8108ce1f>] run_timer_softirq+0x16f/0x3e0
#1: (slock-AF_INET){+.-...}, at: [<
ffffffffa01bcac3>] sctp_generate_timeout_event+0x23/0xe0 [sctp]
Ensure the socket taken is also the same one that is released by
saving a copy of the socket before entering the timeout event
critical section.
Signed-off-by: Karl Heiss <kheiss@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
- Net namespaces are not used
- Keep using sctp_bh_{,un}lock_sock()
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Zefan Li <lizefan@huawei.com>
Dave Airlie [Thu, 20 Aug 2015 00:13:55 +0000 (10:13 +1000)]
drm/radeon: fix hotplug race at startup
commit
7f98ca454ad373fc1b76be804fa7138ff68c1d27 upstream.
We apparantly get a hotplug irq before we've initialised
modesetting,
[drm] Loading R100 Microcode
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<
c125f56f>] __mutex_lock_slowpath+0x23/0x91
*pde =
00000000
Oops: 0002 [#1]
Modules linked in: radeon(+) drm_kms_helper ttm drm i2c_algo_bit backlight pcspkr psmouse evdev sr_mod input_leds led_class cdrom sg parport_pc parport floppy intel_agp intel_gtt lpc_ich acpi_cpufreq processor button mfd_core agpgart uhci_hcd ehci_hcd rng_core snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm usbcore usb_common i2c_i801 i2c_core snd_timer snd soundcore thermal_sys
CPU: 0 PID: 15 Comm: kworker/0:1 Not tainted
4.2.0-rc7-00015-gbf67402 #111
Hardware name: MicroLink /D850MV , BIOS MV85010A.86A.0067.P24.
0304081124 04/08/2003
Workqueue: events radeon_hotplug_work_func [radeon]
task:
f6ca5900 ti:
f6d3e000 task.ti:
f6d3e000
EIP: 0060:[<
c125f56f>] EFLAGS:
00010282 CPU: 0
EIP is at __mutex_lock_slowpath+0x23/0x91
EAX:
00000000 EBX:
f5e900fc ECX:
00000000 EDX:
fffffffe
ESI:
f6ca5900 EDI:
f5e90100 EBP:
f5e90000 ESP:
f6d3ff0c
DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
CR0:
8005003b CR2:
00000000 CR3:
36f61000 CR4:
000006d0
Stack:
f5e90100 00000000 c103c4c1 f6d2a5a0 f5e900fc f6df394c c125f162 f8b0faca
f6d2a5a0 c138ca00 f6df394c f7395600 c1034741 00d40000 00000000 f6d2a5a0
c138ca00 f6d2a5b8 c138ca10 c1034b58 00000001 f6d40000 f6ca5900 f6d0c940
Call Trace:
[<
c103c4c1>] ? dequeue_task_fair+0xa4/0xb7
[<
c125f162>] ? mutex_lock+0x9/0xa
[<
f8b0faca>] ? radeon_hotplug_work_func+0x17/0x57 [radeon]
[<
c1034741>] ? process_one_work+0xfc/0x194
[<
c1034b58>] ? worker_thread+0x18d/0x218
[<
c10349cb>] ? rescuer_thread+0x1d5/0x1d5
[<
c103742a>] ? kthread+0x7b/0x80
[<
c12601c0>] ? ret_from_kernel_thread+0x20/0x30
[<
c10373af>] ? init_completion+0x18/0x18
Code: 42 08 e8 8e a6 dd ff c3 57 56 53 83 ec 0c 8b 35 48 f7 37 c1 8b 10 4a 74 1a 89 c3 8d 78 04 8b 40 08 89 63
Reported-and-Tested-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Zefan Li <lizefan@huawei.com>
Eric Dumazet [Wed, 30 Dec 2015 13:51:12 +0000 (08:51 -0500)]
udp: properly support MSG_PEEK with truncated buffers
commit
197c949e7798fbf28cfadc69d9ca0c2abbf93191 upstream.
Backport of this upstream commit into stable kernels :
89c22d8c3b27 ("net: Fix skb csum races when peeking")
exposed a bug in udp stack vs MSG_PEEK support, when user provides
a buffer smaller than skb payload.
In this case,
skb_copy_and_csum_datagram_iovec(skb, sizeof(struct udphdr),
msg->msg_iov);
returns -EFAULT.
This bug does not happen in upstream kernels since Al Viro did a great
job to replace this into :
skb_copy_and_csum_datagram_msg(skb, sizeof(struct udphdr), msg);
This variant is safe vs short buffers.
For the time being, instead reverting Herbert Xu patch and add back
skb->ip_summed invalid changes, simply store the result of
udp_lib_checksum_complete() so that we avoid computing the checksum a
second time, and avoid the problematic
skb_copy_and_csum_datagram_iovec() call.
This patch can be applied on recent kernels as it avoids a double
checksumming, then backported to stable kernels as a bug fix.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li <lizefan@huawei.com>
Herbert Xu [Mon, 13 Jul 2015 12:01:42 +0000 (20:01 +0800)]
net: Fix skb csum races when peeking
[ Upstream commit
89c22d8c3b278212eef6a8cc66b570bc840a6f5a ]
When we calculate the checksum on the recv path, we store the
result in the skb as an optimisation in case we need the checksum
again down the line.
This is in fact bogus for the MSG_PEEK case as this is done without
any locking. So multiple threads can peek and then store the result
to the same skb, potentially resulting in bogus skb states.
This patch fixes this by only storing the result if the skb is not
shared. This preserves the optimisations for the few cases where
it can be done safely due to locking or other reasons, e.g., SIOCINQ.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Zefan Li <lizefan@huawei.com>
Ben Hutchings [Wed, 23 Dec 2015 13:25:54 +0000 (13:25 +0000)]
USB: ti_usb_3410_502: Fix ID table size
Commit
35a2fbc941ac ("USB: serial: ti_usb_3410_5052: new device id for
Abbot strip port cable") failed to update the size of the
ti_id_table_3410 array. This doesn't need to be fixed upstream
following commit
d7ece6515e12 ("USB: ti_usb_3410_5052: remove
vendor/product module parameters") but should be fixed in stable
branches older than 3.12.
Backports of commit
c9d09dc7ad10 ("USB: serial: ti_usb_3410_5052: add
Abbott strip port ID to combined table as well.") similarly failed to
update the size of the ti_id_table_combined array.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Zefan Li <lizefan@huawei.com>
Eric Dumazet [Wed, 1 May 2013 05:24:03 +0000 (05:24 +0000)]
af_unix: fix a fatal race with bit fields
commit
60bc851ae59bfe99be6ee89d6bc50008c85ec75d upstream.
Using bit fields is dangerous on ppc64/sparc64, as the compiler [1]
uses 64bit instructions to manipulate them.
If the 64bit word includes any atomic_t or spinlock_t, we can lose
critical concurrent changes.
This is happening in af_unix, where unix_sk(sk)->gc_candidate/
gc_maybe_cycle/lock share the same 64bit word.
This leads to fatal deadlock, as one/several cpus spin forever
on a spinlock that will never be available again.
A safer way would be to use a long to store flags.
This way we are sure compiler/arch wont do bad things.
As we own unix_gc_lock spinlock when clearing or setting bits,
we can use the non atomic __set_bit()/__clear_bit().
recursion_level can share the same 64bit location with the spinlock,
as it is set only with this spinlock held.
[1] bug fixed in gcc-4.8.0 :
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080
Reported-by: Ambrose Feinstein <ambrose@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: hejianet <hejianet@gmail.com>
Signed-off-by: Zefan Li <lizefan@huawei.com>
Francesco Ruggeri [Wed, 6 Jan 2016 08:18:48 +0000 (00:18 -0800)]
net: possible use after free in dst_release
commit
07a5d38453599052aff0877b16bb9c1585f08609 upstream.
dst_release should not access dst->flags after decrementing
__refcnt to 0. The dst_entry may be in dst_busy_list and
dst_gc_task may dst_destroy it before dst_release gets a chance
to access dst->flags.
Fixes: d69bbf88c8d0 ("net: fix a race in dst_release()")
Fixes: 27b75c95f10d ("net: avoid RCU for NOCACHE dst")
Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li <lizefan@huawei.com>
Colin Ian King [Wed, 30 Dec 2015 23:06:41 +0000 (23:06 +0000)]
ftrace/scripts: Fix incorrect use of sprintf in recordmcount
commit
713a3e4de707fab49d5aa4bceb77db1058572a7b upstream.
Fix build warning:
scripts/recordmcount.c:589:4: warning: format not a string
literal and no format arguments [-Wformat-security]
sprintf("%s: failed\n", file);
Fixes: a50bd43935586 ("ftrace/scripts: Have recordmcount copy the object file")
Link: http://lkml.kernel.org/r/1451516801-16951-1-git-send-email-colin.king@canonical.com
Cc: Li Bin <huawei.libin@huawei.com>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Zefan Li <lizefan@huawei.com>
Andrew Banman [Tue, 29 Dec 2015 22:54:25 +0000 (14:54 -0800)]
mm/memory_hotplug.c: check for missing sections in test_pages_in_a_zone()
commit
5f0f2887f4de9508dcf438deab28f1de8070c271 upstream.
test_pages_in_a_zone() does not account for the possibility of missing
sections in the given pfn range. pfn_valid_within always returns 1 when
CONFIG_HOLES_IN_ZONE is not set, allowing invalid pfns from missing
sections to pass the test, leading to a kernel oops.
Wrap an additional pfn loop with PAGES_PER_SECTION granularity to check
for missing sections before proceeding into the zone-check code.
This also prevents a crash from offlining memory devices with missing
sections. Despite this, it may be a good idea to keep the related patch
'[PATCH 3/3] drivers: memory: prohibit offlining of memory blocks with
missing sections' because missing sections in a memory block may lead to
other problems not covered by the scope of this fix.
Signed-off-by: Andrew Banman <abanman@sgi.com>
Acked-by: Alex Thorlton <athorlton@sgi.com>
Cc: Russ Anderson <rja@sgi.com>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Greg KH <greg@kroah.com>
Cc: Seth Jennings <sjennings@variantweb.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Zefan Li <lizefan@huawei.com>
Joseph Qi [Tue, 29 Dec 2015 22:54:06 +0000 (14:54 -0800)]
ocfs2: fix BUG when calculate new backup super
commit
5c9ee4cbf2a945271f25b89b137f2c03bbc3be33 upstream.
When resizing, it firstly extends the last gd. Once it should backup
super in the gd, it calculates new backup super and update the
corresponding value.
But it currently doesn't consider the situation that the backup super is
already done. And in this case, it still sets the bit in gd bitmap and
then decrease from bg_free_bits_count, which leads to a corrupted gd and
trigger the BUG in ocfs2_block_group_set_bits:
BUG_ON(le16_to_cpu(bg->bg_free_bits_count) < num_bits);
So check whether the backup super is done and then do the updates.
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Jiufei Xue <xuejiufei@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li <lizefan@huawei.com>
Andrey Ryabinin [Mon, 21 Dec 2015 09:54:45 +0000 (12:54 +0300)]
ipv6/addrlabel: fix ip6addrlbl_get()
commit
e459dfeeb64008b2d23bdf600f03b3605dbb8152 upstream.
ip6addrlbl_get() has never worked. If ip6addrlbl_hold() succeeded,
ip6addrlbl_get() will exit with '-ESRCH'. If ip6addrlbl_hold() failed,
ip6addrlbl_get() will use about to be free ip6addrlbl_entry pointer.
Fix this by inverting ip6addrlbl_hold() check.
Fixes: 2a8cc6c89039 ("[IPV6] ADDRCONF: Support RFC3484 configurable address selection policy table.")
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: Cong Wang <cwang@twopensource.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Zefan Li <lizefan@huawei.com>
Helge Deller [Mon, 21 Dec 2015 09:03:30 +0000 (10:03 +0100)]
parisc: Fix syscall restarts
commit
71a71fb5374a23be36a91981b5614590b9e722c3 upstream.
On parisc syscalls which are interrupted by signals sometimes failed to
restart and instead returned -ENOSYS which in the worst case lead to
userspace crashes.
A similiar problem existed on MIPS and was fixed by commit
e967ef02
("MIPS: Fix restart of indirect syscalls").
On parisc the current syscall restart code assumes that all syscall
callers load the syscall number in the delay slot of the ble
instruction. That's how it is e.g. done in the unistd.h header file:
ble 0x100(%sr2, %r0)
ldi #syscall_nr, %r20
Because of that assumption the current code never restored %r20 before
returning to userspace.
This assumption is at least not true for code which uses the glibc
syscall() function, which instead uses this syntax:
ble 0x100(%sr2, %r0)
copy regX, %r20
where regX depend on how the compiler optimizes the code and register
usage.
This patch fixes this problem by adding code to analyze how the syscall
number is loaded in the delay branch and - if needed - copy the syscall
number to regX prior returning to userspace for the syscall restart.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li <lizefan@huawei.com>
David Howells [Fri, 18 Dec 2015 01:34:26 +0000 (01:34 +0000)]
KEYS: Fix race between read and revoke
commit
b4a1b4f5047e4f54e194681125c74c0aa64d637d upstream.
This fixes CVE-2015-7550.
There's a race between keyctl_read() and keyctl_revoke(). If the revoke
happens between keyctl_read() checking the validity of a key and the key's
semaphore being taken, then the key type read method will see a revoked key.
This causes a problem for the user-defined key type because it assumes in
its read method that there will always be a payload in a non-revoked key
and doesn't check for a NULL pointer.
Fix this by making keyctl_read() check the validity of a key after taking
semaphore instead of before.
I think the bug was introduced with the original keyrings code.
This was discovered by a multithreaded test program generated by syzkaller
(http://github.com/google/syzkaller). Here's a cleaned up version:
#include <sys/types.h>
#include <keyutils.h>
#include <pthread.h>
void *thr0(void *arg)
{
key_serial_t key = (unsigned long)arg;
keyctl_revoke(key);
return 0;
}
void *thr1(void *arg)
{
key_serial_t key = (unsigned long)arg;
char buffer[16];
keyctl_read(key, buffer, 16);
return 0;
}
int main()
{
key_serial_t key = add_key("user", "%", "foo", 3, KEY_SPEC_USER_KEYRING);
pthread_t th[5];
pthread_create(&th[0], 0, thr0, (void *)(unsigned long)key);
pthread_create(&th[1], 0, thr1, (void *)(unsigned long)key);
pthread_create(&th[2], 0, thr0, (void *)(unsigned long)key);
pthread_create(&th[3], 0, thr1, (void *)(unsigned long)key);
pthread_join(th[0], 0);
pthread_join(th[1], 0);
pthread_join(th[2], 0);
pthread_join(th[3], 0);
return 0;
}
Build as:
cc -o keyctl-race keyctl-race.c -lkeyutils -lpthread
Run as:
while keyctl-race; do :; done
as it may need several iterations to crash the kernel. The crash can be
summarised as:
BUG: unable to handle kernel NULL pointer dereference at
0000000000000010
IP: [<
ffffffff81279b08>] user_read+0x56/0xa3
...
Call Trace:
[<
ffffffff81276aa9>] keyctl_read_key+0xb6/0xd7
[<
ffffffff81277815>] SyS_keyctl+0x83/0xe0
[<
ffffffff815dbb97>] entry_SYSCALL_64_fastpath+0x12/0x6f
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Zefan Li <lizefan@huawei.com>
Alan Stern [Wed, 16 Dec 2015 18:32:38 +0000 (13:32 -0500)]
USB: fix invalid memory access in hub_activate()
commit
e50293ef9775c5f1cf3fcc093037dd6a8c5684ea upstream.
Commit
8520f38099cc ("USB: change hub initialization sleeps to
delayed_work") changed the hub_activate() routine to make part of it
run in a workqueue. However, the commit failed to take a reference to
the usb_hub structure or to lock the hub interface while doing so. As
a result, if a hub is plugged in and quickly unplugged before the work
routine can run, the routine will try to access memory that has been
deallocated. Or, if the hub is unplugged while the routine is
running, the memory may be deallocated while it is in active use.
This patch fixes the problem by taking a reference to the usb_hub at
the start of hub_activate() and releasing it at the end (when the work
is finished), and by locking the hub interface while the work routine
is running. It also adds a check at the start of the routine to see
if the hub has already been disconnected, in which nothing should be
done.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Alexandru Cornea <alexandru.cornea@intel.com>
Tested-by: Alexandru Cornea <alexandru.cornea@intel.com>
Fixes: 8520f38099cc ("USB: change hub initialization sleeps to delayed_work")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[lizf: Backported to 3.4: add forward declaration of hub_release()]
Signed-off-by: Zefan Li <lizefan@huawei.com>
Dan Carpenter [Wed, 16 Dec 2015 11:06:37 +0000 (14:06 +0300)]
USB: ipaq.c: fix a timeout loop
commit
abdc9a3b4bac97add99e1d77dc6d28623afe682b upstream.
The code expects the loop to end with "retries" set to zero but, because
it is a post-op, it will end set to -1. I have fixed this by moving the
decrement inside the loop.
Fixes: 014aa2a3c32e ('USB: ipaq: minor ipaq_open() cleanup.')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Zefan Li <lizefan@huawei.com>
Konrad Rzeszutek Wilk [Mon, 2 Nov 2015 23:13:27 +0000 (18:13 -0500)]
xen/pciback: Don't allow MSI-X ops if PCI_COMMAND_MEMORY is not set.
commit
408fb0e5aa7fda0059db282ff58c3b2a4278baa0 upstream.
commit
f598282f51 ("PCI: Fix the NIU MSI-X problem in a better way")
teaches us that dealing with MSI-X can be troublesome.
Further checks in the MSI-X architecture shows that if the
PCI_COMMAND_MEMORY bit is turned of in the PCI_COMMAND we
may not be able to access the BAR (since they are memory regions).
Since the MSI-X tables are located in there.. that can lead
to us causing PCIe errors. Inhibit us performing any
operation on the MSI-X unless the MEMORY bit is set.
Note that Xen hypervisor with:
"x86/MSI-X: access MSI-X table only after having enabled MSI-X"
will return:
xen_pciback: 0000:0a:00.1: error -6 enabling MSI-X for guest 3!
When the generic MSI code tries to setup the PIRQ without
MEMORY bit set. Which means with later versions of Xen
(4.6) this patch is not neccessary.
This is part of XSA-157
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Zefan Li <lizefan@huawei.com>
Konrad Rzeszutek Wilk [Wed, 1 Apr 2015 14:49:47 +0000 (10:49 -0400)]
xen/pciback: For XEN_PCI_OP_disable_msi[|x] only disable if device has MSI(X) enabled.
commit
7cfb905b9638982862f0331b36ccaaca5d383b49 upstream.
Otherwise just continue on, returning the same values as
previously (return of 0, and op->result has the PIRQ value).
This does not change the behavior of XEN_PCI_OP_disable_msi[|x].
The pci_disable_msi or pci_disable_msix have the checks for
msi_enabled or msix_enabled so they will error out immediately.
However the guest can still call these operations and cause
us to disable the 'ack_intr'. That means the backend IRQ handler
for the legacy interrupt will not respond to interrupts anymore.
This will lead to (if the device is causing an interrupt storm)
for the Linux generic code to disable the interrupt line.
Naturally this will only happen if the device in question
is plugged in on the motherboard on shared level interrupt GSI.
This is part of XSA-157
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Zefan Li <lizefan@huawei.com>
Konrad Rzeszutek Wilk [Mon, 2 Nov 2015 22:24:08 +0000 (17:24 -0500)]
xen/pciback: Do not install an IRQ handler for MSI interrupts.
commit
a396f3a210c3a61e94d6b87ec05a75d0be2a60d0 upstream.
Otherwise an guest can subvert the generic MSI code to trigger
an BUG_ON condition during MSI interrupt freeing:
for (i = 0; i < entry->nvec_used; i++)
BUG_ON(irq_has_action(entry->irq + i));
Xen PCI backed installs an IRQ handler (request_irq) for
the dev->irq whenever the guest writes PCI_COMMAND_MEMORY
(or PCI_COMMAND_IO) to the PCI_COMMAND register. This is
done in case the device has legacy interrupts the GSI line
is shared by the backend devices.
To subvert the backend the guest needs to make the backend
to change the dev->irq from the GSI to the MSI interrupt line,
make the backend allocate an interrupt handler, and then command
the backend to free the MSI interrupt and hit the BUG_ON.
Since the backend only calls 'request_irq' when the guest
writes to the PCI_COMMAND register the guest needs to call
XEN_PCI_OP_enable_msi before any other operation. This will
cause the generic MSI code to setup an MSI entry and
populate dev->irq with the new PIRQ value.
Then the guest can write to PCI_COMMAND PCI_COMMAND_MEMORY
and cause the backend to setup an IRQ handler for dev->irq
(which instead of the GSI value has the MSI pirq). See
'xen_pcibk_control_isr'.
Then the guest disables the MSI: XEN_PCI_OP_disable_msi
which ends up triggering the BUG_ON condition in 'free_msi_irqs'
as there is an IRQ handler for the entry->irq (dev->irq).
Note that this cannot be done using MSI-X as the generic
code does not over-write dev->irq with the MSI-X PIRQ values.
The patch inhibits setting up the IRQ handler if MSI or
MSI-X (for symmetry reasons) code had been called successfully.
P.S.
Xen PCIBack when it sets up the device for the guest consumption
ends up writting 0 to the PCI_COMMAND (see xen_pcibk_reset_device).
XSA-120 addendum patch removed that - however when upstreaming said
addendum we found that it caused issues with qemu upstream. That
has now been fixed in qemu upstream.
This is part of XSA-157
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Zefan Li <lizefan@huawei.com>
Konrad Rzeszutek Wilk [Mon, 2 Nov 2015 23:07:44 +0000 (18:07 -0500)]
xen/pciback: Return error on XEN_PCI_OP_enable_msix when device has MSI or MSI-X enabled
commit
5e0ce1455c09dd61d029b8ad45d82e1ac0b6c4c9 upstream.
The guest sequence of:
a) XEN_PCI_OP_enable_msix
b) XEN_PCI_OP_enable_msix
results in hitting an NULL pointer due to using freed pointers.
The device passed in the guest MUST have MSI-X capability.
The a) constructs and SysFS representation of MSI and MSI groups.
The b) adds a second set of them but adding in to SysFS fails (duplicate entry).
'populate_msi_sysfs' frees the newly allocated msi_irq_groups (note that
in a) pdev->msi_irq_groups is still set) and also free's ALL of the
MSI-X entries of the device (the ones allocated in step a) and b)).
The unwind code: 'free_msi_irqs' deletes all the entries and tries to
delete the pdev->msi_irq_groups (which hasn't been set to NULL).
However the pointers in the SysFS are already freed and we hit an
NULL pointer further on when 'strlen' is attempted on a freed pointer.
The patch adds a simple check in the XEN_PCI_OP_enable_msix to guard
against that. The check for msi_enabled is not stricly neccessary.
This is part of XSA-157
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Zefan Li <lizefan@huawei.com>
Konrad Rzeszutek Wilk [Fri, 3 Apr 2015 15:08:22 +0000 (11:08 -0400)]
xen/pciback: Return error on XEN_PCI_OP_enable_msi when device has MSI or MSI-X enabled
commit
56441f3c8e5bd45aab10dd9f8c505dd4bec03b0d upstream.
The guest sequence of:
a) XEN_PCI_OP_enable_msi
b) XEN_PCI_OP_enable_msi
c) XEN_PCI_OP_disable_msi
results in hitting an BUG_ON condition in the msi.c code.
The MSI code uses an dev->msi_list to which it adds MSI entries.
Under the above conditions an BUG_ON() can be hit. The device
passed in the guest MUST have MSI capability.
The a) adds the entry to the dev->msi_list and sets msi_enabled.
The b) adds a second entry but adding in to SysFS fails (duplicate entry)
and deletes all of the entries from msi_list and returns (with msi_enabled
is still set). c) pci_disable_msi passes the msi_enabled checks and hits:
BUG_ON(list_empty(dev_to_msi_list(&dev->dev)));
and blows up.
The patch adds a simple check in the XEN_PCI_OP_enable_msi to guard
against that. The check for msix_enabled is not stricly neccessary.
This is part of XSA-157.
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Zefan Li <lizefan@huawei.com>
Konrad Rzeszutek Wilk [Mon, 16 Nov 2015 17:40:48 +0000 (12:40 -0500)]
xen/pciback: Save xen_pci_op commands before processing it
commit
8135cf8b092723dbfcc611fe6fdcb3a36c9951c5 upstream.
Double fetch vulnerabilities that happen when a variable is
fetched twice from shared memory but a security check is only
performed the first time.
The xen_pcibk_do_op function performs a switch statements on the op->cmd
value which is stored in shared memory. Interestingly this can result
in a double fetch vulnerability depending on the performed compiler
optimization.
This patch fixes it by saving the xen_pci_op command before
processing it. We also use 'barrier' to make sure that the
compiler does not perform any optimization.
This is part of XSA155.
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Zefan Li <lizefan@huawei.com>
Roger Pau Monné [Tue, 3 Nov 2015 16:34:09 +0000 (16:34 +0000)]
xen-blkback: only read request operation from shared ring once
commit
1f13d75ccb806260079e0679d55d9253e370ec8a upstream.
A compiler may load a switch statement value multiple times, which could
be bad when the value is in memory shared with the frontend.
When converting a non-native request to a native one, ensure that
src->operation is only loaded once by using READ_ONCE().
This is part of XSA155.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[lizf: Backported to 3.4:
- adjust context
- call ACCESS_ONCE instead of READ_ONCE]
Signed-off-by: Zefan Li <lizefan@huawei.com>
David Vrabel [Fri, 30 Oct 2015 15:17:06 +0000 (15:17 +0000)]
xen-netback: use RING_COPY_REQUEST() throughout
commit
68a33bfd8403e4e22847165d149823a2e0e67c9c upstream.
Instead of open-coding memcpy()s and directly accessing Tx and Rx
requests, use the new RING_COPY_REQUEST() that ensures the local copy
is correct.
This is more than is strictly necessary for guest Rx requests since
only the id and gref fields are used and it is harmless if the
frontend modifies these.
This is part of XSA155.
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[lizf: Backported to 3.4:
- adjust context
- s/queue/vif/g]
Signed-off-by: Zefan Li <lizefan@huawei.com>
David Vrabel [Fri, 30 Oct 2015 15:16:01 +0000 (15:16 +0000)]
xen-netback: don't use last request to determine minimum Tx credit
commit
0f589967a73f1f30ab4ac4dd9ce0bb399b4d6357 upstream.
The last from guest transmitted request gives no indication about the
minimum amount of credit that the guest might need to send a packet
since the last packet might have been a small one.
Instead allow for the worst case 128 KiB packet.
This is part of XSA155.
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[lizf: Backported to 3.4: s/queue/vif/g]
Signed-off-by: Zefan Li <lizefan@huawei.com>
David Vrabel [Fri, 30 Oct 2015 14:58:08 +0000 (14:58 +0000)]
xen: Add RING_COPY_REQUEST()
commit
454d5d882c7e412b840e3c99010fe81a9862f6fb upstream.
Using RING_GET_REQUEST() on a shared ring is easy to use incorrectly
(i.e., by not considering that the other end may alter the data in the
shared ring while it is being inspected). Safe usage of a request
generally requires taking a local copy.
Provide a RING_COPY_REQUEST() macro to use instead of
RING_GET_REQUEST() and an open-coded memcpy(). This takes care of
ensuring that the copy is done correctly regardless of any possible
compiler optimizations.
Use a volatile source to prevent the compiler from reordering or
omitting the copy.
This is part of XSA155.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Zefan Li <lizefan@huawei.com>