PA-RISC systems with PA8800 and PA8900 processors have had problems with random segmentation faults for many years. Systems with earlier processors are much more stable. Systems with PA8800 and PA8900 processors have a large L2 cache which needs per page flushing for decent performance when a large range is flushed. The combined cache in these systems is also more sensitive to non-equivalent aliases than the caches in earlier systems. The majority of random segmentation faults that I have looked at appear to be memory corruption in memory allocated using mmap and malloc. My first attempt at fixing the random faults didn't work. On reviewing the cache code, I realized that there were two issues which the existing code didn't handle correctly. Both relate to cache move-in. Another issue is that the present bit in PTEs is racy. 1) PA-RISC caches have a mind of their own and they can speculatively load data and instructions for a page as long as there is a entry in the TLB for the page which allows move-in. TLBs are local to each CPU. Thus, the TLB entry for a page must be purged before flushing the page. This is particularly important on SMP systems. In some of the flush routines, the flush routine would be called and then the TLB entry would be purged. This was because the flush routine needed the TLB entry to do the flush. 2) My initial approach to trying the fix the random faults was to try and use flush_cache_page_if_present for all flush operations. This actually made things worse and led to a couple of hardware lockups. It finally dawned on me that some lines weren't being flushed because the pte check code was racy. This resulted in random inequivalent mappings to physical pages. The __flush_cache_page tmpalias flush sets up its own TLB entry and it doesn't need the existing TLB entry. As long as we can find the pte pointer for the vm page, we can get the pfn and physical address of the page. We can also purge the TLB entry for the page before doing the flush. Further, __flush_cache_page uses a special TLB entry that inhibits cache move-in. When switching page mappings, we need to ensure that lines are removed from the cache. It is not sufficient to just flush the lines to memory as they may come back. This made it clear that we needed to implement all the required flush operations using tmpalias routines. This includes flushes for user and kernel pages. After modifying the code to use tmpalias flushes, it became clear that the random segmentation faults were not fully resolved. The frequency of faults was worse on systems with a 64 MB L2 (PA8900) and systems with more CPUs (rp4440). The warning that I added to flush_cache_page_if_present to detect pages that couldn't be flushed triggered frequently on some systems. Helge and I looked at the pages that couldn't be flushed and found that the PTE was either cleared or for a swap page. Ignoring pages that were swapped out seemed okay but pages with cleared PTEs seemed problematic. I looked at routines related to pte_clear and noticed ptep_clear_flush. The default implementation just flushes the TLB entry. However, it was obvious that on parisc we need to flush the cache page as well. If we don't flush the cache page, stale lines will be left in the cache and cause random corruption. Once a PTE is cleared, there is no way to find the physical address associated with the PTE and flush the associated page at a later time. I implemented an updated change with a parisc specific version of ptep_clear_flush. It fixed the random data corruption on Helge's rp4440 and rp3440, as well as on my c8000. At this point, I realized that I could restore the code where we only flush in flush_cache_page_if_present if the page has been accessed. However, for this, we also need to flush the cache when the accessed bit is cleared in ptep_clear_flush_young to keep things synchronized. The default implementation only flushes the TLB entry. Other changes in this version are: 1) Implement parisc specific version of ptep_get. It's identical to default but needed in arch/parisc/include/asm/pgtable.h. 2) Revise parisc implementation of ptep_test_and_clear_young to use ptep_get (READ_ONCE). 3) Drop parisc implementation of ptep_get_and_clear. We can use default. 4) Revise flush_kernel_vmap_range and invalidate_kernel_vmap_range to use full data cache flush. 5) Move flush_cache_vmap and flush_cache_vunmap to cache.c. Handle VM_IOREMAP case in flush_cache_vmap. At this time, I don't know whether it is better to always flush when the PTE present bit is set or when both the accessed and present bits are set. The later saves flushing pages that haven't been accessed, but we need to flush in ptep_clear_flush_young. It also needs a page table lookup to find the PTE pointer. The lpa instruction only needs a page table lookup when the PTE entry isn't in the TLB. We don't atomically handle setting and clearing the _PAGE_ACCESSED bit. If we miss an update, we may miss a flush and the cache may get corrupted. Whether the current code is effectively atomic depends on process control. When CONFIG_FLUSH_PAGE_ACCESSED is set to zero, the page will eventually be flushed when the PTE is cleared or in flush_cache_page_if_present. The _PAGE_ACCESSED bit is not used, so the problem is avoided. The flush method can be selected using the CONFIG_FLUSH_PAGE_ACCESSED define in cache.c. The default is 0. I didn't see a large difference in performance. Signed-off-by: John David Anglin <dave.anglin@bell.net> Cc: <stable@vger.kernel.org> # v6.6+ Signed-off-by: Helge Deller <deller@gmx.de>
87 lines
3.1 KiB
C
87 lines
3.1 KiB
C
/* SPDX-License-Identifier: GPL-2.0 */
|
|
#ifndef _PARISC_CACHEFLUSH_H
|
|
#define _PARISC_CACHEFLUSH_H
|
|
|
|
#include <linux/mm.h>
|
|
#include <linux/uaccess.h>
|
|
#include <asm/tlbflush.h>
|
|
|
|
/* The usual comment is "Caches aren't brain-dead on the <architecture>".
|
|
* Unfortunately, that doesn't apply to PA-RISC. */
|
|
|
|
#include <linux/jump_label.h>
|
|
|
|
DECLARE_STATIC_KEY_TRUE(parisc_has_cache);
|
|
DECLARE_STATIC_KEY_TRUE(parisc_has_dcache);
|
|
DECLARE_STATIC_KEY_TRUE(parisc_has_icache);
|
|
|
|
#define flush_cache_dup_mm(mm) flush_cache_mm(mm)
|
|
|
|
void flush_user_icache_range_asm(unsigned long, unsigned long);
|
|
void flush_kernel_icache_range_asm(unsigned long, unsigned long);
|
|
void flush_user_dcache_range_asm(unsigned long, unsigned long);
|
|
void flush_kernel_dcache_range_asm(unsigned long, unsigned long);
|
|
void purge_kernel_dcache_range_asm(unsigned long, unsigned long);
|
|
void flush_kernel_dcache_page_asm(const void *addr);
|
|
void flush_kernel_icache_page(void *);
|
|
|
|
/* Cache flush operations */
|
|
|
|
void flush_cache_all_local(void);
|
|
void flush_cache_all(void);
|
|
void flush_cache_mm(struct mm_struct *mm);
|
|
|
|
#define flush_kernel_dcache_range(start,size) \
|
|
flush_kernel_dcache_range_asm((start), (start)+(size));
|
|
|
|
/* The only way to flush a vmap range is to flush whole cache */
|
|
#define ARCH_IMPLEMENTS_FLUSH_KERNEL_VMAP_RANGE 1
|
|
void flush_kernel_vmap_range(void *vaddr, int size);
|
|
void invalidate_kernel_vmap_range(void *vaddr, int size);
|
|
|
|
void flush_cache_vmap(unsigned long start, unsigned long end);
|
|
#define flush_cache_vmap_early(start, end) do { } while (0)
|
|
void flush_cache_vunmap(unsigned long start, unsigned long end);
|
|
|
|
void flush_dcache_folio(struct folio *folio);
|
|
#define flush_dcache_folio flush_dcache_folio
|
|
#define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
|
|
static inline void flush_dcache_page(struct page *page)
|
|
{
|
|
flush_dcache_folio(page_folio(page));
|
|
}
|
|
|
|
#define flush_dcache_mmap_lock(mapping) xa_lock_irq(&mapping->i_pages)
|
|
#define flush_dcache_mmap_unlock(mapping) xa_unlock_irq(&mapping->i_pages)
|
|
#define flush_dcache_mmap_lock_irqsave(mapping, flags) \
|
|
xa_lock_irqsave(&mapping->i_pages, flags)
|
|
#define flush_dcache_mmap_unlock_irqrestore(mapping, flags) \
|
|
xa_unlock_irqrestore(&mapping->i_pages, flags)
|
|
|
|
void flush_icache_pages(struct vm_area_struct *vma, struct page *page,
|
|
unsigned int nr);
|
|
#define flush_icache_pages flush_icache_pages
|
|
|
|
#define flush_icache_range(s,e) do { \
|
|
flush_kernel_dcache_range_asm(s,e); \
|
|
flush_kernel_icache_range_asm(s,e); \
|
|
} while (0)
|
|
|
|
void copy_to_user_page(struct vm_area_struct *vma, struct page *page,
|
|
unsigned long user_vaddr, void *dst, void *src, int len);
|
|
void copy_from_user_page(struct vm_area_struct *vma, struct page *page,
|
|
unsigned long user_vaddr, void *dst, void *src, int len);
|
|
void flush_cache_page(struct vm_area_struct *vma, unsigned long vmaddr,
|
|
unsigned long pfn);
|
|
void flush_cache_range(struct vm_area_struct *vma,
|
|
unsigned long start, unsigned long end);
|
|
|
|
#define ARCH_HAS_FLUSH_ANON_PAGE
|
|
void flush_anon_page(struct vm_area_struct *vma, struct page *page, unsigned long vmaddr);
|
|
|
|
#define ARCH_HAS_FLUSH_ON_KUNMAP
|
|
void kunmap_flush_on_unmap(const void *addr);
|
|
|
|
#endif /* _PARISC_CACHEFLUSH_H */
|
|
|