LCOV - code coverage report
Current view: top level - mm - percpu.c (source / functions) Hit Total Coverage
Test: combined.info Lines: 824 1069 77.1 %
Date: 2022-04-01 13:59:58 Functions: 35 42 83.3 %
Branches: 355 614 57.8 %

           Branch data     Line data    Source code
       1                 :            : // SPDX-License-Identifier: GPL-2.0-only
       2                 :            : /*
       3                 :            :  * mm/percpu.c - percpu memory allocator
       4                 :            :  *
       5                 :            :  * Copyright (C) 2009           SUSE Linux Products GmbH
       6                 :            :  * Copyright (C) 2009           Tejun Heo <tj@kernel.org>
       7                 :            :  *
       8                 :            :  * Copyright (C) 2017           Facebook Inc.
       9                 :            :  * Copyright (C) 2017           Dennis Zhou <dennisszhou@gmail.com>
      10                 :            :  *
      11                 :            :  * The percpu allocator handles both static and dynamic areas.  Percpu
      12                 :            :  * areas are allocated in chunks which are divided into units.  There is
      13                 :            :  * a 1-to-1 mapping for units to possible cpus.  These units are grouped
      14                 :            :  * based on NUMA properties of the machine.
      15                 :            :  *
      16                 :            :  *  c0                           c1                         c2
      17                 :            :  *  -------------------          -------------------        ------------
      18                 :            :  * | u0 | u1 | u2 | u3 |        | u0 | u1 | u2 | u3 |      | u0 | u1 | u
      19                 :            :  *  -------------------  ......  -------------------  ....  ------------
      20                 :            :  *
      21                 :            :  * Allocation is done by offsets into a unit's address space.  Ie., an
      22                 :            :  * area of 512 bytes at 6k in c1 occupies 512 bytes at 6k in c1:u0,
      23                 :            :  * c1:u1, c1:u2, etc.  On NUMA machines, the mapping may be non-linear
      24                 :            :  * and even sparse.  Access is handled by configuring percpu base
      25                 :            :  * registers according to the cpu to unit mappings and offsetting the
      26                 :            :  * base address using pcpu_unit_size.
      27                 :            :  *
      28                 :            :  * There is special consideration for the first chunk which must handle
      29                 :            :  * the static percpu variables in the kernel image as allocation services
      30                 :            :  * are not online yet.  In short, the first chunk is structured like so:
      31                 :            :  *
      32                 :            :  *                  <Static | [Reserved] | Dynamic>
      33                 :            :  *
      34                 :            :  * The static data is copied from the original section managed by the
      35                 :            :  * linker.  The reserved section, if non-zero, primarily manages static
      36                 :            :  * percpu variables from kernel modules.  Finally, the dynamic section
      37                 :            :  * takes care of normal allocations.
      38                 :            :  *
      39                 :            :  * The allocator organizes chunks into lists according to free size and
      40                 :            :  * tries to allocate from the fullest chunk first.  Each chunk is managed
      41                 :            :  * by a bitmap with metadata blocks.  The allocation map is updated on
      42                 :            :  * every allocation and free to reflect the current state while the boundary
      43                 :            :  * map is only updated on allocation.  Each metadata block contains
      44                 :            :  * information to help mitigate the need to iterate over large portions
      45                 :            :  * of the bitmap.  The reverse mapping from page to chunk is stored in
      46                 :            :  * the page's index.  Lastly, units are lazily backed and grow in unison.
      47                 :            :  *
      48                 :            :  * There is a unique conversion that goes on here between bytes and bits.
      49                 :            :  * Each bit represents a fragment of size PCPU_MIN_ALLOC_SIZE.  The chunk
      50                 :            :  * tracks the number of pages it is responsible for in nr_pages.  Helper
      51                 :            :  * functions are used to convert from between the bytes, bits, and blocks.
      52                 :            :  * All hints are managed in bits unless explicitly stated.
      53                 :            :  *
      54                 :            :  * To use this allocator, arch code should do the following:
      55                 :            :  *
      56                 :            :  * - define __addr_to_pcpu_ptr() and __pcpu_ptr_to_addr() to translate
      57                 :            :  *   regular address to percpu pointer and back if they need to be
      58                 :            :  *   different from the default
      59                 :            :  *
      60                 :            :  * - use pcpu_setup_first_chunk() during percpu area initialization to
      61                 :            :  *   setup the first chunk containing the kernel static percpu area
      62                 :            :  */
      63                 :            : 
      64                 :            : #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
      65                 :            : 
      66                 :            : #include <linux/bitmap.h>
      67                 :            : #include <linux/memblock.h>
      68                 :            : #include <linux/err.h>
      69                 :            : #include <linux/lcm.h>
      70                 :            : #include <linux/list.h>
      71                 :            : #include <linux/log2.h>
      72                 :            : #include <linux/mm.h>
      73                 :            : #include <linux/module.h>
      74                 :            : #include <linux/mutex.h>
      75                 :            : #include <linux/percpu.h>
      76                 :            : #include <linux/pfn.h>
      77                 :            : #include <linux/slab.h>
      78                 :            : #include <linux/spinlock.h>
      79                 :            : #include <linux/vmalloc.h>
      80                 :            : #include <linux/workqueue.h>
      81                 :            : #include <linux/kmemleak.h>
      82                 :            : #include <linux/sched.h>
      83                 :            : 
      84                 :            : #include <asm/cacheflush.h>
      85                 :            : #include <asm/sections.h>
      86                 :            : #include <asm/tlbflush.h>
      87                 :            : #include <asm/io.h>
      88                 :            : 
      89                 :            : #define CREATE_TRACE_POINTS
      90                 :            : #include <trace/events/percpu.h>
      91                 :            : 
      92                 :            : #include "percpu-internal.h"
      93                 :            : 
      94                 :            : /* the slots are sorted by free bytes left, 1-31 bytes share the same slot */
      95                 :            : #define PCPU_SLOT_BASE_SHIFT            5
      96                 :            : /* chunks in slots below this are subject to being sidelined on failed alloc */
      97                 :            : #define PCPU_SLOT_FAIL_THRESHOLD        3
      98                 :            : 
      99                 :            : #define PCPU_EMPTY_POP_PAGES_LOW        2
     100                 :            : #define PCPU_EMPTY_POP_PAGES_HIGH       4
     101                 :            : 
     102                 :            : #ifdef CONFIG_SMP
     103                 :            : /* default addr <-> pcpu_ptr mapping, override in asm/percpu.h if necessary */
     104                 :            : #ifndef __addr_to_pcpu_ptr
     105                 :            : #define __addr_to_pcpu_ptr(addr)                                        \
     106                 :            :         (void __percpu *)((unsigned long)(addr) -                       \
     107                 :            :                           (unsigned long)pcpu_base_addr +               \
     108                 :            :                           (unsigned long)__per_cpu_start)
     109                 :            : #endif
     110                 :            : #ifndef __pcpu_ptr_to_addr
     111                 :            : #define __pcpu_ptr_to_addr(ptr)                                         \
     112                 :            :         (void __force *)((unsigned long)(ptr) +                         \
     113                 :            :                          (unsigned long)pcpu_base_addr -                \
     114                 :            :                          (unsigned long)__per_cpu_start)
     115                 :            : #endif
     116                 :            : #else   /* CONFIG_SMP */
     117                 :            : /* on UP, it's always identity mapped */
     118                 :            : #define __addr_to_pcpu_ptr(addr)        (void __percpu *)(addr)
     119                 :            : #define __pcpu_ptr_to_addr(ptr)         (void __force *)(ptr)
     120                 :            : #endif  /* CONFIG_SMP */
     121                 :            : 
     122                 :            : static int pcpu_unit_pages __ro_after_init;
     123                 :            : static int pcpu_unit_size __ro_after_init;
     124                 :            : static int pcpu_nr_units __ro_after_init;
     125                 :            : static int pcpu_atom_size __ro_after_init;
     126                 :            : int pcpu_nr_slots __ro_after_init;
     127                 :            : static size_t pcpu_chunk_struct_size __ro_after_init;
     128                 :            : 
     129                 :            : /* cpus with the lowest and highest unit addresses */
     130                 :            : static unsigned int pcpu_low_unit_cpu __ro_after_init;
     131                 :            : static unsigned int pcpu_high_unit_cpu __ro_after_init;
     132                 :            : 
     133                 :            : /* the address of the first chunk which starts with the kernel static area */
     134                 :            : void *pcpu_base_addr __ro_after_init;
     135                 :            : EXPORT_SYMBOL_GPL(pcpu_base_addr);
     136                 :            : 
     137                 :            : static const int *pcpu_unit_map __ro_after_init;                /* cpu -> unit */
     138                 :            : const unsigned long *pcpu_unit_offsets __ro_after_init; /* cpu -> unit offset */
     139                 :            : 
     140                 :            : /* group information, used for vm allocation */
     141                 :            : static int pcpu_nr_groups __ro_after_init;
     142                 :            : static const unsigned long *pcpu_group_offsets __ro_after_init;
     143                 :            : static const size_t *pcpu_group_sizes __ro_after_init;
     144                 :            : 
     145                 :            : /*
     146                 :            :  * The first chunk which always exists.  Note that unlike other
     147                 :            :  * chunks, this one can be allocated and mapped in several different
     148                 :            :  * ways and thus often doesn't live in the vmalloc area.
     149                 :            :  */
     150                 :            : struct pcpu_chunk *pcpu_first_chunk __ro_after_init;
     151                 :            : 
     152                 :            : /*
     153                 :            :  * Optional reserved chunk.  This chunk reserves part of the first
     154                 :            :  * chunk and serves it for reserved allocations.  When the reserved
     155                 :            :  * region doesn't exist, the following variable is NULL.
     156                 :            :  */
     157                 :            : struct pcpu_chunk *pcpu_reserved_chunk __ro_after_init;
     158                 :            : 
     159                 :            : DEFINE_SPINLOCK(pcpu_lock);     /* all internal data structures */
     160                 :            : static DEFINE_MUTEX(pcpu_alloc_mutex);  /* chunk create/destroy, [de]pop, map ext */
     161                 :            : 
     162                 :            : struct list_head *pcpu_slot __ro_after_init; /* chunk list slots */
     163                 :            : 
     164                 :            : /* chunks which need their map areas extended, protected by pcpu_lock */
     165                 :            : static LIST_HEAD(pcpu_map_extend_chunks);
     166                 :            : 
     167                 :            : /*
     168                 :            :  * The number of empty populated pages, protected by pcpu_lock.  The
     169                 :            :  * reserved chunk doesn't contribute to the count.
     170                 :            :  */
     171                 :            : int pcpu_nr_empty_pop_pages;
     172                 :            : 
     173                 :            : /*
     174                 :            :  * The number of populated pages in use by the allocator, protected by
     175                 :            :  * pcpu_lock.  This number is kept per a unit per chunk (i.e. when a page gets
     176                 :            :  * allocated/deallocated, it is allocated/deallocated in all units of a chunk
     177                 :            :  * and increments/decrements this count by 1).
     178                 :            :  */
     179                 :            : static unsigned long pcpu_nr_populated;
     180                 :            : 
     181                 :            : /*
     182                 :            :  * Balance work is used to populate or destroy chunks asynchronously.  We
     183                 :            :  * try to keep the number of populated free pages between
     184                 :            :  * PCPU_EMPTY_POP_PAGES_LOW and HIGH for atomic allocations and at most one
     185                 :            :  * empty chunk.
     186                 :            :  */
     187                 :            : static void pcpu_balance_workfn(struct work_struct *work);
     188                 :            : static DECLARE_WORK(pcpu_balance_work, pcpu_balance_workfn);
     189                 :            : static bool pcpu_async_enabled __read_mostly;
     190                 :            : static bool pcpu_atomic_alloc_failed;
     191                 :            : 
     192                 :        156 : static void pcpu_schedule_balance_work(void)
     193                 :            : {
     194                 :        156 :         if (pcpu_async_enabled)
     195                 :        156 :                 schedule_work(&pcpu_balance_work);
     196                 :            : }
     197                 :            : 
     198                 :            : /**
     199                 :            :  * pcpu_addr_in_chunk - check if the address is served from this chunk
     200                 :            :  * @chunk: chunk of interest
     201                 :            :  * @addr: percpu address
     202                 :            :  *
     203                 :            :  * RETURNS:
     204                 :            :  * True if the address is served from this chunk.
     205                 :            :  */
     206                 :      10854 : static bool pcpu_addr_in_chunk(struct pcpu_chunk *chunk, void *addr)
     207                 :            : {
     208                 :      10854 :         void *start_addr, *end_addr;
     209                 :            : 
     210                 :      10854 :         if (!chunk)
     211                 :            :                 return false;
     212                 :            : 
     213                 :      10854 :         start_addr = chunk->base_addr + chunk->start_offset;
     214                 :      10854 :         end_addr = chunk->base_addr + chunk->nr_pages * PAGE_SIZE -
     215                 :      10854 :                    chunk->end_offset;
     216                 :            : 
     217                 :      10854 :         return addr >= start_addr && addr < end_addr;
     218                 :            : }
     219                 :            : 
     220                 :     247707 : static int __pcpu_size_to_slot(int size)
     221                 :            : {
     222                 :     247707 :         int highbit = fls(size);        /* size is in bytes */
     223                 :     247629 :         return max(highbit - PCPU_SLOT_BASE_SHIFT + 2, 1);
     224                 :            : }
     225                 :            : 
     226                 :     247785 : static int pcpu_size_to_slot(int size)
     227                 :            : {
     228                 :     247785 :         if (size == pcpu_unit_size)
     229                 :        156 :                 return pcpu_nr_slots - 1;
     230                 :     247629 :         return __pcpu_size_to_slot(size);
     231                 :            : }
     232                 :            : 
     233                 :     169224 : static int pcpu_chunk_slot(const struct pcpu_chunk *chunk)
     234                 :            : {
     235                 :     169224 :         const struct pcpu_block_md *chunk_md = &chunk->chunk_md;
     236                 :            : 
     237                 :     338370 :         if (chunk->free_bytes < PCPU_MIN_ALLOC_SIZE ||
     238   [ +  -  +  -  :     169146 :             chunk_md->contig_hint == 0)
                   +  - ]
     239                 :            :                 return 0;
     240                 :            : 
     241   [ -  +  +  +  :     169146 :         return pcpu_size_to_slot(chunk_md->contig_hint * PCPU_MIN_ALLOC_SIZE);
                   +  + ]
     242                 :            : }
     243                 :            : 
     244                 :            : /* set the pointer to a chunk in a page struct */
     245                 :        468 : static void pcpu_set_page_chunk(struct page *page, struct pcpu_chunk *pcpu)
     246                 :            : {
     247                 :        468 :         page->index = (unsigned long)pcpu;
     248                 :            : }
     249                 :            : 
     250                 :            : /* obtain pointer to a chunk from a page struct */
     251                 :       4725 : static struct pcpu_chunk *pcpu_get_page_chunk(struct page *page)
     252                 :            : {
     253                 :       4725 :         return (struct pcpu_chunk *)page->index;
     254                 :            : }
     255                 :            : 
     256                 :        624 : static int __maybe_unused pcpu_page_idx(unsigned int cpu, int page_idx)
     257                 :            : {
     258                 :        624 :         return pcpu_unit_map[cpu] * pcpu_unit_pages + page_idx;
     259                 :            : }
     260                 :            : 
     261                 :      79887 : static unsigned long pcpu_unit_page_offset(unsigned int cpu, int page_idx)
     262                 :            : {
     263                 :      79887 :         return pcpu_unit_offsets[cpu] + (page_idx << PAGE_SHIFT);
     264                 :            : }
     265                 :            : 
     266                 :      78561 : static unsigned long pcpu_chunk_addr(struct pcpu_chunk *chunk,
     267                 :            :                                      unsigned int cpu, int page_idx)
     268                 :            : {
     269                 :        156 :         return (unsigned long)chunk->base_addr +
     270                 :            :                pcpu_unit_page_offset(cpu, page_idx);
     271                 :            : }
     272                 :            : 
     273                 :            : /*
     274                 :            :  * The following are helper functions to help access bitmaps and convert
     275                 :            :  * between bitmap offsets to address offsets.
     276                 :            :  */
     277                 :     116169 : static unsigned long *pcpu_index_alloc_map(struct pcpu_chunk *chunk, int index)
     278                 :            : {
     279                 :     116169 :         return chunk->alloc_map +
     280                 :     116169 :                (index * PCPU_BITMAP_BLOCK_BITS / BITS_PER_LONG);
     281                 :            : }
     282                 :            : 
     283                 :     273413 : static unsigned long pcpu_off_to_block_index(int off)
     284                 :            : {
     285                 :     273413 :         return off / PCPU_BITMAP_BLOCK_BITS;
     286                 :            : }
     287                 :            : 
     288                 :     273413 : static unsigned long pcpu_off_to_block_off(int off)
     289                 :            : {
     290                 :     273413 :         return off & (PCPU_BITMAP_BLOCK_BITS - 1);
     291                 :            : }
     292                 :            : 
     293                 :      93286 : static unsigned long pcpu_block_off_to_off(int index, int off)
     294                 :            : {
     295                 :      93286 :         return index * PCPU_BITMAP_BLOCK_BITS + off;
     296                 :            : }
     297                 :            : 
     298                 :            : /*
     299                 :            :  * pcpu_next_hint - determine which hint to use
     300                 :            :  * @block: block of interest
     301                 :            :  * @alloc_bits: size of allocation
     302                 :            :  *
     303                 :            :  * This determines if we should scan based on the scan_hint or first_free.
     304                 :            :  * In general, we want to scan from first_free to fulfill allocations by
     305                 :            :  * first fit.  However, if we know a scan_hint at position scan_hint_start
     306                 :            :  * cannot fulfill an allocation, we can begin scanning from there knowing
     307                 :            :  * the contig_hint will be our fallback.
     308                 :            :  */
     309                 :     156574 : static int pcpu_next_hint(struct pcpu_block_md *block, int alloc_bits)
     310                 :            : {
     311                 :            :         /*
     312                 :            :          * The three conditions below determine if we can skip past the
     313                 :            :          * scan_hint.  First, does the scan hint exist.  Second, is the
     314                 :            :          * contig_hint after the scan_hint (possibly not true iff
     315                 :            :          * contig_hint == scan_hint).  Third, is the allocation request
     316                 :            :          * larger than the scan_hint.
     317                 :            :          */
     318                 :     203685 :         if (block->scan_hint &&
     319   [ +  +  +  +  :      47111 :             block->contig_hint_start > block->scan_hint_start &&
             +  +  +  + ]
     320                 :            :             alloc_bits > block->scan_hint)
     321                 :      34852 :                 return block->scan_hint_start + block->scan_hint;
     322                 :            : 
     323                 :     121722 :         return block->first_free;
     324                 :            : }
     325                 :            : 
     326                 :            : /**
     327                 :            :  * pcpu_next_md_free_region - finds the next hint free area
     328                 :            :  * @chunk: chunk of interest
     329                 :            :  * @bit_off: chunk offset
     330                 :            :  * @bits: size of free area
     331                 :            :  *
     332                 :            :  * Helper function for pcpu_for_each_md_free_region.  It checks
     333                 :            :  * block->contig_hint and performs aggregation across blocks to find the
     334                 :            :  * next hint.  It modifies bit_off and bits in-place to be consumed in the
     335                 :            :  * loop.
     336                 :            :  */
     337                 :     108805 : static void pcpu_next_md_free_region(struct pcpu_chunk *chunk, int *bit_off,
     338                 :            :                                      int *bits)
     339                 :            : {
     340                 :     108805 :         int i = pcpu_off_to_block_index(*bit_off);
     341                 :     108805 :         int block_off = pcpu_off_to_block_off(*bit_off);
     342                 :     108805 :         struct pcpu_block_md *block;
     343                 :            : 
     344                 :     108805 :         *bits = 0;
     345         [ +  + ]:   12362662 :         for (block = chunk->md_blocks + i; i < pcpu_chunk_nr_blocks(chunk);
     346                 :   12253857 :              block++, i++) {
     347                 :            :                 /* handles contig area across blocks */
     348         [ +  + ]:   12255169 :                 if (*bits) {
     349                 :   12193258 :                         *bits += block->left_free;
     350         [ +  + ]:   12193258 :                         if (block->left_free == PCPU_BITMAP_BLOCK_BITS)
     351                 :   12192867 :                                 continue;
     352                 :            :                         return;
     353                 :            :                 }
     354                 :            : 
     355                 :            :                 /*
     356                 :            :                  * This checks three things.  First is there a contig_hint to
     357                 :            :                  * check.  Second, have we checked this hint before by
     358                 :            :                  * comparing the block_off.  Third, is this the same as the
     359                 :            :                  * right contig hint.  In the last case, it spills over into
     360                 :            :                  * the next block and should be handled by the contig area
     361                 :            :                  * across blocks code.
     362                 :            :                  */
     363                 :      61911 :                 *bits = block->contig_hint;
     364   [ +  -  +  + ]:      61911 :                 if (*bits && block->contig_hint_start >= block_off &&
     365         [ +  + ]:      54975 :                     *bits + block->contig_hint_start < PCPU_BITMAP_BLOCK_BITS) {
     366                 :        921 :                         *bit_off = pcpu_block_off_to_off(i,
     367                 :            :                                         block->contig_hint_start);
     368                 :        921 :                         return;
     369                 :            :                 }
     370                 :            :                 /* reset to satisfy the second predicate above */
     371                 :      60990 :                 block_off = 0;
     372                 :            : 
     373                 :      60990 :                 *bits = block->right_free;
     374                 :      60990 :                 *bit_off = (i + 1) * PCPU_BITMAP_BLOCK_BITS - block->right_free;
     375                 :            :         }
     376                 :            : }
     377                 :            : 
     378                 :            : /**
     379                 :            :  * pcpu_next_fit_region - finds fit areas for a given allocation request
     380                 :            :  * @chunk: chunk of interest
     381                 :            :  * @alloc_bits: size of allocation
     382                 :            :  * @align: alignment of area (max PAGE_SIZE)
     383                 :            :  * @bit_off: chunk offset
     384                 :            :  * @bits: size of free area
     385                 :            :  *
     386                 :            :  * Finds the next free region that is viable for use with a given size and
     387                 :            :  * alignment.  This only returns if there is a valid area to be used for this
     388                 :            :  * allocation.  block->first_free is returned if the allocation request fits
     389                 :            :  * within the block to see if the request can be fulfilled prior to the contig
     390                 :            :  * hint.
     391                 :            :  */
     392                 :      78405 : static void pcpu_next_fit_region(struct pcpu_chunk *chunk, int alloc_bits,
     393                 :            :                                  int align, int *bit_off, int *bits)
     394                 :            : {
     395                 :      78405 :         int i = pcpu_off_to_block_index(*bit_off);
     396                 :      78405 :         int block_off = pcpu_off_to_block_off(*bit_off);
     397                 :      78405 :         struct pcpu_block_md *block;
     398                 :            : 
     399                 :      78405 :         *bits = 0;
     400         [ +  - ]:      86473 :         for (block = chunk->md_blocks + i; i < pcpu_chunk_nr_blocks(chunk);
     401                 :       8068 :              block++, i++) {
     402                 :            :                 /* handles contig area across blocks */
     403         [ +  + ]:      86473 :                 if (*bits) {
     404                 :        392 :                         *bits += block->left_free;
     405         [ +  + ]:        392 :                         if (*bits >= alloc_bits)
     406                 :            :                                 return;
     407         [ -  + ]:        156 :                         if (block->left_free == PCPU_BITMAP_BLOCK_BITS)
     408                 :          0 :                                 continue;
     409                 :            :                 }
     410                 :            : 
     411                 :            :                 /* check block->contig_hint */
     412                 :      86237 :                 *bits = ALIGN(block->contig_hint_start, align) -
     413                 :            :                         block->contig_hint_start;
     414                 :            :                 /*
     415                 :            :                  * This uses the block offset to determine if this has been
     416                 :            :                  * checked in the prior iteration.
     417                 :            :                  */
     418         [ +  - ]:      86237 :                 if (block->contig_hint &&
     419         [ +  + ]:      86237 :                     block->contig_hint_start >= block_off &&
     420         [ +  + ]:      79483 :                     block->contig_hint >= *bits + alloc_bits) {
     421         [ +  + ]:      78169 :                         int start = pcpu_next_hint(block, alloc_bits);
     422                 :            : 
     423                 :      78169 :                         *bits += alloc_bits + block->contig_hint_start -
     424                 :            :                                  start;
     425                 :      78169 :                         *bit_off = pcpu_block_off_to_off(i, start);
     426                 :      78169 :                         return;
     427                 :            :                 }
     428                 :            :                 /* reset to satisfy the second predicate above */
     429                 :       8068 :                 block_off = 0;
     430                 :            : 
     431                 :       8068 :                 *bit_off = ALIGN(PCPU_BITMAP_BLOCK_BITS - block->right_free,
     432                 :            :                                  align);
     433                 :       8068 :                 *bits = PCPU_BITMAP_BLOCK_BITS - *bit_off;
     434                 :       8068 :                 *bit_off = pcpu_block_off_to_off(i, *bit_off);
     435         [ +  - ]:       8068 :                 if (*bits >= alloc_bits)
     436                 :            :                         return;
     437                 :            :         }
     438                 :            : 
     439                 :            :         /* no valid offsets were found - fail condition */
     440                 :          0 :         *bit_off = pcpu_chunk_map_bits(chunk);
     441                 :            : }
     442                 :            : 
     443                 :            : /*
     444                 :            :  * Metadata free area iterators.  These perform aggregation of free areas
     445                 :            :  * based on the metadata blocks and return the offset @bit_off and size in
     446                 :            :  * bits of the free area @bits.  pcpu_for_each_fit_region only returns when
     447                 :            :  * a fit is found for the allocation request.
     448                 :            :  */
     449                 :            : #define pcpu_for_each_md_free_region(chunk, bit_off, bits)              \
     450                 :            :         for (pcpu_next_md_free_region((chunk), &(bit_off), &(bits));    \
     451                 :            :              (bit_off) < pcpu_chunk_map_bits((chunk));                       \
     452                 :            :              (bit_off) += (bits) + 1,                                   \
     453                 :            :              pcpu_next_md_free_region((chunk), &(bit_off), &(bits)))
     454                 :            : 
     455                 :            : #define pcpu_for_each_fit_region(chunk, alloc_bits, align, bit_off, bits)     \
     456                 :            :         for (pcpu_next_fit_region((chunk), (alloc_bits), (align), &(bit_off), \
     457                 :            :                                   &(bits));                               \
     458                 :            :              (bit_off) < pcpu_chunk_map_bits((chunk));                             \
     459                 :            :              (bit_off) += (bits),                                             \
     460                 :            :              pcpu_next_fit_region((chunk), (alloc_bits), (align), &(bit_off), \
     461                 :            :                                   &(bits)))
     462                 :            : 
     463                 :            : /**
     464                 :            :  * pcpu_mem_zalloc - allocate memory
     465                 :            :  * @size: bytes to allocate
     466                 :            :  * @gfp: allocation flags
     467                 :            :  *
     468                 :            :  * Allocate @size bytes.  If @size is smaller than PAGE_SIZE,
     469                 :            :  * kzalloc() is used; otherwise, the equivalent of vzalloc() is used.
     470                 :            :  * This is to facilitate passing through whitelisted flags.  The
     471                 :            :  * returned memory is always zeroed.
     472                 :            :  *
     473                 :            :  * RETURNS:
     474                 :            :  * Pointer to the allocated area on success, NULL on failure.
     475                 :            :  */
     476                 :        390 : static void *pcpu_mem_zalloc(size_t size, gfp_t gfp)
     477                 :            : {
     478   [ -  +  +  - ]:        390 :         if (WARN_ON_ONCE(!slab_is_available()))
     479                 :            :                 return NULL;
     480                 :            : 
     481         [ +  + ]:        390 :         if (size <= PAGE_SIZE)
     482                 :        156 :                 return kzalloc(size, gfp);
     483                 :            :         else
     484                 :        234 :                 return __vmalloc(size, gfp | __GFP_ZERO, PAGE_KERNEL);
     485                 :            : }
     486                 :            : 
     487                 :            : /**
     488                 :            :  * pcpu_mem_free - free memory
     489                 :            :  * @ptr: memory to free
     490                 :            :  *
     491                 :            :  * Free @ptr.  @ptr should have been allocated using pcpu_mem_zalloc().
     492                 :            :  */
     493                 :          0 : static void pcpu_mem_free(void *ptr)
     494                 :            : {
     495                 :          0 :         kvfree(ptr);
     496                 :          0 : }
     497                 :            : 
     498                 :       1146 : static void __pcpu_chunk_move(struct pcpu_chunk *chunk, int slot,
     499                 :            :                               bool move_front)
     500                 :            : {
     501         [ +  - ]:       1146 :         if (chunk != pcpu_reserved_chunk) {
     502         [ +  + ]:       1146 :                 if (move_front)
     503                 :        156 :                         list_move(&chunk->list, &pcpu_slot[slot]);
     504                 :            :                 else
     505                 :        990 :                         list_move_tail(&chunk->list, &pcpu_slot[slot]);
     506                 :            :         }
     507                 :       1146 : }
     508                 :            : 
     509                 :          0 : static void pcpu_chunk_move(struct pcpu_chunk *chunk, int slot)
     510                 :            : {
     511                 :          0 :         __pcpu_chunk_move(chunk, slot, true);
     512                 :          0 : }
     513                 :            : 
     514                 :            : /**
     515                 :            :  * pcpu_chunk_relocate - put chunk in the appropriate chunk slot
     516                 :            :  * @chunk: chunk of interest
     517                 :            :  * @oslot: the previous slot it was on
     518                 :            :  *
     519                 :            :  * This function is called after an allocation or free changed @chunk.
     520                 :            :  * New slot according to the changed state is determined and @chunk is
     521                 :            :  * moved to the slot.  Note that the reserved chunk is never put on
     522                 :            :  * chunk slots.
     523                 :            :  *
     524                 :            :  * CONTEXT:
     525                 :            :  * pcpu_lock.
     526                 :            :  */
     527                 :      84690 : static void pcpu_chunk_relocate(struct pcpu_chunk *chunk, int oslot)
     528                 :            : {
     529         [ +  + ]:      84690 :         int nslot = pcpu_chunk_slot(chunk);
     530                 :            : 
     531         [ +  + ]:      84690 :         if (oslot != nslot)
     532                 :       1146 :                 __pcpu_chunk_move(chunk, nslot, oslot < nslot);
     533                 :      84690 : }
     534                 :            : 
     535                 :            : /*
     536                 :            :  * pcpu_update_empty_pages - update empty page counters
     537                 :            :  * @chunk: chunk of interest
     538                 :            :  * @nr: nr of empty pages
     539                 :            :  *
     540                 :            :  * This is used to keep track of the empty pages now based on the premise
     541                 :            :  * a md_block covers a page.  The hint update functions recognize if a block
     542                 :            :  * is made full or broken to calculate deltas for keeping track of free pages.
     543                 :            :  */
     544                 :       1092 : static inline void pcpu_update_empty_pages(struct pcpu_chunk *chunk, int nr)
     545                 :            : {
     546                 :       1092 :         chunk->nr_empty_pop_pages += nr;
     547                 :       1092 :         if (chunk != pcpu_reserved_chunk)
     548                 :       1092 :                 pcpu_nr_empty_pop_pages += nr;
     549                 :            : }
     550                 :            : 
     551                 :            : /*
     552                 :            :  * pcpu_region_overlap - determines if two regions overlap
     553                 :            :  * @a: start of first region, inclusive
     554                 :            :  * @b: end of first region, exclusive
     555                 :            :  * @x: start of second region, inclusive
     556                 :            :  * @y: end of second region, exclusive
     557                 :            :  *
     558                 :            :  * This is used to determine if the hint region [a, b) overlaps with the
     559                 :            :  * allocated region [x, y).
     560                 :            :  */
     561                 :     314556 : static inline bool pcpu_region_overlap(int a, int b, int x, int y)
     562                 :            : {
     563                 :     314556 :         return (a < y) && (x < b);
     564                 :            : }
     565                 :            : 
     566                 :            : /**
     567                 :            :  * pcpu_block_update - updates a block given a free area
     568                 :            :  * @block: block of interest
     569                 :            :  * @start: start offset in block
     570                 :            :  * @end: end offset in block
     571                 :            :  *
     572                 :            :  * Updates a block given a known free area.  The region [start, end) is
     573                 :            :  * expected to be the entirety of the free area within a block.  Chooses
     574                 :            :  * the best starting offset if the contig hints are equal.
     575                 :            :  */
     576                 :     136978 : static void pcpu_block_update(struct pcpu_block_md *block, int start, int end)
     577                 :            : {
     578                 :     136978 :         int contig = end - start;
     579                 :            : 
     580                 :     136978 :         block->first_free = min(block->first_free, start);
     581         [ +  + ]:     136978 :         if (start == 0)
     582                 :         79 :                 block->left_free = contig;
     583                 :            : 
     584         [ +  + ]:     136978 :         if (end == block->nr_bits)
     585                 :     109741 :                 block->right_free = contig;
     586                 :            : 
     587         [ +  + ]:     136978 :         if (contig > block->contig_hint) {
     588                 :            :                 /* promote the old contig_hint to be the new scan_hint */
     589         [ +  + ]:     119944 :                 if (start > block->contig_hint_start) {
     590         [ +  + ]:     113210 :                         if (block->contig_hint > block->scan_hint) {
     591                 :      40338 :                                 block->scan_hint_start =
     592                 :            :                                         block->contig_hint_start;
     593                 :      40338 :                                 block->scan_hint = block->contig_hint;
     594         [ +  + ]:      72872 :                         } else if (start < block->scan_hint_start) {
     595                 :            :                                 /*
     596                 :            :                                  * The old contig_hint == scan_hint.  But, the
     597                 :            :                                  * new contig is larger so hold the invariant
     598                 :            :                                  * scan_hint_start < contig_hint_start.
     599                 :            :                                  */
     600                 :        439 :                                 block->scan_hint = 0;
     601                 :            :                         }
     602                 :            :                 } else {
     603                 :       6734 :                         block->scan_hint = 0;
     604                 :            :                 }
     605                 :     119944 :                 block->contig_hint_start = start;
     606                 :     119944 :                 block->contig_hint = contig;
     607         [ +  + ]:      17034 :         } else if (contig == block->contig_hint) {
     608   [ +  -  +  - ]:        640 :                 if (block->contig_hint_start &&
     609         [ +  + ]:        640 :                     (!start ||
     610         [ +  + ]:        640 :                      __ffs(start) > __ffs(block->contig_hint_start))) {
     611                 :            :                         /* start has a better alignment so use it */
     612                 :        135 :                         block->contig_hint_start = start;
     613         [ +  + ]:        135 :                         if (start < block->scan_hint_start &&
     614         [ +  - ]:         93 :                             block->contig_hint > block->scan_hint)
     615                 :         93 :                                 block->scan_hint = 0;
     616         [ +  + ]:        505 :                 } else if (start > block->scan_hint_start ||
     617         [ +  - ]:        100 :                            block->contig_hint > block->scan_hint) {
     618                 :            :                         /*
     619                 :            :                          * Knowing contig == contig_hint, update the scan_hint
     620                 :            :                          * if it is farther than or larger than the current
     621                 :            :                          * scan_hint.
     622                 :            :                          */
     623                 :        505 :                         block->scan_hint_start = start;
     624                 :        505 :                         block->scan_hint = contig;
     625                 :            :                 }
     626                 :            :         } else {
     627                 :            :                 /*
     628                 :            :                  * The region is smaller than the contig_hint.  So only update
     629                 :            :                  * the scan_hint if it is larger than or equal and farther than
     630                 :            :                  * the current scan_hint.
     631                 :            :                  */
     632         [ +  + ]:      16394 :                 if ((start < block->contig_hint_start &&
     633   [ +  +  +  + ]:      11772 :                      (contig > block->scan_hint ||
     634                 :       1277 :                       (contig == block->scan_hint &&
     635         [ +  + ]:       1277 :                        start > block->scan_hint_start)))) {
     636                 :       6979 :                         block->scan_hint_start = start;
     637                 :       6979 :                         block->scan_hint = contig;
     638                 :            :                 }
     639                 :            :         }
     640                 :     136978 : }
     641                 :            : 
     642                 :            : /*
     643                 :            :  * pcpu_block_update_scan - update a block given a free area from a scan
     644                 :            :  * @chunk: chunk of interest
     645                 :            :  * @bit_off: chunk offset
     646                 :            :  * @bits: size of free area
     647                 :            :  *
     648                 :            :  * Finding the final allocation spot first goes through pcpu_find_block_fit()
     649                 :            :  * to find a block that can hold the allocation and then pcpu_alloc_area()
     650                 :            :  * where a scan is used.  When allocations require specific alignments,
     651                 :            :  * we can inadvertently create holes which will not be seen in the alloc
     652                 :            :  * or free paths.
     653                 :            :  *
     654                 :            :  * This takes a given free area hole and updates a block as it may change the
     655                 :            :  * scan_hint.  We need to scan backwards to ensure we don't miss free bits
     656                 :            :  * from alignment.
     657                 :            :  */
     658                 :       1435 : static void pcpu_block_update_scan(struct pcpu_chunk *chunk, int bit_off,
     659                 :            :                                    int bits)
     660                 :            : {
     661                 :       1435 :         int s_off = pcpu_off_to_block_off(bit_off);
     662                 :       1435 :         int e_off = s_off + bits;
     663                 :       1435 :         int s_index, l_bit;
     664                 :       1435 :         struct pcpu_block_md *block;
     665                 :            : 
     666         [ +  - ]:       1435 :         if (e_off > PCPU_BITMAP_BLOCK_BITS)
     667                 :            :                 return;
     668                 :            : 
     669                 :       1435 :         s_index = pcpu_off_to_block_index(bit_off);
     670                 :       1435 :         block = chunk->md_blocks + s_index;
     671                 :            : 
     672                 :            :         /* scan backwards in case of alignment skipping free bits */
     673                 :       1435 :         l_bit = find_last_bit(pcpu_index_alloc_map(chunk, s_index), s_off);
     674         [ +  - ]:       1435 :         s_off = (s_off == l_bit) ? 0 : l_bit + 1;
     675                 :            : 
     676                 :       1435 :         pcpu_block_update(block, s_off, e_off);
     677                 :            : }
     678                 :            : 
     679                 :            : /**
     680                 :            :  * pcpu_chunk_refresh_hint - updates metadata about a chunk
     681                 :            :  * @chunk: chunk of interest
     682                 :            :  * @full_scan: if we should scan from the beginning
     683                 :            :  *
     684                 :            :  * Iterates over the metadata blocks to find the largest contig area.
     685                 :            :  * A full scan can be avoided on the allocation path as this is triggered
     686                 :            :  * if we broke the contig_hint.  In doing so, the scan_hint will be before
     687                 :            :  * the contig_hint or after if the scan_hint == contig_hint.  This cannot
     688                 :            :  * be prevented on freeing as we want to find the largest area possibly
     689                 :            :  * spanning blocks.
     690                 :            :  */
     691                 :      53825 : static void pcpu_chunk_refresh_hint(struct pcpu_chunk *chunk, bool full_scan)
     692                 :            : {
     693                 :      53825 :         struct pcpu_block_md *chunk_md = &chunk->chunk_md;
     694                 :      53825 :         int bit_off, bits;
     695                 :            : 
     696                 :            :         /* promote scan_hint to contig_hint */
     697   [ +  +  +  + ]:      53825 :         if (!full_scan && chunk_md->scan_hint) {
     698                 :       6710 :                 bit_off = chunk_md->scan_hint_start + chunk_md->scan_hint;
     699                 :       6710 :                 chunk_md->contig_hint_start = chunk_md->scan_hint_start;
     700                 :       6710 :                 chunk_md->contig_hint = chunk_md->scan_hint;
     701                 :       6710 :                 chunk_md->scan_hint = 0;
     702                 :            :         } else {
     703                 :      47115 :                 bit_off = chunk_md->first_free;
     704                 :      47115 :                 chunk_md->contig_hint = 0;
     705                 :            :         }
     706                 :            : 
     707                 :      53825 :         bits = 0;
     708         [ +  + ]:     108805 :         pcpu_for_each_md_free_region(chunk, bit_off, bits)
     709                 :      54980 :                 pcpu_block_update(chunk_md, bit_off, bit_off + bits);
     710                 :      53825 : }
     711                 :            : 
     712                 :            : /**
     713                 :            :  * pcpu_block_refresh_hint
     714                 :            :  * @chunk: chunk of interest
     715                 :            :  * @index: index of the metadata block
     716                 :            :  *
     717                 :            :  * Scans over the block beginning at first_free and updates the block
     718                 :            :  * metadata accordingly.
     719                 :            :  */
     720                 :      59968 : static void pcpu_block_refresh_hint(struct pcpu_chunk *chunk, int index)
     721                 :            : {
     722                 :      59968 :         struct pcpu_block_md *block = chunk->md_blocks + index;
     723                 :      59968 :         unsigned long *alloc_map = pcpu_index_alloc_map(chunk, index);
     724                 :      59968 :         unsigned int rs, re, start;     /* region start, region end */
     725                 :            : 
     726                 :            :         /* promote scan_hint to contig_hint */
     727         [ +  + ]:      59968 :         if (block->scan_hint) {
     728                 :      28301 :                 start = block->scan_hint_start + block->scan_hint;
     729                 :      28301 :                 block->contig_hint_start = block->scan_hint_start;
     730                 :      28301 :                 block->contig_hint = block->scan_hint;
     731                 :      28301 :                 block->scan_hint = 0;
     732                 :            :         } else {
     733                 :      31667 :                 start = block->first_free;
     734                 :      31667 :                 block->contig_hint = 0;
     735                 :            :         }
     736                 :            : 
     737                 :      59968 :         block->right_free = 0;
     738                 :            : 
     739                 :            :         /* iterate over free areas and update the contig hints */
     740         [ +  + ]:     128273 :         bitmap_for_each_clear_region(alloc_map, rs, re, start,
     741                 :            :                                      PCPU_BITMAP_BLOCK_BITS)
     742                 :      68305 :                 pcpu_block_update(block, rs, re);
     743                 :      59968 : }
     744                 :            : 
     745                 :            : /**
     746                 :            :  * pcpu_block_update_hint_alloc - update hint on allocation path
     747                 :            :  * @chunk: chunk of interest
     748                 :            :  * @bit_off: chunk offset
     749                 :            :  * @bits: size of request
     750                 :            :  *
     751                 :            :  * Updates metadata for the allocation path.  The metadata only has to be
     752                 :            :  * refreshed by a full scan iff the chunk's contig hint is broken.  Block level
     753                 :            :  * scans are required if the block's contig hint is broken.
     754                 :            :  */
     755                 :      78639 : static void pcpu_block_update_hint_alloc(struct pcpu_chunk *chunk, int bit_off,
     756                 :            :                                          int bits)
     757                 :            : {
     758                 :      78639 :         struct pcpu_block_md *chunk_md = &chunk->chunk_md;
     759                 :      78639 :         int nr_empty_pages = 0;
     760                 :      78639 :         struct pcpu_block_md *s_block, *e_block, *block;
     761                 :      78639 :         int s_index, e_index;   /* block indexes of the freed allocation */
     762                 :      78639 :         int s_off, e_off;       /* block offsets of the freed allocation */
     763                 :            : 
     764                 :            :         /*
     765                 :            :          * Calculate per block offsets.
     766                 :            :          * The calculation uses an inclusive range, but the resulting offsets
     767                 :            :          * are [start, end).  e_index always points to the last block in the
     768                 :            :          * range.
     769                 :            :          */
     770                 :      78639 :         s_index = pcpu_off_to_block_index(bit_off);
     771                 :      78639 :         e_index = pcpu_off_to_block_index(bit_off + bits - 1);
     772                 :      78639 :         s_off = pcpu_off_to_block_off(bit_off);
     773                 :      78639 :         e_off = pcpu_off_to_block_off(bit_off + bits - 1) + 1;
     774                 :            : 
     775                 :      78639 :         s_block = chunk->md_blocks + s_index;
     776                 :      78639 :         e_block = chunk->md_blocks + e_index;
     777                 :            : 
     778                 :            :         /*
     779                 :            :          * Update s_block.
     780                 :            :          * block->first_free must be updated if the allocation takes its place.
     781                 :            :          * If the allocation breaks the contig_hint, a scan is required to
     782                 :            :          * restore this hint.
     783                 :            :          */
     784         [ +  + ]:      78639 :         if (s_block->contig_hint == PCPU_BITMAP_BLOCK_BITS)
     785                 :        701 :                 nr_empty_pages++;
     786                 :            : 
     787         [ +  + ]:      78639 :         if (s_off == s_block->first_free)
     788                 :      43285 :                 s_block->first_free = find_next_zero_bit(
     789                 :            :                                         pcpu_index_alloc_map(chunk, s_index),
     790                 :            :                                         PCPU_BITMAP_BLOCK_BITS,
     791                 :      43285 :                                         s_off + bits);
     792                 :            : 
     793                 :      78639 :         if (pcpu_region_overlap(s_block->scan_hint_start,
     794         [ +  + ]:      78639 :                                 s_block->scan_hint_start + s_block->scan_hint,
     795                 :            :                                 s_off,
     796                 :            :                                 s_off + bits))
     797                 :       5036 :                 s_block->scan_hint = 0;
     798                 :            : 
     799                 :      78639 :         if (pcpu_region_overlap(s_block->contig_hint_start,
     800                 :      78639 :                                 s_block->contig_hint_start +
     801         [ +  + ]:      78639 :                                 s_block->contig_hint,
     802                 :            :                                 s_off,
     803                 :            :                                 s_off + bits)) {
     804                 :            :                 /* block contig hint is broken - scan to fix it */
     805         [ +  + ]:      59733 :                 if (!s_off)
     806                 :        623 :                         s_block->left_free = 0;
     807                 :      59733 :                 pcpu_block_refresh_hint(chunk, s_index);
     808                 :            :         } else {
     809                 :            :                 /* update left and right contig manually */
     810                 :      18906 :                 s_block->left_free = min(s_block->left_free, s_off);
     811         [ +  - ]:      18906 :                 if (s_index == e_index)
     812                 :      18906 :                         s_block->right_free = min_t(int, s_block->right_free,
     813                 :            :                                         PCPU_BITMAP_BLOCK_BITS - e_off);
     814                 :            :                 else
     815                 :          0 :                         s_block->right_free = 0;
     816                 :            :         }
     817                 :            : 
     818                 :            :         /*
     819                 :            :          * Update e_block.
     820                 :            :          */
     821         [ +  + ]:      78639 :         if (s_index != e_index) {
     822         [ +  + ]:        236 :                 if (e_block->contig_hint == PCPU_BITMAP_BLOCK_BITS)
     823                 :        235 :                         nr_empty_pages++;
     824                 :            : 
     825                 :            :                 /*
     826                 :            :                  * When the allocation is across blocks, the end is along
     827                 :            :                  * the left part of the e_block.
     828                 :            :                  */
     829                 :        236 :                 e_block->first_free = find_next_zero_bit(
     830                 :            :                                 pcpu_index_alloc_map(chunk, e_index),
     831                 :            :                                 PCPU_BITMAP_BLOCK_BITS, e_off);
     832                 :            : 
     833         [ -  + ]:        236 :                 if (e_off == PCPU_BITMAP_BLOCK_BITS) {
     834                 :            :                         /* reset the block */
     835                 :          0 :                         e_block++;
     836                 :            :                 } else {
     837         [ +  - ]:        236 :                         if (e_off > e_block->scan_hint_start)
     838                 :        236 :                                 e_block->scan_hint = 0;
     839                 :            : 
     840                 :        236 :                         e_block->left_free = 0;
     841         [ +  + ]:        236 :                         if (e_off > e_block->contig_hint_start) {
     842                 :            :                                 /* contig hint is broken - scan to fix it */
     843                 :        235 :                                 pcpu_block_refresh_hint(chunk, e_index);
     844                 :            :                         } else {
     845                 :          1 :                                 e_block->right_free =
     846                 :          1 :                                         min_t(int, e_block->right_free,
     847                 :            :                                               PCPU_BITMAP_BLOCK_BITS - e_off);
     848                 :            :                         }
     849                 :            :                 }
     850                 :            : 
     851                 :            :                 /* update in-between md_blocks */
     852                 :        236 :                 nr_empty_pages += (e_index - s_index - 1);
     853         [ -  + ]:        236 :                 for (block = s_block + 1; block < e_block; block++) {
     854                 :          0 :                         block->scan_hint = 0;
     855                 :          0 :                         block->contig_hint = 0;
     856                 :          0 :                         block->left_free = 0;
     857                 :          0 :                         block->right_free = 0;
     858                 :            :                 }
     859                 :            :         }
     860                 :            : 
     861         [ +  + ]:      78639 :         if (nr_empty_pages)
     862         [ +  - ]:        936 :                 pcpu_update_empty_pages(chunk, -nr_empty_pages);
     863                 :            : 
     864                 :      78639 :         if (pcpu_region_overlap(chunk_md->scan_hint_start,
     865                 :      78639 :                                 chunk_md->scan_hint_start +
     866         [ +  + ]:      78639 :                                 chunk_md->scan_hint,
     867                 :            :                                 bit_off,
     868                 :            :                                 bit_off + bits))
     869                 :       1878 :                 chunk_md->scan_hint = 0;
     870                 :            : 
     871                 :            :         /*
     872                 :            :          * The only time a full chunk scan is required is if the chunk
     873                 :            :          * contig hint is broken.  Otherwise, it means a smaller space
     874                 :            :          * was used and therefore the chunk contig hint is still correct.
     875                 :            :          */
     876                 :      78639 :         if (pcpu_region_overlap(chunk_md->contig_hint_start,
     877                 :      78639 :                                 chunk_md->contig_hint_start +
     878         [ +  + ]:      78639 :                                 chunk_md->contig_hint,
     879                 :            :                                 bit_off,
     880                 :            :                                 bit_off + bits))
     881                 :      53824 :                 pcpu_chunk_refresh_hint(chunk, false);
     882                 :      78639 : }
     883                 :            : 
     884                 :            : /**
     885                 :            :  * pcpu_block_update_hint_free - updates the block hints on the free path
     886                 :            :  * @chunk: chunk of interest
     887                 :            :  * @bit_off: chunk offset
     888                 :            :  * @bits: size of request
     889                 :            :  *
     890                 :            :  * Updates metadata for the allocation path.  This avoids a blind block
     891                 :            :  * refresh by making use of the block contig hints.  If this fails, it scans
     892                 :            :  * forward and backward to determine the extent of the free area.  This is
     893                 :            :  * capped at the boundary of blocks.
     894                 :            :  *
     895                 :            :  * A chunk update is triggered if a page becomes free, a block becomes free,
     896                 :            :  * or the free spans across blocks.  This tradeoff is to minimize iterating
     897                 :            :  * over the block metadata to update chunk_md->contig_hint.
     898                 :            :  * chunk_md->contig_hint may be off by up to a page, but it will never be more
     899                 :            :  * than the available space.  If the contig hint is contained in one block, it
     900                 :            :  * will be accurate.
     901                 :            :  */
     902                 :       6129 : static void pcpu_block_update_hint_free(struct pcpu_chunk *chunk, int bit_off,
     903                 :            :                                         int bits)
     904                 :            : {
     905                 :       6129 :         int nr_empty_pages = 0;
     906                 :       6129 :         struct pcpu_block_md *s_block, *e_block, *block;
     907                 :       6129 :         int s_index, e_index;   /* block indexes of the freed allocation */
     908                 :       6129 :         int s_off, e_off;       /* block offsets of the freed allocation */
     909                 :       6129 :         int start, end;         /* start and end of the whole free area */
     910                 :            : 
     911                 :            :         /*
     912                 :            :          * Calculate per block offsets.
     913                 :            :          * The calculation uses an inclusive range, but the resulting offsets
     914                 :            :          * are [start, end).  e_index always points to the last block in the
     915                 :            :          * range.
     916                 :            :          */
     917                 :       6129 :         s_index = pcpu_off_to_block_index(bit_off);
     918                 :       6129 :         e_index = pcpu_off_to_block_index(bit_off + bits - 1);
     919                 :       6129 :         s_off = pcpu_off_to_block_off(bit_off);
     920                 :       6129 :         e_off = pcpu_off_to_block_off(bit_off + bits - 1) + 1;
     921                 :            : 
     922                 :       6129 :         s_block = chunk->md_blocks + s_index;
     923                 :       6129 :         e_block = chunk->md_blocks + e_index;
     924                 :            : 
     925                 :            :         /*
     926                 :            :          * Check if the freed area aligns with the block->contig_hint.
     927                 :            :          * If it does, then the scan to find the beginning/end of the
     928                 :            :          * larger free area can be avoided.
     929                 :            :          *
     930                 :            :          * start and end refer to beginning and end of the free area
     931                 :            :          * within each their respective blocks.  This is not necessarily
     932                 :            :          * the entire free area as it may span blocks past the beginning
     933                 :            :          * or end of the block.
     934                 :            :          */
     935                 :       6129 :         start = s_off;
     936         [ +  + ]:       6129 :         if (s_off == s_block->contig_hint + s_block->contig_hint_start) {
     937                 :            :                 start = s_block->contig_hint_start;
     938                 :            :         } else {
     939                 :            :                 /*
     940                 :            :                  * Scan backwards to find the extent of the free area.
     941                 :            :                  * find_last_bit returns the starting bit, so if the start bit
     942                 :            :                  * is returned, that means there was no last bit and the
     943                 :            :                  * remainder of the chunk is free.
     944                 :            :                  */
     945                 :       5675 :                 int l_bit = find_last_bit(pcpu_index_alloc_map(chunk, s_index),
     946                 :            :                                           start);
     947         [ +  - ]:       5675 :                 start = (start == l_bit) ? 0 : l_bit + 1;
     948                 :            :         }
     949                 :            : 
     950                 :       6129 :         end = e_off;
     951         [ +  + ]:       6129 :         if (e_off == e_block->contig_hint_start)
     952                 :        559 :                 end = e_block->contig_hint_start + e_block->contig_hint;
     953                 :            :         else
     954                 :       5570 :                 end = find_next_bit(pcpu_index_alloc_map(chunk, e_index),
     955                 :            :                                     PCPU_BITMAP_BLOCK_BITS, end);
     956                 :            : 
     957                 :            :         /* update s_block */
     958         [ +  + ]:       6129 :         e_off = (s_index == e_index) ? end : PCPU_BITMAP_BLOCK_BITS;
     959         [ -  + ]:       6129 :         if (!start && e_off == PCPU_BITMAP_BLOCK_BITS)
     960                 :          0 :                 nr_empty_pages++;
     961                 :       6129 :         pcpu_block_update(s_block, start, e_off);
     962                 :            : 
     963                 :            :         /* freeing in the same block */
     964         [ +  + ]:       6129 :         if (s_index != e_index) {
     965                 :            :                 /* update e_block */
     966         [ -  + ]:          1 :                 if (end == PCPU_BITMAP_BLOCK_BITS)
     967                 :          0 :                         nr_empty_pages++;
     968                 :          1 :                 pcpu_block_update(e_block, 0, end);
     969                 :            : 
     970                 :            :                 /* reset md_blocks in the middle */
     971                 :          1 :                 nr_empty_pages += (e_index - s_index - 1);
     972         [ -  + ]:          1 :                 for (block = s_block + 1; block < e_block; block++) {
     973                 :          0 :                         block->first_free = 0;
     974                 :          0 :                         block->scan_hint = 0;
     975                 :          0 :                         block->contig_hint_start = 0;
     976                 :          0 :                         block->contig_hint = PCPU_BITMAP_BLOCK_BITS;
     977                 :          0 :                         block->left_free = PCPU_BITMAP_BLOCK_BITS;
     978                 :          0 :                         block->right_free = PCPU_BITMAP_BLOCK_BITS;
     979                 :            :                 }
     980                 :            :         }
     981                 :            : 
     982         [ -  + ]:       6129 :         if (nr_empty_pages)
     983         [ #  # ]:          0 :                 pcpu_update_empty_pages(chunk, nr_empty_pages);
     984                 :            : 
     985                 :            :         /*
     986                 :            :          * Refresh chunk metadata when the free makes a block free or spans
     987                 :            :          * across blocks.  The contig_hint may be off by up to a page, but if
     988                 :            :          * the contig_hint is contained in a block, it will be accurate with
     989                 :            :          * the else condition below.
     990                 :            :          */
     991   [ +  +  -  + ]:       6129 :         if (((end - start) >= PCPU_BITMAP_BLOCK_BITS) || s_index != e_index)
     992                 :          1 :                 pcpu_chunk_refresh_hint(chunk, true);
     993                 :            :         else
     994                 :       6128 :                 pcpu_block_update(&chunk->chunk_md,
     995                 :            :                                   pcpu_block_off_to_off(s_index, start),
     996                 :            :                                   end);
     997                 :       6129 : }
     998                 :            : 
     999                 :            : /**
    1000                 :            :  * pcpu_is_populated - determines if the region is populated
    1001                 :            :  * @chunk: chunk of interest
    1002                 :            :  * @bit_off: chunk offset
    1003                 :            :  * @bits: size of area
    1004                 :            :  * @next_off: return value for the next offset to start searching
    1005                 :            :  *
    1006                 :            :  * For atomic allocations, check if the backing pages are populated.
    1007                 :            :  *
    1008                 :            :  * RETURNS:
    1009                 :            :  * Bool if the backing pages are populated.
    1010                 :            :  * next_index is to skip over unpopulated blocks in pcpu_find_block_fit.
    1011                 :            :  */
    1012                 :        208 : static bool pcpu_is_populated(struct pcpu_chunk *chunk, int bit_off, int bits,
    1013                 :            :                               int *next_off)
    1014                 :            : {
    1015                 :        208 :         unsigned int page_start, page_end, rs, re;
    1016                 :            : 
    1017                 :        208 :         page_start = PFN_DOWN(bit_off * PCPU_MIN_ALLOC_SIZE);
    1018                 :        208 :         page_end = PFN_UP((bit_off + bits) * PCPU_MIN_ALLOC_SIZE);
    1019                 :            : 
    1020                 :        208 :         rs = page_start;
    1021                 :        208 :         bitmap_next_clear_region(chunk->populated, &rs, &re, page_end);
    1022         [ -  + ]:        208 :         if (rs >= page_end)
    1023                 :            :                 return true;
    1024                 :            : 
    1025                 :          0 :         *next_off = re * PAGE_SIZE / PCPU_MIN_ALLOC_SIZE;
    1026                 :          0 :         return false;
    1027                 :            : }
    1028                 :            : 
    1029                 :            : /**
    1030                 :            :  * pcpu_find_block_fit - finds the block index to start searching
    1031                 :            :  * @chunk: chunk of interest
    1032                 :            :  * @alloc_bits: size of request in allocation units
    1033                 :            :  * @align: alignment of area (max PAGE_SIZE bytes)
    1034                 :            :  * @pop_only: use populated regions only
    1035                 :            :  *
    1036                 :            :  * Given a chunk and an allocation spec, find the offset to begin searching
    1037                 :            :  * for a free region.  This iterates over the bitmap metadata blocks to
    1038                 :            :  * find an offset that will be guaranteed to fit the requirements.  It is
    1039                 :            :  * not quite first fit as if the allocation does not fit in the contig hint
    1040                 :            :  * of a block or chunk, it is skipped.  This errs on the side of caution
    1041                 :            :  * to prevent excess iteration.  Poor alignment can cause the allocator to
    1042                 :            :  * skip over blocks and chunks that have valid free areas.
    1043                 :            :  *
    1044                 :            :  * RETURNS:
    1045                 :            :  * The offset in the bitmap to begin searching.
    1046                 :            :  * -1 if no offset is found.
    1047                 :            :  */
    1048                 :      78405 : static int pcpu_find_block_fit(struct pcpu_chunk *chunk, int alloc_bits,
    1049                 :            :                                size_t align, bool pop_only)
    1050                 :            : {
    1051                 :      78405 :         struct pcpu_block_md *chunk_md = &chunk->chunk_md;
    1052                 :      78405 :         int bit_off, bits, next_off;
    1053                 :            : 
    1054                 :            :         /*
    1055                 :            :          * Check to see if the allocation can fit in the chunk's contig hint.
    1056                 :            :          * This is an optimization to prevent scanning by assuming if it
    1057                 :            :          * cannot fit in the global hint, there is memory pressure and creating
    1058                 :            :          * a new chunk would happen soon.
    1059                 :            :          */
    1060                 :      78405 :         bit_off = ALIGN(chunk_md->contig_hint_start, align) -
    1061                 :            :                   chunk_md->contig_hint_start;
    1062         [ +  - ]:      78405 :         if (bit_off + alloc_bits > chunk_md->contig_hint)
    1063                 :            :                 return -1;
    1064                 :            : 
    1065         [ +  + ]:      78405 :         bit_off = pcpu_next_hint(chunk_md, alloc_bits);
    1066                 :      78405 :         bits = 0;
    1067         [ +  - ]:      78405 :         pcpu_for_each_fit_region(chunk, alloc_bits, align, bit_off, bits) {
    1068   [ +  +  -  + ]:      78405 :                 if (!pop_only || pcpu_is_populated(chunk, bit_off, bits,
    1069                 :            :                                                    &next_off))
    1070                 :            :                         break;
    1071                 :            : 
    1072                 :          0 :                 bit_off = next_off;
    1073                 :          0 :                 bits = 0;
    1074                 :            :         }
    1075                 :            : 
    1076         [ -  + ]:      78405 :         if (bit_off == pcpu_chunk_map_bits(chunk))
    1077                 :          0 :                 return -1;
    1078                 :            : 
    1079                 :            :         return bit_off;
    1080                 :            : }
    1081                 :            : 
    1082                 :            : /*
    1083                 :            :  * pcpu_find_zero_area - modified from bitmap_find_next_zero_area_off()
    1084                 :            :  * @map: the address to base the search on
    1085                 :            :  * @size: the bitmap size in bits
    1086                 :            :  * @start: the bitnumber to start searching at
    1087                 :            :  * @nr: the number of zeroed bits we're looking for
    1088                 :            :  * @align_mask: alignment mask for zero area
    1089                 :            :  * @largest_off: offset of the largest area skipped
    1090                 :            :  * @largest_bits: size of the largest area skipped
    1091                 :            :  *
    1092                 :            :  * The @align_mask should be one less than a power of 2.
    1093                 :            :  *
    1094                 :            :  * This is a modified version of bitmap_find_next_zero_area_off() to remember
    1095                 :            :  * the largest area that was skipped.  This is imperfect, but in general is
    1096                 :            :  * good enough.  The largest remembered region is the largest failed region
    1097                 :            :  * seen.  This does not include anything we possibly skipped due to alignment.
    1098                 :            :  * pcpu_block_update_scan() does scan backwards to try and recover what was
    1099                 :            :  * lost to alignment.  While this can cause scanning to miss earlier possible
    1100                 :            :  * free areas, smaller allocations will eventually fill those holes.
    1101                 :            :  */
    1102                 :      78405 : static unsigned long pcpu_find_zero_area(unsigned long *map,
    1103                 :            :                                          unsigned long size,
    1104                 :            :                                          unsigned long start,
    1105                 :            :                                          unsigned long nr,
    1106                 :            :                                          unsigned long align_mask,
    1107                 :            :                                          unsigned long *largest_off,
    1108                 :            :                                          unsigned long *largest_bits)
    1109                 :            : {
    1110                 :      87890 :         unsigned long index, end, i, area_off, area_bits;
    1111                 :      87890 : again:
    1112                 :      87890 :         index = find_next_zero_bit(map, size, start);
    1113                 :            : 
    1114                 :            :         /* Align allocation */
    1115                 :      87890 :         index = __ALIGN_MASK(index, align_mask);
    1116                 :      87890 :         area_off = index;
    1117                 :            : 
    1118                 :      87890 :         end = index + nr;
    1119         [ -  + ]:      87890 :         if (end > size)
    1120                 :          0 :                 return end;
    1121                 :      87890 :         i = find_next_bit(map, end, index);
    1122         [ +  + ]:      87890 :         if (i < end) {
    1123                 :       9485 :                 area_bits = i - area_off;
    1124                 :            :                 /* remember largest unused area with best alignment */
    1125   [ +  +  +  + ]:       9485 :                 if (area_bits > *largest_bits ||
    1126   [ +  +  +  - ]:       7500 :                     (area_bits == *largest_bits && *largest_off &&
    1127         [ +  + ]:         54 :                      (!area_off || __ffs(area_off) > __ffs(*largest_off)))) {
    1128                 :       1503 :                         *largest_off = area_off;
    1129                 :       1503 :                         *largest_bits = area_bits;
    1130                 :            :                 }
    1131                 :            : 
    1132                 :       9485 :                 start = i + 1;
    1133                 :       9485 :                 goto again;
    1134                 :            :         }
    1135                 :            :         return index;
    1136                 :            : }
    1137                 :            : 
    1138                 :            : /**
    1139                 :            :  * pcpu_alloc_area - allocates an area from a pcpu_chunk
    1140                 :            :  * @chunk: chunk of interest
    1141                 :            :  * @alloc_bits: size of request in allocation units
    1142                 :            :  * @align: alignment of area (max PAGE_SIZE)
    1143                 :            :  * @start: bit_off to start searching
    1144                 :            :  *
    1145                 :            :  * This function takes in a @start offset to begin searching to fit an
    1146                 :            :  * allocation of @alloc_bits with alignment @align.  It needs to scan
    1147                 :            :  * the allocation map because if it fits within the block's contig hint,
    1148                 :            :  * @start will be block->first_free. This is an attempt to fill the
    1149                 :            :  * allocation prior to breaking the contig hint.  The allocation and
    1150                 :            :  * boundary maps are updated accordingly if it confirms a valid
    1151                 :            :  * free area.
    1152                 :            :  *
    1153                 :            :  * RETURNS:
    1154                 :            :  * Allocated addr offset in @chunk on success.
    1155                 :            :  * -1 if no matching area is found.
    1156                 :            :  */
    1157                 :      78405 : static int pcpu_alloc_area(struct pcpu_chunk *chunk, int alloc_bits,
    1158                 :            :                            size_t align, int start)
    1159                 :            : {
    1160                 :      78405 :         struct pcpu_block_md *chunk_md = &chunk->chunk_md;
    1161         [ +  - ]:      78405 :         size_t align_mask = (align) ? (align - 1) : 0;
    1162                 :      78405 :         unsigned long area_off = 0, area_bits = 0;
    1163                 :      78405 :         int bit_off, end, oslot;
    1164                 :            : 
    1165                 :      78405 :         lockdep_assert_held(&pcpu_lock);
    1166                 :            : 
    1167         [ +  - ]:      78405 :         oslot = pcpu_chunk_slot(chunk);
    1168                 :            : 
    1169                 :            :         /*
    1170                 :            :          * Search to find a fit.
    1171                 :            :          */
    1172                 :      78405 :         end = min_t(int, start + alloc_bits + PCPU_BITMAP_BLOCK_BITS,
    1173                 :            :                     pcpu_chunk_map_bits(chunk));
    1174                 :      78405 :         bit_off = pcpu_find_zero_area(chunk->alloc_map, end, start, alloc_bits,
    1175                 :            :                                       align_mask, &area_off, &area_bits);
    1176         [ +  - ]:      78405 :         if (bit_off >= end)
    1177                 :            :                 return -1;
    1178                 :            : 
    1179         [ +  + ]:      78405 :         if (area_bits)
    1180                 :       1435 :                 pcpu_block_update_scan(chunk, area_off, area_bits);
    1181                 :            : 
    1182                 :            :         /* update alloc map */
    1183         [ -  + ]:      78405 :         bitmap_set(chunk->alloc_map, bit_off, alloc_bits);
    1184                 :            : 
    1185                 :            :         /* update boundary map */
    1186                 :      78405 :         set_bit(bit_off, chunk->bound_map);
    1187         [ -  + ]:      78405 :         bitmap_clear(chunk->bound_map, bit_off + 1, alloc_bits - 1);
    1188                 :      78405 :         set_bit(bit_off + alloc_bits, chunk->bound_map);
    1189                 :            : 
    1190                 :      78405 :         chunk->free_bytes -= alloc_bits * PCPU_MIN_ALLOC_SIZE;
    1191                 :            : 
    1192                 :            :         /* update first free bit */
    1193         [ +  + ]:      78405 :         if (bit_off == chunk_md->first_free)
    1194                 :      36127 :                 chunk_md->first_free = find_next_zero_bit(
    1195                 :      36127 :                                         chunk->alloc_map,
    1196                 :            :                                         pcpu_chunk_map_bits(chunk),
    1197                 :            :                                         bit_off + alloc_bits);
    1198                 :            : 
    1199                 :      78405 :         pcpu_block_update_hint_alloc(chunk, bit_off, alloc_bits);
    1200                 :            : 
    1201                 :      78405 :         pcpu_chunk_relocate(chunk, oslot);
    1202                 :            : 
    1203                 :      78405 :         return bit_off * PCPU_MIN_ALLOC_SIZE;
    1204                 :            : }
    1205                 :            : 
    1206                 :            : /**
    1207                 :            :  * pcpu_free_area - frees the corresponding offset
    1208                 :            :  * @chunk: chunk of interest
    1209                 :            :  * @off: addr offset into chunk
    1210                 :            :  *
    1211                 :            :  * This function determines the size of an allocation to free using
    1212                 :            :  * the boundary bitmap and clears the allocation map.
    1213                 :            :  */
    1214                 :       6129 : static void pcpu_free_area(struct pcpu_chunk *chunk, int off)
    1215                 :            : {
    1216                 :       6129 :         struct pcpu_block_md *chunk_md = &chunk->chunk_md;
    1217                 :       6129 :         int bit_off, bits, end, oslot;
    1218                 :            : 
    1219                 :       6129 :         lockdep_assert_held(&pcpu_lock);
    1220         [ +  - ]:       6129 :         pcpu_stats_area_dealloc(chunk);
    1221                 :            : 
    1222         [ +  - ]:       6129 :         oslot = pcpu_chunk_slot(chunk);
    1223                 :            : 
    1224                 :       6129 :         bit_off = off / PCPU_MIN_ALLOC_SIZE;
    1225                 :            : 
    1226                 :            :         /* find end index */
    1227                 :      12258 :         end = find_next_bit(chunk->bound_map, pcpu_chunk_map_bits(chunk),
    1228                 :       6129 :                             bit_off + 1);
    1229                 :       6129 :         bits = end - bit_off;
    1230         [ -  + ]:       6129 :         bitmap_clear(chunk->alloc_map, bit_off, bits);
    1231                 :            : 
    1232                 :            :         /* update metadata */
    1233                 :       6129 :         chunk->free_bytes += bits * PCPU_MIN_ALLOC_SIZE;
    1234                 :            : 
    1235                 :            :         /* update first free bit */
    1236                 :       6129 :         chunk_md->first_free = min(chunk_md->first_free, bit_off);
    1237                 :            : 
    1238                 :       6129 :         pcpu_block_update_hint_free(chunk, bit_off, bits);
    1239                 :            : 
    1240                 :       6129 :         pcpu_chunk_relocate(chunk, oslot);
    1241                 :       6129 : }
    1242                 :            : 
    1243                 :      41028 : static void pcpu_init_md_block(struct pcpu_block_md *block, int nr_bits)
    1244                 :            : {
    1245                 :      41028 :         block->scan_hint = 0;
    1246                 :      41028 :         block->contig_hint = nr_bits;
    1247                 :      41028 :         block->left_free = nr_bits;
    1248                 :      41028 :         block->right_free = nr_bits;
    1249                 :      41028 :         block->first_free = 0;
    1250                 :      41028 :         block->nr_bits = nr_bits;
    1251                 :            : }
    1252                 :            : 
    1253                 :        234 : static void pcpu_init_md_blocks(struct pcpu_chunk *chunk)
    1254                 :            : {
    1255                 :        234 :         struct pcpu_block_md *md_block;
    1256                 :            : 
    1257                 :            :         /* init the chunk's block */
    1258                 :        234 :         pcpu_init_md_block(&chunk->chunk_md, pcpu_chunk_map_bits(chunk));
    1259                 :            : 
    1260         [ +  + ]:      41028 :         for (md_block = chunk->md_blocks;
    1261         [ +  + ]:      41028 :              md_block != chunk->md_blocks + pcpu_chunk_nr_blocks(chunk);
    1262                 :      40794 :              md_block++)
    1263                 :      40794 :                 pcpu_init_md_block(md_block, PCPU_BITMAP_BLOCK_BITS);
    1264                 :        234 : }
    1265                 :            : 
    1266                 :            : /**
    1267                 :            :  * pcpu_alloc_first_chunk - creates chunks that serve the first chunk
    1268                 :            :  * @tmp_addr: the start of the region served
    1269                 :            :  * @map_size: size of the region served
    1270                 :            :  *
    1271                 :            :  * This is responsible for creating the chunks that serve the first chunk.  The
    1272                 :            :  * base_addr is page aligned down of @tmp_addr while the region end is page
    1273                 :            :  * aligned up.  Offsets are kept track of to determine the region served. All
    1274                 :            :  * this is done to appease the bitmap allocator in avoiding partial blocks.
    1275                 :            :  *
    1276                 :            :  * RETURNS:
    1277                 :            :  * Chunk serving the region at @tmp_addr of @map_size.
    1278                 :            :  */
    1279                 :        156 : static struct pcpu_chunk * __init pcpu_alloc_first_chunk(unsigned long tmp_addr,
    1280                 :            :                                                          int map_size)
    1281                 :            : {
    1282                 :        156 :         struct pcpu_chunk *chunk;
    1283                 :        156 :         unsigned long aligned_addr, lcm_align;
    1284                 :        156 :         int start_offset, offset_bits, region_size, region_bits;
    1285                 :        156 :         size_t alloc_size;
    1286                 :            : 
    1287                 :            :         /* region calculations */
    1288                 :        156 :         aligned_addr = tmp_addr & PAGE_MASK;
    1289                 :            : 
    1290                 :        156 :         start_offset = tmp_addr - aligned_addr;
    1291                 :            : 
    1292                 :            :         /*
    1293                 :            :          * Align the end of the region with the LCM of PAGE_SIZE and
    1294                 :            :          * PCPU_BITMAP_BLOCK_SIZE.  One of these constants is a multiple of
    1295                 :            :          * the other.
    1296                 :            :          */
    1297                 :        156 :         lcm_align = lcm(PAGE_SIZE, PCPU_BITMAP_BLOCK_SIZE);
    1298                 :        156 :         region_size = ALIGN(start_offset + map_size, lcm_align);
    1299                 :            : 
    1300                 :            :         /* allocate chunk */
    1301                 :        156 :         alloc_size = sizeof(struct pcpu_chunk) +
    1302                 :        156 :                 BITS_TO_LONGS(region_size >> PAGE_SHIFT);
    1303                 :        156 :         chunk = memblock_alloc(alloc_size, SMP_CACHE_BYTES);
    1304         [ -  + ]:        156 :         if (!chunk)
    1305                 :          0 :                 panic("%s: Failed to allocate %zu bytes\n", __func__,
    1306                 :            :                       alloc_size);
    1307                 :            : 
    1308                 :        156 :         INIT_LIST_HEAD(&chunk->list);
    1309                 :            : 
    1310                 :        156 :         chunk->base_addr = (void *)aligned_addr;
    1311                 :        156 :         chunk->start_offset = start_offset;
    1312                 :        156 :         chunk->end_offset = region_size - chunk->start_offset - map_size;
    1313                 :            : 
    1314                 :        156 :         chunk->nr_pages = region_size >> PAGE_SHIFT;
    1315                 :        156 :         region_bits = pcpu_chunk_map_bits(chunk);
    1316                 :            : 
    1317                 :        156 :         alloc_size = BITS_TO_LONGS(region_bits) * sizeof(chunk->alloc_map[0]);
    1318                 :        156 :         chunk->alloc_map = memblock_alloc(alloc_size, SMP_CACHE_BYTES);
    1319         [ -  + ]:        156 :         if (!chunk->alloc_map)
    1320                 :          0 :                 panic("%s: Failed to allocate %zu bytes\n", __func__,
    1321                 :            :                       alloc_size);
    1322                 :            : 
    1323                 :        156 :         alloc_size =
    1324                 :        156 :                 BITS_TO_LONGS(region_bits + 1) * sizeof(chunk->bound_map[0]);
    1325                 :        156 :         chunk->bound_map = memblock_alloc(alloc_size, SMP_CACHE_BYTES);
    1326         [ -  + ]:        156 :         if (!chunk->bound_map)
    1327                 :          0 :                 panic("%s: Failed to allocate %zu bytes\n", __func__,
    1328                 :            :                       alloc_size);
    1329                 :            : 
    1330                 :        156 :         alloc_size = pcpu_chunk_nr_blocks(chunk) * sizeof(chunk->md_blocks[0]);
    1331                 :        156 :         chunk->md_blocks = memblock_alloc(alloc_size, SMP_CACHE_BYTES);
    1332         [ -  + ]:        156 :         if (!chunk->md_blocks)
    1333                 :          0 :                 panic("%s: Failed to allocate %zu bytes\n", __func__,
    1334                 :            :                       alloc_size);
    1335                 :            : 
    1336                 :        156 :         pcpu_init_md_blocks(chunk);
    1337                 :            : 
    1338                 :            :         /* manage populated page bitmap */
    1339                 :        156 :         chunk->immutable = true;
    1340         [ +  - ]:        156 :         bitmap_fill(chunk->populated, chunk->nr_pages);
    1341                 :        156 :         chunk->nr_populated = chunk->nr_pages;
    1342                 :        156 :         chunk->nr_empty_pop_pages = chunk->nr_pages;
    1343                 :            : 
    1344                 :        156 :         chunk->free_bytes = map_size;
    1345                 :            : 
    1346         [ +  - ]:        156 :         if (chunk->start_offset) {
    1347                 :            :                 /* hide the beginning of the bitmap */
    1348                 :        156 :                 offset_bits = chunk->start_offset / PCPU_MIN_ALLOC_SIZE;
    1349         [ -  + ]:        156 :                 bitmap_set(chunk->alloc_map, 0, offset_bits);
    1350                 :        156 :                 set_bit(0, chunk->bound_map);
    1351                 :        156 :                 set_bit(offset_bits, chunk->bound_map);
    1352                 :            : 
    1353                 :        156 :                 chunk->chunk_md.first_free = offset_bits;
    1354                 :            : 
    1355                 :        156 :                 pcpu_block_update_hint_alloc(chunk, 0, offset_bits);
    1356                 :            :         }
    1357                 :            : 
    1358         [ +  + ]:        156 :         if (chunk->end_offset) {
    1359                 :            :                 /* hide the end of the bitmap */
    1360                 :         78 :                 offset_bits = chunk->end_offset / PCPU_MIN_ALLOC_SIZE;
    1361         [ -  + ]:         78 :                 bitmap_set(chunk->alloc_map,
    1362         [ -  + ]:         78 :                            pcpu_chunk_map_bits(chunk) - offset_bits,
    1363                 :            :                            offset_bits);
    1364                 :         78 :                 set_bit((start_offset + map_size) / PCPU_MIN_ALLOC_SIZE,
    1365                 :         78 :                         chunk->bound_map);
    1366                 :         78 :                 set_bit(region_bits, chunk->bound_map);
    1367                 :            : 
    1368                 :         78 :                 pcpu_block_update_hint_alloc(chunk, pcpu_chunk_map_bits(chunk)
    1369                 :            :                                              - offset_bits, offset_bits);
    1370                 :            :         }
    1371                 :            : 
    1372                 :        156 :         return chunk;
    1373                 :            : }
    1374                 :            : 
    1375                 :         78 : static struct pcpu_chunk *pcpu_alloc_chunk(gfp_t gfp)
    1376                 :            : {
    1377                 :         78 :         struct pcpu_chunk *chunk;
    1378                 :         78 :         int region_bits;
    1379                 :            : 
    1380                 :         78 :         chunk = pcpu_mem_zalloc(pcpu_chunk_struct_size, gfp);
    1381         [ +  - ]:         78 :         if (!chunk)
    1382                 :            :                 return NULL;
    1383                 :            : 
    1384                 :         78 :         INIT_LIST_HEAD(&chunk->list);
    1385                 :         78 :         chunk->nr_pages = pcpu_unit_pages;
    1386                 :         78 :         region_bits = pcpu_chunk_map_bits(chunk);
    1387                 :            : 
    1388                 :         78 :         chunk->alloc_map = pcpu_mem_zalloc(BITS_TO_LONGS(region_bits) *
    1389                 :            :                                            sizeof(chunk->alloc_map[0]), gfp);
    1390         [ -  + ]:         78 :         if (!chunk->alloc_map)
    1391                 :          0 :                 goto alloc_map_fail;
    1392                 :            : 
    1393                 :         78 :         chunk->bound_map = pcpu_mem_zalloc(BITS_TO_LONGS(region_bits + 1) *
    1394                 :            :                                            sizeof(chunk->bound_map[0]), gfp);
    1395         [ -  + ]:         78 :         if (!chunk->bound_map)
    1396                 :          0 :                 goto bound_map_fail;
    1397                 :            : 
    1398                 :         78 :         chunk->md_blocks = pcpu_mem_zalloc(pcpu_chunk_nr_blocks(chunk) *
    1399                 :            :                                            sizeof(chunk->md_blocks[0]), gfp);
    1400         [ -  + ]:         78 :         if (!chunk->md_blocks)
    1401                 :          0 :                 goto md_blocks_fail;
    1402                 :            : 
    1403                 :         78 :         pcpu_init_md_blocks(chunk);
    1404                 :            : 
    1405                 :            :         /* init metadata */
    1406                 :         78 :         chunk->free_bytes = chunk->nr_pages * PAGE_SIZE;
    1407                 :            : 
    1408                 :         78 :         return chunk;
    1409                 :            : 
    1410                 :            : md_blocks_fail:
    1411                 :          0 :         pcpu_mem_free(chunk->bound_map);
    1412                 :          0 : bound_map_fail:
    1413                 :          0 :         pcpu_mem_free(chunk->alloc_map);
    1414                 :          0 : alloc_map_fail:
    1415                 :          0 :         pcpu_mem_free(chunk);
    1416                 :            : 
    1417                 :          0 :         return NULL;
    1418                 :            : }
    1419                 :            : 
    1420                 :          0 : static void pcpu_free_chunk(struct pcpu_chunk *chunk)
    1421                 :            : {
    1422         [ #  # ]:          0 :         if (!chunk)
    1423                 :            :                 return;
    1424                 :          0 :         pcpu_mem_free(chunk->md_blocks);
    1425                 :          0 :         pcpu_mem_free(chunk->bound_map);
    1426                 :          0 :         pcpu_mem_free(chunk->alloc_map);
    1427                 :          0 :         pcpu_mem_free(chunk);
    1428                 :            : }
    1429                 :            : 
    1430                 :            : /**
    1431                 :            :  * pcpu_chunk_populated - post-population bookkeeping
    1432                 :            :  * @chunk: pcpu_chunk which got populated
    1433                 :            :  * @page_start: the start page
    1434                 :            :  * @page_end: the end page
    1435                 :            :  *
    1436                 :            :  * Pages in [@page_start,@page_end) have been populated to @chunk.  Update
    1437                 :            :  * the bookkeeping information accordingly.  Must be called after each
    1438                 :            :  * successful population.
    1439                 :            :  *
    1440                 :            :  * If this is @for_alloc, do not increment pcpu_nr_empty_pop_pages because it
    1441                 :            :  * is to serve an allocation in that area.
    1442                 :            :  */
    1443                 :        156 : static void pcpu_chunk_populated(struct pcpu_chunk *chunk, int page_start,
    1444                 :            :                                  int page_end)
    1445                 :            : {
    1446                 :        156 :         int nr = page_end - page_start;
    1447                 :            : 
    1448                 :        156 :         lockdep_assert_held(&pcpu_lock);
    1449                 :            : 
    1450         [ -  + ]:        156 :         bitmap_set(chunk->populated, page_start, nr);
    1451                 :        156 :         chunk->nr_populated += nr;
    1452                 :        156 :         pcpu_nr_populated += nr;
    1453                 :            : 
    1454         [ +  - ]:        156 :         pcpu_update_empty_pages(chunk, nr);
    1455                 :        156 : }
    1456                 :            : 
    1457                 :            : /**
    1458                 :            :  * pcpu_chunk_depopulated - post-depopulation bookkeeping
    1459                 :            :  * @chunk: pcpu_chunk which got depopulated
    1460                 :            :  * @page_start: the start page
    1461                 :            :  * @page_end: the end page
    1462                 :            :  *
    1463                 :            :  * Pages in [@page_start,@page_end) have been depopulated from @chunk.
    1464                 :            :  * Update the bookkeeping information accordingly.  Must be called after
    1465                 :            :  * each successful depopulation.
    1466                 :            :  */
    1467                 :          0 : static void pcpu_chunk_depopulated(struct pcpu_chunk *chunk,
    1468                 :            :                                    int page_start, int page_end)
    1469                 :            : {
    1470                 :          0 :         int nr = page_end - page_start;
    1471                 :            : 
    1472                 :          0 :         lockdep_assert_held(&pcpu_lock);
    1473                 :            : 
    1474         [ #  # ]:          0 :         bitmap_clear(chunk->populated, page_start, nr);
    1475                 :          0 :         chunk->nr_populated -= nr;
    1476                 :          0 :         pcpu_nr_populated -= nr;
    1477                 :            : 
    1478         [ #  # ]:          0 :         pcpu_update_empty_pages(chunk, -nr);
    1479                 :          0 : }
    1480                 :            : 
    1481                 :            : /*
    1482                 :            :  * Chunk management implementation.
    1483                 :            :  *
    1484                 :            :  * To allow different implementations, chunk alloc/free and
    1485                 :            :  * [de]population are implemented in a separate file which is pulled
    1486                 :            :  * into this file and compiled together.  The following functions
    1487                 :            :  * should be implemented.
    1488                 :            :  *
    1489                 :            :  * pcpu_populate_chunk          - populate the specified range of a chunk
    1490                 :            :  * pcpu_depopulate_chunk        - depopulate the specified range of a chunk
    1491                 :            :  * pcpu_create_chunk            - create a new chunk
    1492                 :            :  * pcpu_destroy_chunk           - destroy a chunk, always preceded by full depop
    1493                 :            :  * pcpu_addr_to_page            - translate address to physical address
    1494                 :            :  * pcpu_verify_alloc_info       - check alloc_info is acceptable during init
    1495                 :            :  */
    1496                 :            : static int pcpu_populate_chunk(struct pcpu_chunk *chunk,
    1497                 :            :                                int page_start, int page_end, gfp_t gfp);
    1498                 :            : static void pcpu_depopulate_chunk(struct pcpu_chunk *chunk,
    1499                 :            :                                   int page_start, int page_end);
    1500                 :            : static struct pcpu_chunk *pcpu_create_chunk(gfp_t gfp);
    1501                 :            : static void pcpu_destroy_chunk(struct pcpu_chunk *chunk);
    1502                 :            : static struct page *pcpu_addr_to_page(void *addr);
    1503                 :            : static int __init pcpu_verify_alloc_info(const struct pcpu_alloc_info *ai);
    1504                 :            : 
    1505                 :            : #ifdef CONFIG_NEED_PER_CPU_KM
    1506                 :            : #include "percpu-km.c"
    1507                 :            : #else
    1508                 :            : #include "percpu-vm.c"
    1509                 :            : #endif
    1510                 :            : 
    1511                 :            : /**
    1512                 :            :  * pcpu_chunk_addr_search - determine chunk containing specified address
    1513                 :            :  * @addr: address for which the chunk needs to be determined.
    1514                 :            :  *
    1515                 :            :  * This is an internal function that handles all but static allocations.
    1516                 :            :  * Static percpu address values should never be passed into the allocator.
    1517                 :            :  *
    1518                 :            :  * RETURNS:
    1519                 :            :  * The address of the found chunk.
    1520                 :            :  */
    1521                 :       6129 : static struct pcpu_chunk *pcpu_chunk_addr_search(void *addr)
    1522                 :            : {
    1523                 :            :         /* is it in the dynamic region (first chunk)? */
    1524   [ +  -  +  + ]:       6129 :         if (pcpu_addr_in_chunk(pcpu_first_chunk, addr))
    1525                 :            :                 return pcpu_first_chunk;
    1526                 :            : 
    1527                 :            :         /* is it in the reserved region? */
    1528   [ +  -  +  - ]:       4725 :         if (pcpu_addr_in_chunk(pcpu_reserved_chunk, addr))
    1529                 :            :                 return pcpu_reserved_chunk;
    1530                 :            : 
    1531                 :            :         /*
    1532                 :            :          * The address is relative to unit0 which might be unused and
    1533                 :            :          * thus unmapped.  Offset the address to the unit space of the
    1534                 :            :          * current processor before looking it up in the vmalloc
    1535                 :            :          * space.  Note that any possible cpu id can be used here, so
    1536                 :            :          * there's no need to worry about preemption or cpu hotplug.
    1537                 :            :          */
    1538                 :       4725 :         addr += pcpu_unit_offsets[raw_smp_processor_id()];
    1539                 :       4725 :         return pcpu_get_page_chunk(pcpu_addr_to_page(addr));
    1540                 :            : }
    1541                 :            : 
    1542                 :            : /**
    1543                 :            :  * pcpu_alloc - the percpu allocator
    1544                 :            :  * @size: size of area to allocate in bytes
    1545                 :            :  * @align: alignment of area (max PAGE_SIZE)
    1546                 :            :  * @reserved: allocate from the reserved chunk if available
    1547                 :            :  * @gfp: allocation flags
    1548                 :            :  *
    1549                 :            :  * Allocate percpu area of @size bytes aligned at @align.  If @gfp doesn't
    1550                 :            :  * contain %GFP_KERNEL, the allocation is atomic. If @gfp has __GFP_NOWARN
    1551                 :            :  * then no warning will be triggered on invalid or failed allocation
    1552                 :            :  * requests.
    1553                 :            :  *
    1554                 :            :  * RETURNS:
    1555                 :            :  * Percpu pointer to the allocated area on success, NULL on failure.
    1556                 :            :  */
    1557                 :      78405 : static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved,
    1558                 :            :                                  gfp_t gfp)
    1559                 :            : {
    1560                 :            :         /* whitelisted flags that can be passed to the backing allocators */
    1561                 :      78405 :         gfp_t pcpu_gfp = gfp & (GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN);
    1562                 :      78405 :         bool is_atomic = (gfp & GFP_KERNEL) != GFP_KERNEL;
    1563                 :      78405 :         bool do_warn = !(gfp & __GFP_NOWARN);
    1564                 :      78405 :         static int warn_limit = 10;
    1565                 :      78405 :         struct pcpu_chunk *chunk, *next;
    1566                 :      78405 :         const char *err;
    1567                 :      78405 :         int slot, off, cpu, ret;
    1568                 :      78405 :         unsigned long flags;
    1569                 :      78405 :         void __percpu *ptr;
    1570                 :      78405 :         size_t bits, bit_align;
    1571                 :            : 
    1572                 :            :         /*
    1573                 :            :          * There is now a minimum allocation size of PCPU_MIN_ALLOC_SIZE,
    1574                 :            :          * therefore alignment must be a minimum of that many bytes.
    1575                 :            :          * An allocation may have internal fragmentation from rounding up
    1576                 :            :          * of up to PCPU_MIN_ALLOC_SIZE - 1 bytes.
    1577                 :            :          */
    1578         [ +  + ]:      78405 :         if (unlikely(align < PCPU_MIN_ALLOC_SIZE))
    1579                 :         78 :                 align = PCPU_MIN_ALLOC_SIZE;
    1580                 :            : 
    1581                 :      78405 :         size = ALIGN(size, PCPU_MIN_ALLOC_SIZE);
    1582                 :      78405 :         bits = size >> PCPU_MIN_ALLOC_SHIFT;
    1583                 :      78405 :         bit_align = align >> PCPU_MIN_ALLOC_SHIFT;
    1584                 :            : 
    1585   [ +  -  +  -  :     156810 :         if (unlikely(!size || size > PCPU_MIN_UNIT_SIZE || align > PAGE_SIZE ||
                   -  + ]
    1586                 :            :                      !is_power_of_2(align))) {
    1587         [ #  # ]:          0 :                 WARN(do_warn, "illegal size (%zu) or align (%zu) for percpu allocation\n",
    1588                 :            :                      size, align);
    1589                 :          0 :                 return NULL;
    1590                 :            :         }
    1591                 :            : 
    1592         [ +  + ]:      78405 :         if (!is_atomic) {
    1593                 :            :                 /*
    1594                 :            :                  * pcpu_balance_workfn() allocates memory under this mutex,
    1595                 :            :                  * and it may wait for memory reclaim. Allow current task
    1596                 :            :                  * to become OOM victim, in case of memory pressure.
    1597                 :            :                  */
    1598         [ -  + ]:      78197 :                 if (gfp & __GFP_NOFAIL)
    1599                 :          0 :                         mutex_lock(&pcpu_alloc_mutex);
    1600         [ +  - ]:      78197 :                 else if (mutex_lock_killable(&pcpu_alloc_mutex))
    1601                 :            :                         return NULL;
    1602                 :            :         }
    1603                 :            : 
    1604                 :      78405 :         spin_lock_irqsave(&pcpu_lock, flags);
    1605                 :            : 
    1606                 :            :         /* serve reserved allocations from the reserved chunk if available */
    1607   [ +  -  -  - ]:      78405 :         if (reserved && pcpu_reserved_chunk) {
    1608                 :          0 :                 chunk = pcpu_reserved_chunk;
    1609                 :            : 
    1610                 :          0 :                 off = pcpu_find_block_fit(chunk, bits, bit_align, is_atomic);
    1611         [ #  # ]:          0 :                 if (off < 0) {
    1612                 :          0 :                         err = "alloc from reserved chunk failed";
    1613                 :          0 :                         goto fail_unlock;
    1614                 :            :                 }
    1615                 :            : 
    1616                 :          0 :                 off = pcpu_alloc_area(chunk, bits, bit_align, off);
    1617         [ #  # ]:          0 :                 if (off >= 0)
    1618                 :          0 :                         goto area_found;
    1619                 :            : 
    1620                 :          0 :                 err = "alloc from reserved chunk failed";
    1621                 :          0 :                 goto fail_unlock;
    1622                 :            :         }
    1623                 :            : 
    1624                 :      78405 : restart:
    1625                 :            :         /* search through normal chunks */
    1626   [ -  +  +  - ]:    1087402 :         for (slot = pcpu_size_to_slot(size); slot < pcpu_nr_slots; slot++) {
    1627         [ +  + ]:    1008997 :                 list_for_each_entry_safe(chunk, next, &pcpu_slot[slot], list) {
    1628                 :      78405 :                         off = pcpu_find_block_fit(chunk, bits, bit_align,
    1629                 :            :                                                   is_atomic);
    1630         [ -  + ]:      78405 :                         if (off < 0) {
    1631         [ #  # ]:          0 :                                 if (slot < PCPU_SLOT_FAIL_THRESHOLD)
    1632                 :          0 :                                         pcpu_chunk_move(chunk, 0);
    1633                 :          0 :                                 continue;
    1634                 :            :                         }
    1635                 :            : 
    1636                 :      78405 :                         off = pcpu_alloc_area(chunk, bits, bit_align, off);
    1637         [ +  - ]:      78405 :                         if (off >= 0)
    1638                 :      78405 :                                 goto area_found;
    1639                 :            : 
    1640                 :            :                 }
    1641                 :            :         }
    1642                 :            : 
    1643                 :          0 :         spin_unlock_irqrestore(&pcpu_lock, flags);
    1644                 :            : 
    1645                 :            :         /*
    1646                 :            :          * No space left.  Create a new chunk.  We don't want multiple
    1647                 :            :          * tasks to create chunks simultaneously.  Serialize and create iff
    1648                 :            :          * there's still no empty chunk after grabbing the mutex.
    1649                 :            :          */
    1650         [ #  # ]:          0 :         if (is_atomic) {
    1651                 :          0 :                 err = "atomic alloc failed, no space left";
    1652                 :          0 :                 goto fail;
    1653                 :            :         }
    1654                 :            : 
    1655         [ #  # ]:          0 :         if (list_empty(&pcpu_slot[pcpu_nr_slots - 1])) {
    1656                 :          0 :                 chunk = pcpu_create_chunk(pcpu_gfp);
    1657         [ #  # ]:          0 :                 if (!chunk) {
    1658                 :          0 :                         err = "failed to allocate new chunk";
    1659                 :          0 :                         goto fail;
    1660                 :            :                 }
    1661                 :            : 
    1662                 :          0 :                 spin_lock_irqsave(&pcpu_lock, flags);
    1663                 :          0 :                 pcpu_chunk_relocate(chunk, -1);
    1664                 :            :         } else {
    1665                 :          0 :                 spin_lock_irqsave(&pcpu_lock, flags);
    1666                 :            :         }
    1667                 :            : 
    1668                 :          0 :         goto restart;
    1669                 :            : 
    1670                 :      78405 : area_found:
    1671                 :      78405 :         pcpu_stats_area_alloc(chunk, size);
    1672                 :      78405 :         spin_unlock_irqrestore(&pcpu_lock, flags);
    1673                 :            : 
    1674                 :            :         /* populate if not all pages are already there */
    1675         [ +  + ]:      78405 :         if (!is_atomic) {
    1676                 :      78197 :                 unsigned int page_start, page_end, rs, re;
    1677                 :            : 
    1678                 :      78197 :                 page_start = PFN_DOWN(off);
    1679                 :      78197 :                 page_end = PFN_UP(off + size);
    1680                 :            : 
    1681         [ -  + ]:      78197 :                 bitmap_for_each_clear_region(chunk->populated, rs, re,
    1682                 :            :                                              page_start, page_end) {
    1683         [ #  # ]:          0 :                         WARN_ON(chunk->immutable);
    1684                 :            : 
    1685                 :          0 :                         ret = pcpu_populate_chunk(chunk, rs, re, pcpu_gfp);
    1686                 :            : 
    1687                 :          0 :                         spin_lock_irqsave(&pcpu_lock, flags);
    1688         [ #  # ]:          0 :                         if (ret) {
    1689                 :          0 :                                 pcpu_free_area(chunk, off);
    1690                 :          0 :                                 err = "failed to populate";
    1691                 :          0 :                                 goto fail_unlock;
    1692                 :            :                         }
    1693                 :          0 :                         pcpu_chunk_populated(chunk, rs, re);
    1694                 :          0 :                         spin_unlock_irqrestore(&pcpu_lock, flags);
    1695                 :            :                 }
    1696                 :            : 
    1697                 :      78197 :                 mutex_unlock(&pcpu_alloc_mutex);
    1698                 :            :         }
    1699                 :            : 
    1700         [ +  + ]:      78405 :         if (pcpu_nr_empty_pop_pages < PCPU_EMPTY_POP_PAGES_LOW)
    1701         [ +  - ]:        156 :                 pcpu_schedule_balance_work();
    1702                 :            : 
    1703                 :            :         /* clear the areas and return address relative to base address */
    1704         [ +  + ]:     156810 :         for_each_possible_cpu(cpu)
    1705                 :      78405 :                 memset((void *)pcpu_chunk_addr(chunk, cpu, 0) + off, 0, size);
    1706                 :            : 
    1707                 :      78405 :         ptr = __addr_to_pcpu_ptr(chunk->base_addr + off);
    1708                 :      78405 :         kmemleak_alloc_percpu(ptr, size, gfp);
    1709                 :            : 
    1710                 :      78405 :         trace_percpu_alloc_percpu(reserved, is_atomic, size, align,
    1711                 :            :                         chunk->base_addr, off, ptr);
    1712                 :            : 
    1713                 :      78405 :         return ptr;
    1714                 :            : 
    1715                 :          0 : fail_unlock:
    1716                 :          0 :         spin_unlock_irqrestore(&pcpu_lock, flags);
    1717                 :          0 : fail:
    1718                 :          0 :         trace_percpu_alloc_percpu_fail(reserved, is_atomic, size, align);
    1719                 :            : 
    1720   [ #  #  #  # ]:          0 :         if (!is_atomic && do_warn && warn_limit) {
    1721                 :          0 :                 pr_warn("allocation failed, size=%zu align=%zu atomic=%d, %s\n",
    1722                 :            :                         size, align, is_atomic, err);
    1723                 :          0 :                 dump_stack();
    1724         [ #  # ]:          0 :                 if (!--warn_limit)
    1725                 :          0 :                         pr_info("limit reached, disable warning\n");
    1726                 :            :         }
    1727         [ #  # ]:          0 :         if (is_atomic) {
    1728                 :            :                 /* see the flag handling in pcpu_blance_workfn() */
    1729                 :          0 :                 pcpu_atomic_alloc_failed = true;
    1730         [ #  # ]:          0 :                 pcpu_schedule_balance_work();
    1731                 :            :         } else {
    1732                 :          0 :                 mutex_unlock(&pcpu_alloc_mutex);
    1733                 :            :         }
    1734                 :            :         return NULL;
    1735                 :            : }
    1736                 :            : 
    1737                 :            : /**
    1738                 :            :  * __alloc_percpu_gfp - allocate dynamic percpu area
    1739                 :            :  * @size: size of area to allocate in bytes
    1740                 :            :  * @align: alignment of area (max PAGE_SIZE)
    1741                 :            :  * @gfp: allocation flags
    1742                 :            :  *
    1743                 :            :  * Allocate zero-filled percpu area of @size bytes aligned at @align.  If
    1744                 :            :  * @gfp doesn't contain %GFP_KERNEL, the allocation doesn't block and can
    1745                 :            :  * be called from any context but is a lot more likely to fail. If @gfp
    1746                 :            :  * has __GFP_NOWARN then no warning will be triggered on invalid or failed
    1747                 :            :  * allocation requests.
    1748                 :            :  *
    1749                 :            :  * RETURNS:
    1750                 :            :  * Percpu pointer to the allocated area on success, NULL on failure.
    1751                 :            :  */
    1752                 :      24834 : void __percpu *__alloc_percpu_gfp(size_t size, size_t align, gfp_t gfp)
    1753                 :            : {
    1754                 :      24834 :         return pcpu_alloc(size, align, false, gfp);
    1755                 :            : }
    1756                 :            : EXPORT_SYMBOL_GPL(__alloc_percpu_gfp);
    1757                 :            : 
    1758                 :            : /**
    1759                 :            :  * __alloc_percpu - allocate dynamic percpu area
    1760                 :            :  * @size: size of area to allocate in bytes
    1761                 :            :  * @align: alignment of area (max PAGE_SIZE)
    1762                 :            :  *
    1763                 :            :  * Equivalent to __alloc_percpu_gfp(size, align, %GFP_KERNEL).
    1764                 :            :  */
    1765                 :      53571 : void __percpu *__alloc_percpu(size_t size, size_t align)
    1766                 :            : {
    1767                 :      53571 :         return pcpu_alloc(size, align, false, GFP_KERNEL);
    1768                 :            : }
    1769                 :            : EXPORT_SYMBOL_GPL(__alloc_percpu);
    1770                 :            : 
    1771                 :            : /**
    1772                 :            :  * __alloc_reserved_percpu - allocate reserved percpu area
    1773                 :            :  * @size: size of area to allocate in bytes
    1774                 :            :  * @align: alignment of area (max PAGE_SIZE)
    1775                 :            :  *
    1776                 :            :  * Allocate zero-filled percpu area of @size bytes aligned at @align
    1777                 :            :  * from reserved percpu area if arch has set it up; otherwise,
    1778                 :            :  * allocation is served from the same dynamic area.  Might sleep.
    1779                 :            :  * Might trigger writeouts.
    1780                 :            :  *
    1781                 :            :  * CONTEXT:
    1782                 :            :  * Does GFP_KERNEL allocation.
    1783                 :            :  *
    1784                 :            :  * RETURNS:
    1785                 :            :  * Percpu pointer to the allocated area on success, NULL on failure.
    1786                 :            :  */
    1787                 :          0 : void __percpu *__alloc_reserved_percpu(size_t size, size_t align)
    1788                 :            : {
    1789                 :          0 :         return pcpu_alloc(size, align, true, GFP_KERNEL);
    1790                 :            : }
    1791                 :            : 
    1792                 :            : /**
    1793                 :            :  * pcpu_balance_workfn - manage the amount of free chunks and populated pages
    1794                 :            :  * @work: unused
    1795                 :            :  *
    1796                 :            :  * Reclaim all fully free chunks except for the first one.  This is also
    1797                 :            :  * responsible for maintaining the pool of empty populated pages.  However,
    1798                 :            :  * it is possible that this is called when physical memory is scarce causing
    1799                 :            :  * OOM killer to be triggered.  We should avoid doing so until an actual
    1800                 :            :  * allocation causes the failure as it is possible that requests can be
    1801                 :            :  * serviced from already backed regions.
    1802                 :            :  */
    1803                 :        156 : static void pcpu_balance_workfn(struct work_struct *work)
    1804                 :            : {
    1805                 :            :         /* gfp flags passed to underlying allocators */
    1806                 :        156 :         const gfp_t gfp = GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN;
    1807                 :        156 :         LIST_HEAD(to_free);
    1808                 :        156 :         struct list_head *free_head = &pcpu_slot[pcpu_nr_slots - 1];
    1809                 :        156 :         struct pcpu_chunk *chunk, *next;
    1810                 :        156 :         int slot, nr_to_pop, ret;
    1811                 :            : 
    1812                 :            :         /*
    1813                 :            :          * There's no reason to keep around multiple unused chunks and VM
    1814                 :            :          * areas can be scarce.  Destroy all free chunks except for one.
    1815                 :            :          */
    1816                 :        156 :         mutex_lock(&pcpu_alloc_mutex);
    1817                 :        156 :         spin_lock_irq(&pcpu_lock);
    1818                 :            : 
    1819         [ -  + ]:        156 :         list_for_each_entry_safe(chunk, next, free_head, list) {
    1820         [ #  # ]:          0 :                 WARN_ON(chunk->immutable);
    1821                 :            : 
    1822                 :            :                 /* spare the first one */
    1823         [ #  # ]:          0 :                 if (chunk == list_first_entry(free_head, struct pcpu_chunk, list))
    1824                 :          0 :                         continue;
    1825                 :            : 
    1826                 :          0 :                 list_move(&chunk->list, &to_free);
    1827                 :            :         }
    1828                 :            : 
    1829                 :        156 :         spin_unlock_irq(&pcpu_lock);
    1830                 :            : 
    1831         [ -  + ]:        156 :         list_for_each_entry_safe(chunk, next, &to_free, list) {
    1832                 :          0 :                 unsigned int rs, re;
    1833                 :            : 
    1834         [ #  # ]:          0 :                 bitmap_for_each_set_region(chunk->populated, rs, re, 0,
    1835                 :            :                                            chunk->nr_pages) {
    1836                 :          0 :                         pcpu_depopulate_chunk(chunk, rs, re);
    1837                 :          0 :                         spin_lock_irq(&pcpu_lock);
    1838                 :          0 :                         pcpu_chunk_depopulated(chunk, rs, re);
    1839                 :          0 :                         spin_unlock_irq(&pcpu_lock);
    1840                 :            :                 }
    1841                 :          0 :                 pcpu_destroy_chunk(chunk);
    1842                 :          0 :                 cond_resched();
    1843                 :            :         }
    1844                 :            : 
    1845                 :            :         /*
    1846                 :            :          * Ensure there are certain number of free populated pages for
    1847                 :            :          * atomic allocs.  Fill up from the most packed so that atomic
    1848                 :            :          * allocs don't increase fragmentation.  If atomic allocation
    1849                 :            :          * failed previously, always populate the maximum amount.  This
    1850                 :            :          * should prevent atomic allocs larger than PAGE_SIZE from keeping
    1851                 :            :          * failing indefinitely; however, large atomic allocs are not
    1852                 :            :          * something we support properly and can be highly unreliable and
    1853                 :            :          * inefficient.
    1854                 :            :          */
    1855                 :        156 : retry_pop:
    1856         [ -  + ]:        234 :         if (pcpu_atomic_alloc_failed) {
    1857                 :          0 :                 nr_to_pop = PCPU_EMPTY_POP_PAGES_HIGH;
    1858                 :            :                 /* best effort anyway, don't worry about synchronization */
    1859                 :          0 :                 pcpu_atomic_alloc_failed = false;
    1860                 :            :         } else {
    1861                 :        234 :                 nr_to_pop = clamp(PCPU_EMPTY_POP_PAGES_HIGH -
    1862                 :            :                                   pcpu_nr_empty_pop_pages,
    1863                 :            :                                   0, PCPU_EMPTY_POP_PAGES_HIGH);
    1864                 :            :         }
    1865                 :            : 
    1866   [ -  +  +  + ]:       2886 :         for (slot = pcpu_size_to_slot(PAGE_SIZE); slot < pcpu_nr_slots; slot++) {
    1867                 :       2496 :                 unsigned int nr_unpop = 0, rs, re;
    1868                 :            : 
    1869         [ +  + ]:       2496 :                 if (!nr_to_pop)
    1870                 :            :                         break;
    1871                 :            : 
    1872                 :       2418 :                 spin_lock_irq(&pcpu_lock);
    1873         [ +  + ]:       2574 :                 list_for_each_entry(chunk, &pcpu_slot[slot], list) {
    1874                 :        312 :                         nr_unpop = chunk->nr_pages - chunk->nr_populated;
    1875         [ +  + ]:        312 :                         if (nr_unpop)
    1876                 :            :                                 break;
    1877                 :            :                 }
    1878                 :       2418 :                 spin_unlock_irq(&pcpu_lock);
    1879                 :            : 
    1880         [ +  + ]:       2418 :                 if (!nr_unpop)
    1881                 :       2262 :                         continue;
    1882                 :            : 
    1883                 :            :                 /* @chunk can't go away while pcpu_alloc_mutex is held */
    1884         [ +  - ]:        156 :                 bitmap_for_each_clear_region(chunk->populated, rs, re, 0,
    1885                 :            :                                              chunk->nr_pages) {
    1886                 :        156 :                         int nr = min_t(int, re - rs, nr_to_pop);
    1887                 :            : 
    1888                 :        156 :                         ret = pcpu_populate_chunk(chunk, rs, rs + nr, gfp);
    1889         [ +  - ]:        156 :                         if (!ret) {
    1890                 :        156 :                                 nr_to_pop -= nr;
    1891                 :        156 :                                 spin_lock_irq(&pcpu_lock);
    1892                 :        156 :                                 pcpu_chunk_populated(chunk, rs, rs + nr);
    1893                 :        156 :                                 spin_unlock_irq(&pcpu_lock);
    1894                 :            :                         } else {
    1895                 :            :                                 nr_to_pop = 0;
    1896                 :            :                         }
    1897                 :            : 
    1898         [ -  + ]:        156 :                         if (!nr_to_pop)
    1899                 :            :                                 break;
    1900                 :            :                 }
    1901                 :            :         }
    1902                 :            : 
    1903         [ +  + ]:        234 :         if (nr_to_pop) {
    1904                 :            :                 /* ran out of chunks to populate, create a new one and retry */
    1905                 :         78 :                 chunk = pcpu_create_chunk(gfp);
    1906         [ +  - ]:         78 :                 if (chunk) {
    1907                 :         78 :                         spin_lock_irq(&pcpu_lock);
    1908                 :         78 :                         pcpu_chunk_relocate(chunk, -1);
    1909                 :         78 :                         spin_unlock_irq(&pcpu_lock);
    1910                 :         78 :                         goto retry_pop;
    1911                 :            :                 }
    1912                 :            :         }
    1913                 :            : 
    1914                 :        156 :         mutex_unlock(&pcpu_alloc_mutex);
    1915                 :        156 : }
    1916                 :            : 
    1917                 :            : /**
    1918                 :            :  * free_percpu - free percpu area
    1919                 :            :  * @ptr: pointer to area to free
    1920                 :            :  *
    1921                 :            :  * Free percpu area @ptr.
    1922                 :            :  *
    1923                 :            :  * CONTEXT:
    1924                 :            :  * Can be called from atomic context.
    1925                 :            :  */
    1926                 :       6174 : void free_percpu(void __percpu *ptr)
    1927                 :            : {
    1928                 :       6174 :         void *addr;
    1929                 :       6174 :         struct pcpu_chunk *chunk;
    1930                 :       6174 :         unsigned long flags;
    1931                 :       6174 :         int off;
    1932                 :       6174 :         bool need_balance = false;
    1933                 :            : 
    1934         [ +  + ]:       6174 :         if (!ptr)
    1935                 :            :                 return;
    1936                 :            : 
    1937                 :       6129 :         kmemleak_free_percpu(ptr);
    1938                 :            : 
    1939                 :       6129 :         addr = __pcpu_ptr_to_addr(ptr);
    1940                 :            : 
    1941                 :       6129 :         spin_lock_irqsave(&pcpu_lock, flags);
    1942                 :            : 
    1943                 :       6129 :         chunk = pcpu_chunk_addr_search(addr);
    1944                 :       6129 :         off = addr - chunk->base_addr;
    1945                 :            : 
    1946                 :       6129 :         pcpu_free_area(chunk, off);
    1947                 :            : 
    1948                 :            :         /* if there are more than one fully free chunks, wake up grim reaper */
    1949         [ -  + ]:       6129 :         if (chunk->free_bytes == pcpu_unit_size) {
    1950                 :          0 :                 struct pcpu_chunk *pos;
    1951                 :            : 
    1952         [ #  # ]:          0 :                 list_for_each_entry(pos, &pcpu_slot[pcpu_nr_slots - 1], list)
    1953         [ #  # ]:          0 :                         if (pos != chunk) {
    1954                 :            :                                 need_balance = true;
    1955                 :            :                                 break;
    1956                 :            :                         }
    1957                 :            :         }
    1958                 :            : 
    1959                 :       6129 :         trace_percpu_free_percpu(chunk->base_addr, off, ptr);
    1960                 :            : 
    1961                 :       6129 :         spin_unlock_irqrestore(&pcpu_lock, flags);
    1962                 :            : 
    1963         [ -  + ]:       6129 :         if (need_balance)
    1964         [ #  # ]:          0 :                 pcpu_schedule_balance_work();
    1965                 :            : }
    1966                 :            : EXPORT_SYMBOL_GPL(free_percpu);
    1967                 :            : 
    1968                 :          0 : bool __is_kernel_percpu_address(unsigned long addr, unsigned long *can_addr)
    1969                 :            : {
    1970                 :            : #ifdef CONFIG_SMP
    1971                 :          0 :         const size_t static_size = __per_cpu_end - __per_cpu_start;
    1972                 :          0 :         void __percpu *base = __addr_to_pcpu_ptr(pcpu_base_addr);
    1973                 :          0 :         unsigned int cpu;
    1974                 :            : 
    1975         [ #  # ]:          0 :         for_each_possible_cpu(cpu) {
    1976                 :          0 :                 void *start = per_cpu_ptr(base, cpu);
    1977                 :          0 :                 void *va = (void *)addr;
    1978                 :            : 
    1979   [ #  #  #  # ]:          0 :                 if (va >= start && va < start + static_size) {
    1980         [ #  # ]:          0 :                         if (can_addr) {
    1981                 :          0 :                                 *can_addr = (unsigned long) (va - start);
    1982                 :          0 :                                 *can_addr += (unsigned long)
    1983                 :          0 :                                         per_cpu_ptr(base, get_boot_cpu_id());
    1984                 :            :                         }
    1985                 :          0 :                         return true;
    1986                 :            :                 }
    1987                 :            :         }
    1988                 :            : #endif
    1989                 :            :         /* on UP, can't distinguish from other static vars, always false */
    1990                 :            :         return false;
    1991                 :            : }
    1992                 :            : 
    1993                 :            : /**
    1994                 :            :  * is_kernel_percpu_address - test whether address is from static percpu area
    1995                 :            :  * @addr: address to test
    1996                 :            :  *
    1997                 :            :  * Test whether @addr belongs to in-kernel static percpu area.  Module
    1998                 :            :  * static percpu areas are not considered.  For those, use
    1999                 :            :  * is_module_percpu_address().
    2000                 :            :  *
    2001                 :            :  * RETURNS:
    2002                 :            :  * %true if @addr is from in-kernel static percpu area, %false otherwise.
    2003                 :            :  */
    2004                 :          0 : bool is_kernel_percpu_address(unsigned long addr)
    2005                 :            : {
    2006                 :          0 :         return __is_kernel_percpu_address(addr, NULL);
    2007                 :            : }
    2008                 :            : 
    2009                 :            : /**
    2010                 :            :  * per_cpu_ptr_to_phys - convert translated percpu address to physical address
    2011                 :            :  * @addr: the address to be converted to physical address
    2012                 :            :  *
    2013                 :            :  * Given @addr which is dereferenceable address obtained via one of
    2014                 :            :  * percpu access macros, this function translates it into its physical
    2015                 :            :  * address.  The caller is responsible for ensuring @addr stays valid
    2016                 :            :  * until this function finishes.
    2017                 :            :  *
    2018                 :            :  * percpu allocator has special setup for the first chunk, which currently
    2019                 :            :  * supports either embedding in linear address space or vmalloc mapping,
    2020                 :            :  * and, from the second one, the backing allocator (currently either vm or
    2021                 :            :  * km) provides translation.
    2022                 :            :  *
    2023                 :            :  * The addr can be translated simply without checking if it falls into the
    2024                 :            :  * first chunk. But the current code reflects better how percpu allocator
    2025                 :            :  * actually works, and the verification can discover both bugs in percpu
    2026                 :            :  * allocator itself and per_cpu_ptr_to_phys() callers. So we keep current
    2027                 :            :  * code.
    2028                 :            :  *
    2029                 :            :  * RETURNS:
    2030                 :            :  * The physical address for @addr.
    2031                 :            :  */
    2032                 :       1326 : phys_addr_t per_cpu_ptr_to_phys(void *addr)
    2033                 :            : {
    2034                 :       1326 :         void __percpu *base = __addr_to_pcpu_ptr(pcpu_base_addr);
    2035                 :       1326 :         bool in_first_chunk = false;
    2036                 :       1326 :         unsigned long first_low, first_high;
    2037                 :       1326 :         unsigned int cpu;
    2038                 :            : 
    2039                 :            :         /*
    2040                 :            :          * The following test on unit_low/high isn't strictly
    2041                 :            :          * necessary but will speed up lookups of addresses which
    2042                 :            :          * aren't in the first chunk.
    2043                 :            :          *
    2044                 :            :          * The address check is against full chunk sizes.  pcpu_base_addr
    2045                 :            :          * points to the beginning of the first chunk including the
    2046                 :            :          * static region.  Assumes good intent as the first chunk may
    2047                 :            :          * not be full (ie. < pcpu_unit_pages in size).
    2048                 :            :          */
    2049                 :       1326 :         first_low = (unsigned long)pcpu_base_addr +
    2050                 :       1326 :                     pcpu_unit_page_offset(pcpu_low_unit_cpu, 0);
    2051                 :       1326 :         first_high = (unsigned long)pcpu_base_addr +
    2052                 :       1326 :                      pcpu_unit_page_offset(pcpu_high_unit_cpu, pcpu_unit_pages);
    2053                 :       1326 :         if ((unsigned long)addr >= first_low &&
    2054         [ +  - ]:       1326 :             (unsigned long)addr < first_high) {
    2055         [ +  - ]:       1326 :                 for_each_possible_cpu(cpu) {
    2056                 :       1326 :                         void *start = per_cpu_ptr(base, cpu);
    2057                 :            : 
    2058   [ +  -  -  + ]:       1326 :                         if (addr >= start && addr < start + pcpu_unit_size) {
    2059                 :            :                                 in_first_chunk = true;
    2060                 :            :                                 break;
    2061                 :            :                         }
    2062                 :            :                 }
    2063                 :            :         }
    2064                 :            : 
    2065         [ +  - ]:       1326 :         if (in_first_chunk) {
    2066         [ +  - ]:       1326 :                 if (!is_vmalloc_addr(addr))
    2067         [ +  - ]:       2652 :                         return __pa(addr);
    2068                 :            :                 else
    2069                 :          0 :                         return page_to_phys(vmalloc_to_page(addr)) +
    2070                 :          0 :                                offset_in_page(addr);
    2071                 :            :         } else
    2072                 :          0 :                 return page_to_phys(pcpu_addr_to_page(addr)) +
    2073                 :          0 :                        offset_in_page(addr);
    2074                 :            : }
    2075                 :            : 
    2076                 :            : /**
    2077                 :            :  * pcpu_alloc_alloc_info - allocate percpu allocation info
    2078                 :            :  * @nr_groups: the number of groups
    2079                 :            :  * @nr_units: the number of units
    2080                 :            :  *
    2081                 :            :  * Allocate ai which is large enough for @nr_groups groups containing
    2082                 :            :  * @nr_units units.  The returned ai's groups[0].cpu_map points to the
    2083                 :            :  * cpu_map array which is long enough for @nr_units and filled with
    2084                 :            :  * NR_CPUS.  It's the caller's responsibility to initialize cpu_map
    2085                 :            :  * pointer of other groups.
    2086                 :            :  *
    2087                 :            :  * RETURNS:
    2088                 :            :  * Pointer to the allocated pcpu_alloc_info on success, NULL on
    2089                 :            :  * failure.
    2090                 :            :  */
    2091                 :         78 : struct pcpu_alloc_info * __init pcpu_alloc_alloc_info(int nr_groups,
    2092                 :            :                                                       int nr_units)
    2093                 :            : {
    2094                 :         78 :         struct pcpu_alloc_info *ai;
    2095                 :         78 :         size_t base_size, ai_size;
    2096                 :         78 :         void *ptr;
    2097                 :         78 :         int unit;
    2098                 :            : 
    2099         [ +  - ]:         78 :         base_size = ALIGN(struct_size(ai, groups, nr_groups),
    2100                 :            :                           __alignof__(ai->groups[0].cpu_map[0]));
    2101                 :         78 :         ai_size = base_size + nr_units * sizeof(ai->groups[0].cpu_map[0]);
    2102                 :            : 
    2103                 :         78 :         ptr = memblock_alloc(PFN_ALIGN(ai_size), PAGE_SIZE);
    2104         [ +  - ]:         78 :         if (!ptr)
    2105                 :            :                 return NULL;
    2106                 :         78 :         ai = ptr;
    2107                 :         78 :         ptr += base_size;
    2108                 :            : 
    2109                 :         78 :         ai->groups[0].cpu_map = ptr;
    2110                 :            : 
    2111         [ +  + ]:        156 :         for (unit = 0; unit < nr_units; unit++)
    2112                 :         78 :                 ai->groups[0].cpu_map[unit] = NR_CPUS;
    2113                 :            : 
    2114                 :         78 :         ai->nr_groups = nr_groups;
    2115                 :         78 :         ai->__ai_size = PFN_ALIGN(ai_size);
    2116                 :            : 
    2117                 :         78 :         return ai;
    2118                 :            : }
    2119                 :            : 
    2120                 :            : /**
    2121                 :            :  * pcpu_free_alloc_info - free percpu allocation info
    2122                 :            :  * @ai: pcpu_alloc_info to free
    2123                 :            :  *
    2124                 :            :  * Free @ai which was allocated by pcpu_alloc_alloc_info().
    2125                 :            :  */
    2126                 :         78 : void __init pcpu_free_alloc_info(struct pcpu_alloc_info *ai)
    2127                 :            : {
    2128         [ +  - ]:         78 :         memblock_free_early(__pa(ai), ai->__ai_size);
    2129                 :         78 : }
    2130                 :            : 
    2131                 :            : /**
    2132                 :            :  * pcpu_dump_alloc_info - print out information about pcpu_alloc_info
    2133                 :            :  * @lvl: loglevel
    2134                 :            :  * @ai: allocation info to dump
    2135                 :            :  *
    2136                 :            :  * Print out information about @ai using loglevel @lvl.
    2137                 :            :  */
    2138                 :         78 : static void pcpu_dump_alloc_info(const char *lvl,
    2139                 :            :                                  const struct pcpu_alloc_info *ai)
    2140                 :            : {
    2141                 :         78 :         int group_width = 1, cpu_width = 1, width;
    2142                 :         78 :         char empty_str[] = "--------";
    2143                 :         78 :         int alloc = 0, alloc_end = 0;
    2144                 :         78 :         int group, v;
    2145                 :         78 :         int upa, apl;   /* units per alloc, allocs per line */
    2146                 :            : 
    2147                 :         78 :         v = ai->nr_groups;
    2148         [ -  + ]:         78 :         while (v /= 10)
    2149                 :          0 :                 group_width++;
    2150                 :            : 
    2151                 :         78 :         v = num_possible_cpus();
    2152         [ -  + ]:         78 :         while (v /= 10)
    2153                 :          0 :                 cpu_width++;
    2154                 :         78 :         empty_str[min_t(int, cpu_width, sizeof(empty_str) - 1)] = '\0';
    2155                 :            : 
    2156                 :         78 :         upa = ai->alloc_size / ai->unit_size;
    2157                 :         78 :         width = upa * (cpu_width + 1) + group_width + 3;
    2158                 :         78 :         apl = rounddown_pow_of_two(max(60 / width, 1));
    2159                 :            : 
    2160                 :         78 :         printk("%spcpu-alloc: s%zu r%zu d%zu u%zu alloc=%zu*%zu",
    2161                 :            :                lvl, ai->static_size, ai->reserved_size, ai->dyn_size,
    2162                 :            :                ai->unit_size, ai->alloc_size / ai->atom_size, ai->atom_size);
    2163                 :            : 
    2164         [ +  + ]:        234 :         for (group = 0; group < ai->nr_groups; group++) {
    2165                 :         78 :                 const struct pcpu_group_info *gi = &ai->groups[group];
    2166                 :         78 :                 int unit = 0, unit_end = 0;
    2167                 :            : 
    2168         [ -  + ]:         78 :                 BUG_ON(gi->nr_units % upa);
    2169                 :         78 :                 for (alloc_end += gi->nr_units / upa;
    2170         [ +  + ]:        156 :                      alloc < alloc_end; alloc++) {
    2171         [ +  - ]:         78 :                         if (!(alloc % apl)) {
    2172                 :         78 :                                 pr_cont("\n");
    2173                 :         78 :                                 printk("%spcpu-alloc: ", lvl);
    2174                 :            :                         }
    2175                 :         78 :                         pr_cont("[%0*d] ", group_width, group);
    2176                 :            : 
    2177         [ +  + ]:        156 :                         for (unit_end += upa; unit < unit_end; unit++)
    2178         [ +  - ]:         78 :                                 if (gi->cpu_map[unit] != NR_CPUS)
    2179                 :         78 :                                         pr_cont("%0*d ",
    2180                 :            :                                                 cpu_width, gi->cpu_map[unit]);
    2181                 :            :                                 else
    2182                 :          0 :                                         pr_cont("%s ", empty_str);
    2183                 :            :                 }
    2184                 :            :         }
    2185                 :         78 :         pr_cont("\n");
    2186                 :         78 : }
    2187                 :            : 
    2188                 :            : /**
    2189                 :            :  * pcpu_setup_first_chunk - initialize the first percpu chunk
    2190                 :            :  * @ai: pcpu_alloc_info describing how to percpu area is shaped
    2191                 :            :  * @base_addr: mapped address
    2192                 :            :  *
    2193                 :            :  * Initialize the first percpu chunk which contains the kernel static
    2194                 :            :  * percpu area.  This function is to be called from arch percpu area
    2195                 :            :  * setup path.
    2196                 :            :  *
    2197                 :            :  * @ai contains all information necessary to initialize the first
    2198                 :            :  * chunk and prime the dynamic percpu allocator.
    2199                 :            :  *
    2200                 :            :  * @ai->static_size is the size of static percpu area.
    2201                 :            :  *
    2202                 :            :  * @ai->reserved_size, if non-zero, specifies the amount of bytes to
    2203                 :            :  * reserve after the static area in the first chunk.  This reserves
    2204                 :            :  * the first chunk such that it's available only through reserved
    2205                 :            :  * percpu allocation.  This is primarily used to serve module percpu
    2206                 :            :  * static areas on architectures where the addressing model has
    2207                 :            :  * limited offset range for symbol relocations to guarantee module
    2208                 :            :  * percpu symbols fall inside the relocatable range.
    2209                 :            :  *
    2210                 :            :  * @ai->dyn_size determines the number of bytes available for dynamic
    2211                 :            :  * allocation in the first chunk.  The area between @ai->static_size +
    2212                 :            :  * @ai->reserved_size + @ai->dyn_size and @ai->unit_size is unused.
    2213                 :            :  *
    2214                 :            :  * @ai->unit_size specifies unit size and must be aligned to PAGE_SIZE
    2215                 :            :  * and equal to or larger than @ai->static_size + @ai->reserved_size +
    2216                 :            :  * @ai->dyn_size.
    2217                 :            :  *
    2218                 :            :  * @ai->atom_size is the allocation atom size and used as alignment
    2219                 :            :  * for vm areas.
    2220                 :            :  *
    2221                 :            :  * @ai->alloc_size is the allocation size and always multiple of
    2222                 :            :  * @ai->atom_size.  This is larger than @ai->atom_size if
    2223                 :            :  * @ai->unit_size is larger than @ai->atom_size.
    2224                 :            :  *
    2225                 :            :  * @ai->nr_groups and @ai->groups describe virtual memory layout of
    2226                 :            :  * percpu areas.  Units which should be colocated are put into the
    2227                 :            :  * same group.  Dynamic VM areas will be allocated according to these
    2228                 :            :  * groupings.  If @ai->nr_groups is zero, a single group containing
    2229                 :            :  * all units is assumed.
    2230                 :            :  *
    2231                 :            :  * The caller should have mapped the first chunk at @base_addr and
    2232                 :            :  * copied static data to each unit.
    2233                 :            :  *
    2234                 :            :  * The first chunk will always contain a static and a dynamic region.
    2235                 :            :  * However, the static region is not managed by any chunk.  If the first
    2236                 :            :  * chunk also contains a reserved region, it is served by two chunks -
    2237                 :            :  * one for the reserved region and one for the dynamic region.  They
    2238                 :            :  * share the same vm, but use offset regions in the area allocation map.
    2239                 :            :  * The chunk serving the dynamic region is circulated in the chunk slots
    2240                 :            :  * and available for dynamic allocation like any other chunk.
    2241                 :            :  */
    2242                 :         78 : void __init pcpu_setup_first_chunk(const struct pcpu_alloc_info *ai,
    2243                 :            :                                    void *base_addr)
    2244                 :            : {
    2245                 :         78 :         size_t size_sum = ai->static_size + ai->reserved_size + ai->dyn_size;
    2246                 :         78 :         size_t static_size, dyn_size;
    2247                 :         78 :         struct pcpu_chunk *chunk;
    2248                 :         78 :         unsigned long *group_offsets;
    2249                 :         78 :         size_t *group_sizes;
    2250                 :         78 :         unsigned long *unit_off;
    2251                 :         78 :         unsigned int cpu;
    2252                 :         78 :         int *unit_map;
    2253                 :         78 :         int group, unit, i;
    2254                 :         78 :         int map_size;
    2255                 :         78 :         unsigned long tmp_addr;
    2256                 :         78 :         size_t alloc_size;
    2257                 :            : 
    2258                 :            : #define PCPU_SETUP_BUG_ON(cond) do {                                    \
    2259                 :            :         if (unlikely(cond)) {                                           \
    2260                 :            :                 pr_emerg("failed to initialize, %s\n", #cond);                \
    2261                 :            :                 pr_emerg("cpu_possible_mask=%*pb\n",                  \
    2262                 :            :                          cpumask_pr_args(cpu_possible_mask));           \
    2263                 :            :                 pcpu_dump_alloc_info(KERN_EMERG, ai);                   \
    2264                 :            :                 BUG();                                                  \
    2265                 :            :         }                                                               \
    2266                 :            : } while (0)
    2267                 :            : 
    2268                 :            :         /* sanity checks */
    2269         [ -  + ]:         78 :         PCPU_SETUP_BUG_ON(ai->nr_groups <= 0);
    2270                 :            : #ifdef CONFIG_SMP
    2271         [ -  + ]:         78 :         PCPU_SETUP_BUG_ON(!ai->static_size);
    2272         [ -  + ]:         78 :         PCPU_SETUP_BUG_ON(offset_in_page(__per_cpu_start));
    2273                 :            : #endif
    2274         [ -  + ]:         78 :         PCPU_SETUP_BUG_ON(!base_addr);
    2275         [ -  + ]:         78 :         PCPU_SETUP_BUG_ON(offset_in_page(base_addr));
    2276         [ -  + ]:         78 :         PCPU_SETUP_BUG_ON(ai->unit_size < size_sum);
    2277         [ -  + ]:         78 :         PCPU_SETUP_BUG_ON(offset_in_page(ai->unit_size));
    2278         [ -  + ]:         78 :         PCPU_SETUP_BUG_ON(ai->unit_size < PCPU_MIN_UNIT_SIZE);
    2279                 :         78 :         PCPU_SETUP_BUG_ON(!IS_ALIGNED(ai->unit_size, PCPU_BITMAP_BLOCK_SIZE));
    2280         [ -  + ]:         78 :         PCPU_SETUP_BUG_ON(ai->dyn_size < PERCPU_DYNAMIC_EARLY_SIZE);
    2281         [ -  + ]:         78 :         PCPU_SETUP_BUG_ON(!ai->dyn_size);
    2282         [ -  + ]:         78 :         PCPU_SETUP_BUG_ON(!IS_ALIGNED(ai->reserved_size, PCPU_MIN_ALLOC_SIZE));
    2283                 :         78 :         PCPU_SETUP_BUG_ON(!(IS_ALIGNED(PCPU_BITMAP_BLOCK_SIZE, PAGE_SIZE) ||
    2284                 :            :                             IS_ALIGNED(PAGE_SIZE, PCPU_BITMAP_BLOCK_SIZE)));
    2285                 :         78 :         PCPU_SETUP_BUG_ON(pcpu_verify_alloc_info(ai) < 0);
    2286                 :            : 
    2287                 :            :         /* process group information and build config tables accordingly */
    2288                 :         78 :         alloc_size = ai->nr_groups * sizeof(group_offsets[0]);
    2289                 :         78 :         group_offsets = memblock_alloc(alloc_size, SMP_CACHE_BYTES);
    2290         [ -  + ]:         78 :         if (!group_offsets)
    2291                 :          0 :                 panic("%s: Failed to allocate %zu bytes\n", __func__,
    2292                 :            :                       alloc_size);
    2293                 :            : 
    2294                 :         78 :         alloc_size = ai->nr_groups * sizeof(group_sizes[0]);
    2295                 :         78 :         group_sizes = memblock_alloc(alloc_size, SMP_CACHE_BYTES);
    2296         [ -  + ]:         78 :         if (!group_sizes)
    2297                 :          0 :                 panic("%s: Failed to allocate %zu bytes\n", __func__,
    2298                 :            :                       alloc_size);
    2299                 :            : 
    2300                 :         78 :         alloc_size = nr_cpu_ids * sizeof(unit_map[0]);
    2301                 :         78 :         unit_map = memblock_alloc(alloc_size, SMP_CACHE_BYTES);
    2302         [ -  + ]:         78 :         if (!unit_map)
    2303                 :          0 :                 panic("%s: Failed to allocate %zu bytes\n", __func__,
    2304                 :            :                       alloc_size);
    2305                 :            : 
    2306                 :         78 :         alloc_size = nr_cpu_ids * sizeof(unit_off[0]);
    2307                 :         78 :         unit_off = memblock_alloc(alloc_size, SMP_CACHE_BYTES);
    2308         [ -  + ]:         78 :         if (!unit_off)
    2309                 :          0 :                 panic("%s: Failed to allocate %zu bytes\n", __func__,
    2310                 :            :                       alloc_size);
    2311                 :            : 
    2312         [ +  + ]:        156 :         for (cpu = 0; cpu < nr_cpu_ids; cpu++)
    2313                 :         78 :                 unit_map[cpu] = UINT_MAX;
    2314                 :            : 
    2315                 :         78 :         pcpu_low_unit_cpu = NR_CPUS;
    2316                 :         78 :         pcpu_high_unit_cpu = NR_CPUS;
    2317                 :            : 
    2318         [ +  + ]:        156 :         for (group = 0, unit = 0; group < ai->nr_groups; group++, unit += i) {
    2319                 :         78 :                 const struct pcpu_group_info *gi = &ai->groups[group];
    2320                 :            : 
    2321                 :         78 :                 group_offsets[group] = gi->base_offset;
    2322                 :         78 :                 group_sizes[group] = gi->nr_units * ai->unit_size;
    2323                 :            : 
    2324         [ +  + ]:        156 :                 for (i = 0; i < gi->nr_units; i++) {
    2325                 :         78 :                         cpu = gi->cpu_map[i];
    2326         [ -  + ]:         78 :                         if (cpu == NR_CPUS)
    2327                 :          0 :                                 continue;
    2328                 :            : 
    2329         [ -  + ]:         78 :                         PCPU_SETUP_BUG_ON(cpu >= nr_cpu_ids);
    2330         [ -  + ]:         78 :                         PCPU_SETUP_BUG_ON(!cpu_possible(cpu));
    2331         [ -  + ]:         78 :                         PCPU_SETUP_BUG_ON(unit_map[cpu] != UINT_MAX);
    2332                 :            : 
    2333                 :         78 :                         unit_map[cpu] = unit + i;
    2334                 :         78 :                         unit_off[cpu] = gi->base_offset + i * ai->unit_size;
    2335                 :            : 
    2336                 :            :                         /* determine low/high unit_cpu */
    2337         [ -  + ]:         78 :                         if (pcpu_low_unit_cpu == NR_CPUS ||
    2338         [ #  # ]:          0 :                             unit_off[cpu] < unit_off[pcpu_low_unit_cpu])
    2339                 :         78 :                                 pcpu_low_unit_cpu = cpu;
    2340         [ -  + ]:         78 :                         if (pcpu_high_unit_cpu == NR_CPUS ||
    2341         [ #  # ]:          0 :                             unit_off[cpu] > unit_off[pcpu_high_unit_cpu])
    2342                 :         78 :                                 pcpu_high_unit_cpu = cpu;
    2343                 :            :                 }
    2344                 :            :         }
    2345                 :         78 :         pcpu_nr_units = unit;
    2346                 :            : 
    2347         [ +  + ]:        234 :         for_each_possible_cpu(cpu)
    2348         [ -  + ]:        156 :                 PCPU_SETUP_BUG_ON(unit_map[cpu] == UINT_MAX);
    2349                 :            : 
    2350                 :            :         /* we're done parsing the input, undefine BUG macro and dump config */
    2351                 :            : #undef PCPU_SETUP_BUG_ON
    2352                 :         78 :         pcpu_dump_alloc_info(KERN_DEBUG, ai);
    2353                 :            : 
    2354                 :         78 :         pcpu_nr_groups = ai->nr_groups;
    2355                 :         78 :         pcpu_group_offsets = group_offsets;
    2356                 :         78 :         pcpu_group_sizes = group_sizes;
    2357                 :         78 :         pcpu_unit_map = unit_map;
    2358                 :         78 :         pcpu_unit_offsets = unit_off;
    2359                 :            : 
    2360                 :            :         /* determine basic parameters */
    2361                 :         78 :         pcpu_unit_pages = ai->unit_size >> PAGE_SHIFT;
    2362                 :         78 :         pcpu_unit_size = pcpu_unit_pages << PAGE_SHIFT;
    2363                 :         78 :         pcpu_atom_size = ai->atom_size;
    2364                 :         78 :         pcpu_chunk_struct_size = sizeof(struct pcpu_chunk) +
    2365                 :         78 :                 BITS_TO_LONGS(pcpu_unit_pages) * sizeof(unsigned long);
    2366                 :            : 
    2367                 :         78 :         pcpu_stats_save_ai(ai);
    2368                 :            : 
    2369                 :            :         /*
    2370                 :            :          * Allocate chunk slots.  The additional last slot is for
    2371                 :            :          * empty chunks.
    2372                 :            :          */
    2373                 :         78 :         pcpu_nr_slots = __pcpu_size_to_slot(pcpu_unit_size) + 2;
    2374                 :         78 :         pcpu_slot = memblock_alloc(pcpu_nr_slots * sizeof(pcpu_slot[0]),
    2375                 :            :                                    SMP_CACHE_BYTES);
    2376         [ -  + ]:         78 :         if (!pcpu_slot)
    2377                 :          0 :                 panic("%s: Failed to allocate %zu bytes\n", __func__,
    2378                 :            :                       pcpu_nr_slots * sizeof(pcpu_slot[0]));
    2379         [ +  + ]:       1716 :         for (i = 0; i < pcpu_nr_slots; i++)
    2380                 :       1638 :                 INIT_LIST_HEAD(&pcpu_slot[i]);
    2381                 :            : 
    2382                 :            :         /*
    2383                 :            :          * The end of the static region needs to be aligned with the
    2384                 :            :          * minimum allocation size as this offsets the reserved and
    2385                 :            :          * dynamic region.  The first chunk ends page aligned by
    2386                 :            :          * expanding the dynamic region, therefore the dynamic region
    2387                 :            :          * can be shrunk to compensate while still staying above the
    2388                 :            :          * configured sizes.
    2389                 :            :          */
    2390                 :         78 :         static_size = ALIGN(ai->static_size, PCPU_MIN_ALLOC_SIZE);
    2391                 :         78 :         dyn_size = ai->dyn_size - (static_size - ai->static_size);
    2392                 :            : 
    2393                 :            :         /*
    2394                 :            :          * Initialize first chunk.
    2395                 :            :          * If the reserved_size is non-zero, this initializes the reserved
    2396                 :            :          * chunk.  If the reserved_size is zero, the reserved chunk is NULL
    2397                 :            :          * and the dynamic region is initialized here.  The first chunk,
    2398                 :            :          * pcpu_first_chunk, will always point to the chunk that serves
    2399                 :            :          * the dynamic region.
    2400                 :            :          */
    2401                 :         78 :         tmp_addr = (unsigned long)base_addr + static_size;
    2402         [ +  - ]:         78 :         map_size = ai->reserved_size ?: dyn_size;
    2403                 :         78 :         chunk = pcpu_alloc_first_chunk(tmp_addr, map_size);
    2404                 :            : 
    2405                 :            :         /* init dynamic chunk if necessary */
    2406         [ +  - ]:         78 :         if (ai->reserved_size) {
    2407                 :         78 :                 pcpu_reserved_chunk = chunk;
    2408                 :            : 
    2409                 :         78 :                 tmp_addr = (unsigned long)base_addr + static_size +
    2410                 :            :                            ai->reserved_size;
    2411                 :         78 :                 map_size = dyn_size;
    2412                 :         78 :                 chunk = pcpu_alloc_first_chunk(tmp_addr, map_size);
    2413                 :            :         }
    2414                 :            : 
    2415                 :            :         /* link the first chunk in */
    2416                 :         78 :         pcpu_first_chunk = chunk;
    2417                 :         78 :         pcpu_nr_empty_pop_pages = pcpu_first_chunk->nr_empty_pop_pages;
    2418                 :         78 :         pcpu_chunk_relocate(pcpu_first_chunk, -1);
    2419                 :            : 
    2420                 :            :         /* include all regions of the first chunk */
    2421                 :         78 :         pcpu_nr_populated += PFN_DOWN(size_sum);
    2422                 :            : 
    2423                 :         78 :         pcpu_stats_chunk_alloc();
    2424                 :         78 :         trace_percpu_create_chunk(base_addr);
    2425                 :            : 
    2426                 :            :         /* we're done */
    2427                 :         78 :         pcpu_base_addr = base_addr;
    2428                 :         78 : }
    2429                 :            : 
    2430                 :            : #ifdef CONFIG_SMP
    2431                 :            : 
    2432                 :            : const char * const pcpu_fc_names[PCPU_FC_NR] __initconst = {
    2433                 :            :         [PCPU_FC_AUTO]  = "auto",
    2434                 :            :         [PCPU_FC_EMBED] = "embed",
    2435                 :            :         [PCPU_FC_PAGE]  = "page",
    2436                 :            : };
    2437                 :            : 
    2438                 :            : enum pcpu_fc pcpu_chosen_fc __initdata = PCPU_FC_AUTO;
    2439                 :            : 
    2440                 :          0 : static int __init percpu_alloc_setup(char *str)
    2441                 :            : {
    2442         [ #  # ]:          0 :         if (!str)
    2443                 :            :                 return -EINVAL;
    2444                 :            : 
    2445                 :          0 :         if (0)
    2446                 :            :                 /* nada */;
    2447                 :            : #ifdef CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK
    2448         [ #  # ]:          0 :         else if (!strcmp(str, "embed"))
    2449                 :          0 :                 pcpu_chosen_fc = PCPU_FC_EMBED;
    2450                 :            : #endif
    2451                 :            : #ifdef CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK
    2452         [ #  # ]:          0 :         else if (!strcmp(str, "page"))
    2453                 :          0 :                 pcpu_chosen_fc = PCPU_FC_PAGE;
    2454                 :            : #endif
    2455                 :            :         else
    2456                 :          0 :                 pr_warn("unknown allocator %s specified\n", str);
    2457                 :            : 
    2458                 :            :         return 0;
    2459                 :            : }
    2460                 :            : early_param("percpu_alloc", percpu_alloc_setup);
    2461                 :            : 
    2462                 :            : /*
    2463                 :            :  * pcpu_embed_first_chunk() is used by the generic percpu setup.
    2464                 :            :  * Build it if needed by the arch config or the generic setup is going
    2465                 :            :  * to be used.
    2466                 :            :  */
    2467                 :            : #if defined(CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK) || \
    2468                 :            :         !defined(CONFIG_HAVE_SETUP_PER_CPU_AREA)
    2469                 :            : #define BUILD_EMBED_FIRST_CHUNK
    2470                 :            : #endif
    2471                 :            : 
    2472                 :            : /* build pcpu_page_first_chunk() iff needed by the arch config */
    2473                 :            : #if defined(CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK)
    2474                 :            : #define BUILD_PAGE_FIRST_CHUNK
    2475                 :            : #endif
    2476                 :            : 
    2477                 :            : /* pcpu_build_alloc_info() is used by both embed and page first chunk */
    2478                 :            : #if defined(BUILD_EMBED_FIRST_CHUNK) || defined(BUILD_PAGE_FIRST_CHUNK)
    2479                 :            : /**
    2480                 :            :  * pcpu_build_alloc_info - build alloc_info considering distances between CPUs
    2481                 :            :  * @reserved_size: the size of reserved percpu area in bytes
    2482                 :            :  * @dyn_size: minimum free size for dynamic allocation in bytes
    2483                 :            :  * @atom_size: allocation atom size
    2484                 :            :  * @cpu_distance_fn: callback to determine distance between cpus, optional
    2485                 :            :  *
    2486                 :            :  * This function determines grouping of units, their mappings to cpus
    2487                 :            :  * and other parameters considering needed percpu size, allocation
    2488                 :            :  * atom size and distances between CPUs.
    2489                 :            :  *
    2490                 :            :  * Groups are always multiples of atom size and CPUs which are of
    2491                 :            :  * LOCAL_DISTANCE both ways are grouped together and share space for
    2492                 :            :  * units in the same group.  The returned configuration is guaranteed
    2493                 :            :  * to have CPUs on different nodes on different groups and >=75% usage
    2494                 :            :  * of allocated virtual address space.
    2495                 :            :  *
    2496                 :            :  * RETURNS:
    2497                 :            :  * On success, pointer to the new allocation_info is returned.  On
    2498                 :            :  * failure, ERR_PTR value is returned.
    2499                 :            :  */
    2500                 :         78 : static struct pcpu_alloc_info * __init pcpu_build_alloc_info(
    2501                 :            :                                 size_t reserved_size, size_t dyn_size,
    2502                 :            :                                 size_t atom_size,
    2503                 :            :                                 pcpu_fc_cpu_distance_fn_t cpu_distance_fn)
    2504                 :            : {
    2505                 :         78 :         static int group_map[NR_CPUS] __initdata;
    2506                 :         78 :         static int group_cnt[NR_CPUS] __initdata;
    2507                 :         78 :         const size_t static_size = __per_cpu_end - __per_cpu_start;
    2508                 :         78 :         int nr_groups = 1, nr_units = 0;
    2509                 :         78 :         size_t size_sum, min_unit_size, alloc_size;
    2510                 :         78 :         int upa, max_upa, uninitialized_var(best_upa);  /* units_per_alloc */
    2511                 :         78 :         int last_allocs, group, unit;
    2512                 :         78 :         unsigned int cpu, tcpu;
    2513                 :         78 :         struct pcpu_alloc_info *ai;
    2514                 :         78 :         unsigned int *cpu_map;
    2515                 :            : 
    2516                 :            :         /* this function may be called multiple times */
    2517                 :         78 :         memset(group_map, 0, sizeof(group_map));
    2518                 :         78 :         memset(group_cnt, 0, sizeof(group_cnt));
    2519                 :            : 
    2520                 :            :         /* calculate size_sum and ensure dyn_size is enough for early alloc */
    2521                 :         78 :         size_sum = PFN_ALIGN(static_size + reserved_size +
    2522                 :            :                             max_t(size_t, dyn_size, PERCPU_DYNAMIC_EARLY_SIZE));
    2523                 :         78 :         dyn_size = size_sum - static_size - reserved_size;
    2524                 :            : 
    2525                 :            :         /*
    2526                 :            :          * Determine min_unit_size, alloc_size and max_upa such that
    2527                 :            :          * alloc_size is multiple of atom_size and is the smallest
    2528                 :            :          * which can accommodate 4k aligned segments which are equal to
    2529                 :            :          * or larger than min_unit_size.
    2530                 :            :          */
    2531                 :         78 :         min_unit_size = max_t(size_t, size_sum, PCPU_MIN_UNIT_SIZE);
    2532                 :            : 
    2533                 :            :         /* determine the maximum # of units that can fit in an allocation */
    2534                 :         78 :         alloc_size = roundup(min_unit_size, atom_size);
    2535                 :         78 :         upa = alloc_size / min_unit_size;
    2536   [ -  +  -  + ]:         78 :         while (alloc_size % upa || (offset_in_page(alloc_size / upa)))
    2537                 :          0 :                 upa--;
    2538                 :            :         max_upa = upa;
    2539                 :            : 
    2540                 :            :         /* group cpus according to their proximity */
    2541         [ +  + ]:        156 :         for_each_possible_cpu(cpu) {
    2542                 :            :                 group = 0;
    2543                 :         78 :         next_group:
    2544         [ +  - ]:         78 :                 for_each_possible_cpu(tcpu) {
    2545         [ -  + ]:         78 :                         if (cpu == tcpu)
    2546                 :            :                                 break;
    2547   [ #  #  #  #  :          0 :                         if (group_map[tcpu] == group && cpu_distance_fn &&
                   #  # ]
    2548         [ #  # ]:          0 :                             (cpu_distance_fn(cpu, tcpu) > LOCAL_DISTANCE ||
    2549                 :          0 :                              cpu_distance_fn(tcpu, cpu) > LOCAL_DISTANCE)) {
    2550                 :          0 :                                 group++;
    2551                 :          0 :                                 nr_groups = max(nr_groups, group + 1);
    2552                 :          0 :                                 goto next_group;
    2553                 :            :                         }
    2554                 :            :                 }
    2555                 :         78 :                 group_map[cpu] = group;
    2556                 :         78 :                 group_cnt[group]++;
    2557                 :            :         }
    2558                 :            : 
    2559                 :            :         /*
    2560                 :            :          * Wasted space is caused by a ratio imbalance of upa to group_cnt.
    2561                 :            :          * Expand the unit_size until we use >= 75% of the units allocated.
    2562                 :            :          * Related to atom_size, which could be much larger than the unit_size.
    2563                 :            :          */
    2564                 :            :         last_allocs = INT_MAX;
    2565         [ +  + ]:        702 :         for (upa = max_upa; upa; upa--) {
    2566                 :        624 :                 int allocs = 0, wasted = 0;
    2567                 :            : 
    2568   [ +  +  -  + ]:        624 :                 if (alloc_size % upa || (offset_in_page(alloc_size / upa)))
    2569                 :        312 :                         continue;
    2570                 :            : 
    2571         [ +  + ]:        624 :                 for (group = 0; group < nr_groups; group++) {
    2572                 :        312 :                         int this_allocs = DIV_ROUND_UP(group_cnt[group], upa);
    2573                 :        312 :                         allocs += this_allocs;
    2574                 :        312 :                         wasted += this_allocs * upa - group_cnt[group];
    2575                 :            :                 }
    2576                 :            : 
    2577                 :            :                 /*
    2578                 :            :                  * Don't accept if wastage is over 1/3.  The
    2579                 :            :                  * greater-than comparison ensures upa==1 always
    2580                 :            :                  * passes the following check.
    2581                 :            :                  */
    2582         [ +  + ]:        312 :                 if (wasted > num_possible_cpus() / 3)
    2583                 :        234 :                         continue;
    2584                 :            : 
    2585                 :            :                 /* and then don't consume more memory */
    2586         [ +  - ]:         78 :                 if (allocs > last_allocs)
    2587                 :            :                         break;
    2588                 :            :                 last_allocs = allocs;
    2589                 :            :                 best_upa = upa;
    2590                 :            :         }
    2591                 :         78 :         upa = best_upa;
    2592                 :            : 
    2593                 :            :         /* allocate and fill alloc_info */
    2594         [ +  + ]:        156 :         for (group = 0; group < nr_groups; group++)
    2595                 :         78 :                 nr_units += roundup(group_cnt[group], upa);
    2596                 :            : 
    2597                 :         78 :         ai = pcpu_alloc_alloc_info(nr_groups, nr_units);
    2598         [ +  - ]:         78 :         if (!ai)
    2599                 :            :                 return ERR_PTR(-ENOMEM);
    2600                 :         78 :         cpu_map = ai->groups[0].cpu_map;
    2601                 :            : 
    2602         [ +  + ]:        156 :         for (group = 0; group < nr_groups; group++) {
    2603                 :         78 :                 ai->groups[group].cpu_map = cpu_map;
    2604                 :         78 :                 cpu_map += roundup(group_cnt[group], upa);
    2605                 :            :         }
    2606                 :            : 
    2607                 :         78 :         ai->static_size = static_size;
    2608                 :         78 :         ai->reserved_size = reserved_size;
    2609                 :         78 :         ai->dyn_size = dyn_size;
    2610                 :         78 :         ai->unit_size = alloc_size / upa;
    2611                 :         78 :         ai->atom_size = atom_size;
    2612                 :         78 :         ai->alloc_size = alloc_size;
    2613                 :            : 
    2614         [ +  + ]:        156 :         for (group = 0, unit = 0; group < nr_groups; group++) {
    2615                 :         78 :                 struct pcpu_group_info *gi = &ai->groups[group];
    2616                 :            : 
    2617                 :            :                 /*
    2618                 :            :                  * Initialize base_offset as if all groups are located
    2619                 :            :                  * back-to-back.  The caller should update this to
    2620                 :            :                  * reflect actual allocation.
    2621                 :            :                  */
    2622                 :         78 :                 gi->base_offset = unit * ai->unit_size;
    2623                 :            : 
    2624         [ +  + ]:        156 :                 for_each_possible_cpu(cpu)
    2625         [ +  - ]:         78 :                         if (group_map[cpu] == group)
    2626                 :         78 :                                 gi->cpu_map[gi->nr_units++] = cpu;
    2627                 :         78 :                 gi->nr_units = roundup(gi->nr_units, upa);
    2628                 :         78 :                 unit += gi->nr_units;
    2629                 :            :         }
    2630         [ -  + ]:         78 :         BUG_ON(unit != nr_units);
    2631                 :            : 
    2632                 :            :         return ai;
    2633                 :            : }
    2634                 :            : #endif /* BUILD_EMBED_FIRST_CHUNK || BUILD_PAGE_FIRST_CHUNK */
    2635                 :            : 
    2636                 :            : #if defined(BUILD_EMBED_FIRST_CHUNK)
    2637                 :            : /**
    2638                 :            :  * pcpu_embed_first_chunk - embed the first percpu chunk into bootmem
    2639                 :            :  * @reserved_size: the size of reserved percpu area in bytes
    2640                 :            :  * @dyn_size: minimum free size for dynamic allocation in bytes
    2641                 :            :  * @atom_size: allocation atom size
    2642                 :            :  * @cpu_distance_fn: callback to determine distance between cpus, optional
    2643                 :            :  * @alloc_fn: function to allocate percpu page
    2644                 :            :  * @free_fn: function to free percpu page
    2645                 :            :  *
    2646                 :            :  * This is a helper to ease setting up embedded first percpu chunk and
    2647                 :            :  * can be called where pcpu_setup_first_chunk() is expected.
    2648                 :            :  *
    2649                 :            :  * If this function is used to setup the first chunk, it is allocated
    2650                 :            :  * by calling @alloc_fn and used as-is without being mapped into
    2651                 :            :  * vmalloc area.  Allocations are always whole multiples of @atom_size
    2652                 :            :  * aligned to @atom_size.
    2653                 :            :  *
    2654                 :            :  * This enables the first chunk to piggy back on the linear physical
    2655                 :            :  * mapping which often uses larger page size.  Please note that this
    2656                 :            :  * can result in very sparse cpu->unit mapping on NUMA machines thus
    2657                 :            :  * requiring large vmalloc address space.  Don't use this allocator if
    2658                 :            :  * vmalloc space is not orders of magnitude larger than distances
    2659                 :            :  * between node memory addresses (ie. 32bit NUMA machines).
    2660                 :            :  *
    2661                 :            :  * @dyn_size specifies the minimum dynamic area size.
    2662                 :            :  *
    2663                 :            :  * If the needed size is smaller than the minimum or specified unit
    2664                 :            :  * size, the leftover is returned using @free_fn.
    2665                 :            :  *
    2666                 :            :  * RETURNS:
    2667                 :            :  * 0 on success, -errno on failure.
    2668                 :            :  */
    2669                 :         78 : int __init pcpu_embed_first_chunk(size_t reserved_size, size_t dyn_size,
    2670                 :            :                                   size_t atom_size,
    2671                 :            :                                   pcpu_fc_cpu_distance_fn_t cpu_distance_fn,
    2672                 :            :                                   pcpu_fc_alloc_fn_t alloc_fn,
    2673                 :            :                                   pcpu_fc_free_fn_t free_fn)
    2674                 :            : {
    2675                 :         78 :         void *base = (void *)ULONG_MAX;
    2676                 :         78 :         void **areas = NULL;
    2677                 :         78 :         struct pcpu_alloc_info *ai;
    2678                 :         78 :         size_t size_sum, areas_size;
    2679                 :         78 :         unsigned long max_distance;
    2680                 :         78 :         int group, i, highest_group, rc = 0;
    2681                 :            : 
    2682                 :         78 :         ai = pcpu_build_alloc_info(reserved_size, dyn_size, atom_size,
    2683                 :            :                                    cpu_distance_fn);
    2684         [ -  + ]:         78 :         if (IS_ERR(ai))
    2685                 :          0 :                 return PTR_ERR(ai);
    2686                 :            : 
    2687                 :         78 :         size_sum = ai->static_size + ai->reserved_size + ai->dyn_size;
    2688                 :         78 :         areas_size = PFN_ALIGN(ai->nr_groups * sizeof(void *));
    2689                 :            : 
    2690                 :         78 :         areas = memblock_alloc(areas_size, SMP_CACHE_BYTES);
    2691         [ -  + ]:         78 :         if (!areas) {
    2692                 :          0 :                 rc = -ENOMEM;
    2693                 :          0 :                 goto out_free;
    2694                 :            :         }
    2695                 :            : 
    2696                 :            :         /* allocate, copy and determine base address & max_distance */
    2697                 :            :         highest_group = 0;
    2698         [ +  + ]:        156 :         for (group = 0; group < ai->nr_groups; group++) {
    2699                 :            :                 struct pcpu_group_info *gi = &ai->groups[group];
    2700                 :            :                 unsigned int cpu = NR_CPUS;
    2701                 :            :                 void *ptr;
    2702                 :            : 
    2703   [ +  +  +  - ]:        156 :                 for (i = 0; i < gi->nr_units && cpu == NR_CPUS; i++)
    2704                 :         78 :                         cpu = gi->cpu_map[i];
    2705         [ -  + ]:         78 :                 BUG_ON(cpu == NR_CPUS);
    2706                 :            : 
    2707                 :            :                 /* allocate space for the whole group */
    2708                 :         78 :                 ptr = alloc_fn(cpu, gi->nr_units * ai->unit_size, atom_size);
    2709         [ -  + ]:         78 :                 if (!ptr) {
    2710                 :          0 :                         rc = -ENOMEM;
    2711                 :          0 :                         goto out_free_areas;
    2712                 :            :                 }
    2713                 :            :                 /* kmemleak tracks the percpu allocations separately */
    2714         [ -  + ]:         78 :                 kmemleak_free(ptr);
    2715                 :         78 :                 areas[group] = ptr;
    2716                 :            : 
    2717                 :         78 :                 base = min(ptr, base);
    2718         [ -  + ]:         78 :                 if (ptr > areas[highest_group])
    2719                 :          0 :                         highest_group = group;
    2720                 :            :         }
    2721                 :         78 :         max_distance = areas[highest_group] - base;
    2722                 :         78 :         max_distance += ai->unit_size * ai->groups[highest_group].nr_units;
    2723                 :            : 
    2724                 :            :         /* warn if maximum distance is further than 75% of vmalloc space */
    2725   [ -  +  -  -  :        156 :         if (max_distance > VMALLOC_TOTAL * 3 / 4) {
                      + ]
    2726      [ #  #  # ]:          0 :                 pr_warn("max_distance=0x%lx too large for vmalloc space 0x%lx\n",
    2727                 :            :                                 max_distance, VMALLOC_TOTAL);
    2728                 :            : #ifdef CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK
    2729                 :            :                 /* and fail if we have fallback */
    2730                 :          0 :                 rc = -EINVAL;
    2731                 :          0 :                 goto out_free_areas;
    2732                 :            : #endif
    2733                 :            :         }
    2734                 :            : 
    2735                 :            :         /*
    2736                 :            :          * Copy data and free unused parts.  This should happen after all
    2737                 :            :          * allocations are complete; otherwise, we may end up with
    2738                 :            :          * overlapping groups.
    2739                 :            :          */
    2740         [ +  + ]:        156 :         for (group = 0; group < ai->nr_groups; group++) {
    2741                 :         78 :                 struct pcpu_group_info *gi = &ai->groups[group];
    2742                 :         78 :                 void *ptr = areas[group];
    2743                 :            : 
    2744         [ +  + ]:        156 :                 for (i = 0; i < gi->nr_units; i++, ptr += ai->unit_size) {
    2745         [ -  + ]:         78 :                         if (gi->cpu_map[i] == NR_CPUS) {
    2746                 :            :                                 /* unused unit, free whole */
    2747                 :          0 :                                 free_fn(ptr, ai->unit_size);
    2748                 :          0 :                                 continue;
    2749                 :            :                         }
    2750                 :            :                         /* copy and return the unused part */
    2751                 :         78 :                         memcpy(ptr, __per_cpu_load, ai->static_size);
    2752                 :         78 :                         free_fn(ptr + size_sum, ai->unit_size - size_sum);
    2753                 :            :                 }
    2754                 :            :         }
    2755                 :            : 
    2756                 :            :         /* base address is now known, determine group base offsets */
    2757         [ +  + ]:        156 :         for (group = 0; group < ai->nr_groups; group++) {
    2758                 :         78 :                 ai->groups[group].base_offset = areas[group] - base;
    2759                 :            :         }
    2760                 :            : 
    2761                 :         78 :         pr_info("Embedded %zu pages/cpu s%zu r%zu d%zu u%zu\n",
    2762                 :            :                 PFN_DOWN(size_sum), ai->static_size, ai->reserved_size,
    2763                 :            :                 ai->dyn_size, ai->unit_size);
    2764                 :            : 
    2765                 :         78 :         pcpu_setup_first_chunk(ai, base);
    2766                 :         78 :         goto out_free;
    2767                 :            : 
    2768                 :          0 : out_free_areas:
    2769         [ #  # ]:          0 :         for (group = 0; group < ai->nr_groups; group++)
    2770         [ #  # ]:          0 :                 if (areas[group])
    2771                 :          0 :                         free_fn(areas[group],
    2772                 :          0 :                                 ai->groups[group].nr_units * ai->unit_size);
    2773                 :          0 : out_free:
    2774                 :         78 :         pcpu_free_alloc_info(ai);
    2775         [ +  - ]:         78 :         if (areas)
    2776         [ +  - ]:         78 :                 memblock_free_early(__pa(areas), areas_size);
    2777                 :            :         return rc;
    2778                 :            : }
    2779                 :            : #endif /* BUILD_EMBED_FIRST_CHUNK */
    2780                 :            : 
    2781                 :            : #ifdef BUILD_PAGE_FIRST_CHUNK
    2782                 :            : /**
    2783                 :            :  * pcpu_page_first_chunk - map the first chunk using PAGE_SIZE pages
    2784                 :            :  * @reserved_size: the size of reserved percpu area in bytes
    2785                 :            :  * @alloc_fn: function to allocate percpu page, always called with PAGE_SIZE
    2786                 :            :  * @free_fn: function to free percpu page, always called with PAGE_SIZE
    2787                 :            :  * @populate_pte_fn: function to populate pte
    2788                 :            :  *
    2789                 :            :  * This is a helper to ease setting up page-remapped first percpu
    2790                 :            :  * chunk and can be called where pcpu_setup_first_chunk() is expected.
    2791                 :            :  *
    2792                 :            :  * This is the basic allocator.  Static percpu area is allocated
    2793                 :            :  * page-by-page into vmalloc area.
    2794                 :            :  *
    2795                 :            :  * RETURNS:
    2796                 :            :  * 0 on success, -errno on failure.
    2797                 :            :  */
    2798                 :          0 : int __init pcpu_page_first_chunk(size_t reserved_size,
    2799                 :            :                                  pcpu_fc_alloc_fn_t alloc_fn,
    2800                 :            :                                  pcpu_fc_free_fn_t free_fn,
    2801                 :            :                                  pcpu_fc_populate_pte_fn_t populate_pte_fn)
    2802                 :            : {
    2803                 :          0 :         static struct vm_struct vm;
    2804                 :          0 :         struct pcpu_alloc_info *ai;
    2805                 :          0 :         char psize_str[16];
    2806                 :          0 :         int unit_pages;
    2807                 :          0 :         size_t pages_size;
    2808                 :          0 :         struct page **pages;
    2809                 :          0 :         int unit, i, j, rc = 0;
    2810                 :          0 :         int upa;
    2811                 :          0 :         int nr_g0_units;
    2812                 :            : 
    2813                 :          0 :         snprintf(psize_str, sizeof(psize_str), "%luK", PAGE_SIZE >> 10);
    2814                 :            : 
    2815                 :          0 :         ai = pcpu_build_alloc_info(reserved_size, 0, PAGE_SIZE, NULL);
    2816         [ #  # ]:          0 :         if (IS_ERR(ai))
    2817                 :          0 :                 return PTR_ERR(ai);
    2818         [ #  # ]:          0 :         BUG_ON(ai->nr_groups != 1);
    2819                 :          0 :         upa = ai->alloc_size/ai->unit_size;
    2820                 :          0 :         nr_g0_units = roundup(num_possible_cpus(), upa);
    2821   [ #  #  #  # ]:          0 :         if (WARN_ON(ai->groups[0].nr_units != nr_g0_units)) {
    2822                 :          0 :                 pcpu_free_alloc_info(ai);
    2823                 :          0 :                 return -EINVAL;
    2824                 :            :         }
    2825                 :            : 
    2826                 :          0 :         unit_pages = ai->unit_size >> PAGE_SHIFT;
    2827                 :            : 
    2828                 :            :         /* unaligned allocations can't be freed, round up to page size */
    2829                 :          0 :         pages_size = PFN_ALIGN(unit_pages * num_possible_cpus() *
    2830                 :            :                                sizeof(pages[0]));
    2831                 :          0 :         pages = memblock_alloc(pages_size, SMP_CACHE_BYTES);
    2832         [ #  # ]:          0 :         if (!pages)
    2833                 :          0 :                 panic("%s: Failed to allocate %zu bytes\n", __func__,
    2834                 :            :                       pages_size);
    2835                 :            : 
    2836                 :            :         /* allocate pages */
    2837                 :            :         j = 0;
    2838         [ #  # ]:          0 :         for (unit = 0; unit < num_possible_cpus(); unit++) {
    2839                 :          0 :                 unsigned int cpu = ai->groups[0].cpu_map[unit];
    2840         [ #  # ]:          0 :                 for (i = 0; i < unit_pages; i++) {
    2841                 :          0 :                         void *ptr;
    2842                 :            : 
    2843                 :          0 :                         ptr = alloc_fn(cpu, PAGE_SIZE, PAGE_SIZE);
    2844         [ #  # ]:          0 :                         if (!ptr) {
    2845                 :          0 :                                 pr_warn("failed to allocate %s page for cpu%u\n",
    2846                 :            :                                                 psize_str, cpu);
    2847                 :          0 :                                 goto enomem;
    2848                 :            :                         }
    2849                 :            :                         /* kmemleak tracks the percpu allocations separately */
    2850         [ #  # ]:          0 :                         kmemleak_free(ptr);
    2851         [ #  # ]:          0 :                         pages[j++] = virt_to_page(ptr);
    2852                 :            :                 }
    2853                 :            :         }
    2854                 :            : 
    2855                 :            :         /* allocate vm area, map the pages and copy static data */
    2856                 :          0 :         vm.flags = VM_ALLOC;
    2857                 :          0 :         vm.size = num_possible_cpus() * ai->unit_size;
    2858                 :          0 :         vm_area_register_early(&vm, PAGE_SIZE);
    2859                 :            : 
    2860         [ #  # ]:          0 :         for (unit = 0; unit < num_possible_cpus(); unit++) {
    2861                 :          0 :                 unsigned long unit_addr =
    2862                 :          0 :                         (unsigned long)vm.addr + unit * ai->unit_size;
    2863                 :            : 
    2864         [ #  # ]:          0 :                 for (i = 0; i < unit_pages; i++)
    2865                 :          0 :                         populate_pte_fn(unit_addr + (i << PAGE_SHIFT));
    2866                 :            : 
    2867                 :            :                 /* pte already populated, the following shouldn't fail */
    2868                 :          0 :                 rc = __pcpu_map_pages(unit_addr, &pages[unit * unit_pages],
    2869                 :            :                                       unit_pages);
    2870         [ #  # ]:          0 :                 if (rc < 0)
    2871                 :          0 :                         panic("failed to map percpu area, err=%d\n", rc);
    2872                 :            : 
    2873                 :            :                 /*
    2874                 :            :                  * FIXME: Archs with virtual cache should flush local
    2875                 :            :                  * cache for the linear mapping here - something
    2876                 :            :                  * equivalent to flush_cache_vmap() on the local cpu.
    2877                 :            :                  * flush_cache_vmap() can't be used as most supporting
    2878                 :            :                  * data structures are not set up yet.
    2879                 :            :                  */
    2880                 :            : 
    2881                 :            :                 /* copy static data */
    2882                 :          0 :                 memcpy((void *)unit_addr, __per_cpu_load, ai->static_size);
    2883                 :            :         }
    2884                 :            : 
    2885                 :            :         /* we're ready, commit */
    2886                 :          0 :         pr_info("%d %s pages/cpu s%zu r%zu d%zu\n",
    2887                 :            :                 unit_pages, psize_str, ai->static_size,
    2888                 :            :                 ai->reserved_size, ai->dyn_size);
    2889                 :            : 
    2890                 :          0 :         pcpu_setup_first_chunk(ai, vm.addr);
    2891                 :          0 :         goto out_free_ar;
    2892                 :            : 
    2893                 :            : enomem:
    2894         [ #  # ]:          0 :         while (--j >= 0)
    2895                 :          0 :                 free_fn(page_address(pages[j]), PAGE_SIZE);
    2896                 :            :         rc = -ENOMEM;
    2897                 :          0 : out_free_ar:
    2898         [ #  # ]:          0 :         memblock_free_early(__pa(pages), pages_size);
    2899                 :          0 :         pcpu_free_alloc_info(ai);
    2900                 :          0 :         return rc;
    2901                 :            : }
    2902                 :            : #endif /* BUILD_PAGE_FIRST_CHUNK */
    2903                 :            : 
    2904                 :            : #ifndef CONFIG_HAVE_SETUP_PER_CPU_AREA
    2905                 :            : /*
    2906                 :            :  * Generic SMP percpu area setup.
    2907                 :            :  *
    2908                 :            :  * The embedding helper is used because its behavior closely resembles
    2909                 :            :  * the original non-dynamic generic percpu area setup.  This is
    2910                 :            :  * important because many archs have addressing restrictions and might
    2911                 :            :  * fail if the percpu area is located far away from the previous
    2912                 :            :  * location.  As an added bonus, in non-NUMA cases, embedding is
    2913                 :            :  * generally a good idea TLB-wise because percpu area can piggy back
    2914                 :            :  * on the physical linear memory mapping which uses large page
    2915                 :            :  * mappings on applicable archs.
    2916                 :            :  */
    2917                 :            : unsigned long __per_cpu_offset[NR_CPUS] __read_mostly;
    2918                 :            : EXPORT_SYMBOL(__per_cpu_offset);
    2919                 :            : 
    2920                 :            : static void * __init pcpu_dfl_fc_alloc(unsigned int cpu, size_t size,
    2921                 :            :                                        size_t align)
    2922                 :            : {
    2923                 :            :         return  memblock_alloc_from(size, align, __pa(MAX_DMA_ADDRESS));
    2924                 :            : }
    2925                 :            : 
    2926                 :            : static void __init pcpu_dfl_fc_free(void *ptr, size_t size)
    2927                 :            : {
    2928                 :            :         memblock_free_early(__pa(ptr), size);
    2929                 :            : }
    2930                 :            : 
    2931                 :            : void __init setup_per_cpu_areas(void)
    2932                 :            : {
    2933                 :            :         unsigned long delta;
    2934                 :            :         unsigned int cpu;
    2935                 :            :         int rc;
    2936                 :            : 
    2937                 :            :         /*
    2938                 :            :          * Always reserve area for module percpu variables.  That's
    2939                 :            :          * what the legacy allocator did.
    2940                 :            :          */
    2941                 :            :         rc = pcpu_embed_first_chunk(PERCPU_MODULE_RESERVE,
    2942                 :            :                                     PERCPU_DYNAMIC_RESERVE, PAGE_SIZE, NULL,
    2943                 :            :                                     pcpu_dfl_fc_alloc, pcpu_dfl_fc_free);
    2944                 :            :         if (rc < 0)
    2945                 :            :                 panic("Failed to initialize percpu areas.");
    2946                 :            : 
    2947                 :            :         delta = (unsigned long)pcpu_base_addr - (unsigned long)__per_cpu_start;
    2948                 :            :         for_each_possible_cpu(cpu)
    2949                 :            :                 __per_cpu_offset[cpu] = delta + pcpu_unit_offsets[cpu];
    2950                 :            : }
    2951                 :            : #endif  /* CONFIG_HAVE_SETUP_PER_CPU_AREA */
    2952                 :            : 
    2953                 :            : #else   /* CONFIG_SMP */
    2954                 :            : 
    2955                 :            : /*
    2956                 :            :  * UP percpu area setup.
    2957                 :            :  *
    2958                 :            :  * UP always uses km-based percpu allocator with identity mapping.
    2959                 :            :  * Static percpu variables are indistinguishable from the usual static
    2960                 :            :  * variables and don't require any special preparation.
    2961                 :            :  */
    2962                 :            : void __init setup_per_cpu_areas(void)
    2963                 :            : {
    2964                 :            :         const size_t unit_size =
    2965                 :            :                 roundup_pow_of_two(max_t(size_t, PCPU_MIN_UNIT_SIZE,
    2966                 :            :                                          PERCPU_DYNAMIC_RESERVE));
    2967                 :            :         struct pcpu_alloc_info *ai;
    2968                 :            :         void *fc;
    2969                 :            : 
    2970                 :            :         ai = pcpu_alloc_alloc_info(1, 1);
    2971                 :            :         fc = memblock_alloc_from(unit_size, PAGE_SIZE, __pa(MAX_DMA_ADDRESS));
    2972                 :            :         if (!ai || !fc)
    2973                 :            :                 panic("Failed to allocate memory for percpu areas.");
    2974                 :            :         /* kmemleak tracks the percpu allocations separately */
    2975                 :            :         kmemleak_free(fc);
    2976                 :            : 
    2977                 :            :         ai->dyn_size = unit_size;
    2978                 :            :         ai->unit_size = unit_size;
    2979                 :            :         ai->atom_size = unit_size;
    2980                 :            :         ai->alloc_size = unit_size;
    2981                 :            :         ai->groups[0].nr_units = 1;
    2982                 :            :         ai->groups[0].cpu_map[0] = 0;
    2983                 :            : 
    2984                 :            :         pcpu_setup_first_chunk(ai, fc);
    2985                 :            :         pcpu_free_alloc_info(ai);
    2986                 :            : }
    2987                 :            : 
    2988                 :            : #endif  /* CONFIG_SMP */
    2989                 :            : 
    2990                 :            : /*
    2991                 :            :  * pcpu_nr_pages - calculate total number of populated backing pages
    2992                 :            :  *
    2993                 :            :  * This reflects the number of pages populated to back chunks.  Metadata is
    2994                 :            :  * excluded in the number exposed in meminfo as the number of backing pages
    2995                 :            :  * scales with the number of cpus and can quickly outweigh the memory used for
    2996                 :            :  * metadata.  It also keeps this calculation nice and simple.
    2997                 :            :  *
    2998                 :            :  * RETURNS:
    2999                 :            :  * Total number of populated backing pages in use by the allocator.
    3000                 :            :  */
    3001                 :         78 : unsigned long pcpu_nr_pages(void)
    3002                 :            : {
    3003                 :         78 :         return pcpu_nr_populated * pcpu_nr_units;
    3004                 :            : }
    3005                 :            : 
    3006                 :            : /*
    3007                 :            :  * Percpu allocator is initialized early during boot when neither slab or
    3008                 :            :  * workqueue is available.  Plug async management until everything is up
    3009                 :            :  * and running.
    3010                 :            :  */
    3011                 :         78 : static int __init percpu_enable_async(void)
    3012                 :            : {
    3013                 :         78 :         pcpu_async_enabled = true;
    3014                 :         78 :         return 0;
    3015                 :            : }
    3016                 :            : subsys_initcall(percpu_enable_async);

Generated by: LCOV version 1.14