Python 标准库 itertools 为高效循环而创建迭代器的函数

说明

《Python 教程》持续更新中，提供建议、纠错、催更等加作者微信: gairuo123（备注：pandas教程）和关注公众号「盖若」ID: gairuo。跟作者学习，请进入 Python学习课程。欢迎关注作者出版的书籍：《深入浅出Pandas》和《Python之光》。

itertools 模块实现一系列 iterator ，这些迭代器受到 APL，Haskell 和 SML 的启发。为了适用于 Python，它们都被重新写过。本模块标准化了一个快速、高效利用内存的核心工具集，这些工具本身或组合都很有用。它们一起形成了“迭代器代数”，这使得在纯 Python 中有可能创建简洁又高效的专用工具。注意，这些迭代生成器生成的可迭代对象正如可迭代对象的特点，迭代后就会为空。

这些内置工具同时也能很好地与 operator 模块中的高效函数配合使用。例如，我们可以将两个向量的点积映射到乘法运算符： sum(map(operator.mul, vector1, vector2)) 。

无穷迭代器

这些函数可以创建无限长度的迭代器。无限序列只有在for迭代时才会无限地迭代下去，如果只是创建了一个迭代对象，它不会事先把无限个元素生成出来，事实上也不可能在内存中创建无限多个元素。

点数 itertools.count

语法为 itertools.count(start=0, step=1) 创建一个迭代器，它从 start 值开始，返回均匀间隔的值。常用于 map() 中的实参来生成连续的数据点。此外，还用于 zip() 来添加序列号。

大致相当于：

def count(start=0, step=1):
    # count(10) --> 10 11 12 13 14 ...
    # count(2.5, 0.5) -> 2.5 3.0 3.5 ...
    n = start
    while True:
        yield n
        n += step

当对浮点数计数时，替换为乘法代码有时精度会更好，例如： (start + step * i for i in count()) 。在 3.1 版更改: 增加参数 step ，允许非整型。

例如：

import itertools
foo = itertools.count(2, 2)

for i in foo:
    print(i)

'''
2
4
6
8
10
12
...

# 按Ctrl+C退出
'''

无限长度，只能强制退出。

循环 itertools.cycle

语法为 itertools.cycle(iterable) 创建一个迭代器，返回 iterable 中所有元素并保存一个副本。当取完 iterable 中所有元素，返回副本中的所有元素，无限重复。

大致相当于：

def cycle(iterable):
    # cycle('ABCD') --> A B C D A B C D A B C D ...
    saved = []
    for element in iterable:
        yield element
        saved.append(element)
    while saved:
        for element in saved:
              yield element

注意，该函数可能需要相当大的辅助空间（取决于 iterable 的长度）。

如：

import itertools
foo = itertools.cycle('123')

for i in foo:
    print(i)

'''
1
2
3
1
2
3
1
2
3
...

# 按Ctrl+C退出
'''

重复 itertools.repeat

语法为 itertools.repeat(object[, times]) 负责把一个元素无限重复下去，不过如果提供第二个参数就可以限定重复次数。它创建一个迭代器，不断重复 object 。除非设定参数 times ，否则将无限重复。可用于 map() 函数中的参数，被调用函数可得到一个不变参数。也可用于 zip() 的参数以在元组记录中创建一个不变的部分。

大致相当于：

def repeat(object, times=None):
    # repeat(10, 3) --> 10 10 10
    if times is None:
        while True:
            yield object
    else:
        for i in range(times):
            yield object

repeat 最常见的用途就是在 map 或 zip 提供一个常量流：

list(map(pow, range(10), repeat(2)))
# [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

有限迭代器

这些都是根据最短输入序列长度停止的迭代器。

逐积 itertools.accumulate

语法为 itertools.accumulate(iterable[, func, *, initial=None])，与类似函数 functools.reduce() 机制差不多，但它不返回一个最终累积值，而是把累积的过程结果形成一个可迭代对象，可以理解这个序列里的内容是计算过程。参与的意义可以参考 functools.reduce() （注意：它们的参数 iterable 和 func 位置相反）。

大致相当于：

def accumulate(iterable, func=operator.add, *, initial=None):
    'Return running totals'
    # accumulate([1,2,3,4,5]) --> 1 3 6 10 15
    # accumulate([1,2,3,4,5], initial=100) --> 100 101 103 106 110 115
    # accumulate([1,2,3,4,5], operator.mul) --> 1 2 6 24 120
    it = iter(iterable)
    total = initial
    if initial is None:
        try:
            total = next(it)
        except StopIteration:
            return
    yield total
    for element in it:
        total = func(total, element)
        yield total

例如：

>>> data = [3, 4, 6, 2, 1, 9, 0, 7, 5, 8]
>>> list(accumulate(data, operator.mul))     # running product
[3, 12, 72, 144, 144, 1296, 0, 0, 0, 0]
>>> list(accumulate(data, max))              # running maximum
[3, 4, 6, 6, 6, 9, 9, 9, 9, 9]

# Amortize a 5% loan of 1000 with 4 annual payments of 90
>>> cashflows = [1000, -90, -90, -90, -90]
>>> list(accumulate(cashflows, lambda bal, pmt: bal*1.05 + pmt))
[1000, 960.0, 918.0, 873.9000000000001, 827.5950000000001]

# Chaotic recurrence relation https://en.wikipedia.org/wiki/Logistic_map
>>> logistic_map = lambda x, _:  r * x * (1 - x)
>>> r = 3.8
>>> x0 = 0.4
>>> inputs = repeat(x0, 36)     # 仅使用初始值
>>> [format(x, '.2f') for x in accumulate(inputs, logistic_map)]
['0.40', '0.91', '0.30', '0.81', '0.60', '0.92', '0.29', '0.79', '0.63',
 '0.88', '0.39', '0.90', '0.33', '0.84', '0.52', '0.95', '0.18', '0.57',
 '0.93', '0.25', '0.71', '0.79', '0.63', '0.88', '0.39', '0.91', '0.32',
 '0.83', '0.54', '0.95', '0.20', '0.60', '0.91', '0.30', '0.80', '0.60']

批处理 itertools.batched

这是 Python 3.12 新版功能，语法为 itertools.batched(iterable, n)，来自 iterable 的长度为 n 元组形式的批次数据。最后一个批次可能短于 n。

循环处理输入可迭代对象并将数据积累为长度至多为 n 的元组。输入将被惰性地消耗，能填满一个批次即可。结果将在批次填满或输入可迭代对象被耗尽时产生:

flattened_data = ['roses', 'red', 'violets', 'blue', 'sugar', 'sweet']
unflattened = list(batched(flattened_data, 2))
unflattened
# [('roses', 'red'), ('violets', 'blue'), ('sugar', 'sweet')]

for batch in batched('ABCDEFG', 3):
    print(batch)

'''
('A', 'B', 'C')
('D', 'E', 'F')
('G',)
'''

大致相当于：

def batched(iterable, n):
    # batched('ABCDEFG', 3) --> ABC DEF G
    if n < 1:
        raise ValueError('n must be at least one')
    it = iter(iterable)
    while batch := tuple(islice(it, n)):
        yield batch

链条 itertools.chain

语法为 itertools.chain(*iterables)，创建一个迭代器，它首先返回第一个可迭代对象中所有元素，接着返回下一个可迭代对象中所有元素，直到耗尽所有可迭代对象中的元素。可将多个序列处理为单个序列。大致相当于：

def chain(*iterables):
    # chain('ABC', 'DEF') --> A B C D E F
    for it in iterables:
        for element in it:
            yield element

例如：

import itertools
chains = itertools.chain(range(3), 'bye')

for i in chains:
    print(i)

'''
0
1
2
b
y
e
'''

itertools.chain 对象还有一个方法 classmethod chain.from_iterable(iterable) 从一个单独的可迭代参数中得到链式输入，该参数是延迟计算的。

例：

import itertools
chains = itertools.chain.from_iterable([range(3), 'bye'])

for i in chains:
    print(i)
'''
0
1
2
b
y
e
'''

压缩 itertools.compress

语法为itertools.compress(data, selectors) 创建一个迭代器，它返回 data 中经 selectors 真值测试为 True 的元素。迭代器在两者较短的长度处停止。大致相当于：

def compress(data, selectors):
    # compress('ABCDEF', [1,0,1,0,1,1]) --> A C E F
    return (d for d, s in zip(data, selectors) if s)

如：

import itertools

compress = itertools.compress([*'abc'], [0,1,0])
[*compress]
# ['b']

第二个参数，如同一个蒙板，将想要的数据筛选出来，这在数据科学中非常有用。

丢弃 itertools.dropwhile

语法 itertools.dropwhile(predicate, iterable)创建一个迭代器，如果 predicate (谓语) 为 true，迭代器丢弃这些元素，然后返回其他元素。注意，迭代器在 predicate 首次为 false 之前不会产生任何输出，所以可能需要一定长度的启动时间。

大致相当于：

def dropwhile(predicate, iterable):
    # dropwhile(lambda x: x<5, [1,4,6,4,1]) --> 6 4 1
    iterable = iter(iterable)
    for x in iterable:
        if not predicate(x):
            yield x
            break
    for x in iterable:
        yield x

predicate 是一个可调用对象，将 iterable 中的元素传入 predicate 得到 false 后开始返回后续其他元素（不再判断 predicate）：

import itertools

foo = itertools.dropwhile(lambda x: x < 8, [1,3,1,5,6,8,1])
[*foo]
# [8, 1]

筛出假值 itertools.filterfalse

语法为itertools.filterfalse(predicate, iterable)创建一个迭代器，只返回 iterable 中 predicate 为 False 的元素。如果 predicate 是 None，返回真值测试为 false 的元素。大致相当于：

def filterfalse(predicate, iterable):
    # filterfalse(lambda x: x%2, range(10)) --> 0 2 4 6 8
    if predicate is None:
        predicate = bool
    for x in iterable:
        if not predicate(x):
            yield x

和内置函数的筛选机制相反，如：

import itertools

foo = itertools.filterfalse(lambda x: x%2==0, [1,3,1,5,6,8,1])
[*foo]
# [1, 3, 1, 5, 1]

邻近分组 itertools.groupby

语法为 itertools.groupby(iterable, key=None) 创建一个迭代器，返回 iterable 中连续的键和组。key 是一个计算元素键值函数。如果未指定或为 None，key 缺省为恒等函数（identity function），返回元素不变。一般来说，iterable 需用同一个键值函数预先排序。

这种行为与 SQL 的 GROUP BY 操作不同，SQL 的操作会忽略输入的顺序将相同键值的元素分在同组中。itertools.groupby 将邻近的相同元素分组。

import itertools

foo = itertools.groupby([0,0,2,2,3,3,2,2])
[*foo]
'''
[(0, <itertools._grouper at 0x7f9d31bf7910>),
 (2, <itertools._grouper at 0x7f9d31bf4820>),
 (3, <itertools._grouper at 0x7f9d31bf7fa0>),
 (2, <itertools._grouper at 0x7f9d31bf7c70>)]
'''

由于 itertools.groupby() 返回的组本身也是一个迭代器，它与 groupby() 共享底层的可迭代对象。因为源是共享的，当 groupby() 对象向后迭代时，前一个组将消失。因此如果稍后还需要使用的话，可保存为列表：

import itertools

foo = itertools.groupby([0,0,2,2,3,3,2,2], key=lambda x: str(x+1))

groupskeys, groups = [], []
for gk, g in foo:
    groupskeys.append(gk)
    groups.append([*g])

groupskeys
# ['1', '3', '4', '3']
groups
# [[0, 0], [2, 2], [3, 3], [2, 2]]

groupby() 大致相当于：

class groupby:
    # [k for k, g in groupby('AAAABBBCCDAABBB')] --> A B C D A B
    # [list(g) for k, g in groupby('AAAABBBCCD')] --> AAAA BBB CC D
    def __init__(self, iterable, key=None):
        if key is None:
            key = lambda x: x
        self.keyfunc = key
        self.it = iter(iterable)
        self.tgtkey = self.currkey = self.currvalue = object()
    def __iter__(self):
        return self
    def __next__(self):
        self.id = object()
        while self.currkey == self.tgtkey:
            self.currvalue = next(self.it)    # Exit on StopIteration
            self.currkey = self.keyfunc(self.currvalue)
        self.tgtkey = self.currkey
        return (self.currkey, self._grouper(self.tgtkey, self.id))
    def _grouper(self, tgtkey, id):
        while self.id is id and self.currkey == tgtkey:
            yield self.currvalue
            try:
                self.currvalue = next(self.it)
            except StopIteration:
                return
            self.currkey = self.keyfunc(self.currvalue)

索引切片 itertools.islice

它有两个使用方式：

itertools.islice(iterable, stop)
itertools.islice(iterable, start, stop[, step])

itertools.islice() 创建一个迭代器，返回从 iterable 里选中的元素。

如果 start 不是0，跳过 iterable 中的元素，直到到达 start 这个位置。之后迭代器连续返回元素，除非 step 设置的值很高导致被跳过。
如果 stop 为 None，迭代器耗光为止；否则，在指定的位置停止。
如果 start 为 None，迭代从0开始。如果 step 为 None ，步长缺省为1。
与普通的切片不同，islice() 不支持将 start ， stop ，或 step 设为负值。

可用来从内部数据结构被压平的数据中提取相关字段（例如一个多行报告，它的名称字段出现在每三行上）。大致相当于：

def islice(iterable, *args):
    # islice('ABCDEFG', 2) --> A B
    # islice('ABCDEFG', 2, 4) --> C D
    # islice('ABCDEFG', 2, None) --> C D E F G
    # islice('ABCDEFG', 0, None, 2) --> A C E G
    s = slice(*args)
    start, stop, step = s.start or 0, s.stop or sys.maxsize, s.step or 1
    it = iter(range(start, stop, step))
    try:
        nexti = next(it)
    except StopIteration:
        # Consume *iterable* up to the *start* position.
        for i, element in zip(range(start), iterable):
            pass
        return
    try:
        for i, element in enumerate(iterable):
            if i == nexti:
                yield element
                nexti = next(it)
    except StopIteration:
        # Consume to *stop*.
        for i, element in zip(range(i + 1, stop), iterable):
            pass

例如：

import itertools

foo = itertools.islice(range(10), 1, 6, 2)
[*foo]
# [1, 3, 5]

连续重叠对 itertools.pairwise

3.10 新版功能

语法为itertools.pairwise(iterable)返回从输入 iterable 中获取的连续重叠对。

输出迭代器中 2 元组的数量将比输入的数量少一个。如果输入可迭代对象中少于两个值则它将为空。

大致相当于：

def pairwise(iterable):
    # pairwise('ABCDEFG') --> AB BC CD DE EF FG
    a, b = tee(iterable)
    next(b, None)
    return zip(a, b)

例如：

import itertools

foo = itertools.pairwise('aabc')
[*foo]
# [('a', 'a'), ('a', 'b'), ('b', 'c')]

成对映射 itertools.starmap

语法itertools.starmap(function, iterable)创建一个迭代器，使用从可迭代对象中获取的参数来计算该函数。当参数对应的形参已从一个单独可迭代对象组合为元组时（数据已被“预组对”）可用此函数代替 map()。map() 与 starmap() 之间的区别可以类比 function(a,b) 与 function(*c) 的区别。大致相当于:

def starmap(function, iterable):
    # 次幂 starmap(pow, [(2,5), (3,2), (10,3)]) --> 32 9 1000
    for args in iterable:
        yield function(*args)

例如：

import itertools

foo = itertools.starmap(lambda x,y: y/x, [(2,4),(3,9)])
[*foo]
# [2.0, 3.0]

itertools.takewhile

语法 itertools.takewhile(predicate, iterable)，它创建一个迭代器，只要 predicate（谓语，一个函数）为真就从可迭代对象中返回元素。大致相当于:

def takewhile(predicate, iterable):
    # takewhile(lambda x: x<5, [1,4,6,4,1]) --> 1 4
    for x in iterable:
        if predicate(x):
            yield x
        else:
            break

例如：

import itertools

foo = itertools.takewhile(lambda x: x<5, range(10))
[*foo]
# [0, 1, 2, 3, 4]

此时与 filter 效果相同：

foo2 = filter(lambda x: x<5, range(10))
[*foo2]
# [0, 1, 2, 3, 4]

但不同的是，filter 扫描整个迭代器以查找与条件匹配的元素。itertools.takewhile 在发现不符合条件的元素时停止。如：

import itertools

my_list = [1, -2, 3, -4, 5, -6]

foo = itertools.takewhile(lambda x: x>0, my_list)
foo2 = filter(lambda x: x>0, my_list)

[*foo] # outputs [1]
[*foo2] # outputs [1, 3, 5]

生成多迭代器 itertools.tee

它的语法为 itertools.tee(iterable, n=2)，它从一个可迭代对象中返回 n 个独立的迭代器。

下面的 Python 代码能帮助解释 tee 做了什么（尽管实际的实现更复杂，而且仅使用了一个底层的 FIFO 队列）。

大致相当于：

def tee(iterable, n=2):
    it = iter(iterable)
    deques = [collections.deque() for i in range(n)]
    def gen(mydeque):
        while True:
            if not mydeque:             # when the local deque is empty
                try:
                    newval = next(it)   # fetch a new value and
                except StopIteration:
                    return
                for d in deques:        # load it to all the deques
                    d.append(newval)
            yield mydeque.popleft()
    return tuple(gen(d) for d in deques)

一旦 tee() 实施了一次分裂，原有的 iterable 不应再被使用；否则 tee 对象无法得知 iterable 可能已向后迭代。

tee 迭代器不是线程安全的。当同时使用由同一个 tee() 调用所返回的迭代器时可能引发 RuntimeError，即使原本的 iterable 是线程安全的。

该迭代工具可能需要相当大的辅助存储空间（这取决于要保存多少临时数据）。通常，如果一个迭代器在另一个迭代器开始之前就要使用大部份或全部数据，使用 list() 会比 tee() 更快。

案例：

import itertools

t = itertools.tee(['a', 'b', 'c', 'd'], 3)

[*t]
'''
[<itertools._tee at 0x7f9d31802780>,
 <itertools._tee at 0x7f9d31800c00>,
 <itertools._tee at 0x7f9d31800400>]
'''

for i in t:
    print([*i])

'''
['a', 'b', 'c', 'd']
['a', 'b', 'c', 'd']
['a', 'b', 'c', 'd']
'''

最长拉链 itertools.zip_longest

语法为itertools.zip_longest(*iterables, fillvalue=None) 创建一个迭代器，从每个可迭代对象中收集元素。如果可迭代对象的长度未对齐，将根据 fillvalue 填充缺失值。迭代持续到耗光最长的可迭代对象。大致相当于：

def zip_longest(*args, fillvalue=None):
    # zip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D-
    iterators = [iter(it) for it in args]
    num_active = len(iterators)
    if not num_active:
        return
    while True:
        values = []
        for i, it in enumerate(iterators):
            try:
                value = next(it)
            except StopIteration:
                num_active -= 1
                if not num_active:
                    return
                iterators[i] = repeat(fillvalue)
                value = fillvalue
            values.append(value)
        yield tuple(values)

如果其中一个可迭代对象有无限长度，zip_longest() 函数应封装在限制调用次数的场景中（例如 islice() 或 takewhile()）。除非指定， fillvalue 默认为 None 。

例如：

import itertools

t = itertools.zip_longest('abcd','xy', fillvalue=3)

[*t]
# [('a', 'x'), ('b', 'y'), ('c', 3), ('d', 3)]

要注意的是，fillvalue 传参时要写参数名。

排列组合迭代器

完全组合 itertools.combinations

语法为 itertools.combinations(iterable, r) 返回由输入 iterable 中元素组成长度为 r 的子序列。每个元素是一个元组，元组的长度是 r 的值。

每个元组的组成先由序列的顺序组合，组合时先变化最后的值，如：

import itertools
chains = itertools.combinations(range(5), 3)

for i in chains:
    print(i)
'''
(0, 1, 2)
(0, 1, 3)
(0, 1, 4)
(0, 2, 3)
(0, 2, 4)
(0, 3, 4)
(1, 2, 3)
(1, 2, 4)
(1, 3, 4)
(2, 3, 4)
'''

即使元素的值相同，不同位置的元素也被认为是不同的。

import itertools
chains = itertools.combinations([0, 0, 7], 2)

for i in chains:
    print(i)

'''
(0, 0)
(0, 7)
(0, 7)
'''

如果元素各自不同，那么每个组合中没有重复元素。

笛卡儿积 itertools.product

此方法的语法为itertools.product(*iterables, repeat=1) ，用于可迭代对象输入的笛卡儿积。

大致相当于生成器表达式中的嵌套循环。例如， product(A, B) 和 ((x,y) for x in A for y in B) 返回结果一样。

嵌套循环像里程表那样循环变动，每次迭代时将最右侧的元素向后迭代。这种模式形成了一种字典序，因此如果输入的可迭代对象是已排序的，笛卡尔积元组依次序发出。

要计算可迭代对象自身的笛卡尔积，将可选参数 repeat 设定为要重复的次数。例如，product(A, repeat=4) 和 product(A, A, A, A) 是一样的。

该函数大致相当于下面的代码，只不过实际实现方案不会在内存中创建中间结果。

def product(*args, repeat=1):
    # product('ABCD', 'xy') --> Ax Ay Bx By Cx Cy Dx Dy
    # product(range(2), repeat=3) --> 000 001 010 011 100 101 110 111
    pools = [tuple(pool) for pool in args] * repeat
    result = [[]]
    for pool in pools:
        result = [x+[y] for x in result for y in pool]
    for prod in result:
        yield tuple(prod)

在 product() 运行之前，它会完全耗尽输入的可迭代对象，在内存中保留值的临时池以生成结果积。相应地，它只适用于有限的输入。

例如：

import itertools

t = itertools.product('abcd','xy')
[*t]
'''
[('a', 'x'),
 ('a', 'y'),
 ('b', 'x'),
 ('b', 'y'),
 ('c', 'x'),
 ('c', 'y'),
 ('d', 'x'),
 ('d', 'y')]
'''

t2 = itertools.product('a','xy', repeat=2)
[*t2]
'''
[('a', 'x', 'a', 'x'),
 ('a', 'x', 'a', 'y'),
 ('a', 'y', 'a', 'x'),
 ('a', 'y', 'a', 'y')]
'''

排列 itertools.permutations

语法为 itertools.permutations(iterable, r=None)，连续返回由 iterable 元素生成长度为 r 的排列。如果 r 未指定或为 None ，r 默认设置为 iterable 的长度，这种情况下，生成所有全长排列。

排列元组会以字典顺序根据所输入 iterable 的顺序发出。因此，如果所输入 iterable 是已排序的，组合元组也将按已排序的顺序生成。

即使元素的值相同，不同位置的元素也被认为是不同的。如果元素值都不同，每个排列中的元素值不会重复。

大致相当于：

def permutations(iterable, r=None):
    # permutations('ABCD', 2) --> AB AC AD BA BC BD CA CB CD DA DB DC
    # permutations(range(3)) --> 012 021 102 120 201 210
    pool = tuple(iterable)
    n = len(pool)
    r = n if r is None else r
    if r > n:
        return
    indices = list(range(n))
    cycles = list(range(n, n-r, -1))
    yield tuple(pool[i] for i in indices[:r])
    while n:
        for i in reversed(range(r)):
            cycles[i] -= 1
            if cycles[i] == 0:
                indices[i:] = indices[i+1:] + indices[i:i+1]
                cycles[i] = n - i
            else:
                j = cycles[i]
                indices[i], indices[-j] = indices[-j], indices[i]
                yield tuple(pool[i] for i in indices[:r])
                break
        else:
            return

permutations() 的代码也可被改写为 product() 的子序列，只要将含有重复元素（来自输入中同一位置的）的项排除。

def permutations(iterable, r=None):
    pool = tuple(iterable)
    n = len(pool)
    r = n if r is None else r
    for indices in product(range(n), repeat=r):
        if len(set(indices)) == r:
            yield tuple(pool[i] for i in indices)

当 0 <= r <= n ，返回项个数为 n! / (n-r)! ；当 r > n ，返回项个数为0。

例如：

import itertools

it1 = itertools.permutations([1,2,3])
[*it1]
# [(1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1), (3, 1, 2), (3, 2, 1)]

it2 = itertools.permutations([1,2,3], r=2)
[*it2]
# [(1, 2), (1, 3), (2, 1), (2, 3), (3, 1), (3, 2)]

子组合 itertools.combinations

语法为 itertools.combinations(iterable, r)，返回由输入 iterable 中元素组成长度为 r 的子序列。

组合元组会以字典顺序根据所输入 iterable 的顺序发出。因此，如果所输入 iterable 是已排序的，组合元组也将按已排序的顺序生成。

即使元素的值相同，不同位置的元素也被认为是不同的。如果元素各自不同，那么每个组合中没有重复元素。

大致相当于：

def combinations(iterable, r):
    # combinations('ABCD', 2) --> AB AC AD BC BD CD
    # combinations(range(4), 3) --> 012 013 023 123
    pool = tuple(iterable)
    n = len(pool)
    if r > n:
        return
    indices = list(range(r))
    yield tuple(pool[i] for i in indices)
    while True:
        for i in reversed(range(r)):
            if indices[i] != i + n - r:
                break
        else:
            return
        indices[i] += 1
        for j in range(i+1, r):
            indices[j] = indices[j-1] + 1
        yield tuple(pool[i] for i in indices)

combinations() 的代码可被改写为 permutations() 过滤后的子序列，（相对于元素在输入中的位置）元素不是有序的。

def combinations(iterable, r):
    pool = tuple(iterable)
    n = len(pool)
    for indices in permutations(range(n), r):
        if sorted(indices) == list(indices):
            yield tuple(pool[i] for i in indices)

当 0 <= r <= n 时，返回项的个数是 n! / r! / (n-r)!；当 r > n 时，返回项个数为 0。

例如：

import itertools

it1 = itertools.combinations([1,2,3], r=2)
[*it1]
# [(1, 2), (1, 3), (2, 3)]

it2 = itertools.combinations([1,2,3], r=3)
[*it2]
# [(1, 2, 3)]

与 itertools.permutations 不同的是，itertools.combinations 不关注元素顺序，所以只关注组合。

替换组合 itertools.combinations_with_replacement

3.1 新版功能

语法为 itertools.combinations_with_replacement(iterable, r) 返回由输入 iterable 中元素组成的长度为 r 的子序列，允许每个元素可重复出现。

组合元组会以字典顺序根据所输入 iterable 的顺序发出。因此，如果所输入 iterable 是已排序的，组合元组也将按已排序的顺序生成。

不同位置的元素是不同的，即使它们的值相同。因此如果输入中的元素都是不同的话，返回的组合中元素也都会不同。

大致相当于：

def combinations_with_replacement(iterable, r):
    # combinations_with_replacement('ABC', 2) --> AA AB AC BB BC CC
    pool = tuple(iterable)
    n = len(pool)
    if not n and r:
        return
    indices = [0] * r
    yield tuple(pool[i] for i in indices)
    while True:
        for i in reversed(range(r)):
            if indices[i] != n - 1:
                break
        else:
            return
        indices[i:] = [indices[i] + 1] * (r - i)
        yield tuple(pool[i] for i in indices)
combinations_with_replacement() 的代码可被改写为 production() 过滤后的子序列，（相对于元素在输入中的位置）元素不是有序的。

def combinations_with_replacement(iterable, r):
    pool = tuple(iterable)
    n = len(pool)
    for indices in product(range(n), repeat=r):
        if sorted(indices) == list(indices):
            yield tuple(pool[i] for i in indices)

当 n > 0 时，返回项个数为 (n+r-1)! / r! / (n-1)!。

例如：

import itertools

it1 = itertools.combinations_with_replacement([1,2,3], r=2)
[*it1]
# [(1, 1), (1, 2), (1, 3), (2, 2), (2, 3), (3, 3)]

it2 = itertools.combinations_with_replacement([1,2,3], r=3)
[*it2]
'''
[(1, 1, 1),
 (1, 1, 2),
 (1, 1, 3),
 (1, 2, 2),
 (1, 2, 3),
 (1, 3, 3),
 (2, 2, 2),
 (2, 2, 3),
 (2, 3, 3),
 (3, 3, 3)]
'''

more-itertools 库

除了以上内置的功能，还可以使用 Python Package Index 上的 more-itertools 第三方，它提供更加丰富的迭代功能，安装：

pip install more-itertools

扩展的工具提供了与底层工具集相同的高性能。保持了超棒的内存利用率，因为一次只处理一个元素，而不是将整个可迭代对象加载到内存。代码量保持得很小，以函数式风格将这些工具连接在一起，有助于消除临时变量。速度依然很快，因为倾向于使用“矢量化”构件来取代解释器开销大的 for 循环和 generator 。

参考：

https://more-itertools.readthedocs.io/en/stable/api.html
https://pypi.org/project/more-itertools/
https://github.com/more-itertools/more-itertools

参考

https://docs.python.org/zh-cn/3/library/itertools.html

Python 标准库 itertools 为高效循环而创建迭代器的函数

无穷迭代器

点数 itertools.count

循环 itertools.cycle

重复 itertools.repeat

有限迭代器

逐积 itertools.accumulate

批处理 itertools.batched

链条 itertools.chain

压缩 itertools.compress

丢弃 itertools.dropwhile

筛出假值 itertools.filterfalse

邻近分组 itertools.groupby

索引切片 itertools.islice

连续重叠对 itertools.pairwise

成对映射 itertools.starmap

itertools.takewhile

生成多迭代器 itertools.tee

最长拉链 itertools.zip_longest

排列组合迭代器

完全组合 itertools.combinations

笛卡儿积 itertools.product

排列 itertools.permutations

子组合 itertools.combinations

替换组合 itertools.combinations_with_replacement

more-itertools 库

参考

相关内容