如何在Python中对齐中英文混排字符串

Python中有str.ljust、str.rjust、str.center用于左对齐、右对齐和居中对齐字符串。例如'hello'.ljust(10, '*')返回'hello*****'，'hello'.rjust(10, '*')返回'*****hello'，等。每个中日韩文（CJK字符）在Python中被视为一个字符，然而它们的显示宽度为2，这个矛盾使ljust、rjust、center不能正确地对齐CJK字符：例如'你好'.ljust(5, '*')返回'你好***'而不是'你好*'。另见此文。

为了阐述如何解决这个问题，假设我们要以$w$显示宽度对齐字符串s，并以ljust(doc)为例（另外两个同理），另假设fillchar='*'。易知我们需要在s的右侧补$w-l$个'*'，其中$l$是s的显示宽度。而为了使ljust为我们补$w-l$个'*'，ljust的第1个参数应为$n+w-l$，其中$n$为s的字符数。做简单的变换：$n+w-l = w-(l-n)$。假设s中有$a$个显示宽度为1的字符、$b$个显示宽度为2的字符，则$l=a+2b$，$n=a+b$，因此$l-n=b$，即$n+w-l=w-b$。如果s中显示宽度为2的字符限于CJK字符，那么$b$即为CJK字符的个数。Python中求CJK字符在一个字符串string中的个数的函数为：

import unicodedata

def count_cjk_chars(string):
    return sum(unicodedata.east_asian_width(c) in 'FW' for c in string)

不难得到适用于可能含有CJK字符的对齐函数：

def cjkljust(string, width, fillbyte=' '):
    """
    左对齐
    
    >>> cjkljust('hello', 10, '*')
    'hello*****'
    >>> cjkljust('你好world', 10, '*')
    '你好world*'
    >>> cjkljust('你好world', 1, '*')
    '你好world'
    """
    return string.ljust(width - count_cjk_chars(string), fillbyte)


def cjkrjust(string, width, fillbyte=' '):
    """
    右对齐
    """
    return string.rjust(width - count_cjk_chars(string), fillbyte)


def cjkcenter(string, width, fillbyte=' '):
    """
    居中对齐
    """
    return string.center(width - count_cjk_chars(string), fillbyte)

完整代码参见我的Gist。

也可从PyPI下载使用。