Python操作Word文档:使用Python创建和编辑Word文件

使用docx库,可以执行各种任务

  • 创建新文档:可以使用库从头开始或基于模板生成新的Word文档。这对于自动生成报告、信函和其他类型的文档非常有用。
  • 修改现有文档:可以打开现有的Word文档,并使用库修改其内容、格式、样式等。这对于自动更新遵循特定结构的文档特别方便。
  • 添加内容:可以使用库向文档添加段落、标题、表格、图像和其他元素。这有助于用数据动态填充文档。
  • 格式化:该库允许将各种格式化选项应用于文档中的文本和元素,例如更改字体、颜色、对齐方式等。
  • 提取信息:还可以从现有Word文档中提取文本、图像、表格和其他内容,以便进一步分析
  • Docx functions

    1. 文档创建和保存

  • Document(): 创建一个新的word文档
  • Document.save(‘filename.docx’):保存一个document 称为文件(*.docx)
  • 2. Paragraphs and Text (段落和文本)

  • add_paragraph(‘text’): 添加具有指定文本(text)的新段落(Paragraphs)。
  • paragraph.text:获取或设置段落的文本内容。
  • 3. Headings (标题,可以设置几级标题)

  • add_heading(‘text’, level=n): 添加具有指定文本和级别的标题 (1 to 9).
  • 4. Styles and Formatting (样式与格式)

  • paragraph.style = ‘StyleName’: 应用特定的段落样式
  • run = paragraph.add_run(‘text’): 添加一段具有特定格式的文本
  • run.bold, run.italic, etc.: 对管路(run)应用格式设置
  • 5. Tables (表格操作)

  • add_table(rows, cols): 添加具有指定行数和列数的表
  • table.cell(row, col): 获取表中的特定单元格(cell)
  • cell.text:获取或设置单元格的文本内容
  • table.rows, table.columns:访问表的行和列
  • 6. Images(图片操作)

  • document.add_picture(‘image_path’): 向文档中添加图像
  • run.add_picture(‘image_path’): 将图像添加到特定管道(run)中, 比如简历照片位置固定的
  • 7. Document Properties (文档属性)

  • document.core_properties.title: 设置文档的标题
  • document.core_properties.author: 设置文档的作者
  • document.core_properties.keywords: 设置文档的关键词
  • 8. Sections and Page Setup (分区和页面设置)

  • section = document.sections[0]: 获取文档的第一部分( Get the first section of the document)
  • section.page_width, section.page_height: 设置页面尺寸(Set page dimensions)
  • 9. Lists (列表)

    就是markdown中的list,比如下面的这两个就是无序的,大标题1,2,3…就是有序的

  • add_paragraph(‘text’, style=’ListBullet’):创建无序列表( Create a bulleted list)
  • add_paragraph(‘text’, style=’ListNumber’): 创建有序列表(Create a numbered list.)
  • 10. Hyperlinks (超链接)

  • run.add_hyperlink(‘url’, ‘text’): 给当前管道(run)内的特定文本(text)添加超链接(Add a hyperlink to a run)
  • 11. Document Modification (文件修改)

  • document.paragraphs: 访问文档中的所有段落(Access all paragraphs in the document)
  • document.tables: 访问文档中的所有表格(Access all tables in the document)
  • document.styles: 访问和操作文档样式(Access and manipulate document styles)
  • 12. Document Reading(文档读取)

  • Document(‘filename.docx’): 读取一个存在的word文件
  • document.paragraphs[0].text: 访问第一段(paragraphs)的文本(text)

  • 小例子

    1. Installation (安装)

    pip install python-docx
    

    2. 创建一个新的word文档

    创建一个包含文本、标题、表格、图像和格式的文档

    1. Create a new document.(创建一个新的document 对象)
    2. Add a title with centered alignment.(添加一个标题(title)并居中对齐)
    3. Add a paragraph with bold and italic text.(添加带有粗体和斜体文本的段落)
    4. Add a heading and a bulleted list.(添加标题(heading)和项目符号列表)
    5. Add a table with custom column widths.(添加table,并自定义列宽)
    6. Add an image to the document.(添加图片)
    7. Save the document with the name ‘example_document.docx’.(保存文件,文件名为 example_document.docx)
    from docx import Document
    from docx.shared import Pt
    from docx.enum.text import WD_ALIGN_PARAGRAPH
    
    # Create a new document
    doc = Document()
    
    # Add a title
    title = doc.add_heading('Document Creation Example', level=1)
    title.alignment = WD_ALIGN_PARAGRAPH.CENTER
    
    # Add a paragraph with bold and italic text
    paragraph = doc.add_paragraph('This is a sample document created using the python-docx library.')
    run = paragraph.runs[0]
    run.bold = True
    run.italic = True
    
    # Add a heading
    doc.add_heading('Section 1: Introduction', level=2)
    
    # Add a bulleted list
    list_paragraph = doc.add_paragraph()
    list_paragraph.add_run('Bullet 1').bold = True
    list_paragraph.add_run(' - This is the first bullet point.')
    list_paragraph.add_run('\n')
    list_paragraph.add_run('Bullet 2').bold = True
    list_paragraph.add_run(' - This is the second bullet point.')
    
    # Add a table
    doc.add_heading('Section 2: Data', level=2)
    table_1 = doc.add_table(rows=1, cols=2)
    table_1.style = 'Table Grid'
    table_1.autofit = False
    table_1.allow_autofit = False
    for row in table_1.rows:
        for cell in row.cells:
            cell.width = Pt(150)
    table_1.cell(0, 0).text = 'cat'
    table_1.cell(0, 1).text = 'dog'
    
    table_2 = doc.add_table(rows=3, cols=3)
    table_2.style = 'Table Grid'
    table_2.autofit = False
    table_2.allow_autofit = False
    for row in table_2.rows:
        for cell in row.cells:
            cell.width = Pt(100)
    table_2.cell(0, 0).text = 'Name'
    table_2.cell(0, 1).text = 'Age'
    table_2.cell(0, 2).text = 'City'
    for i, data in enumerate([('Alice', '25', 'New York'), ('Bob', '30', 'San Francisco'), ('Charlie', '22', 'Los Angeles')], start=0):
        print(i, data)
        table_2.cell(i, 0).text = data[0]
        table_2.cell(i, 1).text = data[1]
        table_2.cell(i, 2).text = data[2]
    
    # Add an image
    doc.add_heading('Section 3: Image', level=2)
    doc.add_paragraph('Here is an image of cat:')
    doc.add_picture('../imgs/cat.jpg', width=Pt(300))
    
    # Save the document
    doc.save('../word_files/example_new_document.docx')
    

    结果(哈哈,样式有点丑,暂时忽略…):

    3. 修改现有的word文档

    1. open an existing Word document (‘existing_document.docx’).( 读取一个存在的word文档)
    2. Modify the text, formatting, and alignment of the first paragraph.(修改第一段的文本、格式和对齐方式)
    3. Add a new heading.(添加一个新的标题)
    4. Add a new paragraph with a hyperlink.(添加带有超链接的新段落)
    5. Add a new table with custom column widths and data.(添加一个具有自定义列宽和数据的新表)
    6. Save the modified document as ‘modified_document.docx’.(将修改后的文档另存为“modified_document.docx”)
    import docx
    from docx import Document
    from docx.shared import Pt
    from docx.enum.text import WD_ALIGN_PARAGRAPH
    
    def add_hyperlink(paragraph, url, text, color, underline):
        """
        A function that places a hyperlink within a paragraph object.
    
        :param paragraph: The paragraph we are adding the hyperlink to.
        :param url: A string containing the required url
        :param text: The text displayed for the url
        :return: The hyperlink object
        """
    
        # This gets access to the document.xml.rels file and gets a new relation id value
        part = paragraph.part
        r_id = part.relate_to(url, docx.opc.constants.RELATIONSHIP_TYPE.HYPERLINK, is_external=True)
    
        # Create the w:hyperlink tag and add needed values
        hyperlink = docx.oxml.shared.OxmlElement('w:hyperlink')
        hyperlink.set(docx.oxml.shared.qn('r:id'), r_id, )
    
        # Create a w:r element
        new_run = docx.oxml.shared.OxmlElement('w:r')
    
        # Create a new w:rPr element
        rPr = docx.oxml.shared.OxmlElement('w:rPr')
    
        # Add color if it is given
        if not color is None:
          c = docx.oxml.shared.OxmlElement('w:color')
          c.set(docx.oxml.shared.qn('w:val'), color)
          rPr.append(c)
    
        # Remove underlining if it is requested
        if not underline:
          u = docx.oxml.shared.OxmlElement('w:u')
          u.set(docx.oxml.shared.qn('w:val'), 'none')
          rPr.append(u)
    
        # Join all the xml elements together add add the required text to the w:r element
        new_run.append(rPr)
        new_run.text = text
        hyperlink.append(new_run)
    
        paragraph._p.append(hyperlink)
    
        return hyperlink
    # Open an existing document
    
    doc = Document('../word_files/example_new_document.docx')
    
    # Access the first paragraph and modify its text and formatting
    first_paragraph = doc.paragraphs[0]
    first_paragraph.text = 'Updated Text: 宫廷玉液酒,一百八一杯。'
    run = first_paragraph.runs[0]
    run.bold = True #加粗
    run.italic = True #斜体
    run.font.size = Pt(20) #字号
    first_paragraph.alignment = WD_ALIGN_PARAGRAPH.CENTER #居中对齐
    
    # Add a new heading
    doc.add_heading('New Section', level=1)
    
    # Add a new paragraph with a hyperlink
    new_paragraph = doc.add_paragraph('Visit my bolg website: ')
    hyperlink = add_hyperlink(new_paragraph,
                  'https://blog.csdn.net/weixin_40959890/article/details/137598605?spm=1001.2014.3001.5501',
                  'Python docx:在Python中创建和操作Word文档',
                  'FF8822', True)
    # run = new_paragraph.add_run('Python docx:在Python中创建和操作Word文档')
    # run.hyperlink.address = 'https://blog.csdn.net/weixin_40959890/article/details/137598605?spm=1001.2014.3001.5501'
    
    # Add a new table
    doc.add_heading('Table Section', level=2)
    table = doc.add_table(rows=4, cols=4)
    table.style = 'Table Grid'
    table.autofit = False
    table.allow_autofit = False
    for row in table.rows:
        for cell in row.cells:
            cell.width = Pt(100)
    table.cell(0, 0).text = 'Name'
    table.cell(0, 1).text = 'Age'
    table.cell(0, 2).text = 'City'
    for i, data in enumerate([('David', '128', 'London'), ('Emma', '135', 'New York'), ('John', '122', 'Los Angeles')], start=1):
        table.cell(i, 0).text = data[0]
        table.cell(i, 1).text = data[1]
        table.cell(i, 2).text = data[2]
    
    # Save the modified document
    doc.save('../word_files/example_modified_document.docx')
    
    结果看一下(依旧很丑,哈哈,但是修改成功了):

    参考

    word插入超链接
    examples
    python-docx文档
    pypi python-docx

    作者:桂花很香,旭很美

    物联沃分享整理
    物联沃-IOTWORD物联网 » Python操作Word文档:使用Python创建和编辑Word文件

    发表回复