豆豆友情提示:这是一个非官方 GitHub 代理镜像,主要用于网络测试或访问加速。请勿在此进行登录、注册或处理任何敏感信息。进行这些操作请务必访问官方网站 github.com。 Raw 内容也通过此代理提供。
Skip to content

Commit 043ed2d

Browse files
authored
fix(pptx): handle NotImplementedError from shape.shape_type (#3309)
* fix(pptx): handle NotImplementedError from shape.shape_type python-pptx raises NotImplementedError from Shape.shape_type for <p:sp> elements that aren't placeholders, autoshapes, textboxes, or freeforms (e.g. shapes with empty <p:spPr> from Google Slides exports, LibreOffice, or Keynote). handle_groups() and handle_shapes() access shape_type without catching this, crashing the entire conversion. Add a _safe_shape_type() helper that returns None on NotImplementedError, so unrecognized shapes skip only the GROUP recursion and PICTURE extraction while text and table extraction proceed normally. Fixes #3308 Signed-off-by: Tejas Patel <tejas226@hotmail.com> * Fix lint Signed-off-by: Tejas Patel <tejas226@hotmail.com> --------- Signed-off-by: Tejas Patel <tejas226@hotmail.com>
1 parent 8ec14f2 commit 043ed2d

File tree

6 files changed

+575
-2
lines changed

6 files changed

+575
-2
lines changed

docling/backend/mspowerpoint_backend.py

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -688,12 +688,25 @@ def _walk_linear(
688688
slide_size = Size(width=slide_width, height=slide_height)
689689
doc.add_page(page_no=slide_ind + 1, size=slide_size)
690690

691+
def _safe_shape_type(shape):
692+
"""Return shape.shape_type, or None if unrecognized.
693+
694+
python-pptx raises NotImplementedError for <p:sp> elements
695+
that don't match any known shape category (placeholder,
696+
freeform, autoshape, textbox).
697+
"""
698+
try:
699+
return shape.shape_type
700+
except NotImplementedError:
701+
_log.debug("Skipping shape with unrecognized type: %s", shape.name)
702+
return None
703+
691704
def handle_shapes(shape, parent_slide, slide_ind, doc, slide_size):
692705
handle_groups(shape, parent_slide, slide_ind, doc, slide_size)
693706
if shape.has_table:
694707
# Handle Tables
695708
self._handle_tables(shape, parent_slide, slide_ind, doc, slide_size)
696-
if shape.shape_type == MSO_SHAPE_TYPE.PICTURE:
709+
if _safe_shape_type(shape) == MSO_SHAPE_TYPE.PICTURE:
697710
# Handle Pictures
698711
self._handle_pictures(
699712
shape, parent_slide, slide_ind, doc, slide_size
@@ -716,7 +729,7 @@ def handle_shapes(shape, parent_slide, slide_ind, doc, slide_size):
716729
return
717730

718731
def handle_groups(shape, parent_slide, slide_ind, doc, slide_size):
719-
if shape.shape_type == MSO_SHAPE_TYPE.GROUP:
732+
if _safe_shape_type(shape) == MSO_SHAPE_TYPE.GROUP:
720733
for groupedshape in shape.shapes:
721734
handle_shapes(
722735
groupedshape, parent_slide, slide_ind, doc, slide_size
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
item-0 at level 0: unspecified: group _root_
2+
item-1 at level 1: chapter: group slide-0
3+
item-2 at level 2: title: Q3 Revenue Summary
4+
item-3 at level 2: list: group list
5+
item-4 at level 3: list_item: Total revenue grew 18% year-over-year.
6+
item-5 at level 3: list_item: Enterprise segment led growth at 24%.
7+
item-6 at level 3: list_item: APAC region exceeded targets by 12%.
8+
item-7 at level 1: chapter: group slide-1
9+
item-8 at level 2: paragraph: Key Metrics
10+
item-9 at level 2: paragraph: Monthly active users: 4.2M
11+
item-10 at level 2: paragraph: Net retention rate: 112%
12+
item-11 at level 2: paragraph: Text in an unrecognized shape
13+
item-12 at level 1: chapter: group slide-2
14+
item-13 at level 2: title: Next Steps
15+
item-14 at level 2: list: group list
16+
item-15 at level 3: list_item: 1. Finalize pricing model
17+
item-16 at level 3: list_item: 2. Launch APAC campaign
18+
item-17 at level 3: list_item: 3. Hire 3 additional SEs

0 commit comments

Comments
 (0)