豆豆友情提示:这是一个非官方 GitHub 代理镜像,主要用于网络测试或访问加速。请勿在此进行登录、注册或处理任何敏感信息。进行这些操作请务必访问官方网站 github.com。 Raw 内容也通过此代理提供。
Skip to content

gh-146270: Fix PyMember_SetOne(..., NULL) not being atomic#148800

Open
dpdani wants to merge 5 commits intopython:mainfrom
dpdani:gh/146270-fix-slot-del-atomic
Open

gh-146270: Fix PyMember_SetOne(..., NULL) not being atomic#148800
dpdani wants to merge 5 commits intopython:mainfrom
dpdani:gh/146270-fix-slot-del-atomic

Conversation

@dpdani
Copy link
Copy Markdown
Contributor

@dpdani dpdani commented Apr 20, 2026

This PR fixes a sequential consistency bug (introduced by yours truly) whereby two threads that are deleting a struct member may observe both their deletions to be successful.

In order to test this properly, I couldn't use threading.Barrier because its overhead was enough to mask the bug, making the test flaky. Therefore, a spinning-loop barrier was added in the _testcapi module.

@diegorusso diegorusso requested a review from colesbury April 20, 2026 16:03
Comment thread Python/structmember.c Outdated
Comment on lines +338 to +341
// Other cases are already covered by the above:
// oldv == NULL && v != NULL: pseudo-non-existing attribute is set, ok
// oldv != NULL && v == NULL: existing attribute is deleted, ok
// oldv != NULL && v != NULL: existing attribute is set, ok
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop these comments. They're not sufficiently helpful to a reader.

Comment thread Python/structmember.c
Comment on lines +329 to +337
if (v == NULL && oldv == NULL && l->type == Py_T_OBJECT_EX) {
// Pseudo-non-existing attribute is deleted: raise AttributeError.
// The attribute doesn't exist to Python, but CPython knows that it
// could have existed because it was declared in __slots__.
// _Py_T_OBJECT does not raise an exception here, and
// PyMember_GetOne will return Py_None instead of NULL.
PyErr_SetString(PyExc_AttributeError, l->name);
return -1;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Keep the comments simple, i.e., "deleting an already deleted attribute raises an exception
  2. Move the decref after the check. Reading a pointer that has been freed, even if it's just a NULL check, is UB
  3. The l->type is redundant
if (v == NULL && oldv == NULL && l->type == Py_T_OBJECT_EX) {
    // Raise an exception when attempting to delete an already deleted attribute
    PyErr_SetString(PyExc_AttributeError, l->name);
    return -1;
}
Py_XDECREF(oldv);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't understand what you meant with "The l->type is redundant."

I have expanded a comment, hopefully it clarifies why that check is needed?


run_in_threads([writer, reader, reader, reader])

def test_del_object_is_atomic(self):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is too slow. It's not worth trying to catch every sort non-sequential consistency. If you catch the data race under TSan reasonably often, that's fine:

from test.support.threading_helper import run_concurrently

...

        class Spam:
            __slots__ = [ "foo" ]

        def deleter(spam, successes):
            try:
                del spam.foo
                successes.append(True)
            except AttributeError:
                successes.append(False)

        for _ in range(10):
            spam = Spam()
            spam.foo = 0
            successes = []
            run_concurrently(deleter, nthreads=4, args=(spam, successes))
            self.assertEqual(sum(successes), 1)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you say it's too slow, what upper bound do you have in mind?

Btw, with your test, I cannot get it to fail when reverting structmember.c, nor did I get TSan warnings in 5 out of 5 attempts.

I think the reason why that's so is that just starting threads doesn't generate enough contention: they're not actually going to hit the current *(PyObject **)addr == NULL at line 177 concurrently with the FT_ATOMIC_STORE_PTR_RELEASE at line 335. This is why I added SpinningBarrier, to generate more contention and have them hit these lines concurrently, in a reliable way.

I can also just reduce the number of iterations in my test. Even with just iters = 10, TSan reported a race 10 out of 10 times. I kinda put 1000 to begin with and then forgot about it.

With iters = 10, the whole test suite runs in about 35ms on my MacBook, albeit the test doesn't fail reliably with TSan turned off, only failing ~33% of the times.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should aim for <100ms. I get a TSan error on nearly every run

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the current test on 10 iterations, the whole suite runs in <50ms, so I think that should be fine.

It's curious that you're getting more TSan errors, on my macbook I'm seeing 1 TSan warning every ~20 runs. Can I ask you what's your ./configure? Mine is ./configure --config-cache --disable-gil --with-pydebug --with-thread-sanitizer.

Btw, if I use your test code and switch threading.Barrier -> _testcapi.SpinningBarrier inside run_concurrently, I saw a TSan warning on 100 out of 100 runs.

To be clear, I'm not suggesting to use SpinningBarrier in run_concurrently, I'm saying that it definitely gives more determinism to the results, and I think it's worth keeping it.

Comment thread Modules/_testcapimodule.c
Py_RETURN_NONE;
}

/**
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's worth adding this

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without SpinningBarrier the bug becomes significantly harder to reproduce. Also see my comment above.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still not worth it. I'm more than happy to accept reduced chances of catching one specific regression per test run in exchange for avoiding this.

@colesbury
Copy link
Copy Markdown
Contributor

Thanks for fixing this! I left some comments above.

I think it's worth prioritizing keeping the tests fast. In general, it helps to keep them small and understandable too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants