gh-146270: Fix PyMember_SetOne(..., NULL) not being atomic#148800
gh-146270: Fix PyMember_SetOne(..., NULL) not being atomic#148800dpdani wants to merge 5 commits intopython:mainfrom
PyMember_SetOne(..., NULL) not being atomic#148800Conversation
| // Other cases are already covered by the above: | ||
| // oldv == NULL && v != NULL: pseudo-non-existing attribute is set, ok | ||
| // oldv != NULL && v == NULL: existing attribute is deleted, ok | ||
| // oldv != NULL && v != NULL: existing attribute is set, ok |
There was a problem hiding this comment.
Drop these comments. They're not sufficiently helpful to a reader.
| if (v == NULL && oldv == NULL && l->type == Py_T_OBJECT_EX) { | ||
| // Pseudo-non-existing attribute is deleted: raise AttributeError. | ||
| // The attribute doesn't exist to Python, but CPython knows that it | ||
| // could have existed because it was declared in __slots__. | ||
| // _Py_T_OBJECT does not raise an exception here, and | ||
| // PyMember_GetOne will return Py_None instead of NULL. | ||
| PyErr_SetString(PyExc_AttributeError, l->name); | ||
| return -1; | ||
| } |
There was a problem hiding this comment.
- Keep the comments simple, i.e., "deleting an already deleted attribute raises an exception
- Move the decref after the check. Reading a pointer that has been freed, even if it's just a NULL check, is UB
- The
l->typeis redundant
if (v == NULL && oldv == NULL && l->type == Py_T_OBJECT_EX) {
// Raise an exception when attempting to delete an already deleted attribute
PyErr_SetString(PyExc_AttributeError, l->name);
return -1;
}
Py_XDECREF(oldv);There was a problem hiding this comment.
I didn't understand what you meant with "The l->type is redundant."
I have expanded a comment, hopefully it clarifies why that check is needed?
|
|
||
| run_in_threads([writer, reader, reader, reader]) | ||
|
|
||
| def test_del_object_is_atomic(self): |
There was a problem hiding this comment.
This test is too slow. It's not worth trying to catch every sort non-sequential consistency. If you catch the data race under TSan reasonably often, that's fine:
from test.support.threading_helper import run_concurrently
...
class Spam:
__slots__ = [ "foo" ]
def deleter(spam, successes):
try:
del spam.foo
successes.append(True)
except AttributeError:
successes.append(False)
for _ in range(10):
spam = Spam()
spam.foo = 0
successes = []
run_concurrently(deleter, nthreads=4, args=(spam, successes))
self.assertEqual(sum(successes), 1)There was a problem hiding this comment.
When you say it's too slow, what upper bound do you have in mind?
Btw, with your test, I cannot get it to fail when reverting structmember.c, nor did I get TSan warnings in 5 out of 5 attempts.
I think the reason why that's so is that just starting threads doesn't generate enough contention: they're not actually going to hit the current *(PyObject **)addr == NULL at line 177 concurrently with the FT_ATOMIC_STORE_PTR_RELEASE at line 335. This is why I added SpinningBarrier, to generate more contention and have them hit these lines concurrently, in a reliable way.
I can also just reduce the number of iterations in my test. Even with just iters = 10, TSan reported a race 10 out of 10 times. I kinda put 1000 to begin with and then forgot about it.
With iters = 10, the whole test suite runs in about 35ms on my MacBook, albeit the test doesn't fail reliably with TSan turned off, only failing ~33% of the times.
There was a problem hiding this comment.
You should aim for <100ms. I get a TSan error on nearly every run
There was a problem hiding this comment.
With the current test on 10 iterations, the whole suite runs in <50ms, so I think that should be fine.
It's curious that you're getting more TSan errors, on my macbook I'm seeing 1 TSan warning every ~20 runs. Can I ask you what's your ./configure? Mine is ./configure --config-cache --disable-gil --with-pydebug --with-thread-sanitizer.
Btw, if I use your test code and switch threading.Barrier -> _testcapi.SpinningBarrier inside run_concurrently, I saw a TSan warning on 100 out of 100 runs.
To be clear, I'm not suggesting to use SpinningBarrier in run_concurrently, I'm saying that it definitely gives more determinism to the results, and I think it's worth keeping it.
| Py_RETURN_NONE; | ||
| } | ||
|
|
||
| /** |
There was a problem hiding this comment.
I don't think it's worth adding this
There was a problem hiding this comment.
Without SpinningBarrier the bug becomes significantly harder to reproduce. Also see my comment above.
There was a problem hiding this comment.
It's still not worth it. I'm more than happy to accept reduced chances of catching one specific regression per test run in exchange for avoiding this.
|
Thanks for fixing this! I left some comments above. I think it's worth prioritizing keeping the tests fast. In general, it helps to keep them small and understandable too. |
This PR fixes a sequential consistency bug (introduced by yours truly) whereby two threads that are deleting a struct member may observe both their deletions to be successful.
In order to test this properly, I couldn't use
threading.Barrierbecause its overhead was enough to mask the bug, making the test flaky. Therefore, a spinning-loop barrier was added in the_testcapimodule.slot.__delete__()behaves as non-atomic in free-threading #146270