-
Notifications
You must be signed in to change notification settings - Fork 586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vulnerability to pass-pass attack #699
Comments
Thanks for the concern! While I have no doubt that all current Go bots have plenty of blind spots and are highly exploitable, the popular reporting and communication around the paper you mention is highly misleading - the paper you link as it stands doesn't currently achieve this! You might have been confused by the misleading marketing/reporting of this paper. KataGo with typical default settings is not particularly vulnerable to this attack. A major detail you might have overlooked is that the attack only applies to KataGo's raw policy or very small amounts of search. Nobody expects the raw policy or low-playout play to be robust anyways - even in completely ordinary usage in normal positions, already for years more-experienced Go players have been cautioning newer players that for various reasons AI analysis can be untrustworthy or unuseful, especially at lower numbers of playouts. If you've ever had the experience of looking at the policy probabilities on the raw policy even in common tactical situations, you quickly realize how much probability mass it can sometimes put on pretty huge blunders, even in normal situations (i.e. having to do with real gameplay, rather than details about ending the game in certain rules). They happen to usually be less mass than on good moves, but it can also be randomly more, and obviously if you optimized for it adversarially you could find tons of these. The paper claims it works up to about 100 playouts of search, which is fairly small. Also, I've had difficulty reproducing the attack working even at 16 or 32 playouts, and am currently corresponding with the authors to see if maybe there is a bug or some discrepancy in how they configured their experiments, so we should defer making any confident judgments here until that's resolved. :) Obviously this could change if there is a way to "improve" the attack method. I'd find it very interesting if there were a reliable method to find blind spots and genuine weaknesses that aren't correctable by search. There are almost certainly tons of these too! https://github.com/isty2e/Baduk-test-positions has a fun collection of a few of them, some of which KataGo still is incapable of solving even with lots of search. So let's hold on until the authors or someone other team improves the methods so that they do work to find these things (which people are no doubt working on), rather than "tricks" that aren't actually of great concern and/or aren't actually a problem in realistic usage. |
Relevant discussion: AlignmentResearch/go_attack#55 |
A preprint (Adversarial Policies Beat Professional-Level Go AIs) was recently published about a strategy for tricking KataGo into passing when its position is very strong but its territory is not formally secured, leading to a loss. See this excerpt:
This is related in a way to the discussion in #242 ("katago do a training for star array"). And in general I agree with the conclusion of that issue (ignore such specialized attacks)
E.g. @lightvector writes:
The paper's goal is to encourage more robust training techniques for safety-critical applications:
I wonder, though, if this is a worthy case. Would it be hard to prevent the "pass-pass" attack noted in the paper? Or would it also be overly complex or besides-the-point to evaluate the board position for all possible rule sets to avoid this sort of cheap trick?
As noted in the other issue, presumably a variety of other attacks can be mounted, but they may not be as straightforward as this one, which a human with modest go skills could presumably also use.
The text was updated successfully, but these errors were encountered: