Thinking Allowed

medical / technology / education / art / flub

PAIRED: A New Multi-agent Approach for Adversarial Environment Generation

This Google AI blog describes a method for machine learning using paired agents. The gap between the one that did the best and the one that did the worst is regret. That feeling of 'could have done' better is what pushes the 'unsupervised environment design' algorithm towards a better solution.

"The adversary’s job is to maximize the antagonist’s reward while minimizing the protagonist's reward. This means it must create environments that are feasible (because the antagonist can solve them and get a high score), but challenging to the protagonist (exploit weaknesses in its current policy). The gap between the two rewards is the regret — the adversary tries to maximize the regret, while the protagonist competes to minimize it."

What can we learn from the machines? Perhaps the neutral approach they have to winning or losing when the purpose of gaming is just to get better at it.


Source: ai.googleblog.com

protagonist regret adversary paired antagonist maximize reward gap