Thinking Allowed

medical / technology / education / art / flub

showing posts for 'transformers'

Constructing Transformers For Longer Sequences with Sparse Attention Methods

"We show that carefully designed sparse attention can be as expressive and flexible as the original full attention model. Along with theoretical guarantees, we provide a very efficient implementation which allows us to scale to much longer inputs. As a consequence, we achieve state-of-the-art results...
Source: googleblog.com

Improving Language Understanding with Unsupervised Learning: We've obtained state-of-the-art results on a suite of diverse

Improving Language Understanding with Unsupervised Learning: We've obtained state-of-the-art results on a suite of diverse language tasks with a scalable, task-agnostic system, which we're also releasing. Our approach is a combination of two existing ideas: transformers and unsupervised pre-training....
Source: openai.com