Idea
Visualization shows that the global contexts modeled by Non-local Neural Networks are almost the same for different query positions within an image (The figure above shows the attention maps for different query positions (red points) in a non-local block on COCO object detection)
Therefore, we could simplify the non-local block by explicitly using a query-independent attention map for all query positions. Then we add the same aggregated features using the attention map to the features of all query positions to form the output