Modular Co-Attention Network for Visual Question Answering