The ROS node for Visual Question Answering(VQA).

System Configuration


This node requires to work with the Docker Container for inference. Please build the container at first following Setup instruction.



This node requires NVIDIA GPU and more than 4GB GRAM to work properly. You have to install nvidia-container-toolkit for using GPU with docker. Please follow official instruction.

Build the docker image

You have to build the docker image of OFA

roscd jsk_perception/docker

Subscribing topic

  • ~image (sensor_msgs/Image)

    Input image

Publishing topic

  • ~result (jsk_recognition_msgs/VQAResult)

    VQA result

  • ~result/image (sensor_msgs/Image)

    Images used for inference

  • ~visualize (std_msgs/String)

    VQA question and answer to visualize

Action topic

  • ~inference_server/goal (jsk_recognition_msgs/VQATaskActionGoal)

    VQA request with custom questions and image

  • ~inference_server/result (jsk_recognition_msgs/VQATaskActionResult)

    VQA result of ~inference_server/goal


  • ~host (String, default: localhost)

    The host name or IP of inference container

  • ~port (Integer, default: 8080)

    The HTTP port of inference container

Dynamic Reconfigure Parameters

  • ~questions (string, default: what does this image describe?)

    Default questions used for subscribing image topic.


Run inference container on another host or another terminal

In the remote GPU machine,

cd jsk_recognition/jsk_perception/docker
./run_jsk_vil_api --port (Your vacant port) --ofa_task caption --ofa_model_scale huge

--ofa_task should be caption or vqa. Empirically, the output results are more natural for VQA tasks with the Caption model than with the VQA model in OFA.

If you encounter GPU ram shortage, please make ofa_model_scale large .

In the ROS machine,

roslaunch jsk_perception vqa.launch port:=(Your inference container port) host:=(Your inference container host) VQA_INPUT_IMAGE:=(Your image topic name) gui:=true 

Run both inference container and ros node in single host

roslaunch jsk_perception vqa.launch run_api:=true VQA_INPUT_IMAGE:=(Your image topic name) gui:=true