We propose an automated approach for generating findings from X-Ray images based on multi-layer transformer decoders. Several convolutional neural network architectures were investigated for image extraction. Average cosine similarity based on BERT embeddings is selected as the measure of performance evaluation. This automated approach could be useful in situations where there is a lack of expertise in reading medical images, or when a preliminary result or second opinion is needed.