# Super-single image based on in-depth study for MR brain images

### Background

Considering the low quality 2D image \ ({\ mathbf {y}} \)our goal is to achieve a high quality counterpart \ ({\ mathbf {x}} \). Relationship between \ ({\ mathbf {x}} \) and \ ({\ mathbf {y}} \) can be modeled as follows:

\ begin {aligned} {\ mathbf {y}} = {\ varvec {{\ mathscr {F}}}} _ {LR} ^ {- 1} {\ mathbf {D}} {\ varvec {{\ mathscr {F}}}} _ {HR} {\ mathbf {x}} + {\ mathbf {n}}, \ end {aligned}

(1)

where \ ({\ varvec {{\ mathscr {F}}}} _ {LR} ^ {- 1} \) is the reverse FFT operator used in LR mode, \ ({\ mathbf {D}} \) an operator that selects only low-frequency components in the k-space, which in our case gives a dimensional matrix. \ (64 \ times 64 \), \ ({\ varvec {{\ mathscr {F}}}} _ {HR} \) FFT operator for HR mode (\ (128 \ times 128 \)) and \ ({\ mathbf {n}} \) is the sound vector (unknown). The purpose of the super-resolution is to find the approximate image of the operator \ ({\ varvec {{\ mathscr {F}}}} _ _ LR} ^ {- 1} {\ mathbf {D}} {\ varvec {{\ mathscr {F}}}} _ {HR} \). We note that the standard high-resolution problem is usually set differently, i.e., imagining that the HR image is blurred and the pattern is reduced, which leads to the LR image. However, using Eq. (1) more closely follow the process of finding low-quality MRI.

### Convolutionary neural network

We chose a network of convolutional neurons from the SRDenseNet architecture for our implementation. Our selection is based on the good performance of SRDenseNet and along with the adjustable number of its parameters. We note that since there is extensive literature on in-depth study methods for immediate solutions, other networks may also be applicable.24,26,27. SRDenseNet was introduced by Tong and others.25 It consists of blocks of densely packed convection layers (“dense blocks”). In each dense block, which is compatible with the DenseNet architecture33, each convolution layer receives as an input the combined outputs of all previous convolution layers, as shown in Figure 1. Reuse of feature maps in this way avoids the study of redundant features. Instead, the current layer is forced to learn more. As in the original paper, we will use 8 dense blocks of 8 convolution layers, in which each convoluted layer produces 16 feature maps, which means that each dense block gives 128 character maps. The size of the nucleus in each convoluted layer is 3×3. After the final block is compacted, to reduce the number of feature maps to 256, a swamp layer with 1×1 size convolution cores is used, followed by a transpositional convection layer (often called a decontamination layer) that transmits the image to the upper HR space. carries. Note that in this work the gain factor is equal to 2, and therefore we use only one convolution layer with step 2 compared to the 2 involution layer in the original SRDenseNet, which was used for the surge factor 4. Finally, another convoluted layer with a 3×3 core is used to reduce the output to one channel. All layers except the final convolution layer use the nonlinear activation function ReLU (Rectified Linear Unit). In addition, transition connections are used to transfer the output of each dense block to each subsequent dense block, as shown by the SRDenseNet_All architecture on the original paper.25. The complete architecture, which has 1,910,689 study parameters, is shown in Figure 2.

### Data collection and training

In this work, we focused on 2D images, but it should be noted that this approach can be extended to 3D. We generated a set of studies and validations using 2D images taken from the NYU FastMRI Initiative public database (fastmri.med.nyu.edu).34.35. Thus, NYU investigators provided fastMRI data but did not participate in the analysis or writing of this manuscript. A list of NYU fastMRI investigators that need to be updated can be found on the website above. The main purpose of fastMRI is to test whether machine learning can help in the recovery of medical images. The database consists of parts of images with weights T1, weight T2, and FLAIR (liquid inversion recovery) obtained using 1.5 T and 3 T MRI scanners. By studying such different images of the MR brain, the resulting network should also be applied to images obtained using different sequences, without the need to re-study the network when changing parameter settings. We note that even if we have a plan to implement a trained network, for example, only for low-field MR images of T1, the study of the network in high-resolution MR images obtained using different types of sequences makes sense. has. , adapt it to different types of input. This is because the rest times vary with field strength, and therefore a weighted T1 image taken with a low field scanner may differ from an image obtained using a high field scanner. One parameter that needs to be taken care of is image size. We use input images and output output images \ (64 \ times 64 \) and \ (128 \ times 128 \), respectively. Due to the purely convoluted nature of the network, it is possible to use images of different sizes as input. The network must be able to accommodate small deviations. However, it is unlikely that it will be common to images that differ significantly in size from the images in the study set.

The images in the database are of different sizes. As we are interested in HR images \ (128 \ times 128 \) pixels, all images were resized \ (128 \ times 128 \) pixel. This is done using FFT to convert images to k-space data, select the central part of the k-space, and then implement Fourier Rapid Conversion (FFT), as in the equation. (1). After that, we reduce these HR images to LR images \ (64 \ times 64 \) pixel, by, again, using Eq. (1), i.e. we use FFT to convert the image to k-space, we select the central part of k-space (volume). \ (64 \ times 64 \)) and apply the reverse FFT to obtain an LR image. To obtain louder images of the LR, we add a Gaussian complex sound in the k-space, where the sound level varies from image to image. We used a number of sound levels that are consistent with the low MR images we saw in practice. This step is necessary so that the network of convolutional neurons is able to detect images obtained using a low-field MRI scanner, which transmits signals with a relatively low SNR due to the weaker magnetic field.36. In this way, 29,059 and 17,292 image pairs, respectively, were obtained from the study set and validation presented in the data set. We assigned 10,000 of the 17,292 pairs of images in the validation kit to our test kit and another 7,292 to our test kit. Some examples of illustrated pairs that are included in the study set are shown in Figure 3. We note that the data were distributed at the patient level and therefore no data leakage occurred.

Because SRDenseNet is a network of purely convoluted neurons, instead of complete images, which require less memory when learning, it can be studied in patches. In addition, the use of stickers allows us to generate more data. Therefore, we used HR-LR image pairs to create 190,000 pairs of patches for network learning and 10,000 pairs of patches for validation, HR and patches corresponding to their LR size. \ (32 \ times 32 \) pixels and \ (16 \ times 16 \) pixels, respectively.

A convoluted neural network was implemented in TensorFlow37. Optimizers Adam38 with the speed of learning \ (10 ​​^ {- 3} \) was used to reduce the average error loss between the network output and the HR image patches of the model. In addition, we investigated two different loss functions: \ (\ ell _1 \)-loss and HFEN (High-Frequency Error Norm).39. However, after visual inspection of the resulting images, we found that the average error loss was superior to the others. We used a batch size of 20 and a total number of rounds of 74, as this was consistent with the lowest value of the approval loss. The training was conducted on a Titan X Geforce GPU (12 GB) and lasted about 5 hours.

### Take a low MR field image

Two triple in vivo brain scans of two healthy volunteers were obtained using a low-field MRI scan described by O’Reilly and others.9 We will use different parts (2D) of the 3D images as network input. Both experiments were performed using a turbo echo-spin sequence. The following parameters were used for the first test: FoV (field of view) \ (224 \ times 224 \ times 175 \) \ (\ hbox {mm} ^ 3 \)voxel size \ (1.75 \ times 1.75 \ times 3.5 \) \ (\ hbox {mm} ^ 3 \), \ (T_R \)/\ (T_E \) (repetition time / echo time) = 500 ms / 20 ms, echo line length 4, broadband gain of 20 kHz, no signal on average, k-cylinder space coverage. The second experiment was conducted with a set of other parameters: FoV \ (180 \ times 240 \ times 180 \) \ (\ hbox {mm} ^ 3 \), \ (1.5 \ times 1.5 \ times 3 \) \ (\ hbox {mm} ^ 3 \), \ (T_R \)/\ (T_E \) = 400 ms / 20 ms, echo line length 5, bandwidth of 20 kHz, no signal average. All methods were implemented according to the relevant instructions and rules.