In a previous post, an introduction to optical flow was conducted, as well an overview of it's architecture based on the FlowNet 2.o paper. This blog will focus in going deeper into optical flow, which will be done by generating optical flow files both from the standard sintel data and a custom dance video. It will be conducted using a fork of the NVIDIA flownet2-pytorch code base which can be found be found in the Dancelogue linked repo.
The goal of this blog is to:
- Get the flownet2-pytorch codebase up and running.
- Download the relevant dataset as described by the example provided in the original repository.
- Generate optical flow files and then investigate the structure of the flow files.
- Convert the flow files into the color coding scheme to make them easier for humans to understand.
- Apply optical flow generation to dance videos and analyse the result.
The flownet2-pytorch implementation has been designed to work with a GPU. Unfortunately, it means if you don't have access to one it will not be possible to follow this blog completely. In order to mitigate this problem, sample data generated by the model is provided and allows the reader to follow through with the rest of the blog.
The rest of this tutorial is conducted using ubuntu 18.04 with a NVIDIA GEFORCE GTX 1080 Ti GPU. Docker is required and must be GPU enabled which can be done using the the nvidia-docker package.
Downloading the Code base and Datasets
Here is a list of all the code and data required so as to follow through with the blog (downloading the data has been automated so the reader doesn’t have to do it manually, please the Getting Started section):
- The code base for this blog can be cloned from the following repo.
- The sintel data can be downloaded by clicking the following link, the zipped file is 5.63 GB which increases to 12.24 GB when unzipped.
- Custom data can be downloaded which includes a sample optical flow
.flofile, a generated color coding scheme from the sample optical flow file, a dance video to conduct optical flow on, an optical flow video representation of the dance video.
The memory space requirements required to follow through with this blog is approximately 32 GB. The reason for this will be explained later on.
Differences In Fork
As mentioned, a fork of the original flownet2-pytorch was created, and it's because at the writing of this blog, the original repository had issues when building and running the docker image e.g. python package version issues, c libraries compile issues etc. In addition, enhancements were made so as to make downloading the datasets easier. The changes that were made are meant to fix these issues and can be seen in the following pull request https://github.com/dancelogue/flownet2-pytorch/pull/1/files.
- The major updates were to the Dockerfile which includes, fixing the python package versions, updating the cuda and pytorch versions, running an automated build and installation of the correlation layer, adding ffmpeg, adding a third party github package that will allow the reading, processing and conversion of the flow files to the color coding scheme.
- Download scripts were also added for both the datasets and trained models in order to make it easier to get started, the inspiration for this is from the vid2vid repository which is also from NVIDIA.
With this in mind, let's get started. The first thing is to clone the dancelogue fork of the original repository from https://github.com/dancelogue/flownet2-pytorch. Then run the docker script using the following command:
It should take a few minutes to set up, after which, it should change the terminal context to the docker session.
The next thing is to download the relevant datasets, all the required data for the initial setup can be achieved by running the following command within the docker context:
This downloads the
FlowNet2_checkpoint.pth.tar model weights to the models folder, as well as the
MPI-Sintel data to the datasets folder. This is required in order to follow the instructions for the inference example as indicated in the flownet2-pytorch getting started guide. The custom dance video is also downloaded as well as a sample optical flow
The rest of this blog has been automated and can be run by the following command:
Running the inference Example
The command for the original inference example is as follows:
python main.py --inference --model FlowNet2 --save_flow \ --inference_dataset MpiSintelClean \ --inference_dataset_root /path/to/mpi-sintel/clean/dataset \ --resume /path/to/checkpoints
However based on the fork, this has modified to:
python main.py --inference --model FlowNet2 --save_flow \ --inference_dataset MpiSintelClean \ --inference_dataset_root datasets/sintel/training \ --resume checkpoints/FlowNet2_checkpoint.pth.tar \ --save datasets/sintel/output
Let's break it down:
--modelindicates what variation of the model to use. From the previous blog we saw this can be
FlowNet2but for this blog it is set to
--resumeargument indicates the location of the trained model weights. This has been downloaded into the checkpoints folder using the download scripts. Note the trained model weights have certain license restrictions which you should go through in case you need to use them outside this blog.
--inferenceargument simply means, based on the learned capability as defined by the model weights from the training data, what can you tell me about the new dataset. This is different from training the models where the model weights will change.
--inference_datasetindicates what type of data will be fed. In the current case it is sintel as specified by
MpiSintelClean. More options for this can be found in https://github.com/dancelogue/flownet2-pytorch/blob/master/datasets.py and are defined as classes e.g.
FlyingChairs. There is also the
ImagesFromFolderclass which means we can feed custom data e.g. frames from a video and we can get inference from that.
--inference_dataset_rootindicates the location of the data that will be used for the inference process, which has been downloaded and unzipped into the
--save_flowargument indicates that the inferred optical flow should be saved as
--saveargument indicates the location to which the inferred optical flow files as well as the logs should be saved to. It is an optional field and defaults to the
Running the above command saves the generated optical flow files into the
datasets/sintel/output/inference/run.epoch-0-flow-field folder. The generated optical flow files have the extension
.flo which are the flow fields representations.
Analyzing and Visualizing Optical Flow Files
Now that the optical flow files have been generated, it's time to analyze the structure in order get a better understanding of the result, as well as convert them to the flow field color coding scheme. The sample flow file used in this section can be downloaded from the following link.
Analyzing Flow Files
Loading an optical flow file into numpy is a fairly trivial process, which can be conducted as follows:
path = Path('path/to/flow/file/<filename>.flo') with path.open(mode='r') as flo: np_flow = np.fromfile(flo, np.float32) print(np_flow.shape)
The above syntax is based on python3, where the file is loaded into a buffer and then fed into numpy. The next thing is trying to understand the basic features of the flow file which is achieved by the print statement. Assuming you are following with the sample flow file that was provided, this will give the following result
(786435,) . The implication is that for each flow file, it contains a single array with 786453 elements in the array. The memory footprint of a single flow file is approximately 15 MB, which even though looks trivial, increases quite quickly especially when looking at video with thousands of frames.
Before proceeding further we need to look at the optical flow specification as defined in http://vision.middlebury.edu/flow/code/flow-code/README.txt. What we care about is the following:
".flo" file format used for optical flow evaluation Stores 2-band float image for horizontal (u) and vertical (v) flow components. Floats are stored in little-endian order. A flow value is considered "unknown" if either |u| or |v| is greater than 1e9. bytes contents 0-3 tag: "PIEH" in ASCII, which in little endian happens to be the float 202021.25 (just a sanity check that floats are represented correctly) 4-7 width as an integer 8-11 height as an integer 12-end data (width*height*2*4 bytes total) the float values for u and v, interleaved, in row order, i.e., u[row0,col0], v[row0,col0], u[row0,col1], v[row0,col1], ...
Based on the above specification, the following code will allow us to read the flow file correctly (borrowed from https://github.com/georgegach/flow2image/blob/master/f2i.py).
with path.open(mode='r') as flo: tag = np.fromfile(flo, np.float32, count=1) width = np.fromfile(flo, np.int32, count=1) height = np.fromfile(flo, np.int32, count=1) print('tag', tag, 'width', width, 'height', height) nbands = 2 tmp = np.fromfile(flo, np.float32, count= nbands * width * height) flow = np.resize(tmp, (int(height), int(width), int(nbands)))
Based on the optical flow format specification, hopefully the above code should make more sense about what's happening i.e. we get the tag, then the width, followed by height. The output of the print statement is
tag 202021.25 width 1024 height 384. From the given specification we can see that the tag matches the sanity check value, the width of the flow file is 1024 and the height is 384. Note, it is important to have the correct order of reading the file buffer and loading it into numpy, due to the way the files are read in python (bytes are read sequentially) otherwise the tag, height and width can get mixed up. Now that we have the width and the height, we can read the rest of the optical flow data and resize into a shape that's more familiar, which is done using the
A quick way to understand how the flow vectors have been resized is to print them to the terminal, this is done by running the following code:
>> print(flow.shape) (384, 1024, 2) >> print(flow) [-1.2117167 -1.557275]
As we expect the shape of the new representation implies a height of 384, a width of 1024 and has a displacement vector consisting of 2 values. Focusing on the pixel at location
0, 0 we can see the displacement vector at that point seems to be pointing to the left and to the bottom i.e. the bottom left quadrant of an x, y plot, which means we expect the color code for this location to be a light blue or even a green color based on the color coding scheme given below.
Visualizing Flow Files
There are quite a few open source code bases written to visualize optical flow files. The one chosen for this purpose can be found in the github repository https://github.com/georgegach/flow2image. The reason for this is that it allows the generation of video clips from the color coding scheme which will be useful at a later stage. Assuming the docker context provided at the beginning of this tutorial is used, the following command can be used to generate color coded image files of the optical flow.
python /flow2image/f2i.py \ datasets/sintel/output/inference/run.epoch-0-flow-field/*.flo \ -o datasets/sintel/output/color_coding
This takes the optical flow files and generates image files where the displacement vector is color coded as shown below.
In order to understand the color coding scheme, please view the previous blog on optical flow. At position 0, 0 i.e. the bottom right portion of the image, we can indeed see a light blue color and is what we expected from the displacement vector, i.e. it is the color for a vector pointing to the left and bottom.
Applying Optical Flow To Dance Videos
In this section, we will use a dance video, and generate optical flow files from it. The dance video is:
It consists of a dance choreography class in a real world setting.
As the flownet code base takes in images, the first thing we need to do is to convert the videos into frames, which can be done by the following command using ffmpeg.
ffmpeg -i datasets/dancelogue/sample-video.mp4 \ datasets/dancelogue/frames/output_%02d.png
It will output the frames in an ordered sequence within the frames folder, the order is important as the flownet algorithm uses adjacent images to calculate the optical flow between the images. The generated frames occupy 1.7 GB of memory whereas the video is only 11.7 MB, each frame is about 2 MB.
Generating Optical Flow
The optical flow representations can be generated by running the following command.
python main.py --inference --model FlowNet2 --save_flow \ --inference_dataset ImagesFromFolder \ --inference_dataset_root datasets/dancelogue/frames/ \ --resume checkpoints/FlowNet2_checkpoint.pth.tar \ --save datasets/dancelogue/output
This is similar to the inference model we ran with the sintel dataset where the differences are in the
--inference_dataset argument which changes to
ImagesFromFolder and as defined in the codebase. The
--inference_dataset_root is the path to the generated video frames. The generated optical flow files occupy 14.6 GB of memory, this is because each optical flow file is approximately 15.7 MB for this example.
Generating Color Code Scheme
The command to generate the color coding scheme is:
python /flow2image/f2i.py \ datasets/dancelogue/output/inference/run.epoch-0-flow-field/*.flo \ -o datasets/dancelogue/output/color_coding -v -r 30
This makes use of the flow2image repository as well as ffmpeg. Not only does it generate the optical flow color encodings as
.png files, but the
-v -r 30 parameter generates videos from the image files at
30 fps. The generated color coding frames occupy 422 MB of memory which includes a 8.7 MB video file which has the name
000000.flo.mp4 if you are following through this blog.
The generated video representation of the optical flow is as follows:
The gist of the choreography can be seen from the generated video, the different colors indicate the direction of motion. However, it can be seen there is a lot of background noise especially around the central dancers despite no apparent motion in the video. Unfortunately, it is not clear why this is the case.
When running the flownet algorithm, one needs to be aware of the size implications, a 11.7 MB video for example, generates a 1.7 GB file of individual frames when extracted. However when generating optical flow this becomes a 14.6 GB file containing all the optical flow representations. This is because each optical flow file occupies about 15.7 MB in memory, however each image frame occupies 2 MB of memory (for the case of the examples provided). Thus when running optical flow algorithms one needs to be aware of the computation requirements vs space tradeoff. This trade off will impact the architecture when building deep learning systems for video, meaning either generate optical flow files as needed (i.e. lazily) at the cost of computation time or generate all the required formats and representations before hand and save them to the file system at the cost of storage space.
We have seen how to generate optical flow files using a fork of NVIDIA's flownet2-pytorch implementation, as well as have had an overview understanding of optical flow files. The next blog will cover how to use the optical flow representations to understand video content and will be focusing on 2 stream networks.