authors are vetted experts in their fields and write on topics in which they have demonstrated experience. All of our content is peer reviewed and validated by Toptal experts in the same field.
Michael是一位经验丰富的Python、OpenCV和c++开发人员. 他对机器学习和计算机视觉特别感兴趣.
PREVIOUSLY AT
比赛是提升机器学习技能的好方法. Not only do you get access to quality datasets, you are also given clear goals. This helps you focus on the important part: designing quality solutions for problems at hand.
我和我的一个朋友最近参加了 N+1 fish, N+2 fish competition. This machine learning competition, with lots of image processing, requires you to process video clips of fish being identified, measured, and kept or thrown back into the sea.
In the article, I will walk you through how we approached the problem from the competition using standard image processing techniques and pre-trained neural network models. The performance of the submitted solutions was measured based on a certain formula. 凭借我们的解决方案,我们获得了第11名.
关于机器学习的简单介绍,可以参考 this article.
我们在每个片段中都有一条或多条鱼的视频. These videos were captured on different boats fishing for ground fish in the Gulf of Maine.
The videos were collected from fixed-position cameras placed to look down on a ruler. 尺子上放着一条鱼, 渔夫把手从尺子上移开, and then the fisherman either discards or keeps the fish based on the species and size.
这个项目有三个重要的任务. The ultimate goal was to create an algorithm that automatically generates annotations for the video files, 其中注释由以下部分组成:
The organizers of the competition created an aggregated metric that gave a general sense of performance on all of these tasks. The metric was a simple weighted combination of an individual metric for each of the tasks. 虽然有一定的重量, they recommended that we focus on a well-rounded algorithm that was able to contribute to all of the tasks!
You can learn more about how the overall performance metric is calculated from the performance metrics of each individual task from 官方竞赛网页.
When working with machine learning projects dealing with pictures or videos, you will most likely be using 卷积神经网络. But, 在我们使用卷积神经网络之前, we had to preprocess the frames and solve some other subtasks through different strategies.
对于训练,我们使用了一个nVidia 1080Ti GPU. A good chunk of our time was lost trying to optimize our code in order to stay relevant in the competition. We did, however, end up spending less time where it would have mattered more.
With silhouette analysis, finding the number of boats became a fairly trivial task. The steps were as follows, and leveraged some very standard techniques:
SURF detects points of interest in an image and generates feature descriptions. This approach is really robust, even with various image transformations.
一旦图像中感兴趣点的特征已知, 进行K-means聚类, followed by silhouette analysis to determine an approximate number of boats in the images.
尽管数据集包含单独的视频文件, each video seemed to have some overlaps with other videos in the dataset. This is possibly because the videos were split from one long video and thus ended up having a few common frames at the start or end of each video.
识别这样的框架,并在必要时删除它们, 我们在帧上使用了一些快速哈希函数.
通过应用一些标准的图像处理方法, 我们确定了尺子的位置和方向. We then rotated and cropped the image to position the ruler in a consistent manner across all frames. 这也使我们能够将帧尺寸减小到原来的十倍.
检测到的标尺(绘制在平均帧上):
裁剪和旋转区域:
Implementing this stage to determine the sequence of the fish took a majority of my time during this competition. 训练新的卷积神经网络似乎太昂贵了, 所以我们决定使用预训练的神经网络.
为此,我们选择了以下神经网络:
这些神经网络模型是在 ImageNet dataset.
We extracted only the convolutional layers of the models and passed through them the competition dataset. 在输出中,我有一个相当紧凑的特征数组.
Then, we trained the neural networks with only fully connected dense layers and predicted results for each pretrained model. After that, we averaged the result, and the results turned out quite poor.
我们决定用 长短期记忆(LSTM) neural networks for better prediction where the input data was a sequence of five frames which were transformed with the pretrained models.
为了合并所有模型的输出,我们使用了几何均值.
鱼类检测管道为:
一个视频的结果是这样的:
After spending a majority of the contest duration implementing the previous stage, we tried to make up for the lost time working with models from the previous stage to identify the species of the fish.
我们的方法大致如下:
为了确定鱼的长度,我们使用了神经网络. One of them was trained to identify the fish heads and the other was trained to identify fish tails. The lengths of the fish were approximated as the distance between the two points identified by the two neural networks.
以下是各阶段的总体方案:
The overall design was fairly simple as video frames were passed through the stages outlined above before combining the separate results.
Silhouette analysis is a technique that can distinguish between clusters of data points that are visually separate from each other.
A machine learning model is a product of a machine learning algorithm training on data. The model can later be used to produce relevant output for similar inputs.
Batumi, Adjara, Georgia
2016年9月12日成为会员
Michael是一位经验丰富的Python、OpenCV和c++开发人员. 他对机器学习和计算机视觉特别感兴趣.
PREVIOUSLY AT
世界级的文章,每周发一次.
世界级的文章,每周发一次.
Join the Toptal® community.