Hot Topics in Computer Vision

In the project participants are introduced to a current research or industry-related topic. It is not intended to explore a specific area completely. Instead, the participants are confronted with the full complexity of a limited topic and to challenge their own initiative. This allows an insight into research and development of the field.

Please notice: The materials for our lectures and exercises are only available through the network of the Bauhaus-Universität Weimar.

Current projects

Learning Robust Object Detection with Soft-Labels from Multiple Annotators

Relying on predictions of data driven models means trusting the ground truth data - since models can only predict what they have learned. But what if the training data is very difficult to annotate since it requires expert knowledge and the annotator might be wrong?

A possible way to target these issues is to annotate data multiple times and extract the presumed ground truth via majority voting. However, there are other methods that use these data more effectively. 

This project will explore existing methods to merge, vote or otherwise extract ground truth from multi-annotated images. For this purpose the VinDr-XCR and TexBiG Dataset are used, since both provide multi-annotated object detection data. Furthermore, deep neural network architectures will be modified and adapted to utilize such data better during training.

Gigapixels of Perfectly Calibrated Vision: Learning To Perform Subpixel-Accurate Calibration for High-End Multi-Camera Vision Systems

Multi-camera vision systems find application in a variety of 3D vision tasks, for example in free-viewpoint video (FVV) acquisition and motion capture. In such systems, multiple imaging sensors work in concert to capture multiple overlapping synchronous perspectives of the real world. To make use of the captured imagery, geometric relationships between the cameras must be known with very high accuracy.

As the resolution of imaging sensors increases, so does the requirement to the accuracy of their calibration. Subtle defects, which occur during manufacturing of virtually every lens and imaging sensor, lead to non-linear distortions which are hard to approximate accurately with classic parametric distortion models.

In this project students will acquire or deepen their existing knowledge of imaging fundamentals, as well as photogrammetric computer vision, with an in-depth journey into state-of-the-art on accurate calibration techniques. The gained knowledge and skills will be applied to produce an interactive calibration technique for a scientific multi-camera vision system with the maximum throughput of 7.2 Gigapixels per second.

Neural Radiance Fields (NeRF) for 3reCapSL Capturing Device

 

Neural radiance fields (NeRF) are an emerging technology for photorealistic view synthesis on 3D sceneries. Unlike other approaches, the 3D geometry of the scene is not explicity modelled and stored, rather implicitly encoded into a multi-layer perceptron (MLP). NeRFs have produced impressive results for a number of applications. In this project, the applicability of NeRFs in the context of our 3reCapSL photo dome is explored.

For that purpose a (1) 3D reconstruction pipeline is to be scripted, (2) a deep understanding and solid implementation skill on NeRFs is developed, and (3) extensions on NeRFs for recognition tasks are to be explored.

Supervised by Jan Frederick Eick, Paul Debus, and Christian Benz.

Completed projects

Generating 3D Interior Design for Point Cloud Scene Understanding and Room Layout Analysis

Data-driven algorithms require a reasonable amount of data. Especially for 3D scenes, the amount and kind of training data available for learning highly accurate and robust models is still very limited. In the course of the project, the lack of data is reduced by generating appealing 3D interior scenes to facilitate the learning of powerful models for scene understanding and room layout analysis.

The project is managed via the Moodle room.

Image Sharpness Assessment for Non-stationary Image Acquisition Platforms

The relevance of non-stationary platforms for image acquisition -- such as mobile phones or drones -- is steadily increasing for a variety of applications. The resolution (closely related to image sharpness) achievable for certain camera configurations and acquisition distances was theoretically answered with reference to the pinhole camera model. The practically obtained resolution may, however, be distinctly worse than the theoretically possible one. Factors like (motion) blur, out of focus, noise, and improper camera parameters can impede the image quality.

In this project, the practical resolution will be measured by means of a Siemens star. The goal is to implement a robust detection pipeline, that automatically triggers a camera, transfers the image, detects the Siemens star, measures the ellipse of blur, and estimates the deviation of theoretical and practical resolution. The implementation will be deployed on the Nvidia Jetson Nano platform using e.g. the robot operating system (ROS). By linking sensory information from Jetson Nano with the estimated image resolution, it, finally, is possible to analyze and quantify the deteriorating effects of motion during acquisition.

The project offers an interesting entry point into computer vision. You will learn the fundamental practical tools in computer vision, such as Python, OpenCV, gPhoto, Git, etc. Moreover, you will learn how to use and configure a real camera in order to take real images. Beside the basics, it will be possible to explore into areas such as artificial neural network or data analytics, if it benefits the project goal.

The project is managed via the Moodle room.

Shape of you: 3D Semantic Segmentation of Point Cloud Data

With increasing availability and affordability of 3D sensors, such as laser scanners and RGB-D systems in smartphones, 3D scans are becoming the new digital photograph. In the 2D image domain we are already able to perform automatic detection of objects and pixel-wis segmentation into different categories. These tasks are dominated by the utilization of convolutional neural networks.

For this project we will demonstrate how to create high quality 3D scans of indoor environments for visualization tasks, computer games and virtual reality applications. Using these 3D scans, we will then explore methods to analyze and segment the reconstructed geometric data. The goal is to understand and extend technologies that can be used to identify both basic shapes and complex objects of interest such as works of art or museum artifacts.

Applied Deep Learning for Computer Vision

Deep Learning for Computer Vision can be applied on different application-domains such as autonomous driving, anomaly detection, document layout recognition and many more. Throughout the recent years, these tasks have been solved with ever-evolving techniques adding to a vast box of tools to deep learning researchers and practitioners alike. 

The project is aimed towards building a fundamental understanding of current techniques for constructing learning based models, so that they can be applied to problems in the realm of 2D image segmentation, image retrieval and 3D point cloud analysis.

Associated research projects:

Requirements:

  • Successful completion of the course “Image Analysis and Object Recognition” 
  • Good programming skills in Python

Auxiliary Materials:

  • Datacamp Python/Shell ( free for course participants )
  • Udacity PyTorch Intro ( free course )
  • Deep Learning Specialisation ( free course  )
  • Deep Learning with PyTorch ( free ebook )

Drone Flight Path Planning

Drones have recently be applied more and more in the inspection of infrastructure. This project explores possibilities and approaches for efficient and complete mission planning. The project is coordinated via its Moodle room

Separation of Reflectance Components

The project is managed via the learning platform Moodle. All documents and further information can be found in the Moodle course Separation of Reflectance Components 2020.

Combined Camera and Projector Calibration for Real-time Tracking and Mapping

The project is managed via the learning platform Moodle. All documents and further information can be found in the Moodle course Separation of Combined Camera and Projector Calibration 2020.

Neural Bauhaus Style Transfer

Whereas typical deep learning models only have discriminative capabilities -- basically classifying or regressing images or pixels -- generative adversarial networks (GANs) [1] are capable of generating, i.e. producing or synthesizing new images. A whole movement has emerged around the CycleGAN [2,3] approach, which tries to apply the style of one image set (say the paintings of Van Gogh) onto another (say landscape photographs). The applicability of this approach for the transfer of Bauhaus style onto objects or buildings in images or whole images should be explored. At the end of the project a minor exploration on a seemingly different, but well-related problem takes place: In how far is the obtained GAN capable of augmenting a dataset of structural defect data.


Necessary requirements:
- IAOR passed
- good python knowledge (-> examples see below)


Optional skills:
- deep learning
- pytorch

Reference:
[1] Goodfellow, Ian, et al. "Generative adversarial nets." Advances in neural information processing systems. 2014.
[2] Zhu, Jun-Yan, et al. "Unpaired image-to-image translation using cycle-consistent adversarial networks." Proceedings of the IEEE international conference on computer vision. 2017.
[3] https://junyanz.github.io/CycleGAN/

Python Examples:
In case you relatively quickly can answer the following python questions, you're good to go.

1) What does the 'self' refer to?
def compute_difference(self, x, y)...

2) What will be the output?
mat = np.array([[1,2,3],[4,5,6]])
print(mat[1,2])


3) Was the example above coded in python 2 or 3?

4) Does that code run? Why?
mat = np.array([[1,2,3],[4,5,6]])
print(mat[...,0])


5) Does that code run? Why?
mat = np.array([[1,2,3],[4,5,6]])
x,y = 5,3
print(mat[0,x/y])

6) How many elements does arr have?
arr = range(1,11,3)

7) What will be the last element of arr?
arr = [elem**2 for elem in range(5) if elem%2==1]