Appendix B. Optimizing Image Processing

Table of Contents

B.1. Optimizing Module Code
B.2. Optimizing Data Flow in Module Networks

The following two sections discuss how to optimize image data flows in the ML and how to optimize module code.

B.1. Optimizing Module Code

Use a profiler to analyze your module code.
Very simple and unsuspicious code fragments can often cost a lot of time. Before optimizing irrelevant code find out where the time is actually spent.
Make sure that the time is really spent in your module.
Since an ML module usually does not work alone, it might happen that the time is spent in another module or in the ML internals. Loading images via networks, badly paged images, implicit data type conversions, changes to page extents, requests of big input subimages, etc. can require a lot of time which is not spent in your module.
Make your image processing algorithm inplace.
This is not a very powerful optimization, but it may result in a slight speed-up if you already have a fast algorithm.
Enable multithreading for calculateOutputSubImage().
This enables the ML to call calculateOutputSubImage() in parallel. However, please be sure that your algorithm in calculateOutputSubImage() is really thread-safe to avoid nasty bugs.
Avoid position calculations with 6D components.
Often, a straightforward position calculation handles 6D positions. Methods which get vectors or a number of coordinates as parameters are usually expensive, because they require voxel address calculations in all or many dimensions which then can become quite inefficient in inner loops. Try to set a cursor (setCursor*Pos()) outside a loop and use the moveTo*() commands to move the cursor within the loop. This usually results in a simple and fast pointer-add operation because the compiler normally inlines that code.
Try to avoid changes of page extents or be careful when selecting a new one.
Changing page extents can result in a lot of expensive internal copying to compose input subimages for other modules. Try to leave the extent of pages unchanged; then the internal ML optimizations can recycle pages and page references optimally. When setting a new page extent, try to select one which is not too big or too small, and which has an extent of powers of two. If possible use the helper functions in Module to determine an optimal page extent.
Avoid inadequate page extents and inappropriate subimage requests.
Sometimes, page extents or image requests are not well suited, e.g., when you have images with page extent (128x128x1x1x1x1) and request a subimage from (10,10,0,0,0,0) to (10,10,50,0,0,0,0), a line of voxels perpendicular to all pages is requested. Hence, a large number of pages is processed, and only one pixel is copied from each page. This is of course expensive. Think about the (sub)image requests done in that pipeline and use adequate page extents when feeding an image into a module pipeline.
When a module network generally works e.g., slice-based with 2D viewers, 2D page extents are usually appropriate; when you work with 3D algorithms which usually work volume-based or when you are reformatting the image in different dimensions, 3D page extents might be useful, however, a 2D extent is also okay in most cases. To avoid administrative overhead, page extents should not be set too small.
Avoid page extents with dimensions that are higher than the dimension of the used image data, because otherwise the ML host has to manage unused data regions in pages.
Do not cast between data types and do not try to change data types from module inputs to outputs if not really necessary.
When you change data types, you are using cast operations that can become quite expensive on some systems, especially when casting floats to integers. This also inhibits inplace calculations and page recycling in the ML core.
Do not scale data if not really necessary.
When data is requested from the ML, this is often done by passing voxel value scaling information to the request so that the data is delivered in the right interval range. This can lead to expensive operations since implicit casting operations are often necessary then.
Try to implement your algorithm page-based, i.e., select the optimal implementation approach for your algorithm.
Algorithms which are not page-based (i.e., global image processing approaches) lock much memory; they often force the operating system to perfrom virtual memory swapping, they fill up the ML cache, and they often change page extents in a module pipeline, i.e., they do not work optimally with the optimized ML concept. When you need such algorithms, try to use approaches such as the VirtualVolume approach (Section 2.3.7, “ VirtualVolume ”) to merge global image processing with page-based approaches. Selecting the correct implementation approach can drastically speed up your algorithm. See Chapter 4, Image Processing Concepts for a detailed discussion of such approaches.
Request input subimages in "read only" mode.
The ML can pass pointers to cache pages directly as input subimages. That reduces memory allocations and copying in some cases. Note that this mode may not be available in some ML versions.


A.7. Version Control		B.2. Optimizing Data Flow in Module Networks