The following two sections discuss how to optimize image data flows in the ML and how to optimize module code.
Use a profiler to analyze your module code.
Very simple and unsuspicious code fragments can often cost a lot of time. Before optimizing irrelevant code find out where the time is actually spent.
Make sure that the time is really spent in your module.
Since an ML module usually does not work alone, it might happen that the time is spent in another module or in the ML internals. Loading images via networks, badly paged images, implicit data type conversions, changes to page extents, requests of big input subimages, etc. can require a lot of time which is not spent in your module.
Make your image processing algorithm inplace.
This is not a very powerful optimization, but it may result in a slight speed-up if you already have a fast algorithm.
Enable multithreading for
calculateOutputSubImage().
This enables the ML to call
calculateOutputSubImage()
in parallel. However, please
be sure that your algorithm in
calculateOutputSubImage()
is really thread-safe to
avoid nasty bugs.
Avoid position calculations with 6D components.
Often, a straightforward position calculation handles 6D
positions. Methods which get vectors or a number of coordinates as
parameters are usually expensive, because they require voxel address
calculations in all or many dimensions which then can become quite
inefficient in inner loops. Try to set a cursor
(setCursor*Pos()
) outside a loop and use the
moveTo*()
commands to move the cursor
within the loop. This usually results in a simple and fast
pointer-add operation because the compiler normally inlines that
code.
Try to avoid changes of page extents or be careful when selecting a new one.
Changing page extents can result in a lot of expensive
internal copying to compose input subimages for other modules. Try
to leave the extent of pages unchanged; then the internal ML
optimizations can recycle pages and page references optimally. When
setting a new page extent, try to select one which is not too big or
too small, and which has an extent of powers of two. If possible use
the helper functions in Module
to determine
an optimal page extent.
Avoid inadequate page extents and inappropriate subimage requests.
Sometimes, page extents or image requests are not well suited, e.g., when you have images with page extent (128x128x1x1x1x1) and request a subimage from (10,10,0,0,0,0) to (10,10,50,0,0,0,0), a line of voxels perpendicular to all pages is requested. Hence, a large number of pages is processed, and only one pixel is copied from each page. This is of course expensive. Think about the (sub)image requests done in that pipeline and use adequate page extents when feeding an image into a module pipeline.
When a module network generally works e.g., slice-based with 2D viewers, 2D page extents are usually appropriate; when you work with 3D algorithms which usually work volume-based or when you are reformatting the image in different dimensions, 3D page extents might be useful, however, a 2D extent is also okay in most cases. To avoid administrative overhead, page extents should not be set too small.
Avoid page extents with dimensions that are higher than the dimension of the used image data, because otherwise the ML host has to manage unused data regions in pages.
Do not cast between data types and do not try to change data types from module inputs to outputs if not really necessary.
When you change data types, you are using cast operations that can become quite expensive on some systems, especially when casting floats to integers. This also inhibits inplace calculations and page recycling in the ML core.
Do not scale data if not really necessary.
When data is requested from the ML, this is often done by passing voxel value scaling information to the request so that the data is delivered in the right interval range. This can lead to expensive operations since implicit casting operations are often necessary then.
Try to implement your algorithm page-based, i.e., select the optimal implementation approach for your algorithm.
Algorithms which are not page-based (i.e., global image
processing approaches) lock much memory; they often force the
operating system to perfrom virtual memory swapping, they fill up
the ML cache, and they often change page extents in a module
pipeline, i.e., they do not work optimally with the optimized ML
concept. When you need such algorithms, try to use approaches such
as the VirtualVolume
approach (Section 2.3.7, “
VirtualVolume
”) to merge global image processing with
page-based approaches. Selecting the correct implementation approach
can drastically speed up your algorithm. See Chapter 4, Image Processing Concepts for a detailed discussion of
such approaches.
Request input subimages in "read only" mode.
The ML can pass pointers to cache pages directly as input subimages. That reduces memory allocations and copying in some cases. Note that this mode may not be available in some ML versions.
© 2024 MeVis Medical Solutions AG