• Skip to main content
  • Skip to footer

INT

Empowering Visualization

CONTACT US SUPPORT
MENUMENU
  • Products
    • Overview
    • IVAAP™
    • INTViewer™
    • GeoToolkit™
    • Product Overview
  • Demos
    • GeoToolkit Demos
    • IVAAP Demos
  • Success Stories
  • Solutions
    • Overview
    • For E&P
    • For OSDU Visualization
    • For Cloud Partners
    • For Machine Learning
    • For CCUS
    • For Geothermal Energy
    • For Wind Energy
    • For Enterprise
    • Tools for Developers
    • Services Overview
  • Resources
    • Blog
    • Developer Community
    • FAQ
    • INT Resources Library
  • About
    • Overview
    • News
    • Events
    • Careers
    • Meet Our Team
    • About INT

java

Feb 17 2022

How Apache SIS Simplifies the Hidden Complexity of Coordinate Systems in IVAAP

See how Apache SIS, with IVAAP, helps support our client’s coordinate systems by using less code.

With the recent release of IVAAP 2.9, now is a good time to reflect on the help we got along the way. One of the components that made IVAAP possible is the Apache SIS library. The goal of this blog article is to bring more visibility to this awesome Java library.

What Is the Apache SIS Library?

Apache SIS is an open-source library written in Java that makes it easier to develop geospatial applications. Many datasets have metadata designating their location on Earth, and these locations are relative to a datum and a map projection method. There are many datums and many map projection methods. Apache SIS facilitates their identification and the accurate conversion of coordinates between them.

What’s a Datum and What’s a Map Projection Method?

Most people are familiar with latitude and longitude coordinates. This geographic coordinate system has been used for maritime and land-based navigation for centuries. Since the late 1800s, the line defining 0º of longitude has been the Prime meridian, crossing the location of the Royal Observatory in Greenwich, England. This meridian defined one axis, from South to North. The equator defined the other axis, from West to East. The origin point of this system on the Earth’s surface is in the Gulf of Guinea, 600 km off the coast of West Africa.

The traditional geographic coordinate system
The traditional geographic coordinate system (Source)

 

Similarly, a datum defines the origin point of the coordinate axes on the Earth’s surface and defines the direction of the axes. To account for the fact that the Earth is not a perfect sphere, a datum also describes the generalized shape of the Earth. For example, WGS 84 (World Geodetic System 1984) is a widely-used global datum based on latitudes and longitudes where the Earth is modeled as an oblate spheroid, centered around its center of mass.

The WGS 84 reference frame. The oblateness of the ellipsoid is exaggerated in this image. (Source)

 

WGS 84 is used by GPS receivers and the longitude 0º of this datum is actually 335 ft east of the Greenwich meridian.

While universal latitude and longitude coordinates are convenient, they are not universally practical because of land masses drift. Satellite measurements show that the location of Houston relative to the WGS 84 datum changes by 1 inch each year. A local datum is a more pragmatic choice than a global datum because distances from a local point of reference are smaller and don’t change over the years when all locations are on the same tectonic plate. A local datum may also align its spheroid to closely fit the Earth’s surface in this particular area.

A map projection method indicates how the Earth’s surface is flattened into a plane in order to make a 2D map. The most widely known projection method was presented by Gerardus Mercator in 1569. This is a global cylindrical projection method. It preserves local directions and shapes but distorts sizes away from the equator.

Cylindrical Projection
An example of global cylindrical projection (Source)

 

In the US, the Lambert Conformal Conic projection has become a standard projection for mapping large areas. This is a projection that requires multiple parameters, defining the longitude and latitude of its center, a distance offset to this center, and the latitude of its southern and northern parallels.

Conical Projection
An example of local conical projection (Source)

 

When a datum and a projection are used together, they define a projected coordinate reference system. While local systems limit distortions, they are only valid in a small area, an area known as the ”area of use” where a minimum level of accuracy is guaranteed.

Select Coordinate System
A screenshot from INTViewer showing the area of use of NAD27 / Wyoming East Central, a derived projected coordinate reference system

 

How Does Apache SIS Help IVAAP?

To show geoscience datasets on one of IVAAP’s 2D map widgets, you need to use a common visualization coordinate reference system.

IVAAP Screenshot
An IVAAP screenshot showing the location of wells on a map

 

This is where Apache SIS helps us: It understands the properties of both the data and visualization systems and is able to convert coordinates between them.

The math to perform these conversions is complex, this is not something you want to implement on your own. It requires specialized skills, both as a programmer and a domain expert. And just beyond the math, the number of datums and projection methods is mind-boggling. Many historical surveys are still in use today. For example, there are two datums used for making horizontal measurements in North America: the North American Datum of 1927 (NAD 27) and the North American Datum of 1983 (NAD 83). The two datums are based on two different ellipsoid models. As a result, the two datums have grid shifts of up to 100 meters, depending on location. IVAAP is able to visualize datasets that used NAD 27 as a reference, and it is Apache SIS that makes it possible to accurately reproject these coordinates into modern coordinate systems, accounting for their respective datum shift.

The datum shift between NAD 27 and NAD 83 (Source)

 

The oil and gas industry is at the origin of some of these local coordinate systems. Many of today’s new oil fields are in remote areas, initially lacking a geographical survey. There is an organization called the “OGP Surveying and Positioning Committee” which keeps track of these coordinate systems. It is colloquially known as “EPSG” for historical reasons. It regularly provides a database of these coordinate systems to all its members. This database is used by IVAAP and Apache SIS provides a simple API to take advantage of it. Each record in this database has a numerical WKID (Well Known ID). To instantiate a projection method or a coordinate system defined in this database, you just need to prefix this id with the “EPSG:” string.

OperationMethod method = getCoordinateOperationFactory().getOperationMethod("EPSG:9807"); // Transverse Mercator method

CoordinateReferenceSystem crs = CRS.forCode("EPSG:32056”);

 

The EPSG database itself is extensive, but it is common for INT customers to use unlisted coordinate reference systems, created for brand new oil fields. In these cases, a WKT (Well Known Text) string can be used instead. This text is a human-readable description of a projection method or coordinate system. The Apache SIS provides a clean API to parse WKTs. It also provides an API for formula-based projection methods that can’t be described by a WKT.

PROJCS["NAD27 / Wyoming East Central",
    GEOGCS["NAD27",
        DATUM["North_American_Datum_1927",
            SPHEROID["Clarke 1866",6378206.4,294.9786982139006,
                AUTHORITY["EPSG","7008"]],
            AUTHORITY["EPSG","6267"]],
        PRIMEM["Greenwich",0,
            AUTHORITY["EPSG","8901"]],
        UNIT["degree",0.0174532925199433,
            AUTHORITY["EPSG","9122"]],
        AUTHORITY["EPSG","4267"]],
    PROJECTION["Transverse_Mercator"],
    PARAMETER["latitude_of_origin",40.66666666666666],
    PARAMETER["central_meridian",-107.3333333333333],
    PARAMETER["scale_factor",0.999941177],
    PARAMETER["false_easting",500000],
    PARAMETER["false_northing",0],
    UNIT["US survey foot",0.3048006096012192,
        AUTHORITY["EPSG","9003"]],
    AXIS["X",EAST],
    AXIS["Y",NORTH],
    AUTHORITY["EPSG","32056"]]

The WKT of NAD27 / Wyoming East Central, with the WKID 32056

Why Did INT Choose Apache SIS Over Other Options?

INT had previous experience using GeoTools. Similarly to Apache SIS, GeoTools is a Java library dedicated to facilitating the implementation of geographical information systems. Being an older library, it goes much further than Apache SIS. For example, one of its components allows the parsing of shape files, something currently outside of the scope of Apache’s library. As a matter of fact, the first versions of IVAAP were using GeoTools for coordinate conversions.

One of the issues we encountered with GeoTools is that it is a library that provides only fine-grained Java conversion APIs. There are several paths to convert coordinates between two systems, and GeoTools allows the developer to choose the best method. Choosing the “best” method without human interaction is complex; it depends on the extent of the data being manipulated and the “area of use” of each coordinate reference system involved. It also depends on the availability of well-known transformation algorithms between datums. In North America, the standard for transformations between datums was formerly known as NADCON. The rest of the world uses a standard known as NTV2. Apache SIS works with both datum shift standards. It may elect to use WGS 84 as a hub when no datum shift is applicable. An algorithm to pick the best method would require a significant amount of code for INT to write and maintain. While Apache SIS allows fine-grained control over the different transformations used when converting from one coordinate reference system into another, it also provides a high-level API to perform this conversion. The picking of the best algorithm is part of the Apache SIS’ implementation. Its high-level Java API that picks a conversion algorithm matches IVAAP’s general use microservice for the same function. To pick the right algorithm, it only takes 3 parameters:

  • A definition of the “from” coordinate system
  • A definition of the “to” coordinate system
  • A description of the “extent” of the coordinates to convert
double x = …
double y = …
GeographicBoundingBox extentInLongLat = …
DirectPosition position = new DirectPosition2D(x, y);
CoordinateReferenceSystem fromCrs = CRS.forCode("EPSG:32056");
CoordinateReferenceSystem toCrs = CRS.forCode("EPSG:3737");
CoordinateReferenceSystem displayOrientedFromCrs = AbstractCRS.castOrCopy(fromCrs).forConvention(AxesConvention.DISPLAY_ORIENTED);
CoordinateReferenceSystem displayOrientedToCrs = AbstractCRS.castOrCopy(toCrs).forConvention(AxesConvention.DISPLAY_ORIENTED);
CoordinateOperation operation = CRS.findOperation(displayOrientedFromCrs, displayOrientedToCrs, extentInLongLat);
MathTransform mathTransform = operation.getMathTransform();
double[] coordinate = mathTransform.transform(position, position).getCoordinate();

Sample code to convert a single x, y position from “NAD27 / Wyoming East Central” to “NAD83 / Wyoming East Central”

We still use GeoTools for other parts, but as a general rule, the Apache SIS Java API tends to be simpler, more modern than GeoTools when it comes to manipulating coordinates and coordinate systems.

After 3 years of use, we are happy with our decision to move to Apache SIS. This library allows us to support more of our customers’ coordinate systems, with less code. We are also planning to use it to interpret the metadata of GeoTIFF files. The support has been excellent. When we needed help, the members of the Apache SIS development team were really keen to help us. This is one of the reasons why INT felt we needed to give back to the open-source community. Being a long-time member of OSDU, INT contributed to OSDU a coordinate conversion library built on top of Apache SIS. This coordinate conversion library converts GeoJSON and trajectory stations between different coordinate reference systems. Users can specify the specific transformation steps that will be used in the conversion process, either through EPSGs or WKTs. Behind the scenes, it’s the Apache SIS’ fine-grained API that is being used.


Filed Under: IVAAP Tagged With: apache, apache sis, ivaap, java, maps

Jan 12 2021

Comparing Storage APIs from Amazon, Microsoft and Google Clouds

One of the unique capabilities of IVAAP is that it works with the cloud infrastructure of multiple vendors. Whether your SEGY file is posted on Microsoft Azure Blob Storage, Amazon S3 or Google Cloud Storage, IVAAP will be capable of visualizing it.

It’s only when administrators register new connectors that vendor-specific details need to be entered.  For all other users, the user interface will be identical regardless of the data source. The REST API consumed by IVAAP’s HTML5 client is common to all connectors as well. The key component that does the hard work of “speaking the language of each cloud vendor and hiding their details to the other components” is the IVAAP Data Backend.

While the concept of “storage in the cloud” is similar across all three vendors, they each provide a different API to achieve similar goals. In this article, we will compare how to implement 4 basic functionalities. Because the IVAAP Data Backend is written in Java, we’ll only compare Java APIs.

 

Checking that an Object or Blob Exists

Amazon S3

String awsAccessKey = …
String awsSecretKey = …
String region = …
String bucketName = …
String keyName = …
AwsCredentials credentials = AwsBasicCredentials.create(awsAccessKey, awsSecretKey);
S3Client s3Client = S3Client.builder().credentialsProvider(credentials).region(region).build();
try {
    HeadObjectRequest.Builder builder = HeadObjectRequest.builder().bucket(bucketName).key(keyName);
    s3Client.headObject(request);
    return true;
} catch (NoSuchKeyException e) {
    return false;
}

Microsoft Azure Blob Storage

String accountName = …
String accountKey = …
String containerName = …
String blobName = ...
StorageSharedKeyCredential credential = new StorageSharedKeyCredential(accountName, accountKey);
String endpoint = String.format(Locale.ROOT, "https://%s.blob.core.windows.net", accountName);
BlobServiceClientBuilder builder = new BlobServiceClientBuilder().endpoint(endpoint).credential(credential);
BlobServiceClient client = builder.buildClient();
BlobContainerClient containerClient = client.getBlobContainerClient(containerName);
BlobClient blobClient = containerClient.getBlobClient(blobName);
return blob.exists();

Google Cloud Storage

String authKey = …
String projectId = …
String bucketName = …
String blobName = ...
ObjectMapper mapper = new ObjectMapper();
JsonNode node = mapper.readTree(authKey);
ByteArrayInputStream in = new ByteArrayInputStream(mapper.writeValueAsBytes(node));
GoogleCredentials credentials = GoogleCredentials.fromStream(in);
Storage storage = StorageOptions.newBuilder().setCredentials(credentials)
                        .setProjectId(projectId)
                        .build()
                        .getService();
Blob blob = storage.getBlob(bucketName, blobName, BlobGetOption.fields(BlobField.ID));
return blob.exists();

 

Getting the Last Modification Date of an Object or Blob

Amazon S3

String awsAccessKey = …
String awsSecretKey = …
String region = …
String bucketName = …
String keyName = …
AwsCredentials credentials = AwsBasicCredentials.create(awsAccessKey, awsSecretKey);
S3Client s3Client = S3Client.builder().credentialsProvider(credentials).region(region).build();
HeadObjectRequest headObjectRequest = HeadObjectRequest.builder()
.bucket(bucketName)
.key(keyName)
.build();
HeadObjectResponse headObjectResponse = s3Client.headObject(headObjectRequest);
return headObjectResponse.lastModified();

Microsoft Azure Blob Storage

String accountName = …
String accountKey = …
String containerName = …
String blobName = …
StorageSharedKeyCredential credential = new StorageSharedKeyCredential(accountName, accountKey);
String endpoint = String.format(Locale.ROOT, "https://%s.blob.core.windows.net", accountName);
BlobServiceClientBuilder builder = new BlobServiceClientBuilder()
.endpoint(endpoint)
.credential(credential);
BlobServiceClient client = builder.buildClient();
BlobClient blob = client.getBlobClient(containerName, blobName);            BlobProperties properties = blob.getProperties();
return properties.getLastModified();

Google Cloud Storage

String authKey = …
String projectId = …
String bucketName = …
String blobName = …
ObjectMapper mapper = new ObjectMapper();
JsonNode node = mapper.readTree(authKey);
ByteArrayInputStream in = new ByteArrayInputStream(mapper.writeValueAsBytes(node));
GoogleCredentials credentials = GoogleCredentials.fromStream(in);
Storage storage = StorageOptions.newBuilder().setCredentials(credentials)
                        .setProjectId(projectId)
                        .build()
                        .getService();
Blob blob = storage.get(bucketName, blobName,  BlobGetOption.fields(Storage.BlobField.UPDATED));
return blob.getUpdateTime();

 

Getting an Input Stream out of an Object or Blob

Amazon S3

String awsAccessKey = …
String awsSecretKey = …
String region = …
String bucketName = …
String keyName = …
AwsCredentials credentials = AwsBasicCredentials.create(awsAccessKey, awsSecretKey);
S3Client s3Client = S3Client.builder().credentialsProvider(credentials).region(region).build();
GetObjectRequest getObjectRequest = GetObjectRequest.builder()
.bucket(bucketName)
.key(keyName)
.build();
return s3Client.getObject(getObjectRequest);

Microsoft Azure Blob Storage

String accountName = …
String accountKey = …
String containerName = …
String blobName = …
StorageSharedKeyCredential credential = new StorageSharedKeyCredential(accountName, accountKey);
String endpoint = String.format(Locale.ROOT, "https://%s.blob.core.windows.net", accountName);
BlobServiceClientBuilder builder = new BlobServiceClientBuilder()
.endpoint(endpoint)
.credential(credential);
BlobServiceClient client = builder.buildClient();
BlobClient blob = client.getBlobClient(containerName, blobName);
return blob.openInputStream();

Google Cloud Storage

String authKey = …
String projectId = …
String bucketName = …
String blobName = …
ObjectMapper mapper = new ObjectMapper();
JsonNode node = mapper.readTree(authKey);
ByteArrayInputStream in = new ByteArrayInputStream(mapper.writeValueAsBytes(node));
GoogleCredentials credentials = GoogleCredentials.fromStream(in);
Storage storage = StorageOptions.newBuilder().setCredentials(credentials)
                        .setProjectId(projectId)
                        .build()
                        .getService();
Blob blob = storage.get(bucketName, blobName,  BlobGetOption.fields(BlobField.values()));
return Channels.newInputStream(blob.reader());

 

Listing the Objects in a Bucket or Container While Taking into Account Folder Hierarchies

S3

String awsAccessKey = …
String awsSecretKey = …
String region = …
String bucketName = …
String parentFolderPath = ...
AwsCredentials credentials = AwsBasicCredentials.create(awsAccessKey, awsSecretKey);
S3Client s3Client = S3Client.builder().credentialsProvider(credentials).region(region).build();
ListObjectsV2Request.Builder builder = ListObjectsV2Request.builder().bucket(bucketName).delimiter("/").prefix(parentFolderPath + "/");
ListObjectsV2Request request = builder.build();
ListObjectsV2Iterable paginator = s3Client.listObjectsV2Paginator(request);
Iterator<CommonPrefix> foldersIterator = paginator.commonPrefixes().iterator();
while (foldersIterator.hasNext()) {
…
}

Microsoft

String accountName = …
String accountKey = …
String containerName = …
String parentFolderPath = ...
StorageSharedKeyCredential credential = new StorageSharedKeyCredential(accountName, accountKey);
BlobServiceClientBuilder builder = new BlobServiceClientBuilder()
.endpoint(endpoint)
.credential(credential);
BlobServiceClient client = builder.buildClient();
BlobContainerClient containerClient = client.getBlobContainerClient(containerName);
Iterable<BlobItem> iterable = containerClient.listBlobsByHierarchy(parentFolderPath + "/");
for (BlobItem currentItem : iterable) {
   …
}

Google

String authKey = …
String projectId = …
String bucketName = …
String parentFolderPath = ...
ObjectMapper mapper = new ObjectMapper();
JsonNode node = mapper.readTree(authKey);
ByteArrayInputStream in = new ByteArrayInputStream(mapper.writeValueAsBytes(node));
GoogleCredentials credentials = GoogleCredentials.fromStream(in);
Storage storage = StorageOptions.newBuilder().setCredentials(credentials)
                        .setProjectId(projectId)
                        .build()
                        .getService();
Page<Blob> blobs = cloudStorage.listBlobs(bucketName, BlobListOption.prefix(parentFolderPath + "/"), BlobListOption.currentDirectory());
for (Blob currentBlob : blobs.iterateAll()) {
 ...
}

 

Most developers will discover these APIs by leveraging their favorite search engine. Driven by innovation and performance, cloud APIs become obsolete quickly. Amazon was the pioneer, and much of the documentation still indexed by Google is for the v1 SDK, while the v2 has been available for more than two years, but wasn’t a complete replacement. This sometimes makes research challenging for the simplest needs. Microsoft has migrated from v8 to v12 a bit more recently and has a similar challenge to overcome. Being the most recent major player, the Google SDK is not dragged down much by obsolete articles.

The second way that developers will discover an API is by using the official documentation. I found that the Microsoft documentation is the most accessible. There is a definite feel that the Microsoft Azure documentation is treated as an important part of the product, with lots of high-quality sample code targeted at beginners.

The third way that developers discover an API is by using their IDE’s code completion. All cloud vendors make heavy use of the builder pattern. The builder pattern is a powerful way to provide options without breaking backward compatibility, but slows down the self-discovery of the API. The Amazon S3 API also stays quite close to the HTTP protocol, using terminology such as “GetRequest” and “HeadRequest”. Microsoft had a higher level API in v8 where you were manipulating blobs. The v12 iteration moved away from apparent simplicity by introducing the concept of blob clients instead. Microsoft offers a refreshing explanation of this transition. Overall, I found that the Google SDK tends to offer simpler APIs for performing simple tasks.

There are more criterias than simplicity, discoverability when comparing APIs. Versatility and performance are two of them. The Amazon S3 Java SDK is probably the most versatile because of the larger number of applications that have used its technology. It even works with S3 clones such as MinIO Object Storage (and so does IVAAP). The space where there are still a lot of changes is asynchronous APIs. Asynchronous APIs tend to offer higher scalability, faster execution, but can only be compared in specific use cases where they are actually needed. IVAAP makes heavy use of asynchronous APIs, especially to visualize seismic data. This would be the subject of another article. This is an area that evolves rapidly and would deserve a more in-depth comparison.

For more information on IVAAP, please visit www.int.com/products/ivaap/

 


Filed Under: IVAAP Tagged With: API, cloud, Google, ivaap, java, Microsoft

Nov 09 2020

Human Friendly Error Handling in the IVAAP Data Backend

As the use cases of IVAAP grow, the implementation of the data backend evolves. Past releases of IVAAP have been focused on providing data portals to our customers. Since then, a new use case has appeared where IVAAP is used to validate the injection of data in the cloud. Both use cases have a lot in common, but they differ in the way errors should be handled.

In a portal, when a dataset fails to load, the reason why needs to stay “hidden” to end-users. The inner workings of the portal and its data storage mechanisms should not be exposed as they are irrelevant to the user trying to open a new dataset. When IVAAP is used to validate the results of an injection workflow, many more details about where the data is and how it failed to load need to be communicated. And these details should be expressed in a human friendly way.

To illustrate the difference between a human-friendly message and a non-human friendly message, let’s take the hypothetical case where a fault file should have been posted as an object in Amazon S3,… but the upload part of the ingestion workflow failed for some reason. When trying to open that dataset, the Amazon SDK would report this low-level error: “The specified key does not exist. (Service S3, Status Code: 404, Request ID: XXXXXX)”. In the context of an ingestion workflow, a more human friendly message would be “This fault dataset is backed by a file that is either missing or inaccessible.”

 

Screen Shot 2020-11-05 at 4.28.12 PM

 

The IVAAP Data Backend is written in Java. This language has a built-in way to handle errors, so a developer’s first instinct is to use this mechanism to pass human friendly messages back to end users. However, this approach is not as practical as it seems.  The Java language doesn’t make a distinction between human-friendly error messages and low-level error messages such as the one sent by the Amazon SDK, meant to be read only by developers. Essentially, to differentiate them, we would need to create a HumanFriendlyException class, and use this class in all places where an error with a human-friendly explanation is available.

This approach is difficult to scale to a large body of code like IVAAP’s. And the IVAAP Data Backend is not just code, it also comes with a large set of third-party libraries that have their own idea of how to communicate errors. To make matters worse, It’s very common for developers to do this:

 

    try {

             // do something here

        } catch (Exception ex) {

            throw new RuntimeException(ex);

        }

 

This handling wraps the exception, making it difficult to catch by the caller. A “better” implementation would be:

 

     

try {

             // do something here

       } catch (HumanFriendlyException ex) {

            throw ex;

        } catch (Exception ex) {

            throw new RuntimeException(ex);

        }

 

While this is possible to enforce this style for the entirety of IVAAP’s code, you can’t do this for third party libraries calling IVAAP’s code.

Another issue with Java exceptions is that they tend to occur at a low-level, where very little context is known. If a service needs to read a local file, a message “Can’t read file abc.txt” will only be relevant to end users if the primary function of the service call was to read that file. If reading this file was only accessory to the service completion, bubbling up an exception about this file all the way to the end-user will not help.

To provide human-friendly error messages, IVAAP uses a layered approach instead:

  • High level code that catches exceptions reports these exceptions with a human friendly message to a specific logging system
  • When exceptions are thrown in low level code, issues that can expressed in a human friendly way are also reported to that same logging system

With this layered approach where there is a high-level “catch all”, IVAAP is likely to return relevant human friendly errors for most service calls. And the quality of the message improves as more low-level logging is added. This continuous improvement effort is more practical than a pure exception-based architecture because it can be done without having to refactor how/when Java exceptions are thrown or caught. 

To summarize, the architecture of IVAAP avoids using Java exceptions when human-friendly error messages can be communicated. But this is not just an architecture where human-friendly errors use an alternate path to bubble up all the way to the user. It has some reactive elements to it.

For example, if a user calls a backend service to access a dataset, and this dataset fails to load, a 404/Not Found HTTP status code is sent by default with no further details. However, if a human friendly error was issued during the execution of this service, the status code changes to 500/Internal Server Error, and the content of the human friendly message is included in the JSON output of this service. This content is then picked up by the HTML5 client to show to the user. I call this approach “reactive” because unlike a classic logging system, the presence of logs modifies the visible behavior of the service.

With the 2.7 release of IVAAP, we created two categories of human friendly logs. One is connectivity. When a human friendly connectivity log is present, 404/Not Found errors and empty collections are reported with a 500/Internal Server Error HTTP status code. The other is entitlement. When a human friendly entitlement log is present, 404/Not Found errors and empty collections are reported with a 403/Forbidden HTTP status code.

The overall decision on which error message to show to users belongs to the front-end. Only the front-end knows the full context of the task a user is performing. The error handling in the IVAAP Data Backend provides a sane default that the viewer can elect to use, depending on context. OSDU is one of the deployments where the error handling of the data backend is key to the user experience. The OSDU platform has ingestion workflows outside of IVAAP, and with the error reporting capabilities introduced in 2.7, IVAAP becomes a much more effective tool to QA the results of these workflows.

For more information on INT’s newest platform, IVAAP, please visit www.int.com/products/ivaap/

 


Filed Under: IVAAP Tagged With: data, ivaap, java, SDK

Nov 06 2020

How to Get the Best Performance out of Your Seismic Web Applications

One of the most challenging data management problems faced in the industry is with seismic files. Some oil and gas companies estimate that they acquire a petabyte of data per day or more. Domain knowledge and specific approaches are required to move, access, and visualize that data.

In this blog post, we will dive deep into the details of modern technology that can be useful to achieve speed up. We will also cover: common challenges around seismic visualization, how INT helps solve these challenges with advanced compression and decompression techniques, how INT uses vectorization to speed up compression, and more.

What Is IVAAP?

IVAAP is a data visualization platform that accelerates the delivery of cloud-enabled geoscience, drilling, and production solutions.

  • IVAAP Client offers flexible dashboards, 2D & 3D widgets, sessions, and templates
  • IVAAP Server side connects to multiple data sources, integrates with your workflows, and offers real-time services
  • IVAAP Admin client manages user access and projects

Screen Shot 2020-11-04 at 2.37.45 PM

 

Server – Client Interaction

Interaction occurs when the client requests a file to display from the server, the server returns the file lists, the user chooses a file to display, and then the server starts sending chunks of data while it displays this data.

Screen Shot 2020-11-04 at 2.42.21 PM

Some issues encountered with this scheme include:

  • Seismic data files are huge in size — they can be hundreds of gigabytes or even terabytes.
  • Because of the file size, it takes too much time to transfer files via network.
  • The network can have too much bandwidth.

The goals of this scheme are to:

  • Speed up file transfer time
  • Reduce data size for transfer
  • Add user controls for different network bandwidth

And the solution:

  • We decided to implement server-side compression and client-side decompression. We also decided to provide the client parameter that we call acceptable error level after the seismic data file compression/decompression process.

Screen Shot 2020-11-04 at 2.43.13 PM

 

By taking a closer look at compression and decompression data, we can see that the original seismic data goes through a set of five transformations — AGC, Normalization, Hear Wavelets, Quantization, and Huffman. As a result of this transformation, we get a compressed file that can be sent to clients via network. And on the client’s side, there is a decompression process that goes in different directions — from inverse Huffman to inverse AGC. This is the way that clients get original data. It does not get precise, original data. But it gets data after the compression and decompression process. That’s why we added an acceptable error level after the compression and decompression process. This is because we have different scenarios where clients don’t always require the full original data with the full level of precision. For example, sometimes the client only needs to review the seismic data. So using this acceptable error level, they can control how much data will be passed by a network and, of course, speed up this process. 

The resulting scheme looks like this:

Screen Shot 2020-11-04 at 2.44.32 PM

 

The client requests a file list from the server, the user chooses a file to display, and then the server starts sending the data and compresses it. The server then sends it to the client, the client decompresses, and finally, it displays the data. This is repeated for each tile to display.

So why not use any other existing compression, like GZIP, LZ Deflate, etc.? We tried these compressions, but we found out that this type of compression is not as effective as we’d like it to be on our seismic data.

Server-Side Interaction

The primary objective was to speed up the current implementation of compression and decompression on both the server and client side.

The proposal:

  • Server-side compression is implemented in Java, so we decided to create C++ implementation of compression sequence and use JNI layer to call native methods. For the client-side decompression, we implemented in JavaScript to create C++ implementation of decompression and use WebAssembly (WASM) proposal for integrating C++ code into JS.
  • We implemented both compression and decompression algorithms in C++, but after comparing the results and performance of C++ and Java, we discovered that C++ was just 1.5 times faster than “warmed up JVM”. That’s why we decided to move on and apply SIMD instructions for further speedup.

Single Instruction Multiple Data (SIMD)

Screen Shot 2020-11-04 at 2.51.30 PM

SIMD architecture performs the same operation on multiple data elements in parallel. For Scalar operation, you have to perform four separate calculations to get the right result. For SIMD operations, you apply one vector value calculation to get the correct result.

SIMD benefits:

  • Allows processing of several data values with one single instruction.
  • Much faster computation on predefined computation patterns.

SIMD drawbacks:

  • SIMD operations cannot be used to process multiple data in different ways.
  • SIMD operations can only be applied to predefined processing patterns with independent data handling.

Normalization: C++ scalar implementation

Screen Shot 2020-11-04 at 2.47.45 PM

Normalization: C++ SIMD SSE implementation

Screen Shot 2020-11-04 at 2.48.09 PM

Server-Side Speedup Results

There are different types of speedup for different algorithms:

  • Normalization is 9 times faster than the scalar C++ version
  • Haar Wavelets is 6 times faster than the scalar C++ version
  • Huffman has no performance increase (not vectorizable algorithm)

Overall, the server-side compression performance improvement is around 3 times faster than the Java version. This is applying SIMD C++ code. This was good for us, so we decided to move on to the client-side speedup.

Client-Side Speedup

For the client-side speedup, we implemented decompression algorithms in C++ and used WASM to integrate the C++ code in JavaScript.

WebAssembly

WASM is:

  • A binary executable format that can run in browsers
  • A low-level virtual machine
  • A high-level language compile result

WASM is not: 

  • A programming language
  • Connected to the web and cannot be run outside the web

Steps to get WASM working:

Screen Shot 2020-11-04 at 2.48.28 PM

  • Compile C/C++ code with Emscripten to obtain a WASM binary
  • Bind WASM binary to the page using a JavaScript “glue code” 
  • Run app and let the browser instantiate the WASM module, the memory, and the table of references. Once that is done, the WebApp is fully operative. 

C++ Code to Integrate (TaperFilter.h/cpp)

Screen Shot 2020-11-04 at 2.48.55 PM

Emscripten Bindings 

Screen Shot 2020-11-04 at 2.49.12 PM

WebAssembly Integration Example

Screen Shot 2020-11-04 at 2.51.30 PM

Client-Side Speedup Takeaways:

  • Emscripten supports the WebAssembly SIMD proposal
  • Vectorized code will be executed by browsers
  • The results of vectorization for decompression algorithm are: 
    • Inv Normalization: 6 times speedup
    • Inv Haar Wavelets: 10 times speedup
    • Inv Huffman: no performance improvement (not vectorizable)

Overall, the client-side decompression performance improvement with vectorized C++ code was around 6 times faster than the JavaScript version.

For more information on GeoToolkit, please visit int.com/geotoolkit/ or check out our webinar, “How to Get the Best Performance of Your Seismic Web Applications.”


Filed Under: IVAAP Tagged With: compression, ivaap, java, javascript, seismic

Apr 23 2020

Opening IVAAP to Your Proprietary Data Through the Backend SDK

When doing demos of IVAAP, the wow factor is undeniably its user interface, built on top of GeoToolkit.JS. What users of IVAAP typically don’t see is the part accessing the data itself, the IVAAP backend. When we designed the IVAAP backend, we wanted our customers to be able to extend its functionalities. This is one of the reasons we chose Java for its programming language—customers typically have access to Java programmers.

Java is the programming language; it is a well-known, generic-purpose language, but the IVAAP Backend Software Development Kit (SDK) is typically only discovered during an IVAAP evaluation. In previous articles, I described the Lookup API (How to Empower Developers with à la Carte Deployment in IVAAP Upstream Data Visualization Platform) and the use of scopes (Using Scopes in IVAAP: Smart Caching and Other Benefits for Developers). As the SDK has grown, I thought it would be a good time to review what else this SDK provides.

One Optimized Use Case: Plugging Your Own Data

The most common question that I get is: “I see that you can access a WITSML datasource, a PPDM database. I have my own proprietary store for geoscience data, what do I need to do to make IVAAP visualize the data for my data store?” This is where the SDK comes into play. You do not need to modify IVAAP backend’s code to add your own data. In a nutshell, you just need to write a few Java classes, compile them, and add them to your IVAAP deployment.

The Java classes you write need to meet the Application Programming Interface (API) that the SDK defines. If you are a developer, this answer is not enough, this is the textbook definition of a SDK. What makes the IVAAP Backend SDK efficient for our use case is that you only need to write the API for the data you have. Since IVAAP’s built-in data model allows the visualization of maybe 30 different aspects of a well (log curves, deviations, tubing sets, mud logs, raster logs, etc), you only need to write classes for the data you have. For example, to visualize log curves, regardless of how these curves are stored, you only need to write about a dozen classes for a complete implementation.

The next question I get at this point is: “How do I know what to write?”. There is a large amount of documentation available. During the evaluation process, you are granted access to our developers site. This site is a reference used by all INT developers working on the IVAAP backend, whether they are developing IVAAP itself, or creating plugins for customers. It’s a Wiki and gets updated regularly. When I get support questions about the SDK, I typically will write an article in that Wiki and share the link. This is not the only piece of documentation available. There is a classic JavaDoc documentation that details the API in a formal manner. And there is also sample code. We created a sample connector to a SQL database storing well curves, trajectories, well locations and schematics as a practical example on how to use the SDK.

An Extensive Geoscience Data Model to Leverage

Lots of work has been done in IVAAP to facilitate workflows associated with wells, whether they are drilling workflows, production monitoring workflows, or just to manage an inventory. Specifically, IVAAP has a data model to expose the location of wells, log curves, deviation curves, mud logs, schematics, fracking, core images, raster logs, tops and any type of well documentation. Wells are not the only data models that IVAAP includes. Other models exist for seismic data and reservoirs. Several types of surfaces are also supported such as faults, grid surfaces, triangle meshes and seismic horizons.

These data models were built over-time based upon the common denominator between models coming from different systems. For example, if you are familiar with WITSML, you will find that the definition of a well log resembles what WITSML provides, but is flexible enough to also support LAS and DLIS files. From a developer perspective, the data model is exposed through the SDK’s API, without making any assumption on how this data is stored. The data model works for data stored in the cloud, on a file system, in a SQL database, and even data exposed only through a web service. While most of IVAAP’s connectors access one form of data store at a time, some connectors mix storages to combine data from web services and cloud storages. IVAAP’s data model is storage-agnostic, and the services to expose this data model to the HTML5 client are storage-agnostic as well.

IVAAP covers the most common data types found in geoscience. It provides the services to access this data, and the UI to visualize it. When starting an IVAAP development project, most developers should only have to focus on plugging their data, expressing through the SDK’s API on how to retrieve this data.

An API to Customize Entitlements

There is one more way that the IVAAP SDK makes the developer experience seamless when plugging a proprietary datastore. Not only does no code have to be written to expose this data to the viewer, but no code has to be written to control who has access to which data. Both aspects are built-in into the code that will call your implementation. You only have to write the data access layer, and not worry about entitlements or web services. By default, entitlements are based upon the information entered in the IVAAP Administration application.

This separation between data access and entitlements saves development time, but there are cases when a data store controls both data and access to this data. When IVAAP needs to access such an integrated system, the entitlement checks layer needs to be performed by the data access code. The entitlement API allows these checks to be performed at the data level.

The entitlement API is actually very fine-grained. You can customize the behavior of each service to limit access to specific data points. For example, the default behavior of IVAAP is to grant access to all curves of a well when you have been granted access to that well. Depending on your business rules, you might elect to restrict access to specific log curves. The SDK doesn’t force you into an “all or nothing” decision.

An API to Implement Your Own REST Services

Another typical use case is when you need to give access to data that doesn’t belong to the IVAAP built-in data model. In this particular situation, you need to extend IVAAP by adding custom widgets, and ad-hoc web services are needed to expose the relevant data to this widget. There is of course an API for this. External developers use the same API as INT developers to implement web services. INT has developed more than 500 REST services using this API, and external developers benefit from this experience.

Most services are JSON-based, and IVAAP uses the jackson libraries to create JSON content. To advertise capabilities to the HTML5 client, the IVAAP backend uses HATEOAS links. For example, if the JSON description of a well has a link to the mud logs services, then this well has mud logs. If this link is not present, the HTML5 client understands that this well doesn’t contain mud logs, and will adapt its UI accordingly. If you were to add your own service exposing more data associated with a well, you would typically want to add your own HATEOAS to the description of wells. Adding HATEOAS links to existing services is possible by plugging so-called Entity classes. You do not need to modify the code of this service to modify its behavior.

IVAAP’s REST services follow the OpenAPI specifications. There is actually a built-in web service whose only purpose is to expose the available services in the classic Swagger format. IVAAP’s SDK uses annotations similar to the Swagger Annotations API. If you are familiar with this API, documenting your own REST services should be a breeze.

Most of the REST services are JSON-based, but sometimes binary streams are used instead for performance reasons. Binary streams are typically used in IVAAP to expose seismic data, but also surfaces. The SDK uses events to implement such streaming services.

An API to Implement Your Own Real Time Feeds

The service API is not limited to REST services. An API is also available to communicate with the IVAAP HTML5 client through websockets. The WebSockets API is typically used to implement real time communications between the client and the server. For example, when a user opens a well, the user interface uses websockets to send a subscription message to the backend, requesting to be notified if this well changes. This enables a whole set of capabilities, such as real time monitoring. This is the API we use to monitor wells from WITSML datasources. The SDK includes an entire set of hooks so that customers can write their own feeds, including subscription, unsubscription and broadcast of messages.

When you write REST services, the container details are abstracted away and you only need to worry about implementing domain-related code. A REST service working in a Tomcat based development environment will work without any modification in a Play cluster. Likewise, feeds developed with the SDK work seamlessly in both Tomcat and Play. On a developer station, the SDK will use end points from the Servlet API to carry messages. In a Play cluster, the SDK will use ActiveMQ. ActiveMQ allows scalability and reliability features that servlets miss, such as high-rate of messages, and reliable delivery of messages. The use of ActiveMQ is transparent to the developers of feeds.

Utilitarian APIs

There is more to the IVAAP SDK than its APIs to access data, write services or customize entitlements. There are a few other APIs worth mentioning. One of them is the API to perform CRS conversions. Its default implementation uses Apache SIS, but the API itself is generic in nature. CRS conversions are often needed in geoscience, for example to visualize datasets on a map, on top of satellite imagery. Years of work has been built into the Apache SIS library, and virtually no work is needed by IVAAP developers to leverage this library when the SDK is used.

There are also APIs to execute code at startup and to query the environment that IVAAP is running on. The Lookup API gives access to the features that are plugged. The DataSource API indicates which data sources are configured to run in the JVM. The Hosted Services API provides an inventory of the external services that an IVAAP instance needs to interact with. A hosted service could be the REST service that evaluates formulas, or the machine learning system that IVAAP feeds its data to.

A “Developer-Friendly” Development Environment

We made lots of efforts to make sure the development process would be as simple as possible. Developers with experience with Java Servlets will be at ease with their IVAAP development environment. They will use tools they are familiar with such as Eclipse and Tomcat. A production instance of IVAAP doesn’t use servlets, it uses the Play framework. By following the SDK’s API, it is virtually transparent to developers that their code will be deployed in a cluster.

There are a few instances where awareness of the cluster environment is needed. For example, when caching is involved, you want to make sure that all caches are cleared across all JVMs when data gets updated. The IVAAP SDK includes an API to send and receive cluster events, and to create your own events. Since events are serialized from/to JSON, instances in the cluster do not need to share the same build version to interact with each other. This was a deliberate design choice so that you can upgrade your cluster while it’s running, without service interruption.

Caching is a large topic, outside of the scope of this article. IVAAP’s SDK proposes a “DistributedStore” API that hides the complexity of sharing state across JVMs. As long as you use this API, code that caches data will work without any modification in a single-JVM development environment and a multiple-JVMs production environment.

Finally, the SDK’s API is designed to allow fast iterative development. For example, once you have implemented the two classes that define how to list wells in your datastore, you can test them right away with Postman. Earlier I wrote that plugging your own log curves requires about a dozen classes. There is no need to write all twelve to start seeing results. Actually, you do not need to launch Postman to test your web services. You can test services using JUnit. A REST service written with the SDK can be tested with JUnit. This saves time by eliminating the need to launch Tomcat.

When you evaluate IVAAP, you might not have enough time to grasp the depth of the IVAAP SDK. Hopefully, this guide will help you get started.


Filed Under: IVAAP Tagged With: API, geoscience, ivaap, java, REST, SDK

  • Go to page 1
  • Go to page 2
  • Go to Next Page »

Footer

Solutions

  • For E&P
  • For OSDU Visualization
  • For Cloud Partners
  • For Machine Learning
  • For CCUS
  • For Geothermal Energy
  • For Wind Energy
  • For Enterprise
  • Tools for Developers
  • Customer Success Stories

Products

  • IVAAP
  • GeoToolkit
  • INTViewer
  • IVAAP Demos
  • GeoToolkit Demos

About

  • News
  • Events
  • Careers
  • Management Team

Resources

  • Blog
  • FAQ

Support

  • JIRA
  • Developer Community

Contact

INT logo
© 1989–2023 Interactive Network Technologies, Inc.
Privacy Policy
  • Careers
  • Contact Us
  • Search

COPYRIGHT © 2023 INTERACTIVE NETWORK TECHNOLOGIES, Inc