Property Selection

Select the properties

The “Property Selection” tab is divided into two sections: the left side for selecting properties, and the right side for displaying the correlation matrix between the properties. The left section allows users to select or deselect various properties for use in modeling. To modify your selections, simply click on the desired properties and press update.

By default, the properties are preselected to exclude those with too many no-data values. However, the intersection of selected properties can still lead to an empty dataset. In such cases, an error will be raised. Simply deselect the properties causing the issue and press Update again.

As users select properties, the application will display a message in the console indicating the number of available points for the selected properties. This information helps users understand how many data points will be used in the predictive models. If there are too few positive or negative training points, users should be aware that the model might overfit. Additionally, if no points are available, an error will be raised. Users should then adjust their selection to ensure a sufficient number of data points for training.

_images/console_data_used.png

Figure 6 When properties are selected, a message showing the number of available points is printed.

Correlation Matrices

Three matrices can be visualized in the “Property Selection” tab. Users can navigate between these matrices using the buttons at the top right of the images. The available matrices are:

1. Pearson Correlation Matrix

The Pearson correlation matrix shows the linear relationships between properties across the entire dataset. For any two properties, the correlation is calculated using all common data points, excluding any with missing values. A high correlation value (close to 1 or -1) indicates that the two properties carry similar information. In such cases, it is recommended to deselect one of them to avoid redundancy.

_images/pearson_correlation_matrix.png

Figure 7 Pearson Correlation Matrix

2. Spearman Correlation Matrix

The Spearman correlation matrix is useful for identifying non-linear relationships. It is computed using the rank of the data points rather than their actual values, making it more robust to outliers and non-linear patterns. This matrix complements the Pearson matrix by revealing additional dependencies between properties.

_images/spearman_correlation_matrix.png

Figure 8 Spearman Correlation Matrix

3. No-Data Matrix

The No-Data matrix displays the percentage of missing values for each property. It allows users to quickly identify properties with high levels of missing data, which may compromise model performance. Properties with a high proportion of no-data values should be deselected to maintain dataset quality. The matrix also includes the target and non-target properties, providing insight into their respective completeness.

_images/nodata_matrix.png

Figure 9 No-data Matrix

Why is property selection important?

Selecting or deselecting data is a critical step in the process, as certain data may be relevant, while others could introduce bias. In general, an expert should select properties that are geologically relevant and avoid those that are not. The best properties might be found through a trial-and-error process until a good balance between validation score and the relevance of selected properties is achieved.

For example, data that replicate the information of the target (e.g., distance to mineralization) should be deselected. Additionally, data that lack geological relevance (e.g., time, distance to drill holes) should also be omitted.

This approach can be applied at different scales. Some properties can be geologically known to be associated with mineralization. Using those properties might yield a better validation score but will miss areas where a potential distal alteration halo is visible. Moreover, different predictions using different properties can yield different results that may be of interest for various exploration strategies.

For example, in the Flin Flon case example, Silver, Copper, Zinc, and Arsenic are not selected as they are known to be associated with the proximal alteration of mineralization. Instead, we are looking for distal alteration to find new targets. But in a different context, these properties could be selected to search for the areas associated with proximal alteration.

Handling Missing Values

It is possible that some of the selected properties may also contain missing values, which can cause issues within the application. Error messages may appear at the start, indicating ‘properties contain no data values’.

To address this, users should deselect properties with a high percentage of missing values. The No-Data Matrix, available in the “Property Selection” tab, helps identify such properties and should be used to guide selection. Additionally, users can monitor the number of valid points remaining after selecting properties, which is displayed in the console.

It is essential to note that the final prediction will not be computed for any point with missing values in one or more of the selected properties. Therefore, handling missing values—by removing affected properties, imputing values, or preprocessing upstream in Geoscience ANALYST, is a critical step in ensuring successful and reliable modeling.