Property Selection¶
Select the properties¶
The “Property Selection” tab is divided into two sections: the left side for selecting properties, and the right side for displaying the correlation matrix between the properties. The left section allows users to select or deselect various properties for use in modeling. To modify your selections, simply click on the desired properties and press update
.
Correlation Matrix¶
A pearson correlation matrix is displayed in this tab, showing the relationships between data points across the entire dataset. For any two properties, the correlation is calculated using all common data points, excluding any with missing values. If two properties are highly correlated (the correlation value is close to 1 or -1), it suggests they convey similar information. In such cases, it is advisable to deselect one of them.
Property Selection panel and associated correlation matrix.
Why is property selection important?¶
Selecting or deselecting data is a critical step in the process, as certain data may be relevant, while others could introduce bias. In general, an expert should select properties that are geologically relevant and avoid those that are not. The best properties might be found through a trial-and-error process until a good balance between validation score and the relevance of selected properties is achieved.
For example, data that replicate the information of the target (e.g., distance to mineralization) should be deselected. Additionally, data that lack geological relevance (e.g., time, distance to drill holes) should also be omitted.
This approach can be applied at different scales. Some properties can be geologically known to be associated with mineralization. Using those properties might yield a better validation score but will miss areas where a potential distal alteration halo is visible. Moreover, different predictions using different properties can yield different results that may be of interest for various exploration strategies.
For example, in the Flin Flon case example, Silver, Copper, Zinc, and Arsenic are not selected as they are known to be associated with the proximal alteration of mineralization. Instead, we are looking for distal alteration to find new targets. But in a different context, these properties could be selected to search for the areas associated with proximal alteration.
Handling Missing Values¶
It is possible that some of the selected properties may also contain missing values, which can cause issues within the application. Error messages may appear at the start, indicating ‘properties contain no data values’. This situation can arise when missing values in different datasets overlap, leading to an empty dataset. This dataset might be entirely empty or partially so, with either the positive or negative dataset lacking data. In such cases, users should deselect properties with excessive missing values.
It is crucial to understand that the final prediction will not be generated for points that have missing values in the selected properties.