Workflow
========

The tools within the workflow are designed to be used in conjunction with each other. Data scientists do not adhere to a linear process but instead navigate through the workflow in a dynamic manner, revisiting and revising their decisions. They may adjust parameters, modify the selection of properties, or alter the testing dataset multiple times to achieve results they deem relevant.

The current workflow is structured to replicate this iterative logic, allowing users to move back and forth among their decisions until they obtain a satisfying result. This flexibility facilitates the adjustment of parameters, the refinement of property selections, and the modification of the testing dataset as needed. The various interactions within the workflow, designed to support this dynamic process of analysis and refinement, are detailed in :numref:`figure_workflow`.

.. _figure_workflow:

.. figure:: ./images/workflow/workflow.png
    :align: center
    :scale: 50%

    *The interactions between the different panels of the targeting-workflow.*

The interactions listed from A to H in Figure 1 are described below:

A. After `data preparation`_, users must `select the target`_ in the first menu opened from `Geoscience ANALYST`_. Once the target is selected and the workflow initiated, the only interaction possible between Geoscience ANALYST and the application is exporting the data.

B. As mentioned in the `data preparation`_ section, some data may contain missing values. Consequently, the defined training and testing group and could be impacted if a property with a significant number of missing values is included, resulting in fewer available points. Users are encouraged to deselect such data in the `properties panel`_ and rerun the separation in the `train-test split`_ panel.

C. Histograms allow users to verify that the distributions of the training, testing, and entire dataset are consistent. If significant discrepancies are observed between the training and testing datasets for any property, it could lead to overfitting and degrade model performance. A considered decision might involve dropping the problematic properties or adjusting the training-testing sets.

D. The `Property EDA`_ panel ensures the data utilized by the models are well-distributed. For instance, a property excessively influencing the model could indicate it distinctly identifies positive data, which the model then heavily relies upon for predictions. If overfitting is suspected, examining the histogram may provide insights into the cause.

E. Properties showing negative, positive, or null importance can be deselected or reselected in the `properties panel`_ to assess their impact on the prediction score. Sometimes, removing a property with a positive importance score may lead to better predictions and a more generalized model.

F. The property selection and Property EDA tabs function concurrently. Reviewing the data's histograms can inform the decision to select or deselect properties. Any property presenting a non-statistical distribution in the histograms should be considered for removal to prevent overfitting.

G. The modeling and testing tabs are interconnected. The predictive model's effectiveness can vary significantly depending on the chosen training and testing datasets. Identifying a relevant pair of datasets that represents the entire dataset and the target object is crucial. Conducting multiple tests with different training and testing datasets is advisable.

H. After running several predictions, the results can be pushed back to `Geoscience ANALYST`_. The application remains usable even after data export, allowing for continuous iteration and refinement.

.. _data preparation: data_preparation.rst
.. _select the target: target_selection.rst
.. _Geoscience ANALYST: https://www.mirageoscience.com/mining-industry-software/geoscience-analyst/
.. _train-test split: train_test_split.rst
.. _properties panel: property_table.rst
.. _Property EDA: property_inspection.rst