5.1 Creating Training Data
Overview
The Training Data workflow wizard is designed to generate high-quality, class-labelled spatial data used to train AI models for vegetation & erosion classification. Users define training areas, apply automated segmentation tools (like SAM Wand), and assign vegetation types to build datasets for model training or accuracy testing.
Create Training Data
The Training Data wizard guides you through a single workflow for preparing truth data.
The first step is to define where training data will be created, followed by generating and publishing the data.
Step 1 — Create Training Areas

Training Area Tool

Training Area Tool
Overview
Training Areas define where in your imagery training data will be created.These areas are now generated using preset spatial extents rather than freehand drawing.
Access
Select the Training Data tool in the top toolbar.A dropdown will appear allowing you to choose a Training Area size.
How to create a Training Area
- Click the Training Data tool in the top toolbar.
- Select a size from the Training Area dropdown (Small / Medium / Large).
- Click once on the map to place the area.
- The Training Area will appear automatically and can be repositioned if required.
Import Existing Areas
You may also import Training Areas from existing files in the following format:- Shapefile (.zip)
- FlatGeoBuff
- Use Small for highly detailed targets (e.g., erosion scar edges)
- Use Medium for mixed vegetation patches
- Use Large for broad distributions or low-variation classes
- Create multiple Training Areas across varied conditions (slope, aspect, vegetation density)
Step 2 — Create Training Data
Once a Training Area is selected, use any of the tools below to generate training data inside it.

SAM Wand Tool

SAM Wand Tool
Overview
Use the SAM Wand tool to rapidly create complex polygons such as trees or vegetation clusters with minimal manual input.Access
Click the SAM Wand Tool tool in the toolbar.How It Works
-
Left-Click: Add point to selection
-
Right-Click: Add negative point (exclude from selection)
-
Shift + Click + Drag: Draw selection box
-
Alt + Click + Drag: Pan while drawing
-
S: Toggle segmentation preview
-
Enter: Confirm current selection
-
Space: Clear all points and reset selection
tip- Excellent for dense or irregular shapes like canopy areas.
- Minimizes manual drawing for faster dataset generation.

Segment Area Tool

Segment Area Tool
Overview
The Segment Area tool allows you to classify an entire Training Area automatically.You can segment using either:
- SAM Mode — AI-driven segmentation using SAM for pixel grouping
- Tyton Mode — classification using a previously trained Tyton model
This tool is ideal for rapidly generating large amounts of training data with minimal manual effort.
Access
Click the Segment Area option inside the Create Training Data panel.How It Works
1. Select Segmentation Mode
You will see two mode options:- Tyton — Uses your trained classification model
- SAM — Uses the SAM segmentation engine
Tyton Mode
Use this mode if you have already trained a model for your project.- Select a Training Area to classify.
- Choose your trained model from the model dropdown.
- Select which classes you want included in the segmentation.
- Click Classify to run segmentation.
The result will generate polygons inside your Training Area labelled with the selected classes.
SAM Mode
Use this mode to generate unsupervised segmentation without needing a trained model. This creates SAM-based polygons that can later be assigned to classes manually or refined with other tools.- Select your Training Area.
- Switch to SAM Mode.
- Choose your segmentation scale (if available).
- Click Segment to generate SAM polygons.
SAM Mode creates a detailed segmentation map that can be cleaned, merged, or class-assigned using tools like the SAM Wand or standard editing tools.
Results
- All generated polygons appear inside the Training Area.
- Polygons are editable using the editing tools.
- Use SAM Mode early in a project when no trained model exists.
- Use Tyton Mode once your model is trained for higher accuracy and class-specific predictions.
- You can segment multiple Training Areas to build large datasets quickly.
- Always review and clean polygons before publishing for best model performance.

Class Predictor (Beta)

Class Predictor (Beta)
Overview
The Class Predictor (Beta) learns from your class labelled training data and then suggests classes inside a Training Area.As you continue to label more polygons, the predictor learns and its suggestions improve over time.
Access
Open the Create Training Data workfow wizard and click Class Predictor (Beta).How It Works
1. Process Training Areas
Before predictions can be shown, the Class Predictor must analyse your Training area's.- Select one or more Training Areas in the dropdown.
- Click Process Training Areas.
- Wait for your training areas to be processed.
The predictor uses the polygons and classes you have already labelled as examples to learn from.
2. Enable Class Prediction
Once processing is complete, you can enable live predictions for that Training Area.- Toggle the Class Prediction switch to Activate.
- Predictions will now be available when using supported tools.
3. Use Predictions with Supported Tools
Class predictions are surfaced through two tools:- Class Painter Tool — paints polygons directly with the Predicted Class.
- SAM Wand Tool — creates new polygons and fills them using the Predicted Class.
With either tool selected:
- Choose Predicted Class from the class dropdown.
- Move your cursor over the Training Area and interact as normal (paint or SAM selection).
- Polygons will be assigned the predicted class when created using SAM or hovered over using the Class Painter.
4. Control Confidence Threshold
You can control when predictions are displayed using the confidence slider.- Use “Show predictions when confidence is above” to set a percentage threshold.
- Only predictions with a confidence equal to or higher than this value will be shown.
This helps you hide low-confidence guesses and focus on reliable suggestions.
Behaviour & Learning
- The Class Predictor is updated as you label more polygons.
- Early in a project, predictions may be noisy; they improve as your training data grows.
When to Use
- After you have labelled an initial set of polygons in a Training Area.
- When scaling up from a small manually labelled dataset to a much larger one.
- During iterative refinement, where you periodically add more labels and reprocess.
Limitations (Beta)
- Requires existing labelled data to be effective.
- Quality of predictions depends heavily on the variety and correctness of your labels.
- Very imbalanced datasets (one dominant class) may bias the predictions.
- Start by carefully labelling a small but representative set of polygons, then run Process Training Areas.
- Use a higher confidence threshold when you want fewer but more reliable predictions.
- Combine Class Predictor with the Segment Area for fast relabelling of existing polygons, and with the SAM Wand to rapidly create new, labelled polygons.

Layer Correction

Layer Correction
Overview
Refine training polygon boundaries using spectral layers and vegetation indices such as NDVI, VGI, or MSAVI. This tool is essential for improving class accuracy where visual overlap occurs.Access
Click the Layer Correction icon under the Map Tools section of the toolbar.How It Works
- Activate the Layer Correction view
- Select from available indices (e.g., NDVI, VGI, MSAVI)
- Use visual cues to identify class boundaries more precisely
- Adjust polygons accordingly using the Edit Tool
- Use NDVI or MSAVI for differentiating vegetation types
- Combine with the Edit Tool to improve the spatial precision of your training data
- Enhances classification performance when classes like Shrub and Tree visually blend together
Step 3 — Publish Training Data (Optional)

Publish Training Data

Publish Training Data
Overview
Publishing finalises your labelled training data so it can be reused across different projects, and enables it to be used in the classification workflow and for training Tyton models.Access
Open the Create Training Data workfow wizard and select Publish Training Data.How It Works
1. Name Your Training Dataset
Enter a clear and descriptive name in the Training Dataset Name field.2. Select Training Areas
Choose one or more Training Areas that contain the labelled polygons you want to publish.3. Publish
Click Publish Training Dataset to finalise and save your training data.What Publishing Does
- Packages all labelled polygons from the selected Training Areas.
- Saves the dataset so it can be reused across multiple projects.
- Makes the dataset available for classification and training TytonAI models.
- Locks in the training data so it cannot be accidentally changed.
Notes
- Only polygons inside the selected Training Areas are included.
- Ensure your training data is labelled and reviewed before publishing.
- If you create new polygons later, publish a new dataset for updated results.
- Use clear dataset names so you can recognise versions later.
- Publish after completing a meaningful batch of labelling for best iterative training results.
- You can publish multiple datasets throughout the project lifecycle.
- Start with a small number of clean, representative Training Areas before scaling up.
- Spread Training Areas across different terrain types, lighting conditions, and vegetation densities.
- Use Small areas for fine features, Medium for general vegetation, and Large for broad patterns.
- Increase the prediction confidence threshold if you only want high-certainty class suggestions.
- Publish datasets in meaningful batches and use clear versioned names.
- A small amount of high-quality, diverse training data outperforms large inconsistent datasets.