Build a Dataset

Data is the DNA of AI models. With a quality dataset, you can train different models and test variations with ease. In this guide, we will show you how to import your data, work with it, and expand it.

A dataset for fine-tuning typically consists of both inputs and outputs. When designing a dataset for a particular use case, ask yourself these questions:

  • What do I want to get out of my model? What format should it be in?

  • Which data is actually relevant to the task?

  • If I wanted a human to do the job, what data would I provide them? How would they filter it before doing the task—can I pre-filter the data at all to make sure there is nothing extraneous?

  • Am I asking the model to do more than one thing at the same time? How can I make the job easier for the model?

If you are hoping to get more than one output back from your model, consider using separate models, each dedicated to its own task. In programming, this is called "separation of concerns," and it applies to designing AI models too.

In general, the simpler you make the task upfront, the better your chances are of creating a high-performing model.

To see how datasets work in Entry Point, we'll start by importing a CSV.

Last updated