# Inference Parameters

### Inference Parameters

Although they vary from provider to provider, here are the most common settings you can adjust when generating completions.

#### Temperature

Language models typically produce a list of possible next tokens before selecting one. A higher temperature makes it more likely than the model will select a less-likely token. That results in more randomness and variety as temperature increases. This is often thought of as a proxy for creativity. At a temperature of 0, the model will always select the most likely token.

{% hint style="info" %}
There are sources of non-determinism that originate from the way GPUs perform matrix multiplication that could result in different outputs at a temperature of 0, but these are negligible compared to the impact of temperature.
{% endhint %}

#### Top P

Top P is the top percentile of tokens you want to consider when selecting the next token (before temperature is considered). A value of 0.9 corresponds to considering 90% of possibilities, while 0.10  only considers the top 10% of token possibilities. You can use Top P and temperature together for interesting results.

#### Max Tokens

LLMs keep generating text until they generate an "End of Sequence" token, also called a "stop" token. You can also stop them early by setting Max Tokens. It's always a good idea to set a reasonable max limit for token output to prevent excessive token usage, especially with models capable of long outputs.

#### Number of Completions

Some [model providers](https://docs.entrypointai.com/key-concepts/model-providers) allow you to generate multiple completions in a single request. This is only useful if you have a temperature greater than 0.
