Search results
Results from the WOW.Com Content Network
In machine learning, hyperparameter optimization [1] or tuning is the problem of choosing a set of optimal hyperparameters for a learning algorithm. A hyperparameter is a parameter whose value is used to control the learning process, which must be configured before the process starts.
In machine learning, a hyperparameter is a parameter that can be set in order to define any configurable part of a model's learning process. Hyperparameters can be classified as either model hyperparameters (such as the topology and size of a neural network) or algorithm hyperparameters (such as the learning rate and the batch size of an optimizer).
In Bayesian statistics, a hyperparameter is a parameter of a prior distribution; the term is used to distinguish them from parameters of the model for the underlying system under analysis. For example, if one is using a beta distribution to model the distribution of the parameter p of a Bernoulli distribution , then:
The parameter is called the hyperparameter, while its distribution given by (,) is an example of a hyperprior distribution. The notation of the distribution of Y changes as another parameter is added, i.e. Y ∣ θ , μ ∼ N ( θ , 1 ) {\displaystyle Y\mid \theta ,\mu \sim N(\theta ,1)} .
Coordinate descent is an optimization algorithm that successively minimizes along coordinate directions to find the minimum of a function.At each iteration, the algorithm determines a coordinate or coordinate block via a coordinate selection rule, then exactly or inexactly minimizes over the corresponding coordinate hyperplane while fixing all other coordinates or coordinate blocks.
Bayesian optimization of a function (black) with Gaussian processes (purple). Three acquisition functions (blue) are shown at the bottom. [8]Bayesian optimization is typically used on problems of the form (), where is a set of points, , which rely upon less (or equal to) than 20 dimensions (,), and whose membership can easily be evaluated.
However, these optimization techniques assumed constant hyperparameters, i.e. a fixed learning rate and momentum parameter. In the 2010s, adaptive approaches to applying SGD with a per-parameter learning rate were introduced with AdaGrad (for "Adaptive Gradient") in 2011 [ 16 ] and RMSprop (for "Root Mean Square Propagation") in 2012. [ 17 ]
English: In hyperparameter optimization with tree-structured Parzen estimators (TPE), the optimizer creates a model of the relation between hyperparameters and measured performance of the machine learning model to optimize. Areas of the search space with a better performance are searched more likely.