A Complete List of Google’s Autocomplete Signals in Chrome

Google Chrome utilizes a machine learning model for address bar autocomplete. This model, likely a Multilayer Perceptron (MLP) processes numerous input signals to predict and rank suggestions.

auto-complete-model

Here’s a breakdown of these signals:

Input Features:

User Browsing History:

  • log_visit_count: (float32[-1,1]) Logarithmic count of user visits to the URL.
  • log_typed_count: (float32[-1,1]) Logarithmic count of the URL being typed in the address bar.
  • log_shortcut_visit_count: (float32[-1,1]) Logarithmic count of user visits to the URL via a desktop shortcut.
  • elapsed_time_last_visit_days: (float32[-1,1]) Days elapsed since the user last visited the URL.
  • log_elapsed_time_last_visit_secs: (float32[-1,1]) Logarithmic seconds elapsed since the user last visited the URL.
  • elapsed_time_last_shortcut_visit_days: (float32[-1,1]) Days elapsed since the user last visited the URL via a desktop shortcut.
  • log_elapsed_time_last_shortcut_visit_sec: (float32[-1,1]) Logarithmic seconds elapsed since the user last visited the URL via a desktop shortcut.
  • num_bookmarks_of_url: (float32[-1,1]) Count of bookmarks associated with the URL.
  • shortest_shortcut_len: (float32[-1,1]) Length of the shortest desktop shortcut for the URL.

Website Characteristics:

  • length_of_url: (float32[-1,1]) Length of the URL string.

Match Characteristics:

  • total_title_match_length: (float32[-1,1]) Total length of matches between the user’s input and the website title.
  • total_bookmark_title_match_length: (float32[-1,1]) Total length of matches between the user’s input and the bookmark titles for the URL.
  • total_host_match_length: (float32[-1,1]) Total length of matches between the user’s input and the URL host.
  • total_path_match_length: (float32[-1,1]) Total length of matches between the user’s input and the URL path.
  • total_query_or_ref_match_length: (float32[-1,1]) Total length of matches between the user’s input and the URL query/referral parts.
  • first_url_match_position: (float32[-1,1]) Position of the first match between the user’s input and the URL.
  • first_bookmark_title_match_position: (float32[-1,1]) Position of the first match between the user’s input and the bookmark titles for the URL.
  • host_match_at_word_boundary: (float32[-1,1]) Boolean indicator of whether the host match occurs at a word boundary.
  • has_non_scheme_www_match: (float32[-1,1]) Boolean indicator of whether a match occurs without considering the scheme (http/https) or “www” prefix.
  • is_host_only: (float32[-1,1]) Boolean indicator of whether the user’s input matches the host only.

Model Processing:

These features are fed into the neural network. The network architecture, including specific layers and weights, is defined within the model file.

Output:

The model outputs a prediction score (float32[-1,1]) representing the relevance of each potential autocomplete suggestion. This score is used to rank suggestions, with higher scores appearing higher in the address bar dropdown.

Model Architecture:

  1. Input Layer: 20 input features, each represented by a separate node (e.g., elapsed_time_last_shortcut_visit_dayslog_visit_counttotal_title_match_length).
  2. Concatenation Layer: All 20 input features are concatenated along axis 1, resulting in a single tensor of shape ? x 20. The “?” indicates a variable batch size.
  3. Dense Layer (FullyConnected): A fully connected layer with:
    • Weights: Shape 64 x 20, suggesting 64 neurons in this layer. The weights are quantized as int8 for efficiency.
    • Bias: Shape 64, a bias term for each neuron.
    • Activation Function: ReLU (Rectified Linear Unit).
    • Quantization: Asymmetric quantization of inputs is applied.
    • Dense Layer (FullyConnected): Another fully connected layer with:
      • Weights: Shape 1 x 64, leading to a single output neuron.
      • Bias: Shape 1, a bias term for the output neuron.
  1. Logistic Layer: This likely represents a sigmoid activation function applied to the output of the previous dense layer, producing a value between 0 and 1.
  2. Output Layer: A single output node (“sigmoid”) representing the predicted score.

Key Observations:

  • Simple Architecture: The model consists of two hidden dense layers with a ReLU activation and a final sigmoid activation for output.
  • Quantization: The model employs quantization to reduce size and improve performance, using int8 weights for the first dense layer.
  • Feature Engineering: The input features are a combination of raw values and engineered features (e.g., logarithmic transformations, match lengths, boolean indicators).

Dan Petrovic, the managing director of DEJAN, is Australia’s best-known name in the field of search engine optimisation. Dan is a web author, innovator and a highly regarded search industry event speaker.
ORCID iD: https://orcid.org/0000-0002-6886-3211

gemini-logo
Gemini Pro
2 Points