How to query the adjacent chunks given a chunk in AI Search?

02:31 17 Feb 2026

I'm using skills include Content Understanding to process documents and split and vectorize them into chunks.

I'm dealing with a problem where table is cut off in some chunks, so I'm finding a way to retrieve adjacent chunks. So far I tried:

filter_query = (
            f"document_title eq '{title}' and "
            f"location_metadata/pageNumberTo ge {current_page_from - 1} and "
            f"location_metadata/pageNumberTo le {current_page_from} and "
            # Exclude current chunk by comparing both page ranges
            f"(location_metadata/pageNumberFrom ne {current_page_from} or "
            f"location_metadata/pageNumberTo ne {current_page_to})"
        )
order_by = ["location_metadata/pageNumberTo desc"]

But it's doesn't work really well. For example current chunk from page 4 - 7

Result I got: 1-3, 3-3, 4-4, 4-4

The one frustrated me is ordinalPosition even when read documents, I still don't get what it mean. I'm not sure if it shows the order position from original documents?

In my case I think there can be two parent document from a pdf. Here is my skills set:

{
  "@odata.etag": "\"0x8DE694EC5967263\"",
  "name": "content-understanding-skillset",
  "description": "ContentUnderstandingSkill only",
  "skills": [
    {
      "@odata.type": "#Microsoft.Skills.Util.ContentUnderstandingSkill",
      "name": "content-understanding-skill",
      "description": "Extract text and images with location metadata and chunk text",
      "context": "/document",
      "extractionOptions": [
        "images",
        "locationMetadata"
      ],
      "inputs": [
        {
          "name": "file_data",
          "source": "/document/file_data",
          "inputs": []
        }
      ],
      "outputs": [
        {
          "name": "text_sections",
          "targetName": "text_sections"
        },
        {
          "name": "normalized_images",
          "targetName": "normalized_images"
        }
      ],
      "chunkingProperties": {
        "unit": "characters",
        "maximumLength": 8000,
        "overlapLength": 3000
      }
    },
    {
      "@odata.type": "#Microsoft.Skills.Text.AzureOpenAIEmbeddingSkill",
      "name": "text-embedding-skill",
      "description": "Embedding skill for text chunks",
      "context": "/document/text_sections/*",
      "resourceUri": "https://ptechagent.cognitiveservices.azure.com",
      "apiKey": "",
      "deploymentId": "text-embedding-3-large",
      "dimensions": 3072,
      "modelName": "text-embedding-3-large",
      "inputs": [
        {
          "name": "text",
          "source": "/document/text_sections/*/content",
          "inputs": []
        }
      ],
      "outputs": [
        {
          "name": "embedding",
          "targetName": "text_vector"
        }
      ]
    },
    {
      "@odata.type": "#Microsoft.Skills.Custom.ChatCompletionSkill",
      "name": "image-verbalization-skill",
      "description": "GenAI Prompt skill for image verbalization",
      "context": "/document/normalized_images/*",
      "uri": "",
      "httpMethod": "POST",
      "timeout": "PT1M",
      "batchSize": 1,
      "apiKey": "",
      "inputs": [
        {
          "name": "systemMessage",
          "source": "='You are tasked with generating concise, accurate descriptions of images, figures, diagrams, or charts in documents. The goal is to capture the key information and meaning conveyed by the image without including extraneous details like style, colors, visual aesthetics, or size.\n\nInstructions:\nContent Focus: Describe the core content and relationships depicted in the image.\n\nFor diagrams, specify the main elements and how they are connected or interact.\nFor charts, highlight key data points, trends, comparisons, or conclusions.\nFor figures or technical illustrations, identify the components and their significance.\nClarity & Precision: Use concise language to ensure clarity and technical accuracy. Avoid subjective or interpretive statements.\n\nAvoid Visual Descriptors: Exclude details about:\n\nColors, shading, and visual styles.\nImage size, layout, or decorative elements.\nFonts, borders, and stylistic embellishments.\nContext: If relevant, relate the image to the broader content of the technical document or the topic it supports.\n\nExample Descriptions:\nDiagram: \"A flowchart showing the four stages of a machine learning pipeline: data collection, preprocessing, model training, and evaluation, with arrows indicating the sequential flow of tasks.\"\n\nChart: \"A bar chart comparing the performance of four algorithms on three datasets, showing that Algorithm A consistently outperforms the others on Dataset 1.\"\n\nFigure: \"A labeled diagram illustrating the components of a transformer model, including the encoder, decoder, self-attention mechanism, and feedforward layers.\"'",
          "inputs": []
        },
        {
          "name": "userMessage",
          "source": "='Please describe this image.'",
          "inputs": []
        },
        {
          "name": "image",
          "source": "/document/normalized_images/*/data",
          "inputs": []
        }
      ],
      "outputs": [
        {
          "name": "response",
          "targetName": "verbalizedImage"
        }
      ],
      "httpHeaders": {}
    },
    {
      "@odata.type": "#Microsoft.Skills.Text.AzureOpenAIEmbeddingSkill",
      "name": "verbalized-image-embedding-skill",
      "description": "Embedding skill for verbalized images",
      "context": "/document/normalized_images/*",
      "resourceUri": "",
      "apiKey": "",
      "deploymentId": "text-embedding-3-large",
      "dimensions": 3072,
      "modelName": "text-embedding-3-large",
      "inputs": [
        {
          "name": "text",
          "source": "/document/normalized_images/*/verbalizedImage",
          "inputs": []
        }
      ],
      "outputs": [
        {
          "name": "embedding",
          "targetName": "verbalizedImage_vector"
        }
      ]
    }
  ],
  "cognitiveServices": {
    "@odata.type": "#Microsoft.Azure.Search.AIServicesByKey",
    "description": "Cognitive Services resource for Content Understanding Skill",
    "key": "",
    "subdomainUrl": "https://ptechagent.services.ai.azure.com/"
  },
  "knowledgeStore": {
    "storageConnectionString": "",
    "projections": [
      {
        "tables": [],
        "objects": [],
        "files": [
          {
            "storageContainer": "marketbot-documents-images",
            "generatedKeyName": "marketbot-documents-imagesKey",
            "source": "/document/normalized_images/*",
            "inputs": []
          }
        ]
      }
    ],
    "parameters": {
      "synthesizeGeneratedKeyName": true
    }
  },
  "indexProjections": {
    "selectors": [
      {
        "targetIndexName": "content-understanding-index",
        "parentKeyFieldName": "text_document_id",
        "sourceContext": "/document/text_sections/*",
        "mappings": [
          {
            "name": "content_text",
            "source": "/document/text_sections/*/content",
            "inputs": []
          },
          {
            "name": "content_embedding",
            "source": "/document/text_sections/*/text_vector",
            "inputs": []
          },
          {
            "name": "location_metadata",
            "source": "/document/text_sections/*/locationMetadata",
            "inputs": []
          },
          {
            "name": "document_title",
            "source": "/document/document_title",
            "inputs": []
          }
        ]
      },
      {
        "targetIndexName": "content-understanding-index",
        "parentKeyFieldName": "image_document_id",
        "sourceContext": "/document/normalized_images/*",
        "mappings": [
          {
            "name": "content_text",
            "source": "/document/normalized_images/*/verbalizedImage",
            "inputs": []
          },
          {
            "name": "content_embedding",
            "source": "/document/normalized_images/*/verbalizedImage_vector",
            "inputs": []
          },
          {
            "name": "content_path",
            "source": "/document/normalized_images/*/imagePath",
            "inputs": []
          },
          {
            "name": "location_metadata",
            "source": "/document/normalized_images/*/locationMetadata",
            "inputs": []
          },
          {
            "name": "document_title",
            "source": "/document/document_title",
            "inputs": []
          }
        ]
      }
    ],
    "parameters": {
      "projectionMode": "skipIndexingParentDocuments"
    }
  }
}

Thank you

azure-ai-search

Your Answer

Privacy & Cookie Consent