AgentQL is a powerful and intuitive query language designed to simplify web data extraction and automation. Unlike traditional methods like XPath and CSS selectors, which are prone to breaking and complex to maintain, AgentQL allows developers to locate web elements using natural language-like queries. This robust tool adapts to website structure changes, offers context-aware element selection, and reduces maintenance efforts. AgentQL is ideal for a variety of use cases, including data scraping, web automation, and end-to-end testing, making it easier to extract structured data, automate workflows, and build reliable test suites.
Table of Contents
- Introducing AgentQL
- Understanding AgentQL’s Architecture
- Hands-on Implementation
- Testing on the AgentQL’s Playground
Let’s start by understanding what AgentQL is.
Introducing AgentQL
AgentQL is an AI-powered query language that transforms how developers interact with web elements and extract data. Unlike traditional selectors, it uses natural language queries to precisely locate web elements, making web automation more intuitive and resilient to site updates.The system consists of three core components: the AgentQL Query Language for human-readable element location, the Python SDK for seamless integration with automation frameworks like Playwright, and the AgentQL Debugger Chrome Extension for real-time query testing.
Developers can interact with web content through element queries (query_elements() and get_by_prompt()) for page manipulation, and data queries (query_data()) for structured information extraction. This versatile toolset simplifies common web automation tasks like testing, scraping, and data collection while providing a more maintainable and robust solution compared to conventional methods.
Understanding AgentQL’s Architecture
AgentQL’s architecture is built on a sophisticated pipeline that transforms natural language queries into precise web element selection and data extraction. The system begins by processing two primary input sources: the page’s HTML structure and its accessibility tree, providing a comprehensive understanding of the web content. Upon receiving a query, AgentQL first simplifies the input by removing unnecessary noise and metadata, then dynamically selects between two specialized pipelines: one optimized for data scraping with high accuracy, and another for web automation focusing on speed and reliability.
At its core, AgentQL leverages various Large Language Models (including GPT-4, Llama, and Gemini) alongside its proprietary model, choosing the most appropriate one based on task complexity. The results undergo rigorous grounding and validation processes to ensure accuracy and alignment with the original context.
Hands-on Implementation
Step 1: Setup Your Environment
Create a new file called search_automation.py:
import agentql
from playwright.sync_api import sync_playwright
from agentql.ext.playwright.sync_api import Page
URL = "https://scrapeme.live/shop"
Step 2: Define Your Search and Data Queries
Define queries to locate the search box, fetch product details, and add the “Qwilfish” product to the cart:
SEARCH_BOX_QUERY = """
{
search_product_box
}
"""
PRODUCT_DATA_QUERY = """
{
price_currency
products[] {
name
price
}
}
"""
NATURAL_LANGUAGE_PROMPT = "Button to display Qwilfish page"
Step 3: Extract Product Data
Create a function to navigate to the site, perform the search, and extract data on products listed:
def _extract_product_data(page: Page, search_key_word: str) -> dict:
response = page.query_elements(SEARCH_BOX_QUERY)
response.search_product_box.type(search_key_word, delay=200)
page.keyboard.press("Enter")
data = page.query_data(PRODUCT_DATA_QUERY)
return data
Step 4: Interact with Elements
Define a function to interact with the “Qwilfish” product and add it to the cart:
def _add_qwilfish_to_cart(page: Page):
qwilfish_page_btn = page.get_by_prompt(NATURAL_LANGUAGE_PROMPT)
if qwilfish_page_btn:
qwilfish_page_btn.click()
page.wait_for_timeout(10000)
Step 5: Create the Main Function
Combine the functions to launch the browser, execute the search, retrieve product data, and interact with elements:
def main():
with sync_playwright() as playwright, playwright.chromium.launch(headless=False) as browser:
page = agentql.wrap(browser.new_page())
page.goto(URL)
product_data = _extract_product_data(page, search_key_word="fish")
print(product_data)
_add_qwilfish_to_cart(page)
Step 6: Run Your Script
Run the complete script with the command below:
python search_automation.py
Output
{'price_currency': '£', 'products': [{'name': 'Qwilfish', 'price': 77.0}, {'name': 'Huntail', 'price': 52.0}, {'name': 'Marill', 'price': 127.0}, {'name': 'Croconaw', 'price': 175.0}, {'name': 'Seadra', 'price': 51.0}, {'name': 'Slowbro', 'price': 179.0}]}
Testing on AgentQL’s Playground
AgentQL Playground offers a user-friendly interface for quick and effective web scraping. Start by visiting the Playground, entering the URL of the website you want to scrape, and clicking on “Suggest a Query.” Simply use natural language to describe what you need—for example, “Find all product prices on the page”—and AgentQL will generate an appropriate query.
This interactive approach allows you to experiment with different queries, view real-time results, and refine your queries before implementing them, helping you understand how AgentQL interprets and converts requests into effective scraping queries.
Output
Final Words
AgentQL offers an intuitive alternative to traditional selector-based approaches. By harnessing natural language processing, it bridges the gap between human intent and web interaction, making automation tasks simpler and more reliable. Whether you’re building tests, scrapers, or automation scripts, AgentQL’s intelligent query system, SDK, and debugging tools streamline your development process. As web applications grow increasingly complex, AgentQL’s adaptive approach and AI-powered understanding ensure your automation remains resilient while maintaining clean, maintainable code.