How does selenium 2 (webdriver) implement WebElement


The scenario I'm considering is basic:

page = driver.open_page(URL)
linkElement = page.find_elements(XPATH)[0]

(I'm assuming I'm using a remote machine - i.e., remote driver). How does the server know on which element to click.

More generally, I couldn't find an overview of the selenium 2 implementation. That is, something that just tells the story but doesn't go line by line in the code, on the one hand, but something much more detailed than just the api.

7/28/2017 8:33:19 PM

Accepted Answer

You're right, there isn't a good one-size-fits-all "under-the-covers" look at how WebDriver implements the various parts of its API, largely because the actual implementation can be very different depending on the browser and the operating system. The closest you'll come is the various pages on the project wiki.

To answer your specific question, the remote server creates a local instance of the client-side driver, and uses it to locate and click on the element. The driver (InternetExplorerDriver, FirefoxDriver, ChromeDriver, etc.) commonly uses JavaScript to find the element and get its dimensions and location on the page. The element is scrolled into view, if needed, and an OS-level mouse event is sent to the browser window to simulate the click.

However, this is just the common case, and there are exceptions For instance, some browsers may find the element using means other than the JavaScript automation atoms. Likewise, some drivers on some operating systems rely on synthetic events rather than OS-level, or so-called "native", events. The important thing to remember is that the remote server instantiates the same object you would if you invoked the driver locally, without using the Selenium remote server.

5/3/2011 2:58:17 PM

If I understand your question right:

You start remote web driver server. Through a remote driver client (that is your test), this server is told to fetch a page ('URL' is a variable set previously, for instance: String URL = "";).

Then the server is told to find all the elements on the fetched page by some XPath ('XPATH' is something like By.xpath("//div(@class = 'some_button_class')") - this is java implementation of WebDriver and I'm not sure how it's used in Ruby). This command will return a list of WebElements - all the 'div' elements that look like this <div class="some_button_class"><div>. Check out how XPath works, if you're not familliar with it.

Since you're using [0] at the end of your command, you're telling server to return the first element from that list (the first div that looks like <div class="some_button_class"><div>)

At the end, you're telling the server to perform a click on that element. Server gets that command, translates it to javascript and injects that javascript to the page. Injected javascript triggers 'click' event on the page, and performs what would happen if a real user would click on the element.

Hope this helps.

Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow