I am trying to crawl and parse a dynamic content of website using selenium, Generally the website am crawling loads its content by the scroll event from the page,So i triggered the scroll event by selenium until the end of the page is reached.
In Product phase am fetching each product detail by loop iteration, it also works fine.. but when it reaches iteration count of 280 above....
This is my code below...
private void init() throws IOException {
FirefoxProfile profile = new FirefoxProfile();//Create Firefox profile
profile.setPreference("javascript.enabled", true);//Allow javascript for browser
WebDriver htmDriver = new FirefoxDriver(profile);//add profile to firefoxDriver
htmDriver.get(urlTextField.getText());//Get and Connect to the url from URL text Field
htmDriver.manage().window().maximize();//Maximize the Browser window
String count = htmDriver.findElement(By.cssSelector("#numbFound > #no-of-results-filter")).getText();//Total Product Count for the category
//System.out.println("Total Category Count : "+count);
htmDriver.findElement(By.cssSelector(".list")).click();//Click to view the Product in List
htmDriver.manage().timeouts().implicitlyWait(10, TimeUnit.SECONDS);//Wait
int lCount = Integer.parseInt(count);//Calculate the scroll length
for (int i = 1; i <= Math.ceil(lCount / 5); i++) {
//Generate Arrow Down Action
htmDriver.findElement(By.id("products-main4")).sendKeys(
Keys.ARROW_DOWN);
htmDriver.findElement(By.id("products-main4")).sendKeys(
Keys.ARROW_DOWN);
htmDriver.findElement(By.id("products-main4")).sendKeys(
Keys.ARROW_DOWN);
htmDriver.findElement(By.id("products-main4")).sendKeys(
((JavascriptExecutor) htmDriver).executeScript(
"window.scrollBy(0,document.body.scrollHeight)", "");
htmDriver.manage().timeouts().implicitlyWait(5, TimeUnit.SECONDS);
}
//Product Phase
int row2 = 0;
List rdata = htmDriver.findElements(By.className("product_list_view_cont"));//selector to select each product row
for (WebElement data : rdata) {
String title = data.findElement(By.cssSelector(".product_list_view_heading")).getText();//Get Product Title
System.out.println(title);
//Check if Product Price available
boolean product_price = data.findElements(By.cssSelector(".product_list_view_price_outer span")).isEmpty();
if(product_price == false){
//Get the Price of the Product
String price = data.findElement(By.cssSelector(".product_list_view_price_outer var[id^=selling-price-id-]")).getText().trim();
System.out.println(price);
}else{
//If Price not Available add make the data null
system.out.println("No price")
}
String brand = data.findElement(By.cssSelector("ul.key-features li")).getText();
System.out.println(brand);
String brandUrl = data.findElement(By.cssSelector(".product_list_view_info_cont a")).getAttribute("href");//Fetch Brand Url
System.out.println(brandUrl);
String status = data.findElement(By.cssSelector(".product_list_view_buy-outer .lfloat")).getText();//Fetch Brand Url
System.out.println(status);
}
}
selenium throws exception as follows
Feb 18, 2015 10:00:10 AM org.apache.http.impl.execchain.RetryExec execute
INFO: I/O exception (java.net.BindException) caught when processing request to {}->http://localhost:7055: Address already in use: connect
Feb 18, 2015 10:00:10 AM org.apache.http.impl.execchain.RetryExec execute
INFO: Retrying request to {}->http://localhost:7055
Feb 18, 2015 10:00:10 AM org.apache.http.impl.execchain.RetryExec execute
INFO: I/O exception (java.net.BindException) caught when processing request to {}->http://localhost:7055: Address already in use: connect
Feb 18, 2015 10:00:10 AM org.apache.http.impl.execchain.RetryExec execute
INFO: Retrying request to {}->http://localhost:7055
Feb 18, 2015 10:00:10 AM org.apache.http.impl.execchain.RetryExec execute
INFO: I/O exception (java.net.BindException) caught when processing request to {}->http://localhost:7055: Address already in use: connect
Feb 18, 2015 10:00:10 AM org.apache.http.impl.execchain.RetryExec execute
INFO: Retrying request to {}->http://localhost:7055
Feb 18, 2015 10:00:12 AM org.apache.http.impl.execchain.RetryExec execute
INFO: I/O exception (java.net.BindException) caught when processing request to {}->http://localhost:7055: Address already in use: connect
Feb 18, 2015 10:00:12 AM org.apache.http.impl.execchain.RetryExec execute
INFO: Retrying request to {}->http://localhost:7055
Feb 18, 2015 10:00:12 AM org.apache.http.impl.execchain.RetryExec execute
INFO: I/O exception (java.net.BindException) caught when processing request to {}->http://localhost:7055: Address already in use: connect
Feb 18, 2015 10:00:12 AM org.apache.http.impl.execchain.RetryExec execute
INFO: Retrying request to {}->http://localhost:7055
Feb 18, 2015 10:00:12 AM org.apache.http.impl.execchain.RetryExec execute
INFO: I/O exception (java.net.BindException) caught when processing request to {}->http://localhost:7055: Address already in use: connect
Feb 18, 2015 10:00:12 AM org.apache.http.impl.execchain.RetryExec execute
INFO: Retrying request to {}->http://localhost:7055
for each Iteration after some times...