Appearance
Scraping
Scraping a web page.
Endpoint
GET or POST https://api.browserku.com
Your First Request
Let's make GET
request to scrape https://example.com
:
bash
curl https://api.browserku.com \
-G \
--data-urlencode "url=https//example.com"
The result will be a JSON object which looks like this:
json
{
"meta": {
"lang": "en",
"title": "NASA",
"description": "NASA.gov brings you the latest news, images and videos from America's space agency, pioneering the future in space exploration, scientific discovery and aeronautics research."
}
}
Browserku automatically extracts important information from this web page for you:
meta.title
: The page titlemeta.description
: The page descriptionmeta.lang
: The page language, fromlang
attribute on the<html>
elementmeta.themeColor
: The page theme color, from<meta name="theme-color">
element
If you want to retrive the entire html from this page, use includeHtml
option, a new html
property will be added to the result object.
Taking Screenshots
To take screenshots, using the screenshot
option, the easiest way is to set it to true
:
bash
curl https://api.browserku.com \
-G \
--data-urlencode "url=https//example.com" \
--data-urlencode "screenshot=true"
Now the result will be a JSON object which looks like this:
json
{
"meta": {
"title": "NASA",
"description": "NASA.gov brings you the latest news, images and videos from America's space agency, pioneering the future in space exploration, scientific discovery and aeronautics research.",
"lang": "en"
},
"screenshot": {
"url": "https://r2.browserku.com/beGjNFfLCLSanjFqIawPx.png",
"width": 1280,
"height": 720,
"size": 794292
}
}
The saved screenshot url will be valid for 7 days, after that it will be automatically deleted. Consult screenshot
option to learn more.
If you want Browserku to directly return the screenshot image instead of the JSON object as the response, you can use response
option and set it to screenshot.url
:
bash
curl https://api.browserku.com \
-G \
--data-urlencode "url=https//example.com" \
--data-urlencode "screenshot=true" \
--data-urlencode "response=screenshot.url"
ts
const { result } = await browserku.scrape({
url: "https://example.com",
screenshot: true,
response: "screenshot.url",
})
Basically you can use response
option to reference any value in the result object and make Browserku use it as the actuall response. Make sure the value you referenced is a URL.
Query Parameters / Body
url
- Type:
string?
The URL you want to scrape.
source
- Type:
string?
Instead of using an external url
, you can provide a source
HTML to scrape.
Learn more about Using Custom HTML.
includeHtml
- Type:
boolean?
Whether to include the scraped HTML in the response.
waitUntil
- Type: One of
load
,domcontentloaded
,networkidle0
,networkidle2
- Default:
networkidle0
Define when the page is considered ready.
timeout
- Type:
int?
>=0 <=60,000
Page nagivation timeout in milliseconds.
blockAds
- Type:
boolean?
- Default:
true
Whether to block ads.
proxy
- Type:
string?
Proxy server to use.
device
- Type:
string?
Simulate the viewport, userAgent of the given device.
userAgent
- Type:
string?
Override the user agent.
viewport
- Type:
object?
Property | Type | Description |
---|---|---|
width | int | |
height | int | |
deviceScaleFactor | int? | Must be betwwen 0 ~ 10 |
isMobile | boolean? | |
hasTouch | boolean? | |
isLandscape | boolean? |
Override the viewport.
auth
- Type:
object?
Property | Type | Description |
---|---|---|
username | string | |
password | string |
Basic authentication.
waitForSelector
- Type:
string?
Wait for a selector to appear to consider the page as ready.
waitForTimeout
- Type:
int?
>=0 <=60,000
Wait for a timeout (in milliseconds) to consider the page as ready.
animations
- Type:
boolean?
- Default:
false
Whether to enable CSS animations.
scripts
- Type:
string[]?
Inject custom scripts into the page.
json
[
"document.body.style.backgroundColor = 'red'",
"https://your-website.com/custom-script.js"
]
Can be either a JavaScript string or a URL to your script.
styles
- Type:
string[]?
Inject custom styles into the page.
json
["body {background: red}", "https://your-website.com/custom-style.css"]
Can be either a CSS string or a URL to your CSS file.
click
- Type:
string?
Click an element when the page is ready.
cookies
- Type:
object[]?
Property | Type | Description |
---|---|---|
name | string | |
value | string | |
domain | string? | |
path | string? | |
secure | boolean? | |
httpOnly | boolean? | |
sameSite | string? | Can be either Lax or Strict |
Attach cookies to the request.
ttl
- Type:
string?
- Default:
1d
Define how long the response should be cached for.
We use ms to parse human-readable time into milliseconds.
noCache
- Type:
boolean
Skip cache (if exists).
rejectResourceTypes
- Type:
string[]?
Reject requests to the given resource types.
Available resource types:
ts
type ResourceType =
| "document"
| "stylesheet"
| "image"
| "media"
| "font"
| "script"
| "texttrack"
| "xhr"
| "fetch"
| "eventsource"
| "websocket"
| "manifest"
| "other"
allowResourceTypes
- Type:
string[]?
The reverse of rejectResourceTypes
, only allow requests to the given resource types to be fetched. By default all requests are allowed.
javascript
- Type:
boolean?
- Default:
true
Whether to enable JavaScript.
screenshot
- Type:
boolean?
object?
Property | Type | Description |
---|---|---|
type | string? | Can be either jpeg or png , png by default |
quality | int? | Must be between 0 and 100 |
selector | string? | Only screenshot the specific element |
fullPage | boolean? | |
omitBackground | boolean? |
Take a screenshot of the page or the specified selector
.
A new property screenshot
will be added to the response:
json
{
"screenshot": {
"url": "https://r2.browserku.com/some_random_id.png",
"width": 800,
"height": 600,
"size": 1024
}
}
- Type:
boolean?
object?
Property | Type | Description |
---|---|---|
scale | int? | Must be between 0 and 10 |
displayHeaderFooter | boolean? | |
headerTemplate | string? | |
footerTemplate | string? | |
format | string? | One of "Letter", "Legal", "Tabloid", "Ledger", "A0", "A1", "A2", "A3", "A4", "A5", "A6" |
margin | object? | |
landscape | boolean? | |
pageRanges | string? | |
printBackground | boolean? | |
omitBackground | boolean? |
response
- Type:
string?
Use a property in response data as the actual response.
For example, let's you have a request:
text
GET /?screenshot=true&url=https://example.com
You get a response like this:
json
{
"screenshot": {
"url": "https://r2.browserku.com/some_random_id.png"
}
}
If you want to directly serve the the screenshot image as the response, you can append &response=screenshot.url
to the request URL:
text
GET /?screenshot=true&url=https://example.com&response=screenshot.url
debug
- Type:
boolean?
Add debug information to the result.