> For the complete documentation index, see [llms.txt](https://documentation.scrapin.io/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://documentation.scrapin.io/dataset/basics/editor.md).

# Delivery Format

We provide flexible access to our data, so your team always gets the information in the right format, at the right time. You can choose between 4 different flat file datasets format.

| Format                 | Pros                                                                                                                                                                                                        | Cons                                                                                                                        |
| ---------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------- |
| **JSON**               | - Human-readable and widely used- Nested structures supported- Great for APIs and web data                                                                                                                  | - Large file sizes (not space-efficient)- Slower to parse on big datasets- Harder to process in batch (not line-delimited)  |
| **JSONL** (JSON Lines) | - Line-delimited, easy to stream and process- Works well with big data pipelines (can split files easily)- Still human-readable                                                                             | - Less widely adopted than plain JSON- No built-in schema enforcement- Slightly less convenient for nested arrays than JSON |
| **CSV**                | - Very simple and widely supported- Compact for tabular data- Easy to open in Excel/Sheets                                                                                                                  | - No support for nested/complex data- No types (everything is text)- Ambiguities with delimiters, quoting, encoding         |
| **Parquet**            | - Highly compressed and columnar (great for analytics)- Efficient for large-scale queries (only reads needed columns)- Schema + data types preserved- Optimized for big data tools (Spark, Hive, Snowflake) | - Not human-readable- Higher overhead for small datasets- More complex libraries needed to read/write                       |

***

### What does the data look like? <a href="#what-does-the-data-look-like" id="what-does-the-data-look-like"></a>

When you choose to export or receive data, you can select different formats. Each format represents the data in a different way. Here’s what you can expect:

#### **1. JSON (JavaScript Object Notation)**

```json
{
  "id": 1,
  "name": "Alice",
  "email": "alice@example.com",
  "skills": ["Python", "SQL"]
}
```

**Key points:**

* A **human-readable text file** used almost everywhere.
* Stores data as objects with keys (`"name"`, `"email"`) and values.
* Can handle lists, nested objects, and complex structures.

**When to use it:**

* Great for structured data that needs to be read or shared easily.
* Widely used in APIs and integrations.

#### **2. JSONL (JSON Lines)**

**What it looks like:**

```json
{"id": 1, "name": "Alice", "email": "alice@example.com"}
{"id": 2, "name": "Bob", "email": "bob@example.com"}
{"id": 3, "name": "Charlie", "email": "charlie@example.com"}
```

**Key points:**

* Each **line in the file** is its own JSON object.
* Easy to process line by line, especially with large files.
* Lighter and faster than a big JSON array for bulk data.

**When to use it:**

* Perfect for **large-scale data processing** (Spark, Elasticsearch, etc.).
* Useful for **logs, events, or streaming-style data**.

#### **3. CSV (Comma-Separated Values)**

**What it looks like:**

```csv
id,name,email
1,Alice,alice@example.com
2,Bob,bob@example.com
3,Charlie,charlie@example.com
```

**Key points:**

* A **simple text table**, with each row as a record and each column separated by a comma (sometimes semicolon).
* Opens easily in Excel, Google Sheets, or any spreadsheet tool.
* Doesn’t support nested or complex data—everything is flattened into rows and columns.

**When to use it:**

* Best for **simple tabular data**.
* Ideal if you want to quickly view and manipulate your data in Excel.

#### **4. Parquet**

**What it looks like:**

* You **cannot open a Parquet file in a normal text editor**.
* It’s a **binary, compressed format** optimized for storage and analytics.

**Example (using Python/Pandas):**

```python
import pandas as pd

df = pd.read_parquet("data.parquet")
print(df.head())
```

Output (table view):

```
   id     name               email
0   1    Alice   alice@example.com
1   2      Bob     bob@example.com
2   3  Charlie  charlie@example.com
```

**Key points:**

* Stores data with proper types (numbers, strings, dates, etc.).
* Very efficient in terms of **size** and **query performance**.
* Supported by all major data tools (Snowflake, BigQuery, Spark, etc.).

**When to use it:**

* Best for **large datasets**.
* Ideal for **analytics pipelines** and **machine learning workloads**.

***


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://documentation.scrapin.io/dataset/basics/editor.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
