# Delivery Format

We provide flexible access to our data, so your team always gets the information in the right format, at the right time. You can choose between 4 different flat file datasets format.

| Format                 | Pros                                                                                                                                                                                                        | Cons                                                                                                                        |
| ---------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------- |
| **JSON**               | - Human-readable and widely used- Nested structures supported- Great for APIs and web data                                                                                                                  | - Large file sizes (not space-efficient)- Slower to parse on big datasets- Harder to process in batch (not line-delimited)  |
| **JSONL** (JSON Lines) | - Line-delimited, easy to stream and process- Works well with big data pipelines (can split files easily)- Still human-readable                                                                             | - Less widely adopted than plain JSON- No built-in schema enforcement- Slightly less convenient for nested arrays than JSON |
| **CSV**                | - Very simple and widely supported- Compact for tabular data- Easy to open in Excel/Sheets                                                                                                                  | - No support for nested/complex data- No types (everything is text)- Ambiguities with delimiters, quoting, encoding         |
| **Parquet**            | - Highly compressed and columnar (great for analytics)- Efficient for large-scale queries (only reads needed columns)- Schema + data types preserved- Optimized for big data tools (Spark, Hive, Snowflake) | - Not human-readable- Higher overhead for small datasets- More complex libraries needed to read/write                       |

***

### What does the data look like? <a href="#what-does-the-data-look-like" id="what-does-the-data-look-like"></a>

When you choose to export or receive data, you can select different formats. Each format represents the data in a different way. Here’s what you can expect:

#### **1. JSON (JavaScript Object Notation)**

```json
{
  "id": 1,
  "name": "Alice",
  "email": "alice@example.com",
  "skills": ["Python", "SQL"]
}
```

**Key points:**

* A **human-readable text file** used almost everywhere.
* Stores data as objects with keys (`"name"`, `"email"`) and values.
* Can handle lists, nested objects, and complex structures.

**When to use it:**

* Great for structured data that needs to be read or shared easily.
* Widely used in APIs and integrations.

#### **2. JSONL (JSON Lines)**

**What it looks like:**

```json
{"id": 1, "name": "Alice", "email": "alice@example.com"}
{"id": 2, "name": "Bob", "email": "bob@example.com"}
{"id": 3, "name": "Charlie", "email": "charlie@example.com"}
```

**Key points:**

* Each **line in the file** is its own JSON object.
* Easy to process line by line, especially with large files.
* Lighter and faster than a big JSON array for bulk data.

**When to use it:**

* Perfect for **large-scale data processing** (Spark, Elasticsearch, etc.).
* Useful for **logs, events, or streaming-style data**.

#### **3. CSV (Comma-Separated Values)**

**What it looks like:**

```csv
id,name,email
1,Alice,alice@example.com
2,Bob,bob@example.com
3,Charlie,charlie@example.com
```

**Key points:**

* A **simple text table**, with each row as a record and each column separated by a comma (sometimes semicolon).
* Opens easily in Excel, Google Sheets, or any spreadsheet tool.
* Doesn’t support nested or complex data—everything is flattened into rows and columns.

**When to use it:**

* Best for **simple tabular data**.
* Ideal if you want to quickly view and manipulate your data in Excel.

#### **4. Parquet**

**What it looks like:**

* You **cannot open a Parquet file in a normal text editor**.
* It’s a **binary, compressed format** optimized for storage and analytics.

**Example (using Python/Pandas):**

```python
import pandas as pd

df = pd.read_parquet("data.parquet")
print(df.head())
```

Output (table view):

```
   id     name               email
0   1    Alice   alice@example.com
1   2      Bob     bob@example.com
2   3  Charlie  charlie@example.com
```

**Key points:**

* Stores data with proper types (numbers, strings, dates, etc.).
* Very efficient in terms of **size** and **query performance**.
* Supported by all major data tools (Snowflake, BigQuery, Spark, etc.).

**When to use it:**

* Best for **large datasets**.
* Ideal for **analytics pipelines** and **machine learning workloads**.

***
