Delivery Format

We provide flexible access to our data, so your team always gets the information in the right format, at the right time. You can choose between 4 different flat file datasets format.

Format
Pros
Cons

JSON

- Human-readable and widely used- Nested structures supported- Great for APIs and web data

- Large file sizes (not space-efficient)- Slower to parse on big datasets- Harder to process in batch (not line-delimited)

JSONL (JSON Lines)

- Line-delimited, easy to stream and process- Works well with big data pipelines (can split files easily)- Still human-readable

- Less widely adopted than plain JSON- No built-in schema enforcement- Slightly less convenient for nested arrays than JSON

CSV

- Very simple and widely supported- Compact for tabular data- Easy to open in Excel/Sheets

- No support for nested/complex data- No types (everything is text)- Ambiguities with delimiters, quoting, encoding

Parquet

- Highly compressed and columnar (great for analytics)- Efficient for large-scale queries (only reads needed columns)- Schema + data types preserved- Optimized for big data tools (Spark, Hive, Snowflake)

- Not human-readable- Higher overhead for small datasets- More complex libraries needed to read/write


What does the data look like?

When you choose to export or receive data, you can select different formats. Each format represents the data in a different way. Here’s what you can expect:

1. JSON (JavaScript Object Notation)

{
  "id": 1,
  "name": "Alice",
  "email": "[email protected]",
  "skills": ["Python", "SQL"]
}

Key points:

  • A human-readable text file used almost everywhere.

  • Stores data as objects with keys ("name", "email") and values.

  • Can handle lists, nested objects, and complex structures.

When to use it:

  • Great for structured data that needs to be read or shared easily.

  • Widely used in APIs and integrations.

2. JSONL (JSON Lines)

What it looks like:

{"id": 1, "name": "Alice", "email": "[email protected]"}
{"id": 2, "name": "Bob", "email": "[email protected]"}
{"id": 3, "name": "Charlie", "email": "[email protected]"}

Key points:

  • Each line in the file is its own JSON object.

  • Easy to process line by line, especially with large files.

  • Lighter and faster than a big JSON array for bulk data.

When to use it:

  • Perfect for large-scale data processing (Spark, Elasticsearch, etc.).

  • Useful for logs, events, or streaming-style data.

3. CSV (Comma-Separated Values)

What it looks like:

id,name,email
1,Alice,[email protected]
2,Bob,[email protected]
3,Charlie,[email protected]

Key points:

  • A simple text table, with each row as a record and each column separated by a comma (sometimes semicolon).

  • Opens easily in Excel, Google Sheets, or any spreadsheet tool.

  • Doesn’t support nested or complex data—everything is flattened into rows and columns.

When to use it:

  • Best for simple tabular data.

  • Ideal if you want to quickly view and manipulate your data in Excel.

4. Parquet

What it looks like:

  • You cannot open a Parquet file in a normal text editor.

  • It’s a binary, compressed format optimized for storage and analytics.

Example (using Python/Pandas):

import pandas as pd

df = pd.read_parquet("data.parquet")
print(df.head())

Output (table view):

   id     name               email
0   1    Alice   [email protected]
1   2      Bob     [email protected]
2   3  Charlie  [email protected]

Key points:

  • Stores data with proper types (numbers, strings, dates, etc.).

  • Very efficient in terms of size and query performance.

  • Supported by all major data tools (Snowflake, BigQuery, Spark, etc.).

When to use it:

  • Best for large datasets.

  • Ideal for analytics pipelines and machine learning workloads.


Last updated

Was this helpful?