Delivery Format
We provide flexible access to our data, so your team always gets the information in the right format, at the right time. You can choose between 4 different flat file datasets format.
JSON
- Human-readable and widely used- Nested structures supported- Great for APIs and web data
- Large file sizes (not space-efficient)- Slower to parse on big datasets- Harder to process in batch (not line-delimited)
JSONL (JSON Lines)
- Line-delimited, easy to stream and process- Works well with big data pipelines (can split files easily)- Still human-readable
- Less widely adopted than plain JSON- No built-in schema enforcement- Slightly less convenient for nested arrays than JSON
CSV
- Very simple and widely supported- Compact for tabular data- Easy to open in Excel/Sheets
- No support for nested/complex data- No types (everything is text)- Ambiguities with delimiters, quoting, encoding
Parquet
- Highly compressed and columnar (great for analytics)- Efficient for large-scale queries (only reads needed columns)- Schema + data types preserved- Optimized for big data tools (Spark, Hive, Snowflake)
- Not human-readable- Higher overhead for small datasets- More complex libraries needed to read/write
What does the data look like?
When you choose to export or receive data, you can select different formats. Each format represents the data in a different way. Here’s what you can expect:
1. JSON (JavaScript Object Notation)
{
"id": 1,
"name": "Alice",
"email": "[email protected]",
"skills": ["Python", "SQL"]
}
Key points:
A human-readable text file used almost everywhere.
Stores data as objects with keys (
"name"
,"email"
) and values.Can handle lists, nested objects, and complex structures.
When to use it:
Great for structured data that needs to be read or shared easily.
Widely used in APIs and integrations.
2. JSONL (JSON Lines)
What it looks like:
{"id": 1, "name": "Alice", "email": "[email protected]"}
{"id": 2, "name": "Bob", "email": "[email protected]"}
{"id": 3, "name": "Charlie", "email": "[email protected]"}
Key points:
Each line in the file is its own JSON object.
Easy to process line by line, especially with large files.
Lighter and faster than a big JSON array for bulk data.
When to use it:
Perfect for large-scale data processing (Spark, Elasticsearch, etc.).
Useful for logs, events, or streaming-style data.
3. CSV (Comma-Separated Values)
What it looks like:
id,name,email
1,Alice,[email protected]
2,Bob,[email protected]
3,Charlie,[email protected]
Key points:
A simple text table, with each row as a record and each column separated by a comma (sometimes semicolon).
Opens easily in Excel, Google Sheets, or any spreadsheet tool.
Doesn’t support nested or complex data—everything is flattened into rows and columns.
When to use it:
Best for simple tabular data.
Ideal if you want to quickly view and manipulate your data in Excel.
4. Parquet
What it looks like:
You cannot open a Parquet file in a normal text editor.
It’s a binary, compressed format optimized for storage and analytics.
Example (using Python/Pandas):
import pandas as pd
df = pd.read_parquet("data.parquet")
print(df.head())
Output (table view):
id name email
0 1 Alice [email protected]
1 2 Bob [email protected]
2 3 Charlie [email protected]
Key points:
Stores data with proper types (numbers, strings, dates, etc.).
Very efficient in terms of size and query performance.
Supported by all major data tools (Snowflake, BigQuery, Spark, etc.).
When to use it:
Best for large datasets.
Ideal for analytics pipelines and machine learning workloads.
Last updated
Was this helpful?