Add Column

Add Column is your Swiss Army knife for creating new data in your dataset. Whether you want to combine existing columns, perform calculations, or transform your data in any way, Add Column is where the magic happens!

What Can You Do With Add Column?

Combine Information 📝

Join first and last names into a full name
Merge address fields into a complete address
Concatenate any text fields together

Perform Calculations 🔢

Calculate totals (like price * quantity)
Work out percentages (like (score / total) * 100)
Convert units (like kilometers to miles)

Create Categories 🏷️

Group customers by spending levels
Classify products by price range
Flag high-priority items

Clean Up Data 🧹

Remove unwanted spaces from text
Fix inconsistent formatting
Standardize values

Add Business Logic 💡

Calculate discounts based on conditions
Apply different tax rates
Set up complex classification rules

🧱 Basic Building Blocks

Literals

These are your raw values used for direct comparisons, calculations, and thresholds. Use them when you need fixed values - like filtering orders above 1000, applying a 0.15 tax rate, or checking if status equals "active".

Type	Technical Name	Examples	Common Uses
Integers	Whole Numbers	`123`, `-456`, `1000`	Order quantities, Counting, Exact thresholds
Floating Point	Decimal Numbers	`123.45`, `-0.5`, `0.15`	Prices, Percentages, Precise measurements
Strings	Text Values	`"hello world"`, `"active"`, `"SKU-123"`	Status labels, Names, IDs
Booleans	True/False Values	`true`, `false`	Flags, Status checks, Logical operations
Null	Empty/Missing Value	`null`	Missing data, Default states, Optional values

Variables/Columns

Reference your dataset columns or create aliases for cleaner expressions. Especially useful when you need to refer to the same column multiple times or want to make your expressions more readable. Column bindings help simplify complex transformations.

Standard identifiers: column_name, my_variable
Column bindings: a * 2
Template syntax: ${column_name}
- This is useful when you want to be more specific in referencing a column, such as ${revenue} * ${quantity} for two numbers, or to concatenate two columns together regardless of what the datatype is, such as `${price} + " " + ${quantity}.

Note: Currently, when referencing columns by name, spaces and special cases are ignored. E.g. "First Name" should be referenced as "firstname" in a formula.

Operators

Arithmetic Operators

Addition: a + b
Subtraction: a - b
Multiplication: a * b
Division: a / b
Negation: -a

Comparison Operators

Equal: a == b
Not Equal: a != b
Less Than: a < b
Less Than or Equal: a <= b
Greater Than: a > b
Greater Than or Equal: a >= b

Logical Operators

AND: a and b
OR: a or b
NOT: !a

Data Structures

Arrays

Arrays are ordered lists of values. In this system they can contain different types of values in the same array.

Array literal: [1, 2, 3]
Array indexing: arr[0]
Can contain mixed types: `[1, "text", true]

Structs/Objects

Structs are collections of named fields that can contain different types of values. They're useful for grouping related data together.

Struct literal: { name: "John", age: 30 }
- Can be created through {name: name, age: age}
- This is useful for conducting joins and useful aggregations.
Nested structures: { user: { name: "John", details: { age: 30 } } }

Functions

Built-in Functions

str(value): Convert value to string
type(value): Get type of value
len(value): Length of string or array
isnull(value): Check if value is null
trim(string): Remove whitespace from string

Complex Expressions

Function Calls

len("hello world") = 11

trim("  hello  ") = hello

str(123) = 123

Nested Expressions

(10 * (5 + 3)) / 2 = 40
array[indexOf(items, "search")]

Conditional Logic

age >= 18 and (status == "active" or role == "admin")
!isnull(value) and value > 0

If Statements

You can use if statements for conditional logic using this syntax:

if condition then value1 else value2

For example:

if ${price} > 100 then "expensive" else "cheap"
if ${age} >= 18 then "adult" else "minor"

You can combine conditions using logical operators:

if ${age} >= 18 and ${income} > 50000 then "qualified" else "not qualified"

Window Functions

Window functions allow you to access data from other rows relative to the current row:

lag(column): Gets the value from the previous row for that column lead(column): Gets the value from the next row for that column

For example:

lag(${price})  // Gets previous row's price
lead(${quantity})  // Gets next row's quantity

Column Bindings

You can create column bindings at the start of an expression using semicolon syntax:

a=column1, b=column2; (a * 2) + b

Type Coercion

Numbers in binary operations are automatically coerced to the appropriate type
String concatenation is performed with + when both operands are strings
Explicit type conversion is available through the str() function

Error Handling

The parser will provide error messages for:

Invalid syntax
Unknown variables/columns
Type mismatches
Invalid function calls
Duplicate field names in structs
Invalid array indices
Unterminated strings or identifiers

Examples

Basic Arithmetic

price * quantity * (1 - discount)

String Operations

trim(first_name) + " " + last_name

Array Operations

values[0] + values[len(values) - 1]

Complex Conditions

amount > 1000 and (
  status == "approved" or 
  (role == "admin" and override == true)
)

Column Transformations

a=original_column; 
{
  raw: a,
  normalized: a / 100,
  category: str(a) + "_category"
}

Best Practices

Use parentheses to make complex expressions more readable and to ensure correct operator precedence
Use column bindings for clarity when referencing the same column multiple times
Break complex expressions into smaller parts using struct literals
Use meaningful names for column bindings
Prefer explicit type conversions over implicit coercion for clarity

Implementation Notes

All expressions maintain their source location information for error reporting
The formula box is case-sensitive
Whitespace is ignored except within string literals
Comments are supported using // comment syntax
Columns will evaluate in real time - if it works, you should see the result.

🎓 Pro Tips

Use parentheses to make complex calculations clearer: (price + tax) * quantity
Break down complex formulas into smaller parts using column bindings
When in doubt, use the ${column_name} syntax for column names
Test your formulas with small examples first
Use comments to explain complex calculations: // Calculate price after tax

❌ Common Mistakes to Avoid

Forgetting quotes around text: Use status == "active", not status == active
Mixing up = and ==: Use == for comparing, = for column bindings
Not handling null values: Check with isnull() when needed
Case sensitivity: "Hello" and "hello" are different
Forgetting to wrap column names with spaces in ${}

🆘 Troubleshooting

If your formula isn't working:

Check for matching quotes and parentheses
Verify column names match exactly
Make sure you're using the right data types
Look for typos in function names
Use the type() function to check what kind of data you're working with

Remember: The preview will update in real-time, so you can see immediately if your formula works!

Why would I want to use Parquet?

On This Page