Python Data Structures

Python Data Structures: Choosing the Right Container for Your Data

Every Python program handles data — but how you store that data matters enormously. Python gives you five essential containers: lists, dictionaries, sets, generators, and tuples. Knowing when to reach for each one is what separates clean, efficient code from messy, slow code.

List — your everyday workhorse

The list is Python’s most flexible container. It holds an ordered sequence of items, allows duplicates, and lets you freely add, remove, or reorder elements whenever you like. Think of it as a dynamic array that grows as needed.

You’ll use lists for collecting user input, storing search results, building queues, or any time the order of items actually matters to you.

fruits = ["mango", "banana", "lychee"] fruits.append("guava") # add to the end fruits[1] = "jackfruit" # replace by index print(fruits[0]) # → mango

Lists preserve insertion order. If you need to keep track of when something was added, a list is your friend.

Dictionary — look things up by name

Dictionaries store data as key-value pairs. Instead of accessing items by position like a list, you access them by a meaningful label — a key. This makes lookups fast regardless of how much data you have stored.

Dictionaries shine in real-world apps: storing user profiles, configuration settings, API responses, or any structured record where fields have names.

student = {"name": "Rahim", "age": 20, "gpa": 3.8} print(student["name"]) # → Rahim student["age"] = 21 # update a value student["city"] = "Dhaka" # add a new key

Keys must be unique — adding the same key twice just updates its value. Keys are also immutable, so strings and numbers work; lists do not.

Set — when uniqueness is the point

A set is an unordered collection that automatically discards duplicates. There’s no index, no guarantee of order — just a bag of distinct values. That simplicity is precisely what makes it so useful for deduplication and membership checks.

Common use cases: finding unique visitors, comparing two datasets, removing repeated entries from a list in one line.

tags = {"python", "data", "python", "code"} print(tags) # → {"python", "data", "code"} a = {1, 2, 3} b = {2, 3, 4} print(a & b) # intersection → {2, 3}

Heads up: {} creates an empty dict, not an empty set. Use set() for that. It’s a common gotcha for beginners.

Generator — laziness as a superpower

A generator produces values one at a time, only when asked. It doesn’t compute and store everything upfront — it generates values on demand. For huge sequences, this saves significant memory compared to a list.

Generators are ideal when working with large files, infinite sequences, or data pipelines where you process items one by one.

# list builds everything in memory at once squares_list = [x**2 for x in range(1_000_000)] # generator computes each value only when needed squares_gen = (x**2 for x in range(1_000_000)) print(next(squares_gen)) # → 0 print(next(squares_gen)) # → 1

Generators are single-use — once exhausted, you can’t rewind. If you need to iterate more than once, use a list instead.

Tuple — when data should stay frozen

A tuple is like a list that you’ve sealed shut. Once created, its contents cannot be changed. That immutability isn’t a limitation — it’s a deliberate signal to anyone reading your code that this data is not meant to change.

Use tuples for things like coordinates, RGB colors, function return values with multiple components, or database rows. They’re also slightly faster than lists and can be used as dictionary keys.

location = (23.8103, 90.4125) # Dhaka coordinates red = (255, 0, 0) # RGB color # tuples can be used as dict keys; lists cannot distances = {(0, 0): "origin"}

Quick comparison

Structure	Ordered	Mutable	Duplicates	Best for
`list`	yes	yes	yes	General sequences
`dict`	yes	yes	Keys: no	Named lookups
`set`	no	yes	no	Unique items
`generator`	one-pass	no	yes	Large / lazy data
`tuple`	yes	no	yes	Fixed records

The takeaway

There’s no single “best” structure — each one exists for a reason. Reach for a list by default. Switch to a dict when items have names. Use a set when uniqueness matters. Choose a tuple to signal that data is fixed. And lean on a generator whenever memory is a concern. Picking the right container from the start makes your code faster, cleaner, and easier to reason about.