Essential CLI Tools for Shell Scripting and Automation
Beyond the text-processing trinity, Bash scripts regularly lean on a whole ecosystem of small, focused command-line utilities. Each one does one thing well. Here's a tour of the essential ones, with practical examples of where they actually show up.
find: The File Locator
find is far more powerful than most people realize. Most folks use it for find . -name "*.txt" and stop there — but it can traverse directory trees filtering by type, size, modification time, permissions, and more, and then execute commands on whatever it finds:
# Basic usage
find /path -name "*.txt" # Find .txt files
find . -type f # Regular files only
find . -type d # Directories only
find . -name "*.log" -mtime -7 # .log files modified in last 7 days
find . -size +10M # Files larger than 10MB
find . -name "*.tmp" -delete # Find and delete temp files
find . -perm 644 # Files with exactly these permissions
# Execute a command on each found file
find . -name "*.jpg" -exec convert {} {}.webp \;
# Use with xargs for efficiency
find . -name "*.log" | xargs grep "ERROR"
# Null-delimited for filenames with spaces
find . -name "*.txt" -print0 | xargs -0 wc -l
# Find files modified in the last 24 hours
find . -type f -mtime -1
# Find files modified since a specific date (GNU find)
find . -type f -newermt '2024-01-01'
# Find empty files and directories
find . -empty
# Find files and print with details
find . -name "*.py" -exec ls -lh {} \;
The -print0 and xargs -0 pair is the safe way to handle filenames that contain spaces or special characters. -print0 separates results with null bytes instead of newlines, and xargs -0 reads null-separated input. It's a bit verbose, but you'll be glad you used it the first time a filename with a space breaks your script.
xargs: Turning stdin into Arguments
xargs reads items from stdin and passes them as arguments to another command. Without it, you'd need a full loop to process each item individually. With it, you can do the same thing in a one-liner:
# Delete all .tmp files
find . -name "*.tmp" | xargs rm
# Process in batches of 5
echo "a b c d e f g h i j" | xargs -n 5 echo
# Run commands in parallel (4 processes at once)
find . -name "*.png" -print0 | xargs -0 -P 4 -I {} convert {} {}.jpg
# Build commands with placeholder
cat urls.txt | xargs -I {} curl -O {}
# Preview what xargs would do (add echo)
find . -name "*.log" | xargs echo rm
-P 4 runs 4 processes in parallel — genuinely useful when you're doing something like bulk image conversion. -I {} lets you specify exactly where in the command the argument should go, rather than always appending it at the end. And always pair -print0 with find and -0 with xargs when filenames are involved.
sort and uniq: Ordering and De-duplicating
sort file.txt # Alphabetical sort
sort -n numbers.txt # Numeric sort
sort -rn numbers.txt # Reverse numeric sort
sort -k2 data.txt # Sort by second field
sort -k2 -k3n data.txt # Sort by second field, then third numerically
sort -t',' -k2 data.csv # Sort CSV by second column
sort -u file.txt # Sort and remove duplicates
uniq file.txt # Remove consecutive duplicates (input must be sorted)
uniq -c file.txt # Count occurrences
uniq -d file.txt # Show only duplicates
uniq -u file.txt # Show only unique lines
The canonical "frequency counter" pattern comes up constantly:
cat data.txt | sort | uniq -c | sort -rn
It works because sort groups identical lines together, uniq -c counts consecutive duplicates, and the second sort -rn orders by frequency, highest first. Once you've used this pattern a few times, it becomes muscle memory.
cut: Extracting Fields
cut extracts specific fields or character ranges from text — simpler than awk when you just need to grab a column:
# Extract fields by delimiter
cut -d: -f1 /etc/passwd # First field from colon-separated file
cut -d, -f1,3 data.csv # Fields 1 and 3 from CSV
cut -d: -f1-3 /etc/passwd # Fields 1 through 3
{% element elem_flashcard_9_1_44915a %}
{% element elem_flashcard_9_0_2110c3 %}
# Extract by character position
cut -c1-10 file.txt # First 10 characters of each line
cut -c-5 file.txt # Characters up to position 5
tr: Character Translation
tr translates or deletes characters in a stream — character by character, no regex needed:
echo "hello world" | tr 'a-z' 'A-Z' # Uppercase
echo "hello world" | tr -d 'aeiou' # Delete vowels
echo "hello world" | tr -s ' ' # Squeeze repeated spaces
echo "one:two:three" | tr ':' '\n' # Replace colons with newlines
cat file.txt | tr -d '\r' # Remove Windows carriage returns
That last one — removing Windows carriage returns — is something you'll need more often than you'd expect. Files transferred from Windows have \r\n line endings; Linux tools only expect \n. Strange characters at the end of lines, mysterious command failures, variables with garbage values that shouldn't have garbage values — often a tr -d '\r' fixes the whole thing.
head and tail: Top and Bottom of Files
head -5 file.txt # First 5 lines
tail -5 file.txt # Last 5 lines
tail -f /var/log/syslog # Follow a file (watch for new lines)
tail -f app.log | grep "ERROR" # Watch log in real-time, filter errors
head -1 file.txt # Just the first line (e.g., CSV header)
tail -n +2 file.txt # Everything except the first line
tail -f is an essential debugging tool — it keeps reading and displaying new content as it's appended to a file. You'll use it constantly when watching server logs during a deployment or investigating a running process.
date: Time Handling in Scripts
date # Current date and time
date +%Y-%m-%d # Just the date (2024-01-15)
date +%Y%m%d_%H%M%S # Compact timestamp (20240115_143022)
date +%s # Unix timestamp (seconds since epoch)
date -d "2024-01-01" # Parse a date string (GNU date)
date -d "yesterday" # Yesterday's date
date -d "+7 days" # One week from now
Timestamps are incredibly useful in scripts for naming log files and backups:
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
backup_file="backup_${TIMESTAMP}.tar.gz"
Every backup file gets a unique name automatically. Simple and effective.
wc: Counting Things
wc -l file.txt # Count lines
wc -w file.txt # Count words
wc -c file.txt # Count bytes
wc -m file.txt # Count characters
# Practical: count lines in script output
ls | wc -l # How many files are here?
cat /etc/passwd | wc -l # How many user accounts?
curl and jq: HTTP and JSON on the Command Line
curl is the command-line HTTP client, and jq is the command-line JSON processor. Together, they let you interact with APIs directly from your scripts:
# Basic HTTP GET
curl https://api.example.com/data
# POST with JSON body
curl -X POST \
-H "Content-Type: application/json" \
-d '{"key": "value"}' \
https://api.example.com/endpoint
# Follow redirects, save to file
curl -L -o output.html https://example.com
# Download with progress bar
curl -# -O https://example.com/largefile.zip
# With authentication
curl -u username:password https://api.example.com/protected
# Check if URL is reachable (silent, just exit code)
curl -s -f https://api.example.com/health > /dev/null
jq for processing JSON:
# Pretty-print JSON
curl https://api.example.com/data | jq '.'
# Extract a field
curl https://api.example.com/user/1 | jq '.name'
# Filter an array
curl https://api.example.com/users | jq '.[] | select(.active == true) | .email'
# Create a new JSON object
jq -n '{name: "Alice", age: 30}'
# Use in a script
name=$(curl -s https://api.example.com/user/1 | jq -r '.name')
echo "User: $name"
The -r flag in jq outputs "raw" strings without JSON quotes — essential when you're using jq output in shell variables. Without it, your variable ends up with literal quote characters in it, which causes all kinds of subtle breakage downstream.
Only visible to you
Sign in to take notes.