Do Not Use '>' in Your Command Prompt (and How to Stay Safe in Shell)

2021-03-16

Over the years of troubleshooting performance problems in the Unix/Linux world, I have seen multiple cases where a regularly used command line tool in a customer server just stops working for some reason. The tool just returns immediately, doing absolutely nothing. No output printed, no coredumps and the exit code is zero (success!).

This article walks you through a couple of such incidents and in the end I explain how I avoid accidentally doing bad stuff in production in general.

  1. The mysterious case of a broken application binary
  2. The mysterious case of a broken OS binary
  3. How to avoid the file clobbering problem?
  4. How to stay safe in shell?
  5. Limiting damage when accidentally running bad stuff
  6. Reducing the chance of accidentally running bad stuff
  7. Summary

The mysterious case of a broken application binary

Here’s a (manually reproduced) example of such a problem on a Linux server. The expdp command is an Oracle database’s high-speed data export tool, but this can happen to any file. Normally the output is this:

oracle@oel7l bin> pwd
/u01/app/oracle/product/18.0.0/dbhome_1/bin
oracle@oel7l bin> 
oracle@oel7l bin> expdp help=y

Export: Release 18.0.0.0.0 - Production on Tue Mar 16 17:55:35 2021
Version 18.3.0.0.0

Copyright (c) 1982, 2018, Oracle and/or its affiliates.  All rights reserved.

The Data Pump export utility provides a mechanism for transferring data objects
between Oracle databases. The utility is invoked with the following command:

   Example: expdp scott/tiger DIRECTORY=dmpdir DUMPFILE=scott.dmp

... lots of output removed...

However, when a customer tried to run that command in their environment one morning, this happened:

oracle@oel7l bin> expdp help=y
oracle@oel7l bin> 
oracle@oel7l bin> echo $?
0

The expdp command had just stopped working overnight! It immediately returned, doing nothing, yet there were no error messages and even the shell command return code $? showed 0 - success.

Time to dig deeper! Let’s make sure that we are trying to execute the right command in its correct location:

oracle@oel7l bin> pwd
/u01/app/oracle/product/18.0.0/dbhome_1/bin
oracle@oel7l bin> which expdp
/u01/app/oracle/product/18.0.0/dbhome_1/bin/expdp
oracle@oel7l bin> 

All looks correct so far, let’s take a look into the binary itself:

oracle@oel7l bin> file expdp
expdp: empty
oracle@oel7l bin> ls -l expdp 
-rwxr-x--x. 1 oracle oinstall 0 Mar 16 17:55 expdp

Boom! Something has truncated the file to zero bytes! I even see the last modification time of this file, it may give some extra clues regarding what/who it may have been (was there some database software patching/releasing going on then, done manually by humans).

An easy quick check is to take a look into shell history files (like .bash_history in users’ home directories) in that server. I’ll query my own user history with fc -l:

oracle@oel7l bin> fc -l
1044   rm ls
1045   cd bin/
1046   pwd
1047   expdp help=y
1048   oracle@oel7l bin> pwd
1049   /u01/app/oracle/product/18.0.0/dbhome_1/bin
1050   oracle@oel7l bin>
1051   oracle@oel7l bin> expdp help=y
1052   Export: Release 18.0.0.0.0 - Production on Tue Mar 16 17:55:35 2021
1053   Version 18.3.0.0.0
1054   Copyright (c) 1982, 2018, Oracle and/or its affiliates.  All rights reserved.
1055   expdp help=y
1056   which expdp
1057   file expdp
1058   ls -l expdp 
1059   pwd
oracle@oel7l bin> 

Whoa, the highlighted commands don’t look like proper shell commands at all! They actually look like someone had accidentally pasted some random junk from their terminal screen back to the shell as commands!

Now, all of of these highlighted “commands” would have just errored out as they are some random terminal output, not valid shell commands, right? For example, trying to execute this command would give you an error…

$ oracle@oel7l bin
-bash: oracle@oel7l: command not found

… BUT, some of our commands have “>” signs in them too!

$ oracle@oel7l bin> expdp help=y

The above command itself did not succeed because of the abovementioned “command not found” shell error, but the output of the failed command (zero bytes) will be still written into whatever file name comes after that “>” redirection character. Now, if you happen to be in your home directory or ‘/tmp’, you’d be just accidentally creating a new empty file called expdp there. However, when you happen to be in your application binary directory (like I was in this case) or used the full executable path in your previous commands, you will end up clobbering the existing binary file. You’ll truncate it and replace it with a zero byte file. And from then on, your shell happily executes the zero-byte “shell” script and returns success in doing nothing.

From my terminal scroll-back history of this experiment, you’ll see me accidentally pasting lots of junk from the screen back as commands - and not all of it is harmless:

oracle@oel7l bin> 
oracle@oel7l bin> 
oracle@oel7l bin> oracle@oel7l bin> pwd
-bash: oracle@oel7l: command not found
oracle@oel7l bin> /u01/app/oracle/product/18.0.0/dbhome_1/bin
-bash: /u01/app/oracle/product/18.0.0/dbhome_1/bin: Is a directory
oracle@oel7l bin> oracle@oel7l bin> 
-bash: syntax error near unexpected token `newline'
oracle@oel7l bin> oracle@oel7l bin> 
-bash: syntax error near unexpected token `newline'
oracle@oel7l bin> oracle@oel7l bin> expdp help=y
-bash: oracle@oel7l: command not found
oracle@oel7l bin> 
oracle@oel7l bin> Export: Release 18.0.0.0.0 - Production on Tue Mar 16 17:55:35 2021
-bash: Export:: command not found
oracle@oel7l bin> Version 18.3.0.0.0
-bash: Version: command not found
oracle@oel7l bin> 
oracle@oel7l bin> Copyright (c) 1982, 2018, Oracle and/or its affiliates.  All rights reserved.
-bash: syntax error near unexpected token `c'
oracle@oel7l bin> 

It can actually get even worse! :-)

The mysterious case of a broken OS binary

In one system I once looked at, someone had managed to truncate /bin/ls as root! It started as an exciting “OMG did they delete the entire filesystem?!” exercise:

root@oel7l bin> ls /home
root@oel7l bin> 

Ok, the home dir is gone? What about root?

root@oel7l bin> ls /
root@oel7l bin> 

What, the root directory is gone too? How could I even log in if root is gone? Is someone still deleting stuff? Did we get hacked?!

Luckily you don’t have to rely only on ls to list file & directory names. Other than find you could use shell’s built-in wildcard expansion:

root@oel7l bin> echo /*
/bin /boot /dev /etc /home /lib /lib64 /media /mnt /opt /proc /root /run /sbin /srv /sys /tmp /u01 /u02 /u03 /u04 /usr /var
root@oel7l bin>

The files are still there! We can further examine file metadata with file or stat commands (and the glob wildcard expansion is supported):

root@oel7l bin> file /
/: directory
root@oel7l bin> 
root@oel7l bin> stat /
  File: ‘/’
  Size: 4096        Blocks: 8          IO Block: 4096   directory
Device: fc00h/64512d  Inode: 128         Links: 21
Access: (0555/dr-xr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Context: system_u:object_r:root_t:s0
Access: 2021-03-16 19:14:49.953914433 -0400
Modify: 2018-09-14 19:01:29.695056064 -0400
Change: 2018-09-14 19:01:29.695056064 -0400
 Birth: -

We even see the last modification timestamp with the stat command (ls gets its info from the same place as stat under the hood).

So, there’s something wrong specifically with the ls binary. The next logical command to run is which ls. It would show you from which directory in your PATH it found a file with such name that is accessible to you and has an “x” bit set. It is less known, but which also shows if someone has aliased your ls command to shutdown as a prank.

Let’s see where is my ls binary:

root@oel7l bin> which ls
/usr/bin/ls

Hmm, I had always thought that ls was in /bin not /usr/bin. Let’s check whether the /bin directory is just a symlink:

root@oel7l bin> ls -ld /bin
root@oel7l bin> 

Oops, I had already forgotten that our ls command is broken!

root@oel7l bin> file /bin
/bin: symbolic link to `usr/bin'
root@oel7l bin> 
root@oel7l bin> stat /bin
  File: ‘/bin’ -> ‘usr/bin’
  Size: 7           Blocks: 0          IO Block: 4096   symbolic link
Device: fc00h/64512d  Inode: 773         Links: 1
Access: (0777/lrwxrwxrwx)  Uid: (    0/    root)   Gid: (    0/    root)
Context: system_u:object_r:bin_t:s0
Access: 2021-03-16 17:38:36.389289020 -0400
Modify: 2018-08-19 11:29:23.860005567 -0400
Change: 2018-08-19 11:29:23.860005567 -0400
 Birth: -

Both file and stat can tell us if we are dealing with a link and where it points to. Using the echo * pattern trick I can see what other files do we still have in /usr/bin:

root@oel7l bin> echo /usr/bin/ls*
/usr/bin/ls /usr/bin/lsattr /usr/bin/lsblk /usr/bin/lscpu /usr/bin/lsinitrd /usr/bin/lsipc /usr/bin/lslocks /usr/bin/lslogins /usr/bin/lsmem /usr/bin/lsns /usr/bin/lsscsi

root@oel7l bin> file /usr/bin/ls
/usr/bin/ls: empty

Indeed, the /usr/bin/ls file has been truncated to zero bytes by someone!

root@oel7l bin> stat /usr/bin/ls
  File: ‘/usr/bin/ls’
  Size: 0           Blocks: 0          IO Block: 4096   regular empty file
Device: fc00h/64512d  Inode: 104356      Links: 1
Access: (0755/-rwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Context: system_u:object_r:bin_t:s0
Access: 2021-03-16 17:57:58.944003009 -0400
Modify: 2021-03-16 17:57:54.594064500 -0400
Change: 2021-03-16 17:57:54.594064500 -0400
 Birth: -

The file’s last modification date may give me extra insight into what/who might have accidentally overwritten the file. The .bash_history shows a similar pattern of “someone” pasting in junk from the terminal screen:

root@oel7l bin> fc -l
1005   ls /var/run
1006   ls /var/log
1007   ls /var/local
1008   root@oel7l bin> ls /var/local
1009   root@oel7l bin> 
1010   ls /var/local
1011   ls /
1012   which ls
1013   pwd
1014   ls -ld /bin
root@oel7l bin> 

You would now need to restore your ls binary or reinstall it from the install package. Or, if not having a working ls command drives you nuts while doing the restore operation, you could create a shell wildcard expansion-based temporary shell script or even an alias that just uses echo * :-)

root@oel7l bin> alias ls=echo
root@oel7l bin> ls /*
/bin /boot /dev /etc /home /lib /lib64 /media /mnt /opt /proc /root /run /sbin /srv /sys /tmp /u01 /u02 /u03 /u04 /usr /var

How to avoid the file clobbering problem?

In bash you can just set -o noclobber! This will tell the shell to not clobber (overwrite contents) using the redirection operator.

Let’s check the current value, create a test file and enable noclobber, to see how it helps:

$ set -o | grep clob
noclobber       off
$ 
$ echo hello > a
$ 
$ cat a
hello
$ 
$ set -o noclobber
$ set -o | grep clob
noclobber       on
$ 

Ok, now let’s try to overwrite that file with redirection:

$ echo hello > a
-bash: a: cannot overwrite existing file
$

Nice! Bash doesn’t allow overwriting the file a. Similarly, accidentally pasting in bad terminal output doesn’t overwrite the file:

$ badprompt> a
-bash: a: cannot overwrite existing file
$ 

However, the noclobber option is not very fool-proof, if your goal is to avoid accidental file modifications. For example, bash allows you to override the general noclobber setting and use >| to say that you really do want to clobber that file:

$ badprompt>| a
bash: badprompt: command not found...
$ 
$ cat a
$

It’s less likely that someone’s prompt ends with >|, but when you accidentally paste in random junk from terminal, these character sequences may well happen. And more, noclobber does not prevent one from appending to the end of the file with >>:

$ set -o noclobber
$ cat a
$ 
$ echo hello > a
-bash: a: cannot overwrite existing file
$ 
$ cat a
$ 

We couldn’t overwrite the existing empty file with anything so far, thanks to noclobber. But let’s try to append:

$ echo hello >> a
$ echo hello >> a
$ 
$ cat a
hello
hello
$ 

No clobber, no problem!

This requires a little more exotic circumstances, the accidentally executed command must actually exist and print something into its standard output, for it to be appended to whatever filename comes after the >> redirection. Some examples from the Linux world would be the mysql or pcp users (the executable command name == username in typical installations, thus some people may have their prompts looking like mysql> or pcp> when logged in as these users. Nevertheless, pasting in a bunch of such unlucky enough junk from the terminal with >> in it may cause you to append random stuff to your existing binaries and scripts, instead of truncating them to zero. (Which one is better? 🤔)

What about the # in a typical root prompt? Everything that comes after a # character, is gonna be a comment, right? I have allowed clobbering and am using just echo commands here to keep things simpler:

$ cat a
hello
hello
$ 
$ echo zzz #> a
zzz
$ 
$ cat a
hello
hello

The above example didn’t try to overwrite my file, as the #> a was treated as a comment. The yellow “zzz” you see repeated above, was just the standard output of the echo command, displayed on terminal screen (and redirection to a file did not kick in thanks to the comment # character). My file’s contents still say “hello”.

Now let’s make one last tiny change as I want my awesome-shell-prompt to be more compact. “I just trimmed some whitespace, I don’t think we need to test that”:

$ cat a
hello
hello
$ echo zzz#> a
$ 
$ cat a
zzz#
$

Oh crap, I should have tested that! A tiny change to whitespace changed the meaning of the echo command. Without the space between “zzz” and “#”, the shell thinks it’s all part of a single argument passed to the echo commadn (echo zzz#) and the output of that command (zzz#) was then redirected to my file a.

Hopefully by now it is pretty evident that for the sake of sanity of our production systems (and ourselves), you shouldn’t use > in your shell prompts. However, this will not guarantee avoiding problems related to accidental pasting of other bad commands. For example, having an unintended space between the desired directory/filename prefix and * in this example - don’t run it!:

# rm -rif /some/app/dir/oldlog_ *

The above command will try to erase a single file /some/app/dir/oldlog_ and anything that matches * in the current working directory!

How to stay safe in shell?

This stuff is complex! Small mistakes can come back to bite you in a variety of ways. We are running critical production systems after all - how to avoid these problem reliably, so we wouldn’t be dependent on human errors not happening and always having luck?

I am not addressing any higher level solutions here, like various immutable infrastructure-as-code plaforms - they eliminate much of the “manual everyday typing” human error risk and shift the remaining risk to different layers.

If you actually need to log in to the servers manually, then the best solution I know is to not use privileged access when you don’t need it. This will reduce convenience, but will increase safety.

Limiting damage when accidentally pasting / typing in bad stuff

  1. Do not log in as root or sudo to an interactive root shell, even on development machines - they’re someone’s production too!
  2. In production, don’t even log in as the database/application owner at OS level (so that the malformed rm command above can’t erase important files)
  3. In production, don’t configure universal passwordless sudo for yourself
  4. In production, you can enable typical (diagnostic) commands via passwordless sudo (but everything else still needs a password)
  5. In production, disable sudo credential caching (via /etc/sudoers)
  6. When applying OS changes in production, write them all into a script, test it and run that exact script with sudo + password
  7. sudo and /etc/sudoers aren’t meant only for gaining selective access to root user, but can be any other user too (dba, app owner)

This way, even if you do paste in some accidental junk from your clipboard, you can not mess something up under the different OS users, unless you are deliberate or get extremely unlucky.

Reducing the chance of accidentally pasting / typing in bad stuff

Going back to the original topic in this article - pasting in random stuff from your computer’s clipboard is bad!

  1. For me, the first step of avoiding some of the paste horror is to not use a terminal which immediately pastes the clipboard on just a right mouse click. I accidentally right click my mouse multiple times per day! I’ve used this terminal for the last 13 years as it gives me horizontal scrolling and doesn’t have the insane right-mouse-click pasting. I paste with a deliberate CMD+V, nothing else
  2. I use various notes.txt style files. When working interactively in production servers (having live performance troubleshooting fun!), I tend to write any non-trivial command into an editor in a separate window first (often testing the commands out in a similar test environment)
  3. I don’t copy&paste commands. I typically cut&paste back to the same text editor window, to make sure that the latest command was definitely put into the clipboard (I have had many occurrences of some browser or MS Word window just silently ignoring my copy commands).
  4. Once I am sure that the clipboard contains what I intended, I will immediately paste it to the production terminal window. Sometimes with a manually typed # prefix just to double-check it before hitting “go”
  5. In extra paranoid mode, I double-check my clipboard contents even when visiting any browser windows (you know, because of pastejacking)
  6. You might think that you’d just mitigate all risk by always pasting stuff to a vim editor on the production server side, but what if your clipboard buffer contains an ^ESC:q!\nsomeverybadcommand\n or an unfortunate VIM macro? So I tend to cut & paste clipboard to my local editor, immediately before pasting it to the server

Summary

I hope that this was an entertaining read… and maybe it helps to explain that old mysterious incident when you had to restore only the mysql or sqlplus binary from a backup, while everything else seemed to be fine (other than your shell prompt with the > suffix ;-)

This file clobbering problem is just one example of how accidental input can mess things up in your servers, even if you don’t hit some of the worst case rm -rf or shutdown commands. There are multiple reliable options for avoiding trouble and greatly reducing the local blast radius. Having good command prompt hygiene, especially in important systems, reduces the amount of headaches as well. And after that, you’ll need good backups.

If you want further reading - one more way to look into misbehaving shell binaries is explained in my earlier blog entry. It talks about troubleshooting sudden SSH logon delays via system call tracing using strace. While strace doesn’t trace the application’s internal user-space logic directly, it can still be very useful in cases like immediate, abrupt exits or in scenarios like “my config file changes are not picked up by the application”:

Discussion


  1. Updated video course material to be announced soon:
    Advanced Oracle SQL Tuning training. Advanced Oracle Troubleshooting training, Linux Performance & Troubleshooting training.
    Check the current versions out here!
  2. Get randomly timed updates by email or follow Social/RSS