In this exercise we will practice with the Unix filters cut and tr. We will also practice using paste even though, strictly speaking, it is not a filter. In addition, we will expand our use of grep to add the options -c, -v, -e, -i, -n, and -l and the use of simple regular expressions. Last, we will do a handful of real-life problems to put these together and address some practical problem.
Do the best you can in this exercise. Pay more attention to the final part where all the filters are used together.
For the duration of these exercises, you will be using test files from the directory samples/Data beneath the public data area on hills. You will also need to create temporary files from time to time, so you can either get personal copies of the files you need and work in your own area, or you can work in the Data directory and place your temporary files in your home directory. This is what the answer key does. Instead of data files, some of the problems us the output of various Linux commands as input to work on.
The filter cut slices its input vertically either by character position or using a delimter and a field number. cut cannot rearrange fields. If you want to generate some output that takes fields #3 and #5 of one file and displays them as field #5 followed by field #3, you must cut fields out individually and then use paste to paste them together. If you are using fields, the default delimiter for both cut and paste is a tab.
Look at the output of the date command. Using cut and column positions, create a file that contains the month and another that contains the day of the month. Then output the date as day month with a space between the day and the month.
Look at the file samples/Data/Hired_Data beneath the class public directory. Its format is Name:Dept:Job Then use a command to output:
Look at the file Emp_Data. Output the Names field as First Last rather than Last, First
Look at the files st2 and st3. Note that they are different lengths. Output a list that has the first field of st3 followed by the third field of st2, keeping the same delimiter. How was the difference in length resolved?
The tr command translates, deletes, complements and squeezes characters. Try the following command on a text file:
cat file | tr -cs '[[:alnum:]]' '[\012*]' )
tr needs one or two character strings to specify the characters that it is to translate. A shorthand to enumerating these characters is to use a character set. Thus, the character string 'abcde' could be specified as '[a-e]'. Just like wildcards, you can also use a character class in a character set. Character sets (and classes) are port of modern regular expressions, although there are a few minor differences.
Examples:
tr -d '[[:alpha:][:punct:]]'- This set contains all characters that are members of either the class alpha or punct (Delete all control and punctuation characters)
Using echo, send tr the string "UPPER lower", telling it to delete all blanks.
Redo the last command, telling tr to squeeze out repeated blanks.
Tell it to squeeze out repeated blanks and change those remaining to pound symbols (#)
Last, squeeze out leading blanks, then translate uppercase characters to lowercase characters. (this i making use of the fact that tr can translate a sequence of chracters to another sequence using 1-for-1 substitution)
Examine the file sorttest. Output the file on the screen after changing the # delimiter to a comma.
Look at an ls -l listing of a directory. You want to cut out the size and name field, but cut relies on a single delimiter between fields. Let's design this solution.
take the ls -l output and squeeze successive blanks to a single one.
now add cut so that the only size and name is output
This part covers the grep command, adding options -e -n and -l. The later ones use simple regular expressions. The files can be found in the directory samples/Data beneath the public data area. Using grep only, display the lines in the file u2 that:
This last aection has problems that put all the filters together to solve some real-world problems. Some of them are challenging and a good reason to use our class Slack channel for help.
Not we are only making use of simple regular expressions in this set, so you may have to do a bit more work on these problems
For the original version of this exerercise as well as solutions refer to Greg Boyd's handout
Submit your answers as an ordinary text file on Canvas