Main Course Webpage

Course Slack Page

In-Class Lab 10: Filters 2

In this exercise we will practice with the Unix filters cut and tr. We will also practice using paste even though, strictly speaking, it is not a filter. In addition, we will expand our use of grep to add the options -c, -v, -e, -i, -n, and -l and the use of simple regular expressions. Last, we will do a handful of real-life problems to put these together and address some practical problem.

Do the best you can in this exercise. Pay more attention to the final part where all the filters are used together.

For the duration of these exercises, you will be using test files from the directory samples/Data beneath the public data area on hills. You will also need to create temporary files from time to time, so you can either get personal copies of the files you need and work in your own area, or you can work in the Data directory and place your temporary files in your home directory. This is what the answer key does. Instead of data files, some of the problems us the output of various Linux commands as input to work on.

Part One - cut/paste

The filter cut slices its input vertically either by character position or using a delimter and a field number. cut cannot rearrange fields. If you want to generate some output that takes fields #3 and #5 of one file and displays them as field #5 followed by field #3, you must cut fields out individually and then use paste to paste them together. If you are using fields, the default delimiter for both cut and paste is a tab.

  1. Look at the output of the date command. Using cut and column positions, create a file that contains the month and another that contains the day of the month. Then output the date as day month with a space between the day and the month.

  2. Look at the file samples/Data/Hired_Data beneath the class public directory. Its format is Name:Dept:Job Then use a command to output:

  3. Names only
  4. Names and Jobs, separated by a :
  5. Jobs then Names, separated by a : (this takes several commands)
  6. Look at the file Emp_Data. Output the Names field as First Last rather than Last, First

  7. Look at the files st2 and st3. Note that they are different lengths. Output a list that has the first field of st3 followed by the third field of st2, keeping the same delimiter. How was the difference in length resolved?

Part Two - tr

The tr command translates, deletes, complements and squeezes characters. Try the following command on a text file:

cat file | tr -cs '[[:alnum:]]' '[\012*]' )

tr needs one or two character strings to specify the characters that it is to translate. A shorthand to enumerating these characters is to use a character set. Thus, the character string 'abcde' could be specified as '[a-e]'. Just like wildcards, you can also use a character class in a character set. Character sets (and classes) are port of modern regular expressions, although there are a few minor differences.

Examples:

tr -d '[[:alpha:][:punct:]]'
- This set contains all characters that are members of either the class alpha or punct (Delete all control and punctuation characters)
  1. Using echo, send tr the string "UPPER lower", telling it to delete all blanks.

  2. Redo the last command, telling tr to squeeze out repeated blanks.

  3. Tell it to squeeze out repeated blanks and change those remaining to pound symbols (#)

  4. Last, squeeze out leading blanks, then translate uppercase characters to lowercase characters. (this i making use of the fact that tr can translate a sequence of chracters to another sequence using 1-for-1 substitution)

  5. Examine the file sorttest. Output the file on the screen after changing the # delimiter to a comma.

    Look at an ls -l listing of a directory. You want to cut out the size and name field, but cut relies on a single delimiter between fields. Let's design this solution.

  6. take the ls -l output and squeeze successive blanks to a single one.

  7. now add cut so that the only size and name is output

Part Three - grep

This part covers the grep command, adding options -e -n and -l. The later ones use simple regular expressions. The files can be found in the directory samples/Data beneath the public data area. Using grep only, display the lines in the file u2 that:

  1. start with cow
  2. start with the world It
  3. contain exactly (consists of) cow
  4. contain either cow or animal
  5. contain both cow and animal, anywhere on the line
  6. output the lines in u2 that start with cow with the number in hte file
  7. output the number of lines in u2 that contain cow
  8. output the names of the files in the Data directory that contain cow
  9. output the number of lines in each file in Data that contain cow
  10. output the number of lines in u2 that don't contain cow
  11. output the lines of the passwd file whose shell field (the last field) is /usr/bin/bash
  12. output the lines of passwd whose shell field is neither /usr/bin/bash or /usr/bin/ksh
  13. output the lines of passwd whose login field (the first field) is three chracters long
  14. output the lines of st.bad that have an empty field

Part Four

This last aection has problems that put all the filters together to solve some real-world problems. Some of them are challenging and a good reason to use our class Slack channel for help.

Not we are only making use of simple regular expressions in this set, so you may have to do a bit more work on these problems

  1. Output the login (field #1) of all entires in passwd whose default group (field 4) is 200
  2. Output a list of the different shells used in the file passwd (the last field)
  3. how many different default groups (field #4) are there in passwd?
  4. How many members (members are in field #4, separated by commas) are there in the line in the group file whose group id (field #3) is 3021?
  5. Create a file (in your home directory) named E14dept with only the names (field 2) of each person in the file Emp_Manager1 whose dept field (field 3) is E14. The list should be sorted by the person's id number (field 1)
  6. Student accounts on hills have the default group of 506. Output the number of lines of /etc/passwd whose default group (field 4) is not 506 (i.e. who are not students). Compare this to the number of students in the file. Any predictions?

Turning in your exercise

For the original version of this exerercise as well as solutions refer to Greg Boyd's handout

Submit your answers as an ordinary text file on Canvas