Lab 5¶
Some remarks¶
Repositories and directories for this course: Most course materials are in the course repo.
We recomend that you clone the repo into a directory called
stat159-f17-reference
. First, move into your directory for this course (e.g.stat159
) and clone the repo:git clone https://github.com/berkeley-stat159-f17/stat159-f17.git stat159-f17-reference
Then copy the contents of the repo into a new directory called
stat159-f17-work
:cp -r stat159-f17-reference stat159-f17-work
Now in the
stat159-f17-work
directory you can make changes on notebooks etc. When we add course materials you can pull in thestat159-f17-reference
directory, and then copy again.Absolute paths vs relative paths:
Repositories are meant to be shared. If you have a path to data that looks like
Users/username/repo/data.csv
, will it be able to run on another computer? How can we change the path so that it can run from inside therepo
directory?
File IO¶
This section was slightly modified from the Python docs
Opening a file¶
The function open
returns a file object, and is most commonly used
with two arguments: open(filename, mode)
.
In [2]:
f = open('workfile', 'w')
The first argument is a string containing the filename. The second
argument is another string containing a few characters describing the
way in which the file will be used. mode can be 'r'
when the file
will only be read, 'w'
for only writing (an existing file with the
same name will be erased), and 'a'
opens the file for appending; any
data written to the file is automatically added to the end. 'r+'
opens the file for both reading and writing. The mode argument is
optional; 'r'
will be assumed if it’s omitted.
Normally, files are opened in text mode, that means, you read and write
strings from and to the file, which are encoded in a specific encoding.
If encoding is not specified, the default is platform dependent (see
open). 'b'
appended to the mode opens the file in binary mode: now
the data is read and written in the form of bytes objects. This mode
should be used for all files that don’t contain text.
In text mode, the default when reading is to convert platform-specific
line endings (\n
on Unix, \r\n
on Windows) to just \n
. When
writing in text mode, the default is to convert occurrences of \n
back to platform-specific line endings. This behind-the-scenes
modification to file data is fine for text files, but will corrupt
binary data like that in JPEG or EXE files. Be very careful to use
binary mode when reading and writing such files.
It is good practice to use the with keyword when dealing with file objects. The advantage is that the file is properly closed after its suite finishes, even if an exception is raised at some point. Using with is also much shorter than writing equivalent try-finally blocks:
In [3]:
with open('workfile') as f:
read_data = f.read()
f.closed
Out[3]:
True
If you’re not using the with keyword, then you should call f.close()
to close the file and immediately free up any system resources used by
it. If you don’t explicitly close a file, Python’s garbage collector
will eventually destroy the object and close the open file for you, but
the file may stay open for a while. Another risk is that different
Python implementations will do this clean-up at different times.
After a file object is closed, either by a with statement or by calling
f.close()
, attempts to use the file object will automatically fail.
:
In [ ]:
f.close()
f.read()
Exercise: Write the equivalent logic of the with
statement with
try-finally blocks
In [ ]:
# Your code here
Methods for file objects¶
First, let’s create a file object for example.txt
In [12]:
f = open("lab5-files/example.txt", "r")
Reading¶
To read a file’s contents, call f.read(size)
, which reads some
quantity of data and returns it as a string (in text mode) or bytes
object (in binary mode). size
is an optional numeric argument. When
size
is omitted or negative, the entire contents of the file will
be read and returned; it’s your problem if the file is twice as large
as your machine’s memory. Otherwise, at most size bytes are read and
returned. If the end of the file has been reached, f.read()
will
return an empty string (''
). :
In [13]:
print(f.read())
This is a temporary text file.
We'll parse this file.
In [14]:
f.read()
Out[14]:
''
f.readline()
reads a single line from the file; a newline character
(\n
) is left at the end of the string, and is only omitted on the
last line of the file if the file doesn’t end in a newline. This makes
the return value unambiguous; if f.readline()
returns an empty
string, the end of the file has been reached, while a blank line is
represented by '\n'
, a string containing only a single newline. :
In [15]:
f = open("example.txt", "r")
f.readline()
Out[15]:
'This is a temporary text file.\n'
In [16]:
f.readline()
Out[16]:
"We'll parse this file.\n"
In [17]:
f.readline()
Out[17]:
''
For reading lines from a file, you can loop over the file object. This is memory efficient, fast, and leads to simple code:
In [21]:
f = open("example.txt", "r")
for line in f:
print(line, end='')
This is a temporary text file.
We'll parse this file.
If you want to read all the lines of a file in a list you can also use
list(f)
or f.readlines()
.
Writing¶
Now let’s create a new file to write to
In [22]:
f = open("our_file.txt", "w")
f.write(string)
writes the contents of string to the file,
returning the number of characters written. :
In [23]:
f.write('This is a test\n')
Out[23]:
15
Other types of objects need to be converted – either to a string (in text mode) or a bytes object (in binary mode) – before writing them:
In [25]:
value = ('the answer', 42)
s = str(value) # convert the tuple to string
f.write(s)
f.close()
f.tell()
returns an integer giving the file object’s current
position in the file represented as number of bytes from the beginning
of the file when in binary mode and an opaque number when in text mode.
To change the file object’s position, use f.seek(offset, from_what)
.
The position is computed from adding offset to a reference point; the
reference point is selected by the from_what argument. A from_what
value of 0 measures from the beginning of the file, 1 uses the current
file position, and 2 uses the end of the file as the reference point.
from_what can be omitted and defaults to 0, using the beginning of
the file as the reference point. :
In [40]:
f = open('our_file.txt', 'rb+')
f.write(b'0123456789abcdef')
Out[40]:
16
In [41]:
f.seek(5) # Go to the 6th byte in the file
Out[41]:
5
In [42]:
f.read(1)
Out[42]:
b'5'
In [43]:
f.seek(-3, 2) # Go to the 3rd byte before the end
Out[43]:
48
In [44]:
f.read(1)
Out[44]:
b'4'
In text files (those opened without a b
in the mode string), only
seeks relative to the beginning of the file are allowed (the exception
being seeking to the very file end with seek(0, 2)
) and the only
valid offset values are those returned from the f.tell()
, or zero.
Any other offset value produces undefined behaviour.
Calisthenics¶
Exception handling¶
Using a try-catch-finally block, write a function which takes in a list of numbers and returns a list of all the elements up until the first negative number.
In [ ]:
# Type your code here
Quantiles¶
Write a function to compute the median of a list of numbers
In [1]:
# Type your code here
Now write a function to compute the \(p^\text{th}\) percentile
In [2]:
# Type your code here
File I/O¶
Write a function which creates a file with \(n\) lines numbered
In [3]:
# Type your code here
Write a function which appends to that file an extra \(m\) lines
In [4]:
# Type your code here