dc.description.abstract | The Linux Operating System provides the infrastructure that powers the vast majority of the
Internet. This course introduces you to Linux, with a focus on the operating system features
that are accessible from the command line and programming or scripting languages such as
Python and AWK. Through significant programming projects, you will learn how to develop and
manage software within the Linux environment, with emphasis on applications that manipulate
real-life datasets and interface with data obtained live from the Internet. Pre-requisite: COMP
1300C. This course assumes that you already know Python 3.
We will cover the following topics in this course: the history of Linux and other “free software”
and open source systems, the Linux command line (aka “the Shell”) and built-in features,
permissions and processes, regular expressions, the “gawk” variant of the AWK programming
language, version control and automated application building, HTML, advanced Python features
(such as generators) and its libraries. We will also cover Hadoop and, Spark, which are systems
used by Google, Yahoo, Yelp, and many other companies to do massively scalable computations
on clusters of Linux computers.
A significant part of this course involves actually writing AWK scripts and Python programs that
address the above topics – in labs, for homework, on exams, and for a semester project. We
will use publicly available databases as interesting and realistic examples similar to data that
might be encountered in business settings. | en_US |