Apache
We
talked
about
it
in
CSCI110
Now
it’s
time
for
you
to
configure
an
Apache
web
server
You won’t configure the entire Apache in the labs; there you will simply configure your own public_html directory
Apache
•
What
did
you
learn
about
it
back
in
CSCI110?
–
That
it
is
the
most
common
web
server
–
That
it
has
a
highly
modular
structure
•By selecting modules, you can build an Apache with different
capabilities capabilities
–
That
its
logs
tell
you
about
traffic
on
your
site
and
that
log
analysis
can
warn
you
of
hacker
attacks
–
That
its
run
‐
time
behaviour
is
controlled
by
a
configuration
file
–
And
that
it
comes
with
lots
of
detailed
documentation!
You have forgotten it all? Sigh! Go find the notes in www.uow.edu.au/~nabg/110
Web
server
usage
from
Netcraft
Web
server
usage
from
Netcraft
What’s changing?
Nothing very obvious; nginx is growing a little.
Something
odd
happening
Something odd in 2013 – Apache down Microsoft up ?
Modularity
• Unfortunately, the way that this is achieved does vary quite a bit
between different versions of Apache, and between the different
implementations for Windows and Linux etc. – And Debian/Ubuntu linux systems add their own twist
•
But
basically
–
There is a core web‐server
There is a core web server
• Handles delivery of static files–
With modules, almost always included, that implement
HTTP features like content MIME type negotiation and
authorization
–
And additional modules, selectively included, that support
specialized features like WebDAV
• Web‐based distributed authoring and versioning adds document
editing and management functionality – it’s used to support web
Modularity
• Traditional–Apache has a .configure script, you ran this with command line arguments naming the modules that you wanted enabled (or disabled if they were enabled by default)
–The configure script created a makefile
–You ran make, and it built a statically linked Apache with the modules that you specified
that you specified
–If you wanted to change modules, you had to reconfigure, recompile and relink
• Modern Apaches
–You build your Apache with “shared” objects for the different modules
–You edit the configuration file to specify the modules that you want included – leaving other modules around in case you want to add them later
•Just a matter of changing the config file and restarting Apache –The code gets linked in dynamically when needed at run‐time
•Windows versions of Apache always did this
Windows
(xampp)
•
C:\xamp\apache\conf\httpd.conf contains the Apache
configuration file
•
Near head, there is a section where modules are listed in
“LoadModule” directives – just comment/un‐comment these
as needed to get your desired Apache
as needed to get your desired Apache
Debian/Ubuntu
•
Apache
configured
and
make
run
with
all
but
core
modules
specified
as
shared
–
Two directories created for code – mods‐available and
mods‐enabled
•mods‐available contains “shared object” .so files with code, and .conf files with configuration data for those modules that require .conf files with configuration data for those modules that require such data
•mods‐enabled – has links to files in mods‐available (or, sometimes, copies); these are the modules in your actually running Apache
–Helper programs “a2enmod” and “a2dismod” should keep these directories set up correctly
–But you can just create links at shell level
•If you need to change .conf files, I suggest that you create a copy in mods‐enabled and just edit the copy (that way, the standard configurations are still recorded in the mods‐available files) http://www.control‐escape.com/web/configuring‐apache2‐debian.html
Ubuntu
Ubuntu
‐
/etc/apache2
•
httpd.conf – no
content,
required
placeholder
file
•
apache2.conf
– root
configuration
file
–
Some of settings
–
And “Include /etc/apache2/mods‐enabled/*.load” etc
p
Apache2
ServerRoot “/etc/apache2” Timeout 300 KeepAlive On MaxKeepAliveRequests 100 KeepAliveTimeout 15 <IfModule mpm_prefork_module> AccessFileName .htaccess DefaultType text/plain HostnameLookups Off ErrorLog /var/log/apache2/error.log … StartServers 5 MinSpareServers 5 MaxSpareServers 10 MaxClients 150 MaxRequestsPerChild 0 </IfModule> User ${APACHE_RUN_USER} …Performance
tuning?
You may need to adjust values of
parameters like MaxClients
Enabled
mods
Apaches in lab will have userdir in mods‐enabled – which allows you to use a public_html directory in your home directory.
What
modules
are
available?
•
Core
features
–
Core
• the basic web server; also deals with configuration aspects like location of htdocs, keep alive, etc etc
mpm common
and variants like mpm winnt–
mpm_common
and variants like mpm‐winnt• Code for the multi‐processing parts (threaded parts)
–
prefork
• Code for the old style non‐threaded server
Details at http://httpd.apache.org/docs/current/mod/
What
modules
are
available?
•
Some
of
the
others!
–actions•Allows you to specify that requests for particular file types should trigger execution of a specified CGI program
–So could specify that any request for an image/gif file would result in Apache running a specified cgi program (which maybe gets the image from a database, or generates it) rather than looking for an actual image file
generates it) rather than looking for an actual image file
–cgi
•Adds in the standard CGI mechanism
–Files in the “ScriptAlias” directory (usually cgi‐bin) and those whose types have been defined in an AddHandler directive in the main configuration file will be executed as programs rather than returned as files
–(cgid)
•Specialization needed for some Unix operating systems running a
threaded Apache
There are more than 80 modules; some are a little exotic
More
modules
–
dav (also dav_fs, dav_lock)
• Support for WebDAV functionality
–
cache (along with disk_cache and mem_cache)
• Supports caching of local (or proxied) content by URI
–Can have schemes where first request for some expensive to generate dynamic page results in it being generated and given a time to live; this version saved in a disk file or in memory.
Subsequent requests for the same content are delivered from the cache until the time to live has expired.
»Cautions – hacker exploits
–
suexec
• Allows “set user” scripts that run with userid other than that of web‐server
More
modules
–so •Module needed if this Apache will use dynamic linking –ssl •Encryption etc for use with SSL and TLS (https) –uniqueid•“Generate a unique identifier for each request”
–Maybe, for Unix clusters etc. Still listed as a module but description seems to relate to single threaded Apaches.
–rewrite
•Using regex, rewrite request URLs on the fly.
One place you will encounter this is with Zend Framework where all requests have to actually be routed through the code of the framework engine. –ident
•Call back to client running identd process – supposedly returns identifier of user who submitted request. (Most clients don’t run identd; those that do could always lie)
More
modules
–
mime
• Associating filename extensions with handlers and filters
–
mime_magic
• Guess file type based on first few bytes
negotiaton
–
negotiaton
• Content negotiation
–
deflate
• Compress content before return to client
–
auth_basic, auth_digest, authn_dbd, authn_dbm,
authn_file
Non
Apache
modules
–
php,
perl,
…
•Third party modules can be added
•mod‐php and mod‐perl both incorporate interpreter code into the Apache so that scripts in these languages can be handled directly rather than involving the more costly fork‐process, exec‐ interpreter mechanism
–Set up can be a little involved – which is why pre‐built software stacks like lamp and xampp are popular
Modules
•
Include
your
modules
•
Configure
your
modules
–
Some
require
their
own
“environment
variables”
with data such as location of required directories
with
data
such
as
location
of
required
directories
Editing
a
.conf
file,
e.g.
userdir
•
userdir.load
–
Apache
code
to
make
use
of
individual
public_html
directories
(e.g.
~nabg/Hello.html)
•
userdir.conf
–
Where
are
the
user
directories?
–
What
changes
can
be
made
in
.htacess files?
•
Now
default
userdir.conf is
probably
“wrong”
–
It
doesn’t
match
local
directory
hierarchy
–
It
doesn’t
allow
enough
overrides
in
.htaccess file
userdir.conf
Running
Apache
•
Windows/xampp
–
xampp
‐
control.exe
•Allows you to manually control your Apache or have it as a
Windows service
–If a Windows service, it will start automatically with Windows
R t t 80 ( h i fi fil )
•Runs at port 80 (can change in config file)
•
Linux
–
Can
launch
manually
– have
to
edit
/etc/apache2/ports.conf
to
specify
a
port
like
8080
–
Can
be
configured
as
a
Linux
daemon
process
•Will need to read Linux documentation to find out how
these are configured
Your
Apache
is
running
•
Logs
– by
default
in
/var/log/apache2
(or
C:\xampp\apache\logs)
•
Access
log
and
error
log
A
h
t
ti ll h dl
ll
f l
fil
–
Apache
automatically
handles
rollover
of
log
files;
older
files
get
compressed;
current
access.log and
error.log
•
Permissions in labs will prevent you from viewing log
files, but there is a utility command to show the tail of
error log file (apacheerrortail)
Logs
•
Illustrated
in
CSCI110
•
Access
– all
requests
–
Who
(IP),
when,
what,
result
(success/fail),
size,
browser referrer
browser,
referrer,
…
127.0.0.1 ‐ ‐[18/Nov/2011:16:25:07 +1100] "GET /C2012GeoLoc/geo1.html HTTP/1.1" 200 2571 "‐" "Mozilla/5.0 (Windows NT 5.1; rv:8.0) Gecko/20100101 Firefox/8.0" 127.0.0.1 ‐ ‐[18/Nov/2011:16:26:25 +1100] "GET /C2012GeoLoc/geo1.html HTTP/1.1" 304 ‐"‐" "Mozilla/5.0 (Windows NT 5.1; rv:8.0) Gecko/20100101 Firefox/8.0“•
Error
Nov 09 15:01:06 2011] [error] [client 127.0.0.1] File does not exist: C:/xampp/htdocs/js, referer: http://localhost/C2012DnD/remy0.htmlLogs
•
Analyse
your
logs
–
Requests
for
non
‐
existent
files,
e.g.
phpMyAdmin/*,
robots.txt,
…
–
Repeated failed login attempts to controlled
Repeated
failed
login
attempts
to
controlled
access
sub
‐
directories
–
Inappropriate
referrer
–
…
Example
•
Minor
cs.uow server
used
for
321
projects
These 2 are legit; The rest – The rest hacker Same sequence of requests submitted on same day from 195.29.116.194, 93.153.215.26, 212.88.119.242, 211.100.49.194, 61.233.9.58, 61.222.216.248, 164.77.88.42, 118.123.96.89, 211.143.113.2, 116.242.178.84, 89.208.153.2 This hacking script must be popular with the script‐kiddy typesWho
done
it?
•
host
a.b.c.d
– E.g. $ host 194.116.29.195194.116.29.195.in‐addr.arpa domain name pointer ZET‐ZH‐ZGOZ‐P.net.t‐com.hr. Host 26.215.153.93.in‐addr.arpa. not found: 3(NXDOMAIN)
210 209 190 79 in‐addr arpa domain name pointer isc210 internetdsl tpnet pl 210.209.190.79.in addr.arpa domain name pointer isc210.internetdsl.tpnet.pl. 242.119.88.212.in‐addr.arpa domain name pointer mail.aclaimafrica.com. 248.216.222.61.in‐addr.arpa domain name pointer 61‐222‐216‐248.HINET‐
IP.hinet.net.
Host 42.88.77.164.in‐addr.arpa. not found: 3(NXDOMAIN)
•
Sometime
worth
adding
deny
from
a.b.c.d to
your
httpd.conf files!
Why
did
many
reverse
DNS
lookups
fail?
•
Possibly deliberate
–
The
DNS
reverse
lookup
relies
on
data
registered
in
the
DNS
configuration
files
for
the
domain
containing
the
machine
•You don’t want to be easily recognised?
•You don t want to be easily recognised?
You simply don’t include all the data that you should have
put in the DNS configuration file for your domain!
•
But can you check further?
–
Yes
– by
using
an
IP
location
service
such
as
that
at
geobytes.com
•They have tables mapping IP addresses to locations
Who
done
it?
Oh, it was a Chinese hacker, possibly somewhere in Beijing – whoever would have thought of that This b****** was in Chile, probably Santiago Another Beijing boy Muscovite Beijing must be a big city filled with computer geeks Taiwan Uganda
Lab
exercises
•
Lab
machines
have
Apache
configured
and
running
at
port
80.
•
You
control
your
public_html directory
Y
l
ht
fil th
–
You
can
place
a
.htaccess file
there
•
Use it to do things like have subdirectories with
restricted access or support multiple language types
Limited
Exercises
•
Multiple
language
versions
of
a
web
site
–
Need
negotiation
and
mime
modules
(should
be
in
lab
setup)
–
Works
with
language
preferences
set
in
browser
B i ht d t lik
•Browser might send request like Accept‐Language: fr; 1=1.0, en; q=0.5
(“I want French but if you don’t have it I will abase myself and read English”)
–
Needs
to
handle
cases
where
language
preference
cannot
be
satisfied
•
Sub
‐
directory
with
restricted
access
–
Just
“basic
authentication”
in
the
lab
Language
preferences
•
Create
multiple
versions
of
those
static
pages
that
are
to
be
delivered
in
chosen
language
–
Welcome.html
.en
,
Welcome.html
.fr
,
Welcome.html
.de
–
Pages
g
should
(obviously)
(
y)
have
content
in
appropriate
pp p
language
•
Will
usually
need
an
extra
directive
in
the
<head>
section
of
the
page
that
specifies
font:
–
<meta
http
‐
equiv=“Content
‐
type”
content=“text/html;
charset=“utf
‐
8”
>
–(Charset value obviously needs to be appropriate to the page, utf‐8 should suffice for most European languages)
Best
matching
language
•
Language
preference
system
will
try
to
find
a
page
–
You
have
pages
in
English,
French,
German
Your Bulgarian visitor will end up with the German
–
Your
Bulgarian
visitor
will
end
up
with
the
German
page
No
match
•
If
the
user’s
language
requirements
cannot
be
met,
Apache
will
return
some
error
message
that
might
puzzle
the
browser
Providing
a
default
•
Rather
than
returning
an
error,
it
is
better
to
return
a
page
in
some
default
language
–
Old
style
(doesn’t
appear
to
work
anymore)
• You had a page Welcome.html.html – this would get returned if
language preferences couldn’t be satisfied.
–
New
style
•Specify what is to happen by directives in the .htacess file –LanguagePriority and ForceLanguagePriority directives
–Example
LanguagePriority en fr de ForceLanguagePriority Fallback
Another
way
…
•
The
standard
Apache
ways
of
negotiating
languages
are
described
at
http://httpd.apache.org/docs/2.2/content
‐
negotiation.html
•
But
can
instead
use
Apache
rewrite
rules
–Test the language preference headerRewrite the URL as appropriate –Rewrite the URL as appropriate
•Typically would have different language sub‐directories, rewrite rule
would identify the browser’s language preference and insert the
appropriate sub‐directory name into the pathname for file –Advice
•http://www.askapache.com/htaccess/modrewrite‐tips‐tricks.html •http://answers.oreilly.com/topic/2680‐apache‐redirect‐to‐language‐
path‐by‐browser‐language‐setting/
Controlled
Access
•
Very
common
to
have
base
directory
“Allow
from
all”
but
sub
‐
directories
to
which
you
wish
to
restrict
access
•
Restriction
by
IP?
–
Useful
for
•Limiting access to machines within a specified domain
(identified by the IP address)
•Blocking access from known hacker locations
•
Name/password
controls
more
generally
useful
Reference http://httpd.apache.org/docs/2.0/howto/auth.html
Limits
on
domain
or
ip
•
Edit
the
.htacess directory
in
the
sub
‐
directory
that
you
wish
to
control;
e.g.
add
“deny”
lines
–
(Or edit the <Directory> segment in a httpd.conf file)
–
Example
Deny from 212.88.119.242, 211.100.49.194, 61.233.9.58, 61.222.216.248
–
Can use domain name, but a bit more costly at run‐time
Deny from uow.edu.au
–(cost is that Apache must do a reverse DNS lookup on client’s IP to check the domain)
Limits
on
domain
or
ip
•
If
you
want
to
limit
access
to
a
particular
domain,
edit
the
.htacess file
so
that
it
has
both
“deny”
and
“allow”
directive:
Order
deny,
allow
Deny
from
all
Allow
from
uow.edu.au
Password
access
•
Need
to
create
collections
of
user
‐
names
and
passwords,
and
optionally
“groups”
•
Apache
allows
various
options
for
storing
names
and
passwords
– text
file,
dbm file,
sql database
–
Lab
setup
will
only
support
the
text
file
option;
this
suffices
if
you
have
fewer
than
~100
users
•If setting up your own Apache then try configuring with the
dbm options and using that to store names and password
•
Apache
provides
helper
programs
– htpasswd
and
dbmanage
Create
password
file
•
Suggestions
– Put these files in a directory quite separate from the htdocs (or public_html) directory (but has to be accessible to Apache process “www‐data” user)
– Use names like .htpasses, .htgroups – the File directive in the Apache configurationg file should helpp stopp such files beingg accessed byy hackers
•
Commands
htpasswd –c /home/e/ead432/apacheother/.htpasses tom
New password: cleartext
Re‐type new password: cleartext
Adding new user tom
Groups
•
Simple
text
file,
one
or
more
groups
defined
–
Format:
Groupname:
user1
user2
user3
–
Just create in text editor and save (e g htgroups)
Just
create
in
text
editor
and
save
(e.g.
.htgroups)
•
Example
Friends:
tom
dick
harry
sue
htdocs or your public_html forfriendsonly Directory hierarchy .htaccess passwordfile groupsfile
Password file created using an apache utility
g p
Groups file created using a simple text editor These files in a different part of your file space; and controls set to prevent web access!
Editing
the
.htaccess file
•
The
.htaccess file
will
require
directives
specifying
things
like
the
location
of
the
password
file:
AuthType Basicyp
AuthName "For my friends"
AuthUserFile /home/e/ead432/apacheother/.htpasses
AuthGroupFile /home/e/ead432/apacheother/.htgroups
Require group Friends
Access
control
•
On
first
access
to
the
controlled
directory,
Apache
will
return
an
“authentication
required”
response
and
keep
the
connection
open.
Access
control
•
Browser
will
display
a
default
prompt
for
the
user
name
and
password
Basic
access
•
Password
sent
“base
64”
encoded
– simply
a
fixed
character
substitution
scheme
•
Apache
and
http
support
the
more
sophisticated
“Digest”
mechanism
– but
not
all
browsers
implement
digest
(
enough
have
it,
so
you
could
probably
try
using
Digest
)
–
Digest – an MD5 hash of the “realm”, “username” and
“password”
Alternatively, set things up so that https is used and all data are encrypted; this can be achieved in Apache with a “rewrite rule” that changes the URLs ; one suggestion
Browser
fakes
logged
in
status
•
Every
subsequent
request
sent
by
browser
to
same
“realm”
(i.e.
sub
‐
directory
on
web
‐
site)
will
contain
the
name/password
data
in
the
HTTP
headers
–
Every subsequent access does involve Apache checking the
authentication data (open .htpasses file and read it etc);
this may slow operations
•
Action
is
invisible
to
user
who
simply
experiences
a
“logged
in”
state
where
they
can
read
any
pages
in
the
controlled
sub
‐
directory.
•
Logs
show
data
on
user
access
Log
while
using
authentication
• 130.130.64.204 ‐ ‐[03/Jan/2012:13:40:36 +1100] "GET
/~nabg/ForMyFriends/Friend.html HTTP/1.1" 401 668 "‐"
"Mozilla/5.0 (Windows NT 5.1; rv:8.0) Gecko/20100101 Firefox/8.0"
• 130.130.64.204 ‐harry [03/Jan/2012:13:42:06 +1100] "GET
/~nabg/ForMyFriends/Friend.html HTTP/1.1" 304 211 "‐"
"Mozilla/5.0 (Windows NT 5.1; rv:8.0) Gecko/20100101 Firefox/8.0"
130 130 64 204 h [03/J /2012 13 52 18 1100] "GET
• 130.130.64.204 ‐harry [03/Jan/2012:13:52:18 +1100] "GET
/~nabg/ForMyFriends/Friend2.html HTTP/1.1" 200 457 "‐"
"Mozilla/5.0 (Windows NT 5.1; rv:8.0) Gecko/20100101 Firefox/8.0"
• 130.130.64.204 ‐harry [03/Jan/2012:13:52:24 +1100] "GET
/~nabg/ForMyFriends/Friend3.html HTTP/1.1" 200 458 "‐"
"Mozilla/5.0 (Windows NT 5.1; rv:8.0) Gecko/20100101 Firefox/8.0"
Access
control
•
Access
control
–
It’s
defined
in
HTTP
– all
those
“authentication”
responses
etc
–
Apache
implements
it.
•Handled through special programs used by sys‐admin
•
But
it
is
also
done
in
application
code
–
Your
own
database
access
code
–
Probably
a
web
component
(with
some
restrictions
on
use)
for
administering
users
and
their
passwords
–
Possibly
utilising
the
built
in
support
provided
by
HTTP,
but
more
likely
login
state
handled
through
cookies
and
session
state
variables
When
is
it
worth
using
the
HTTP/Apache
version?
•
Uhm?
–
Really
not
that
convenient
–
Most
likely
to
be
used
when
have
a
sub
‐
directory
with
lots
of
static
content
where
access
should
be
restricted
•Think of something like teaching materials for a 100‐level
subject at a small University; access to be limited to enrolled
students
–
But,
but,
but
…
•In modern usage, such content would be better handled using a content management system such as Drupal (where you will have a convenient administrator interface for updating content and for administering users)