In simple terms, sockets are a programmable interface to connections between pro- grams, possibly running on different computers of a network. They allow data format- ted as byte strings to be passed between processes and machines. Sockets also form the basis and low-level “plumbing” of the Internet itself: all of the familiar higher-level Net protocols, like FTP, web pages, and email, ultimately occur over sockets. Sockets are also sometimes called communications endpoints because they are the portals through which programs send and receive bytes during a conversation.
socket的英文原义是“孔”或“插座”,通常也称作"套接字",用于描述IP地址和端口,是一个通信链的句柄。在Internet上的主机一般运行了多个服务软件,同时提供几种服务。每种服务都打开一个Socket,并绑定到一个端口上,不同的端口对应于不同的服务。Socket正如其英文原意那样,象一个多孔插座。一台主机犹如布满各种插座的房间,每个插座有一个编号,有的插座提供220伏交流电, 有的提供110伏交流电,有的则提供有线电视节目。 客户软件将插头插到不同编号的插座,就可以得到不同的服务。
Although often used for network conversations, sockets may also be used as a com- munication mechanism between programs running on the same computer, taking the form of a general Inter-Process Communication (IPC) mechanism. We saw this socket usage mode briefly in Chapter 5. Unlike some IPC devices, sockets are bidirectional data streams: programs may both send and receive data through them.
也可IPC,双向数据
To programmers, sockets take the form of a handful of calls available in a library. These socket calls know how to send bytes between machines, using lower-level operations such as the TCP network transmission control protocol. At the bottom, TCP knows how to transfer bytes, but it doesn’t care what those bytes mean. For the purposes of this text, we will generally ignore how bytes sent to sockets are physically transferred. To understand sockets fully, though, we need to know a bit about how computers are named.
Python provides support for standard protocols, which auto- mates most of the socket and message formatting details. Standard Internet protocols define a structured way to talk over sockets. They generally standardize both message formats and socket port numbers:
conversations.
messages are exchanged.
make it easier for programs to locate the standard protocols, port numbers in the range of 0 to 1023 are reserved and preassigned to the standard higher-level protocols.
Table 12-1. Port Numbers Reserved for Common Protocols |
|||
Protocol |
Common
Function |
Port
Number |
Python
Module |
HTTP |
Web pages |
80 |
http.client , http.server |
NNTP |
Usenet news |
119 |
nntplib |
FTP data default |
File transfers |
20 |
ftplib |
FTP control |
File transfers |
21 |
ftplib |
SMTP |
Sending email |
25 |
smtplib |
POP3 |
Fetching email |
110 |
poplib |
IMAP4 |
Fetching email |
143 |
imaplib |
Finger |
Informational |
79 |
n/a |
Telnet |
Command lines |
23 |
telnetlib |
On one side of a conversation, machines that support standard protocols perpetually run a set of programs that listen for connection requests on the reserved ports. On the other end of a dialog, other machines contact those programs to use the services they export. We usually call the perpetually running listener program a server and the connecting program a client. Let’s use the familiar web browsing model as an example. As shown in Table 12-1, the HTTP protocol used by the Web allows clients and servers to talk over sockets on port 80:
Server
listens for incoming connection requests, on a socket bound to port 80. Often, the server itself does nothing but watch for requests on its port perpetually; handling requests is delegated to spawned processes or threads.
Clients
port 80 to initiate a connection. For web servers, typical clients are web browsers like Firefox, Internet Explorer, or Chrome, but any script can open a client-side connection on port 80 to fetch web pages from the server. The server’s machine name can also be simply “localhost” if it’s the same as the client’s.
The structure of those message bytes varies from protocol to protocol, is hidden by the Python library. For example, the FTP protocol prevents deadlock by conversing over two sockets: one for control messages only and one to transfer file data. An FTP server listens for control messages (e.g., “send me a file”) on one port, and transfers file data over another. FTP clients open socket connections to the server machine’s control port, send requests, and send or receive file data over a socket connected to a data port on the server machine. FTP also defines standard message structures passed between client and server. The control message used to request a file, for instance, must follow a standard format.
In fact, each supported protocol is represented in Python’s standard library by either a module package of the same name as the protocol or by a module file with a name of the form xxxlib.py
Although sockets them- selves transfer only byte strings, we can also transfer Python objects through them by using Python’s pickle module. Because this module converts Python objects such as lists, dictionaries, and class instances to and from byte strings, it provides the extra step needed to ship higher-level objects through sockets when required.
Beyond basic data communication tasks, the socket module also includes a variety of more advanced tools. For instance, it has calls for the following and more:
Server side: open a TCP/IP socket on a port, listen for a message from a client, and send an echo reply; this is a simple one-shot listen/reply conversation per client, but it goes into an infinite loop to listen for more clients as long as this server script runs; the client may run on a remote machine, or on same computer if it uses 'localhost' for server
from socket import *
myHost = '' # '' = all available interfaces on host
myPort = 50007 # listen on a non-reserved port number
sockobj = socket(AF_INET, SOCK_STREAM) #make a TCP socket object
sockobj.bind((myHost, myPort)) #bind it to server port number
sockobj.listen(5) #listen, allow 5 pending connects
while True:
connections, address = sockobj.accept() #wait for next client connect
print 'Server connected by', address #connection is a new socket
while True:
data = connections.recv(1024) # read next line on client socket
if not data: break # send a reply line to the client
connections.send(b'Echo =>' + data) # until eof when socket closed
connections.close()
Client side: use sockets to send data to the server, and print server's reply to each message line; 'localhost' means that the server is running on the same machine as the client, which lets us test client and server on one machine; to test over the Internet, run a server on a remote machine, and set serverHost or argv[1] to machine's domain name or IP addr; Python sockets are a portable BSD socket interface, with object methods for the standard socket calls available in the system's C library;
import sys
from socket import *
serverHost = 'localhost' # portable socket interface plus constants
# server name, or: 'starship.python.net'
serverPort = 50007 # non-reserved port used by the server
message = [b'Hello network world'] # default text to send to server
# requires bytes: b'' or str,encode()
if len(sys.argv) > 1:
serverHost = sys.argv[1] # server from cmd line arg 1
if len(sys.argv) > 2: # text from cmd line args 2..n
message = (x.encode() for x in sys.argv[2:])
sockobj = socket(AF_INET, SOCK_STREAM) # make a TCP/IP socket object
sockobj.connect((serverHost, serverPort)) # connect to server machine + port
for line in message:
sockobj.send(line) # send line to server over socket
data = sockobj.recv(1024) # receive line from server: up to 1k
print 'Client received:', data # bytes are quoted, was `x`, repr(x)
sockobj.close()
Uses the Python socket module to create a TCP socket object. The names AF_INET and SOCK_STREAM are preassigned variables defined by and imported from the socket module; using them in combination means “create a TCP/IP socket,” the standard communication device for the Internet. More specifically, AF_INET means the IP address protocol, and SOCK_STREAM means the TCP transfer protocol. The AF_INET / SOCK_STREAM combination is the default because it is so common, but it’s typical to make this explicit.
sockobj = socket(AF_INET, SOCK_STREAM)
Associates the socket object with an address—for IP addresses, we pass a server machine name and port number on that machine. This is where the server identifies the machine and port associated with the socket. In server programs, the hostname is typically an empty string (“”), which means the machine that the script runs on (formally, all available local and remote interfaces on the machine), and the port is a number outside the range 0 to 1023 (which is reserved for standard protocols, described earlier). Note that each unique socket dialog you support must have its own port number; if you try to open a socket on a port already in use, Python will raise an exception. Also notice the nested parentheses in this call—for the AF_INET address protocol socket here, we pass the host/port socket address to bind as a two-item tuple object (pass a string for AF_UNIX ). Te
sockobj.bind((myHost, myPort))
Starts listening for incoming client connections and allows for a backlog of up to five pending requests. The value passed sets the number of incoming client requests queued by the operating system before new requests are denied (which happens only if a server isn’t fast enough to process requests before the queues fill up). A value of 5 is usually enough for most socket-based programs; the value must be at least 1.
sockobj.listen(5)
At this point, the server is ready to accept connection requests from client programs running on remote machines (or the same machine) and falls into an infinite loop— while True (or the equivalent while 1 for older Pythons and ex-C programmers)— waiting for them to arrive:
connection, address = sockobj.accept()
Waits for the next client connection request to occur; when it does, the accept call returns a brand-new socket object over which data can be transferred from and to the connected client. Connections are accepted on sockobj , but communication with a client happens on connection , the new socket. This call actually returns a two-item tuple— address is the connecting client’s Internet address. We can call accept more than one time, to service multiple client connections; that’s why each call returns a new, distinct socket for talking to a particular client.
Once we have a client connection, we fall into another loop to receive data from the client in blocks of up to 1,024 bytes at a time, and echo each block back to the client:
data = connection.recv(1024)
Reads at most 1,024 more bytes of the next message sent from a client (i.e., coming across the network or IPC connection), and returns it to the script as a byte string. We get back an empty byte string when the client has finished—end-of-file is triggered when the client closes its end of the socket.
connection.send(b'Echo=>' + data)
Sends the latest byte string data block back to the client program, prepending the string 'Echo=>' to it first. The client program can then recv what we send here— the next reply line. Technically this call sends as much data as possible, and returns the number of bytes actually sent. To be fully robust, some programs may need to resend unsent portions or use connection.sendall to force all bytes to be sent.
Although the socket model is limited to transferring byte strings, you can send and receive nearly arbitrary Python objects with the standard library pickle object serialization module. Its dumps and loads calls convert Python objects to and from byte strings, ready for direct socket transfer:
import pickle
x = pickle.dumps([99,100]) # on sending end... convert to byte strings
x # string passed to send, returned by recv
'(lp0\nI99\naI100\na.'
pickle.loads(x) # on receiving end... convert back to object
[99, 100]
For simpler types that correspond to those in the C language, the struct module provides the byte-string conversion we need as well:
import struct
x = struct.pack('>ii', 99 ,100) # convert simpler types for transmission
x
'\x00\x00\x00c\x00\x00\x00d'
struct.unpack('>ii',x)
(99, 100)
sockobj.connect((serverHost, serverPort))
Opens a connection to the machine and port on which the server program is lis- tening for client connections. This is where the client specifies the string name of the service to be contacted. In the client, we can either specify the name of the remote machine as a domain name (e.g., starship.python.net) or numeric IP ad- dress. We can also give the server name as localhost (or the equivalent IP address 127.0.0.1 ) to specify that the server program is running on the same machine as the client; that comes in handy for debugging servers without having to connect to the Net. And again, the client’s port number must match the server’s exactly. Note the nested parentheses again—just as in server bind calls, we really pass the server’s host/port address to connect in a tuple object.
Once the client establishes a connection to the server, it falls into a loop, sending a message one line at a time and printing whatever the server sends back after each line is sent:
sockobj.send(line)
Transfers the next byte-string message line to the server over the socket. Notice that the default list of lines contains bytes strings ( b'...' ). Just as on the server, data passed through the socket must be a byte string, though it can be the result of a manual str.encode encoding call or an object conversion with pickle or struct if desired. When lines to be sent are given as command-line arguments instead, they must be converted from str to bytes ; the client arranges this by en- coding in a generator expression (a call map(str.encode, sys.argv[2:]) would have the same effect).
The server keeps running and responds to requests made each time you run the client script in the other window.
First, upload the server’s source file to a remote machine where you have an account and a Python. The & syntax in Unix/Linux shells can be used to run the server script in the background. Now that the server is listening for connections on the Net, run the client on your local computer multiple times again. This time, the client runs on a different machine than the server, so we pass in the server’s domain or IP name as a client command-line argument. The server still uses a machine name of "" because it always listens on what- ever machine it runs on.
!ping learning-python.com
PING learning-python.com (97.74.215.115) 56(84) bytes of data. 64 bytes from p3nlh266.shr.prod.phx3.secureserver.net (97.74.215.115): icmp_seq=1 ttl=38 time=210 ms 64 bytes from p3nlh266.shr.prod.phx3.secureserver.net (97.74.215.115): icmp_seq=2 ttl=38 time=212 ms 64 bytes from p3nlh266.shr.prod.phx3.secureserver.net (97.74.215.115): icmp_seq=3 ttl=38 time=214 ms 64 bytes from p3nlh266.shr.prod.phx3.secureserver.net (97.74.215.115): icmp_seq=4 ttl=38 time=209 ms 64 bytes from p3nlh266.shr.prod.phx3.secureserver.net (97.74.215.115): icmp_seq=5 ttl=38 time=217 ms ^C --- learning-python.com ping statistics --- 6 packets transmitted, 5 received, 16% packet loss, time 5004ms rtt min/avg/max/mdev = 209.249/212.888/217.931/3.055 ms
import sys
from PP4E.launchmodes import QuietPortableLauncher
numclients = 8
def start(cmdline):
QuietPortableLauncher(cmdline, cmdline)()
# start('echo-server.py') # spawn server locally if not yet started
args = ' '.join(sys.argv[1:]) # pass server name if running remotely
for i in range(numclients):
start('echo-client.py %s' % args) # spawn 8? clients to test the server
It’s also important to know that this client and server engage in a proprietary sort of discussion, and so use the port number 50007 outside the range reserved for standard protocols (0 to 1023). There’s nothing preventing a client from opening a socket on one of these special ports, however. For instance, the following client-side code con- nects to programs listening on the standard email, FTP, and HTTP web server ports on three different server machines:
from socket import *
sock = socket(AF_INET,SOCK_STREAM)
sock.connect(('pop.secureserver.net', 110))
print sock.recv(70)
+OK <28789.1401261144@p3plpop05-10.prod.phx3.secureserver.net>
sock.close()
sock = socket(AF_INET,SOCK_STREAM)
sock.connect(('learning-python.com', 21))
print sock.recv(70)
220---------- Welcome to Pure-FTPd [privsep] [TLS] ---------- 220-You
sock.close()
sock = socket(AF_INET,SOCK_STREAM)
sock.connect(('www.python.net', 80))
sock.send(b'GET /\r\n') # fetch root page reply
7
sock.recv(70)
'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"\r\n "http://'
sock.recv(70)
'www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">\r\n<html xmlns="http://www.'
sock.close()
Python’s poplib , ftplib , and http.client and urllib.request modules provide higher-level interfaces for talking to servers on these ports.
Speaking of reserved ports, it’s all right to open client-side connections on reserved ports as in the prior section, but you can’t install your own server-side scripts for these ports unless you have special permission.
sock = socket(AF_INET,SOCK_STREAM)
sock.bind(('',80))
--------------------------------------------------------------------------- error Traceback (most recent call last) <ipython-input-88-0c574f672e0b> in <module>() ----> 1 sock.bind(('',80)) /usr/lib/python2.7/socket.pyc in meth(name, self, *args) 222 223 def meth(name,self,*args): --> 224 return getattr(self._sock,name)(*args) 225 226 for _m in _socketmethods: error: [Errno 13] Permission denied
In real-world client/server programs, it’s far more typical to code a server so as to avoid blocking new requests while handling a current client’s request. Perhaps the easiest way to do so is to service each client’s request in parallel—in a new process, in a new thread, or by manually switching (multiplexing) between clients in an event loop. This isn’t a socket issue per se, and we already learned how to start processes and threads in Chapter 5. But since these schemes are so typical of socket server programming, let’s explore all three ways to handle client requests in parallel here.
Server side: open a socket on a port, listen for a message from a client, and send an echo reply; forks a process to handle each client connection; child processes share parent's socket descriptors; fork is less portable than threads--not yet on Windows, unless Cygwin or similar installed
import os, time, sys
from socket import *
myHost = '' # '' = all available interfaces on host
myPort = 50007 # listen on a non-reserved port number
sockobj = socket(AF_INET, SOCK_STREAM) #make a TCP socket object
sockobj.bind((myHost, myPort)) #bind it to server port number
sockobj.listen(5) #listen, allow 5 pending connects
def now():
return time.ctime(time.time())
activeChildren = []
def reapChildren(): # reap any dead child processes
while activeChildren: # else may fill up system table
pid, stat = os.waitpid(0, os.WNOHANG) # don't hang if no child exited
if not pid: break
activeChildren.remove(pid)
# child process: reply, exit simulate a blocking activity
# read, write a client socket till eof when socket closed
def handleClient(connection):
time.sleep(5)
while True:
data = connections.recv(1024) # read next line on client socket
if not data: break # send a reply line to the client
reply = 'Echo=>%s at %s' % (data,now())
connections.send(reply.encode()) # until eof when socket closed
connections.close()
os._exit(0)
def dispatcher(): # listen until process killed
while True: # wait for next connection,
connection, address = sockobj.accept() # pass to process for service
print 'Server connected by', address, 'at', now()
reapChildren() # clean up exited children now
childPid = os.fork() # copy this process
if childPid == 0: # if in child process: handle
handleClient(connection)
else: # else: go accept next connect
activeChildren.append(childPid) # add to active child pid list
dispatcher()
!netstat -pant | grep 50007 #show 50007 port
!kill -9 pid # kill python server
the test proceeds as follows:
immediately go to sleep for five seconds (to simulate being busy doing something useful). 4. Each client waits until the server replies, which happens five seconds after their initial requests.
In a more realistic application, that delay could be fatal if many clients were trying to connect at once—the server would be stuck in the action we’re simulating with time.sleep , and not get back to the main loop to accept new client requests. With process forks per request, clients can be serviced in parallel.
ps -af full process listing command shows that all the dead child pro-
cesses stay in the system process table (show as
When the reapChildren command is reactivated, dead child zombie entries are cleaned up each time the server gets a new client connection request, by calling the Python os.waitpid function. A few zombies may accumulate if the server is heavily loaded, but they will remain only until the next client connection is received (you get only as many zombies as processes served in parallel since the last accept )
In fact, if you type fast enough, you can actually see a child process morph from a real
running program into a zombie. Here, for example, a child spawned to handle a new
request changes to
On some systems, it’s also possible to clean up zombie child processes by resetting the signal handler for the SIGCHLD signal delivered to a parent process by the operating system when a child process stops or exits. If a Python script assigns the SIG_IGN (ignore) action as the SIGCHLD signal handler, zombies will be removed automatically and im- mediately by the operating system as child processes exit; the parent need not issue wait calls to clean up after them. Because of that, this scheme is a simpler alternative to manually reaping zombies on platforms where it is supported.
# Demo Python's signal module; pass signal number as a command-line arg, and use
# a "kill -N pid" shell command to send this process a signal; on my Linux machine,
# SIGUSR1=10, SIGUSR2=12, SIGCHLD=17, and SIGCHLD handler stays in effect even if
# not restored: all other handlers are restored by Python after caught, but SIGCHLD
# behavior is left to the platform's implementation; signal works on Windows too,
# but defines only a few signal types; signals are not very portable in general
import sys, signal, time
def now():
return time.asctime()
def onSignal(signum, stackframe): # Python signal handler
print 'Got signal', signum, 'at', now()
# most handlers stay in effect but sigchld handler is not
if signum == signal.SIGCHLD: #signal.signal(signal.SIGCHLD, onSignal)
print 'sigchld caught'
signum = int(sys.argv[1])
signal.signal(signum, onSignal) # install signal handler
while True: signal.pause() # sleep waiting for signals
To run this script, simply put it in the background and send it signals by typing the kill -signal-number process-id shell command line; this is the shell’s equivalent of Python’s os.kill function available on Unix-like platforms only. Process IDs are listed in the PID column of ps command results. Here is this script in action catching signal numbers 10 (reserved for general use) and 9 (the unavoidable terminate signal).
import os, time, sys
from socket import *
myHost = '' # '' = all available interfaces on host
myPort = 50007 # listen on a non-reserved port number
sockobj = socket(AF_INET, SOCK_STREAM) #make a TCP socket object
sockobj.bind((myHost, myPort)) #bind it to server port number
sockobj.listen(5) #listen, allow 5 pending connects
signal.signal(signal.SIGCHLD, signal.SIG_IGN) #avoid child zombie processes
def now():
return time.ctime(time.time())
# child process: reply, exit simulate a blocking activity
# read, write a client socket till eof when socket closed
def handleClient(connection):
time.sleep(5)
while True:
data = connections.recv(1024) # read next line on client socket
if not data: break # send a reply line to the client
reply = 'Echo=>%s at %s' % (data,now())
connections.send(reply.encode()) # until eof when socket closed
connections.close()
os._exit(0)
def dispatcher(): # listen until process killed
while True: # wait for next connection,
connection, address = sockobj.accept() # pass to process for service
print 'Server connected by', address, 'at', now()
childPid = os.fork() # copy this process
if childPid == 0: # if in child process: handle
handleClient(connection)
this technique is not universally supported across all flavors of Unix. If you care about portability, manually reaping children as we did in Example 12-4 may still be desirable.
Though it's crash on Win,open sockets are not correctly pickled when passed as arguments into the new process, it's ok on linux.
Because threads all run in the same process and memory space, they automatically share sockets passed between them, similar in spirit to the way that child processes inherit socket descriptors. Unlike processes, though, threads are usually less expensive to start, and work on both Unix-like machines and Windows under standard Python today. Furthermore, many (though not all) see threads as simpler to program—child threads die silently on exit, without leaving behind zombies to haunt the server.
# Server side: open a socket on a port, listen for a message from a client,
# and send an echo reply; echoes lines until eof when client closes socket;
# spawns a thread to handle each client connection; threads share global
# memory space with main thread; this is more portable than fork: threads
# work on standard Windows systems, but process forks do not
import time, _thread as thread # or use threading.Thread().start()
from socket import * # get socket constructor and constants
myHost = '' # server machine, '' means local host
myPort = 50007 # listen on a non-reserved port number
sockobj = socket(AF_INET, SOCK_STREAM) # make a TCP socket object
sockobj.bind((myHost, myPort)) # bind it to server port number
sockobj.listen(5) # allow up to 5 pending connects
def now():
return time.ctime(time.time()) # current time on the server
def handleClient(connection): # in spawned thread: reply
time.sleep(5) # simulate a blocking activity
while True: # read, write a client socket
data = connection.recv(1024)
if not data: break
reply = 'Echo=>%s at %s' % (data, now())
connection.send(reply.encode())
connection.close()
def dispatcher(): # listen until process killed
while True: # wait for next connection,
connection, address = sockobj.accept() # pass to thread for service
print 'Server connected by', address, 'at', now()
thread.start_new_thread(handleClient, (connection,))
dispatcher()
Remember that a thread silently exits when the function it is running returns; unlike the process fork version, we don’t call anything like os . _exit in the client handler func- tion (and we shouldn’t—it may kill all threads in the process, including the main loop watching for new connections!). Because of this, the thread version is not only more portable, but also simpler.
socketserver module defines classes that implement all flavors of forking and threading servers that you are likely to be interested in.
"""
Server side: open a socket on a port, listen for a message from a client, and
send an echo reply; this version uses the standard library module socketserver to
do its work; socketserver provides TCPServer, ThreadingTCPServer, ForkingTCPServer,
UDP variants of these, and more, and routes each client connect request to a new
instance of a passed-in request handler object's handle method; socketserver also
supports Unix domain sockets, but only on Unixen; see the Python library manual.
"""
import SocketServer as socketserver, time # get socket server, handler objects
myHost = '' # server machine, '' means local host
myPort = 50007 # listen on a non-reserved port number
def now():
return time.ctime(time.time())
class MyClientHandler(socketserver.BaseRequestHandler):
def handle(self): # on each client connect
print(self.client_address, now()) # show this client's address
time.sleep(5) # simulate a blocking activity
while True: # self.request is client socket
data = self.request.recv(1024) # read, write a client socket
if not data: break
reply = 'Echo=>%s at %s' % (data, now())
self.request.send(reply.encode())
self.request.close()
# make a threaded server, listen/handle clients forever
myaddr = (myHost, myPort)
server = socketserver.ThreadingTCPServer(myaddr, MyClientHandler)
server.serve_forever()
Technically, though, threads and processes don’t really run in parallel, unless you’re lucky enough to have a machine with many CPUs. Instead, your operating system performs a juggling act—it divides the computer’s processing power among all active tasks. It runs part of one, then part of another, and so on. All the tasks appear to run in parallel, but only because the operating system switches focus between tasks so fast that you don’t usually notice. This process of switching between tasks is sometimes called time-slicing when done by an operating system; it is more generally known as multiplexing.
In select-asynchronous servers, a single main loop run in a single process and thread decides which clients should get a bit of attention each time through. Client requests and the main dispatcher loop are each given a small slice of the server’s attention if they are ready to converse.
That is, when the sources passed to select are sockets, we can be sure that socket calls like accept , recv , and send will not block (pause) the server when applied to objects returned by select . Because of that, a single-loop server that uses select need not get stuck communicating with one client or waiting for new ones while other clients are starved for the server’s attention.
Because this type of server does not need to start threads or processes, it can be efficient when transactions with clients are relatively short-lived. However, it also requires that these transactions be quick; if they are not, it still runs the risk of becoming bogged down waiting for a dialog with a particular client to end, unless augmented with threads or forks for long-running transactions.
Confusingly, select-based servers are often called asynchronous, to describe their multiplexing of short-lived transactions. Really, though, the classic forking and threading servers we met earlier are asynchronous, too, as they do not wait for completion of a given client’s request. There is a clearer distinction between serial and parallel servers
can handle multiple clients without ever starting new processes or threads
# P822
"""
Server: handle multiple clients in parallel with select. use the select
module to manually multiplex among a set of sockets: main sockets which
accept new client connections, and input sockets connected to accepted
clients; select can take an optional 4th arg--0 to poll, n.m to wait n.m
seconds, or omitted to wait till any socket is ready for processing.
"""
import sys
import time
from select import select
from socket import socket, AF_INET, SOCK_STREAM
def now():
return time.ctime(time.time())
myHost = '' # server machine, '' means local host
myPort = 50007 # listen on a non-reserved port number
if len(sys.argv) == 3: # allow host/port as cmdline args too
myHost, myPort = sys.argv[1:]
numPortSocks = 2 # number of ports for client connects
# make main sockets for accepting new client requests
mainsocks, readsocks, writesocks = [], [], []
for i in range(numPortSocks):
portsock = socket(AF_INET, SOCK_STREAM) # make a TCP/IP socket object
portsock.bind((myHost, myPort)) # bind it to server port number
portsock.listen(5) # listen, allow 5 pending connects
mainsocks.append(portsock) # add to main list to identify
readsocks.append(portsock) # add to select inputs list
myPort += 1 # bind on consecutive ports
# event loop: listen and multiplex until server process killed
print('select-server loop starting')
while True:
# print(readsocks)
readables, writeables, exceptions = select(readsocks, writesocks, [])
for sockobj in readables:
if sockobj in mainsocks: # for ready input sockets
# port socket: accept new client
newsock, address = sockobj.accept() # accept should not block
print('Connect:', address, id(newsock)) # newsock is a new socket
readsocks.append(newsock) # add to select list, wait
else:
# client socket: read next line
data = sockobj.recv(1024) # recv should not block
print('\tgot', data, 'on', id(sockobj))
if not data: # if closed by the clients
sockobj.close() # close here and remv from
readsocks.remove(sockobj) # del list else reselected
else:
# this may block: should really select for writes too
reply = 'Echo=>%s at %s' % (data, now())
sockobj.send(reply.encode())
Formally, select is called with three lists of selectable objects (input sources, out- put sources, and exceptional condition sources), plus an optional timeout. The timeout argument may be a real wait expiration value in seconds (use floating-point numbers to express fractions of a second), a zero value to mean simply poll and return immediately, or omitted to mean wait until at least one object is ready (as done in our server script). The call returns a triple of ready objects—subsets of the first three arguments—any or all of which may be empty if the timeout expired before sources became ready.
If you’re interested in using select , you will probably also be interested in checking out the asyncore.py module in the standard Python library. It implements a class- based callback model, where input and output callbacks are dispatched to class methods by a precoded select event loop. As such, it allows servers to be con- structed without threads or forks, and it is a select -based alternative to the sock etserver module’s threading and forking module we met in the prior sections. As for this type of server in general, asyncore is best when transactions are short— what it describes as “I/O bound” instead of “CPU bound” programs, the latter of which still require threads or forks. See the Python library manual for details and a usage example.
For other server options, see also the open source Twisted system (http://twistedmatrix.com). Twisted is an asynchronous networking framework written in Python that supports TCP, UDP, multicast, SSL/TLS, serial communication, and more. It supports both clients and servers and includes implementations of a number of commonly used network services such as a web server, an IRC chat server, a mail server, a relational database interface, and an object broker. Although Twisted supports processes and threads for longer-running actions, it also uses an asynchronous, event-driven model to handle clients, which is similar to the event loop of GUI libraries like tkinter. It abstracts an event loop, which multiplexes among open socket connections, automates many of the details in- herent in an asynchronous server, and provides an event-driven framework for scripts to use to accomplish application tasks. Twisted’s internal event engine is similar in spirit to our select -based server and the asyncore module, but it is re- garded as much more advanced. Twisted is a third-party system, not a standard library tool; see its website and documentation for more details.
select
way that it can be multiplexed with other requests and not block the server’s main loop
because we need to manually transfer control among all tasks (for instance, compare the threaded and select versions of our echo server, even without write selects).
threads or forks
long-running processing above and beyond the socket calls used to pass data.
The asyncore standard library module
Twisted
allow a script to use standard stream tools such as the print and input built-in functions and sys module file calls (e.g., sys.stdout.write ), and connect them to sock- ets only when needed.
The socket object makefile method comes in handy anytime you wish to process a socket with normal file object methods or need to pass a socket to an existing interface or program that expects a file.
The makefile method also allows us to treat normally binary socket data as text instead of byte strings, and has additional arguments such as encoding that let us specify non- default Unicode encodings for the transferred text
Although text can always be encoded and decoded with manual calls after binary mode socket transfers, make file shifts the burden of text encodings from your code to the file wrapper object.
even when line buffering is requested, socket wrapper file writes (and by association, prints) are buffered until the program exits, manual flushes are reques- ted, or the buffer becomes full.
# socket-unbuff-server.py
from __future__ import print_function
from socket import * # read three messages over a raw socket
sock = socket()
sock.bind(('', 60000))
sock.listen(5)
print('accepting...')
conn, id = sock.accept() # blocks till client connect
for i in range(3):
print('receiving...')
msg = conn.recv(1024) # blocks till data received
print(msg) # gets all print lines at once unless flushed
# socket-unbuff-client.py
# send three msgs over wrapped and raw socket
from __future__ import print_function
import time
from socket import *
sock = socket() # default=AF_INET, SOCK_STREAM (tcp/ip)
sock.connect(('localhost', 60000))
# default=full buff, 0=error, 1 not linebuff!
file = sock.makefile('w', buffering=1)
print('sending data1')
file.write('spam\n')
time.sleep(5) # must follow with flush() to truly send now
# file.flush() # uncomment flush lines to see the difference
print('sending data2')
# adding more file prints does not flush buffer either
print('eggs', file)
time.sleep(5)
# file.flush() # output appears at server recv only upon
# flush or exit
print('sending data3')
sock.send(b'ham\n') # low-level byte string interface sends immediately
time.sleep(5) # received first if don't flush other two!
Buffered streams and deadlock are general issues that go beyond socket wrapper files.
# pipe-unbuff-writer.py
# output line buffered (unbuffered) if stdout is a terminal, buffered by default for
# other devices: use -u or sys.stdout.flush() to avoid delayed output on pipe/socket
import time, sys
for i in range(5):
print(time.asctime()) # print transfers per stream buffering
sys.stdout.write('spam\n') # ditto for direct stream file access
time.sleep(2) # unles sys.stdout reset to other file
# no output for 10 seconds unless Python -u flag used or sys.stdout.flush()
# but writer's output appears here every 2 seconds when either option is used
from __future__ import print_function
import os
for line in os.popen('python -u pipe-unbuff-writer.py'): # iterator reads lines
print(line, end='') # blocks without -u!
why use sockets in this redirection role at all? Programs require a direct spawning relationship, command pipes do not support longerlived or remotely running servers the way that sockets do.
running perpetually to serve multiple clients (albeit with some changes to our utility module’s listener initialization code). Moreover, passing in remote machine names to our socket redirection tools would allow a client to connect to a server running on a completely different machine.
with the open call support stronger independence of client and server, too, but unlike sockets, they are usually limited to the local machine, and are not supported on all platforms.
implements both the server-side and the client-side logic needed to ship a requested file from server to client machines over a raw socket.
implement client and server-side logic to transfer an arbitrary file from server to client over a socket; uses a simple control-info protocol rather than separate sockets for control and data (as in ftp), dispatches each client request to a handler thread, and loops to transfer the entire file by blocks; see ftplib examples for a higher-level transport scheme
"""
#############################################################################
implement client and server-side logic to transfer an arbitrary file from
server to client over a socket; uses a simple control-info protocol rather
than separate sockets for control and data (as in ftp), dispatches each
client request to a handler thread, and loops to transfer the entire file
by blocks; see ftplib examples for a higher-level transport scheme;
#############################################################################
"""
import sys, os, time, thread
from socket import *
blksz = 1024
defaultHost = 'localhost'
defaultPort = 50001
helptext = """
Usage...
server=> getfile.py -mode server [-port nnn] [-host hhh|localhost]
client=> getfile.py [-mode client] -file fff [-port nnn] [-host hhh|localhost]
"""
def now():
return time.asctime()
def parsecommandline():
dict = {} # put in dictionary for easy lookup
args = sys.argv[1:] # skip program name at front of args
while len(args) >= 2: # example: dict['-mode'] = 'server'
dict[args[0]] = args[1]
args = args[2:]
return dict
def client(host, port, filename):
sock = socket(AF_INET, SOCK_STREAM)
sock.connect((host, port))
sock.send((filename + '\n').encode()) # send remote name with dir: bytes
dropdir = os.path.split(filename)[1] # filename at end of dir path
file = open(dropdir, 'wb') # create local file in cwd
while True:
data = sock.recv(blksz) # get up to 1K at a time
if not data: break # till closed on server side
file.write(data) # store data in local file
sock.close()
file.close()
print('Client got', filename, 'at', now())
def serverthread(clientsock):
sockfile = clientsock.makefile('r') # wrap socket in dup file obj
filename = sockfile.readline()[:-1] # get filename up to end-line
try:
file = open(filename, 'rb')
while True:
bytes = file.read(blksz) # read/send 1K at a time
if not bytes: break # until file totally sent
sent = clientsock.send(bytes)
assert sent == len(bytes)
except:
print 'Error downloading file on server:', filename
clientsock.close()
def server(host, port):
serversock = socket(AF_INET, SOCK_STREAM) # listen on TCP/IP socket
serversock.bind((host, port)) # serve clients in threads
serversock.listen(5)
while True:
clientsock, clientaddr = serversock.accept()
print 'Server connected by', clientaddr, 'at', now()
thread.start_new_thread(serverthread, (clientsock,))
def main(args):
host = args.get('-host', defaultHost) # use args or defaults
port = int(args.get('-port', defaultPort)) # is a string in argv
if args.get('-mode') == 'server': # None if no -mode: client
if host == 'localhost': host = '' # else fails remotely
server(host, port)
elif args.get('-file'): # client mode needs -file
client(host, port, args['-file'])
else:
print helptext
if __name__ == '__main__':
args = parsecommandline()
main(args)
fers the requested file’s bytes. 2. The client function sends the server a file’s name and stores all the bytes it gets back in a local file of the same name. 3. The most novel feature here is the protocol between client and server: the client starts the conversation by shipping a filename string up to the server, terminated with an end- of-line character, and including the file’s directory path in the server. At the server, a spawned thread extracts the requested file’s name by reading the client socket, and opens and transfers the requested file back to the client, one chunk of bytes at a time.
One subtle security point here: the server instance code is happy to send any server- side file whose pathname is sent from a client, as long as the server is run with a user- name that has read access to the requested file. If you care about keeping some of your server-side files private, you should add logic to suppress downloads of restricted files. I’ll leave this as a suggested exercise here, but we will implement such filename checks in a different getfile download tool later in this book.
For instance, it would be easy to implement a simple tkinter GUI frontend to the client- side portion of the getfile script we just met. Such a tool, run on the client machine, may simply pop up a window with Entry widgets for typing the desired filename, server, and so on. Once download parameters have been input, the user interface could either import and call the getfile.client function with appropriate option arguments, or build and run the implied getfile.py command line using tools such as os.system , os.popen , subprocess , and so on.
"""
launch getfile script client from simple tkinter GUI;
could also use os.fork+exec, os.spawnv (see Launcher);
windows: replace 'python' with 'start' if not on path;
"""
import os
from tkinter import *
from tkinter.messagebox import showinfo
def onReturnKey():
cmdline = ('python getfile.py -mode client -file %s -port %s -host %s' %
(content['File'].get(),
content['Port'].get(),
content['Server'].get()))
os.system(cmdline)
showinfo('getfilegui-1', 'Download complete')
box = Tk()
labels = ['Server', 'Port', 'File']
content = {}
for label in labels:
row = Frame(box)
row.pack(fill=X)
Label(row, text=label, width=6).pack(side=LEFT)
entry = Entry(row)
entry.pack(side=RIGHT, expand=YES, fill=X)
content[label] = entry
box.title('getfilegui-1')
box.bind('<Return>', (lambda event: onReturnKey()))
mainloop()
"""
same, but with grids and import+call, not packs and cmdline;
direct function calls are usually faster than running files;
"""
import getfile
from tkinter import *
from tkinter.messagebox import showinfo
def onSubmit():
getfile.client(content['Server'].get(),
int(content['Port'].get()),
content['File'].get())
showinfo('getfilegui-2', 'Download complete')
box = Tk()
labels = ['Server', 'Port', 'File']
rownum = 0
content = {}
for label in labels:
Label(box, text=label).grid(column=0, row=rownum)
entry = Entry(box)
entry.grid(column=1, row=rownum, sticky=E+W)
content[label] = entry
rownum += 1
box.columnconfigure(0, weight=0) # make expandable
box.columnconfigure(1, weight=1)
Button(text='Submit', command=onSubmit).grid(row=rownum, column=0, columnspan=2)
box.title('getfilegui-2')
box.bind('<Return>', (lambda event: onSubmit()))
mainloop()
If you’re like me, though, writing all the GUI form layout code in those two scripts can seem a bit tedious, whether you use packing or grids. In fact, it became so tedious to me that I decided to write a general-purpose form-layout class, shown in Exam- ple 12-20, which handles most of the GUI layout grunt work.
"""
##################################################################
a reusable form class, used by getfilegui (and others)
##################################################################
"""
from tkinter import *
entrysize = 40
class Form: # add non-modal form box
def __init__(self, labels, parent=None): # pass field labels list
labelsize = max(len(x) for x in labels) + 2
box = Frame(parent) # box has rows, buttons
box.pack(expand=YES, fill=X) # rows has row frames
rows = Frame(box, bd=2, relief=GROOVE) # go=button or return key
rows.pack(side=TOP, expand=YES, fill=X) # runs onSubmit method
self.content = {}
for label in labels:
row = Frame(rows)
row.pack(fill=X)
Label(row, text=label, width=labelsize).pack(side=LEFT)
entry = Entry(row, width=entrysize)
entry.pack(side=RIGHT, expand=YES, fill=X)
self.content[label] = entry
Button(box, text='Cancel', command=self.onCancel).pack(side=RIGHT)
Button(box, text='Submit', command=self.onSubmit).pack(side=RIGHT)
box.master.bind('<Return>', (lambda event: self.onSubmit()))
def onSubmit(self): # override this
for key in self.content: # user inputs in
print(key, '\t=>\t', self.content[key].get()) # self.content[k]
def onCancel(self): # override if need
Tk().quit() # default is exit
class DynamicForm(Form):
def __init__(self, labels=None):
labels = input('Enter field names: ').split()
Form.__init__(self, labels)
def onSubmit(self):
print('Field values...')
Form.onSubmit(self)
self.onCancel()
if __name__ == '__main__':
import sys
if len(sys.argv) == 1:
Form(['Name', 'Age', 'Job']) # precoded fields, stay after submit
else:
DynamicForm() # input fields, go away after submit
mainloop()
"""
launch getfile client with a reusable GUI form class;
os.chdir to target local dir if input (getfile stores in cwd);
to do: use threads, show download status and getfile prints;
"""
from form import Form
from tkinter import Tk, mainloop
from tkinter.messagebox import showinfo
import getfile, os
class GetfileForm(Form):
def __init__(self, oneshot=False):
root = Tk()
root.title('getfilegui')
labels = ['Server Name', 'Port Number', 'File Name', 'Local Dir?']
Form.__init__(self, labels, root)
self.oneshot = oneshot
def onSubmit(self):
Form.onSubmit(self)
localdir = self.content['Local Dir?'].get()
portnumber = self.content['Port Number'].get()
servername = self.content['Server Name'].get()
filename = self.content['File Name'].get()
if localdir:
os.chdir(localdir)
portnumber = int(portnumber)
getfile.client(servername, portnumber, filename)
showinfo('getfilegui', 'Download complete')
if self.oneshot: Tk().quit() # else stay in last localdir
if __name__ == '__main__':
GetfileForm()
mainloop()
One caveat worth pointing out here: the GUI is essentially dead while the download is in progress (even screen redraws aren’t handled—try covering and uncovering the window and you’ll see what I mean). We could make this better by running the down- load in a thread, but since we’ll see how to do that in the next chapter when we explore the FTP protocol, you should consider this problem a preview.
In particular, getfile clients can talk only to machines that are running a getfile server. In the next chapter, we’ll discover another way to download files—FTP—which also runs on sockets but provides a higher-level interface and is available as a standard service on many machines on the Net. We don’t generally need to start up a custom server to transfer files over FTP, the way we do with getfile . In fact, the user-interface scripts in this chapter could be easily changed to fetch the desired file with Python’s FTP tools, instead of the getfile module. But instead of spilling all the beans here, I’ll just say, “Read on.”
If you’re looking for a lower-level way to communicate with devices in general, though, you may also be interested in the topic of Python’s serial port interfaces. This isn’t quite related to Internet scripting, but it’s similar enough in spirit and is discussed often enough on the Net to merit a few words here.
In brief, scripts can use serial port interfaces to engage in low-level communication with things like mice, modems, and a wide variety of serial devices and hardware. Serial port interfaces are also used to communicate with devices connected over infrared ports (e.g., hand-held computers and remote modems). Such interfaces let scripts tap into raw data streams and implement device protocols of their own. Other Python tools such as the ctypes and struct modules may provide additional tools for creating and extracting the packed binary data these ports transfer.
At this writing, there are a variety of ways to send and receive data over serial ports in Python scripts. Notable among these options is an open source extension package known as pySerial, which allows Python scripts to control serial ports on both Windows and Linux, as well as BSD Unix, Jython (for Java), and IronPython (for .Net and Mono). Unfortunately, there is not enough space to cover this or any other serial port option in any sort of detail in this text. As always, see your favorite web search engine for up- to-date details on this front.