進(jìn)程初識(shí)

進(jìn)程

一個(gè)進(jìn)程就是一個(gè)正在執(zhí)行程序的實(shí)例。從概念上說,進(jìn)程是處于執(zhí)行期的程序以及相關(guān)的資源的總稱。進(jìn)程不僅局限于一段可執(zhí)行程序代碼,通常還包含其它資源:內(nèi)核數(shù)據(jù)對象、打開的文件、掛起的信號(hào)、處理器狀態(tài)、內(nèi)存地址空間、一個(gè)或多個(gè)執(zhí)行線程等。

程序本身并不是進(jìn)程,程序可以認(rèn)為是磁盤上二進(jìn)制代碼的集合。當(dāng)操作系統(tǒng)加載運(yùn)行程序的那一時(shí)刻,即創(chuàng)建了新的進(jìn)程。操作系統(tǒng)可以加載運(yùn)行同一個(gè)程序多次,另一層意思是兩個(gè)或多個(gè)進(jìn)程可以共享同一程序代碼。

在Linux中,通常調(diào)用fork()系統(tǒng)調(diào)用來創(chuàng)建一個(gè)新的進(jìn)程。調(diào)用fork()的進(jìn)程稱為父進(jìn)程,生成的新進(jìn)程為子進(jìn)程fork()系統(tǒng)調(diào)用從內(nèi)核返回兩次:一次回到父進(jìn)程,另一次回到子進(jìn)程。

程序通過exit()系統(tǒng)調(diào)用退出執(zhí)行。該函數(shù)會(huì)終結(jié)進(jìn)程并將其占用的資源釋放掉。父進(jìn)程可以通過wait4()系統(tǒng)調(diào)用用來等待和獲取子進(jìn)程的退出狀態(tài)。進(jìn)程退出后被設(shè)置為僵死狀態(tài),直到它的父進(jìn)程調(diào)用wait()waitpid()為止。

進(jìn)程狀態(tài)

每個(gè)進(jìn)程都有生命周期,從創(chuàng)建到終止。進(jìn)程通過狀態(tài)來體現(xiàn)生命周期的變化。進(jìn)程狀態(tài)可分為三大類:

  • 運(yùn)行態(tài):該時(shí)刻進(jìn)程占用cpu
  • 就緒態(tài):進(jìn)程已經(jīng)就緒,暫時(shí)未被cpu執(zhí)行
  • 阻塞態(tài):等待某種外部事件發(fā)生

在Linux中,進(jìn)程狀態(tài)分為五種:

  • TASK_RUNNING:進(jìn)程待執(zhí)行(就緒態(tài),在運(yùn)行隊(duì)列中待執(zhí)行)或正在執(zhí)行(運(yùn)行態(tài))
  • TASK_INTERRUPTIBLE:進(jìn)程處于等待狀態(tài)(阻塞態(tài)),等待某個(gè)條件達(dá)成后被內(nèi)核喚醒,也可能因接收到信號(hào)而提前被喚醒
  • TASK_UNINTERRUPTIBLE:進(jìn)程處于等待狀態(tài)(阻塞態(tài)),不可中斷,不接收任何信號(hào),必須等待某事件發(fā)生才會(huì)被喚醒
  • TASK_TRACED:被其它進(jìn)程跟蹤,如被ptrace調(diào)試
  • TASK_STOPPED:進(jìn)程停止執(zhí)行,當(dāng)進(jìn)程接收到SIGSTOP、SIGTSTP、SIGTTINSIGTTOU信號(hào)后會(huì)進(jìn)入此狀態(tài)

進(jìn)程的層次結(jié)構(gòu)

當(dāng)進(jìn)程創(chuàng)建子進(jìn)程后,父子進(jìn)程會(huì)以某種形式保持關(guān)聯(lián),而子進(jìn)程又可以創(chuàng)建更多的子進(jìn)程,這樣就組成一個(gè)進(jìn)程的層次結(jié)構(gòu)。

每個(gè)進(jìn)程有且僅有一個(gè)父進(jìn)程,但可以擁有0個(gè)或多個(gè)子進(jìn)程。擁有同一個(gè)父進(jìn)程的所有進(jìn)程稱為兄弟

Linux中所有的進(jìn)程都是PID為1的進(jìn)程的后代。

進(jìn)程描述符

內(nèi)核把運(yùn)行態(tài)的進(jìn)程信息存放在由雙向循環(huán)鏈表構(gòu)成的任務(wù)隊(duì)列中。隊(duì)列中的每一項(xiàng)類型為struct task_struct,稱為進(jìn)程描述符結(jié)構(gòu),
該結(jié)構(gòu)定義在<linux/sched.h>文件中。進(jìn)程描述符中包含一個(gè)具體進(jìn)程的所有信息。

進(jìn)程描述符的信息可以大致劃分為以下幾大類:

  • 調(diào)度參數(shù):進(jìn)程優(yōu)先級,最近消耗cpu的時(shí)間,最近睡眠的時(shí)間等。
  • 內(nèi)存映射:指向代碼、數(shù)據(jù)、堆棧段或頁表的指針。
  • 信號(hào):通過信號(hào)掩碼顯示哪些信號(hào)被忽略、哪些需要被捕捉、哪些暫時(shí)阻塞、哪些信號(hào)傳遞當(dāng)中。
  • 機(jī)器寄存器:當(dāng)上下文切換時(shí),機(jī)器寄存器的內(nèi)容會(huì)被保存。
  • 系統(tǒng)調(diào)用狀態(tài):當(dāng)前系統(tǒng)調(diào)用的信息,包括參數(shù)和返回值。
  • 文件描述符表:當(dāng)某個(gè)文件被打開時(shí),文件描述作為索引在文件描述表中定位相關(guān)文件的i節(jié)點(diǎn)數(shù)據(jù)結(jié)構(gòu)。
  • 統(tǒng)計(jì)數(shù)據(jù):指向記錄用戶、系統(tǒng)執(zhí)行時(shí)間。
  • 內(nèi)核堆棧:進(jìn)程的內(nèi)核部分可使用的固定堆棧。
  • 其他:進(jìn)程狀態(tài)、PID、父子進(jìn)程關(guān)系、用戶和組標(biāo)識(shí)等。

struct task_struct結(jié)構(gòu)體比較大,完整的結(jié)構(gòu)如下(linux-4.9.44):

struct task_struct {
#ifdef CONFIG_THREAD_INFO_IN_TASK
    /*
     * For reasons of header soup (see current_thread_info()), this
     * must be the first element of task_struct.
     */
    struct thread_info thread_info;
#endif
    volatile long state;    /* -1 unrunnable, 0 runnable, >0 stopped */
    void *stack;
    atomic_t usage;
    unsigned int flags; /* per process flags, defined below */
    unsigned int ptrace;

#ifdef CONFIG_SMP
    struct llist_node wake_entry;
    int on_cpu;
#ifdef CONFIG_THREAD_INFO_IN_TASK
    unsigned int cpu;   /* current CPU */
#endif
    unsigned int wakee_flips;
    unsigned long wakee_flip_decay_ts;
    struct task_struct *last_wakee;

    int wake_cpu;
#endif
    int on_rq;

    /*與調(diào)度相關(guān)的信息*/
    int prio, static_prio, normal_prio;
    unsigned int rt_priority;
    const struct sched_class *sched_class;
    struct sched_entity se;
    struct sched_rt_entity rt;
#ifdef CONFIG_CGROUP_SCHED
    struct task_group *sched_task_group;
#endif
    struct sched_dl_entity dl;

#ifdef CONFIG_PREEMPT_NOTIFIERS
    /* list of struct preempt_notifier: */
    struct hlist_head preempt_notifiers;
#endif

#ifdef CONFIG_BLK_DEV_IO_TRACE
    unsigned int btrace_seq;
#endif

    unsigned int policy;
    int nr_cpus_allowed;
    cpumask_t cpus_allowed;

#ifdef CONFIG_PREEMPT_RCU
    int rcu_read_lock_nesting;
    union rcu_special rcu_read_unlock_special;
    struct list_head rcu_node_entry;
    struct rcu_node *rcu_blocked_node;
#endif /* #ifdef CONFIG_PREEMPT_RCU */
#ifdef CONFIG_TASKS_RCU
    unsigned long rcu_tasks_nvcsw;
    bool rcu_tasks_holdout;
    struct list_head rcu_tasks_holdout_list;
    int rcu_tasks_idle_cpu;
#endif /* #ifdef CONFIG_TASKS_RCU */

#ifdef CONFIG_SCHED_INFO
    struct sched_info sched_info;
#endif
    /*task鏈表*/
    struct list_head tasks;
#ifdef CONFIG_SMP
    struct plist_node pushable_tasks;
    struct rb_node pushable_dl_tasks;
#endif
    /*虛擬內(nèi)存空間*/
    struct mm_struct *mm, *active_mm;
    /* per-thread vma caching */
    u32 vmacache_seqnum;
    struct vm_area_struct *vmacache[VMACACHE_SIZE];
#if defined(SPLIT_RSS_COUNTING)
    struct task_rss_stat    rss_stat;
#endif
/* task state */
    int exit_state;
    int exit_code, exit_signal;
    int pdeath_signal;  /*  The signal sent when the parent dies  */
    unsigned long jobctl;   /* JOBCTL_*, siglock protected */

    /* Used for emulating ABI behavior of previous Linux versions */
    unsigned int personality;

    /* scheduler bits, serialized by scheduler locks */
    unsigned sched_reset_on_fork:1;
    unsigned sched_contributes_to_load:1;
    unsigned sched_migrated:1;
    unsigned sched_remote_wakeup:1;
    unsigned :0; /* force alignment to the next boundary */

    /* unserialized, strictly 'current' */
    unsigned in_execve:1; /* bit to tell LSMs we're in execve */
    unsigned in_iowait:1;
#if !defined(TIF_RESTORE_SIGMASK)
    unsigned restore_sigmask:1;
#endif
#ifdef CONFIG_MEMCG
    unsigned memcg_may_oom:1;
#ifndef CONFIG_SLOB
    unsigned memcg_kmem_skip_account:1;
#endif
#endif
#ifdef CONFIG_COMPAT_BRK
    unsigned brk_randomized:1;
#endif
#ifdef CONFIG_CGROUPS
    /* disallow userland-initiated cgroup migration */
    unsigned no_cgroup_migration:1;
#endif

    unsigned long atomic_flags; /* Flags needing atomic access. */

    struct restart_block restart_block;
    /*進(jìn)程標(biāo)識(shí)*/
    pid_t pid;
    pid_t tgid;

#ifdef CONFIG_CC_STACKPROTECTOR
    /* Canary value for the -fstack-protector gcc feature */
    unsigned long stack_canary;
#endif
    /*
     * pointers to (original) parent process, youngest child, younger sibling,
     * older sibling, respectively.  (p->father can be replaced with
     * p->real_parent->pid)
     */
    struct task_struct __rcu *real_parent; /* real parent process */
    struct task_struct __rcu *parent; /* recipient of SIGCHLD, wait4() reports */
    /*
     * children/sibling forms the list of my natural children
     */
    struct list_head children;  /* list of my children */
    struct list_head sibling;   /* linkage in my parent's children list */
    struct task_struct *group_leader;   /* threadgroup leader */

    /*
     * ptraced is the list of tasks this task is using ptrace on.
     * This includes both natural children and PTRACE_ATTACH targets.
     * p->ptrace_entry is p's link on the p->parent->ptraced list.
     */
    struct list_head ptraced;
    struct list_head ptrace_entry;

    /* PID/PID hash table linkage. */
    struct pid_link pids[PIDTYPE_MAX];
    struct list_head thread_group;
    struct list_head thread_node;

    struct completion *vfork_done;      /* for vfork() */
    int __user *set_child_tid;      /* CLONE_CHILD_SETTID */
    int __user *clear_child_tid;        /* CLONE_CHILD_CLEARTID */

    cputime_t utime, stime, utimescaled, stimescaled;
    cputime_t gtime;
    struct prev_cputime prev_cputime;
#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
    seqcount_t vtime_seqcount;
    unsigned long long vtime_snap;
    enum {
        /* Task is sleeping or running in a CPU with VTIME inactive */
        VTIME_INACTIVE = 0,
        /* Task runs in userspace in a CPU with VTIME active */
        VTIME_USER,
        /* Task runs in kernelspace in a CPU with VTIME active */
        VTIME_SYS,
    } vtime_snap_whence;
#endif

#ifdef CONFIG_NO_HZ_FULL
    atomic_t tick_dep_mask;
#endif
    unsigned long nvcsw, nivcsw; /* context switch counts */
    u64 start_time;     /* monotonic time in nsec */
    u64 real_start_time;    /* boot based time in nsec */
/* mm fault and swap info: this can arguably be seen as either mm-specific or thread-specific */
    unsigned long min_flt, maj_flt;

    struct task_cputime cputime_expires;
    struct list_head cpu_timers[3];

/* process credentials */
    const struct cred __rcu *ptracer_cred; /* Tracer's credentials at attach */
    const struct cred __rcu *real_cred; /* objective and real subjective task
                     * credentials (COW) */
    const struct cred __rcu *cred;  /* effective (overridable) subjective task
                     * credentials (COW) */
    char comm[TASK_COMM_LEN]; /* executable name excluding path
                     - access with [gs]et_task_comm (which lock
                       it with task_lock())
                     - initialized normally by setup_new_exec */
/* file system info */
    struct nameidata *nameidata;
#ifdef CONFIG_SYSVIPC
/* ipc stuff */
    struct sysv_sem sysvsem;
    struct sysv_shm sysvshm;
#endif
#ifdef CONFIG_DETECT_HUNG_TASK
/* hung task detection */
    unsigned long last_switch_count;
#endif
/* filesystem information */
    struct fs_struct *fs;
/* open file information */
    struct files_struct *files;
/* namespaces */
    struct nsproxy *nsproxy;
/* signal handlers */
    struct signal_struct *signal;
    struct sighand_struct *sighand;

    sigset_t blocked, real_blocked;
    sigset_t saved_sigmask; /* restored if set_restore_sigmask() was used */
    struct sigpending pending;

    unsigned long sas_ss_sp;
    size_t sas_ss_size;
    unsigned sas_ss_flags;

    struct callback_head *task_works;

    struct audit_context *audit_context;
#ifdef CONFIG_AUDITSYSCALL
    kuid_t loginuid;
    unsigned int sessionid;
#endif
    struct seccomp seccomp;

/* Thread group tracking */
    u32 parent_exec_id;
    u32 self_exec_id;
/* Protection of (de-)allocation: mm, files, fs, tty, keyrings, mems_allowed,
 * mempolicy */
    spinlock_t alloc_lock;

    /* Protection of the PI data structures: */
    raw_spinlock_t pi_lock;

    struct wake_q_node wake_q;

#ifdef CONFIG_RT_MUTEXES
    /* PI waiters blocked on a rt_mutex held by this task */
    struct rb_root pi_waiters;
    struct rb_node *pi_waiters_leftmost;
    /* Deadlock detection and priority inheritance handling */
    struct rt_mutex_waiter *pi_blocked_on;
#endif

#ifdef CONFIG_DEBUG_MUTEXES
    /* mutex deadlock detection */
    struct mutex_waiter *blocked_on;
#endif
#ifdef CONFIG_TRACE_IRQFLAGS
    unsigned int irq_events;
    unsigned long hardirq_enable_ip;
    unsigned long hardirq_disable_ip;
    unsigned int hardirq_enable_event;
    unsigned int hardirq_disable_event;
    int hardirqs_enabled;
    int hardirq_context;
    unsigned long softirq_disable_ip;
    unsigned long softirq_enable_ip;
    unsigned int softirq_disable_event;
    unsigned int softirq_enable_event;
    int softirqs_enabled;
    int softirq_context;
#endif
#ifdef CONFIG_LOCKDEP
# define MAX_LOCK_DEPTH 48UL
    u64 curr_chain_key;
    int lockdep_depth;
    unsigned int lockdep_recursion;
    struct held_lock held_locks[MAX_LOCK_DEPTH];
    gfp_t lockdep_reclaim_gfp;
#endif
#ifdef CONFIG_UBSAN
    unsigned int in_ubsan;
#endif

/* journalling filesystem info */
    void *journal_info;

/* stacked block device info */
    struct bio_list *bio_list;

#ifdef CONFIG_BLOCK
/* stack plugging */
    struct blk_plug *plug;
#endif

/* VM state */
    struct reclaim_state *reclaim_state;

    struct backing_dev_info *backing_dev_info;

    struct io_context *io_context;

    unsigned long ptrace_message;
    siginfo_t *last_siginfo; /* For ptrace use.  */
    struct task_io_accounting ioac;
#if defined(CONFIG_TASK_XACCT)
    u64 acct_rss_mem1;  /* accumulated rss usage */
    u64 acct_vm_mem1;   /* accumulated virtual memory usage */
    cputime_t acct_timexpd; /* stime + utime since last update */
#endif
#ifdef CONFIG_CPUSETS
    nodemask_t mems_allowed;    /* Protected by alloc_lock */
    seqcount_t mems_allowed_seq;    /* Seqence no to catch updates */
    int cpuset_mem_spread_rotor;
    int cpuset_slab_spread_rotor;
#endif
#ifdef CONFIG_CGROUPS
    /* Control Group info protected by css_set_lock */
    struct css_set __rcu *cgroups;
    /* cg_list protected by css_set_lock and tsk->alloc_lock */
    struct list_head cg_list;
#endif
#ifdef CONFIG_FUTEX
    struct robust_list_head __user *robust_list;
#ifdef CONFIG_COMPAT
    struct compat_robust_list_head __user *compat_robust_list;
#endif
    struct list_head pi_state_list;
    struct futex_pi_state *pi_state_cache;
#endif
#ifdef CONFIG_PERF_EVENTS
    struct perf_event_context *perf_event_ctxp[perf_nr_task_contexts];
    struct mutex perf_event_mutex;
    struct list_head perf_event_list;
#endif
#ifdef CONFIG_DEBUG_PREEMPT
    unsigned long preempt_disable_ip;
#endif
#ifdef CONFIG_NUMA
    struct mempolicy *mempolicy;    /* Protected by alloc_lock */
    short il_next;
    short pref_node_fork;
#endif
#ifdef CONFIG_NUMA_BALANCING
    int numa_scan_seq;
    unsigned int numa_scan_period;
    unsigned int numa_scan_period_max;
    int numa_preferred_nid;
    unsigned long numa_migrate_retry;
    u64 node_stamp;         /* migration stamp  */
    u64 last_task_numa_placement;
    u64 last_sum_exec_runtime;
    struct callback_head numa_work;

    struct list_head numa_entry;
    struct numa_group *numa_group;

    /*
     * numa_faults is an array split into four regions:
     * faults_memory, faults_cpu, faults_memory_buffer, faults_cpu_buffer
     * in this precise order.
     *
     * faults_memory: Exponential decaying average of faults on a per-node
     * basis. Scheduling placement decisions are made based on these
     * counts. The values remain static for the duration of a PTE scan.
     * faults_cpu: Track the nodes the process was running on when a NUMA
     * hinting fault was incurred.
     * faults_memory_buffer and faults_cpu_buffer: Record faults per node
     * during the current scan window. When the scan completes, the counts
     * in faults_memory and faults_cpu decay and these values are copied.
     */
    unsigned long *numa_faults;
    unsigned long total_numa_faults;

    /*
     * numa_faults_locality tracks if faults recorded during the last
     * scan window were remote/local or failed to migrate. The task scan
     * period is adapted based on the locality of the faults with different
     * weights depending on whether they were shared or private faults
     */
    unsigned long numa_faults_locality[3];

    unsigned long numa_pages_migrated;
#endif /* CONFIG_NUMA_BALANCING */

#ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
    struct tlbflush_unmap_batch tlb_ubc;
#endif

    struct rcu_head rcu;

    /*
     * cache last used pipe for splice
     */
    struct pipe_inode_info *splice_pipe;

    struct page_frag task_frag;

#ifdef  CONFIG_TASK_DELAY_ACCT
    struct task_delay_info *delays;
#endif
#ifdef CONFIG_FAULT_INJECTION
    int make_it_fail;
#endif
    /*
     * when (nr_dirtied >= nr_dirtied_pause), it's time to call
     * balance_dirty_pages() for some dirty throttling pause
     */
    int nr_dirtied;
    int nr_dirtied_pause;
    unsigned long dirty_paused_when; /* start of a write-and-pause period */

#ifdef CONFIG_LATENCYTOP
    int latency_record_count;
    struct latency_record latency_record[LT_SAVECOUNT];
#endif
    /*
     * time slack values; these are used to round up poll() and
     * select() etc timeout values. These are in nanoseconds.
     */
    u64 timer_slack_ns;
    u64 default_timer_slack_ns;

#ifdef CONFIG_KASAN
    unsigned int kasan_depth;
#endif
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
    /* Index of current stored address in ret_stack */
    int curr_ret_stack;
    /* Stack of return addresses for return function tracing */
    struct ftrace_ret_stack *ret_stack;
    /* time stamp for last schedule */
    unsigned long long ftrace_timestamp;
    /*
     * Number of functions that haven't been traced
     * because of depth overrun.
     */
    atomic_t trace_overrun;
    /* Pause for the tracing */
    atomic_t tracing_graph_pause;
#endif
#ifdef CONFIG_TRACING
    /* state flags for use by tracers */
    unsigned long trace;
    /* bitmask and counter of trace recursion */
    unsigned long trace_recursion;
#endif /* CONFIG_TRACING */
#ifdef CONFIG_KCOV
    /* Coverage collection mode enabled for this task (0 if disabled). */
    enum kcov_mode kcov_mode;
    /* Size of the kcov_area. */
    unsigned    kcov_size;
    /* Buffer for coverage collection. */
    void        *kcov_area;
    /* kcov desciptor wired with this task or NULL. */
    struct kcov *kcov;
#endif
#ifdef CONFIG_MEMCG
    struct mem_cgroup *memcg_in_oom;
    gfp_t memcg_oom_gfp_mask;
    int memcg_oom_order;

    /* number of pages to reclaim on returning to userland */
    unsigned int memcg_nr_pages_over_high;
#endif
#ifdef CONFIG_UPROBES
    struct uprobe_task *utask;
#endif
#if defined(CONFIG_BCACHE) || defined(CONFIG_BCACHE_MODULE)
    unsigned int    sequential_io;
    unsigned int    sequential_io_avg;
#endif
#ifdef CONFIG_DEBUG_ATOMIC_SLEEP
    unsigned long   task_state_change;
#endif
    int pagefault_disabled;
#ifdef CONFIG_MMU
    struct task_struct *oom_reaper_list;
#endif
#ifdef CONFIG_VMAP_STACK
    struct vm_struct *stack_vm_area;
#endif
#ifdef CONFIG_THREAD_INFO_IN_TASK
    /* A live task holds one reference. */
    atomic_t stack_refcount;
#endif
/* CPU-specific state of this task */
    struct thread_struct thread;
/*
 * WARNING: on x86, 'thread_struct' contains a variable-sized
 * structure.  It *MUST* be at the end of 'task_struct'.
 *
 * Do not put anything below here!
 */
};

進(jìn)程標(biāo)識(shí)

內(nèi)核通過一個(gè)唯一的進(jìn)程標(biāo)識(shí)值(PID)來標(biāo)識(shí)每個(gè)進(jìn)程,PID的類型為pid_t,實(shí)際是一個(gè)int類型。通過/proc/sys/kernel/pid_max文件查看或修改內(nèi)核對PID最大值的限制,在某些機(jī)器該文件內(nèi)容為4194304。

在分配或回收PID值,內(nèi)核通過維護(hù)pidmap[PIDMAP_ENTRIES]位圖數(shù)組來標(biāo)識(shí)哪些PID被分配,哪些為空閑。

struct thread_info結(jié)構(gòu)

內(nèi)核為每個(gè)進(jìn)程在內(nèi)核棧底(假設(shè)內(nèi)核棧是向下增長)創(chuàng)建一個(gè)struct thread_info的結(jié)構(gòu)對象,該對象主要作用就是讓內(nèi)核能夠快速的獲取當(dāng)前進(jìn)程描述符。struct thread_info是一個(gè)與cpu硬件相關(guān)的結(jié)構(gòu),在文件<asm/thread_info.h>定義。

struct thread_info結(jié)構(gòu)如下:

/*
 * On IA-64, we want to keep the task structure and kernel stack together, so they can be
 * mapped by a single TLB entry and so they can be addressed by the "current" pointer
 * without having to do pointer masking.
 */
struct thread_info {
        struct task_struct *task;       /* XXX not really needed, except for dup_task_struct() */
        __u32 flags;                    /* thread_info flags (see TIF_*) */
        __u32 cpu;                      /* current CPU */
        __u32 last_cpu;                 /* Last CPU thread ran on */
        __u32 status;                   /* Thread synchronous flags */
        mm_segment_t addr_limit;        /* user-level address space limit */
        int preempt_count;              /* 0=premptable, <0=BUG; will also serve as bh-counter */
#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
        __u64 ac_stamp;
        __u64 ac_leave;
        __u64 ac_stime;
        __u64 ac_utime;
#endif
};

在X86上,內(nèi)核棧的大小和位置是固定的。所以很容易獲取棧底指針,即struct thread_info結(jié)構(gòu)對象的位置。再通過該對象就可以獲取task的地址。

current_thread_info()->task

不同的CPU體系結(jié)構(gòu),獲取task的方式有所有不同,有些cpu會(huì)將當(dāng)前的task_struct保存在固定的寄存器中。

注:關(guān)于struct thread_info結(jié)構(gòu)的描述對于現(xiàn)在linux內(nèi)核可能不適用了,因?yàn)橛锌赡軆?nèi)核會(huì)將整個(gè)task_struct對象直接放置在棧底。說白了,thread_info的存在是一種優(yōu)化方式,這種優(yōu)化方式可能被其它更好的方式所取代。

進(jìn)程的創(chuàng)建

Linux調(diào)用fork()通過復(fù)制當(dāng)前進(jìn)程來創(chuàng)建新的子進(jìn)程,Linux為了加快創(chuàng)建過程,并未復(fù)制整個(gè)進(jìn)程地址空間。Linux創(chuàng)建進(jìn)程是出了名的快,主要使用了寫時(shí)復(fù)制(copy on write)技術(shù),一種推遲或免除數(shù)據(jù)復(fù)制的技術(shù)。

寫時(shí)復(fù)制

fork()主要開銷是復(fù)制父進(jìn)程的頁表以及給子進(jìn)程創(chuàng)建新的進(jìn)程描述符。父進(jìn)程與子進(jìn)程通過頁表共享數(shù)據(jù)頁,并將數(shù)據(jù)頁的屬性設(shè)置為只讀。當(dāng)數(shù)據(jù)頁寫入數(shù)據(jù)時(shí),因?yàn)橹蛔x將引發(fā)頁錯(cuò)誤,內(nèi)核捕獲該錯(cuò)誤并復(fù)制數(shù)據(jù)頁,更新進(jìn)程的頁表項(xiàng)。

進(jìn)程管理命令

ps:顯示當(dāng)前進(jìn)程狀態(tài)

ice@ice-VirtualBox:~/linux/linux-4.9.44$ ps aux 
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.4 225712  9132 ?        Ss   1月15   2:24 /sbin/init splash
root         2  0.0  0.0      0     0 ?        S    1月15   0:00 [kthreadd]
root         4  0.0  0.0      0     0 ?        I<   1月15   0:00 [kworker/0:0H]
root         6  0.0  0.0      0     0 ?        I<   1月15   0:00 [mm_percpu_wq]
root         7  0.0  0.0      0     0 ?        S    1月15   1:21 [ksoftirqd/0]
root         8  0.0  0.0      0     0 ?        I    1月15   6:34 [rcu_sched]
root         9  0.0  0.0      0     0 ?        I    1月15   0:00 [rcu_bh]
root        10  0.0  0.0      0     0 ?        S    1月15   0:00 [migration/0]
root        11  0.0  0.0      0     0 ?        S    1月15   0:08 [watchdog/0]
root        12  0.0  0.0      0     0 ?        S    1月15   0:00 [cpuhp/0]
...

顯示的列信息:

  • USER: 用戶名
  • PID:進(jìn)程ID
  • PPID: 父進(jìn)程ID
  • %CPU: CPU占用率
  • %MEM: 內(nèi)存占用率
  • VSZ: 虛擬內(nèi)存大小(單位KB)
  • RSS:實(shí)際使用內(nèi)存大小
  • TTY: 控制終端(?表示守護(hù)進(jìn)程)
  • STAT: 狀態(tài)(S-可中斷睡眠,D-不可中斷睡眠,R-正在運(yùn)行,Z-僵尸狀態(tài))
  • START:啟動(dòng)時(shí)間
  • TIME: 累積CPU時(shí)間
  • COMMAND: 執(zhí)行的命令

常用選項(xiàng):

  • -e: 顯示所有進(jìn)程,與-A相同
  • -l: 按長格式顯示
  • -f: 按全格式顯示
  • -u <userlist>:顯示指定用戶的進(jìn)程,默認(rèn)顯示所有用戶
  • --sort spec : 指定排序的方式
  • -C cmdlist: 顯示指定進(jìn)程名的進(jìn)程
  • -L:顯示線程信息

示例:

  • 只顯示指定用戶ice的進(jìn)程
ice@ice-VirtualBox:~/linux/linux-4.9.44$ ps -f -u ice
UID        PID  PPID  C STIME TTY          TIME CMD
ice       1290 17950  0 10:17 pts/1    00:01:13 emacs sched.h
ice       2007     1  0 1月15 ?       00:00:00 /lib/systemd/systemd --user
ice       2015  2007  0 1月15 ?       00:00:00 (sd-pam)
ice       2031  2003  0 1月15 ?       00:00:00 /bin/sh /etc/xdg/xfce4/xinitrc -- /etc/X11/xinit/xserverrc
ice       2045  2007  0 1月15 ?       00:00:00 /usr/bin/dbus-daemon --session --address=systemd: --nofork --nopidfile --systemd-activat
ice       2119     1  0 1月15 ?       00:00:00 /usr/bin/VBoxClient --clipboard
ice       2120  2119  0 1月15 ?       00:00:12 /usr/bin/VBoxClient --clipboard
ice       2129     1  0 1月15 ?       00:00:00 /usr/bin/VBoxClient --display
...
  • 按cpu使用率排序顯示
ice@ice-VirtualBox:~/linux/linux-4.9.44$ ps -aux --sort -pcpu
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
ice       1290  0.7 14.3 610100 292848 pts/1   Tl   10:17   1:13 emacs sched.h
root      5919  0.7  4.7 210340 97760 ?        SLsl 1月25  69:16 /usr/sbin/corosync -f
root     19477  0.3  2.8 718880 58496 ?        Ssl  1月26  28:05 /usr/bin/dockerd -H fd://
root     19498  0.3  1.6 663636 32848 ?        Ssl  1月26  24:35 docker-containerd --config /var/run/docker/containerd/containerd.toml
ice       2142  0.2  0.0 126232  1608 ?        Sl   1月15  68:23 /usr/bin/VBoxClient --draganddrop
root      1283  0.1  1.7 248092 35304 ?        S<Lsl 1月15  36:24 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslo
root         1  0.0  0.4 225712  9144 ?        Ss   1月15   2:25 /sbin/init splash
...
  • 按內(nèi)存使用率排序顯示
ice@ice-VirtualBox:~/linux/linux-4.9.44$ ps -aux --sort -pmem
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
ice       1290  0.6 14.3 610100 292848 pts/1   Tl   10:17   1:13 emacs sched.h
root      5919  0.7  4.7 210340 97760 ?        SLsl 1月25  69:17 /usr/sbin/corosync -f
ice       4609  0.0  4.6 761340 95344 ?        SNl  1月27   0:17 /usr/bin/python3 /usr/bin/update-manager --no-update --no-focus-on-map
root       254  0.0  3.6 194960 73980 ?        S<s  1月15   2:47 /lib/systemd/systemd-journald
root     19477  0.3  2.8 718880 58496 ?        Ssl  1月26  28:05 /usr/bin/dockerd -H fd://
ice       2992  0.0  2.2 921676 45764 ?        Sl   1月15   2:49 /usr/bin/xfce4-terminal
root      1645  0.0  2.1 363052 44760 tty7     Rsl+ 1月15   5:47 /usr/lib/xorg/Xorg -core :0 -seat seat0 -auth /var/run/lightdm/root/:0
root      1283  0.1  1.7 248092 35304 ?        S<Lsl 1月15  36:24 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslo
root     19498  0.3  1.6 663636 32848 ?        Ssl  1月26  24:35 docker-containerd --config /var/run/docker/containerd/containerd.toml
ice       2175  0.0  1.1 436592 23904 ?        Sl   1月15   0:38 xfwm4 --replace
...
  • 查看sshd進(jìn)程
ice@ice-VirtualBox:~/linux/linux-4.9.44$ ps -f -C sshd
UID        PID  PPID  C STIME TTY          TIME CMD
root      1431     1  0 1月15 ?       00:00:00 /usr/sbin/sshd -D
  • 查看某個(gè)進(jìn)程的線程信息
ice@ice-VirtualBox:~/linux/linux-4.9.44$ ps -L 19498
  PID   LWP TTY      STAT   TIME COMMAND
19498 19498 ?        Ssl    0:00 docker-containerd --config /var/run/docker/containerd/containerd.toml
19498 19516 ?        Ssl    6:28 docker-containerd --config /var/run/docker/containerd/containerd.toml
19498 19517 ?        Ssl    0:00 docker-containerd --config /var/run/docker/containerd/containerd.toml
19498 19518 ?        Ssl    4:28 docker-containerd --config /var/run/docker/containerd/containerd.toml
19498 19519 ?        Ssl    0:00 docker-containerd --config /var/run/docker/containerd/containerd.toml
19498 19520 ?        Ssl    0:00 docker-containerd --config /var/run/docker/containerd/containerd.toml
19498 19521 ?        Ssl    1:48 docker-containerd --config /var/run/docker/containerd/containerd.toml
19498 19530 ?        Ssl    4:07 docker-containerd --config /var/run/docker/containerd/containerd.toml
19498 19532 ?        Ssl    4:48 docker-containerd --config /var/run/docker/containerd/containerd.toml
19498 16227 ?        Ssl    2:53 docker-containerd --config /var/run/docker/containerd/containerd.toml
  • 與watch命令結(jié)合達(dá)到實(shí)時(shí)查詢進(jìn)程狀態(tài)的效果
watch -n 1 'ps -aux --sort -pmem'

top: 動(dòng)態(tài)實(shí)時(shí)顯示系統(tǒng)和進(jìn)程的信息

ice@ice-VirtualBox:~/linux/linux-4.9.44$ top
top - 13:35:42 up 16 days, 13:11,  1 user,  load average: 0.07, 0.02, 0.00
Tasks: 181 total,   1 running, 129 sleeping,   1 stopped,   1 zombie
%Cpu(s): 11.6 us,  5.3 sy,  0.3 ni, 82.5 id,  0.0 wa,  0.0 hi,  0.3 si,  0.0 st
KiB Mem :  2041304 total,   177336 free,   781632 used,  1082336 buff/cache
KiB Swap:  2097148 total,  2063908 free,    33240 used.  1000428 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                                                              
 1645 root      20   0  363052  44760  12216 S  7.9  2.2   5:56.07 Xorg                                                                 
 2992 ice       20   0  922068  45764  23984 S  4.6  2.2   2:54.71 xfce4-terminal                                                       
 2175 ice       20   0  436592  23904  15824 S  1.0  1.2   0:39.85 xfwm4                                                                
 3804 ice       20   0   49348   4048   3392 R  1.0  0.2   0:00.25 top                                                                  
 2411 ice       20   0  220776   5096   4384 S  0.7  0.2   0:02.69 at-spi2-registr                                                      
 5919 root      rt   0  210340  97760  74932 S  0.7  4.8  69:27.79 corosync                                                             
 2142 ice       20   0  126232   1608   1532 S  0.3  0.1  68:25.90 VBoxClient                                                           
 2200 ice       20   0  588228  19912  10488 S  0.3  1.0   0:04.20 polkit-gnome-au                                                      
 2223 ice       20   0  535352  16180   7944 S  0.3  0.8   0:01.17 light-locker                                                         
 2393 ice       20   0   49928   4180   3720 S  0.3  0.2   0:00.44 dbus-daemon
...
  • 系統(tǒng)運(yùn)行時(shí)間和平均負(fù)載
top - 13:35:42 up 16 days, 13:11,  1 user,  load average: 0.07, 0.02, 0.00
  • 任務(wù)統(tǒng)計(jì)
Tasks: 181 total,   1 running, 129 sleeping,   1 stopped,   1 zombie
  • CPU統(tǒng)計(jì)
%Cpu(s): 11.6 us,  5.3 sy,  0.3 ni, 82.5 id,  0.0 wa,  0.0 hi,  0.3 si,  0.0 st
  • 內(nèi)存統(tǒng)計(jì)
KiB Mem :  2041304 total,   177336 free,   781632 used,  1082336 buff/cache
KiB Swap:  2097148 total,  2063908 free,    33240 used.  1000428 avail Mem 

kill:給進(jìn)程發(fā)送信號(hào)

發(fā)送和接收信號(hào)是一種進(jìn)程之間通信的機(jī)制。Linux內(nèi)置一些固定的信號(hào)以及信號(hào)處理方式。
命令格式:

 kill [options] <pid> [...]

常用 選項(xiàng):

  • -<signal>-s <signal>:指定要發(fā)送的信號(hào)
  • -l-L:顯示信號(hào)列表

示例:

  • 查看信號(hào)列表
ice@ice-VirtualBox:~/linux/linux-4.9.44$ kill -l
 1) SIGHUP   2) SIGINT   3) SIGQUIT  4) SIGILL   5) SIGTRAP
 2) SIGABRT  7) SIGBUS   8) SIGFPE   9) SIGKILL 10) SIGUSR1
1)  SIGSEGV 12) SIGUSR2 13) SIGPIPE 14) SIGALRM 15) SIGTERM
2)  SIGSTKFLT   17) SIGCHLD 18) SIGCONT 19) SIGSTOP 20) SIGTSTP
3)  SIGTTIN 22) SIGTTOU 23) SIGURG  24) SIGXCPU 25) SIGXFSZ
4)  SIGVTALRM   27) SIGPROF 28) SIGWINCH    29) SIGIO   30) SIGPWR
5)  SIGSYS  34) SIGRTMIN    35) SIGRTMIN+1  36) SIGRTMIN+2  37) SIGRTMIN+3
6)  SIGRTMIN+4  39) SIGRTMIN+5  40) SIGRTMIN+6  41) SIGRTMIN+7  42) SIGRTMIN+8
7)  SIGRTMIN+9  44) SIGRTMIN+10 45) SIGRTMIN+11 46) SIGRTMIN+12 47) SIGRTMIN+13
8)  SIGRTMIN+14 49) SIGRTMIN+15 50) SIGRTMAX-14 51) SIGRTMAX-13 52) SIGRTMAX-12
9)  SIGRTMAX-11 54) SIGRTMAX-10 55) SIGRTMAX-9  56) SIGRTMAX-8  57) SIGRTMAX-7
10) SIGRTMAX-6  59) SIGRTMAX-5  60) SIGRTMAX-4  61) SIGRTMAX-3  62) SIGRTMAX-2
11) SIGRTMAX-1  64) SIGRTMAX        
  • 終止進(jìn)程
kill <pid> [...]    #優(yōu)雅地終止進(jìn)程,默認(rèn)發(fā)送SIGTERM信號(hào)(15)
kill -9 <pid> [...] #強(qiáng)制終止進(jìn)程
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容